Big data has emerged in 2012 as a major pain point for many organisations. Feeding on Cloud, mobility and social networking trends has had everyone scrambling to find ways to manage everything that comes in. ARN sat down with some of the dominant players in the space to talk through some of the issues behind big data.
Matthew Sainsbury (MS): It’s safe to say that big data has become a really big topic over the past couple of months. But what are some of the issues that have caused big data to become such a topic of interest?
Sean Kopelke (SK): The interesting part is in trying to get some agreement at the table about what the big data space is. You have a lot of technology players out there presenting solutions for big data without clarifying exactly what that actually means to the business. We need to understand what the challenge is around big data and from the Symantec point of view we see it from a number of different areas. It’s around having these mass volumes of information that are worth keeping, but a lot of organisations struggle over how to manage that sheer volume of data and make use of it.
Paul Harapin (PH): Customers at the moment are faced with a situation of having masses of data flowing in from all different parts of their business. Part of the problem is in analysing that, but there is also opportunity for competitive advantage. We have two scenarios with our customers at the moment; one is a manufacturer that spends a lot of their marketing dollars dragging the Twitter feeds globally – every Twitter, every day, globally. It analyses that data based on various parameters around individual brand names, hits on terms, positive and negative, and tries to distill marketing strategies from that. I also spoke to some utilities in Japan a few weeks ago which went from collecting metadata once every three months to smart meters collecting metadata every minute and so the massive amounts of data there provide huge opportunities. From our perspective opportunities exist in analysing that data and helping customers develop new apps to take advantage of the new mechanism of analysing that data so their life is easier.
Craig Baty (CB): We’re used to talking about business intelligence, which is often likened to big data, but we see them as two very different things. Business intelligence is all about structured data. You know where the data is. You know what it looks like. It has been designed for a certain purpose and the business intelligence tools take that data and produce reports. It’s normally fairly static and business intelligence describes what happens. Sometimes, based on that, you can predict something that you want to do, but generally the data is old. Big data is coming from lots of sources whereby you may not actually know the structure of the sources, especially if you go into the Cloud and you access databases that aren’t yours. Then when you get the data you need to be able to process it in real time, so it’s unstructured and it should enable you not only to predict, but also get the information rapidly enough, so that you can prescribe an action.
SK: I was reading something in the US recently. There is a very large department chain over there and they collect every piece of information they can on someone. They’ve got the department store cards, what you buy, where you buy, the type of things you buy and they got pulled up because they sent a mailing campaign out and sent a letter to someone saying, “congratulations, you’re pregnant”. The woman in question hadn’t actually told anybody that she was, but the department store had predicted, based on her buying habits and what she had been doing over the last six weeks, that she was pregnant and they marketed stuff to her.
That’s a great example of taking that massive amount of information, analysing it in real time and being able to make a strategic decision on the case a marketing point of view. In that particular scenario, it may not have generated the best publicity for them, but there is a great opportunity for businesses if they can predict what is actually happening out there through the behaviour of people. I’m finding a lot of customers are talking how they don’t know what they potentially can do with the information, so they’re just going to hold it all. Then as technology becomes better and there are more solutions to help them through it they’ll be able to make some meaningful decisions out of that.
CB: The bit that you mentioned about privacy has to come up as a discussion. Governments and taskforces and panels are all looking at this. There was a panel two or three weeks ago in Camden that I was on and it was all about Patriot Act, privacy and security and Australians are actually quite concerned about all of that and so when we talk about big data, it’s just another thing that scares people about Cloud and it’s another thing that will actually draw this whole conversation to a head. The government or someone needs to take action about what you can and can’t do with the data.
Peter Prowse (PP): How do we gather the data and then what do we do with it once we’ve got it? The organisations that we’re speaking to are literally struggling under a blizzard of information, which they don’t know how to pass at the moment. They don’t know whether it’s worthwhile or whether it’s useless. Go back and think about the introduction of RFID, which was a technology that was going to revolutionise the world five years ago. Fifteen generations ago in IT language and the hype was that we’re going to be able to capture all the information of the shoppers as they go out of the supermarket or as they travel through an airport. It died on the vine because there was no real business value that could be extracted from that. The challenge when you think about big data or this sort of volume of data that is coming in an unstructured way is unless you’ve got the tools to be able to truly analyse it and be able to extract value from it, then most businesses will just store it and it will sit somewhere in their operation for years to come, costing them a lot of money in that space. So the key thing is not really actually getting the data. It’s what you do with it once you’ve got it. So you’re seeing the rise of tools like Hadoop, as an example, that provide then some smart analytics, and start taking some of that business value.
For organisations like us, it’s about being able to assist customers in efficiently storing that data and being able to manage it through its lifecycle. How do you actually tier and archive your data, so that you’re moving a lot of this stuff into cheaper modes of storage?
PH: As I alluded to, it’s not even an issue on just how you analyse the data. It’s a question of how do you make that data instantaneously relevant to the business applications that are in the hands of your team. That’s part of where VMWare has been focusing on the developer side. We’re looking at how we can help customers develop new applications with the knowledge that this data capability is there regardless of what tools they happen to use so that the applications now are being modernised and used in a way that says, “Hey, if somebody is making a transaction on a credit card it’s not really helpful if it takes me a couple of hours to analyse past buying patterns because they’ve already taken the goods and they’re out of the shop.”
Louis Tague (LT): So where previously customers could be without data for a day or two, it’s now down to hours, minutes, seconds. That is creating a lot of complexity for many of our customers in terms of delivering that kind of availability back to their business. There are lots of solutions in terms of the customer being able to do that. It’s also driving virtualisation and Cloud discussions as well. For all of us, it’s an exciting time as they grapple with this massive data that they’re now having to manage.
Ronnie Altit (RA): I’m going to be a little bit contentious. The concept of what we’ve been talking about so far is how to extract value from the datasets that we’re now capturing. The reality is that it’s no different to what it was five years ago. It’s just now that the switch is almost flicked in people’s minds where they realise that there is a lot more that they can do. Business intelligence systems have been around for many, many years. That’s not a novel concept. What is actually impacting and what is creating the concept of big data is social media. That to me is what is changing things. In reality businesses are getting busier, so they have to create more data.
As a relatively new, young business we leverage the social media as much as we can including LinkedIn, Twitter and Facebook to get the message out and the power of it quite phenomenal. Just by way of example, I know this isn’t a social media discussion but here's an example of what you can do with that information. We actually started to create a consolidated LinkedIn profile and we made a edict in a small 17-person organisation that nobody posted stuff about the business outside of going through a central channel. We posted through our central LinkedIn channel. We got 10,000 hits in a week once we got that going. That to me is what is important. That dataset is out there. It’s in the Cloud. I don’t really mind what’s going on with it, but it’s more around the concept of how the data we’ve got hasn’t changed. It’s just growing. What we’re trying to do with it hasn’t changed and this fundamental issue still exists: what do I need to keep and for how long, which even in the years before big data was a topic. How many conversations did all of us have with end users around how long do you need to keep your email for?
PP: The points that I’d make in return are that it reminds me a lot of the monitoring and management conversation as well. Just about every CIO I know has an unopened copy of HP OpenView sitting on their credenza somewhere. They go and buy all these tools and they can’t get them to work in their environment for whatever reason. What you’re seeing now is from a monitoring and management perspective lots of niche organisations pop up to fill that gap. Similarly, in the data explosion world, you’re seeing niche players like Ruby on Rails because the traditional players of the world cannot cope with the way the data is actually coming in at the moment. When it came into the database is was all pretty neat and Teradata, for instance, could crunch away quite nicely. When it’s coming in from video, Twitter, Facebook, you need to be able to respond differently.
There is a well-known Australian organisation that will remain nameless, which had some critical comments made on Facebook about them. Their response to that was to shut down their Facebook page and then they shut down their Twitter feed, so what happened was that the people that were being critical about that organization set up a new Twitter feed, and set up a new Facebook. They weren’t able to pass that information in real time and sort out their best response. That organisation probably has business intelligence tools out the wazoo, but they weren’t able to do that in real time. It took them weeks to respond and then they made the wrong response to that as they got there.
Cam Wayland (CW): It comes back to understanding that it’s not only just the volume of data that’s coming in from everywhere whether it’s structured or unstructured. It’s the velocity that the data is coming in at and also the speed of business. Business can’t sit still now. If somebody is flaming you on Facebook you need to understand what has caused that, where it’s going on, and how to do something about it that’s not going to end up coming back and essentially biting you on the backside.
A lot of businesses are forced into social media because of that. They’re collecting the data and they’re only looking at one portion of what that means for their business because they can’t analyse it and therefore, they’re making mistakes around that. Traditional analytics have been around for ages. That’s the relatively structured and easy way to deal with it because it’s not necessarily at light speed. It’s when you mash it all together. That’s where organisations are having a real struggle with.
Luke McLean (LM): For me the key issue is the data management strategy. I know we can capture the whole amount, but we need to be careful on how we do this and working out the tiers, which is similar to Ronnie’s point and I’ll put it in a similar vein as Cloud. It has been around for a very, very long time and if you’ll look at multi-tenant datacentres and people accessing that, it’s a lot richer. There is a lot more information. It’s the same with big data and it has been around for a very long time also, but now there is just an influx of new information and storage, and it’s how we tier and how we manage that data which takes a combination of business consultancy, process and clever technology because petabytes and Exabyte of data is a lot for anyone to get across. We’re capturing the future, so we need to retain that and work out a clever way to manage that and to access that in the future.
CB: The conversation so far has been around social media farming. Three billion videos are uploaded on YouTube every day and 30 million extra Facebooks, so there is a lot there if someone wants to get a hold of them and we’ve talked about enterprise use, but I see big data and the analysis of the data having more of an application to impacting everyday life. Fujitsu has a concept called the human centric intelligence society or human centric computing. It sounds like a cliché, but it is the overall Japanese philosophy about making life easier for people. We’ve invested a lot in developing high performance computing and we have the world’s fastest supercomputer. It’s almost four times faster than the nearest competitor and we launched a campaign this week on social media all about “what would you do with the world’s fastest supercomputer.” Somebody might say “go and invent another supercomputer” or “find a cure for cancer”, but the way that we see big data falling into that is you need a lot of processing power, especially when you take it down to the individual.
One in four Australians and one in three New Zealanders have asthma. There’s an application that has been developed in the US. It comes from a company called Asthma Pulse. It has come up with a means for putting an RFID tag or wireless device on Ventolin inhaler and it’s collecting a database of asthma medication use in real time. What happens is if the system senses that 5000 people used their puffer in the last 10 minutes in a certain area of the city, the company can alert a doctor who can get something on the radio that says there is something wrong with the air here. Which in the end could save billions of dollars of healthcare. If you can keep somebody from having an asthma attack you’ve stopped them being in the medical system. That requires real time processing.
PH: People have things on their running shoes now when they’re exercising and they’re collecting all that data, so you start to aggregate that and see that this person is this age and they’re running in this environment under these conditions and their heart rate’s here. If you aggregate that globally, you make some medical analysis in terms of predicting if they’re going to have some health problems or anything.
LT: So who owns that data Paul?
PH: Well that’s a whole separate discussion, right? That’s the privacy side.
LT: But I think that’s the question of big data. It’s not just in an organidation’s datacenter anymore. It’s spread all around the place from Cloud providers and everywhere else. The real issue around that is the security about who owns it and who has access to it.
CB: We have an RFID system with Boeing that keeps track of every nut and bolt and piece in an airplane. There is more than a million parts and if one of those isn’t in the right place that could cause damage, so the system keeps track of where they are and helps them do cabin inspections in one-tenth of the time because they know what’s missing. If you collected all of that from all of the planes in the world just in the airline industry alone how much data are you going to get? And then you start doing it with cars and you do it with utilities and so on.
SK: And there is a big business value there. You say you’ve got your running shoes on and your band on and you own that data because you collect that information. Now what is the value of that? That would be great information if you wanted to share it with your life insurance company. I want life insurance, but I don’t want to pay $500 a month like everyone else. I want to pay $200 because I jog every day. I walk. I do this and this is my fitness and therefore, why am I paying like everyone else and that could be a good business opportunity and the reverse though if he’s unhealthy.
PP: I’m going to challenge that Sean. I’m going to challenge you around that data because most people collect data from their mobile phone at the moment they have some form of device around their arm and then they plug that into a computer and that data then goes into a Cloud. So that data is then not residing on your physical property.
I come back around to this flood of information. What will happen in my view is that organisations will be inundated with so much data from so many sources that they will physically not be able to cope. They can’t cope now. As the very smart marketing people decide they need to capture all of these Twitter feeds and Facebook information, then at some point in time they’re just going give up and stop capturing the data. That is exactly what happened on the monitoring and management front where they turned off all of the monitoring tools because they couldn’t cope with the inundation.
RA: Do you think it’s that they couldn’t cope with the inundation or do you think that at some point they woke up and realised that a lot of the stuff they were capturing didn’t matter?
PH: Or it might not have been relevant. What was relevant was the root cause and we’re seeing interest in that now. We’re now analysing it on a predictive basis, so we’re not waiting for something to happen and then say “hey this broke down, fix it”. We’re saying based on the patterns of your infrastructure for the past X number of years and months, we predict that you will have a problem in here and you’ll have a failure or an outage, so why don’t go and replace the item or do something so that that doesn’t happen in the first place. You take it up to the next level and say “well I’m a company and I’m collecting data on my customers and what I want is competitive advantage and so how do I take that data and provide you a better service or offer you the right product when you walk into the branch based on questions you’ve asked before”. Privacy issues aside, companies are looking at this primarily from the perspective of how they can provide a better service to their customers that is customised to what you want rather than just mass dumping what they think you might like and you’ll see that more through cable TV, and ads targeting you based on what you’re watching, and so on.
lM: The Cloud is a great leveler, so if you are a smaller company you can turn on your Cloud and just have on demand storage and compute for whichever initiative or campaign you’re running and then shut it down as opposed to a large enterprise where you have all this infrastructure and you’ve maintained a whole bunch of information that’s hard to get through.
CB: The SMBs are also more agile and flexible.
PH: It also shows why the application side is important with the Cloud being able to level the playing field in terms of access to computer resources on an as-needed basis. The smaller organisation can focus on putting their money not into infrastructure, but into the application side and be able to analyse or run their business, be it proactive data analysis or whatever it is, in a different way that they’ve never been able to do before simply because of the cost structures.
LM: And you might have vertical brokers all starting to come up and say “I’m in healthcare and I’m going to consolidate all this information and for a fee per month through Cloud you can access particular parts to get your projects done.”
CW: So is that establishing an opportunity for the channel to develop those applications or take it out to the customers so that it’s not just about like the physical infrastructure or even like the virtual infrastructure? I’ve got the data. I can outsource it to a Cloud or whatever. I don’t need to invest in that. Now what do I actually do with it? Somebody has to actually show them that opportunity because I suspect that a lot of customers haven’t even thought about it.
RA: What’s interesting there is one of the things we’re good at in IT is giving things names. We’ve given now a name to Cloud, which has not been anything new and we weren’t even happy with that. We thought up Cloud, private Cloud and all the different types of Clouds. We’ve now got big data. We’re good at coining phrases for things that already exist. Where the disconnect is, is in “just because we can doesn’t mean we should”. With all of this data out there right now and a lot of it could potentially be very beneficial, what I don’t see happening at the business level is clients and customers analysing what is the true value of the information that they could otherwise potentially extract.