Showing posts with label HP Vertica. Show all posts
Showing posts with label HP Vertica. Show all posts

Monday, December 21, 2015

How INOVVO Delivers Analysis that Leads to Greater User Retention and Loyalty for Mobile Operators

Transcript of a discussion on how advanced analytics drawing on multiple data sources provides wireless operators improved interactions with their subscribers and enhances customer experience through personalized insights.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the HPE Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Our next big-data case study discussion examines how INOVVO delivers impactful analytical services to mobile operators to help them engender improved end-user loyalty.

We'll see how advanced analytics, drawing on multiple data sources, enables INOVVO’s mobile carrier customers to provide mobile users with faster, more reliable, and relevant services.

To learn more about how INOVVO uses big data to make major impacts on mobile services, please join me in welcoming Joseph Khalil, President and CEO of INOVVO in Reston, Virginia. Welcome, Joseph.
Embed the HPE Big Data Analytics Engines
To Meet Enterprise-Scale Requirements
Get More Information
Joseph Khalil: Thank you, Dana. I'm glad to be here.

Gardner: User experience and quality of service are so essential nowadays. What has been the challenge for you to gain an integrated and comprehensive view of subscribers and networks that they're on in order to uphold that expectation for user experience and quality?

Khalil: As you mentioned in your intro, we cater to the mobile telco industry. Our customers are mobile operators who have customers in North America, Europe, and the Asia-Pacific region. There are a lot of privacy concerns when you start talking about customer data, and we're very sensitive to that.

Khalil
The challenge is to handle the tremendous volume of data generated by the wireless networks and still adhere to all privacy guidelines. This means we have to deploy our solutions within the firewalls of network operators. This is a big-data solution, and as you know, big data requires a lot of hardware and a big infrastructure.

So our challenge is how we can deploy big data with a small hardware footprint and high storage capacity and performance. That’s what we’ve been working on over the last few years. We have a very compelling offer that we've been delivering to our customers for the past five years. We're leveraging HPE Vertica for our storage technology, and it has allowed us to meet very stringent deployment requirements. HPE has been and still is a great technology partner for us.

Gardner: Tell us a little bit more about how you do that in terms of gathering that data, making sure that you adhere to privacy concerns, and at the same time, because velocity, as we know, is so important, quickly deliver analytics back. How does that work?

User experience

Khalil: We deal with a large number of records that are generated daily within the network. This is data coming from deep packet inspection probes. Almost every operator we talk to has them deployed, because they want to understand the user experience on their networks.

These probes capture large volume of clickstream data. Then, they relay it to us almost in a near real-time fashion. This is the velocity component. We leverage open-source technologies that we adapted to our needs that allow us to deal with the influx of streaming data.

We're now in discussion with HPE about their Kafka offering, which deals with streaming data and scalability issues and seems to complement our current solution and enhances our ability to deal with the velocity and volume issues. Then, our challenge is not just dealing with the data velocity, but also how to access the data and render reports in few seconds.

One of our offering is a care product that’s used by care organizations. They want to know what their customers did the last hour on the network. So there's a near real-time urgency to have this data streamed, loaded, processed, and available for reporting. That’s what our platforms offers.

Gardner: Joseph, given that you're global in nature and that there are so many distribution points for the gathering of data, do you bring this all into a single data center? Do you use cloud or other on-demand elements? How do you manage the centralization of that data?
Our customers can go and see the performance of everything that’s happened on the network for the last 13 months.

Khalil: We don’t have cloud deployments to date, even though our technology allows for it. We could deploy our software in the cloud, but again, due to privacy concerns with customers' data, we end up deploying our solutions in-network within the operators’ firewalls.

One of the big advantages of our solution is that we can choose to host it locally on customers’ premises. We typically store data for up to 13 months. So our customers can go and see the performance of everything that’s happened on the network for the last 13 months.

We store the data at different levels -- hourly, daily, weekly, monthly -- but to answer your question, we deploy on-site, and that’s where all the data is centralized.

Gardner: Let’s look at why this is so important to your customer, the mobile carrier, the mobile operator. What is it that helps their business and benefits their business by having this data and having that speed of analysis?

Customer care

Khalil: Our customer care module, the Subscriber Analytix Care, is used by care agents. These are the individuals that respond to 611 calls from customers complaining about issues with their devices, coverage, or whatever the case may be.

When they're on the phone with a customer and they put in a phone number to investigate, they want to be able to get the report to render in under five seconds. They don’t want to have the customer waiting while the tool is churning trying to retrieve the care dashboard. They want to hit "go," and have the information come on their screen. They want to be able to quickly determine if there's an issue or not. Is there a network issue, is it a device issue, whatever the case may be?

So we give them that speed and simplicity, because the data we are collecting is very complex, and we take all the complexity away. We have our own proprietary data analysis and modeling techniques, and it happens on-the-fly as the data is going through the system. So when the care agent loads that screen, it’s right there at a glance. They can quickly determine what the case may be that’s impacting the customer.
Our care module has been demonstrated to reduce the average call handle time, the time care personnel spend with the customer on the phone.

Our care module has been demonstrated to reduce the average call handle time, the time care personnel spend with the customer on the phone. For big operators, you could imagine how many calls they get every day. Shaving a few minutes off each call can amount to a lot of savings in terms of dollars for them.

Gardner: So in a sense, there’s a force-multiplier by having this analysis. Not only do you head off the problems and fix them before they become evident, which includes better user experience, they're happier as a customer. They stay on the network. But then, when there are problems, you can empower those people who are solving the problem, who are dealing with that customer directly to have the right information in hand.

Khalil: Exactly. They have everything. We give them all the tools that are available to them to quickly determine on the fly how to resolve the issue that the customer is having. That’s why speed is very important for a module like care.
Embed the HPE Big Data Analytics Engines
To Meet Enterprise-Scale Requirements
Get More Information
For our marketing module, speed is important, but not as critical as care, because now you don’t have a customer waiting on the line for you to run your report to see how subscribers are using the network or how they're using their devices. We still produce reports fairly quickly in few seconds, which is also what the platform can offer for marketing.

Gardner: So those are some of the immediate and tactical benefits, but I should think that, over time, as you aggregate this data, there is a strategic benefit, where you can predict what demands are going to be on your networks and/or what services will be more in demand than others, perhaps market by market, region by region. How does that work? How do you provide that strategic level of analysis as well?

Khalil: This is on the marketing side of our platform, Subscriber Analytix Marketing. It's used by the CMO organizations, by marketing analysts, to understand how subscribers are using the services. For example, an operator will have different rate plans or tariff plans. They have different devices, tablets, different offerings, different applications that they're promoting.

How are customers using all these services? Before the advent of deep packet inspection probes and before the advent of big data, operators were blind to how customers are using the services offered by the network. Traditional tools couldn’t get anywhere near handling the amount of data that’s generated by the services.

Specific needs

Today, we can look at this data and synthesize it for them, so they can easily look at it, slice and dice it along many dimensions such as, age, gender, device type, location, time, you name it. Marketing analysts can then use these dimensions to ask very detailed questions about usage on the network. Based on that, they can target specific customers with specific offers that match their specific needs.

Gardner: Of course, in a highly competitive environment, where there are multiple carriers vying for that mobile account, the one that’s first to market with those programs can have a significant advantage.

Khalil: Exactly. Operators are competing now based on the services they offer and their related costs. Back 10-15 years ago, radio coverage footprint and voice plans were the driving factors. Today, it's the data services offered and their associated rate plans.

Gardner: Joseph, let’s learn a little bit more about INOVVO. You recently completed purchase of comScore’s wireless solutions division. Tell us a bit about how you’ve grown as a company, both organically and through acquisition, and maybe the breadth of your services beyond what we've already described?
Our tool allows them to anticipate when existing network elements exhaust their current capacity.

Khalil: INOVVO is a new company. We started in May 2015, but the business is very mature. My senior managers and I have been in this business since 2005. We started the Subscriber Analytix product line back in 2005. Then, comScore acquired us in 2010, and we stayed with them for about 5 years, until this past May.

At that time, comScore decided that they wanted to focus more on their core business and they decided to divest the Subscriber Analytix group. My senior management and I executed a management buyout, and that’s how we started INOVVO.

However, comScore is still a key partner for us. A key component of our product is a dictionary for categorizing and classifying websites, devices, and mobile apps. That’s produced by comScore, and comScore is known in this industry as the gold standard for these types of categorizations .

We have exclusive licensing rights to use the dictionary in our platform. So we have a very close partnership with comScore. Today, as far as the services that INOVVO offers, we have a Subscriber Analytix product line, which is for care, marketing, and network.

We talked about care and marketing, we also have a network module. This is for engineers and network planners. We help engineers understand the utilization of their network elements and help them plan and forecast what the utilization is going to be in the near future, given current trends, and help them stay ahead of the curve. Our tool allows them to anticipate when existing network elements exhaust their current capacity.

Gardner: And given that platform and technology providers like HPE are enabling you to handle streaming real-time highly voluminous amounts of data, where do you see your services going next?

It appears to me that more than just mobile devices will be on these networks. Perhaps we're moving towards the Internet of Things (IoT). We're looking more towards people replacing other networks with their mobile network for entertainment and other aspects of their personal and business lives. At that packet level, where you examine this traffic, it seems to me that you can offer more services to more people in the fairly near future.

Two paths

Khalil: IoT is big and it’s showing up on everybody’s radar. We have two paths that we're pursuing on our roadmap. There is the technology component, and that’s why HPE is a key partner for us. We believe in all their big data components that they offer. And the other component for us is the data-science component and data analysis.

The innovation is going to be in the type of modeling techniques that are going to be used to help, in our case, our customers, the mobile operators. Moving down the road, there could be other beneficiaries of that data, for example companies that are deploying the sensors that are generating the data.

I'm sure they want some feedback on all that data that their sensors are generating. We have all the building blocks now to keep expanding what we have and start getting into those advanced analytics, advanced methodologies, and predictive modeling. These are the areas, and this is where we see really our core expertise, because we understand this data.

Today you see a lot of platforms showing up that say, “Give me your data and I'll show you nice looking reports.” But there is a key component that is missing and that is the domain expertise in understanding the data. This is our core expertise.
My advice is that it’s a new field and you need to consider not just the Hadoop storage layer but the other analytical layers that complements it.

Gardner: Before we finish up, I'd like to ask you about lessons learned that you might share with others. For those organizations that are grappling with the need for near real-time analytics with massive amounts of data, having tremendous amount of data available to them, maybe it’s on a network, maybe it’s in a different environment, do you have any 20/20 hindsight that you might offer on how to make the best use of big data and monetize it?

Khalil: There is a lot of confusion in the industry today about big data. What is big data and what do I need for big data? You hear the terms Hadoop. "I have deployed a Hadoop cluster. So I have solved my big data needs." You ask people what’s their big-data strategy, and they say they have deployed Hadoop. Well, then. what are you doing with Hadoop? How are you accessing the data? How are you reporting on the data?

My advice is that it’s a new field and you need to consider not just the Hadoop storage layer but the other analytical layers that complements it. Everybody is excited about big data. Everybody wants to really have strategy to use big data, and there are multiple components to it. We offer a key component. We don't pitch ourselves to our customers and say, “We are your big data solution for everything you have.”

There is an underlying framework that they have to deploy, and Hadoop is one of them. then comes our piece. It sits on top of the data hosting infrastructure and feeds from all the different data types, because in our industry, typical operators have hundreds if not thousands of data silos that exist in their organization.

So you need framework to really host the various data sources, and Hadoop could be one of them. Then, you need a higher-level reporting layer, an analytical layer, that really can start combining these data silos and making sense of it and bringing value to the organization. So it's a complete strategy of how to handle big data.

Gardner: And that analytics layer that's what HPE Vertica is doing for you.

Key component

Khalil: Exactly. HPE is a key component of what do we do in our analytical layer. There are misconceptions. When we go talk to our customers, They say, “Oh, you're using your Vertica platform to replicate our big data store,” and we say that we're not. The big data store is a lower level, and we're an analytical layer. We're not going to keep everything. We're going to look at all your data, throw away a lot of it, just keep what you really need, and then synthesize it to be modeled and reported on.

Gardner: I'm afraid we'll have to leave it there. We've been exploring how INOVVO delivers impactful analytical services to mobile operators so they can foster improved end-user loyalty, and we've identified how advanced analytics, drawing on multiple data sources, provides a better network quality assurance and, of course, an all-important better user experience.
Embed the HPE Big Data Analytics Engines
To Meet Enterprise-Scale Requirements
Get More Information
So join me in thanking Joseph Khalil, President and CEO of INOVVO in Reston, Virginia. And a big thank you as well to our audience for joining us for this big data innovation case study discussion.

I'm Dana Gardner; Principal Analyst at Interarbor Solutions, your host for this ongoing series of HPE-sponsored discussions. Thanks again for listening, and do come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a discussion on how advanced analytics drawing on multiple data sources provides wireless operators improved interactions with their subscribers and enhances customer experience through personalized insights. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Tuesday, November 03, 2015

Big Data Generates New Insights into What’s Happening in the World's Tropical Ecosystems

Transcript of a discussion on how large-scale monitoring of rainforest, biodiversity and climate has been enabled and accelerated by cutting-edge, big-data capture, retrieval and analysis.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Our next big-data case study discussion explores how large-scale monitoring of rainforest biodiversity and climate has been enabled and accelerated by cutting-edge big-data capture, retrieval, and analysis.

We'll learn how quantitative analysis and modeling are generating new insights into what’s happening in tropical ecosystems worldwide, and we'll hear how such insights are leading to better ways to attain and verify sustainable development and preservation methods and techniques.

To learn more about data science -- and how hosting that data science in the cloud -- helps the study of biodiversity, we're pleased to welcome our guests, Eric Fegraus, Senior Director of Technology of the TEAM Network at Conservation International in Arlington, Virginia. Welcome, Eric.

Eric Fegraus: Hi, Dana. It’s great to be here. Thank you.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Gardner: We're glad to have you. We're also here with Jorge Ahumada, Executive Director of the TEAM Network, also at Conservation International. Welcome, Jorge.

Jorge Ahumada: Great to be here.

Gardner: Let’s start with the trends. Clearly, knowing what’s going on in environments in the tropics helps us understand what to do and what not to do. How has that changed? We spoke about a year ago, Eric. Are there any trends or driving influences that have made this data gathering more important than ever.

Fegraus: Over this last year, we’ve been able to roll out our analytic systems across the TEAM Network. We're having more-and-more uptake with our protected-area managers using the system and we have some good examples where the results are being used.

Fegraus
For example, in Uganda, we noticed that a particular cat species was trending downward. The folks there were really curious why this was happening. At first, they were excited that there was this cat species, which was previously not known to be there.

This particular forest is a gorilla reserve, and one of the main economic drivers around the reserve is ecotourism, people paying to go see the gorillas. Once they saw that these cats are going down, they started asking what could be impacting this. Our system told them that the way they were bringing in the eco-tourists to see the gorillas had shifted and that was potentially having an impact of where the cats were. It allowed them to readjust and think about their practices to bring in the tourists to the gorillas.

Information at work

Gardner: Information at work.

Fegraus: Information at work at the protected-area level.

Gardner: Just to be clear for our audience, the TEAM Network stands for the Tropical Ecology Assessment and Monitoring. Jorge, tell us a little bit about how that came about, the TEAM Network and what it encompasses worldwide?

Ahumada: The TEAM Network was a program that started about 12 years ago and it was started to fill a void in the information we have from tropical forests. Tropical forests cover a little bit less than 10 percent of the terrestrial area in the world, but they have more than 50 percent of the biodiversity.

Ahumda
So they're the critical places to be conserved from that point of view, despite the fact we didn’t have any information about what's happening in these places. That’s how the TEAM Network was born, and the model was to use data collection methods that were standardized, that were replicated across a number of sites, and have systems that would store and analyze that data and make it useful. That was the main motivation.

Gardner: Of course, it’s super-important to be able to collect and retrieve and put that data into a place where it can be analyzed. It’s also, of course, important then to be able to share that analysis. Eric, tell us what's been happening lately that has led to the ability for all of those parts of a data lifecycle to really come to fruition?

Fegraus: Earlier this year, we completed our end-to-end system. We're able to take the data from the field, from the camera traps, from the climate stations, and bring it into our central repository. We then push the data into Vertica, which is used for the analytics. Then, we developed a really nice front-end dashboard that shows the results of species populations in all the protected areas where we work.

The analytical process also starts to identify what could be impacting the trends that we're seeing at a per-species level. This dashboard also lets the user look at the data in a lot of different ways. They can aggregate it and they can slice and dice it in different ways to look at different trends.

Gardner: Jorge, what sort of technologies are they using for that slicing and dicing? Are you seeing certain tools like Distributed R or visualization software and business-intelligence (BI) packages? What's the common thread or is it varied greatly?

Ahumada: It depends on the analysis, but we're really at the forefront of analytics in terms of big data. As Michael Stonebraker and other big data thinkers have said, the big-data analytics infrastructure has concentrated on the storage of big data, but not so much on the analytics. We break that mold because we're doing very, very sophisticated Bayesian analytics with this data.

One of the problems of working with camera-trap data is that you have to separate the detection process from the actual trend that you're seeing because you do have a detection process that has error.

Hierarchical models

We do that with hierarchical models, and it's a fairly complicated model. Just using that kind of model, a normal computer will take days and months. With the power of Vertica and power of processing, we’ve been able to shrink that to a few hours. We can run 500 or 600 species from 13 sites, all over the world in five hours. So it’s a really good way to use the power of processing.

We’d been also more recently working with Distributed R, a new package that was written by HP folks at Vertica, to analyze satellite images, because we're also interested in what’s happening at these sites in terms of forest loss. Satellite images are really complicated, because you have millions of pixels and you don’t really know what each pixel is. Is it forest, agricultural land, or a house? So running that on normal R, it's kind of a problem.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Distributed R is a package that actually takes some of those functions, like random forest and regression trees, and takes full power of the vertical processing of Vertica. So we’ve seen a 10-fold increase in performance with that, and it allows us to get much more information out of those images.

Gardner: Not only are you on the cutting-edge for the analytics, you've also moved to the bleeding edge on infrastructure and distribution mechanisms. Eric, tell us a little bit about your use of cloud and hybrid cloud?

Fegraus: To back up a little bit, we ended up building a system that uses Vertica. It’s an on-premise solution and that's what we're using in the TEAM Network. We've since realized that this solution we built for the TEAM Network can also be readily scalable to other organizations and government agencies, etc., different people that want to manage camera trap data, they want to do the analytics.

So now, we're at a process where we’ve been essentially doing software development and producing software that’s scalable. If an organization wants to replicate what we’re doing, we have a solution that we can spin up in the cloud that has all of the data management, the analytics, the data transformations and processing, the collection, and all the data quality controls, all built into a software instance that could be spun up in the cloud.
In many of these countries, it's very difficult for some of those governments to expand out their old solutions on the ground. Cloud solutions offer a very good, effective way to manage data.

Gardner: And when you say “in the cloud,” are you talking about a specific public cloud, in a specific country or all the above, some of the above?

Fegraus: All of the above. We'll be using Vertica or we're using Vertica OnDemand. We're actually going to transition our existing on-premise solution into Vertica OnDemand. The solution we’re developing uses mostly open-source software and it can be replicated in the Amazon cloud or other clouds that have the right environments where we can get things up and running.

Gardner: Jorge, how important is that to have that global choice for cloud deployment and attract users and also keep your cost limited?

Ahumada: It’s really key, because in many of these countries, it's very difficult for some of those governments to expand out their old solutions on the ground. Cloud solutions offer a very good, effective way to manage data. As Eric was saying, the big limitation here is which cloud solutions are available in each country. Right now, we have something with cloud OnDemand here, but in some of the countries, we might not have the same infrastructure. So we'll have to contract different vendors or whatever.

But it's a way to keep cost down, deliver the information really quick, and store the data in a way that is safe and secure.

What's next?

Gardner: Eric, now that we have this ability to retrieve, gather, analyze, and now distribute, what comes next in terms of having these organizations work together? Do we have any indicators of what the results might be in the field? How can we measure the effectiveness at the endpoint -- that is to say, in these environments based on what you have been able to accomplish technically?

Fegraus: One of the nice things about the software that we built that can run in the various cloud environments, is that it can also be connected. For example, if we start putting these solutions in a particular continent, and there are countries that are doing this next to each other, there are not going to be silos that will be unable to share an aggregated level of data across each other so that we can get a holistic picture of what's happening.

So that was very important when we started going down this process, because one of the big inhibitors for growth within the environmental sciences is that there are these traditional silos of data that people in organizations keep and sit on and essentially don't share. That was a very important driver for us as we were going down this path of building software.

Gardner: Jorge, what comes next in terms of technology. Are the scale issues something you need to hurdle to get across? Are there analytics issues? What's the next requirements phase that you would like to work through technically to make this even more impactful?

Ahumada: As we scale up in size and  start  having more granularity in the countries where we work, the challenge is going to be keeping these systems responsive and information coming. Right now, one of the big limitations is the analytics. We do have analytics running at top speeds, but once we started talking about countries, we're going to have an the order of many more species and many more protected areas to monitor.
This is something that the industry is starting to move forward on in terms of incorporating more of the power of the hardware into the analytics, rather than just the storage and the management of data.

This is something that the industry is starting to move forward on in terms of incorporating more of the power of the hardware into the analytics, rather than just the storage and the management of data. We're looking forward to keep working with our technology partners, and in particular HP, to help them guide this process. As a case study, we're very well-positioned for that, because we already have that challenge.

Gardner: Also it appears to me that you are a harbinger, a bellwether, for the Internet of Things (IoT). Much of your data is coming from monitoring, sensors, devices, and cameras. It's in the form of images and raw data. Any thoughts about what others who are thinking about the impact of the IoT should consider, now that you have been there?

Fegraus: When we talk about big data, we're talking about data collected from phones, cars, and human devices. Humans are delivering the data. But here we have a different problem. We're talking about nature delivering the data and we don't have that infrastructure in places like Uganda, Zimbabwe, or Brazil.

So we have to start by building that infrastructure and we have the camera traps as an example of that. We need to be able to deploy much more, much larger-scale infrastructure to collect data and diversify the sensors that we currently have, so that we can gather sound data, image data, temperature, and environmental data in a much larger scale.

Satellites can only take us some part of the way, because we're always going to have problems with resolution. So it's really deployment on the ground which is going to be a big limitation, and it's a big field that is developing now.

Gardner: Drones?

Using drones

Fegraus: Drones, for example, have that capacity, especially small drones that are showing to be intelligent, to be able to collect a lot of information autonomously. This is at the cutting edge right now of technological development, and we're excited about it.

Gardner: Well great. I'm afraid we will have to leave it there. We have been learning and exploring how large-scale monitoring of rainforest, biodiversity and climate has been enabled and accelerated by cutting-edge, big-data capture, retrieval, and analysis. And we've seen how quantitative analysis and modeling are generating new insights into what's happening in tropical ecosystems worldwide.

So a big thanks to our guests, Eric Fegraus, Senior Director of Technology of the TEAM Network at Conservation International, and Jorge Ahumada, the Executive Director of the TEAM Network, also at Conservation International.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
And a big thank you to our audience as well for joining us for this big data innovation case study discussion. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a discussion on how large-scale monitoring of rainforest, biodiversity and climate has been enabled and accelerated by cutting-edge, big-data capture, retrieval and analysis. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Tuesday, August 18, 2015

The Future of Business Intelligence as a Service with GoodData and HP Vertica

Transcript of a BriefingsDirect discussion on how GoodData helps customers gain new insights into their businesses with on-demand data analytics.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Our next big data case study interview highlights how GoodData expands the realms and possibilities for delivering business intelligence (BI) and data warehousing as a service. We'll learn how they're exploring new technologies to make that more seamless across more data types for more types of users.

With that, we welcome Jeff Morris, Vice President of Marketing at GoodData in San Francisco. Welcome, Jeff. 

Jeff Morris: Thanks very much, Dana.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Gardner: We are also here with Chris Selland, Vice President for Business Development at HP Vertica. Welcome, Chris.

Chris Selland: Thanks, Dana. Great to be here with you both.

Gardner: First, Jeff, for those who might not be that familiar, tell us about GoodData, what you do and why it's different.

Morris: GoodData is an analytics platform as a service (PaaS). We cover the full spectrum end-to-end use case of creating an analytic infrastructure as a service and delivering that to our customers.

https://www.linkedin.com/profile/view?id=269795&authType=OPENLINK&authToken=yu9i&locale=en_US&srchid=2156023941439220736231&srchindex=26&srchtotal=1029&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A2156023941439220736231%2CVSRPtargetId%3A269795%2CVSRPcmpt%3Aprimary%2CVSRPnm%3Atrue%2CauthType%3AOPENLINK
Morris
We take on the challenges of collecting the data, whatever it is, structured and unstructured. We use a variety of technologies as appropriate, as we do that. We warehouse it in our multitenant, massively scalable data warehouse that happens to be powered by HP Vertica.

We then combine and integrate it into whatever the customer’s particular key performance indicators (KPIs) are. We present that in aggregate in our extensible analytics engine and then present it to the end users through desired dashboards, reports, or discoverable analytics.

Our business is set up such that about half of our business operates on an internal use case, typically a sales and marketing and social analytic kind of use case. The other half of our business, we call "Powered by GoodData." and those customers are embedding the GoodData technology in their own products. So we have a number of companies creating these customer-facing data products that ultimately generate new streams of revenue for their business.

40,000 customers

We've been at this since 2007. We're serving about 40,000 customers at this point and enjoying somewhere around 2.4 million data uploads a week. We've built out the service such that it's massively scalable. We deliver incredibly fast time to market. Last quarter, about two thirds of our deployments were delivered within 16 weeks or less.

One of the divisions of HP, in fact, deployed GoodData in less than six weeks. They are giving their first set of KPIs and delivering that value to them. What’s making us different in the marketplace right now is that we're eliminating all of the headaches associated with creating your own big data lake-style BI infrastructure and environment.

What we end up doing is affording you the time to focus on the analytics and the results that you gain from them—without having to manage the back-end operations.

Gardner: What’s interesting to me is that you mentioned PaaS for BI. Instead of developing applications and then having a production environment that’s seamlessly available to you, you're creating analytic applications on datasets that are contributed to your platform. Is that right?

Morris: Yes, indeed. The datasets themselves also tend to be born in the cloud. As I said, the types of applications that we're building typically focus on sales and marketing and social, and e-commerce related data, all of which are very, very popular, cloud-based data sources. And you can imagine they're growing like crazy.

We see a leaning in our customer base of integrating some on-premise information, typically from their legacy systems, and then marrying that up with the Salesforce, or the market data or social information that they want to integrate and build a full view of their customers -- or a full exposure of what their own applications are doing.
What we end up doing is affording you the time to focus on the analytics and the results that you gain from them—without having to manage the backend operations.

Gardner: So, you're really providing an excellent example of how HP Vertica is a cloud-borne analytics platform and implementation. That’s kind of interesting.

But I wonder whether any of your clients, maybe not so much in the media, but some of the more traditional verticals like healthcare, retail, or government, are trying to do this across a hybrid model. For example, they're doing some BI and they have warehouses on-premises or maybe other hosting models, but they also want to start to dabble in moving this to the cloud and taking advantage of what the cloud does best. Are we now on the vanguard of hybrid BI?

Morris: We're getting there, and there are certainly some industries are more cloud friendly than others right now. Interestingly, the healthcare space is starting to, but they're still nascent. The financial services industry is still nascent. They're very protective of their information. But retailers, e-commerce organizations, technology ISVs, and digital media agencies have adopted the cloud-based model very aggressively.

We're seeing a terrific growth and expansion there and we do see use cases right now where we're beginning to park the cloud-based environment alongside your more traditional analytics environments to create that hybrid effect. Often, those customers are recognizing that the speed at which data is growing in the cloud is driving them to look for a solution like ours.

Gardner: Chris, how unique is GoodData in terms of being all cloud moving toward hybrid, and does this really provide a poster child, in a sense, for Vertica as a service?

Special relationship

Selland: GoodData is certainly a very special partner and a very special relationship for us. As you said, Vertica is fundamentally a software platform that was purpose-built for big data that is absolutely cloud-enabled. But GoodData is the best representation of the partner who has taken our platform and then rolled out service offerings that are specifically designed to solve specific problems. It's also very flexible and adaptable.

Selland
So, it’s a special partnership and relationship. It's a great proof point for the fact that the HP Vertica platform absolutely was designed to be running in the cloud for those customers who want to do it.

As Jeff said, though, it really varies greatly by industry. A large majority of the customers in our customer advisory board (CAB), which tend to be some of our largest customers and some pretty well-known industries, were saying how they will never put their data in the cloud.

Never is a very long time, but at the same time, there are other industries that are adopting it very rapidly. So there is a rate of change that’s going on in the industry. It varies by size of company, by the type of competitive environment, and by the type of data. And yes, there is a lot of hybridization going on out there. We're seeing more of the hybridization in existing organizations that are migrating to the cloud. There's a lot of new breed companies who started in the cloud and have every intent of staying there.

But there's a lot of dynamism in this industry, a lot of change, and this is a partnership that is a true win-win. As I said, it's a very special relationship for both companies.

Gardner: Jeff, given that we have such variability, vertical by vertical, company by company, green-field versus an established company will behave differently vis-à-vis their architecture and their IT implementation. You need to be ready for any and all of that, and I suppose Vertica does as well.
We're triple clustering each set of instances of our vertical warehouses, so they are always reliable and redundant.

We're hearing also more than just HP Vertica here. We're talking about Haven, which includes Hadoop, Autonomy, security and applications. Is there a path that you see whereby you can try to be as many things to as many types of customer and vertical industries?

I'm thinking about Hadoop, security, and bringing some of the more enterprise-caliber KPIs and SLAs, so that some of those folks that are hesitant to move at least some their data in some ways to the cloud would move in that direction. Is that a vision for you? Maybe you could explain where you see this going on a hybrid basis.

Morris: Absolutely. The HP Haven-style architecture is a vision in a direction that we are going. We do use Hadoop right now for special use cases of expanding and providing structure, creating structure out of unstructured information for a number of our customers, and then moving that into our Vertica-based warehouse.

The beauty of Vertica in the cloud is the way we have set this up and this also helps address both the security and the reliability issues that might be a thought of as issues in the cloud. We're triple clustering each set of instances of our vertical warehouses, so they are always reliable and redundant.

Daily updates

We, like the biggest enterprises out there, are vigilantly maintaining our network. We update our network on behalf of our customers on a daily basis, as necessary. We roll out and maintain a very standardized operating environment, including an open stack-based operating environment, so that customers never need to even care about what versions of the SSL libraries exist or what versions of the VPN exist.

We're taking care of all of that really deep networking and things that the most stalwart enterprise-style IT architects are concerned about. We have to do that, too, and we have to do it at scale for this multi-tenant kind of use-case.

As I said, the architecture itself is very Haven-like, it just happens to be exclusively in the cloud -- which we find interesting and unique for us. As for the Hadoop piece, we don’t use Autonomy yet, but there are some interesting use cases that we are exploring there. We use Vertica in a couple of places in our architecture, not only that central data warehouse, but we also use it as a high-performance storage vehicle for our analytic data marts.

So when our customers are pushing a lot of information through our system, we're tapping into Vertica’s horsepower in two spots. Then, our analytic engine can ingest and deal with those massive amounts of data as we start to present it to customers.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
On the Haven architecture side, we're a wonderful example of where Haven ends up in the cloud. For the applications themselves, the kind of things that customers are creating, might be these hybrid styles where they're drawing legacy information in from their existing on-premise systems. Then, they're gathering up, as I said before, their sales and marketing information and their social information.

The one that we see as a wonderful green field for us is capturing social information. We have our own social analytic maturity model that we describe to customers and partners on how to capitalize on your campaigns and how to maximize your exposure through every single social channel you can think of.

We're very proficient at that, and that's what's really driving the immense sizes of data that our customers are asking for right now. Where we used to talk in tens of terabytes for a big system, we're now talking in the world of hundreds, multiple hundreds of terabytes, for a system. Case by case by case, we're seeing this really take off.

Gardner: It's fine to talk about this as an abstraction, but it's really useful to hear some examples. Do you have any companies, either named or unnamed, that provide a great use case example of PaaS, for BI apps that take advantage of some of the attributes of HP Haven and Vertica?
Where we used to talk in tens of terabytes for a big system, we're now talking in the world of hundreds, multiple hundreds of terabytes, for a system.

Morris: One of our oldest and most dear customers is Zendesk. They have a very successful customer-support application in the cloud. They provide both a freemium model and degrees of for-fee products to their customers.

And the number one reason why their customers upgrade from freemium to general and then general to the gold level of product is the analytics that they're supplying inside of there. They very recently announced a whole series of data products themselves, all powered by GoodData, as the embedded analytic environment within Zendesk.

We have another customer, Service Channel which is a wonderful example of marrying together two very disparate user communities. Service Channel is a facility’s management enterprise resource planning (ERP) application. They bring together the facility managers of your favorite brick-and-mortar retailers with the suppliers who provide those retail facilities service, janitorial services, air-conditioning guy, the plumbers.

Disparate customers

Marrying disparate types of customers, they create their own data products as well, where they are integrating third-party information like weather data. They score their customers, both the retailers as well as the suppliers, and benchmark them against each other. They compare how well one vendor provides service to another vendor and they also compare how much one of the retailers spends on maintaining their space.

Of course, Apple gets incredibly high marks. RadioShack, right now, as they transition their stores, not so much. Service Channel knew this information long before the industry did, because they're watching spend. They, too, are starting to create almost a bidding network.

When they integrated their weather data into the environment, they started tracking and saying, "Apple would like to gain first right of refusal on the services that they need." So if Apple’s air conditioning goes out, the service provider comes in and fixes the air-conditioning sooner than Best Buy and all of their competitors. And they'll bid up for that. So they've created almost a marketplace. As I said before, these data products are really quite an advantage for us.

Gardner: Looking a bit to the future, we've heard the interest in moving from predictive to prescriptive analytics. It seems to me that that’s really a factor of the quality of the data in getting data from different sources and bring it together, something you can do in a cloud more easily or more efficiently than server by server, or cluster by cluster.
We feel like we're creating a central location where analysts, data scientists, and our regular IT can all come together and build a variety of analytic applications.

What kind of services should we envision as the analytics as a business model unfolds in the cloud and you can start to do joins across different types of data for an industry, rather than just an enterprise? Is there an opportunity to get that prescriptive value as a provider with the past capability? It sounds very exciting and interesting. What's coming next?

Morris: Most definitely, we're seeing a number of great opportunities, and many are created and developed by the technologies we've chosen as our platform. We love the idea of creating not only predictive, but prescriptive, types of applications in use cases on top of the GoodData environment. We have customers that are doing that right now and we expect to see them continue to do that.

What I think will become really interesting is when the GoodData community starts to share their analytic experiences or their analytic product with each other. We feel like we're creating a central location where analysts, data scientists, and our regular IT can all come together and build a variety of analytic applications, because the data lives in the same place. The data lives in one central location, and that’s an unusual thing. In most of the industry your data is still siloed. Either you keep it to yourself on-premise or your vendors keep it to themselves in the cloud and on-premise.

But we become this melting pot of information and of data that can be analytically evaluated and processed. We love the fact that Vertica has its own built-in analytic functions right in the database itself. We love the fact that they run our predictive language without any other issue and we see our customers beginning to build off of that capability.

My last point about the power of that central location and the power of GoodData is that our whole goal is to free time for those data scientists and those IT people to actually perform analytics and get out of the business of maintaining the systems that make analytics available, so that you can focus on the real intellectual capital that you want to be creating.

Identifying trends

Gardner: So, Chris, to cap this off, I think we've identified some trends. We have PaaS for BI. We have hybrid BI. We have cloud data joins and ecosystems that create a higher value abstraction from data. Any thoughts about how this comes together, and does this fit into the vision that you have at HP Vertica and that you're seeing in other parts of your business?

Selland: We're very much only at the front end of the big data analytics revolution. I ultimately don’t think we are going to be using the term "big data" in 10 years.

I often compare big data today to eBusiness 10, 12 years ago. Nobody uses that term anymore, but that was when everything was going online, and now everything is online, and the whole world has changed. The same thing is happening with analytics today.

With a hundred times more data we can actually get 10,000 times more insight. And that's true, but it's not just the amount of data; it's the ability to cross-correlate. That's the whole vision of what Jeff was just talking about that GoodData is trying to do.
We're very much only at the front end of the big data/analytics revolution. I ultimately don’t think we are going to be using the term "big data" in 10 years.

It's the vision of Haven, to bring in all types of data and to be able to look at it more holistically. One of my favorite examples, just to make that concrete, is that there is an airline we were talking to. They were having a customer service issue. They were having a lot of their passengers tweeting angrily about them, and they were trying to analyze the social media data to figure out how to make this stop and how to respond.

In a totally separate part of the organization, they had a predictive maintenance project, almost an Internet-of-things (IoT) type of project, going on. They were looking at data coming off the fleet, and trying to do better job of keeping their flights on time.

If you think about this, you say, "Duh." There was a correlation between the fact that they were having service problems and that the flights were late with the fact that the passengers were angry. Suddenly, they realized that maybe by focusing less on the social data in this case, or looking at that as the symptom as opposed to cause, they were able to solve the problem much more effectively. That's a very, very simple example.

I cite that because it makes real for people that it's when you really start cross-correlating data you wouldn't normally think belong together -- social data and maintenance data, for example -- you get true insights. It's almost a silly simple example, but those types of examples we're going to see much more. The more of this we can do, the more power we are going to get. I think that the front end of the revolution is here.

Gardner: And then those insights become empirical, and not just intuitive or based on someone's observation. You have hard evidence.

Selland: Correct, exactly.

Gardner: All right. I'm afraid we have to leave it there. We have been learning about how GoodData delivers a platform as a service around business intelligence, built on HP Vertica, in the cloud. I'd like to thank our guests, Jeff Morris, the Vice President of Marketing at GoodData, and Chris Selland, Vice President for Business Development at HP Vertica.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
And I'd like to thank our audience as well for joining us for this special new style of IT discussion. I'm Dana Gardner; Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and do come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP Enterprise.

Transcript of a Briefings Direct discussion on how GoodData is helping its customers gain new insights into their businesses with data analytics. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Monday, August 10, 2015

How ECommerce Sites Harvest Big Data Across Multiple Clouds

Transcript of a BriefingsDirect discussion on how HP Vertica helps a big-data consultancy scale workloads for ecommerce sites.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation.

Gardner
Our big data user interview highlights how a consultant is helping large ecommerce organizations better manage their big data and provide the insights that they need to thrive in a fast-paced environment.

With that, please join me in welcoming our guest, Jimmy Mohsin, Principal Software Architect at Norjimm LLC, a consultancy based in Princeton, New Jersey. Welcome, Jimmy.

Jimmy Mohsin: Thank you, Dana.

Gardner: We've been hearing an awful lot of about some extraordinary situations where the fast-paced environment and data volumes that users are dealing with have left them with a need for a much better architecture.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Tell me what you are seeing in the marketplace? How desperate are people to find the right big data architecture? 

Mohsin There's a lot of interest in trying to deal with large data volumes, not only large data volumes, but also data that changes rapidly. Now, there are many companies that have very large datasets, some in terabytes, some in petabytes and then they're getting live feeds.

The data is there and it’s changing rapidly. The traditional databases sometimes can’t handle that problem, especially if you're using that database as a warehouse and you're reporting against it.

Basically, we have kind of a moving-target situation. With HP Vertica, what we've seen is the ability to solve that problem in at least some of the cases that I've come across, and I can talk about specific use cases in that regard.

Input/output issues

Gardner: Before we get into a specific use case, I'm interested particularly in some of these input/output issues. People are trying to decide how to move the data around. They're toying with cloud. They're trying to bring data for more types of traditional repositories. And, as you say, they're facing new types of data problems with streaming and real-time feeds.

How do you see them beginning this process when they have to handle so many variables? Is it something that’s an IT architecture, or enterprise architecture, or data architecture? Who's responsible for this, given that it’s now a rather holistic problem?

Mohsin In my present project, we ran into that. The problem is that many companies don't even have a well defined data-architecture team. Some of them do. You'll find a lot of companies with an enterprise-architect role and you'll have some companies with a haphazard definition of an architectural group.

Mohsin
Net-net, at least at this point, unless companies are more structured, it becomes a management issue in the sense that someone at the leadership level needs to know who has what domain knowledge and then form the appropriate team to skin this cat.

I know of a recent situation where we had to build a team of four people, and only one was an architect. But we built a virtual team of four people who were able to assemble and collate all the repositories that spanned 15 years and four different technology flavors, and then come up with an approach that resulted in a single repository in HP Vertica.

So there are no easy answers yet, because organizations just aren't uniformly structured.

Gardner: Well, I imagine they'll be adapting, just like we all are, to the new realities. In the meantime, tell me about a specific use case that demonstrates the intensity of scale and velocity, and how at least one architecture has been deployed to manage that?

Mohsin One of my present projects deals with one of the world's largest retailers. It's eCommerce, online selling. One of the things they do, in addition to their transactions of buying and selling, is email campaign management. That means staying in touch with the customer on the basis of their purchases, their interests, and their profiles.

One of the things we do is see what a certain customer’s buying preferences have been over the past 90 days. Knowing that and the customer’s profile, we can try to predict what their buying patterns will be. So we send them a very tailored message in that regard. In this project, we're dealing with about 150 to 160 million emails a day. So this is definitely big data.

Here we have online information coming into one warehouse as to what's happening in the world of buying and selling. Then, behind the scenes, while that information is being sent to the warehouse, we're trying to do these email campaigns.

This is where the problem becomes fairly complicated. We tried traditional relational database management systems (RDBMS), and they kind of worked, but we ran into a slew of speed and performance issues. That's really where the big-data world was really beneficial. We were able to address that problem in about a seven-month project that we ran.

Gardner: And this was using Vertica?

Large organization

Mohsin We did an evaluation. We looked at a few databases, and the corporate choice was Vertica. We saw that there is a whole bunch of big-data vendors. The issue is that many of the vendors don't have any large organizations behind them, and Vertica does. The company management felt that this was a new big database, but HP was behind it, and the fact that they also use HP hardware helped a lot.

They chose Vertica. The team I was managing did a proof of concept (POC) and we were able to demonstrate that Vertica would be able to handle the reporting that is tied to the email campaign management. We ran a 90 day POC, and the results were so positive that there was an interest in going live. We went live in about another 90 days, following a 90-day POC.

Gardner: I understand that Vertica is quite versatile. I've heard of a number of ways in which it's used technically. But this email campaign problem almost sounds like a transactional issue, a complex event processing issue, or a transfer agent scaling issue. How does big data, Vertica, and analytics come to bear on this particular problem?

Mohsin It's exactly what you say it is. As we are reporting and pushing out the campaigns, new information is coming in every half hour, sometimes even more frequently. There's a live feed that's updating the warehouse. While the warehouse is being updated, we want to report against it in real time and keep our campaigns going.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
The key point is that we can't really stop any of these processes. The customers who are managing the campaigns want to see information very frequently. We can’t even predict when they would want their information. At the same time, the transactional systems are sending us live feeds.

The problem we ran into with the traditional RDBMS is that the reporting didn't function when the live feeds were underway. We couldn't run our backend email campaign reports when new data was coming in.

One of the benefits Vertica has, due to its basic architecture and its columnar design is that it's better positioned to do that. This is what we were able to demonstrate in the live POC, and nobody was going to take our word for it.

The end user said, "Take few of our largest clients. Take some of our clients that have a lot of transactions. Prove that the reports will work for those clients." That's what we did in 30 days. Then, we extended it, and then in 90 days, we demonstrated the whole thing end to end. Following that was the go-live.

Gardner: You had to solve that problem of the live feeds, the rapidity of information. Rather going to a stop, batch process, analyze, repeat, you've gained a solution to your problem.

But at the same time, it seems like you're getting data into an environment where you can analyze it and perhaps extract other forms of analysis, in addition to solving your email, eCommerce trajectory issues. It seems to me that you're now going to have the opportunity to add a new dimension of analysis to what's going on and perhaps we find these transactions more towards a customer inference benefit.

More than a database

Mohsin One of the things internally that I like to say is that Vertica isn't just a big database, it’s more than just a database. It's really a platform, because you have distributed all, you are publishing other tools. When we adopted it and went live with this technology, we first solved the feeds and speeds problem, but now we're very much positioned to use some of the capabilities that exist in Vertica.

We had Distributed R being one of them, Inference Analysis being another one, so that we can build intelligent reports. To date, we've been building those outside the RDBMS. RDBMS has no role in that. With Vertica, I call it more of a data platform. So we definitely will go there, but that would be our second phase.

As the system starts to function and deliver on the key use cases, the next stage would be to build more sophisticated reports. We definitely have the requirements and now we have the ability to deliver.

Gardner: Perhaps you could add visualization capabilities to that. You could make a data pool available to more of the constituents within this organization so that they could innovate and do experiments. That’s a very powerful stuff indeed.

Is there anything else you can tell us for other organizations that might be facing similar issues around real-time feeds and the need to analyze and react, now that you have been through this on this particular project. Are there any lessons learned for others.
One of the issues in big data at least today is that you can’t find a whole slew of clients who have already gone live and who are in production.

If you're facing transactional issues and you haven't thought about a big-data platform as part of that solution, what do you offer to them in terms of maybe lighting a light bulb in their mind about looking for alternatives to traditional middleware.

Mohsin Like so many people try to do, we tried to see if anyone else had done this. One of the issues in big data at least today is that you can’t find a whole slew of clients who have already gone live and who are in production.

There are lots of people in development, and some are live, but in our space, we couldn't find anyone who was live. We solved that issue via a quick-hit POC. The big lesson there was that we scoped the POC right. We didn’t want to do too much and we didn’t want to do too little. So that was a good lesson learned.

The other big thing is the data-migration question. Maybe, to some extent, this problem will never be solved. It's not so easy to pull data out of legacy database systems. Very few of them will give you good tools to migrate away from them. They all want you to stay. So we had to write our own tooling. We scoured the market for it, but we couldn’t find too many options out there.

Understand your data

So a huge lesson learned was, if you really want to do this, if you want to move to big data, get a handle on understanding your data. Make sure you have the domain experts in-house. Make sure you have the tooling in place, however rudimentary it might be, to be able to pull the data out of your existing database. Once you have it in the file system, Vertica can take it in minutes. That’s not the problem. The problem is getting it out.

We continue to grapple with that and we have made product enhancement recommendations. But in fairness to Vertica, this is really not something that Vertica can do much about, because this is more in the legacy database space.

Gardner: I've heard quite a few people say that, given the velocity with which they are seeing people move to the cloud, that obviously isn't part of their problem, as the data is already in the cloud. It's in the standardized architecture that that cloud is built around, if there is a platform-as-a-service (PaaS) capability, then getting at the data isn't so much of a problem, or am I not reading that correctly?
There is still a lingering fear of the cloud. People will tell you that the cloud is not secure.

Mohsin No, you're reading that correctly. The problem we have is that a lot of companies are still not in the cloud. There is still a lingering fear of the cloud. People will tell you that the cloud is not secure. If you have customer information, if you have personalized data, many organizations don't want to put it in the cloud.

Slowly, they are moving in that direction. If we were all there, I would completely agree with you, but since we still have so many on-premise deployments, we're still in a hybrid mode -- some is on-prem, some is in the cloud.

Gardner: I just bring it up because it gives yet another reason to seriously consider cloud. It’s a benefit that is actually quite powerful -- the data access and ability to do joins and bring datasets together because they're all in the same cloud.

Mohsin I fundamentally agree with you. I fundamentally believe in the cloud and that it really should be the way to go. Going through our very recent go-live, there is no way we could have the same elasticity in an on-prem is deployment that we can have in a cloud. I can pick up the phone, call a cloud provider, and have another machine the next day. I can't do that if it’s on-premise.

Again, a simple question of moving all the assets into the cloud, at least in some organizations, will take several months, if not years.

Gardner:  Very good. I'm afraid we will have to leave it there. We have been discussing how a specific enterprise in the eCommerce space has solved some unique problems using big data and, in particular, the HP Vertica platform.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
That sets the stage for a wider use of big data for transactional problems and live-feed issues. It's also why moving to cloud has also some potential benefits for speed, velocity, and dexterity when it comes to data across multiple data sources and implementations.

So with that, a big thank you to our guest, Jimmy Mohsin, Principal Software Architect at Norjimm LLC, a consultancy based in Princeton, New Jersey. Thanks, Jimmy.

Mohsin Thanks, Dana. Have a great day.

Gardner: And a big thank you to our audience as well, for joining us for the special new style of IT discussion.

I'm Dana Gardner; Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP Enterprise.

Transcript of a BriefingsDirect discussion on how HP Vertica helps a big-data consultancy scale workloads for ecommerce sites. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in: