Let’s Meet Up at the Nashville Analytics Summit


The Nashville Analytics Summit will be on us before we know it. This special gathering of data and analytics professionals is scheduled for August 20th and 21st, and should be bigger and better than ever. From my first experience with the Summit in 2014, it has consistently been a highlight of my year. My first Summit took place at the Lipscomb Spark Center meeting space with about a hundred attendees. Just a few years later, we’d grown to more than 450 attendees and moved into the Omni Hotel.

Mark it on your calendar. I’ll give you five reasons why it is a can’t-miss event if you work with data:

  1. We’ve invited world-renowned keynote speakers like Stephen Few and Thomas Davenport. You won’t believe who we are planning to bring in this year.
  2. There isn’t a better networking event for analytics professionals in our region. Whether you’re looking for talent or looking for the next step in your career, you’ll meet kindred spirits, data lovers, and innovative businesses. For two years in a row, we have hired Juice interns directly from conversations at the Summit. 
  3. It’s for everyone who works with data. Analyst, Chief Data Officer, or Data Scientist… we’ve got you covered. There are technical workshops and presentations for the hands-on practitioner and case studies and management strategies for the executive. We’re committed to bringing you quality and diverse content.
  4. It’s a “Goldilocks” conference. Some conferences go on for days. Some conferences are a sea of people, or too small to expand your horizons. The Analytics Summit is two days, 500-something people, and conveniently located in the cosy confines of the Omni Hotel. It is easy to meet new people and connect with people you know.
  5. See what’s happening. Nashville has a core of companies committed to building a special and innovative analytics community. We have innovators like Digital Reasoning, Stratasan, and Juice Analytics. We have larger companies making a deep commitment to analytics like Asurion, HCA, and Nissan. The Summit is the best chance to see the state of our thriving analytics community.

Now that you’re convinced you can’t miss out, you’re may wonder what to do next. First, block out your calendar (August 20 and 21). Next, find a colleague who you’d like to go with. Want to be even more involved? We invited dozens of local professionals to speak at the Summit. You can submit a proposal to present. 

Finally, if you don’t want your company to miss out on the opportunity to reach our entire analytics community, there are still slots for sponsors.

I hope to see you there.

learn more and register

Originally Posted at: Let’s Meet Up at the Nashville Analytics Summit by analyticsweek

The Future Of Big Data Looks Like Streaming

Big data is big news, but it’s still in its infancy. While most enterprises at least talk about launching Big Data projects, the reality is that very few do in any significant way. In fact, according to new survey data from Dimensional, while 91% of corporate data professionals have considered investment in Big Data, only 5% actually put any investment into a deployment, and only 11% even had a pilot in place.

Big data is big news, but it’s still in its infancy. While most enterprises at least talk about launching Big Data projects, the reality is that very few do in any significant way. In fact, according to new survey data from Dimensional, while 91% of corporate data professionals have considered investment in Big Data, only 5% actually put any investment into a deployment, and only 11% even had a pilot in place.

Real Time Gets Real

ReadWrite: Hadoop has been all about batch processing, but the new world of streaming analytics is all about real time and involves a different stack of technologies.

Langseth: Yes, however I would not entangle the concepts of real-time and streaming. Real-time data is obviously best handled as a stream. But it’s possible to stream historical data as well, just as your DVR can stream Gone with the Wind or last week’s American Idol to your TV.

 This distinction is important, as we at Zoomdata believe that analyzing data as a stream adds huge scalability and flexibility benefits, regardless of if the data is real-time or historical.

RW: So what are the components of this new stack? And how is this new big data stack impacting enterprise plans?

JL: The new stack is in some ways an extension of the old stack, and in some ways really new.

Data has always started its life as a stream. A stream of transactions in a point of sale system. A stream of stocks being bought and sold. A stream of agricultural goals being traded for valuable metals in Mesopotamia.

Traditional ETL processes would batch that data up and kill its stream nature. They did so because the data could not be transported as a stream, it needed to be loaded onto removable disks and tapes to be transported from place to place.

But now it is possible to take streams from their sources, through any enrichment or transformation processes, through analytical systems, and into the data’s “final resting place”—all as a stream. There is no real need to batch up data given today’s modern architectures such as Kafka and Kinesis, modern data stores such as MongoDB, Cassandra, Hbase, and DynamoDB (which can accept and store data as a stream), and modern business intelligence tools like the ones we make at Zoomdata that are able to process and visualize these streams as well as historical data, in a very seamless way.

Just like your home DVR can play live TV, rewind a few minutes or hours, or play moves from last century, the same is possible with data analysis tools like Zoomdata that treat time as a fluid.

Throw That Batch In The Stream

Also we believe that those who have proposed a “Lambda Architecture,” effectively separating paths for real-time and batched data, are espousing an unnecessary trade-off, optimized for legacy tooling that simply wasn’t engineered to handle streams of data be they historical or real-time.

At Zoomdata we believe that it is not necessary to separate-track real-time and historical, as there is now end-to-end tooling that can handle both from sourcing, to transport, to storage, to analysis and visualization.

RW: So this shift toward streaming data is real, and not hype?

JL: It’s real. It’s affecting modern deployments right now, as architects realize that it isn’t necessary to ever batch up data, at all, if it can be handled as a stream end-to-end. This massively simplifies Big Data architectures if you don’t need to worry about batch windows, recovering from batch process failures, etc.

So again, even if you don’t need to analyze data from five seconds or even five minutes ago to make business decisions, it still may be simplest and easiest to handle the data as a stream. This is a radical departure from the way things in big data have been done before, as Hadoop encouraged batch thinking.

But it is much easier to just handle data as a stream, even if you don’t care at all—or perhaps not yet—about real-time analysis.

RW: So is streaming analytics what Big Data really means?

JL: Yes. Data is just like water, or electricity. You can put water in bottles, or electricity in batteries, and ship them around the world by planes trains and automobiles. For some liquids, such as Dom Perignon, this makes sense. For other liquids, and for electricity, it makes sense to deliver them as a stream through wires or pipes. It’s simply more efficient if you don’t need to worry about batching it up and dealing with it in batches.

Data is very similar. It’s easier to stream big data end-to-end than it is to bottle it up.

Article originally appeared HERE.

Source by analyticsweekpick

Statistics: Is This Big Data’s Biggest Hurdle?

Big Data is less about the data itself and more about what you do with the data. The application of statistics and statistical principles on the data helps you extract the information it contains. According to Wikipedia, statistics is the study of the collection, analysis, interpretation, presentation, and organization of data. The American Statistical Association defines statistics as “the science of learning from data, and of measuring, controlling, and communicating uncertainty.”

Statistics is considered to be one of the three primary pillars of the field of data science (the other two are content domain knowledge and computer science skills). While content domain expertise provides the context through which you identify the relevant questions to ask, computer science skills help you get access to the relevant data and prepare them for analysis, statistics helps you interrogate that data to provide answers to your questions.

The Rise of Statistics

We have a lot of data and are generating a lot more of it. IDC says that we created 2.8 zettabytes in 2012. They estimate that number will grow to 40 zettabytes by 2020. It’s not surprising that Hal Varian, chief economist at Google, in 2009, said that “the sexy job in the next 10 years will be statisticians.” Statistics, after all, helps make sense of and get insight from data. The importance of statistics and statistical thinking in our datafied world can also be found in this excellent slideshare by Diego Kuonen, a statistician.

Figure 1. The Hottest Skill on LinkedIn in 2014: Statistical Analysis and Data Mining
Figure 1. The Hottest Skill on LinkedIn in 2014: Statistical Analysis and Data Mining

Statistical skills are receiving increasing attention in the world of business and education. LinkedIn found that statistical analysis and data mining was the hottest skill in 2014 (see Figure 1).

Many companies are pursuing statistics-savvy people to help them make sense of their quickly-expanding, ever-growing, complex data. Job postings on Indeed show that the number of data science jobs continue to grow (see Figure 2).

big-data, data-science Job Trends graph
Figure 2. Growth rate for Data Science jobs continues to increase.

University students are flocking to the field of statistics. Of the STEM Professions, statistics has been the fastest growing undergraduate degree over the past four years (see Figure 3).

Figure 3. Of the STEM fields, statistics has the highest growth rate.

The Fall of Statistics

The value of statistics is evident by the increase in number of statistics degrees and the Big Data jobs requiring statistical skills. These are encouraging headlines, no doubt, as more businesses are adopting what scientists have been using to solve problems for decades. But here are a few troubling trends that need to be considered in our world of Big Data.

McKinsey estimates that the US faces a shortage of up to 190,000 people with analytics expertise to fill these data science jobs as well as a shortage of 1.5 million people to fill managerial and analyst jobs who can understand and make decisions based on the data. Where will we find these statistics-savvy people to fill the jobs of tomorrow? We may have to look outside the US.

Figure 4. Some ranking
Figure 4. USA Ranks 27th in the world on math literacy of 15-year-old students.

In a worldwide study on 15-year-old students’ reading, mathematics, and science literacy (the Program for International Student Assessment; PISA), researchers found that US teenagers, compared to children of other countries, ranked 27th (out of 34 countries) in math literacy (see Figure 4), many countries having significantly higher scores than the US. According to the NY Times, while 13% of industrialized nations reached the top two levels of proficiency in math,  just 9% of US students did. In comparison, 55% of students from Shanghai reached that level of proficiency. In Singapore, 40% did.

Even the general US public is showing a decreased interest in statistics. Using Google Trends, I looked at the popularity of the term, statistics, among the general US public, comparing it with “analytics” and “big data.” While the number of searches for “big data” and “analytics” has increased, the number of searches of “statistics” has decreased steadily since 2004.

Summary and Major Trends

Statistics is the science of learning from data. Statistics and statistical thinking helps people understand the importance of data collection, analysis, interpretation and reporting of results.

In our Big Data world, statistical skills are becoming increasingly important for businesses. Companies are creating analytics-intensive jobs for statistics-savvy people, and universities are churning out more graduates with statistics degrees. On the other hand, there is expected to be a huge talent gap in the analytics industry. Additionally, the math literacy of US students is very low compared to the rest of the world. Finally, the US general public’s interest in statistics has been decreasing steadily for about a decade.

Knowledge of Statistics Needed for Both Analysts and Consumers

Statistics and statistical knowledge are not just for people who analyze data. They are also for people who consume, interpret and make decisions based the analysis of those data. Think of the data from wearable devices, home monitoring systems and health records and how they are turned into reports for fitness buffs, homeowners and patients. Think of CRM systems, customer surveys, social media posts and review sites and how dashboards are created to help front-line employees make better decisions to improve the customer experience.

The better the grasp of statistics people have, the more insight/value/use they will get from the data. In a recent study, I found that customer experience professionals had difficulty estimating size of customer segments based on customer survey metrics. Even though these customer experience professionals commonly use customer survey metrics to understand their customers, they showed extreme bias when solving this relatively simple problem. I assert that they would likely benefit (make fewer errors) if they understood statistics.

To get value from the data, you need to make sense of it, do something with it. How you do that is through statistics and applying statistical thinking to your data. Statistics is a way of helping people get value from their data. As the number of things that get quantified (e.g., data) continues to grow, so will the value of statistics.

The Most Important Thing People Need to Know about Statistics

Statistics is the language of data. Like knowledge of your native language helps you maneuver in the world of words, statistics will help you maneuver in the world of data. As the world around us becomes more quantified, statistical skills will become more and more essential in our daily lives. If you want to make sense of our data-intensive world, you will need to understand statistics.

I’m not saying that everyone needs an in-depth knowledge of statistics, but I do believe that everybody would benefit from knowing basic statistical concepts and principles. What is the most important thing you think people need to know about statistics and why? I would love to hear your answers in the comments section. Here is my take on this question.


Source by bobehayes

Big data and predictive analytics likely to dominate recruitment, TeamLease report says

CHENNAI: New areas such as big data and predictive analytics are emerging as the most coveted in the Indian recruitment space and also accelerating the need for a highly sophisticated workforce, says TeamLease Services’ employment outlook report for the half year period from April to September 2015.

The report expects that demand for the analytic skills is likely to far outstrip supply. The skills required for a data analytics function typically relate to mathematics and statistics, besides programming skills and an ability to go through lines of data generated by businesses to unearth valuable patterns.

According to TeamLease, other key trends that are likely to dominate recruitment industry over the next six months include increasing demand for Information Technology (IT), engineering and other blue collar jobs, emergence of startups as key hirers and the increased adoption of Recruitment Process Outsourcing (RPO).

The report states that both business and employment outlook is likely to dip marginally in the April-September period, indicating a consolidation by India Inc. While business sentiment is expected to witness a one point drop, employment outlook is likely to be down by two points.

Photo courtesy of The Times of India
Photo courtesy of The Times of India

“However, the cautiousness seems to have not dampened job growth and it remains strong at 11.3%, although lower than last half year, it is significantly better than the previous year,” TeamLease said in a release.

Industry-wise analysis showed that retail (led largely by online) and manufacturing/engineering, which clocked a three and four point increase in business and employment outlook respectively, are pushing the overall sentiment upwards while telecom seems to be the laggard.

Geographically, Mumbai and Delhi continue to showcase robust business and hiring activity, while Chennai’s growth is restricted only to business.

Pune, with a three point increase in hiring, is on a hiring spree, the report says. It expects that hiring from tier 2 towns will fall by two points in the current half year.

Functionally, the report says sales & marketing seems to have lost its sheen in the employment market with the focus landing on IT and engineering roles.

“As the economic fundamentals that drive business and hiring sentiments last half year continue to exist, the current dip is more a course correction than a downturn. The pro-industry announcements and easing of norms coupled by the resurgence in the GDP growth will definitely pull the hiring back onto the growth trajectory,” Kunal Sen, senior vice-president of TeamLease Services, said in a statement.

The report stresses on the growing requirement for talent in the field of delivery/logistics, facility management, mobile applications as well as data science and lists content curator, dental hygienist and valuation and market risk analysts as a few sought after skills.

To read the original article from The Times of India, click here.

Originally Posted at: Big data and predictive analytics likely to dominate recruitment, TeamLease report says by analyticsweekpick

The Blueprint for Becoming Data Driven: Data Quality

Data quality is the foundation upon which data-driven culture rests. Before upper level management can embrace data-centric processes, before analytics and reliance on big data becomes pervasive, and before data stewardship can truly extend outside IT’s boundaries to become embraced by the business, organizations must trust their data.

Trustworthy data meets data quality measures for…

  • Parsing
  • Cleansing
  • Profiling
  • De-duplicating
  • Modeling

…and is the reliable, consistent basis for accurate analytics and the value data provide to optimize business processes.

By ensuring data quality, organizations are laying the foundation for becoming data driven both explicitly and implicitly. Explicit manifestations of this shift include an increased reliance on data, greater valuation of data as an asset, and an entrenchment of data as a means of optimizing business. Implicitly, the daily upkeep of data-driven processes become second-nature as aspects of data stewardship, provenance, integration, and even modeling simply become due diligence for everyone’s job.

According to Tamr head of product and strategy Nidhi Aggarwal, the quintessential manifestation of a data-centered culture may be reflected in another way that delivers even greater pecuniary benefits—the utilization of all enterprise data. “People talk about the democratization of analytics and about being truly data driven,” Aggarwal commented. “You cannot do that if you’re only using 20 percent of your data.”

Cognitive Data Science Automates Data Quality Measures
Data quality is largely conceived of as the output of assiduous data preparation. It is effectively ensured via the deployment of the machine learning technologies that are responsible for automating critical facets of data science: particularly the data cleansing and preparation that can otherwise become too laborious. Contemporary platforms for data quality establish machine learning models that map data of all types to specific measures for quality—and which can also include additional facets of data preparation, such as transformation. “Our machine learning models are able to do that really fast, really cheap, and improve it over time as we see more and more data,” Aggarwal noted.

With machine learning, quality measures begin by mapping relevant data sources to one another to determine how their attributes relate. The cognitive prowess of these models are demonstrated in their ability to sift through individual records for these data sources (which may be bountiful). In doing so, they identify points of redundancy, relationships between data, names and terms, how recent data are, and many other facets of data quality. “It’s very difficult for a human to do that,” Aggarwal said. “With machine learning, by doing statistical analysis, by looking at all of the attributes, by looking at these rules that some domain experts provide to the models, by looking at how the humans answered the questions that we presented as samples to them, it makes decisions about how these things should be de-duplicated.”

Reinforcing Trust with Provenance and Natural Language Processing
Competitive preparation platforms that facilitate data quality temper the quality measures of cognitive computing with human involvement. The result is extremely detailed data provenance which reinforces trust in data quality, and which is easily traced for the purposes of assurance. The decisions that domain experts make about how sources are unified and relate to each other for specific data types—which is critical to establishing data quality—are recorded and stored in platforms for traceability. Thus, there is little ambiguity about who made a decision, when, and what effect it had on the underlying machine learning model for how data was unified and defined to establish data quality. Natural Language Processing is involved in the data quality process (especially with unstructured text) by helping to reconcile definitions, different terms, and commonalities between terms and how they are phrased. The pivotal trust required for becoming data-driven is therefore facilitated with both machine learning and human expertise.

Metadata and Evolving Models
The granular nature of a machine learning, human tempered approach to data quality naturally lends itself to metadata and incorporating new data sources into quality measures. Metadata is identified and compared between sources to ensure unification for specific use cases and requisite data quality. The true value of this cognitive approach to data quality is evinced when additional data sources are included. According to Aggarwal: “People can do this manual mapping if they only wanted to do it manually once. But the trouble is when they have to add a new data source, it’s almost as much effort as doing it the first time.” However, the semantic technologies that form the crux of machine learning are able to incorporate new sources into models so that “the model can actually look at the new data set, profile it really quickly, and figure out where it maps to all the things that it previously knows about” Aggarwal said.

More significantly, the underlying machine learning model can evolve alongside data sets that are radically dissimilar from its initial ones. “Then the model updates itself to include this new data,” Aggarwal mentioned. “So when a new data set comes in further down the line, the chances are that it will be completely new and that the models don’t align with it go lower and lower every time.” The time saved from the expedited process of updating the models required for data quality underscore the agility required to further trust data when transitioning to becoming data driven.

Using All Data
When organizations are able to trust their data because of the aforementioned rigorous standards for data quality, they are able to incorporate more data into business processes. The mapping procedures previously outlined helps organizations to bring all of their data together and determine which of it relates to a specific uses case. The monetary boons of incorporating all enterprise data into business processes is exemplified with a use case from the procurement vertical. Were a company attempting to determine how many suppliers it had and whether it was getting the best payment terms from them, those that were not data savvy could only use a finite amount of their overall data—limited to particular business units—to determine this answer. Those that were truly data-driven and able to incorporate all of their data for this undertaking could incorporate the input of greater amounts of business units and, according to Aggarwal, who encountered this situation with a Tamr customer:

“There were wildly different payment terms for the same supplies. When we dug into what parts they were buying from the suppliers and at what prices across the different business units, there were sometimes 300X differences in the price of the same part.” Unifying one’s data for uniform quality measures is integral to identifying these variances, which translates into quantifiable financial advantages. “An individual decision might save them a few hundred dollars here and there,” Aggarwal remarked. “Collectively, optimizing their decisions every single day has saved them millions and millions of dollars over time. That’s the power of bringing all data together.”

Citizen Stewardship and Business Engagement
The pervasiveness of data reliance and the value it creates for decision-making and business processes is intrinsically engendered through the trust gained from a firm foundation in data quality. By utilizing timely, reliable, data that is consistent in terms of metadata, attributes, and records management, organizations can transition to a datacentric culture. The products of such a culture are the foregoing cost advantages businesses attributed to improved decision-making. The by-products are streamlined data preparation, improved provenance, upper level management support, aligned metadata, and an appreciation of data’s value and upkeep on the part of the business users who depend on it most.

Aggarwal commented that increased data quality processes facilitated by machine learning and human oversight result in: “A broader dialogue about data in terms of stewardship. Today stewardship is in the hands of IT people basically who don’t have business context. What [we do] is take that stewardship and engage the business people who actually know something about the data much sooner in the process of data quality. That’s how they get to higher data quality, faster.”

And that’s also how they become data driven, faster.

Originally Posted at: The Blueprint for Becoming Data Driven: Data Quality

3 S for Building Big Data Analytics Tool of the Future

3 S for Building Big Data Analytics Tool of the Future
3 S for Building Big Data Analytics Tool of the Future

There is a huge debate on what constitutes the Big Data Analytics tool of the future and many jump in the race to try their flavor of solutions or problem solving techniques that address many critical use cases at play in Big Data laden businesses. While new businesses are working at it, what constitutes a good fundamental theory on product design strategy that could help create something for the future – Solutions with an ability to stay competitive and relevant in the current times.

On the search for some thoughts, I stumbled upon the video of Christopher Lynch from Atlas Venture (@AnalyticsWeek Boston’s First Unconference Finance/Insurance track keynote). He made some interesting points on what constitutes an interesting focus area for new opportunities. You could see the video attached below(Click the video below to watch the specific bit, I would also recommend watching the entire bit as it has lots of great points on current big data ecosystem). He touched on 3S’s Simplicity, Scalability and Security as 3 fundamental areas for big data analytics companies. He certainly has an interesting perspective and surely provides a good coverage on the current disruptive opportunity areas. I’ve some coinciding thoughts briefly mentioned in the ebook Data Driven Innovation – A Primer(download free here). I briefly touched about 3S’s that we should use in our products to help cause much needed disruption in big data space. My laundry list was Small, Simple and Scale. So, it’s great to have 2 of 3 areas that Chris also slated.

3S’s that I think will shape the future of Big Data Analytics and Why:

1. Small: Yes, Big Data is Big but the solution should be small. Reducing the scope of the product to the one magical thing that could solve a potential use case. Wearing a system’s architect hat, one could easily vote for it as small solutions tends to scale well and are more often than not simple to understand. Small is where the most tough part goes in the planning. When you’ve heard that 80% is planning and 20% is execution, it is safe to say that 80% is / should be spend on making the solution smaller. A quick bite size for easy adaptability.

2. Simple: This is a no brainer in the world of software engineering. Simplicity always triumphs ginormous complicated product. Sure, complexity sells but as a service and not as a product. Who has not heard about the quote “if you can’t explain it simply you don’t understand it well enough?”. This is applicable to a good system design and hence a good big data product design. Simple solutions are often understood quickly and therefore meet easy adoption, hence better sales. In fact, it could be safe to say that simplicity is the most important aspect of the 3 listed here.

3. Scale: This is surely a freebie if you get the first two right, but there have been times when a simple and small solution failed to scale. Scalability is another good area of focus for disruption. A good unit size simple tool could be replicated over and over. This will induce the element of scalability in the tools. A good system should be able to grow with the company it is helping to grow. A tool that does not travel for a long ride with a company will often see a diminishing adoption right at the beginning. A great hopeful thought is that this point is easiest to achieve if above 2 points are taken into consideration. Scalability is important for adoption among big businesses who deal with big blobs of data.

I would certainly agree to Chris’s point of importance of security in current tools for easy adoption in enterprise world and I should probably add it as my 4th S as well. So, yes, all powers to you and congratulations on your disruptive platform if you’ve build your science around those 4S’s. World needs your product and Big Data Analytics world is craving for disruption from the tools that only serve to the 1% and rest 99% only wait and watch for the tools to get to their capability levels, else they have to up their game and buy into ocean of tools which are complicated, super sticky and failure is expensive.

Till that day arrives, all we’ve to do is write in front of us: Simple, Small & Scalable and keep them in mind as you build a solution.

Here’s the video (If you don’t have time? Skip to 4m 10sec down):

Originally Posted at: 3 S for Building Big Data Analytics Tool of the Future

The One Number You Need to Grow (A Replication)

The one number you need to grow.

That was the title of the 2003 HBR article by Fred Reichheld that introduced the Net Promoter Score as a way to measure customer loyalty.

It’s a strong claim that a single attitudinal item can portend company success. And strong claims need strong evidence (or at least corroborating evidence).

In an earlier article, I examined the original evidence put forth by Reichheld and looked for any other published evidence and discussed the findings at the event, How Harmful is the Net Promoter Score?

To establish the validity and make the claim that the NPS predicts growth, Fred Reichheld reported that the NPS was the best or second-best predictor of growth in 11 of 14 industries (p. 28).

The data he provided in the appendix of his 2006 book The Ultimate Question to support the relationship shows data from 35 companies in six industries (computers, life insurance, Korean auto insurance, U.S. airlines, Internet Service Providers, and UK supermarkets). His 2003 HBR article contained five more companies and one additional industry (rental cars) for a total of 40 companies and 7 industries.

Close examination of the data reveals that Reichheld used historical, not future growth. He showed the three-year average growth rates (1999–2002) correlated with the two-year average Net Promoter Scores (2001–2002). In other words, the NPS correlated with past growth rates (as opposed to future growth rates). This does establish validity (a sort of concurrent validity) but not predictive validity.

To assess the predictive ability of the NPS, I looked at the U.S. airline industry in 2013 and found a strong correlation between future growth and NPS (but only after accounting for a major merger in the industry).

The published literature on the topic in the last 15 years isn’t terribly helpful either. I found eight other studies that examined the NPS’s predictive ability (Figure 1). I was, however, a bit disappointed in the quality of many of the studies given the ubiquity of the Net Promoter Score.

As Figure 1 shows, three of the eight studies found medium to strong correlations but used historical or current revenue (not future). Of the five remaining studies that used future metrics, two were authored by a competitor of Satmetrix (a possible competitive bias) and one was from a book with connections to Satmetix and not peer reviewed (with an agenda to promote the NPS).

Figure 1: Summary of papers examining the NPS and growth (many used historical revenue or had methodological flaws—like not actually using the 11-point LTR item).

Surprisingly, two of the three studies that looked at future metrics didn’t use the 11-point Likelihood to Recommend question (Keiningham et al., 2007b; Morgan and Rego, 2006). One study that used a 10-point version that found no correlation with business growth also found no correlation with any metrics at the firm level for three Norwegian industries it examined (Keiningham et al., 2007a), which was an unusual finding given all other studies found some correlation with metrics.

Only the study by de Haan et al. (2015) actually used the 11-point Likelihood to Recommend item and found the Net Promoter Score did have a small correlation with future intent (collected in a longitudinal study). It wasn’t the best predictor, but it did correlate with future metrics (which was similar to the finding from the study by Keiningham et al., 2007b using a 5-point LTR).

I think there are at least two reasons for the dearth of published data examining the NPS and growth:

  1. Little upside: There’s little upside for Satmetrix and Reichheld to fund and publish more research to establish the predictive validity of the NPS. If it’s already in wide usage (most Fortune 500 companies use it), then there’s little to gain. That Reichheld didn’t include more data in his 2nd edition of The Ultimate Question likely supports this. (He even excluded the appendix that was in the 1st.)
  2. It’s difficult: Predicting revenue at the customer or company level requires data from two points in time. Longitudinal data takes time to collect (by definition years in this case). It’s also hard to associate attitudinal data to financial performance. Companies have little reason to expose their own data and third-party firms have trouble getting access.

Predicting Future Growth with the Original Data

A few papers I cited above pointed out the problem with Reichheld using historical revenue to show future growth but none I found actually looked to see whether the published NPS data predicted future growth for the same industries. Keiningham (2007a) did use some of Reichheld’s data to show that the American Consumer Satisfaction Index was an equal or better predictor of historical revenue, but didn’t look at future growth.

So, I revisited the very data used to establish the NPS validity—the 1999–2002 Net Promoter Score data Reichheld published in his 2006 book appendix and 2003 HBR article.

With the help of research assistants, I dug through old annual reports, press releases, articles, and the Internet Archive to match the financial metrics collected more than 15 years ago. It wasn’t easy, as many companies merged or went out of business, and whole industries morphed (AOL anyone?). We had to piece together numbers from many different sources and make some assumptions (noted below).

After several weeks of digging we had good results and were able to find data for the same six industries used in the 2006 book plus the one industry included in the HBR article for the years 2002–2006. Table 1 shows the industry, the metric we used, the year the NPS data was reported in Reichheld’s book, the current/historical years Reichheld used, and then the years we found data for to predict future growth.

Industry Metric NPS Data Reichheld Years Our Future Years
U.S. PC market PC Shipments 2001-2001 1999-2002 2002-2005
U.S. Life Insurance market Life premiums 2001-2002 1999-2003 2002-2005
U.S. airlines market Sales 2001-2002 1999-2002 2002-2005
U.S. Internet Service Providers Sales 2002 1999-2002 2002-2005
U.S. car rental market Revenue 2002 1999-2002 2002-2005
UK supermarkets Sales 2003 1999-2003 2003-2006
Korean auto insurance Sales 2003 2001-2003 2003-2006

Table 1: Industries used to establish the predictive ability of the Net Promoter Score from The Ultimate Question and the 2003 HBR article.


We used two future growth periods to assess the predictive validity of the NPS. The first are the two years immediately following the NPS data (and graphed below). For the U.S. industries this was 2002–2003; for the international industries this was 2003–2004 (which matches the years of NPS data Reichheld used). The second includes a longer period of three to four years of growth (2002–2005 for U.S. industries and 2003–2006 for international). We computed Pearson correlations for each industry, then averaged the correlations using the Fisher Z transformation to account for the non-normality in correlations. Finally, we converted the correlations to R2 values to match the fit statistic reported in The Ultimate Question.

Reichheld notes that they found the log of the change in NPS would boost the explanatory power (R2) of NPS but they reported only raw NPS numbers in the appendix. With only one year of NPS data, we didn’t have changes in the NPS so we replicated the approach in the appendix using only the data from the single Net Promoter Scores.

Table 2 shows the results for Reichheld’s originally reported R2 values using current or historical revenue and our R2 values for the subsequent two and four years.

A bit to my surprise (given the many vocal critics and lack of published data), we found evidence that the Net Promoter Score predicted growth in both the subsequent two- and four-year periods. On average we found the Net Promoter Scores reported by Reichheld explained 38% of the changes in growth for the seven industries examined for the immediate two years (low of 8% to a high of 76%). The explanatory power decreased some when the future period increased (which is not too surprising given what can change in four years). For the four-year period, the average explanatory power of the NPS is still 30% (low of 4% to a high of 79%).

To put these R2 values into perspective, the SAT can explain (predict) around 25% of first year college grades, which means these R2 values are impressively large.

  Reichheld Historical R^Sq 2-Year Future Growth R^Sq 4-Year Future Growth R^Sq
U.S. PC market 68% 27% 75%
U.S. Insurance market 86% 39% 4%
U.S. airlines market 68% 8% 22%
U.S. Internet Service Providers 93% 20% 2%
U.S. car rentals 28% 8% 8%
UK supermarkets 84% 76% 79%
Korean auto insurance 68% 48% 12%
Avg R2(Fisher Transformed) 76% 38% 30%

Table 2: R2 values of seven industries from Reichheld’s NPS data compared to historically reported revenue and two-year and four-year growth rates by industry. The Fisher R to Z transformation was used to average the correlations before converting to R2 averages. *Reichheld reported an R2 of 68% for Korean auto but our replication from the scatterplots generated a value of ~30%. See other notes below by industry.

Below we have re-created the bubble scatterplots from Reichheld and compared that with our two-year future data. We estimated the regression lines, R2 values and bubble size using a similar approach as described in Keiningham et al 2007a.

PC Shipments

Historical R2 = 74% Future (2 Years): R2 = 27%
nps gateway dell 26

Note: Compaq was purchased by Dell so is not included in future years. IBM sold its PC industry to Lenovo in 2005 so calculation only includes growth rates between 2002–2004 instead of 2002–2005. Gateway merged with eMachines in 2004; growth rates are also only 2002–2004 and only include Gateway numbers.

US Life Insurance

Historical R2 = 86% Future (2 Years): R2 = 39%
nps life premium 2001-2002 nps life premium 2002-2003 39

Note: For Prudential we used growth rates in British pounds, but bubble size on the chart is determined by converted number of life premiums in U.S. dollars.

US Airlines

Historical R2 = 66% Future (2 Years): R2 = 8%
nps airlines 82

Note: TWA stopped operations in 2001 and wasn’t included in calculation for future years. America West Airlines four-year growth period is between 2002–2004 as they merged with US Airways Group in 2005.

Internet Service Providers (ISPs)

Historical R2 = 89% Future (2 Years): R2 = 20%
nps internet service provider 22


UK Grocery Stores

Historical R2 = 81% Future (2 Years): R2 = 76%
nps groceries 81 nps groceries 76

Note: For ASDA we used growth rates in USD, but the bubble size on the chart is determined by converted number of sales in British pounds.

Korean Auto Insurance

Historical R2 = 68%/30%* Future (2 Years): R2 = 48%
nps korean auto insurance 30 nps korean auto insurance 47

Note: Reichheld reports an R2 of 68% but we calculated a much lower R2 of 30% from the same data.

U.S. Rental Cars

Historical R2 = 28% Future (2 Years): R2 = 17%
nps car rentals 28 nps car rentals 17

Note: In 2003 Vanguard Group purchased National and Alamo brands and didn’t separate the revenue so they are excluded in the future analysis.



A re-examination of the original NPS data using future (rather than historical revenue growth) found:

The NPS explains immediate firm growth in selected industries. On average we found NPS data can explain 38% of the variability in company growth metrics in seven industries at the company/firm level. This is less than half the explanatory power of historical growth reported by Reichheld (76%) but still represents a substantial amount relative to other behavioral science measures. While not as impressive, it still suggests the NPS is a leading indicator of future growth rates, at least in some selected industries for some time periods at the company level.

The NPS is still predictive of more distant growth. The explanatory power of the NPS still remained at a solid 30% for a four-year future growth period. This suggests that established company policies and growth patterns can remain in effect for years (but not always) and the NPS may still portend the more distant future (again in these selected industries and years).

Industry changes are hard to predict with few data points. Companies merge, industries morph, and unexpected changes can happen that affect a company’s growth and consequently the predictive ability of any measure, including the NPS. This was seen in the car rental industry (National merged) and the PC industry (IBM sold to Lenovo) and the airline industry (TWA was acquired after bankruptcy ). When an industry has few data points (e.g. ISPs with only three), only the strongest relationships are detectible and small changes in one year can completely remove any evidence for a relationship between NPS and growth.

Prediction is imprecise. The NPS may be a victim of its own success with its hype leading many to dismiss it unless it’s a perfect predictor of growth. (After all the headline indicated it’s the ONE number you need to grow!) Making predictions is difficult and imprecise but this analysis suggests the NPS does have reasonable predictive ability, at least as high as other high-stakes measures like college entrance exams. It’s unlikely always the superior measure in every industry, given our earlier analyses on satisfaction but this data again suggests it may be an adequate proxy measure of future growth for many industries.

There is a possible selection bias. We limited our analysis to the industries, companies, and metrics reported by Reichheld. It’s likely that these are the best illustrations of the NPS’s predictive (or post-dictive) ability and may not be representative of all industries. Reichheld himself reported that the NPS wasn’t always the best predictor of growth (only in 11/14 industries). A future analysis will look at a broader range of the seven industries shown here as well as examinations at the customer level.



Below are the sources where we found growth rates to match those reported in Reichheld so you can check our work and assumptions (let us know if you see a discrepancy).

US PC market (All Firms)

US Life insurance market

US Airlines

US Internet Service Providers

UK supermarkets

Korean auto insurance

US Car rental

(function() {
if (!window.mc4wp) {
window.mc4wp = {
listeners: [],
forms : {
on: function (event, callback) {
event : event,
callback: callback

Sign-up to receive weekly updates.

Source: The One Number You Need to Grow (A Replication) by analyticsweek

Nate-Silvering Small Data Leads to Internet Service Provider (ISP) industry insights

There is much talk of Big Data and how it is changing/impacting how businesses improve the customer experience. In this week’s post, I want to illustrate the value of Small Data.

Internet Service Providers (ISPs) receive the lowest customer satisfaction ratings among the industry sectors measured by the American Customer Satisfaction Index (ACSI). As an industry, then, the ISP industry has much room for improvement, some more than others. This week, I will use several data sets to help determine ISP intra-industry rankings and how to improve  their inter-industry ranking.

Table 1. Internet Service Provider Ratings
Table 1. Internet Service Provider Ratings

I took to the Web to find several publicly available and relevant data sets regarding ISPs. In all, I found 12 metrics from seven different sources for 27 ISPs. I combined the data sets by ISP. By merging the different data sources, we will be able to uncover greater insights about these different ISPs and what they need to do to increase customer loyalty. The final data set appears in Table 1. The description of each metric appears below:

  • Broadband type: The types of broadband were from PCMag article.
  • Actual ISP Speed: Average speed for Netflix streams from November 2012: Measured in megabits per second (Mbps).
  • American Customer Satisfaction Index (ACSI): an overall measure of customer satisfaction from 2013. Ratings can vary from 0 to 100.
  • Temkin Loyalty Ratings: Based on three likelihood questions (repurchase, switch and recommend) from 2012. Questions are combined and reported as a “net score,” similar to the NPS methodology. Net scores can range from -100 to 100.
  • JD Power: A 5-star rating system for overall satisfaction from 2012. 5 Star = Among the best; 4 Star = Better than most; 3 Star = About average; 2 Star = The rest.
  • PCMag Ratings (6 metrics: Recommend to Fees): Ratings based on customer survey that measured different CX areas in 2012. Ratings are based on a 10-point scale.
  • DSL Reports: The average customer rating across five areas. These five areas are: 1) Pre-Sales Information, 2) Install Coordination,  3) Connection reliability, 4) Tech Support and 5) Value for money. Data were pulled from the site on 6/30/2013. Ratings are based on a 5-point scale.

As you can see in Table 1, there is much missing data for some of the 27 ISPs. The missing data do not necessarily reflect the quality of the data that appear in the table. These sources simply did not collect data to provide reliable ratings for each ISP or simply did not attempt to collect data for each ISP. The descriptive statistics for and correlations among the study variables appear in Table 2.

Table 2. Descriptive Statistics of and Correlations among Study Variables
Table 2. Descriptive Statistics of and Correlations among Study Variables

It’s all about Speed

Customer experience management research tells us that one way of improving satisfaction is to improve the customer experience. We see that actual speed of the ISP is positively related to most customer ratings, suggesting that ISPs that have faster speed also have customers who are more satisfied with them compared to ISPs who have slower speeds. The only exception with this is for satisfaction with Fees; ISPs with faster actual speed tend to have customers who are less satisfied with Fees compared to ISPs with slower actual speed.

Nate-Silvering the Data

Table 3. Rescaled Values of Customer Loyalty Metrics for Internet Service Providers
Table 3. Rescaled Values of Customer Loyalty Metrics for Internet Service Providers

Recall that Nate Silver aggregated several polls to make accurate predictions about the results of the 2012 presidential elections. Even though different polls, due to sampling error, had different outcomes (sometimes Obama won, sometimes Romney won), the aggregation of different polls resulted in a clearer picture of who was really likely to win.

In the current study, we have five different survey vendors (ASCI, Temkin, JD Power, PCMag and DSLREPORTS.com) assessing customer satisfaction with ISPs. Depending on what survey vendor you use, the ranking of ISPs differ. We can get a clearer picture of the ranking by combining the different data sources because a single study is less reliable than the combination of many different studies. While the outcome of aggregating customer surveys may not be as interesting as aggregating presidential polls, the general approach that Silver used to aggregate different results can be applied to the current data (I call it Nate-Silvering the data).

Given that the average correlations among the loyalty-related metrics in Table 2 are rather high (average r = .77; median r = .87), aggregating each metric to form an Overall Advocacy Loyalty metric makes mathematical sense. This overall score would be a much more reliable indicator of the quality of an ISP than any single rating by itself.

To facilitate the aggregation process, I first transformed the customer ratings to a common scale, a 100 -point scale using the following methods. I transformed the Temkin Ratings (a net score) into mean scores based on a mathematical model developed for this purpose (see: The Best Likelihood to Recommend Metric: Mean Score or Net Promoter Score?). This value was then multiplied by 10. The remaining metrics were transformed into a 100-point scale by using a multiplicative function of 20 (JD Power, DSLREPORTS) and 10 (PCMag Sat, PCMag Rec). These rescaled values are located in Table 3. While the transformation altered the average of each metric, these transformations did not appreciably alter the correlations among the metrics (average r = .75, median r = .82).

Table 4. Rankings of Internet Service Providers based on the average loyalty ratings.
Table 4. Rankings of Internet Service Providers based on the average loyalty ratings.

The transformed values were averaged for each of the ISPs. These results appear in Table 4. As seen in this table, the top 5 rated ISPs (overall advocacy ratings) are:

  1. WOW!
  2. Verizon FiOS
  3. Cablevision
  4. Earthlink
  5. Bright House

The bottom 5 rated ISPs (overall advocacy ratings) are:

  1. Windstream
  2. CenturyLink
  3. Frontier
  4. WildBlue
  5. HughesNet


Small Data, like its big brother, can provide good insight (with the help of right analytics, of course) about a given topic. By combining small data sets about ISPs, I was able to show that:

  1. Actual ISP speed is related to customer satisfaction with speed of ISP. ISPs that have objectively faster speed receive higher ratings on satisfaction with speed.
  2. Different survey vendors provide reliable and valid results about customer satisfaction with ISPs (there was a high correlation among different survey vendors).
  3. Improving customer loyalty with ISPs is a function of actual ISP speed.

The bottom line is that you shouldn’t forget the value of small data.

Source: Nate-Silvering Small Data Leads to Internet Service Provider (ISP) industry insights

#Compliance and #Privacy in #Health #Informatics by @BesaBauta

#Compliance and #Privacy in #Health #Informatics by @BesaBauta

In this podcast @BesaBauta from MeryFirst talks about the compliance and privacy challenges faced in hyper regulated industry. With her experience in health informatics, Besa shared some best practices and challenges that are faced by data science groups in health informatics and other similar groups in regulated space. This podcast is great for anyone looking to learn about data science compliance and privacy challenges.

Besa’s Recommended Read:
The Art Of War by Sun Tzu and Lionel Giles https://amzn.to/2Jx2PYm

Podcast Link:
iTunes: http://math.im/itunes
GooglePlay: http://math.im/gplay

Besa’s BIO:
Dr. Besa Bauta is the Chief Data Officer and Chief Compliance Officer for MercyFirst, a social service organization providing health and mental health services to children and adolescents in New York City. She oversees the Research, Evaluation, Analytics, and Compliance for Health (REACH) division, including data governance and security measures, analytics, risk mitigation, and policy initiatives.
She is also an Adjunct Assistant Professor at NYU, and previously worked as a Research Director for a USAID project in Afghanistan, and as the Senior Director of Research and Evaluation at the Center for Evidence-Based Implementation and Research (CEBIR). She holds a Ph.D. in implementation science with a focus on health services, an MPH in Global Health and an MSW. Her research has focused on health systems, mental health, and integration of technology to improve population-level outcomes.

About #Podcast:
#FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Want to sponsor?
Email us @ info@analyticsweek.com

#FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Source: #Compliance and #Privacy in #Health #Informatics by @BesaBauta by v1shal

Big Data in China Is a Big Deal

Big data means different things in different regions – in China retailers are finding ways to make it useful.

One thing Western brands have learned from expansion into the East is that Chinese shoppers are a discerning consumer group. They want genuine quality – fake items are no longer acceptable, value (demonstrated by Single’s Days’ record-shattering sales levels), and VIP treatment.

They also spend a lot of money, with around 250 million of them parting with approximately $275 billioneach year for Internet purchases alone. That’s a massive 60 percent of all online purchases in Asia.

It’s a hugely significant retail market, and key to leveraging its potential is the intelligent use of the wealth of data gathered each time a shopper researches a product, visits a store, or makes a purchase.

“Big data” is often a loaded term – it can mean different things to different people, depending on the industry. But retailers have gone some way toward pinning it down and making it useful.

Targeting With Precision 

Perhaps the most important – and profitable – use of consumer data is extracting preferences and patterns of purchase and using the information to offer highly targeted value-added services and products.

For example, Alibaba’s Open Data Processing Service (ODPS), has allowed it to analyze millions of transactions and set up a highly effective loan service for small online businesses. Data from Alipay and Alibaba’s shopping sites, including purchases, reviews, and credit ratings, assesses a borrower’s ability to repay a loan.

The use of more than 100 computing models and around 80 billion data entries has allowed Alibaba to reduce the cost of lending to a fraction of the cost of a traditional bank loan.

Of course, this kind of accuracy in consumer targeting opens the door to clienteling and the personalized service customers in China are looking for – the ability to identify with precision the needs and wants of people looking for a superior service.

For a small fee, retailers can use the might of the ODPS’ processing power to identify trends, pinpoint key demographics, and plan future ranges and campaigns aimed to meet the exact requirements of their customers.

Tracking Rogue Traders

Concern over counterfeit products means that consumers are prepared to pay a premium for genuine Western items, and will choose online stores such as TMall, which have a reputation for trading in authentic brands. However, the proliferation of fake goods is still a problem, and businesses are turning to big data to help tackle the issue. Following a report by the Chinese government, an e-commerce union comprising key online firms has been established, designed to pool vendor data in an attempt to identify rogue traders through their online shops, transactions, and other sales activity.

The amount of detailed information available to sales platforms should, in theory, mean that there is simply nowhere left to hide for sellers with less than scrupulous product standards.

Transfer of Intelligence

According to the Chinese University of Hong Kong, the three biggest players in China’s online industry, known as “BAT” (Baidu, Alibaba, Tencent) are “sitting on a goldmine of big data.”

The potential for cross-industry application is huge – data integration and mining between retail and financial institutions, for example, will drive the future of both online and physical commerce.

Tapping into the skills and experience of BAT – Baidu alone has thousands of analysts assessing data every day – will give retailers and other industries unprecedented accuracy in profiling the people buying their products and using their services, rich in both detail and opportunity.

The bottom line is that, especially for retailers, big data is a big deal. Expect to see even more sophisticated targeting models and customer-centric business operations coming from China over the next year, thanks to the intelligent use of information.

Originally via “Big Data in China Is a Big Deal”

Originally Posted at: Big Data in China Is a Big Deal