Big data: are we making a big mistake?

Big data is a vague term for a massive phenomenon that has rapidly become an obsession with entrepreneurs, scientists, governments and the media.

High quality global journalism requires investment. Please share this article with others using the link below, do not cut & paste the article. See our Ts&Cs and Copyright Policy for more detail.

Five years ago, a team of researchers from Google announced a remarkable achievement in one of the world’s top scientific journals, Nature. Without needing the results of a single medical check-up, they were nevertheless able to track the spread of influenza across the US. What’s more, they could do it more quickly than the Centers for Disease Control and Prevention (CDC). Google’s tracking had only a day’s delay, compared with the week or more it took for the CDC to assemble a picture based on reports from doctors’ surgeries. Google was faster because it was tracking the outbreak by finding a correlation between what people searched for online and whether they had flu symptoms.

Not only was “Google Flu Trends” quick, accurate and cheap, it was theory-free. Google’s engineers didn’t bother to develop a hypothesis about what search terms – “flu symptoms” or “pharmacies near me” – might be correlated with the spread of the disease itself. The Google team just took their top 50 million search terms and let the algorithms do the work.

FirstFT is our new essential daily email briefing of the best stories from across the web

The success of Google Flu Trends became emblematic of the hot new trend in business, technology and science: “Big Data”. What, excited journalists asked, can science learn from Google?

As with so many buzzwords, “big data” is a vague term, often thrown around by people with something to sell. Some emphasise the sheer scale of the data sets that now exist – the Large Hadron Collider’s computers, for example, store 15 petabytes a year of data, equivalent to about 15,000 years’ worth of your favourite music.

But the “big data” that interests many companies is what we might call “found data”, the digital exhaust of web searches, credit card payments and mobiles pinging the nearest phone mast. Google Flu Trends was built on found data and it’s this sort of data that ­interests me here. Such data sets can be even bigger than the LHC data – Facebook’s is – but just as noteworthy is the fact that they are cheap to collect relative to their size, they are a messy collage of datapoints collected for disparate purposes and they can be updated in real time. As our communication, leisure and commerce have moved to the internet and the internet has moved into our phones, our cars and even our glasses, life can be recorded and quantified in a way that would have been hard to imagine just a decade ago.

Cheerleaders for big data have made four exciting claims, each one reflected in the success of Google Flu Trends: that data analysis produces uncannily accurate results; that every single data point can be captured, making old statistical sampling techniques obsolete; that it is passé to fret about what causes what, because statistical correlation tells us what we need to know; and that scientific or statistical models aren’t needed because, to quote “The End of Theory”, a provocative essay published in Wired in 2008, “with enough data, the numbers speak for themselves”.

Unfortunately, these four articles of faith are at best optimistic oversimplifications. At worst, according to David Spiegelhalter, Winton Professor of the Public Understanding of Risk at Cambridge university, they can be “complete bollocks. Absolute nonsense.”

Found data underpin the new internet economy as companies such as Google, Facebook and Amazon seek new ways to understand our lives through our data exhaust. Since Edward Snowden’s leaks about the scale and scope of US electronic surveillance it has become apparent that security services are just as fascinated with what they might learn from our data exhaust, too.

Consultants urge the data-naive to wise up to the potential of big data. A recent report from the McKinsey Global Institute reckoned that the US healthcare system could save $300bn a year – $1,000 per American – through better integration and analysis of the data produced by everything from clinical trials to health insurance transactions to smart running shoes.

But while big data promise much to scientists, entrepreneurs and governments, they are doomed to disappoint us if we ignore some very familiar statistical lessons.

“There are a lot of small data problems that occur in big data,” says Spiegelhalter. “They don’t disappear because you’ve got lots of the stuff. They get worse.”

. . .

Four years after the original Nature paper was published, Nature News had sad tidings to convey: the latest flu outbreak had claimed an unexpected victim: Google Flu Trends. After reliably providing a swift and accurate account of flu outbreaks for several winters, the theory-free, data-rich model had lost its nose for where flu was going. Google’s model pointed to a severe outbreak but when the slow-and-steady data from the CDC arrived, they showed that Google’s estimates of the spread of flu-like illnesses were overstated by almost a factor of two.

The problem was that Google did not know – could not begin to know – what linked the search terms with the spread of flu. Google’s engineers weren’t trying to figure out what caused what. They were merely finding statistical patterns in the data. They cared about ­correlation rather than causation. This is common in big data analysis. Figuring out what causes what is hard (impossible, some say). Figuring out what is correlated with what is much cheaper and easier. That is why, according to Viktor Mayer-Schönberger and Kenneth Cukier’s book, Big Data, “causality won’t be discarded, but it is being knocked off its pedestal as the primary fountain of meaning”.

But a theory-free analysis of mere correlations is inevitably fragile. If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down. One explanation of the Flu Trends failure is that the news was full of scary stories about flu in December 2012 and that these stories provoked internet searches by people who were healthy. Another possible explanation is that Google’s own search algorithm moved the goalposts when it began automatically suggesting diagnoses when people entered medical symptoms.

Google Flu Trends will bounce back, recalibrated with fresh data – and rightly so. There are many reasons to be excited about the broader opportunities offered to us by the ease with which we can gather and analyse vast data sets. But unless we learn the lessons of this episode, we will find ourselves repeating it.

Statisticians have spent the past 200 years figuring out what traps lie in wait when we try to understand the world through data. The data are bigger, faster and cheaper these days – but we must not pretend that the traps have all been made safe. They have not.

In 1936, the Republican Alfred Landon stood for election against President Franklin Delano Roosevelt. The respected magazine, The Literary Digest, shouldered the responsibility of forecasting the result. It conducted a postal opinion poll of astonishing ambition, with the aim of reaching 10 million people, a quarter of the electorate. The deluge of mailed-in replies can hardly be imagined but the Digest seemed to be relishing the scale of the task. In late August it reported, “Next week, the first answers from these ten million will begin the incoming tide of marked ballots, to be triple-checked, verified, five-times cross-classified and totalled.”

After tabulating an astonishing 2.4 million returns as they flowed in over two months, The Literary Digest announced its conclusions: Landon would win by a convincing 55 per cent to 41 per cent, with a few voters favouring a third candidate.

The election delivered a very different result: Roosevelt crushed Landon by 61 per cent to 37 per cent. To add to The Literary Digest’s agony, a far smaller survey conducted by the opinion poll pioneer George Gallup came much closer to the final vote, forecasting a comfortable victory for Roosevelt. Mr Gallup understood something that The Literary Digest did not. When it comes to data, size isn’t everything.

Opinion polls are based on samples of the voting population at large. This means that opinion pollsters need to deal with two issues: sample error and sample bias.

Sample error reflects the risk that, purely by chance, a randomly chosen sample of opinions does not reflect the true views of the population. The “margin of error” reported in opinion polls reflects this risk and the larger the sample, the smaller the margin of error. A thousand interviews is a large enough sample for many purposes and Mr Gallup is reported to have conducted 3,000 interviews.

But if 3,000 interviews were good, why weren’t 2.4 million far better? The answer is that sampling error has a far more dangerous friend: sampling bias. Sampling error is when a randomly chosen sample doesn’t reflect the underlying population purely by chance; sampling bias is when the sample isn’t randomly chosen at all. George Gallup took pains to find an unbiased sample because he knew that was far more important than finding a big one.

The Literary Digest, in its quest for a bigger data set, fumbled the question of a biased sample. It mailed out forms to people on a list it had compiled from automobile registrations and telephone directories – a sample that, at least in 1936, was disproportionately prosperous. To compound the problem, Landon supporters turned out to be more likely to mail back their answers. The combination of those two biases was enough to doom The Literary Digest’s poll. For each person George Gallup’s pollsters interviewed, The Literary Digest received 800 responses. All that gave them for their pains was a very precise estimate of the wrong answer.

The big data craze threatens to be The Literary Digest all over again. Because found data sets are so messy, it can be hard to figure out what biases lurk inside them – and because they are so large, some analysts seem to have decided the sampling problem isn’t worth worrying about. It is.

Professor Viktor Mayer-Schönberger of Oxford’s Internet Institute, co-author of Big Data, told me that his favoured definition of a big data set is one where “N = All” – where we no longer have to sample, but we have the entire background population. Returning officers do not estimate an election result with a representative tally: they count the votes – all the votes. And when “N = All” there is indeed no issue of sampling bias because the sample includes everyone.

But is “N = All” really a good description of most of the found data sets we are considering? Probably not. “I would challenge the notion that one could ever have all the data,” says Patrick Wolfe, a computer scientist and professor of statistics at University College London.

An example is Twitter. It is in principle possible to record and analyse every message on Twitter and use it to draw conclusions about the public mood. (In practice, most researchers use a subset of that vast “fire hose” of data.) But while we can look at all the tweets, Twitter users are not representative of the population as a whole. (According to the Pew Research Internet Project, in 2013, US-based Twitter users were disproportionately young, urban or suburban, and black.)

There must always be a question about who and what is missing, especially with a messy pile of found data. Kaiser Fung, a data analyst and author of Numbersense, warns against simply assuming we have everything that matters. “N = All is often an assumption rather than a fact about the data,” he says.

Consider Boston’s Street Bump smartphone app, which uses a phone’s accelerometer to detect potholes without the need for city workers to patrol the streets. As citizens of Boston download the app and drive around, their phones automatically notify City Hall of the need to repair the road surface. Solving the technical challenges involved has produced, rather beautifully, an informative data exhaust that addresses a problem in a way that would have been inconceivable a few years ago. The City of Boston proudly proclaims that the “data provides the City with real-time in­formation it uses to fix problems and plan long term investments.”

Yet what Street Bump really produces, left to its own devices, is a map of potholes that systematically favours young, affluent areas where more people own smartphones. Street Bump offers us “N = All” in the sense that every bump from every enabled phone can be recorded. That is not the same thing as recording every pothole. As Microsoft researcher Kate Crawford points out, found data contain systematic biases and it takes careful thought to spot and correct for those biases. Big data sets can seem comprehensive but the “N = All” is often a seductive illusion.

Who cares about causation or sampling bias, though, when there is money to be made? Corporations around the world must be salivating as they contemplate the uncanny success of the US discount department store Target, as famously reported by Charles Duhigg in The New York Times in 2012. Duhigg explained that Target has collected so much data on its customers, and is so skilled at analysing that data, that its insight into consumers can seem like magic.

Duhigg’s killer anecdote was of the man who stormed into a Target near Minneapolis and complained to the manager that the company was sending coupons for baby clothes and maternity wear to his teenage daughter. The manager apologised profusely and later called to apologise again – only to be told that the teenager was indeed pregnant. Her father hadn’t realised. Target, after analysing her purchases of unscented wipes and magnesium supplements, had.

Statistical sorcery? There is a more mundane explanation.

“There’s a huge false positive issue,” says Kaiser Fung, who has spent years developing similar approaches for retailers and advertisers. What Fung means is that we didn’t get to hear the countless stories about all the women who received coupons for babywear but who weren’t pregnant.

Hearing the anecdote, it’s easy to assume that Target’s algorithms are infallible – that everybody receiving coupons for onesies and wet wipes is pregnant. This is vanishingly unlikely. Indeed, it could be that pregnant women receive such offers merely because everybody on Target’s mailing list receives such offers. We should not buy the idea that Target employs mind-readers before considering how many misses attend each hit.

In Charles Duhigg’s account, Target mixes in random offers, such as coupons for wine glasses, because pregnant customers would feel spooked if they realised how intimately the company’s computers understood them.

Fung has another explanation: Target mixes up its offers not because it would be weird to send an all-baby coupon-book to a woman who was pregnant but because the company knows that many of those coupon books will be sent to women who aren’t pregnant after all.

None of this suggests that such data analysis is worthless: it may be highly profitable. Even a modest increase in the accuracy of targeted special offers would be a prize worth winning. But profitability should not be conflated with omniscience.

In 2005, John Ioannidis, an epidemiologist, published a research paper with the self-explanatory title, “Why Most Published Research Findings Are False”. The paper became famous as a provocative diagnosis of a serious issue. One of the key ideas behind Ioannidis’s work is what statisticians call the “multiple-comparisons problem”.

It is routine, when examining a pattern in data, to ask whether such a pattern might have emerged by chance. If it is unlikely that the observed pattern could have emerged at random, we call that pattern “statistically significant”.

The multiple-comparisons problem arises when a researcher looks at many possible patterns. Consider a randomised trial in which vitamins are given to some primary schoolchildren and placebos are given to others. Do the vitamins work? That all depends on what we mean by “work”. The researchers could look at the children’s height, weight, prevalence of tooth decay, classroom behaviour, test scores, even (after waiting) prison record or earnings at the age of 25. Then there are combinations to check: do the vitamins have an effect on the poorer kids, the richer kids, the boys, the girls? Test enough different correlations and fluke results will drown out the real discoveries.

There are various ways to deal with this but the problem is more serious in large data sets, because there are vastly more possible comparisons than there are data points to compare. Without careful analysis, the ratio of genuine patterns to spurious patterns – of signal to noise – quickly tends to zero.

Worse still, one of the antidotes to the ­multiple-comparisons problem is transparency, allowing other researchers to figure out how many hypotheses were tested and how many contrary results are languishing in desk drawers because they just didn’t seem interesting enough to publish. Yet found data sets are rarely transparent. Amazon and Google, Facebook and Twitter, Target and Tesco – these companies aren’t about to share their data with you or anyone else.

New, large, cheap data sets and powerful ­analytical tools will pay dividends – nobody doubts that. And there are a few cases in which analysis of very large data sets has worked miracles. David Spiegelhalter of Cambridge points to Google Translate, which operates by statistically analysing hundreds of millions of documents that have been translated by humans and looking for patterns it can copy. This is an example of what computer scientists call “machine learning”, and it can deliver astonishing results with no preprogrammed grammatical rules. Google Translate is as close to theory-free, data-driven algorithmic black box as we have – and it is, says Spiegelhalter, “an amazing achievement”. That achievement is built on the clever processing of enormous data sets.

But big data do not solve the problem that has obsessed statisticians and scientists for centuries: the problem of insight, of inferring what is going on, and figuring out how we might intervene to change a system for the better.

“We have a new resource here,” says Professor David Hand of Imperial College London. “But nobody wants ‘data’. What they want are the answers.”

To use big data to produce such answers will require large strides in statistical methods.

“It’s the wild west right now,” says Patrick Wolfe of UCL. “People who are clever and driven will twist and turn and use every tool to get sense out of these data sets, and that’s cool. But we’re flying a little bit blind at the moment.”

Statisticians are scrambling to develop new methods to seize the opportunity of big data. Such new methods are essential but they will work by building on the old statistical lessons, not by ignoring them.

Recall big data’s four articles of faith. Uncanny accuracy is easy to overrate if we simply ignore false positives, as with Target’s pregnancy predictor. The claim that causation has been “knocked off its pedestal” is fine if we are making predictions in a stable environment but not if the world is changing (as with Flu Trends) or if we ourselves hope to change it. The promise that “N = All”, and therefore that sampling bias does not matter, is simply not true in most cases that count. As for the idea that “with enough data, the numbers speak for themselves” – that seems hopelessly naive in data sets where spurious patterns vastly outnumber genuine discoveries.

“Big data” has arrived, but big insights have not. The challenge now is to solve new problems and gain new answers – without making the same old statistical mistakes on a grander scale than ever.


Tim Harford’s latest book is ‘The Undercover Economist Strikes Back’. To comment on this article please post below, or email

Originally posted via “Big data: are we making a big mistake?”


Originally Posted at: Big data: are we making a big mistake? by anum

How Google does Rapid Prototyping? Tom Chi’s Perspective [video]

How Google does Rapid Prototyping? Tom Chi’s Perspective
In this TEDEducation video Tom Chi from Google Glass team explains how rapid prototyping is done. This video is a good and snappy tutorial to help entrepreneurs seek ways to do rapid prototyping. This not only help surface product problems early in product lifecycle and fixing them quickly but also help with one of the startup key problem, to have a prototype for validation.

Let us know if you have thoughts on how to do it effectively.


Free Research Report on the State of Patient Experience in US Hospitals

Download Free Report from TCELab: Improving the Patient Experience

The Centers for Medicare & Medicaid Services (CMS) will be using patient feedback about their care as part of their reimbursement plan for acute care hospitals (see Hospital Value-Based Purchasing (VBP) program). The purpose of the VBP program is to promote better clinical outcomes for patients and improve their experience of care during hospital stays. Not surprisingly, hospitals are focusing on improving the patient experience (PX) to ensure they receive the maximum of their incentive payments.

Free Download of Research Report on the Patient Experience

I spent the past few months conducting research on and writing about the importance of patient experience (PX) in US hospitals. My partners at TCELab have helped me summarize these studies into a single research report, Improving the Patient Experience . As far as I am aware, these series of studies are the first to integrate these disparate US hospital data sources (e.g., Patient Experience, Health Outcomes, Process of Care, and Medicare spending per patient) to apply predictive analytics for the purpose of identifying the reasons behind a loyal patient base.

While this research is really about the entirety of US hospitals, hospitals still need to dig deeper into their own specific patient experience data to understand what they need to do to improve the patient experience. This report is a good starting point for hospitals to learn what they need to do to improve the patient experience and increase patient loyalty. Read the entire press release about the research report, Improving the Patient Experience.

Get the free report from TCELab by clicking the image or link below:

Download Free Report from TCELab: Improving the Patient Experience



Source by bobehayes

How Big Data Analytics Can Help Track Money Laundering

Criminal and terrorist organizations are increasingly relying on international trade to hide the flow of illicit funds across borders. Big data analytics may be the key to tracking these financial flows.

or the past decade, governments around the world have established international anti-money laundering (AML) and counter-terrorist financing efforts in an effort to shut down the cross-border flow of funds to criminal and terrorist organizations. Their success has encouraged criminals to move their cash smuggling away from the financial system to the byzantine world of global trade. According to PwC US, big data analytics are becoming essential to tracking these activities.

It’s easy to understand why criminal and terrorist organizations would turn to the global merchandise export trade to hide the movement of their funds. It’s a classic needle in a haystack — an $18.3 trillion business formed of a “web of complexity that involves finance, shipping and insurance interests operating across multiple legal systems, multiple customs procedures, and multiple languages, using a set of traditional practices and procedures that in some instances have changed little for centuries,” PwC says.

Watching the Money Flow

There’s no real way to quantify how much money criminals are invisibly exchanging using this system. PwC notes that the Global Financial Integrity (GFI) research and advocacy organization estimates that 80 percent of illicit financial flows from developing countries are accomplished through trade-based money laundering (TBML), from more than $200 billion in 2002 to more than $600 billion in 2011. GFI believes more than $101 billion was illicitly smuggled into China in 2012 via over-invoicing, which is only one of the common TBML techniques.

“At its core, trade finance is an old-fashioned business,” the report says. “As other industries have adopted more technology- and data-driven infrastructures, trade finance has remained extremely document-intensive and paper-based, moored on a framework of instruments, systems, and practices that have proven their effectiveness and earned global trust over the generations.”

But they are also opaque, PwC says, making it extremely difficult for AML efforts to see what’s going on.

“For example, trade finance’s legacy procedures affect the relationship management aspect of AML, which includes know-your-customer (KYC) procedures and examination of customer documentation prior to transaction approval,” the report says. “In this paper-intensive environment, AML remains a largely manual procedure and thus prone to human error. It remains reliant upon established “red flag” checklists provided by regulators, in which transactions are manually reviewed by analysts, escalated should any concerns be raised, and then subjected to further manual review if wrongdoing is suspected.”

The Need to Share Data

This state of affairs is exacerbated by a number of factors, especially the lack of data sharing between customs, tax and legal authorities and a tendency to rely on AML procedures designed to target cash smuggling and financial system misuse. Instead, PwC says, authorities need to develop targeted TBML responses that focus on data sharing and text and data analytics.

So what exactly does TBML look like? Common TBML techniques include the following:

Under-invoicing. The exporter invoices trade goods at a price below the fair market price. This allows the exporter to effectively transfer value to the importer, as the payment for the trade goods will be lower than the value the importer receives when reselling the goods on the open market.
Over-invoicing. This technique is much the same as the first, except in reverse. The exporter invoices trade goods at a price above the fair market value, allowing the importer to transfer value to the exporter.
Multiple invoicing. With this technique, a money launderer or terrorist financier issues multiple invoices for the same international trade transaction, justifying multiple payments for the same shipment. “Payments can originate from different financial institutions, adding to the complexity of detection, and legitimate explanations can be offered if the scheme is uncovered (e.g., amendment of payment terms, payment of late fees, etc.),” the report explains.
Over- and under-shipment. In some cases, the parties simply overstate or understate the quantities of goods shipped relative to the payments sent or received. PwC calls out an extreme example of this, known as “phantom shipping,” in which no goods are exchanged at all, but shipping and customs documents are processed as normal.
False description of trade goods. With this technique, money launderers misrepresent the quality or type of trade goods. For instance, they might replace an expensive item listed on the invoice and customs documents with an inexpensive item.
Informal money transfer systems (IMTS). These networks have, in many cases, been co-opted by criminals and terrorists. PwC points to Colombia’s Black Market Peso Exchange (BMPE) as a prime example. Established by Colombian businesses trying to get around Colombia’s restrictive currency exchange policies, the BMPE allows users to sell dollars to a broker, who then trades them for Pesos to a legitimate Colombian business that needs hard U.S. currency to purchase goods for shipment to South America. It’s not just Colombian drug traffickers repatriating their profits either; PwC notes that similar systems exist around the world, including the hawalahundi system on the Indian sub-continent and others in Venezuela, Argentina, Brazil and Paraguay.

What Can Big Data Do?

So how can big data analytics help organizations find these illicit transactions in an $18.3 trillion haystack? Well, for one, the sea of documents generated by this activity — the commercial invoices, bills of lading, insurance certificates, inspection certificates, certificates of origin and more — that make it so difficult to see what’s truly happening may also be the point of vulnerability.

“A global, one-stop solution to TBML is highly unlikely,” PwC says. “The most effective solution would involve the imposition of bank-like compliance requirements on all organizations that trade internationally. But while this would create transparency across transactions, it would also create a massive layer of red tape that would adversely impact the preponderance of traders and related parties who are engaged in legitimate activity. The largely unquantifiable nature of the TBML problem makes it difficult to justify such an intrusive, expensive and vastly complicated solution. Short of global regulation, we have global analytics.”

In other words, automating anti-TBML monitoring — extracting and analyzing in-house and external data, both structured and unstructured — is of critical importance.

PwC believes such a program must properly align across key business areas and incorporate automated processes using a variety of advanced techniques, including:

Text analytics. The capability to extract data from text files in an automated fashion can unlock a massive amount of data that can be used for transaction monitoring.
Web analytics and Web-crawling. These tools can systematically scan the web to review shipment and custom details and compare them against corresponding documentation.
Unit price analysis. This statistic-driven approach uses publicly available data and algorithms to detect if unit prices exceed or fall far below global and regional established thresholds.
Unit weight analysis. This technique involves searching for instances where money launderers are attempting to transfer value by overstating or understating the quantity of goods shipped relative to payments.
Network (relationship) analysis of trade partners and ports. Enterprise analytics software tools can identify hidden relationships in data between trade partners and ports, and between other participants in the trade lifecycle. They can also identify potential shell companies or outlier activity.

International trade and country profiling analysis. An analysis of publicly available data may establish profiles of the types of goods that specific countries import and export, flagging outliers that might indicate TBML activity.

Thor Olavsrud

Orginally posted via “How Big Data Analytics Can Help Track Money Laundering”

Source: How Big Data Analytics Can Help Track Money Laundering by anum

The Modern Day Software Engineer: Less Coding And More Creating

Last week, I asked the CEO of a startup company in Toronto, “How do you define a software engineer?”.

She replied, “Someone who makes sh*t work”;

This used to be all you needed. If your online web app starts to crash, hire a software engineer to fix the problem.

If your app needs a new feature, hire a software engineer to build it (AKA weave together lines of code to make sh*t work).

We need to stop referring to an engineer as an ‘engineer’. CEOs of startups need to stop saying ‘we need more engineers’.

The modern day ‘engineer’ cannot simply be an engineer. They need to be a renaissance person; a person who is well versed in multiple aspects of life.

Your job as a software engineer cannot be to simply ‘write code’. That’s like saying a Canadian lawyer’s job is to speak English.

English and code are means of doing the real job: Produce value that society wants.

So, to start pumping out code to produce a new feature simply because it’s on the ‘new features list’ is mindless. You can’t treat code as a means itself.

The modern day engineer (MDE) needs to understand the modern day world. The MDE cannot simply sit in a room alone and write code.

The MDE needs to understand the social and business consequences of creating and releasing a product.

The MDE cannot leave it up to the CEOs and marketers and business buffs to come up with the ‘why’ for a new product.

Everyone should be involved in the ‘why’, as long they are in the ‘now’.

New frameworks that emphasis less code and more productivity are being released every day, almost.

We are slowly moving towards a future where writing code will be so easy that it would be unimpressive to be someone who only writes code.

In the future Google Translate will probably add JavaScript and Python (and other programming languages) to their list of languages. Now all you have to do is type in English and get a JavaScript translation. In fact, who needs a programming language like JavaScript or Python when you can now use English to directly tell a computer what to do?

Consequently, code becomes a language that can be spoken by all. So, to write good code, you need to be more than an ‘engineer’. You need to be a renaissance person and a person who understands the wishes, wants, emotions and needs of the modern day world.

Today (October 22nd, 2015), I was at a TD Canada Trust networking event designed for ‘tech professionals’ in Waterloo ON, Canada. The purpose of this event was to demo new ‘tech’ (the word has so many meanings nowadays) products to young students and professionals. The banking industry is in the process of a full makeover, if you didn’t know. One of the TD guys, let’s call him Julio, was telling me a little summary of what TD was (and is) trying to do with its recruitment process.

Let me give you the gist of what he said:

“We have business professionals (business analysts, etc) whose job is to understand the 5 W’s of the product. Also, we have engineers/developers/programmers who just write code. What we are now looking for is someone who can engage with others as well as do the technical stuff.”

His words were wise, but I was not sure if he fully understood the implications of what he was talking about. This is the direction we have been heading for quite some time now, but it’s about time we kick things up a notch.

Expect more of this to come.
Expect hybrid roles.
Expect it become easier and easier to write code.
Expect to be valued for your social awareness paired with your ability to make sh*t work.

Perhaps software tech is at the beginning of a new Renaissance era.

*View the original post here*

Twitter: @nikhil_says


Originally Posted at: The Modern Day Software Engineer: Less Coding And More Creating by nbhaskar

Customer Loyalty and Goal Setting

All companies who use customer loyalty surveys strive to see increases in their customer loyalty scores. Improving customer loyalty has been shown to have a positive impact on business results and long-term business success. Toward that end, executives implement various company-wide improvements in hopes that improvements in customer loyalty scores will follow.

One common method for improving performance is goal setting. There is a plethora of research on the effectiveness of goal setting in improving performance. In the area of customer satisfaction, what typically occurs is that management sees that their customer loyalty score is 7.0 (on a 0-10 scale) at the start of the year. They then set a customer loyalty goal of 8.0 for the end of the fiscal year. What happens at the end of the year? The score remains about 7.0. While their intentions are good, management does not see the increases in loyalty scores that they set out to attain. What went wrong? How can this company effectively use goal setting to improve their customer loyalty scores?

Here are a few characteristics of goals that improve the probability that goals will improve performance:

Specific. Goals need to be specific and clearly define what behaviors/actions are going to be taken to achieve the goal and in what time-frame or frequency these behaviors/actions should take place. For example, a goal stating, “Decrease the number of contacts with the company a customer needs to resolve an issue” does little to help employees focus their efforts because there is no mention of a rate/frequency associated with the decrease. A better goal would be, “Resolve customer issues in three or fewer contacts.”

Measurable. A measurement system needs to be in place to track/monitor progress toward the goal. The measurement system is used to determine whether the goal has been achieved and provides a feedback loop to the employees who are achieving the goal.

A common problem with using customer loyalty scores as the metric to track or monitor improvements is that satisfaction goals are still vague with respect to what the employees can actually do to impact satisfaction/loyalty scores. Telling the technical support department that the company’s customer loyalty goal is 8.0 provides no input on how that employee can affect that score. A better measure for the technical support department would be “satisfaction with technical support” or other technical support questions on the survey (e.g., “technical support responsiveness,” technical support availability”). We know that satisfaction with technical support is positively related to customer loyalty. Using these survey questions for goal setting has a greater impact on changing their behaviors compared to using vague loyalty questions. Because satisfaction with technical support is related to customer loyalty, improvements in technical support satisfaction should lead to improvements in loyalty scores.

An even better measure would be to use operational metrics for goal setting. The company must first identify the key operational metrics that are statistically related to customer satisfaction/loyalty. This process involves in-depth research via linkage analysis (e.g., linking satisfaction scores with operational measures such as hold time, turnaround time, and number of transfers) but the payoffs are great; once identified, the customer-centric operational metrics can be used for purposes of goal setting.

Difficult but attainable. Research has shown that difficult goals lead to better performance compared to goals that are easy. Difficult goals focus attention to the problem at hand. Avoid setting goals, however, that are too difficult and, consequently, not achievable. One way to set difficult and attainable goals is to use historical performance data to determine the likelihood of achieving different performance levels.

Relevant. Goals for the employees should be appropriate for the employees’ role; can the employee impact the goal? Additionally, the goal should be relevant to both the employee and the organization. Holding employees to be responsible for goals that are outside of their control (e.g., technical support representatives being responsible for product quality) is unfair and can lead to low morale.

Accepted (or mutually set). For goal setting to increase performance, employees should be allowed to participate in setting their goals. Goals that are not accepted by the recipient are not likely to be internalized and motivating. A good approach would be to get employees involved early in the process of goal setting. Let them help in identifying the problem, selecting (or understanding) the key measures to track, and setting the goal.


The following are key characteristics of effective goals:

  • Specific
  • Measurable
  • Difficult but attainable
  • Relevant
  • Accepted (or mutually set)

Goal setting can be an effective management tool. Incorporating this methodology can build a customer-centric culture by ensuring employees’s behaviors are guided by measures that matter to the customer.

Source: Customer Loyalty and Goal Setting

Two Underutilized Heroes of Data & Innovation: Correlation & Covariance

Two Underutilized Heroes of Data & Innovation: Correlation & Covariance
Two Underutilized Heroes of Data & Innovation: Correlation & Covariance

Yes, Data driven innovation is fun and it gets most done in less. But let’s talk about a math that is not as much known as it should be in the enterprise world. Correlation & Covariance are two such values that are most underutilized and have the tendency to cause maximum impact and disruption to any complicated business model.

First, a quick high level math primer (picked from Wiki): In probability theory and statistics, the mathematical descriptions of covariance and correlation are very similar.[1][2] Both describe the degree of similarity between two random variables or sets of random variables.
Correlation refers to any of a broad class of statistical relationships involving dependence.
Whereas, Covariance is a measure of how much two random variables change together. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the smaller values, i.e., the variables tend to show similar behavior, the covariance is positive.[1] In the opposite case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e., the variables tend to show opposite behavior, the covariance is negative. Anyways, over with the math talk, you could find more information by searching for covariance & correlations and if you are not blown away by it’s capabilities, do take out some extra time for reading about cross-correlation & cross covariance. You will get into the world of predictive modeling and so much more savvy stuff that you could do with these two interesting and powerful concepts.

On a traditional note, a company is analytically as smart as the analytics team it entails. But, on an interesting note, it does not have to be like this. A smarter business model like utilizing correlation & covariance on your captured data could do the heavy lifting for you and help you focus on the areas that are really causing some significant impact to your business. As you must have already read, by definition covariance and correlation can help you understand relationship between 2 random sets of data.

What is happening in most of the companies that I spoke with is that most of us have utilized this math while handling known sets of data within the periphery of a project. For an example, a project data and its variables can be correlated together for finding any hidden relations. If these relationships are not determined, it could cost your businesses a significant impact. If you are not at this yet, stop reading now and get your correlation & covariance mojo active at the least within the projects.

If your organization is already doing it within projects, you are part of that savvy organization which takes success and failures of their projects too seriously for them to be left with professionals. Now, you might need to ask, what next. Where is the next big wave? Innovation is the next big thing that is riding on the data that correlation/covariance could provide your organization. How about doing it within different projects, departments, silos etc. Consider for a case where one project is impacting the other. So, one tiny dependency on a remote department could cause a significant impact to totally unrelated department in the business.

Yes, you guessed right, we are talking about a big-data problem, or may be one of the biggest big-data problems for your organization.

Correlation and covariance have the power to identify those hidden relationships that you would have never guessed existed and then helps you find the extent of their dependency. How much one variable varies with the other. Once you have a model in place to comb your organization’s data for any correlations and thereby finding their covariance, you would understand how much one event is linked to other and by what degree. This would help your business identify high impact areas that you could then map to high performance. All you need to do is understand if the identified relationship is known or unknown. If it’s known, yes, you have validated that sometimes world is as sane as you expect it to be, and If not, wallah, you just identified a potential area to investigate and worry about, to make sure all relationships in your business are accounted for.

If data combing is done properly for any possible correlations and covariance, you could assure nothing will ever fall through the crack again. Your radar will always pick potential areas as soon as their relationship is established. And yes, that will save your business some cash and help it run optimally.

So, to do a quick recap:
1. Make sure you understand what correlation/covariance is, and for added bonus, read about cross correlation & cross covariance.
2. Make sure your project or projects in your company are leveraging correlation/covariance in finding hidden dependencies that could jeopardize the success of your project.
3. Make sure, you have big-data setup that could help connect data across various projects, departments & business units for finding possible correlations and their covariance.
4. Make sure you have right triggers, alarms and action plan setup for investigating any identified relationships further.
5. Make sure you have an automated system that combs the business data and help identifies possible cracks in real time.

If you are done with those 5 steps, your business is destined for consistent improvements and sustained data driven innovations.

And yes, as I always rant, you don’t have to do it in-house. Probably, for better business sense, get it made outside and then once it is validated, bring it in-house. All you need is a good data analytics/visualization platform that could take any number of structured and un-structured data and find correlations between them.

Originally Posted at: Two Underutilized Heroes of Data & Innovation: Correlation & Covariance

Bob Hayes to Address Vovici Vision 2010 Users Conference, May 10-12, 2010

Dulles, VA – November 2, 2009 – Vovici, the leading provider of survey software and enterprise feedback management (EFM) solutions, will hold its user conference, Vision 2010, May 10-12, 2010 in Reston, Virginia.

Vision 2010 will bring together feedback management leaders and experts across multiple industries to participate in compelling educational sessions, training, and peer networking opportunities. Among the confirmed keynote presenters will be three customer loyalty luminaries:

  • Jeb Dasteel, Chief Customer Officer of Oracle (NASDAQ: ORCL)
  • Jeanne Bliss, author of I Love You More Than My Dog and Chief Customer Officer
  • Bob Hayes, Ph. D., author of Beyond the Ultimate Question and recognized loyalty expert

“At Oracle, executive leadership is relentlessly focused on listening to customers and prioritizing feedback to drive customer strategy at all levels,” said Dasteel. Dasteel has been with Oracle for 11 years, five of which have been spent running Oracle’s Global Customer Programs and as CCO for the last year. Dasteel was named the 2009 Chief Customer Officer of the Year at the first Chief Customer Officer Summit.

Jeanne Bliss spent 25 years in the role of Chief Customer Officer at Lands’ End, Allstate, Microsoft, Mazda and Coldwell Banker. Today her firm, CustomerBLISS consults around the world, teaching and guiding companies and leaders how to wrap their business around customer relationships and business prosperity. “Leading companies understand the importance of listening to customers, using feedback to deliver an experience with impact, and creating a lasting bond,” noted Bliss. Her first book, Chief Customer Officer (Jossey-Bass, 2006), was based on 25 years of reporting to the CEOs of five major corporations.

Bob Hayes, Ph.D., is the president and founder of Business Over Broadway. He is a recognized expert in customer satisfaction and loyalty measurement, and has conducted survey research for enterprise companies, including Siebel Systems, Oracle, Agilent Technologies, and Cisco Systems. “There are key ingredients to a successful customer feedback program. Adoption of these elements is critical to improving both customer relationship management and customer loyalty, and Vision 2010 will offer a great opportunity to learn how to accomplish these,” said Hayes.

To register for Vision 2010, please visit:

“Vovici is the Voice of the Customer platform that is helping Fortune 500 companies to emotionally connect to customers,” said Greg Stock, chairman and CEO of Vovici. “We are very excited to bring this amazing group together to share insights and proven methodologies that actually achieve higher level business objectives and make the customer’s vision a reality.”

Source: Bob Hayes to Address Vovici Vision 2010 Users Conference, May 10-12, 2010 by bobehayes

Every step you take: Who owns our mobile health data?

Gadgets that track your steps, sleeping and heart rate could help us live longer and cut national healthcare costs by billions – or so we are told.

Microsoft has just launched its first wearable health gadget, the Band, in the US ahead of its global launch.

Similar products from Samsung and Google are already on the market and early next year the much-hyped Watch from Apple will go on sale.

Millions of us are going to be having our most intimate bodily functions monitored by these gadgets, creating more health data than has ever existed before.

Why do these machines help us stay fit and more importantly what happens to all that information we are generating and sharing?

Tim Cook introducing the Apple Watch
Apple will soon follow Microsoft and Google into the mobile health device market

Massive market

Before the giants of the tech world realised that wearable, health-focused gadgets were the new big thing the market was already thriving.

In March the European Commission published its green paper on mobile health, which contained some mind-boggling statistics.

It suggests that 97,000 apps are on sale in the mobile health sector, which includes tracking apps but also apps that help patients make appointments and keep track of medication.

It predicts that by 2017 more than 1.5 billion people around the world will be using these apps, generating total revenues of £14.5bn ($23bn).

In the EU alone it is estimated that these apps and gadgets could reduce health costs by £77.5bn (99bn euros).

Sector pioneers

Most of the growth has come from start-ups that saw the potential early and now face a competitive onslaught from the big technology companies.

Five years ago French firm Withings launched its wireless scales – the device feeds data back to you, by plotting a graph of your weight over time.

“It started with the scales because we thought that was the one dimension that would make sense for people to track,” Julien De Preaumont, chief marketing officer at Withings, says.

“The first rule of data is to make people aware of their health to make them realise how their weight is evolving.

black wireless scales by Withtings
The wireless scales by Withings uses data visualisation to help dieters lose weight

“The curve reveals the impact of life changes, it will show how a divorce, a diet or a new job will affect your weight.”

After the scales took off, Withings launched wearable gadgets that track your movement, heart rate, blood pressure and sleep.

The company maintains that the data it collects belongs to the user only.

But it has published reports revealing the most obese cities in France and the US, as well as another study showing sleep patterns across Europe.

Withings says this does not compromise the privacy of the individual user’s data because it is aggregated and anonymised.

Business games

While Withings has grown to be a global business, US firm Fitbit has also seen its business thrive beyond its borders.

Founded in 2007 Fitbit offers wireless scales, wearable devices that monitor movement, heart rate, sleep and blood pressure, and is evangelical about the motivating power of targets and data on our health.

Fitbit also offers companies its gadgets and software for corporate use.

Its “corporate wellness” scheme started in the US and companies can use the scheme to get a rebate on their taxes.

A screengrab from a Fitbit challenge
Games and challenges can be used to motivate people to compete against each other

Clients so far include blue-chip multinationals such as BP and Time Warner.

Employees can sign up and different divisions can compete against each other over the number of steps taken or stairs climbed.

“The key is to make the product sticky,” says Gareth Jones from Fitbit, and the key to that is gamification.

“Our software incorporates challenges like daily showdowns and weekend warriors which motivate people and keep them coming back.”

But should employees be worried about sharing their every movement, 24 hours a day with a corporate scheme?

“We don’t have data about this, it’s very much a choice of the individual as to whether they sign in for the programme. We see the result of that as purely the people who agree to participate and the people who don’t,” says Mr Jones.

“We might share with the corporate administrator information that 50 people have been invited and 45 have said yes. How the company uses that information is up to the company.”

‘In the hands of the people’

The potential of all the data that is now being collected is huge, both for business and for public health bodies.

Imagine going to the doctor and being able to show them how much exercise you do, how much sleep you get and your blood pressure for the last year.

While the insurance industry is using mobile applications for arranging appointments and giving health information, they are yet to fully embrace the use of wearable devices and the data they collect, though it is a development that could completely change their business as many research papers suggest.

Meanwhile the use of the data for medical research is also a long way off.

Professor John Newton from Public Health England would like to see a more joined-up approach.

“We’ve got the world of apps, a huge investment from the technology companies, but the healthcare sector hasn’t made the link,” he says.

“If you were able to make the link between a hospital service like a diabetic clinic with a patient’s mobile phone data, they could tell immediately whether that person’s diabetes was going out of control.”

His message is clear: “Put the data into the hands of the people who can use it to make a difference.”

Like all the new data that is being recorded and analysed the possibilities are massive but the ethical and privacy issues surrounding our personal information will not go away quickly.

Originally posted via “Every step you take: Who owns our mobile health data?”

Originally Posted at: Every step you take: Who owns our mobile health data?

How Oracle Uses Big Data to Improve the Customer Experience

Data Silo for each Business Data Source

Customer experience management (CEM) programs are no stranger to the use of data. CEM professionals use data to gain insight about their customers to help improve the customer experience and optimize customer loyalty. Not surprisingly, CEM programs typically rely on customer feedback as their main data source (e.g., social media, customer emails, tech support notes, formal customer surveys). Customer feedback data, however, are only one type of business data that are used to improve business decisions.

Big Data

The concept of Big Data is broad one and I consider it an amalgamation of different areas that help us try to get a handle on, insight from and use out of data. Big Data, including the tools, processes and solutions to wrangle the ever-increasing size, complexity and velocity of business data, can help companies extract value from collecting, processing and analyzing vast quantities of data. Businesses who can get a better handle on these data will be more likely to outperform their competitors who do not.

I recently wrote about the implications of Big Data on the practice of CEM and how Big Data providers can help companies integrate all their different business data (e.g., operational, financial, constituency, customer) to understand how different data sources impact customer satisfaction and loyalty. With the ever-increasing hype around the promise of Big Data, there has been a call for practitioners to provide real world examples of Big Data solutions in use.  I offer up one example below. The example was first presented in my book on CEM best practices, Beyond the Ultimate Question, and highlights Oracle’s use of Big Data principles to improve their service request (SR) process.

Oracle Understands Value of Integrating Data Silos

Jeb Dasteel, Oracle’s senior vice president and chief customer officer, understands the value of integrating different data sources with their customer metrics:

“It is important to understand how the operational measures that we use to drive our business correlate to the satisfaction of our customers. Our studies have helped determine the areas of operational performance that are the key drivers of our customer’s satisfaction. This has provided an opportunity to focus our improvement initiatives specifically on those areas that are of greatest importance to our customers.”

Jeb Dasteel, SVP, Chief Customer Officer, Oracle
from, Beyond the Ultimate Question

By integrating different types of metrics (from disparate data silos), Oracle is able to expand how they think about their customer experience improvement initiatives. Rather than focusing solely on their customer metrics to gain customer insights, Oracle links different data sources to get a holistic understanding of all the business areas that impact customer loyalty. Here is how they accomplished this Big Data project.

Oracle’s Service Request Process

Oracle customers can request help in the form of service requests (SRs).  The quality of these SRs are typically measured using objective operational metrics that are automatically generated in their CRM system. Oracle’s system tracks many operational metrics. For this illustration, we will look at three:

  • Total Time to Resolve (Close Date – Open Date)
  • Initial Response Time
  • Number of SR Ownership Changes

In addition to the operational metrics that are captured as part of their SR process, Oracle solicits feedback from their customers about the quality of their specific SR experience (via transaction-based survey). These customer feedback data are housed in a separate system apart from the operational metrics.

Oracle wanted to understand how their operational metrics were related to satisfaction with the service request.

Data Federation of Operational Metrics and Customer Metrics

Figure 1. Data Model for Linking Operational Metrics and Customer Metrics (result of data federation)

Oracle used data federation to pull together metrics from the two disparate data sources (operational metrics and one for customer satisfaction metrics). The data were linked together at the transaction level. The data model for this Big Data project appears in Figure 1.

After the data were linked together, segments for each operational variable were created (from low to high values) to understand how customer satisfaction varied over different levels of the operational metric.

Results of Analyses

Analyses revealed some interesting insights about how the three operational metrics impact customer satisfaction with the transaction. The relationship of each operational metric with overall satisfaction with the SR is presented in Figures 2, 3 and 4.

Using Total Time to Resolve the SR, Oracle found that customers were more satisfied with their SRs that were resolved more quickly compared to customers whose SRs took longer to resolve (See Figure 2.).

Figure 2. Relationship between time to resolve SR and customer satisfaction with SR

Using Initial Response Time to the SR, Oracle found that customers were no more satisfied or dissatisfied with their SRs whether the initial response time was fast or slow (See Figure 3.). Despite the expectations that the Initial Response Time to the SR would greatly impact the customers’ satisfaction with the SR, this study showed that the initial response time had no impact on the satisfaction of customers.

Figure 3. Relationship between initial response time and customer satisfaction with SR

Using Number of Ownership Changes, Oracle found that customers were more satisfied with their SRs that had fewer ownership changes compared to customers whose SRs had more ownership changes (See Figure 4.).

The application of Big Data solutions at Oracle has provided much insight regarding how the management of customers through the Service Request process can be facilitated with the use of operational metrics. The analyses showed that not all operational metrics are predictive of customer satisfaction;  initial response time was unrelated to customer satisfaction, suggesting that monitoring metrics associated with that aspect of the SR process is unnecessary in improving customer satisfaction. To improve the customer experience with the SR process (e.g., improve customer satisfaction), changes to the SR process are best directed at elements of the SR process that will impact the resolution time and the number of ownership changes.

Figure 4. Relationship between number of SR ownership changes and customer satisfaction with SR

Benefits of Big Data

Linking disparate data silos proved useful for Oracle. They were able to identify the operational metrics that were important to customers. More importantly, they were able to identify operational metrics that were not important to driving customer satisfaction. Demonstrating the statistical relationship between operational metrics and customer satisfaction and operational metrics can help you in three ways:

  1. Build/Identify customer-centric business metrics: You can identify/create key operational metrics that are statistically linked to customer satisfaction and focus only those that are important to your customers.
  2. Manage customer relationships using objective operational metrics: Driving business growth now becomes a process of using the operational metrics to manage customer relationships. Big Data studies can help you identify appropriate operational performance goals (using operational metrics) that ensure customers will be satisfied.
  3. Reward employee behavior that will drive customer satisfaction: Because of their reliability and specificity, operational metrics are good candidates for use in goal setting and employee incentive programs.  Rewarding employee performance based on customer-centric operational metrics ensures employees are aligned with the needs of the customers.


Proper application of Big Data principles helps expand the types of metrics you can use as part of your customer experience strategy. By taking a customer-centric approach in their analyses of their Big Data, Oracle was able to link operational metrics to customer feedback metrics to identify how the operational metrics are related to customer satisfaction. This type of approach to understanding all your business data will help you build customer-centric operational metrics, manage customer relationships using operational metrics and reward employees based on operational metrics that matter to the customer.

Originally Posted at: How Oracle Uses Big Data to Improve the Customer Experience