Organizations want real-time big data and analytics capability because of an emerging need for big data that can be immediately actionable in business decisions. An example is the use of big data in online advertising, which immediately personalizes ads for viewers when they visit websites based on their customer profiles that big data analytics have captured.
“Customers now expect personalization when they visit websites,” said Jeff Kelley, a big data analytics analyst from Wikibon, a big data research and analytics company. “There are also other real-time big data needs in specific industry verticals that want real-time analytics capabilities.”
The financial services industry is a prime example. “Financial institutions want to cut down on fraud, and they also want to provide excellent service to their customers,” said Kelley. “Several years ago, if a customer tried to use his debit card in another country, he was often denied because of fears of fraud in the system processing the transaction. Now these systems better understand each customer’s habits and the places that he is likely to travel to, so they do a better job at preventing fraud, but also at enabling customers to use their debit cards without these cards being locked down for use when they travel abroad.”
Kelly believes that in the longer term this ability to apply real-time analytics to business problems will grow as the Internet of Things (IoT) becomes a bigger factor in daily life.
“The Internet of Things will enable sensor tacking of consumer type products in businesses and homes,” he said. “You will be collect and analyze data from various pieces of equipment and appliances and optimize performance.”
The process of harnessing IoT data is highly complex, and companies like GE are now investigating the possibilities. If this IoT data can be captured in real time and acted upon, preventive maintenance analytics can be developed to preempt performance problems on equipment and appliances, and it might also be possible for companies to deliver more rigorous sets of service level agreements (SLAs) to their customers.
Kelly is excited at the prospects, but he also cautions that companies have to change the way they view themselves and their data to get the most out of IoT advancement.
“There is a fundamental change of mindset,” he explained, “and it will require different ways of approaching application development and how you look at the business. For example, a company might have to redefine itself from thinking that it only makes ‘makes trains,’ to a company that also ‘services trains with data.'”
The service element, warranties, service contracts, how you interact with the customer, and what you learn from these customer interactions that could be forwarded into predictive selling are all areas that companies might need to rethink and realign in their business as more IoT analytics come online. The end result could be a reformation of customer relationship management (CRM) to a strictly customer-centric model that takes into account every aspect of the customer’s “life cycle” with the company — from initial product purchases, to servicing, to end of product life considerations and a new beginning of the sales cycle.
This is Part 2 of a series on the Development of the Customer Sentiment Index (see introduction, and Part 1). The CSI assessesÂ the extentÂ to which customers describe your company/brand with words that reflectÂ positive or negativeÂ sentiment. This post covers the development of a judgment-basedÂ sentiment lexicon and compares it to empirically-based sentiment lexicons.
Last week, I created four sentiment lexicons for use in a new customer experience (CX) metric, the Customer Sentiment Index (CSI). The four sentiment lexicons were empirically derived using data from a variety of online review sites from IMDB, Goodreads, OpenTable and Amazon/Tripadvisor.Â This week, I develop a sentiment lexicon using a non-empirical approach.
Human JudgmentÂ Approach to Sentiment Classification
TheÂ judgment-based approach does not rely on data to derive the sentiment values; rather this method requires the use of subject matter experts toÂ classify words into sentiment categories. This approach is time-consuming, requiring the subject matter experts to manually classify each of the thousands of words in ourÂ empirically-derived lexicons. To minimize the work required by the subject matter experts, an initial set of opinion words were generated using two studies.
In the first study, as part of an annual customer survey, a B2B technology company included an open-ended survey question, “Using one word, please describe COMPANY’S products/services.” From 1619Â completed surveys, 894 customers provided an answer for the question.Â Many respondents used multiple words or the company’s name as their response, reducing the number of useful responses to be 689.Â Of these respondents,Â a total of 251 usable unique words were used by respondents.
Also, the customer survey included questions that required customers to provide ratings on measures of customer loyalty (e.g., overall satisfaction, likelihood to recommend, likelihood to buy different products, likelihood to renew) and satisfaction with the customer experience (e.g., product quality, sales process, ease of doing business, technical support).
In the second study, as part of a customer relationship survey, I solicited responses from customers of wireless service providers (B2C sample).Â The sample was obtained using Mechanical Turk by recruiting English-speaking participants to complete a short customer survey about their experience with their wireless service provider. In addition toÂ the standard rated questions in the customer survey (e.g., customer loyalty, CX ratings),Â the following question was used to generate the one word opinion: “What one word best describes COMPANY? Please answer this question using one word.”
From 469Â completed surveys, 429Â customers provided an answer for the question, Many respondents used multiple words or the company’s name as their response, reducing the number of useful responses to be 319. Of these respondents,Â a total of 85 usable unique words were used by respondents.
Sentiment Rating of Opinion Words
The list of customer-generated words for each sample wasÂ independently rated by the two experts. I was one of those experts. My good friend and colleague was the other expert. We bothÂ hold a PhD in industrial-organizational psychology and specialize in test development (him) and survey development (me). We have extensive graduate-levelÂ training on the topics ofÂ statistics and psychological measurement principles. Also, we have applied experience, helping companies gain value from psychological measurements. We each haveÂ over 20 years of experience in developing/validating tests and surveys.
For eachÂ list of words (N = 251 and N = 85), each expertÂ was given the list of words and wasÂ instructed to “rate each word on a scale from 0 to 10; where 0 is most negative sentiment/opinion and 10 is most positive sentiment/opinion; and 5 is the midpoint.” After providing their first rating of each word, each of the two raters were then given the opportunity to adjust their initial ratings for each word. For this process, each rater was given the list of 251 words with their initial rating and were asked to make any adjustments to their initial ratings.
Results of Human Judgment Approach to Sentiment Classification
Descriptive statistics of and correlations among the expert-derived sentiment values of customer-generated words appears in Table 1. As you can see, the two raters assignÂ very similar sentiment ratings to words for both sets. Average ratings were similar. Also, the inter-rater agreement between the two raters for the 251 words was r = .87 and for the 85 words was .88.
After slight adjustments, the inter-rater agreement between the two raters improved to r = .90 for the list of 251 words and .92 for the list of 85 words. This high inter-rater agreement indicated that the raters were consistent in their interpretation of the two lists of words with respect to sentiment.
Because of the high agreement between the raters and comparable means between raters, an overall sentiment score for each word was calculated as the average of the raters’ second/adjusted rating (See Table 1 or Figure 2 for descriptive statistics for this metric).
Comparing Empirically-Derived and Expert-Derived Sentiment
In all, I have created five lexicons; four lexicons are derived empirically from four data sources (i.e., OpenTable, Amazon/Tripadvisor, Goodreads and IMDB) and one lexicon is derived using subject matter experts’ sentiment classification.
I compared these five lexicons to better understand the similarity and differences of each lexicon. I applied the four empirically-derived lexicons to each list of customer-generated words. So, in all, for each list of words, I have 5 sentiment scores.
The descriptive statistics of and correlations among the five sentiment scores for the 251 customer-generated words appears in Table 2. Table 3 houses the information for the 85 customer-generated words.
As you can see, there is high agreement among the empirically-derived lexicons (average correlation = .65 for the list of 251 words and .79 for the list of 85 words.
There are statistically significant mean differences across the empirically-derived lexicons; Amazon/Tripadvisor has the highest average sentiment value and Goodreads has the lowest. Lexicons from IMDB and OpenTable provide similar means. The expert judgment lexicon provides the lowest average sentiment ratings for each list of customer-generated words. The absolute sentiment value of a word is dependent on the sentiment lexicon you use. So, pick a lexicon and use it consistently;Â changing your lexicon couldÂ change your metric.
Looking at the the correlations of the expert-derived sentiments with each of the empirically-derived sentiment, we see that OpenTable lexicon had higher correlation with the experts compared toÂ the Goodreads lexicon. The pattern of results make sense. The OpenTable sample is much more similar to the sample on which the experts provided their sentiment ratings. OpenTableÂ represents a customer/supplier relationship regarding a service while the Goodreads’ sample represents a different type of relationship (customer/book quality).
Summary and Conclusions
These two studies demonstrated that subject matter experts are able to scale words along a sentiment scale. There was high agreement among the experts in their classification.
Additionally, these judgment-derived lexicons were very similar to fourÂ empirically derived lexicons.Â Lexicons based on subject matter experts’ sentiment classification/scaling of wordsÂ are highly correlated to empirically-derived lexicons.Â It appears that each of the five sentiment lexicons tells you roughly the same thing as the other lexicons.
The empirically-derived lexicons are less comprehensive than the subject matter experts’ lexicons regarding customer-generated words. By design, the subject matter experts classified all words that were generated by customers; some of the words that were used by the customers do not appear in the empirically-derived lexicons. For example, the OpenTable lexicon only represents 65% (164/251) of the customer-generated words for Study 1 and 71% (60/85) of the customer-generated words for Study 2. Using empirically-derived lexicons for the purposes of calculatingÂ the Customer Sentiment IndexÂ couldÂ be augmented using lexicons that are based on subject matter experts’ classification/scaling of words.
In the next post, I will continue presenting information about the validating the Customer Sentiment Index (CSI). So far, the analysisÂ shows that the sentiment scores of the CSI are reliable (we get similar results using different lexicons). We now need to understand what the CSI is measuring. I will show this by examining the correlation of the CSI with other commonly used customer metrics, including likelihood to recommend (e.g., NPS), overall satisfaction and CX ratings of important customer touch points (e.g., product quality, customer service). Examining correlations of this nature will also shed light on the usefulness of the CSI in a business setting.
In the next few blog posts, I will introduce a new metric, the Customer Sentiment Index (CSI). Integrated into your customer relationship survey, the CSI assessesÂ the degree to which customers possess aÂ positive or negative attitudeÂ about you. The development of the CSI involved the application of different disciplinesÂ including psychometrics, sentiment analysis and predictive analytics.
Each weekly blog post will describe a step in the CSI development process. Even though the series ofÂ blog posts wereÂ designed to complement and build onÂ one another,Â each post will be able to stand alone with respect to its topic. The upcoming topics include:
Measuring customers’ attitudes using structured and unstructured data
Sentiment analysis and sentiment lexicons
Developing sentiment lexicons using an empirically-based andÂ judgment-based approach
Reliability, validity and usefulness of the CSI
Applications of the CSI: Improving customer experience, mobile surveys
As a whole, these posts will represent my ongoing research and development of the Customer Sentiment Index. If any companies are interested in getting involved in the CSI through sponsorship or partnership, please contact me.
Executives at a European Financial Services fi rm had a clear vision.
The company would create a data analytics application for all the markets
it served. It would then collect data about its customersâ behaviours and
preferences and, through analysis of the data, could identify opportunities
that would enable the fi rm to present the right o er to the right customer
at the right time. The company, thereby, would become more central to the
fi nancial lives of its customers. Rising revenues, of course, would follow.
Partway through the work of building the application, however, cost pressures at the fi rm whittled away at the
scope of the project. Instead of an application that would address all its markets, the fi rm decided to prioritise
one market and launch the application there. But the company had neglected to establish frameworks for
defi ning and categorising the data assets being collected, making it diffi cult (if not impossible) for the application
to recognise how data points related to each other
Note: This article originally appeared in FTIConsulting. Click for link here.
Effective data governance consists of protocols, practices, and the people necessary for implementation to ensure trustworthy, consistent data. Its yields include regulatory compliance, improved data quality, and dataâs increased valuation as a monetary asset that organizations can bank on.
Nonetheless, these aspects of governance would be impossible without what is arguably its most important component: the common terminologies and definitions that are sustainable throughout an entire organization, and which comprise the foundation for the aforementioned policy and governance outcomes.
When intrinsically related to the technologies used to implement governance protocols, terminology systems (containing vocabularies and taxonomies) can unify terms and definitions at a granular level. The result is a greatly increased ability to tackle the most pervasive challenges associated with big data governance including recurring issues with unstructured and semi-structured data, integration efforts (such as mergers and acquisitions), and regulatory compliance.
A Realistic Approach
Designating the common terms and definitions that are the rudiments of governance varies according to organization, business units, and specific objectives for data management. Creating policy from them and embedding them in technology that can achieve governance goals is perhaps most expediently and sustainably facilitated by semantic technologies, which are playing an increasingly pivotal role in the overall implementation of data governance in the wake of big dataâs emergence.
Once organizations adopt a glossary of terminology and definitions, they can then determine rules about terms based on their relationships to one another via taxonomies. Taxonomies are useful for disambiguation purposes and can clarify preferred labelsâamong any number of synonymsâfor different terms in accordance to governance conventions. These definitions and taxonomies form the basis for automated terminology systems that label data according to governance standards via inputs and outputs. Ingested data adheres to terminology conventions and is stored according to preferred labels. Data captured prior to the implementation of such a system can still be queried according to the systemâs standards.
Linking Terminology Systems: Endless Possibilities
The possibilities that such terminology systems produce (especially for unstructured and semi-structured big data) are virtually limitless, particularly with the linking capabilities of semantic technologies. In the medical field, a hand written note hastily scribbled by a doctor can be readily transcribed by the terminology system in accordance to governance policy with preferred terms, effectively giving structure to unstructured data. Moreover, it can be linked to billing coding systems per business functions. That structured data can then be stored in a knowledge repository and queried along with other data, adding to the comprehensive integration and accumulation of data that gives big data its value.
Focusing on common definitions and linking terminology systems enables organizations to leverage business intelligence and analytics on different databases across business units. This method is also critical for determining customer disambiguation, a frequently occurring problem across vertical industries. In finance, it is possible for institutions with numerous subsidiaries and acquisitions (such as Citigroup, Citibank, Citi Bike, etc.) to determine which subsidiary actually spent how much money with the parent company and additional internal, data-sensitive problems by using a common repository. Also, linking the different terminology repositories for these distinct yet related entities can achieve the same objective.
The primary way in which semantics addresses linking between terminology systems is by ensuring that those systems are utilizing the same words and definitions for the commonality of meaning required for successful linking. Vocabularies and taxonomies can provide such commonality of meaning, which can be implemented with ontologies to provide a standards-based approach to disparate systems and databases.
Subsequently, all systems that utilize those vocabularies and ontologies can be linked. In finance, the Financial Industry Business Ontology (FIBO) is being developed to grant âdata harmonization andâ¦the unambiguous sharing of meaning across different repositories.â The life sciences industry is similarly working on industry wide standards so that numerous databases can be made available to all within this industry, while still restricting access to internal drug discovery processes according to organization.
Regulatory Compliance and Ontologies
In terms of regulatory compliance, organizations are much more flexible and celeritous to account for new requirements when data throughout disparate systems and databases are linked and commonly sharedârequiring just a single update as opposed to numerous time consuming updates in multiple places. Issues of regulatory compliance are also assuaged in a semantic environment through the use of ontological models, which provide the schema that can create a model specifically in adherence to regulatory requirements.
Organizations can use ontologies to describe such requirements, then write rules for them that both restrict and permit access and usage according to regulations. Although ontological models can also be created for any other sort of requirements pertaining to governance (metadata, reference data, etc.) it is somewhat idealistic to attempt to account for all facets of governance implementation via such models. The more thorough approach is to do so with terminology systems and supplement them accordingly with ontological models.
The true value in utilizing a semantic approach to big data governance that focuses on terminology systems, their requisite taxonomies, and vocabularies pertains to the fact that this method is effective for governing unstructured data. Regardless of what particular schema (or lack thereof) is available, organizations can get their data to adhere to governance protocols by focusing on the terms, definitions, and relationships between them. Conversely, ontological models have a demonstrated efficacy with structured data. Given the fact that the majority of new data created is unstructured, the best means of wrapping effective governance policies and practices around them is through leveraging these terminology systems and semantic approaches that consistently achieve governance outcomes.
About the Author: Dr. Jans Aasman Ph.d is the CEO of Franz Inc., an early innovator in Artificial Intelligence and leading supplier of Semantic Graph Database technology. Dr. Aasmanâs previous experience and educational background include:
â¢ Experimental and cognitive psychology at the University of Groningen, specialization: Psychophysiology, Cognitive Psychology.
â¢ Tenured Professor in Industrial Design at the Technical University of Delft. Title of the chair: Informational Ergonomics of Telematics and Intelligent Products
â¢ KPN Research, the research lab of the major Dutch telecommunication company
â¢ Carnegie Mellon University. Visiting Scientist at the Computer Science Department of Prof. Dr. Allan Newell