Data Analytics Success Starts with Empowerment
Being Data Driven is not as much of a tech challenge as it is an adoption challenge. Adoption has it’s root in cultural DNA of any organization. Great data driven organizations rungs the data driven culture into the corporate DNA. A culture of connection, interactions, sharing and collaboration is what it takes to be data driven. Its about being empowered more than its about being educated.
[ DATA SCIENCE Q&A]
Q:Why is naive Bayes so bad? How would you improve a spam detection algorithm that uses naive Bayes?
A: Naïve: the features are assumed independent/uncorrelated
Assumption not feasible in many cases
Improvement: decorrelate features (covariance matrix into identity matrix)
On this International Women’s Day, it might be a wise idea to learn how women is shaping the entrepreneurial landscape. Not only is the impact impressive, growing but it is also building sustained growth. In some aspects, the impact is equal or better than the male counterparts.
Women entrepreneurs has been on the rise for sometime, more specifically, we’ve grown twice as fast as men between 1997 and 2007, at the pace of 44% growth in women-owned businesses. if it is not a cool stats, not sure what else is?
There are a Dozen interesting factoids about how women is shaping business landscape:
In 2005, there were 7 CEOâs in Fortune 500. As of May 2011, there were 12 CEOâs in Fortune 500 companies, not many but growing.
Approximately 32% of women business owners believe that being a woman in a male-dominated industry is beneficial.
The number of women-owned companies with 100 or more employees has increased at nearlytwice the growth rate of all other companies.
The vast majority (83%) of women business owners are personally involved in selecting and purchasing technology for their businesses.
The workforces of women-owned firms show more gender equality. Women business owners overallemploy a roughly balanced workforce (52% women, 48% men), while men business owners employ 38% women and 62% men, on average.
3% of all women-owned firms have revenues of $1 million or more compared with 6% of men-owned firms.
Women business owners are nearly twice as likely as men business owners to intend to pass the business on to a daughter or daughters (37% vs. 19%).
Between 1997 and 2002, women-owned firms increased their employment by 70,000, whereas firms owned by men lost 1 million employees.
One in five firms with revenue of $1 million or more is woman-owned.
Women owners of firms with $1 million or more in revenue are more likely to belong to formal business organizations, associations or networks than other women business owners (81% vs. 61%).
Women-owned firms in the U.S. are more likely than all firms to offer flex-time, tuition reimbursement and, at a smaller size, profit sharing to their workers.
86% of women entrepreneurs say they use the same products and services at home that they do in their business, for familiarity and convenience.
Road is well traveled and boy we have covered a distance. Let us embrace and keep breaking the glass ceiling. At the end, Happy International Women’s Day you all!
Data is being collected in droves, but most of the time, people donât know what to do with it. Thatâs why data scientists are hot commodities in the startup world right now. In fact, between 2003 and 2013, employment in data industries grew about 21 percent — nearly 16 percent more than overall employment growth. Itâs a fairly new concept, but these people are so valuable because they understand the significance of data for your business and how you can use it.
Using analytics, firms can discover patterns and stories in data, build the infrastructure needed to properly collect and store it, inform business decisions and guide strategy. Access to sufficient and robust data is vital to sustained startup growth.
Companies need to incorporate data science into their business models as early as possible while theyâre taking risks and making crucial decisions about the future. But how do you know whether your company is ready to go the extra mile and hire a data scientist?
First, you need to make sure you can afford to hire one. On average, a single data scientist costs a company $100,000 annually. A team of data engineers, machine learning experts and modelers can cost millions.
Smaller companies may need to create software solutions and invest time in building revenue to ensure they can actually utilize a data scientistâs skills. Tools such as Tableau, Qlik and Google Charts can help you plot and visualize the results of your data collection, connect this information to dashboards and quickly glean actionable insights.
Once your business is ready to make a larger investment to gain a competitive edge, there are several key traits to seek out in potential candidates. The best data scientists are:
All the data in the world wonât illuminate much if the scientist analyzing it doesnât possess practical IT skills, experience with the tools mentioned above and a thorough understanding of basic security practices. A solid background in mathematics and statistics is also an indispensable trait; this demonstrates an intellectual rigor and the ability to confidently synthesize and massage many types of data sets.
Armed with a thorough understanding of the pressures inherent to certain industries, skilled data scientists can effectively enlighten the decision-making process. To this end, interview recruits about how they view the competitive climate at the moment.
A good way to guarantee you hire the best data scientist for your needs is to ask each contender to develop a sample presentation based on a specific set of data you provide. Then, pursue the candidates who convey real vision, robust understanding and deep insight.
Related: 4 Things a Data Scientist Can Do for Entrepreneurs
Data scientists energize enterprise through discovery. Natural curiosity and enthusiasm for solving big problems coupled with an ability to transform data into a product may place one candidate above the rest.
Just as successful startup teams depend on across-the-board versatility, data scientists must be agile enough to quickly modify their methods to suit changes within a particular industry.
You want this person to beat you to the punch when it comes to anticipating questions that data could answer. Look for someone who has a keen sense for future data applications.
7. Strong communication.
Insight that canât be expressed is worthless. Good data scientists are able to uncover data patterns and are willing to explain those patterns in clear and helpful ways through thoughtful and open communication. They should know how to present visualizations of data and tell a story through numbers.
The perfect complement to a scaling, booming startup is a data scientist with a killer skill set. By sharing the burden and excitement of making crucial business decisions, this single hire can take your startup from data zero to data hero in no time.
Note: This article originally appeared in Entrepreneur. Click for link here.
Data Have Meaning
We live in a Big Data world in which everything is quantified. While the emphasis of Big Data has been focused on distinguishing the three characteristics of data (the infamous three Vs), we need to be cognizant of the fact that data have meaning. That is, the numbers in your data represent something of interest, an outcome that is important to your business. The meaning of those numbers is about the veracity of your data.
[ DATA SCIENCE Q&A]
Q:How frequently an algorithm must be updated?
A: You want to update an algorithm when:
– You want the model to evolve as data streams through infrastructure
– The underlying data source is changing
– Example: a retail store model that remains accurate as the business grows
– Dealing with non-stationarity
– Incremental algorithms: the model is updated every time it sees a new training example
Note: simple, you always have an up-to-date model but you cant incorporate data to different degrees.
Sometimes mandatory: when data must be discarded once seen (privacy)
– Periodic re-training in batch mode: simply buffer the relevant data and update the model every-so-often
Note: more decisions and more complex implementations
– Is the sacrifice worth it?
– Data horizon: how quickly do you need the most recent training example to be part of your model?
– Data obsolescence: how long does it take before data is irrelevant to the model? Are some older instances
more relevant than the newer ones?
Economics: generally, newer instances are more relevant than older ones. However, data from the same month, quarter or year of the last year can be more relevant than the same periods of the current year. In a recession period: data from previous recessions can be more relevant than newer data from different economic cycles.
In 2015, more than 169 million personal records were exposed, ranging from financial records, trade secrets, and important files from education, government, and healthcare sector. Though big organizations are the usual victims of data breach, there is an ongoing trend which shows that small businesses are rapidly becoming a much-favored victim by hackers nowadays. 2017 should be high on charts for businesses to fix their security loopholes.
Here’s a great cheat sheet on 8 data security tips that will come handy in case one needs to revisit their data security strategy.
8 Pointers are:
Designate Computer Access Levels
Enable Two-Factor Authentication
Secure Wireless Network Connection
Use SSL for exchanging Sensitive Data
Use Trusted Resources for storage
Store Encrypted Data Backups
Make your staff aware
Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.
[ DATA SCIENCE Q&A]
Q:What is the maximal margin classifier? How this margin can be achieved?
A: * When the data can be perfectly separated using a hyperplane, there actually exists an infinite number of these hyperplanes
* Intuition: a hyperplane can usually be shifted a tiny bit up, or down, or rotated, without coming into contact with any of the observations
* Large margin classifier: choosing the hyperplance that is farthest from the training observations
* This margin can be achieved using support vectors
As recently as 2009 there were only a handful of big data projects and total industry revenues were under $100 million. By the end of 2012 more than 90 percent of the Fortune 500 will likely have at least some big data initiatives under way.
One of the byproducts of technologyâs continued expansion is a high volume of data generated by the web, mobile devices, cloud computing and the Internet of Things (IoT). Converting this âbig dataâ into usable information has created its own side industry, one that businesses can use to drive strategy and better understand customer behavior.
The big data industry requires analysts to stay up to date with the machinery, tools and concepts associated with big data, and how each can be used to grow the field. Letâs explore three trends currently shaping the future of the big data industry:
Big Data Analytics Degrees
Mostly due to lack of know-how, businesses arenât tapping into the full potential of big data. In fact, most companies only analyze about 12 percent of the emails, text messages, social media, documents or other data-collecting channels available to them (Forrester). Many universities now offer programs for big data analytics degrees to directly acknowledge this skills gap. The programs are designed to administer analytical talent, train and teach the skillsets â such as programming language proficiency, quantitative analysis tool expertise and statistical knowledge â needed to interpret big data. Analysts predict the demand for industry education will only grow, making it essential for universities to adopt analytics-based degree programs.
Predicting Consumer Behaviors
Big data allows businesses to access and extract key insights about their consumerâs behavior. Predictive analytics challenges businesses to take data interpretation a step further by not only looking for patterns and trends, but using them to predict future purchasing habits or actions. In essence, predictive analytics, which is a branch of big data and data mining, allows businesses to make more data-based predictions, optimize processes for better business outcomes and anticipate potential risk.
Another benefit of predictive analytics is the impact it will have on industries such as health informatics. Health informatics uses electronic health record (EHR) systems to solve problems in healthcare such as effectively tracking a patientâs medical history. By documenting records in electronic format, doctors can easily track and assess a patientâs medical history from any certified access port. This allows doctors to make assumptions about a patientâs health using predictive analytics based on documented results.
Cognitive Machine Improvements
A key trend evolving in 2016 is cognitive improvement in machinery. As humans, we crave relationship and identify with brands, ideas and concepts that are relatable and easy to use. We expect technology will adapt to this need by âhumanizingâ the way machines retain memories and interpret and process information.
Cognitive improvement aims to solve computing errors, yet still predict and improve outcomes as humans would. It also looks to solve human mistakes, such as medical errors or miscalculated analytics reports. A great example of cognitive improvement is IBMâs Watson supercomputer. Itâs classified as the leading cognitive machine to answer complex questions using natural language.
The rise of big data mirrors the rise of tech. In 2016, we will start to see trends in big data education, as wells as a shift in data prediction patterns and error solutions. The future is bright for business and analytic intelligence, and it all starts with big data.
Dr. Athanasios Gentimis
Dr. Athanasios (Thanos) Gentimis is an Assistant Professor of Math and Analytics at Florida Polytechnic University. Dr. Gentimis received a Ph.D. in Theoretical Mathematics from the University of Florida, and is knowledgeable in several computer programming/technical languages that include C++, FORTRAN, Python and MATLAB.
A successfulÂ customer experience managementÂ (CEM) programÂ requires the collection, synthesis, analysis and dissemination of different types of business metrics, including operational, financial, constituency and customer metrics (see Figure 1). Â The quality of customer metrics necessarily impacts your understanding of how to best manage customer relationships to improve the customer experience, increase customer loyalty and grow your business. Using the wrong customer metrics could lead to sub-optimal decisions while using the right customer metrics can lead to good decisions that give you a competitive edge. Â How do you know if you are using the right customer metrics in your CEM program? This post will help formalize a set of standards you can use to evaluate your customer metrics.
Customer metrics areÂ numerical scores or indicesÂ that summarize customer feedback results. They can be based on either customer ratings (e.g., average satisfaction rating with product quality) or open-ended customer comments (via sentiment analysis). Additionally, customer ratings can be based on aÂ single itemÂ or anÂ aggregated set of itemsÂ (averaging over a set of items to get a single score/metric).
Meaning of Customer Metrics
Customer metrics represent more than just numerical scores. Customer metrics have a deeper meaning, representing some underlying characteristic/mental processes about your customers: their opinions and attitudes about and intentions toward your company or brand. Figure 2 depicts this relationship between the feedback tool (questions) and the this overall score that we label as something. Â Gallup claims to measureÂ customer engagement (CE11)Â using 11 survey questions. Other practitioners have developed their unique metrics that assess underlying customer attitudes/intentions. TheÂ SERVQUALÂ method assesses several dimensions of service quality; theÂ RAPID LoyaltyÂ approach measures three types of customer loyalty: retention, advocacy and purchasing.Â TheÂ Net Promoter ScoreÂ® measures likelihood to recommend.
Customer Metrics are Necessary for Effective CEM Programs but not Frequently Used
Despite the usefulness of customer metrics, few businesses gather them. In a study examining the use of customer experience (CX) metrics, Bruce Temkin found that only about half (52%) ofÂ businesses collect and communicate customer experience (CX) metrics. Even fewer of them review CX metrics with cross-functional teams (39%), tie compensation to CX metrics (28%) or make trade-offs between financial and CX metrics (19%).
Evaluating Your Customer Metrics
As companies continue to grow their CEM programs and adopt best practices, they will rely more and more on the use of customer metrics.Â Whether you are developing your own in-house customer metric or using a proprietary customer metric, you need to be able to criticallyÂ evaluate them to ensure they are meeting the needs of your CEM program.Â Here are four questions to ask about your customer metrics.
1. What is the definition of the customer metric?
Customer metrics need to be supported by a clear description of what it is measuring. Basically, the customer metric is defined the way that words are defined in the dictionary. They areÂ non-ambiguous and straightforward.Â The definition, referred to as the constitutive definition,Â not only tells you what the customer metric is measuring, it also tells you what the customer metric is not measuring.
The complexity of the definition will match the complexity of the customer metric itself. Depending on the customer metric, definitions can reflect a narrow concept or a more complex concept. For single-item metrics, definitions are fairly narrow. For example, a customer metric based on the satisfaction rating of a single overall product quality question would have the following definition: “Satisfaction with product quality”. For customer metrics that are made up of several items, a well-articulated definition is especially important. These customer metrics measure something more nuanced than single-item customer metrics. Try to capture the essence of the commonality shared across the different items. For example, if the ratings of five items about the call center experience (e.g., technical knowledge of rep, professionalism of rep, resolution) are combined into an overall metric, then the definition of the overall metric would be: “Overall satisfaction with call center experience.”
2. How is the customer metric calculated?
Closely related to question 1, you need to convey precisely how the customer metric is calculated. Understanding how the customer metric is calculated requires understanding two things: 1) the specific items/questions in the customer metric; 2) how items/questions were combined to get to the final score. Knowing the specific items and how they are combined help define what the customer metric is measuring (operational definition). Any surveyÂ instructions and information about the rating scale (numerical and verbal anchors) need to be included.
3. What are the measurement properties of the customer metric?
Measurement properties refer to a scientifically-derived indices that describe the quality of a customer metric. Applying the field ofÂ psychometrics and scientific measurement standards (Standards for Educational and Psychological Testing), you can evaluate the quality of customer metrics.Â Analyzing existing customer feedback data, you are able to evaluate customer metrics along two criteria: 1) Reliability and 2) Validity. ReliabilityÂ refers to measurement precision/consistency.Â ValidityÂ is concerned with what is being measured. Providing evidence of reliability and validity of your customer metrics is essential towards establishing a solid set of customer metrics for your CEM program. The relationship between these two measurement criteria is depicted in Figure 3. Your goal is to develop/select customer metrics that are both reliable and valid (top right quadrant).
While there are different kinds of reliability (see Figure 4), one in particular is especially important when the customer metric is made up of multiple items (e.g., most commonly, items are averaged to get one overall metric). Internal consistency reliability is a great summary index that tells you if the items should combined together. Higher internal consistency (above .80 is good; 1.0 is the maximum possible) tells you that the items measure one underlying construct; aggregating them makes sense. Low internal consistency tells you that the items are likely measuring different things and should not be aggregated together.
There are three different lines ofÂ validity evidenceÂ that help show that the customer metric actually measures what you think it is measuring. To establish that a customer metric assesses something real, you can look at the content of the items to determine how well they represent your variable of interest (establishing evidence ofÂ content validity), you can calculate how well the customer metric correlates with some external criteria (establishing evidence ofÂ criterion validity) and you can understand, through statistical relationships among different metrics, how your customer metric fits into a theoretical framework that distinguishes your customer metric from other customer metrics (e.g., How is the customer engagement metric different than the customer advocacy metric? -Â construct validity).
These three different lines of validity evidence demonstrate that the customer metric measures what it is intended to measure. Criterion-related validity evidence often involves linking customer metrics to other data sources (operational metrics, financial metrics, constituency metrics).
Exploring the reliability and validity of your current customer metrics has a couple of extra benefits. First, these types of analyses can improve the measurement properties of your current customer metrics by identifying unnecessary questions. Second, reliability and validity analysis can improve the overall customer survey by identifying CX questions that do not help explain customer loyalty differences. Removal of specific CX questions can significantly reduce survey length without loss of information.
4. How useful is the customer metric?
While customer metrics can beÂ usedÂ for many types of analyses (e.g., driver, segmentation), theirÂ usefulnessÂ is demonstrated by the number and types of insights they provide. Your validation efforts to understand the quality of the customer metrics create a practical framework for making real organizational changes.Â Specifically, by understanding the causes and consequences of the customer metric, you can identify/create customer-centric operational metrics (See Figure 5) to help manage call center performance, understand how changes in the customer metric correspond to changes in revenue (See Figure 6) and identify customer-focused training needs and standards for employees (See Figure 7).
Below are two articles on the development and validation of four customer metrics. One article focuses on three related customer metrics. The other article focuses on an employee metric. Even though this present blog post talked primarily about customer metrics, the same criteria can be applied to employee metrics.
In each article, I present the necessary information needed to critically evaluate each customer metric: 1) Clear definition of the customer metrics, 2) description of how metrics are calculated, 3) measurement properties (reliability/validity), 4) show that metrics are related to important outcomes (e.g., revenue, employee satisfaction). The articles are:
Hayes, B.E.Â (2011). Lessons in loyalty. Quality Progress, March, 24-31.Â Paper discusses theÂ development and validation of the RAPID Loyalty approach. Three reliable customer loyalty metrics are predictive of different types of business growth.Â Read entire article.
Hayes, B. E. (1994). How to measure empowerment. Quality Progress, 27(2), 41-46. Paper discusses need to define and measure empowerment. Researcher develops reliable measure of employee perceptions of empowerment, the Employee Empowerment Questionnaire (EEQ). The EEQ was related to important employee attitudes (job satisfaction).Â Read entire article.
A customer metric is good when: 1) it is supported with aÂ clear definitionÂ of what it measures and what is does not measure; 2) there is a clear method ofÂ how the metric is calculated, including all items and how they are combined; 3) there is goodÂ reliability and validity evidenceÂ regarding how well the customer metric measures what it is supposed to measure; 4) they areÂ usefulÂ in helping drive real internal changes (e.g., improved marketing, sales, service) that lead to measurable business growth (e.g., increased revenue, decreased churn).
Using customer metrics that meet these criteria will ensure your CEM program is effective in improving how your manage the customer relationship. Clear definitions of the metrics and accompanying descriptions of how they are calculated help improve communications regarding customer feedback. Different employees, across job levels or roles, can now speak a common language about feedback results. Establishing the reliability and validity of the metrics gives senior executives the confidence they need to use customer feedback as part of their decision-making process.
The bottom line: a good customer metric provides information that is reliable, valid and useful.
Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.
[ DATA SCIENCE Q&A]
Q:Is it better to spend 5 days developing a 90% accurate solution, or 10 days for 100% accuracy? Depends on the context?
A: * premature optimization is the root of all evils
* At the beginning: quick-and-dirty model is better
* Optimization later
– Depends on the context
– Is error acceptable? Fraud detection, quality assurance
The largest AT&T database boasts titles including the largest volume of data in one unique database (312 terabytes) and the second largest number of rows in a unique database (1.9 trillion), which comprises AT&T’s extensive calling records.
We all know what âfocus groupâ is and what it is used for. What we don’t admit quickly is that it has little use and that we all deal with it acting old school. With changing consumer ecosystem, we should think of some other more quantitative technique that is more relevant to the current stage. With ever evolving technology and sophisticated tools, there is no reason to feel otherwise. Focus group was never an efficient way to measure product-market fit. But, considering it was the only thing that was easily available that could provide a decent start; industry went with it. We are now at a point where we could change and upgrade ourselves to harness better ways to measure potential product need and adoption.
Few of the downsides of using focus group
Unnatural settings for participants
Consider a situation where a bunch of strangers come together and discuss about some product that they have not seen before. When in real life would such an incident occur? Why would someone speak honestly without any trust between moderator and the participant? This is not a natural setting where anyone experiences a real product. So why should we use this template to make decisions?
Not in accord of how a real decision process works
Calling people and having them sit in a group and vouch for product is not how we should decide on the attractiveness/adoption of a product. There are several other things that work in tandem to influence our decision making process spend on a product and those are almost impossible to replicate in focus group sessions. For example â In real life, most of the people depend on word of mouth and suggestions from friends and family to try and adopt a new product. Such a flaw induces greater margin of error in data gathered from such groups.
Motivation for the participants is different
This is another area which makes focus group less reliable area to focus on. Consider why someone will ever detach from their day-to-day lives to come to a focus group. The reasons could be many, namely – Money, early adopter, ability to meet / network with people etc. Such variation in experience and motivation for participants induces more noise than signals.
Not a right framework for asking for snap judgment on products
Another interesting point against focus group template is its framework to gather people out of the blue, have them experience product for the first time and ask for their opinion. Everyone brings their own speed to the table when it comes to understanding the product. So, how can it be not flawed when everyone is asked at same short interval to share their opinion? This also induces error in findings.
Little is useless and more is expensive
We all know that the background for the participants is highly variable, and it is almost impossible to carve a niche out of the participants. If few participants are invited, it is extremely hard to pin-point the needs of participants, and if we invite too many, it will be an expensive model and with all the error and flaws in it. This makes focus group model useless and costly.
It is not about the product but the experience
A product never alone work on its own, it often works in conjunction with experience that is delivered by other dependent areas. And cumulative interactions deliver the product experience. In focus group, it is extremely difficult to deliver an exact experience as it has not been built into the mix yet. Experience comes after numerous product iterations with customers. So, in initial stages, it is extremely difficult to suggest anything by just quick hands on with product and no experience build around it.
Consider a case where iTunes is pitched to focus group. âiTunes is a place where you could buy individual songs and not the whole album, yes online and no, No CDsâ. Have you ever wondered how that will fly? Focus group is great in suggesting something right in the ally of what is already present today. If there is a groundbreaking product whose market has not yet been explored, it could induce some uneasiness and could easily meet with huge rejection. So, focus groups are pretty much innovation killers.
People might not be honest unintentionally
Consider a case where you are asked about your true feelings for a product in a room full with people who think highly about it. Wouldn’t it skew your observation as well? We all have a strong tendency to bend towards political correctness causing us to skew actual findings. There are other such biases caused by group think, dominating personality in the room etc. that have been identified to invalidate the findings of the focus group sessions. This introduces error in judgment and makes collected data erroneous.
Above stated reasons are few of many that make a focus group obsolete, erroneous and unreliable. So, we should avoid using them and we should substitute it with other more effective ways.
So, whatâs next? What should companies do? Letâs leave it to another day, and another blog. Catch you all soon.