Jun 15, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data Accuracy  Source

[ AnalyticsWeek BYTES]

>> 5 Steps Required to Building a Best Practice Digital Analytics Function by analyticsweekpick

>> 100 Greatest Quotes On Leadership by v1shal

>> For the airline industry, big data is cleared for take-off by anum

Wanna write? Click Here

[ NEWS BYTES]

>>
 The Rise of Network Functions Virtualization – Virtualization Review Under  Virtualization

>>
 Data Science Up and Down the Ladder of Abstraction – InfoQ.com Under  Data Science

>>
 Wildly inaccurate election forecasts highlight Big Data challenges – ZDNet Under  Big Data Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

How to Create a Mind: The Secret of Human Thought Revealed

image

Ray Kurzweil is arguably today’s most influential—and often controversial—futurist. In How to Create a Mind, Kurzweil presents a provocative exploration of the most important project in human-machine civilization—reverse… more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:Is it better to spend 5 days developing a 90% accurate solution, or 10 days for 100% accuracy? Depends on the context?
A: * “premature optimization is the root of all evils”
* At the beginning: quick-and-dirty model is better
* Optimization later
Other answer:
– Depends on the context
– Is error acceptable? Fraud detection, quality assurance

Source

[ VIDEO OF THE WEEK]

Surviving Internet of Things

 Surviving Internet of Things

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

In God we trust. All others must bring data. – W. Edwards Deming

[ PODCAST OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

The largest AT&T database boasts titles including the largest volume of data in one unique database (312 terabytes) and the second largest number of rows in a unique database (1.9 trillion), which comprises AT&T’s extensive calling records.

Sourced from: Analytics.CLUB #WEB Newsletter

Why Focus Groups Don’t Work And Cost Millions

030120.focusgroup
We all know what “focus group” is and what it is used for. What we don’t admit quickly is that it has little use and that we all deal with it acting old school. With changing consumer ecosystem, we should think of some other more quantitative technique that is more relevant to the current stage. With ever evolving technology and sophisticated tools, there is no reason to feel otherwise. Focus group was never an efficient way to measure product-market fit. But, considering it was the only thing that was easily available that could provide a decent start; industry went with it. We are now at a point where we could change and upgrade ourselves to harness better ways to measure potential product need and adoption.

Few of the downsides of using focus group

Unnatural settings for participants
Consider a situation where a bunch of strangers come together and discuss about some product that they have not seen before. When in real life would such an incident occur? Why would someone speak honestly without any trust between moderator and the participant? This is not a natural setting where anyone experiences a real product. So why should we use this template to make decisions?

Not in accord of how a real decision process works
Calling people and having them sit in a group and vouch for product is not how we should decide on the attractiveness/adoption of a product. There are several other things that work in tandem to influence our decision making process spend on a product and those are almost impossible to replicate in focus group sessions. For example – In real life, most of the people depend on word of mouth and suggestions from friends and family to try and adopt a new product. Such a flaw induces greater margin of error in data gathered from such groups.

Motivation for the participants is different
This is another area which makes focus group less reliable area to focus on. Consider why someone will ever detach from their day-to-day lives to come to a focus group. The reasons could be many, namely – Money, early adopter, ability to meet / network with people etc. Such variation in experience and motivation for participants induces more noise than signals.

Not a right framework for asking for snap judgment on products
Another interesting point against focus group template is its framework to gather people out of the blue, have them experience product for the first time and ask for their opinion. Everyone brings their own speed to the table when it comes to understanding the product. So, how can it be not flawed when everyone is asked at same short interval to share their opinion? This also induces error in findings.

Little is useless and more is expensive
We all know that the background for the participants is highly variable, and it is almost impossible to carve a niche out of the participants. If few participants are invited, it is extremely hard to pin-point the needs of participants, and if we invite too many, it will be an expensive model and with all the error and flaws in it. This makes focus group model useless and costly.

It is not about the product but the experience
A product never alone work on its own, it often works in conjunction with experience that is delivered by other dependent areas. And cumulative interactions deliver the product experience. In focus group, it is extremely difficult to deliver an exact experience as it has not been built into the mix yet. Experience comes after numerous product iterations with customers. So, in initial stages, it is extremely difficult to suggest anything by just quick hands on with product and no experience build around it.

Innovation suppressant
Consider a case where iTunes is pitched to focus group. “iTunes is a place where you could buy individual songs and not the whole album, yes online and no, No CDs”. Have you ever wondered how that will fly? Focus group is great in suggesting something right in the ally of what is already present today. If there is a groundbreaking product whose market has not yet been explored, it could induce some uneasiness and could easily meet with huge rejection. So, focus groups are pretty much innovation killers.

People might not be honest unintentionally
Consider a case where you are asked about your true feelings for a product in a room full with people who think highly about it. Wouldn’t it skew your observation as well? We all have a strong tendency to bend towards political correctness causing us to skew actual findings. There are other such biases caused by group think, dominating personality in the room etc. that have been identified to invalidate the findings of the focus group sessions. This introduces error in judgment and makes collected data erroneous.

Above stated reasons are few of many that make a focus group obsolete, erroneous and unreliable. So, we should avoid using them and we should substitute it with other more effective ways.

So, what’s next? What should companies do? Let’s leave it to another day, and another blog. Catch you all soon.

Source: Why Focus Groups Don’t Work And Cost Millions by d3eksha

Jun 08, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Fake data  Source

[ AnalyticsWeek BYTES]

>> IBM and Hadoop Challenge You to Use Big Data for Good by bobehayes

>> AtScale opens Hadoop’s big-data vaults to nonexpert business users by anum

>> Hacking the Data Science by v1shal

Wanna write? Click Here

[ FEATURED COURSE]

Machine Learning

image

6.867 is an introductory course on machine learning which gives an overview of many concepts, techniques, and algorithms in machine learning, beginning with topics such as classification and linear regression and ending … more

[ FEATURED READ]

The Future of the Professions: How Technology Will Transform the Work of Human Experts

image

This book predicts the decline of today’s professions and describes the people and systems that will replace them. In an Internet society, according to Richard Susskind and Daniel Susskind, we will neither need nor want … more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:What is a decision tree?
A: 1. Take the entire data set as input
2. Search for a split that maximizes the ‘separation” of the classes. A split is any test that divides the data in two (e.g. if variable2>10)
3. Apply the split to the input data (divide step)
4. Re-apply steps 1 to 2 to the divided data
5. Stop when you meet some stopping criteria
6. (Optional) Clean up the tree when you went too far doing splits (called pruning)

Finding a split: methods vary, from greedy search (e.g. C4.5) to randomly selecting attributes and split points (random forests)

Purity measure: information gain, Gini coefficient, Chi Squared values

Stopping criteria: methods vary from minimum size, particular confidence in prediction, purity criteria threshold

Pruning: reduced error pruning, out of bag error pruning (ensemble methods)

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Big Data Analytics

 @AnalyticsWeek Panel Discussion: Big Data Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

In God we trust. All others must bring data. – W. Edwards Deming

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with David Rose, @DittoLabs

 #BigData @AnalyticsWeek #FutureOfData #Podcast with David Rose, @DittoLabs

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Data production will be 44 times greater in 2020 than it was in 2009.

Sourced from: Analytics.CLUB #WEB Newsletter

What is the Value of International Polls about the US Presidential Candidates?

I saw the results of a recent opinion poll about the US presidential election that amazed me. While many recent polls of US voters reveal a virtual tie in presidential race between Barack Obama and Mitt Romney, a BBC poll surveying citizens from other countries about the US president found overwhelming support for Barack Obama over Mitt Romney. In this late summer/early fall study by GlobeScan and PIPA of over 20,000 people across 21 countries, 50% favored Obama and 9% favored Mr Romney.

Global Businesses Needs Global Feedback

Companies conducting international business regularly poll their customers and prospects across the different countries they serve in hopes to get better insights about how to run their business. They use this feedback to help them understand where to enter new markets, guide product development, and improve service quality, just to name a few. The end goal is to create a loyal customer base (e.g., customers come back, recommend and expand relationship).

The US government’s policies impact international relations on many levels (e.g., economically, financially and socially). Could there be some value from this international poll for the candidates themselves and their constituencies?

Looking at the results of the poll, there are few implications that stand out to me:

  1. The Romney brand has little international support. Mitt Romney has touted that his business experience has prepared him to be an effective president. How can he use these results to improve his image abroad?
  2. Many international citizens do not care about the US presidency (in about half of the countries, fewer than 50% of respondents did not express an opinion for either Obama or Romney).
  3. After four years of an Obama presidency, the international community continues to support the re-election of Obama. Obama received comparable results in 2008.

I like to use data whenever possible to help me guide my decisions. However, I will be the first to admit that I am no expert on international relations. So, I am seeking help from my readers. Here are three questions:

  1. Are these survey results useful to help guide US constituencies’ voting decision?
  2. Is international citizenry survey results about the US presidential candidates analogous to international customer survey results about US companies?
  3. If you owned a company and where selling the Obama and Romney brand, how would you use these survey results (barring simply ignoring them) to improve international customer satisfaction?

I would love to hear your opinions.

Source: What is the Value of International Polls about the US Presidential Candidates?

Jun 01, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Ethics  Source

[ AnalyticsWeek BYTES]

>> Big Data has Big Implications for Customer Experience Management by bobehayes

>> Optimizing your customer relationship survey by bobehayes

>> Big Data Insights in Healthcare, Part II. A Perspective on Challenges to Adoption by froliol

Wanna write? Click Here

[ NEWS BYTES]

>>
 Red Hat launches OpenShift on Google Cloud – ZDNet Under  Cloud

>>
 Justifying the Hybrid Cloud: It’s All About the Application – IT Business Edge (blog) Under  Hybrid Cloud

>>
 MEF, PNDA Roll Out Initiative Focusing on LSO Analytics – SDxCentral Under  Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Machine Learning

image

6.867 is an introductory course on machine learning which gives an overview of many concepts, techniques, and algorithms in machine learning, beginning with topics such as classification and linear regression and ending … more

[ FEATURED READ]

On Intelligence

image

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:Examples of NoSQL architecture?
A: * Key-value: in a key-value NoSQL database, all of the data within consists of an indexed key and a value. Cassandra, DynamoDB
* Column-based: designed for storing data tables as sections of columns of data rather than as rows of data. HBase, SAP HANA
* Document Database: map a key to some document that contains structured information. The key is used to retrieve the document. MongoDB, CouchDB
* Graph Database: designed for data whose relations are well-represented as a graph and has elements which are interconnected, with an undetermined number of relations between them. Polyglot Neo4J

Source

[ VIDEO OF THE WEEK]

Reimagining the role of data in government

 Reimagining the role of data in government

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data that is loved tends to survive. – Kurt Bollacker, Data Scientist, Freebase/Infochimps

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with  John Young, @Epsilonmktg

 #BigData @AnalyticsWeek #FutureOfData #Podcast with John Young, @Epsilonmktg

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

By 2020, we will have over 6.1 billion smartphone users globally (overtaking basic fixed phone subscriptions).

Sourced from: Analytics.CLUB #WEB Newsletter

Are U.S. Hospitals Delivering a Better Patient Experience?

The Centers for Medicare & Medicaid Services (CMS) use patient feedback about their care as part of their reimbursement plan for acute care hospitals. Under the Hospital Value-Based Purchasing Program, CMS makes value-based incentive payments to acute care hospitals, based either on how well the hospitals perform on certain quality measures or how much the hospitals’ performance improves on certain quality measures from their performance during a baseline period. This program began in FY 2013 for discharges occurring on or after October 1, 2012.

A standard patient satisfaction survey, known as HCAHPS (Hospital Consumer Assessment of Healthcare Providers and Systems), is the source of the patient feedback for the reimbursement program. I have previously used these publicly available HCAHPS data to understand the state of affairs for US hospitals in 2011 (see Big Data Provides Big Insights for U.S. Hospitals). Now that the Value-Based Purchasing program has been in effect since October 2012, I wanted to revisit the HCAHPS patient survey data to determine if US hospitals have improved. First, let’s review the HCAHPS survey.

The HCAHPS Survey

The survey asks a random sample of recently discharged patients about important aspects of their hospital experience. The data set includes patient survey results for US hospitals on ten measures of patients’ perspectives of care. The 10 measures are:

  1. Nurses communicate well
  2. Doctors communicate well
  3. Received help as soon as they wanted (Responsive)
  4. Pain well controlled
  5. Staff explain medicines before giving to patients
  6. Room and bathroom are clean
  7. Area around room is quiet at night
  8. Given information about what to do during recovery at home
  9. Overall hospital rating
  10. Recommend hospital to friends and family (Recommend)

For questions 1 through 7, respondents were asked to provide frequency ratings about the occurrence of each attribute (Never, Sometimes, Usually, Always). For question 8, respondents were provided a Y/N option. For question 9, respondents were asked to provide an overall rating of the hospital on a scale from 0 (Worst hospital possible) to 10 (Best hospital possible). For question 10, respondents were asked to provide their likelihood of recommending the hospital (Definitely no, Probably no, Probably yes, Definitely yes).

The Metrics

The HCAHPS data sets report metrics for each hospital as percentages of responses. Because the data sets have already been somewhat aggregated (e.g., percentages reported for group of response options), I was unable to calculate average scores for each hospital. Instead, I used top box scores as the metric of patient experience. I found that top box scores are highly correlated with average scores across groups of companies, suggesting that these two metrics tell us the same thing about the companies (in our case, hospitals).

Top box scores for the respective rating scales are defined as: 1) Percent of patients who reported “Always”; 2) Percent of patients who reported “Yes”; 3) Percent of patients who gave a rating of 9 or 10; 4) Percent of patients who said “Definitely yes.”

Top box scores provide an easy-to-understand way of communicating the survey results for different types of scales. Even though there are four different rating scales for the survey questions, using a top box reporting method puts all metrics on the same numeric scale. Across all 10 metrics, hospital scores can range from 0 (bad) to 100 (good).

I examined PX ratings of acute care hospitals across two time periods. The two time periods were 2011 (Q3 2010 through Q2 2011) and 2013 (Q4 2012 through Q3 2013). The data from the 2013 time-frame are the latest publicly available patient survey data as of this writing.

Results: Patient Satisfaction with US Hospitals Increasing

Patient Advocacy Trends for Acute Care Hospitals in US
Figure 1. Patient advocacy has increased for US hospitals

Figure 1 contains the comparisons for patient advocacy ratings for US hospitals across the two time periods. Paired T-tests comparing the three loyalty metrics across the two time periods were statistically significant, showing that patients are reporting higher levels of loyalty toward hospitals in 2013 compared to 2011. This increase in patient loyalty, while small, is still real.

Greater gains in patient loyalty have been seen for Overall Hospital Rating (increase of 2.26) compared to Recommend (increase of 1.09).

Figure 2. Patient Experience Trends
Figure 2. Patient satisfaction with their in-patient experience has increased for US hospitals

Figure 2 contains the comparisons for patient experience ratings for US hospitals across the two time periods. Again, paired T-tests comparing the seven PX metrics across the two time periods were statistically significant, showing that patients are reporting higher levels of satisfaction with their in-patient experience in 2013 compared to 2011.

The biggest increases in satisfaction were seen in “Given information about recovery,” “Staff explained meds” and “Responsive.” The smallest increases in satisfaction were seen for “Doctor communication” and “Pain well controlled.”

Summary

Hospital reimbursements are based, in part, on their patient satisfaction ratings. Consequently, hospital executives are focusing their efforts at improving the patient experience.

Comparing HCAHPS patient survey results from 2011 to 2013, it appears that hospitals have improved how they deliver patient care. Patient loyalty and PX metrics show significant improvements from 2011 to 2013.

Originally Posted at: Are U.S. Hospitals Delivering a Better Patient Experience? by bobehayes

May 25, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Big Data knows everything  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Your Agile Data Warehousing Architect: Excel by v1shal

>> The future of marketing automation depends on data analytics at scale by anum

>> Google loses data as lightning strikes by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Big Data Startup Tamr Wins Financial Investment From GE Ventures – CRN Under  Big Data

>>
 Securonix Unveils Big Data Security Analytics Platform With Unprecedented Threat Prediction, Detection and … – Broadway World Under  Big Data Security

>>
 Installing Ubuntu On Windows 10 — On vSphere – Virtualization Review Under  Virtualization

More NEWS ? Click Here

[ FEATURED COURSE]

Hadoop Starter Kit

image

Hadoop learning made easy and fun. Learn HDFS, MapReduce and introduction to Pig and Hive with FREE cluster access…. more

[ FEATURED READ]

Machine Learning With Random Forests And Decision Trees: A Visual Guide For Beginners

image

If you are looking for a book to help you understand how the machine learning algorithms “Random Forest” and “Decision Trees” work behind the scenes, then this is a good book for you. Those two algorithms are commonly u… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:What is: collaborative filtering, n-grams, cosine distance?
A: Collaborative filtering:
– Technique used by some recommender systems
– Filtering for information or patterns using techniques involving collaboration of multiple agents: viewpoints, data sources.
1. A user expresses his/her preferences by rating items (movies, CDs.)
2. The system matches this user’s ratings against other users’ and finds people with most similar tastes
3. With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user

n-grams:
– Contiguous sequence of n items from a given sequence of text or speech
– ‘Andrew is a talented data scientist”
– Bi-gram: ‘Andrew is”, ‘is a”, ‘a talented”.
– Tri-grams: ‘Andrew is a”, ‘is a talented”, ‘a talented data”.
– An n-gram model models sequences using statistical properties of n-grams; see: Shannon Game
– More concisely, n-gram model: P(Xi|Xi?(n?1)…Xi?1): Markov model
– N-gram model: each word depends only on the n?1 last words

Issues:
– when facing infrequent n-grams
– solution: smooth the probability distributions by assigning non-zero probabilities to unseen words or n-grams
– Methods: Good-Turing, Backoff, Kneser-Kney smoothing

Cosine distance:
– How similar are two documents?
– Perfect similarity/agreement: 1
– No agreement : 0 (orthogonality)
– Measures the orientation, not magnitude

Given two vectors A and B representing word frequencies:
cosine-similarity(A,B)=?A,B?/||A||?||B||

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek: Big Data Health Informatics for the 21st Century: Gil Alterovitz

 @AnalyticsWeek: Big Data Health Informatics for the 21st Century: Gil Alterovitz

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Torture the data, and it will confess to anything. – Ronald Coase

[ PODCAST OF THE WEEK]

#FutureOfData Podcast: Conversation With Sean Naismith, Enova Decisions

 #FutureOfData Podcast: Conversation With Sean Naismith, Enova Decisions

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

The data volumes are exploding, more data has been created in the past two years than in the entire previous history of the human race.

Sourced from: Analytics.CLUB #WEB Newsletter