[ COVER OF THE WEEK ]
Trust the data Source
[ LOCAL EVENTS & SESSIONS]
More WEB events? Click Here
[ AnalyticsWeek BYTES]
>> July 31, 2017 Health and Biotech analytics news roundup by pstein
>> For Musicians and Songwriters, Streaming Creates Big Data Challenge by analyticsweekpick
>> Simplifying Data Warehouse Optimization by analyticsweekpick
Wanna write? Click Here
[ NEWS BYTES]
Is Population Health on the Agenda as Google Nabs Geisinger CEO? – Health IT Analytics Under Health Analytics
How to catch security blind spots during a cloud migration – GCN.com Under Cloud
Data Analytics Outsourcing Market Application Analysis, Regional Outlook, Growth Trends, Key Players and Forecasts … – AlgosOnline (press release) (blog) Under Social Analytics
More NEWS ? Click Here
[ FEATURED COURSE]
Statistical Thinking and Data Analysis
[ FEATURED READ]
Big Data: A Revolution That Will Transform How We Live, Work, and Think
[ TIPS & TRICKS OF THE WEEK]
Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.
[ DATA SCIENCE Q&A]
Q:What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?
Its measure of performance of a targeting model (or a rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. Lift is simply: target response/average response.
Suppose a population has an average response rate of 5% (mailing for instance). A certain model (or rule) has identified a segment with a response rate of 20%, then lift=20/5=4
Typically, the modeler seeks to divide the population into quantiles, and rank the quantiles by lift. He can then consider each quantile, and by weighing the predicted response rate against the cost, he can decide to market that quantile or not.
if we use the probability scores on customers, we can get 60% of the total responders wed get mailing randomly by only mailing the top 30% of the scored customers.
– Key performance indicator
– A type of performance measurement
– Examples: 0 defects, 10/10 customer satisfaction
– Relies upon a good understanding of what is important to the organization
Marketing & Sales:
– New customers acquisition
– Customer attrition
– Revenue (turnover) generated by segments of the customer population
– Often done with a data management platform
– Mean time between failure
– Mean time to repair
– Statistics with good performance even if the underlying distribution is not normal
– Statistics that are not affected by outliers
– A learning algorithm that can reduce the chance of fitting noise is called robust
– Median is a robust measure of central tendency, while mean is not
– Median absolute deviation is also more robust than the standard deviation
– How well a statistical model fits a set of observations
– Examples: AIC, R2, Kolmogorov-Smirnov test, Chi 2, deviance (glm)
Design of experiments:
The design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation.
In its simplest form, an experiment aims at predicting the outcome by changing the preconditions, the predictors.
– Selection of the suitable predictors and outcomes
– Delivery of the experiment under statistically optimal conditions
– Blocking: an experiment may be conducted with the same equipment to avoid any unwanted variations in the input
– Replication: performing the same combination run more than once, in order to get an estimate for the amount of random error that could be part of the process
– Interaction: when an experiment has 3 or more variables, the situation in which the interaction of two variables on a third is not additive
– Pareto principle
– 80% of the effects come from 20% of the causes
– 80% of your sales come from 20% of your clients
– 80% of a company complaints come from 20% of its customers
[ VIDEO OF THE WEEK]
#FutureOfData with @theClaymethod, @TiVo discussing running analytics in media industry
Subscribe to Youtube
[ QUOTE OF THE WEEK]
Big Data is not the new oil. – Jer Thorp
[ PODCAST OF THE WEEK]
@DrewConway on creating socially responsible data science practice #FutureOfData #Podcast
[ FACT OF THE WEEK]
According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years.