[ COVER OF THE WEEK ]
Data security Source
[ LOCAL EVENTS & SESSIONS]
- Feb 22, 2018 #WEB An Introduction to Big Data with Hadoop & Spark – Free Webinar
- Feb 21, 2018 #WEB Washington Python for Data Science
- Feb 22, 2018 #WEB New Approach To Building Big Data Apps: Product Recommendations, Fast Data & ML
[ AnalyticsWeek BYTES]
[ NEWS BYTES]
[ FEATURED COURSE]
[ FEATURED READ]
Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn … more
[ TIPS & TRICKS OF THE WEEK]
Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.
[ DATA SCIENCE Q&A]
Q:How do you know if one algorithm is better than other?
A: * In terms of performance on a given data set?
* In terms of performance on several data sets?
* In terms of efficiency?
In terms of performance on several data sets:
– ‘Does learning algorithm A have a higher chance of producing a better predictor than learning algorithm B in the given context?
– ‘Bayesian Comparison of Machine Learning Algorithms on Single and Multiple Datasets, A. Lacoste and F. Laviolette
– ‘Statistical Comparisons of Classifiers over Multiple Data Sets, Janez Demsar
In terms of performance on a given data set:
– One wants to choose between two learning algorithms
– Need to compare their performances and assess the statistical significance
One approach (Not preferred in the literature):
– Multiple k-fold cross validation: run CV multiple times and take the mean and sd
– You have: algorithm A (mean and sd) and algorithm B (mean and sd)
– Is the difference meaningful? (Paired t-test)
Sign-test (classification context):
Simply counts the number of times A has a better metrics than B and assumes this comes from a binomial distribution. Then we can obtain a p-value of the HoHo test: A and B are equal in terms of performance.
Wilcoxon signed rank test (classification context):
Like the sign-test, but the wins (A is better than B) are weighted and assumed coming from a symmetric distribution around a common median. Then, we obtain a p-value of the HoHo test.
Other (without hypothesis testing):
[ VIDEO OF THE WEEK]
Subscribe to Youtube
[ QUOTE OF THE WEEK]
Without big data, you are blind and deaf and in the middle of a freeway. Geoffrey Moore
[ PODCAST OF THE WEEK]
[ FACT OF THE WEEK]
Distributed computing (performing computing tasks using a network of computers in the cloud) is very real. Google GOOGL -0.53% uses it every day to involve about 1,000 computers in answering a single search query, which takes no more than 0.2 seconds to complete.