[ COVER OF THE WEEK ]
Complex data Source
[ LOCAL EVENTS & SESSIONS]
- Jan 23, 2018 #WEB Lean Six Sigma Black Belt-4 days Classroom Training in Atlanta
- Dec 20, 2017 #WEB Connected Devices in a Network: What we should Care About?
- Dec 12, 2017 #WEB Lean Six Sigma Black Belt-4 days Classroom Training in Indianapolis IN
[ AnalyticsWeek BYTES]
[ NEWS BYTES]
[ FEATURED COURSE]
[ FEATURED READ]
Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely wor… more
[ TIPS & TRICKS OF THE WEEK]
Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.
[ DATA SCIENCE Q&A]
Q:What is random forest? Why is it good?
A: Random forest? (Intuition):
– Underlying principle: several weak learners combined provide a strong learner
– Builds several decision trees on bootstrapped training samples of data
– On each tree, each time a split is considered, a random sample of m predictors is chosen as split candidates, out of all p predictors
– Rule of thumb: at each split m=?p
– Predictions: at the majority rule
Why is it good?
– Very good performance (decorrelates the features)
– Can model non-linear class boundaries
– Generalization error for free: no cross-validation needed, gives an unbiased estimate of the generalization error as the trees is built
– Generates variable importance
[ VIDEO OF THE WEEK]
Subscribe to Youtube
[ QUOTE OF THE WEEK]
It’s easy to lie with statistics. It’s hard to tell the truth without statistics. Andrejs Dunkels
[ PODCAST OF THE WEEK]
[ FACT OF THE WEEK]
In 2008, Google was processing 20,000 terabytes of data (20 petabytes) a day.