Feb 27, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
statistical anomaly  Source

[ AnalyticsWeek BYTES]

>> Voices in AI – Episode 101: A Conversation with Cindi Howsen by analyticsweekpick

>> Machine Learning and Information Security: Impact and Trends by administrator

>> Autoscaling of Cloud-Native Apps Lowers TCO and Improves Availability by analyticsweek

Wanna write? Click Here

[ FEATURED COURSE]

The Analytics Edge

image

This is an Archived Course
EdX keeps courses open for enrollment after they end to allow learners to explore content and continue learning. All features and materials may not be available, and course content will not be… more

[ FEATURED READ]

Hypothesis Testing: A Visual Introduction To Statistical Significance

image

Statistical significance is a way of determining if an outcome occurred by random chance, or did something cause that outcome to be different than the expected baseline. Statistical significance calculations find their … more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:What is an outlier? Explain how you might screen for outliers and what would you do if you found them in your dataset. Also, explain what an inlier is and how you might screen for them and what would you do if you found them in your dataset
A: Outliers:
– An observation point that is distant from other observations
– Can occur by chance in any distribution
– Often, they indicate measurement error or a heavy-tailed distribution
– Measurement error: discard them or use robust statistics
– Heavy-tailed distribution: high skewness, can’t use tools assuming a normal distribution
– Three-sigma rules (normally distributed data): 1 in 22 observations will differ by twice the standard deviation from the mean
– Three-sigma rules: 1 in 370 observations will differ by three times the standard deviation from the mean

Three-sigma rules example: in a sample of 1000 observations, the presence of up to 5 observations deviating from the mean by more than three times the standard deviation is within the range of what can be expected, being less than twice the expected number and hence within 1 standard deviation of the expected number (Poisson distribution).

If the nature of the distribution is known a priori, it is possible to see if the number of outliers deviate significantly from what can be expected. For a given cutoff (samples fall beyond the cutoff with probability p), the number of outliers can be approximated with a Poisson distribution with lambda=pn. Example: if one takes a normal distribution with a cutoff 3 standard deviations from the mean, p=0.3% and thus we can approximate the number of samples whose deviation exceed 3 sigmas by a Poisson with lambda=3

Identifying outliers:
– No rigid mathematical method
– Subjective exercise: be careful
– Boxplots
– QQ plots (sample quantiles Vs theoretical quantiles)

Handling outliers:
– Depends on the cause
– Retention: when the underlying model is confidently known
– Regression problems: only exclude points which exhibit a large degree of influence on the estimated coefficients (Cook’s distance)

Inlier:
– Observation lying within the general distribution of other observed values
– Doesn’t perturb the results but are non-conforming and unusual
– Simple example: observation recorded in the wrong unit (°F instead of °C)

Identifying inliers:
– Mahalanobi’s distance
– Used to calculate the distance between two random vectors
– Difference with Euclidean distance: accounts for correlations
– Discard them

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Big Data Analytics

 @AnalyticsWeek Panel Discussion: Big Data Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

He uses statistics as a drunken man uses lamp posts—for support rather than for illumination. – Andrew Lang

[ PODCAST OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

And one of my favourite facts: At the moment less than 0.5% of all data is ever analysed and used, just imagine the potential here.

Sourced from: Analytics.CLUB #WEB Newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *