Dec 14, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data interpretation  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ NEWS BYTES]

>>
 AI and machine learning will make everyone a musician – Wired.co.uk Under  Machine Learning

>>
 Barnes & Noble Inc (NYSE:BKS) Institutional Investor Sentiment Analysis – WeeklyHub Under  Sentiment Analysis

>>
 Reposition DCIM systems for virtualization, container management – TechTarget Under  Virtualization

More NEWS ? Click Here

[ FEATURED COURSE]

Probability & Statistics

image

This course introduces students to the basic concepts and logic of statistical reasoning and gives the students introductory-level practical ability to choose, generate, and properly interpret appropriate descriptive and… more

[ FEATURED READ]

Thinking, Fast and Slow

image

Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely wor… more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:What is random forest? Why is it good?
A: Random forest? (Intuition):
– Underlying principle: several weak learners combined provide a strong learner
– Builds several decision trees on bootstrapped training samples of data
– On each tree, each time a split is considered, a random sample of m predictors is chosen as split candidates, out of all p predictors
– Rule of thumb: at each split m=?p
– Predictions: at the majority rule

Why is it good?
– Very good performance (decorrelates the features)
– Can model non-linear class boundaries
– Generalization error for free: no cross-validation needed, gives an unbiased estimate of the generalization error as the trees is built
– Generates variable importance

Source

[ VIDEO OF THE WEEK]

Understanding Data Analytics in Information Security with @JayJarome, @BitSight

 Understanding Data Analytics in Information Security with @JayJarome, @BitSight

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Without big data, you are blind and deaf and in the middle of a freeway. – Geoffrey Moore

[ PODCAST OF THE WEEK]

@CRGutowski from @GE_Digital on Using #Analytics to #Transform Sales #FutureOfData #Podcast

 @CRGutowski from @GE_Digital on Using #Analytics to #Transform Sales #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

235 Terabytes of data has been collected by the U.S. Library of Congress in April 2011.

Sourced from: Analytics.CLUB #WEB Newsletter

Dec 07, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Complex data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Big Data Analytics Bottleneck Challenging Global Capital Markets Ecosystem, Says TABB Group by analyticsweekpick

>> Best & Worst Time for Cold Call by v1shal

>> The real-time machine for every business: Big data-driven market analytics by thomassujain

Wanna write? Click Here

[ NEWS BYTES]

>>
 More hospital closings in rural America add risk for pregnant women – Reuters Under  Health Analytics

>>
 Statistics show fatal, injury crashes up at Sturgis Rally compared to this time last year – KSFY Under  Statistics

>>
 The power of machine learning reaches data management – Network World Under  Machine Learning

More NEWS ? Click Here

[ FEATURED COURSE]

Introduction to Apache Spark

image

Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals…. more

[ FEATURED READ]

Thinking, Fast and Slow

image

Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely wor… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE Q&A]

Q:What is random forest? Why is it good?
A: Random forest? (Intuition):
– Underlying principle: several weak learners combined provide a strong learner
– Builds several decision trees on bootstrapped training samples of data
– On each tree, each time a split is considered, a random sample of m predictors is chosen as split candidates, out of all p predictors
– Rule of thumb: at each split m=?p
– Predictions: at the majority rule

Why is it good?
– Very good performance (decorrelates the features)
– Can model non-linear class boundaries
– Generalization error for free: no cross-validation needed, gives an unbiased estimate of the generalization error as the trees is built
– Generates variable importance

Source

[ VIDEO OF THE WEEK]

RShiny Tutorial: Turning Big Data into Business Applications

 RShiny Tutorial: Turning Big Data into Business Applications

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It’s easy to lie with statistics. It’s hard to tell the truth without statistics. – Andrejs Dunkels

[ PODCAST OF THE WEEK]

@BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

 @BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

In 2008, Google was processing 20,000 terabytes of data (20 petabytes) a day.

Sourced from: Analytics.CLUB #WEB Newsletter

Nov 30, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
SQL Database  Source

[ AnalyticsWeek BYTES]

>> Getting a 360° View of the Customer – Interview with Mark Myers of IBM by bobehayes

>> The Blueprint for Becoming Data Driven: Data Quality by jelaniharper

>> May 04, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ NEWS BYTES]

>>
 Why Google’s Artificial Intelligence Confused a Turtle for a Rifle – Fortune Under  Artificial Intelligence

>>
 Microsoft Workplace Analytics helps managers understand worker … – TechCrunch Under  Analytics

>>
 Storytelling – Two Essentials for Customer Experience Professionals – Customer Think Under  Customer Experience

More NEWS ? Click Here

[ FEATURED COURSE]

A Course in Machine Learning

image

Machine learning is the study of algorithms that learn from data and experience. It is applied in a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need… more

[ FEATURED READ]

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

image

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for e… more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:How to clean data?
A: 1. First: detect anomalies and contradictions
Common issues:
* Tidy data: (Hadley Wickam paper)
column names are values, not names, e.g. 26-45…
multiple variables are stored in one column, e.g. m1534 (male of 15-34 years’ old age)
variables are stored in both rows and columns, e.g. tmax, tmin in the same column
multiple types of observational units are stored in the same table. e.g, song dataset and rank dataset in the same table
*a single observational unit is stored in multiple tables (can be combined)
* Data-Type constraints: values in a particular column must be of a particular type: integer, numeric, factor, boolean
* Range constraints: number or dates fall within a certain range. They have minimum/maximum permissible values
* Mandatory constraints: certain columns can’t be empty
* Unique constraints: a field must be unique across a dataset: a same person must have a unique SS number
* Set-membership constraints: the values for a columns must come from a set of discrete values or codes: a gender must be female, male
* Regular expression patterns: for example, phone number may be required to have the pattern: (999)999-9999
* Misspellings
* Missing values
* Outliers
* Cross-field validation: certain conditions that utilize multiple fields must hold. For instance, in laboratory medicine: the sum of the different white blood cell must equal to zero (they are all percentages). In hospital database, a patient’s date or discharge can’t be earlier than the admission date
2. Clean the data using:
* Regular expressions: misspellings, regular expression patterns
* KNN-impute and other missing values imputing methods
* Coercing: data-type constraints
* Melting: tidy data issues
* Date/time parsing
* Removing observations

Source

[ VIDEO OF THE WEEK]

@AngelaZutavern & @JoshDSullivan @BoozAllen discussed Mathematical Corporation #FutureOfData

 @AngelaZutavern & @JoshDSullivan @BoozAllen discussed Mathematical Corporation #FutureOfData

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

What we have is a data glut. – Vernon Vinge

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Eloy Sasot, News Corp

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Eloy Sasot, News Corp

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

140,000 to 190,000. Too few people with deep analytical skills to fill the demand of Big Data jobs in the U.S. by 2018.

Sourced from: Analytics.CLUB #WEB Newsletter

Nov 23, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Ethics  Source

[ AnalyticsWeek BYTES]

>> Apr 27, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> How AI is hacking humanity! Lesson from #Brexit & #Election2016 by v1shal

>> Startup Movement Vs Momentum, a Classic Dilemma by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Ditching Engagement in Favor of Blunt-Force Awareness Is a Temptation Marketers Must Avoid – Adweek Under  Social Analytics

>>
 Big Data Set to Get Much Bigger by 2021 – Which-50 (blog) Under  Big Data

>>
 Weak cyber-security protocols can rob companies off clients say experts – Exchange4Media Under  cyber security

More NEWS ? Click Here

[ FEATURED COURSE]

Hadoop Starter Kit

image

Hadoop learning made easy and fun. Learn HDFS, MapReduce and introduction to Pig and Hive with FREE cluster access…. more

[ FEATURED READ]

On Intelligence

image

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:Is it better to design robust or accurate algorithms?
A: A. The ultimate goal is to design systems with good generalization capacity, that is, systems that correctly identify patterns in data instances not seen before
B. The generalization performance of a learning system strongly depends on the complexity of the model assumed
C. If the model is too simple, the system can only capture the actual data regularities in a rough manner. In this case, the system poor generalization properties and is said to suffer from underfitting
D. By contrast, when the model is too complex, the system can identify accidental patterns in the training data that need not be present in the test set. These spurious patterns can be the result of random fluctuations or of measurement errors during the data collection process. In this case, the generalization capacity of the learning system is also poor. The learning system is said to be affected by overfitting
E. Spurious patterns, which are only present by accident in the data, tend to have complex forms. This is the idea behind the principle of Occam’s razor for avoiding overfitting: simpler models are preferred if more complex models do not significantly improve the quality of the description for the observations
Quick response: Occam’s Razor. It depends on the learning task. Choose the right balance
F. Ensemble learning can help balancing bias/variance (several weak learners together = strong learner)
Source

[ VIDEO OF THE WEEK]

RShiny Tutorial: Turning Big Data into Business Applications

 RShiny Tutorial: Turning Big Data into Business Applications

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The data fabric is the next middleware. – Todd Papaioannou

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @DavidRose, @DittoLabs

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @DavidRose, @DittoLabs

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

According to Twitter’s own research in early 2012, it sees roughly 175 million tweets every day, and has more than 465 million accounts.

Sourced from: Analytics.CLUB #WEB Newsletter

Nov 16, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Insights  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> IT jobs to shift to new tech, data analytics, cloud services by analyticsweekpick

>> The Increasing Influence of Cloud Computing by jelaniharper

>> Factoid to Give Big-Data a Perspective by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Microsoft Azure customers now can run workloads on Cray supercomputers – ZDNet Under  Data Scientist

>>
 ‘Cyber security a major challenge for govt organisations’ – Hindu Business Line Under  cyber security

>>
 Master of machines: the rise of artificial intelligence calls for postgrad experts – The Guardian Under  Artificial Intelligence

More NEWS ? Click Here

[ FEATURED COURSE]

Applied Data Science: An Introduction

image

As the world’s data grow exponentially, organizations across all sectors, including government and not-for-profit, need to understand, manage and use big, complex data sets—known as big data…. more

[ FEATURED READ]

Rise of the Robots: Technology and the Threat of a Jobless Future

image

What are the jobs of the future? How many will there be? And who will have them? As technology continues to accelerate and machines begin taking care of themselves, fewer people will be necessary. Artificial intelligence… more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:You have data on the durations of calls to a call center. Generate a plan for how you would code and analyze these data. Explain a plausible scenario for what the distribution of these durations might look like. How could you test, even graphically, whether your expectations are borne out?
A: 1. Exploratory data analysis
* Histogram of durations
* histogram of durations per service type, per day of week, per hours of day (durations can be systematically longer from 10am to 1pm for instance), per employee…
2. Distribution: lognormal?

3. Test graphically with QQ plot: sample quantiles of log(durations)log?(durations) Vs normal quantiles

Source

[ VIDEO OF THE WEEK]

@BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

 @BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

In God we trust. All others must bring data. – W. Edwards Deming

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Dr. Nipa Basu, @DnBUS

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Dr. Nipa Basu, @DnBUS

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Retailers who leverage the full power of big data could increase their operating margins by as much as 60%.

Sourced from: Analytics.CLUB #WEB Newsletter

Nov 09, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Accuracy check  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Surviving the Internet of Things by v1shal

>> Map of US Hospitals and their Health Outcome Metrics by bobehayes

>> Eradicating Silos Forever with Linked Enterprise Data by jelaniharper

Wanna write? Click Here

[ NEWS BYTES]

>>
 The Importance of TSP Snapshot Statistics – FEDweek Under  Statistics

>>
 World’s largest data center to be built in Arctic Circle – CNBC Under  Data Center

>>
 Hybrid cloud and blockchain solutions will be the future for data … – Information Age Under  Hybrid Cloud

More NEWS ? Click Here

[ FEATURED COURSE]

A Course in Machine Learning

image

Machine learning is the study of algorithms that learn from data and experience. It is applied in a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need… more

[ FEATURED READ]

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

image

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for e… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE Q&A]

Q:What is statistical power?
A: * sensitivity of a binary hypothesis test
* Probability that the test correctly rejects the null hypothesis H0H0 when the alternative is true H1H1
* Ability of a test to detect an effect, if the effect actually exists
* Power=P(reject H0|H1istrue)
* As power increases, chances of Type II error (false negative) decrease
* Used in the design of experiments, to calculate the minimum sample size required so that one can reasonably detects an effect. i.e: ‘how many times do I need to flip a coin to conclude it is biased?’
* Used to compare tests. Example: between a parametric and a non-parametric test of the same hypothesis

Source

[ VIDEO OF THE WEEK]

Data-As-A-Service (#DAAS) to enable compliance reporting

 Data-As-A-Service (#DAAS) to enable compliance reporting

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

You can use all the quantitative data you can get, but you still have to distrust it and use your own intelligence and judgment. – Alvin Tof

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @MPFlowersNYC, @enigma_data

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @MPFlowersNYC, @enigma_data

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

571 new websites are created every minute of the day.

Sourced from: Analytics.CLUB #WEB Newsletter

Nov 02, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data shortage  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Malaysia opens digital government lab for big data analytics by analyticsweekpick

>> Sep 07, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> A Visual Approach to Data Management: The Transcendent Power of Data Visualizations by jelaniharper

Wanna write? Click Here

[ NEWS BYTES]

>>
 Big Data and Drone Tech Can Help Fight Famine – The Cipher Brief Under  Big Data

>>
 New Jersey Resources Corp (NYSE:NJR) Institutional Investor Sentiment Analysis – Finance News Daily Under  Sentiment Analysis

>>
 Different types of virtualization – RCR Wireless – RCR Wireless News Under  Virtualization

More NEWS ? Click Here

[ FEATURED COURSE]

Artificial Intelligence

image

This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more

[ FEATURED READ]

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

image

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored f… more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:How do you take millions of users with 100’s transactions each, amongst 10k’s of products and group the users together in meaningful segments?
A: 1. Some exploratory data analysis (get a first insight)

* Transactions by date
* Count of customers Vs number of items bought
* Total items Vs total basket per customer
* Total items Vs total basket per area

2.Create new features (per customer):

Counts:

* Total baskets (unique days)
* Total items
* Total spent
* Unique product id

Distributions:

* Items per basket
* Spent per basket
* Product id per basket
* Duration between visits
* Product preferences: proportion of items per product cat per basket

3. Too many features, dimension-reduction? PCA?

4. Clustering:

* PCA

5. Interpreting model fit
* View the clustering by principal component axis pairs PC1 Vs PC2, PC2 Vs PC1.
* Interpret each principal component regarding the linear combination it’s obtained from; example: PC1=spendy axis (proportion of baskets containing spendy items, raw counts of items and visits)

Source

[ VIDEO OF THE WEEK]

@AngelaZutavern & @JoshDSullivan @BoozAllen discussed Mathematical Corporation #FutureOfData

 @AngelaZutavern & @JoshDSullivan @BoozAllen discussed Mathematical Corporation #FutureOfData

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

He uses statistics as a drunken man uses lamp posts—for support rather than for illumination. – Andrew Lang

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @MichOConnell, @Tibco

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @MichOConnell, @Tibco

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Estimates suggest that by better integrating big data, healthcare could save as much as $300 billion a year — that’s equal to reducing costs by $1000 a year for every man, woman, and child.

Sourced from: Analytics.CLUB #WEB Newsletter

Sep 21, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Trust the data  Source

[ AnalyticsWeek BYTES]

>> April 3, 2017 Health and Biotech analytics news roundup by pstein

>> CEOs to Employees – Vote for Romney else Face Layoffs. A Good Strategy? by v1shal

>> 8 big trends in big data analytics by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 The Hybrid Cloud Depends on Solid Networking – EnterpriseNetworkingPlanet (blog) Under  Hybrid Cloud

>>
 Hoteliers witness revenue surge by four pct with DJUBO adoption – Yahoo News Under  Sales Analytics

>>
 FX Volatility Focused on Weak USD As JPY Firms; EUR Pushes To 1.2000 – DailyFX Under  Sentiment Analysis

More NEWS ? Click Here

[ FEATURED COURSE]

CS229 – Machine Learning

image

This course provides a broad introduction to machine learning and statistical pattern recognition. … more

[ FEATURED READ]

Introduction to Graph Theory (Dover Books on Mathematics)

image

A stimulating excursion into pure mathematics aimed at “the mathematically traumatized,” but great fun for mathematical hobbyists and serious mathematicians as well. Requiring only high school algebra as mathematical bac… more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:Explain what a long-tailed distribution is and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?
A: * In long tailed distributions, a high frequency population is followed by a low frequency population, which gradually tails off asymptotically
* Rule of thumb: majority of occurrences (more than half, and when Pareto principles applies, 80%) are accounted for by the first 20% items in the distribution
* The least frequently occurring 80% of items are more important as a proportion of the total population
* Zipf’s law, Pareto distribution, power laws

Examples:
1) Natural language
– Given some corpus of natural language – The frequency of any word is inversely proportional to its rank in the frequency table
– The most frequent word will occur twice as often as the second most frequent, three times as often as the third most frequent…
– The” accounts for 7% of all word occurrences (70000 over 1 million)
– ‘of” accounts for 3.5%, followed by ‘and”…
– Only 135 vocabulary items are needed to account for half the English corpus!

2. Allocation of wealth among individuals: the larger portion of the wealth of any society is controlled by a smaller percentage of the people

3. File size distribution of Internet Traffic

Additional: Hard disk error rates, values of oil reserves in a field (a few large fields, many small ones), sizes of sand particles, sizes of meteorites

Importance in classification and regression problems:
– Skewed distribution
– Which metrics to use? Accuracy paradox (classification), F-score, AUC
– Issue when using models that make assumptions on the linearity (linear regression): need to apply a monotone transformation on the data (logarithm, square root, sigmoid function…)
– Issue when sampling: your data becomes even more unbalanced! Using of stratified sampling of random sampling, SMOTE (‘Synthetic Minority Over-sampling Technique”, NV Chawla) or anomaly detection approach

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Marketing Analytics

 @AnalyticsWeek Panel Discussion: Marketing Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It’s easy to lie with statistics. It’s hard to tell the truth without statistics. – Andrejs Dunkels

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with  John Young, @Epsilonmktg

 #BigData @AnalyticsWeek #FutureOfData #Podcast with John Young, @Epsilonmktg

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

14.9 percent of marketers polled in Crain’s BtoB Magazine are still wondering ‘What is Big Data?’

Sourced from: Analytics.CLUB #WEB Newsletter

Sep 14, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Correlation-Causation  Source

[ FEATURED COURSE]

Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz

image

Use data to build a better startup faster in partnership with Geckoboard… more

[ FEATURED READ]

The Future of the Professions: How Technology Will Transform the Work of Human Experts

image

This book predicts the decline of today’s professions and describes the people and systems that will replace them. In an Internet society, according to Richard Susskind and Daniel Susskind, we will neither need nor want … more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:Explain selection bias (with regard to a dataset, not variable selection). Why is it important? How can data management procedures such as missing data handling make it worse?
A: * Selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved
Types:
– Sampling bias: systematic error due to a non-random sample of a population causing some members to be less likely to be included than others
– Time interval: a trial may terminated early at an extreme value (ethical reasons), but the extreme value is likely to be reached by the variable with the largest variance, even if all the variables have similar means
– Data: “cherry picking”, when specific subsets of the data are chosen to support a conclusion (citing examples of plane crashes as evidence of airline flight being unsafe, while the far more common example of flights that complete safely)
– Studies: performing experiments and reporting only the most favorable results
– Can lead to unaccurate or even erroneous conclusions
– Statistical methods can generally not overcome it

Why data handling make it worse?
– Example: individuals who know or suspect that they are HIV positive are less likely to participate in HIV surveys
– Missing data handling will increase this effect as it’s based on most HIV negative
-Prevalence estimates will be unaccurate

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Big Data Analytics

 @AnalyticsWeek Panel Discussion: Big Data Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It’s easy to lie with statistics. It’s hard to tell the truth without statistics. – Andrejs Dunkels

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @DavidRose, @DittoLabs

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @DavidRose, @DittoLabs

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

In late 2011, IDC Digital Universe published a report indicating that some 1.8 zettabytes of data will be created that year.

Sourced from: Analytics.CLUB #WEB Newsletter

Sep 07, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Human resource  Source

[ AnalyticsWeek BYTES]

>> 4 Marketing Analytics Tools That Are Shaping the Industry by analyticsweekpick

>> Employee Productivity in 40 Hours Work Week [Infographics] by v1shal

>> 20 Best Practices for Customer Feedback Programs: Strategy and Governance by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 Artificial intelligence holds great potential for both students and teachers – but only if used wisely – The Conversation AU Under  Artificial Intelligence

>>
 Oracle’s New Video Series Shows Off Its Customer Experience Chops – Adweek Under  Customer Experience

>>
 What’s next for wireless? – Telegraph.co.uk Under  IOT

More NEWS ? Click Here

[ FEATURED COURSE]

Learning from data: Machine learning course

image

This is an introductory course in machine learning (ML) that covers the basic theory, algorithms, and applications. ML is a key technology in Big Data, and in many financial, medical, commercial, and scientific applicati… more

[ FEATURED READ]

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

image

In the world’s top research labs and universities, the race is on to invent the ultimate learning algorithm: one capable of discovering any knowledge from data, and doing anything we want, before we even ask. In The Mast… more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:How do you test whether a new credit risk scoring model works?
A: * Test on a holdout set
* Kolmogorov-Smirnov test

Kolmogorov-Smirnov test:
– Non-parametric test
– Compare a sample with a reference probability distribution or compare two samples
– Quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution
– Or between the empirical distribution functions of two samples
– Null hypothesis (two-samples test): samples are drawn from the same distribution
– Can be modified as a goodness of fit test
– In our case: cumulative percentages of good, cumulative percentages of bad

Source

[ VIDEO OF THE WEEK]

Big Data Introduction to D3

 Big Data Introduction to D3

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Big Data is not the new oil. – Jer Thorp

[ PODCAST OF THE WEEK]

#DataScience Approach to Reducing #Employee #Attrition

 #DataScience Approach to Reducing #Employee #Attrition

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Facebook users send on average 31.25 million messages and view 2.77 million videos every minute.

Sourced from: Analytics.CLUB #WEB Newsletter