[ COVER OF THE WEEK ]
Big Data knows everything Source
[ LOCAL EVENTS & SESSIONS]
- Jun 14, 2017 #WEB Big Data & Analytics for Retail Summit – Day 1
- May 31, 2017 #WEB Empowering Yourself with Data Science & Case Study: Classifying Cars
- Jun 27, 2017 #WEB Sentiment Analysis Symposium – Day 1
[ AnalyticsWeek BYTES]
[ NEWS BYTES]
[ FEATURED COURSE]
[ FEATURED READ]
If you are looking for a book to help you understand how the machine learning algorithms “Random Forest” and “Decision Trees” work behind the scenes, then this is a good book for you. Those two algorithms are commonly u… more
[ TIPS & TRICKS OF THE WEEK]
Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.
[ DATA SCIENCE Q&A]
Q:What is: collaborative filtering, n-grams, cosine distance?
A: Collaborative filtering:
– Technique used by some recommender systems
– Filtering for information or patterns using techniques involving collaboration of multiple agents: viewpoints, data sources.
1. A user expresses his/her preferences by rating items (movies, CDs.)
2. The system matches this users ratings against other users and finds people with most similar tastes
3. With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user
– Contiguous sequence of n items from a given sequence of text or speech
– ‘Andrew is a talented data scientist
– Bi-gram: ‘Andrew is, ‘is a, ‘a talented.
– Tri-grams: ‘Andrew is a, ‘is a talented, ‘a talented data.
– An n-gram model models sequences using statistical properties of n-grams; see: Shannon Game
– More concisely, n-gram model: P(Xi|Xi?(n?1)…Xi?1): Markov model
– N-gram model: each word depends only on the n?1 last words
– when facing infrequent n-grams
– solution: smooth the probability distributions by assigning non-zero probabilities to unseen words or n-grams
– Methods: Good-Turing, Backoff, Kneser-Kney smoothing
– How similar are two documents?
– Perfect similarity/agreement: 1
– No agreement : 0 (orthogonality)
– Measures the orientation, not magnitude
Given two vectors A and B representing word frequencies:
[ VIDEO OF THE WEEK]
Subscribe to Youtube
[ QUOTE OF THE WEEK]
Torture the data, and it will confess to anything. Ronald Coase
[ PODCAST OF THE WEEK]
[ FACT OF THE WEEK]
The data volumes are exploding, more data has been created in the past two years than in the entire previous history of the human race.