Why Using the ‘Cloud’ Can Undermine Data Protections

By Jack Nicas

While the increasing use of encryption helps smartphone users protect their data, another sometime related technology, cloud computing, can undermine those protections.

The reason: encryption can keep certain smartphone data outside the reach of law enforcement. But once the data is uploaded to companies’ computers connected to the Internet–referred to as “the cloud”–it may be available to authorities with court orders.
“The safest place to keep your data is on a device that you have next to you,” said Marc Rotenberg, head of the Electronic Privacy Information Center. “You take a bit of a risk when you back up your device. Once you do that it’s on another server.”

Encryption and cloud computing “are two competing trends,” Mr. Rotenberg said. “The movement to the cloud has created new privacy risks for users and businesses. Encryption does offer the possibility of restoring those safeguards, but it has to be very strong and it has to be under the control of the user.”

Apple is fighting a government request that it help the Federal Bureau of Investigation unlock the iPhone of Syed Rizwan Farook, the shooter in the December terrorist attack in San Bernardino, Calif.

The FBI believes the phone could contain photos, videos and records of text messages that Mr. Farook generated in the final weeks of his life.

The data produced before then? Apple already provided it to investigators, under a court search warrant. Mr. Farook last backed up his phone to Apple’s cloud service, iCloud, on Oct. 19.

Encryption scrambles data to make it unreadable until accessed with the help of a unique key. The most recent iPhones and Android phones come encrypted by default, with a user’s passcode activating the unique encryption key stored on the device itself. That means a user’s contacts, photos, videos, calendars, notes and, in some cases, text messages are protected from anyone who doesn’t have the phone’s passcode. The list includes hackers, law enforcement and even the companies that make the phones’ software: Apple and Google.

However, Apple and Google software prompt users to back up their devices on the cloud. Doing so puts that data on the companies’ servers, where it is more accessible to law enforcement with court orders.

Apple says it encrypts data stored on its servers, though it holds the encryption key. The exception is so-called iCloud Keychain data that stores users’ passwords and credit-card information; Apple says it can’t access or read that data.

Officials appear to be asking for user data more often. Google said that it received nearly 35,000 government requests for data in 2014 and that it complies with the requests in about 65% of cases. Apple’s data doesn’t allow for a similar comparison since the company reported the number of requests from U.S. authorities in ranges in 2013.

Whether they back up their smartphones to the cloud, most users generate an enormous amount of data that is stored outside their devices, and thus more accessible to law enforcement.

“Your phone is an incredibly intricate surveillance device. It knows everyone you talk to, where you are, where you live and where you work,” said Bruce Schneier, chief technology officer at cybersecurity firm Resilient Systems Inc. “If you were required to carry one by law, you would rebel.”

Google, Yahoo Inc. and others store users’ emails on their servers. Telecom companies keep records of calls and some standard text messages.
Facebook
Inc. and Twitter Inc. store users’ posts, tweets and connections.

Even Snapchat Inc., the messaging service known for photo and video messages that quickly disappear, stores some messages. The company says in its privacy policy that “in many cases” it automatically deletes messages after they are viewed or expire. But it also says that “we may also retain certain information in backup for a limited period or as required by law” and that law enforcement sometimes requires it “to suspend our ordinary server-deletion practices for specific information.”

Snapchat didn’t respond to a request for comment.

Write to Jack Nicas at jack.nicas@wsj.com
(END) Dow Jones Newswires
02-18-161938ET
Copyright (c) 2016 Dow Jones & Company, Inc.

Source: Why Using the ‘Cloud’ Can Undermine Data Protections by analyticsweekpick

Nov 09, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Accuracy check  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Surviving the Internet of Things by v1shal

>> Map of US Hospitals and their Health Outcome Metrics by bobehayes

>> Eradicating Silos Forever with Linked Enterprise Data by jelaniharper

Wanna write? Click Here

[ NEWS BYTES]

>>
 The Importance of TSP Snapshot Statistics – FEDweek Under  Statistics

>>
 World’s largest data center to be built in Arctic Circle – CNBC Under  Data Center

>>
 Hybrid cloud and blockchain solutions will be the future for data … – Information Age Under  Hybrid Cloud

More NEWS ? Click Here

[ FEATURED COURSE]

A Course in Machine Learning

image

Machine learning is the study of algorithms that learn from data and experience. It is applied in a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need… more

[ FEATURED READ]

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

image

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for e… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE Q&A]

Q:What is statistical power?
A: * sensitivity of a binary hypothesis test
* Probability that the test correctly rejects the null hypothesis H0H0 when the alternative is true H1H1
* Ability of a test to detect an effect, if the effect actually exists
* Power=P(reject H0|H1istrue)
* As power increases, chances of Type II error (false negative) decrease
* Used in the design of experiments, to calculate the minimum sample size required so that one can reasonably detects an effect. i.e: ‘how many times do I need to flip a coin to conclude it is biased?’
* Used to compare tests. Example: between a parametric and a non-parametric test of the same hypothesis

Source

[ VIDEO OF THE WEEK]

Data-As-A-Service (#DAAS) to enable compliance reporting

 Data-As-A-Service (#DAAS) to enable compliance reporting

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

You can use all the quantitative data you can get, but you still have to distrust it and use your own intelligence and judgment. – Alvin Tof

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @MPFlowersNYC, @enigma_data

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @MPFlowersNYC, @enigma_data

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

571 new websites are created every minute of the day.

Sourced from: Analytics.CLUB #WEB Newsletter

Surge in real-time big data and IoT analytics is changing corporate thinking

Big data that can be immediately actionable in business decisions is transforming corporate thinking. One expert cautions that a mindset change is needed to get the most from these analytics.

Gartner reported in September 2014 that 73% of respondents in a third quarter 2014 survey had already invested or planned to invest in big data in the next 24 months. This was an increase from 64% in 2013.

The big data surge has fueled the adoption of Hadoop and other big data batch processing engines, but it is also moving beyond batch and into a real-time big data analytics approach.

Organizations want real-time big data and analytics capability because of an emerging need for big data that can be immediately actionable in business decisions. An example is the use of big data in online advertising, which immediately personalizes ads for viewers when they visit websites based on their customer profiles that big data analytics have captured.

“Customers now expect personalization when they visit websites,” said Jeff Kelley, a big data analytics analyst from Wikibon, a big data research and analytics company. “There are also other real-time big data needs in specific industry verticals that want real-time analytics capabilities.”

The financial services industry is a prime example. “Financial institutions want to cut down on fraud, and they also want to provide excellent service to their customers,” said Kelley. “Several years ago, if a customer tried to use his debit card in another country, he was often denied because of fears of fraud in the system processing the transaction. Now these systems better understand each customer’s habits and the places that he is likely to travel to, so they do a better job at preventing fraud, but also at enabling customers to use their debit cards without these cards being locked down for use when they travel abroad.”

Kelly believes that in the longer term this ability to apply real-time analytics to business problems will grow as the Internet of Things (IoT) becomes a bigger factor in daily life.

“The Internet of Things will enable sensor tacking of consumer type products in businesses and homes,” he said. “You will be collect and analyze data from various pieces of equipment and appliances and optimize performance.”

The process of harnessing IoT data is highly complex, and companies like GE are now investigating the possibilities. If this IoT data can be captured in real time and acted upon, preventive maintenance analytics can be developed to preempt performance problems on equipment and appliances, and it might also be possible for companies to deliver more rigorous sets of service level agreements (SLAs) to their customers.

Kelly is excited at the prospects, but he also cautions that companies have to change the way they view themselves and their data to get the most out of IoT advancement.

“There is a fundamental change of mindset,” he explained, “and it will require different ways of approaching application development and how you look at the business. For example, a company might have to redefine itself from thinking that it only makes ‘makes trains,’ to a company that also ‘services trains with data.'”

The service element, warranties, service contracts, how you interact with the customer, and what you learn from these customer interactions that could be forwarded into predictive selling are all areas that companies might need to rethink and realign in their business as more IoT analytics come online. The end result could be a reformation of customer relationship management (CRM) to a strictly customer-centric model that takes into account every aspect of the customer’s “life cycle” with the company — from initial product purchases, to servicing, to end of product life considerations and a new beginning of the sales cycle.

Originally posted via “Surge in real-time big data and IoT analytics is changing corporate thinking”

Originally Posted at: Surge in real-time big data and IoT analytics is changing corporate thinking by analyticsweekpick

Development of the Customer Sentiment Index: Lexical Differences

This is Part 2 of a series on the Development of the Customer Sentiment Index (see introduction, and Part 1). The CSI assesses the extent to which customers describe your company/brand with words that reflect positive or negative sentiment. This post covers the development of a judgment-based sentiment lexicon and compares it to empirically-based sentiment lexicons.

Last week, I created four sentiment lexicons for use in a new customer experience (CX) metric, the Customer Sentiment Index (CSI). The four sentiment lexicons were empirically derived using data from a variety of online review sites from IMDB, Goodreads, OpenTable and Amazon/Tripadvisor. This week, I develop a sentiment lexicon using a non-empirical approach.

Human Judgment Approach to Sentiment Classification

The judgment-based approach does not rely on data to derive the sentiment values; rather this method requires the use of subject matter experts to classify words into sentiment categories. This approach is time-consuming, requiring the subject matter experts to manually classify each of the thousands of words in our empirically-derived lexicons. To minimize the work required by the subject matter experts, an initial set of opinion words were generated using two studies.

In the first study, as part of an annual customer survey, a B2B technology company included an open-ended survey question, “Using one word, please describe COMPANY’S products/services.” From 1619 completed surveys, 894 customers provided an answer for the question. Many respondents used multiple words or the company’s name as their response, reducing the number of useful responses to be 689. Of these respondents, a total of 251 usable unique words were used by respondents.

Also, the customer survey included questions that required customers to provide ratings on measures of customer loyalty (e.g., overall satisfaction, likelihood to recommend, likelihood to buy different products, likelihood to renew) and satisfaction with the customer experience (e.g., product quality, sales process, ease of doing business, technical support).

In the second study, as part of a customer relationship survey, I solicited responses from customers of wireless service providers (B2C sample). The sample was obtained using Mechanical Turk by recruiting English-speaking participants to complete a short customer survey about their experience with their wireless service provider. In addition to the standard rated questions in the customer survey (e.g., customer loyalty, CX ratings), the following question was used to generate the one word opinion: “What one word best describes COMPANY? Please answer this question using one word.

From 469 completed surveys, 429 customers provided an answer for the question, Many respondents used multiple words or the company’s name as their response, reducing the number of useful responses to be 319. Of these respondents, a total of 85 usable unique words were used by respondents.

Sentiment Rating of Opinion Words

The list of customer-generated words for each sample was independently rated by the two experts. I was one of those experts. My good friend and colleague was the other expert. We both hold a PhD in industrial-organizational psychology and specialize in test development (him) and survey development (me). We have extensive graduate-level training on the topics of statistics and psychological measurement principles. Also, we have applied experience, helping companies gain value from psychological measurements. We each have over 20 years of experience in developing/validating tests and surveys.

For each list of words (N = 251 and N = 85), each expert was given the list of words and was instructed to “rate each word on a scale from 0 to 10; where 0 is most negative sentiment/opinion and 10 is most positive sentiment/opinion; and 5 is the midpoint.” After providing their first rating of each word, each of the two raters were then given the opportunity to adjust their initial ratings for each word. For this process, each rater was given the list of 251 words with their initial rating and were asked to make any adjustments to their initial ratings.

Results of Human Judgment Approach to Sentiment Classification

Table 1.  Descriptive Statistics and Correlations of Sentiment Values across Two Expert Raters
Table 1. Descriptive Statistics and Correlations of Sentiment Values across Two Expert Raters

Descriptive statistics of and correlations among the expert-derived sentiment values of customer-generated words appears in Table 1. As you can see, the two raters assign very similar sentiment ratings to words for both sets. Average ratings were similar. Also, the inter-rater agreement between the two raters for the 251 words was r = .87 and for the 85 words was .88.

After slight adjustments, the inter-rater agreement between the two raters improved to r = .90 for the list of 251 words and .92 for the list of 85 words. This high inter-rater agreement indicated that the raters were consistent in their interpretation of the two lists of words with respect to sentiment.

Figure 1. Distribution of
Figure 1. Distribution of Sentiment Values of Customer-Generated Words using Subject Matter Experts’ Sentiment Lexicon

Because of the high agreement between the raters and comparable means between raters, an overall sentiment score for each word was calculated as the average of the raters’ second/adjusted rating (See Table 1 or Figure 2 for descriptive statistics for this metric).

Comparing Empirically-Derived and Expert-Derived Sentiment

In all, I have created five lexicons; four lexicons are derived empirically from four data sources (i.e., OpenTable, Amazon/Tripadvisor, Goodreads and IMDB) and one lexicon is derived using subject matter experts’ sentiment classification.

Table 2. Descriptive Statistics and Correlations among Sentiment Values of Customer-Generated Words across Five Sentiment Lexicons (N = 251)
Table 2. Descriptive Statistics and Correlations among Sentiment Values of Customer-Generated Words across Five Sentiment Lexicons (N = 251)

I compared these five lexicons to better understand the similarity and differences of each lexicon. I applied the four empirically-derived lexicons to each list of customer-generated words. So, in all, for each list of words, I have 5 sentiment scores.

The descriptive statistics of and correlations among the five sentiment scores for the 251 customer-generated words appears in Table 2. Table 3 houses the information for the 85 customer-generated words.

Table 3. Descriptive Statistics and Correlations among Sentiment Values of Customer-Generated Words across Five Sentiment Lexicons (N = 85)
Table 3. Descriptive Statistics and Correlations of among Sentiment Values of Customer-Generated Words across 5 Sentiment Lexicons (N=85)

As you can see, there is high agreement among the empirically-derived lexicons (average correlation = .65 for the list of 251 words and .79 for the list of 85 words.

There are statistically significant mean differences across the empirically-derived lexicons; Amazon/Tripadvisor has the highest average sentiment value and Goodreads has the lowest. Lexicons from IMDB and OpenTable provide similar means. The expert judgment lexicon provides the lowest average sentiment ratings for each list of customer-generated words. The absolute sentiment value of a word is dependent on the sentiment lexicon you use. So, pick a lexicon and use it consistently; changing your lexicon could change your metric.

Looking at the the correlations of the expert-derived sentiments with each of the empirically-derived sentiment, we see that OpenTable lexicon had higher correlation with the experts compared to the Goodreads lexicon. The pattern of results make sense. The OpenTable sample is much more similar to the sample on which the experts provided their sentiment ratings. OpenTable represents a customer/supplier relationship regarding a service while the Goodreads’ sample represents a different type of relationship (customer/book quality).

Summary and Conclusions

These two studies demonstrated that subject matter experts are able to scale words along a sentiment scale. There was high agreement among the experts in their classification.

Additionally, these judgment-derived lexicons were very similar to four empirically derived lexicons. Lexicons based on subject matter experts’ sentiment classification/scaling of words are highly correlated to empirically-derived lexicons. It appears that each of the five sentiment lexicons tells you roughly the same thing as the other lexicons.

The empirically-derived lexicons are less comprehensive than the subject matter experts’ lexicons regarding customer-generated words. By design, the subject matter experts classified all words that were generated by customers; some of the words that were used by the customers do not appear in the empirically-derived lexicons. For example, the OpenTable lexicon only represents 65% (164/251) of the customer-generated words for Study 1 and 71% (60/85) of the customer-generated words for Study 2. Using empirically-derived lexicons for the purposes of calculating the Customer Sentiment Index could be augmented using lexicons that are based on subject matter experts’ classification/scaling of words.

In the next post, I will continue presenting information about the validating the Customer Sentiment Index (CSI). So far, the analysis shows that the sentiment scores of the CSI are reliable (we get similar results using different lexicons). We now need to understand what the CSI is measuring. I will show this by examining the correlation of the CSI with other commonly used customer metrics, including likelihood to recommend (e.g., NPS), overall satisfaction and CX ratings of important customer touch points (e.g., product quality, customer service). Examining correlations of this nature will also shed light on the usefulness of the CSI in a business setting.

Originally Posted at: Development of the Customer Sentiment Index: Lexical Differences by bobehayes

Nov 02, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data shortage  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Malaysia opens digital government lab for big data analytics by analyticsweekpick

>> Sep 07, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> A Visual Approach to Data Management: The Transcendent Power of Data Visualizations by jelaniharper

Wanna write? Click Here

[ NEWS BYTES]

>>
 Big Data and Drone Tech Can Help Fight Famine – The Cipher Brief Under  Big Data

>>
 New Jersey Resources Corp (NYSE:NJR) Institutional Investor Sentiment Analysis – Finance News Daily Under  Sentiment Analysis

>>
 Different types of virtualization – RCR Wireless – RCR Wireless News Under  Virtualization

More NEWS ? Click Here

[ FEATURED COURSE]

Artificial Intelligence

image

This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more

[ FEATURED READ]

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

image

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored f… more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:How do you take millions of users with 100’s transactions each, amongst 10k’s of products and group the users together in meaningful segments?
A: 1. Some exploratory data analysis (get a first insight)

* Transactions by date
* Count of customers Vs number of items bought
* Total items Vs total basket per customer
* Total items Vs total basket per area

2.Create new features (per customer):

Counts:

* Total baskets (unique days)
* Total items
* Total spent
* Unique product id

Distributions:

* Items per basket
* Spent per basket
* Product id per basket
* Duration between visits
* Product preferences: proportion of items per product cat per basket

3. Too many features, dimension-reduction? PCA?

4. Clustering:

* PCA

5. Interpreting model fit
* View the clustering by principal component axis pairs PC1 Vs PC2, PC2 Vs PC1.
* Interpret each principal component regarding the linear combination it’s obtained from; example: PC1=spendy axis (proportion of baskets containing spendy items, raw counts of items and visits)

Source

[ VIDEO OF THE WEEK]

@AngelaZutavern & @JoshDSullivan @BoozAllen discussed Mathematical Corporation #FutureOfData

 @AngelaZutavern & @JoshDSullivan @BoozAllen discussed Mathematical Corporation #FutureOfData

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

He uses statistics as a drunken man uses lamp posts—for support rather than for illumination. – Andrew Lang

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @MichOConnell, @Tibco

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @MichOConnell, @Tibco

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Estimates suggest that by better integrating big data, healthcare could save as much as $300 billion a year — that’s equal to reducing costs by $1000 a year for every man, woman, and child.

Sourced from: Analytics.CLUB #WEB Newsletter

Development of the Customer Sentiment Index: Introduction

WORDCLOUDSENTIMENTARTICLEINTROIn the next few blog posts, I will introduce a new metric, the Customer Sentiment Index (CSI). Integrated into your customer relationship survey, the CSI assesses the degree to which customers possess a positive or negative attitude about you. The development of the CSI involved the application of different disciplines including psychometrics, sentiment analysis and predictive analytics.

Each weekly blog post will describe a step in the CSI development process. Even though the series of blog posts were designed to complement and build on one another, each post will be able to stand alone with respect to its topic. The upcoming topics include:

  • Measuring customers’ attitudes using structured and unstructured data
  • Sentiment analysis and sentiment lexicons
  • Developing sentiment lexicons using an empirically-based and judgment-based approach
  • Reliability, validity and usefulness of the CSI
  • Applications of the CSI: Improving customer experience, mobile surveys

As a whole, these posts will represent my ongoing research and development of the Customer Sentiment Index. If any companies are interested in getting involved in the CSI through sponsorship or partnership, please contact me.

Source: Development of the Customer Sentiment Index: Introduction by bobehayes

Sep 21, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Trust the data  Source

[ AnalyticsWeek BYTES]

>> April 3, 2017 Health and Biotech analytics news roundup by pstein

>> CEOs to Employees – Vote for Romney else Face Layoffs. A Good Strategy? by v1shal

>> 8 big trends in big data analytics by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 The Hybrid Cloud Depends on Solid Networking – EnterpriseNetworkingPlanet (blog) Under  Hybrid Cloud

>>
 Hoteliers witness revenue surge by four pct with DJUBO adoption – Yahoo News Under  Sales Analytics

>>
 FX Volatility Focused on Weak USD As JPY Firms; EUR Pushes To 1.2000 – DailyFX Under  Sentiment Analysis

More NEWS ? Click Here

[ FEATURED COURSE]

CS229 – Machine Learning

image

This course provides a broad introduction to machine learning and statistical pattern recognition. … more

[ FEATURED READ]

Introduction to Graph Theory (Dover Books on Mathematics)

image

A stimulating excursion into pure mathematics aimed at “the mathematically traumatized,” but great fun for mathematical hobbyists and serious mathematicians as well. Requiring only high school algebra as mathematical bac… more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:Explain what a long-tailed distribution is and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?
A: * In long tailed distributions, a high frequency population is followed by a low frequency population, which gradually tails off asymptotically
* Rule of thumb: majority of occurrences (more than half, and when Pareto principles applies, 80%) are accounted for by the first 20% items in the distribution
* The least frequently occurring 80% of items are more important as a proportion of the total population
* Zipf’s law, Pareto distribution, power laws

Examples:
1) Natural language
– Given some corpus of natural language – The frequency of any word is inversely proportional to its rank in the frequency table
– The most frequent word will occur twice as often as the second most frequent, three times as often as the third most frequent…
– The” accounts for 7% of all word occurrences (70000 over 1 million)
– ‘of” accounts for 3.5%, followed by ‘and”…
– Only 135 vocabulary items are needed to account for half the English corpus!

2. Allocation of wealth among individuals: the larger portion of the wealth of any society is controlled by a smaller percentage of the people

3. File size distribution of Internet Traffic

Additional: Hard disk error rates, values of oil reserves in a field (a few large fields, many small ones), sizes of sand particles, sizes of meteorites

Importance in classification and regression problems:
– Skewed distribution
– Which metrics to use? Accuracy paradox (classification), F-score, AUC
– Issue when using models that make assumptions on the linearity (linear regression): need to apply a monotone transformation on the data (logarithm, square root, sigmoid function…)
– Issue when sampling: your data becomes even more unbalanced! Using of stratified sampling of random sampling, SMOTE (‘Synthetic Minority Over-sampling Technique”, NV Chawla) or anomaly detection approach

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Marketing Analytics

 @AnalyticsWeek Panel Discussion: Marketing Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It’s easy to lie with statistics. It’s hard to tell the truth without statistics. – Andrejs Dunkels

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with  John Young, @Epsilonmktg

 #BigData @AnalyticsWeek #FutureOfData #Podcast with John Young, @Epsilonmktg

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

14.9 percent of marketers polled in Crain’s BtoB Magazine are still wondering ‘What is Big Data?’

Sourced from: Analytics.CLUB #WEB Newsletter