Jan 31, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Convincing  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Data Science is more than Machine Learning  by analyticsweek

>> The Upper Echelons of Cognitive Computing: Deriving Business Value from Speech Recognition by jelaniharper

>> Talend and Splunk: Aggregate, Analyze and Get Answers from Your Data Integration Jobs by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Regulating the Internet of Things – RFID Journal Under  Internet Of Things

>>
 Analytics Use Grows in Parallel with Data Volumes – Datanami Under  Analytics

>>
 Startup right in 2019: how to set up your customer experience for success – SmartCompany.com.au Under  Customer Experience

More NEWS ? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

Introduction to Graph Theory (Dover Books on Mathematics)

image

A stimulating excursion into pure mathematics aimed at “the mathematically traumatized,” but great fun for mathematical hobbyists and serious mathematicians as well. Requiring only high school algebra as mathematical bac… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:You are compiling a report for user content uploaded every month and notice a spike in uploads in October. In particular, a spike in picture uploads. What might you think is the cause of this, and how would you test it?
A: * Halloween pictures?
* Look at uploads in countries that don’t observe Halloween as a sort of counter-factual analysis
* Compare uploads mean in October and uploads means with September: hypothesis testing

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek #FutureOfData with Robin Thottungal(@rathottungal), Chief Data Scientist at @EPA

 @AnalyticsWeek #FutureOfData with Robin Thottungal(@rathottungal), Chief Data Scientist at @EPA

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It is a capital mistake to theorize before one has data. Insensibly, one begins to twist the facts to suit theories, instead of theories to

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Dr. Nipa Basu, @DnBUS

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Dr. Nipa Basu, @DnBUS

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Retailers who leverage the full power of big data could increase their operating margins by as much as 60%.

Sourced from: Analytics.CLUB #WEB Newsletter

Key Considerations for Converting Legacy ETL to Modern ETL

Recently, there has been a surge in our customers who want to move away from legacy data integration platforms to adopting Talend as their one-stop shop for all their integration needs. Some of the organizations have thousands of legacy ETL jobs to convert to Talend before they are fully operational. The big question that lurks in everyone’s mind is how to get past this hurdle.

Defining Your Conversion Strategy

To begin with, every organization undergoing such a change needs to focus on three key aspects:

  1. Will the source and/or target systems change? Is this just an ETL conversion from their legacy system to modern ETL like Talend?
  2. Is the goal to re-platform as well? Will the target system change?
  3. Will the new platform reside on the cloud or continue to be on-premise?

This is where Talend’s Strategic Services can help carve a successful conversion strategy and implementation roadmap for our customers. In the first part of this three-blog series, I will focus on the first aspect of conversion.

Before we dig into it, it’s worthwhile to note a very important point – the architecture of the product itself. Talend is a JAVA code generator and unlike its competitors (where the code is migrated from one environment to the other) Talend actually builds the code and migrates built binaries from one environment to the other. In many organizations, it takes a few sprints to fully acknowledge this fact as the architects and developers are used to the old ways of referring to code migration.

The upside of this architecture is that it helps in enabling a continuous integration environment that was not possible with legacy tools. A complete architecture of Talend’s platform not only includes the product itself, but also includes third-party products such as Jenkins, NEXUS – artifact repository and a source control repository like GIT. Compare this to a JAVA programming environment and you can clearly see the similarities. In short, it is extremely important to understand that Talend works differently and that’s what sets it apart from the rest in the crowd.

Where Should You Get Started?

Let’s focus on the first aspect, conversion. Assuming that nothing else changes except for the ETL jobs that integrate, cleanse, transform and load the data, it makes it a lucrative opportunity to leverage a conversion tool – something that ingests legacy code and generates Talend code. It is not a good idea to try and replicate the entire business logic of all ETL jobs manually as there will be a great risk of introducing errors leading to prolonged QA cycles. However, just like anyone coming from a sound technical background, it is also not a good idea to completely rely on the automated conversion process itself since the comparison may not always be apples to apples. The right approach is to use the automated conversion process as an accelerator with some manual interventions.

Bright minds bring in success. Keeping that mantra in mind, first build your team:

  • Core Team – Identify architects, senior developers and SMEs (data analysts, business analysts, people who live and breathe data in your organization)
  • Talend Experts – Bring in experts of the tool so that they can guide you and provide you with the best practices and solutions to all your conversion related effort. Will participate in performance tuning activities
  • Conversion Team – A team that leverages a conversion tool to automate the conversion process. A solid team with a solid tool and open to enhancing the tool along the way to automate new designs and specifications
  • QA Team – Seasoned QA professionals that help you breeze through your QA testing activities

Now comes the approach – Follow this approach for each sprint:

Categorize 

Analyze the ETL jobs and categorize them depending on the complexity of the jobs based on functionality and components used. Some good conversion tools provide analyzers that can help you determine the complexity of the jobs to be converted. Spread a healthy mix of varying complexity jobs across each sprint.

Convert 

Leverage a conversion tool to automate the conversion of the jobs. There are certain functionalities such as an “unconnected lookup” that can be achieved through an innovative method in Talend. Seasoned conversion tools will help automate such functionalities

Optimize

Focus on job design and performance tuning. This is your chance to revisit design, if required, either to leverage better component(s) or to go for a complete redesign. Also focus on performance optimization. For high-volume jobs, you could increase the throughput and performance by leveraging Talend’s big data components, it is not uncommon for us to see that we end up completely redesigning a converted Talend Data Integration job to a Talend Big Data job to drastically improve performance. Another feather in our hat where you can seamlessly execute standard data integration jobs alongside big data jobs.

Complete 

Unit test and ensure all functionalities and performance acceptance criteria are satisfied before handing over the job to QA

QA 

An automated QA approach to compare result sets produced by the old set of ETL jobs and new ETL jobs. At the least, focus on:

  • Compare row counts from the old process to that of the new one
  • Compare each data element loaded by the load process to that of the new one
  • Verify “upsert” and “delete” logic work as expected
  • Introduce an element of regression testing to ensure fixes are not breaking other functionalities
  • Performance testing to ensure SLAs are met

Now, for several reasons, there can be instances where one would need to design a completely new ETL process for a certain functionality in order to continue processing data in the same way as before. For such situations, you should leverage the “Talend Experts” team that not only liaisons with the team that does the automated conversion but also works closely with the core team to ensure that, in such situations, the best solution is proposed which is then converted to a template and provided to the conversion team who can then automate the new design into the affected jobs.

As you can see, these activities can be part of the “Categorize” and “Convert” phases of the approach.

Finally, I would suggest chunking the conversion effort into logical waves. Do not go for a big bang approach since the conversion effort could be a lengthy one depending on the number of legacy ETL jobs in an organization.

Conclusion:

This brings me to the end of the first part of the three-blog series. Below are the five key takeaways of this blog:

  1. Define scope and spread the conversion effort across multiple waves
  2. Identify core team, Talend experts, a solid conversion team leveraging a solid conversion tool and seasoned QA professionals
  3. Follow an iterative approach for the conversion effort
  4. Explore Talend’s big data capabilities to enhance performance
  5. Innovate new functionalities, create templates and automate the conversion of these functionalities

Stay tuned for the next two!!

The post Key Considerations for Converting Legacy ETL to Modern ETL appeared first on Talend Real-Time Open Source Data Integration Software.

Source: Key Considerations for Converting Legacy ETL to Modern ETL

Jan 24, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Accuracy check  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> January 30, 2017 Health and Biotech analytics news roundup by pstein

>> Oct 18, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> How will social media analytics bring your business closer to success? by thomassujain

Wanna write? Click Here

[ NEWS BYTES]

>>
 8 common questions from aspiring data scientists, answered – Tech in Asia Under  Data Scientist

>>
 D-Link Camera Poses Data Security Risk, Consumer Reports Finds … – ConsumerReports.org Under  Data Security

>>
 Cyber Security – KSNF/KODE – FourStatesHomepage.com Under  cyber security

More NEWS ? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t

image

People love statistics. Statistics, however, do not always love them back. The Signal and the Noise, Nate Silver’s brilliant and elegant tour of the modern science-slash-art of forecasting, shows what happens when Big Da… more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:Give examples of bad and good visualizations?
A: Bad visualization:
– Pie charts: difficult to make comparisons between items when area is used, especially when there are lots of items
– Color choice for classes: abundant use of red, orange and blue. Readers can think that the colors could mean good (blue) versus bad (orange and red) whereas these are just associated with a specific segment
– 3D charts: can distort perception and therefore skew data
– Using a solid line in a line chart: dashed and dotted lines can be distracting

Good visualization:
– Heat map with a single color: some colors stand out more than others, giving more weight to that data. A single color with varying shades show the intensity better
– Adding a trend line (regression line) to a scatter plot help the reader highlighting trends

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Keynote: The CMO isn't satisfied: Judah Phillips

 @AnalyticsWeek Keynote: The CMO isn’t satisfied: Judah Phillips

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

I’m sure, the highest capacity of storage device, will not enough to record all our stories; because, everytime with you is very valuable da

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @MPFlowersNYC, @enigma_data

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @MPFlowersNYC, @enigma_data

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

100 terabytes of data uploaded daily to Facebook.

Sourced from: Analytics.CLUB #WEB Newsletter

Do Detractors Really Say Bad Things about a Company?

https://www.geekwire.com/2015/the-xfinity-diet-how-i-slashed-my-monthly-cable-bill-and-barely-noticed-the-difference/

https://www.geekwire.com/2015/the-xfinity-diet-how-i-slashed-my-monthly-cable-bill-and-barely-noticed-the-difference/Can you think of a bad experience you had with a company?

Did you tell a friend about the bad experience?

Negative word of mouth can be devastating for company and product reputation. If companies can track it and do something to fix the problem, the damage can be contained.

This is one of the selling points of the Net Promoter Score. That is, customers who rate companies low on a 0 to 10 scale (6 and below) are dubbed “Detractors” because they‘re more likely spreading negative word of mouth and discouraging others from buying from a company. Companies with too much negative word of mouth would be unable to grow as much as others that have more positive word of mouth.

But is there any evidence that low scorers are really more likely to say bad things?

Is the NPS Scoring Divorced from Reality?

There is some concern that these NPS designations are divorced from reality. That is, there’s no evidence (or reason) for detractors being classified as 0 to 6 and promoters being 9-10. If these designations are indeed arbitrary or make no sense, then it’s indeed concerning. (See the tweet comment from a vocal critic in Figure 1.)

Figure 1 Validity of NPS designations

Figure 1: Example of a concern being expressed about the validity of the NPS designations.

To look for evidence of the designations, I re-read the 2003 HBR article by Fred Reichheld that made the NPS famous. Reichheld does mention that the reason for the promoter classification is customer referral and repurchase rates but doesn’t provide a lot of detail (not too surprising given it’s an HBR article) or mention the reason for detractors here.

Figure 2 Quote HBR article

Figure 2: Quote from the HBR article “Only Number You Need to Grow,” showing the justification for the designation of detractors, passives, and promoters.

In his 2006 book, The Ultimate Question, Reichheld further explains the justification for the cutoff of detractors, passives, and promoters. In analyzing several thousand comments, he reported that 80% of the Negative Word of Mouth comments came from those who responded from 0 to 6 on the likelihood to recommend item (pg 30). He further reiterated the claim that 80% of the customer referrals came from promoters (9s and 10s).

Contrary to at least one prominent UX voice on social media, there is some evidence and justification for the designations. It’s based on referral and repurchase behaviors and the sharing of negative comments. This might not be enough evidence to convince people (and certainly not dogmatic critics) to use these designations though. It would be good to find corroborating data.

The Challenges with Purchases and Referrals

Corroborating the promoter designation means finding purchases and referrals. It’s not easy associating actual purchases and actual referrals with attitudinal data. You need a way to associate customer survey data with purchases and then track purchases from friends and colleagues. Privacy issues aside, even in the same company, purchase data is often kept in different (and guarded) databases making associations challenging. It was something I dealt with constantly while at Oracle.

What’s more, companies have little incentive to share repurchase rates and survey data with outside firms and third parties may not have access to actual purchase history. Instead, academics and researchers often rely on reported purchases and reported referrals, which may be less accurate than records of actual purchases and actual referrals (a topic for an upcoming article). It’s nonetheless common in the Market Research literature to rely on stated past behavior as a reasonable proxy for actual behavior. We’ll also address purchases and referrals in a future article.

Collecting Word-of-Mouth Comments

But what about the negative comments used to justify the cutoff between detractors and passives? We wanted to replicate Reichheld’s findings that detractors accounted for a substantial portion of negative comments using another dataset to see whether the pattern held.

We looked at open-ended comments we collected from about 500 U.S. customers regarding their most recent experiences with one of nine prominent brands and products. We collected the data ourselves from an online survey in November 2017. It included a mix of airlines, TV providers, and digital experiences. In total, we had 452 comments regarding the most recent experience with the following brands/products:

  • American Airlines
  • Delta Airlines
  • United Airlines
  • Comcast
  • DirecTV
  • Dish Network
  • Facebook
  • iTunes
  • Netflix

Participants in the survey also answered the 11-point Likelihood to Recommend question, as well as a 10-point and 5-point version of the same question.

Coding the Sentiments

The open-ended comments were coded into sentiments from two independent evaluators. Negative comments were coded -1, neutral 0, and positive 1. During the coding process, the evaluators didn’t have access to the raw LTR scores (0 to 10) or other quantitative information.

In general, there was good agreement between the evaluators. The correlation between sentiment scores was high (r = .83) and they agreed 82% of the time on scores. On the remaining 18% where there was disagreement, differences were reconciled, and a sentiment was selected.

Most comments were neutral (43%) or positive (39%), with only 21% of the comments being coded as negative.

Examples of positive comments

“I flew to Hawaii for vacation, the staff was friendly and helpful! I would recommend it to anyone!”—American Airlines Customer

“I love my service with Dish network. I use one of their affordable plans and get many options. I have never had an issue with them, and they are always willing to work with me if something has financially changed.”—Dish Network Customer

Examples of neutral comments

“I logged onto Facebook, checked my notifications, scrolled through my feed, liked a few things, commented on one thing, and looked at some memories.”—Facebook User

“I have a rental property and this is the current TV subscription there. I access the site to manage my account and pay my bill.”—DirecTV User

Examples of negative comments

“I took a flight back from Boston to San Francisco 2 weeks ago on United. It was so terrible. My seat was tiny and the flight attendants were rude. It also took forever to board and deboard.”—United Airlines Customer

“I do not like Comcast because their services consistently have errors and we always have issues with the internet. They also frequently try to raise prices on our bill through random fees that increase over time. And their customer service is unsatisfactory. The only reason we still have Comcast is because it is the best option in our area.”—Comcast Customer

Associating Sentiments to Likelihood to Recommend (Qual to Quant)

We then associated each coded sentiment with the 0 to 10 values on the Likelihood to Recommend item provided by the respondent. Figure 3 shows this relationship.

Figure 3: Likelihood to Recommend

Figure 3: Percent of positive or negative comments associated with each LTR score from 0 to 10.

For example, 24% of all negative comments were associated with people who gave a 0 on the Likelihood to Recommend scale (the lowest response option). In contrast, 35% of positive comments were associated with people who scored the maximum 10 (most likely to recommend). This is further evidence for the extreme responder effect we’ve discussed in an earlier article.

You can see a pattern: As the score increases from 0 to 10, the percent of negative comments go down (r = -.71) and the percent of positive comments go up (r = .87). There isn’t a perfect linear relationship between comment sentiment and scores (otherwise the correlation would be r =1). For example, the percent of positive comments is actually higher at responses of 8 than 9 and the percent of negative comments is actually higher at 5 than 4 (possibly an artifact of this sample size). Nonetheless, this relationship is very strong.

Detractor Threshold Supported

What’s quite interesting from this analysis is that at a score of 6, the ratio of positive to negative comments flips. Respondents with scores above a 6 (7s-10s) are more likely to make positive comments about their most recent experience. Respondents who scored their Likelihood to Recommend at 6 and below are more likely to make negative comments (spread negative word of mouth) about their most recent experience.

At a score of 6, a participant is about 70% more likely to make a negative comment than a positive comment (10% vs 6% respectively). As scores go lower, the ratio goes up dramatically. At a score of 5, participants are more than three times as likely to make a negative comment as a positive comment. At a score of 0, customers are 42 times more likely to make a negative rather than a positive comment (0.6% vs. 24% respectively).

When aggregating the raw scores into promoters, passives, and detractors, we can see that a substantial 90% of negative comments are associated with detractors (0 to 6s). This is shown in Figure 4.

The positive pattern is less pronounced, but still a majority (54%) of positive comments are associated with promoters (9s and 10s). It’s also interesting to see that the passives (7s and 8s) have a much more uniform chance of making a positive, neutral, or negative comment.

This corroborates the data from Reichheld, which showed 80% of negative comments were associated with those who scored 0 to 6. He didn’t report the percent of positive comments with promoters and didn’t associate the responses to each scale point as we did here (you’re welcome).

Figure 4: Percent of positive and negative comments

Figure 4: Percent of positive or negative comments associated with each NPS classification.

If your organization uses a five-point Likelihood to Recommend scale (5 = extremely likely and 1 = not at all likely), there are similar patterns, albeit on a more compressed scale (see Figure 5 ). At a response of 3, the ratio of positive to negative comments also flips—making responses 3 or below also good designations for detractors. At a score of 3, a customer is almost four times as likely to make a negative comment about their experience than a positive comment.

Figure 5: Percent positive or negative comments LTR

Figure 5: Percent of positive or negative comments associated with each LTR score from 1 to 5 (for companies that use a 5-point scale).

Summary & Takeaways

An examination of 452 open-ended comments about customers most recent experience with nine prominent brands and products revealed:

  • Detractors accounted for 90% of negative comments. This independent evaluation corroborates the earlier analysis by Reichheld that found detractors accounted for a majority of negative word-of-mouth comments. This smaller dataset actually found a higher percentage of negative comments associated with 0 to 6 responses than Reichheld reported.
  • Six is a good threshold for identifying negative comments. The probability a comment will be negative (negative word of mouth) starts to exceed positive comment probability at 6 (on the 11-point LTR scale) and 3 (on a 5-point scale). Researchers looking at LTR scores alone can use this threshold to provide some idea about the probability of the customer sentiment about their most recent experience.
  • Repurchase and referral rates need to be examined. This analysis didn’t examine the relationship between referrals or repurchases (reported and observed) and likelihood to recommend, a topic for future research to corroborate the promoter designation.
  • Results are for specific brands used. In this analysis, we selected a range of brands and products we expected to represent a good range of NPS scores (from low to high). Future analyses can examine whether the pattern of scores at 6 or below correspond to negative sentiment in different contexts (e.g. for the most recent purchase) or for other brands/products/websites.
  • Think probabilistically. This analysis doesn’t mean a customer who gave a score of 6 or below necessarily had a bad experience or will say bad things about a company. Nor does it mean that a customer who gives a 9 or 10 necessarily had a favorable experience. You should think probabilistically about UX measures in general and NPS too. That is, it’s more likely (higher probability) that as scores go down on the Likelihood to Recommend item, the chance someone will be saying negative things goes up (but doesn’t guarantee it).
  • Examine your relationships between scores and comments. Most companies we work with have a lot of NPS data associated with verbatim comments. Use the method of coding sentiments described here to see how well the detractor designation matches sentiment and, if possible, see how well the promoter designations correspond with repurchase and referral rates or other behavioral measures (and consider sharing your results!).
  • Take a measured approach to making decisions. Many aspects of measurement aren’t intuitive and it’s easy to dismiss what we don’t understand or are skeptical about. Conversely, it’s easy to accept what’s “always been done” or published in high profile journals. Take a measured approach to deciding what’s best (including on how to use the NPS). Don’t blindly accept programs that claim to be revolutionary without examining the evidence. And don’t be quick to toss out the whole system because it has shortcomings or is over-hyped (we’d have to toss out a lot of methods and authors if this were the case). In all cases, look for corroborating evidence…probably something more than what you find on Twitter.

Source: Do Detractors Really Say Bad Things about a Company? by analyticsweek

Jan 17, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data Storage  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Big Data Analytics, Supercomputing Seed Growth in Plant Research by analyticsweekpick

>> August 14, 2017 Health and Biotech analytics news roundup by pstein

>> Refugee migration: Where are people fleeing from and where are they going? by analyticsweek

Wanna write? Click Here

[ NEWS BYTES]

>>
 Prescriptive and Predictive Analytics Market Will Boast Developments in Global Industry by 2018-2025 – Leading Journal (blog) Under  Talent Analytics

>>
 Machine-learning algorithm predicts how cells repair broken DNA – EurekAlert (press release) Under  Machine Learning

>>
 200 jobs in Belfast being created by US cyber security firm – RTE.ie Under  cyber security

More NEWS ? Click Here

[ FEATURED COURSE]

Machine Learning

image

6.867 is an introductory course on machine learning which gives an overview of many concepts, techniques, and algorithms in machine learning, beginning with topics such as classification and linear regression and ending … more

[ FEATURED READ]

Storytelling with Data: A Data Visualization Guide for Business Professionals

image

Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You’ll discover the power of storytelling and the way to make data a pivotal point in your story. Th… more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:Is it better to spend 5 days developing a 90% accurate solution, or 10 days for 100% accuracy? Depends on the context?
A: * “premature optimization is the root of all evils”
* At the beginning: quick-and-dirty model is better
* Optimization later
Other answer:
– Depends on the context
– Is error acceptable? Fraud detection, quality assurance

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Eloy Sasot, News Corp

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Eloy Sasot, News Corp

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Big Data is not the new oil. – Jer Thorp

[ PODCAST OF THE WEEK]

@JohnTLangton from @Wolters_Kluwer discussed his #AI Lead Startup Journey #FutureOfData #Podcast

 @JohnTLangton from @Wolters_Kluwer discussed his #AI Lead Startup Journey #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data.

Sourced from: Analytics.CLUB #WEB Newsletter

Interpreting Single Items from the SUS

SUS itemsThe System Usability Scale has been around for decades and is used by hundreds of organizations globally.

The 10-item SUS questionnaire is a measure of a user’s perception of the usability of a “system.”

A system can be just about anything a human interacts with: software apps (business and consumer), hardware, mobile devices, mobile apps, websites, or voice user interfaces.

The SUS questionnaire is scored by combining the 10 items into a single SUS score ranging from 0 to 100. From its creation, though, John Brooke cautioned against interpreting individual items:

“Note that scores for individual items are not meaningful on their own”~John Brooke.

Brooke’s caution against examining scores for the individual items of the SUS was appropriate at the time. After all, he was publishing a “quick and dirty” questionnaire with analyses based on data from 20 people.

There is a sort of conventional wisdom that multiple items are superior to single items and in fact, single item measures and analysis are often dismissed in peer-reviewed journals.

More items will by definition increase the internal consistency reliability of a questionnaire when measured using Cronbach’s alpha. In fact, you can’t measure internal consistency reliability with only one item. However, other methods measure reliability, including test-retest reliability. Single measures, such as satisfaction, brand attitude, task ease, and likelihood to recommend, also exhibit sufficient test-retest reliability and little if anything may be gained by using multiple items.

SUS Benchmarks

John Brooke didn’t publish any benchmarks or guidance for what makes a “good” SUS score. But because the SUS has been used extensively by other researchers who have published the results, we have been able to derive a database of scores. Table 1 shows SUS grades and percentiles that Jim Lewis and I put together from that database, which itself is an adaptation of work from Bangor and Kortum.

Grade SUS Percentile Range
A+ 84.1-100 96-100
A 80.8-84.0 90-95
A- 78.9-80.7 85-89
B+ 77.2-78.8 80-84
B 74.1 – 77.1 70 – 79
B- 72.6-74.0 65-69
C+ 71.1-72.5 60-64
C 65.0-71.0 41-59
C- 62.7-64.9 35-40
D 51.7-62.6 15-34
F 0-51.6 0-14

Table 1: SUS scores, grades, and percentile ranks.

To use the table, find your raw SUS score in the middle column and then find its corresponding grade in the left column and percentile rank in the right column. For example, a SUS score of 75 is a bit above the global average of 68 and nets a “B” grade. A SUS score below 50 puts it in the “F” grade with a percentile rank among the worst interfaces (worse than 86% or better than only 14%).

Why Develop Item-Level Benchmarks?

While the SUS provides an overall measure of perceived ease and our grading scale provides a way to interpret the raw score, researchers may want to measure and set targets for other more specific experience attributes (e.g. perceptions of findability, complexity, consistency, and confidence). To do so, researchers would need to develop specific items to measure those more specific attributes.

Some attributes, such as findability, do not appear in the 10 SUS items. Other attributes, such as perceived complexity (Item 2), perceived ease of use (Item 3), perceived consistency (Item 6), perceived learnability (Item 7), and confidence in use (Item 9) do appear in the SUS.

Researchers who use the SUS and who also need to assess any of these specific attributes would need to decide whether to ask participants in their studies to rate this attribute twice (once in the SUS and again using a separate item) or to use the response to the SUS item in two ways (contributing to the overall SUS score and as a measure of the specific attribute of interest). The latter, using the response to the SUS item in two ways, is the more efficient approach.

In short, using item benchmarks saves respondents time as they answer fewer items and saves researchers time as they don’t have to derive new items and get the bonus of having benchmarks to make the responses more meaningful.

Developing SUS Item Level Benchmarks

To help make the process of understanding individual SUS items better, Jim Lewis and I compiled data from 166 unpublished industrial usability studies/surveys based on scores from 11,855 individual SUS questionnaires.

We then used regression equations to predict overall SUS scores from the individual items. We found each item explained between 35% and 89% of the full SUS score (a large percentage for a single item). Full details of the regression equations and process are available in the Journal of Usability Studies article.

To make item benchmarks easy to reference, we computed the score you’d need for an average “C” score of 68 or a good score of 80, an “A-.“ Why 80? We’ve found that a SUS of 80 has become a common industrial goal. It’s also a good psychological threshold that’s attainable. Achieving a raw SUS score of 90 sounds better but is extraordinarily difficult (only one study in the database exceeded 90–data from Netflix).

Table 2 shows the mean score you would need for each item to achieve an average “C” or good “A-“ score.

SUS Item Target for Average Score Target for Good Score
1. I think that I would like to use this system frequently. ≥ 3.39 ≥ 3.80
2. I found the system unnecessarily complex. ≤ 2.44 ≤ 1.85
3. I thought the system was easy to use. ≥ 3.67 ≥ 4.24
4. I think that I would need the support of a technical person to be able to use this system. ≤ 1.85 ≤ 1.51
5. I found the various functions in this system were well integrated. ≥ 3.55 ≥ 3.96
I thought there was too much inconsistency in this system. ≤ 2.20 ≤ 1.77
7. I would imagine that most people would learn to use this system very quickly. ≥ 3.71 ≥ 4.19
8. I found the system very cumbersome to use. ≤ 2.25 ≤ 1.66
9. I felt very confident using the system. ≥ 3.72 ≥ 4.25
10. I needed to learn a lot of things before I could get going with this system. ≤ 2.09 ≤ 1.64

Table 2: Benchmarks for average and good scores for the 10 SUS items.

For example, if you’re using Item 3, “I thought the system was easy to use,” then a mean score of 3.67 would correspond to a SUS score of 68 (an average overall system score). For an above average SUS score of 80, the corresponding target for Item 3 would be a mean score of at least 4.24.

Note that due to the mixed tone of the SUS, the directionality of the item targets is different for odd- and even-numbered items. Specifically, for odd-numbered items, means need to be greater than the targets; for even-numbered items, observed means need to be less than the targets. For example, for Item 2, “I found the system unnecessarily complex,” you would want to have a mean below 2.44 to achieve an average score (SUS equivalent of 68) and below 1.85 for a good score (SUS equivalent of 80).

Summary

The popularity of the SUS has allowed for the creation of normalized databases and guidance on what constitutes poor, good, or excellent scores. Researchers on some occasions may want to use single items from the SUS to benchmark more specific constructs (e.g. “I felt very confident using the system” representing user confidence). Using data from almost 12,000 participants we were able to create benchmarks for individual SUS items to achieve average “C” scores and high “A-“ SUS equivalent scores. These benchmarks allow researchers to know what mean value to aim for to achieve an average or good experience when interpreting single items from the SUS.

Originally Posted at: Interpreting Single Items from the SUS

Borrowing Technology from Media & Entertainment for Big Data Analytics in the Cloud

For most of computing’s history, data meant “structured” data or data that fits neatly into pre-defined categories and rows stored in databases or spreadsheets. But the big data movement has changed all of that with the proliferation of unstructured data analysis. Unstructured data is any data that doesn’t fit into a predefined data model. It includes things like video, images, text, and all the data being logged by sensors and the myriad of digital devices. Where structured data is relatively easy to store and analyze using traditional technology, unstructured data isn’t.

None-the-less, today, massive collections of unstructured data are being analyzed for altruistic purposes like combating crime and preventing disease, but also for profit motivated goals like spotting business trends. And, as we’ve entered an era of pervasive surveillance – including aerial surveillance by drones and low earth orbit satellites capable of delivering 50 cm resolution imagery – media content (photos, videos and audio) are more relevant to big data analytics than ever before.

Unstructured data tends to be vastly larger than structured data, and is mostly responsible for our crossing the threshold from regular old data to “big data.” That threshold is not defined by a specific number of terabytes or even petabytes, but by what happens when data accumulates to an amount so large that innovative techniques are required to store, analyze and move it. Public cloud computing technology is one of these innovations that’s being applied to big data analytics because it offers a virtually unlimited elastic supply of compute power, networking and storage with a pay-for-use pricing model (all of which opens up new possibilities for analyzing both unstructured and structured big data).

Before their recent and unfortunate shutdown, the respected tech news and research site GigaOM released a survey on enterprise big data. In it over 90% of participants said they planned to move more than a terabyte of data into the cloud, and 20% planned to move more than 100 TB. Cloud storage is a compelling solution as both an elastic repository for this overflowing data and a location readily accessible to cloud-based analysis.

However, one of the challenges that come with using public cloud computing and cloud storage is getting the data into the cloud in the first place. Moving large files and bulk data sets over the Internet can be very inefficient with traditional protocols like FTP and HTTP (the most common way organizations move large files, and the foundation for most options cloud storage providers offer to get your data to them besides shipping hard drives).

In that same GigaOm survey, 24% expressed concern about whether their available bandwidth can accommodate pushing their large data volumes up to the cloud, and 21% worry that they don’t have the expertise to carry out the data migration (read about all the options for moving data to any of the major cloud storage providers, and you too might be intimidated).

While bandwidth and expertise are very legitimate concerns, there are SaaS (Software as a Service) large file transfer solutions that can make optimal use of bandwidth, are very easy to use and integrate with Amazon S3, Microsoft Azure and Google Cloud. In fact, the foundation technology of these solutions was originally built to move very large media files throughout the production, post production and distribution of film and television.

Back in the early 2000’s, when the Media & Entertainment industry began actively transitioning from physical media including tape and hard drives to digital file-based workflows, they had a big data movement problem too. For companies like Disney and the BBC, sending digital media between their internal locations and external editing or broadcast partners was a serious issue. Compared to everything else moving over the Internet, those files were huge. (And broadcast masters are relatively small compared to the 4K raw camera footage being captured today. For example, an hour of raw camera footage often requires a terabyte or more of storage.)

During M&E’s transition from physical media to file-based media, companies like Signiant started developing new protocols for the fast transfer of large files over public and private IP networks, with the high security that the movie industry requires for their most precious assets. The National Academy of Television Arts and Sciences even recognized Signiant’s pioneering role with a Technology and Engineering Emmy award in 2014.

Today, that technology has evolved in step with the cloud revolution, and SaaS accelerated large file transfer technology is expanding to other industries. Far faster and more reliable than older technologies like FTP and HTTP, this solution can also be delivered as a service, so users do not have to worry about provisioning hardware and software infrastructure, including scaling and balancing servers for load peaks and valleys. The “expertise” many worry about needing is a non-issue because the solution is so simple to use. And it’s being used in particular to push large volumes to cloud storage for all kinds of time-sensitive projects, including big data analytics. For example, scientists are analyzing images of snow and ice cover to learn more about climate change, and (interesting though less benevolent) businesses are analyzing images of competitors’ parking lots — counting cars by make and model — in order to understand the shopping habits and demographics of their customers.

It’s always fascinating to see how innovation occurs. It almost never springs from nothing, but is adapted from techniques and technologies employed somewhere else to solve a different challenge. Who would have thought, at the turn of the century, that the technology developed for Media & Entertainment would be so relevant to big data scientific, government and business analytics? And that technology used to produce and delivery entertainment could be leveraged for the betterment of society?

Originally posted via “Borrowing Technology from Media & Entertainment for Big Data Analytics in the Cloud”

Originally Posted at: Borrowing Technology from Media & Entertainment for Big Data Analytics in the Cloud

Jan 10, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
SQL Database  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Improving the Customer Experience Through Big Data [VIDEO] by bobehayes

>> Accelerating Discovery with a Unified Analytics Platform for Genomics by analyticsweek

>> The UX of Brokerage Websites by analyticsweek

Wanna write? Click Here

[ NEWS BYTES]

>>
 Italy-America Chamber, Luxury Marketing Council host 2nd Annual Luxury Summit – Luxury Daily Under  Social Analytics

>>
 Top five business analytics intelligence trends for 2019 – Information Age Under  Analytics

>>
 Billions of dollars have not helped Indian e-tailers figure out AI and big data – Quartz Under  Big Data Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Artificial Intelligence

image

This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more

[ FEATURED READ]

The Industries of the Future

image

The New York Times bestseller, from leading innovation expert Alec Ross, a “fascinating vision” (Forbes) of what’s next for the world and how to navigate the changes the future will bring…. more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:Why is naive Bayes so bad? How would you improve a spam detection algorithm that uses naive Bayes?
A: Naïve: the features are assumed independent/uncorrelated
Assumption not feasible in many cases
Improvement: decorrelate features (covariance matrix into identity matrix)

Source

[ VIDEO OF THE WEEK]

@JustinBorgman on Running a data science startup, one decision at a time #Futureofdata #Podcast

 @JustinBorgman on Running a data science startup, one decision at a time #Futureofdata #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data beats emotions. – Sean Rad, founder of Ad.ly

[ PODCAST OF THE WEEK]

@JustinBorgman on Running a data science startup, one decision at a time #Futureofdata #Podcast

 @JustinBorgman on Running a data science startup, one decision at a time #Futureofdata #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

73% of organizations have already invested or plan to invest in big data by 2016

Sourced from: Analytics.CLUB #WEB Newsletter

Customer Loyalty Feedback Meets Customer Relationship Management

clicktoolsIn my new book, Total Customer Experience, I illustrate why three types of customer loyalty are needed to understand the different ways your customers can show their loyalty towards your company or brand. The three types of loyalty are:

  1. Retention Loyalty: likelihood of customers to stay with a company
  2. Advocacy Loyalty: likelihood of customers to recommend the company/ advocate on the company’s behalf
  3. Purchasing Loyalty: likelihood of customers to expand their relationship with the company

Using this multi-faceted model, I developed a loyalty measurement approach, referred to as the RAPID Loyalty Approach, to help companies get a more comprehensive picture of customer loyalty. Understanding the factors that impact these different types of loyalty helps companies target customer experience improvement strategies to increase different types of customer loyalty.

Data Integration

When companies are able to link these RAPID loyalty metrics with other customer information, like purchase history, campaign responses and employee/partner feedback, the customer insights become deeper. TCELab  (where I am the Chief Customer Officer) is working with Clicktools to help Salesforce customers implement the RAPID Loyalty Approach. This partnership brings together TCELab’s survey knowledge and advisory services with Clicktools’ exceptional feedback software and Salesforce integration; for the fifth consecutive year, Clicktools has received the Salesforce AppExchange™ Customer Choice Award for Best Survey App.

TCELab will include RAPID surveys in Clicktools’ survey library, available in all Clicktools editions and  integrated easilywith a RAPID Salesforce.com custom object.  Salesforce reports and dashboards, including linkage analysis will follow.  Customers can call on the expertise of TCELab for advice on tailoring the surveys for their organization and for support in analysis and reporting.

Joint Whitepaper from TCELab and Clicktools

David Jackson, founder and CEO of Clicktools, and I have co-written a whitepaper titled, “RAPID Loyalty: A Comprehensive Approach to Customer Loyalty,” to present the basic structure and benefits of the RAPID approach and to offer Clicktools customers access to a special program for getting started.

Download the Whitepaper >>

Originally Posted at: Customer Loyalty Feedback Meets Customer Relationship Management by bobehayes