Elsevier: How to Gain Data Agility in the Cloud

Presenting at Talend Connect London 2018 is Reed Elsevier (part of RELX Group), a $7 billion data and analytics company with 31,000 employees, serving scientists, lawyers, doctors, and insurance companies among its many clients. The company helps scientists make discoveries, lawyers win cases, doctors save lives, insurance companies offer customers lower prices, and save taxpayers money by preventing fraud.

Standardizing business practices for successful growth

As the business grew over the years, different parts of the organization began buying and deploying integration tools, which created management challenges for central IT. It was a “shadow IT” situation, where individual business departments were implementing their own integrations with their own different tools.

With lack of standardization, integration was handled separately between different units, which made it more difficult for different components of the enterprise to share data. Central IT wanted to bring order to the process and deploy a system that was effective at meeting the company’s needs as well as scalable to keep pace with growth.

Moving to the cloud

One of the essential requirements was that any new solution be a cloud-based offering. Elsevier a few years ago became a “cloud first” company, mandating that any new IT services be delivered via the cloud and nothing be hosted on-premises. It also adopted agile methodologies and a continuous deployment approach, to become as nimble as possible when bringing new products or releases to market.

Elsevier selected Talend as a solution and began using it in 2016. Among the vital selection factors were platform flexibility, alignment with the company’s existing infrastructure, and its ability to generate Java code as output and support microservices and containers.

In their Talend Connect session, Delivering Agile integration platforms, Elsevier will discuss how it got up and running rapidly with Talend despite having a diverse development environment. And, how it’s using Talend, along with Amazon Web Services, to build a data platform for transforming raw data into insight at scale across the business. You’ll learn how Elsevier created a dynamic platform using containers, serverless data processing and continuous integration/continuous development to reach a level of agility and speed.

Agility is among the most significant benefits of their approach using Talend. Elsevier spins up servers as needed and enables groups to independently develop integrations on a common platform without central IT being a bottleneck. Since building the platform, internal demand has far surpassed the company’s expectations—as it is delivering cost savings and insight at a whole new level.

Attend this session to learn more about how you can transform your integration environment.

 

The post Elsevier: How to Gain Data Agility in the Cloud appeared first on Talend Real-Time Open Source Data Integration Software.

Source: Elsevier: How to Gain Data Agility in the Cloud by analyticsweekpick

Dec 06, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data security  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Kaggle Joins Google Cloud by analyticsweek

>> Customer Loyalty 2.0 Article in Quirk’s Marketing Research Review by bobehayes

>> The Big Data Problem in Customer Experience Management: Understanding Sampling Error by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 Want Safer Internet of Things? Change Government Buying Rules. – Nextgov Under  Internet Of Things

>>
 Winter May Bring Bouts of Extreme Cold to Some, Drought Relief to … – Global Banking And Finance Review (press release) Under  Financial Analytics

>>
 Will Cloud and Improving Margins Dominate Amazon’s Earning Report? – Motley Fool Under  Cloud

More NEWS ? Click Here

[ FEATURED COURSE]

Hadoop Starter Kit

image

Hadoop learning made easy and fun. Learn HDFS, MapReduce and introduction to Pig and Hive with FREE cluster access…. more

[ FEATURED READ]

Thinking, Fast and Slow

image

Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely wor… more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:What do you think about the idea of injecting noise in your data set to test the sensitivity of your models?
A: * Effect would be similar to regularization: avoid overfitting
* Used to increase robustness

Source

[ VIDEO OF THE WEEK]

@EdwardBoudrot / @Optum on #DesignThinking & #DataDriven Products #FutureOfData #Podcast

 @EdwardBoudrot / @Optum on #DesignThinking & #DataDriven Products #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data really powers everything that we do. – Jeff Weiner

[ PODCAST OF THE WEEK]

#FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

 #FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Akamai analyzes 75 million events per day to better target advertisements.

Sourced from: Analytics.CLUB #WEB Newsletter

How to use the TRACE function to track down problems in QlikView

Mark Viecelli
Business Intelligence Specialist, Qlik
John Daniel Associates, Inc.

Mark’s Profile

What is the TRACE Function?

Many Qlik developers will tell you that one of the most tedious tasks when developing an application is tracking down where and why your script has failed, especially when you are working on an application with complex scripting. Sometimes these errors are obvious, but oftentimes you can spend unfathomable amounts of time rereading your script trying to find out exactly where things went wrong. Throughout my time working with Qlik, I have personally experienced countless iterations of digging through my failed script executions until finally locating my problems after what always seems like an eternity… That was until I stumbled across the TRACE function.

The TRACE function has become one of my favorite tools in my arsenal of QlikView tips and tricks due to its ability to save time and eliminate many of the frustrations that come with debugging your script. In fact, I have found this function so useful that I have made a habit of adding it to all of my applications no matter the size or complexity. I believe that after reading this blog you too will find yourself adding this TRACE function to each and every one of your scripts!

In its simplest form, the TRACE function writes a string to the Script Execution Progress window and the script log file as shown in the images below.

Trace Function 1

Trace Function 2

As you can see in the images above, the TRACE function allows you to set markers within the Script Execution Progress window and log file so that if a script fails you can walk through your script and follow each marker, eventually pin-pointing the exact area where things went awry. These “markers,” or messages, are versatile and customizable, which allow you to tailor your TRACE markers to any application or situation.

One important thing to remember is that the TRACE function does not specifically spell out what failed, but rather it shows you what pieces of the script were successfully completed, thus allowing the developer to follow the success markers up until the point of script failure. The image below shows what would happen in both the Script Execution Progress window and the application log file if the script were to fail.

Trace Function 3

Trace Function 4

You can see that the Glossary Load started and finished, which tells me that everything seems to be working fine with my Glossary. I then scroll down to see that my Change Log Load starts, but I do not see any marker telling me that the load of this table was finished. Due to the missing TRACE marker in the Script Execution window and the presence of the “Execution Failed” notification in the log file I know that something went wrong within my Change Log table load. These events and markers allow me to save time and jump right to that piece of script within my application and continue my research.

While this simple example is being used for the sake of this blog and explanation, in larger and more complex scripts being able to see what was fully completed and what was not can be a life saver. If you are running your script manually within the QlikView application you will still receive your normal script error dialog box, but the TRACE function can be invaluable when running jobs from the QMC where the specific error notifications are not as pronounced unless viewing the log file.

How is the TRACE function used?

There are a variety of ways to use the trace function but the two I find most useful are as follows:

  1. Designating a specific string by simply typing the desired output directly into the script as you have seen throughout this blog

Trace Function 5

Notice that there does not need to be quotes around your string unless you are adding additional components such as a variable. In that case you would need to use the format of ‘This is an example ’& $(vExample).

  1. Create a variable that feeds into your TRACE function

Trace Function 6

Trace Function 7

NOTE: If running the script manually from within the Qlik application it is VERY helpful to deselect the “Close when finished” checkbox at the bottom left hand side of the Script Execution Progress window. This eliminates the need to access the log file in the event of a failure. You can simply just scroll within your Script Execution Progress window to easily locate the area of your script error. This can also be configured in the User Preferences menu located under the Settings. Both methods are shown below.

Trace Function 8

Trace Function 9

*DEVELOPMENT TIP* – Because the TRACE function does not print anything to signify where a TRACE has been inserted it is sometimes hard to locate the markers that you have worked into your script. I find it extremely helpful to insert a string or characters that stand out into your TRACE function. Strings such as ‘……….’ (this string is used in the example image above), ‘********’, or ‘>>>>>>>’ will make the TRACE much easier to read when scrolling through your Script Execution Progress window or application log file.

Here’s a quick side-by-side look at what would happen in the Script Execution Progress window if I removed the preceding ‘……….’ before my TRACE string:

Trace Function 10

Trace Function 11

While the TRACE information is still present, it does not jump out at the developer the same as it would if they were to include preceding characters. For best results, use something that sticks out to you. Again, this helps to save time and increase readability.

 

NOTE: The TRACE function must be used before or after your script statements. For example, it CANNOT be used in the middle of a LOAD, such as after a field name.

As I have stated earlier, one limitation of the TRACE function is that it will not give you the exact record or field the script fails on. This is mainly due to the fact that the TRACE cannot be placed within a given statement. Although the TRACE function will not point out an exact record or field of error, it does have the ability to substantially narrow down the possible sources of errors. Because of this, I offer the following development tip:

 

*DEVELOPMENT TIP* – Use as many TRACE functions as you see fit. The more TRACE functions you use the easier it will be to debug your application.

For experienced developers it is easy to ignore certain functions or practices because they may seem useless or may take an extra few seconds to enter into your script. However, I maintain that it is these “basic” functions that we so often overlook that can actually be game changers when it comes crunch time. It is important to not become jaded by the development experience you obtain and always remember that going back to basics is not a sign of ignorance, but rather one of the smartest moves developers of all experience levels can make.

The TRACE function is debugging made easy. The next time you develop an application or begin to make changes to an old script give this method a try… You won’t be disappointed!

Save

The post How to use the TRACE function to track down problems in QlikView appeared first on John Daniel Associates, Inc..

Source by analyticsweek

Don’t Let your Data Lake become a Data Swamp

In an always-on, competitive business environment, organizations are looking to gain an edge through digital transformation. Subsequently, many companies feel a sense of urgency to transform across all areas of their enterprise—from manufacturing to business operations—in the constant pursuit of continuous innovation and process efficiency.

Data is at the heart of all these digital transformation projects. It is the critical component that helps generate smarter, improved decision-make by empowering business users to eliminate gut feelings, unclear hypotheses, and false assumptions. As a result, many organizations believe building a massive data lake is the ‘silver bullet’ for delivering real-time business insights. In fact, according to a survey by CIO review from IDG, 75 percent of business leaders believe their future success will be driven by their organization’s ability to make the most of their information assets. However, only four percent of these organizations said they are set up a data-driven approach for successfully benefits from their information.

Is your Data Lake becoming more of a hindrance than an enabler?

The reality is that all these new initiatives and technologies come with a unique set of generated data, which creates additional complexity in the decision-making process. To cope with the growing volume and complexity of data and alleviate IT pressure, some are migrating to the cloud.

But this transition—in turn—creates other issues. For example, once data is made more broadly available via the cloud, more employees want access to that information. Growing numbers and varieties of business roles are looking to extract value from increasingly diverse data sets, faster than ever—putting pressure on IT organizations to deliver real-time, data access that serves the diverse needs of business users looking to apply real-time analytics to their everyday jobs. However, it’s not just about better analytics—business users also frequently want tools that allow them to prepare, share, and manage data.

To minimize tension and friction between IT and business departments, moving raw data to one place where everybody can access it sounded like a good move.  The concept of the data lake first coined by James Dixon in 2014 expected the data lake to be a large body of raw data in a more natural state where different users come to examine it, delve into it, or extract samples from it. However, increasingly organizations are beginning to realize that all the time and effort spent building massive data lakes have frequently made things worse due to poor data governance and management, which resulted in the formation of so-called “Data Swamps”.

Bad data clogging up the machinery

The same way data warehouses failed to manage data analytics a decade ago, data lakes will undoubtedly become “Data Swamps” if companies don’t manage them in the correct way. Putting all your data in a single place won’t in and of itself solve a broader data access problem. Leaving data uncontrolled, un-enriched, not qualified, and unmanaged, will dramatically hamper the benefits of a data lake, as it will still have the ability to only be utilized properly by a limited number of experts with a unique set of skills.

A success system of real-time business insights starts with a system of trust. To illustrate the negative impact of bad data and bad governance, let’s take a look at what happened to Dieselgate. The Dieselgate emissions scandal highlighted the difference between real-world and official air pollutant emissions data. In this case, the issue was not a problem of data quality, but of ethics, since some car manufacturers misled the measurement system by injecting fake data. This resulted in fines for car manufacturers exceeding more than tens of billions of dollars and consumers losing faith in the industry. After all, how can consumers trust the performance of cars now that they know the system-of-measure has been intentionally tampered with? 

The takeaway in the context of an enterprise data lake is that its value will depend on the level of trust employees have in the data contained in the lake. Failing to control data accuracy and quality within the lake will create mistrust amongst employees, seed doubt about the competency of IT, and jeopardize the whole data value chain, which then negatively impacts overall company performance.

A cloud data warehouse to deliver trusted insights for the masses

Leading firms believe governed cloud data lakes represent an adequate solution to overcoming some of these more traditional data lake stumbling blocks. The following four-step approach helps modernize cloud data warehouse while providing better insight into the entire organization. 

  1. Unite all data sources and reconcile them: Make sure the organization has the capacity to integrate a wide array of data sources, formats and sizes. Storing a wide variety of data in one place is the first step, but it’s not enough. Bridging data pipelines and reconciling them is another way to gain the capacity to manage insights. Verify the company has a cloud-enabled data management platform combining rich integration capabilities and cloud elasticity to process high data volumes at a reasonable price.
  2. Accelerate trusted insights to the masses: Efficiently manage data with cloud data integration solutions that help prepare, profile, cleanse, and mask data while monitoring data quality over time regardless of file format and size.  When coupled with cloud data warehouse capabilities, data integration can enable companies to create trusted data for access, reporting, and analytics in a fraction of the time and cost of traditional data warehouses. 
  3. Collaborative data governance to the rescue: The old schema of a data value chain where data is produced solely by IT in data warehouses and consumed by business users is no longer valid.  Now everyone wants to create content, add context, enrich data, and share it with others. Take the example of the internet and a knowledge platform such as Wikipedia where everybody can contribute, moderate and create new entries in the encyclopedia. In the same way Wikipedia established collaborative governance, companies should instill a collaborative governance in their organization by delegating the appropriate role-based, authority or access rights to citizen data scientists, line-of-business experts, and data analysts.
  4. Democratize data access and encourage users to be part of the Data Value Chain: Without making people accountable for what they’re doing, analyzing, and operating, there is little chance that organizations will succeed in implementing the right data strategy across business lines. Thus, you need to build a continuous Data Value Chain where business users contribute, share, and enrich the data flow in combination with a cloud data warehouse multi-cluster architecture that will accelerate data usage by load balancing data processing across diverse audiences.

In summary, think of data as the next strategic asset. Right now, it’s more like a hidden treasure at the bottom of many companies. Once modernized, shared and processed, data will reveal its true value, delivering better and faster insights to help companies get ahead of the competition.

The post Don’t Let your Data Lake become a Data Swamp appeared first on Talend Real-Time Open Source Data Integration Software.

Source by analyticsweek

Nov 29, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Fake data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Data Science is more than Machine Learning  by analyticsweek

>> How data analytics can drive workforce diversity by analyticsweekpick

>> Steph Curry’s Season Stats in 13 lines of R Code by stattleship

Wanna write? Click Here

[ NEWS BYTES]

>>
 Infor upgrades Talent Science solution – Trade Arabia Under  Talent Analytics

>>
 Fascinating Cricket Statistics – Radio New Zealand Under  Statistics

>>
 Job Role: IoT Solutions Architect – Techopedia (press release) Under  IOT

More NEWS ? Click Here

[ FEATURED COURSE]

Intro to Machine Learning

image

Machine Learning is a first-class ticket to the most exciting careers in data analysis today. As data sources proliferate along with the computing power to process them, going straight to the data is one of the most stra… more

[ FEATURED READ]

Thinking, Fast and Slow

image

Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely wor… more

[ TIPS & TRICKS OF THE WEEK]

Data Analytics Success Starts with Empowerment
Being Data Driven is not as much of a tech challenge as it is an adoption challenge. Adoption has it’s root in cultural DNA of any organization. Great data driven organizations rungs the data driven culture into the corporate DNA. A culture of connection, interactions, sharing and collaboration is what it takes to be data driven. Its about being empowered more than its about being educated.

[ DATA SCIENCE Q&A]

Q:Is it beneficial to perform dimensionality reduction before fitting an SVM? Why or why not?
A: * When the number of features is large comparing to the number of observations (e.g. document-term matrix)
* SVM will perform better in this reduced space

Source

[ VIDEO OF THE WEEK]

@DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

 @DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

You can use all the quantitative data you can get, but you still have to distrust it and use your own intelligence and judgment. – Alvin Tof

[ PODCAST OF THE WEEK]

Dave Ulrich (@dave_ulrich) talks about role / responsibility of HR in #FutureOfWork #JobsOfFuture #Podcast

 Dave Ulrich (@dave_ulrich) talks about role / responsibility of HR in #FutureOfWork #JobsOfFuture #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Every person in the US tweeting three tweets per minute for 26,976 years.

Sourced from: Analytics.CLUB #WEB Newsletter

Estimating Other “Likelihood to Recommend” Metrics from Your Net Promoter Score (NPS)

In the realm of customer experience management, businesses can employ different summary metrics of customer feedback ratings. That is, the same set of data can be summarized in different ways. Popular summary metrics include mean scores, net scores and customer segment percentages. Prior analysis of different likelihood to recommend metrics reveal, however, that they are highly correlated; that is, different summary metrics regarding the “likelihood to recommend” question tell you essentially the same thing about your customers. This post presents information to help you compare different likelihood to recommend summary metrics.

Mean_NPS_Box_Descrp_Corrs
Table 1. Correlations among different summary metrics of the likelihood to recommend question.

Sample

The data were from three separate studies, each examining consumer attitudes toward either their PC Manufacturer or Wireless Service Provider. Here are the details for each study:

  1. PC manufacturer: Survey of 1058 general US consumers in Aug 2007 about their PC manufacturer. All respondents for this study were interviewed to ensure they met the correct profiling criteria, and were rewarded with an incentive for filling out the survey. Respondents were ages 18 and older. GMI (Global Market Insite, Inc., www.gmi-mr.com) provided the respondent panels and the online data collection methodology.
  2. Wireless service provider: Survey of 994 US general consumers in June 2007 about their wireless provider. All respondents were from a panel of General Consumers in the United States ages 18 and older. The potential respondents were selected from a general panel which is recruited in a double opt-in process; all respondents were interviewed to ensure they meet correct profiling criteria. Respondents were given an incentive on a per-survey basis. GMI (Global Market Insite, Inc., www.gmi-mr.com) provided the respondent panels and the online data collection methodology.
  3. Wireless service providers: Survey of 5686 worldwide consumers from Spring 2010 about their wireless provider. All respondents for this study were rewarded with an incentive for filling out the survey. Respondents were ages 18 or older. Mob4Hire (www.mob4hire.com)  provided the respondent panels and the online data collection methodology.
Figure 1. Regression equations using NPS as the predictor and other summary metrics as the criteria.
Figure 1. Regression equations using NPS as the predictor and other summary metrics as the criteria.  Click image to enlarge.

From these three studies across nearly 8000 respondents, I calculated six customer metrics for 48 different brands/companies for those that had 30 or more responses. Of the 48 different brands, most were from the Wireless Service provider industry (N = 41). The remaining seven brands were from the PC industry. I calculated six different “likelihood to recommend” metrics for each of these 48 brands.

Results

The descriptive statistics of six different metrics and the correlations among them appear in Table 1. As you can see, five of the six summary metrics are highly related to each other. The correlations among these metrics vary from .85 to .97 (the negative correlations with Bottom 7 Box indicate that the bottom box score is a measure of badness; higher scores indicate more negative customer responses). The metric regarding the Passives segment is weakly related to the other metrics because the customer segment that it represents reflects the middle of the distribution of ratings.

The extremely high correlations among the rest of the metrics tell us that these five metrics tell us roughly the same thing about the 48 brands. That is, brands with high Net Promoter Scores are those that are getting high Mean Scores, high Top Box Scores (Promoters), Low Bottom Box Scores (Detractors) and Positive Scores (% ≥ 6).

Comparing Different Summary Metrics

It is easy to compare your company’s Net Promoter Score to other companies when other companies also report their Net Promoter Score. When different companies summarize their “likelihood to recommend” question using a Mean Score  or Top/Bottom Box Scores, this comparison across companies becomes difficult. However, we can use the current findings to help translate NPS scores into other summary metrics. Because the different metrics are so highly related, we can, with great precision, estimate different metrics from the NPS via regression analysis. Using regression analysis, I estimated the other five summary metrics from the Net Promoter Score.

I calculated five different regression questions using NPS as the predictor and each of the other summary metrics as the criterion for each equation. I selected regression equations (e.g., linear, polynomial) that optimized the percent of variance explained by the model. The Mean Score was predicted using a linear model. The remaining scores were predicted using a polynomial model. The regression equations for each of the metrics (along with a scatter plot of the associated data) appear in Figure 1. As you can see in Figure 1, most of the regression equations explain a large percent of the variance in the outcome variables.

Table 2. Net Promoter Scores and Predicted Values of Other Summary Metrics. Click image to enlarge.

Using these regression equations, you can calculate the expected summary score from any Net Promoter Score. Simply substitute the x value with your Net Promoter Score and solve for y. Table 2 provides a summary of predicted values of other summary metrics given different Net Promoter Scores. For example, an NPS of -70 is equivalent to a Mean Score of 4.9. An NPS of 0.0 is equivalent to a Mean Score of 7.1. 

Summary

Customer feedback data can be summarized in different ways. The current analysis showed that different summary metrics (e.g., Mean Scores, Net Scores, Top/Bottom Box Scores), not surprisingly, tell you the same thing; that is, summary metrics are highly correlated with each other. Using regression equations, you can easily transform your Net Promoter Score into other summary metrics.

In an upcoming post, I will examine how well customer feedback professionals are able to estimate different summary metrics (customer segment percentages) from Net Promoter Scores and Mean Scores without the assistance of regression equations. This upcoming post will highlight potential biases that professionals have when interpreting Net Promoter and Mean Scores.

Source

The Evolution of Internet of Things (Infographics)

Internet of Things, is a new revolution of the Internet. Objects make themselves recognizable and they get intelligence thanks to the fact that they can communicate information about themselves and they can access information that has been aggregated by other things.

THE EVOLUTION OF THE INTERNET OF THINGS
THE EVOLUTION OF THE INTERNET OF THINGS

Source Casaleggio.it

Source

Nov 22, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Trust the data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> July 31, 2017 Health and Biotech analytics news roundup by pstein

>> For Musicians and Songwriters, Streaming Creates Big Data Challenge by analyticsweekpick

>> Simplifying Data Warehouse Optimization by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Is Population Health on the Agenda as Google Nabs Geisinger CEO? – Health IT Analytics Under  Health Analytics

>>
 How to catch security blind spots during a cloud migration – GCN.com Under  Cloud

>>
 Data Analytics Outsourcing Market Application Analysis, Regional Outlook, Growth Trends, Key Players and Forecasts … – AlgosOnline (press release) (blog) Under  Social Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Statistical Thinking and Data Analysis

image

This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and n… more

[ FEATURED READ]

Big Data: A Revolution That Will Transform How We Live, Work, and Think

image

“Illuminating and very timely . . . a fascinating — and sometimes alarming — survey of big data’s growing effect on just about everything: business, government, science and medicine, privacy, and even on the way we think… more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?
A: Lift:
It’s measure of performance of a targeting model (or a rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. Lift is simply: target response/average response.

Suppose a population has an average response rate of 5% (mailing for instance). A certain model (or rule) has identified a segment with a response rate of 20%, then lift=20/5=4

Typically, the modeler seeks to divide the population into quantiles, and rank the quantiles by lift. He can then consider each quantile, and by weighing the predicted response rate against the cost, he can decide to market that quantile or not.
“if we use the probability scores on customers, we can get 60% of the total responders we’d get mailing randomly by only mailing the top 30% of the scored customers”.

KPI:
– Key performance indicator
– A type of performance measurement
– Examples: 0 defects, 10/10 customer satisfaction
– Relies upon a good understanding of what is important to the organization

More examples:

Marketing & Sales:
– New customers acquisition
– Customer attrition
– Revenue (turnover) generated by segments of the customer population
– Often done with a data management platform

IT operations:
– Mean time between failure
– Mean time to repair

Robustness:
– Statistics with good performance even if the underlying distribution is not normal
– Statistics that are not affected by outliers
– A learning algorithm that can reduce the chance of fitting noise is called robust
– Median is a robust measure of central tendency, while mean is not
– Median absolute deviation is also more robust than the standard deviation

Model fitting:
– How well a statistical model fits a set of observations
– Examples: AIC, R2, Kolmogorov-Smirnov test, Chi 2, deviance (glm)

Design of experiments:
The design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation.
In its simplest form, an experiment aims at predicting the outcome by changing the preconditions, the predictors.
– Selection of the suitable predictors and outcomes
– Delivery of the experiment under statistically optimal conditions
– Randomization
– Blocking: an experiment may be conducted with the same equipment to avoid any unwanted variations in the input
– Replication: performing the same combination run more than once, in order to get an estimate for the amount of random error that could be part of the process
– Interaction: when an experiment has 3 or more variables, the situation in which the interaction of two variables on a third is not additive

80/20 rule:
– Pareto principle
– 80% of the effects come from 20% of the causes
– 80% of your sales come from 20% of your clients
– 80% of a company complaints come from 20% of its customers

Source

[ VIDEO OF THE WEEK]

#FutureOfData with @theClaymethod, @TiVo discussing running analytics in media industry

 #FutureOfData with @theClaymethod, @TiVo discussing running analytics in media industry

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Big Data is not the new oil. – Jer Thorp

[ PODCAST OF THE WEEK]

@DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

 @DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years.

Sourced from: Analytics.CLUB #WEB Newsletter

The Challenges Canadian Companies Face When Implementing Big Data

Big Data is a big deal but implementing it is seldom simple or cheap.

According to new Accenture research that surveyed senior executives from seven industries in 19 countries, including Canada, the three biggest challenges Canadian companies face when implementing big data are budget, a shortage of skilled professionals, and security.

Canadian executives said the three main ways their companies use big data are to identify new sources of revenue, retain and acquire customers, and develop new products and services. And they’re seeing tangible business outcomes from big data, too, in the form of customer experience enhancement and new sources of revenue.

“Businesses are at a transition point­­ where instead of just talking about the potential results that can be achieved from big data, they are realizing actual benefits including increasing revenues, a growing base of loyal customers, and more efficient operations,” said Narendra Mulani, senior managing director, Accenture Analytics, part of Accenture Digital.

“They’re recognizing that big data is one of the cornerstones of digital transformation,” Mulani added.

 

But half of Canadian execs cite budget as a challenge to applying big data, while 40% struggle to find talent. These are obstacles they must overcome, however, as 90% rate big data as “extremely” or “very” important to their business’ digital transformation, and 86% of those who have applied big data are satisfied with the results, according to the report.

“We’ve seen organizations overcome big data implementation challenges by remaining flexible and recognizing that no single solution suits every situation,” explained Vince Dell’Anno, managing director and global information management lead, Accenture Analytics, part of Accenture Digital. “If a particular approach doesn’t work, organizations quickly try another one, learning as they grow.

“They also start small and stay realistic in their expectations,” he added. “Rather than attempting to do everything at once, they focus resources around proving value in one area, and then let the results cascade from there.”

“Today, even the most basic items like water pipes can generate and provide data,” continued Mulani. “While the Internet of Things is giving rise to massive sources and quantities of data, new big data technologies are emerging that help uncover crucial business insights from the data.”

“Companies not implementing big data solutions are missing an opportunity to turn their data into an asset that drives business and a competitive advantage,” Mulani affirmed

Originally posted via “The Challenges Canadian Companies Face When Implementing Big Data”

Originally Posted at: The Challenges Canadian Companies Face When Implementing Big Data by analyticsweekpick

Dreaming of being Data Driven? A Case for Central Data Office

case for central data office
case for central data office

Nuf spoke about BigData and enterprise journey that takes them there. One care area where the buck stops is “who owns the data?” or “no one guy owns the data”, making accessibility of data a nightmare scenario. Companies spend lot of time and energy to get around this shortcoming and not many move on with planning this the right way. If you had spend more than a month in analytics, you could get chills when it comes to bringing data sets from other department or silo. Have you ever wondered why is this a problem and when almost everyone is suffering still, nothing is yet done? Let’s start scratching our head and start thinking about giving your business a centralized data office that takes care of the enterprise data. It’s a dream to make data availability not an IT nightmare nor an interdepartmental mind wrestle, but a smooth process. There are numerous ways to get you aligned to this journey. Following are few of several examples.

1. Reduce time to data: Yes, this should be a no surprise, when it is all managed at one place, there are bound to be centralized processes and resource allocation making it more structured and fast. You don’t have to invent laws of reciprocity to get that magical data that will help you do you job better. You will gain access to your data sets faster. On the other side, you will now know the process of sharing data and hence will share them without again getting into random processes and timings. Both sides will get a methodical and accounted system that has set expectations. This will also unhitch IT and other departments from dealing with lot of people requesting random data at random times and provide a central funnel for all dispensing.

2. Give every data an owner & thereby reduce bad data: How many times have you heard that “it’s not my data” or “I don’t own it” or “I just requested IT for it, and I don’t know much about it?” this is another big problem plaguing big enterprises. Lack of ownership comes with a price. More often than not, of adding loads of overheads and restricting the capability. Have a centralized data office reduces such problems by giving ownership to someone besides IT whose primary job is not to understand data in the first place. Having done that it became fluid easy to get to the owners and understand more about data and qualify it better for faster and effective analytics.

3. More transparency and thereby less traps & greasy edges: Having a centralized data office puts more light into data, it’s relevance and how each data is contributing to bigger enterprise dream. Having a central office give a bigger lens to data strategy teams to understand how data is shaping their existence. It makes things more transparent and thereby showing pitfalls and greasy spots clearly. This will help in better, clearer and fail safe strategies, as they will not be based on more qualified data. This will ultimately helps companies steer their data focus effectively.

4. Accelerate the journey to Data Driven Enterprise: Wow.. Really, yes really. Data Driven Enterprise is an elite status assuring your sustainable existence. It certainly does not come cheap or easy but requires a meticulous and consistent stride to data based decision-making. Having a centralized data office will surely make it all a reality. Data is the toughest part in any data driven strategy. Getting data has consistently been a struggle and not having a standardized process had made it anything but better. So, a centralized body will embrace better and consistent data quality standard making more and more data usable. Which ultimately plays a crucial role in creating a data driven organization.

5. Makes overall data analytics strategy a reality and not a hidden dream: A hidden message behind making a centralized data office is not just to make data more manageable but to empower current analytics capabilities to utilize more of data in their analysis. Overall company level analytics had been the function of people and teams that are directly working on it. We humans tends to induce some error over the period of time, so what makes us think that our designed analytics will be a true representation of the company. Having a centralized data approach will help in this regards by providing more holistic view on overall analytics strategy. The capability has suddenly increased to leverage all the data elements in defining overall analytics strategy.

So, this should be a no surprise that central data office is start of a long data driven journey as well as sustained & profitable vision of using data for effective decision-making. This will not only make data easily accessible but also encourage people to use more data and less gut in their decision-making processes.

Originally Posted at: Dreaming of being Data Driven? A Case for Central Data Office