Oct 18, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)


Ethics  Source


More WEB events? Click Here


 Cray Inc (NASDAQ:CRAY) Institutional Investor Sentiment Analysis – Thorold News Under  Sentiment Analysis

 Crimson Hexagon’s Plight In Five Words: Facebook Doesn’t Want … – AdExchanger Under  Social Analytics

 Unisys Unveils TrustCheck™, the First Subscription-Based Service … – APN News Under  Risk Analytics

More NEWS ? Click Here


Process Mining: Data science in Action


Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more


The Industries of the Future


The New York Times bestseller, from leading innovation expert Alec Ross, a “fascinating vision” (Forbes) of what’s next for the world and how to navigate the changes the future will bring…. more


Data Analytics Success Starts with Empowerment
Being Data Driven is not as much of a tech challenge as it is an adoption challenge. Adoption has it’s root in cultural DNA of any organization. Great data driven organizations rungs the data driven culture into the corporate DNA. A culture of connection, interactions, sharing and collaboration is what it takes to be data driven. Its about being empowered more than its about being educated.


Q:How would you define and measure the predictive power of a metric?
A: * Predictive power of a metric: the accuracy of a metric’s success at predicting the empirical
* They are all domain specific
* Example: in field like manufacturing, failure rates of tools are easily observable. A metric can be trained and the success can be easily measured as the deviation over time from the observed
* In information security: if the metric says that an attack is coming and one should do X. Did the recommendation stop the attack or the attack never happened?



Understanding #BigData #BigOpportunity in Big HR by @MarcRind #FutureOfData #Podcast

 Understanding #BigData #BigOpportunity in Big HR by @MarcRind #FutureOfData #Podcast

Subscribe to  Youtube


You can use all the quantitative data you can get, but you still have to distrust it and use your own intelligence and judgment. – Alvin Tof


@Schmarzo @DellEMC on Ingredients of healthy #DataScience practice #FutureOfData #Podcast

 @Schmarzo @DellEMC on Ingredients of healthy #DataScience practice #FutureOfData #Podcast


iTunes  GooglePlay


IDC Estimates that by 2020,business transactions on the internet- business-to-business and business-to-consumer – will reach 450 billion per day.

Sourced from: Analytics.CLUB #WEB Newsletter

Periodic Table Personified [image]

Have you ever tried memorizing periodic table? It is a daunting task as it has lot of elements and all coded with 2 alphabet characters. So, what is the solution? There are various methods used to do that. For one check out Wonderful Life with the Elements: The Periodic Table Personified by Bunpei Yorifuji. In his effortm Bunpei personified all the elements. It is a fun way to identify eact element and make it easily recognizable.

In his book, Yorifuji makes the many elements seem a little more individual by illustrating each one as as an anthropomorphic cartoon character, with distinctive hairstyles and clothes to help readers tell them apart. As for example, take Nitrogens, they have mohawks because they “hate normal,” while in another example, noble gases have afros because they are “too cool” to react to extreme heat or cold. Man-made elements are depicted in robot suits, while elements used in industrial application wear business attire.

Image by Wired

Source: Periodic Table Personified [image] by v1shal

Exploring the Structure of High-Dimensional Data with HyperTools in Kaggle Kernels

Exploring the structure of high-dimensional data with HyperTools in Kaggle Kernels

The datasets we encounter as scientists, analysts, and data nerds are increasingly complex. Much of machine learning is focused on extracting meaning from complex data. However, there is still a place for us lowly humans: the human visual system is phenomenal at detecting complex structure and discovering subtle patterns hidden in massive amounts of data. Every second that our eyes are open, countless data points (in the form of light patterns hitting our retinas) are pouring into visual areas of our brain. And yet, remarkably, we have no problem at all recognizing a neat looking shell on a beach, or our friend’s face in a large crowd. Our brains are “unsupervised pattern discovery aficionados.”

On the other hand, there is at least one major drawback to relying on our visual systems to extract meaning from the world around us: we are essentially capped at perceiving just 3 dimensions at a time, and many datasets we encounter today are higher dimensional.

So, the question of the hour is: how can we harness the incredible pattern-recognition superpowers of our brains to visualize complex and high-dimensional datasets?

Dimensionality Reduction

In comes dimensionality reduction, stage right. Dimensionality reduction is just what it sounds like– transforming a high-dimensional dataset into a lower-dimensional dataset. For example, take this UCI ML dataset on Kaggle comprising observations about mushrooms, organized as a big matrix. Each row is comprised of a bunch of features of the mushroom, like cap size, cap shape, cap color, odor etc. The simplest way to do dimensionality reduction might be to simply ignore some of the features (e.g. pick your favorite three—say size, shape, and color—and ignore everything else). However, this is problematic if the features you drop contained valuable diagnostic information (e.g. whether the mushrooms are poisonous).

A more sophisticated approach is to reduce the dimensionality of the dataset by only considering its principal components, or the combinations of features that explains the most variance in the dataset. Using a technique called principal components analysis (or PCA), we can reduced the dimensionality of a dataset, while preserving as much of its precious variance as possible. The key intuition is that we can create a new set of (a smaller number of) features, where each of the new features is some combination of the old features. For example, one of these new features might reflect a mix of shape and color, and another might reflect a mix of size and poisonousness. In general, each new feature will be constructed from a weighted mix of the original features.

Below is a figure to help with the intuition. Imagine that you had a 3 dimensional dataset (left), and you wanted to reduce it to a 2 dimensional dataset (right). PCA finds the principal axes in the original 3D space where the variance between points is the highest. Once we identify the two axes that explain the most variance (the black lines in the left panel), we can re-plot the data along just those axes, as shown on the right. Our 3D dataset is now 2D. Here we have chosen a low-dimensional example so we could visualize what is happening. However, this technique can be applied in the same way to higher-dimensional datasets.

We created the HyperTools package to facilitate these sorts of dimensionality reduction-based visual explorations of high-dimensional data. The basic pipeline is to feed in a high-dimensional dataset (or a series of high-dimensional datasets) and, in a single function call, reduce the dimensionality of the dataset(s) and create a plot. The package is built atop many familiar friends, including matplotlib, scikit-learn and seaborn. HyperTools is designed with ease of use as a primary objective. We highlight two example use cases below.

Mushroom foraging with HyperTools: Visualizing static ‘point clouds’

First, let’s explore the mushrooms dataset we referenced above. We start by importing the relevant libraries:

import pandas as pd
import hypertools as hyp

and then we read in our data into a pandas DataFrame:

data = pd.read_csv('../input/mushrooms.csv')
index class cap-shape cap-surface cap-color bruises odor gill-attachment
0 p x s n t p f
1 e x s y t a f
2 e b s w t l f
3 p x y w t p f
4 e x s g f n f
5 e x y y t a f

Each row of the DataFrame corresponds to a mushroom observation, and each column reflects a descriptive feature of the mushroom (only some of the rows and columns are shown above). Now let’s plot the high-dimensional data in a low dimensional space by passing it to HyperTools. To handle text columns, HyperTools will first convert each text column into a series of binary ‘dummy’ variables before performing the dimensionality reduction. For example, if the ‘cap size’ column contained ‘big’ and ‘small’ labels, this single column would be turned into two binary columns: one for ‘big’ and one for ‘small’, where 1s represents the presence of that feature and 0s represents the absence (for more on this, see the documentation for the get_dummies function in pandas).

hyp.plot(data, 'o')

In plotting the DataFrame, we are effectively creating a three-dimensional “mushroom space,” where mushrooms that exhibit similar features appear as nearby dots, and mushrooms that exhibit different features appear as more distant dots. By visualizing the DataFrame in this way, it becomes immediately clear that there are multiple clusters in the data. In other words, all combinations of mushroom features are not equally likely, but rather certain combinations of features tend to go together. To better understand this space, we can color each point according to some feature in the data that we are interested in knowing more about. For example, let’s color the points according to whether the mushrooms are (p)oisonous or (e)dible (the class_labels feature):

hyp.plot(data,'o', group=class_labels, legend=list(set(class_labels)))

Visualizing the data in this way highlights that mushrooms’ poisonousness appears stable within each cluster (e.g. mushrooms that have similar features), but varies across clusters. In addition, it looks like there are a number of distinct clusters that are poisonous/edible. We can explore this further by using the ‘cluster’ feature of HyperTools, which colors the observations using k-means clustering. In the description of the dataset, it was noted that there were 23 different types of mushrooms represented in this dataset, so we’ll set the n_clusters parameter to 23:

hyp.plot(data, 'o', n_clusters=23)

To gain access to the cluster labels, the clustering tool may be called directly using hyp.tools.cluster, and the resulting labels may then be passed to hyp.plot:

cluster_labels = hyp.tools.cluster(data, n_clusters=23)
hyp.plot(data, group=cluster_labels)

By default, HyperTools uses PCA to do dimensionality reduction, but with a few additional lines of code we can use other dimensionality reduction methods by directly calling the relevant functions from sklearn. For example, we can use t-SNE to reduce the dimensionality of the data using:

from sklearn.manifold import TSNE
TSNE_model = TSNE(n_components=3)
reduced_data_TSNE = TSNE_model.fit_transform(hyp.tools.df2mat(data))
hyp.plot(reduced_data_TSNE,'o', group=class_labels, legend=list(set(class_labels)))

Different dimensionality reduction methods highlight or preserve different aspects of the data. A repository containing additional examples (including different dimensionality reduction methods) may be found here.

The data expedition above provides one example of how the geometric structure of data may be revealed through dimensionality reduction and visualization. The observations in the mushrooms dataset formed distinct clusters, which we identified using HyperTools. Explorations and visualizations like this could help guide analysis decisions (e.g. whether to use a particular type of classifier to discriminate poisonous vs. edible mushrooms). If you’d like to play around with HyperTools and the mushrooms dataset, check out and fork this Kaggle Kernel!

Climate science with HyperTools: Visualizing dynamic data

Whereas the mushrooms dataset comprises static observations, here we will take a look at some global temperature data, which will showcase how HyperTools may be used to visualize timeseries data using dynamic trajectories.

This next dataset is made up of monthly temperature recordings from a sample of 20 global cities over the 138 year interval ranging from 1875–2013. To prepare this dataset for analysis with HyperTools, we created a time by cities matrix, where each row is a temperature recording for subsequent months, and each column is the temperature value for a different city. You can replicate this demo by using the Berkeley Earth Climate Change dataset on Kaggle or by cloning this GitHub repo. To visualize temperature changes over time, we will use HyperTools to reduce the dimensionality of the data, and then plot the temperature changes over time as a line:


Well that just looks like a hot mess, now doesn’t it? However, we promise there is structure in there– so let’s find it! Because each city is in a different location, the mean and variance of its temperature timeseries may be higher or lower than the other cities. This will in turn affect how much that city is weighted when dimensionality reduction is performed. To normalize the contribution of each city to the plot, we can set the normalize flag (default value: False). Setting normalize='across' <will normalize (z-score) each column of the data. HyperTools incorporates a number of useful normalization options, which you can read more about here.

hyp.plot(temps, normalize='across')

Now we’re getting somewhere! Rotating the plot with the mouse reveals an interesting shape to this dataset. To help highlight the structure and understand how it changes over time, we can color the lines by year, where more red lines indicates early and more blue lines indicate later timepoints:

hyp.plot(temps, normalize='across', group=years.flatten(), palette='RdBu_r')

Coloring the lines has now revealed two key structural aspects of the data. First, there is a systematic shift from blue to red, indicating a systematic change in the pattern of global temperatures over the years reflected in the dataset. Second, within each year (color), there is a cyclical pattern, reflecting seasonal changes in the temperature patterns. We can also visualize these two phenomena using a two dimensional plot:

hyp.plot(temps, normalize='across', group=years.flatten(), palette='RdBu_r', ndims=2)

Now, for the grand finale. In addition to creating static plots, HyperTools can also create animated plots, which can sometimes reveal additional patterns in the data. To create an animated plot, simply pass animate=True to hyp.plot when visualizing timeseries data. If you also pass chemtrails=True, a low-opacity trace of the data will remain in the plot:

hyp.plot(temps, normalize='across', animate=True, chemtrails=True)

That pleasant feeling you get from looking at the animation is called “global warming.”

This concludes our exploration of climate and mushroom data with HyperTools. For more, please visit the project’s GitHub repository, readthedocs site, a paper we wrote, or our demo notebooks.


Andrew is a Cognitive Neuroscientist in the Contextual Dynamics Laboratory. His postdoctoral work integrates ideas from basic learning and memory research with computational techniques used in data science to optimize learning in natural educational settings, like the classroom or online. Additionally, he develops open-source software for data visualization, research and education.

The Contextual Dynamics Lab at Dartmouth College uses computational models and brain recordings to understand how we extract information from the world around us. You can learn more about us at http://www.context-lab.com.

Source: Exploring the Structure of High-Dimensional Data with HyperTools in Kaggle Kernels

Oct 11, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)


statistical anomaly  Source


More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> How to Successfully Incorporate Analytics Into Your Growth Marketing Process by analyticsweek

>> How the lack of the right data affects the promise of big data in India by analyticsweekpick

>> SDN and network function virtualization market worth $ 45.13 billion by 2020 by analyticsweekpick

Wanna write? Click Here


 Marketing Analytics Software Market Effect and Growth Factors Research and Projection – Coherent News (press release) (blog) Under  Marketing Analytics

 Streaming Analytics Market Research Study including Growth Factors, Types and Application by regions from 2017 to … – managementjournal24.com Under  Streaming Analytics

 State Street: Latest investor sentiment towards Brexit – Asset Servicing Times Under  Risk Analytics

More NEWS ? Click Here


CPSC 540 Machine Learning


Machine learning (ML) is one of the fastest growing areas of science. It is largely responsible for the rise of giant data companies such as Google, and it has been central to the development of lucrative products, such … more


The Future of the Professions: How Technology Will Transform the Work of Human Experts


This book predicts the decline of today’s professions and describes the people and systems that will replace them. In an Internet society, according to Richard Susskind and Daniel Susskind, we will neither need nor want … more


Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.


Q:What are the drawbacks of linear model? Are you familiar with alternatives (Lasso, ridge regression)?
A: * Assumption of linearity of the errors
* Can’t be used for count outcomes, binary outcomes
* Can’t vary model flexibility: overfitting problems
* Alternatives: see question 4 about regularization



Understanding #FutureOfData in #Health & #Medicine - @thedataguru / @InovaHealth #FutureOfData #Podcast

 Understanding #FutureOfData in #Health & #Medicine – @thedataguru / @InovaHealth #FutureOfData #Podcast

Subscribe to  Youtube


Data matures like wine, applications like fish. – James Governor


Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

 Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai


iTunes  GooglePlay


94% of Hadoop users perform analytics on large volumes of data not possible before; 88% analyze data in greater detail; while 82% can now retain more of their data.

Sourced from: Analytics.CLUB #WEB Newsletter

The intersection of analytics, social media and cricket in the cognitive era of computing

Photo Credit: Getty Images.

Since 1975, every fourth year, world class cricketing nations come together for a month long extravaganza. Since February 14th, 2015, 14 teams are battling it out in a 6 week long Cricket World Cup tournament in 14 venues across Australia and New Zealand. During these six weeks, millions of cricket fans alter their schedule to relish the game of cricket world-wide in unison. It is a cricket carnival of sorts.

Cricket world cup is at its peak. All eyes are glued to the sport and nothing much is going unobserved – on the field or off it. Whether it is Shikhar Dhawan scoring a century or Virat Kohli’s anger, cricket enthusiasts are having a ball. There is however another segment that is thriving as cricket fever is reaching a high. It is the dedicated cricket follower who follows each ball, takes stock of each miss, for him each run is a statistic and each delivery an opportunity. I remember years ago, we all used to be hooked to a radio, while on the move keeping track of ball-by-ball updates, then came the television; and now with social media, the stakeholder engagement has become phenomenally addictive.

Such a big fan base is bound to open opportunities for sports tourism, brand endorsements and global partnerships. CWC is an event that many business enterprises tap in order to make their presence felt. With adequate assistance of technology and in-depth insights the possibilities of stakeholder engagement and scaling up ventures is huge like never before.

Sports industry is perhaps one of the biggest enterprises that have willingly adopted technology to change the game for players, viewers, organizers as well as broadcasters. Pathbreaking advent in technology has ensured that the experience of followers of the game has become finer and more nuanced. It is no longer just about what is happening on the field but about what happened on similar occasions in the past, and what could possibly happen given the past records of the team and the players. This ever-growing back and forth between information and analysis makes for a cricket lover’s paradise.

Cognitive analysis of such a large data is no longer just a dream. Machine learning algorithms are getting smarter day by day using cloud computing on clusters, that is about to change the whole landscape of human experience and involvement. To understand what CWC means to different people, from various backgrounds it is important to understand their psychology/perception of the game. A deeper look can bring us closer to understanding how technology, analytics and big data is in fact changing the dynamics of cricket.

A common man’s perspective

Cricket world cup to a common man is about sneaking a win from close encounters, high scoring run fests, electric crowd, blind faith in their teams and something to chew on, spicing their opinion after the game is over. With small boundaries, better willows, fit athletes and pressure situations to overcome to be victorious, every contest is an absolute delight to watch and closely follow. Cricket fans are so deeply attached to the game and the players that every bit of juicy information about the game enthralls them.

A geek’s perspective

In the last forty years the use of technology has changed the game of cricket on the field. Years ago, snickometer was considered revolutionary, then came the pathbreaking Hawk eye followed by Pitchvision, DRS (Decision Review System) and now we have flashing cricket bails. For cricketers this has meant a better reviewing process. Now they understand their game better, correct their mistakes, prepare against their weakness and also plan specific strategies against individual players of the opposite team. For cricket followers and business houses this has meant a better engagement with the audience, a deeper personalised experience and a detailed understanding of what works, what does not.

This increase in the viewer-engagement quotient has been boosted with Matchups covering past records on player performance, match stats etc. Wisden captures data from each match and provides the basis of comparatives around player potential, strike rate, runs in the middle over, important players in the death overs etc.

While Wisden India provides all the data points, IBM’s Analytics engine processes the information into meaningful assets of historical data making it possible predict future outcomes. For CWC 2015 IBM has partnered with Wisden to provide the viewers with live match analysis, player performance, which is very frequently used by commentators, coaches to keep the viewers glued to the match proceedings.

Just like it makes insightful observations from a vast trove of data in cricket, IBM’s Analytics Engine equips organizations to take advantage of all data to make better business decisions. Using analytics-instead of instinct-can help organizations provide personalized experiences to its customers, spot new opportunities and risks and create new business models.

Similarly, with social media outreach, the overall engagement of viewers in the game has become crucial in boosting confidence of a team or succumbing to the pressure of the masses.

Aggregating shared opinion on social sites is a key to highlighting the expectations, generating perceived predictions about the teams, potential wins, most popular players etc.

To give an idea of the numbers and technology involved, as part of Twitterati, IBM processed about 3 million tweets on an average, in a two match day, analysed at 10 min intervals.

IBM Cloudant was used to store tweets, crawled from twitter having match/tournament specific hashtags. According to their needs, IBM fetched the tweets from Cloudant and generated the events specific to every match. IBM Bluemix automates the process of getting tweets from Twitter and generating the events corresponding to every match given the schedule of the tournament of Cricket World Cup. The application is hosted in Bluemix. Apart from these technologies, IBM developed the core engine that identifies events from the Twitter feed.

The Social Sentiment Index analyzed around 3.5 million tweets, by tracking about 700 match-specific events daily in Twitter. IBM Data Curation and Integration capabilities were used on BigInsights and Social Data Accelerator (SDA) to extract social insights from streaming twitter feed in real time.

Moreover, IBM Text Analytics and Natural Language Processing performs fine grained temporal analytics around events that have short lifespan but are important — events like boundaries, sixes and wickets.

IBM Social Media Analytics also examines the quantum of discussion around teams, players, events. It examines sentiments across different entities and Identify topics that are trending and understands ways in which advertisers can use the discussion to appropriately position their products and services.

IBM Content Analytics examines the large social content more deeply and tries to mimic human cognition and learning behavior to answer complex questions like the impact of certain player or attributes determining the outcome of the game.

An enterprise perspective

What is most interesting to businesses however is that observing these campaigns help in understanding the consumer sentiment to drive sales initiatives. With right business insights in the nick of time, in line with social trends, several brands have come up with lucrative offers one can’t refuse. In earlier days, this kind of marketing required pumping in of a lot of money and waiting for several weeks before one could analyse and approve the commercial success of a business idea. With tools like IBM Analytics at hand, one can not only grab the data needed, assess it so it makes a business sense, but also anticipate the market response.

Imagine how, in the right hands, especially in the data sensitive industry, the facility of analyzing large scale structured and unstructured data combined with cloud computing and cognitive machine learning can lead to capable and interesting solutions with weighted recommendations at your disposal.

The potential of idea already sounds a game-changer to me. When I look around, every second person is tweeting and posting about everything around them. There are volumes of data waiting to be analyzed. With the power to process the virality of the events in real-time across devices, sensors, applications, I can vouch that with data mining and business intelligence capabilities, cloud computing can significantly improve and empower businesses to run focused campaigns.

With engines like Social Data Accelerator Cloudant, Social Data Curation at your service, social data analysis can be democratized to a fairly accurate possibility, opening new channels of business, which have not been identified so far. CWC 2015 insight is just the beginning. Howzzat?

Originally posted via “The intersection of analytics, social media and cricket in the cognitive era of computing”

Source: The intersection of analytics, social media and cricket in the cognitive era of computing by analyticsweekpick

Oct 04, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Connection Information

To perform the requested action, WordPress needs to access your web server. Please enter your FTP credentials to proceed. If you do not remember your credentials, you should contact your web host.

Connection Type




Data interpretation  Source


More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> The End of Transformation: Expediting Data Preparation and Analytics with Edge Computing by jelaniharper

>> Movie Recommendations? How Does Netflix Do It? A 9 Step Coding & Intuitive Guide Into Collaborative Filtering by nbhaskar

>> Your Firm’s Culture Need to Catch Up with its Business Analytics? by analyticsweekpick

Wanna write? Click Here


Artificial Intelligence


This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more


Big Data: A Revolution That Will Transform How We Live, Work, and Think


“Illuminating and very timely . . . a fascinating — and sometimes alarming — survey of big data’s growing effect on just about everything: business, government, science and medicine, privacy, and even on the way we think… more


Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.


Q:Give examples of bad and good visualizations?
A: Bad visualization:
– Pie charts: difficult to make comparisons between items when area is used, especially when there are lots of items
– Color choice for classes: abundant use of red, orange and blue. Readers can think that the colors could mean good (blue) versus bad (orange and red) whereas these are just associated with a specific segment
– 3D charts: can distort perception and therefore skew data
– Using a solid line in a line chart: dashed and dotted lines can be distracting

Good visualization:
– Heat map with a single color: some colors stand out more than others, giving more weight to that data. A single color with varying shades show the intensity better
– Adding a trend line (regression line) to a scatter plot help the reader highlighting trends



Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

 Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

Subscribe to  Youtube


The most valuable commodity I know of is information. – Gordon Gekko


#BigData @AnalyticsWeek #FutureOfData with Jon Gibs(@jonathangibs) @L2_Digital

 #BigData @AnalyticsWeek #FutureOfData with Jon Gibs(@jonathangibs) @L2_Digital


iTunes  GooglePlay


The Hadoop (open source software for distributed computing) market is forecast to grow at a compound annual growth rate 58% surpassing $1 billion by 2020.

Sourced from: Analytics.CLUB #WEB Newsletter

Seven ways predictive analytics can improve healthcare


Everyone is a patient at some time or another, and we all want good medical care. We assume that doctors are all medical experts and that there is good research behind all their decisions.

Physicians are smart, well trained and do their best to stay up to date with the latest research. But they can’t possibly commit to memory all the knowledge they need for every situation, and they probably don’t have it all at their fingertips. Even if they did have access to the massive amounts of data needed to compare treatment outcomes for all the diseases they encounter, they would still need time and expertise to analyze that information and integrate it with the patient’s own medical profile. But this kind of in-depth research and statistical analysis is beyond the scope of a physician’s work.

That’s why more and more physicians – as well as insurance companies – are using predictive analytics.

Predictive analytics (PA) uses technology and statistical methods to search through massive amounts of information, analyzing it to predict outcomes for individual patients. That information can include data from past treatment outcomes as well as the latest medical research published in peer-reviewed journals and databases.

Not only can PA help with predictions, but it can also reveal surprising associations in data that our human brains would never suspect.

In medicine, predictions can range from responses to medications to hospital readmission rates. Examples are predicting infections from methods of suturing, determining the likelihood of disease, helping a physician with a diagnosis, and even predicting future wellness.

The statistical methods are called learning models because they can grow in precision with additional cases. There are two major ways in which PA differs from traditional statistics (and from evidence-based medicine):

  • First, predictions are made for individuals and not for groups
  • Second PA does not rely upon a normal (bell-shaped) curve.

Prediction modelling uses techniques such as artificial intelligence to create a prediction profile (algorithm) from past individuals. The model is then “deployed” so that a new individual can get a prediction instantly for whatever the need is, whether a bank loan or an accurate diagnosis.

In this post, I discuss the top seven benefits of PA to medicine – or at least how they will be beneficial once PA techniques are known and widely used. In the United States, many physicians are just beginning to hear about predictive analytics and are realizing that they have to make changes as the government regulations and demands have changed. For example, under the Affordable Care Act, one of the first mandates within Meaningful Use demands that patients not be readmitted before 30 days of being dismissed from the hospital. Hospitals will need predictive models to accurately assess when a patient can safely be released.

1. Predictive analytics increase the accuracy of diagnoses.

Physicians can use predictive algorithms to help them make more accurate diagnoses. For example, when patients come to the ER with chest pain, it is often difficult to know whether the patient should be hospitalized. If the doctors were able to answers questions about the patient and his condition into a system with a tested and accurate predictive algorithm that would assess the likelihood that the patient could be sent home safely, then their own clinical judgments would be aided. The prediction would not replace their judgments but rather would assist.

In a visit to one’s primary care physician, the following might occur: The doctor has been following the patient for many years. The patient’s genome includes a gene marker for early onset Alzheimer’s disease, determined by researchers using predictive analytics. This gene is rare and runs in the patient’s family on one side. Several years ago, when it was first discovered, the patient agreed to have his blood taken to see if he had the gene. He did. There was no gene treatment available, but evidence based research indicated to the PCP conditions that may be helpful for many early Alzheimer’s patients.

Ever since, the physician has had the patient engaging in exercise, good nutrition, and brain games apps that the patient downloaded on his smart phone and which automatically upload to the patient’s portal. Memory tests are given on a regular basis and are entered into the electronic medical record (EMR), which also links to the patient portal. The patient himself adds data weekly onto his patient portal to keep track of time and kinds of exercises, what he is eating, how he has slept, and any other variable that his doctor wishes to keep track of.

Because the PCP has a number of Alzheimer’s patients, the PCP has initiated an ongoing predictive study with the hope of developing a predictive model for individual likelihood of memory maintenance and uses, with permission, the data thus entered through the patients’ portals. At this visit, the physician shares the good news that a gene therapy been discovered for the patient’s specific gene and recommends that the patient receive such therapy.

2. Predictive analytics will help preventive medicine and public health.

With early intervention, many diseases can be prevented or ameliorated. Predictive analytics, particularly within the realm of genomics, will allow primary care physicians to identify at-risk patients within their practice. With that knowledge, patients can make lifestyle changes to avoid risks (An interview with Dr. Tim Armstrong on this WHO podcast explores the question: Do lifestyle changes improve health?)

As lifestyles change, population disease patterns may dramatically change with resulting savings in medical costs. As Dr. Daniel Kraft, Medicine and Neuroscience Chair at Stanford University, points out in his videoMedicine 2064:

During the history of medicine, we have not been involved in healthcare; no, we’ve been consumed by sick care. We wait until someone is sick and then try to treat that person. Instead, we need to learn how to avoid illness and learn what will make us healthy. Genomics will play a huge part in the shift toward well-living.

As Dr. Kraft mentions, our future medications might be designed just for us because predictive analytics methods will be able to sort out what works for people with “similar subtypes and molecular pathways.”

3. Predictive analytics provides physicians with answers they are seeking for individual patients.

Evidence-based medicine (EBM) is a step in the right direction and provides more help than simple hunches for physicians. However, what works best for the middle of a normal distribution of people may not work best for an individual patient seeking treatment. PA can help doctors decide the exact treatments for those individuals. It is wasteful and potentially dangerous to give treatments that are not needed or that won’t work specifically for an individual. (This topic is covered in a paper by the Personalized Medicine Coalition.) Better diagnoses and more targeted treatments will naturally lead to increases in good outcomes and fewer resources used, including the doctor’s time.

4. Predictive analytics can provide employers and hospitals with predictions concerning insurance product costs.

Employers providing healthcare benefits for employees can input characteristics of their workforce into a predictive analytic algorithm to obtain predictions of future medical costs. Predictions can be based upon the company’s own data or the company may work with insurance providers who also have their own databases in order to generate the prediction algorithms. Companies and hospitals, working with insurance providers, can synchronize databases and actuarial tables to build models and subsequent health plans. Employers might also use predictive analytics to determine which providers may give them the most effective products for their particular needs. Built into the models would be the specific business characteristics. For example, if it is discovered that the average employee visits a primary care physician six times a year, those metrics can be included in the model.

Hospitals will also work with insurance providers as they seek to increase optimum outcomes and quality assurance for accreditation. In tailoring treatments that produce better outcomes, accreditation standards are both documented and increasingly met. (Likewise, predictive analytics can support the Accountable Care Organization (ACO) model in that the primary goal of ACO is the reduction of costs by treating specific patient populations successfully. Supply chain management (SCM) for model hospitals and insurance providers will change as needs for resources change; in fact when using PA, those organizations may see otherwise hidden opportunities for savings and increasing efficiency. PA has a way of bringing our attention to that which may not have been seen before.

5. Predictive analytics allow researchers to develop prediction models that do not require thousands of cases and that can become more accurate over time.

In huge population studies, even very small differences can be “statistically significant.” Researchers understand that randomly assigned case control studies are superior to observational studies, but often it is simply not feasible to carry out such a design. From huge observational studies, the small but statistically significant differences are often not clinically significant. The media, ignorant of research nuances, may then focus on those small but statistically significant findings, convincing and sometimes frightening the public. Researchers also are to blame as sometimes they themselves do not understand the difference between statistical significance and clinical significance.

For example, in a TEDxColumbiaEngineering talk, Dr. David H. Newman spoke about the recent recommendation by the media that small to moderate alcohol consumption by women can result in higher levels of certain cancers. Many news programs and newspapers loudly and erroneously warned women not to drink even one alcoholic drink per day.

In contrast with predictive analytics, initial models in can be generated with smaller numbers of cases and then the accuracy of such may be improved over time with increased cases. The models are alive, learning, and adapting with added information and with changes that occur in the population over time.

In order to make use of data across practices, electronic data record systems will need to be compatible with one another; interoperability, or this very coordination, is important and has been mandated by the US government. Governance around the systems will require transparency and accountability. One program suite, STATISTICA, is familiar with governance as it has worked with banks, pharmaceutical industries and government agencies. Using such a program will be crucial in order to offer “transparent” models, meaning they work smoothly with other programs, such as Microsoft and Visual Basic. In addition, STATISTICA can provide predictive models using double-blind elements and random assignment, satisfying the continued need for controlled studies.

On the other hand, some programs are proprietary, and users often have to pay the statistical company to use their own data. In addition, they may find that the system is not compatible other systems if they need to make changes. When dealing with human life, the risks of making mistakes are increased, and the models used must lend themselves to making the systems valid, sharable and reliable.

6. Pharmaceutical companies can use predictive analytics to best meet the needs of the public for medications.

There will be incentives for the pharmaceutical industry to develop medications for ever smaller groups. Old medications, dropped because they were not used by the masses, may be brought back because drug companies will find it economically feasible to do so. In other words, previous big bulk medications are certain to be used less if they are found not to help many of those who were prescribed them. Less used medications will be economically lucrative to revive and develop as research is able to predict those who might benefit from them. For example, if 25,000 people need to be treated with a medication “shotgun-style” in order to save 10 people, then much waste has occurred. All medications have unwanted side effects. The shotgun-style delivery method can expose patients to those risks unnecessarily if the medication is not needed for them. Dr. Newman (above) discussed the probably overuse of statins as one example.

7. Patients have the potential benefit of better outcomes due to predictive analytics.

There will be many benefits in quality of life to patients as the use of predictive analytics increase. Potentially individuals will receive treatments that will work for them, be prescribed medications that work for them and not be given unnecessary medications just because that medication works for the majority of people. The patient role will change as patients become more informed consumers who work with their physicians collaboratively to achieve better outcomes. Patients will become aware of possible personal health risks sooner due to alerts from their genome analysis, from predictive models relayed by their physicians, from the increasing use of apps and medical devices (i.e., wearable devices and monitoring systems), and due to better accuracy of what information is needed for accurate predictions. They then will have decisions to make about life styles and their future well being.


Conclusion:  Changes are coming in medicine worldwide.

In developed nations, such as the United States, predictive analytics are the next big idea in medicine –the next evolution in statistics – and roles will change as a result.

  • Patients will have to become better informed and will have to assume more responsibility for their own care, if they are to make use of the information derived.
  • Physician roles will likely change to more of a consultant than decision maker, who will advise, warn and help individual patients. Physicians may find more joy in practice as positive outcomes increase and negative outcomes decrease. Perhaps time with individual patients will increase and physicians can once again have the time to form positive and lasting relationships with their patients. Time to think, to interact, to really help people; relationship formation is one of the reasons physicians say they went into medicine, and when these diminish, so does their satisfaction with their profession.
  • Hospitals, pharmaceutical companies and insurance providers will see changes as well. For example, there may be fewer unnecessary hospitalizations, resulting initially in less revenue. Over time, however, admissions will be more meaningful, the market will adjust, and accomplishment will rise. Initially, revenues may also be lost by pharmaceutical and device companies, but then more specialized and individualized offerings will increase profits. They may be forced to find newer and better solutions for individuals, ultimately providing them with fresh sources of revenue. There may be increased governmental funds offered for those who are innovative in approach.

All in all, changes are coming. The genie is out of the box and, in fact, is building boxes for the rest of us. Smart industries will anticipate and prepare.

These changes that can literally revolutionize the way medicine is practiced for better health and disease reduction.

I think about the Bayer TV commercialin which a woman gets a note that says, “Your heart attack will arrive in two days.” The voiceover proclaims, “Laura’s heart attack didn’t come with a warning.” Not so with predictive analytics. That very message could be sent to Laura from her doctor who uses predictive analytics. Better yet, in our bright future, Laura might get the note from her doctor that says, “Your heart attack will occur eight years from now, unless …” – giving Laura the chance to restructure her life and change the outcome.

Note: This article originally appeared in Elsevier. Click for link here.

Source: Seven ways predictive analytics can improve healthcare

Sep 27, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)


Complex data  Source


More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Practical Tips for Running a PURE Evaluation by analyticsweek

>> Big Data Explained in Less Than 2 Minutes – To Absolutely Anyone by analyticsweekpick

>> 20 Best Practices in Customer Feedback Programs: Building a Customer-Centric Company by bobehayes

Wanna write? Click Here


 DENAVE launches DENSALES – an end-to-end sales force automation solution – Express Computer (press release) (blog) Under  Sales Analytics

 Hyperconverged Startup Cohesity Hits $200M Annual Sales Pace – Data Center Knowledge Under  Data Center

 Like Magic: Seamless Customer Care In An IoT-Connected World – Forbes Under  Customer Experience

More NEWS ? Click Here


CS229 – Machine Learning


This course provides a broad introduction to machine learning and statistical pattern recognition. … more


The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t


People love statistics. Statistics, however, do not always love them back. The Signal and the Noise, Nate Silver’s brilliant and elegant tour of the modern science-slash-art of forecasting, shows what happens when Big Da… more


Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.


Q:Compare R and Python
A: R
– Focuses on better, user friendly data analysis, statistics and graphical models
– The closer you are to statistics, data science and research, the more you might prefer R
– Statistical models can be written with only a few lines in R
– The same piece of functionality can be written in several ways in R
– Mainly used for standalone computing or analysis on individual servers
– Large number of packages, for anything!

– Used by programmers that want to delve into data science
– The closer you are working in an engineering environment, the more you might prefer Python
– Coding and debugging is easier mainly because of the nice syntax
– Any piece of functionality is always written the same way in Python
– When data analysis needs to be implemented with web apps
– Good tool to implement algorithms for production use



#BigData @AnalyticsWeek #FutureOfData #Podcast with Joe DeCosmo, @Enova

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Joe DeCosmo, @Enova

Subscribe to  Youtube


Data really powers everything that we do. – Jeff Weiner


Discussing #InfoSec with @travturn, @hrbrmstr(@rapid7) @thebearconomist(@boozallen) @yaxa_io

 Discussing #InfoSec with @travturn, @hrbrmstr(@rapid7) @thebearconomist(@boozallen) @yaxa_io


iTunes  GooglePlay


More than 200bn HD movies – which would take a person 47m years to watch.

Sourced from: Analytics.CLUB #WEB Newsletter

Piwik PRO Introduces New Pricing Packages For Small and Medium Enterprises

We’re happy to announce that new pricing plans are available for Piwik PRO Marketing Suite. The changes will allow small and medium enterprises to take advantage of affordable privacy-compliant marketing tools (including Consent Manager) to meet the requirements of the GDPR.

In recent weeks, one of the most restrictive data privacy regulations the world has ever seen came into force – we’re obviously talking about GDPR.

Now every company that processes the personal data of EU citizens has to make sure that their internal processes, services and products are in line with the provisions of the new law (we wrote about it in numerous articles on our blog, be sure to check them out). Otherwise, they risk severe fines.

Among many other things, they have to collect active consents from visitors before they start processing their data.

The new rules apply not only to large corporations, but also to small and medium-sized enterprises.

When the market standard is not enough

The reason for the worry for many of them is the fact that the most popular freemium analytics software provider decided to limit their support in that matter to the bare minimum.

Although Google introduced some product updates that aim to help their clients comply with the new regulation (like data retention control and a user deletion tool), they decided that their clients (data controllers) are the ones who have to develop their own mechanism for collecting, managing, and storing consents (via opt-in) from visitors (for both Google Analytics and Google Tag Manager).

Following all these rules can be a hassle for website owners, especially small to medium enterprises with often limited resources of time and workforce.

Important note! Recent events indicate that Google could be an unreliable partner in the face of the new EU regulations. On the first day after the regulation came into force, Google was sued for violating provisions of GDPR by Max Schrems an Austrian lawyer and privacy activist. You can read about it more in this article by Verge.

How Piwik PRO can help you with the task

Luckily, there are many vendors who decided to create a tool to mediate between visitors and analytics software. Depending on the provider, it’s called Cookie Consent Manager, Cookie Widget, GDPR Consent Manager, etc.

These tools are a kind of gatekeeper that passes information about consents between individual visitors and your analytics system. That way, you make sure that the data you’re operating on has been collected in compliance with the new law.

One of the companies developing this type of product is Piwik PRO. You can read more about our solution here.

New pricing plan for small and medium enterprises

Due to the growing interest in our GDPR Consent Manager among small and medium enterprises, we decided to prepare a special offer tailored to their needs.

All companies wanting to collect data about the behavior of their website’s visitors in a privacy-compliant manner, will be able to take advantage of the new Business Plan” pricing package. The offer is intended for businesses with up 2 million monthly actions on their websites.

It includes the following products:

The combined forces of these products will help you collect all the relevant information about visitors without violating the provisions of the new law (and also other data privacy laws including Chinese Internet Law and Russian law 526-FZ).

Additionally, your data will be stored in highly secure environment:

  • ISO 27001 Certified private cloud data center
  • Fully-redundant infrastructure with 99% SLA
  • Microsoft Azure GDPR-compliant cloud infrastructure, hosted in the location of your choice: Germany, Netherlands, USA

What’s more, you’ll count on professional customer support including:

  • Email support
  • Live chat
  • User training
  • Professional Onboarding

Sound interesting? Then give it a (free) spin! All you have to do is register for 30-day free trial. Our sales representatives will contact you within 24 hours!

You also can read more about the offer on our pricing page.


The post Piwik PRO Introduces New Pricing Packages For Small and Medium Enterprises appeared first on Piwik PRO.

Source: Piwik PRO Introduces New Pricing Packages For Small and Medium Enterprises by analyticsweek

Big Data’s Big Deal is HUMAN TOUCH, not Technology

Big Data Big Deal - Human Touch


I have been involved in marketing analytics work for some years now. It requires me to regularly talk to CXOs about their big data challenges, and their plan to leverage this data to improve business decision making. I am constantly surprised how much misconception exists among executives. All of them read about new technologies and platforms coming out of Silicon Valley that magically clean, organize, analyze and visualize data for them. As if, they just have to implement some technology, press a button, and insights would start flowing.

This is a myth. There is no such (magical) technology-based analysis. Period.

Big Data’s big deal is not about technology platforms – it is rather about appropriate human interface with data technology.

I am myself guilty of selling big data solutions under the facade of technology and platforms. In many ways, I have contributed to this misconception about Big Data technology. So, I hope you believe me when I tell you – Big Data’s big deal is not about technology platforms – it is rather about appropriate human interface with data technology. Let’s not continue to speculate that technology platforms would save the enterprise from all data problems. I have seen the most advanced technology platforms that exist today. There is only one thing I know – these platforms would serve no purpose if we don’t have trained data professionals who know three basic things – business/domain knowledge, analytical experience, and ability to embrace new data technology.


We all know – there is data everywhere. In the past couple of years, the world has generated more data than the prior civilization put together. Whether it is content posted on web and social media, data transmitted from sensors in cars, appliances, buildings and airplanes, or streamed to your mobile, television or computers, we are surrounded and overwhelmed by data. Advancements in technology are the main driver of this data deluge, but similar advancements have taken place in the technology to collect and store data. This has made it economical for organizations to build infrastructure to store and manage large sets of data. But, the real problem is deriving value out of this data and making it useful. This is where most of the stagnation is today. According to International Data Corporation (IDC), only one percent of the digital data generated is currently being analyzed.


Everyone agrees there is a big data revolution happening, but it is not about the volume and scale of data being generated. The revolution is about the ability to actually do something with that data. What used to take millions of dollars to first build the infrastructure and then hire really smart and expensive individuals to analyze data, can now be done in thousands. It all comes down to using the right set of new age technologies and implementing right set of rules (read algorithms) to deliver answers that weren’t possible earlier. This is where the new age data computation and analysis shines. We have come a long way to leverage machine learning, graph analysis, predictive modeling algorithms and other techniques to uncover patterns and correlations that may not be readily apparent, but may turn out to be highly beneficial for business decision making.

There have been vast improvements in how and what type of datasets can be linked together to capture insights that aren’t possible with singular datasets. An example that everyone understands is how Amazon links together shopping and purchase history of customers to make product recommendations. Along with linking of datasets, improvements in visualization tools have made it much easier for humans to analyze data and see patterns. These technologies are now making inroads into all types of disparate use cases to solve complex problems ranging from pharmaceutical drug discovery to providing terrorism alerts.


Insights can only be delivered by data scientists. And, there is a huge shortage of people who are comfortable with handling large amounts of data. Data collection is easy and cheap, and the general approach is to collect everything and worry later about relevancy and finding patterns. This can be a mistake especially with large datasets because there can be numerous possible correlations that increase the number of false positives that can surface. No matter how sophisticated technologies get, we need more data scientists outside of academia and working full-time on solving real world problems.

Machines cannot replace human beings when it comes to asking the right questions, deciding what to analyze, what data to use, identifying patterns and interpreting results for the business. Machines are good at doing fast computations and analysis, but we need data scientists to build hypothesis, design tests, and use the data to confirm the hypothesis. Traditional data scientists are not the solution, though. There are many generalists in the data science field who claim that if you throw data at them – they can deliver insights. But, here’s the reality – someone who doesn’t have knowledge of your business can only have limited (if any) impact. In addition, data scientists need to make sure decision makers are not presented with too much data because it quickly becomes useless. This is where technology and analytical experience comes in handy – techniques that help aggregate, organize, filter and cluster data are extremely important to reduce datasets to digestible chunks.


Company executives need to understand that human touch plays a fundamental role in the big data journey. Insights delivered by technology without proper human interface can put their business at risk, alienate customers, and damage their brand. Given the current advancements, it comes down to putting the right technologies to use and getting the right people (who know your business) in the room to derive value out of the ‘Big Data’. Is that an easy thing to do? What has been your experience?

You can also contact me on twitter @jasmeetio or on my personal website – Jasmeet Sawhney.

Source: Big Data’s Big Deal is HUMAN TOUCH, not Technology