Are APIs becoming the keys to customer experience?

In recent years, APIs have encouraged the emergence of new services by facilitating collaboration between applications and databases of one or more companies. Beyond catalyzing innovation, APIs have also revolutionized the customer-company relationship, allowing it to provide an accurate and detailed picture of the consumer at a time when a quality customer experience now counts as much as the price or capabilities of the product.

APIs: A Bridge Between the Digital and Physical World

Over the years, customer relationship channels have multiplied with consumers who can interact with their brands through stores, voice, email, mobile applications, the web or chatbots. The multiple points of interaction used by customers have made its journey more complex, forcing companies to consider data from these many channels to deliver the most seamless customer experience possible. To do this, they must synchronize data from one channel to another and cross-reference data related to its history with the brand. This is where APIs come into play. These interfaces allow data processing to refine customer knowledge and deliver a personalized experience.

Thanks to a 360° customer view, the digital experience can be extended in store. The API acts as a bridge between the digital and physical world.

The APIs also allow organizations to work with data in a more operational way and especially in real time. However, many companies still treat their loyal customers as if they’ve never interacted before. It is therefore not uncommon to have to reappear after several requests or to retrace the history of previous interactions, which can seriously damage the customer relationship.

The challenge for companies is to deliver a seamless, consistent and personalized experience through real-time analysis. This will provide relevant information to account managers during interaction and allow them to have guidance on the next best action to take, in line with the client’s expectations.

Even better, with APIs, we can predict the customer’s buying behavior and suggest services or products that meet their needs. Indeed, with the data collected, and thanks to the use of artificial intelligence, the cross-tabulations and instant analysis make it possible to refine the selection to offer an increasingly relevant and fluid experience, increasing customer loyalty and thus the economic performance of companies.

The Importance of APIs with GDPR

Recently, there has been a trend to empower consumers to control their data, after new regulations such as the European Payment Services Directive (PSD2) and GDPR came into force in May 2018.

What do they have in common? They both give individuals control over their personal data with the ability to request, delete or share it with other organizations. Thus, within the framework of PSD2, it is now possible to manage your bank account or issue payments through an application that is not necessarily that of your bank. Through this, APIs provide companies the opportunity to offer a dedicated portal to their customers to enable them to manage their data autonomously and offer new, innovative payment services.

For its part, companies will be able to better manage governance and the risks of fraudulent access to data. With an API, a company can proactively detect abnormal or even suspicious data access behaviors in near real time.

APIs are the gateways between companies and their business data and are the answer to real needs that the market is beginning to meet with customer experience. However, many organizations have not yet understood the importance of implementing an API strategy, an essential part of digital transformation, as well as the cloud, and the emergence of increasingly data-driven organizations. APIs are the missing link between data and customer experience — a key companies need to start using.

Ready to Learn More? 

<< Watch the webinar on-demand “APIs for Dummies” >>

The post Are APIs becoming the keys to customer experience? appeared first on Talend Real-Time Open Source Data Integration Software.

Originally Posted at: Are APIs becoming the keys to customer experience? by analyticsweekpick

Sep 19, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15


Data Mining  Source

[ AnalyticsWeek BYTES]

>> The What and Where of Big Data: A Data Definition Framework by bobehayes

>> Do Attitudes Predict Behavior? by analyticsweek

>> Discussing #InfoSec with @travturn @hrbrmstr @thebeareconomist @yaxa_io – Playcast – Data Analytics Leadership Playbook Podcast by v1shal

Wanna write? Click Here


Process Mining: Data science in Action


Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more


Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython


Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored f… more


Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.


Q:Explain Tufte’s concept of ‘chart junk’?
A: All visuals elements in charts and graphs that are not necessary to comprehend the information represented, or that distract the viewer from this information

Examples of unnecessary elements include:
– Unnecessary text
– Heavy or dark grid lines
– Ornamented chart axes
– Pictures
– Background
– Unnecessary dimensions
– Elements depicted out of scale to one another
– 3-D simulations in line or bar charts



Venu Vasudevan @VenuV62 (@ProcterGamble) on creating a rockstar data science team #FutureOfData #Podcast

 Venu Vasudevan @VenuV62 (@ProcterGamble) on creating a rockstar data science team #FutureOfData #Podcast

Subscribe to  Youtube


It’s easy to lie with statistics. It’s hard to tell the truth without statistics. – Andrejs Dunkels


@DrewConway on fabric of an IOT Startup #FutureOfData #Podcast

 @DrewConway on fabric of an IOT Startup #FutureOfData #Podcast


iTunes  GooglePlay


Every person in the US tweeting three tweets per minute for 26,976 years.

Sourced from: Analytics.CLUB #WEB Newsletter

What Are the 3 Critical Keys to Healthcare Big Data Analytics?


Healthcare big data analytics isn’t just a “use it or lose it” proposition for the provider community – it’s quickly becoming a “use it if you want to hold on to anything at all” situation for organizations that must invest in population health management, clinical analytics, and risk stratification if they are to succeed in a value-based reimbursement world.

Maintaining market share during this shift away from the simpler cash transactions of a fee-for-service environment requires organizations to take a proactive dive into their financial and clinical data, yet developing the technological and organizational competencies to take advantage of big data tools is just as complex as it sounds.

Healthcare big data analytics

Despite the vital importance of using big data to describe, predict, and prevent costly events in large patient populations, providers of all types and sizes are struggling to collect, categorize, store, retrieve, and analyze their data assets.

In a recent industry poll, Stoltenberg Consulting found that big data confuses half of providers, and six percent of participants were too intimidated by the process to even consider starting a healthcare big data analytics program.

Why is big data analytics such a difficult topic to tackle for healthcare organizations?  How do successful providers begin the process?  In this article, healthcare stakeholders weigh in on the three most important foundational steps for beginning a big data analytics and population health management program.

Define a direction and outline specific goals

Big data may be nearly infinite in scope, but having data for the sake of having data will not help achieve measurable organizational objectives.  Healthcare providers must start off their big data journey by defining clear, bite-sized problems that need solving.  Often, these problems are the “low hanging fruit” of healthcare operations: preventable readmissions, emergency department overuse, chronic disease management, patient engagement, and primary care screening rates.

“I would highly encourage people to start around specific high-value use cases,” suggested Marc Perlman, Global Vice President of Healthcare and Life Sciences at Oracle during a 2014 interview.  “I think the average hospital or health system has over 400 different interfaces and integration points, but it may take seven of them working together to give you some value out of your data.  Hospitals and health systems need to think about what they’re trying to fix, and then based upon that, what data sources they need.”

“We think it makes a lot of sense to think about what they are going to try to accomplish, what the methods are that they’re trying to drive, and how they are going to focus on sustainability,” he continued.  “I would say the most important thing is have a vision of your solution, what you’re trying to fix, and know where you’re going.  And then the technology will follow.”

Successful healthcare organizations often flag financial pain points as the gateway into big data analytics.  A March 2015 survey found that 59 percent of hospitals identified higher-than-necessary costs of care as one of their top motivating factors for implementing big data analytics.  Organizations are also seeking the clinical and financial insights necessary to engage in pay-for-performance reimbursement structures and combat the pressures of accountable care.

To do this, organizations implemented clinical analytics technologies that integrate EHR data and patient outcomes into a rich portrait of a patient’s journey through the care continuum.  More than 60 percent of organizations that took this approach have been able to improve their 30-day preventable readmissions rates and cut their mortality rates.

“Start out slow and have realistic goals,” says Dr. Robert M. Fishman, DO, FACP, who has helped to lead Valley Health Partners to the highest level of patient-centered medical home (PCMH) recognition thanks to investments in health IT and a fundamental attitude shift within the physician hospital organization (PHO).  “And when you achieve those goals, set out new goals.”

“We set up a very modest program where we would pick a couple of diagnoses in internal medicine and a couple of diagnoses in pediatrics and we would begin to set up policies and think in a more patient-centric way to reach certain goals,” he said.

“We concentrated on congestive heart failure and COPD, because those two conditions produce a lot of readmissions and emergency department visits.  There’s a lot of expense.  If we could really get a handle on those conditions, we could improve care, improve outcomes and decrease costs.  Three things that we’re all very interested in.”

By focusing on solving specific use cases for big data analytics, these organizations have proven that small big data investments are worth larger ones in the future.

Measure twice, cut once to plan a health IT infrastructure

After outlining a strategy, healthcare organizations can start making investments in advanced health IT products to support their efforts.  While the vast majority of providers already have a basic EHR infrastructure in place – and EHRs are increasingly coming packaged with population health management tools that meet many big data needs – sometimes more sophisticated products are required to get the ball rolling.

But crafting an interoperable health IT infrastructure with a high degree of flexibility and usability is a difficult ask.  Stakeholders are only focusing on the desperate need for interoperability between data systems because it is so rare to find an ecosystem of well-integrated vendor products that communicate freely with one another.

Many organizations continue to be challenged by their convoluted legacy systems, and most providers cannot afford to rip out and replace an entire infrastructure.  For those lucky few starting from scratch, 2015 is a good time to get into the big data analytics game.

As vendors focus on developing plug-and-play technologies and health information exchanges take on more mature roles in the local provider community, crafting an interoperable health IT system is easier than ever – if the right rules are on the table from the beginning.

“When it comes to doing big data analytics, you have to have two things right off the bat,” stated Richard C. Howe, PhD, FHIMSS, Executive Director of the North Texas Regional Extension Center, which also functions as the Dallas-Fort Worth region’s primary HIE.

“The first is strong governance. You have to get all the participants in the same room so they can determine what data they are going to contribute.  Get the absolute governance structure outlined at the beginning so you know what direction you’re trying to go.”

“The second thing is to start simple.  If you start off with thousands of different data elements, you’re just going to drown in the data before you see any results.  We started with claims data, and we found that there is a lot of really good information there that has been valuable to our hospital members even before we started adding more clinical information.  I would say good governance has to start simple.”

An organization’s data governance plan can make or break a big data analytics project: the old “garbage in, garbage out” rule will always apply.  Even the most robust tools require their users tounderstand their potential and their limitations, both of which rely on the quality of data moving through the system.

Understanding the scope of health IT tools, the data standards they are built upon, and the data integrity requirements of leveraging health IT for actionable insights will help organizations pick the best products for their needs in the long term, not just in the next six months to a year.

“When it comes to tools, providers should consider products that can give them the functionality that they need to answer the questions they’re asking today, but can also grow with them to continue answering those questions three years and five years down the road,” said Shane Pilcher, Vice President at Stoltenberg Consulting.

“They have to be thinking three, five and maybe even ten years down the road in terms of what they anticipate their questions are going to be, so that they know they’re on the right track with collecting the data that’s going to answer them.

“Even from the EHR perspective, this is where that long-term plan comes into play.  They know the type of questions that they’re looking for today,” he added.  “They need to anticipate the type of questions they’re going to asking in the future.  But in most cases, you don’t know what you don’t know, so you’ve got to be as creative, as imaginative as you can today when you’re setting up your roadmap.  That’s going to give you the information that you need to start defining what used to be collected today in the EHR and what you need to grow.”

A far-sighted approach to big data analytics may help organizations avoid or mitigate some of the interoperability problems that have plagued the industry for so long.  Investing in products that encourage health information exchange through standardized data elements will make analytics easier and ensure that organizations are set up for meeting ongoing mandates such as meaningful use.

Ensure support from executives and buy-in from clinical staff

No healthcare big data analytics plan can succeed without enthusiasm and support from all levels of the organization.  The board room must provide the funding and the direction; the clinical end-users must understand and embrace new technologies and new workflows.  Big data analytics isn’t just an IT project, but an organizational transformation from top to bottom.

Starting small, measuring results, and demonstrating improvement is often the key to securing executive buy-in, Pilcher says.  “Once you start picking up traction, you can start to identify that low-hanging fruit that can lead to cost savings and improve patient care.  These are cost savings that go directly to the bottom line of the organization and also show return on investment.  By being able to show ROI, the administration may be more inclined to invest more time, more labor, and more capital to further enhance the program and go after bigger and greater fruits.”

And executive leaders may not take too much convincing these days.  Despite the fact that more than a third of providers feel that a lack of leadership is a major barrier to big data analytics success, executives largely recognize the critical role that data competency will play in the immediate future.

Eighty-nine percent of hospital executives participating in a recent PwC poll are taking action to become more innovative and nimble through big data analytics adoption, and 95 percent are seeking to harness the potential of analytics technologies to extract actionable insights from their big data.

During the HIMSS15 Leadership Survey, three-quarters of organizations agreed with the notion that health IT is vital for achieving strategic goals and improvements in patient care.  Over half believe that health IT has helped them improve their population health management programs.   More than four in ten respondents think their executive leaders have a “fairly sophisticated understanding” of big data analytics technologies and the need to leverage them.

C-suite leaders are among the most likely to express an intention to purchase data analytics tools, with Chief Information Officers and Chief Medical Information Officers being the most eager to invest in new health IT products.  Even Chief Financial Officers are recognizing the fundamental need for analytics infrastructure to cut costs, raise revenues, and utilize resources more appropriately.

Getting clinicians to understand why their workflows are suddenly changing can be even more complicated than securing funds to purchase new tools, however.  It is important for providers todevelop a multi-disciplinary team for big data analytics: one that includes representatives from all areas of the organization.

Clinical champions can help to explain to their peers why certain tasks are changing, why new metrics may be pointing out flaws in the patient care process, and why it is important to adapt to an evolving health IT landscape.  Above all, both executive leaders and staff-level super users must be able to point to clear and immediate benefits when introducing a new tool, or risk rebellion among dissatisfied clinicians.

“When we are talking about big data, I think there needs to be a clear purpose,” says Tina Esposito, Vice President of the Center for Health Information Services at Advocate Health Care.  “There has to be a core need or a well-defined problem that you are trying to solve.”

“Big data is a means to an end for solving problems.  So you got to be very clear that you are not pulling this data together just to do put it together. There has got to be a focused effort from the right people to leverage that information so that ultimately you are supporting the business and your population health goals.”

“You need to be sure that what you are creating is usable in the most efficient and easiest way, and that it makes a positive impact on clinicians,” she said. “Is the clinician leveraging that intelligence that you are providing as part of their workflow in the EHR?  Are they seeing a benefit from it?  That’s going to be the most important piece of any big data project.”

Note: This article originally appeared in Health IT Analytics. Click for link here.

Source: What Are the 3 Critical Keys to Healthcare Big Data Analytics?

Sep 12, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15


Data Mining  Source

[ AnalyticsWeek BYTES]

>> Which Customer Loyalty Metric is the Best? My Interview with Jeff Olsen of Allegiance Radio by bobehayes

>> For Musicians and Songwriters, Streaming Creates Big Data Challenge by analyticsweekpick

>> 3 Emerging Big Data Careers in an IoT-Focused World by kmartin

Wanna write? Click Here


Process Mining: Data science in Action


Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more


Storytelling with Data: A Data Visualization Guide for Business Professionals


Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You’ll discover the power of storytelling and the way to make data a pivotal point in your story. Th… more


Data Have Meaning
We live in a Big Data world in which everything is quantified. While the emphasis of Big Data has been focused on distinguishing the three characteristics of data (the infamous three Vs), we need to be cognizant of the fact that data have meaning. That is, the numbers in your data represent something of interest, an outcome that is important to your business. The meaning of those numbers is about the veracity of your data.


Q:When you sample, what bias are you inflicting?
A: Selection bias:
– An online survey about computer use is likely to attract people more interested in technology than in typical

Under coverage bias:
– Sample too few observations from a segment of population

Survivorship bias:
– Observations at the end of the study are a non-random set of those present at the beginning of the investigation
– In finance and economics: the tendency for failed companies to be excluded from performance studies because they no longer exist



@RCKashyap @Cylance on State of Security & Technologist Mindset #FutureOfData #Podcast

 @RCKashyap @Cylance on State of Security & Technologist Mindset #FutureOfData #Podcast

Subscribe to  Youtube


The temptation to form premature theories upon insufficient data is the bane of our profession. – Sherlock Holmes


@ReshanRichards on creating a learning startup for preparing for #FutureOfWork #JobsOfFuture #Podcast

 @ReshanRichards on creating a learning startup for preparing for #FutureOfWork #JobsOfFuture #Podcast


iTunes  GooglePlay


29 percent report that their marketing departments have ‘too little or no customer/consumer data.’ When data is collected by marketers, it is often not appropriate to real-time decision making.

Sourced from: Analytics.CLUB #WEB Newsletter

Sep 05, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15


Weak data  Source

[ AnalyticsWeek BYTES]

>> How to Win Business using Marketing Data [infographics] by v1shal

>> Serverless: A Game Changer for Data Integration by analyticsweekpick

>> Self-Reported Intentions vs Actual Behaviors: Comparing Two Employee Turnover Metrics by bobehayes

Wanna write? Click Here


R, ggplot, and Simple Linear Regression


Begin to use R and ggplot while learning the basics of linear regression… more


The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t


People love statistics. Statistics, however, do not always love them back. The Signal and the Noise, Nate Silver’s brilliant and elegant tour of the modern science-slash-art of forecasting, shows what happens when Big Da… more


Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.


Q:Why is naive Bayes so bad? How would you improve a spam detection algorithm that uses naive Bayes?
A: Naïve: the features are assumed independent/uncorrelated
Assumption not feasible in many cases
Improvement: decorrelate features (covariance matrix into identity matrix)



Unraveling the Mystery of #BigData

 Unraveling the Mystery of #BigData

Subscribe to  Youtube


Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world. – Atul Butte, Stanford


@JohnNives on ways to demystify AI for enterprise #FutureOfData #Podcast

 @JohnNives on ways to demystify AI for enterprise #FutureOfData #Podcast


iTunes  GooglePlay


And one of my favourite facts: At the moment less than 0.5% of all data is ever analysed and used, just imagine the potential here.

Sourced from: Analytics.CLUB #WEB Newsletter

April 17, 2017 Health and Biotech analytics news roundup

Introducing Verily Study Watch: The device has multiple sensors, a long battery life, and a large capacity for storage. It is not available to the public but is currently being used in clinical studies.

Sansoro Health raises $5.2 million led by Bain Capital Ventures: The company’s product allows a link between customers and electronic health records.

Hospital cuts costly falls by 39% due to predictive analytics: The system, at Camino Hospital in California, flags high-risk patients at admission, then continually updates the risk throughout their stay.

Winning with analytics in the pharmaceutical industry: The industry can use analytics to improve efficiency and reduce costs across the business.

Why HIT tools can help organizations navigate the challenges of growth: Most health systems have implemented electronic health records. Along with improving care, these records can help distribute patients throughout a large system and better administer transfers.

Originally Posted at: April 17, 2017 Health and Biotech analytics news roundup

Movie Recommendations? How Does Netflix Do It? A 9 Step Coding & Intuitive Guide Into Collaborative Filtering

‘Movies recommended for you’ – Netflix
‘Videos recommended for you’ – YouTube
‘Restaurants recommended for you’ – Some smart restaurant finder app

Notice a trend? Your favorite apps ‘know’ you (or at least they think they do). They gradually learn your preferences over time (or in a matter of hours) and suggest new products which they think you’ll love.

How is this done? I can’t speak for how Netflix actually makes movie recommendations, but the fundamentals are largely intuitive, actually.

If you keep ‘five staring’ Stoner Comedy movies like the whole ‘Harold and Kumar’ series on Netflix, it makes sense for Netflix to assume that you may also enjoy ‘Ted’, or any other Stoner Comedy film on Netflix.

To make recommendations in a real world application, let’s take our intuition and apply it to a machine learning algorithm called Collaborative Filtering.

The following guide will be done in the ‘Octave’ programming language, so we can properly understand what is going on under the hood of collaborative filtering. Let’s get started.

Step 1 – Initialize The Movie Ratings

Simple but scalable scenario

  • 10 movies
  • 5 users
  • 3 features (we’ll discuss this in Step 3)

Here is an example diagram of movie ratings. Our rating system is from 1-10:


Let’s initialize a 10 X 5 matrix called ‘ratings’; this matrix holds all the ratings given by all users, for all movies. Note: Not all users may have rated all movies, and this is okay.

Note 2: I simply made up some data for ‘ratings’. The point of this step is to simply start off with a dataset that we can work with.

This matrix below contains the same ratings data you saw in the picture above. Here is how we declare it in Octave:

ratings = [
8 4 0 0 4;
0 0 8 10 4;
8 10 0 0 6;
10 10 8 10 10;
0 0 0 0 0;
2 0 4 0 6;
8 6 4 0 0;
0 0 6 4 0;
0 6 0 4 10;
0 4 6 8 8];

Learner’s check:

  • Each column represents all the movies rated by a single user
  • Each row represents all the ratings (from different users) received by a single movie


Recall that our rating system is from 1-10. Notice how there are 0’s to denote that no rating has been given.

Step 2 – Determine Whether a User Rated a Movie

To make our life easier, let’s also declare a binary matrix (0’s and 1’s) to denote whether a user rated a movie.

1 = the user rated the movie.
0 = the user did not rate the movie.

Let’s call this matrix ‘did_rate’. Note it has the same dimensions as ‘ratings’, 10 X 5:

did_rate = ratings ~= 0;

This above command should give you the following binary matrix:

did_rate = 
1 1 0 0 1
0 0 1 1 1
1 1 0 0 1
1 1 1 1 1
0 0 0 0 0
1 0 1 0 1
1 1 1 0 0
0 0 1 1 0
0 1 0 1 1
0 1 1 1 1

Learner’s check:

  • did_rate(2, 3) = 1: This means the 3rd user did rate the 2nd movie
  • did_rate(6, 4) = 0: This means the 4th user did not rate the 6th movie

Step 3 – User Preferences and Movie Features/Characteristics

This is where it gets interesting. In order for us to build a robust recommendation engine, we need to know user preferences and movie features (characteristics). After all, a good recommendation is based off of knowing this key user and movie information.

For example, a user preference could be how much the user likes comedy movies, on a scale of 1-5. A movie characteristic could be to what degree is the movie considered a comedy, on a scale of 0-1.

Example 1: User preferences -> Sample preferences for a single user Chelsea


Example 2: Movie features -> Sample features for a single movie Bad Boys


Note: The user preferences are the exact same as the movie features; in other words, we can map each user preference to a movie feature. This makes sense; if a user has a huge preference for a comedy, we’d like to recommend a movie with a high degree of comedy. If we have add a new preference for the user, for ‘romantic-comedy’, we should also add this as a new feature for a movie, so that our recommendation algorithm can fully use this feature/preference when making a prediction.

Note 2: We can use these numbers that I purposely came up with to ‘predict’ ratings for movies. For example, let’s predict what Chelsea would rate Bad Boys, below:

Chelsea's (C) rating (R) of Bad Boys (BB): RC,BB = comedy feature product * action feature product * romance feature product
RC,BB; = (4.5 * 0.8) + (4.9 * 0.5)  + (3.6 * 0.4)
RC,BB; = 7.49

5 big problems: This seems great, but:

  1. Who has time to sit down and come up with a list of features for users and movies?
  2. It would be very time consuming to come up with a value for each feature, for each and every user and movie.
  3. Why did I pick 1-10 as the range for user preferences and 0-1 as the range for movie features? It seems a bit forced.
  4. How does the product (multiplication) of user_prefs and movie_features magically give us a predicted rating?
  5. Why did I pick ‘comedy’, ‘romance’ and ‘action’ as the features? This seems manual and forced. There must be a better way to generate features

The solution: 

Before we dive deep into the collaborative filtering solution to answer our 4 big problems, let’s quickly introduce some key matrixes that we’ll be needing.

The user features (preferences) can be represented by a matrix ‘user_prefs’. In our example, we have 5 users and 3 features. So, ‘user_prefs’ is a 5 X 3 matrix.

Here is an example diagram to help visualize the data ‘user_prefs’ contains:


The movie features can also be represented by a matrix ‘movie_features’. In our example, we have 10 movies and 3 features. So, ‘movie_features’ is a 10 X 3 matrix.

Here is an example diagram to help visualize the data ‘movie_features’ contains:


Step 4: Let’s Rate Some Movies

I have a list of 10 movies here, in a text file:

1 Harold and Kumar Escape From Guantanamo Bay (2008)
2 Ted (2012)
3 Straight Outta Compton (2015)
4 A Very Harold and Kumar Christmas (2011)
5 Notorious (2009)
6 Get Rich Or Die Tryin' (2005)
7 Frozen (2013)
8 Tangled (2010)
9 Cinderella (2015)
10 Toy Story 3 (2010)

Now, let’s rate some movies. Our ratings can be represented by a 10 X 1 column vector my_ratings. Let’s initialize it to 0’s and make some ratings:

my_ratings = zeros(10, 1);
my_ratings(1) = 7;
my_ratings(5) = 8;
my_ratings(8)= 3;

Learner’s check:

  • I gave Harold and Kumar Escape From Guantanamo Bay a 7
  • I gave Notorious an 8
  • I gave Tangled a 3

Let’s update ratings and did_rate with the our ratings my_ratings:

ratings = [my_ratings ratings];
did_rate = [(my_ratings ~= 0) did_rate];

Learner’s check:

  • ‘ratings’ is now a 10 X 6 matrix
  • ‘did_rate’ is now a 10 X 6 matrix

Step 5: Mean Normalize All The Ratings

Once we get to Step 7: Minimize The Cost Function,  you may see why mean normalizing the ‘ratings‘ matrix is necessary.

What is mean normalization?

It is much easier to understand the ‘what’ if we understand the why. Why normalize the ‘ratings’ matrix?

Consider the following scenario:

A user (Christie) rated 0 movies. Our collaborative filtering algorithm that we are about to build will then go on to predict that Christie will rate all movies as 0. You may see why in the further steps when we cover the cost function and gradient descent. Don’t worry about it for now.

This is no good, because then we won’t be able to suggest Christie anything.  After all, a recommendation is simply based off of what movie(s) we predict the user to rate the highest.

So how do recommend a movie to a user who has never placed a rating?

We simply suggest the highest average rated movie. That’s the best we can do, since we know nothing about the user. This is made possible because of mean normalization.

What is mean normalization?

Mean normalization, in our case, is the process of making the average rating received by each movie equal to 0.

Take a look at our Step 1 example the ‘ratings’ matrix, again:


Each row represents all the ratings received by one movie. Here’s how to normalize a matrix:

  1. Find the average of the 1st row. In other words, find the average rating received by the first movie ‘Harold and Kumar Go To Guantanamo Bay’
  2. Subtract this average from each rating (entry) in the 1st row
  3. The first row has now been normalized. This row now has an average of 0.
  4. Repeat steps 1 & 2 for all rows.

Here is the implementation for mean normalization in Octave:

function [ratings_norm, ratings_mean] = normalizeRatings(ratings, did_rate)
[m, n] = size(ratings);
ratings_mean = zeros(m, 1);
ratings_norm = zeros(size(ratings));
for i = 1:m
% all the indexes where there is a 1
idx = find(did_rate(i, :) == 1);

%only finding the mean for which the user has rated
ratings_mean(i) = mean(ratings(i, idx));
ratings_norm(i, idx) = ratings(i, idx) - ratings_mean(i);


We can call this function and store the results into a 1 X 2 row vector.

[ratings, ratings_mean] = normalizeRatings(ratings, did_rate);

Learner’s check:

‘ratings’ contains the normalized ‘ratings’ matrix. Of course, it’s still a 10 X 6 matrix. Here it is below:

ratings =
1.25000 2.25000 -1.75000 0.00000 0.00000 -1.75000
0.00000 0.00000 0.00000 0.66667 2.66667 -3.33333
0.00000 0.00000 2.00000 0.00000 0.00000 -2.00000
0.00000 0.40000 0.40000 -1.60000 0.40000 0.40000
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
0.00000 -2.00000 0.00000 0.00000 0.00000 2.00000
0.00000 2.00000 0.00000 -2.00000 0.00000 0.00000
-1.33333 0.00000 0.00000 1.66667 -0.33333 0.00000
0.00000 0.00000 -0.66667 0.00000 -2.66667 3.33333
0.00000 0.00000 -2.50000 -0.50000 1.50000 1.50000

‘ratings_mean’ is a 10 X 1 column vector whose ith row contains the average rating of the ith movie. Here it is below:

ratings_mean =

Step 6: Collaborative Filtering via Linear Regression

If you are unfamiliar with how a linear regression works, these links should be helpful.

The simplest way to think about it is that we are simply fitting a line, (i.e) learning from to a scatter plot (in the case of a uni-linear regression):


In our case, we face a multi-linear regression problem. But don’t worry, we’ll briefly cover the intuition in a few seconds.

Helpful intuition : A user’s big preference for comedy movies (i.e 4.5/5) paired with a high movie’s ‘level of comedy’ (i.e 0.8/1) tends to be positively correlated with the user’s rating for that movie. For the most part, this correlation is continuos.

Conversely, a user’s hate for comedy (1/5), still paired with a high movie’s ‘level of comedy’ (i.e 0.8/1) tends to be negatively correlated with the user’s rating for that movie. This is another reason for mean normalization. If you notice in the ‘ratings’ matrix above, there are some negative ratings. These ratings are negative because they have been rated below average.

If you are familiar with a linear regression, you may know that the goal of a linear regression is to minimize the sum of squared errors (absolute difference between our predicted values and observed values), in order to come up with the best learning algorithm for predicting new outputs, or in the case of a uni-linear regression, the best ‘line of best fit’.

Note: In our case, we face a multi-linear regression problem, since we have many more than 1 feature.

A linear regression is associated with some cost function; our goal is to minimize this cost function (Step 7), and thus minimize the sum of squared errors.

A vectorized implementation of a linear regression is as follows:

Y = X * θT

Learner’s check:

  • θ is our parameter (user preferences, in our case) vector
  • X is our vector of features (movie features, in our case)

To fit our example, we can rename the variables as such:

ratings = movie_features * user_prefsT

We want to simultaneously find optimal values of movie_features and user_prefs such that the sum of squared errors (cost function) is minimized. How can we do this?

Step 7: Minimize The Cost Function

We will allow our collaborative filtering algorithm to simultaneously come up with the appropriate values of ‘movie_features’ and ‘user_prefs’, by minimizing the sum of squared errors, through a process called gradient descent. For our case, the gradient descent algorithm (function) we’ll be using in Octave is fmincg.

Note: If you are unfamiliar with gradient descent, worry not. All you need to understand is that gradient descent is an iterative algorithm that helps us minimize, in our specific case, the sum of squared errors. Consequently, we will have ‘learned’ the appropriate values of ‘user_prefs’ and ‘movie_features’ to make accurate predictions on movie ratings for every user.

We need to provide our fmincg function with 2 things: A cost function and it’s gradients (the slopes/ partial derivatives of cost function).

Here is my cost function, with regularization (to prevent overfitting, i.e high variance):

predictions = X * Theta';
difference = predictions - Y;
J = sum(difference(R==1) .^ 2) / 2;
thetaReg = sum(sum(Theta .^ 2)) * (lambda / 2);
xReg = sum(sum(X .^ 2)) * (lambda / 2);
J = J + thetaReg + xReg;

Learner’s check:

  • Remember, this code is inside of a function. So when this function is executed and it returns its matrix(s), the X matrix will hold the learned data (numbers) for ‘movie_features’ and the Theta matrix will hold the learned data (numbers) for ‘user_prefs’. You will see this shortly, if you are confused


Here is my gradient code:

for i = 1 : num_movies
withoutReg = ((X(i, :) * Theta' - Y(i, :)) .* R(i, :) * Theta);
reg = lambda * X(i, :);
X_grad(i, :) = withoutReg + reg;
for j = 1 : num_users
withoutReg = ((X * Theta(j, :)' - Y(:, j)) .* R(:, j))' * X;
reg = lambda * Theta(j, :);
Theta_grad(j, :) = withoutReg + reg;

And here is the full implementation of the entire function (calculating cost and its gradients):

function [J, grad] = costFunc(params, Y, R, num_users, num_movies, ...
num_features, lambda)
X = reshape(params(1:num_movies*num_features), num_movies, num_features);
Theta = reshape(params(num_movies*num_features+1:end), ...
num_users, num_features);
J = 0; % cost (sum of squared differences)
X_grad = zeros(size(X)); % (partial derviates of J with respect to X (movie_features))
Theta_grad = zeros(size(Theta)); % (partial derviates of J with respect to Theta (user_prefs))

% Cost function with regularization
predictions = X * Theta';
difference = predictions - Y;
J = sum(difference(R==1) .^ 2) / 2;

thetaReg = sum(sum(Theta .^ 2)) * (lambda / 2);
xReg = sum(sum(X .^ 2)) * (lambda / 2);

J = J + thetaReg + xReg;

% gradients
for i = 1 : num_movies
withoutReg = ((X(i, :) * Theta' - Y(i, :)) .* R(i, :) * Theta);
reg = lambda * X(i, :);
X_grad(i, :) = withoutReg + reg;

for j = 1 : num_users
withoutReg = ((X * Theta(j, :)' - Y(:, j)) .* R(:, j))' * X;
reg = lambda * Theta(j, :);
Theta_grad(j, :) = withoutReg + reg;

grad = [X_grad(:); Theta_grad(:)];


Before we actually execute this function, we need to initialize our parameters user_prefs (Theta) and movie_features (X) to random small numbers. To do this in Octave, I have used the randn function. This function returns a matrix of random elements that are normally distributed, with a mean of 0 and a variance of 1:

num_users = size(ratings, 2);
num_movies = size(ratings, 1);
num_features = 5;
% Initialize Parameters Theta (user_prefs), X (movie_features)
movie_features = randn(num_movies, num_features);
user_prefs = randn(num_users, num_features);
initial_parameters = [movie_features(:); user_prefs(:)];

Now, let’s set some options for our cost minimizing fmincg function:

options = optimset('GradObj', 'on', 'MaxIter', 100);

Finally, let’s run fmincg, which will consequently run costFunc 100 times. Notice, fmincg takes our costFunc function as an argument. This is what fmincg needs to minimize our cost function and calculate the best learning algorithm for predicting movie ratings:

lambda = 10; % regularization weight/parameter
optimal_prefs_and_features = fmincg (@(t)(costFunc(t, ratings, did_rate, num_users, num_movies, ...
num_features, lambda)), ...
initial_parameters, options);

Learner’s check:

  • If you are unfamiliar with regularization, you don’t need to worry about what lambda means.
  • optimal_prefs_and_features is the column vector returned from fmincg. It contains optimal values for user preferences and movie features that minimize our cost function

We need to extract ‘user_prefs’ and ‘movie_features’ from optimal_prefs_and_features, so we can start making some predictions:

movie_features = reshape(optimal_prefs_and_features(1:num_movies*num_features), num_movies, num_features);
user_prefs = reshape(optimal_prefs_and_features(num_movies*num_features+1:end), ...
num_users, num_features);

Step 8: Make Movie Predictions!…Finally

Recall Step 4: Let’s Rate Some Movies. We rated some movies. Now, let’s use our learning algorithm we just built to predict ratings that we would give movies, based on our learning algorithm, and our ‘my_ratings’ row vector:

all_predictions = movie_features * user_prefs';
my_predictions = all_predictions(:,1) + ratings_mean;

‘my_predictions’ is a 10 X 1 column vector:

my_predictions =

Learner’s check:

  • Recall in Step 5 where we mean normalized all the ‘ratings’. Since we subtracted the mean of the movie’s ratings from each rating for that movie, we added back ‘ratings_mean’ to our predicted ratings.

Let’s display our predictions:

[r, ix] = sort(my_predictions, 'descend');
fprintf('nTop recommendations for you:n');
for i=1:10
j = ix(i);
fprintf('Predicting rating %.1f for movie %sn', my_predictions(j), ...
fprintf('nnOriginal ratings provided:n');
for i = 1:length(my_ratings)
if my_ratings(i) > 0
fprintf('Rated %d for %sn', my_ratings(i), ...

The result looks as follows:

Top recommendations for you:
Predicting rating 9.6 for movie Straight Outta Compton (2015)
Predicting rating 8.0 for movie A Very Harold and Kumar Christmas (2011)
Predicting rating 8.0 for movie Notorious (2009)
Predicting rating 7.3 for movie Ted (2012)
Predicting rating 6.7 for movie Cinderella (2015)
Predicting rating 6.5 for movie Toy Story 3 (2010)
Predicting rating 6.0 for movie Frozen (2013)
Predicting rating 5.8 for movie Harold and Kumar Escape From Guantanamo Bay (2008)
Predicting rating 4.3 for movie Tangled (2010)
Predicting rating 4.0 for movie Get Rich Or Die Tryin' (2005)
Original ratings provided:
Rated 7 for Harold and Kumar Escape From Guantanamo Bay (2008)
Rated 8 for Notorious (2009)
Rated 3 for Tangled (2010)

Step 9: Take It Further

You should try to build your own recommendation engine. Perhaps not just for movies, but for anything else you can think of. We can’t always find what are looking for by ourselves. Sometimes a good recommendation is all we need.

Perhaps you can implement a clustering algorithm such as k-means or DBSCAN to group users with similar features together, and thereby recommend the same movies to users belonging to the same cluster.

In our example, the more you rate movie movies, the more ‘personalized’ (and possibly accurate) your recommendations will be. This is because you are giving the recommendation engine (learning algorithm) more of your data to observe and learn from.

So, maybe if you actually ‘Netflix and chill’ed more often, Netflix will know you better and make better movie recommendations for you 😉

Nikhil Bhaskar

*Original post here*

Source: Movie Recommendations? How Does Netflix Do It? A 9 Step Coding & Intuitive Guide Into Collaborative Filtering by nbhaskar

Aug 29, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents( failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15


Accuracy  Source


More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Is Your Data Product Ready for Launch? by analyticsweek

>> Are U.S. Hospitals Delivering a Better Patient Experience? by bobehayes

>> 50 UX Metrics, Methods, & Measurement Articles from 2018 by analyticsweek

Wanna write? Click Here


Master Statistics with R


In this Specialization, you will learn to analyze and visualize data in R and created reproducible data analysis reports, demonstrate a conceptual understanding of the unified nature of statistical inference, perform fre… more


On Intelligence


Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more


Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.


Q:Given two fair dices, what is the probability of getting scores that sum to 4? to 8?
A: * Total: 36 combinations
* Of these, 3 involve a score of 4: (1,3), (3,1), (2,2)
* So: 3/36=1/12
* Considering a score of 8: (2,6), (3,5), (4,4), (6,2), (5,3)
* So: 5/36



@AnalyticsWeek: Big Data Health Informatics for the 21st Century: Gil Alterovitz

 @AnalyticsWeek: Big Data Health Informatics for the 21st Century: Gil Alterovitz

Subscribe to  Youtube


Information is the oil of the 21st century, and analytics is the combustion engine. – Peter Sondergaard


#FutureOfData Podcast: Conversation With Sean Naismith, Enova Decisions

 #FutureOfData Podcast: Conversation With Sean Naismith, Enova Decisions


iTunes  GooglePlay


Bad data or poor data quality costs US businesses $600 billion annually.

Sourced from: Analytics.CLUB #WEB Newsletter

August 28, 2017 Health and Biotech analytics news roundup

Genome sequencing method can detect clinically relevant mutations using 5 CTCs: Researchers showed that a technique that can sequence very long stretches of the genome can accurately quantify mutations using only 5 ‘circulating tumor cells’ (although they used 34 in this study).

Artificial intelligence predicts dementia before onset of symptoms: Using only one scan of the brain per patient, McGill scientists were able to accurately predict Alzheimer’s 2 years before its onset.

Using machine learning to improve patient care: Two papers from MIT made strides in the field, one that used ICU data to predict necessary treatments and another that trained models of mortality and length of stay based on electronic health record data.

How CROs Are Helping With Healthcare’s Data Problem: Clinical trial costs are a major cause of rising health care costs. To help streamline this, pharmaceutical companies are increasingly using ‘contract research organizations’ to conduct trials, as they can use their expertise and specialized business intelligence tools to cut costs.

I was worried about artificial intelligence—until it saved my life: Krista Jones had a rare form of cancer that was only able to be correctly treated with machine learning technology.

Genomic Medicine Has Entered the Building: Some types of genome sequences now cost as much as an MRI, which has allowed organizations to undertake large-scale studies in personalized medicine.

Source: August 28, 2017 Health and Biotech analytics news roundup

The 37 best tools for data visualization

Creating charts and info graphics can be time-consuming. But these tools make it easier.

It’s often said that data is the new world currency, and the web is the exchange bureau through which it’s traded. As consumers, we’re positively swimming in data; it’s everywhere from labels on food packaging design to World Health Organisation reports. As a result, for the designer it’s becoming increasingly difficult to present data in a way that stands out from the mass of competing data streams.

One of the best ways to get your message across is to use a visualization to quickly draw attention to the key messages, and by presenting data visually it’s also possible to uncover surprising patterns and observations that wouldn’t be apparent from looking at stats alone.


Not a web designer or developer? You may prefer free tools for creating infographics.

As author, data journalist and information designer David McCandless said in his TED talk: “By visualizing information, we turn it into a landscape that you can explore with your eyes, a sort of information map. And when you’re lost in information, an information map is kind of useful.”

There are many different ways of telling a story, but everything starts with an idea. So to help you get started we’ve rounded up some of the most awesome data visualization tools available on the web.

01. Dygraphs

Help visitors explore dense data sets with JavaScript library Dygraphs

Dygraphs is a fast, flexible open source JavaScript charting library that allows users to explore and interpret dense data sets. It’s highly customizable, works in all major browsers, and you can even pinch to zoom on mobile and tablet devices.

02. ZingChart

ZingChart lets you create HTML5 Canvas charts and more

ZingChart is a JavaScript charting library and feature-rich API set that lets you build interactive Flash or HTML5 charts. It offer over 100 chart types to fit your data.

03. InstantAtlas

InstantAtlas enables you to create highly engaging visualisations around map data

If you’re looking for a data viz tool with mapping, InstantAtlas is worth checking out. This tool enables you to create highly-interactive dynamic and profile reports that combine statistics and map data to create engaging data visualizations.

04. Timeline

Timeline creates beautiful interactive visualizations

Timeline is a fantastic widget which renders a beautiful interactive timeline that responds to the user’s mouse, making it easy to create advanced timelines that convey a lot of information in a compressed space.

Each element can be clicked to reveal more in-depth information, making this a great way to give a big-picture view while still providing full detail.

05. Exhibit

Exhibit makes data visualization a doddle

Developed by MIT, and fully open-source, Exhibit makes it easy to create interactive maps, and other data-based visualizations that are orientated towards teaching or static/historical based data sets, such as flags pinned to countries, or birth-places of famous people.

06. Modest Maps

 Modest Maps
Integrate and develop interactive maps within your site with this cool tool

Modest Maps is a lightweight, simple mapping tool for web designers that makes it easy to integrate and develop interactive maps within your site, using them as a data visualization tool.

The API is easy to get to grips with, and offers a useful number of hooks for adding your own interaction code, making it a good choice for designers looking to fully customise their user’s experience to match their website or web app. The basic library can also be extended with additional plugins, adding to its core functionality and offering some very useful data integration options.

07. Leaflet

Use OpenStreetMap data and integrate data visualisation in an HTML5/CSS3 wrapper

Another mapping tool, Leaflet makes it easy to use OpenStreetMap data and integrate fully interactive data visualisation in an HTML5/CSS3 wrapper.

The core library itself is very small, but there are a wide range of plugins available that extend the functionality with specialist functionality such as animated markers, masks and heatmaps. Perfect for any project where you need to show data overlaid on a geographical projection (including unusual projections!).

08. WolframAlpha

 Wolfram Alpha
Wolfram Alpha is excellent at creating charts

Billed as a “computational knowledge engine”, the Google rival WolframAlpha is really good at intelligently displaying charts in response to data queries without the need for any configuration. If you’re using publically available data, this offers a simple widget builder to make it really simple to get visualizations on your site.

09. makes data visualization as simple as it can be is a combined gallery and infographic generation tool. It offers a simple toolset for building stunning data representations, as well as a platform to share your creations. This goes beyond pure data visualisation, but if you want to create something that stands on its own, it’s a fantastic resource and an info-junkie’s dream come true!

10. Visualize Free

 Visualize Free
Make visualizations for free!

Visualize Free is a hosted tool that allows you to use publicly available datasets, or upload your own, and build interactive visualizations to illustrate the data. The visualizations go well beyond simple charts, and the service is completely free plus while development work requires Flash, output can be done through HTML5.

11. Better World Flux

 Better World Flux
Making the ugly beautiful – that’s Better World Flux

Orientated towards making positive change to the world, Better World Flux has some lovely visualizations of some pretty depressing data. It would be very useful, for example, if you were writing an article about world poverty, child undernourishment or access to clean water. This tool doesn’t allow you to upload your own data, but does offer a rich interactive output.

12. FusionCharts

FusionCharts Suite XT
A comprehensive JavaScript/HTML5 charting solution for your data visualization needs

FusionCharts Suite XT brings you 90+ charts and gauges, 965 data-driven maps, and ready-made business dashboards and demos. FusionCharts comes with extensive JavaScript API that makes it easy to integrate it with any AJAX application or JavaScript framework. These charts, maps and dashboards are highly interactive, customizable and work across all devices and platforms. They also have a comparison of the top JavaScript charting libraries which is worth checking out.

13. jqPlot

jqPlot is a nice solution for line and point charts

Another jQuery plugin, jqPlot is a nice solution for line and point charts. It comes with a few nice additional features such as the ability to generate trend lines automatically, and interactive points that can be adjusted by the website visitor, updating the dataset accordingly.

14. Dipity

Dipity has free and premium versions to suit your needs

Dipity allows you to create rich interactive timelines and embed them on your website. It offers a free version and a premium product, with the usual restrictions and limitations present. The timelines it outputs are beautiful and fully customisable, and are very easy to embed directly into your page.

15. Many Eyes

 Many Eyes
Many Eyes was developed by IBM

Developed by IBM, Many Eyes allows you to quickly build visualizations from publically available or uploaded data sets, and features a wide range of analysis types including the ability to scan text for keyword density and saturation. This is another great example of a big company supporting research and sharing the results openly.

16. D3.js

You can render some amazing diagrams with D3

D3.js is a JavaScript library that uses HTML, SVG, and CSS to render some amazing diagrams and charts from a variety of data sources. This library, more than most, is capable of some seriously advanced visualizations with complex data sets. It’s open source, and uses web standards so is very accessible. It also includes some fantastic user interaction support.

17. JavaScript InfoVis Toolkit

 JavaScript InfoVis Toolkit
JavaScript InfoVis Toolkit includes a handy modular structure

A fantastic library written by Nicolas Belmonte, the JavaScript InfoVis Toolkit includes a modular structure, allowing you to only force visitors to download what’s absolutely necessary to display your chosen data visualizations. This library has a number of unique styles and swish animation effects, and is free to use (although donations are encouraged).

18. jpGraph

jpGraph is a PHP-based data visualization tool

If you need to generate charts and graphs server-side, jpGraph offers a PHP-based solution with a wide range of chart types. It’s free for non-commercial use, and features extensive documentation. By rendering on the server, this is guaranteed to provide a consistent visual output, albeit at the expense of interactivity and accessibility.

19. Highcharts

Highcharts has a huge range of options available

Highcharts is a JavaScript charting library with a huge range of chart options available. The output is rendered using SVG in modern browsers and VML in Internet Explorer. The charts are beautifully animated into view automatically, and the framework also supports live data streams. It’s free to download and use non-commercially (and licensable for commercial use). You can also play with the extensive demos using JSFiddle.

20. Google Charts

 Google Charts
Google Charts has an excellent selection of tools available

The seminal charting solution for much of the web, Google Charts is highly flexible and has an excellent set of developer tools behind it. It’s an especially useful tool for specialist visualizations such as geocharts and gauges, and it also includes built-in animation and user interaction controls.

21. Excel

It isn’t graphically flexible, but Excel is a good way to explore data: for example, by creating ‘heat maps’ like this one

You can actually do some pretty complex things with Excel, from ‘heat maps’ of cells to scatter plots. As an entry-level tool, it can be a good way of quickly exploring data, or creating visualizations for internal use, but the limited default set of colours, lines and styles make it difficult to create graphics that would be usable in a professional publication or website. Nevertheless, as a means of rapidly communicating ideas, Excel should be part of your toolbox.

Excel comes as part of the commercial Microsoft Office suite, so if you don’t have access to it, Google’s spreadsheets – part ofGoogle Docs and Google Drive – can do many of the same things. Google ‘eats its own dog food’, so the spreadsheet can generate the same charts as the Google Chart API. This will get your familiar with what is possible before stepping off and using the API directly for your own projects.


CSV (Comma-Separated Values) and JSON (JavaScript Object Notation) aren’t actual visualization tools, but they are common formats for data. You’ll need to understand their structures and how to get data in or out of them.

23. Crossfilter

Crossfilter in action: by restricting the input range on any one chart, data is affected everywhere. This is a great tool for dashboards or other interactive tools with large volumes of data behind them

As we build more complex tools to enable clients to wade through their data, we are starting to create graphs and charts that double as interactive GUI widgets. JavaScript library Crossfilter can be both of these. It displays data, but at the same time, you can restrict the range of that data and see other linked charts react.

24. Tangle

Tangle creates complex interactive graphics. Pulling on any one of the knobs affects data throughout all of the linked charts. This creates a real-time feedback loop, enabling you to understand complex equations in a more intuitive way

The line between content and control blurs even further with Tangle. When you are trying to describe a complex interaction or equation, letting the reader tweak the input values and see the outcome for themselves provides both a sense of control and a powerful way to explore data. JavaScript library Tangle is a set of tools to do just this.

Dragging on variables enables you to increase or decrease their values and see an accompanying chart update automatically. The results are only just short of magical.

25. Polymaps

Aimed more at specialist data visualisers, the Polymaps library creates image and vector-tiled maps using SVG

Polymaps is a mapping library that is aimed squarely at a data visualization audience. Offering a unique approach to styling the the maps it creates, analagous to CSS selectors, it’s a great resource to know about.

26. OpenLayers

It isn’t easy to master, but OpenLayers is arguably the most complete, robust mapping solution discussed here

OpenLayers is probably the most robust of these mapping libraries. The documentation isn’t great and the learning curve is steep, but for certain tasks nothing else can compete. When you need a very specific tool no other library provides, OpenLayers is always there.

27. Kartograph

Kartograph’s projections breathe new life into our standard slippy maps

Kartograph’s tag line is ‘rethink mapping’ and that is exactly what its developers are doing. We’re all used to the Mercator projection, but Kartograph brings far more choices to the table. If you aren’t working with worldwide data, and can place your map in a defined box, Kartograph has the options you need to stand out from the crowd.

28. CartoDB

CartoDB provides an unparalleled way to combine maps and tabular data to create visualisations

CartoDB is a must-know site. The ease with which you can combine tabular data with maps is second to none. For example, you can feed in a CSV file of address strings and it will convert them to latitudes and longitudes and plot them on a map, but there are many other users. It’s free for up to five tables; after that, there are monthly pricing plans.

29. Processing

Processing provides a cross-platform environment for creating images, animations, and interactions

Processing has become the poster child for interactive visualizations. It enables you to write much simpler code which is in turn compiled into Java.

There is also a Processing.js project to make it easier for websites to use Processing without Java applets, plus a port to Objective-C so you can use it on iOS. It is a desktop application, but can be run on all platforms, and given that it is now several years old, there are plenty of examples and code from the community.

30. NodeBox

NodeBox is a quick, easy way for Python-savvy developers to create 2D visualisations

NodeBox is an OS X application for creating 2D graphics and visualizations. You need to know and understand Python code, but beyond that it’s a quick and easy way to tweak variables and see results instantly. It’s similar to Processing, but without all the interactivity.

31. R

A powerful free software environment for statistical computing and graphics, R is the most complex of the tools listed here

How many other pieces of software have an entire search enginededicated to them? A statistical package used to parse large data sets, R is a very complex tool, and one that takes a while to understand, but has a strong community and package library, with more and more being produced.

The learning curve is one of the steepest of any of these tools listed here, but you must be comfortable using it if you want to get to this level.

32. Weka

A collection of machine-learning algorithms for data-mining tasks, Weka is a powerful way to explore data

When you get deeper into being a data scientist, you will need to expand your capabilities from just creating visualizations to data mining. Weka is a good tool for classifying and clustering data based on various attributes – both powerful ways to explore data – but it also has the ability to generate simple plots.

33. Gephi

Gephi in action. Coloured regions represent clusters of data that the system is guessing are similar

When people talk about relatedness, social graphs and co-relations, they are really talking about how two nodes are related to one another relative to the other nodes in a network. The nodes in question could be people in a company, words in a document or passes in a football game, but the maths is the same.

Gephi, a graph-based visualiser and data explorer, can not only crunch large data sets and produce beautiful visualizations, but also allows you to clean and sort the data. It’s a very niche use case and a complex piece of software, but it puts you ahead of anyone else in the field who doesn’t know about this gem.

34. iCharts

iCharts can have interactive elements, and you can pull in data from Google Docs

The iCharts service provides a hosted solution for creating and presenting compelling charts for inclusion on your website. There are many different chart types available, and each is fully customisable to suit the subject matter and colour scheme of your site.

Charts can have interactive elements, and can pull data from Google Docs, Excel spreadsheets and other sources. The free account lets you create basic charts, while you can pay to upgrade for additional features and branding-free options.

35. Flot

Create animated visualisations with this jQuery plugin

Flot is a specialised plotting library for jQuery, but it has many handy features and crucially works across all common browsers including Internet Explorer 6. Data can be animated and, because it’s a jQuery plugin, you can fully control all the aspects of animation, presentation and user interaction. This does mean that you need to be familiar with (and comfortable with) jQuery, but if that’s the case, this makes a great option for including interactive charts on your website.

36. Raphaël

This handy JavaScript library offers a range of data visualisation options

This handy JavaScript library offers a wide range of data visualization options which are rendered using SVG. This makes for a flexible approach that can easily be integrated within your own web site/app code, and is limited only by your own imagination.

That said, it’s a bit more hands-on than some of the other tools featured here (a victim of being so flexible), so unless you’re a hardcore coder, you might want to check out some of the more point-and-click orientated options first!

37. jQuery Visualize

 JQuery Visualise
jQuery Visualize Plugin is an open source charting plugin

Written by the team behind jQuery’s ThemeRoller and jQuery UI websites, jQuery Visualize Plugin is an open source charting plugin for jQuery that uses HTML Canvas to draw a number of different chart types. One of the key features of this plugin is its focus on achieving ARIA support, making it friendly to screen-readers. It’s free to download from this page on GitHub.

Further reading

  • A great Tumblr blog for visualization examples and
  • Nicholas Felton’s annual reports are now infamous, but he also has a Tumblr blog of great things he finds.
  • From the guy who helped bring Processing into the
  • Stamen Design is always creating interesting
  • Eyeo Festival brings some of the greatest minds in data visualization together in one place, and you can watch the videos online.

Brian Suda is a master informatician and author of Designing with Data, a practical guide to data visualisation.

Originally posted via “The 37 best tools for data visualization”