Sep 26, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Conditional Risk  Source

[ AnalyticsWeek BYTES]

>> Big data solves mystery: Why humans have no more genes than worms by analyticsweekpick

>> Making Big Data Work: Supply Chain Management by analyticsweekpick

>> @Schmarzo @DellEMC on Ingredients of healthy #DataScience practice #FutureOfData #Podcast by v1shal

Wanna write? Click Here

[ FEATURED COURSE]

Probability & Statistics

image

This course introduces students to the basic concepts and logic of statistical reasoning and gives the students introductory-level practical ability to choose, generate, and properly interpret appropriate descriptive and… more

[ FEATURED READ]

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

image

In the world’s top research labs and universities, the race is on to invent the ultimate learning algorithm: one capable of discovering any knowledge from data, and doing anything we want, before we even ask. In The Mast… more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:How frequently an algorithm must be updated?
A: You want to update an algorithm when:
– You want the model to evolve as data streams through infrastructure
– The underlying data source is changing
– Example: a retail store model that remains accurate as the business grows
– Dealing with non-stationarity

Some options:
– Incremental algorithms: the model is updated every time it sees a new training example
Note: simple, you always have an up-to-date model but you can’t incorporate data to different degrees.
Sometimes mandatory: when data must be discarded once seen (privacy)
– Periodic re-training in “batch” mode: simply buffer the relevant data and update the model every-so-often
Note: more decisions and more complex implementations

How frequently?
– Is the sacrifice worth it?
– Data horizon: how quickly do you need the most recent training example to be part of your model?
– Data obsolescence: how long does it take before data is irrelevant to the model? Are some older instances
more relevant than the newer ones?
Economics: generally, newer instances are more relevant than older ones. However, data from the same month, quarter or year of the last year can be more relevant than the same periods of the current year. In a recession period: data from previous recessions can be more relevant than newer data from different economic cycles.

Source

[ VIDEO OF THE WEEK]

Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

 Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

You can use all the quantitative data you can get, but you still have to distrust it and use your own intelligence and judgment. – Alvin Tof

[ PODCAST OF THE WEEK]

Want to fix #DataScience ? fix #governance by @StephenGatchell @Dell #FutureOfData #Podcast

 Want to fix #DataScience ? fix #governance by @StephenGatchell @Dell #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

A quarter of decision-makers surveyed predict that data volumes in their companies will rise by more than 60 per cent by the end of 2014, with the average of all respondents anticipating a growth of no less than 42 per cent.

Sourced from: Analytics.CLUB #WEB Newsletter

What Your Social Data Knows About You – @SethS_D Author #NYTBestSeller Everybody Lies

What Your Social Data Knows About You – @SethS_D Author #NYTBestSeller Everybody Lies #FutureJobs#JobsOfFuture #Podcast

In this podcast Seth Stephens-Davidowitz (@SethD_S), author of New York Times Bestseller Everybody Lies, discussed what our social data knows about us. He shares some critical insights into human psyche on how humans behaves differently to machines then fellow humans. This shed some interesting light on how #JobsOfFuture would use our social and technology interactions to create experience that best represent and benefit us. He shared some insights into what future of work would look like. He sheds some insights into how businesses could use data to create a great experience for our employees, workers, clients, and partners. This is a great podcast for anyone looking to understand the depth of insights that data could create.

Seth’s Book:
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz amzn.to/2OA0YBs

Seth’s Recommended Read:
Enlightenment Now: The Case for Reason, Science, Humanism, and Progress by Steven Pinker amzn.to/2Kl2nsr

Podcast Link:
iTunes: math.im/jofitunes
Youtube: math.im/jofyoutube

Seth’s BIO:
Seth Stephens-Davidowitz has used data from the internet — particularly Google searches — to get new insights into the human psyche.

Seth has used Google searches to measure racism, self-induced abortion, depression, child abuse, hateful mobs, the science of humor, sexual preference, anxiety, son preference, and sexual insecurity, among many other topics.

His 2017 book, Everybody Lies, published by HarperCollins, was a New York Times bestseller; a PBS NewsHour Book of the Year; and an Economist Book of the Year.

Seth worked for one-and-a-half years as a data scientist at Google and is currently a contributing op-ed writer for the New York Times. He is a former visiting lecturer at the Wharton School at the University of Pennsylvania.
He received his BA in philosophy, Phi Beta Kappa, from Stanford, and his PhD in economics from Harvard.

In high school, Seth wrote obituaries for the local newspaper, the Bergen Record, and was a juggler in theatrical shows. He now lives in Brooklyn and is a passionate fan of the Mets, Knicks, Jets, Stanford football, and Leonard Cohen.

About #Podcast:
#JobsOfFuture is created to spark the conversation around the future of work, worker and workplace. This podcast invite movers and shakers in the industry who are shaping or helping us understand the transformation in work.

Wanna Join?
If you or any you know wants to join in,
Register your interest @ play.analyticsweek.com/guest/

Want to sponsor?
Email us @ info@analyticsweek.com

Keywords:
#JobsOfFuture #FutureOfWork #FutureOfWorker #FutuerOfWorkplace #Work #Worker #Workplace

Originally Posted at: What Your Social Data Knows About You – @SethS_D Author #NYTBestSeller Everybody Lies

Are APIs becoming the keys to customer experience?

In recent years, APIs have encouraged the emergence of new services by facilitating collaboration between applications and databases of one or more companies. Beyond catalyzing innovation, APIs have also revolutionized the customer-company relationship, allowing it to provide an accurate and detailed picture of the consumer at a time when a quality customer experience now counts as much as the price or capabilities of the product.

APIs: A Bridge Between the Digital and Physical World

Over the years, customer relationship channels have multiplied with consumers who can interact with their brands through stores, voice, email, mobile applications, the web or chatbots. The multiple points of interaction used by customers have made its journey more complex, forcing companies to consider data from these many channels to deliver the most seamless customer experience possible. To do this, they must synchronize data from one channel to another and cross-reference data related to its history with the brand. This is where APIs come into play. These interfaces allow data processing to refine customer knowledge and deliver a personalized experience.

Thanks to a 360° customer view, the digital experience can be extended in store. The API acts as a bridge between the digital and physical world.

The APIs also allow organizations to work with data in a more operational way and especially in real time. However, many companies still treat their loyal customers as if they’ve never interacted before. It is therefore not uncommon to have to reappear after several requests or to retrace the history of previous interactions, which can seriously damage the customer relationship.

The challenge for companies is to deliver a seamless, consistent and personalized experience through real-time analysis. This will provide relevant information to account managers during interaction and allow them to have guidance on the next best action to take, in line with the client’s expectations.

Even better, with APIs, we can predict the customer’s buying behavior and suggest services or products that meet their needs. Indeed, with the data collected, and thanks to the use of artificial intelligence, the cross-tabulations and instant analysis make it possible to refine the selection to offer an increasingly relevant and fluid experience, increasing customer loyalty and thus the economic performance of companies.

The Importance of APIs with GDPR

Recently, there has been a trend to empower consumers to control their data, after new regulations such as the European Payment Services Directive (PSD2) and GDPR came into force in May 2018.

What do they have in common? They both give individuals control over their personal data with the ability to request, delete or share it with other organizations. Thus, within the framework of PSD2, it is now possible to manage your bank account or issue payments through an application that is not necessarily that of your bank. Through this, APIs provide companies the opportunity to offer a dedicated portal to their customers to enable them to manage their data autonomously and offer new, innovative payment services.

For its part, companies will be able to better manage governance and the risks of fraudulent access to data. With an API, a company can proactively detect abnormal or even suspicious data access behaviors in near real time.

APIs are the gateways between companies and their business data and are the answer to real needs that the market is beginning to meet with customer experience. However, many organizations have not yet understood the importance of implementing an API strategy, an essential part of digital transformation, as well as the cloud, and the emergence of increasingly data-driven organizations. APIs are the missing link between data and customer experience — a key companies need to start using.

Ready to Learn More? 

<< Watch the webinar on-demand “APIs for Dummies” >>

The post Are APIs becoming the keys to customer experience? appeared first on Talend Real-Time Open Source Data Integration Software.

Originally Posted at: Are APIs becoming the keys to customer experience? by analyticsweekpick

Sep 19, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Data Mining  Source

[ AnalyticsWeek BYTES]

>> The What and Where of Big Data: A Data Definition Framework by bobehayes

>> Do Attitudes Predict Behavior? by analyticsweek

>> Discussing #InfoSec with @travturn @hrbrmstr @thebeareconomist @yaxa_io – Playcast – Data Analytics Leadership Playbook Podcast by v1shal

Wanna write? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

image

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored f… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE Q&A]

Q:Explain Tufte’s concept of ‘chart junk’?
A: All visuals elements in charts and graphs that are not necessary to comprehend the information represented, or that distract the viewer from this information

Examples of unnecessary elements include:
– Unnecessary text
– Heavy or dark grid lines
– Ornamented chart axes
– Pictures
– Background
– Unnecessary dimensions
– Elements depicted out of scale to one another
– 3-D simulations in line or bar charts

Source

[ VIDEO OF THE WEEK]

Venu Vasudevan @VenuV62 (@ProcterGamble) on creating a rockstar data science team #FutureOfData #Podcast

 Venu Vasudevan @VenuV62 (@ProcterGamble) on creating a rockstar data science team #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It’s easy to lie with statistics. It’s hard to tell the truth without statistics. – Andrejs Dunkels

[ PODCAST OF THE WEEK]

@DrewConway on fabric of an IOT Startup #FutureOfData #Podcast

 @DrewConway on fabric of an IOT Startup #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Every person in the US tweeting three tweets per minute for 26,976 years.

Sourced from: Analytics.CLUB #WEB Newsletter

What Are the 3 Critical Keys to Healthcare Big Data Analytics?

sleep-opener1

Healthcare big data analytics isn’t just a “use it or lose it” proposition for the provider community – it’s quickly becoming a “use it if you want to hold on to anything at all” situation for organizations that must invest in population health management, clinical analytics, and risk stratification if they are to succeed in a value-based reimbursement world.

Maintaining market share during this shift away from the simpler cash transactions of a fee-for-service environment requires organizations to take a proactive dive into their financial and clinical data, yet developing the technological and organizational competencies to take advantage of big data tools is just as complex as it sounds.

Healthcare big data analytics

Despite the vital importance of using big data to describe, predict, and prevent costly events in large patient populations, providers of all types and sizes are struggling to collect, categorize, store, retrieve, and analyze their data assets.

In a recent industry poll, Stoltenberg Consulting found that big data confuses half of providers, and six percent of participants were too intimidated by the process to even consider starting a healthcare big data analytics program.

Why is big data analytics such a difficult topic to tackle for healthcare organizations?  How do successful providers begin the process?  In this article, healthcare stakeholders weigh in on the three most important foundational steps for beginning a big data analytics and population health management program.

Define a direction and outline specific goals

Big data may be nearly infinite in scope, but having data for the sake of having data will not help achieve measurable organizational objectives.  Healthcare providers must start off their big data journey by defining clear, bite-sized problems that need solving.  Often, these problems are the “low hanging fruit” of healthcare operations: preventable readmissions, emergency department overuse, chronic disease management, patient engagement, and primary care screening rates.

“I would highly encourage people to start around specific high-value use cases,” suggested Marc Perlman, Global Vice President of Healthcare and Life Sciences at Oracle during a 2014 interview.  “I think the average hospital or health system has over 400 different interfaces and integration points, but it may take seven of them working together to give you some value out of your data.  Hospitals and health systems need to think about what they’re trying to fix, and then based upon that, what data sources they need.”

“We think it makes a lot of sense to think about what they are going to try to accomplish, what the methods are that they’re trying to drive, and how they are going to focus on sustainability,” he continued.  “I would say the most important thing is have a vision of your solution, what you’re trying to fix, and know where you’re going.  And then the technology will follow.”

Successful healthcare organizations often flag financial pain points as the gateway into big data analytics.  A March 2015 survey found that 59 percent of hospitals identified higher-than-necessary costs of care as one of their top motivating factors for implementing big data analytics.  Organizations are also seeking the clinical and financial insights necessary to engage in pay-for-performance reimbursement structures and combat the pressures of accountable care.

To do this, organizations implemented clinical analytics technologies that integrate EHR data and patient outcomes into a rich portrait of a patient’s journey through the care continuum.  More than 60 percent of organizations that took this approach have been able to improve their 30-day preventable readmissions rates and cut their mortality rates.

“Start out slow and have realistic goals,” says Dr. Robert M. Fishman, DO, FACP, who has helped to lead Valley Health Partners to the highest level of patient-centered medical home (PCMH) recognition thanks to investments in health IT and a fundamental attitude shift within the physician hospital organization (PHO).  “And when you achieve those goals, set out new goals.”

“We set up a very modest program where we would pick a couple of diagnoses in internal medicine and a couple of diagnoses in pediatrics and we would begin to set up policies and think in a more patient-centric way to reach certain goals,” he said.

“We concentrated on congestive heart failure and COPD, because those two conditions produce a lot of readmissions and emergency department visits.  There’s a lot of expense.  If we could really get a handle on those conditions, we could improve care, improve outcomes and decrease costs.  Three things that we’re all very interested in.”

By focusing on solving specific use cases for big data analytics, these organizations have proven that small big data investments are worth larger ones in the future.

Measure twice, cut once to plan a health IT infrastructure

After outlining a strategy, healthcare organizations can start making investments in advanced health IT products to support their efforts.  While the vast majority of providers already have a basic EHR infrastructure in place – and EHRs are increasingly coming packaged with population health management tools that meet many big data needs – sometimes more sophisticated products are required to get the ball rolling.

But crafting an interoperable health IT infrastructure with a high degree of flexibility and usability is a difficult ask.  Stakeholders are only focusing on the desperate need for interoperability between data systems because it is so rare to find an ecosystem of well-integrated vendor products that communicate freely with one another.

Many organizations continue to be challenged by their convoluted legacy systems, and most providers cannot afford to rip out and replace an entire infrastructure.  For those lucky few starting from scratch, 2015 is a good time to get into the big data analytics game.

As vendors focus on developing plug-and-play technologies and health information exchanges take on more mature roles in the local provider community, crafting an interoperable health IT system is easier than ever – if the right rules are on the table from the beginning.

“When it comes to doing big data analytics, you have to have two things right off the bat,” stated Richard C. Howe, PhD, FHIMSS, Executive Director of the North Texas Regional Extension Center, which also functions as the Dallas-Fort Worth region’s primary HIE.

“The first is strong governance. You have to get all the participants in the same room so they can determine what data they are going to contribute.  Get the absolute governance structure outlined at the beginning so you know what direction you’re trying to go.”

“The second thing is to start simple.  If you start off with thousands of different data elements, you’re just going to drown in the data before you see any results.  We started with claims data, and we found that there is a lot of really good information there that has been valuable to our hospital members even before we started adding more clinical information.  I would say good governance has to start simple.”

An organization’s data governance plan can make or break a big data analytics project: the old “garbage in, garbage out” rule will always apply.  Even the most robust tools require their users tounderstand their potential and their limitations, both of which rely on the quality of data moving through the system.

Understanding the scope of health IT tools, the data standards they are built upon, and the data integrity requirements of leveraging health IT for actionable insights will help organizations pick the best products for their needs in the long term, not just in the next six months to a year.

“When it comes to tools, providers should consider products that can give them the functionality that they need to answer the questions they’re asking today, but can also grow with them to continue answering those questions three years and five years down the road,” said Shane Pilcher, Vice President at Stoltenberg Consulting.

“They have to be thinking three, five and maybe even ten years down the road in terms of what they anticipate their questions are going to be, so that they know they’re on the right track with collecting the data that’s going to answer them.

“Even from the EHR perspective, this is where that long-term plan comes into play.  They know the type of questions that they’re looking for today,” he added.  “They need to anticipate the type of questions they’re going to asking in the future.  But in most cases, you don’t know what you don’t know, so you’ve got to be as creative, as imaginative as you can today when you’re setting up your roadmap.  That’s going to give you the information that you need to start defining what used to be collected today in the EHR and what you need to grow.”

A far-sighted approach to big data analytics may help organizations avoid or mitigate some of the interoperability problems that have plagued the industry for so long.  Investing in products that encourage health information exchange through standardized data elements will make analytics easier and ensure that organizations are set up for meeting ongoing mandates such as meaningful use.

Ensure support from executives and buy-in from clinical staff

No healthcare big data analytics plan can succeed without enthusiasm and support from all levels of the organization.  The board room must provide the funding and the direction; the clinical end-users must understand and embrace new technologies and new workflows.  Big data analytics isn’t just an IT project, but an organizational transformation from top to bottom.

Starting small, measuring results, and demonstrating improvement is often the key to securing executive buy-in, Pilcher says.  “Once you start picking up traction, you can start to identify that low-hanging fruit that can lead to cost savings and improve patient care.  These are cost savings that go directly to the bottom line of the organization and also show return on investment.  By being able to show ROI, the administration may be more inclined to invest more time, more labor, and more capital to further enhance the program and go after bigger and greater fruits.”

And executive leaders may not take too much convincing these days.  Despite the fact that more than a third of providers feel that a lack of leadership is a major barrier to big data analytics success, executives largely recognize the critical role that data competency will play in the immediate future.

Eighty-nine percent of hospital executives participating in a recent PwC poll are taking action to become more innovative and nimble through big data analytics adoption, and 95 percent are seeking to harness the potential of analytics technologies to extract actionable insights from their big data.

During the HIMSS15 Leadership Survey, three-quarters of organizations agreed with the notion that health IT is vital for achieving strategic goals and improvements in patient care.  Over half believe that health IT has helped them improve their population health management programs.   More than four in ten respondents think their executive leaders have a “fairly sophisticated understanding” of big data analytics technologies and the need to leverage them.

C-suite leaders are among the most likely to express an intention to purchase data analytics tools, with Chief Information Officers and Chief Medical Information Officers being the most eager to invest in new health IT products.  Even Chief Financial Officers are recognizing the fundamental need for analytics infrastructure to cut costs, raise revenues, and utilize resources more appropriately.

Getting clinicians to understand why their workflows are suddenly changing can be even more complicated than securing funds to purchase new tools, however.  It is important for providers todevelop a multi-disciplinary team for big data analytics: one that includes representatives from all areas of the organization.

Clinical champions can help to explain to their peers why certain tasks are changing, why new metrics may be pointing out flaws in the patient care process, and why it is important to adapt to an evolving health IT landscape.  Above all, both executive leaders and staff-level super users must be able to point to clear and immediate benefits when introducing a new tool, or risk rebellion among dissatisfied clinicians.

“When we are talking about big data, I think there needs to be a clear purpose,” says Tina Esposito, Vice President of the Center for Health Information Services at Advocate Health Care.  “There has to be a core need or a well-defined problem that you are trying to solve.”

“Big data is a means to an end for solving problems.  So you got to be very clear that you are not pulling this data together just to do put it together. There has got to be a focused effort from the right people to leverage that information so that ultimately you are supporting the business and your population health goals.”

“You need to be sure that what you are creating is usable in the most efficient and easiest way, and that it makes a positive impact on clinicians,” she said. “Is the clinician leveraging that intelligence that you are providing as part of their workflow in the EHR?  Are they seeing a benefit from it?  That’s going to be the most important piece of any big data project.”

Note: This article originally appeared in Health IT Analytics. Click for link here.

Source: What Are the 3 Critical Keys to Healthcare Big Data Analytics?

Sep 12, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Data Mining  Source

[ AnalyticsWeek BYTES]

>> Which Customer Loyalty Metric is the Best? My Interview with Jeff Olsen of Allegiance Radio by bobehayes

>> For Musicians and Songwriters, Streaming Creates Big Data Challenge by analyticsweekpick

>> 3 Emerging Big Data Careers in an IoT-Focused World by kmartin

Wanna write? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

Storytelling with Data: A Data Visualization Guide for Business Professionals

image

Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You’ll discover the power of storytelling and the way to make data a pivotal point in your story. Th… more

[ TIPS & TRICKS OF THE WEEK]

Data Have Meaning
We live in a Big Data world in which everything is quantified. While the emphasis of Big Data has been focused on distinguishing the three characteristics of data (the infamous three Vs), we need to be cognizant of the fact that data have meaning. That is, the numbers in your data represent something of interest, an outcome that is important to your business. The meaning of those numbers is about the veracity of your data.

[ DATA SCIENCE Q&A]

Q:When you sample, what bias are you inflicting?
A: Selection bias:
– An online survey about computer use is likely to attract people more interested in technology than in typical

Under coverage bias:
– Sample too few observations from a segment of population

Survivorship bias:
– Observations at the end of the study are a non-random set of those present at the beginning of the investigation
– In finance and economics: the tendency for failed companies to be excluded from performance studies because they no longer exist

Source

[ VIDEO OF THE WEEK]

@RCKashyap @Cylance on State of Security & Technologist Mindset #FutureOfData #Podcast

 @RCKashyap @Cylance on State of Security & Technologist Mindset #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The temptation to form premature theories upon insufficient data is the bane of our profession. – Sherlock Holmes

[ PODCAST OF THE WEEK]

@ReshanRichards on creating a learning startup for preparing for #FutureOfWork #JobsOfFuture #Podcast

 @ReshanRichards on creating a learning startup for preparing for #FutureOfWork #JobsOfFuture #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

29 percent report that their marketing departments have ‘too little or no customer/consumer data.’ When data is collected by marketers, it is often not appropriate to real-time decision making.

Sourced from: Analytics.CLUB #WEB Newsletter

Sep 05, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Weak data  Source

[ AnalyticsWeek BYTES]

>> How to Win Business using Marketing Data [infographics] by v1shal

>> Serverless: A Game Changer for Data Integration by analyticsweekpick

>> Self-Reported Intentions vs Actual Behaviors: Comparing Two Employee Turnover Metrics by bobehayes

Wanna write? Click Here

[ FEATURED COURSE]

R, ggplot, and Simple Linear Regression

image

Begin to use R and ggplot while learning the basics of linear regression… more

[ FEATURED READ]

The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t

image

People love statistics. Statistics, however, do not always love them back. The Signal and the Noise, Nate Silver’s brilliant and elegant tour of the modern science-slash-art of forecasting, shows what happens when Big Da… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE Q&A]

Q:Why is naive Bayes so bad? How would you improve a spam detection algorithm that uses naive Bayes?
A: Naïve: the features are assumed independent/uncorrelated
Assumption not feasible in many cases
Improvement: decorrelate features (covariance matrix into identity matrix)

Source

[ VIDEO OF THE WEEK]

Unraveling the Mystery of #BigData

 Unraveling the Mystery of #BigData

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world. – Atul Butte, Stanford

[ PODCAST OF THE WEEK]

@JohnNives on ways to demystify AI for enterprise #FutureOfData #Podcast

 @JohnNives on ways to demystify AI for enterprise #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

And one of my favourite facts: At the moment less than 0.5% of all data is ever analysed and used, just imagine the potential here.

Sourced from: Analytics.CLUB #WEB Newsletter

April 17, 2017 Health and Biotech analytics news roundup

Introducing Verily Study Watch: The device has multiple sensors, a long battery life, and a large capacity for storage. It is not available to the public but is currently being used in clinical studies.

Sansoro Health raises $5.2 million led by Bain Capital Ventures: The company’s product allows a link between customers and electronic health records.

Hospital cuts costly falls by 39% due to predictive analytics: The system, at Camino Hospital in California, flags high-risk patients at admission, then continually updates the risk throughout their stay.

Winning with analytics in the pharmaceutical industry: The industry can use analytics to improve efficiency and reduce costs across the business.

Why HIT tools can help organizations navigate the challenges of growth: Most health systems have implemented electronic health records. Along with improving care, these records can help distribute patients throughout a large system and better administer transfers.

Originally Posted at: April 17, 2017 Health and Biotech analytics news roundup

Movie Recommendations? How Does Netflix Do It? A 9 Step Coding & Intuitive Guide Into Collaborative Filtering

‘Movies recommended for you’ – Netflix
‘Videos recommended for you’ – YouTube
‘Restaurants recommended for you’ – Some smart restaurant finder app

Notice a trend? Your favorite apps ‘know’ you (or at least they think they do). They gradually learn your preferences over time (or in a matter of hours) and suggest new products which they think you’ll love.

How is this done? I can’t speak for how Netflix actually makes movie recommendations, but the fundamentals are largely intuitive, actually.

If you keep ‘five staring’ Stoner Comedy movies like the whole ‘Harold and Kumar’ series on Netflix, it makes sense for Netflix to assume that you may also enjoy ‘Ted’, or any other Stoner Comedy film on Netflix.

To make recommendations in a real world application, let’s take our intuition and apply it to a machine learning algorithm called Collaborative Filtering.

The following guide will be done in the ‘Octave’ programming language, so we can properly understand what is going on under the hood of collaborative filtering. Let’s get started.

Step 1 – Initialize The Movie Ratings

Simple but scalable scenario

  • 10 movies
  • 5 users
  • 3 features (we’ll discuss this in Step 3)

Here is an example diagram of movie ratings. Our rating system is from 1-10:

RatingsOne

Let’s initialize a 10 X 5 matrix called ‘ratings’; this matrix holds all the ratings given by all users, for all movies. Note: Not all users may have rated all movies, and this is okay.

Note 2: I simply made up some data for ‘ratings’. The point of this step is to simply start off with a dataset that we can work with.

This matrix below contains the same ratings data you saw in the picture above. Here is how we declare it in Octave:

ratings = [
8 4 0 0 4;
0 0 8 10 4;
8 10 0 0 6;
10 10 8 10 10;
0 0 0 0 0;
2 0 4 0 6;
8 6 4 0 0;
0 0 6 4 0;
0 6 0 4 10;
0 4 6 8 8];

Learner’s check:

  • Each column represents all the movies rated by a single user
  • Each row represents all the ratings (from different users) received by a single movie

 

Recall that our rating system is from 1-10. Notice how there are 0’s to denote that no rating has been given.

Step 2 – Determine Whether a User Rated a Movie

To make our life easier, let’s also declare a binary matrix (0’s and 1’s) to denote whether a user rated a movie.

1 = the user rated the movie.
0 = the user did not rate the movie.

Let’s call this matrix ‘did_rate’. Note it has the same dimensions as ‘ratings’, 10 X 5:

did_rate = ratings ~= 0;

This above command should give you the following binary matrix:

did_rate = 
1 1 0 0 1
0 0 1 1 1
1 1 0 0 1
1 1 1 1 1
0 0 0 0 0
1 0 1 0 1
1 1 1 0 0
0 0 1 1 0
0 1 0 1 1
0 1 1 1 1

Learner’s check:

  • did_rate(2, 3) = 1: This means the 3rd user did rate the 2nd movie
  • did_rate(6, 4) = 0: This means the 4th user did not rate the 6th movie

Step 3 – User Preferences and Movie Features/Characteristics

This is where it gets interesting. In order for us to build a robust recommendation engine, we need to know user preferences and movie features (characteristics). After all, a good recommendation is based off of knowing this key user and movie information.

For example, a user preference could be how much the user likes comedy movies, on a scale of 1-5. A movie characteristic could be to what degree is the movie considered a comedy, on a scale of 0-1.

Example 1: User preferences -> Sample preferences for a single user Chelsea

SampleUserPrefs

Example 2: Movie features -> Sample features for a single movie Bad Boys

SampleMovieFeature

Note: The user preferences are the exact same as the movie features; in other words, we can map each user preference to a movie feature. This makes sense; if a user has a huge preference for a comedy, we’d like to recommend a movie with a high degree of comedy. If we have add a new preference for the user, for ‘romantic-comedy’, we should also add this as a new feature for a movie, so that our recommendation algorithm can fully use this feature/preference when making a prediction.

Note 2: We can use these numbers that I purposely came up with to ‘predict’ ratings for movies. For example, let’s predict what Chelsea would rate Bad Boys, below:

Chelsea's (C) rating (R) of Bad Boys (BB): RC,BB = comedy feature product * action feature product * romance feature product
RC,BB; = (4.5 * 0.8) + (4.9 * 0.5)  + (3.6 * 0.4)
RC,BB; = 7.49

5 big problems: This seems great, but:

  1. Who has time to sit down and come up with a list of features for users and movies?
  2. It would be very time consuming to come up with a value for each feature, for each and every user and movie.
  3. Why did I pick 1-10 as the range for user preferences and 0-1 as the range for movie features? It seems a bit forced.
  4. How does the product (multiplication) of user_prefs and movie_features magically give us a predicted rating?
  5. Why did I pick ‘comedy’, ‘romance’ and ‘action’ as the features? This seems manual and forced. There must be a better way to generate features

The solution: 

Before we dive deep into the collaborative filtering solution to answer our 4 big problems, let’s quickly introduce some key matrixes that we’ll be needing.

The user features (preferences) can be represented by a matrix ‘user_prefs’. In our example, we have 5 users and 3 features. So, ‘user_prefs’ is a 5 X 3 matrix.

Here is an example diagram to help visualize the data ‘user_prefs’ contains:

SampleUserPrefs2

The movie features can also be represented by a matrix ‘movie_features’. In our example, we have 10 movies and 3 features. So, ‘movie_features’ is a 10 X 3 matrix.

Here is an example diagram to help visualize the data ‘movie_features’ contains:

SampleMovieFeatures2

Step 4: Let’s Rate Some Movies

I have a list of 10 movies here, in a text file:

1 Harold and Kumar Escape From Guantanamo Bay (2008)
2 Ted (2012)
3 Straight Outta Compton (2015)
4 A Very Harold and Kumar Christmas (2011)
5 Notorious (2009)
6 Get Rich Or Die Tryin' (2005)
7 Frozen (2013)
8 Tangled (2010)
9 Cinderella (2015)
10 Toy Story 3 (2010)

Now, let’s rate some movies. Our ratings can be represented by a 10 X 1 column vector my_ratings. Let’s initialize it to 0’s and make some ratings:

my_ratings = zeros(10, 1);
my_ratings(1) = 7;
my_ratings(5) = 8;
my_ratings(8)= 3;

Learner’s check:

  • I gave Harold and Kumar Escape From Guantanamo Bay a 7
  • I gave Notorious an 8
  • I gave Tangled a 3

Let’s update ratings and did_rate with the our ratings my_ratings:

ratings = [my_ratings ratings];
did_rate = [(my_ratings ~= 0) did_rate];

Learner’s check:

  • ‘ratings’ is now a 10 X 6 matrix
  • ‘did_rate’ is now a 10 X 6 matrix

Step 5: Mean Normalize All The Ratings

Once we get to Step 7: Minimize The Cost Function,  you may see why mean normalizing the ‘ratings‘ matrix is necessary.

What is mean normalization?

It is much easier to understand the ‘what’ if we understand the why. Why normalize the ‘ratings’ matrix?

Consider the following scenario:

A user (Christie) rated 0 movies. Our collaborative filtering algorithm that we are about to build will then go on to predict that Christie will rate all movies as 0. You may see why in the further steps when we cover the cost function and gradient descent. Don’t worry about it for now.

This is no good, because then we won’t be able to suggest Christie anything.  After all, a recommendation is simply based off of what movie(s) we predict the user to rate the highest.

So how do recommend a movie to a user who has never placed a rating?

We simply suggest the highest average rated movie. That’s the best we can do, since we know nothing about the user. This is made possible because of mean normalization.

What is mean normalization?

Mean normalization, in our case, is the process of making the average rating received by each movie equal to 0.

Take a look at our Step 1 example the ‘ratings’ matrix, again:

RatingsOne

Each row represents all the ratings received by one movie. Here’s how to normalize a matrix:

  1. Find the average of the 1st row. In other words, find the average rating received by the first movie ‘Harold and Kumar Go To Guantanamo Bay’
  2. Subtract this average from each rating (entry) in the 1st row
  3. The first row has now been normalized. This row now has an average of 0.
  4. Repeat steps 1 & 2 for all rows.

Here is the implementation for mean normalization in Octave:

function [ratings_norm, ratings_mean] = normalizeRatings(ratings, did_rate)
[m, n] = size(ratings);
ratings_mean = zeros(m, 1);
ratings_norm = zeros(size(ratings));
for i = 1:m
% all the indexes where there is a 1
idx = find(did_rate(i, :) == 1);

%only finding the mean for which the user has rated
ratings_mean(i) = mean(ratings(i, idx));
ratings_norm(i, idx) = ratings(i, idx) - ratings_mean(i);
end

end

We can call this function and store the results into a 1 X 2 row vector.

[ratings, ratings_mean] = normalizeRatings(ratings, did_rate);

Learner’s check:

‘ratings’ contains the normalized ‘ratings’ matrix. Of course, it’s still a 10 X 6 matrix. Here it is below:

ratings =
1.25000 2.25000 -1.75000 0.00000 0.00000 -1.75000
0.00000 0.00000 0.00000 0.66667 2.66667 -3.33333
0.00000 0.00000 2.00000 0.00000 0.00000 -2.00000
0.00000 0.40000 0.40000 -1.60000 0.40000 0.40000
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
0.00000 -2.00000 0.00000 0.00000 0.00000 2.00000
0.00000 2.00000 0.00000 -2.00000 0.00000 0.00000
-1.33333 0.00000 0.00000 1.66667 -0.33333 0.00000
0.00000 0.00000 -0.66667 0.00000 -2.66667 3.33333
0.00000 0.00000 -2.50000 -0.50000 1.50000 1.50000

‘ratings_mean’ is a 10 X 1 column vector whose ith row contains the average rating of the ith movie. Here it is below:

ratings_mean =
5.7500
7.3333
8.0000
9.6000
8.0000
4.0000
6.0000
4.3333
6.6667
6.5000

Step 6: Collaborative Filtering via Linear Regression

If you are unfamiliar with how a linear regression works, these links should be helpful.

The simplest way to think about it is that we are simply fitting a line, (i.e) learning from to a scatter plot (in the case of a uni-linear regression):

SimpleLinearRegression

In our case, we face a multi-linear regression problem. But don’t worry, we’ll briefly cover the intuition in a few seconds.

Helpful intuition : A user’s big preference for comedy movies (i.e 4.5/5) paired with a high movie’s ‘level of comedy’ (i.e 0.8/1) tends to be positively correlated with the user’s rating for that movie. For the most part, this correlation is continuos.

Conversely, a user’s hate for comedy (1/5), still paired with a high movie’s ‘level of comedy’ (i.e 0.8/1) tends to be negatively correlated with the user’s rating for that movie. This is another reason for mean normalization. If you notice in the ‘ratings’ matrix above, there are some negative ratings. These ratings are negative because they have been rated below average.

If you are familiar with a linear regression, you may know that the goal of a linear regression is to minimize the sum of squared errors (absolute difference between our predicted values and observed values), in order to come up with the best learning algorithm for predicting new outputs, or in the case of a uni-linear regression, the best ‘line of best fit’.

Note: In our case, we face a multi-linear regression problem, since we have many more than 1 feature.

A linear regression is associated with some cost function; our goal is to minimize this cost function (Step 7), and thus minimize the sum of squared errors.

A vectorized implementation of a linear regression is as follows:

Y = X * θT

Learner’s check:

  • θ is our parameter (user preferences, in our case) vector
  • X is our vector of features (movie features, in our case)

To fit our example, we can rename the variables as such:

ratings = movie_features * user_prefsT

We want to simultaneously find optimal values of movie_features and user_prefs such that the sum of squared errors (cost function) is minimized. How can we do this?

Step 7: Minimize The Cost Function

We will allow our collaborative filtering algorithm to simultaneously come up with the appropriate values of ‘movie_features’ and ‘user_prefs’, by minimizing the sum of squared errors, through a process called gradient descent. For our case, the gradient descent algorithm (function) we’ll be using in Octave is fmincg.

Note: If you are unfamiliar with gradient descent, worry not. All you need to understand is that gradient descent is an iterative algorithm that helps us minimize, in our specific case, the sum of squared errors. Consequently, we will have ‘learned’ the appropriate values of ‘user_prefs’ and ‘movie_features’ to make accurate predictions on movie ratings for every user.

We need to provide our fmincg function with 2 things: A cost function and it’s gradients (the slopes/ partial derivatives of cost function).

Here is my cost function, with regularization (to prevent overfitting, i.e high variance):

predictions = X * Theta';
difference = predictions - Y;
J = sum(difference(R==1) .^ 2) / 2;
thetaReg = sum(sum(Theta .^ 2)) * (lambda / 2);
xReg = sum(sum(X .^ 2)) * (lambda / 2);
J = J + thetaReg + xReg;

Learner’s check:

  • Remember, this code is inside of a function. So when this function is executed and it returns its matrix(s), the X matrix will hold the learned data (numbers) for ‘movie_features’ and the Theta matrix will hold the learned data (numbers) for ‘user_prefs’. You will see this shortly, if you are confused

 

Here is my gradient code:

for i = 1 : num_movies
withoutReg = ((X(i, :) * Theta' - Y(i, :)) .* R(i, :) * Theta);
reg = lambda * X(i, :);
X_grad(i, :) = withoutReg + reg;
end;
for j = 1 : num_users
withoutReg = ((X * Theta(j, :)' - Y(:, j)) .* R(:, j))' * X;
reg = lambda * Theta(j, :);
Theta_grad(j, :) = withoutReg + reg;
end;

And here is the full implementation of the entire function (calculating cost and its gradients):

function [J, grad] = costFunc(params, Y, R, num_users, num_movies, ...
num_features, lambda)
X = reshape(params(1:num_movies*num_features), num_movies, num_features);
Theta = reshape(params(num_movies*num_features+1:end), ...
num_users, num_features);
J = 0; % cost (sum of squared differences)
X_grad = zeros(size(X)); % (partial derviates of J with respect to X (movie_features))
Theta_grad = zeros(size(Theta)); % (partial derviates of J with respect to Theta (user_prefs))

% Cost function with regularization
predictions = X * Theta';
difference = predictions - Y;
J = sum(difference(R==1) .^ 2) / 2;

thetaReg = sum(sum(Theta .^ 2)) * (lambda / 2);
xReg = sum(sum(X .^ 2)) * (lambda / 2);

J = J + thetaReg + xReg;

% gradients
for i = 1 : num_movies
withoutReg = ((X(i, :) * Theta' - Y(i, :)) .* R(i, :) * Theta);
reg = lambda * X(i, :);
X_grad(i, :) = withoutReg + reg;
end;

for j = 1 : num_users
withoutReg = ((X * Theta(j, :)' - Y(:, j)) .* R(:, j))' * X;
reg = lambda * Theta(j, :);
Theta_grad(j, :) = withoutReg + reg;
end;

grad = [X_grad(:); Theta_grad(:)];

end

Before we actually execute this function, we need to initialize our parameters user_prefs (Theta) and movie_features (X) to random small numbers. To do this in Octave, I have used the randn function. This function returns a matrix of random elements that are normally distributed, with a mean of 0 and a variance of 1:

num_users = size(ratings, 2);
num_movies = size(ratings, 1);
num_features = 5;
% Initialize Parameters Theta (user_prefs), X (movie_features)
movie_features = randn(num_movies, num_features);
user_prefs = randn(num_users, num_features);
initial_parameters = [movie_features(:); user_prefs(:)];

Now, let’s set some options for our cost minimizing fmincg function:

options = optimset('GradObj', 'on', 'MaxIter', 100);

Finally, let’s run fmincg, which will consequently run costFunc 100 times. Notice, fmincg takes our costFunc function as an argument. This is what fmincg needs to minimize our cost function and calculate the best learning algorithm for predicting movie ratings:

lambda = 10; % regularization weight/parameter
optimal_prefs_and_features = fmincg (@(t)(costFunc(t, ratings, did_rate, num_users, num_movies, ...
num_features, lambda)), ...
initial_parameters, options);

Learner’s check:

  • If you are unfamiliar with regularization, you don’t need to worry about what lambda means.
  • optimal_prefs_and_features is the column vector returned from fmincg. It contains optimal values for user preferences and movie features that minimize our cost function

We need to extract ‘user_prefs’ and ‘movie_features’ from optimal_prefs_and_features, so we can start making some predictions:

movie_features = reshape(optimal_prefs_and_features(1:num_movies*num_features), num_movies, num_features);
user_prefs = reshape(optimal_prefs_and_features(num_movies*num_features+1:end), ...
num_users, num_features);

Step 8: Make Movie Predictions!…Finally

Recall Step 4: Let’s Rate Some Movies. We rated some movies. Now, let’s use our learning algorithm we just built to predict ratings that we would give movies, based on our learning algorithm, and our ‘my_ratings’ row vector:

all_predictions = movie_features * user_prefs';
my_predictions = all_predictions(:,1) + ratings_mean;

‘my_predictions’ is a 10 X 1 column vector:

my_predictions =
5.7500
7.3333
8.0000
9.6000
8.0000
4.0000
6.0000
4.3333
6.6667
6.5000

Learner’s check:

  • Recall in Step 5 where we mean normalized all the ‘ratings’. Since we subtracted the mean of the movie’s ratings from each rating for that movie, we added back ‘ratings_mean’ to our predicted ratings.

Let’s display our predictions:

[r, ix] = sort(my_predictions, 'descend');
fprintf('nTop recommendations for you:n');
for i=1:10
j = ix(i);
fprintf('Predicting rating %.1f for movie %sn', my_predictions(j), ...
movieList{j});
end
fprintf('nnOriginal ratings provided:n');
for i = 1:length(my_ratings)
if my_ratings(i) > 0
fprintf('Rated %d for %sn', my_ratings(i), ...
movieList{i});
end
end

The result looks as follows:

Top recommendations for you:
Predicting rating 9.6 for movie Straight Outta Compton (2015)
Predicting rating 8.0 for movie A Very Harold and Kumar Christmas (2011)
Predicting rating 8.0 for movie Notorious (2009)
Predicting rating 7.3 for movie Ted (2012)
Predicting rating 6.7 for movie Cinderella (2015)
Predicting rating 6.5 for movie Toy Story 3 (2010)
Predicting rating 6.0 for movie Frozen (2013)
Predicting rating 5.8 for movie Harold and Kumar Escape From Guantanamo Bay (2008)
Predicting rating 4.3 for movie Tangled (2010)
Predicting rating 4.0 for movie Get Rich Or Die Tryin' (2005)
Original ratings provided:
Rated 7 for Harold and Kumar Escape From Guantanamo Bay (2008)
Rated 8 for Notorious (2009)
Rated 3 for Tangled (2010)

Step 9: Take It Further

You should try to build your own recommendation engine. Perhaps not just for movies, but for anything else you can think of. We can’t always find what are looking for by ourselves. Sometimes a good recommendation is all we need.

Perhaps you can implement a clustering algorithm such as k-means or DBSCAN to group users with similar features together, and thereby recommend the same movies to users belonging to the same cluster.

In our example, the more you rate movie movies, the more ‘personalized’ (and possibly accurate) your recommendations will be. This is because you are giving the recommendation engine (learning algorithm) more of your data to observe and learn from.

So, maybe if you actually ‘Netflix and chill’ed more often, Netflix will know you better and make better movie recommendations for you 😉

Nikhil Bhaskar

*Original post here*

Source: Movie Recommendations? How Does Netflix Do It? A 9 Step Coding & Intuitive Guide Into Collaborative Filtering by nbhaskar