Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.
[ DATA SCIENCE Q&A]
Q:What is random forest? Why is it good?
A: Random forest? (Intuition):
– Underlying principle: several weak learners combined provide a strong learner
– Builds several decision trees on bootstrapped training samples of data
– On each tree, each time a split is considered, a random sample of m predictors is chosen as split candidates, out of all p predictors
– Rule of thumb: at each split m=?p
– Predictions: at the majority rule
Why is it good?
– Very good performance (decorrelates the features)
– Can model non-linear class boundaries
– Generalization error for free: no cross-validation needed, gives an unbiased estimate of the generalization error as the trees is built
– Generates variable importance
Master Data Management (MDM) is the process of establishing and implementing standards, policies and tools for data that’s most important to an enterprise, including but not limited to information on customers, employees, products and suppliers.
In business master data management (MDM) comprises the processes, governance, policies, standards and tools that consistently defines and manages the critical data of an organization to provide a single point of reference.
The data that is mastered may include:
master data – the business objects for transactions, and the dimensions for analysis
reference data – the set of permissible values to be used by other data fields
Transactional data – supports applications
Analytical data – supports decision making 
In computing, An MDM tool can be used to support master data management by removing duplicates, standardizing data (mass maintaining), incorporating rules to eliminate incorrect data from entering the system in order to create an authoritative source of master data. Master data are the products, accounts and parties for which the business transactions are completed. The root cause problem stems from business unit and product line segmentation, in which the same customer will be serviced by different product lines, with redundant data being entered about the customer (aka party in the role of customer) and account in order to process the transaction. The redundancy of party and account data is compounded in the front to back office life cycle, where the authoritative single source for the party, account and product data is needed but is often once again redundantly entered or augmented.
So, with task such important Master Data must be designed appropriately and after careful consideration to variour bells and whistles which are responsible for success and failure of the project. Following are top 21 bestpractices that needs to be considered before applying a good data management strategy.
1. Define “What is the business problem we’re trying to solve?”:
With so much data and so many disperate data sources, it is very easy to get lost in translation. So, a mental road map on the overall objective will help in keeping the effort streamlined.
2. Understand how the project helps to prep you for big data:
Yes, growing data is a concern and it should be sorted out at the planning stage. It is important to identify how master data management strategy will prepare your organization not only for generic enterprise data but to cope up with ever increasing big data.
3. Devise a good IT strategy:
Good IT strategy always go hand in hand with a good data strategy. A disfucntional IT strategy could really throw off a most efficient designed data management strategy. A good IT strategy increase the chances of success for a good MDM strategy by several degrees.
4. Business “users” must take full ownership of the master data initiative:
It’s important that business and it’s users must take full ownership of the inititaitve. A well defined ownership will save project from several communication failure which is almost everytime responsible for any project failure.
5. Allow ample time for evaluation and planning:
A well laid out planning stage ensures all the cracks and crevices are sorted out before project is rolled out. A rushed project often increases the rist of failure. Don’t underestimate the time and expertise needed to develop foundational data models.
6. Understand your MDM hubâs data model and how it integrates with your internal source systems and external content providers:
When data model problems cropped up relatively late in the project, whether it was a disconnect between the hub and an important source system, or a misalignment between data modeled in the hub and an external information provider, it was very disruptive. These problems can be avoided by really understanding how the hub is designed, and then mapping that back to your source systems and your external information sources.
7. Identify the project’s mission and business values:
This is another important area that needs it’s due attention. A clear project mission and business value definition helps in making sure high ROI is thought for and planned after in the process. One must link the initiatives to actionable insights.
8. Choose the best technology platform:
Choosing a good technology is important as well. Remeber, you don’t change your technology daily, so putting some thought and research into it makes a lot of different in sustainability of the project. A good technology should help organization grow to next several years without presenting too much growth bottlenecks.
9. Be real and plan a multi-domain design:
In a real world, many MDM technologies grew up managing one particular type of master data. A good strategy must be consistent across. So, applying the same approach to the various master data domains, whether those be customer, product, asset, supplier, location or person is a good strategy.
10. Active, involved executive sponsorship:
Most organizations are very comfortable with their âislands of dataâ and with technology being implemented in silos. For someone in the organization to come along and suggest changing that status quo, and to start managing critical information centrally, treating it as a true corporate asset, is going to mean some serious cultural change.
11. Use a holistic approach â people, process, technology and information:
This may be the most important best practice. Youâve got to start with the people, the politics, the culture, and then to make sure you spend at least as much time on the business processes involved in data governance and data stewardship. These really deserve a separate article of their own.
12. Pay attention to organizational governance:
You must have a very strong governance model that addresses issues such as change management and knowledge transfer. Afterall, the culture in an organization is a most important entity and a sorted plan to derisk project from it ensures success.
13. Build your processes to be ongoing and repeatable, supporting continuous improvement:
Data governance is a long term proposition. As a reality of any enterprise life, as long as one is in business, enterprise will be creating, modifying, and using master data. So if everyone in the company relies on them, but no one is specifically accountable for maintaining and certifying their level of quality, it shouldnât be a surprise that, over time, like everything else, they become more and more chaotic and unusable. So plan from the beginning for a âway of lifeâ, not a project.
14. Have a big vision, but take small steps:
Consider the ultimate goal, but limit the scope of the initial deployment, users told Ventana. Once master data management is working in one place, extend it step by step, they advised. Business processes, rather than technology, are often the mitigating factor, they said, so it’s important to get end-user input early in the process.
15. Consider potential performance problems:
Performance is the 800-pound gorilla quietly lurking in the master data management discussion, Loshin cautioned. Different architectures can mean different performance penalties. So, make some room for repair.
16. Management needs to recognize the importance of a dedicated team of data stewards:
Just as books belong in a library and a library needs librarians, master data belongs in a dedicated repository of some type, and that repository needs to be managed by data stewards. It is cruicial to start with convincing management of the need for a small team of data stewards who are 100% dedicated to managing the enterpriseâs master data.
17. Consider the transition plan:
Then, there’s the prospect of rolling out a program that has an impact on many critical processes and systems — no trivial concern. Loshin recommended that companies should plan an master data management transition strategy that allows for static and dynamic data synchronization.
18. Resist the urge to customize:
Now that commercial off-the-shelf hub platforms have matured a bit, it should be easier to resist the temptation to get under the hood and customize them. Most vendors are still revving their products as often as twice a year, so you definitely donât want to get into a situation where you are ârev lockedâ to an older version.
19. Stay current with vendor-provided patches:
Given the frequency of point releases, patches and major upgrades, you should probably plan for at least one major upgrade during the initial implementation, and be sure to build âupgrade competencyâ in the team that will maintain the hub platform after the initial project goes live.
20. Carefully plan deployment:
With increasing MDM complexity, training of business and technical people is more important than ever. Using untrained or semi-trained systems integrators and outsourcing attempts caused major problems and project delays for master data management users.
21. Test, test, test and then test again:
This is like the old saying about whatâs important in real estate â âlocation, location, locationâ. Your MDM hub environment is going to be different, by definition, than every other environment in the world.
Before we dive into the topic, I want to take a step back and explain what is QRCode: QR Code (abbreviated from Quick Response Code) is the trademark for a type of matrix barcode (or two-dimensional code) first designed for the automotive industry. More recently, the system has become popular outside the industry due to its fast readability and large storage capacity compared to standard UPC barcodes. The code consists of black modules (square dots) arranged in a square pattern on a white background. The information encoded can be made up of four standardized kinds (“modes”) of data (numeric, alphanumeric, byte/binary, Kanji), or through supported extensions, virtually any kind of data.(per wikipedia).
To me, QRCode is an amazing magic wand that has the power to connect analog world to the digital world. It has the power to engage a motivated customers who is scanning QR Code and convert them to loyalists. From the day I was introduced to QR Code to today, I am extremely excited for what QR Code is worth, but at the same time, severely impacted by how underutilized it is. For the sake of this blog, and to understand what stores near-by are doing with their QRCode, I visited my nearest mall and clicked photos of the first few QR executions. To my surprise, it did not take me much to find or click quick snapshots of few different type of QR implementations. But amazing thing is that they all are doing it wrong. I will get on it soon. QRCode is facing some challenges with adoption, but with capable mobile devices, it is bound to pick up if it has not already. With this slow QR Code adoption, the only thing retailers need is a lousy execution throwing away users from using this amazing digital wonder of the world.
So, what are retailers doing wrong?
QRCode deployed covers used cases ranging from âsignup with our mailing listâ, âdownload our appâ, to âvisit our social pageâ. There was no consistency in execution. Every store wants users to juggle in different ways. Below are 5 used cases that I came across. It is very likely that most of the retail store QRCode implementation falls into one of these. I can understand that retailers are still experimenting with QRCode projects and understanding the impact. But consider this: A user, who is motivated to click a QRCode, puts in considerable effort to do lots of clicks to get to other side. So, what is it all worth- A facebook like, a twitter follow, an app download or a signup for mailing list? Having a QRCode should be taken as similar to having a domain. Try having a domain name pointing to all these services. Just like domain names, QRCode are precious as well. It is a perfect way to engage an already committed user. So, why throw vague click-to-action at them. Why not grab their attention for something that is win-win for both the retailer and the customer.
Following are implementations of retail QRCode – âThe Good, The Bad and The Ugly sidesâ.
1. I got this image from someone and found it very interesting to share. It has some pros and cons to it.
The Good: QR Code is sitting in the primary location, gate is the first interface and attaching QR Code made it easily accessible. So, kudos for that.
The Bad: Is âfacebook likeâ or â twitter followâ that important? If a user jumping through hoops to scan a QRCode, to like a facebook page is appreciative of their effort? Is it providing enough value to retailer or user?
The Ugly: See at the end.
2. This is another example from a smoothie joint near the mall area, closer to my place.
The Good: It is great to separate interest groups, people with different intent will pick appropriate click to action. Here Yelp and Facebook audience are provided with separate QRCode.
The Bad: Confusion. With limited adoption and involvement, it is way too risky to have 2 QRCodes. It also exposes the campaign to technical issues. What if user scans them from distance that both QR show up etc. This implementation raises more questions than answers.
The Ugly: See at the end.
3. This is taken from a nearby Van Heusen outlet store from nearby mall.
The Good: $5 is very appreciative for the effort user is going through. This is gratifying users for their effort. $5 worked magic when it comes to fixing the eyes to banner as well.
The Bad: Â Confusing plate, 2 offers are bundled into one plate. It could confuse users. Text & QR are packaged into one.
The Ugly: See at the end.
4.Â This image is taken at a local GMC store. I like the way they explained the used case. I find no problem in understanding how to use it. But then, I am not sure if itâs usable for all audience.
The Good: Very well laid out plan on how to engage with the used case.
The Bad: Only caters to deal hunters; what if you are not here for deals?
The Ugly: See at the end.
5. This image was taken from a nearby Costco. I visit there often and never paid attention to this banner until recently.
The Good: The position of the banner. It was kept right above the checkout counter. In case a long queue is awaiting, user could use that waiting time to indulge with the banner.
The Bad: A descriptive banner, with fine prints and asking user to download an app. App is very intimate to users due to limited real-estate on mobile phone. It is asking the user too much of commitment while waiting.
The Ugly: See below. The Ugly: Almost all the used cases suffer from the same issue, not creating a bi-directional engagement interface with user. QRCode is used when user is actually physically present in the store and scanning. Also, it is not known at the moment what the user could be suffering from, so, selling them something without knowing what they want to buy is not a great idea. So, it is important for retail stores to provide a dashboard that could better address their current need (tools to help a surfing user) and once current issue is addresses, provide them with an opportunity to convert those users by offering app download links, social follow buttons or email newsletter signups.
From the observation above, few things stand out. Retailer gets the importance of QRCode but still lack a used case that will help engage their customers better with their brand. As with all great brands, listening is as important of a task as talking. So, why QRCode should be any different? They should also have ability to listen to customer as well as talk to them. Therefore QRCode should be primarily thought of as a tool to engage an active customer.
It is important to look at QRCode from a different lens. Unlike, facebook like, mobile app, social follow, QRCode is used when user is actively engaged in a store, so selling them engagement tools for future might not be something that is uber targeted. However, one could obviously cross-sell those tools, on landing page when user scans a QRCode. So, QRCode interfaces should be handled differently and should not be mixed with loyalty tools.
So, an Ideal QRCode should have following components:
1. Single QRCode addressing all the needs of the user.
2. A well accessible placement of the QRCode, making it easily discoverable.
3. Well laid out procedure to help users engage with QRCode.
4. QRCode bringing users to a super dashboard that could help them in any possible way it can. i.e. providing product descriptions, deals, specials, live chats, app links etc.
5. Providing capability for users to leave comments, complaints, suggestion and fill surveys.
6. Ability to further help users extend the engagement by providing links to social media channels, apps, email list, newsletters etc.
7. Providing access to email list.
Based on the business, users, and used case, there may be more of less used cases as stated above, but the overall coverage should be pretty much same.
So, an advice to all the retailers, get back to whiteboards and rethink existing QRCode strategy. It is a big pot of gold if done right. As holiday season is approaching, this could be a great opportunity to connect with masses and engage with them by designing a perfect system.
Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.
[ DATA SCIENCE Q&A]
Q:How to clean data?
A: 1. First: detect anomalies and contradictions
* Tidy data: (Hadley Wickam paper)
column names are values, not names, e.g. 26-45
multiple variables are stored in one column, e.g. m1534 (male of 15-34 years old age)
variables are stored in both rows and columns, e.g. tmax, tmin in the same column
multiple types of observational units are stored in the same table. e.g, song dataset and rank dataset in the same table
*a single observational unit is stored in multiple tables (can be combined)
* Data-Type constraints: values in a particular column must be of a particular type: integer, numeric, factor, boolean
* Range constraints: number or dates fall within a certain range. They have minimum/maximum permissible values
* Mandatory constraints: certain columns cant be empty
* Unique constraints: a field must be unique across a dataset: a same person must have a unique SS number
* Set-membership constraints: the values for a columns must come from a set of discrete values or codes: a gender must be female, male
* Regular expression patterns: for example, phone number may be required to have the pattern: (999)999-9999
* Missing values
* Cross-field validation: certain conditions that utilize multiple fields must hold. For instance, in laboratory medicine: the sum of the different white blood cell must equal to zero (they are all percentages). In hospital database, a patients date or discharge cant be earlier than the admission date
2. Clean the data using:
* Regular expressions: misspellings, regular expression patterns
* KNN-impute and other missing values imputing methods
* Coercing: data-type constraints
* Melting: tidy data issues
* Date/time parsing
* Removing observations
Data scientists are not dime a dozen and they are not in abundance as well. Buzz around bigdata has produced a job category that is not only confusing but has been costing companies a lot in their stride to look through the talent pool to dig for a so called data scientist. So, what exactly is the problem and why are we suddenly seeing a lot of data scientist emerging from nowhere with very different skill sets? To understand this we need to understand the bigdata phenomena.
With emergence of big data user companies like Google, Facebook, yahoo etc. and their amazing contribution to open source, new platforms have been developed to process too much data using commodity hardware in fast and yet, cost efficient ways. Now with that phenomenon, every company wants to get savvier when it comes to managing data to gain insights and ultimately building competitive edge over their competitors. But companies are used to understanding small pieces of data using their business analysts. But talk about more data and more tools. Who will fit in? So, they started on lookout for special breed of professional that have the capability to deal with big data and it’s hidden insights.
So, where is the problem here? The problem lies in the fact that only one job title emerged from this phenomenon- data scientist. The professionals who are currently practicing some data science via business analysis, data warehousing or data designing jumped on the bandwagon grabbing the title of the data scientist. What is interesting here is that data scientist job as explained above does not deserve a single job description so it should be handled accordingly. It was never a magical job title that has all the answers for any data curious organization, to be able to understand, develop and manage a data project.
Before we go into what companies should do, let’s reiterate what is a data scientist. As the name suggest, it is something to do with data and scientist. Which means it should include job description that has done some data engineering, data automation, and scientific computing with a hint of business capabilities. If we extrapolate, we are looking at a professional with computer science degree, doctorate in statistical computing and MBA in business. What would be luck in finding that candidate and by-the-way, they should have some industry domain expertise as well. What is the likelihood that such a talent exists? Rare. But, even if they are in abundance, companies should tackle this problem at much granular and sustainable scale. And one more thing to note here is that no two data scientist job requirements are the same. This means that your data scientist requirement could be extremely different from what anyone else is looking for in a data scientist. So, why should we have one title to cater to such a diverse category?
So, what should companies do? First it is important to understand that companies are building data scientistsâ capabilities and should not be hiring the herd of data scientists. This means that companies/ hiring managers should understand that they are not looking for a particular individual but a team as a solution. It is important for businesses to clearly articulate those magic skillsets that their so-called data scientist should carry. Following this drill, companies should split the skillset into categories, Data analytics, Business analyst, data warehousing professionals, software developer, and data engineers to name a few. Finding a common island where business analysts, statistical computing modelers and data engineers work in harmony to address a system that handles big data is a great start. Think of it as putting together a central data office. Huh! another buzz word. Donât worry; I will go into more details in the follow-up blogs. Think of it as a department where business, engineering and statistician work together on a common objective. Data science is nothing but an art to find value in lots of data. So, big-data is to build capability to parse/analyze lots of data. So, business should work through their laundry list of skillset. First identify internal resources that could accommodate that list. Following this, companies should form a hard matrix structure to prove the idea of set of people working together as a data scientist. BTW I am not saying that you need one individual from each category, but, together the team should have all the skills mentioned above
One important take away for companies is to understand that the moment they came across a so called data scientist, it is important to understand which side of data scientist the talent represents. Placing that talent in their respective silo will help provide a clearer vision when it comes to understanding the talent and understanding the void that could stay intact if the resources are not filled accordingly. So, living in this convoluted world of data scientist is hard and tricky. Having some chops into understanding data science as a talent, companies could really play the big data talent game to their advantage and lure some cutting edge people and grow sustainably.
When you encounter apparently unmanageable and insurmountable Big Data sets, it seems practically impossible to get easy access to the correct data. The fact is, Big Data management could prove to be highly tricky and challenging and may come up with a few issues. However, effective data access could still be attained. Here are a few strategies for effectively achieving superlative data connectivity.
Hadoop is actually an ecosystem that has been designed for helping organizations in storing mammoth quantities of Big Data. It is important for you to have a sound understanding of ways to successfully bring your Big Data into and take it out of Hadoop so that you could effectively move ahead, as companies are involved in handling challenges of integrating Big Data within already existing data-infrastructure.
Integrating Cloud Data with Already Existing Reporting Applications
Integrating Cloud Data with already existing reporting applications such as Salesforce Dx has totally transformed the way you perceive and work with your customer data. These systems would, however, could face certain complications in acquiring the real-time reports. You would be relying on these reports for perfect business decision-making, thus generating the demand for an effective solution that allows such kind of real-time reporting.
Do Not Let the Sheer Scale of Big Data Get to You
Big Data could be hugely advantageous for businesses but if your organization is not ready to effectively handle it, you may have to do without the business value Big Data actually has on offer for you. Some organizations have the necessary scalable, flexible data infrastructure required for exploiting Big Data in order to achieve crucial business insight.
Access Salesforce Data via SQL
Salesforce data actually provides great value for numerous organizations; however, access issues could prove to be major obstacles in the way of organizations from reaping the fullest possible advantage. However, now businesses could effectively have an easy access to Salesforce data through ODBC and SQL. These smart drivers would be allowing you to create a connection and start implementing your queries in just a few minutes.
Do Accurate Analysis of Big Data
You could get a greater accuracy of big data depending on the technology utilized. There are several Big Data platforms that could be chosen by a company such as Apache Spark and Hadoop could come up with unique and accurate analysis of Big Data sets. More cutting-edge Big Data technology would be successfully generating more state-of-the-art Big Data models. Many organizations would be opting for a reliable Big Data provider. There are a great variety of options open to them and so businesses today could easily locate a Big Data provider that is suitable for their specific requirements and that comes up with accurate results or precise outcomes.
Business organizations must take extra initiative in assessing and analyzing the data collected by them. They must make sure that the data is collected from an authentic and reliable source. They must identify the context behind data generation. Every move involved in the analysis process requires being observed carefully right from the proper data ingestion to its enrichment and preparation. Data protection from external interference is essential.
Sujain Thomas is a Salesforce consultant and discusses the benefits of Salesforce Dx in her blog posts. She is an avid blogger and has an impressive fan base.
Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.
[ DATA SCIENCE Q&A]
Q:Is it better to design robust or accurate algorithms?
A: A. The ultimate goal is to design systems with good generalization capacity, that is, systems that correctly identify patterns in data instances not seen before
B. The generalization performance of a learning system strongly depends on the complexity of the model assumed
C. If the model is too simple, the system can only capture the actual data regularities in a rough manner. In this case, the system poor generalization properties and is said to suffer from underfitting
D. By contrast, when the model is too complex, the system can identify accidental patterns in the training data that need not be present in the test set. These spurious patterns can be the result of random fluctuations or of measurement errors during the data collection process. In this case, the generalization capacity of the learning system is also poor. The learning system is said to be affected by overfitting
E. Spurious patterns, which are only present by accident in the data, tend to have complex forms. This is the idea behind the principle of Occams razor for avoiding overfitting: simpler models are preferred if more complex models do not significantly improve the quality of the description for the observations
Quick response: Occams Razor. It depends on the learning task. Choose the right balance
F. Ensemble learning can help balancing bias/variance (several weak learners together = strong learner) Source
Everybody has an opinion about Steve Jobs. Please tell me what you think of him and how he has impacted your life in this brief survey.
Steve Jobs, co-founder of Apple, passed away earlier this week at the age of 56.Â In the process of writing about how he impacted my life in myÂ blog,Â I created an image of him. To make this image, I collected quotes and articles that were written about him in the day following his passing. The quotes were from such notables like President Obama, Mark Zuckerberg, Guy Kawasaki, and Bill Gates, to name a few.Â Using these descriptive words of Steve Jobs, I created a word cloud in the form of his soon-to-be iconic image onÂ Apple.com.
In the world cloud, the font size of the words is related to the frequency of usage of the words; the larger the font size, the more frequently that word is used to describe Steve Jobs. This picture essentially represents how these people define him, remember him.
I now want to be more purposeful in creating the same image using words from people who never met him but whose lives may have been impacted by him.Â Could you please complete my one-minuteÂ survey about Steve Jobs?Â I am also going to conduct sentiment analysis on your comments to understand the sentiment behind them. So… your survey responses help to create art and advance science. In addition to feeling good about yourself, I will notify youÂ when this project is completed (if you provide your email address in the survey).
Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.
[ DATA SCIENCE Q&A]
Q:You have data on the durations of calls to a call center. Generate a plan for how you would code and analyze these data. Explain a plausible scenario for what the distribution of these durations might look like. How could you test, even graphically, whether your expectations are borne out?
A: 1. Exploratory data analysis
* Histogram of durations
* histogram of durations per service type, per day of week, per hours of day (durations can be systematically longer from 10am to 1pm for instance), per employee
2. Distribution: lognormal?
3. Test graphically with QQ plot: sample quantiles of log(durations)log?(durations) Vs normal quantiles
Given that hospitals have a variety of metrics at their disposal, it would be interesting to understand how these different metrics are related with each other. Do hospitals that receive higher PX ratings (e.g., more satisfied patients) also have better scores on other metrics (lower mortality rates, better process of care measures) than hospitals with lower PX ratings? In this week’s post, I will use the following hospital quality metrics:
Health Outcomes (mortality rates, re-admission rates)
Process of Care
I will briefly cover each of these metrics below.
1. Patient Experience
Patient experience (PX) reflects the patients’ perceptions about their recent inpatient experience. PX isÂ collected by a survey known asÂ HCAHPSÂ (Hospital Consumer Assessment of Healthcare Providers and Systems).Â HCAHPS (pronounced “H-caps“)Â is a national, standardized survey of hospital patients and was developed by a partnership of public and private organizations andÂ was created to publicly report the patientâs perspective of hospital care.
The survey asks a random sample of recently discharged patients about important aspects of their hospital experience.Â The data set includes patient survey results for over 3800 US hospitalsÂ onÂ ten measures of patients’ perspectives of careÂ (e.g., nurse communication, pain well controlled). I combined two general questions (Overall hospital rating and recommend) to create a patient advocacy metric. Thus, a total of 9 PX metrics were used.Â Across all 9 metrics, hospital scores can range from 0 (bad) to 100 (good).Â You can see the PX measures for different US hospitalÂ here.
2. Process of Care
Process of care measures show, in percentage form or as a rate, how often a health care provider gives recommended care; that is, the treatment known to give the best results for most patients with a particular condition.Â The process of care metric is based onÂ medical information from patient records that reflects the rate or percentage across 12 procedures related to surgical care. Â Some of these procedures are related to antibiotics being given/stopped at the right times and treatments to prevent blood clots. Â These percentages were translated into scores that ranged from 0 (worse) to 100 (best). Â Higher scores indicate that the hospital has a higher rate of following best practices in surgical care. Details of how these metrics were calculated appear below the map.
I calculated an overall Process of Care Metric by averaging each of the 12 process of care scores. The process of care metric was used because it has good measurement properties (internal consistency was .75) and, thus reflects a good overall measure of process of care. You can see the process of care measures for different US hospital here.
3. Health Outcomes
Measures that tell what happened after patients with certain conditions received hospital care are called “Outcome Measures.” We use two general types of outcome measures: 1) 30-day Mortality Rate and 2) 30-day Readmission Rate. The 30-day risk-standardized mortality and 30-day risk-standardized readmission measures for heart attack, heart failure, and pneumonia areÂ produced from Medicare claims and enrollment data using sophisticated statistical modeling techniquesÂ that adjust for patient-level risk factors and account for the clustering of patients within hospitals.
The death rates focus on whether patients died within 30 days of their hospitalization. The readmission rates focus on whether patients were hospitalized again within 30 days.
Three mortality rate and readmission rate measures were included in the healthcare dataset for each hospital. These were:
30-Day Mortality Rate / Readmission Rate from Heart Attack
30-Day Mortality Rate / Readmission Rate from Heart Failure
30-Day Mortality Rate / Readmission Rate from Pneumonia
Mortality/Readmission rate is measured in units of 1000 patients. So, if a hospital has a heart attack mortality rate of 15, that means that for every 1000 heart attack patients, 15 of them die get readmitted.Â You can see the health outcome measures for different US hospitalÂ here.
The three types of metrics (e.g., PX, Health Outcomes, Process of Care) were housed in separate databases on the data.medicare.gov site. As explained elsewhere in my post on Big Data,Â I linked these three data sets together by hospital name. Basically, I federated the necessary metrics from their respective database and combined them into a single data set.
Descriptive statistics for each variable are located in Table 1. The correlations of each of the PX measures with each of the Health Outcome and Process of Care Measures is located in Table 2. As you can see, the correlations of PX with other hospital metrics is very low, suggesting that PX measures are assessing something quite different than the Health Outcome Measures and Process of Care Measures.
Patient Loyalty and Health Outcomes and Process of Care
Patient loyalty/advocacy (as measured by the Patient Advocacy Index) is logically correlated with the other measures (except for Death Rate from Heart Failure). Hospitals that have higher patient loyalty ratings have lower death rates, readmission rates and higher levels of process of care. The degree of relationship, however, is quite small (the percent of variance explained by patient advocacy is only 3%).
Patient Experience and Health Outcomes and Process of Care
Patient experience (PX) shows a complex relationship with health outcome and process of care measures. It appears that hospitals that have higher PX ratings also report higherÂ death rates. However, as expected, hospitals that have higher PX ratings report lower readmission rates. Although statistically significant, all of the correlations of PX metrics with other hospitals metrics are low.
The PX dimension that had the highest correlation with readmission rates and process of care measures was “Given Information about my Recovery upon discharge“. Â Hospitals who received high scores on this dimensions also experienced lower readmission rates and higher process of care scores.
Hospitals are tracking different types of quality metrics, metrics being used to evaluate each hospital’s performance. Three different metrics for US hospitals were examined to understand how well they are related to each other (there are many other metrics on which hospitals can be compared). Results show that the patient experience and patient loyalty are only weakly related to other hospital metrics, suggesting that improving the patient experience will have little impact on other hospital measures (health outcomes, process of care).