Hacking the Data Science

Hacking the Data Science
Hacking the Data Science

In my previous blog on the convoluted world of data scientist, I shed some light on who exactly is data scientist. There was a brief mention of how the data scientist is a mix of Business Intelligence, Statistical Modeler and Computer Savvy IT folks. The write-up discussed on how businesses should look at their workforce for data science as a capability and not as data scientist as a job. One area that has not been given its due share is how to get going in building data science area. How the businesses should proceed in filling their data science space. So, in this blog, I will spend sometime explaining an easy hack to get going on your data science journey without bankrupting yourself of hiring boatload of data scientists.

Let’s first try visiting what is already published on this area. A quick thought that comes to mind when thinking about the image that shows data science as three overlapping circles. One is Business, one is statistical modeler and one is technology. Where further common area shared between Technology, Business and statistician is written as data science. This is a great representation of where data science lies. But it sometimes confuses the viewer as well. From the look of it, one could guess that overlapping region comprises of the professionals who possess all the 3 talents and it’s about people. Whereas all it is suggesting is that overlap region contains common use cases that requires all 3 areas of business, statistician and technology.

Also, the image of 3 overlapping circles does not convey the complete story as well. It suggests overlap of some common use cases but not how resources will work across the three silos. We need a better representation to convey the accurate story. This will help in better understanding on how businesses should go about attacking the data science vertical in effective manner. We need a representation keeping in mind the workforce that is represented by these circles. Let’s call these resources in 3 broad silos Business Intelligence folks represented by Business circle, Statistical modeler is represented by statistician circle and IT/Computer engineers are represented by Technology circle. For simplicity lets assume these circles are not touching with each other and they are represented as 3 separated islands. This will provide a clear canvas of where we are and where we need to get.

This journey from 3 separated circles to 3 overlapping circle communicates some signals to understand how to achieve this capability from the talent perspective. We are given 3 separate islands a task to join them. There are few alternatives that comes to mind:

1. Hire data scientists, have them build their own circle in the middle. Let them keep expanding on their capability in all 3 directions (Business, Statistics and Technology). This will keep increasing the middle bubble to a point it touches and overlaps all the 3 circles. This particular solution is resulting in mass hiring of data scientists by mid-large scale enterprises. Most of these data scientist were not given real use cases and they are trying to find how they could bring the 3 circle closer to make the overlap happen. It does not take long to understand that this is not the most efficient process as it costs a bunch to businesses. This method sounds juicy as it gives Human Resources a good night sleep as HR could acquire Data Scientist talents for an area, which is high in demand. Now everyone needs to work on those hires and teach them 3 circles and the culture associated with it. Good thing in this method is that these professionals are extremely skilled and could roll the dice pretty quickly. However, they might take their own pace when it comes to identifying use cases and justifying their investments.

2. Another way that comes to mind is to set aside some professionals from each circle to start digging their way to common area where all the three could meet, learn from each other. Collaboration brings the best in most people. Via collaborative way, these professionals bring their respective culture, their SMEs in their line of business to the mix. This method looks to be the optimum solution, as it requires no outside help. This method does provide an organic way to build data science capability but it could take forever before these 3 camps could come to same page. This slowness, also trips this particular method as one of the most efficient one.

3. If one method is expensive but fast and other is cost effective but slow, what is the best method? It is somewhere between the slider of fast-expensive and slow-cost effective. So, hybrid looks to be bringing the best of both worlds. Having a data scientist and a council of SMEs from respective circles working together could keep the expense at check and at the same time brings the three camps closer faster via their SMEs. How many data scientist to begin with? Answer could be found out based on the company size, its complexity and wallet size. Now you could further hack the process to hire contracting data scientist to work as a liaison till the three camps find their overlap in professional capacity. So, this is the particular method which could businesses could explore to hack the data science world and get their businesses to big data compliant and data driven capable business.

So, experimenting with hybrid of shared responsibility between 3 circles of business, statistics and technology with a data scientist as a manager will bring businesses to speed when it comes to adapting themselves to big data ways of doing things.

Source by v1shal

New MIT algorithm rubs shoulders with human intuition in big data analysis

We all know that computers are pretty good at crunching numbers. But when it comes to analyzing reams of data and looking for important patterns, humans still come in handy: We’re pretty good at figuring out what variables in the data can help us answer particular questions. Now researchers at MIT claim to have designed an algorithm that can beat most humans at that task.

[AI can now muddle its way through the math SAT about as well as you can]

Max Kanter, who created the algorithm as part of his master’s thesis at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) along with his advisor Kalyan Veeramachaneni, entered the algorithm into three major big data competitions. In a paper to be presented this week at IEEE International Conference on Data Science and Advanced Analytics, they announced that their “Data Science Machine” has beaten 615 of the 906 human teams it’s come up against.

The algorithm didn’t get the top score in any of its three competitions. But in two of them, it created models that were 94 percent and 96 percent as accurate as those of the winning teams. In the third, it managed to create a model that was 87 percent as accurate. The algorithm used raw datasets to make models predicting things such as when a student would be most at risk of dropping an online course, or what indicated that a customer during a sale would turn into a repeat buyer.

Kanter and Veeramachaneni’s algorithm isn’t meant to throw human data scientists out — at least not anytime soon. But since it seems to do a decent job of approximating human “intuition” with much less time and manpower, they hope it can provide a good benchmark.

[MIT researchers can listen to your conversation by watching your potato chip bag]

“If the Data Science Machine performance is adequate for the purposes of the problem, no further work is necessary,” they wrote in the study.

That might not be sufficient for companies relying on intense data analysis to help them increase profits, but it could help answer data-based questions that are being ignored.

“We view the Data Science Machine as a natural complement to human intelligence,” Kanter said in a statement. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”

This post has been updated to clarify that Kalyan Veeramachaneni also contributed to the study. 

View original post HERE.

Originally Posted at: New MIT algorithm rubs shoulders with human intuition in big data analysis

Aug 03, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)


Productivity  Source

[ AnalyticsWeek BYTES]

>> Periodic Table Personified [image] by v1shal

>> How to Win Business using Marketing Data [infographics] by v1shal

>> October 31, 2016 Health and Biotech analytics news roundup by pstein

Wanna write? Click Here


 Israeli cyber co Waterfall teams with insurance specialists – Globes Under  cyber security

 Call Centers, Listen Up: 3 Steps to Define the Customer Experience at “Hello” – Customer Think Under  Customer Experience

 Australian companies spending up on big data in 2017 – ChannelLife Australia Under  Big Data Analytics

More NEWS ? Click Here


CS109 Data Science


Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data managem… more


The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t


People love statistics. Statistics, however, do not always love them back. The Signal and the Noise, Nate Silver’s brilliant and elegant tour of the modern science-slash-art of forecasting, shows what happens when Big Da… more


Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.


Q:Explain the difference between “long” and “wide” format data. Why would you use one or the other?
A: * Long: one column containing the values and another column listing the context of the value Fam_id year fam_inc

* Wide: each different variable in a separate column
Fam_id fam_inc96 fam_inc97 fam_inc98

Long Vs Wide:
– Data manipulations are much easier when data is in the wide format: summarize, filter
– Program requirements



Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

 Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

Subscribe to  Youtube


Getting information off the Internet is like taking a drink from a firehose. – Mitchell Kapor


#BigData @AnalyticsWeek #FutureOfData #Podcast with @MPFlowersNYC, @enigma_data

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @MPFlowersNYC, @enigma_data


iTunes  GooglePlay


In 2015, a staggering 1 trillion photos will be taken and billions of them will be shared online. By 2017, nearly 80% of photos will be taken on smart phones.

Sourced from: Analytics.CLUB #WEB Newsletter

From Mad Men to Math Men: Is Mass Advertising Really Dead?

The Role of Media Mix Modeling and Measurement in Integrated Marketing.

At the end of the year Ogilvy is coming out with a new book on the role of Mass Advertising in this new world of Digital Media, Omni-channel and quantitative marketing in which we now live. Forrester research speculated about a more balanced approach to marketing measurement at its recent conference. Forrester proclaimed that gone are the days of the unmeasured Mad men approach to advertising with its large spending on ad buys that largely only drove soft metrics such as Brand Impressions and Customer Consideration. The new balanced approach to ad measurement includes a more Mathematical approach where programs have hard metrics many of which are financial (Sales, ROI, Quality Customer relationships) is here to stay. The hypothesis that Forrester put forward in their talk was that marketing has almost gone too far to Quantitative Marketing and they suggested the best blend is where Marketing Research as well as quantitative and behavioral data both have a role in measuring integrated campaigns. So what does that mean for Mass Advertising you ask?

Firstly, and the ad agencies can breathe a sigh of relief, Mass Marketing is not dead but is subject to many new standards namely:

Mass will be a smaller part of the Omni-Channel Mix of activities that CMO’s can allocate their spending toward but that allocation should be guided by concrete measures and for very large buckets of spend, Media Mix or Marketing Mix Modeling can help with decision making. The last statistic that we saw from Forrester was that Digital Media spend was about to or already surpassed mass media ad spend. So CMO’s should not be surprised to see that SEM, Google AdWords, Digital/Social and Direct Marketing are a significant 50% or more of the overall investment.
CFO’s are asking CMO’s for the returns on programmatic and digital media buys. How much money does each media buy make and how do you know it’s working? Gone are the days of “always on” mass advertising that could get away with reporting back only GRP’s or brand health measures. The faster the CMO’s are on board with this shift the better they can ensure a dynamic process for marketing investments.
The willingness to turn off and match market test mass media to ensure that it is still working. Many firms need to assess whether TV, and Print works for their brand, or campaign in light of their current target markets. Many ad agencies and marketing service providers have advanced audience selection and matching tools to help with this problem. (Nielsen, Acxiom and many more) These tools typically integrate audience profiling as well as privacy protected behavior information.
The Need to run more integrated campaigns with standard offers across channels and a smarter way of connecting the call to action and the drivers to Omni-channel within each media. For example, mention that consumers can download the firm’s app in the app store in other online advertising. The Integration of channels within a campaign will require more complex measurement attribution rules as well as additional test marketing and test and learn principles.
In this post we briefly explore two of these changes namely media mix modeling and advertising measurement options. If you want more specifics, please feel free to contact us if we can be helpful at CustomerIntelligence.net.

Firstly, let’s discuss that there is always a way to measure mass advertising and that it is not true that you need to leave it turned on for all eternity to do so for example: If you want to understand does my Media buy in NYC or a matched Market like LA (Large Markets) bring in the level of sales and inquiry to other channels that I need at a certain threshold, we posit that you can always:

Conduct a simple pre and post ad campaign lift analysis to determine the level of sales and other performance metrics prior to, during and after the ad campaign has run.
Secondly, you can hold out a group of matched markets to serve as control for performance comparison against the market you are running the ad in.
Shut off the media in a market for a very brief period of time. This can allow you to compare “dark” period performance with the “on” program period with Seasonality adjustments to derive some intelligence about performance and perhaps create a performance factor or base line from which to measure going forward. Such a factor can be leveraged for future measurement without shutting off programs. This is what we call dues paying in marketing analytics. You may have to sacrifice a small amount of sales to read the results each year. This is one way to ensure measurement of mass advertising.
Finally, you can study any behavioral changes in cross sell and upsell rates of current customers who may increase their relationship because of the campaign you are running.
Another point to make is that Enterprise Marketing Automation can help with the tracking and measuring of ad campaigns. For really large Integrated Marketing Budgets we recommend, Media Mix Modeling or Marketing Mix Modeling. There are a number of firms(Market Share Partners Inc is one firm.) that provide these models and we can discuss that in future posts. The Basic Mechanics of Marketing Mix Modeling(MMM) is as follows:

MMM uses econometrics to help understand the relationship between Sales and the various marketing tactics that drive Sales

Combines a variety of marketing tactics (channels, campaigns, etc.) with large data sets of historical performance data
Regression Modeling and Statistical analysis is often performed on available data to estimate the impact of various promotional tactics on sales in order to forecast the future sets of promotional tactics
This analysis allows you to predict sales based on mathematical correlations to historical marketing drivers & market conditions
Ultimately, the model uses predictive analytics to optimize future marketing investments to drive increases in sales, profits & share
Allows you to understand ROI for each media channel, campaign and execution strategy
Which media vehicles/campaigns are most effective at driving revenue/profit and share
Shows you what your incremental sales would be at different levels of media spend
Optimal Spending Mix by Channel to generate the most sales
The model establishes a link between individual drivers and Sales
Allows you to identify a sales response curve to advertising
So the good news is … Mass Advertising is far from dead! Its effectiveness will be looked at in the broader context of integrated campaigns with an eye toward contributing hard returns such as more customers and quality relationships. In addition, Mass advertising will be looked at in the context of how it integrates with digital, for example when the firm runs the TV ad do searches for the firm’s products and brand increase and then do those searches convert to sales. The Funnel in the new Omni-channel world is still being established.

In Summary, overall Mass advertising must be understood at the segment, customer and brand level as we believe it has a role in the overall marketing mix when targeted and used in the most efficient way. A more thoughtful view of marketing efficiency is now emerging and includes such aspects as matching the TV ad in the right channel to the right audience, understanding metrics and measures as well as integration points in how mass can complement digital channels as part of an integrated Omni-Channel Strategy. Viewing Advertising as a Separate discipline from Digital Marketing is on its way to disappearing and marketers must be well versed in both online and offline as the lines will continue to blur, change and optimize. Flexibility is key. The Organizations inside companies will continue to merge to reflect this integration and to avoid siloed thinking and sub-par results.

We look forward to dialoguing and getting your thoughts and experience on these topics and understanding counterpoints and other points of view to ours.

Thanks Tony Branda

CEO CustomerIntelligence.net.

Mad Men and Math Men

Source: From Mad Men to Math Men: Is Mass Advertising Really Dead?

Better Recruiting Metrics Lead to Better Talent Analytics


According to Josh Bersin in Deloitte’s 2013 report, Talent Analytics: From Small Data to Big Data, 75% of HR leaders acknowledge analytics are important to the success of their organizations. But 51% have no formal talent analytics plans in place. Nearly 40% say they don’t have the resources to conduct sound talent analytics. Asked to rate their own workforce analytics skills, another 56% said poor.

As Bersin further noted in a recent PeopleFluent article, HR Forecast 2014, “Only 14% of the companies we studied are even starting to analyze people-related data in a statistical way and correlate it to business outcomes. The rest are still dealing with reporting, data cleaning and infrastructure challenges.”

There’s a striking gap between the large number of companies that recognize the importance of metrics and talent analytics and the smaller number that actually have the means and expertise to put them to use.

Yes, we do need to gather and maintain the right people data first, such as when and where applicants apply for jobs, and the specific skills an employee has. But data is just information captured by recruiting system or software already in place. It doesn’t tell any story.

Compare data against goals or thresholds and it turns into insight, a.k.a workforce metrics — measurements with a goal in mind, otherwise known as Key Performance Indicators (KPIs), all of which gauge quantifiable components of a company’s performance. Metrics reflect critical factors for success and help a company measure its progress towards strategic goals.

But here’s where it gets sticky. You don’t set off on a cross-country road trip until you know how to read the map.

For companies, it’s important to agree on the right business metrics – and it all starts with recruiting. Even with standard metrics for retention and attrition in place, some companies also track dozens of meaningless metrics— not tied to specific business goals, not helping to improve business outcomes.

I’ve seen recruiting organizations spend all their time in the metrics-gathering phase, and never get around to acting on the results — in industry parlance, “boiling the ocean.” You’re far better off gathering a limited number of metrics that you actually analyze and then act upon.

Today many organizations are focused on developing recruiting metrics and analytics because there’s so much data available today on candidates and internal employees (regardless of classification). Based on my own recruiting experience and that of many other recruiting leaders, here are what I consider the Top 5 Recruiting Metrics:

1. New growth vs. attrition rates. What percentage of the positions you fill are new hires vs. attrition? This shows what true growth really looks like. If you are hiring mostly due to attrition, it would indicate that selection, talent engagement, development and succession planning need attention. You can also break this metric down by division/department, by manager and more.

2. Quality of hires. More and more, the holy grail of hiring. Happily, all measurable: what individual performances look like, how long they stay, whether or not they are top performers, what competencies comprise their performance, where are they being hired from and why.

3. Sourcing. Measuring not just the what but the why of your best talent pools: job boards, social media, other companies, current employees, etc. This metric should also be applied to quality of hire: you’ll want to know where the best candidates are coming from. Also, if you want to know the percentage rate for a specific source, divide the number of source hires by the number of external hires. (For example, total Monster job board hires divided by total external hires.)

4. Effectiveness ratio. How many openings do you have versus how many you’re actually filling?  You can also measure your recruitment rate by dividing the total number of new hires per year by the total number of regular headcount reporting to work each year. Your requisitions filled percent can be tallied by dividing the total number of filled requisitions by the total number of approved requisitions.

5. Satisfaction rating. An important one, because it’s not paid much attention to when your other metrics are in good shape. Satisfaction ratings can be gleaned from surveys of candidates, new hires and current employees looking for internal mobility. While your overall metrics may be positive, it’s important to find out how people experience your hiring process.

As your business leaves behind those tedious spreadsheets and manual reports and moves into Talent Analytics, metrics are going to be what feeds those results. Consider which metrics are the most appropriate for your business — and why. And then, the real analysis can begin, and help your organization make better talent-related decisions.

Article originally appeared HERE.

Source by analyticsweekpick