Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.
[ DATA SCIENCE Q&A]
Q:What are confounding variables?
A: * Extraneous variable in a statistical model that correlates directly or inversely with both the dependent and the independent variable
* A spurious relationship is a perceived relationship between an independent variable and a dependent variable that has been estimated incorrectly
* The estimate fails to account for the confounding factor
In the developed economies of Europe, government administrators could save more than 100 billion ($149 billion) in operational efficiency improvements alone by using big data, not including using big data to reduce fraud and errors and boost the collection of tax revenues.
I am happy to announce that I have joined Gleanster’s Thought Leader group as a contributing analyst.Â Gleanster is a market research and advisory services firm that benchmarks best practices in technology-enabled business initiatives, delivering actionable insights that allow companies to make smart business decisions and match their needs with vendor solutions.
In my role at Gleanster, I will be involved in providing insight into the Enterprise Feedback Management (EFM) and Customer Experience Management (CEM) space. Building on Gleanster’s 2010 Customer Feedback Management reportÂ as well as my ownÂ research on best practices in customer feedback programsÂ (See Beyond the Ultimate Question for complete results of my research), I will be directing Gleanster’s upcoming benchmark study on Customer Experience Management. In this study, we will identify specific components of CEM that are essential in helping companies deliver a great customer experience that increases customer loyalty.
“We are excited to have Dr. Hayes as part of our distinguished thought leader group. Dr. Hayes brings over 20 years of experience to bear on important issues in customer experience management and enterprise feedback management. Specifically, his prior research on the measurement and meaning of customer loyalty and best practices in customer feedback programs has helped advance the field tremendously. His scientific research is highly regarded by his industry peers, and we are confident that Dr. Hayes’ continuing contributions to the field will bring great value to the Gleanster community.”
Jeff Zabin, CEO
As a proud member of the 1% for the Planet alliance, Gleanster is committed to donating at least 1% of their annual sales revenue to nonprofit organizations focused on environmental sustainability.
Itâs 5:05pm EST. Bob, CFO of ABC Inc is about to get on an earnings call after just reporting a 20% miss on earnings due to slower revenue growth than forecasted. Company ABCâs stock price is plummeting, down 25% in extended hour trading. The board is furious and investors demand answers on the discrepancies.
Inaccurate revenue forecast remains one of the biggest risks for CFOs. In a recent study, more than 50% of companies feel their pipeline forecast is only about 50% accurate. Projecting a $30M revenue target and coming in short $6M can leave investors and employees frustrated and feeling misguided on the growth trajectory of the company.
In the past 10 years, supply chain has become much more complex with omni-channel distribution and the increasing number of indirect participants that can influence product demand. Advertising and promotions can create an uplift in demand that spikes sales by 20% or more. In addition, different types of customers have different purchasing behaviors. These behaviors are driven by a myriad of underlying indicators and should be modeled individually. Yet, Financial Planning and Analysis (FP&A) has not changed fundamentally despite the changing landscape in the way companies do business. The process is still largely manual and dependent on time-series estimation techniques dating back to the 1980s.
Machine learning is a new technology that uses algorithms to learn from the data and guides us in making more informed decisions. Leveraging the power of machines allows us to consider more scenarios and combine the effects of thousands of indicators to improve forecast accuracy. For revenue forecasting, machine learning excels in the following 3 areas:
1. Trend discovery from unlimited amounts of data
With the advances in big data technologies, computers can cost-effectively crunch through data of all types and sizes. Unlike humans, algorithms can simulate numerous scenarios and recognize patterns that keep re-emerging in the data. It is also not limited to structured data and can examine unstructured data such as emails and logs to extract meaningful indicators.
2.Â Granularity of forecast
Instead of looking at product line level aggregate sales values, machine learning algorithms can detect patterns at SKU, purchase order and invoice levels to discover interesting relationships and dependencies. For example, algorithms may find that the demand of one product (iPhone 6) is a leading indicator of demand for another product (iPhone 6 accessories).
3. Adaptive and Dynamic
Machines can also automatically adapt and re-run forecasting scenarios to adjust to changing market conditions and consumer demands.
Companies such as Flowcast are leading the charge in introducing machine learning techniques to the finance department of organizations. This will arm CFOs with much greater confidence in their analyses and projections.
Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.
[ DATA SCIENCE Q&A]
Q:Do we always need the intercept term in a regression model?
A: * It guarantees that the residuals have a zero mean
* It guarantees the least squares slopes estimates are unbiased
* the regression line floats up and down, by adjusting the constant, to a point where the mean of the residuals is zero
In this podcast Sid Probstein, CTO AIFoundry talks about mindset of technology transformist in data driven world. He discusses some of challenges he faces as a technologist and provide some ways to mitigate it. Sid also talk about mindset of technologist in startup vs a larger enterprise. It is a must listen conversation for technology folks in the industry trying to navigate the technology and business divide.
Sid Probstein is the CTO and VP of Solution Delivery for AI Foundry, the enterprise software arm and new face of Kodak Alaris. AI Foundry is disrupting the mortgage business by taking origination automation to the next level – enabling self-service, distributed capture and the automatic classification and extraction of scanned & imaged documents into actionable intelligence. He was previously co-founder and CTO at Attivio and held executive positions at FAST Search & Transfer, Northern Light Technology and John Hancock Financial Services.
#FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.
As big data continues to grow, companies around the world are on the hunt for technical recruits â a shift experts predict will continue through 2014 and beyond. WinterWyman’sNY Tech Search division, for example, has seen a 300% increase in demand for data scientists and engineers since 2013. The hottest sectors for big data growth are ad tech, financial services, ecommerce and social media
The hottest sectors for big data growth are ad tech, financial services, ecommerce and social media â those with the highest opportunity for revenue.
Below is a quick guide for both companies and job seekers seeking big data opportunities.
What is big data?
Big data is the advance in data management technology, which allows for an increase in the scale and manipulation of a company’s data. It allows companies to know more about their customers, products and their own infrastructure. More recently, people have become increasingly focused on the monetization of that data.
How can companies benefit by using big data; and more importantly, which industries use it?
Big data is everywhere; the volume of data produced, saved and mined is mind-boggling. Today, companies use data collection and analysis to formulate more cogent business strategies. This will continue to be an emerging area for all industries.
What is the current hiring landscape for big data? What are the salary ranges?
Currently, tech positions in big data are hard to fill because the demand is overwhelming and the talent pool is so small. It is difficult to find job candidates with the specific skill sets needed while balancing the cost of that talent. Companies need to ensure they can make money off of the âdataâ to justify offering candidates a large salary.
The highest demand is for data engineers who can code, utilize data analytics and manipulate for marketing purposes. The newest â and most sought-after role â is for data scientists who can integrate big data into both the company’s IT department and business functions.
These positions are all within a salary range of $90-$180,000
These positions are all within a salary range of $90-$180,000, depending on the individual role and experience. The typical time to hire is less than three weeks.
What is a data scientist?
Data scientists integrate big data technology into both IT departments and business functions. Many have a formal education in computer science and math, focusing on architecture/code, modeling, statistics and analytics. There is also a trend toward data-oriented masterâs degree programs being offered at many colleges and universities. A data scientist must also understand the business applications of big data and how it will affect the business organization, and be able to communicate with IT and business management.
What can job seekers do to get the skills they need for a job in big data?
You need to be a marketable programmer already, or enroll in a program/school like General Assembly. To help make you more marketable to transition to a big data job, aim to work on projects using platforms like Hadoop or Mongo.
3 tips for hiring companies
1. Don’t delay time to hire: These candidates have a lot of career options. They often have multiple job offers and are not on the market long. Wait too long, and they won’t be available.
2. Promote your company: It’s good to have a distinct company culture, but candidates are more concerned with how their job will evolve, who they will work with, what technology they will use, etc. Be sure that you’re hitting all of these key points throughout the interview process and on your company website’s “careers” section.
3. Practice flexibility when considering candidatesâ qualifications: Because of the limited number of qualified candidates, companies must be open to considering candidates from different industries who have transferable job skills. Focus on a candidate’s potential to learn and grow with the company rather than strict prerequisites for hard skills.
3 tips for job seekers
1. You have options: The market is strong, and this is a great time to be looking for employment. You have negotiating power when it comes to salary, benefits, etc.
2. Contract or permanent jobs abound: Most job candidates can convert a contract/temporary position to permanent employment. There are more opportunities today than in the past few years to transition from a contract position to a full-time one. That being said, developing a strong portfolio of contract/freelance work can prove lucrative â you’ll need to decide what option works best for your needs, goals and schedule.
3. Your technical skills are hot: People with strong tech backgrounds on their resumes are being bombarded with offers. You can afford to be selective about the company you decide to ultimately join.
Predictions for the future of big data
In my expert opinion, there will be a continued hiring demand for big data-related positions in industries such as mobile, healthcare and financial services â but industries that have the ability to monetize big data, such as ad tech, will likely have a longer, deeper and steeper hiring demand for big data-related positions.
What is so exciting is that big data applies to almost all industries. As a data scientist, you can work for any number of companies or industries. As an employer, it’s all about finding the right talent to fit your big data needs.
What is big-data?
In ideal scenario, big-data definition change from case to case. But, to summarize Wikipedia does a good job: In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. The challenges include capture, storage, search, sharing, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to “spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.”
Why is it important for insurance and why should they care?
Insurance companies have been gathering data (both structured and unstructured) for years now. So, the current big-data landscape fits their bill well. Insurance companies could use big-data to analyze their data and learn great deal of insights to service customers better and help differentiate them from competitors. Here are a few used cases that should motivate insurers from embracing big-data in their arsenal.
1. Â Â Â Â Do linkage analysis of structured and unstructured data: Insurance companies have been collecting data for eons. This data is placed in either structured or unstructured form. Before the age of sophisticated analytical tools, it was nearly impossible to comb the data for any further insights considering the amount of effort and cost vs expected outcome. Thanks to big-data, a lot of tools have emerged that are well capable of doing that task with minimum resource requirement and promise great outcomes. So, it should be taken as an opportunity for insurance companies to look deep in their data silos and process them to find meaningful Â correlations and insights to further help business.
2. Â Â Â Â Use public data from social web and scoop for prospect signals: Another big area that has been unleashed by sophisticated big-data tool is capturing social-web and searching it for any meaningful keyword, and use it to understand the insurance landscape. For example, consider looking for keywords that are utilized to describe oneâs business and see how much you lead that space. There are many other used cases that are super critical to insurance and could be solved by big-data tools.
3. Â Â Â Â Use data from social web to spy on competition: This is another powerful used case being used by many companies to better understand their competition, their brand perception and their social media footprint. It is done by sniffing on public web activity for competition and further analyzes the findings to learn more about competition. Real-time nature of the data makes it all the more interesting keeping information current and real-time.
4. Â Â Â Â Sniffing and processing all the product interfaces for insights: This is another big area harnessed by big-data tools. Due to superior analytical skills, big-data tools could also help in providing real-time insights from data collected from all the product interfaces. Whether it is verbal(call-center logs, queries etc.) or non-verbal data(logs, activity report, market conditions etc.). Once an appropriate model-framework to consume that data is build, big-data tools could get to job and start real-time analysis of customers, sales and provide invaluable actionable insights.
5. Â Â Â Â Big-data for data driven innovation: I have been a strong advocate for data driven innovation. Data driven innovation is innovating using the power of data. Once appropriate modules are identified that could advocate innovations, their information could be then processed and monitored for any correlations with business critical KPIs. Once a direct link is established, tweaking the process and monitoring its impact on the system and quickly help in understanding the areas for improvement. So, this module could be used to create innovation and promote lean build-measure-learn loops for faster learning and deployment. This will drastically reduce the execution cycle for testing innovations.
I am certain that there are numerous other areas, in which insurance could pitch in. Feel free to share your thoughts in comments.
Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.
[ DATA SCIENCE Q&A]
Q:What is statistical power?
A: * sensitivity of a binary hypothesis test
* Probability that the test correctly rejects the null hypothesis H0H0 when the alternative is true H1H1
* Ability of a test to detect an effect, if the effect actually exists
* Power=P(reject H0|H1istrue)
* As power increases, chances of Type II error (false negative) decrease
* Used in the design of experiments, to calculate the minimum sample size required so that one can reasonably detects an effect. i.e: how many times do I need to flip a coin to conclude it is biased?
* Used to compare tests. Example: between a parametric and a non-parametric test of the same hypothesis
In current teams driving data science, there has been an on-slot of discussions around which machine learning method to use and which algorithms perform optimally for which solutions.
There are several dependencies to make that decision. Some are primarily linked to:
1. Type of data: such as quantity, quality and varsity in data.
2. Resources for the task
3. Expected time for the task
4. Expectation from the data
Our friends at SAS has put together a great cheet sheet that could work as a great starting point.