Improving Big Data Governance with Semantics

By Dr. Jans Aasman Ph.d, CEO of Franz Inc.

Effective data governance consists of protocols, practices, and the people necessary for implementation to ensure trustworthy, consistent data. Its yields include regulatory compliance, improved data quality, and data’s increased valuation as a monetary asset that organizations can bank on.

Nonetheless, these aspects of governance would be impossible without what is arguably its most important component: the common terminologies and definitions that are sustainable throughout an entire organization, and which comprise the foundation for the aforementioned policy and governance outcomes.

When intrinsically related to the technologies used to implement governance protocols, terminology systems (containing vocabularies and taxonomies) can unify terms and definitions at a granular level. The result is a greatly increased ability to tackle the most pervasive challenges associated with big data governance including recurring issues with unstructured and semi-structured data, integration efforts (such as mergers and acquisitions), and regulatory compliance.

A Realistic Approach
Designating the common terms and definitions that are the rudiments of governance varies according to organization, business units, and specific objectives for data management. Creating policy from them and embedding them in technology that can achieve governance goals is perhaps most expediently and sustainably facilitated by semantic technologies, which are playing an increasingly pivotal role in the overall implementation of data governance in the wake of big data’s emergence.

Once organizations adopt a glossary of terminology and definitions, they can then determine rules about terms based on their relationships to one another via taxonomies. Taxonomies are useful for disambiguation purposes and can clarify preferred labels—among any number of synonyms—for different terms in accordance to governance conventions. These definitions and taxonomies form the basis for automated terminology systems that label data according to governance standards via inputs and outputs. Ingested data adheres to terminology conventions and is stored according to preferred labels. Data captured prior to the implementation of such a system can still be queried according to the system’s standards.

Linking Terminology Systems: Endless Possibilities
The possibilities that such terminology systems produce (especially for unstructured and semi-structured big data) are virtually limitless, particularly with the linking capabilities of semantic technologies. In the medical field, a hand written note hastily scribbled by a doctor can be readily transcribed by the terminology system in accordance to governance policy with preferred terms, effectively giving structure to unstructured data. Moreover, it can be linked to billing coding systems per business functions. That structured data can then be stored in a knowledge repository and queried along with other data, adding to the comprehensive integration and accumulation of data that gives big data its value.

Focusing on common definitions and linking terminology systems enables organizations to leverage business intelligence and analytics on different databases across business units. This method is also critical for determining customer disambiguation, a frequently occurring problem across vertical industries. In finance, it is possible for institutions with numerous subsidiaries and acquisitions (such as Citigroup, Citibank, Citi Bike, etc.) to determine which subsidiary actually spent how much money with the parent company and additional internal, data-sensitive problems by using a common repository. Also, linking the different terminology repositories for these distinct yet related entities can achieve the same objective.

The primary way in which semantics addresses linking between terminology systems is by ensuring that those systems are utilizing the same words and definitions for the commonality of meaning required for successful linking. Vocabularies and taxonomies can provide such commonality of meaning, which can be implemented with ontologies to provide a standards-based approach to disparate systems and databases.

Subsequently, all systems that utilize those vocabularies and ontologies can be linked. In finance, the Financial Industry Business Ontology (FIBO) is being developed to grant “data harmonization and…the unambiguous sharing of meaning across different repositories.” The life sciences industry is similarly working on industry wide standards so that numerous databases can be made available to all within this industry, while still restricting access to internal drug discovery processes according to organization.

Regulatory Compliance and Ontologies
In terms of regulatory compliance, organizations are much more flexible and celeritous to account for new requirements when data throughout disparate systems and databases are linked and commonly shared—requiring just a single update as opposed to numerous time consuming updates in multiple places. Issues of regulatory compliance are also assuaged in a semantic environment through the use of ontological models, which provide the schema that can create a model specifically in adherence to regulatory requirements.

Organizations can use ontologies to describe such requirements, then write rules for them that both restrict and permit access and usage according to regulations. Although ontological models can also be created for any other sort of requirements pertaining to governance (metadata, reference data, etc.) it is somewhat idealistic to attempt to account for all facets of governance implementation via such models. The more thorough approach is to do so with terminology systems and supplement them accordingly with ontological models.

Terminologies First
The true value in utilizing a semantic approach to big data governance that focuses on terminology systems, their requisite taxonomies, and vocabularies pertains to the fact that this method is effective for governing unstructured data. Regardless of what particular schema (or lack thereof) is available, organizations can get their data to adhere to governance protocols by focusing on the terms, definitions, and relationships between them. Conversely, ontological models have a demonstrated efficacy with structured data. Given the fact that the majority of new data created is unstructured, the best means of wrapping effective governance policies and practices around them is through leveraging these terminology systems and semantic approaches that consistently achieve governance outcomes.

About the Author: Dr. Jans Aasman Ph.d is the CEO of Franz Inc., an early innovator in Artificial Intelligence and leading supplier of Semantic Graph Database technology. Dr. Aasman’s previous experience and educational background include:
• Experimental and cognitive psychology at the University of Groningen, specialization: Psychophysiology, Cognitive Psychology.
• Tenured Professor in Industrial Design at the Technical University of Delft. Title of the chair: Informational Ergonomics of Telematics and Intelligent Products
• KPN Research, the research lab of the major Dutch telecommunication company
• Carnegie Mellon University. Visiting Scientist at the Computer Science Department of Prof. Dr. Allan Newell

Source: Improving Big Data Governance with Semantics by jaasman

Why the time is ripe for security behaviour analytics


Behaviour analytics technology is being developed or acquired by a growing number of information security suppliers. In July 2015 alone, European security technology firm Balabit released a real-time user behaviour analytics monitoring tool called Blindspotter and security intelligence firm Splunk acquired behaviour analytics and machine learning firm Caspida. But what is driving this trend?

Like most trends, there is no single driver, but several key factors that come together at the same time.

In this case, storage technology has improved and become cheaper, enabling companies to store more network activity data; distributed computing capacity is enabling real-time data gathering and analysis; and at the same time, traditional signature-based security technologies or technologies designed to detect specific types of attack are failing to block increasingly sophisticated attackers.
As companies have deployed security controls, attackers have shifted focus from malware to individuals in organisations, either stealing their usernames and passwords to access and navigate corporate networks without being detected or getting their co-operation through blackmail and other forms of coercion.

Stealing legitimate user credentials for both on-premise and cloud-based services is becoming increasingly popular with attackers as a way into an organisation that enables them to carry out reconnaissance, and it is easily done, according to Matthias Maier, European product marketing manager for Splunk.

“For example, we are seeing highly plausible emails that appear to be from a company’s IT support team telling a targeted employee their email inbox is full and their account has been locked. All they need to do is type in their username and password to access the account and delete messages, but in doing so, the attackers are able to capture legitimate credentials without using any malware and access corporate IT systems undetected,” he said.

An increase in such technique by attackers is driving a growing demand from organisations for technologies such as behaviour analytics that enable them to build an accurate profile of normal business activities for all employees. This means if credentials are stolen or people are being coerced into helping attackers, these systems are able to flag unusual patterns of behaviour.

Read complete article at:

Originally Posted at: Why the time is ripe for security behaviour analytics

Real-Time, Predictive Data Modeling

Traditionally, data modeling has been one of the most time-consuming facets of leveraging data-driven processes. This reality has become significantly aggravated by the variety of big data options, their time-sensitive needs, and the ever growing complexity of the data ecosystem which readily meshes disparate data types and IT systems for an assortment of use cases.

Attempting to design schema for such broad varieties of data in accordance with the time constraints required to act on those data and extract value from them is difficult enough in relational environments. Incorporating such pre-conceived schema with semi-structured, machine-generated data (and integrating them with structured data) complicates the process, especially when requirements dynamically change over time.

Subsequently, one of the most significant trends to impact data modeling is the emerging capability to produce schema on-the-fly based on the data themselves, which considerably accelerates the modeling process while simplifying the means of using data-centric options.

According to Loom Systems VP of Product Dror Mann, “We’ve been able to build algorithms that break the data and structure it. We break it for instance to lift the key values. We understand that this is the constant, that’s the host, that’s the celerity, that’s the message, and all the rest are just properties to explain what’s going on there.”

Algorithmic Data Modeling
The expanding reliance on algorithms to facilitate data modeling is one of the critical deployments of Artificial Intelligence technologies such as machine learning and deep learning. These cognitive computing capabilities are underpinned by semantic technologies which prove influential in on-the-fly data modeling at scale. The foregoing algorithms are effectual in such time-sensitive use cases partly because of classification technologies which “measure every type of metric in a single one” Mann explained. The automation potential of the use of classifications with AI algorithms is an integral part of hastening the data modeling process in these circumstances. As Mann observed, “For our usual customers, even if it’s a medium-sized enterprise, their data will probably create more than tens of thousands of metrics that will be measured by our software.” The classification enabled by semantic technologies allows for the underlying IT system to understand how to link the various data elements in a way which is communicable and sustainable according to the ensuing schema.

Pervasive Applicability
The result is that organizations are able to model data of various types in a way in which they are not constrained by schema, but rather mutate schema to include new data types and requirements. This ability to create schema as needed is vital to avoiding vendor lock-in and enabling various IT systems to communicate with one another. In such environments, the system “creates the schema and allows the user to manipulate the change accordingly,” Mann reflected. “It understands the schema from the data, and does some of the work of an engineer that would look at the data.” In fact, one of the primary use cases for such modeling is the real-time monitoring of IT systems which has become increasingly germane to both operations and analytics. Crucial to this process is the real-time capabilities involved, which are necessary for big data quantities and velocities. “The system ingests the data in real time and does the computing in real time,” Mann revealed. “Through the data we build a data set where we learn the pattern. From the first several minutes of viewing samples it will build a pattern of these samples and build the baseline of these metrics.”

From Predictive to Preventive
Another pivotal facet of automated data modeling fueled by AI is the predictive functionality which can prevent undesirable outcomes. These capabilities are of paramount importance in real-time monitoring of information systems for operations, and are applicable to various aspects of the Internet of Things and the Industrial Internet as well. Monitoring solutions employing AI-based data modeling are able to determine such events before they transpire due to the sheer amounts of data they are able to parse through almost instantaneously. When monitoring log data, for instance, these solutions can analyze such data and their connotations in a way which vastly exceeds that of conventional manual monitoring of IT systems. In these situations “the logs are being scanned in real time, all the time,” Mann noted. “Usually logs tell you a much richer story. If you are able to scan your logs at the information level, not just at the error level…you would be able to predict issues before they happen because the logs tell you when something is about to be broken.”

Going Forward
Data modeling is arguably the foundation of nearly every subsequent data-focused activity from integration to real-time application monitoring. AI technologies are currently able to accelerate the modeling phase in a way that enables these activities to be determined even more by the actual data themselves, as opposed to relying upon predetermined schema. This flexibility has manifold utility for the enterprise, decreases time to value, and increases employee and IT system efficiency. Its predictive potential only compounds the aforementioned boons, and could very well prove a harbinger of the future for data modeling. According to Mann:

“When you look at statistics, sometimes you can detect deviations and abnormalities, but in many cases you’re also able to detect things before they happen because you can see the trend. So when you’re detecting a trend you see a sequence of events and it’s trending up or down. You’re able to detect what we refer to as predictions which tells you that something is about to take place. Why not fix it now before it breaks?”

Source by jelaniharper

Data Scientists and the Practice of Data Science

ibminsightpanelpicI was recently involved in a couple of panel discussions on what it means to be a data scientist and to practice data science. These discussions/debates took place at IBM Insight in Las Vegas in Late October. I attended the event as IBM’s guest. The panels, moderated by Brian Fanzo, included me and these data experts:

I enjoyed our discussions and their take on the topic of data science. Our discussion was opened by the question “What is the role of a data scientist in the insight economy?” You can read each of our answers to this question on IBM’s Big Data Hub. While we come from different backgrounds, there was a common theme across our answers. We all think that data science is about finding insights in data to help make better decisions. I offered a more complete answer to that question in a prior post. Today, I want to share some more thoughts about other areas of the field of data science that we talked about in our discussions. The content below reflects my opinion.

What is a Data Scientist?

Data Scientist Skills
Figure 1. The three skills of data scientists

As more data professionals are now calling themselves data scientists, it’s important to clarify exactly what a data scientist is. One way to understand data scientists is to understand what kind of skills they bring to bear on analytics projects. It’s generally agreed that a successful data scientist is one who possesses skills across three areas: subject matter expertise in a particular field, programming/technology and statistics/math (see DJ Patil and Hilary Mason’s take, Drew Conway’s Data Science Venn Diagram (see Figure 1) and a review of many experts’ opinion on this topic.

AnalyticsWeek and I recently took an empirical approach to understanding the skills of data scientists by asking over 500 data professionals about their job roles and their proficiency across 25 data skills in five areas (i.e., business, technology, programming, math/modeling and statistics). A factor analysis of their proficiency ratings revealed three factors: business acumen, technology/programming skills and statistics/math knowledge.

Figure 2. Data professionals in different job roles are proficient in different data skills. Click image to enlarge.

A data scientist who possesses expertise in all data skills is rare. In our survey, none of the respondents were experts in all five skill areas. Instead, our results identified four different types of data scientists, each with varying levels of proficiency in data skills; as expected, different data professionals possessed role-specific skills (see Figure 2). Business Management professionals were the most proficient in business skills. Developers were the most proficient in technology and programming skills. Researchers were most proficient in math/modeling and statistics. Creatives did not possess great proficiency in any one skill.

The Practice of Data Science: Getting Insights from Data

Gil Press offers a great summary of the field of data science. He traces the literary history of the term (term first appears in use in 1974) and settles on the idea that data science is way of extracting insights from those data using the powers of computer science and statistics applied to data from a specific field of study.

Figure 3. Six Phases of the CRISP-DM (Cross Industry Standard Process for Data Mining) methodology. Download the IBM SPSS Modeler CRISP-DM Guide here.
But how do you get insights from data? Bernard Marr offers his 5-step SMART approach to extract information. SMART stands for:

  • S = Start with Strategy
  • M = Measure Metrics and Data
  • A = Apply Analytics
  • R = Report Results
  • T = Transform your Business

Another approach is the 6-step CRISP-DM (Cross Industry Standard Process for Data Mining) method (see Figure 3). In a KDNuggets Poll in 2014, the CRISP-DM method was the most popular methodology (43%) used by data professionals for analytics, data mining, and data science projects.

These two approaches have a lot in common with each other and both share a lot with a method that has been around for about 1000 years: the scientific method (see Alhazen, a forerunner of the scientific method). The scientific method follows these general steps (see figure 4):

Figure 1. The scientific method is a way to get insights from your data
Figure 4. The scientific method is a way to get insights from your data
  1. Formulate a question or problem statement
  2. Generate a hypothesis that is testable
  3. Gather/Generate data to understand the phenomenon in question. Data can be generated through experimentation; when we can’t conduct true experiments, data are obtained through observations and measurements.
  4. Analyze data to test the hypotheses / Draw conclusions
  5. Communicate results to interested parties or take action (e.g., change processes) based on the conclusions. Additionally, the outcome of the scientific method can help us refine our hypotheses for further testing.

The value of data is measured by what you do with it. Whether you’re investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge, the scientific method is an effective way to systematically interrogate your data. Scientists may differ with respect to the variables they use and the problems they study (e.g., medicine, education and business), but they all use the scientific method to advance bodies of knowledge.

Data is, has been and forever will be at the heart of science. The scientific method necessarily involves the collection of empirical evidence, subject to specific principles of reasoning. That is the practice of science, a way of extracting knowledge from data. Data science is science.

The Democratization of Data Science

Taking a scientific approach to analyzing data is not only valuable to data workers; it is also valuable for people who consume, interpret and make decisions based the analysis of those data. In business, data users need to think critically about sales reports, social media metrics and quarterly reports. Application vendors are marketing their tools and platforms as a way of making everybody a data scientist, enabling end users (i.e., data users) to get advanced statistical and visualization capabilities to find insights (see Prelert’s take on this here, Tableau’s ideas here and Umbel’s call here).

I believe that the democratization of data science is not only a software problem but also an education problem. Companies need to provide their employees training on statistics and statistical concepts. This type of training gives the employees the ability to think critically about the data (e.g., data source, measurement properties and relevance of the metrics). The better the grasp of statistics employees have, the more insight/value/use they will get from the software they use to analyze/visualize that data.

Statistics is the language of data. Like knowledge of your native language helps you maneuver in the world of words, statistics will help you maneuver in the world of data. As the world around us becomes more quantified, statistical skills will become more and more essential in our daily lives. If you want to make sense of our data-intensive world, you will need to understand statistics.

Conclusions and Final Thoughts

Businesses are relying on data professionals with unique skills to make sense of their data. These data professionals apply their skills to improve decision-making in humans or algorithms. Getting from data to insights, data professionals can adopt a systematic approach to optimize the use of their skills. Following are some conclusions about data scientists and the practice of data science.

  • The practice of data science requires three skills: subject matter expertise, computing skills and statistical knowledge.
  • The general term, ‘data scientist,’ is ambiguous. Our research studied four different types of data scientists: Business management, Programmer, Creative and Researcher. Each role possessed different strengths.
  • Science is a way of thinking, a way of testing ideas using data. An effective practice of data science includes the scientific method. I think that the term, ‘data science,’ is redundant. It’s just science. Science requires the use of data, data to help you understand your business and how the world really works.
  • Offer employees training on statistics. Giving people analytics software and expecting them to excel at data science is like giving them a stethescope and expecting them to excel at medicine. The better they understand the language of data, the more value they will get from the analytics software they use.

I’ll leave you with some thoughts on data science I shared with Nick Dimeo at IBM Insight.

I would love to hear your thoughts on data scientists and the practice of data science. What do those terms mean to you?

Originally Posted at: Data Scientists and the Practice of Data Science by bobehayes

Smart Data Modeling: From Integration to Analytics

There are numerous reasons why smart data modeling, which is predicated on semantic technologies and open standards, is one of the most advantageous means of effecting everything from integration to analytics in data management.

  • Business-Friendly—Smart data models are innately understood by business users. These models describe entities and their relationships to one another in terms that business users are familiar with, which serves to empower this class of users in myriad data-driven applications.
  • Queryable—Semantic data models are able to be queried, which provides a virtually unparalleled means of determining provenance, source integration, and other facets of regulatory compliance.
  • Agile—Ontological models readily evolve to include additional business requirements, data sources, and even other models. Thus, modelers are not responsible for defining all requirements upfront, and can easily modify them at the pace of business demands.

According to Cambridge Semantics Vice President of Financial Services Marty Loughlin, the most frequently used boons of this approach to data modeling is an operational propensity in which, “There are two examples of the power of semantic modeling of data. One is being able to bring the data together to ask questions that you haven’t anticipated. The other is using those models to describe the data in your environment to give you better visibility into things like data provenance.”

Implicit in those advantages is an operational efficacy that pervades most aspects of the data sphere.

Smart Data Modeling
The operational applicability of smart data modeling hinges on its flexibility. Semantic models, also known as ontologies, exist independently of infrastructure, vendor requirements, data structure, or any other characteristic related to IT systems. As such, they can incorporate attributes from all systems or data types in a way that is aligned with business processes or specific use cases. “This is a model that makes sense to a business person,” Loughlin revealed. “It uses terms that they’re familiar with in their daily jobs, and is also how data is represented in the systems.” Even better, semantic models do not necessitate all modeling requirements prior to implementation. “You don’t have to build the final model on day one,” Loughlin mentioned. “You can build a model that’s useful for the application that you’re trying to address, and evolve that model over time.” That evolution can include other facets of conceptual models, industry-specific models (such as FIBO), and aspects of new tools and infrastructure. The combination of smart data modeling’s business-first approach, adaptable nature and relatively rapid implementation speed is greatly contrasted with typically rigid relational approaches.

Smart Data Integration and Governance
Perhaps the most cogent application of smart data modeling is its deployment as a smart layer between any variety of IT systems. By utilizing platforms reliant upon semantic models as a staging layer for existing infrastructure, organizations can simplify data integration while adding value to their existing systems. The key to integration frequently depends on mapping. When mapping from source to target systems, organizations have traditionally relied upon experts from each of those systems to create what Loughlin called “ a source to target document” for transformation, which is given to developers to facilitate ETL. “That process can take many weeks, if not months, to complete,” Loughlin remarked. “The moment you’re done, if you need to make a change to it, it can take several more weeks to cycle through that iteration.”

However, since smart data modeling involves common models for all systems, integration merely includes mapping source and target systems to that common model. “Using common conceptual models to drive existing ETL tools, we can provide high quality, governed data integration,” Loughlin said. The ability of integration platforms based on semantic modeling to automatically generate the code for ETL jobs not only reduces time to action, but also increases data quality while reducing cost. Additional benefits include the relative ease in which systems and infrastructure are added to this process, the tendency for deploying smart models as a catalog for data mart extraction, and the means to avoid vendor lock-in from any particular ETL vendor.

Smart Data Analytics—System of Record
The components of data quality and governance that are facilitated by deploying semantic models as the basis for integration efforts also extend to others that are associated with analytics. Since the underlying smart data models are able to be queried, organizations can readily determine provenance and audit data through all aspects of integration—from source systems to their impact on analytics results. “Because you’ve now modeled your data and captured the mapping in a semantic approach, that model is queryable,” Loughlin said. “We can go in and ask the model where data came from, what it means, and what conservation happened to that data.” Smart data modeling provides a system of record that is superior to many others because of the nature of analytics involved. As Loughlin explained, “You’re bringing the data together from various sources, combining it together in a database using the domain model the way you described your data, and then doing analytics on that combined data set.”

Smart Data Graphs
By leveraging these models on a semantic graph, users are able to reap a host of analytics benefits that they otherwise couldn’t because such graphs are focused on the relationships between nodes. “You can take two entities in your domain and say, ‘find me all the relationships between these two entities’,” Loughlin commented about solutions that leverage smart data modeling in RDF graph environments. Consequently, users are able to determine relationships that they did not know existed. Furthermore, they can ask more questions based on those relationships than they otherwise would be able to ask. The result is richer analytics results based on the overarching context between relationships that is largely attributed to the underlying smart data models. The nature and number of questions asked, as well as the sources incorporated for such queries, is illimitable. “Semantic graph databases, from day one have been concerned with ontologies…descriptions of schema so you can link data together,” explained Franz CEO Jans Aasman. “You have descriptions of the object and also metadata about every property and attribute on the object.”

Modeling Models
When one considers the different facets of modeling that smart data modeling includes—business models, logical models, conceptual models, and many others—it becomes apparent that the true utility in this approach is an intrinsic modeling flexibility upon which other approaches simply can’t improve. “What we’re actually doing is using a model to capture models,” Cambridge Semantics Chief Technology Officer Sean Martin observed. “Anyone who has some form of a model, it’s probably pretty easy for us to capture it and incorporate it into ours.” The standards-based approach of smart data modeling provides the sort of uniform consistency required at an enterprise level, which functions as means to make data integration, data governance, data quality metrics, and analytics inherently smarter.


Hacking the Data Science

Hacking the Data Science
Hacking the Data Science

In my previous blog on the convoluted world of data scientist, I shed some light on who exactly is data scientist. There was a brief mention of how the data scientist is a mix of Business Intelligence, Statistical Modeler and Computer Savvy IT folks. The write-up discussed on how businesses should look at their workforce for data science as a capability and not as data scientist as a job. One area that has not been given its due share is how to get going in building data science area. How the businesses should proceed in filling their data science space. So, in this blog, I will spend sometime explaining an easy hack to get going on your data science journey without bankrupting yourself of hiring boatload of data scientists.

Let’s first try visiting what is already published on this area. A quick thought that comes to mind when thinking about the image that shows data science as three overlapping circles. One is Business, one is statistical modeler and one is technology. Where further common area shared between Technology, Business and statistician is written as data science. This is a great representation of where data science lies. But it sometimes confuses the viewer as well. From the look of it, one could guess that overlapping region comprises of the professionals who possess all the 3 talents and it’s about people. Whereas all it is suggesting is that overlap region contains common use cases that requires all 3 areas of business, statistician and technology.

Also, the image of 3 overlapping circles does not convey the complete story as well. It suggests overlap of some common use cases but not how resources will work across the three silos. We need a better representation to convey the accurate story. This will help in better understanding on how businesses should go about attacking the data science vertical in effective manner. We need a representation keeping in mind the workforce that is represented by these circles. Let’s call these resources in 3 broad silos Business Intelligence folks represented by Business circle, Statistical modeler is represented by statistician circle and IT/Computer engineers are represented by Technology circle. For simplicity lets assume these circles are not touching with each other and they are represented as 3 separated islands. This will provide a clear canvas of where we are and where we need to get.

This journey from 3 separated circles to 3 overlapping circle communicates some signals to understand how to achieve this capability from the talent perspective. We are given 3 separate islands a task to join them. There are few alternatives that comes to mind:

1. Hire data scientists, have them build their own circle in the middle. Let them keep expanding on their capability in all 3 directions (Business, Statistics and Technology). This will keep increasing the middle bubble to a point it touches and overlaps all the 3 circles. This particular solution is resulting in mass hiring of data scientists by mid-large scale enterprises. Most of these data scientist were not given real use cases and they are trying to find how they could bring the 3 circle closer to make the overlap happen. It does not take long to understand that this is not the most efficient process as it costs a bunch to businesses. This method sounds juicy as it gives Human Resources a good night sleep as HR could acquire Data Scientist talents for an area, which is high in demand. Now everyone needs to work on those hires and teach them 3 circles and the culture associated with it. Good thing in this method is that these professionals are extremely skilled and could roll the dice pretty quickly. However, they might take their own pace when it comes to identifying use cases and justifying their investments.

2. Another way that comes to mind is to set aside some professionals from each circle to start digging their way to common area where all the three could meet, learn from each other. Collaboration brings the best in most people. Via collaborative way, these professionals bring their respective culture, their SMEs in their line of business to the mix. This method looks to be the optimum solution, as it requires no outside help. This method does provide an organic way to build data science capability but it could take forever before these 3 camps could come to same page. This slowness, also trips this particular method as one of the most efficient one.

3. If one method is expensive but fast and other is cost effective but slow, what is the best method? It is somewhere between the slider of fast-expensive and slow-cost effective. So, hybrid looks to be bringing the best of both worlds. Having a data scientist and a council of SMEs from respective circles working together could keep the expense at check and at the same time brings the three camps closer faster via their SMEs. How many data scientist to begin with? Answer could be found out based on the company size, its complexity and wallet size. Now you could further hack the process to hire contracting data scientist to work as a liaison till the three camps find their overlap in professional capacity. So, this is the particular method which could businesses could explore to hack the data science world and get their businesses to big data compliant and data driven capable business.

So, experimenting with hybrid of shared responsibility between 3 circles of business, statistics and technology with a data scientist as a manager will bring businesses to speed when it comes to adapting themselves to big data ways of doing things.

Source by v1shal

New MIT algorithm rubs shoulders with human intuition in big data analysis

We all know that computers are pretty good at crunching numbers. But when it comes to analyzing reams of data and looking for important patterns, humans still come in handy: We’re pretty good at figuring out what variables in the data can help us answer particular questions. Now researchers at MIT claim to have designed an algorithm that can beat most humans at that task.

[AI can now muddle its way through the math SAT about as well as you can]

Max Kanter, who created the algorithm as part of his master’s thesis at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) along with his advisor Kalyan Veeramachaneni, entered the algorithm into three major big data competitions. In a paper to be presented this week at IEEE International Conference on Data Science and Advanced Analytics, they announced that their “Data Science Machine” has beaten 615 of the 906 human teams it’s come up against.

The algorithm didn’t get the top score in any of its three competitions. But in two of them, it created models that were 94 percent and 96 percent as accurate as those of the winning teams. In the third, it managed to create a model that was 87 percent as accurate. The algorithm used raw datasets to make models predicting things such as when a student would be most at risk of dropping an online course, or what indicated that a customer during a sale would turn into a repeat buyer.

Kanter and Veeramachaneni’s algorithm isn’t meant to throw human data scientists out — at least not anytime soon. But since it seems to do a decent job of approximating human “intuition” with much less time and manpower, they hope it can provide a good benchmark.

[MIT researchers can listen to your conversation by watching your potato chip bag]

“If the Data Science Machine performance is adequate for the purposes of the problem, no further work is necessary,” they wrote in the study.

That might not be sufficient for companies relying on intense data analysis to help them increase profits, but it could help answer data-based questions that are being ignored.

“We view the Data Science Machine as a natural complement to human intelligence,” Kanter said in a statement. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”

This post has been updated to clarify that Kalyan Veeramachaneni also contributed to the study. 

View original post HERE.

Originally Posted at: New MIT algorithm rubs shoulders with human intuition in big data analysis

From Mad Men to Math Men: Is Mass Advertising Really Dead?

The Role of Media Mix Modeling and Measurement in Integrated Marketing.

At the end of the year Ogilvy is coming out with a new book on the role of Mass Advertising in this new world of Digital Media, Omni-channel and quantitative marketing in which we now live. Forrester research speculated about a more balanced approach to marketing measurement at its recent conference. Forrester proclaimed that gone are the days of the unmeasured Mad men approach to advertising with its large spending on ad buys that largely only drove soft metrics such as Brand Impressions and Customer Consideration. The new balanced approach to ad measurement includes a more Mathematical approach where programs have hard metrics many of which are financial (Sales, ROI, Quality Customer relationships) is here to stay. The hypothesis that Forrester put forward in their talk was that marketing has almost gone too far to Quantitative Marketing and they suggested the best blend is where Marketing Research as well as quantitative and behavioral data both have a role in measuring integrated campaigns. So what does that mean for Mass Advertising you ask?

Firstly, and the ad agencies can breathe a sigh of relief, Mass Marketing is not dead but is subject to many new standards namely:

Mass will be a smaller part of the Omni-Channel Mix of activities that CMO’s can allocate their spending toward but that allocation should be guided by concrete measures and for very large buckets of spend, Media Mix or Marketing Mix Modeling can help with decision making. The last statistic that we saw from Forrester was that Digital Media spend was about to or already surpassed mass media ad spend. So CMO’s should not be surprised to see that SEM, Google AdWords, Digital/Social and Direct Marketing are a significant 50% or more of the overall investment.
CFO’s are asking CMO’s for the returns on programmatic and digital media buys. How much money does each media buy make and how do you know it’s working? Gone are the days of “always on” mass advertising that could get away with reporting back only GRP’s or brand health measures. The faster the CMO’s are on board with this shift the better they can ensure a dynamic process for marketing investments.
The willingness to turn off and match market test mass media to ensure that it is still working. Many firms need to assess whether TV, and Print works for their brand, or campaign in light of their current target markets. Many ad agencies and marketing service providers have advanced audience selection and matching tools to help with this problem. (Nielsen, Acxiom and many more) These tools typically integrate audience profiling as well as privacy protected behavior information.
The Need to run more integrated campaigns with standard offers across channels and a smarter way of connecting the call to action and the drivers to Omni-channel within each media. For example, mention that consumers can download the firm’s app in the app store in other online advertising. The Integration of channels within a campaign will require more complex measurement attribution rules as well as additional test marketing and test and learn principles.
In this post we briefly explore two of these changes namely media mix modeling and advertising measurement options. If you want more specifics, please feel free to contact us if we can be helpful at

Firstly, let’s discuss that there is always a way to measure mass advertising and that it is not true that you need to leave it turned on for all eternity to do so for example: If you want to understand does my Media buy in NYC or a matched Market like LA (Large Markets) bring in the level of sales and inquiry to other channels that I need at a certain threshold, we posit that you can always:

Conduct a simple pre and post ad campaign lift analysis to determine the level of sales and other performance metrics prior to, during and after the ad campaign has run.
Secondly, you can hold out a group of matched markets to serve as control for performance comparison against the market you are running the ad in.
Shut off the media in a market for a very brief period of time. This can allow you to compare “dark” period performance with the “on” program period with Seasonality adjustments to derive some intelligence about performance and perhaps create a performance factor or base line from which to measure going forward. Such a factor can be leveraged for future measurement without shutting off programs. This is what we call dues paying in marketing analytics. You may have to sacrifice a small amount of sales to read the results each year. This is one way to ensure measurement of mass advertising.
Finally, you can study any behavioral changes in cross sell and upsell rates of current customers who may increase their relationship because of the campaign you are running.
Another point to make is that Enterprise Marketing Automation can help with the tracking and measuring of ad campaigns. For really large Integrated Marketing Budgets we recommend, Media Mix Modeling or Marketing Mix Modeling. There are a number of firms(Market Share Partners Inc is one firm.) that provide these models and we can discuss that in future posts. The Basic Mechanics of Marketing Mix Modeling(MMM) is as follows:

MMM uses econometrics to help understand the relationship between Sales and the various marketing tactics that drive Sales

Combines a variety of marketing tactics (channels, campaigns, etc.) with large data sets of historical performance data
Regression Modeling and Statistical analysis is often performed on available data to estimate the impact of various promotional tactics on sales in order to forecast the future sets of promotional tactics
This analysis allows you to predict sales based on mathematical correlations to historical marketing drivers & market conditions
Ultimately, the model uses predictive analytics to optimize future marketing investments to drive increases in sales, profits & share
Allows you to understand ROI for each media channel, campaign and execution strategy
Which media vehicles/campaigns are most effective at driving revenue/profit and share
Shows you what your incremental sales would be at different levels of media spend
Optimal Spending Mix by Channel to generate the most sales
The model establishes a link between individual drivers and Sales
Allows you to identify a sales response curve to advertising
So the good news is … Mass Advertising is far from dead! Its effectiveness will be looked at in the broader context of integrated campaigns with an eye toward contributing hard returns such as more customers and quality relationships. In addition, Mass advertising will be looked at in the context of how it integrates with digital, for example when the firm runs the TV ad do searches for the firm’s products and brand increase and then do those searches convert to sales. The Funnel in the new Omni-channel world is still being established.

In Summary, overall Mass advertising must be understood at the segment, customer and brand level as we believe it has a role in the overall marketing mix when targeted and used in the most efficient way. A more thoughtful view of marketing efficiency is now emerging and includes such aspects as matching the TV ad in the right channel to the right audience, understanding metrics and measures as well as integration points in how mass can complement digital channels as part of an integrated Omni-Channel Strategy. Viewing Advertising as a Separate discipline from Digital Marketing is on its way to disappearing and marketers must be well versed in both online and offline as the lines will continue to blur, change and optimize. Flexibility is key. The Organizations inside companies will continue to merge to reflect this integration and to avoid siloed thinking and sub-par results.

We look forward to dialoguing and getting your thoughts and experience on these topics and understanding counterpoints and other points of view to ours.

Thanks Tony Branda


Mad Men and Math Men

Source: From Mad Men to Math Men: Is Mass Advertising Really Dead?

Better Recruiting Metrics Lead to Better Talent Analytics


According to Josh Bersin in Deloitte’s 2013 report, Talent Analytics: From Small Data to Big Data, 75% of HR leaders acknowledge analytics are important to the success of their organizations. But 51% have no formal talent analytics plans in place. Nearly 40% say they don’t have the resources to conduct sound talent analytics. Asked to rate their own workforce analytics skills, another 56% said poor.

As Bersin further noted in a recent PeopleFluent article, HR Forecast 2014, “Only 14% of the companies we studied are even starting to analyze people-related data in a statistical way and correlate it to business outcomes. The rest are still dealing with reporting, data cleaning and infrastructure challenges.”

There’s a striking gap between the large number of companies that recognize the importance of metrics and talent analytics and the smaller number that actually have the means and expertise to put them to use.

Yes, we do need to gather and maintain the right people data first, such as when and where applicants apply for jobs, and the specific skills an employee has. But data is just information captured by recruiting system or software already in place. It doesn’t tell any story.

Compare data against goals or thresholds and it turns into insight, a.k.a workforce metrics — measurements with a goal in mind, otherwise known as Key Performance Indicators (KPIs), all of which gauge quantifiable components of a company’s performance. Metrics reflect critical factors for success and help a company measure its progress towards strategic goals.

But here’s where it gets sticky. You don’t set off on a cross-country road trip until you know how to read the map.

For companies, it’s important to agree on the right business metrics – and it all starts with recruiting. Even with standard metrics for retention and attrition in place, some companies also track dozens of meaningless metrics— not tied to specific business goals, not helping to improve business outcomes.

I’ve seen recruiting organizations spend all their time in the metrics-gathering phase, and never get around to acting on the results — in industry parlance, “boiling the ocean.” You’re far better off gathering a limited number of metrics that you actually analyze and then act upon.

Today many organizations are focused on developing recruiting metrics and analytics because there’s so much data available today on candidates and internal employees (regardless of classification). Based on my own recruiting experience and that of many other recruiting leaders, here are what I consider the Top 5 Recruiting Metrics:

1. New growth vs. attrition rates. What percentage of the positions you fill are new hires vs. attrition? This shows what true growth really looks like. If you are hiring mostly due to attrition, it would indicate that selection, talent engagement, development and succession planning need attention. You can also break this metric down by division/department, by manager and more.

2. Quality of hires. More and more, the holy grail of hiring. Happily, all measurable: what individual performances look like, how long they stay, whether or not they are top performers, what competencies comprise their performance, where are they being hired from and why.

3. Sourcing. Measuring not just the what but the why of your best talent pools: job boards, social media, other companies, current employees, etc. This metric should also be applied to quality of hire: you’ll want to know where the best candidates are coming from. Also, if you want to know the percentage rate for a specific source, divide the number of source hires by the number of external hires. (For example, total Monster job board hires divided by total external hires.)

4. Effectiveness ratio. How many openings do you have versus how many you’re actually filling?  You can also measure your recruitment rate by dividing the total number of new hires per year by the total number of regular headcount reporting to work each year. Your requisitions filled percent can be tallied by dividing the total number of filled requisitions by the total number of approved requisitions.

5. Satisfaction rating. An important one, because it’s not paid much attention to when your other metrics are in good shape. Satisfaction ratings can be gleaned from surveys of candidates, new hires and current employees looking for internal mobility. While your overall metrics may be positive, it’s important to find out how people experience your hiring process.

As your business leaves behind those tedious spreadsheets and manual reports and moves into Talent Analytics, metrics are going to be what feeds those results. Consider which metrics are the most appropriate for your business — and why. And then, the real analysis can begin, and help your organization make better talent-related decisions.

Article originally appeared HERE.

Source by analyticsweekpick