Impacting Insurance Company’s bottom line through Big-data

In one of the recent blog I published What Insurance companies could do to save others from eating their lunch, I have stated the importance of good data management as one of the essential component for business growth. Big data fits right into that alley.

What is big-data?
In ideal scenario, big-data definition change from case to case. But, to summarize Wikipedia does a good job: In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. The challenges include capture, storage, search, sharing, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to “spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.”

Why is it important for insurance and why should they care?
Insurance companies have been gathering data (both structured and unstructured) for years now. So, the current big-data landscape fits their bill well. Insurance companies could use big-data to analyze their data and learn great deal of insights to service customers better and help differentiate them from competitors. Here are a few used cases that should motivate insurers from embracing big-data in their arsenal.

1.     Do linkage analysis of structured and unstructured data: Insurance companies have been collecting data for eons. This data is placed in either structured or unstructured form. Before the age of sophisticated analytical tools, it was nearly impossible to comb the data for any further insights considering the amount of effort and cost vs expected outcome. Thanks to big-data, a lot of tools have emerged that are well capable of doing that task with minimum resource requirement and promise great outcomes. So, it should be taken as an opportunity for insurance companies to look deep in their data silos and process them to find meaningful  correlations and insights to further help business.

2.     Use public data from social web and scoop for prospect signals: Another big area that has been unleashed by sophisticated big-data tool is capturing social-web and searching it for any meaningful keyword, and use it to understand the insurance landscape. For example, consider looking for keywords that are utilized to describe one’s business and see how much you lead that space. There are many other used cases that are super critical to insurance and could be solved by big-data tools.

3.     Use data from social web to spy on competition: This is another powerful used case being used by many companies to better understand their competition, their brand perception and their social media footprint. It is done by sniffing on public web activity for competition and further analyzes the findings to learn more about competition. Real-time nature of the data makes it all the more interesting keeping information current and real-time.

4.     Sniffing and processing all the product interfaces for insights: This is another big area harnessed by big-data tools. Due to superior analytical skills, big-data tools could also help in providing real-time insights from data collected from all the product interfaces. Whether it is verbal(call-center logs, queries etc.) or non-verbal data(logs, activity report, market conditions etc.). Once an appropriate model-framework to consume that data is build, big-data tools could get to job and start real-time analysis of customers, sales and provide invaluable actionable insights.

5.     Big-data for data driven innovation: I have been a strong advocate for data driven innovation. Data driven innovation is innovating using the power of data. Once appropriate modules are identified that could advocate innovations, their information could be then processed and monitored for any correlations with business critical KPIs. Once a direct link is established, tweaking the process and monitoring its impact on the system and quickly help in understanding the areas for improvement. So, this module could be used to create innovation and promote lean build-measure-learn loops for faster learning and deployment. This will drastically reduce the execution cycle for testing innovations.

I am certain that there are numerous other areas, in which insurance could pitch in. Feel free to share your thoughts in comments.

Source by d3eksha

Which Machine Learning to use? A #cheatsheet

In current teams driving data science, there has been an on-slot of discussions around which machine learning method to use and which algorithms perform optimally for which solutions.

There are several dependencies to make that decision. Some are primarily linked to:
1. Type of data: such as quantity, quality and varsity in data.
2. Resources for the task
3. Expected time for the task
4. Expectation from the data

Our friends at SAS has put together a great cheet sheet that could work as a great starting point.

Chart picked from: SAS Blog

Originally Posted at: Which Machine Learning to use? A #cheatsheet

Data And Analytics Collaboration Is A Win-Win-Win For Manufacturers, Retailers And Consumers

Collaboration between consumer packaged goods (CPG) manufacturers and retailers is more important than ever. Why? It’s because manufacturers struggle with fading brand loyalty from price sensitive consumers who are increasingly switching to private label, niche or locally-produced products. And at the same time, retailers face two major challenges of their own: showrooming—when shoppers come into the store to browse before buying online—and pressures from aggressive online retailers.

To best counter these and other issues, manufacturers and retailers need to engage as partners. Through data and analytic collaboration, they can establish a critical connection that enables them to work together to co-create differentiated in-store experiences that deliver mutual benefits.

Everybody Wins

Consumers used to get product information from advertisements and by talking to salespeople in brick-and-mortar stores. Today, shoppers aggressively research and compare products before setting foot inside a store. Or even while in the store by looking across competitive retailer outlets on their smartphones and tablets. Due to the mobile revolution, prices, product variations and reviews are more available and easier to compare than ever.
This showrooming trend can result in lost customers and lost revenues. It also renders traditional approaches to collaboration between manufacturers and retailers ineffective. Making decisions based on historical sales data is no longer sufficient. Driving category growth is increasingly about serving the right information to the shopper at the right time to support a purchase decision.

What’s needed is cooperation along with analysis of integrated data to deliver actionable insights that enable better brand, product, packaging, supply chain and business planning decisions; and to power shopper marketing programs in-store and online. This approach benefits not only manufacturers and retailers, but also consumers who enjoy the advantages of shopper reward and loyalty programs.

The Winning Road

With many shopping decisions being made outside the store environment, there is an increased priority placed on understanding and influencing shopper behavior at many points along the path to purchase. Mobile, social networks, Web and email channels are the new media used every day by marketers to target content and offers that drive purchase activity. One-to-one relationships are becoming the new currency upon which the most valued brands are based, while creating unique shopper experiences has come to define retail excellence.

Leading CPG companies have differentiated themselves by executing laser-focused consumer connection strategies based on data analytics. A variety of data-driven decisions, from assortment and inventory planning through pricing and trade promotion, all affect shopper purchase outcomes.

Based on integrated and detailed data from sources such as the retailer’s point-of-sale system, loyalty programs, syndicated sources and data aggregators, analytics allows CPG companies to become more relevant to their consumers by meeting their needs, earning their loyalty and building relationships. In short, advanced analytics separate successful retail-manufacturer partnerships from those that aren’t.
Focus On Demand

Maintaining an efficient distribution and inventory process is critical to maximizing financial performance and meeting buyers’ expectations. Sharing shopper data and insights supports concepts such as collaborative demand forecasting, dynamic replenishment and vendor-managed inventory.

Price, promotion and shelf placement are critical areas that drive collaboration, but the efforts are often based on summary-level and infrequently updated data. To effectively move the needle in managing a category at the shelf, organizations must have a strong analytics foundation. Armed with better insights, category managers, store operations leaders, merchandise planners and allocation decision-makers can optimize the factors that influence sales performance of products in specific categories, geographies and stores.

Value In Data-Driven Collaboration

CPG manufacturer and retail business executives recognize the value of fact-based decision making enabled by integrated data and real-time analytics. Data-driven collaboration establishes a beneficial connection that allows both sides to achieve common objectives, including increased product sales, growth in revenue and brand loyalty.

Gib Bassett is the global program director for Consumer Goods at Teradata.

Justin Honaman is a Partner with Teradata and leads the National Consumer Goods Industry Consulting practice.

This story originally appeared in the Q2 2014 issue of Teradata Magazine.

Originally posted via “Data And Analytics Collaboration Is A Win-Win-Win For Manufacturers, Retailers And Consumers”.

Originally Posted at: Data And Analytics Collaboration Is A Win-Win-Win For Manufacturers, Retailers And Consumers

Caterpillar digs in to data analytics—investing in hot startup Uptake

The mining and construction equipment maker wants a piece of the industrial Internet. Its strategy? Turn to the startup world for help.

General Electric isn’t the only industrial giant attempting to jumpstart its business with data and software services. On Thursday morning Caterpillar announced it has made a minority investment in Uptake, a Chicago-based data analytics platform co-founded by long-time entrepreneur Brad Keywell, who is profiled in the current issue of Fortune.

As part of the agreement, Caterpillar and Uptake will co-develop “predictive diagnostics” tools for the larger company’s customers. (Uptake says it is also working with other players in insurance, automotive and healthcare, though it won’t disclose other names.) The idea? To take the mountains of data spewing from bulldozers and hydraulic shovels and turn it into meaningful information that can help Caterpillar’s customers catch potential maintenance issues before breakdowns occur, minimizing downtime.

“We had some experience in this [data analytics] because of our our autonomous mining equipment,” Doug Oberhelman, Caterpillar’s CEO, said in an interview with Fortune last month. “But we were really looking for somebody to help us jumpstart this. And that’s where the lightbulb went on between Brad and I.”

Oberhelman’s company, based in Peoria, Ill., is a 90-year-old manufacturer whose performance is often seen as a gauge of the health of the global economy. Uptake, meanwhile, is the brainchild of Keywell, a serial entrepreneur best known for co-founding daily-deal site Groupon. But the two CEOs bonded at a Chicago-area breakfast hosted by—of all people—British prime minister David Cameron.

“This is an early example of something that will become commonplace at some point—entrepreneurs trying to disrupt within an industry rather than disrupt an industry from the outside,” Keywell told Fortune.

Of course, disrupting GE’s $1 billion head start in data analytics (which includes a massive software center the company has built in the Bay Area) won’t be easy. CEO Jeff Immelt has made it clear that the so-called “industrial Internet” will bring about the next wave of (much-needed) growth for his company, and the company has been hard at work developing applications based on its Predix platform and lining up customers like United Airlines and BP.

Uptake, located in the former Montgomery Ward building in Chicago (where many of Keywell’s other startups are also headquartered) has about 100 employees, including a handful of data scientists from nearby universities. It is a speck compared to GE, the 27th largest company in the world. But it’s ability to snag Caterpillar as both an investor and a customer won’t go unnoticed: Not everyone, especially competitors, will want to buy software from GE.

Caterpillar isn’t disclosing how much money it is putting into Uptake, but the two companies have already been working closely for several months (Keywell’s venture capital fund, Lightbank, has also invested in the startup). The ROI for Caterpillar could be far-reaching. While it takes three to five years to design and build a new bulldozer, Uptake’s product development is measured in days and weeks. In an industry that will increasingly have to rely on software services for growth, incorporating some of the startup world’s DNA—speed and agility—is a smart bet for Caterpillar.

Originally posted via “Caterpillar digs in to data analytics—investing in hot startup Uptake”

Originally Posted at: Caterpillar digs in to data analytics—investing in hot startup Uptake

5 Brands That Are Using the Internet of Things in Awesome Ways

mcdonaldsIs your refrigerator running a spam operation? Thanks to the Internet of Things, the answer to that question could be yes.

Despite some dystopian fears, like that spamming refrigerator, the Internet of Things isn’t just an eerie term that sounds like it was plucked from Brave New World. It is a vague one though, so to clear up any uncertainty, here’s the dictionary definition: “a proposed development of the Internet in which everyday objects have network connectivity, allowing them to send and receive data.”

As Altimeter Group points out in its new report, “Customer Experience in the Internet of Things,” brands are already using this sci-fi technology in amazing ways to build customer relationships and optimize their products. In reality, it’s more evolution than revolution, as companies are already tracking smartphone and Internet usage to gather data that provides crucial feedback about consumer behavior. As the report states, the Internet of Things only “brings us closer than ever to the ultimate marketing objective: delivering the right content or experience in the right context.”

Talk of trackers and sensors and refrigerators gone wild may sound intimidating for brands that are still getting their content operations up and running, but some major companies are already exploring the new frontier of the Internet of Things. Here are the five brands doing it best.

1. Walgreens

Have you ever found yourself searching for a specific item in a pharmacy, wishing you could click control-F to locate it, pay, and leave quickly? Aisle411, Google Tango, and Walgreens teamed up to create a new mobile app that can grant harried shoppers that wish. By using Google’s virtual indoor 3D mapping technology, Aisle411 created a mobile shopping platform that lets consumers search and map products in the store, take advantage of personalized offers, and easily collect loyalty points.

“This changes the definition of in-store advertising in two key ways,” Aisle411 CEO Nathan Pettyjohn told Mobile Commerce Daily. “Advertising becomes an experience—imagine children in a toy store having their favorite toy guide them through the store on a treasure hunt in the aisles of the store—and the end-cap is everywhere; every inch of the store is now a digital end cap.”

According to a Forrester study, 19 percent of consumers are already using their mobile devices to browse in stores. Instead of forcing consumers to look away from their screens, Walgreens is meeting them there.

2. Taco Bell

Nowadays, practically everyone is reliant on their GPS to get them places. That’s why Taco Bell is targeting consumers based on location by advertising and messaging them on mobile platforms like Pandora, Waze (a navigation app purchased by Google), and weather apps.

Digiday reports that in 2014, Taco Bell positioned ads on Waze for their 12 Pack product each Saturday morning to target drivers who might’ve been on their way to watch college football games. The Waze integration was so successful that Taco Bell decided to do the same thing on Sundays during the NFL season—this time advertising its Cool Ranch Doritos Locos Taco.

3. Home Depot

Home Depot has previously used augmented reality in its mobile app to allow users to see how certain products would look in their homes. IKEA is also known for enticing consumers with this mobile strategy. But now, Home Depot is making life even easier for shoppers by piloting a program that connects a customer’s online shopping carts and wish lists with an in-store mobile app.

As explained in the Altimeter report, upon entering a Home Depot, customers who are part of the Pro Rewards program will be able to view the most efficient route through the store based on the products they shopped for online. And anyone who’s been inside a Home Depot knows how massive and overwhelming those places can be without directions.

Creepy? Maybe. But helpful? Definitely. Michael Hibbison, VP of marketing and social media at Home Depot, defends the program to Altimeter Group: “Loyalty programs give brands more rope when it comes to balancing risks of creep. The way we think of it is we will be as personalized as you are loyal.”

4. Tesla Motors

Getting your car fixed can be as easy as installing a software update on your phone—at least for Tesla customers. Tesla’s cars are electric, powered by batteries similar to those that fuel your laptop and mobile device. So when Tesla had to recall almost 30,000 Model S cars because their wall chargers were overheating, the company was able to do the ultimate form of damage control. Instead of taking the products back or bothering customers to take them to a dealership, Tesla just updated the software of each car, effectively eliminating the problem in all of their products.

Tesla also used this connectedness by crowdsourcing updated improvements for their products. As reported by Altimeter, a customer recently submitted a request for a crawl feature that allows the driver to ease into a slow cruise control in heavy traffic. Tesla not only granted the customer’s request, but they added the feature to their entire fleet of cars with just one software update.

5. McDonald’s

McDonald’s may be keeping it old school with their Monopoly contest, which, after 22 years, can still be won by peeling back stickers on your fries and McNuggets. But for their other marketing projects, McDonald’s is getting pretty tech savvy.

McDonald’s partnered with Piper, a Bluetooth low-energy beacon solution provider, to greet customers on their phones as they enter the restaurant. Through the app, consumers are offered coupons, surveys, Q&As, and even information about employment opportunities.

What does McDonald’s get out of it? Data. Lots of data. When customers enter comments, their feedback is routed to the appropriate manager who can respond to the request before the person leaves the establishment.

Too close for comfort? Not compared to the company’s controversial pay-with-a-hug stunt. And at least this initiative is working. According to Mobile Commerce Daily, in the first month of the app’s launch McDonald’s garnered more than 18,000 offer redemptions, and McChicken sales increased 8 percent.

By tapping into the Internet of Things, brands can closely monitor consumer behavior, and—even though it may sound a bit too invasive—put the data they collect to good use. With sensors, a product can go from being a tool to an actual medium of communication between the marketer and the consumer. That sounds pretty cool. But, just to be safe, if you get a shady email from your fridge, maybe don’t open it.

 To read the original article on The Constant Strategist, click here.


Source: 5 Brands That Are Using the Internet of Things in Awesome Ways

Business Linkage Analysis: An Overview

Customer feedback professionals are asked to demonstrate the value of their customer feedback programs. They are asked: Does the customer feedback program measure attitudes that are related to real customer behavior? How do we set operational goals to ensure we maximize customer satisfaction? Are the customer feedback metrics predictive of our future financial performance and business growth? Do customers who report higher loyalty spend more than customers who report lower levels of loyalty? To answer these questions, companies look to a process called business linkage analysis.

Figure 1. Companies who adopt linkage analysis get the insight that drives customer loyalty

Business Linkage Analysis is the process of combining different sources of data (e.g., customer, employee, partner, financial, and operational) to uncover important relationships among important variables (e.g., call handle time and customer satisfaction). For our context, linkage analysis will refer to the linking of other data sources to customer feedback metrics (e.g., customer satisfaction, customer loyalty).

Business Case for Linkage Analyses
Based on a recent study on customer feedback programs best practices (Hayes, 2009), I found that companies who regularly conduct operational linkages analyses with their customer feedback data had higher customer loyalty (72nd percentile) compared to companies who do conduct linkage analyses (50th percentile). Furthermore, customer feedback executives were substantially more satisfied with their customer feedback program in helping them manage customer relationships when linkage analyses (e.g., operational, financial, constituency) were a part of the program (~90% satisfied) compared to their peers in companies who did not use linkage analyses (~55% satisfied). Figure 1 presents the effect size for VOC operational linkage analyses.

Linkage analyses appears to have a positive impact on customer loyalty by providing executives the insights they need to manage customer relationships. These insights give loyalty leaders an advantage over loyalty laggards. Loyalty leaders apply linkage analyses results in a variety of ways to build a more customer-centric company: Determine the ROI of different improvement effort, create customer-centric operational metrics (important to customers) and set employee training standards to ensure customer loyalty, to name a few. In upcoming posts, I will present specific examples of linkage analyses using customer feedback data.

Linkage Analysis: A Data Management and Analysis Problem

Figure 2. Linking Disparate Business Data Sources Leads to Insight

You can think of linkage analysis as a two-step process: 1 ) organizing two disparate data sources into one coherent dataset and 2) conducting analyses on that aggregated dataset. The primary hurdle in any linkage analysis is organizing the data in an appropriate way where the resulting linked dataset make logical sense for our analyses (appropriate unit of analysis). Therefore, data management and statistical skills are essential in conducting a linkage analysis study. More on that later.

Once the data are organized, the researcher is able to conduct nearly any kind of statistical analyses he/she want (e.g., Regression, ANOVA, Multivariate), as long as it makes sense given the types of variables (e.g., nominal, interval) you are using.

Figure 3. Common Types of Linkages among Disparate Data Sources

Types of Linkage Analyses

In business, linkage analyses are conducted using the following types of data (see Figure 2):

  1. Customer Feedback
  2. Financial
  3. Operational
  4. Employee
  5. Partner

Even though I discuss these data sources as if they are distinct, separate sources of data, it is important to note that some companies have some of these data sources housed in one dataset (e.g., call center system can house transaction details including operational metrics and customer satisfaction with that transaction). While this is an advantage, these companies still need to ensure their data are organized together in an appropriate way.

With these data sources, we can conduct three general types of linkage analyses:

  1. Financial: linking customer feedback to financial metrics
  2. Operational: linking customer feedback to operational metrics
  3. Constituency: linking customer feedback to employee and partner variables

Before we go further, I need to make an important distinction between two different types of customer feedback sources: 1) relationship-based and 2) transaction-based. In relationship-based feedback, customer ratings (data) reflect their overall experience with and loyalty towards the company. In transaction-based feedback, customer ratings (data) reflect their experience with a specific event or transaction. This distinction is necessary because different types of linkage analyses require different types of customer feedback data (See Figure 3). Relationship-based customer feedback is needed to conduct financial linkage analyses and transaction-based customer feedback is needed to conduct operational linkage analyses.

Statistical Analyses

The term “linkage analysis” is actually a misnomer. Linkage analysis is not really a type of analysis; it is used to denote that two different data sources have been “linked” together. In fact, several types of analyses can be employed after two data sources have been linked together. Three general types of analyses that I use in linkage analyses are:

  1. Factor analysis of the customer survey items: This analysis helps us create indices from the customer surveys. These indices will be used in the analyses. These indices, because they are made up of several survey questions, are more reliable than any single survey question. Therefore, if there is a real relationship between customer attitudes and financial performance, the chances of finding this relationship greatly improves when we use metrics rather than single items.
  2. Correlational analysis (e.g., Pearson correlations, regression analysis): This class of analyses helps us identify the linear relationship between customer satisfaction/loyalty metrics and other business metrics.
  3. Analysis of Variance (ANOVA): This type of analysis helps us identify the potentially non-linear relationships between the customer satisfaction/loyalty metrics and other business metrics. For example, it is possible that increases in customer satisfaction/loyalty will not translate into improved business metrics until customer satisfaction/loyalty reaches a critical level. When ANOVA is used, the independent variables in the model (x) will be the customer satisfaction/loyalty metrics and the dependent variables will be the financial business metrics (y).


Business linkage analysis is the process of combining different sources of data to uncover important insights about the causes and consequence of customer satisfaction and loyalty. For VOC programs, linkage analyses fall into three general types: financial, operational, and constituency. Each of these types of linkage analyses provide useful insight that can help senior executives better manage customer relationships and improve business growth. I will provide examples of each type of linkage analyses in following posts.

Download a free white paper titled, “Linkage Analysis in your Voice of the Customer Program.”

Source by bobehayes

Meet the startup that is obsessed with tracking every other startup in the world


At their previous jobs at venture capital firms, Sequoia Capital and Accel Partners, respectively, Neha Singh and Abhishek Goyal often had to help identify prospective startups and make investment decisions.

But it wasn’t always easy.

Startups usually don’t disclose information about themselves, since they are privately held firms and are under no compulsion to share data publicly. So, Singh and Goyal had to constantly struggle to collate information from multiple sources.

Eventually, fed up with the lack of a single source for data, the Indian Institute of Technology graduates quit their jobs in 2013 to start an analytics firm, Tracxn!. Their ambition: To become the Gartner—the go-to firm for information technology research—of the startup ecosystem.

“It’s almost surprising,” Singh told Quartz in an email interview, “that despite billions of dollars invested in each of the sectors (be in foodtech or mobile commerce, or payments, etc), thousands of people employed in this ecosystem and many more aspiring to start something here, there is not a single source which tracks and provides insights into these private markets.”

Tracxn! started operations in May 2013, working from Lightspeed Venture Partners’ office in Menlo Park, California, with angel funding from founders of e-commerce companies like Flipkart and Delhivery. In 2014, the startup began its emerging markets operation with focus on India and China.

“After our first launch in April last year, we scaled the revenues quickly and turned profitable last September, (and) grew to a team of 40,” Singh said. Most of its analysts are based in Bengaluru.

Tracxn! follows a SaaS (software as a service) business model, charging subscribers between $20,000 and $90,000 per year. With a database ofover 7,000 Indian and 21,000 US startups, Singh and Goyal now count over 50 venture capital funds among their clients, which also include mergers and acquisitions specialists, product managers, founders and aspiring entrepreneurs.

While firms like Mattermark, Datafox and CB Insights provide similar services, Tracxn! allows investors to get an overview of a sector within the ecosystem before drilling down to individual companies.

“For many funds, we have become a primary source of their deal discovery,” said Singh. “We want to become the default research platform for anyone looking for information and trends on these private markets and companies.”

In April this year, Tracxn! received $3.5 million in funding from private equity firm, SAIF Partners, which it plans to use to ramp up its analyst strength to 150 by the end of the year.

“We keep getting inquiries from investors across various countries (like from Europe, parts of Southeast Asia, etc),” explained Singh. “But we cannot launch them because we don’t have analyst teams for it.”

But with money on its way, Tracxn! now wants to expand coverage into Malaysia, Indonesia, Singapore, Philippines, Vietnam and Europe to build its global database.

Originally posted at:

Originally Posted at: Meet the startup that is obsessed with tracking every other startup in the world

The What and Where of Big Data: A Data Definition Framework

I recently read a good article on the difference between structured and unstructured data. The author defines structured data as data that can be easily organized. As a result these type of data are easily analyzable. Unstructured data refers to information that either does not have a pre-defined data model and/or is not organized in a predefined manner. Unstructured data are not easy to analyze. A primary goal of a data scientist is to extract structure from unstructured data. Natural language processing is a process of extracting something useful (e.g., sentiment, topics) from something that is essentially useless (e.g., text).

While I like these definitions she offers, she included an infographic that is confusing. It equates the structural nature of the data with the source of the data, suggesting that structured data are generated solely from internal/enterprise systems while unstructured data are generated solely from social media sources. I think it would be useful to separate the format (structure vs. unstructured) of the data from source (internal vs. external) of data.

Sources of Data: Internal and External

Generally speaking, business data can come from either internal sources or from external sources. Internal sources of data reflect those data that are under the control of the business. These data are housed in financial reporting system, operational systems, HR systems and CRM systems, to name a few. Business leaders have a large say in the quality of internal data; they are essentially a byproduct of the processes and systems the leaders use to run the business and generate/store the data.

External sources of data, on the other hand, are any data generated outside the walls of the business. These data sources include social media, online communities, open data sources and more. Due to the nature of source of data, external sources of data are under less control by the business than are internal sources of data. These data are collected by other companies, each using their unique systems and processes.

Data Definition Framework

Data Definition Framework
Figure 1. Data Definition Framework

This 2×2 data framework is a way to think about your business data (See Figure 1). This model distinguishes the format of data from the source of data. The 2 columns represent the format of the data, either structured or unstructured. The 2 rows represent the source of the data, either internal or external. Data can fall into one of the four quadrants.

Using this framework, we see that unstructured data can come from both internal sources (e.g., open-ended survey questions, call center transcripts) and external sources (e.g., Twitter comments, Pinterest images). Unstructured data is primarily human-generated. Human-generated data are those that are input by people.

Structured data also can come from both inside (e.g., survey ratings, Web logs, process control measures) and outside (e.g., GPS for tweets, Yelp ratings) the business. Structured data includes both human-generated and machine-generated data. Machine-generated data are those that are calculated/collected automatically and without human intervention (e.g., metadata).

The quality of any analysis is dependent on the quality of the data. You are more likely to uncover something useful in your analysis if your data are reliable and valid. When measuring customers’ attitudes, we can use customer ratings or customer comments as our data source. Customer satisfaction ratings, due to the nature of the data (structured / internal), might be more reliable and valid than customer sentiment metrics from social media content (unstructured / external); as a result, the use of structured data might lead to a better understanding of your data.

Data format is not the same as data source. I offer this data framework as a way for businesses to organize and understand their data assets. Identify strengths and gaps in your own data collection efforts. Organize your data to help you assess your Big Data analytic needs. Understanding the data you have is a good first step in knowing what you can do with it.

What kind of data do you have?



Why You Must Not Have Any Doubts About Cloud Security

The fear of the unknown grips all when adopting anything new and it is therefore natural that there are more skeptics when it comes to Cloud computing, which is a new technology that not everybody understands. The lack of understanding creates fear that makes people worry without reason before they take the first step in adapting the latest technology.  The pattern has been evident during the introduction and launch of any new technology and the advent of Cloud is no exception. It is therefore not a surprise that when it comes to Cloud computing, the likely stakeholders comprising of IT professionals and business owners are wary about the technology and often suspicious about its security level.

Despite wide-scale adoption, more than 90% enterprises in the United States use the Cloud, and there are mixed feelings about Cloud security among companies.  Interestingly, it is not the enterprise alone that uses the Cloud services NJ because it attracts a large section of small and medium businesses too, with 52% SMBs utilizing the platform for storage. The numbers indicate that the users have been able to overcome the initial fear and now trying to figure out what the new technology is. There is a feeling that the Cloud security is inferior to the security offered by legacy systems and in this article, we will try to understand why the Cloud is so useful and why there should not be concerns about the security.

The perception of Cloud security

The debate rages around whether the Cloud is much more secure or somewhat more secure than legacy systems  It has been revealed in a survey that  34% IT professionals feel that the Cloud is slightly more secure but not as much secure that would give them the confidence to rank it a few notches above the legacy systems. The opinion stems from the fact that there have been some high profile data breaches in the Cloud at Apple iCloud, Home Depot, and Target but the breaches resulted not from shortcomings of the Cloud security but due to human factors. Misinformation and lack of knowledge are reasons for making people skeptical about Cloud security.

Strong physical barriers and close surveillance

There used to be a time when legacy systems security was not an issue because denying access to on-premise computers was good enough to thwart hackers and other intrusions. However, it can be difficult to implement proper security in legacy systems comprising of the workstation, terminal, and browser that make it unreliable.  Businesses are now combining legacy systems with the Cloud infrastructure together with the backup and recovery services thus making it more vulnerable to security threats from hackers. Moreover, it is not easy to assess the security of legacy systems that entail a multi-step process that tends to indicate that replacing the legacy system is a better option.

While a locked door is the only defense in most offices to protect the computer system, Cloud service providers have robust arrangements for physical security of data centers comprising of barbed wire, high fences, concrete barriers, security cameras and guards for patrolling the area. Besides preventing people from entering the data center, it also monitors activities in the adjoining spaces.

Access is controlled

The threat is not only from online attackers that try to breach the system, but the threat also comes from people gaining easy physical access to the system that could make it more vulnerable. Cloud service providers ensure complete data security through data encryption during storage, organizations are now turning to selective data storage by using the Cloud facility for storing sensitive data offsite and keep it inaccessible from unauthorized persons. It reduces the human risk of causing damage since only the authorized users get access to sensitive data that remains securely stored in the Cloud. No employee, vendors or third parties can access the data by breaching the security cordon.

Assured cybersecurity

Cloud service providers are well aware of the security concerns and adopt robust security measures to ensure that once data reaches the data centers, it remains wholly protected. The Cloud is under close monitoring and surveillance round the clock that gives users more confidence about data security. When using the Cloud services, you not only get access to the top class data center that offers flexibility and security but also you receive the support of qualified experts who help to make better use of the resource for your business.

Auditing security system

To ensure flawless security to its clients, Cloud service providers conduct frequent auditing of the security features to identify possible weaknesses and take measures to eradicate it. Although the yearly audit is the norm, the interim audit may also take place if the need arises.

As the number of Cloud service users keep increasing, it adequately quells the security fears.


21 Big Data Master Data Management Best Practices

21 Big Data Master Data Management Best Practices
21 Big Data Master Data Management Best Practices

Master Data Management (MDM) is the process of establishing and implementing standards, policies and tools for data that’s most important to an enterprise, including but not limited to information on customers, employees, products and suppliers.

Per Wiki:
In business master data management (MDM) comprises the processes, governance, policies, standards and tools that consistently defines and manages the critical data of an organization to provide a single point of reference.[1]
The data that is mastered may include:
master data – the business objects for transactions, and the dimensions for analysis
reference data – the set of permissible values to be used by other data fields
Transactional data – supports applications
Analytical data – supports decision making [2]
In computing, An MDM tool can be used to support master data management by removing duplicates, standardizing data (mass maintaining), incorporating rules to eliminate incorrect data from entering the system in order to create an authoritative source of master data. Master data are the products, accounts and parties for which the business transactions are completed. The root cause problem stems from business unit and product line segmentation, in which the same customer will be serviced by different product lines, with redundant data being entered about the customer (aka party in the role of customer) and account in order to process the transaction. The redundancy of party and account data is compounded in the front to back office life cycle, where the authoritative single source for the party, account and product data is needed but is often once again redundantly entered or augmented.

So, with task such important Master Data must be designed appropriately and after careful consideration to variour bells and whistles which are responsible for success and failure of the project. Following are top 21 bestpractices that needs to be considered before applying a good data management strategy.

1. Define “What is the business problem we’re trying to solve?”:
With so much data and so many disperate data sources, it is very easy to get lost in translation. So, a mental road map on the overall objective will help in keeping the effort streamlined.

2. Understand how the project helps to prep you for big data:
Yes, growing data is a concern and it should be sorted out at the planning stage. It is important to identify how master data management strategy will prepare your organization not only for generic enterprise data but to cope up with ever increasing big data.

3. Devise a good IT strategy:
Good IT strategy always go hand in hand with a good data strategy. A disfucntional IT strategy could really throw off a most efficient designed data management strategy. A good IT strategy increase the chances of success for a good MDM strategy by several degrees.

4. Business “users” must take full ownership of the master data initiative:
It’s important that business and it’s users must take full ownership of the inititaitve. A well defined ownership will save project from several communication failure which is almost everytime responsible for any project failure.

5. Allow ample time for evaluation and planning:
A well laid out planning stage ensures all the cracks and crevices are sorted out before project is rolled out. A rushed project often increases the rist of failure. Don’t underestimate the time and expertise needed to develop foundational data models.

6. Understand your MDM hub’s data model and how it integrates with your internal source systems and external content providers:
When data model problems cropped up relatively late in the project, whether it was a disconnect between the hub and an important source system, or a misalignment between data modeled in the hub and an external information provider, it was very disruptive. These problems can be avoided by really understanding how the hub is designed, and then mapping that back to your source systems and your external information sources.

7. Identify the project’s mission and business values:
This is another important area that needs it’s due attention. A clear project mission and business value definition helps in making sure high ROI is thought for and planned after in the process. One must link the initiatives to actionable insights.

8. Choose the best technology platform:
Choosing a good technology is important as well. Remeber, you don’t change your technology daily, so putting some thought and research into it makes a lot of different in sustainability of the project. A good technology should help organization grow to next several years without presenting too much growth bottlenecks.

9. Be real and plan a multi-domain design:
In a real world, many MDM technologies grew up managing one particular type of master data. A good strategy must be consistent across. So, applying the same approach to the various master data domains, whether those be customer, product, asset, supplier, location or person is a good strategy.

10. Active, involved executive sponsorship:
Most organizations are very comfortable with their “islands of data” and with technology being implemented in silos. For someone in the organization to come along and suggest changing that status quo, and to start managing critical information centrally, treating it as a true corporate asset, is going to mean some serious cultural change.

11. Use a holistic approach – people, process, technology and information:
This may be the most important best practice. You’ve got to start with the people, the politics, the culture, and then to make sure you spend at least as much time on the business processes involved in data governance and data stewardship. These really deserve a separate article of their own.

12. Pay attention to organizational governance:
You must have a very strong governance model that addresses issues such as change management and knowledge transfer. Afterall, the culture in an organization is a most important entity and a sorted plan to derisk project from it ensures success.

13. Build your processes to be ongoing and repeatable, supporting continuous improvement:
Data governance is a long term proposition. As a reality of any enterprise life, as long as one is in business, enterprise will be creating, modifying, and using master data. So if everyone in the company relies on them, but no one is specifically accountable for maintaining and certifying their level of quality, it shouldn’t be a surprise that, over time, like everything else, they become more and more chaotic and unusable. So plan from the beginning for a “way of life”, not a project.

14. Have a big vision, but take small steps:
Consider the ultimate goal, but limit the scope of the initial deployment, users told Ventana. Once master data management is working in one place, extend it step by step, they advised. Business processes, rather than technology, are often the mitigating factor, they said, so it’s important to get end-user input early in the process.

15. Consider potential performance problems:
Performance is the 800-pound gorilla quietly lurking in the master data management discussion, Loshin cautioned. Different architectures can mean different performance penalties. So, make some room for repair.

16. Management needs to recognize the importance of a dedicated team of data stewards:
Just as books belong in a library and a library needs librarians, master data belongs in a dedicated repository of some type, and that repository needs to be managed by data stewards. It is cruicial to start with convincing management of the need for a small team of data stewards who are 100% dedicated to managing the enterprise’s master data.

17. Consider the transition plan:
Then, there’s the prospect of rolling out a program that has an impact on many critical processes and systems — no trivial concern. Loshin recommended that companies should plan an master data management transition strategy that allows for static and dynamic data synchronization.

18. Resist the urge to customize:
Now that commercial off-the-shelf hub platforms have matured a bit, it should be easier to resist the temptation to get under the hood and customize them. Most vendors are still revving their products as often as twice a year, so you definitely don’t want to get into a situation where you are “rev locked” to an older version.

19. Stay current with vendor-provided patches:
Given the frequency of point releases, patches and major upgrades, you should probably plan for at least one major upgrade during the initial implementation, and be sure to build “upgrade competency” in the team that will maintain the hub platform after the initial project goes live.

20. Carefully plan deployment:
With increasing MDM complexity, training of business and technical people is more important than ever. Using untrained or semi-trained systems integrators and outsourcing attempts caused major problems and project delays for master data management users.

21. Test, test, test and then test again:
This is like the old saying about what’s important in real estate – “location, location, location”. Your MDM hub environment is going to be different, by definition, than every other environment in the world.

Originally Posted at: 21 Big Data Master Data Management Best Practices