Dec 28, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Accuracy check  Source

[ AnalyticsWeek BYTES]

>> Ensuring Data Quality in the Big Data Age by jelaniharper

>> Measurement and Meaning of Customer Loyalty by bobehayes

>> Measuring The Customer Experience Requires Fewer Questions Than You Think by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 Gang statistics for second quarter of 2017 have been compiled by San Bernardino County DA’s Office – Fontana Herald-News Under  Statistics

>>
 53% Of Companies Are Adopting Big Data Analytics – Forbes Under  Big Data Analytics

>>
 BaseHealth gets $8.5M to help providers find ‘invisible patients … – MobiHealthNews Under  Health Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Intro to Machine Learning

image

Machine Learning is a first-class ticket to the most exciting careers in data analysis today. As data sources proliferate along with the computing power to process them, going straight to the data is one of the most stra… more

[ FEATURED READ]

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

image

In the world’s top research labs and universities, the race is on to invent the ultimate learning algorithm: one capable of discovering any knowledge from data, and doing anything we want, before we even ask. In The Mast… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:How to clean data?
A: 1. First: detect anomalies and contradictions
Common issues:
* Tidy data: (Hadley Wickam paper)
column names are values, not names, e.g. 26-45…
multiple variables are stored in one column, e.g. m1534 (male of 15-34 years’ old age)
variables are stored in both rows and columns, e.g. tmax, tmin in the same column
multiple types of observational units are stored in the same table. e.g, song dataset and rank dataset in the same table
*a single observational unit is stored in multiple tables (can be combined)
* Data-Type constraints: values in a particular column must be of a particular type: integer, numeric, factor, boolean
* Range constraints: number or dates fall within a certain range. They have minimum/maximum permissible values
* Mandatory constraints: certain columns can’t be empty
* Unique constraints: a field must be unique across a dataset: a same person must have a unique SS number
* Set-membership constraints: the values for a columns must come from a set of discrete values or codes: a gender must be female, male
* Regular expression patterns: for example, phone number may be required to have the pattern: (999)999-9999
* Misspellings
* Missing values
* Outliers
* Cross-field validation: certain conditions that utilize multiple fields must hold. For instance, in laboratory medicine: the sum of the different white blood cell must equal to zero (they are all percentages). In hospital database, a patient’s date or discharge can’t be earlier than the admission date
2. Clean the data using:
* Regular expressions: misspellings, regular expression patterns
* KNN-impute and other missing values imputing methods
* Coercing: data-type constraints
* Melting: tidy data issues
* Date/time parsing
* Removing observations

Source

[ VIDEO OF THE WEEK]

Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

 Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

If we have data, let’s look at data. If all we have are opinions, let’s go with mine. – Jim Barksdale

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @MichOConnell, @Tibco

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @MichOConnell, @Tibco

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Data production will be 44 times greater in 2020 than it was in 2009.

Sourced from: Analytics.CLUB #WEB Newsletter

Caterpillar digs in to data analytics—investing in hot startup Uptake

The mining and construction equipment maker wants a piece of the industrial Internet. Its strategy? Turn to the startup world for help.

General Electric isn’t the only industrial giant attempting to jumpstart its business with data and software services. On Thursday morning Caterpillar announced it has made a minority investment in Uptake, a Chicago-based data analytics platform co-founded by long-time entrepreneur Brad Keywell, who is profiled in the current issue of Fortune.

As part of the agreement, Caterpillar and Uptake will co-develop “predictive diagnostics” tools for the larger company’s customers. (Uptake says it is also working with other players in insurance, automotive and healthcare, though it won’t disclose other names.) The idea? To take the mountains of data spewing from bulldozers and hydraulic shovels and turn it into meaningful information that can help Caterpillar’s customers catch potential maintenance issues before breakdowns occur, minimizing downtime.

“We had some experience in this [data analytics] because of our our autonomous mining equipment,” Doug Oberhelman, Caterpillar’s CEO, said in an interview with Fortune last month. “But we were really looking for somebody to help us jumpstart this. And that’s where the lightbulb went on between Brad and I.”

Oberhelman’s company, based in Peoria, Ill., is a 90-year-old manufacturer whose performance is often seen as a gauge of the health of the global economy. Uptake, meanwhile, is the brainchild of Keywell, a serial entrepreneur best known for co-founding daily-deal site Groupon. But the two CEOs bonded at a Chicago-area breakfast hosted by—of all people—British prime minister David Cameron.

“This is an early example of something that will become commonplace at some point—entrepreneurs trying to disrupt within an industry rather than disrupt an industry from the outside,” Keywell told Fortune.

Of course, disrupting GE’s $1 billion head start in data analytics (which includes a massive software center the company has built in the Bay Area) won’t be easy. CEO Jeff Immelt has made it clear that the so-called “industrial Internet” will bring about the next wave of (much-needed) growth for his company, and the company has been hard at work developing applications based on its Predix platform and lining up customers like United Airlines and BP.

Uptake, located in the former Montgomery Ward building in Chicago (where many of Keywell’s other startups are also headquartered) has about 100 employees, including a handful of data scientists from nearby universities. It is a speck compared to GE, the 27th largest company in the world. But it’s ability to snag Caterpillar as both an investor and a customer won’t go unnoticed: Not everyone, especially competitors, will want to buy software from GE.

Caterpillar isn’t disclosing how much money it is putting into Uptake, but the two companies have already been working closely for several months (Keywell’s venture capital fund, Lightbank, has also invested in the startup). The ROI for Caterpillar could be far-reaching. While it takes three to five years to design and build a new bulldozer, Uptake’s product development is measured in days and weeks. In an industry that will increasingly have to rely on software services for growth, incorporating some of the startup world’s DNA—speed and agility—is a smart bet for Caterpillar.

Originally posted via “Caterpillar digs in to data analytics—investing in hot startup Uptake”

Originally Posted at: Caterpillar digs in to data analytics—investing in hot startup Uptake

5 Brands That Are Using the Internet of Things in Awesome Ways

mcdonaldsIs your refrigerator running a spam operation? Thanks to the Internet of Things, the answer to that question could be yes.

Despite some dystopian fears, like that spamming refrigerator, the Internet of Things isn’t just an eerie term that sounds like it was plucked from Brave New World. It is a vague one though, so to clear up any uncertainty, here’s the dictionary definition: “a proposed development of the Internet in which everyday objects have network connectivity, allowing them to send and receive data.”

As Altimeter Group points out in its new report, “Customer Experience in the Internet of Things,” brands are already using this sci-fi technology in amazing ways to build customer relationships and optimize their products. In reality, it’s more evolution than revolution, as companies are already tracking smartphone and Internet usage to gather data that provides crucial feedback about consumer behavior. As the report states, the Internet of Things only “brings us closer than ever to the ultimate marketing objective: delivering the right content or experience in the right context.”

Talk of trackers and sensors and refrigerators gone wild may sound intimidating for brands that are still getting their content operations up and running, but some major companies are already exploring the new frontier of the Internet of Things. Here are the five brands doing it best.

1. Walgreens

Have you ever found yourself searching for a specific item in a pharmacy, wishing you could click control-F to locate it, pay, and leave quickly? Aisle411, Google Tango, and Walgreens teamed up to create a new mobile app that can grant harried shoppers that wish. By using Google’s virtual indoor 3D mapping technology, Aisle411 created a mobile shopping platform that lets consumers search and map products in the store, take advantage of personalized offers, and easily collect loyalty points.

“This changes the definition of in-store advertising in two key ways,” Aisle411 CEO Nathan Pettyjohn told Mobile Commerce Daily. “Advertising becomes an experience—imagine children in a toy store having their favorite toy guide them through the store on a treasure hunt in the aisles of the store—and the end-cap is everywhere; every inch of the store is now a digital end cap.”

According to a Forrester study, 19 percent of consumers are already using their mobile devices to browse in stores. Instead of forcing consumers to look away from their screens, Walgreens is meeting them there.

2. Taco Bell

Nowadays, practically everyone is reliant on their GPS to get them places. That’s why Taco Bell is targeting consumers based on location by advertising and messaging them on mobile platforms like Pandora, Waze (a navigation app purchased by Google), and weather apps.

Digiday reports that in 2014, Taco Bell positioned ads on Waze for their 12 Pack product each Saturday morning to target drivers who might’ve been on their way to watch college football games. The Waze integration was so successful that Taco Bell decided to do the same thing on Sundays during the NFL season—this time advertising its Cool Ranch Doritos Locos Taco.

3. Home Depot

Home Depot has previously used augmented reality in its mobile app to allow users to see how certain products would look in their homes. IKEA is also known for enticing consumers with this mobile strategy. But now, Home Depot is making life even easier for shoppers by piloting a program that connects a customer’s online shopping carts and wish lists with an in-store mobile app.

As explained in the Altimeter report, upon entering a Home Depot, customers who are part of the Pro Rewards program will be able to view the most efficient route through the store based on the products they shopped for online. And anyone who’s been inside a Home Depot knows how massive and overwhelming those places can be without directions.

Creepy? Maybe. But helpful? Definitely. Michael Hibbison, VP of marketing and social media at Home Depot, defends the program to Altimeter Group: “Loyalty programs give brands more rope when it comes to balancing risks of creep. The way we think of it is we will be as personalized as you are loyal.”

4. Tesla Motors

Getting your car fixed can be as easy as installing a software update on your phone—at least for Tesla customers. Tesla’s cars are electric, powered by batteries similar to those that fuel your laptop and mobile device. So when Tesla had to recall almost 30,000 Model S cars because their wall chargers were overheating, the company was able to do the ultimate form of damage control. Instead of taking the products back or bothering customers to take them to a dealership, Tesla just updated the software of each car, effectively eliminating the problem in all of their products.

Tesla also used this connectedness by crowdsourcing updated improvements for their products. As reported by Altimeter, a customer recently submitted a request for a crawl feature that allows the driver to ease into a slow cruise control in heavy traffic. Tesla not only granted the customer’s request, but they added the feature to their entire fleet of cars with just one software update.

5. McDonald’s

McDonald’s may be keeping it old school with their Monopoly contest, which, after 22 years, can still be won by peeling back stickers on your fries and McNuggets. But for their other marketing projects, McDonald’s is getting pretty tech savvy.

McDonald’s partnered with Piper, a Bluetooth low-energy beacon solution provider, to greet customers on their phones as they enter the restaurant. Through the app, consumers are offered coupons, surveys, Q&As, and even information about employment opportunities.

What does McDonald’s get out of it? Data. Lots of data. When customers enter comments, their feedback is routed to the appropriate manager who can respond to the request before the person leaves the establishment.

Too close for comfort? Not compared to the company’s controversial pay-with-a-hug stunt. And at least this initiative is working. According to Mobile Commerce Daily, in the first month of the app’s launch McDonald’s garnered more than 18,000 offer redemptions, and McChicken sales increased 8 percent.

By tapping into the Internet of Things, brands can closely monitor consumer behavior, and—even though it may sound a bit too invasive—put the data they collect to good use. With sensors, a product can go from being a tool to an actual medium of communication between the marketer and the consumer. That sounds pretty cool. But, just to be safe, if you get a shady email from your fridge, maybe don’t open it.

 To read the original article on The Constant Strategist, click here.

 

Source: 5 Brands That Are Using the Internet of Things in Awesome Ways

Dec 21, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Accuracy  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> February 20, 2017 Health and Biotech analytics news roundup by pstein

>> Oracle introduces big data-infused marketing services by analyticsweekpick

>> 2017 Trends in Cognitive Computing: Humanizing Artificial Intelligence by jelaniharper

Wanna write? Click Here

[ NEWS BYTES]

>>
 Machine Learning & EHSQ: An Overview – Techzone360 Under  Machine Learning

>>
 IFINSEC Financial Sector IT Security Conference And Exhibition – ISBuzz News Under  Big Data Security

>>
 FinancialForce has fingertip on the pulse of AI – Enterprise Times Under  Prescriptive Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Pattern Discovery in Data Mining

image

Learn the general concepts of data mining along with basic methodologies and applications. Then dive into one subfield in data mining: pattern discovery. Learn in-depth concepts, methods, and applications of pattern disc… more

[ FEATURED READ]

Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4th Edition

image

The eagerly anticipated Fourth Edition of the title that pioneered the comparison of qualitative, quantitative, and mixed methods research design is here! For all three approaches, Creswell includes a preliminary conside… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:Examples of NoSQL architecture?
A: * Key-value: in a key-value NoSQL database, all of the data within consists of an indexed key and a value. Cassandra, DynamoDB
* Column-based: designed for storing data tables as sections of columns of data rather than as rows of data. HBase, SAP HANA
* Document Database: map a key to some document that contains structured information. The key is used to retrieve the document. MongoDB, CouchDB
* Graph Database: designed for data whose relations are well-represented as a graph and has elements which are interconnected, with an undetermined number of relations between them. Polyglot Neo4J

Source

[ VIDEO OF THE WEEK]

Understanding Data Analytics in Information Security with @JayJarome, @BitSight

 Understanding Data Analytics in Information Security with @JayJarome, @BitSight

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Getting information off the Internet is like taking a drink from a firehose. – Mitchell Kapor

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with  John Young, @Epsilonmktg

 #BigData @AnalyticsWeek #FutureOfData #Podcast with John Young, @Epsilonmktg

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

For a typical Fortune 1000 company, just a 10% increase in data accessibility will result in more than $65 million additional net income.

Sourced from: Analytics.CLUB #WEB Newsletter

Business Linkage Analysis: An Overview

Customer feedback professionals are asked to demonstrate the value of their customer feedback programs. They are asked: Does the customer feedback program measure attitudes that are related to real customer behavior? How do we set operational goals to ensure we maximize customer satisfaction? Are the customer feedback metrics predictive of our future financial performance and business growth? Do customers who report higher loyalty spend more than customers who report lower levels of loyalty? To answer these questions, companies look to a process called business linkage analysis.

Figure 1. Companies who adopt linkage analysis get the insight that drives customer loyalty

Business Linkage Analysis is the process of combining different sources of data (e.g., customer, employee, partner, financial, and operational) to uncover important relationships among important variables (e.g., call handle time and customer satisfaction). For our context, linkage analysis will refer to the linking of other data sources to customer feedback metrics (e.g., customer satisfaction, customer loyalty).

Business Case for Linkage Analyses
Based on a recent study on customer feedback programs best practices (Hayes, 2009), I found that companies who regularly conduct operational linkages analyses with their customer feedback data had higher customer loyalty (72nd percentile) compared to companies who do conduct linkage analyses (50th percentile). Furthermore, customer feedback executives were substantially more satisfied with their customer feedback program in helping them manage customer relationships when linkage analyses (e.g., operational, financial, constituency) were a part of the program (~90% satisfied) compared to their peers in companies who did not use linkage analyses (~55% satisfied). Figure 1 presents the effect size for VOC operational linkage analyses.

Linkage analyses appears to have a positive impact on customer loyalty by providing executives the insights they need to manage customer relationships. These insights give loyalty leaders an advantage over loyalty laggards. Loyalty leaders apply linkage analyses results in a variety of ways to build a more customer-centric company: Determine the ROI of different improvement effort, create customer-centric operational metrics (important to customers) and set employee training standards to ensure customer loyalty, to name a few. In upcoming posts, I will present specific examples of linkage analyses using customer feedback data.

Linkage Analysis: A Data Management and Analysis Problem

Figure 2. Linking Disparate Business Data Sources Leads to Insight

You can think of linkage analysis as a two-step process: 1 ) organizing two disparate data sources into one coherent dataset and 2) conducting analyses on that aggregated dataset. The primary hurdle in any linkage analysis is organizing the data in an appropriate way where the resulting linked dataset make logical sense for our analyses (appropriate unit of analysis). Therefore, data management and statistical skills are essential in conducting a linkage analysis study. More on that later.

Once the data are organized, the researcher is able to conduct nearly any kind of statistical analyses he/she want (e.g., Regression, ANOVA, Multivariate), as long as it makes sense given the types of variables (e.g., nominal, interval) you are using.

Figure 3. Common Types of Linkages among Disparate Data Sources

Types of Linkage Analyses

In business, linkage analyses are conducted using the following types of data (see Figure 2):

  1. Customer Feedback
  2. Financial
  3. Operational
  4. Employee
  5. Partner

Even though I discuss these data sources as if they are distinct, separate sources of data, it is important to note that some companies have some of these data sources housed in one dataset (e.g., call center system can house transaction details including operational metrics and customer satisfaction with that transaction). While this is an advantage, these companies still need to ensure their data are organized together in an appropriate way.

With these data sources, we can conduct three general types of linkage analyses:

  1. Financial: linking customer feedback to financial metrics
  2. Operational: linking customer feedback to operational metrics
  3. Constituency: linking customer feedback to employee and partner variables

Before we go further, I need to make an important distinction between two different types of customer feedback sources: 1) relationship-based and 2) transaction-based. In relationship-based feedback, customer ratings (data) reflect their overall experience with and loyalty towards the company. In transaction-based feedback, customer ratings (data) reflect their experience with a specific event or transaction. This distinction is necessary because different types of linkage analyses require different types of customer feedback data (See Figure 3). Relationship-based customer feedback is needed to conduct financial linkage analyses and transaction-based customer feedback is needed to conduct operational linkage analyses.

Statistical Analyses

The term “linkage analysis” is actually a misnomer. Linkage analysis is not really a type of analysis; it is used to denote that two different data sources have been “linked” together. In fact, several types of analyses can be employed after two data sources have been linked together. Three general types of analyses that I use in linkage analyses are:

  1. Factor analysis of the customer survey items: This analysis helps us create indices from the customer surveys. These indices will be used in the analyses. These indices, because they are made up of several survey questions, are more reliable than any single survey question. Therefore, if there is a real relationship between customer attitudes and financial performance, the chances of finding this relationship greatly improves when we use metrics rather than single items.
  2. Correlational analysis (e.g., Pearson correlations, regression analysis): This class of analyses helps us identify the linear relationship between customer satisfaction/loyalty metrics and other business metrics.
  3. Analysis of Variance (ANOVA): This type of analysis helps us identify the potentially non-linear relationships between the customer satisfaction/loyalty metrics and other business metrics. For example, it is possible that increases in customer satisfaction/loyalty will not translate into improved business metrics until customer satisfaction/loyalty reaches a critical level. When ANOVA is used, the independent variables in the model (x) will be the customer satisfaction/loyalty metrics and the dependent variables will be the financial business metrics (y).

Summary

Business linkage analysis is the process of combining different sources of data to uncover important insights about the causes and consequence of customer satisfaction and loyalty. For VOC programs, linkage analyses fall into three general types: financial, operational, and constituency. Each of these types of linkage analyses provide useful insight that can help senior executives better manage customer relationships and improve business growth. I will provide examples of each type of linkage analyses in following posts.

Download a free white paper titled, “Linkage Analysis in your Voice of the Customer Program.”

Source by bobehayes

Meet the startup that is obsessed with tracking every other startup in the world

startup

At their previous jobs at venture capital firms, Sequoia Capital and Accel Partners, respectively, Neha Singh and Abhishek Goyal often had to help identify prospective startups and make investment decisions.

But it wasn’t always easy.

Startups usually don’t disclose information about themselves, since they are privately held firms and are under no compulsion to share data publicly. So, Singh and Goyal had to constantly struggle to collate information from multiple sources.

Eventually, fed up with the lack of a single source for data, the Indian Institute of Technology graduates quit their jobs in 2013 to start an analytics firm, Tracxn!. Their ambition: To become the Gartner—the go-to firm for information technology research—of the startup ecosystem.

“It’s almost surprising,” Singh told Quartz in an email interview, “that despite billions of dollars invested in each of the sectors (be in foodtech or mobile commerce, or payments, etc), thousands of people employed in this ecosystem and many more aspiring to start something here, there is not a single source which tracks and provides insights into these private markets.”

Tracxn! started operations in May 2013, working from Lightspeed Venture Partners’ office in Menlo Park, California, with angel funding from founders of e-commerce companies like Flipkart and Delhivery. In 2014, the startup began its emerging markets operation with focus on India and China.

“After our first launch in April last year, we scaled the revenues quickly and turned profitable last September, (and) grew to a team of 40,” Singh said. Most of its analysts are based in Bengaluru.

Tracxn! follows a SaaS (software as a service) business model, charging subscribers between $20,000 and $90,000 per year. With a database ofover 7,000 Indian and 21,000 US startups, Singh and Goyal now count over 50 venture capital funds among their clients, which also include mergers and acquisitions specialists, product managers, founders and aspiring entrepreneurs.

While firms like Mattermark, Datafox and CB Insights provide similar services, Tracxn! allows investors to get an overview of a sector within the ecosystem before drilling down to individual companies.

“For many funds, we have become a primary source of their deal discovery,” said Singh. “We want to become the default research platform for anyone looking for information and trends on these private markets and companies.”

In April this year, Tracxn! received $3.5 million in funding from private equity firm, SAIF Partners, which it plans to use to ramp up its analyst strength to 150 by the end of the year.

“We keep getting inquiries from investors across various countries (like from Europe, parts of Southeast Asia, etc),” explained Singh. “But we cannot launch them because we don’t have analyst teams for it.”

But with money on its way, Tracxn! now wants to expand coverage into Malaysia, Indonesia, Singapore, Philippines, Vietnam and Europe to build its global database.

Originally posted at: http://qz.com/401931/meet-the-startup-that-is-obsessed-with-tracking-every-other-startup-in-the-world/

Originally Posted at: Meet the startup that is obsessed with tracking every other startup in the world

Dec 14, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data interpretation  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ NEWS BYTES]

>>
 AI and machine learning will make everyone a musician – Wired.co.uk Under  Machine Learning

>>
 Barnes & Noble Inc (NYSE:BKS) Institutional Investor Sentiment Analysis – WeeklyHub Under  Sentiment Analysis

>>
 Reposition DCIM systems for virtualization, container management – TechTarget Under  Virtualization

More NEWS ? Click Here

[ FEATURED COURSE]

Probability & Statistics

image

This course introduces students to the basic concepts and logic of statistical reasoning and gives the students introductory-level practical ability to choose, generate, and properly interpret appropriate descriptive and… more

[ FEATURED READ]

Thinking, Fast and Slow

image

Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely wor… more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:What is random forest? Why is it good?
A: Random forest? (Intuition):
– Underlying principle: several weak learners combined provide a strong learner
– Builds several decision trees on bootstrapped training samples of data
– On each tree, each time a split is considered, a random sample of m predictors is chosen as split candidates, out of all p predictors
– Rule of thumb: at each split m=?p
– Predictions: at the majority rule

Why is it good?
– Very good performance (decorrelates the features)
– Can model non-linear class boundaries
– Generalization error for free: no cross-validation needed, gives an unbiased estimate of the generalization error as the trees is built
– Generates variable importance

Source

[ VIDEO OF THE WEEK]

Understanding Data Analytics in Information Security with @JayJarome, @BitSight

 Understanding Data Analytics in Information Security with @JayJarome, @BitSight

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Without big data, you are blind and deaf and in the middle of a freeway. – Geoffrey Moore

[ PODCAST OF THE WEEK]

@CRGutowski from @GE_Digital on Using #Analytics to #Transform Sales #FutureOfData #Podcast

 @CRGutowski from @GE_Digital on Using #Analytics to #Transform Sales #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

235 Terabytes of data has been collected by the U.S. Library of Congress in April 2011.

Sourced from: Analytics.CLUB #WEB Newsletter

The What and Where of Big Data: A Data Definition Framework

I recently read a good article on the difference between structured and unstructured data. The author defines structured data as data that can be easily organized. As a result these type of data are easily analyzable. Unstructured data refers to information that either does not have a pre-defined data model and/or is not organized in a predefined manner. Unstructured data are not easy to analyze. A primary goal of a data scientist is to extract structure from unstructured data. Natural language processing is a process of extracting something useful (e.g., sentiment, topics) from something that is essentially useless (e.g., text).

While I like these definitions she offers, she included an infographic that is confusing. It equates the structural nature of the data with the source of the data, suggesting that structured data are generated solely from internal/enterprise systems while unstructured data are generated solely from social media sources. I think it would be useful to separate the format (structure vs. unstructured) of the data from source (internal vs. external) of data.

Sources of Data: Internal and External

Generally speaking, business data can come from either internal sources or from external sources. Internal sources of data reflect those data that are under the control of the business. These data are housed in financial reporting system, operational systems, HR systems and CRM systems, to name a few. Business leaders have a large say in the quality of internal data; they are essentially a byproduct of the processes and systems the leaders use to run the business and generate/store the data.

External sources of data, on the other hand, are any data generated outside the walls of the business. These data sources include social media, online communities, open data sources and more. Due to the nature of source of data, external sources of data are under less control by the business than are internal sources of data. These data are collected by other companies, each using their unique systems and processes.

Data Definition Framework

Data Definition Framework
Figure 1. Data Definition Framework

This 2×2 data framework is a way to think about your business data (See Figure 1). This model distinguishes the format of data from the source of data. The 2 columns represent the format of the data, either structured or unstructured. The 2 rows represent the source of the data, either internal or external. Data can fall into one of the four quadrants.

Using this framework, we see that unstructured data can come from both internal sources (e.g., open-ended survey questions, call center transcripts) and external sources (e.g., Twitter comments, Pinterest images). Unstructured data is primarily human-generated. Human-generated data are those that are input by people.

Structured data also can come from both inside (e.g., survey ratings, Web logs, process control measures) and outside (e.g., GPS for tweets, Yelp ratings) the business. Structured data includes both human-generated and machine-generated data. Machine-generated data are those that are calculated/collected automatically and without human intervention (e.g., metadata).

The quality of any analysis is dependent on the quality of the data. You are more likely to uncover something useful in your analysis if your data are reliable and valid. When measuring customers’ attitudes, we can use customer ratings or customer comments as our data source. Customer satisfaction ratings, due to the nature of the data (structured / internal), might be more reliable and valid than customer sentiment metrics from social media content (unstructured / external); as a result, the use of structured data might lead to a better understanding of your data.

Data format is not the same as data source. I offer this data framework as a way for businesses to organize and understand their data assets. Identify strengths and gaps in your own data collection efforts. Organize your data to help you assess your Big Data analytic needs. Understanding the data you have is a good first step in knowing what you can do with it.

What kind of data do you have?

 

Source

Why You Must Not Have Any Doubts About Cloud Security

The fear of the unknown grips all when adopting anything new and it is therefore natural that there are more skeptics when it comes to Cloud computing, which is a new technology that not everybody understands. The lack of understanding creates fear that makes people worry without reason before they take the first step in adapting the latest technology.  The pattern has been evident during the introduction and launch of any new technology and the advent of Cloud is no exception. It is therefore not a surprise that when it comes to Cloud computing, the likely stakeholders comprising of IT professionals and business owners are wary about the technology and often suspicious about its security level.

Despite wide-scale adoption, more than 90% enterprises in the United States use the Cloud, and there are mixed feelings about Cloud security among companies.  Interestingly, it is not the enterprise alone that uses the Cloud services NJ because it attracts a large section of small and medium businesses too, with 52% SMBs utilizing the platform for storage. The numbers indicate that the users have been able to overcome the initial fear and now trying to figure out what the new technology is. There is a feeling that the Cloud security is inferior to the security offered by legacy systems and in this article, we will try to understand why the Cloud is so useful and why there should not be concerns about the security.

The perception of Cloud security

The debate rages around whether the Cloud is much more secure or somewhat more secure than legacy systems  It has been revealed in a survey that  34% IT professionals feel that the Cloud is slightly more secure but not as much secure that would give them the confidence to rank it a few notches above the legacy systems. The opinion stems from the fact that there have been some high profile data breaches in the Cloud at Apple iCloud, Home Depot, and Target but the breaches resulted not from shortcomings of the Cloud security but due to human factors. Misinformation and lack of knowledge are reasons for making people skeptical about Cloud security.

Strong physical barriers and close surveillance

There used to be a time when legacy systems security was not an issue because denying access to on-premise computers was good enough to thwart hackers and other intrusions. However, it can be difficult to implement proper security in legacy systems comprising of the workstation, terminal, and browser that make it unreliable.  Businesses are now combining legacy systems with the Cloud infrastructure together with the backup and recovery services thus making it more vulnerable to security threats from hackers. Moreover, it is not easy to assess the security of legacy systems that entail a multi-step process that tends to indicate that replacing the legacy system is a better option.

While a locked door is the only defense in most offices to protect the computer system, Cloud service providers have robust arrangements for physical security of data centers comprising of barbed wire, high fences, concrete barriers, security cameras and guards for patrolling the area. Besides preventing people from entering the data center, it also monitors activities in the adjoining spaces.

Access is controlled

The threat is not only from online attackers that try to breach the system, but the threat also comes from people gaining easy physical access to the system that could make it more vulnerable. Cloud service providers ensure complete data security through data encryption during storage, organizations are now turning to selective data storage by using the Cloud facility for storing sensitive data offsite and keep it inaccessible from unauthorized persons. It reduces the human risk of causing damage since only the authorized users get access to sensitive data that remains securely stored in the Cloud. No employee, vendors or third parties can access the data by breaching the security cordon.

Assured cybersecurity

Cloud service providers are well aware of the security concerns and adopt robust security measures to ensure that once data reaches the data centers, it remains wholly protected. The Cloud is under close monitoring and surveillance round the clock that gives users more confidence about data security. When using the Cloud services, you not only get access to the top class data center that offers flexibility and security but also you receive the support of qualified experts who help to make better use of the resource for your business.

Auditing security system

To ensure flawless security to its clients, Cloud service providers conduct frequent auditing of the security features to identify possible weaknesses and take measures to eradicate it. Although the yearly audit is the norm, the interim audit may also take place if the need arises.

As the number of Cloud service users keep increasing, it adequately quells the security fears.

Source

Dec 07, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Complex data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Big Data Analytics Bottleneck Challenging Global Capital Markets Ecosystem, Says TABB Group by analyticsweekpick

>> Best & Worst Time for Cold Call by v1shal

>> The real-time machine for every business: Big data-driven market analytics by thomassujain

Wanna write? Click Here

[ NEWS BYTES]

>>
 More hospital closings in rural America add risk for pregnant women – Reuters Under  Health Analytics

>>
 Statistics show fatal, injury crashes up at Sturgis Rally compared to this time last year – KSFY Under  Statistics

>>
 The power of machine learning reaches data management – Network World Under  Machine Learning

More NEWS ? Click Here

[ FEATURED COURSE]

Introduction to Apache Spark

image

Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals…. more

[ FEATURED READ]

Thinking, Fast and Slow

image

Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely wor… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE Q&A]

Q:What is random forest? Why is it good?
A: Random forest? (Intuition):
– Underlying principle: several weak learners combined provide a strong learner
– Builds several decision trees on bootstrapped training samples of data
– On each tree, each time a split is considered, a random sample of m predictors is chosen as split candidates, out of all p predictors
– Rule of thumb: at each split m=?p
– Predictions: at the majority rule

Why is it good?
– Very good performance (decorrelates the features)
– Can model non-linear class boundaries
– Generalization error for free: no cross-validation needed, gives an unbiased estimate of the generalization error as the trees is built
– Generates variable importance

Source

[ VIDEO OF THE WEEK]

RShiny Tutorial: Turning Big Data into Business Applications

 RShiny Tutorial: Turning Big Data into Business Applications

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It’s easy to lie with statistics. It’s hard to tell the truth without statistics. – Andrejs Dunkels

[ PODCAST OF THE WEEK]

@BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

 @BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

In 2008, Google was processing 20,000 terabytes of data (20 petabytes) a day.

Sourced from: Analytics.CLUB #WEB Newsletter