@BrianHaugli @The_Hanover ‏on Building a #Leadership #Security #Mindset

[youtube https://www.youtube.com/watch?v=XoEPadDSp6E]

In This podcast Brian Haugli from The Hanover Insurance Group sat with Vishal to talk about some of the security led leader’s mindset. From discussing some of the leadership mindset to practitioner tactical guide to help future security leaders to understand how to secure their organization. This session is great for any security passionate leader willing to create a security wrapped growth mindset.

Brian’s Read Recommendation:
On The Road by Jack Kerouac http://amzn.to/2hMhOhG

Podcast Link:
iTunes: http://math.im/itunes

GooglePlay: http://math.im/gplay

Brian’s BIO:
Brian Haugli is a Certified Information Systems Security Professional (CISSP) and a Global Industrial Cyber Security Professional (GICSP). Brian previously served as a senior advisor on cyber security and information risk management for the Department of Defense, US Army ITA, and Pentagon. He has 20 years of professional experience and expertise in network topologies, design, implementation, architecture, and cyber security. He has extensive knowledge of and has implemented risk management frameworks, methodologies, and processes. He has been responsible for creating compliant and secure networks for multiple sites through his extensive background in intrusion detection and full network end-to-end testing. He has outstanding communication skills, a positive demeanor, and the ability to interface with all levels of an organization.

About #Podcast:
#FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join?
If you or any you know wants to join in,
Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor?
Email us @ info@analyticsweek.com

Keywords:
FutureOfData
Data
Analytics
Leadership Podcast
Big Data
Strategy

Originally Posted at: @BrianHaugli @The_Hanover ‏on Building a #Leadership #Security #Mindset by v1shal

Apache Spark for Big Analytics

Apache Spark is hot.   Spark, a top-level Apache project, is an open source distributed computing framework for advanced analytics in Hadoop.  Originally developed as a research project at UC Berkeley’s AMPLab, the project achieved incubator status in Apache in June 2013 and top-level status in February 2014.

Spark seeks to address the critical challenges for advanced analytics in Hadoop.  First, Spark is designed to support in-memory processing, so developers can write iterative algorithms without writing out a result set after each pass through the data.  This enables true high performance advanced analytics; for techniques like logistic regression, project sponsors report runtimes in Spark 100X faster than what they are able to achieve with MapReduce.

Second, Spark offers an integrated framework for advanced analytics, including a machine learning library (MLLib); a graph engine (GraphX); a streaming analytics engine (Spark Streaming) and fast interactive query tool (Shark).   (Update:  Databricks recently announced Alpha availability of Spark SQL).   This eliminates the need to support multiple point solutions, such as Giraph, and GraphLab for graph engines; Storm and S4 for streaming; or Hive and Impala for interactive queries.  A single platform simplifies integration, and ensures that users can produce consistent results across different types of analysis.

spark-project-header1-cropped

At Spark’s core is an abstraction layer called Resilient Distributed Datasets, or RDDs.  RDDs are read-only partitioned collections of records created through deterministic operations on stable data or other RDDs.  RDDs include information about data lineage together with instructions for data transformation and (optional) instructions for persistence.  They are designed to be fault tolerant, so that if an operation fails it can be reconstructed.

For data sources, Spark works with any file stored in HDFS, or any other storage system supported by Hadoop (including local file systems, Amazon S3, Hypertable and HBase).  Hadoop supports text files, SequenceFiles and any other Hadoop InputFormat.

Spark’s machine learning library, MLLib, is rapidly growing.   In the latest release it includes linear support vector machines and logistic regression for binary classification; linear regression; k-means clustering; and alternating least squares for collaborative filtering.  Linear regression, logistic regression and support vector machines are all based on a gradient descent optimization algorithm, with options for L1 and L2 regularization.  MLLib is part of a larger machine learning project (MLBase), which includes an API for feature extraction and an optimizer (currently in development with planned release in 2014).

GraphX, Spark’s graph engine, combines the advantages of data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark framework.  It enables users to interactively load, transform, and compute on massive graphs.  Project sponsors report performance comparable to Apache Giraph, but in a fault tolerant environment that is readily integrated with other advanced analytics.

Spark Streaming offers an additional abstraction called discretized streams, or DStreams.  DStreams are a continuous sequence of RDDs representing a stream of data; they are created from live incoming data or generated by transforming other DStreams.  Spark receives data, divides it into batches, then replicates the batches for fault tolerance and persists them in memory where they are available for mathematical operations.

Currently, Spark supports programming interfaces for Scala, Java and Python.  For R users, the team at Berkeley’s AMPLab released a developer preview of SparkR in January.

There is an active and growing developer community for Spark; 83 developers contributed to Release 0.9.  In the past six months, developers contributed more commits to Spark than to all of the other Apache analytics projects combined.   In 2013, the Spark project published seven double-dot releases, including Spark 0.8.1 published on December 19; this release included YARN 2.2 support, high availability mode for cluster management, performance optimizations and improvements to the machine learning library and Python interface.  The Spark team released 0.9.0 in February, 2014, and 0.9.1, a maintenance release, in April, 2014.  Release 0.9 includes Scala 2.10 support, a configuration library, improvements to Spark Streaming, the Alpha release for GraphX, enhancements to MLLib and many other enhancements).

In a nod to Spark’s rapid progress, Cloudera announced immediate support for Spark in February.   MapR recently announced that it will distribute the complete Spark stack, including Shark (Cloudera does not distribute Shark).  Hortonworks also recently announced plans to distribute Spark for machine learning, though it plans to stick with Storm for streaming analytics and Giraph for graph engines.  Databricks offers a certification program for Spark; participants currently include Adatao, Alpine Data Labs, ClearStory and Tresata.)

In December, the first Spark Summit attracted more than 450 participants from more than 180 companies.  Presentations covered a range of applications such as neuroscience, audience expansion, real-time network optimization and real-time data center management, together with a range of technical topics.  The 2014 Spark Summit will be held in San Francisco this June 30-July 2.

In recognition of Spark’s rapid development, on February 27 Apache announced that Spark is a top-level project.  Developers expect to continue adding machine learning features and to simplify implementation.  Together with an R interface and commercial support, we can expect continued interest and application for Spark.   Enhancements are coming rapidly — expect more announcements before the Spark Summit.

Source

ADP Launches Its Own Big Data Analytics Cloud Platform

ADPlogo.290x195
Automatic Data Processing—better known as ADP, the payroll service to much of the world—went into the big data platform business on May 12. And why not, with more than 600,000 client businesses and 24 million employees in the U.S. alone from which to access piles of metadata.
Roseland, N.J.-based ADP is the payroll accounting service that uses its aggregate metadata to generate a monthly jobs report that is respected by many economists. In fact, because it is the largest payroll processor in the nation, ADP’s vast big data silo from U.S. companies and their employees ranks second only to the employee records of the U.S. federal government.
So ADP has started to use all that data for a new business wing: a big data-based cloud service. ADP DataCloud is designed to put day-to-day analytics capabilities in the hands of line-of-business accounting and HR staff members, enabling them to obtain insights from the workforce data already embedded in their individual ADP human capital management systems.
ADP DataCloud aims to boost business and workforce management goals, such as workforce productivity, talent development, retention and the identification of flight risks. More than 1,000 ADP clients already are using these analytics capabilities.
Exploring the Trends Impacting the Modern Data Center Download Now
ADP DataCloud provides companies with critical information that can help answer key questions facing not only HR, but the overall business. A consumer-grade user experience blended with analytics allows clients to obtain deep enterprise insights across the organization’s HCM data. The embedded big data platform is within ADP’s core solutions, including ADP Vantage HCM, Enterprise HR, ADP Workforce Now and ADP Time & Attendance.

Key features include the following:
Benchmarking: Offers companies the ability to compare HCM metrics with an aggregated and anonymous market benchmark at the industry, location and job-title level to inform key workforce decisions.
Data exchange: Provides companies with the ability to combine workforce data with other types of business data, such as sales or customer satisfaction scores, from non-ADP platforms to identify key deeper business insights and actions.
Predictive analytics: Utilizes predictive models derived from ADP data to help employers make smarter, forward-looking workforce decisions by providing insight into the likelihood of specific workforce management outcomes. The first capability helps employers identify those employees likely to leave their organization.
A new research project commissioned by ADP found that 75 percent of companies with 1,000 or more employees have access to data to inform business decisions, but only 46 percent are using workforce analytics capabilities to improve business decision making. The same study found that 42 percent of company finance executives and 32 percent of midlevel managers want to utilize workforce analytics.
The study consisted of a 2015 survey of 300 HR executives, finance executives and managers at companies with 1,000+ employees.

Originally posted at: http://www.eweek.com/enterprise-apps/adp-launches-its-own-big-data-analytics-platform.html

Source

How Google Understands You [Infographic]

How Google Understands You [Infographic]
How Google Understands You [Infographic]

You thought the human brain was complex? With its ability to retrieve stored memories from years past and forge connections from seemingly disparate topics, it truly seems like the brain is a miraculous organ that rules our everyday lives. But what about the Google brain? Just as intricate and just as ever-changing as a human’s brain, the Google search engine works to make associations, recommendations, and analysis based upon your search phrases.

However, the question remains: how does Google understand what we want from it? When we ask it a question, how do those millions of results show up for us effortlessly, ranked in terms of relevancy and authority? Every one of us takes this process for granted so in this infographic, we’ll look at the inner mechanics of the Google search engine that produces the results you see on your screen.

How Google Understands You [INFOGRAPHIC]
Infographic by Vertical Measures

Originally Posted at: How Google Understands You [Infographic] by v1shal

Using Big Data to Kick-start Your Career

Gordon Square Communications and WAAT offers tips about how to make the most of online resources to land a dream job – all without spending a penny.

Left to right: Vamory Traore, Sylvia Arthur and Grzegorz Gonciarz

You are probably familiar with Monster.com or Indeed.com, huge jobs websites where you can upload your CV together with other 150 million people every month.

The bad news is that it is unlikely that your CV will ever get seen on one of these websites, discovered attendees of London Technology Week event Using Tech to Find a Job at Home or Abroad.

“There are too many people looking for a small number of jobs,” says Sylvia Arthur, Communicator Consultant at Gordon Square Communications and author of the book Get Hired! out on 30th June.

“The problem is that only 20% of jobs are advertised, while 25% of people are seeking a new job. If you divide twenty by twenty-five, the result of the equation is that you lose,” explains Ms Arthur.

So, how can we use technology to effectively find a job?

The first step is to analyse the “Big Data” – all the information that tells us about trends or associations, especially relating to human behaviour.

For example, if we were looking for a job in IT, we could read in the news that a new IT company has opened in Shoreditch, and from there understand that there are new IT jobs available in East London.

Big Data also tells us about salaries and cost of living in different areas, or what skills are required.

“Read job boards not as much to find a job as to understand what are the growing sectors and the jobs of the future,” is Ms Arthur’s advice.

Once you know where to go with the skills you have, you need to bear in mind that most recruiters receive thousands of CVs for a single job and they would rather ask a colleague for a referral than scan through all of them.

So if you are not lucky enough to have connections, you need to be proactive and make yourself known in the industry. “Comment, publish, be active in your area, showcase your knowledge,” says Ms Arthur.

“And when you read about an interesting opportunity, be proactive and contact the CEO, tell them what you know and what you can do for them. LinkedIn Premium free trial is a great tool to get in touch with these people.”

Another good advice is to follow the key people in your sector on social media. Of all the jobs posted on social media, 51% are on Twitter, compared to only 23% on LinkedIn.

And for those looking for jobs in the EEA, it is worth checking out EURES, a free online platform where job seekers across Europe are connected with validated recruiters.

“In Europe there are some countries with shortage of skilled workforce and others with high unemployment,” explains Grzegorz Gonciarz and Vamory Traore from WAAT.

“The aim of EURES is to tackle this problem.”

Advisers with local knowledge also help jobseekers to find more information about working and living in another European country before they move.

As for recent graduates looking for experience, a new EURES program called Drop’pin will start next week.

The program aims to fill the skills gap that separates young people from recruitment through free training sessions both online and on location.

To read the original article on London Technology Week, click here.

Originally Posted at: Using Big Data to Kick-start Your Career

Benchmarking the share of voice of Coca-Cola, Red Bull and Pepsi

Today we’re comparing three soft drink brands: Coca Cola, Pepsi and Red Bull. All are big names in the beverages industry. We’ll use BuzzTalk’s benchmark tool to find out which brand is talked about the most and how people feel about this brand. As you probably know it’s not enough if people talk about your brand. You want them to be positive and enthusiastic.

Coca Cola has the largest Share of Voice

In order to benchmark these brands we’ve created three Media Reports in BuzzTalk. These are all set-up the same way. We include news sites, blogs, journals and Twitter for the time period starting at 23 September 2013. In these reports we didn’t include printed media.

softdrinks share of buzzAs you can see Coca Cola (blue) is the dominant brand online. Nearly 45% of the publications mention Coca Cola. Red Bull (green) and Pepsi Cola (red) follow close to each other at 29 and 26%.

Benchmarking the Buzz as not all buzz is created equal

Coca Cola doesn’t dominate everywhere on the web. If we take a closer look the dominance of Coca Cola is predominantly caused by it’s share of tweets. When we zoom in on news sites we notice it’s Red Bull who’s got the biggest piece of the pie. On blogs (not shown) Coca Cola and Red Bull match up.

buzz by content type

Is Coca Cola’s dominance on Twitter due to Beliebers?

About 99,6% of Coca Cola related publications is on Twitter. Most of these tweets relate to the Coca-Cola.FM radio station in South America in relation with Justin Bieber. On 12th November Coca Cola streamed the concert of this young pop star and what we’re seeing here is the effect of ‘Beliebers’ on the share of voice.

coca cola hashtag justin bieber

The Coca Cola Christmas effect can still be detected

The Bieber effect is even stronger than christmas (42884 versus 2764 tweets).

coca cola hashtag xmas

Last year we demonstrated what’s marking the countdown to the holidays: it’s the release of the new Coca Cola TV-commercial. What we noticed then was a sudden increase in the mood state ‘tension’. In the following graph you can see it’s still there (Coca Cola is still in blue).

coca cola tension time novemberThe mood state ‘tension’ relates to both anxiety and excitement. It’s the emotion we pick up during large product releases. If this is the first time you’re reading about mood states we recommend reading this blogpost as an introduction. Mood states are an interesting add-on to sentiment to be used in predictions about human behavior. The ways in which actual predictions can be made are subject of ongoing research.

How do we feel about these brands?

Let’s examine some more mood states and see whether we can find a mood state that’s clearly associated with a brand. As you can see in the graphs below each soft drink brand gets it fair share of mood state tension. Tension not specific for Coca Cola, though it is more prominent during the countdown towards christmas.

mood states by brandPepsi Cola evokes the most ‘confusion’ and slightly more ‘anger’. The feelings of confusion are often related to feeling quilty after drinking (too much) Pepsi.

how do we feel

Red Bull generates the most mood states as it’s dominating not only for fatigue, but also – to a lesser extend – for depression, tension and vigor.

 

Striking is the amount of publications for Red Bull in which the mood state fatigue can be detected. They say “Red Bull gives you wings” and this tag line has become famous. People now associated tiredness with the desire for Red Bull. But people also blame Red Bull for (still) feeling tired or more tired. At least it’s good to see Red Bull also has it’s share in the ‘vigor’ mood state department.

To read the original article on BuzzTalk, click here.

Originally Posted at: Benchmarking the share of voice of Coca-Cola, Red Bull and Pepsi by analyticsweekpick

What Crying Baby Could Teach Big Data Discovery Solution Seekers?

What Crying Baby Could Teach Big Data Discovery Solution Seekers?
What Crying Baby Could Teach Big Data Discovery Solution Seekers?

Yes, you read it right. It is a light title for a serious problem. I spoke with big-data scientists in some fortune 100 companies and tried to poke them to learn their strategy on how they want to tackle big data & how they are figuring out the method/tool that works best for them. It was interesting to hear their story, to learn all the options that are available to them and how they ended up picking the tool. I was trying to understand/resolve the problem and then, one night I saw my 2 year daughter cry non-stop. We all huddled to find what is troubling her. Then it occurred to me that, it is the similar situation that companies are facing today.

First, let me explain what happened, and then I will try to make the connection on why and how it is relevant. On one blue moon, my daughter who has just turned two, started acting fussy compared to her normal state. There were some guests at home, so as a normal parent we started figuring out what is bothering her to calm her down, but nothing seems to be working. One of guest put forward some suggestion for the reason for her fussiness, and then there were other theories that got added. All of us were trying to find the right reason for her fussiness from our individual experience and soon, a collaboration of various tricks worked and she found her peace. Not sure if the reason for the fussiness is any important here but the good part is that she became relaxed.

Now, this is the problem that most of the companies are facing today. Like my daughter they all are fussy as they all have a big-data problem, they have lot of unknowns hiding in their data. They all can barely understand how to find them, let alone the way to put them to use. And if we compare visualization tool to guests, parent and everybody around my daughter trying to figure out their own version of what is happening- It’s a chaos. If you let one of the many figure out their version of what it is, they may be off for quite some time that could be painful, discomforting and wrong for some time. On the other hand, a model of collective wisdom worked best as everyone gave their quick thoughts which helped us collaborate and iterate on the information and figure out the best path.

Now consider companies’ using multiple tools on their problem, and babysitting for days/months/years costing time, money and resources. These tools could end up becoming the best nanny there is or the worst one. Outcome is anyone’s guess, but if you get a good tool, will you ever find out if there is a better or best tool out there. That is the problem big-data industry is facing today. Unlike their other traditional appliances/tools, big-data tool requires considerable cash influx and time/resource commitment, so going through long sales cycle and marrying a single tool should not be high on their charts.
Before you get onto your hunting, make sure to create a small data set that best defines your business chaos. The data should contain almost every aspect of your business in a way that it could work as a good recruiting tool for data discovery platform. I will go a bit deeper into what entails some good preparatory steps before you go shopping. But for this blog, let’s make sure we have our basic data set ready for testing the tools.

Now, the best approach in recruiting best visualization framework should go through one of the three ways:
1. Hiring an independent consulting, like we consult pediatrics for their expertise in dealing with baby problems, we could hire a specialized shop that could work closely with your business, and other data visualizations vendors. These consultants could help companies recruit those tools by acting as a mediation layer to help you filter out any bias, or technological challenge that restricts your decision making capabilities. These consultants could sit with your organizations, understand it’s requirements and go for tool fishing recommending the best tool that suits your needs.

2. Maximizing the use of trial periods for platform. Just as we quickly turn around things and validate which method could pacify the kids quickly and not get into long cycle of failures, we could treat It is the same. This technique is painful but still does relatively less damage than going full throttle with one tool on long journey of failure. This approach prepares you to have a mindset, tactical and strategic agenda to hire/fire tool fast and pick the best tool that is delivering maximum value per dataset. This technique is relatively expensive among the three and it could introduce some bias in the decision making.

3. Go with platform plays: Similar to pediatric clinic, you could find almost everything that could help pacify the situation. Similarly, vendors that provide you with platform system to help you experiment all those methodologies and let you pick the best combination that will work for your system. These vendors are not stuck to any visualization techniques but they make everything available to clients and help them get stuck with best package out there. Having locked at such system you could make sure that your business interest should get the highest precedence and not any specific visualization/discovery technique. For keeping the blog clean from any shout outs, I would keep the company name out of the text, but do let me know if you are interested to know which all companies provide platform play for you to experiment with.

And by that you could make the baby stop crying in fastest, most cost effective and business responsive manner.

Originally Posted at: What Crying Baby Could Teach Big Data Discovery Solution Seekers?

Lavastorm Democratizing Big Data Analytics in Face of Skills Shortage

Democratizing Big Data refers to the growing movement of making products and services more accessible to other staffers, such as business analysts, along the lines of “self-service business intelligence” (BI).

In this case, the democratized solution is “the all-in-one Lavastorm Analytics Engineplatform,” the Boston company said in an announcement today announcing product improvements. It “provides an easy-to-use, drag-and-drop data preparation environment to provide business analysts a self-serve predictive analytics solution that gives them more power and a step-by-step validation for their visualization tools.”

It addresses one of the main challenges to successful Big Data deployments, as listed in study after study: lack of specialized talent.

“Business analysts typically encounter a host of core problems when trying to utilize predictive analytics,” Lavastorm said. “They lack the necessary skills and training of data scientists to work in complex programming environments like R. Additionally, many existing BI tools are not tailored to enable self-service data assembly for business analysts to marry rich data sets with their essential business knowledge.”

XXX
[Click on image for larger view.]The Lavastorm Analytics Engine (source: Lavastorm Analytics)

That affirmation has been confirmed many times. For example, a recent report by Capgemini Consulting, “Cracking the Data Conundrum: How Successful Companies Make Big Data Operational,” says that lack of Big Data and analytics skills was reported by 25 percent of respondents as a key challenge to successful deployments. “The Big Data talent gap is something that organizations are increasingly coming face-to-face with,” Capgemini said.

Other studies indicate they haven’t been doing such a good job facing the issue, as the self-service BI promises remain unfulfilled.

Enterprises are trying many different approaches to solving the problem. Capgemini noted that some companies are investing more in training, while others try more unconventional techniques, such as partnering with other companies in employee exchange programs that share more skilled workers or teaming up with or outright acquiring startup Big Data companies to bring skills in-house.

Others, such as Altiscale Inc., offer Hadoop-as-a-Service solutions, or, like BlueData, provide self-service, on-premises private clouds with simplified analysis tools.

Lavastorm, meanwhile, uses the strategy of making the solutions simpler and easier to use. “Demand for advanced analytic capabilities from companies across the globe is growing exponentially, but data scientists or those with specialized backgrounds around predictive analytics are in short supply,” said CEO Drew Rockwell. “Business analysts have a wealth of valuable data and valuable business knowledge, and with the Lavastorm Analytics Engine, are perfectly positioned to move beyond their current expertise in descriptive analytics to focus on the future, predicting what will happen, helping their companies compete and win on analytics.”

The Lavastorm Analytics Engine comes in individual desktop editions or in server editions for use in larger workgroups or enterprise-wide.

New predictive analytics features added to the product as listed today by Lavastorm include:

  • Linear Regression: Calculate a line of best fit to estimate the values of a variable of interest.
  • Logistic Regression: Calculate probabilities of binary outcomes.
  • K-Means Clustering: Form a user-specified number of clusters out of data sets based on user-defined criteria.
  • Hierarchical Clustering: Form a user-specified number of clusters out of data sets by using an iterative process of cluster merging.
  • Decision Tree: Predict outcomes by identifying patterns from an existing data set.

These and other new features are available today, Lavastorm said, with more analytical component enhancements to the library on tap.

The company said its approach to democratizing predictive analytics gives business analysts drag-and-drop capabilities specifically designed to help them master predictive analytics.

“The addition of this capability within the Lavastorm Analytics Engine’s visual, data flow-driven approach enables a fundamentally new method for authoring advanced analyses by providing a single shared canvas upon which users with complementary skill sets can collaborate to rapidly produce robust, trusted analytical applications,” the company said.

About the Author- David Ramel is an editor and writer for 1105 Media.

Originally posted via “Lavastorm Democratizing Big Data Analytics in Face of Skills Shortage”

Source: Lavastorm Democratizing Big Data Analytics in Face of Skills Shortage