Don’t Let your Data Lake become a Data Swamp

In an always-on, competitive business environment, organizations are looking to gain an edge through digital transformation. Subsequently, many companies feel a sense of urgency to transform across all areas of their enterprise—from manufacturing to business operations—in the constant pursuit of continuous innovation and process efficiency.

Data is at the heart of all these digital transformation projects. It is the critical component that helps generate smarter, improved decision-make by empowering business users to eliminate gut feelings, unclear hypotheses, and false assumptions. As a result, many organizations believe building a massive data lake is the ‘silver bullet’ for delivering real-time business insights. In fact, according to a survey by CIO review from IDG, 75 percent of business leaders believe their future success will be driven by their organization’s ability to make the most of their information assets. However, only four percent of these organizations said they are set up a data-driven approach for successfully benefits from their information.

Is your Data Lake becoming more of a hindrance than an enabler?

The reality is that all these new initiatives and technologies come with a unique set of generated data, which creates additional complexity in the decision-making process. To cope with the growing volume and complexity of data and alleviate IT pressure, some are migrating to the cloud.

But this transition—in turn—creates other issues. For example, once data is made more broadly available via the cloud, more employees want access to that information. Growing numbers and varieties of business roles are looking to extract value from increasingly diverse data sets, faster than ever—putting pressure on IT organizations to deliver real-time, data access that serves the diverse needs of business users looking to apply real-time analytics to their everyday jobs. However, it’s not just about better analytics—business users also frequently want tools that allow them to prepare, share, and manage data.

To minimize tension and friction between IT and business departments, moving raw data to one place where everybody can access it sounded like a good move.  The concept of the data lake first coined by James Dixon in 2014 expected the data lake to be a large body of raw data in a more natural state where different users come to examine it, delve into it, or extract samples from it. However, increasingly organizations are beginning to realize that all the time and effort spent building massive data lakes have frequently made things worse due to poor data governance and management, which resulted in the formation of so-called “Data Swamps”.

Bad data clogging up the machinery

The same way data warehouses failed to manage data analytics a decade ago, data lakes will undoubtedly become “Data Swamps” if companies don’t manage them in the correct way. Putting all your data in a single place won’t in and of itself solve a broader data access problem. Leaving data uncontrolled, un-enriched, not qualified, and unmanaged, will dramatically hamper the benefits of a data lake, as it will still have the ability to only be utilized properly by a limited number of experts with a unique set of skills.

A success system of real-time business insights starts with a system of trust. To illustrate the negative impact of bad data and bad governance, let’s take a look at what happened to Dieselgate. The Dieselgate emissions scandal highlighted the difference between real-world and official air pollutant emissions data. In this case, the issue was not a problem of data quality, but of ethics, since some car manufacturers misled the measurement system by injecting fake data. This resulted in fines for car manufacturers exceeding more than tens of billions of dollars and consumers losing faith in the industry. After all, how can consumers trust the performance of cars now that they know the system-of-measure has been intentionally tampered with? 

The takeaway in the context of an enterprise data lake is that its value will depend on the level of trust employees have in the data contained in the lake. Failing to control data accuracy and quality within the lake will create mistrust amongst employees, seed doubt about the competency of IT, and jeopardize the whole data value chain, which then negatively impacts overall company performance.

A cloud data warehouse to deliver trusted insights for the masses

Leading firms believe governed cloud data lakes represent an adequate solution to overcoming some of these more traditional data lake stumbling blocks. The following four-step approach helps modernize cloud data warehouse while providing better insight into the entire organization. 

  1. Unite all data sources and reconcile them: Make sure the organization has the capacity to integrate a wide array of data sources, formats and sizes. Storing a wide variety of data in one place is the first step, but it’s not enough. Bridging data pipelines and reconciling them is another way to gain the capacity to manage insights. Verify the company has a cloud-enabled data management platform combining rich integration capabilities and cloud elasticity to process high data volumes at a reasonable price.
  2. Accelerate trusted insights to the masses: Efficiently manage data with cloud data integration solutions that help prepare, profile, cleanse, and mask data while monitoring data quality over time regardless of file format and size.  When coupled with cloud data warehouse capabilities, data integration can enable companies to create trusted data for access, reporting, and analytics in a fraction of the time and cost of traditional data warehouses. 
  3. Collaborative data governance to the rescue: The old schema of a data value chain where data is produced solely by IT in data warehouses and consumed by business users is no longer valid.  Now everyone wants to create content, add context, enrich data, and share it with others. Take the example of the internet and a knowledge platform such as Wikipedia where everybody can contribute, moderate and create new entries in the encyclopedia. In the same way Wikipedia established collaborative governance, companies should instill a collaborative governance in their organization by delegating the appropriate role-based, authority or access rights to citizen data scientists, line-of-business experts, and data analysts.
  4. Democratize data access and encourage users to be part of the Data Value Chain: Without making people accountable for what they’re doing, analyzing, and operating, there is little chance that organizations will succeed in implementing the right data strategy across business lines. Thus, you need to build a continuous Data Value Chain where business users contribute, share, and enrich the data flow in combination with a cloud data warehouse multi-cluster architecture that will accelerate data usage by load balancing data processing across diverse audiences.

In summary, think of data as the next strategic asset. Right now, it’s more like a hidden treasure at the bottom of many companies. Once modernized, shared and processed, data will reveal its true value, delivering better and faster insights to help companies get ahead of the competition.

The post Don’t Let your Data Lake become a Data Swamp appeared first on Talend Real-Time Open Source Data Integration Software.

Source by analyticsweek

Estimating Other “Likelihood to Recommend” Metrics from Your Net Promoter Score (NPS)

In the realm of customer experience management, businesses can employ different summary metrics of customer feedback ratings. That is, the same set of data can be summarized in different ways. Popular summary metrics include mean scores, net scores and customer segment percentages. Prior analysis of different likelihood to recommend metrics reveal, however, that they are highly correlated; that is, different summary metrics regarding the “likelihood to recommend” question tell you essentially the same thing about your customers. This post presents information to help you compare different likelihood to recommend summary metrics.

Table 1. Correlations among different summary metrics of the likelihood to recommend question.


The data were from three separate studies, each examining consumer attitudes toward either their PC Manufacturer or Wireless Service Provider. Here are the details for each study:

  1. PC manufacturer: Survey of 1058 general US consumers in Aug 2007 about their PC manufacturer. All respondents for this study were interviewed to ensure they met the correct profiling criteria, and were rewarded with an incentive for filling out the survey. Respondents were ages 18 and older. GMI (Global Market Insite, Inc., provided the respondent panels and the online data collection methodology.
  2. Wireless service provider: Survey of 994 US general consumers in June 2007 about their wireless provider. All respondents were from a panel of General Consumers in the United States ages 18 and older. The potential respondents were selected from a general panel which is recruited in a double opt-in process; all respondents were interviewed to ensure they meet correct profiling criteria. Respondents were given an incentive on a per-survey basis. GMI (Global Market Insite, Inc., provided the respondent panels and the online data collection methodology.
  3. Wireless service providers: Survey of 5686 worldwide consumers from Spring 2010 about their wireless provider. All respondents for this study were rewarded with an incentive for filling out the survey. Respondents were ages 18 or older. Mob4Hire (  provided the respondent panels and the online data collection methodology.
Figure 1. Regression equations using NPS as the predictor and other summary metrics as the criteria.
Figure 1. Regression equations using NPS as the predictor and other summary metrics as the criteria.  Click image to enlarge.

From these three studies across nearly 8000 respondents, I calculated six customer metrics for 48 different brands/companies for those that had 30 or more responses. Of the 48 different brands, most were from the Wireless Service provider industry (N = 41). The remaining seven brands were from the PC industry. I calculated six different “likelihood to recommend” metrics for each of these 48 brands.


The descriptive statistics of six different metrics and the correlations among them appear in Table 1. As you can see, five of the six summary metrics are highly related to each other. The correlations among these metrics vary from .85 to .97 (the negative correlations with Bottom 7 Box indicate that the bottom box score is a measure of badness; higher scores indicate more negative customer responses). The metric regarding the Passives segment is weakly related to the other metrics because the customer segment that it represents reflects the middle of the distribution of ratings.

The extremely high correlations among the rest of the metrics tell us that these five metrics tell us roughly the same thing about the 48 brands. That is, brands with high Net Promoter Scores are those that are getting high Mean Scores, high Top Box Scores (Promoters), Low Bottom Box Scores (Detractors) and Positive Scores (% ≥ 6).

Comparing Different Summary Metrics

It is easy to compare your company’s Net Promoter Score to other companies when other companies also report their Net Promoter Score. When different companies summarize their “likelihood to recommend” question using a Mean Score  or Top/Bottom Box Scores, this comparison across companies becomes difficult. However, we can use the current findings to help translate NPS scores into other summary metrics. Because the different metrics are so highly related, we can, with great precision, estimate different metrics from the NPS via regression analysis. Using regression analysis, I estimated the other five summary metrics from the Net Promoter Score.

I calculated five different regression questions using NPS as the predictor and each of the other summary metrics as the criterion for each equation. I selected regression equations (e.g., linear, polynomial) that optimized the percent of variance explained by the model. The Mean Score was predicted using a linear model. The remaining scores were predicted using a polynomial model. The regression equations for each of the metrics (along with a scatter plot of the associated data) appear in Figure 1. As you can see in Figure 1, most of the regression equations explain a large percent of the variance in the outcome variables.

Table 2. Net Promoter Scores and Predicted Values of Other Summary Metrics. Click image to enlarge.

Using these regression equations, you can calculate the expected summary score from any Net Promoter Score. Simply substitute the x value with your Net Promoter Score and solve for y. Table 2 provides a summary of predicted values of other summary metrics given different Net Promoter Scores. For example, an NPS of -70 is equivalent to a Mean Score of 4.9. An NPS of 0.0 is equivalent to a Mean Score of 7.1. 


Customer feedback data can be summarized in different ways. The current analysis showed that different summary metrics (e.g., Mean Scores, Net Scores, Top/Bottom Box Scores), not surprisingly, tell you the same thing; that is, summary metrics are highly correlated with each other. Using regression equations, you can easily transform your Net Promoter Score into other summary metrics.

In an upcoming post, I will examine how well customer feedback professionals are able to estimate different summary metrics (customer segment percentages) from Net Promoter Scores and Mean Scores without the assistance of regression equations. This upcoming post will highlight potential biases that professionals have when interpreting Net Promoter and Mean Scores.


The Evolution of Internet of Things (Infographics)

Internet of Things, is a new revolution of the Internet. Objects make themselves recognizable and they get intelligence thanks to the fact that they can communicate information about themselves and they can access information that has been aggregated by other things.




The Challenges Canadian Companies Face When Implementing Big Data

Big Data is a big deal but implementing it is seldom simple or cheap.

According to new Accenture research that surveyed senior executives from seven industries in 19 countries, including Canada, the three biggest challenges Canadian companies face when implementing big data are budget, a shortage of skilled professionals, and security.

Canadian executives said the three main ways their companies use big data are to identify new sources of revenue, retain and acquire customers, and develop new products and services. And they’re seeing tangible business outcomes from big data, too, in the form of customer experience enhancement and new sources of revenue.

“Businesses are at a transition point­­ where instead of just talking about the potential results that can be achieved from big data, they are realizing actual benefits including increasing revenues, a growing base of loyal customers, and more efficient operations,” said Narendra Mulani, senior managing director, Accenture Analytics, part of Accenture Digital.

“They’re recognizing that big data is one of the cornerstones of digital transformation,” Mulani added.


But half of Canadian execs cite budget as a challenge to applying big data, while 40% struggle to find talent. These are obstacles they must overcome, however, as 90% rate big data as “extremely” or “very” important to their business’ digital transformation, and 86% of those who have applied big data are satisfied with the results, according to the report.

“We’ve seen organizations overcome big data implementation challenges by remaining flexible and recognizing that no single solution suits every situation,” explained Vince Dell’Anno, managing director and global information management lead, Accenture Analytics, part of Accenture Digital. “If a particular approach doesn’t work, organizations quickly try another one, learning as they grow.

“They also start small and stay realistic in their expectations,” he added. “Rather than attempting to do everything at once, they focus resources around proving value in one area, and then let the results cascade from there.”

“Today, even the most basic items like water pipes can generate and provide data,” continued Mulani. “While the Internet of Things is giving rise to massive sources and quantities of data, new big data technologies are emerging that help uncover crucial business insights from the data.”

“Companies not implementing big data solutions are missing an opportunity to turn their data into an asset that drives business and a competitive advantage,” Mulani affirmed

Originally posted via “The Challenges Canadian Companies Face When Implementing Big Data”

Originally Posted at: The Challenges Canadian Companies Face When Implementing Big Data by analyticsweekpick

Dreaming of being Data Driven? A Case for Central Data Office

case for central data office
case for central data office

Nuf spoke about BigData and enterprise journey that takes them there. One care area where the buck stops is “who owns the data?” or “no one guy owns the data”, making accessibility of data a nightmare scenario. Companies spend lot of time and energy to get around this shortcoming and not many move on with planning this the right way. If you had spend more than a month in analytics, you could get chills when it comes to bringing data sets from other department or silo. Have you ever wondered why is this a problem and when almost everyone is suffering still, nothing is yet done? Let’s start scratching our head and start thinking about giving your business a centralized data office that takes care of the enterprise data. It’s a dream to make data availability not an IT nightmare nor an interdepartmental mind wrestle, but a smooth process. There are numerous ways to get you aligned to this journey. Following are few of several examples.

1. Reduce time to data: Yes, this should be a no surprise, when it is all managed at one place, there are bound to be centralized processes and resource allocation making it more structured and fast. You don’t have to invent laws of reciprocity to get that magical data that will help you do you job better. You will gain access to your data sets faster. On the other side, you will now know the process of sharing data and hence will share them without again getting into random processes and timings. Both sides will get a methodical and accounted system that has set expectations. This will also unhitch IT and other departments from dealing with lot of people requesting random data at random times and provide a central funnel for all dispensing.

2. Give every data an owner & thereby reduce bad data: How many times have you heard that “it’s not my data” or “I don’t own it” or “I just requested IT for it, and I don’t know much about it?” this is another big problem plaguing big enterprises. Lack of ownership comes with a price. More often than not, of adding loads of overheads and restricting the capability. Have a centralized data office reduces such problems by giving ownership to someone besides IT whose primary job is not to understand data in the first place. Having done that it became fluid easy to get to the owners and understand more about data and qualify it better for faster and effective analytics.

3. More transparency and thereby less traps & greasy edges: Having a centralized data office puts more light into data, it’s relevance and how each data is contributing to bigger enterprise dream. Having a central office give a bigger lens to data strategy teams to understand how data is shaping their existence. It makes things more transparent and thereby showing pitfalls and greasy spots clearly. This will help in better, clearer and fail safe strategies, as they will not be based on more qualified data. This will ultimately helps companies steer their data focus effectively.

4. Accelerate the journey to Data Driven Enterprise: Wow.. Really, yes really. Data Driven Enterprise is an elite status assuring your sustainable existence. It certainly does not come cheap or easy but requires a meticulous and consistent stride to data based decision-making. Having a centralized data office will surely make it all a reality. Data is the toughest part in any data driven strategy. Getting data has consistently been a struggle and not having a standardized process had made it anything but better. So, a centralized body will embrace better and consistent data quality standard making more and more data usable. Which ultimately plays a crucial role in creating a data driven organization.

5. Makes overall data analytics strategy a reality and not a hidden dream: A hidden message behind making a centralized data office is not just to make data more manageable but to empower current analytics capabilities to utilize more of data in their analysis. Overall company level analytics had been the function of people and teams that are directly working on it. We humans tends to induce some error over the period of time, so what makes us think that our designed analytics will be a true representation of the company. Having a centralized data approach will help in this regards by providing more holistic view on overall analytics strategy. The capability has suddenly increased to leverage all the data elements in defining overall analytics strategy.

So, this should be a no surprise that central data office is start of a long data driven journey as well as sustained & profitable vision of using data for effective decision-making. This will not only make data easily accessible but also encourage people to use more data and less gut in their decision-making processes.

Originally Posted at: Dreaming of being Data Driven? A Case for Central Data Office

Challenges for Data Driven Organization

Along with each new invention come its side effects or new challenges. This is true even in the case of data capturing, harnessing. Data is a holy grail for data scientists and organizations as it can help them reach the highest pinnacles of productivity, innovation, growth etc., but it comes with great responsibility. The organizations have to proactively prepare themselves in the domain of data policies, data security, legal issues, Technology, Organizational change and Talent, access to data etc. to successfully leverage the potential of data.

Data policies: As organizations start capturing and analyzing larger amounts of data, they need to setup policies that adhere and respect issues around cross national flow of data, intellectual property, and liability. Data can easily flow across the international borders in data pipes and the country or origination could be different from the country of analysis. This needs to moderated and there are policies that restrict such wide transfers of data for specific types of data like heath information. Also, there needs to be policies around who can analyze some sensitive data for individuals. So, policies restricting the use of data like credit score, SSN etc. are important for privacy considerations and preventing misuse of sensitive data. The increasing concerns around privacy of consumer data have been led by policies of some firms that have used consumer’s data for their own benefits. This needs to be mitigated by policies for protection of use of consumer data esp health and financial. Thus there is a tradeoff between utility and privacy that needs to be resolved.

Data Security: There are concerns around the security of data. Once there are policies to manage who has access and how much, we need to make sure that those policies are adhered to. In the recent past, there have been increasing instances of breach of consumer data by hackers and ill minded organizations. This has led to panic and concerns about security of data. As more and more consumer, organizational and national data gets digitalized; it would become important to protect that data with better technology and policies.

Legal Issues: Issues around use of data, ownership of data and liability arising from the use of data are new and would need to be understood and resolved. Data is different from other assets and can be easily transferred, copied and manipulated. So, this can lead to ownership issues that can become very important in a competitive situation, both within and across the organizations. There could be other issues related with the liability arising from the use and analysis of data, esp. incorrect analysis or implementation. This could have severe impact on the organization and would need clarification probably over time, to capture the full potential of data.

Technology and techniques: Need for data capture and analysis have brought organizations to a point where it is important to merge and use various data systems and mart to harness the complete value of that data. So, new techniques and technologies need to be employed to achieve this goal. Organizations need to develop the basic infrastructure and capability to support data capture, data integration, data analysis and reporting. This also implies that you need to invest in new technology, upgrade legacy systems and do change management to train personnel. There is also a need for new technologies that can help satisfy the need for data maneuvering and consumption in an easier fashion.

Organizational change and talent: This is a difficult issue and has many aspects to it. On one side, leadership may lack the understanding of big data and its potential benefits, so as to promote and approve initiatives to build capabilities. On the other side, there might be a lack to talent in the organization to effectively handle data and analyze it. This can be a big competitive advantage for companies that can use this data to effectively succeed in the market. Another issue is the lack of organizational structure, incentives to optimize the use of data to make better and informed decisions. So, the organizations have to take three fold actions – educate the leadership on the importance of big data and get their support; develop in-house capability or hire people that can handle big data; and create organizational structures to promote and optimize the use of data.

Access to data: The power of data multifold when it is integrated with other data sources to bring to light interesting insights. In most organizations, different departments use different systems with little scope for data integration. Also, as already stated, data ownership can provide the feeling of power and competitive advantage to some people in the organizations, leading to reluctance in sharing it and optimizing its use. So, we need to make sure that economic incentives are aligned within an organization to make the most effective use of data by sharing and integrating. To transform an organization, you may also need data from third party sources, and that might not be very easy to access and use. New business models are evolving and are being considered by different organizations to make such transactions easy.

Industry structure: Some industry structures have not evolved to imbibe the basic principles of efficiency and productivity. These industries are not impacted by competitive pressures and have a different rate of use of data. For example – government as well as health care are such industries where performance transparency is low and where data has not made much inroads. These industries need to improvise their productivity by using data more intensively to make more informed decisions. Organization leaders would have to determine how to evolve the structure of these organizations in an increasingly integrated and competitive world and how to use data to achieve and optimize them.

Thus, data as a business driver can be transformative for organizations if the above listed challenges can be tackled and the power of data is realized and utilized. All the stakeholders involved from leadership, to data scientists to policy makers need to understand the growing challenges as the data evolves and proactively counter them, so that we can create a culture that promotes and appreciates the use of data for everyone’s benefits.

Source by d3eksha

The 3 Step Guide CIO’s Need to Build a Data-Driven Culture

Today’s CIO has more data available than ever before. There is an opportunity for potential big improvements in decision-making outcomes, it carries huge complexity and responsibility in getting it right.

Many have already got it wrong and this is largely in part down to organisational culture. At the centre of creating a successful analytics strategy is building a data-driven culture.

According to a report by Gartner more than 35% of the top 5,000 global companies will fail to make use of the insight driven from their data. In another report by Eckerson, just 36% of the respondents gave their BI program a grade of ‘Excellent’ or ’Good’.

With the wealth of data already available in the world and the promise that it will continue to grow at an exponential rate, it seems inevitable that organisations attempt to leverage this resource to its fullest to improve their decision-making capabilities.

Before we move forward, it’s important to state that underpinning the success of these steps is to ensure all employees who have a direct involvement with the data or the insight generated are able to contribute. This point is highlighted in a case study of Warby Parker who illustrate the importance of utilising self-service technologies that help all users meet their own data needs, which, according to Carl Anderson, the director of Data Science, is essential in realising a data-driven culture.

Set Realistic Goals

I suppose this step is generic and best practice across all aspects of an organisation. However, I felt it needed to be mentioned because there are a number of examples available where decision-makers have become disillusioned with their analytics program due to it not delivering what they had expected.

Therefore, CIO’s should take the time to prepare in-depth research into their organisation; I recommend they look at current and future challenges facing their organisation and tailor their analytics strategy appropriately around solving these.

During this process, it is important to have a full understanding of the data sources currently used for analysis and reporting by the organisation as well as considering the external data sources available to the organisation that are not yet utilised.

By performing extensive research and gaining understanding on the data sources available to the organisation, it will be easier for CIO’s to set realistic and clear goals that address the challenges facing the business. Though there is still work to be done addressing how the analytics strategy will go about achieving these goals, it’s at this point where CIO’s need to get creative with the data available to them.

For example, big data has brought with it a wealth of unstructured data and many analysts believe that tapping into this unstructured data is paramount to obtaining a competitive advantage in the years to come. However it appears to be something that most will not realise any time soon as according to recent studies estimate that only around 0.5% percentage of unstructured data is analysed in the world.

Build the Right Infrastructure

Once the plan has been formulated, the next step for CIO’s is to ensure that their organisation’s IT infrastructure is aligned with the strategy so that the set goals can be achieved.

There is no universal “one way works for all” solution on building the right infrastructure; the most important factor to consider is whether the IT infrastructure can work according to the devised strategy.

A key requirement and expectation underpinning all good, modern infrastructures is the capability to integrate all of the data sources in the organisation into one central repository. The benefit being that by combining all of the data sources it provides users with a fully holistic view of the entire organisation.

For example, in a data environment where all of the organisation’s data is stored in silo, analysts may identify a trend or correlation in one data source but not have the full perspective afforded if the data were unified, i.e. what can our other data sources tell us about what has contributed to this correlation?

Legacy technologies that are now obsolete should be replaced in favour of more modern approaches to processing, storing and analysing data – one example are those technologies built on search-engine technology, as cited by Gartner.

Enable Front-Line Employees and Other Business Users

Imperative to succeeding now is ensuring that front-line employees (those whose job roles can directly benefit by having access to data) and other business users (managers, key business executives, etc.) are capable of self-serving their own data needs.

CIO’s should look to acquire a solution built specifically for self-service analysis over large-volumes of data and capable of seamless integration with their IT infrastructure.

A full analysis of employee skill-set and mind-set should be undertaken to determine whether certain employees need training in particular areas to bolster their knowledge or simply need to adapt their mind-set to a more analytical one.

Whilst it is essential that the front-line employees and other business users are given access to self-service analysis, inherently they will likely be “less-technical users”. Therefore ensuring they have the right access to training and other learning tools is vital to guarantee that they don’t become frustrated or disheartened.

By investing in employee development in these areas now, it will save time and money further down the line, removing an over reliance on both internal and external IT experts.

Source: The 3 Step Guide CIO’s Need to Build a Data-Driven Culture

Using sparklyr with Microsoft R Server

The sparklyr package (by RStudio) provides a high-level interface between R and Apache Spark. Among many other things, it allows you to filter and aggregate data in Spark using the dplyr syntax. In Microsoft R Server 9.1, you can now connect to a a Spark session using the sparklyr package as the interface, allowing you to combine the data-preparation capabilities of sparklyr and the data-analysis capabilities of Microsoft R Server in the same environment.

In a presentation by at the Spark Summit (embedded below, and you can find the slides here), Ali Zaidi shows how to connect to a Spark session from Microsoft R Server, and use the sparklyr package to extract a data set. He then shows how to build predictive models on this data (specifically, a deep Neural Network and a Boosted Trees classifier). He also shows how to build general ensemble models, cross-validate hyper-parameters in parallel, and even gives a preview of forthcoming streaming analysis capabilities.


Any easy way to try out these capabilities is with Azure HDInsight 3.6, which provides a managed Spark 2.1 instance with Microsoft R Server 9.1.

Spark Summit: Extending the R API for Spark with sparklyr and Microsoft R Server

Originally Posted at: Using sparklyr with Microsoft R Server

@DrJasonBrooks talked about the Fabric and Future of Leadership #JobsOfFuture #Podcast


In this podcast Jason talked about the fabric of a great transformative leadership. He shared some tactical steps that current leadership could follow to ensure their relevance and their association with transformative teams. Jason emphasized the role of team, leader and organization in create a healthy future proof culture. It is a good session for the leadership of tomorrow.

Jason’s Recommended Read:
Reset: Reformatting Your Purpose for Tomorrow’s World by Jason Brooks
Essentialism: The Disciplined Pursuit of Less by Greg McKeown

Podcast Link:

Jason’s BIO:
Dr. Jason Brooks is an executive, entrepreneur, consulting and leadership psychologist, bestselling author, and speaker with over 24 years of demonstrated results in the design, implementation and evaluation of leadership and organizational development, organizational effectiveness, and human capital management solutions, He work to grow leaders and enhance workforce performance and overall individual and company success. He is a results-oriented, high-impact executive leader with experience in start-up, high-growth, and operationally mature multi-million and multi-billion dollar companies in multiple industries.

About #Podcast:
#JobsOfFuture podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join?
If you or any you know wants to join in,
Register your interest @

Want to sponsor?
Email us @

#JobsOfFuture #Leadership #Podcast #Future of #Work #Worker & #Workplace

Source: @DrJasonBrooks talked about the Fabric and Future of Leadership #JobsOfFuture #Podcast by v1shal

Making sense of unstructured data by turning strings into things

Making sense of unstructured data by turning strings into things
Making sense of unstructured data by turning strings into things

We all know about the promise of Big Data Analytics to transform our understanding of the world. The analysis of structured data, such as inventory, transactions, close rates, and even clicks, likes and shares is clearly valuable, but the curious fact about the immense volume of data being produced is that a vast majority of it is unstructured text. Content such as news articles, blog post, product reviews, and yes even the dreaded 140 character novella contain tremendous value, if only they could be connected to things in the real world – people, places and things. In this talk, we’ll discuss the challenges and opportunities that result when you extract entities from Big Text.

Gregor Stewart – Director of Product Management for Text Analytics at Basis Technology

As Director of Product Management, Mr. Stewart helps to ensure that Basis Technology’s offerings stay ahead of the curve. Previously Mr. Stewart was the CTO of a storage services startup and a strategy consultant. He holds a Masters in Natural Language Processing from the University of Edinburgh, a BA in PPE from the University of Oxford, and a Masters from the London School of Economics.

Thanks to our amazing sponsors:

MicrosoftNERD for Venue

Basis Technology for Food and Kindle Raffle



Originally Posted at: Making sense of unstructured data by turning strings into things by v1shal