Nov 22, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Trust the data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> July 31, 2017 Health and Biotech analytics news roundup by pstein

>> For Musicians and Songwriters, Streaming Creates Big Data Challenge by analyticsweekpick

>> Simplifying Data Warehouse Optimization by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Is Population Health on the Agenda as Google Nabs Geisinger CEO? – Health IT Analytics Under  Health Analytics

>>
 How to catch security blind spots during a cloud migration – GCN.com Under  Cloud

>>
 Data Analytics Outsourcing Market Application Analysis, Regional Outlook, Growth Trends, Key Players and Forecasts … – AlgosOnline (press release) (blog) Under  Social Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Statistical Thinking and Data Analysis

image

This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and n… more

[ FEATURED READ]

Big Data: A Revolution That Will Transform How We Live, Work, and Think

image

“Illuminating and very timely . . . a fascinating — and sometimes alarming — survey of big data’s growing effect on just about everything: business, government, science and medicine, privacy, and even on the way we think… more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?
A: Lift:
It’s measure of performance of a targeting model (or a rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. Lift is simply: target response/average response.

Suppose a population has an average response rate of 5% (mailing for instance). A certain model (or rule) has identified a segment with a response rate of 20%, then lift=20/5=4

Typically, the modeler seeks to divide the population into quantiles, and rank the quantiles by lift. He can then consider each quantile, and by weighing the predicted response rate against the cost, he can decide to market that quantile or not.
“if we use the probability scores on customers, we can get 60% of the total responders we’d get mailing randomly by only mailing the top 30% of the scored customers”.

KPI:
– Key performance indicator
– A type of performance measurement
– Examples: 0 defects, 10/10 customer satisfaction
– Relies upon a good understanding of what is important to the organization

More examples:

Marketing & Sales:
– New customers acquisition
– Customer attrition
– Revenue (turnover) generated by segments of the customer population
– Often done with a data management platform

IT operations:
– Mean time between failure
– Mean time to repair

Robustness:
– Statistics with good performance even if the underlying distribution is not normal
– Statistics that are not affected by outliers
– A learning algorithm that can reduce the chance of fitting noise is called robust
– Median is a robust measure of central tendency, while mean is not
– Median absolute deviation is also more robust than the standard deviation

Model fitting:
– How well a statistical model fits a set of observations
– Examples: AIC, R2, Kolmogorov-Smirnov test, Chi 2, deviance (glm)

Design of experiments:
The design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation.
In its simplest form, an experiment aims at predicting the outcome by changing the preconditions, the predictors.
– Selection of the suitable predictors and outcomes
– Delivery of the experiment under statistically optimal conditions
– Randomization
– Blocking: an experiment may be conducted with the same equipment to avoid any unwanted variations in the input
– Replication: performing the same combination run more than once, in order to get an estimate for the amount of random error that could be part of the process
– Interaction: when an experiment has 3 or more variables, the situation in which the interaction of two variables on a third is not additive

80/20 rule:
– Pareto principle
– 80% of the effects come from 20% of the causes
– 80% of your sales come from 20% of your clients
– 80% of a company complaints come from 20% of its customers

Source

[ VIDEO OF THE WEEK]

#FutureOfData with @theClaymethod, @TiVo discussing running analytics in media industry

 #FutureOfData with @theClaymethod, @TiVo discussing running analytics in media industry

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Big Data is not the new oil. – Jer Thorp

[ PODCAST OF THE WEEK]

@DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

 @DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years.

Sourced from: Analytics.CLUB #WEB Newsletter

The Challenges Canadian Companies Face When Implementing Big Data

Big Data is a big deal but implementing it is seldom simple or cheap.

According to new Accenture research that surveyed senior executives from seven industries in 19 countries, including Canada, the three biggest challenges Canadian companies face when implementing big data are budget, a shortage of skilled professionals, and security.

Canadian executives said the three main ways their companies use big data are to identify new sources of revenue, retain and acquire customers, and develop new products and services. And they’re seeing tangible business outcomes from big data, too, in the form of customer experience enhancement and new sources of revenue.

“Businesses are at a transition point­­ where instead of just talking about the potential results that can be achieved from big data, they are realizing actual benefits including increasing revenues, a growing base of loyal customers, and more efficient operations,” said Narendra Mulani, senior managing director, Accenture Analytics, part of Accenture Digital.

“They’re recognizing that big data is one of the cornerstones of digital transformation,” Mulani added.

 

But half of Canadian execs cite budget as a challenge to applying big data, while 40% struggle to find talent. These are obstacles they must overcome, however, as 90% rate big data as “extremely” or “very” important to their business’ digital transformation, and 86% of those who have applied big data are satisfied with the results, according to the report.

“We’ve seen organizations overcome big data implementation challenges by remaining flexible and recognizing that no single solution suits every situation,” explained Vince Dell’Anno, managing director and global information management lead, Accenture Analytics, part of Accenture Digital. “If a particular approach doesn’t work, organizations quickly try another one, learning as they grow.

“They also start small and stay realistic in their expectations,” he added. “Rather than attempting to do everything at once, they focus resources around proving value in one area, and then let the results cascade from there.”

“Today, even the most basic items like water pipes can generate and provide data,” continued Mulani. “While the Internet of Things is giving rise to massive sources and quantities of data, new big data technologies are emerging that help uncover crucial business insights from the data.”

“Companies not implementing big data solutions are missing an opportunity to turn their data into an asset that drives business and a competitive advantage,” Mulani affirmed

Originally posted via “The Challenges Canadian Companies Face When Implementing Big Data”

Originally Posted at: The Challenges Canadian Companies Face When Implementing Big Data by analyticsweekpick

Dreaming of being Data Driven? A Case for Central Data Office

case for central data office
case for central data office

Nuf spoke about BigData and enterprise journey that takes them there. One care area where the buck stops is “who owns the data?” or “no one guy owns the data”, making accessibility of data a nightmare scenario. Companies spend lot of time and energy to get around this shortcoming and not many move on with planning this the right way. If you had spend more than a month in analytics, you could get chills when it comes to bringing data sets from other department or silo. Have you ever wondered why is this a problem and when almost everyone is suffering still, nothing is yet done? Let’s start scratching our head and start thinking about giving your business a centralized data office that takes care of the enterprise data. It’s a dream to make data availability not an IT nightmare nor an interdepartmental mind wrestle, but a smooth process. There are numerous ways to get you aligned to this journey. Following are few of several examples.

1. Reduce time to data: Yes, this should be a no surprise, when it is all managed at one place, there are bound to be centralized processes and resource allocation making it more structured and fast. You don’t have to invent laws of reciprocity to get that magical data that will help you do you job better. You will gain access to your data sets faster. On the other side, you will now know the process of sharing data and hence will share them without again getting into random processes and timings. Both sides will get a methodical and accounted system that has set expectations. This will also unhitch IT and other departments from dealing with lot of people requesting random data at random times and provide a central funnel for all dispensing.

2. Give every data an owner & thereby reduce bad data: How many times have you heard that “it’s not my data” or “I don’t own it” or “I just requested IT for it, and I don’t know much about it?” this is another big problem plaguing big enterprises. Lack of ownership comes with a price. More often than not, of adding loads of overheads and restricting the capability. Have a centralized data office reduces such problems by giving ownership to someone besides IT whose primary job is not to understand data in the first place. Having done that it became fluid easy to get to the owners and understand more about data and qualify it better for faster and effective analytics.

3. More transparency and thereby less traps & greasy edges: Having a centralized data office puts more light into data, it’s relevance and how each data is contributing to bigger enterprise dream. Having a central office give a bigger lens to data strategy teams to understand how data is shaping their existence. It makes things more transparent and thereby showing pitfalls and greasy spots clearly. This will help in better, clearer and fail safe strategies, as they will not be based on more qualified data. This will ultimately helps companies steer their data focus effectively.

4. Accelerate the journey to Data Driven Enterprise: Wow.. Really, yes really. Data Driven Enterprise is an elite status assuring your sustainable existence. It certainly does not come cheap or easy but requires a meticulous and consistent stride to data based decision-making. Having a centralized data office will surely make it all a reality. Data is the toughest part in any data driven strategy. Getting data has consistently been a struggle and not having a standardized process had made it anything but better. So, a centralized body will embrace better and consistent data quality standard making more and more data usable. Which ultimately plays a crucial role in creating a data driven organization.

5. Makes overall data analytics strategy a reality and not a hidden dream: A hidden message behind making a centralized data office is not just to make data more manageable but to empower current analytics capabilities to utilize more of data in their analysis. Overall company level analytics had been the function of people and teams that are directly working on it. We humans tends to induce some error over the period of time, so what makes us think that our designed analytics will be a true representation of the company. Having a centralized data approach will help in this regards by providing more holistic view on overall analytics strategy. The capability has suddenly increased to leverage all the data elements in defining overall analytics strategy.

So, this should be a no surprise that central data office is start of a long data driven journey as well as sustained & profitable vision of using data for effective decision-making. This will not only make data easily accessible but also encourage people to use more data and less gut in their decision-making processes.

Originally Posted at: Dreaming of being Data Driven? A Case for Central Data Office

Nov 15, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Tour of Accounting  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> 5 Advantages of Using a Redshift Data Warehouse by analyticsweek

>> January 23, 2017 Health and Biotech analytics news roundup by pstein

>> May 25, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ NEWS BYTES]

>>
 ND vital statistics hold steady in 2017 – Bismarck Tribune Under  Statistics

>>
 The Use of Ramped Rep Equivalents (RREs) in Sales Analytics and Modeling – Enterprise Irregulars (blog) Under  Sales Analytics

>>
 State Street: Latest investor sentiment towards Brexit – Asset Servicing Times Under  Risk Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Probability & Statistics

image

This course introduces students to the basic concepts and logic of statistical reasoning and gives the students introductory-level practical ability to choose, generate, and properly interpret appropriate descriptive and… more

[ FEATURED READ]

On Intelligence

image

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:How do you handle missing data? What imputation techniques do you recommend?
A: * If data missing at random: deletion has no bias effect, but decreases the power of the analysis by decreasing the effective sample size
* Recommended: Knn imputation, Gaussian mixture imputation

Source

[ VIDEO OF THE WEEK]

#FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

 #FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Torture the data, and it will confess to anything. – Ronald Coase

[ PODCAST OF THE WEEK]

Solving #FutureOfWork with #Detonate mindset (by @steven_goldbach & @geofftuff) #JobsOfFuture #Podcast

 Solving #FutureOfWork with #Detonate mindset (by @steven_goldbach & @geofftuff) #JobsOfFuture #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide.

Sourced from: Analytics.CLUB #WEB Newsletter

Challenges for Data Driven Organization

Along with each new invention come its side effects or new challenges. This is true even in the case of data capturing, harnessing. Data is a holy grail for data scientists and organizations as it can help them reach the highest pinnacles of productivity, innovation, growth etc., but it comes with great responsibility. The organizations have to proactively prepare themselves in the domain of data policies, data security, legal issues, Technology, Organizational change and Talent, access to data etc. to successfully leverage the potential of data.

Data policies: As organizations start capturing and analyzing larger amounts of data, they need to setup policies that adhere and respect issues around cross national flow of data, intellectual property, and liability. Data can easily flow across the international borders in data pipes and the country or origination could be different from the country of analysis. This needs to moderated and there are policies that restrict such wide transfers of data for specific types of data like heath information. Also, there needs to be policies around who can analyze some sensitive data for individuals. So, policies restricting the use of data like credit score, SSN etc. are important for privacy considerations and preventing misuse of sensitive data. The increasing concerns around privacy of consumer data have been led by policies of some firms that have used consumer’s data for their own benefits. This needs to be mitigated by policies for protection of use of consumer data esp health and financial. Thus there is a tradeoff between utility and privacy that needs to be resolved.

Data Security: There are concerns around the security of data. Once there are policies to manage who has access and how much, we need to make sure that those policies are adhered to. In the recent past, there have been increasing instances of breach of consumer data by hackers and ill minded organizations. This has led to panic and concerns about security of data. As more and more consumer, organizational and national data gets digitalized; it would become important to protect that data with better technology and policies.

Legal Issues: Issues around use of data, ownership of data and liability arising from the use of data are new and would need to be understood and resolved. Data is different from other assets and can be easily transferred, copied and manipulated. So, this can lead to ownership issues that can become very important in a competitive situation, both within and across the organizations. There could be other issues related with the liability arising from the use and analysis of data, esp. incorrect analysis or implementation. This could have severe impact on the organization and would need clarification probably over time, to capture the full potential of data.

Technology and techniques: Need for data capture and analysis have brought organizations to a point where it is important to merge and use various data systems and mart to harness the complete value of that data. So, new techniques and technologies need to be employed to achieve this goal. Organizations need to develop the basic infrastructure and capability to support data capture, data integration, data analysis and reporting. This also implies that you need to invest in new technology, upgrade legacy systems and do change management to train personnel. There is also a need for new technologies that can help satisfy the need for data maneuvering and consumption in an easier fashion.

Organizational change and talent: This is a difficult issue and has many aspects to it. On one side, leadership may lack the understanding of big data and its potential benefits, so as to promote and approve initiatives to build capabilities. On the other side, there might be a lack to talent in the organization to effectively handle data and analyze it. This can be a big competitive advantage for companies that can use this data to effectively succeed in the market. Another issue is the lack of organizational structure, incentives to optimize the use of data to make better and informed decisions. So, the organizations have to take three fold actions – educate the leadership on the importance of big data and get their support; develop in-house capability or hire people that can handle big data; and create organizational structures to promote and optimize the use of data.

Access to data: The power of data multifold when it is integrated with other data sources to bring to light interesting insights. In most organizations, different departments use different systems with little scope for data integration. Also, as already stated, data ownership can provide the feeling of power and competitive advantage to some people in the organizations, leading to reluctance in sharing it and optimizing its use. So, we need to make sure that economic incentives are aligned within an organization to make the most effective use of data by sharing and integrating. To transform an organization, you may also need data from third party sources, and that might not be very easy to access and use. New business models are evolving and are being considered by different organizations to make such transactions easy.

Industry structure: Some industry structures have not evolved to imbibe the basic principles of efficiency and productivity. These industries are not impacted by competitive pressures and have a different rate of use of data. For example – government as well as health care are such industries where performance transparency is low and where data has not made much inroads. These industries need to improvise their productivity by using data more intensively to make more informed decisions. Organization leaders would have to determine how to evolve the structure of these organizations in an increasingly integrated and competitive world and how to use data to achieve and optimize them.

Thus, data as a business driver can be transformative for organizations if the above listed challenges can be tackled and the power of data is realized and utilized. All the stakeholders involved from leadership, to data scientists to policy makers need to understand the growing challenges as the data evolves and proactively counter them, so that we can create a culture that promotes and appreciates the use of data for everyone’s benefits.

Source by d3eksha

The 3 Step Guide CIO’s Need to Build a Data-Driven Culture

Today’s CIO has more data available than ever before. There is an opportunity for potential big improvements in decision-making outcomes, it carries huge complexity and responsibility in getting it right.

Many have already got it wrong and this is largely in part down to organisational culture. At the centre of creating a successful analytics strategy is building a data-driven culture.

According to a report by Gartner more than 35% of the top 5,000 global companies will fail to make use of the insight driven from their data. In another report by Eckerson, just 36% of the respondents gave their BI program a grade of ‘Excellent’ or ’Good’.

With the wealth of data already available in the world and the promise that it will continue to grow at an exponential rate, it seems inevitable that organisations attempt to leverage this resource to its fullest to improve their decision-making capabilities.

Before we move forward, it’s important to state that underpinning the success of these steps is to ensure all employees who have a direct involvement with the data or the insight generated are able to contribute. This point is highlighted in a case study of Warby Parker who illustrate the importance of utilising self-service technologies that help all users meet their own data needs, which, according to Carl Anderson, the director of Data Science, is essential in realising a data-driven culture.

Set Realistic Goals

I suppose this step is generic and best practice across all aspects of an organisation. However, I felt it needed to be mentioned because there are a number of examples available where decision-makers have become disillusioned with their analytics program due to it not delivering what they had expected.

Therefore, CIO’s should take the time to prepare in-depth research into their organisation; I recommend they look at current and future challenges facing their organisation and tailor their analytics strategy appropriately around solving these.

During this process, it is important to have a full understanding of the data sources currently used for analysis and reporting by the organisation as well as considering the external data sources available to the organisation that are not yet utilised.

By performing extensive research and gaining understanding on the data sources available to the organisation, it will be easier for CIO’s to set realistic and clear goals that address the challenges facing the business. Though there is still work to be done addressing how the analytics strategy will go about achieving these goals, it’s at this point where CIO’s need to get creative with the data available to them.

For example, big data has brought with it a wealth of unstructured data and many analysts believe that tapping into this unstructured data is paramount to obtaining a competitive advantage in the years to come. However it appears to be something that most will not realise any time soon as according to recent studies estimate that only around 0.5% percentage of unstructured data is analysed in the world.

Build the Right Infrastructure

Once the plan has been formulated, the next step for CIO’s is to ensure that their organisation’s IT infrastructure is aligned with the strategy so that the set goals can be achieved.

There is no universal “one way works for all” solution on building the right infrastructure; the most important factor to consider is whether the IT infrastructure can work according to the devised strategy.

A key requirement and expectation underpinning all good, modern infrastructures is the capability to integrate all of the data sources in the organisation into one central repository. The benefit being that by combining all of the data sources it provides users with a fully holistic view of the entire organisation.

For example, in a data environment where all of the organisation’s data is stored in silo, analysts may identify a trend or correlation in one data source but not have the full perspective afforded if the data were unified, i.e. what can our other data sources tell us about what has contributed to this correlation?

Legacy technologies that are now obsolete should be replaced in favour of more modern approaches to processing, storing and analysing data – one example are those technologies built on search-engine technology, as cited by Gartner.

Enable Front-Line Employees and Other Business Users

Imperative to succeeding now is ensuring that front-line employees (those whose job roles can directly benefit by having access to data) and other business users (managers, key business executives, etc.) are capable of self-serving their own data needs.

CIO’s should look to acquire a solution built specifically for self-service analysis over large-volumes of data and capable of seamless integration with their IT infrastructure.

A full analysis of employee skill-set and mind-set should be undertaken to determine whether certain employees need training in particular areas to bolster their knowledge or simply need to adapt their mind-set to a more analytical one.

Whilst it is essential that the front-line employees and other business users are given access to self-service analysis, inherently they will likely be “less-technical users”. Therefore ensuring they have the right access to training and other learning tools is vital to guarantee that they don’t become frustrated or disheartened.

By investing in employee development in these areas now, it will save time and money further down the line, removing an over reliance on both internal and external IT experts.

Source: The 3 Step Guide CIO’s Need to Build a Data-Driven Culture

Nov 08, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data shortage  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ FEATURED COURSE]

Deep Learning Prerequisites: The Numpy Stack in Python

image

The Numpy, Scipy, Pandas, and Matplotlib stack: prep for deep learning, machine learning, and artificial intelligence… more

[ FEATURED READ]

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

image

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for e… more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:What is your definition of big data?
A: Big data is high volume, high velocity and/or high variety information assets that require new forms of processing
– Volume: big data doesn’t sample, just observes and tracks what happens
– Velocity: big data is often available in real-time
– Variety: big data comes from texts, images, audio, video…

Difference big data/business intelligence:
– Business intelligence uses descriptive statistics with data with high density information to measure things, detect trends etc.
– Big data uses inductive statistics (statistical inference) and concepts from non-linear system identification to infer laws (regression, classification, clustering) from large data sets with low density information to reveal relationships and dependencies or to perform prediction of outcomes or behaviors

Source

[ VIDEO OF THE WEEK]

@BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

 @BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

If you can’t explain it simply, you don’t understand it well enough. – Albert Einstein

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @MPFlowersNYC, @enigma_data

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @MPFlowersNYC, @enigma_data

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

140,000 to 190,000. Too few people with deep analytical skills to fill the demand of Big Data jobs in the U.S. by 2018.

Sourced from: Analytics.CLUB #WEB Newsletter

Using sparklyr with Microsoft R Server

The sparklyr package (by RStudio) provides a high-level interface between R and Apache Spark. Among many other things, it allows you to filter and aggregate data in Spark using the dplyr syntax. In Microsoft R Server 9.1, you can now connect to a a Spark session using the sparklyr package as the interface, allowing you to combine the data-preparation capabilities of sparklyr and the data-analysis capabilities of Microsoft R Server in the same environment.

In a presentation by at the Spark Summit (embedded below, and you can find the slides here), Ali Zaidi shows how to connect to a Spark session from Microsoft R Server, and use the sparklyr package to extract a data set. He then shows how to build predictive models on this data (specifically, a deep Neural Network and a Boosted Trees classifier). He also shows how to build general ensemble models, cross-validate hyper-parameters in parallel, and even gives a preview of forthcoming streaming analysis capabilities.

[youtube https://www.youtube.com/watch?v=8-xvKlz26vg?rel=0&w=500&h=281]

Any easy way to try out these capabilities is with Azure HDInsight 3.6, which provides a managed Spark 2.1 instance with Microsoft R Server 9.1.

Spark Summit: Extending the R API for Spark with sparklyr and Microsoft R Server

Originally Posted at: Using sparklyr with Microsoft R Server

Nov 01, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Accuracy check  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> The User Experience of State Government Websites by analyticsweek

>> Marginal gains: the rise of data analytics in sport by analyticsweekpick

>> The Pitfalls of Using Predictive Models by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 How to Avoid the Trap of Fragmented Security Analytics – Security Intelligence (blog) Under  Analytics

>>
 Are You Spending Too Much (or Too Little) on Cybersecurity? – Data Center Knowledge Under  Data Center

>>
 Most UK businesses are not insured against security breaches and data loss, says study – Information Age Under  Data Security

More NEWS ? Click Here

[ FEATURED COURSE]

Python for Beginners with Examples

image

A practical Python course for beginners with examples and exercises…. more

[ FEATURED READ]

Superintelligence: Paths, Dangers, Strategies

image

The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. Other animals have stronger muscles or sharper claws, but … more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:You have data on the durations of calls to a call center. Generate a plan for how you would code and analyze these data. Explain a plausible scenario for what the distribution of these durations might look like. How could you test, even graphically, whether your expectations are borne out?
A: 1. Exploratory data analysis
* Histogram of durations
* histogram of durations per service type, per day of week, per hours of day (durations can be systematically longer from 10am to 1pm for instance), per employee…
2. Distribution: lognormal?

3. Test graphically with QQ plot: sample quantiles of log(durations)log?(durations) Vs normal quantiles

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek #FutureOfData with Robin Thottungal(@rathottungal), Chief Data Scientist at @EPA

 @AnalyticsWeek #FutureOfData with Robin Thottungal(@rathottungal), Chief Data Scientist at @EPA

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Everybody gets so much information all day long that they lose their common sense. – Gertrude Stein

[ PODCAST OF THE WEEK]

Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

 Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Brands and organizations on Facebook receive 34,722 Likes every minute of the day.

Sourced from: Analytics.CLUB #WEB Newsletter

@DrJasonBrooks talked about the Fabric and Future of Leadership #JobsOfFuture #Podcast

[youtube https://www.youtube.com/watch?v=SB29nSaCppU]

In this podcast Jason talked about the fabric of a great transformative leadership. He shared some tactical steps that current leadership could follow to ensure their relevance and their association with transformative teams. Jason emphasized the role of team, leader and organization in create a healthy future proof culture. It is a good session for the leadership of tomorrow.

Jason’s Recommended Read:
Reset: Reformatting Your Purpose for Tomorrow’s World by Jason Brooks https://amzn.to/2rAuywh
Essentialism: The Disciplined Pursuit of Less by Greg McKeown https://amzn.to/2jOX8Xi

Podcast Link:
iTunes: http://math.im/itunes
GooglePlay: http://math.im/gplay

Jason’s BIO:
Dr. Jason Brooks is an executive, entrepreneur, consulting and leadership psychologist, bestselling author, and speaker with over 24 years of demonstrated results in the design, implementation and evaluation of leadership and organizational development, organizational effectiveness, and human capital management solutions, He work to grow leaders and enhance workforce performance and overall individual and company success. He is a results-oriented, high-impact executive leader with experience in start-up, high-growth, and operationally mature multi-million and multi-billion dollar companies in multiple industries.

About #Podcast:
#JobsOfFuture podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join?
If you or any you know wants to join in,
Register your interest @ info@analyticsweek.com

Want to sponsor?
Email us @ info@analyticsweek.com

Keywords:
#JobsOfFuture #Leadership #Podcast #Future of #Work #Worker & #Workplace

Source: @DrJasonBrooks talked about the Fabric and Future of Leadership #JobsOfFuture #Podcast by v1shal