Dec 27, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)


Data shortage  Source


More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Large Visualizations in canvasXpress by analyticsweek

>> How to pick the right sample for your analysis by jburchell

>> How Google Understands You [Infographic] by v1shal

Wanna write? Click Here


 Meet data center compliance standards in hybrid deployments – TechTarget Under  Data Center

 Approaching The Hybrid Cloud Computing Model For Modern Government – Forbes Under  Cloud

 Financial Analytics Market 2018 Report with Manufacturers, Dealers, Consumers, Revenue, Regions, Types, Application – The Iowa DeltaChi Under  Financial Analytics

More NEWS ? Click Here


Introduction to Apache Spark


Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals…. more


Rise of the Robots: Technology and the Threat of a Jobless Future


What are the jobs of the future? How many will there be? And who will have them? As technology continues to accelerate and machines begin taking care of themselves, fewer people will be necessary. Artificial intelligence… more


Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.


Q:How do you assess the statistical significance of an insight?
A: * is this insight just observed by chance or is it a real insight?
Statistical significance can be accessed using hypothesis testing:
– Stating a null hypothesis which is usually the opposite of what we wish to test (classifiers A and B perform equivalently, Treatment A is equal of treatment B)
– Then, we choose a suitable statistical test and statistics used to reject the null hypothesis
– Also, we choose a critical region for the statistics to lie in that is extreme enough for the null hypothesis to be rejected (p-value)
– We calculate the observed test statistics from the data and check whether it lies in the critical region

Common tests:
– One sample Z test
– Two-sample Z test
– One sample t-test
– paired t-test
– Two sample pooled equal variances t-test
– Two sample unpooled unequal variances t-test and unequal sample sizes (Welch’s t-test)
– Chi-squared test for variances
– Chi-squared test for goodness of fit
– Anova (for instance: are the two regression models equals? F-test)
– Regression F-test (i.e: is at least one of the predictor useful in predicting the response?)



#FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

 #FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

Subscribe to  Youtube


War is 90% information. – Napoleon Bonaparte


Scott Harrison (@SRHarrisonJD) on leading the learning organization #JobsOfFuture #Podcast

 Scott Harrison (@SRHarrisonJD) on leading the learning organization #JobsOfFuture #Podcast


iTunes  GooglePlay


Estimates suggest that by better integrating big data, healthcare could save as much as $300 billion a year — that’s equal to reducing costs by $1000 a year for every man, woman, and child.

Sourced from: Analytics.CLUB #WEB Newsletter

2018 Trends in Blockchain

If the defining characteristic of data management in 2018 is the heterogeneity of contemporary computing environments, then Blockchain is a considerable factor contributing to its decentralization.

Expectations for this distributed ledger technology are decidedly high. Its low latency, irrefutable transaction capabilities are overtaking so many verticals that one of Forrester’s Top 10 Technology Trends To Watch: 2018 to 2020 predicts that by 2019 “a viable blockchain-based market will be commercialized.”

Blockchain’s growing popularity is directly attributed to its utilitarian nature, which supersedes individual industries and use cases. It’s not just a means of revolutionizing finance via cryptocurrencies such as Bitcoin, but of implementing new security paradigms, legal measures, and data sources for Artificial Intelligence. Most importantly, it could very well herald the end of silo culture.

By helping to seamlessly connect heterogeneous databases around the world in a peer-to-peer fashion, its overall impact is projected to be “as disruptive as the internet was 20 years ago—and still is” according to Algebraix Data CEO Charlie Silver.

For it to realize this future, however, there are a number of points of standardization within and between blockchain networks which must solidify.

They should begin doing so in earnest in the coming year.

Private Blockchains, Centralized Authority
The most common use of blockchain is for validating transactions and issuing monetary value for cryptocurrencies. These are typical instances of what is known as public blockchains, in which the ledger is distributed amongst individuals or businesses for sharing and determining the integrity of transactional data. What could truly trigger the expansion of Blockchain’s adoption, however, is the growing credence associated with private blockchains. These networks extend to only members of well-defined (i.e. not open to the general public) participants, such as those in a supply chain or for some other discreet business purpose. The strength of public blockchains is largely in the lack of a central authority which adds to the indisputable nature of transactions. In private blockchains, however, that centralized authority is the key to assuring the integrity of data exchanges. “What that means is there is a blockchain orchestrator that enables the interactions between the parties, coordinates all those things, provides the governance, and then when the transaction is done…you have permanent immutability, and transparency with permissions and so on,” commented One Network SVP of Products Adeel Najmi.

Smart Contracts
In addition to providing governance standards for all parties in the network, a centralized mediator also facilitates consistency in semantics and metadata which is crucial for exchanging data. Without that centralization, blockchain communities must define their own data governance protocols, semantic standards, and Master Data Management modeling conventions. The notion of standards and the legality of exchanges between blockchain in the form of smart contracts will also come to prominence in 2018. Smart contracts involve denoting what various exchanges of data mean, what takes place when such data is transmitted, and what parties must do in agreement with one another for any variety of transactions. However, the dearth of standards for blockchain—particularly as they might apply between blockchains—leads to questions of legality of certain facets of smart contracts. According to Gartner: “Much of the legal basis for identity, trust, smart contracts, and other components are undefined in a blockchain context. Established laws still need to be revised and amended to accommodate blockchain use cases, and financial reporting is still unclear.” These points of uncertainty regarding blockchain correlate to its adoption rate, yet are to be expected for an emerging technology. Silver noted, “Like in the Internet of Things, there’s all kinds of standards debates going on. All new technologies have it.”

Artificial Intelligence
One of the more exciting developments to emerge in 2018 will be the synthesis of blockchain technologies with those for AI. There are a number of hypothetical ways in which these two technologies can influence—and aid—one another. Perhaps one of the more concrete ones is that the amounts of data involved in blockchain make excellent sources to feed the neural networks which thrive on copious big data quantities. According to Gartner VP Distinguished Analyst Whit Andrews, in this respect Blockchain’s impact on AI is similar to that of the Internet of Thing’s impact on AI. “Just like IoT, [Blockchain’s] creating a whole lot of data about different things making it possible for organizations to serve as an authority where previously they had to rely on others,” Andrews explained. “That’s where Blockchain changes everything.” In public decentralized blockchains, the newfound authorization of business partners, individuals, or companies can enable the sort of data quantities which, if properly mined, contribute to newfound insights. “So, maybe Artificial Intelligence again emerges as an exceptional way of interpreting that data stream,” Andrews remarked.

What is certain, however, is that the intersection of these two technologies is still forthcoming. Andrews indicated approximately one in 25 CIOs are employing AI today, and “the figure is similar with blockchain.” 2018 advancements related to deploying AI with Blockchain pertain to resolving blockchain’s scalability to encompass exorbitant big data amounts. Najmi observed that, “Due to scalability and query limitations of traditional blockchains it is difficult to implement intelligent sense and respond capabilities with predictive, prescriptive analytics and autonomous decision making.”

Low Latency
Blockchain is reducing the persistence of silo culture throughout data management in two fundamental ways. The first is related to its low latency. The boons of a shared network become minimized if it takes inordinate amounts of times for transactions. Granted, one of the factors in decentralized blockchains is that there is a validation period. Transactions might appear with low or no latency, but they still require validation. In private blockchains with a centralized mediator, that validation period is reduced. Nonetheless, the main way implementing blockchain reduces silos is simply by connecting databases via the same ledger system. This latter aspect of blockchain is one of the reasons it is expanding across industries such as insurance and real estate. “In the U.S. and maybe in Western Europe there is good infrastructure for finding out real estate information such as who owns what, who’s got a mortgage, etc.,” Silver said. “But 90 percent of the world doesn’t have that infrastructure. So think about all the global real estate information now being accessible, and the lack of silos. That’s the perfect use case of information getting de-siloed through Blockchain.”

Growing Influence
At this point, the potential for Blockchain likely exceeds its practical utility for information assets today. Nonetheless, with its capabilities applicable to so many different facets of data management, its influence will continue to grow throughout 2018. Those capabilities encompass significant regions of finance, transactional data, legality (via smart contracts), and AI. Adoption rates ultimately depend on the viability of the public and private paradigms—both how the latter can impact the former and vice versa. The key issue at stake with these models is the resolution of standards, semantics, and governance needed to institutionalize this technology. Once that’s done, Blockchain may provide a novel means of integrating both old and new IT systems.

“If you think about the enterprise, it’s got 20, 30 years of systems that need to interoperate,” Silver said. “Old systems don’t just die; they just find a new way to integrate into a new architecture.”

Exactly what blockchain’s role in that new architecture will be remains to be seen.


Sharing R Notebooks using RMarkdown

At Databricks, we are thrilled to announce the integration of RStudio with the Databricks Unified Analytics

Unified Analytics is a new category of solutions that unify data processing with AI technologies, making AI much more achievable for enterprise organizations and enabling them to accelerate their AI initiatives. Unified Analytics makes it easier for enterprises to build data pipelines across various siloed data storage systems and to prepare labeled datasets for model building, which allows organizations to do AI on their existing data and iteratively do AI on massive data sets. Unified Analytics also provides access to a broad set of AI algorithms that can be applied to these labeled datasets iteratively to fine-tune the models. Lastly, Unified Analytics solutions also provide collaboration capabilities for data scientists and data engineers to work effectively across the entire development-to-production lifecycle. The organizations that succeed in unifying their domain data at scale and unifying that data with the best AI technologies will be the ones that succeed with AI. Want to see if Databricks’ Unified Analytics Platform can help you? Try for free today 

“>Unified Analytics Platform. You can try it out now with this RMarkdown notebook (Rmd | HTML) or visit us at

Databricks Unified Analytics Platform now supports RStudio Server (press release). Users often ask if they can move notebooks between RStudio and Databricks workspace using RMarkdown — the most popular dynamic R document format. The answer is yes, you can easily export any Databricks R notebook as an RMarkdown file, and vice versa for imports. This allows you to effortlessly share content between a Databricks R notebook and RStudio, combining the best of both environments.

What is RMarkdown
RMarkdown is the dynamic document format RStudio uses. It is normal Markdown plus embedded R (or any other language) code that can be executed to produce outputs, including tables and charts, within the document. Hence, after changing your R code, you can just rerun all code in the RMarkdown file rather than redo the whole run-copy-paste cycle. And an RMarkdown file can be directly exported into multiple formats, including HTML, PDF,  and Word.

Exporting an R Notebook to RMarkdown
To export an R notebook to an RMarkdown file, first open up the notebook, then select File > Export >RMarkdown (), as shown in the figure below.

This will create a snapshot of your notebook and serialize it as an RMarkdown which will be downloaded to your browser.

You can then launch RStudio and upload the exported RMarkdown file. Below is a screenshot:

Importing RMarkdown files as Databricks Notebooks
Importing an RMarkdown file is no different than importing any other file types. The easiest way to do so is to right-click where you want it to be imported and select Import () in the context menu:

A dialog box would pop up, just as would with importing any other file types. Importing from both a file and a URL are supported:

You can also click next to a folder’s name, at the top of the workspace area, and select Import ():

Using RMarkdown, content can be easily shared between a Databricks R notebook and RStudio. That completes the seamless integration of RStudio in Databricks’ Unified Platform. You are welcome to try it out on the Databricks Community Edition for free.  For more information, please visit

Read More
To read more about our efforts with SparkR on Databricks, we refer you to the following assets:


Try Databricks for free. Get started today.

The post Sharing R Notebooks using RMarkdown appeared first on Databricks.

Originally Posted at: Sharing R Notebooks using RMarkdown