Borrowing Technology from Media & Entertainment for Big Data Analytics in the Cloud

For most of computing’s history, data meant “structured” data or data that fits neatly into pre-defined categories and rows stored in databases or spreadsheets. But the big data movement has changed all of that with the proliferation of unstructured data analysis. Unstructured data is any data that doesn’t fit into a predefined data model. It includes things like video, images, text, and all the data being logged by sensors and the myriad of digital devices. Where structured data is relatively easy to store and analyze using traditional technology, unstructured data isn’t.

None-the-less, today, massive collections of unstructured data are being analyzed for altruistic purposes like combating crime and preventing disease, but also for profit motivated goals like spotting business trends. And, as we’ve entered an era of pervasive surveillance – including aerial surveillance by drones and low earth orbit satellites capable of delivering 50 cm resolution imagery – media content (photos, videos and audio) are more relevant to big data analytics than ever before.

Unstructured data tends to be vastly larger than structured data, and is mostly responsible for our crossing the threshold from regular old data to “big data.” That threshold is not defined by a specific number of terabytes or even petabytes, but by what happens when data accumulates to an amount so large that innovative techniques are required to store, analyze and move it. Public cloud computing technology is one of these innovations that’s being applied to big data analytics because it offers a virtually unlimited elastic supply of compute power, networking and storage with a pay-for-use pricing model (all of which opens up new possibilities for analyzing both unstructured and structured big data).

Before their recent and unfortunate shutdown, the respected tech news and research site GigaOM released a survey on enterprise big data. In it over 90% of participants said they planned to move more than a terabyte of data into the cloud, and 20% planned to move more than 100 TB. Cloud storage is a compelling solution as both an elastic repository for this overflowing data and a location readily accessible to cloud-based analysis.

However, one of the challenges that come with using public cloud computing and cloud storage is getting the data into the cloud in the first place. Moving large files and bulk data sets over the Internet can be very inefficient with traditional protocols like FTP and HTTP (the most common way organizations move large files, and the foundation for most options cloud storage providers offer to get your data to them besides shipping hard drives).

In that same GigaOm survey, 24% expressed concern about whether their available bandwidth can accommodate pushing their large data volumes up to the cloud, and 21% worry that they don’t have the expertise to carry out the data migration (read about all the options for moving data to any of the major cloud storage providers, and you too might be intimidated).

While bandwidth and expertise are very legitimate concerns, there are SaaS (Software as a Service) large file transfer solutions that can make optimal use of bandwidth, are very easy to use and integrate with Amazon S3, Microsoft Azure and Google Cloud. In fact, the foundation technology of these solutions was originally built to move very large media files throughout the production, post production and distribution of film and television.

Back in the early 2000’s, when the Media & Entertainment industry began actively transitioning from physical media including tape and hard drives to digital file-based workflows, they had a big data movement problem too. For companies like Disney and the BBC, sending digital media between their internal locations and external editing or broadcast partners was a serious issue. Compared to everything else moving over the Internet, those files were huge. (And broadcast masters are relatively small compared to the 4K raw camera footage being captured today. For example, an hour of raw camera footage often requires a terabyte or more of storage.)

During M&E’s transition from physical media to file-based media, companies like Signiant started developing new protocols for the fast transfer of large files over public and private IP networks, with the high security that the movie industry requires for their most precious assets. The National Academy of Television Arts and Sciences even recognized Signiant’s pioneering role with a Technology and Engineering Emmy award in 2014.

Today, that technology has evolved in step with the cloud revolution, and SaaS accelerated large file transfer technology is expanding to other industries. Far faster and more reliable than older technologies like FTP and HTTP, this solution can also be delivered as a service, so users do not have to worry about provisioning hardware and software infrastructure, including scaling and balancing servers for load peaks and valleys. The “expertise” many worry about needing is a non-issue because the solution is so simple to use. And it’s being used in particular to push large volumes to cloud storage for all kinds of time-sensitive projects, including big data analytics. For example, scientists are analyzing images of snow and ice cover to learn more about climate change, and (interesting though less benevolent) businesses are analyzing images of competitors’ parking lots — counting cars by make and model — in order to understand the shopping habits and demographics of their customers.

It’s always fascinating to see how innovation occurs. It almost never springs from nothing, but is adapted from techniques and technologies employed somewhere else to solve a different challenge. Who would have thought, at the turn of the century, that the technology developed for Media & Entertainment would be so relevant to big data scientific, government and business analytics? And that technology used to produce and delivery entertainment could be leveraged for the betterment of society?

Originally posted via “Borrowing Technology from Media & Entertainment for Big Data Analytics in the Cloud”

Originally Posted at: Borrowing Technology from Media & Entertainment for Big Data Analytics in the Cloud

Jan 10, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
SQL Database  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Improving the Customer Experience Through Big Data [VIDEO] by bobehayes

>> Accelerating Discovery with a Unified Analytics Platform for Genomics by analyticsweek

>> The UX of Brokerage Websites by analyticsweek

Wanna write? Click Here

[ NEWS BYTES]

>>
 Italy-America Chamber, Luxury Marketing Council host 2nd Annual Luxury Summit – Luxury Daily Under  Social Analytics

>>
 Top five business analytics intelligence trends for 2019 – Information Age Under  Analytics

>>
 Billions of dollars have not helped Indian e-tailers figure out AI and big data – Quartz Under  Big Data Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Artificial Intelligence

image

This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more

[ FEATURED READ]

The Industries of the Future

image

The New York Times bestseller, from leading innovation expert Alec Ross, a “fascinating vision” (Forbes) of what’s next for the world and how to navigate the changes the future will bring…. more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:Why is naive Bayes so bad? How would you improve a spam detection algorithm that uses naive Bayes?
A: Naïve: the features are assumed independent/uncorrelated
Assumption not feasible in many cases
Improvement: decorrelate features (covariance matrix into identity matrix)

Source

[ VIDEO OF THE WEEK]

@JustinBorgman on Running a data science startup, one decision at a time #Futureofdata #Podcast

 @JustinBorgman on Running a data science startup, one decision at a time #Futureofdata #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data beats emotions. – Sean Rad, founder of Ad.ly

[ PODCAST OF THE WEEK]

@JustinBorgman on Running a data science startup, one decision at a time #Futureofdata #Podcast

 @JustinBorgman on Running a data science startup, one decision at a time #Futureofdata #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

73% of organizations have already invested or plan to invest in big data by 2016

Sourced from: Analytics.CLUB #WEB Newsletter

Customer Loyalty Feedback Meets Customer Relationship Management

clicktoolsIn my new book, Total Customer Experience, I illustrate why three types of customer loyalty are needed to understand the different ways your customers can show their loyalty towards your company or brand. The three types of loyalty are:

  1. Retention Loyalty: likelihood of customers to stay with a company
  2. Advocacy Loyalty: likelihood of customers to recommend the company/ advocate on the company’s behalf
  3. Purchasing Loyalty: likelihood of customers to expand their relationship with the company

Using this multi-faceted model, I developed a loyalty measurement approach, referred to as the RAPID Loyalty Approach, to help companies get a more comprehensive picture of customer loyalty. Understanding the factors that impact these different types of loyalty helps companies target customer experience improvement strategies to increase different types of customer loyalty.

Data Integration

When companies are able to link these RAPID loyalty metrics with other customer information, like purchase history, campaign responses and employee/partner feedback, the customer insights become deeper. TCELab  (where I am the Chief Customer Officer) is working with Clicktools to help Salesforce customers implement the RAPID Loyalty Approach. This partnership brings together TCELab’s survey knowledge and advisory services with Clicktools’ exceptional feedback software and Salesforce integration; for the fifth consecutive year, Clicktools has received the Salesforce AppExchange™ Customer Choice Award for Best Survey App.

TCELab will include RAPID surveys in Clicktools’ survey library, available in all Clicktools editions and  integrated easilywith a RAPID Salesforce.com custom object.  Salesforce reports and dashboards, including linkage analysis will follow.  Customers can call on the expertise of TCELab for advice on tailoring the surveys for their organization and for support in analysis and reporting.

Joint Whitepaper from TCELab and Clicktools

David Jackson, founder and CEO of Clicktools, and I have co-written a whitepaper titled, “RAPID Loyalty: A Comprehensive Approach to Customer Loyalty,” to present the basic structure and benefits of the RAPID approach and to offer Clicktools customers access to a special program for getting started.

Download the Whitepaper >>

Originally Posted at: Customer Loyalty Feedback Meets Customer Relationship Management by bobehayes

Emergence of #DataOps Age – @AndyHPalmer #FutureOfData #Podcast

[youtube https://www.youtube.com/watch?v=ER9mHaWMMww]

Emergence of #DataOps Age – @AndyHPalmer #FutureOfData

Youtube: https://youtu.be/ER9mHaWMMww
iTunes: http://math.im/itunes

In this podcast @AndyPalmer from @Tamr sat with @Vishaltx from @AnalyticsWeek to talk about the emergence / need / market for Data Ops, a specialized capability emerging from merging data engineering and dev ops ecosystem due to increased convoluted data silos and complicated processes. Andy shared his journey on what some of the businesses and its leaders are doing wrong and how businesses needs to rethink their data silos to future proof themselves. This is a good podcast for any data leader thinking about cracking the code on getting high quality insights from data.

Andy’s Recommended Read:
Enlightenment Now: The Case for Reason, Science, Humanism, and Progress by Steven Pinker https://amzn.to/2Lc6WqK
The Three-Body Problem by Cixin Liu and Ken Liu https://amzn.to/2rQyPvp

Andy’s BIO:
Andy Palmer is a serial entrepreneur who specializes in accelerating the growth of mission-driven startups. Andy has helped found and/or fund more than 50 innovative companies in technology, health care and the life sciences. Andy’s unique blend of strategic perspective and disciplined tactical execution is suited to environments where uncertainty is the rule rather than the exception. Andy has a specific passion for projects at the intersection of computer science and the life sciences.

Most recently, Andy co-founded Tamr, a next generation data curation company and Koa Labs, a start-up club in the heart of Harvard Square, Cambridge, MA.

Specialties: Software, Sales & Marketing, Web Services, Service Oriented Architecture, Drug Discovery, Database, Data Warehouse, Analytics, Startup, Entrepreneurship, Informatics, Enterprise Software, OLTP, Science, Internet, ecommerce, Venture Capital, Bootstrapping, Founding Team, Venture Capital firm, Software companies, early stage venture, corporate development, venture-backed, venture capital fund, world-class, stage venture capital

About #Podcast:
#FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join?
If you or any you know wants to join in,
Register your interest and email at info@analyticsweek.com

Want to sponsor?
Email us @ info@analyticsweek.com

Keywords:
#FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Originally Posted at: Emergence of #DataOps Age – @AndyHPalmer #FutureOfData #Podcast by v1shal

Jan 03, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Conditional Risk  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Looking out for Big Data Capital of the World by v1shal

>> It’s Official! Talend to Welcome Stitch to the Family! by analyticsweekpick

>> Data Management Rules for Analytics by analyticsweek

Wanna write? Click Here

[ NEWS BYTES]

>>
 Startups aspiring to market like big brands: with Smartech & AI, today they can – YourStory.com Under  Prescriptive Analytics

>>
 Ecolab Inc (NYSE:ECL) Institutional Investor Sentiment Analysis – The Cardinal Weekly (press release) Under  Sentiment Analysis

>>
 Data center outsourcing faces a legal test – DatacenterDynamics Under  Data Center

More NEWS ? Click Here

[ FEATURED COURSE]

Python for Beginners with Examples

image

A practical Python course for beginners with examples and exercises…. more

[ FEATURED READ]

The Black Swan: The Impact of the Highly Improbable

image

A black swan is an event, positive or negative, that is deemed improbable yet causes massive consequences. In this groundbreaking and prophetic book, Taleb shows in a playful way that Black Swan events explain almost eve… more

[ TIPS & TRICKS OF THE WEEK]

Grow at the speed of collaboration
A research by Cornerstone On Demand pointed out the need for better collaboration within workforce, and data analytics domain is no different. A rapidly changing and growing industry like data analytics is very difficult to catchup by isolated workforce. A good collaborative work-environment facilitate better flow of ideas, improved team dynamics, rapid learning, and increasing ability to cut through the noise. So, embrace collaborative team dynamics.

[ DATA SCIENCE Q&A]

Q:How do you test whether a new credit risk scoring model works?
A: * Test on a holdout set
* Kolmogorov-Smirnov test

Kolmogorov-Smirnov test:
– Non-parametric test
– Compare a sample with a reference probability distribution or compare two samples
– Quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution
– Or between the empirical distribution functions of two samples
– Null hypothesis (two-samples test): samples are drawn from the same distribution
– Can be modified as a goodness of fit test
– In our case: cumulative percentages of good, cumulative percentages of bad

Source

[ VIDEO OF THE WEEK]

Data-As-A-Service (#DAAS) to enable compliance reporting

 Data-As-A-Service (#DAAS) to enable compliance reporting

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

You can have data without information, but you cannot have information without data. – Daniel Keys Moran

[ PODCAST OF THE WEEK]

Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

 Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

2.7 Zetabytes of data exist in the digital universe today.

Sourced from: Analytics.CLUB #WEB Newsletter

Discussing the World of Crypto with @JoelComm / @BadCrypto

[youtube https://youtu.be/xJucEIDitas]

Discussing the World of Crypto with @JoelComm / @BadCrypto #FutureOfData

Youtube: https://youtu.be/xJucEIDitas
iTunes: http://apple.co/2ynxopz

In this podcast Joel Comm from The Bad Crypto Podcast sat with Vishal Kumar, CEO AnalyticsWeek and discuss the World of Crypto Currencies. The discussion sheds light into the nuances in the rapidly exploding world of Crypto Currencies, some of the thinking behind the currencies. The discussion also sheds light into the opportunities and risks in the industry. Joel sheds his insights about how to think about theses currencies and long term implications of the algos that run these currencies. The podcast is a great listen for anyone who wants to understand the world of crypto currencies.

*please note, this podcast and / or its content in no ways advocate any investment advice and nor intended to generate any positive or negative influence. Crypto Currencies are highly volatile in nature and any investor must use absolute caution and care while evaluating such currencies.*

Joel’s Recommended Read:
Cryptocurrencies 101 By James Altucher http://bit.ly/2Bi5FMv

Podcast Link:
iTunes: http://math.im/itunes
GooglePlay: http://math.im/gplay

Joel’s BIO:
As a knowledgable & inspirational speaker, Joel speaks on a variety of business and entrepreneurial topics. He presents a step-by-step playbook on how to use social media as a leveraging tool to expand the reach of your brand, increase your customer base, and create fierce brand loyalty for your business. Joel is also able to speak with authority on the various ways to harness the marketing power of technology to explode profits. He offers an inspiring yet down-to-earth call to action for those who dream of obtaining growth and financial success. As someone who went from having only 87 cents in his bank account to creating multiple successful businesses, Joel is uniquely poised to instruct and inspire when it comes to using the various forms of new media as avenues towards the greater goal of business success. He is a broadcast veteran with thousands of hours in radio, podcasting, television and online video experience. Joel is the host of two popular, yet completely different podcasts. FUN with Joel Comm features the lighter side of top business and social leaders. The Bad Crypto Podcast makes cryptocurrency and bitcoin understandable to the masses.

Joel is the New York Times best-selling author of 14 books, including The AdSense Code, Click Here to Order: Stories from the World’s Most Successful Entrepreneurs, KaChing: How to Run an Online Business that Pays and Paysm Twitter Power 3.0 and Self Employed: 50 Signs That You Might Be an Entrepreneur. He has also written over 40 ebooks. He has appeared in The New York Times, on Jon Stewart’s The Daily Show, on CNN online, on Fox News, and many other places.

About #Podcast:
#FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join?
If you or any you know wants to join in,
Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor?
Email us @ info@analyticsweek.com

Keywords:
#FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Source

Nick Howe (@Area9Nick) talks about fabric of learning organization to bring #JobsOfFuture #podcast

[youtube https://www.youtube.com/watch?v=-1ZP_tbZFgI]

In this podcast Nick Howe (@NickJHowe) from @Area9Learning talks about the transforming world of learning landscape. He shed light on some of the learning challenges and some of the ways learning could match the evolving world and its learning needs. Nick sheds light on some tactical steps that businesses could adopt to create world class learning organization. This podcast is must for learning organization.

Nick’s Recommended Read:
The End of Average: Unlocking Our Potential by Embracing What Makes Us Different by Todd Rose https://amzn.to/2kiahYN
Superintelligence: Paths, Dangers, Strategies by Nick Bostrom https://amzn.to/2IAPURg

Podcast Link:
iTunes: http://math.im/jofitunes
GooglePlay: http://math.im/jofgplay

Nick’s BIO:
Nick Howe is an award winning Chief Learning Officer and business leader with a focus on the application of innovative education technologies. He is the Chief Learning Officer at Area9 Lyceum – one of global leaders in adaptive learning technology, a Strategic Advisor to the Institute of Simulation and Training at the University of Central Florida, and board advisor to multiple EdTech startups.

For twelve years Nick was the Chief Learning Officer at Hitachi Data Systems where he built and led the corporate university and online communities serving over 50,000 employees, resellers and customers.

With over 25 years’ global sales, sales enablement, delivery and consulting experience with Hitachi, EDS Corporation and Bechtel Inc., Nick is passionate about the transformation of customer experiences, partner relationships and employee performance through learning and collaboration

About #Podcast:
#JobsOfFuture podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Want to sponsor?
Email us @ info@analyticsweek.com

Keywords:
#JobsOfFuture #Leadership #Podcast #Future of #Work #Worker & #Workplace

Originally Posted at: Nick Howe (@Area9Nick) talks about fabric of learning organization to bring #JobsOfFuture #podcast

Dec 27, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data shortage  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Large Visualizations in canvasXpress by analyticsweek

>> How to pick the right sample for your analysis by jburchell

>> How Google Understands You [Infographic] by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Meet data center compliance standards in hybrid deployments – TechTarget Under  Data Center

>>
 Approaching The Hybrid Cloud Computing Model For Modern Government – Forbes Under  Cloud

>>
 Financial Analytics Market 2018 Report with Manufacturers, Dealers, Consumers, Revenue, Regions, Types, Application – The Iowa DeltaChi Under  Financial Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Introduction to Apache Spark

image

Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals…. more

[ FEATURED READ]

Rise of the Robots: Technology and the Threat of a Jobless Future

image

What are the jobs of the future? How many will there be? And who will have them? As technology continues to accelerate and machines begin taking care of themselves, fewer people will be necessary. Artificial intelligence… more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:How do you assess the statistical significance of an insight?
A: * is this insight just observed by chance or is it a real insight?
Statistical significance can be accessed using hypothesis testing:
– Stating a null hypothesis which is usually the opposite of what we wish to test (classifiers A and B perform equivalently, Treatment A is equal of treatment B)
– Then, we choose a suitable statistical test and statistics used to reject the null hypothesis
– Also, we choose a critical region for the statistics to lie in that is extreme enough for the null hypothesis to be rejected (p-value)
– We calculate the observed test statistics from the data and check whether it lies in the critical region

Common tests:
– One sample Z test
– Two-sample Z test
– One sample t-test
– paired t-test
– Two sample pooled equal variances t-test
– Two sample unpooled unequal variances t-test and unequal sample sizes (Welch’s t-test)
– Chi-squared test for variances
– Chi-squared test for goodness of fit
– Anova (for instance: are the two regression models equals? F-test)
– Regression F-test (i.e: is at least one of the predictor useful in predicting the response?)

Source

[ VIDEO OF THE WEEK]

#FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

 #FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

War is 90% information. – Napoleon Bonaparte

[ PODCAST OF THE WEEK]

Scott Harrison (@SRHarrisonJD) on leading the learning organization #JobsOfFuture #Podcast

 Scott Harrison (@SRHarrisonJD) on leading the learning organization #JobsOfFuture #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Estimates suggest that by better integrating big data, healthcare could save as much as $300 billion a year — that’s equal to reducing costs by $1000 a year for every man, woman, and child.

Sourced from: Analytics.CLUB #WEB Newsletter

2018 Trends in Blockchain

If the defining characteristic of data management in 2018 is the heterogeneity of contemporary computing environments, then Blockchain is a considerable factor contributing to its decentralization.

Expectations for this distributed ledger technology are decidedly high. Its low latency, irrefutable transaction capabilities are overtaking so many verticals that one of Forrester’s Top 10 Technology Trends To Watch: 2018 to 2020 predicts that by 2019 “a viable blockchain-based market will be commercialized.”

Blockchain’s growing popularity is directly attributed to its utilitarian nature, which supersedes individual industries and use cases. It’s not just a means of revolutionizing finance via cryptocurrencies such as Bitcoin, but of implementing new security paradigms, legal measures, and data sources for Artificial Intelligence. Most importantly, it could very well herald the end of silo culture.

By helping to seamlessly connect heterogeneous databases around the world in a peer-to-peer fashion, its overall impact is projected to be “as disruptive as the internet was 20 years ago—and still is” according to Algebraix Data CEO Charlie Silver.

For it to realize this future, however, there are a number of points of standardization within and between blockchain networks which must solidify.

They should begin doing so in earnest in the coming year.

Private Blockchains, Centralized Authority
The most common use of blockchain is for validating transactions and issuing monetary value for cryptocurrencies. These are typical instances of what is known as public blockchains, in which the ledger is distributed amongst individuals or businesses for sharing and determining the integrity of transactional data. What could truly trigger the expansion of Blockchain’s adoption, however, is the growing credence associated with private blockchains. These networks extend to only members of well-defined (i.e. not open to the general public) participants, such as those in a supply chain or for some other discreet business purpose. The strength of public blockchains is largely in the lack of a central authority which adds to the indisputable nature of transactions. In private blockchains, however, that centralized authority is the key to assuring the integrity of data exchanges. “What that means is there is a blockchain orchestrator that enables the interactions between the parties, coordinates all those things, provides the governance, and then when the transaction is done…you have permanent immutability, and transparency with permissions and so on,” commented One Network SVP of Products Adeel Najmi.

Smart Contracts
In addition to providing governance standards for all parties in the network, a centralized mediator also facilitates consistency in semantics and metadata which is crucial for exchanging data. Without that centralization, blockchain communities must define their own data governance protocols, semantic standards, and Master Data Management modeling conventions. The notion of standards and the legality of exchanges between blockchain in the form of smart contracts will also come to prominence in 2018. Smart contracts involve denoting what various exchanges of data mean, what takes place when such data is transmitted, and what parties must do in agreement with one another for any variety of transactions. However, the dearth of standards for blockchain—particularly as they might apply between blockchains—leads to questions of legality of certain facets of smart contracts. According to Gartner: “Much of the legal basis for identity, trust, smart contracts, and other components are undefined in a blockchain context. Established laws still need to be revised and amended to accommodate blockchain use cases, and financial reporting is still unclear.” These points of uncertainty regarding blockchain correlate to its adoption rate, yet are to be expected for an emerging technology. Silver noted, “Like in the Internet of Things, there’s all kinds of standards debates going on. All new technologies have it.”

Artificial Intelligence
One of the more exciting developments to emerge in 2018 will be the synthesis of blockchain technologies with those for AI. There are a number of hypothetical ways in which these two technologies can influence—and aid—one another. Perhaps one of the more concrete ones is that the amounts of data involved in blockchain make excellent sources to feed the neural networks which thrive on copious big data quantities. According to Gartner VP Distinguished Analyst Whit Andrews, in this respect Blockchain’s impact on AI is similar to that of the Internet of Thing’s impact on AI. “Just like IoT, [Blockchain’s] creating a whole lot of data about different things making it possible for organizations to serve as an authority where previously they had to rely on others,” Andrews explained. “That’s where Blockchain changes everything.” In public decentralized blockchains, the newfound authorization of business partners, individuals, or companies can enable the sort of data quantities which, if properly mined, contribute to newfound insights. “So, maybe Artificial Intelligence again emerges as an exceptional way of interpreting that data stream,” Andrews remarked.

What is certain, however, is that the intersection of these two technologies is still forthcoming. Andrews indicated approximately one in 25 CIOs are employing AI today, and “the figure is similar with blockchain.” 2018 advancements related to deploying AI with Blockchain pertain to resolving blockchain’s scalability to encompass exorbitant big data amounts. Najmi observed that, “Due to scalability and query limitations of traditional blockchains it is difficult to implement intelligent sense and respond capabilities with predictive, prescriptive analytics and autonomous decision making.”

Low Latency
Blockchain is reducing the persistence of silo culture throughout data management in two fundamental ways. The first is related to its low latency. The boons of a shared network become minimized if it takes inordinate amounts of times for transactions. Granted, one of the factors in decentralized blockchains is that there is a validation period. Transactions might appear with low or no latency, but they still require validation. In private blockchains with a centralized mediator, that validation period is reduced. Nonetheless, the main way implementing blockchain reduces silos is simply by connecting databases via the same ledger system. This latter aspect of blockchain is one of the reasons it is expanding across industries such as insurance and real estate. “In the U.S. and maybe in Western Europe there is good infrastructure for finding out real estate information such as who owns what, who’s got a mortgage, etc.,” Silver said. “But 90 percent of the world doesn’t have that infrastructure. So think about all the global real estate information now being accessible, and the lack of silos. That’s the perfect use case of information getting de-siloed through Blockchain.”

Growing Influence
At this point, the potential for Blockchain likely exceeds its practical utility for information assets today. Nonetheless, with its capabilities applicable to so many different facets of data management, its influence will continue to grow throughout 2018. Those capabilities encompass significant regions of finance, transactional data, legality (via smart contracts), and AI. Adoption rates ultimately depend on the viability of the public and private paradigms—both how the latter can impact the former and vice versa. The key issue at stake with these models is the resolution of standards, semantics, and governance needed to institutionalize this technology. Once that’s done, Blockchain may provide a novel means of integrating both old and new IT systems.

“If you think about the enterprise, it’s got 20, 30 years of systems that need to interoperate,” Silver said. “Old systems don’t just die; they just find a new way to integrate into a new architecture.”

Exactly what blockchain’s role in that new architecture will be remains to be seen.

Source

Sharing R Notebooks using RMarkdown

At Databricks, we are thrilled to announce the integration of RStudio with the Databricks Unified Analytics

Unified Analytics is a new category of solutions that unify data processing with AI technologies, making AI much more achievable for enterprise organizations and enabling them to accelerate their AI initiatives. Unified Analytics makes it easier for enterprises to build data pipelines across various siloed data storage systems and to prepare labeled datasets for model building, which allows organizations to do AI on their existing data and iteratively do AI on massive data sets. Unified Analytics also provides access to a broad set of AI algorithms that can be applied to these labeled datasets iteratively to fine-tune the models. Lastly, Unified Analytics solutions also provide collaboration capabilities for data scientists and data engineers to work effectively across the entire development-to-production lifecycle. The organizations that succeed in unifying their domain data at scale and unifying that data with the best AI technologies will be the ones that succeed with AI. Want to see if Databricks’ Unified Analytics Platform can help you? Try for free today 

“>Unified Analytics Platform. You can try it out now with this RMarkdown notebook (Rmd | HTML) or visit us at www.databricks.com/rstudio.

Introduction
Databricks Unified Analytics Platform now supports RStudio Server (press release). Users often ask if they can move notebooks between RStudio and Databricks workspace using RMarkdown — the most popular dynamic R document format. The answer is yes, you can easily export any Databricks R notebook as an RMarkdown file, and vice versa for imports. This allows you to effortlessly share content between a Databricks R notebook and RStudio, combining the best of both environments.

What is RMarkdown
RMarkdown is the dynamic document format RStudio uses. It is normal Markdown plus embedded R (or any other language) code that can be executed to produce outputs, including tables and charts, within the document. Hence, after changing your R code, you can just rerun all code in the RMarkdown file rather than redo the whole run-copy-paste cycle. And an RMarkdown file can be directly exported into multiple formats, including HTML, PDF,  and Word.

Exporting an R Notebook to RMarkdown
To export an R notebook to an RMarkdown file, first open up the notebook, then select File > Export >RMarkdown (), as shown in the figure below.

This will create a snapshot of your notebook and serialize it as an RMarkdown which will be downloaded to your browser.

You can then launch RStudio and upload the exported RMarkdown file. Below is a screenshot:

Importing RMarkdown files as Databricks Notebooks
Importing an RMarkdown file is no different than importing any other file types. The easiest way to do so is to right-click where you want it to be imported and select Import () in the context menu:


A dialog box would pop up, just as would with importing any other file types. Importing from both a file and a URL are supported:

You can also click next to a folder’s name, at the top of the workspace area, and select Import ():

Conclusion
Using RMarkdown, content can be easily shared between a Databricks R notebook and RStudio. That completes the seamless integration of RStudio in Databricks’ Unified Platform. You are welcome to try it out on the Databricks Community Edition for free.  For more information, please visit www.databricks.com/rstudio.

Read More
To read more about our efforts with SparkR on Databricks, we refer you to the following assets:

 

Try Databricks for free. Get started today.

The post Sharing R Notebooks using RMarkdown appeared first on Databricks.

Originally Posted at: Sharing R Notebooks using RMarkdown