Borrowing Technology from Media & Entertainment for Big Data Analytics in the Cloud

For most of computing’s history, data meant “structured” data or data that fits neatly into pre-defined categories and rows stored in databases or spreadsheets. But the big data movement has changed all of that with the proliferation of unstructured data analysis. Unstructured data is any data that doesn’t fit into a predefined data model. It includes things like video, images, text, and all the data being logged by sensors and the myriad of digital devices. Where structured data is relatively easy to store and analyze using traditional technology, unstructured data isn’t.

None-the-less, today, massive collections of unstructured data are being analyzed for altruistic purposes like combating crime and preventing disease, but also for profit motivated goals like spotting business trends. And, as we’ve entered an era of pervasive surveillance – including aerial surveillance by drones and low earth orbit satellites capable of delivering 50 cm resolution imagery – media content (photos, videos and audio) are more relevant to big data analytics than ever before.

Unstructured data tends to be vastly larger than structured data, and is mostly responsible for our crossing the threshold from regular old data to “big data.” That threshold is not defined by a specific number of terabytes or even petabytes, but by what happens when data accumulates to an amount so large that innovative techniques are required to store, analyze and move it. Public cloud computing technology is one of these innovations that’s being applied to big data analytics because it offers a virtually unlimited elastic supply of compute power, networking and storage with a pay-for-use pricing model (all of which opens up new possibilities for analyzing both unstructured and structured big data).

Before their recent and unfortunate shutdown, the respected tech news and research site GigaOM released a survey on enterprise big data. In it over 90% of participants said they planned to move more than a terabyte of data into the cloud, and 20% planned to move more than 100 TB. Cloud storage is a compelling solution as both an elastic repository for this overflowing data and a location readily accessible to cloud-based analysis.

However, one of the challenges that come with using public cloud computing and cloud storage is getting the data into the cloud in the first place. Moving large files and bulk data sets over the Internet can be very inefficient with traditional protocols like FTP and HTTP (the most common way organizations move large files, and the foundation for most options cloud storage providers offer to get your data to them besides shipping hard drives).

In that same GigaOm survey, 24% expressed concern about whether their available bandwidth can accommodate pushing their large data volumes up to the cloud, and 21% worry that they don’t have the expertise to carry out the data migration (read about all the options for moving data to any of the major cloud storage providers, and you too might be intimidated).

While bandwidth and expertise are very legitimate concerns, there are SaaS (Software as a Service) large file transfer solutions that can make optimal use of bandwidth, are very easy to use and integrate with Amazon S3, Microsoft Azure and Google Cloud. In fact, the foundation technology of these solutions was originally built to move very large media files throughout the production, post production and distribution of film and television.

Back in the early 2000’s, when the Media & Entertainment industry began actively transitioning from physical media including tape and hard drives to digital file-based workflows, they had a big data movement problem too. For companies like Disney and the BBC, sending digital media between their internal locations and external editing or broadcast partners was a serious issue. Compared to everything else moving over the Internet, those files were huge. (And broadcast masters are relatively small compared to the 4K raw camera footage being captured today. For example, an hour of raw camera footage often requires a terabyte or more of storage.)

During M&E’s transition from physical media to file-based media, companies like Signiant started developing new protocols for the fast transfer of large files over public and private IP networks, with the high security that the movie industry requires for their most precious assets. The National Academy of Television Arts and Sciences even recognized Signiant’s pioneering role with a Technology and Engineering Emmy award in 2014.

Today, that technology has evolved in step with the cloud revolution, and SaaS accelerated large file transfer technology is expanding to other industries. Far faster and more reliable than older technologies like FTP and HTTP, this solution can also be delivered as a service, so users do not have to worry about provisioning hardware and software infrastructure, including scaling and balancing servers for load peaks and valleys. The “expertise” many worry about needing is a non-issue because the solution is so simple to use. And it’s being used in particular to push large volumes to cloud storage for all kinds of time-sensitive projects, including big data analytics. For example, scientists are analyzing images of snow and ice cover to learn more about climate change, and (interesting though less benevolent) businesses are analyzing images of competitors’ parking lots — counting cars by make and model — in order to understand the shopping habits and demographics of their customers.

It’s always fascinating to see how innovation occurs. It almost never springs from nothing, but is adapted from techniques and technologies employed somewhere else to solve a different challenge. Who would have thought, at the turn of the century, that the technology developed for Media & Entertainment would be so relevant to big data scientific, government and business analytics? And that technology used to produce and delivery entertainment could be leveraged for the betterment of society?

Originally posted via “Borrowing Technology from Media & Entertainment for Big Data Analytics in the Cloud”

Originally Posted at: Borrowing Technology from Media & Entertainment for Big Data Analytics in the Cloud

Customer Loyalty Feedback Meets Customer Relationship Management

clicktoolsIn my new book, Total Customer Experience, I illustrate why three types of customer loyalty are needed to understand the different ways your customers can show their loyalty towards your company or brand. The three types of loyalty are:

  1. Retention Loyalty: likelihood of customers to stay with a company
  2. Advocacy Loyalty: likelihood of customers to recommend the company/ advocate on the company’s behalf
  3. Purchasing Loyalty: likelihood of customers to expand their relationship with the company

Using this multi-faceted model, I developed a loyalty measurement approach, referred to as the RAPID Loyalty Approach, to help companies get a more comprehensive picture of customer loyalty. Understanding the factors that impact these different types of loyalty helps companies target customer experience improvement strategies to increase different types of customer loyalty.

Data Integration

When companies are able to link these RAPID loyalty metrics with other customer information, like purchase history, campaign responses and employee/partner feedback, the customer insights become deeper. TCELab  (where I am the Chief Customer Officer) is working with Clicktools to help Salesforce customers implement the RAPID Loyalty Approach. This partnership brings together TCELab’s survey knowledge and advisory services with Clicktools’ exceptional feedback software and Salesforce integration; for the fifth consecutive year, Clicktools has received the Salesforce AppExchange™ Customer Choice Award for Best Survey App.

TCELab will include RAPID surveys in Clicktools’ survey library, available in all Clicktools editions and  integrated easilywith a RAPID custom object.  Salesforce reports and dashboards, including linkage analysis will follow.  Customers can call on the expertise of TCELab for advice on tailoring the surveys for their organization and for support in analysis and reporting.

Joint Whitepaper from TCELab and Clicktools

David Jackson, founder and CEO of Clicktools, and I have co-written a whitepaper titled, “RAPID Loyalty: A Comprehensive Approach to Customer Loyalty,” to present the basic structure and benefits of the RAPID approach and to offer Clicktools customers access to a special program for getting started.

Download the Whitepaper >>

Originally Posted at: Customer Loyalty Feedback Meets Customer Relationship Management by bobehayes

Emergence of #DataOps Age – @AndyHPalmer #FutureOfData #Podcast


Emergence of #DataOps Age – @AndyHPalmer #FutureOfData


In this podcast @AndyPalmer from @Tamr sat with @Vishaltx from @AnalyticsWeek to talk about the emergence / need / market for Data Ops, a specialized capability emerging from merging data engineering and dev ops ecosystem due to increased convoluted data silos and complicated processes. Andy shared his journey on what some of the businesses and its leaders are doing wrong and how businesses needs to rethink their data silos to future proof themselves. This is a good podcast for any data leader thinking about cracking the code on getting high quality insights from data.

Andy’s Recommended Read:
Enlightenment Now: The Case for Reason, Science, Humanism, and Progress by Steven Pinker
The Three-Body Problem by Cixin Liu and Ken Liu

Andy’s BIO:
Andy Palmer is a serial entrepreneur who specializes in accelerating the growth of mission-driven startups. Andy has helped found and/or fund more than 50 innovative companies in technology, health care and the life sciences. Andy’s unique blend of strategic perspective and disciplined tactical execution is suited to environments where uncertainty is the rule rather than the exception. Andy has a specific passion for projects at the intersection of computer science and the life sciences.

Most recently, Andy co-founded Tamr, a next generation data curation company and Koa Labs, a start-up club in the heart of Harvard Square, Cambridge, MA.

Specialties: Software, Sales & Marketing, Web Services, Service Oriented Architecture, Drug Discovery, Database, Data Warehouse, Analytics, Startup, Entrepreneurship, Informatics, Enterprise Software, OLTP, Science, Internet, ecommerce, Venture Capital, Bootstrapping, Founding Team, Venture Capital firm, Software companies, early stage venture, corporate development, venture-backed, venture capital fund, world-class, stage venture capital

About #Podcast:
#FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join?
If you or any you know wants to join in,
Register your interest and email at

Want to sponsor?
Email us @

#FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Originally Posted at: Emergence of #DataOps Age – @AndyHPalmer #FutureOfData #Podcast by v1shal

Discussing the World of Crypto with @JoelComm / @BadCrypto


Discussing the World of Crypto with @JoelComm / @BadCrypto #FutureOfData


In this podcast Joel Comm from The Bad Crypto Podcast sat with Vishal Kumar, CEO AnalyticsWeek and discuss the World of Crypto Currencies. The discussion sheds light into the nuances in the rapidly exploding world of Crypto Currencies, some of the thinking behind the currencies. The discussion also sheds light into the opportunities and risks in the industry. Joel sheds his insights about how to think about theses currencies and long term implications of the algos that run these currencies. The podcast is a great listen for anyone who wants to understand the world of crypto currencies.

*please note, this podcast and / or its content in no ways advocate any investment advice and nor intended to generate any positive or negative influence. Crypto Currencies are highly volatile in nature and any investor must use absolute caution and care while evaluating such currencies.*

Joel’s Recommended Read:
Cryptocurrencies 101 By James Altucher

Podcast Link:

Joel’s BIO:
As a knowledgable & inspirational speaker, Joel speaks on a variety of business and entrepreneurial topics. He presents a step-by-step playbook on how to use social media as a leveraging tool to expand the reach of your brand, increase your customer base, and create fierce brand loyalty for your business. Joel is also able to speak with authority on the various ways to harness the marketing power of technology to explode profits. He offers an inspiring yet down-to-earth call to action for those who dream of obtaining growth and financial success. As someone who went from having only 87 cents in his bank account to creating multiple successful businesses, Joel is uniquely poised to instruct and inspire when it comes to using the various forms of new media as avenues towards the greater goal of business success. He is a broadcast veteran with thousands of hours in radio, podcasting, television and online video experience. Joel is the host of two popular, yet completely different podcasts. FUN with Joel Comm features the lighter side of top business and social leaders. The Bad Crypto Podcast makes cryptocurrency and bitcoin understandable to the masses.

Joel is the New York Times best-selling author of 14 books, including The AdSense Code, Click Here to Order: Stories from the World’s Most Successful Entrepreneurs, KaChing: How to Run an Online Business that Pays and Paysm Twitter Power 3.0 and Self Employed: 50 Signs That You Might Be an Entrepreneur. He has also written over 40 ebooks. He has appeared in The New York Times, on Jon Stewart’s The Daily Show, on CNN online, on Fox News, and many other places.

About #Podcast:
#FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join?
If you or any you know wants to join in,
Register your interest @

Want to sponsor?
Email us @

#FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy


Nick Howe (@Area9Nick) talks about fabric of learning organization to bring #JobsOfFuture #podcast


In this podcast Nick Howe (@NickJHowe) from @Area9Learning talks about the transforming world of learning landscape. He shed light on some of the learning challenges and some of the ways learning could match the evolving world and its learning needs. Nick sheds light on some tactical steps that businesses could adopt to create world class learning organization. This podcast is must for learning organization.

Nick’s Recommended Read:
The End of Average: Unlocking Our Potential by Embracing What Makes Us Different by Todd Rose
Superintelligence: Paths, Dangers, Strategies by Nick Bostrom

Podcast Link:

Nick’s BIO:
Nick Howe is an award winning Chief Learning Officer and business leader with a focus on the application of innovative education technologies. He is the Chief Learning Officer at Area9 Lyceum – one of global leaders in adaptive learning technology, a Strategic Advisor to the Institute of Simulation and Training at the University of Central Florida, and board advisor to multiple EdTech startups.

For twelve years Nick was the Chief Learning Officer at Hitachi Data Systems where he built and led the corporate university and online communities serving over 50,000 employees, resellers and customers.

With over 25 years’ global sales, sales enablement, delivery and consulting experience with Hitachi, EDS Corporation and Bechtel Inc., Nick is passionate about the transformation of customer experiences, partner relationships and employee performance through learning and collaboration

About #Podcast:
#JobsOfFuture podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Want to sponsor?
Email us @

#JobsOfFuture #Leadership #Podcast #Future of #Work #Worker & #Workplace

Originally Posted at: Nick Howe (@Area9Nick) talks about fabric of learning organization to bring #JobsOfFuture #podcast

2018 Trends in Blockchain

If the defining characteristic of data management in 2018 is the heterogeneity of contemporary computing environments, then Blockchain is a considerable factor contributing to its decentralization.

Expectations for this distributed ledger technology are decidedly high. Its low latency, irrefutable transaction capabilities are overtaking so many verticals that one of Forrester’s Top 10 Technology Trends To Watch: 2018 to 2020 predicts that by 2019 “a viable blockchain-based market will be commercialized.”

Blockchain’s growing popularity is directly attributed to its utilitarian nature, which supersedes individual industries and use cases. It’s not just a means of revolutionizing finance via cryptocurrencies such as Bitcoin, but of implementing new security paradigms, legal measures, and data sources for Artificial Intelligence. Most importantly, it could very well herald the end of silo culture.

By helping to seamlessly connect heterogeneous databases around the world in a peer-to-peer fashion, its overall impact is projected to be “as disruptive as the internet was 20 years ago—and still is” according to Algebraix Data CEO Charlie Silver.

For it to realize this future, however, there are a number of points of standardization within and between blockchain networks which must solidify.

They should begin doing so in earnest in the coming year.

Private Blockchains, Centralized Authority
The most common use of blockchain is for validating transactions and issuing monetary value for cryptocurrencies. These are typical instances of what is known as public blockchains, in which the ledger is distributed amongst individuals or businesses for sharing and determining the integrity of transactional data. What could truly trigger the expansion of Blockchain’s adoption, however, is the growing credence associated with private blockchains. These networks extend to only members of well-defined (i.e. not open to the general public) participants, such as those in a supply chain or for some other discreet business purpose. The strength of public blockchains is largely in the lack of a central authority which adds to the indisputable nature of transactions. In private blockchains, however, that centralized authority is the key to assuring the integrity of data exchanges. “What that means is there is a blockchain orchestrator that enables the interactions between the parties, coordinates all those things, provides the governance, and then when the transaction is done…you have permanent immutability, and transparency with permissions and so on,” commented One Network SVP of Products Adeel Najmi.

Smart Contracts
In addition to providing governance standards for all parties in the network, a centralized mediator also facilitates consistency in semantics and metadata which is crucial for exchanging data. Without that centralization, blockchain communities must define their own data governance protocols, semantic standards, and Master Data Management modeling conventions. The notion of standards and the legality of exchanges between blockchain in the form of smart contracts will also come to prominence in 2018. Smart contracts involve denoting what various exchanges of data mean, what takes place when such data is transmitted, and what parties must do in agreement with one another for any variety of transactions. However, the dearth of standards for blockchain—particularly as they might apply between blockchains—leads to questions of legality of certain facets of smart contracts. According to Gartner: “Much of the legal basis for identity, trust, smart contracts, and other components are undefined in a blockchain context. Established laws still need to be revised and amended to accommodate blockchain use cases, and financial reporting is still unclear.” These points of uncertainty regarding blockchain correlate to its adoption rate, yet are to be expected for an emerging technology. Silver noted, “Like in the Internet of Things, there’s all kinds of standards debates going on. All new technologies have it.”

Artificial Intelligence
One of the more exciting developments to emerge in 2018 will be the synthesis of blockchain technologies with those for AI. There are a number of hypothetical ways in which these two technologies can influence—and aid—one another. Perhaps one of the more concrete ones is that the amounts of data involved in blockchain make excellent sources to feed the neural networks which thrive on copious big data quantities. According to Gartner VP Distinguished Analyst Whit Andrews, in this respect Blockchain’s impact on AI is similar to that of the Internet of Thing’s impact on AI. “Just like IoT, [Blockchain’s] creating a whole lot of data about different things making it possible for organizations to serve as an authority where previously they had to rely on others,” Andrews explained. “That’s where Blockchain changes everything.” In public decentralized blockchains, the newfound authorization of business partners, individuals, or companies can enable the sort of data quantities which, if properly mined, contribute to newfound insights. “So, maybe Artificial Intelligence again emerges as an exceptional way of interpreting that data stream,” Andrews remarked.

What is certain, however, is that the intersection of these two technologies is still forthcoming. Andrews indicated approximately one in 25 CIOs are employing AI today, and “the figure is similar with blockchain.” 2018 advancements related to deploying AI with Blockchain pertain to resolving blockchain’s scalability to encompass exorbitant big data amounts. Najmi observed that, “Due to scalability and query limitations of traditional blockchains it is difficult to implement intelligent sense and respond capabilities with predictive, prescriptive analytics and autonomous decision making.”

Low Latency
Blockchain is reducing the persistence of silo culture throughout data management in two fundamental ways. The first is related to its low latency. The boons of a shared network become minimized if it takes inordinate amounts of times for transactions. Granted, one of the factors in decentralized blockchains is that there is a validation period. Transactions might appear with low or no latency, but they still require validation. In private blockchains with a centralized mediator, that validation period is reduced. Nonetheless, the main way implementing blockchain reduces silos is simply by connecting databases via the same ledger system. This latter aspect of blockchain is one of the reasons it is expanding across industries such as insurance and real estate. “In the U.S. and maybe in Western Europe there is good infrastructure for finding out real estate information such as who owns what, who’s got a mortgage, etc.,” Silver said. “But 90 percent of the world doesn’t have that infrastructure. So think about all the global real estate information now being accessible, and the lack of silos. That’s the perfect use case of information getting de-siloed through Blockchain.”

Growing Influence
At this point, the potential for Blockchain likely exceeds its practical utility for information assets today. Nonetheless, with its capabilities applicable to so many different facets of data management, its influence will continue to grow throughout 2018. Those capabilities encompass significant regions of finance, transactional data, legality (via smart contracts), and AI. Adoption rates ultimately depend on the viability of the public and private paradigms—both how the latter can impact the former and vice versa. The key issue at stake with these models is the resolution of standards, semantics, and governance needed to institutionalize this technology. Once that’s done, Blockchain may provide a novel means of integrating both old and new IT systems.

“If you think about the enterprise, it’s got 20, 30 years of systems that need to interoperate,” Silver said. “Old systems don’t just die; they just find a new way to integrate into a new architecture.”

Exactly what blockchain’s role in that new architecture will be remains to be seen.


Sharing R Notebooks using RMarkdown

At Databricks, we are thrilled to announce the integration of RStudio with the Databricks Unified Analytics

Unified Analytics is a new category of solutions that unify data processing with AI technologies, making AI much more achievable for enterprise organizations and enabling them to accelerate their AI initiatives. Unified Analytics makes it easier for enterprises to build data pipelines across various siloed data storage systems and to prepare labeled datasets for model building, which allows organizations to do AI on their existing data and iteratively do AI on massive data sets. Unified Analytics also provides access to a broad set of AI algorithms that can be applied to these labeled datasets iteratively to fine-tune the models. Lastly, Unified Analytics solutions also provide collaboration capabilities for data scientists and data engineers to work effectively across the entire development-to-production lifecycle. The organizations that succeed in unifying their domain data at scale and unifying that data with the best AI technologies will be the ones that succeed with AI. Want to see if Databricks’ Unified Analytics Platform can help you? Try for free today 

“>Unified Analytics Platform. You can try it out now with this RMarkdown notebook (Rmd | HTML) or visit us at

Databricks Unified Analytics Platform now supports RStudio Server (press release). Users often ask if they can move notebooks between RStudio and Databricks workspace using RMarkdown — the most popular dynamic R document format. The answer is yes, you can easily export any Databricks R notebook as an RMarkdown file, and vice versa for imports. This allows you to effortlessly share content between a Databricks R notebook and RStudio, combining the best of both environments.

What is RMarkdown
RMarkdown is the dynamic document format RStudio uses. It is normal Markdown plus embedded R (or any other language) code that can be executed to produce outputs, including tables and charts, within the document. Hence, after changing your R code, you can just rerun all code in the RMarkdown file rather than redo the whole run-copy-paste cycle. And an RMarkdown file can be directly exported into multiple formats, including HTML, PDF,  and Word.

Exporting an R Notebook to RMarkdown
To export an R notebook to an RMarkdown file, first open up the notebook, then select File > Export >RMarkdown (), as shown in the figure below.

This will create a snapshot of your notebook and serialize it as an RMarkdown which will be downloaded to your browser.

You can then launch RStudio and upload the exported RMarkdown file. Below is a screenshot:

Importing RMarkdown files as Databricks Notebooks
Importing an RMarkdown file is no different than importing any other file types. The easiest way to do so is to right-click where you want it to be imported and select Import () in the context menu:

A dialog box would pop up, just as would with importing any other file types. Importing from both a file and a URL are supported:

You can also click next to a folder’s name, at the top of the workspace area, and select Import ():

Using RMarkdown, content can be easily shared between a Databricks R notebook and RStudio. That completes the seamless integration of RStudio in Databricks’ Unified Platform. You are welcome to try it out on the Databricks Community Edition for free.  For more information, please visit

Read More
To read more about our efforts with SparkR on Databricks, we refer you to the following assets:


Try Databricks for free. Get started today.

The post Sharing R Notebooks using RMarkdown appeared first on Databricks.

Originally Posted at: Sharing R Notebooks using RMarkdown