Using sparklyr with Microsoft R Server

The sparklyr package (by RStudio) provides a high-level interface between R and Apache Spark. Among many other things, it allows you to filter and aggregate data in Spark using the dplyr syntax. In Microsoft R Server 9.1, you can now connect to a a Spark session using the sparklyr package as the interface, allowing you to combine the data-preparation capabilities of sparklyr and the data-analysis capabilities of Microsoft R Server in the same environment.

In a presentation by at the Spark Summit (embedded below, and you can find the slides here), Ali Zaidi shows how to connect to a Spark session from Microsoft R Server, and use the sparklyr package to extract a data set. He then shows how to build predictive models on this data (specifically, a deep Neural Network and a Boosted Trees classifier). He also shows how to build general ensemble models, cross-validate hyper-parameters in parallel, and even gives a preview of forthcoming streaming analysis capabilities.

[youtube https://www.youtube.com/watch?v=8-xvKlz26vg?rel=0&w=500&h=281]

Any easy way to try out these capabilities is with Azure HDInsight 3.6, which provides a managed Spark 2.1 instance with Microsoft R Server 9.1.

Spark Summit: Extending the R API for Spark with sparklyr and Microsoft R Server

Originally Posted at: Using sparklyr with Microsoft R Server

Leave a Reply

Your email address will not be published. Required fields are marked *