Topic: “Introduction to Apache Spark on SHARCNET”
Speaker: Jose Nandez, SHARCNET
Webinar link: SN-Seminars Vidyo room
Apache Spark is a general purpose light cluster computer platform. Spark extends the MapReduce model to support more computations. Spark is accessible through Python, Scala, Java, R or SQL. Spark can run on Hadoop clusters or in a standalone mode, and access any Hadoop data, including databases from Cassandra, Hive or Hbase. It has been recently paired with MongoDB. In this talk, I will discuss Spark’s main data structure and commands. Some real examples where Spark can be used will be presented. I will also show how to load Spark module on SHARCNET clusters, how to submit a Spark script to the SHARCNET scheduler, and how to use SHARCNET resources for developing your Spark program using Python.
Need help attending a webinar? See the SHARCNET Help Wiki.