Success & Challenges running Spark on Kubernetes at scale (Data Mechanics)

About the webinar

Jean-Yves is the co-founder of Data Mechanics, a cloud-native Spark platform for data engineers, which aims to be a more developer-friendly and cost-effective alternative to services like EMR, Dataproc, and Databricks. The Data Mechanics platform is deployed on a managed Kubernetes cluster inside their customer's cloud account (AWS, GCP, or Azure). Prior to Data Mechanics, JY was one of the early engineers at Databricks.

Since 2018, Spark users can natively run Spark on Kubernetes instead of Hadoop YARN as a resource manager. Kubernetes offers advantages such as native containerization, faster startup time, cost-reductions, and a rich ecosystem of tools. The release of Spark 3.1expected in early 2021 declares Kubernetes integration GA and production-ready. With this new development, Kubernetes is expected to become the new standard for running Spark.

In this talk, JY will start off with a brief introduction to Spark on Kubernetes. From there, he will continue his talk with his lessons learned while helping customers adopt this technology as CEO of Data Mechanics.

A technical talk, with concrete code examples, tips, and a risky live demo! No prior experience with Spark or Kubernetes is required, though basic technical knowledge in one or the other will help you make the most of this talk.