Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs, and supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing,