Apache Spark Scala Interview Questions- Shyam Mallesh
Mastering requires more than reading a cheat sheet. It requires the architectural mindset taught by experts like Shyam Mallesh —understanding memory management, shuffle mechanics, and the Scala functional paradigm.
⚠️ coalesce(1) avoids shuffle but may cause data skew. Only safe if current partitions are small. Apache Spark Scala Interview Questions- Shyam Mallesh
val df = spark.read.json("data.json")
| Method | Storage Level | Purpose | |--------------|------------------------------|---------| | cache() | MEMORY_ONLY (default) | Speed up repeated actions | | persist() | Choose level (MEMORY_ONLY, MEMORY_AND_DISK, DISK_ONLY, etc.) | Fine-grained control over eviction | | checkpoint() | Saves to HDFS/S3 (reliable storage) | Break lineage, reduce driver memory, avoid recomputation chain | Mastering requires more than reading a cheat sheet