Spark Performance Optimization Series: #1. Skew
In Spark cluster data is typically read in as 128 MB partitions which ensures even distribution of data. However, as the data is transformed (e.g. aggregated), it is possible to have significantly…
Spark's Skew Problem —Does It Impact Performance ?, by Aditya Sahu, Curious Data Catalog
Spark Job Optimization: Dealing with Data Skew
List: Apache Spark, Curated by Luan Moreno M. Maciel
Handling Data Skew in Apache Spark, by Dima Statz
i.ytimg.com/vi/R3wVjyePRno/hqdefault.jpg
Spark Performance Tuning: Skewness Part 1, by Wasurat Soontronchai
List: Reading list, Curated by mohit chaurasia
List: Spark Optimization, Curated by Ashwin Krishnan
Apache Spark Core—Deep Dive—Proper Optimization
Performance Optimization of Spark-SQL
Spark Performance Tuning: Skewness Part 1, by Wasurat Soontronchai
Apache Spark 3.0 and skew join optimization in the Adaptive Query Execution
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark 1, Karau, Holden, Warren, Rachel, eBook
List: Apache Spark, Curated by Luan Moreno M. Maciel