Spark Performance Optimization Series: #1. Skew

By A Mystery Man Writer

In Spark cluster data is typically read in as 128 MB partitions which ensures even distribution of data. However, as the data is transformed (e.g. aggregated), it is possible to have significantly…

Spark's Skew Problem —Does It Impact Performance ?, by Aditya Sahu, Curious Data Catalog

Spark Job Optimization: Dealing with Data Skew

List: Apache Spark, Curated by Luan Moreno M. Maciel

Handling Data Skew in Apache Spark, by Dima Statz

i.ytimg.com/vi/R3wVjyePRno/hqdefault.jpg

Spark Performance Tuning: Skewness Part 1, by Wasurat Soontronchai

List: Reading list, Curated by mohit chaurasia

List: Spark Optimization, Curated by Ashwin Krishnan

Apache Spark Core—Deep Dive—Proper Optimization

Performance Optimization of Spark-SQL

Spark Performance Tuning: Skewness Part 1, by Wasurat Soontronchai

Apache Spark 3.0 and skew join optimization in the Adaptive Query Execution

High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark 1, Karau, Holden, Warren, Rachel, eBook

List: Apache Spark, Curated by Luan Moreno M. Maciel