Code block is to find how the data are partitioned

PHOTO EMBED

Fri Feb 06 2026 13:15:22 GMT+0000 (Coordinated Universal Time)

Saved by @Saravana_Kumar #python

(
    mw_dataset.withColumn("partition_id", sf.spark_partition_id())
              .groupBy("partition_id")
              .agg(sf.count(sf.col("partition_id")).alias("partition_count"))
              .orderBy(sf.desc(sf.col("partition_count")))
              .show()
)
content_copyCOPY