SQL fundamentals + correctness
Joins (inner/left), window functions, deduping, aggregation, handling nulls
“Latest record per key”, sessionization, retention-style queries
Performance thinking: partition pruning, avoiding exploding joins, using appropriate filters
Python fundamentals
Writing clean functions, parsing/transforming data, error handling
Working with dates, JSON, config-driven pipelines
Unit-test mindset (what would you test, edge cases)
PySpark / distributed data
DataFrame API vs SQL, when to use each
Joins (broadcast vs shuffle), repartition/coalesce, caching, skew awareness
Window functions in Spark, incremental processing, idempotency
File formats (parquet), partitioning strategies, reading/writing S3/HDFS