SQL:
1. Find avg of top merchant where mode is online (given 2 tables, merchant, payment)
2. Find average & sum of each merchant catag for the last quarter.
Pyspark:
1. Have 3 csv - movie (rating, box office, genre, etc), actor (id, name, etc), movitActorMap. Find the total box office of actorA, whose filim is in GenreA and having rating>5
2. Get top 3 actors, by sum of box office, acted more than 6 film, rating>5.
3. Pysparks internals question, coalesce() vs repartition() - when does shuffle happen?
Unix:
Basic file commands, grep, zip etc. Find the latest 3 files starting with "Latest" and zip it to this location.