There was a programming (homework) question. Orbital Insight Interview Problem Please write a program in Python that reads a text file and outputs the k most common tokens (contiguous sequences of characters that are neither whitespace nor “.” [period] nor “,” [comma] nor “-” [hyphen]) in the input. The consideration of token equality should be case-insensitive; that is, foo FOO and fOo are all considered the same token. Furthermore, this text has been machine scanned, so there are a few other common pairs of characters that can get confused in the scanning process: 0 and O (that’s zero and capital oh), b and 6, B and 8, and 1, I, and l (that’s numeral 1, capital i, and lowercase L). You should consider tokens that are equal except for case and these equivalences to be the same for counting purposes, but you must track all of the exact variants that are seen. The value of k (the number of most common tokens you should output) should be taken as a command-line argument. If it is missing, a default of 10 should be used. Feel free to read the text from stdin, or to take a filename as a command-line argument and read the text from that file, or both. The output should be the various token variations seen (separated by the “|” [pipe] character) and the count of each token, in descending order of number of occurrences, with the variations and the count separated by a “@” character.
Check out your Company Bowl for anonymous work chats.