There was a programming (homework) question.

Orbital Insight Interview Problem
Please write a program in Python that reads a text file and outputs the k most
common tokens (contiguous sequences of characters that are neither
whitespace nor “.” [period] nor “,” [comma] nor “-” [hyphen]) in the input.
The consideration of token equality should be case-insensitive; that is, foo FOO
and fOo are all considered the same token. Furthermore, this text has been
machine scanned, so there are a few other common pairs of characters that
can get confused in the scanning process: 0 and O (that’s zero and capital oh),
b and 6, B and 8, and 1, I, and l (that’s numeral 1, capital i, and lowercase L).
You should consider tokens that are equal except for case and these
equivalences to be the same for counting purposes, but you must track all of
the exact variants that are seen.
The value of k (the number of most common tokens you should output) should
be taken as a command-line argument. If it is missing, a default of 10 should
be used. Feel free to read the text from stdin, or to take a filename as a
command-line argument and read the text from that file, or both.
The output should be the various token variations seen (separated by the “|”
[pipe] character) and the count of each token, in descending order of number of
occurrences, with the variations and the count separated by a “@” character.

Question

There was a programming (homework) question. 

Orbital Insight Interview Problem
Please write a program in Python that reads a text file and outputs the k most
common tokens (contiguous sequences of characters that are neither
whitespace nor “.” [period] nor “,” [comma] nor “-” [hyphen]) in the input.
The consideration of token equality should be case-insensitive; that is, foo FOO
and fOo are all considered the same token. Furthermore, this text has been
machine scanned, so there are a few other common pairs of characters that
can get confused in the scanning process: 0 and O (that’s zero and capital oh),
b and 6, B and 8, and 1, I, and l (that’s numeral 1, capital i, and lowercase L).
You should consider tokens that are equal except for case and these
equivalences to be the same for counting purposes, but you must track all of
the exact variants that are seen.
The value of k (the number of most common tokens you should output) should
be taken as a command-line argument. If it is missing, a default of 10 should
be used. Feel free to read the text from stdin, or to take a filename as a
command-line argument and read the text from that file, or both.
The output should be the various token variations seen (separated by the “|”
[pipe] character) and the count of each token, in descending order of number of
occurrences, with the variations and the count separated by a “@” character.

Orbital Insight

Orbital Insight interview question

Want the inside scoop on your own company?

Bowls

Followed companies

Job searches