Invite people to do 2 rounds of technical interviews to qualify “CV review” then to reject people. One remote about simple llm evaluation using a golden dataset. Another simple live coding of KV cache in Python.
Interviewer was also not great at interviewing people.
Interview questions [1]
Question 1
Actually didn’t ask any llm / ai engineering questions.
I applied through a recruiter. The process took 1 week. I interviewed at ellamind (Berlin) in Dec 2025
Interview
-In the first screening, I talked with a head hunter from linkedin.
-Then an external HR company did the second screening because the company is so small, they don't have their own HR.
-after passing those, they sent me a coding challenge which was very easy for me. there is a small problem with this step though. You see the challenge and a timer starts, you have to submit in 90 minutes. If anything is unclear for you, you cannot ask your question, get an answer and complete the challenge in time.
- Even though I reached 100% accuracy in the coding challenge, they have sent me a rejection email in less than a day. It has startled me and I asked for a feedback where I failed and how I can improve myself. They didn't answer to it.
So I have been through 3 interview steps without seeing or talking to anyone from ellamind and they left my question unanswered. My take is that they are a very small team in which, time & speed are more valuable than professionality and kindness.
Interview questions [1]
Question 1
Implement an LLM-as-a-judge. Its task is to judge the outputs of other LLMs.
Your judge will be presented with four completions to a given prompt. It needs to judge which one is the best.
Use Gemini 2.0 Flash and try to achieve a high accuracy on the provided dataset.
## Requirements
1. Install dependencies: `pip install -r requirements.txt` (tested with Python 3.10).
2. Copy `.env_example` to `.env` and fill in the API key we provided.
3. Implement your full solution in `main.py` only. Do not add any other files. They will be ignored.
4. Before submitting, run `python run_submission.py --debug` to check if your solution is gradable and bug-free.
- If you want to check the accuracy on the full dataset, omit the `--debug` flag.
- Make sure that your solution runs in under 20 minutes on the full dataset.
- If you run into rate limits you can lower `CONCURRENCY` in `run_submission.py`.
Please leave the `requirements.txt` unchanged and don't add any other dependencies.
You can request usage information for the provided API key from [OpenRouter](https://openrouter.ai/docs/api-reference/api-keys/get-current-api-key). The provided API key will be deactivated after the challenge.
## Submission
- Create a private GitHub repository.
- If you're Github repo is public you will spoil the challenge for others. We will automatically reject your submission.
- Please name the repository `ellamind_coding_challenge_John_Doe` where `John_Doe` is your name.
- Invite `ellamind-admin` as collaborator to your repository.
- Ensure your solution is present on the main branch.
- No further changes will be accepted after the two hour time limit.
- If you don't follow these instructions the grading will fail.
## Code Storage and Analysis
Your submitted code will be stored and analyzed by ellamind for the purpose of evaluating your application and improving our recruitment process.
## Additional Notes
- Grading will use a private test-set with the same structure.
- Perfect results aren't expected.