Job interviews in data science - my experience copy-pasting solutions

Talent war and talent shortage are common myths about tech in general, and data science in particular. In a previous blog post, I explained why I thought the problem was rather about Human Resources, an inability to design appropriate recruitment processes.

Now I have more facts to support this claim. I recently decided to apply for data scientist positions, given the new opportunities in remote work opened by the coronavirus pandemic. I got interviews at Toptal, Sleepiz and Sensyne.

In this blog post, I am going to share some details about what really happens in these job interviews, from my personal perspective as a copy-paste data scientist, who acquired the reflex of asking Google, GitHub and StackOverflow before starting to code by myself.

Toptal: Test with StackOverflow solution

First, I applied to Toptal, a selective freelancing platform claiming to “hire” the top 3% of freelance talent. It means only 3% of applicants succeed at their entry test, opening access to their closed freelancing platform.

At Toptal, they are now “hiring” data scientists, although they don’t provide any information about current opportunities available on their platform (vs. competitors like Upwork and Freelancer.com, where anyone can see projects from customers):

I applied, and after the HR interview, I was invited to take Codility tests. They consist of complex coding puzzles, with no relevance to data science whatsoever (a Datacamp test would make more sense). Codility is a branch of software engineering by itself, with little real-world applications beyond job interviews. Despite widespread criticism (here, and here…), they are still used in the software industry.

Some job candidates go through special training to acquire Codility skills. I preferred to ask my best friend: Google! Codility solutions can often be found on Stackoverflow and other coding forums, although the Codility team works hard to modify their tests against copy-paste developers like me.

I passed one test, and failed at the other two. The Toptal team asked for my feedback, and I naively answered that I unearthed the exact solution of one test from an obscure forum, but for the other two solutions, some tweaking of Stackoverflow posts would be necessary. After all, googling and stackoverflowing are not trivial, they are highly specialized skills, otherwise Stackoverflow wouldn’t be littered with so many duplicate questions. I wanted to present myself as an elite copy-paste programmer, who could find a role on their elitist freelancing platform. They didn’t appreciate my answer:

Their “analysis” and “investigation” consisted of reading my honest feedback email. If copy-paste developers don’t have integrity, who does?

At the end, I landed on their bottom 97% of tech talent.

Update 2022: Toptal HR team contacted me again, and now a confidentiality agreement is needed before starting the Toptal screening process:

Toptal tries to keep control of the narrative about the ‘top 3%’, and does not tolerate any free expression by the 97% of freelancers.

Sleepiz: Test with Kaggle solution

Then comes Sleepiz. It’s a spin-off from the prestigious ETH Zurich in Switzerland.

Sleepiz is “Europe’s most promising digital health startup”, building medical-grade sleep monitoring at home.

Quite surprisingly, their data science test had nothing to do with their core business in digital health: it was about road safety, the task of predicting police officer attendance in car accidents:

Road safety is very far from my domain expertise either. I expected digital health stuff. So I felt I should learn a bit about road safety before jumping onto the assignment.

After some intensive googling, I was surprised to find out the exact task on Kaggle! OMG!

I immediately reported this stunning discovery to the Sleepiz team, insisting that a senior ML engineer/researcher shouldn’t re-invent the wheel. However, it wasn’t enough for Sleepiz:

Despite a 72 hours deadline to complete the assignment, Sleepiz Lead Data Scientist felt that the comprehensive Kaggle notebook was too “preliminary”. The hiring bar at Sleepiz is extraordinarily high.

Sensyne Health: Test with Wikipedia solution

The third company was Sensyne Health, another promising digital health startup, spinning off the University of Oxford in the United Kingdom:

The HR interview went well, she appreciated my previous experience with medical data.

The next steps were twofold: a “conversation” (as she called it) with their ML research team, to “tell you more about what Sensyne Health does”, and a data science test. About the test, I clearly asked if I was allowed to use Google, as I told her about my unfortunate experiences with Toptal and Sleepiz. She laughed “We would never do that! Of course!”.

However, she never sent me the test. I only had a “conversation” with ML team.

I wanted this conversation to be productive, so I took the initiative to look at the recent publications of interviewers, to find out something inspiring. My attention was caught by their recent paper about Covid-19: “Early risk assessment for COVID-19 patients from emergency department data using machine learning”. Sensyne Health even brags about this paper on Twitter:

I had a similar project within the COVIND medical data consortium that I launched at the beginning of the pandemic. So I was baffled that their NHS data, which they also brag a lot about (“one of the world’s largest and most valuable datasets”), didn’t even include patient comorbidities and previous conditions (like diabetes and asthma), which matter a lot in Covid-19, as everyone knows. As a result, their “risk assessment” is unsurprisingly trivial: the “age” feature dwarfs all the others:

It will be hard to demonstrate the value of machine learning for Covid-19 with such paper. According to my quick and superficial understanding (I didn’t want to over-prepare the interview, to keep the tone conversational), I had the inner feeling that it was yet another RIRO model (Rubbish-In, Rubbish-Out), polluting the ML literature. Deeper domain expertise would be needed to salvage the paper, which might explain why they were hiring a new data scientist. But I wanted to listen to authors’ perspective.

Instead of defending the quality of their data and research, Sensyne Health researchers preferred to quickly transition to Wikipedia-style questions about basic ML algorithms, like how decision trees work, how convnets work, logistic regression, and so on. I only gave high-level answers, as it was just a “conversation”, and I don’t dig into the details of those algorithms every day. Anyway, they already have mature implementations in Sci-kit Learn and Keras. I didn’t expect such questions, but apparently, that’s what Sensyne Health is “very passionate about”…(instead of ML research for healthcare)

If I knew this “conversation” would be about in-depth and detailed explanations of standard and boring machine learning algorithms, I would have browsed Wikipedia in parallel, to refresh my memory.

When a whole nation behaves like those startups: the case of Algeria

We are in the 21st century, with smartphones everywhere. I was surprised to find such weird interviews at tech startups, who are supposed to build the future (ETH Zurich motto is “Where the future begins”. How ironic). They remind me of the Algerian political regime, who cuts the internet every year for the baccalaureate national exam, in order to avoid leaks.

Algerian students are very passionate about copy-pasting, and authorities could have seized this opportunity to drop the whole practice of exams altogether, and leapfrog to project-based education, like Finland. Instead of reforming the Algerian education system, they prefer to brutally unplug the whole country every year. Are Algerian authorities recruited through Toptal, or poached from Zurich and Oxford startups?

This repressive attitude towards the Internet is not surprising in Algeria, a banana republic ruled by old military boomers:

Since 2019, this old military regime is being disrupted by the Algerian youth, who ask for a modern governance, a kind of Algeria 2.0:

But will Algeria actually improve by becoming a startup nation governed by Gen Z data scientists, instead of Generals from the military? After my job interviews at Toptal, Sleepiz and Sensyne, I am no longer optimistic…

So what would be a better job interview?

This blog post is already pretty long, so I will present my alternative perspective on tech recruitment in the 21st century, a recruitment that embraces copy-pasting, in a follow-up blog post, if enough people ask (my Twitter @mostafabenh). In fact, I am building my own startup team, in parallel with my job hunt.

Job interviews in data science - my experience copy-pasting solutions

Toptal: Test with StackOverflow solution

Sleepiz: Test with Kaggle solution

Sensyne Health: Test with Wikipedia solution

When a whole nation behaves like those startups: the case of Algeria

So what would be a better job interview?

More articles from Melwy

Open science matters — timeline of the Lancet and New England Journal of Medicine frauds

Lancet paper on Chloroquine is overhyped - Real World Data should not be a black box