Will ChatGPT kill pre-employment tests

The rise of ChatGPT and similar LLMs has brought up many questions regarding their impacts on various industries, both positive and negative. One question I get asked regularly if this will be the end of automated skill tests.

To understand how ChatGPT impacts skill tests and coding tests, we need to first understand what ChatGPT can or cannot do:

What ChatGPT can do

Quick Solutions: ChatGPT can provide code snippets or solutions to common problems in seconds. This could potentially reduce the need for entry-level coders to tackle basic tasks.
Debugging Help: By describing a particular error or issue, developers can get advice on debugging, reducing the time and effort in troubleshooting.
Learning & Tutorials: Beginners can quickly ask and learn from the model, which may reduce dependency on traditional learning methods or platforms.

What ChatGPT can't do (yet)

Complex Problem Solving: Real-world software development often requires a deep understanding of specific requirements, business logic, and user experience. ChatGPT cannot yet fully grasp or create intricate, bespoke solutions on its own.
Deep Technical Knowledge: While ChatGPT has extensive knowledge, there are still niches and cutting-edge fields where human experts outshine.
Accuracy: One of the biggest drawbacks of ChatGPT is that it gets confused easily if the question is complex enough, and can be very confident about an incorrect answer it provided.

I was surprised by a talk Yejin Choi (an NLP expert) gave yesterday in Berkeley, on some surprising weaknesses of GPT4:
As many humans know, 237*757=179,409
but GPT4 said 179,289.

For the easy problem of multiplying two 3 digit numbers, they measured GPT4 accuracy being only… pic.twitter.com/kp3TDBaWId
— Alex Dimakis (@AlexGDimakis) August 16, 2023

Elimination as opposed to selection

If you think that ChatGPT will render your pre-employment test useless, you're probably using them incorrectly. The point of a skills test is to eliminate, not to select. So the metric you should be optimizing for is the number of unqualified candidates it enables you to confidently reject, versus the number of candidates who could cheat on the test and do well.

The automated skills test is meant to be a first step in companies that receive too many applications. Rather than having a recruiter go through the resume and decide who to move forward based on how what their resume says, a 30-40 min test can quickly help you identify the top 20%.

From there, the rest of your process can be semi-automated or manual.

As a thumb rule, the test should be so easy that if someone doesn't pass the test, it doesn't make sense to interview them. As part of working with thousands of companies to screen candidates, we've seen that a short 40 min, easy test can easily filter out 70-80% of candidates. If a few of the top 20% candidates have made it through using unfair means, that's usually very straightforward to detect in the interview process.

Also given the proctoring features that are in place today, it's genuinely difficult to cheat. If someone manages to get through the proctored test, do well and end up interviewing with your company they're probably smart and you should interview them. They know how to get things done.

Impact on skill tests

When Integrated Development Environments (IDEs) first came into prominence, there were concerns they would make certain coding tasks obsolete, thanks to features like auto-completion, error highlighting, and automated refactoring. While they did streamline many processes, IDEs didn't replace the need for skilled developers or for coding tests. They became tools that enhanced productivity.

Given that ChatGPT can solve basic/ textbook questions easily, recruiters need to be more careful about the quality of questions on their tests. This has always been the case- answers to textbook style questions are easily available online. It is important to make sure that in addition to the questions being Google-proof, the questions are now AI-proof.

Aptitude tests are a great way to filter candidates across roles. A simple 25 min aptitude tests can give you a lot of data points about the learnability of candidates. Across the decades, several research studies have concluded that multi-measure or aptitude tests are the best predictors of on-the-job success.

Skills testing platforms can implement stricter proctoring (anti-cheating) controls to prevent candidates from using unfair means or consulting AI models like ChatGPT during the test.

Images + text based question

At Adaface, we test our questions against the most advanced LLM chatbots in the market every few weeks. This is done to ensure the integrity of the test.

The majority of Adaface questions have images, where the image contains critical information needed to solve the question. This makes it harder for the candidate to use ChatGPT (or other AI bots) to solve.

One of the most recent proctoring features we've launched at Adaface is screen share proctoring.

It's an optional feature. If enabled, Ada (the assessment chatbot) will ask candidates to share their entire screen during the test. If they leave the window or open another window during the test, Ada will capture a screenshot of their screen. The scorecard will show the captured images for advanced cheating detection.

So if your candidates are using ChatGPT to answer questions on the test, you'll know that from their scorecards.

Sample scorecard with screen share proctoring

Copy-paste protection + coding timeline

The coding editor has copy-paste and cut-paste protection. Candidates won't be allowed to copy code from elsewhere and paste into their coding editor on Adaface. You will also be notified if the candidates uses the Developer Console to override the protection.

You can also see the full timeline of how the candidate wrote the code for a particular question. The timeline helps you visualize the candidate's thought process and if they paste code from somewhere else, you will see a sudden delta in the timeline.

Conclusion

Skill testing platforms might evolve, but they won't disappear. They'll shift focus, adapting to being ChatGPT-proof and implementing advanced anti-cheating solutions.

Companies also need to keep in mind that as long as they are using skill tests for elimination (not selection), a small percentage of candidates being able to get through to the next round doesn't render the skills tests useless. They can easily be identified in the interviews.