Search test library by skills or roles
⌘ K

Adaface Sample Natural Language Processing Questions

Here are some sample Natural Language Processing questions from our premium questions library (10273 non-googleable questions).

Skills

🧐 Question

Medium

Hate Speech Detection Challenge
Text Classification
Data Imbalance
Solve
You are working on a project to detect hate speech in social media posts. Your initial model, a basic binary classification model, has achieved high accuracy during training, but it's not performing well on the validation set. You also notice that your dataset has significantly more non-hate-speech examples than hate-speech examples. Given this situation, which of the following strategies could likely improve the performance of your model?
A: Collect more data and retrain the model.
            B: Introduce data augmentation techniques specifically for hate-speech examples.
            C: Change the model architecture from binary classification to multi-class classification.
            D: Replace all the words in the posts with their synonyms to increase the diversity of the data.
            E: Remove the non-hate-speech examples from the dataset to focus on the hate-speech examples.

Easy

Identifying Fake Reviews
Text Classification
Solve
You are a data scientist at an online marketplace company. Your task is to develop a solution to identify fake reviews on your platform. You have a dataset where each review is marked as either 'genuine' or 'fake'. After developing an initial model, you find that it's accurately classifying 'genuine' reviews but performing poorly with 'fake' ones. Which of the following steps can likely improve your model's performance in this context?
A: Use a more complex model to capture the intricacies of 'fake' reviews.
            B: Obtain more data to improve the overall performance of the model.
            C: Implement a cost-sensitive learning approach, placing a higher penalty on misclassifying 'fake' reviews.
            D: Translate the reviews to another language and then back to the original language to enhance their clarity.
            E: Remove the 'genuine' reviews from your training set to focus on 'fake' reviews.

Medium

Sentence probability
N-Grams
Language Models
Solve
Consider the following pseudo code for calculating the probability of a sentence using a bigram language model:
 image
Assume that the bigram and unigram counts are as follows:
            
            bigram_counts = {("i", "like"): 2, ("like", "cats"): 1, ("cats", "too"): 1}
            unigram_counts = {"i": 2, "like": 2, "cats": 2, "too": 1}
            vocabulary_size = 4
            
            What is the probability of the sentence "I like cats too" using the bigram language model?

Easy

Tokenization and Stemming
Stemming
Solve
You are working on a natural language processing project and need to preprocess the text data for further analysis. Your task is to tokenize the text and apply stemming to the tokens. Assuming you have an English text corpus, which of the following combinations of tokenizer and stemmer would most likely result in the best balance between token granularity and generalization?

Medium

Word Sense Disambiguation
Solve
You have been provided with a pre-trained BERT model (pretrained_bert_model) and you need to perform Word Sense Disambiguation (WSD) on the word "bat" in the following sentence:
            
            "The bat flew around the room."
            
            You have also been provided with a function called cosine_similarity(vec1, vec2) that calculates the cosine similarity between two vectors.
Which of the following steps should you perform to disambiguate the word "bat" in the given sentence using the BERT model and cosine similarity?
            
            1. Tokenize the sentence and pass it through the pre-trained BERT model.
            2. Extract the embeddings of the word "bat" from the sentence.
            3. Calculate the cosine similarity between the "bat" embeddings and each sense's representative words.
            4. Choose the sense with the highest cosine similarity.
            5. Calculate the Euclidean distance between the "bat" embeddings and each sense's representative words.
            6. Choose the sense with the lowest Euclidean distance.
🧐 Question🔧 Skill

Medium

Hate Speech Detection Challenge
Text Classification
Data Imbalance

2 mins

Natural Language Processing
Solve

Easy

Identifying Fake Reviews
Text Classification

2 mins

Natural Language Processing
Solve

Medium

Sentence probability
N-Grams
Language Models

2 mins

Natural Language Processing
Solve

Easy

Tokenization and Stemming
Stemming

2 mins

Natural Language Processing
Solve

Medium

Word Sense Disambiguation

2 mins

Natural Language Processing
Solve
🧐 Question🔧 Skill💪 Difficulty⌛ Time
Hate Speech Detection Challenge
Text Classification
Data Imbalance
Natural Language Processing
Medium2 mins
Solve
Identifying Fake Reviews
Text Classification
Natural Language Processing
Easy2 mins
Solve
Sentence probability
N-Grams
Language Models
Natural Language Processing
Medium2 mins
Solve
Tokenization and Stemming
Stemming
Natural Language Processing
Easy2 mins
Solve
Word Sense Disambiguation
Natural Language Processing
Medium2 mins
Solve

Trusted by recruitment teams in enterprises globally

Amazon Morgan Stanley Vodafone United Nations HCL PayPal Bosch WeWork Optimum Solutions Deloitte Microsoft NCS Doubtnut Sokrati J&T Express Capegemini

We evaluated several of their competitors and found Adaface to be the most compelling. Great library of questions that are designed to test for fit rather than memorization of algorithms.


Swayam Narain, CTO, Affable

hashtag image heart icon Swayam
customers across world
Join 1500+ companies in 80+ countries.
Try the most candidate friendly skills assessment tool today.
g2 badges
Ready to streamline your recruitment efforts with Adaface?
Ready to streamline your recruitment efforts with Adaface?
logo
40 min tests.
No trick questions.
Accurate shortlisting.
ada
Ada
● Online
Previous
Score: NA
Next
✖️