Search test library by skills or roles
⌘ K
Basic SQL Queries interview questions
1. Imagine you have a table of toys. How do you ask the computer to show you all the toys that are red?
2. Let's say you have a list of students and their ages. How would you find the oldest student?
3. If you have a table of books, how can you find all the books written by 'Author X'?
4. Suppose you have a table of customer orders. How would you list all the orders placed yesterday?
5. You have a table of products with prices. How do you find the product with the highest price?
6. If you have a table of employees, how do you list them in alphabetical order by last name?
7. Imagine a table of songs. How do you find all songs longer than 3 minutes?
8. If there is a table with customer details, and another with order information, how would you retrieve customer name along with what the customer ordered?
9. You have a table of transactions. How do you find the total amount of all transactions?
10. Given a table of events with start and end times, how can you list all events that happen on a specific date?
11. Suppose you have a list of products with their prices and quantities. How do you calculate the total value of each product (price times quantity)?
12. If you have a table of cities and their populations, how can you find the average population of all cities?
13. Imagine a table of blog posts. How do you select all the columns of data of the blogposts?
14. How would you retrieve data from the 'employees' table, but only include records where the 'department' is 'Sales'?
15. If you want to sort the 'products' table by 'price' in descending order, what SQL query would you use?
16. If you have a table of items with a category column, how do you find how many items are in each category?
17. Explain how you would find the second highest salary from an employee table.
18. Given two tables, 'customers' and 'orders', how would you combine them to show each customer's information along with their order details?
19. How do you write a query to retrieve only unique entries from a particular column in a table?
20. If you have a date column, how can you retrieve records for entries only in the last 7 days?
21. You have customer data in one table, and order data in another. How would you find all customers who have placed at least one order?
Intermediate SQL Queries interview questions
1. How would you find all employees who earn more than the average salary in their department?
2. Write a query to identify customers who have placed orders on consecutive days.
3. Can you explain how to use window functions to calculate a rolling average of sales over a 7-day period?
4. How would you retrieve the top 3 products in each category based on their sales figures?
5. Describe a SQL query to find the second highest salary in a company.
6. Explain how to identify duplicate records in a table based on specific columns.
7. Write a query to pivot a table, transforming row values into column headers.
8. How can you calculate the cumulative distribution of salaries within a company?
9. Design a query to find all departments where the average salary is higher than the company average.
10. How would you retrieve records that exist in one table but not in another, based on a common key?
11. Explain how to use Common Table Expressions (CTEs) to solve a hierarchical data problem, like finding all descendants of a node in a tree structure.
12. Write a query to identify the longest streak of consecutive days a customer has placed orders.
13. How would you implement pagination in a SQL query to retrieve records in batches?
14. Describe a scenario where you would use a recursive CTE and provide an example query.
15. Explain how to optimize a slow-running SQL query. What tools and techniques would you use?
16. How can you find the percentage change in sales from one month to the next for each product?
17. Write a query that determines if two date ranges overlap.
18. How would you find employees who have never completed a training course?
19. Explain how to handle NULL values in aggregate functions and comparisons.
20. Write a query to calculate the median salary of employees in each department.
21. How would you use a window function to rank customers based on their total spending?
22. Describe how to identify and correct data inconsistencies in a database.
23. Explain how to audit data changes in a table.
24. Write a query to transform JSON data stored in a column into relational data.
25. Explain the difference between correlated and non-correlated subqueries, and provide an example of each.
26. Write a query to identify the top 5 customers who contributed to 80% of the total sales.
27. How would you design a SQL query to find all possible pairs of employees who work in the same department?
Advanced SQL Queries interview questions
1. How would you identify the top 3 products in each category based on sales, using window functions?
2. Can you explain how to optimize a slow-performing query involving multiple joins and subqueries?
3. Describe a scenario where you'd use a recursive common table expression (CTE) and provide an example.
4. How do you handle duplicate data in a table and ensure data integrity?
5. Explain the difference between clustered and non-clustered indexes and their impact on query performance.
6. How would you design a database schema to efficiently store and retrieve hierarchical data, like an organizational chart?
7. Describe how to use partitioning to improve query performance on a very large table.
8. Explain how to implement row-level security in a database.
9. How would you audit changes made to specific tables in a database?
10. Can you describe the process of database normalization and its benefits?
11. How do you handle deadlocks in a database environment?
12. Explain the use of different isolation levels in transaction management.
13. How would you implement a full-text search feature in a database?
14. Describe how to use database triggers to enforce complex business rules.
15. How can you optimize queries that use the `LIKE` operator with wildcards?
16. Explain the difference between `UNION` and `UNION ALL` and when to use each.
17. How would you design a database schema for storing time-series data?
18. Describe different strategies for backing up and restoring a database.
19. How do you monitor database performance and identify potential bottlenecks?
20. Explain how to use stored procedures to encapsulate complex business logic.
21. How would you migrate a database from one platform to another (e.g., MySQL to PostgreSQL)?
22. Describe the concept of sharding and its advantages in a large-scale database environment.
23. Explain how to use database views to simplify complex queries and improve security.
24. How would you implement a data warehousing solution for business intelligence reporting?
25. Describe different techniques for data compression in a database.
26. Explain how to use database functions to extend the functionality of SQL.
Expert SQL Queries interview questions
1. How do you optimize a query that is running slowly due to a large number of joins?
2. Describe a scenario where using a window function would be more efficient than using a subquery. Provide an example.
3. Explain how you would diagnose and resolve a deadlock situation in a database.
4. How can you implement row-level security in SQL to restrict access to certain data based on user roles?
5. Design a SQL query to identify and remove duplicate records from a table, keeping only the most recent entry.
6. Explain the difference between clustered and non-clustered indexes, and when you would use each.
7. How would you design a database schema to efficiently store and query time-series data?
8. Describe how you would use partitioning to improve the performance of a large table. What are the considerations?
9. Write a SQL query to calculate a running total or cumulative sum of a column in a table.
10. How can you use Common Table Expressions (CTEs) to simplify complex queries? Give an example.
11. Explain how you would use SQL to perform data masking or anonymization for privacy purposes.
12. Describe how you would monitor and optimize database performance. What tools would you use?
13. How would you implement a full-text search feature in your database? What are the options and trade-offs?
14. Explain how you can use SQL to audit data changes and track who made them. Provide an example.
15. Describe a scenario where you would use a recursive CTE and provide an example query.
16. How would you handle a situation where a query is timing out due to resource constraints? What steps would you take to resolve it?
17. Explain the ACID properties of database transactions and how they ensure data integrity.
18. How do you handle null values in SQL queries to avoid unexpected results? Give examples.
19. Describe how you would use SQL to perform data validation and ensure data quality.
20. How would you optimize SQL queries that involve subqueries? What are the alternatives?
21. Explain how you would use database triggers to automate tasks or enforce business rules.
22. Design a query to find the top N records within each group, such as the top 3 customers with the highest sales in each region.
23. How can you use explain plans to understand the execution plan of a query and identify potential bottlenecks?

97 MySQL interview questions that you should ask to hire top engineers


Siddhartha Gunti Siddhartha Gunti

September 09, 2024


SQL skills are foundational across many technical roles, from Data Analysts to Database Administrators. Effectively assessing a candidate's SQL abilities is therefore, key to ensuring they can meet the demands of the role; just like identifying the right skills for a data engineer.

This blog post provides a range of SQL interview questions, categorized by skill level: basic, intermediate, advanced, and expert, in addition to MCQs. You will find carefully curated questions designed to evaluate candidates' SQL proficiency and problem-solving abilities.

By using these questions, you will confidently assess candidates' SQL knowledge and practical skills, and reduce time to hire by using Adaface's SQL assessments before the interview.

Table of contents

Basic SQL Queries interview questions
Intermediate SQL Queries interview questions
Advanced SQL Queries interview questions
Expert SQL Queries interview questions
SQL Queries MCQ
Which SQL Queries skills should you evaluate during the interview phase?
3 Tips for Using SQL Queries Interview Questions
Hire Top SQL Talent with Targeted Assessments
Download SQL Queries interview questions template in multiple formats

Basic SQL Queries interview questions

1. Imagine you have a table of toys. How do you ask the computer to show you all the toys that are red?

Assuming we're using SQL, you'd use a SELECT statement with a WHERE clause to filter by color. It would look something like this:

SELECT * FROM toys WHERE color = 'red';

This SQL query selects all columns (*) from the toys table, but only includes rows where the color column has the value 'red'.

2. Let's say you have a list of students and their ages. How would you find the oldest student?

To find the oldest student, I would iterate through the list, keeping track of the oldest student found so far and their age. For each student, I would compare their age to the age of the current oldest student. If the current student is older, I would update the oldest student. At the end of the iteration, the variable would hold the oldest student.

In Python, this can be done using a loop:

students = [{"name": "Alice", "age": 20}, {"name": "Bob", "age": 22}, {"name": "Charlie", "age": 19}]
oldest_student = students[0]
for student in students:
    if student["age"] > oldest_student["age"]:
        oldest_student = student
print(oldest_student)

3. If you have a table of books, how can you find all the books written by 'Author X'?

To find all books written by 'Author X' from a table of books, you would use a SQL query with a WHERE clause. Assuming your table is named books and has a column named author, the query would look like this:

SELECT * FROM books WHERE author = 'Author X';

This query selects all columns (*) from the books table where the value in the author column is equal to 'Author X'. The result will be a list of all books written by that author.

4. Suppose you have a table of customer orders. How would you list all the orders placed yesterday?

To list all orders placed yesterday, you would use a SQL query similar to the following:

SELECT * 
FROM Orders
WHERE OrderDate = CURRENT_DATE - INTERVAL '1 day';

This query assumes that OrderDate is the column storing the date of the order. The CURRENT_DATE - INTERVAL '1 day' portion calculates yesterday's date. If the database system uses a different date function (e.g., DATE() in MySQL or GETDATE() in SQL Server), the query would need to be adjusted accordingly. Also the above query returns just the date, but you might need to adjust the format of date-time column to just date for comparison, depends on your database system.

5. You have a table of products with prices. How do you find the product with the highest price?

To find the product with the highest price, you'd typically use SQL. Assuming your table is named products and has columns product_name and price, the query would look like this:

SELECT product_name, price
FROM products
ORDER BY price DESC
LIMIT 1;

This query orders the table by the price column in descending order and then limits the result to the first row, which contains the product with the highest price. Some databases might offer alternative ways to achieve the same result, such as using a MAX() function in a subquery, but the above approach is generally efficient and readable.

6. If you have a table of employees, how do you list them in alphabetical order by last name?

Assuming your employee table has columns like first_name and last_name, you can use an SQL query like this:

SELECT * FROM employees ORDER BY last_name ASC;

This query selects all columns (*) from the employees table and orders the results alphabetically in ascending order (ASC) based on the values in the last_name column.

7. Imagine a table of songs. How do you find all songs longer than 3 minutes?

Assuming the table is named songs and has a column duration_seconds representing the song's length in seconds, the following SQL query would retrieve all songs longer than 3 minutes (180 seconds):

SELECT * FROM songs WHERE duration_seconds > 180;

This query selects all columns (*) from the songs table where the duration_seconds column is greater than 180.

8. If there is a table with customer details, and another with order information, how would you retrieve customer name along with what the customer ordered?

To retrieve customer name along with order details from two tables, one with customer details and another with order information, you would typically use a SQL JOIN operation. Assuming the tables are named Customers and Orders, and they share a common column such as CustomerID, the query would look like this:

SELECT c.CustomerName, o.OrderDetails
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID;

This query joins the Customers table (aliased as c) with the Orders table (aliased as o) based on the CustomerID. It then selects the CustomerName from the Customers table and OrderDetails from the Orders table for each matching row.

9. You have a table of transactions. How do you find the total amount of all transactions?

To find the total amount of all transactions in a table, you would use the SUM() aggregate function in SQL.

SELECT SUM(transaction_amount) AS total_amount
FROM transactions;

Replace transaction_amount with the actual column name that holds the transaction amounts in your transactions table. The AS total_amount part is optional; it just gives a name to the resulting sum.

10. Given a table of events with start and end times, how can you list all events that happen on a specific date?

To list all events that happen on a specific date, you need to query the table and filter the results based on the event's start and end times. Assuming your table has columns start_time and end_time (both with appropriate datetime datatypes), and you want to find events happening on a date, say '2024-03-15', you can use the following SQL-like logic:

SELECT * FROM events
WHERE start_time <= '2024-03-15 23:59:59' AND end_time >= '2024-03-15 00:00:00';

This query selects all events where the start_time is before the end of the given date, and the end_time is after the beginning of the given date. This ensures that events that start before, end after, or occur entirely within the specified date are included in the results. Replace '2024-03-15' with the desired date.

11. Suppose you have a list of products with their prices and quantities. How do you calculate the total value of each product (price times quantity)?

To calculate the total value of each product, you simply multiply the price of the product by its quantity. For example, if a product has a price of $10 and a quantity of 5, the total value is $10 * 5 = $50.

If you are working with a list or array of products, you would iterate through the list and apply this calculation to each product. In Python, this could look like:

products = [{"name": "A", "price": 10, "quantity": 5}, {"name": "B", "price": 20, "quantity": 2}]

for product in products:
    total_value = product["price"] * product["quantity"]
    print(f"Total value of {product['name']}: ${total_value}")

12. If you have a table of cities and their populations, how can you find the average population of all cities?

To find the average population of all cities, you would sum the population of each city and then divide by the total number of cities. This can be done using SQL with the following steps:

  • Use the SUM() aggregate function to calculate the total population of all cities.
  • Use the COUNT(*) aggregate function to count the total number of cities.
  • Divide the total population by the total number of cities to get the average. For example:
SELECT SUM(population) / COUNT(*) AS average_population
FROM cities;

13. Imagine a table of blog posts. How do you select all the columns of data of the blogposts?

To select all columns from a table named blogposts, you would use the following SQL query:

SELECT * FROM blogposts;

The * symbol is a wildcard that represents all columns in the table. This query retrieves every column (e.g., id, title, content, author, date_published) for each row in the blogposts table.

14. How would you retrieve data from the 'employees' table, but only include records where the 'department' is 'Sales'?

To retrieve data from the 'employees' table where the 'department' is 'Sales', you would use a SQL query with a WHERE clause. Here's an example:

SELECT * FROM employees WHERE department = 'Sales';

This query selects all columns (*) from the 'employees' table, but filters the results to only include rows where the value in the 'department' column is equal to 'Sales'.

15. If you want to sort the 'products' table by 'price' in descending order, what SQL query would you use?

To sort the 'products' table by 'price' in descending order, you would use the following SQL query:

SELECT * FROM products ORDER BY price DESC;

The ORDER BY clause is used to sort the result set. DESC keyword specifies that the sorting should be in descending order. If you wanted ascending order (which is the default), you would use ASC or omit the keyword.

16. If you have a table of items with a category column, how do you find how many items are in each category?

To find how many items are in each category in a table, you would use a GROUP BY clause in SQL. This clause groups rows that have the same value in a specified column into summary rows, like counting the occurrences of each category.

For example, if your table is named items and has a column named category, the query would look like this:

SELECT category, COUNT(*) AS item_count
FROM items
GROUP BY category;

This query selects the category and the count of items (COUNT(*)) for each category. The GROUP BY category clause ensures that the count is calculated separately for each distinct category.

17. Explain how you would find the second highest salary from an employee table.

To find the second highest salary from an employee table, you can use SQL. One common approach involves using a subquery or window functions. Using a subquery, you'd first find the maximum salary, and then select the maximum salary that is less than the maximum salary.

SELECT MAX(salary) FROM employee WHERE salary < (SELECT MAX(salary) FROM employee);

Alternatively, using window functions:

SELECT salary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as rank_num FROM employee) AS ranked_salaries WHERE rank_num = 2 LIMIT 1;

This approach ranks the salaries in descending order and selects the salary with rank 2.

18. Given two tables, 'customers' and 'orders', how would you combine them to show each customer's information along with their order details?

To combine the customers and orders tables and show customer information along with order details, you'd typically use a JOIN operation in SQL. Specifically, a LEFT JOIN or INNER JOIN are common choices depending on the desired result. An INNER JOIN will only return rows where there's a match in both tables. A LEFT JOIN will return all rows from the customers table and the matching rows from the orders table. If there's no matching order for a customer, the order details columns will contain NULL values.

Here's an example SQL query assuming both tables have a common column named customer_id:

SELECT
  customers.*,
  orders.*
FROM
  customers
LEFT JOIN
  orders ON customers.customer_id = orders.customer_id;

19. How do you write a query to retrieve only unique entries from a particular column in a table?

To retrieve only unique entries from a particular column in a table, you can use the DISTINCT keyword in SQL. The basic syntax is:

SELECT DISTINCT column_name FROM table_name;

This query will return a list of all the distinct values present in the specified column, effectively removing any duplicate entries. For example, if you have a table named 'Customers' and you want to retrieve all the unique cities from the 'City' column, the query would be SELECT DISTINCT City FROM Customers;

20. If you have a date column, how can you retrieve records for entries only in the last 7 days?

To retrieve records from a date column for entries only in the last 7 days, you would typically use a WHERE clause in your SQL query that compares the date column to the current date minus 7 days.

For example, assuming your date column is named date_column and your table is named your_table, the SQL query would look like this:

SELECT * FROM your_table WHERE date_column >= DATE('now', '-7 days');

This query selects all records from your_table where the date_column is within the last 7 days, including today.

21. You have customer data in one table, and order data in another. How would you find all customers who have placed at least one order?

To find all customers who have placed at least one order, I would use a JOIN operation between the customer and order tables. Assuming the customer table has a customer_id and the order table has a customer_id referencing the customer, a SQL query would look like this:

SELECT DISTINCT c.customer_id
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;

This query joins the two tables based on the customer_id. The DISTINCT keyword ensures that each customer is only listed once, even if they have placed multiple orders. If you don't have SQL you can also use pandas/spark or any other data processing tool to achieve the same result using similar logic.

Intermediate SQL Queries interview questions

1. How would you find all employees who earn more than the average salary in their department?

To find all employees who earn more than the average salary in their department, you can use SQL. The general approach involves using a subquery or a window function.

Here's an example using a subquery:

SELECT employee_name, salary, department
FROM employees e
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department = e.department
);

Alternatively, using a window function:

SELECT employee_name, salary, department
FROM (
    SELECT employee_name, salary, department, AVG(salary) OVER (PARTITION BY department) as avg_dept_salary
    FROM employees
) AS subquery
WHERE salary > avg_dept_salary;

Both queries achieve the same goal, but the window function approach can sometimes be more efficient for larger datasets.

2. Write a query to identify customers who have placed orders on consecutive days.

To identify customers who have placed orders on consecutive days, you can use a self-join or window functions. Here's an example using SQL with window functions:

SELECT DISTINCT customer_id
FROM (
    SELECT 
        customer_id,
        order_date,
        LAG(order_date, 1, order_date) OVER (PARTITION BY customer_id ORDER BY order_date) as prev_order_date
    FROM 
        orders
) sub
WHERE DATEDIFF(order_date, prev_order_date) = 1;

This query uses the LAG window function to get the previous order date for each customer. Then, it filters for rows where the difference between the current order date and the previous order date is exactly 1 day. The outer SELECT DISTINCT ensures each customer is only listed once.

3. Can you explain how to use window functions to calculate a rolling average of sales over a 7-day period?

To calculate a rolling 7-day average of sales using window functions, you would typically use SQL. The key is the AVG() aggregate function combined with the OVER() clause to define the window. Within OVER(), you specify ORDER BY to define the order and ROWS BETWEEN to define the window frame, in this case, 6 preceding days and the current day.

Here's an example SQL query:

SELECT
    sale_date,
    sales_amount,
    AVG(sales_amount) OVER (ORDER BY sale_date ASC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_avg
FROM
    sales_table;

This query calculates the average sales_amount for each sale_date, considering the current day and the 6 days before it.

4. How would you retrieve the top 3 products in each category based on their sales figures?

To retrieve the top 3 products in each category based on sales figures, I'd typically use a window function in SQL. Assuming you have a table named products with columns like category_id, product_id, and sales, the query would look like this:

SELECT category_id, product_id, sales
FROM (
  SELECT category_id, product_id, sales,
         RANK() OVER (PARTITION BY category_id ORDER BY sales DESC) as sales_rank
  FROM products
) ranked_products
WHERE sales_rank <= 3;

This query first ranks products within each category based on sales in descending order using the RANK() window function. Then, it selects only those products where the rank is less than or equal to 3, effectively giving you the top 3 products per category. The PARTITION BY clause ensures that the ranking is done separately for each category.

5. Describe a SQL query to find the second highest salary in a company.

To find the second highest salary, you can use a subquery or window functions.

Using a subquery:

SELECT MAX(salary)
FROM employee
WHERE salary < (SELECT MAX(salary) FROM employee);

Using window functions (more robust for handling edge cases like ties):

SELECT salary
FROM (
    SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as salary_rank
    FROM employee
) AS ranked_salaries
WHERE salary_rank = 2;

The window function approach is generally preferred because it can handle scenarios where multiple employees might have the same highest salary, ensuring that you still retrieve the second distinct salary.

6. Explain how to identify duplicate records in a table based on specific columns.

To identify duplicate records in a table based on specific columns, you can use SQL queries. The basic idea is to group the data by the columns you want to check for duplicates and then count the occurrences of each group. If any group has a count greater than 1, it indicates duplicate records.

Here's a general SQL approach:

SELECT column1, column2, COUNT(*) 
FROM table_name
GROUP BY column1, column2
HAVING COUNT(*) > 1;

Replace column1, column2 with the names of the columns you want to check for duplicates. The HAVING COUNT(*) > 1 clause filters the results to show only those groups (combinations of column1 and column2 values) that appear more than once in the table.

7. Write a query to pivot a table, transforming row values into column headers.

Pivoting data transforms row values into column headers. The exact syntax varies depending on the database system.

For example, in SQL Server, you could use the PIVOT operator:

SELECT * FROM
(
    SELECT category, product, sales FROM SalesTable
) AS SourceTable
PIVOT
(
    SUM(sales)
    FOR category IN ([Category1], [Category2], [Category3]) -- specify all possible categories
) AS PivotTable;

In other databases like MySQL, you might need to use conditional aggregation:

SELECT
    product,
    SUM(CASE WHEN category = 'Category1' THEN sales ELSE 0 END) AS Category1,
    SUM(CASE WHEN category = 'Category2' THEN sales ELSE 0 END) AS Category2,
    SUM(CASE WHEN category = 'Category3' THEN sales ELSE 0 END) AS Category3
FROM
    SalesTable
GROUP BY product;

8. How can you calculate the cumulative distribution of salaries within a company?

To calculate the cumulative distribution of salaries, you need a dataset of all salaries. First, sort the salaries in ascending order. Then, for each salary, calculate the cumulative probability by dividing its rank (position in the sorted list) by the total number of salaries. This result represents the proportion of employees earning at or below that salary. The result is the cumulative distribution.

For example, if you have 100 salaries, the employee with the lowest salary has a cumulative probability of 1/100 = 0.01, meaning 1% of employees earn at or below that salary. The employee in the middle (50th position) has a cumulative probability of 50/100 = 0.5, or 50%. The highest paid employee would have a cumulative probability of 1.0 (100%). This can be easily visualized with a cumulative distribution function (CDF) plot.

9. Design a query to find all departments where the average salary is higher than the company average.

SELECT department
FROM employees
GROUP BY department
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);

This query first calculates the average salary for each department using GROUP BY and AVG(salary). Then, the HAVING clause filters these departments, keeping only those where the average salary is greater than the overall company's average salary (calculated by the subquery SELECT AVG(salary) FROM employees).

10. How would you retrieve records that exist in one table but not in another, based on a common key?

To retrieve records that exist in one table (Table A) but not in another (Table B) based on a common key, you can use a LEFT JOIN or a NOT EXISTS clause in SQL. A LEFT JOIN includes all rows from the left table (Table A) and the matching rows from the right table (Table B). By filtering for rows where the right table's join column is NULL, you can identify records present only in Table A.

Alternatively, the NOT EXISTS clause checks for the absence of matching records in Table B for each record in Table A. This approach is often more performant, especially with proper indexing.

Example using LEFT JOIN:

SELECT A.*
FROM TableA A
LEFT JOIN TableB B ON A.CommonKey = B.CommonKey
WHERE B.CommonKey IS NULL;

Example using NOT EXISTS:

SELECT A.*
FROM TableA A
WHERE NOT EXISTS (SELECT 1 FROM TableB B WHERE A.CommonKey = B.CommonKey);

11. Explain how to use Common Table Expressions (CTEs) to solve a hierarchical data problem, like finding all descendants of a node in a tree structure.

CTEs provide a clean and efficient way to traverse hierarchical data. For finding all descendants of a node, a recursive CTE can be used. The CTE consists of two parts: an anchor member and a recursive member. The anchor member selects the initial node (the root node for finding all descendants). The recursive member then joins the CTE with the original table to find the children of the previously selected nodes.

Here's a general example:

WITH RECURSIVE Descendants AS (
    SELECT id, parent_id, name
    FROM your_table
    WHERE id = 'target_node_id' -- Anchor member: start from the target node

    UNION ALL

    SELECT t.id, t.parent_id, t.name
    FROM your_table t
    INNER JOIN Descendants d ON t.parent_id = d.id -- Recursive member: find children
)
SELECT * FROM Descendants;

your_table contains the hierarchical data, id is the node identifier, and parent_id represents the parent of a given node. The initial SELECT statement fetches the target node. The UNION ALL combines the initial selection with the results of the recursive step. The recursive step joins your_table with the CTE (Descendants) to find child nodes. The recursion continues until no more descendants are found.

12. Write a query to identify the longest streak of consecutive days a customer has placed orders.

To identify the longest streak of consecutive days a customer has placed orders, you can use SQL. Here's a general approach:

WITH RankedOrders AS (
 SELECT
 customer_id,
 order_date,
 ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS rn
 FROM
 orders
 GROUP BY customer_id, order_date -- handles multiple orders on the same day
),
ConsecutiveDays AS (
 SELECT
 customer_id,
 order_date,
 DATE(order_date, '-' || (rn - 1) || ' days') AS date_group
 FROM
 RankedOrders
),
Streaks AS (
 SELECT
 customer_id,
 date_group,
 COUNT(*) AS streak_length
 FROM
 ConsecutiveDays
 GROUP BY
 customer_id, date_group
)
SELECT
 customer_id,
 MAX(streak_length) AS longest_streak
FROM
 Streaks
GROUP BY customer_id
ORDER BY longest_streak DESC;

First, we assign a rank to each order date for each customer. Then, we calculate a date_group by subtracting the rank minus one from the order date. Consecutive dates will have the same date_group. Finally, we group by customer_id and date_group to calculate the streak_length and find the maximum streak for each customer.

13. How would you implement pagination in a SQL query to retrieve records in batches?

Pagination in SQL is typically implemented using the LIMIT and OFFSET clauses. LIMIT specifies the maximum number of rows to return, while OFFSET specifies the number of rows to skip from the beginning of the result set.

For example, to retrieve records 11-20 (a batch of 10 records starting from the 11th record), the query would look like this:

SELECT * FROM table_name ORDER BY id LIMIT 10 OFFSET 10;

In this query LIMIT 10 restricts the number of returned rows to 10, and OFFSET 10 skips the first 10 rows. This is suitable for a single database. Use a keyset pagination method if you want to scale to multiple DB shards.

14. Describe a scenario where you would use a recursive CTE and provide an example query.

A recursive CTE is useful when you need to query hierarchical or tree-structured data. For example, imagine an employee table where each employee can report to another employee (forming a management hierarchy). If you need to find all subordinates of a given manager, a recursive CTE is an efficient approach.

WITH RECURSIVE EmployeeHierarchy AS (
 SELECT id, name, manager_id, 0 AS level
 FROM Employees
 WHERE id = 'specific_manager_id' -- Start with the specific manager

 UNION ALL

 SELECT e.id, e.name, e.manager_id, eh.level + 1
 FROM Employees e
 INNER JOIN EmployeeHierarchy eh ON e.manager_id = eh.id -- Find subordinates
) 
SELECT id, name, level FROM EmployeeHierarchy;

This query starts with a specific manager and then recursively joins the Employees table with the CTE to find all employees who report to someone already in the CTE, effectively traversing the hierarchy downwards. The level column indicates the depth of each employee in the hierarchy relative to the initial manager. The query will continue until it reaches the bottom of the hierarchy (employees with no subordinates).

15. Explain how to optimize a slow-running SQL query. What tools and techniques would you use?

To optimize a slow-running SQL query, I'd start by identifying the bottleneck. I'd use tools like the database's query execution plan analyzer (e.g., EXPLAIN in MySQL or PostgreSQL) to understand how the database is executing the query and pinpoint slow operations, such as full table scans or inefficient joins. Common techniques include:

  • Index Optimization: Ensuring appropriate indexes exist on columns used in WHERE clauses, JOIN conditions, and ORDER BY clauses. Adding, removing, or modifying indexes based on the query's execution plan can significantly improve performance.
  • Query Rewriting: Restructuring the query to be more efficient. This might involve simplifying complex WHERE clauses, avoiding SELECT *, using JOINs instead of subqueries (or vice-versa depending on the context), and optimizing GROUP BY and ORDER BY operations.
  • Data Partitioning: For large tables, partitioning data can reduce the amount of data scanned by the query.
  • Caching: Implementing caching mechanisms (e.g., using database-level caching or an external caching layer like Redis) to store frequently accessed data.
  • Hardware Upgrades: If the database server is resource-constrained (CPU, memory, disk I/O), upgrading the hardware can improve overall performance.
  • Statistics Updates: Ensuring that the database optimizer has up-to-date statistics about the data distribution in tables. These statistics are used to generate efficient execution plans.
  • Analyze Query Patterns: Identifying common slow queries and optimizing them or re-designing related functionality.

I would also use database monitoring tools to observe query performance over time and identify any recurring issues. I would profile the query with the execution plan frequently to observe differences as I implemented new indexes or re-wrote the query.

16. How can you find the percentage change in sales from one month to the next for each product?

To find the percentage change in sales from one month to the next for each product, you can use the following formula:

  • Percentage Change = ((Current Month Sales - Previous Month Sales) / Previous Month Sales) * 100

You would apply this formula to each product individually. For example, if Product A had sales of 100 in January and 120 in February, the percentage change would be ((120 - 100) / 100) * 100 = 20%. This calculation would be repeated for each product and each month-to-month comparison.

17. Write a query that determines if two date ranges overlap.

To determine if two date ranges overlap, you can use the following logic. Let's say we have two ranges, Range A (start_a, end_a) and Range B (start_b, end_b). The ranges overlap if and only if: start_a <= end_b AND start_b <= end_a. This condition verifies that the starting point of either range falls before the ending point of the other range.

Here's an example SQL query illustrating this:

SELECT 
    CASE 
        WHEN start_a <= end_b AND start_b <= end_a THEN 'Overlap' 
        ELSE 'No Overlap' 
    END AS OverlapStatus
FROM 
    YourTable;

18. How would you find employees who have never completed a training course?

To find employees who haven't completed a specific training course, you'd typically query a database with employee and training information. Assuming you have an employees table and a training_completions table, you can use a LEFT JOIN and check for NULL values in the training_completions table.

For example, in SQL:

SELECT e.employee_id, e.employee_name
FROM employees e
LEFT JOIN training_completions tc ON e.employee_id = tc.employee_id AND tc.course_id = 'specific_course_id'
WHERE tc.employee_id IS NULL;

This query retrieves all employees from the employees table and attempts to match them with records in the training_completions table for the specified course. The WHERE clause filters for cases where there's no matching record in training_completions (i.e., tc.employee_id IS NULL), indicating the employee hasn't completed the course.

19. Explain how to handle NULL values in aggregate functions and comparisons.

Aggregate functions (like SUM, AVG, COUNT, MIN, MAX) typically handle NULL values by excluding them from the calculation. For example, SUM(column_with_nulls) will sum only the non-NULL values in that column. COUNT(*) counts all rows, including those with NULLs, while COUNT(column_name) counts only rows where column_name is not NULL.

Comparisons with NULL always evaluate to UNKNOWN (which is treated as neither true nor false in SQL). Therefore, you cannot use =, !=, <, >, etc. to directly compare with NULL. Instead, use IS NULL and IS NOT NULL to check for NULL values. For example, WHERE column_name IS NULL will select rows where column_name contains a NULL value.

20. Write a query to calculate the median salary of employees in each department.

Calculating the median salary requires different approaches depending on the database system. Here's a general outline using SQL and some common approaches:

WITH DepartmentSalaries AS (
 SELECT
 department_id,
 salary,
 ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary) AS row_num,
 COUNT(*) OVER (PARTITION BY department_id) AS total_employees
 FROM
 employees
),
MedianCalculator AS (
 SELECT
 department_id,
 AVG(salary) AS median_salary
 FROM
 DepartmentSalaries
 WHERE
 row_num BETWEEN (total_employees + 1) / 2 AND (total_employees + 2) / 2
 GROUP BY
 department_id
)
SELECT
 mc.department_id,
 mc.median_salary
FROM
MedianCalculator mc;

This SQL code calculates the median salary for each department by first assigning a row number to each employee within their department based on salary. Then, it calculates the median by averaging the salary of the middle one or two employees (depending on whether the total number of employees is odd or even). This approach provides a general solution that can be adapted to various SQL databases.

21. How would you use a window function to rank customers based on their total spending?

To rank customers based on their total spending using a window function, you'd typically use the RANK(), DENSE_RANK(), or ROW_NUMBER() functions in SQL. These functions allow you to assign a rank to each row within a partition of your result set, without grouping the rows.

For example, assuming you have a table named orders with columns customer_id and order_amount, and you want to rank customers based on their total spending, you could use the following SQL query:

SELECT
  customer_id,
  SUM(order_amount) AS total_spending,
  RANK() OVER (ORDER BY SUM(order_amount) DESC) AS spending_rank
FROM
  orders
GROUP BY
  customer_id;

This query calculates the sum of order_amount for each customer_id, and then the RANK() function assigns a rank based on the total spending, with the customer having the highest spending getting rank 1. DENSE_RANK() would assign consecutive ranks without gaps, even if there are ties in total_spending. ROW_NUMBER() assigns a unique rank to each row.

22. Describe how to identify and correct data inconsistencies in a database.

Identifying and correcting data inconsistencies involves several steps. First, data profiling helps understand data patterns and identify anomalies like out-of-range values or unusual distributions. Data inconsistencies can also be detected by implementing data validation rules at the application or database level (e.g., using constraints, triggers) to prevent invalid data entry. Then, perform data quality audits, by comparing data across different sources or within the same database using SQL queries, reporting tools, or dedicated data quality software. This involves looking for duplicate records, orphaned records, and conflicting values. Finally, use tools or custom scripts to perform data cleansing and transformation and correct discrepancies, such as data deduplication, standardization of formats, and resolving conflicting values based on business rules or source of truth.

Correcting inconsistencies requires choosing the right approach based on the specific issue. This might involve updating records manually, using SQL scripts to batch update data, or creating more complex data transformation pipelines. Always back up data before making any changes and document the correction process.

23. Explain how to audit data changes in a table.

Auditing data changes in a table involves tracking who made what changes and when. A common approach is to use triggers. You create a trigger that fires on INSERT, UPDATE, or DELETE operations on the table you want to audit. This trigger then inserts a record into an audit table.

The audit table typically includes the following columns:

  • table_name: The name of the table being audited.
  • column_name: The name of the column changed.
  • primary_key_value: The primary key of the affected row.
  • old_value: The value before the change.
  • new_value: The value after the change.
  • changed_by: The user who made the change.
  • changed_at: The timestamp of the change.

Here's an example of a trigger:

CREATE TRIGGER audit_employee
AFTER UPDATE ON employee
FOR EACH ROW
BEGIN
  INSERT INTO employee_audit (
    table_name,
    column_name,
    primary_key_value,
    old_value,
    new_value,
    changed_by,
    changed_at
  )
  VALUES (
    'employee',
    'salary',
    OLD.id,
    OLD.salary,
    NEW.salary,
    USER(),
    NOW()
  );
END;

Some databases also offer built-in auditing features that can simplify this process.

24. Write a query to transform JSON data stored in a column into relational data.

The specific query depends heavily on the database system you're using and the structure of your JSON data. However, the general approach involves using database-specific JSON functions to extract values from the JSON column and map them to relational columns.

For example, in PostgreSQL:

SELECT
  json_data ->> 'field1' AS column1,
  json_data ->> 'field2' AS column2
FROM
  your_table;

Where json_data is the name of the column containing JSON, and field1, field2 are keys within the JSON object. The ->> operator extracts the value as text. Similar functions exist in other databases like MySQL (JSON_EXTRACT), SQL Server (JSON_VALUE), and others. Consult the documentation for your specific database for the correct functions and syntax.

25. Explain the difference between correlated and non-correlated subqueries, and provide an example of each.

A correlated subquery depends on the outer query for its values. The inner query executes once for each row processed by the outer query. In contrast, a non-correlated subquery is independent and executes only once. Its result is then used by the outer query.

Example:

  • Non-correlated: SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees); - The subquery calculates the average salary once.
  • Correlated: SELECT * FROM employees e WHERE salary > (SELECT AVG(salary) FROM departments WHERE department_id = e.department_id); - The subquery calculates the average salary for each employee's department.

26. Write a query to identify the top 5 customers who contributed to 80% of the total sales.

WITH CustomerSales AS (
 SELECT
 customer_id,
 SUM(sale_amount) AS total_sales
 FROM
 sales_table
 GROUP BY
 customer_id
),
RankedSales AS (
 SELECT
 customer_id,
 total_sales,
 SUM(total_sales) OVER (ORDER BY total_sales DESC) AS cumulative_sales,
 SUM(total_sales) OVER () AS total_overall_sales
 FROM
 CustomerSales
),
PercentageSales AS (
 SELECT
 customer_id,
 total_sales,
 cumulative_sales,
 total_overall_sales,
 (cumulative_sales * 1.0 / total_overall_sales) * 100 AS cumulative_percentage
 FROM
 RankedSales
)
SELECT
 customer_id,
 total_sales
FROM
 PercentageSales
WHERE
 cumulative_percentage <= 80
ORDER BY
 total_sales DESC
LIMIT 5;

This SQL query identifies the top 5 customers contributing to 80% of total sales. It calculates each customer's total sales, then cumulatively sums the sales in descending order to determine the percentage contribution. Finally, it filters for customers whose cumulative percentage is less than or equal to 80% and selects the top 5 by total sales.

27. How would you design a SQL query to find all possible pairs of employees who work in the same department?

To find all possible pairs of employees who work in the same department, you can use a self-join. Here's the SQL query:

SELECT e1.employee_id, e2.employee_id
FROM employees e1
JOIN employees e2 ON e1.department_id = e2.department_id AND e1.employee_id < e2.employee_id;

This query joins the employees table to itself (e1 and e2) based on the condition that their department_id values are equal. The e1.employee_id < e2.employee_id condition prevents duplicate pairs (e.g., (A, B) and (B, A)) and also avoids pairing an employee with themselves.

Advanced SQL Queries interview questions

1. How would you identify the top 3 products in each category based on sales, using window functions?

To identify the top 3 products in each category based on sales using window functions in SQL, you can use the RANK() or DENSE_RANK() function. First, partition the data by category. Then, order the products within each category by sales in descending order. Assign a rank to each product based on its sales within its category. Finally, filter the result to include only those products with a rank less than or equal to 3.

For example, if you're using SQL:

WITH RankedSales AS (
 SELECT
 category,
 product_name,
 sales,
 RANK() OVER (PARTITION BY category ORDER BY sales DESC) AS sales_rank
 FROM
 sales_table
)
SELECT
 category,
 product_name,
 sales
FROM
 RankedSales
WHERE
 sales_rank <= 3;

2. Can you explain how to optimize a slow-performing query involving multiple joins and subqueries?

To optimize a slow query with multiple joins and subqueries, start by analyzing the query execution plan using tools like EXPLAIN. Identify the most expensive operations, such as full table scans or poorly indexed joins. Focus on optimizing the worst offenders first. Index relevant columns used in JOIN conditions and WHERE clauses. Rewrite subqueries as joins where possible or consider using temporary tables to store intermediate results. Check statistics and update them to ensure the query optimizer has the best information.

Further optimizations may include rewriting the query to minimize the amount of data processed, filtering data as early as possible, or using appropriate data types. Partitioning large tables can also improve performance. Finally, consider denormalizing the database schema if read performance is critical and write performance is less of a concern. For example if using MySQL, check slow_query_log and also performance_schema for bottlenecks and specific query insights.

3. Describe a scenario where you'd use a recursive common table expression (CTE) and provide an example.

A recursive CTE is useful when you need to traverse a hierarchical data structure, such as an organizational chart, a bill of materials, or a graph of relationships. For example, imagine you have a table Employee with columns EmployeeID and ManagerID. You want to find all employees who report, directly or indirectly, to a specific manager.

Here's an example:

WITH RECURSIVE EmployeeHierarchy AS (
    -- Anchor member: select the initial manager
    SELECT EmployeeID, ManagerID, EmployeeID AS RootManagerID
    FROM Employee
    WHERE ManagerID IS NULL  -- Assuming top manager's ManagerID is NULL

    UNION ALL

    -- Recursive member: find subordinates of the current level
    SELECT e.EmployeeID, e.ManagerID, eh.RootManagerID
    FROM Employee e
    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)

SELECT EmployeeID FROM EmployeeHierarchy WHERE RootManagerID = 123; --Example:Find all under ManagerID 123

4. How do you handle duplicate data in a table and ensure data integrity?

To handle duplicate data and ensure integrity, I'd first identify the duplicates. This can be done using SQL queries with GROUP BY and HAVING COUNT(*) > 1 clauses to find rows with identical values in key columns. Then, I'd choose a strategy for dealing with them. Options include:

  • Removing duplicates: Use ROW_NUMBER() and a common table expression (CTE) to partition the data, assign a unique number to each row within the partition, and then delete rows with row_number > 1.
  • Preventing duplicates: Implement unique constraints or indexes on the relevant columns during table creation or alteration. This will prevent future insertion of duplicate rows.
  • Data validation: Implement validation rules in the application layer or using database triggers to catch and handle potential duplicates before they are inserted. Another approach is to implement MERGE statements for upsert logic where existing records are updated or new ones are created.

5. Explain the difference between clustered and non-clustered indexes and their impact on query performance.

Clustered indexes determine the physical order of data in a table. A table can only have one clustered index. Because the data is physically sorted according to the clustered index, queries that use the clustered index to retrieve data are generally very fast. Think of it like a phone book, where the data is physically ordered by last name.

Non-clustered indexes, on the other hand, are like a separate lookup table that contains a pointer to the actual data rows. A table can have multiple non-clustered indexes. When a query uses a non-clustered index, the database engine first finds the row in the index, and then uses the pointer to retrieve the actual data from the table. This can be slower than using a clustered index because of the extra step required to retrieve the data. However, non-clustered indexes can still significantly improve query performance, especially for queries that only need to retrieve a few columns of data. For example, imagine a library catalog; it can quickly point you to the location of a book without having to scan every book on the shelves.

6. How would you design a database schema to efficiently store and retrieve hierarchical data, like an organizational chart?

There are a few common ways to model hierarchical data in a relational database. An adjacency list model uses a simple self-referencing foreign key. A table would contain columns like id, name, and parent_id. parent_id would reference the id of the parent node, or be null for the root. Retrieval is straightforward for direct children, but becomes complex for deeper hierarchies requiring recursive queries or stored procedures.

Alternatively, a materialized path model stores the entire path to a node as a string (e.g., '1/5/12'). This allows for efficient retrieval of descendants using LIKE clauses. However, updates (insertions/deletions) require updating many rows to maintain path integrity. Another approach is the nested set model which uses left and right values to represent the tree structure. Retrieving descendants is efficient (e.g., WHERE left BETWEEN node.left AND node.right), but insertions/deletions are complex and computationally expensive. The optimal choice depends on the frequency of reads versus writes and the depth of the hierarchy.

7. Describe how to use partitioning to improve query performance on a very large table.

Partitioning a large table divides it into smaller, more manageable pieces based on a chosen partitioning key (e.g., date, region). This improves query performance by allowing the database to scan only the relevant partitions instead of the entire table. This drastically reduces I/O operations and CPU usage. For example, a table of customer orders partitioned by order date enables queries for orders in a specific month to only scan that month's partition.

Different partitioning strategies exist (range, list, hash), each suited for different use cases. Range partitioning (e.g., by date) is common for time-series data. Proper partition key selection is critical; frequently queried columns are good candidates. Query predicates should include the partition key to achieve optimal performance benefits. Also remember to consider maintenance (adding/removing partitions) which may impact performance.

8. Explain how to implement row-level security in a database.

Row-level security (RLS) restricts data access at the row level, enabling users to see only the data relevant to them. Implementation typically involves creating security policies associated with database tables. These policies define filter predicates that determine which rows a user can access based on attributes like user ID, group membership, or other relevant criteria.

Common methods include using database views with WHERE clauses based on user context (less secure, harder to manage), or implementing dedicated RLS features (if available) like in PostgreSQL with CREATE POLICY. RLS policies are generally more efficient and robust since the filtering is handled by the database engine during query execution. For example, in PostgreSQL:

CREATE POLICY user_policy ON my_table
FOR SELECT
TO app_user
USING (user_id = current_user);

9. How would you audit changes made to specific tables in a database?

To audit changes to specific tables, several approaches can be used. One common method is to use database triggers. A trigger can be set up to automatically execute a function whenever data is inserted, updated, or deleted in the target table. This trigger function captures the old and new values of the changed row and inserts them into an audit table. The audit table typically includes columns for the table name, the type of operation (insert, update, delete), the timestamp of the change, the user who made the change, and the changed data itself.

Another method involves using database-specific audit features. Many modern databases offer built-in audit capabilities that allow you to track changes to data and database objects. For example, in PostgreSQL, you can use extensions like pgaudit. These features offer more robust and efficient auditing compared to triggers. Choosing between triggers and native auditing depends on the specific database system, the level of detail required, and the performance considerations.

10. Can you describe the process of database normalization and its benefits?

Database normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It typically involves dividing databases into two or more tables and defining relationships between the tables. This is achieved by following a series of normal forms (1NF, 2NF, 3NF, etc.), each with specific rules to eliminate different types of redundancy.

The benefits of normalization include reduced data redundancy, improved data consistency, easier data modification, and faster query performance in some cases. By minimizing redundant data, updates only need to be made in one place, preventing inconsistencies. This also leads to a smaller database size and improved data integrity, as data is stored logically and consistently across the database.

11. How do you handle deadlocks in a database environment?

Deadlocks occur when two or more transactions are blocked indefinitely, waiting for each other to release the resources that they hold. We can handle deadlocks using several strategies including:

  • Deadlock prevention: This involves designing the system to prevent deadlocks from occurring in the first place. This can include ordering resource access, acquiring all necessary locks at once, or using timeouts to release locks.
  • Deadlock detection and recovery: This approach allows deadlocks to occur, but then detects them and takes action to resolve them. Databases often have a deadlock detection mechanism that identifies cycles in the 'waits-for' graph. Once a deadlock is detected, one of the transactions involved (the 'victim') is rolled back, releasing its locks and allowing other transactions to proceed. The victim transaction may then be retried. This approach can lead to 'livelock' where a transaction is repeatedly rolled back and retried. In practice, databases typically employ this detection and recovery because prevention can be restrictive and impact performance. Setting appropriate lock timeouts is also vital.

12. Explain the use of different isolation levels in transaction management.

Transaction isolation levels control the degree to which transactions are isolated from one another's modifications. Lower isolation levels permit higher concurrency but increase the risk of phenomena like dirty reads, non-repeatable reads, and phantom reads. Common isolation levels include: Read Uncommitted (lowest level, allows dirty reads), Read Committed (prevents dirty reads), Repeatable Read (prevents dirty reads and non-repeatable reads), and Serializable (highest level, prevents all the mentioned phenomena by essentially serializing transactions).

13. How would you implement a full-text search feature in a database?

To implement full-text search, I would typically use a dedicated full-text search engine like Elasticsearch or Solr alongside the database. These engines index the text data, allowing for efficient searching based on keywords, stemming, and other text analysis techniques. The database would then be used to store the core data, with the search engine handling text-based queries. When a search query is received, it's sent to the search engine, which returns the relevant document IDs. These IDs are then used to retrieve the full records from the database.

Alternatively, some databases offer built-in full-text search capabilities. For instance, PostgreSQL has tsvector and tsquery types along with functions for indexing and querying text data. MySQL has FULLTEXT indexes. While simpler to set up initially, these built-in features might not offer the same level of performance or advanced features as dedicated search engines for large datasets or complex search requirements.

14. Describe how to use database triggers to enforce complex business rules.

Database triggers can be used to enforce complex business rules by automatically executing SQL code in response to specific database events such as INSERT, UPDATE, or DELETE operations on a table. For example, you might use a trigger to prevent an order from being created if the customer's credit limit is exceeded, or to automatically update an inventory count when a new order is placed.

To implement a complex rule, you would typically define a trigger that fires before the relevant data modification. The trigger logic then checks if the proposed change violates the business rule. If it does, the trigger can ROLLBACK the transaction, preventing the change. SQL code example:

CREATE TRIGGER check_credit_limit
BEFORE INSERT ON Orders
FOR EACH ROW
BEGIN
  IF (SELECT credit_limit FROM Customers WHERE customer_id = NEW.customer_id) < NEW.order_total THEN
    SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Credit limit exceeded';
  END IF;
END;

15. How can you optimize queries that use the `LIKE` operator with wildcards?

Optimizing LIKE queries with wildcards depends on the wildcard's position. Leading wildcards (%value) prevent index usage, requiring a full table scan, which is inefficient.

To optimize:

  • Avoid leading wildcards: If possible, restructure the query to avoid % at the beginning. For example, instead of LIKE '%value', consider LIKE 'value%' which can use an index.
  • Full-text search: For complex wildcard searches, consider using full-text search capabilities offered by your database (e.g., MySQL's MATCH AGAINST, PostgreSQL's full-text search). These are optimized for text searching.
  • Database-specific optimizations: Explore database-specific features and indexing strategies. For example, some databases offer trigram indexes or other techniques that can improve LIKE performance, regardless of wildcard position.
  • Limit result set: Add LIMIT clause to reduce the number of rows searched/returned, when only a limited number of result is sufficient.
  • Use CONTAINS instead of LIKE in certain scenarios (e.g., SQL Server provides CONTAINS which can be more efficient for full-text searches if properly configured).

16. Explain the difference between `UNION` and `UNION ALL` and when to use each.

The UNION and UNION ALL operators in SQL are used to combine the result sets of two or more SELECT statements. The key difference is that UNION removes duplicate rows from the combined result set, while UNION ALL includes all rows, including duplicates.

Use UNION when you need a distinct set of rows from multiple sources and duplicate rows are not desired. Use UNION ALL when you want to combine all rows from multiple sources, regardless of duplicates, and performance is a concern, as removing duplicates can be a costly operation. For example, if you are archiving data and want to add one table's values into another table to create an archive, then UNION ALL should be used.

17. How would you design a database schema for storing time-series data?

A common approach is to use a wide-column store or a relational database with a specific schema. For relational databases, a simple schema could include a table with columns for timestamp, metric_name, and value. Partitioning the table by metric_name and/or time ranges (e.g., monthly) significantly improves query performance. Indexing timestamp is also crucial. For a larger scale implementation, consider using specialized time-series databases like InfluxDB or TimescaleDB, which offer optimized storage and querying capabilities for time-series data.

An alternative, often more scalable, approach involves a wide-column store, such as Cassandra or HBase. The row key would consist of the metric_name and timestamp, and the column value would hold the actual data. This allows for efficient retrieval of data for a specific metric over a time range. Data aggregation can also be performed using map-reduce jobs or other distributed computing frameworks if needed.

18. Describe different strategies for backing up and restoring a database.

Database backup and restoration are crucial for data protection and disaster recovery. Common backup strategies include: 1. Full backups: Copy the entire database. They are simple but time-consuming. 2. Differential backups: Back up only the changes since the last full backup. Restoration requires the last full and the latest differential. 3. Incremental backups: Back up only the changes since the last backup (full or incremental). Restoration requires the last full and all subsequent incremental backups in order. 4. Transaction log backups: Capture all transaction logs since the last full, differential, or transaction log backup. Used for point-in-time recovery. Restoration often involves restoring the full backup and then replaying the transaction logs. 5. Snapshot backups: Create a point-in-time copy using storage system capabilities; quick but might not be transactionally consistent.

Restoration strategies depend on the backup type and the desired recovery point. Full backups offer the simplest restore, while differential and incremental backups require restoring multiple backup sets. Transaction log backups allow for precise point-in-time recovery. Backup frequency depends on data change rate and acceptable data loss.

19. How do you monitor database performance and identify potential bottlenecks?

Database performance monitoring involves tracking key metrics and identifying deviations from established baselines. I typically use a combination of tools like built-in database monitoring features (e.g., Performance Monitor in SQL Server, Performance Schema in MySQL), third-party monitoring solutions (e.g., Datadog, New Relic), and custom scripts to gather data. Key metrics to monitor include CPU utilization, memory usage, disk I/O, query execution times, lock contention, and buffer cache hit ratios. I analyze slow queries using query analyzers/profilers to identify inefficient code, missing indexes, or poorly designed schemas. I would also check for deadlocks, excessive blocking, and long-running transactions which can point to bottlenecks.

20. Explain how to use stored procedures to encapsulate complex business logic.

Stored procedures are precompiled SQL statements stored within the database. They encapsulate complex business logic by providing a modular and reusable way to perform database operations. Instead of embedding SQL directly in applications, you call a stored procedure, passing parameters as needed. This approach offers several advantages.

Using stored procedures, you can centralize and maintain complex logic in one place. This simplifies application code, reduces redundancy, and improves security by controlling data access. For example, a stored procedure might handle a series of updates across multiple tables when a customer order is processed. You would create the stored procedure, for example in SQL Server:

CREATE PROCEDURE ProcessOrder (@CustomerID INT, @ProductID INT, @Quantity INT)
AS
BEGIN
  -- Logic to update inventory, create order record, etc.
END

Then, your application simply calls EXEC ProcessOrder @CustomerID = 123, @ProductID = 456, @Quantity = 1.

21. How would you migrate a database from one platform to another (e.g., MySQL to PostgreSQL)?

Migrating a database involves several steps, from planning to execution and validation. First, assessment is crucial. Analyze the schemas, data types, stored procedures, and application dependencies to understand the scope and potential incompatibilities. Then, schema conversion may be needed. Tools like pgloader (for PostgreSQL) can assist in converting the schema and data, but manual adjustments are often necessary due to differences in data types or SQL syntax.

After the schema is ready, the next step is data migration. This can be achieved through logical replication, dump and restore, or ETL tools. Logical replication provides continuous synchronization, useful for minimal downtime migrations. Dump and restore involves exporting data from the source and importing it into the destination. Finally, thorough testing and validation are essential to ensure data integrity and application functionality in the new environment. Monitor performance closely after the migration and have a rollback plan in place.

22. Describe the concept of sharding and its advantages in a large-scale database environment.

Sharding is a database architecture pattern where a large database is horizontally partitioned into smaller, more manageable databases called shards. Each shard contains a subset of the overall data, and they all reside on separate database servers. This distribution enables parallel processing and reduces the load on any single server.

The advantages of sharding in a large-scale environment include improved performance due to data localization and parallel query execution, increased storage capacity as data is spread across multiple servers, and enhanced availability and fault tolerance because failure of one shard does not necessarily bring down the entire system. It also allows for easier scaling, as you can add more shards to handle increasing data volumes or user traffic.

23. Explain how to use database views to simplify complex queries and improve security.

Database views are virtual tables based on the result-set of a SQL statement. They simplify complex queries by encapsulating them into a single, named object. Instead of writing the same complex query repeatedly, you can query the view as you would a regular table. This promotes code reusability and makes queries easier to understand and maintain.

Views also enhance security. You can grant users access to a view that only shows specific columns or rows from underlying tables, without granting them direct access to the tables themselves. This restricts access to sensitive data. For example, a view might only expose customer names and order dates but not credit card information. You can create views using SQL CREATE VIEW command. For example:

CREATE VIEW customer_orders AS
SELECT customer_name, order_date
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id;

24. How would you implement a data warehousing solution for business intelligence reporting?

A data warehousing solution typically involves these steps: First, identify the key business metrics and data sources. Then, extract, transform, and load (ETL) data from various sources (databases, APIs, flat files) into a staging area. Implement data cleansing and transformation logic, often using tools like Apache Spark or cloud-based ETL services. Finally, load the transformed data into a data warehouse (e.g., Snowflake, Amazon Redshift, Google BigQuery).

For reporting, I'd build a star schema or snowflake schema on top of the data warehouse. Then, use business intelligence tools like Tableau, Power BI, or Looker to create reports and dashboards. Regularly monitor data quality and refresh schedules to ensure accurate and timely insights. Considerations include scalability, security, and cost optimization across all stages.

25. Describe different techniques for data compression in a database.

Data compression in databases reduces storage space and improves I/O performance. Several techniques exist, including:

  • Row/Page Level Compression: Compresses entire rows or pages of data using algorithms like Lempel-Ziv (LZ77) or Lempel-Ziv-Welch (LZW). This is often transparent to applications.
  • Columnar Compression: Compresses individual columns separately, especially effective for data warehouses where similar data is stored in columns. Techniques include run-length encoding (RLE), dictionary encoding, and bit-vector encoding.
  • Data Deduplication: Identifies and eliminates redundant data copies, storing only unique instances. Often used for backups and archival data.
  • Dictionary Compression: Builds a dictionary of frequently occurring values and replaces them with shorter codes.
  • Delta Compression: Stores only the differences (deltas) between successive versions of data. Useful for time-series data or audit trails.

26. Explain how to use database functions to extend the functionality of SQL.

Database functions extend SQL functionality by allowing you to create custom routines that perform specific tasks within the database. These functions can be written in a variety of languages, depending on the database system (e.g., PL/SQL in Oracle, T-SQL in SQL Server, PL/pgSQL in PostgreSQL). They encapsulate complex logic, calculations, or data manipulations that would be cumbersome or impossible to express directly in SQL.

To use them:

  1. Define the function: You define a function within the database using a CREATE FUNCTION statement, specifying its name, input parameters, return type, and the code to be executed.

    CREATE FUNCTION calculate_discount (price DECIMAL, discount_percent DECIMAL) 
    RETURNS DECIMAL 
    AS $$ 
    BEGIN
      RETURN price * (1 - discount_percent/100);
    END; 
    $$ LANGUAGE plpgsql;
    
  2. Call the function: Once defined, you can call the function directly in your SQL queries, just like built-in SQL functions. For example:

    SELECT item_name, calculate_discount(price, 10) AS discounted_price FROM products;
    

Expert SQL Queries interview questions

1. How do you optimize a query that is running slowly due to a large number of joins?

To optimize a slow query with numerous joins, consider several strategies. First, ensure appropriate indexes exist on the join columns of all tables involved. This dramatically speeds up the join process by allowing the database to quickly locate matching rows. Secondly, review the join order. The order in which tables are joined can significantly impact performance; joining the smallest tables first can reduce the size of intermediate result sets. Sometimes rewriting the query to use subqueries, temporary tables, or common table expressions (CTEs) can help the database optimizer choose a more efficient execution plan. Finally, analyze the query execution plan provided by the database system (EXPLAIN PLAN in many SQL databases) to identify bottlenecks and areas for improvement, such as full table scans or inefficient join algorithms. Regularly updating table statistics also aids the query optimizer in making informed decisions.

Additionally, consider denormalization as a last resort if read performance is paramount and write performance is less critical. Reducing the number of joins by incorporating relevant data into a single table can improve query speed, but it comes with the cost of increased data redundancy and potential update anomalies. Carefully weigh the trade-offs before opting for denormalization. Review if the join types are appropriate, for example using LEFT JOIN when an INNER JOIN may be more appropriate if the joined record is required.

2. Describe a scenario where using a window function would be more efficient than using a subquery. Provide an example.

Window functions often outperform subqueries, especially for ranking and aggregation tasks within partitions of data. A common scenario is calculating a moving average or running total. Imagine you have a table of daily sales data for different stores and need to calculate a 7-day moving average of sales for each store. Using a subquery, you would need to execute a separate subquery for each row to calculate the average, which is computationally expensive. A window function can calculate this in a single pass.

Here's an example using SQL:

SELECT
    store_id,
    sale_date,
    daily_sales,
    AVG(daily_sales) OVER (PARTITION BY store_id ORDER BY sale_date ASC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_average_7_day
FROM
    sales_table;

In this example, the AVG() OVER() function calculates the average daily sales within a window of the current row and the 6 preceding rows, partitioned by store_id. This approach is typically much more efficient than achieving the same result with correlated subqueries, especially with larger datasets.

3. Explain how you would diagnose and resolve a deadlock situation in a database.

To diagnose a deadlock, I would first identify the processes involved. Most database systems offer tools or views to detect deadlocks, showing which processes are blocked and which resources they are waiting for. For example, in SQL Server, I'd use sp_who2 or query sys.dm_os_waiting_tasks. In PostgreSQL, I'd use pg_locks and pg_stat_activity. The output would reveal the conflicting transactions and the resources (e.g., tables, rows) they're contending for.

To resolve a deadlock, the most common approach is deadlock detection and automatic rollback (victim selection). The database system typically chooses one of the deadlocked transactions as a victim, rolls it back, and releases its resources, allowing the other transaction(s) to proceed. While automatic rollback is the standard solution, preventing deadlocks through careful transaction design is preferable. This involves ordering operations within transactions consistently, using appropriate locking strategies, and keeping transactions short. In applications, implementing retry logic after a rollback exception is crucial.

4. How can you implement row-level security in SQL to restrict access to certain data based on user roles?

Row-level security (RLS) in SQL restricts data access based on user roles. You typically achieve this using security policies or views. A security policy is a predicate function associated with a table that determines which rows a user can access. This predicate often checks user context (e.g., USER_NAME(), application role) against a column in the table or a separate roles table.

For example, in SQL Server, you would use CREATE SECURITY POLICY to define a filter predicate, which is then applied to the table. Alternatively, views can be created that filter data based on user roles; however, security policies are generally preferred for maintainability and performance. Another important feature is to implement data masking techniques to prevent exposing sensitive data.

5. Design a SQL query to identify and remove duplicate records from a table, keeping only the most recent entry.

To identify and remove duplicate records, keeping the most recent, you can use a combination of window functions and a common table expression (CTE). Assuming you have a table named your_table with columns like id, data, and created_at (timestamp), the following SQL query can achieve this:

WITH RankedRows AS (
    SELECT
        id,
        data,
        created_at,
        ROW_NUMBER() OVER(PARTITION BY data ORDER BY created_at DESC) AS rn
    FROM
        your_table
)
DELETE FROM your_table
WHERE id IN (SELECT id FROM RankedRows WHERE rn > 1);

This query first assigns a rank to each row within partitions defined by the data column, ordering by created_at in descending order. The ROW_NUMBER() function assigns rank 1 to the most recent entry. Then, it deletes all records from the original table whose id is present in the RankedRows CTE with rank greater than 1 (duplicate entries). Adapt the PARTITION BY clause to match the columns that define a duplicate, and the ORDER BY clause for the timestamp column.

6. Explain the difference between clustered and non-clustered indexes, and when you would use each.

A clustered index determines the physical order of data in a table. There can only be one clustered index per table because data can only be physically sorted in one way. A non-clustered index is like an index in a book; it stores a separate structure containing the indexed columns and pointers back to the data rows in the table. There can be multiple non-clustered indexes per table.

Use a clustered index when you frequently retrieve data in a specific order or need efficient range queries. Good candidates for clustered indexes are columns that are often used in ORDER BY clauses or for retrieving data within a specific date range. Use non-clustered indexes when you need to speed up queries based on specific column values without affecting the physical order of the data, or when you have multiple columns you need to frequently search on but cannot all fit in a clustered index. They're also useful for columns that are not frequently used in range queries.

7. How would you design a database schema to efficiently store and query time-series data?

For efficient time-series data storage and querying, I'd opt for a schema optimized for write-heavy workloads and range-based queries. A common approach involves a table with columns for timestamp, metric name, and value. Indexing the timestamp column is crucial. Furthermore, consider using a columnar database or a time-series database (TSDB) like TimescaleDB or InfluxDB. These are specifically designed to handle time-series data efficiently, offering features such as data compression, retention policies, and optimized query engines. Partitioning the table by time (e.g., daily or weekly) can significantly improve query performance by reducing the amount of data scanned.

An example schema might look like this:

CREATE TABLE metrics (
    time TIMESTAMPTZ NOT NULL,
    metric_name TEXT NOT NULL,
    value DOUBLE PRECISION NOT NULL
);

CREATE INDEX ON metrics (time);

8. Describe how you would use partitioning to improve the performance of a large table. What are the considerations?

Partitioning a large table involves dividing it into smaller, more manageable parts based on a specific key. This improves performance by allowing queries to scan only relevant partitions instead of the entire table. For example, if you have a table of sales data, you could partition it by date (e.g., monthly or yearly). When querying for sales in a specific month, the database only needs to scan that month's partition, drastically reducing I/O and improving query speed.

Considerations include choosing the appropriate partitioning key (a frequently queried column with good cardinality), the number of partitions (too many can lead to overhead, too few negates the benefits), and the type of partitioning (range, list, hash). Also, one must consider maintenance overhead like managing partitions. Data skew can affect performance if some partitions are significantly larger than others. Proper index maintenance on partitioned tables is also crucial.

9. Write a SQL query to calculate a running total or cumulative sum of a column in a table.

To calculate a running total in SQL, you can use window functions. The SUM() OVER() clause is the key. The ORDER BY clause within OVER() specifies the order in which the running total is calculated. If no partitioning is done, running total will be calculated over the whole table.

SELECT
    column_to_sum,
    SUM(column_to_sum) OVER (ORDER BY order_column)
AS running_total
FROM
    your_table;

Replace column_to_sum with the name of the column you want to sum, order_column with the column that defines the order for the running total calculation, and your_table with the actual table name. The PARTITION BY clause can be added in OVER() to calculate running totals for each partition separately.

10. How can you use Common Table Expressions (CTEs) to simplify complex queries? Give an example.

CTEs, or Common Table Expressions, simplify complex queries by breaking them down into smaller, more manageable, named result sets. These named result sets can then be referenced within the main query, improving readability and maintainability. Essentially, a CTE acts like a temporary view that exists only for the duration of a single query.

For example, consider a scenario where you want to find employees who earn more than the average salary of their department. Without CTEs, this query can be quite complex with nested subqueries. With CTEs, you can first define a CTE to calculate the average salary per department and then join it with the employee table to filter employees. Here's an example:

WITH AvgSalaries AS (
 SELECT department, AVG(salary) AS avg_salary
 FROM employees
 GROUP BY department
)
SELECT e.name, e.salary, a.department, a.avg_salary
FROM employees e
JOIN AvgSalaries a ON e.department = a.department
WHERE e.salary > a.avg_salary;

In this example, AvgSalaries CTE first calculates the average salary for each department, and then it is joined with employees table to filter the records.

11. Explain how you would use SQL to perform data masking or anonymization for privacy purposes.

SQL can perform data masking and anonymization in several ways to protect sensitive information. Common techniques include:

  • Substitution: Replacing actual data with realistic but fake values. For example, substituting names with randomly generated names.
  • Encryption: Encrypting sensitive columns using encryption functions. Only users with appropriate keys can decrypt the data.
  • Hashing: Using hashing functions (e.g., SHA256) to irreversibly transform data. Useful when the original value is not needed, only its uniqueness.
  • Redaction: Removing parts of the data. For instance, masking all but the last four digits of a credit card number.
  • Aggregation: Replacing individual data points with aggregate values (e.g., averages or counts).

For example, to mask email addresses, you could use UPDATE users SET email = CONCAT('user', id, '@example.com');. To redact part of a credit card column UPDATE customers SET credit_card = CONCAT('XXXXXXXXXXXX', RIGHT(credit_card, 4));. These operations are usually carried out on a copy of the production data or in a secure development/testing environment.

12. Describe how you would monitor and optimize database performance. What tools would you use?

To monitor database performance, I would focus on key metrics like query execution time, CPU utilization, memory usage, disk I/O, and lock contention. I'd use tools like the database's built-in performance monitor (e.g., Performance Monitor in SQL Server, pg_stat_statements in PostgreSQL, or MySQL Performance Schema). Other useful tools include dedicated monitoring solutions such as Prometheus with Grafana, Datadog, or New Relic. These tools provide dashboards and alerts to identify performance bottlenecks.

To optimize performance, I would start by identifying slow-running queries using the monitoring tools. I'd then analyze the execution plans to understand how the database is processing the query. Common optimization techniques include: indexing frequently queried columns, rewriting inefficient queries, optimizing data types, partitioning large tables, and caching frequently accessed data. Regularly updating database statistics is also important for the query optimizer to make informed decisions. For example, to get the execution plan in postgres:

EXPLAIN ANALYZE SELECT * FROM mytable WHERE column_name = 'some_value';

13. How would you implement a full-text search feature in your database? What are the options and trade-offs?

Implementing full-text search involves several options, each with its own trade-offs. One approach is using the database's built-in full-text search capabilities, such as PostgreSQL's tsvector and tsquery or MySQL's FULLTEXT indexes. This is convenient but might offer limited customization and performance compared to dedicated search engines. The trade-off is ease of implementation versus control and scalability.

Another option is to integrate a dedicated search engine like Elasticsearch or Apache Solr. These provide advanced features like stemming, synonyms, and relevance ranking, along with better performance for large datasets. However, this adds complexity to the system as it requires maintaining a separate service and keeping the search index synchronized with the database. The trade-off here is increased complexity and resource requirements for enhanced search functionality and performance.

14. Explain how you can use SQL to audit data changes and track who made them. Provide an example.

SQL auditing of data changes involves tracking modifications (inserts, updates, deletes) made to a table and recording who made them, and when. This can be achieved using triggers. A trigger is a special type of stored procedure that automatically executes in response to certain events on a table.

Here's an example using a table named employees and an audit table named employee_audit:

-- Audit table
CREATE TABLE employee_audit (
    audit_id INT PRIMARY KEY AUTO_INCREMENT,
    employee_id INT,
    old_name VARCHAR(255),
    new_name VARCHAR(255),
    updated_by VARCHAR(255),
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Trigger to track updates to the employees table
CREATE TRIGGER employees_AFTER_UPDATE
AFTER UPDATE ON employees
FOR EACH ROW
BEGIN
    INSERT INTO employee_audit (employee_id, old_name, new_name, updated_by)
    VALUES (OLD.id, OLD.name, NEW.name, USER());
END;

This trigger, employees_AFTER_UPDATE, will automatically insert a row into the employee_audit table whenever a row in the employees table is updated. The USER() function captures the username of the user who performed the update. Similar triggers can be created for INSERT and DELETE operations to capture all changes. OLD and NEW keywords allows for comparison.

15. Describe a scenario where you would use a recursive CTE and provide an example query.

A recursive CTE (Common Table Expression) is useful for querying hierarchical or tree-structured data. A classic example is an organizational chart or a bill of materials. Imagine a table Employee with columns EmployeeID and ManagerID. You want to find all employees who report (directly or indirectly) to a specific manager.

Here's an example query using a recursive CTE to achieve this:

WITH RECURSIVE EmployeeHierarchy AS (
 -- Anchor member: Select the direct reports of the manager
 SELECT EmployeeID, ManagerID, 0 AS Level
 FROM Employee
 WHERE ManagerID = 'SpecificManagerID'

 UNION ALL

 -- Recursive member: Select employees reporting to those in the previous level
 SELECT e.EmployeeID, e.ManagerID, eh.Level + 1 AS Level
 FROM Employee e
 INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT EmployeeID
FROM EmployeeHierarchy;

16. How would you handle a situation where a query is timing out due to resource constraints? What steps would you take to resolve it?

When a query times out due to resource constraints, I would first identify the resource bottleneck, which could be CPU, memory, or I/O. I'd use monitoring tools to pinpoint which resource is being exhausted. Then, I'd focus on optimizing the query itself by:

  • Analyzing the query plan: Use EXPLAIN (or the equivalent in the specific database) to understand how the database is executing the query and identify potential performance bottlenecks like full table scans or missing indexes.
  • Adding indexes: If the query is performing full table scans, adding appropriate indexes to the columns used in WHERE clauses, JOIN conditions, and ORDER BY clauses can significantly improve performance.
  • Rewriting the query: Consider rewriting the query to be more efficient, possibly using alternative join strategies, subqueries, or by breaking down the query into smaller, more manageable parts.
  • Limiting the result set: If the query returns a large number of rows, consider using LIMIT to retrieve only the necessary data.

If query optimization doesn't fully resolve the issue, I'd consider increasing the allocated resources (e.g., more memory, faster CPUs) to the database server. As a temporary workaround, I might increase the query timeout duration, but this is not a long-term solution. Tuning database configurations, such as buffer pool size, can also help. Finally, if the workload is consistently exceeding available resources, I would consider database sharding/partitioning strategies.

17. Explain the ACID properties of database transactions and how they ensure data integrity.

ACID properties are a set of principles that guarantee database transactions are processed reliably. They are: Atomicity (all or nothing, a transaction completes fully or not at all), Consistency (the transaction must maintain database integrity constraints, moving the database from one valid state to another), Isolation (concurrent transactions do not interfere with each other, appearing as if they are executed sequentially), and Durability (once a transaction is committed, its changes are permanent, even in the event of system failures).

These properties ensure data integrity by preventing partial updates (Atomicity), enforcing rules (Consistency), avoiding conflicts between users (Isolation), and guaranteeing data survival (Durability). For example, consider a bank transfer. Atomicity ensures that either both the debit from one account and the credit to another happen, or neither does. Isolation prevents issues if two transfers to the same account happen at the same time.

18. How do you handle null values in SQL queries to avoid unexpected results? Give examples.

To handle null values in SQL queries, I primarily use the IS NULL and IS NOT NULL operators for filtering. For example, to select all rows where the email column is null, I would use SELECT * FROM users WHERE email IS NULL;. Conversely, to select rows where email is not null: SELECT * FROM users WHERE email IS NOT NULL;.

Additionally, the COALESCE() function is valuable for replacing null values with a specified default. For instance, SELECT COALESCE(email, 'no_email@example.com') FROM users; would return the email address if it exists, or 'no_email@example.com' if the email is null. These techniques help prevent unexpected behavior in comparisons and calculations involving nulls, ensuring query results are predictable and accurate. Using functions like NVL() and NULLIF() (depending on the database system) serves a similar purpose.

19. Describe how you would use SQL to perform data validation and ensure data quality.

SQL can be used to enforce data validation through several mechanisms. CHECK constraints can be defined on tables to ensure that values in a column meet specific criteria (e.g., a date is within a valid range, a string matches a certain pattern). NOT NULL constraints prevent missing data in required columns. UNIQUE constraints guarantee that there are no duplicate values in a column or set of columns. Foreign key constraints (FOREIGN KEY) maintain referential integrity between tables, ensuring that relationships between tables are valid. Furthermore, stored procedures can be created to perform more complex validation logic before data is inserted or updated.

To further enhance data quality, SQL can be used for data profiling to identify inconsistencies or anomalies. For example, you can use COUNT(DISTINCT column_name) to check for data uniqueness, or aggregate functions like MIN, MAX, AVG to find outliers and potential data errors. SQL queries can also be scheduled to run regularly to identify data quality issues and trigger alerts. Using CASE statements helps to categorize and validate data based on different rules.

20. How would you optimize SQL queries that involve subqueries? What are the alternatives?

To optimize SQL queries with subqueries, focus on rewriting them. Subqueries can often be inefficient, especially correlated subqueries, because they might execute for each row processed by the outer query. Here are some alternatives:

  • JOINs: Replace subqueries (especially in the WHERE clause) with JOIN operations. This allows the database optimizer to efficiently combine data from multiple tables.
  • Temporary Tables/Common Table Expressions (CTEs): For complex subqueries, create a temporary table or use a CTE to store the intermediate result. Then, query the temporary table/CTE. CTEs help improve readability and can sometimes allow the optimizer to make better choices.
  • Indexes: Ensure appropriate indexes are in place on the columns involved in the subquery and the outer query's join conditions or WHERE clause. Indexes significantly speed up data retrieval.
  • Rewriting with EXISTS or NOT EXISTS: Sometimes, IN or NOT IN subqueries can be replaced with EXISTS or NOT EXISTS for better performance, particularly with large datasets.

For example, instead of SELECT * FROM orders WHERE customer_id IN (SELECT customer_id FROM customers WHERE city = 'New York'), you might use SELECT o.* FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE c.city = 'New York'. Also, consider using your database's query execution plan to analyze and identify bottlenecks. Based on the execution plan, fine-tune indexes or query structure for optimal performance.

21. Explain how you would use database triggers to automate tasks or enforce business rules.

Database triggers are special stored procedures that automatically execute in response to certain events on a particular table or view. I would use them to automate tasks like auditing changes to data, automatically updating derived data, or enforcing complex business rules. For example, I could create a trigger that fires AFTER INSERT on an Orders table to automatically update the Customer table with the customer's total order count. This removes the need to update Customer in the application code, centralizing the logic in the database.

Another use case is to enforce business rules. A trigger can BEFORE INSERT on a Products table to ensure that the product price is within an acceptable range. IF NEW.price < minimum_price THEN SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Price below minimum'; END IF;. This enforces a business rule at the database level, preventing bad data from entering the system regardless of the application used to insert the data.

22. Design a query to find the top N records within each group, such as the top 3 customers with the highest sales in each region.

To find the top N records within each group (e.g., top 3 customers with highest sales in each region), you can use window functions (also known as analytic functions) in SQL. The ROW_NUMBER(), RANK(), or DENSE_RANK() window functions are very effective for this task.

Here's a general SQL query structure using ROW_NUMBER():

SELECT * FROM (
    SELECT 
        *, 
        ROW_NUMBER() OVER (PARTITION BY region ORDER BY sales DESC) as rn
    FROM 
        customers
) AS subquery
WHERE rn <= 3;

In this example:

  • PARTITION BY region divides the data into groups based on the region column.
  • ORDER BY sales DESC orders the customers within each region by their sales in descending order.
  • ROW_NUMBER() OVER (...) assigns a unique rank to each customer within each region based on sales.
  • The outer query filters the results to include only the top 3 customers (where rn <= 3) in each region.

23. How can you use explain plans to understand the execution plan of a query and identify potential bottlenecks?

Explain plans are invaluable tools for understanding how a database executes a query. By examining the explain plan, you can see the sequence of operations the database will perform, such as table scans, index lookups, joins, and sorts. This allows you to identify potential bottlenecks like full table scans on large tables (where an index might be more efficient), inefficient join algorithms, or missing indexes.

To use an explain plan, you generally prefix your query with EXPLAIN (or a database-specific equivalent like EXPLAIN ANALYZE for PostgreSQL). The output details each step in the execution. Key metrics to analyze include the estimated cost, number of rows processed, and the type of access method used (e.g., index seek vs. full table scan). By focusing on the most expensive steps, you can then explore optimization strategies like adding indexes, rewriting the query, or updating statistics.

SQL Queries MCQ

Question 1.

Consider two tables: Employees (EmpID, EmpName, DeptID, Salary) and Departments (DeptID, DeptName). Which SQL query retrieves the names of employees who work in the 'Sales' department and have a salary greater than $60,000?

Options:
Question 2.

Which SQL query calculates the total number of orders placed by each customer and orders the results by the number of orders in descending order?

Options:
Question 3.

Which SQL query retrieves the names of all products with a price greater than the average product price in the 'Products' table, and orders the results by price in descending order?

Options:

Options:
Question 4.

Which SQL query calculates the average salary for each department, but only includes departments with more than 5 employees?

options:

Options:
Question 5.

Which SQL query retrieves the name of the customer who placed the most recent order, assuming you have tables named Customers (with columns CustomerID, CustomerName) and Orders (with columns OrderID, CustomerID, OrderDate)?

Options:
Question 6.

Which SQL query retrieves the names of departments where the maximum salary of employees in that department is greater than the average salary of all employees across all departments?

Options:
Question 7.

Which SQL query retrieves the names of employees who are managers and have a salary greater than the average salary of all employees who are analysts?

Options:
Question 8.

Which SQL query retrieves the names of employees who have the same salary as the highest salary within their respective department?

Options:
Question 9.

Which SQL query retrieves the names of employees whose salary is greater than the salary of at least one employee in the 'Marketing' department?

options:

Options:
Question 10.

Which SQL query retrieves the names of all departments that do not have any employees?

Options:

Options:
Question 11.

Given a table Products with columns product_name, category, and price, which SQL query retrieves the names of products whose price is higher than the average price of products within the same category?

options:

Options:
Question 12.

Which SQL query correctly calculates the total salary for each department?

Options:
Question 13.

Which SQL query retrieves the names of employees who earn more than all employees in the 'Finance' department?

Options:

Options:
Question 14.

Which SQL query retrieves the names of employees who have a salary greater than the salary of every employee in the 'Marketing' department?

Options:
Question 15.

Which SQL query retrieves the names of departments where the average salary of employees in that department is greater than the overall average salary of all employees across all departments?

Options:

  • A) SELECT department_name FROM departments WHERE AVG(salary) > (SELECT AVG(salary) FROM employees);
  • B) SELECT d.department_name FROM departments d JOIN employees e ON d.department_id = e.department_id GROUP BY d.department_name HAVING AVG(e.salary) > (SELECT AVG(salary) FROM employees);
  • C) SELECT department_name FROM departments WHERE salary > (SELECT AVG(salary) FROM employees);
  • D) SELECT d.department_name FROM departments d JOIN employees e ON d.department_id = e.department_id WHERE AVG(e.salary) > (SELECT AVG(salary) FROM employees);
Options:
Question 16.

Which SQL query retrieves the department with the highest average salary?

options:

Options:
Question 17.

Which SQL query identifies customers who have placed a number of orders that is greater than the average number of orders placed by all customers?

Options:

Options:
Question 18.

Which SQL query retrieves the names of all departments where every employee's salary is greater than $50,000?

Options:
Question 19.

Which SQL query retrieves the names of employees who report to a manager whose salary is greater than $80,000? Assume the table Employees has columns employee_id, employee_name, manager_id, and salary. The manager_id column references the employee_id of the manager.

Options:
Question 20.

Which SQL query retrieves the second highest salary from the 'Employees' table? Assume the table has columns 'employee_id', 'name', and 'salary'.

Options:

Options:
Question 21.

Which SQL query retrieves the department(s) with the lowest average salary?

options:

Options:
Question 22.

Write an SQL query to retrieve the names of employees who joined the company before any employee in the 'Sales' department. Consider the employees table with columns employee_id, employee_name, department, and join_date.

Options:
Question 23.

What SQL query retrieves the total sales for each product category, but only includes those categories where the total number of sales exceeds 100?

Options:

Options:
Question 24.

Which SQL query retrieves the names of departments where the minimum salary of employees is greater than $50,000 and the average salary is less than $70,000?

Options:

Options:
Question 25.

Which SQL query retrieves the names of employees who share both the same department and job title with at least one other employee?

options:

Options:

Which SQL Queries skills should you evaluate during the interview phase?

You can't assess everything about a candidate in a single interview. However, for SQL queries, evaluating a few core skills will significantly indicate a candidate's abilities. These include understanding database fundamentals, writing effective queries, and optimizing performance.

Which SQL Queries skills should you evaluate during the interview phase?

SQL Fundamentals

See if candidates truly know their SQLs with an SQL assessment asking targeted questions. This can help you filter candidates who have a solid base and those who don't.

To assess their grasp, try a question that requires them to demonstrate their understanding of SQL basics.

Write a SQL query to select all columns from a table named 'Employees' where the 'Department' is 'Sales'.

Look for the candidate to use SELECT *, FROM Employees, and WHERE Department = 'Sales' correctly. They should also understand how to specify the table name and the condition in the WHERE clause.

Query Writing

Screen candidates on their SQL query writing skills using relevant MCQs. This helps to identify candidates who can write complex queries and manipulate data effectively.

You can evaluate this with a question that requires them to join tables and use aggregate functions.

Given two tables, 'Orders' and 'Customers', write a query to find the total number of orders placed by each customer.

The candidate should be able to use a JOIN clause to combine the tables, GROUP BY to group by customer, and COUNT() to count the number of orders. Correctly using aliases for clarity is also a plus.

Query Optimization

Identify candidates with optimization knowledge with MCQs focused on indexing and query execution plans. These tests help weed out candidates who only write queries but don't understand performance.

Pose a question about optimization to check for this skill.

How would you optimize a slow-running query that retrieves data from a large table?

Look for answers that mention indexing, analyzing the query execution plan, and rewriting the query to avoid full table scans. Bonus points if they mention specific techniques like using WHERE clauses with indexed columns.

3 Tips for Using SQL Queries Interview Questions

Before you start putting what you've learned to use, here are our top three tips to help you make the most of SQL queries in your interview process. Following these suggestions will improve your chances of identifying the best candidates.

1. Leverage SQL Skill Tests to Streamline Initial Screening

Save valuable interview time by using SQL skill tests as part of your initial screening process. This will allow you to quickly assess candidates' practical SQL abilities and filter out those who don't meet your requirements.

Consider using Adaface's SQL Online Test or SQL Coding Test to evaluate candidates' proficiency in writing and debugging SQL queries. For those requiring expertise in specific database systems, we also offer tests tailored for MySQL, PostgreSQL, MS SQL Server, and more. Check out our comprehensive test library for more options.

By integrating these assessments, you can objectively measure a candidate's SQL skills, ensuring that interview time is reserved for more in-depth discussions with qualified individuals. This approach helps you make data-driven decisions early in the hiring process.

2. Strategically Outline Key SQL Interview Questions

Time is a precious commodity during interviews, so be strategic about the SQL questions you select. Choosing a focused set of relevant questions helps you maximize your evaluation of candidates on the most critical aspects of the role.

Beyond SQL-specific questions, consider exploring related areas that complement database skills. Interview questions on data modeling and data analysis can reveal a candidate's ability to design and interpret data structures, while questions assessing communication skills can gauge their ability to explain technical concepts clearly.

Carefully curate your question set to ensure that you cover essential SQL concepts and related skills, providing a well-rounded assessment of each candidate's potential.

3. Master the Art of Asking Follow-Up Questions

Simply posing SQL questions is not enough; the real insights come from the follow-up. Thoughtful follow-up questions are key to truly understanding a candidate's depth of knowledge.

For instance, after asking a candidate to write a SQL query to retrieve specific data, follow up with questions like, 'How would this query perform with a very large dataset?' This will reveal their understanding of query optimization and scalability.

Hire Top SQL Talent with Targeted Assessments

Looking to hire candidates with strong SQL skills? Accurately assessing their abilities is key. Using skill tests is the most effective way to validate their expertise. Explore Adaface's range of SQL tests, including tests for MySQL, PostgreSQL, and more.

Once you've identified top performers through skills assessments, it's time to invite them for interviews. Ready to get started? Sign up or learn more about our online assessment platform.

SQL Online Test

25 mins | 10 MCQs
The SQL online test evaluates a candidate's ability to design and build relational databases and tables from scratch, apply CRUD options, write efficient queries and subqueries to filter data and create efficient indexes for faster SQL queries.
Try SQL Online Test

Download SQL Queries interview questions template in multiple formats

SQL Queries Interview Questions FAQs

What are some basic SQL interview questions?

Basic SQL interview questions cover topics like SELECT statements, WHERE clauses, JOINs, and understanding primary and foreign keys. They assess a candidate's ability to retrieve and manipulate data from a database.

What are some intermediate SQL interview questions?

Intermediate SQL interview questions might involve subqueries, aggregate functions (COUNT, SUM, AVG), GROUP BY, HAVING clauses, and understanding different types of JOINs (LEFT, RIGHT, INNER, OUTER). These questions evaluate a candidate's ability to perform more complex data analysis.

What are some advanced SQL interview questions?

Advanced SQL interview questions could cover topics like window functions, common table expressions (CTEs), stored procedures, triggers, indexing, query optimization, and transaction management. They test a candidate's ability to design and optimize database solutions.

What are some expert SQL interview questions?

Expert SQL interview questions might delve into database architecture, performance tuning at scale, replication, sharding, security best practices, and advanced data modeling techniques. They assess a candidate's mastery of SQL and database administration.

How can I best use SQL interview questions during hiring?

Use SQL interview questions to evaluate a candidate's understanding of database concepts, problem-solving skills, and ability to write clean, optimized SQL code. Combine theoretical questions with practical exercises to get a holistic view.

Besides technical skills, what else should I look for in a SQL candidate?

Look for candidates who demonstrate strong communication skills, the ability to work in a team, a passion for data, and a willingness to learn and adapt to new technologies. Problem-solving aptitude and attention to detail are also good indicators.

Related posts

Free resources

customers across world
Join 1200+ companies in 80+ countries.
Try the most candidate friendly skills assessment tool today.
g2 badges
logo
40 min tests.
No trick questions.
Accurate shortlisting.