Article written by Rishabh Dev Choudhary, under the guidance of Lakshmi Pavan Daggubati, a Data Science Manager at Meta. Reviewed by Vishal Rana, a versatile ML Engineer with deep expertise in data engineering, big data pipelines, advanced analytics, and AI-driven solutions..
The growing demand for data analysts, driven by automation and large-scale digital transformation, has positioned this profession among the most secure and future-focused career paths in today’s data economy. Preparation to SQL interview questions and answers are centr
As organisations become increasingly data driven, demand for analysts who can query, transform
In many technical interviews, SQL
This guide compiles over 50 carefully selected questions spanning basic SQL funda
The aspiring candidates preparing for SQL
Basic SQL rounds evaluate whether you possess a dependable working foundation that reflects real on-the-job data work. Interviewers commonly assess your command of joins, aggregations, filtering logic, NULL handling, and data integrity principl
These topics frequently appear in SQL interview questions for da
SQL (Structured Query Language) is the standard language for querying and managing data stored in relational databases. A DBMS (Database Management System) is software that allows you to sto
SQL is the most widely required technical skill in analyst job descriptions because i
To understand how SQL works, you first need
Joins are used to combine rows from two or more tables based on a related column. Data analysts rely on joins constantly, for example, linking a customers table to an orders table to see which customers made purchases.
A JOIN combines rows from tw
How do two tables connect?
customers table orders table
+-------------+ +----------+-------------+
| customer_id |<─────────| order_id | customer_id |
| name | | amount | |
| email | | order_date |
+-------------+ +----------+-------------+
Shared column: customer_id
Example
SELECT c.name, o.order_id, o.amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
Although all JOIN operations combine data from related tables, they differ in how they treat un
To clearly understand how these JOIN types differ, let us examine what each one returns and the situations where it is most appropriately used. The comparison below highlights how each JOIN handles matching a
| Join Type | What It Returns | When to Use It |
| INNER JOIN | Only rows where the join condition matches in both tables | When you only want records that exist in both tables |
| LEFT JOIN | All rows from the left table, plus matching rows from the right (NULLs where no match) | When you want all records from the main table, regardless of whether a match exists |
| RIGHT JOIN | All rows from the right table, plus matching rows from the left (NULLs where no match) | Less common; same as LEFT JOIN with table order reversed |
| FULL OUTER JOIN | All rows from both tables; NULLs fill the unmatched sides | When you want a complete picture with no records dropped |
-- INNER JOIN
SELECT c.name, o.order_id
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
-- LEFT JOIN
SELECT c.name, o.order_id
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
-- FULL OUTER JOIN
SELECT c.name, o.order_id
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;
A SELF JOIN joins a table to itself. This is useful when rows in the same table have a relationship with each other, for example, an employees table where each employee has a manager_id that references another row in the same table.
Example: Finding each employee and their manager:
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;
A CROSS JOIN returns the Cartesian product of two tables, every row in the first table is paired with every row in the second. If table A has 3 rows and table B has 4 rows, the result has 12 rows.
Common use case: Generating all possible combinations. For example, pairing every product with every store location to create a template for sales reporting.
SELECT p.product_name, s.store_name
FROM products p
CROSS JOIN stores s;
| Feature | JOIN | UNION |
| Direction | Combines columns from multiple tables | Combines rows from multiple queries |
| Requirement | Tables share a related column | Queries must return the same number of columns with compatible data types |
| Duplicates | Not a concern in the same way | UNION removes duplicates; UNION ALL keeps them |
| Use case | Enriching a dataset with additional columns | Stacking similar datasets from different sources |
Aggregate functions summarise multiple rows into a single value. In data analysis, they are the core of any reporting query, calculating totals, averages, counts, and finding extremes across groups of data.
Aggregate functions perform a calculation across a set of rows and return a single result.
| Function | What It Does |
| COUNT() | Counts the number of rows |
| SUM() | Adds up all values in a column |
| AVG() | Returns the average value |
| MIN() | Returns the smallest value |
| MAX() | Returns the largest value |
SELECT
COUNT(*) AS total_orders,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value,
MIN(amount) AS smallest_order,
MAX(amount) AS largest_order
FROM orders;
This is one of the most frequently tested basic SQL questions. The key distinction is when the filtering happens.
| Feature | WHERE | HAVING |
| Filters | Individual rows | Grouped results |
| Used with | Any query | Must follow GROUP BY |
| Can reference aggregate functions? | No | Yes |
| Executes | Before aggregation | After aggregation |
-- WHERE filters rows before grouping
SELECT region, SUM(amount) AS total_sales
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY region;
-- HAVING filters groups after aggregation
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
HAVING SUM(amount) > 100000;
GROUP BY collapses all rows that share the same value in one or more columns into a single summary row. Any column in the SELECT list that is not inside an aggregate function must appear in the GROUP BY clause.
Example: Total sales by region:
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
ORDER BY total_sales DESC;
Use COUNT(DISTINCT column_name) to count unique values, excluding duplicates and NULLs.
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;
It is particularly useful in data analysis when you need to know how many unique users, products, or events exist in a dataset, rather than the raw row count which may include repeats.
| Variation | What It Counts | Includes NULLs |
| COUNT(*) | All rows, including those with NULL values in any column | Yes |
| COUNT(column) | Only rows where the specified column is NOT NULL | No |
| COUNT(DISTINCT column) | Only unique non-NULL values in the specified column | No |
-- Counts all rows
SELECT COUNT(*) FROM orders;
-- Counts only rows where email is not NULL
SELECT COUNT(email) FROM customers;
-- Counts unique non-NULL product categories
SELECT COUNT(DISTINCT category) FROM products;
Filtering and sorting determine which rows your query returns and the order in which they appear, making them foundational operations that are tested in almost every SQL screening.
Use the WHERE clause to filter rows based on one or more conditions. The WHERE clause is evaluated before any aggregation.
-- Single condition
SELECT * FROM orders WHERE amount > 500;
-- Multiple conditions
SELECT * FROM orders
WHERE amount > 500
AND order_date >= '2025-01-01';
-- Using OR
SELECT * FROM customers
WHERE country = 'India' OR country = 'Singapore';
| Operator | Purpose | Example |
| BETWEEN | Filters rows within a range (inclusive) | amount BETWEEN 100 AND 500 |
| IN | Filters rows matching any value in a list | country IN (‘India’, ‘US’, ‘UK’) |
| LIKE | Filters rows matching a pattern | name LIKE ‘A%’ |
-- BETWEEN
SELECT * FROM orders WHERE amount BETWEEN 100 AND 500;
-- IN
SELECT * FROM customers WHERE country IN ('India', 'US', 'UK');
-- LIKE (% = any sequence of characters, _ = single character)
SELECT * FROM customers WHERE name LIKE 'A%';
SELECT * FROM products WHERE code LIKE 'SKU_001';
Use ORDER BY to sort results. By default, sorting is ascending (smallest to largest). Add DESC for descending order.
-- Ascending (default)
SELECT name, salary FROM employees ORDER BY salary;
-- Descending
SELECT name, salary FROM employees ORDER BY salary DESC;
-- Sort by multiple columns
SELECT department, name, salary
FROM employees
ORDER BY department ASC, salary DESC;
Both restrict the number of rows returned, but they belong to different SQL dialects.
| Keyword | Database | Syntax |
| LIMIT | MySQL, PostgreSQL, SQLite | At the end of the query |
| TOP | SQL Server, MS Access | At the start of the SELECT clause |
-- PostgreSQL / MySQL
SELECT * FROM orders ORDER BY amount DESC LIMIT 10;
-- SQL Server
SELECT TOP 10 * FROM orders ORDER BY amount DESC;
NULL values represent missing or unknown data. NULLs can silently distort aggregations and comparisons if not handled correctly; for example, AVG() ignores NULLs, which may give a misleading average if many values are missing.
NULL means the value is unknown or absent, it is not the same as 0 (which is a real number) or ” (which is a real but empty string). Think of it like a blank cell on a form versus a cell where someone wrote “0” or “N/A.”
The distinction matters because NULL behaves differently in comparisons and arithmetic, any arithmetic operation involving NULL returns NULL.
Use IS NULL or IS NOT NULL. Never use = NULL. It is a very common mistake that always returns no results because NULL cannot be equal to anything, including itself.
-- Correct
SELECT * FROM customers WHERE phone IS NULL;
SELECT * FROM customers WHERE phone IS NOT NULL;
-- WRONG -- this returns no rows
SELECT * FROM customers WHERE phone = NULL;
COALESCE returns the first non-NULL value in a list of arguments. It is widely used in data cleaning to replace NULLs with a default or fallback value.
Example: Replacing NULL phone numbers with a placeholder:
SELECT
name,
COALESCE(phone, 'No phone on record') AS contact_number
FROM customers;
-- Also useful in calculations to avoid NULL propagation
SELECT
order_id,
amount + COALESCE(discount, 0) AS adjusted_amount
FROM orders;
| Function | What It Does |
| COALESCE(a, b) | Returns the first non-NULL value from the list |
| NULLIF(a, b) | Returns NULL if a equals b; otherwise returns a |
NULLIF is often used to avoid division-by-zero errors.
-- COALESCE: replace NULL with a default
SELECT COALESCE(NULL, 0); -- returns 0
SELECT COALESCE(NULL, NULL, 5); -- returns 5
-- NULLIF: return NULL when values match (prevents divide-by-zero)
SELECT total_revenue / NULLIF(total_orders, 0) AS avg_order_value
FROM summary;
Constraints enforce rules on the data stored in a table, ensuring accuracy and consistency. Understanding them shows interviewers that you understand data integrity, not just how to query data, but how it is structured.
A primary key is a column (or combination of columns) that uniquely identifies each row in a table. No two rows can share the same primary key value, and a primary key column cannot contain NULL.
Example: In a customers table, customer_id is typically the primary key. Every customer gets a unique ID, and no customer record can exist without one.
A foreign key is a column in one table that references the primary key of another table, establishing a link between the two. It enforces referential integrity, you cannot insert a foreign key value that does not exist in the referenced table.
Example:
customers table orders table
+-------------+ +-----------+-------------+
| customer_id |<─────────── | order_id | customer_id |
| name | | amount | |
+-------------+ +-----------+-------------+
PRIMARY KEY FOREIGN KEY
An orders row with customer_id = 999 cannot exist unless a customer with ID 999 exists in the customers table.
| Feature | Primary Key | Unique Key |
| Uniqueness | Must be unique | Must be unique |
| NULL values | Not allowed | Allowed (usually one NULL per column) |
| Number per table | Only one | Multiple unique keys allowed |
| Purpose | Uniquely identifies each row | Ensures no duplicates in a column |
Constraints are rules applied to columns to limit the type of data that can be entered.
| Constraint | What It Enforces |
| PRIMARY KEY | Unique, non-NULL identifier for each row |
| FOREIGN KEY | References a valid value in another table |
| UNIQUE | No duplicate values in the column |
| NOT NULL | Column cannot be left empty |
| CHECK | Values must satisfy a specified condition (e.g. age > 0) |
| DEFAULT | Assigns a default value when none is provided |
Normalization is the process of organising a database to reduce redundancy and improve data integrity. A properly normalised schema makes data easier to maintain and less prone to update anomalies.
Normalization means restructuring tables so that each piece of information is stored in only one place. For example, instead of storing a customer’s name and address in every order row, you split the data into a customers table and an orders table linked by a foreign key. It prevents inconsistencies, if the customer’s address changes, you update it in one place, not across hundreds of order rows.
| Normal Form | Rule (Plain Language) | What It Fixes |
| 1NF – First Normal Form | Each column holds a single, atomic value. No repeating groups. | A phone_numbers column storing multiple numbers as “123, 456”, split into separate rows or a separate table. |
| 2NF – Second Normal Form | Must be in 1NF. Every non-key column depends on the whole primary key, not part of it. | Applies to composite keys. If a table uses (order_id, product_id) as a key, a column like customer_name that only depends on order_id violates 2NF, move it to an orders table. |
| 3NF – Third Normal Form | Must be in 2NF. Non-key columns must depend only on the primary key, not on other non-key columns. | If a table stores zip_code and city, and city is determined by zip_code rather than the primary key, move them into a zip_codes lookup table. |
| Feature | Normalization | Denormalization |
| Goal | Reduce redundancy, ensure integrity | Improve read performance |
| Data duplication | Minimised | Intentionally introduced |
| Write operations | Faster, fewer anomalies | Slower, more data to maintain |
| Read operations | More joins required | Fewer joins, simpler queries |
| Best for | Transactional systems (OLTP) | Reporting and analytics (OLAP) |
In data analyst workflows, denormalization is common in data warehouses where fast reads matter more than write efficiency.
A subquery is a query nested inside another query. Subqueries allow you to use the result of one query as input for another.
A subquery (also called an inner query or nested query) is a SELECT statement written inside another SQL statement. It runs first, and its result is used by the outer query.
When to use a subquery vs. a join: Subqueries are often more readable for filtering against aggregate results. Joins are generally preferred for combining tables for performance reasons.
-- Find all customers who have placed at least one order above $1000
SELECT name
FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM orders
WHERE amount > 1000
);
| Feature | Non-Correlated | Correlated |
| Dependency on outer query | Independent, runs once | Depends on the outer query, runs once per row |
| Performance | Generally faster | Can be slow on large datasets |
| Typical use | Filtering against a fixed set | Row-level comparisons |
-- Non-correlated: inner query runs once and returns a fixed list
SELECT name FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders);
-- Correlated: inner query references the outer query's current row
SELECT name FROM employees e
WHERE salary > (
SELECT AVG(salary)
FROM employees
WHERE department = e.department -- references outer query
);
EXISTS stops as soon as it finds one matching row (short-circuits), making it more efficient than IN when the subquery returns a large result set. IN is simpler for small, fixed lists.
-- IN: compares a value against a fixed list of results
SELECT name FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders WHERE amount > 500
);
-- EXISTS: checks whether the subquery returns any rows at all (returns TRUE/FALSE)
SELECT name FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id AND o.amount > 500
);
Also Read: 50+ Advanced SQL Interview Questions and Answers to Master SQL
Advanced SQL rounds go beyond syntax. Interviewers at this level want to see that you can write performant, readable queries for real-world analytical problems, ranking data, calculating running totals, working with hierarchies, and optimising queries that run against millions of rows.
The following advanced SQL interview questions for data analyst covers window functions, CTEs, query optimisation, and the patterns that come up most in senior and mid-level SQL data analysis interview questions.
Window functions perform calculations across a set of rows related to the current row without collapsing the table the way GROUP BY does. They are used heavily in analyst work for rankings, running totals, period-over-period comparisons, and percentile calculations.
Key Tip: Window functions do not reduce the number of rows returned. It is the most common point of confusion. Every row in the original dataset still appears in the output, the window function just adds a new calculated column alongside it.
A window function operates on a “window” of rows defined by an OVER() clause, calculating a value for each row based on its relationship to other rows in that window without grouping rows together. Think of it like this: GROUP BY collapses your table into summaries, but a window function lets each row keep its identity while still “looking at its neighbours.”
-- Average salary across the entire table, shown alongside each employee row
SELECT
name,
department,
salary,
AVG(salary) OVER() AS company_avg_salary
FROM employees;
All three assign a number to each row, but they handle ties differently.
| Function | Behaviour on Ties | Gaps in Sequence? |
| ROW_NUMBER() | Each tied row gets a unique, arbitrary number | No gaps |
| RANK() | Tied rows share the same rank; the next rank skips ahead | Yes, gaps after ties |
| DENSE_RANK() | Tied rows share the same rank; next rank is consecutive | No gaps |
SELECT
name,
salary,
ROW_NUMBER() OVER(ORDER BY salary DESC) AS row_num,
RANK() OVER(ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER(ORDER BY salary DESC) AS dense_rnk
FROM employees;
-- Example output for two employees tied at $90,000:
-- name | salary | row_num | rnk | dense_rnk
-- Alice | 90000 | 1 | 1 | 1
-- Bob | 90000 | 2 | 1 | 1
-- Carol | 85000 | 3 | 3 | 2 <-- RANK skips to 3; DENSE_RANK goes to 2
PARTITION BY divides the result set into groups for a window function to operate within without collapsing the rows. GROUP BY collapses rows into summaries.
| Feature | GROUP BY | PARTITION BY |
| Rows in output | One row per group | All original rows preserved |
| Used with | Aggregate functions | Window functions |
| Can show individual row details? | No | Yes |
Real-World Context: Used when ranking salespeople within each region, while still showing each salesperson’s individual row.
-- GROUP BY: one row per department, total salary
SELECT department, SUM(salary) AS dept_total
FROM employees
GROUP BY department;
-- PARTITION BY: all rows kept, each showing its department's total
SELECT
name,
department,
salary,
SUM(salary) OVER(PARTITION BY department) AS dept_total
FROM employees;
LAG accesses data from a previous row; LEAD accesses data from a following row. Both use an OVER() clause and are commonly used for period-over-period comparisons.
Real-World Context: Comparing this month’s revenue to last month’s in a financial dashboard.
SELECT
month,
revenue,
LAG(revenue, 1) OVER(ORDER BY month) AS prev_month_revenue,
LEAD(revenue, 1) OVER(ORDER BY month) AS next_month_revenue,
revenue - LAG(revenue, 1) OVER(ORDER BY month) AS mom_change
FROM monthly_revenue;
Use SUM() as a window function with ORDER BY inside the OVER() clause. It tells SQL to sum all rows from the start of the partition up to and including the current row.
Real-World Context: Cumulative revenue over time in a sales report.
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER(ORDER BY order_date) AS running_total
FROM daily_sales;
Use AVG() as a window function with a ROWS BETWEEN frame clause to define the rolling window.
Real-World Context: Smoothing daily traffic spikes in web analytics or financial dashboards.
SELECT
date,
daily_sessions,
AVG(daily_sessions) OVER(
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7day_avg
FROM web_traffic;
This frame says: “For each row, average the current row and the 6 rows before it”, a 7-day window.
NTILE(n) divides rows into n equal-sized groups (buckets) and assigns each row a bucket number from 1 to n.
Real-World Example: Splitting customers into quartiles (4 groups) by total spend to identify top spenders.
SELECT
customer_id,
total_spend,
NTILE(4) OVER(ORDER BY total_spend DESC) AS spend_quartile
FROM customer_summary;
-- Quartile 1 = top 25% spenders
CTEs are named, temporary result sets defined at the top of a query using the WITH keyword. They make complex queries dramatically easier to read by breaking logic into labelled steps, rather than nesting subquery upon subquery.
A CTE defines a named temporary result set that you can reference in the query that follows. Unlike a subquery, which is embedded inline, a CTE sits at the top of the query and can be referenced multiple times. This makes the query far easier to read, debug, and maintain.
-- Without CTE (hard to read)
SELECT customer_id, total_orders
FROM (
SELECT customer_id, COUNT(*) AS total_orders
FROM orders
GROUP BY customer_id
) sub
WHERE total_orders > 5;
-- With CTE (clear and readable)
WITH order_counts AS (
SELECT customer_id, COUNT(*) AS total_orders
FROM orders
GROUP BY customer_id
)
SELECT customer_id, total_orders
FROM order_counts
WHERE total_orders > 5;
A recursive CTE references itself, allowing it to iterate, useful for traversing hierarchical or graph-structured data.
Real-World Example: Navigating an org chart to find all reports under a given manager, regardless of depth.
WITH RECURSIVE org_chart AS (
-- Anchor: start with the top-level manager
SELECT employee_id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: join each employee to their manager found in the previous step
SELECT e.employee_id, e.name, e.manager_id, oc.level + 1
FROM employees e
INNER JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT * FROM org_chart ORDER BY level;
Note: Recursive CTEs are tested in senior-level interviews. Understanding the anchor/recursive member structure is key.
| Feature | CTE | Temporary Table |
| Lifespan | Exists only for the duration of the single query | Persists for the duration of the session |
| Storage | Not physically stored | Written to disk (tempdb or equivalent) |
| Reusability | Within a single query only | Can be queried multiple times across statements |
| Indexable | No | Yes |
| Best for | Complex single queries, improving readability | Multi-step ETL processes, reusing intermediate results |
Both produce intermediate result sets, but CTEs are named and defined upfront, making them far more readable when the same logic is needed more than once.
-- Same logic as a subquery (embedded, harder to read)
SELECT name FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders WHERE amount > 500
);
-- Same logic as a CTE (named, reusable, readable)
WITH high_value_orders AS (
SELECT customer_id FROM orders WHERE amount > 500
)
SELECT name FROM customers
WHERE customer_id IN (
SELECT customer_id FROM high_value_orders
);
For a single-use filter, a subquery is fine. When the same intermediate result is used in multiple places in a query, a CTE is cleaner and avoids repetition.
Query performance matters in analyst roles because slow queries delay dashboards, reports, and ad-hoc investigations. Understanding the basics of how SQL engines retrieve data and how to help them do it faster.
An index is a separate data structure that the database maintains to speed up data retrieval. Think of it like the index at the back of a book: instead of reading every page to find a topic, you go straight to the page numbers listed in the index. Without an index, the database performs a full table scan, reading every row which becomes very slow on large tables.
With an index on a frequently queried column (like customer_id or order_date), the database jumps directly to the relevant rows. The trade-off is that indexes consume storage and slightly slow down INSERT, UPDATE, and DELETE operations because the index must be updated too.
| Index Type | Description |
| Clustered Index | Physically reorders the table rows to match the index. One per table. Typically on the primary key. |
| Non-Clustered Index | A separate structure that points to row locations. Multiple allowed per table. |
| Unique Index | Enforces uniqueness on a column (often created automatically by PRIMARY KEY and UNIQUE constraints). |
| Composite Index | Index on multiple columns, most effective when queries filter on those columns in the same order. |
| Full-Text Index | Optimised for searching large text columns (e.g. product descriptions). |
| Partial / Filtered Index | Index on a subset of rows matching a condition, keeps the index small and targeted. |
The below mentioned SQL data analyst interview questions and answers cover hands-on write-the-query problems, the format used in live coding rounds, take-home assessments, and on-site whiteboard interviews.
| Column | Type | Description |
| order_id | INT | Primary key |
| customer_id | INT | Foreign key to customers |
| order_date | DATE | Date the order was placed |
| amount | DECIMAL | Order value |
Task: Return customer_id and the count of their orders for any customer who placed more than 3 orders in the last 30 days.
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY customer_id
HAVING COUNT(*) > 3
ORDER BY order_count DESC;
| Column | Type | Description |
| employee_id | INT | Primary key |
| name | VARCHAR | Employee name |
| salary | DECIMAL | Annual salary |
Task: Return the second highest distinct salary.
-- Approach 1: Using subquery with MAX
SELECT MAX(salary) AS second_highest_salary
FROM employees
WHERE salary < (
SELECT MAX(salary) FROM employees
);
-- Approach 2: Using DENSE_RANK (handles ties and is more scalable)
WITH ranked AS (
SELECT salary, DENSE_RANK() OVER(ORDER BY salary DESC) AS rnk
FROM employees
)
SELECT salary AS second_highest_salary
FROM ranked
WHERE rnk = 2
LIMIT 1;
| Column | Type | Description |
| customer_id | INT | Primary key |
| VARCHAR | Customer email address | |
| name | VARCHAR | Customer name |
Task: Find email addresses that appear more than once in the table.
Real-world context: Used to identify duplicate email addresses before a CRM migration or merge.
SELECT
email,
COUNT(*) AS occurrences
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;
-- To see the full rows, not just the duplicate emails:
SELECT *
FROM customers
WHERE email IN (
SELECT email
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
)
ORDER BY email;
| Column | Type | Description |
| employee_id | INT | Primary key |
| name | VARCHAR | Employee name |
| department | VARCHAR | Department name |
| salary | DECIMAL | Annual salary |
Task: Rank each employee by salary within their department. Highest salary = rank 1.
Real-world context: Commonly used in HR analytics dashboards to surface the highest earners per team.
SELECT
name,
department,
salary,
DENSE_RANK() OVER(
PARTITION BY department
ORDER BY salary DESC
) AS salary_rank
FROM employees
ORDER BY department, salary_rank;
| Column | Type | Description |
| month | DATE | First day of the month |
| revenue | DECIMAL | Total revenue for that month |
Task: Show each month’s revenue alongside the previous month’s, and calculate the absolute change.
Real-world context: This is a standard business reporting query used in finance and sales dashboards.
SELECT
month,
revenue,
LAG(revenue, 1) OVER(ORDER BY month) AS prev_month_revenue,
revenue - LAG(revenue, 1) OVER(ORDER BY month) AS mom_change,
ROUND(
100.0 * (revenue - LAG(revenue, 1) OVER(ORDER BY month))
/ NULLIF(LAG(revenue, 1) OVER(ORDER BY month), 0),
2
) AS pct_change
FROM monthly_revenue
ORDER BY month;
| Customers | Type | Description |
| customer_id | INT | Primary key |
| name | VARCHAR | Customer name |
| Orders | Type | Description |
| order_id | INT | Primary key |
| customer_id | INT | FK to customers |
| order_date | DATE | Order date |
Task: Return customers who have not placed any order in the last 90 days (including customers who have never ordered).
-- Anti-join approach using LEFT JOIN
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
AND o.order_date >= CURRENT_DATE - INTERVAL '90 days'
WHERE o.order_id IS NULL;
-- Alternative using NOT EXISTS
SELECT customer_id, name
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= CURRENT_DATE - INTERVAL '90 days'
);
| Column | Type | Description |
| product_id | INT | Primary key |
| product_name | VARCHAR | Product name |
| category | VARCHAR | Product category |
| revenue | DECIMAL | Total revenue |
Task: Return the top 3 products by revenue within each category.
WITH ranked_products AS (
SELECT
product_name,
category,
revenue,
DENSE_RANK() OVER(
PARTITION BY category
ORDER BY revenue DESC
) AS revenue_rank
FROM product_sales
)
SELECT product_name, category, revenue, revenue_rank
FROM ranked_products
WHERE revenue_rank <= 3
ORDER BY category, revenue_rank;
| Column | Type | Description |
| sale_date | DATE | Date of sales |
| daily_revenue | DECIMAL | Revenue for that day |
Task: Show each day’s revenue alongside the running cumulative total from the beginning of the dataset.
Real-world context: Used in financial and sales dashboards to track progress toward revenue targets.
SELECT
sale_date,
daily_revenue,
SUM(daily_revenue) OVER(
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue
FROM daily_sales
ORDER BY sale_date;
| Column | Type | Description |
| user_id | INT | User identifier |
| login_date | DATE | Date of login |
Task: Find users who logged in on at least two consecutive calendar days.
Real-world context: Used in retention and engagement analysis to identify active streaks.
WITH login_gaps AS (
SELECT
user_id,
login_date,
LAG(login_date) OVER(PARTITION BY user_id ORDER BY login_date) AS prev_login
FROM user_logins
)
SELECT DISTINCT user_id
FROM login_gaps
WHERE login_date - prev_login = 1;
-- In MySQL: DATEDIFF(login_date, prev_login) = 1
| Column | Type | Description |
| product | VARCHAR | Product name |
| month | VARCHAR | Month name (e.g. ‘Jan’, ‘Feb’) |
| revenue | DECIMAL | Revenue for that month |
Before (row format):
| Product | Month | Revenue |
| Widget A | Jan | 5000 |
| Widget A | Feb | 6000 |
| Widget B | Jan | 3000 |
After (column format):
| Product | Jan | Feb |
| Widget A | 5000 | 6000 |
| Widget B | 3000 | NULL |
Real-world context: Required when preparing data for a spreadsheet or executive report where each month needs its own column.
-- PostgreSQL / Standard SQL using conditional aggregation
SELECT
product,
SUM(CASE WHEN month = 'Jan' THEN revenue ELSE 0 END) AS Jan,
SUM(CASE WHEN month = 'Feb' THEN revenue ELSE 0 END) AS Feb,
SUM(CASE WHEN month = 'Mar' THEN revenue ELSE 0 END) AS Mar
FROM monthly_sales
GROUP BY product;
-- SQL Server supports PIVOT syntax:
SELECT product, [Jan], [Feb], [Mar]
FROM monthly_sales
PIVOT (
SUM(revenue)
FOR month IN ([Jan], [Feb], [Mar])
) AS pivot_table;
SQL interview questions for data analysts cover several core concepts such as joins, aggregations, and NULL handling, progressing into advanced technique
You must practise writing queries independently, recreate schemas, and solve problems without referring to answers to build true fluency under pressure.
For structured preparation, you may explore Interview Kickstart’s Data Analyst Interview Prep program for guided practice and feedback.
The four categories that appear most consistently are joins (especially INNER JOIN and LEFT JOIN), aggregation functions with GROUP BY and HAVING, window functions (particularly ROW_NUMBER, RANK, and LAG/LEAD), and query logic problems involving subqueries or CTEs. Most interviews layer these together, for example, asking you to rank or aggregate within groups, so practising combinations is more effective than drilling each topic in isolation.
Aim to thoroughly understand around 50-70 questions across all difficulty levels rather than skimming through 200. Depth matters more than breadth in interviews. Interviewers follow up and probe your reasoning. If you can write, explain, and optimise queries for the 20 most common patterns cold, you will be better prepared than someone who has read 100 questions without writing any.
SQL is necessary but not sufficient on its own. Most analyst roles also expect proficiency in at least one data visualisation tool (Tableau, Power BI, or Looker), familiarity with spreadsheets, and increasingly some Python or R for statistical analysis. SQL handles the data extraction and transformation layer; the other tools handle the communication of results. Demonstrating all three in a portfolio project is the strongest signal you can give a hiring team.
Entry-level roles typically expect solid command of SELECT queries with WHERE, GROUP BY, ORDER BY, and HAVING; all four standard join types; basic aggregate functions; and NULL handling. You may also be asked to write a simple subquery. Window functions and CTEs are increasingly common even at entry level, especially in tech companies, so covering them in your preparation is worthwhile even if you are not required to have deep experience with them.
The most effective preparation combines writing queries on real datasets with reviewing worked examples. Use platforms like LeetCode (Database section), Mode Analytics, or Kaggle SQL labs to write queries against actual tables, not just read answers. Focus on getting queries wrong the first time and debugging them; that process mirrors what happens in live interviews. Time yourself on medium-difficulty problems to simulate interview pressure.
Yes. Window functions appear in a large proportion of mid-to-senior analyst interviews and are increasingly common even at entry level. The functions that come up most are ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), and SUM() OVER() for running totals. If you can explain the difference between PARTITION BY and GROUP BY and write a clean ranking query from scratch, you will handle the majority of window function questions.
Related Reads:
Time Zone:
Master ML interviews with DSA, ML System Design, Supervised/Unsupervised Learning, DL, and FAANG-level interview prep.
Get strategies to ace TPM interviews with training in program planning, execution, reporting, and behavioral frameworks.
Course covering SQL, ETL pipelines, data modeling, scalable systems, and FAANG interview prep to land top DE roles.
Course covering Embedded C, microcontrollers, system design, and debugging to crack FAANG-level Embedded SWE interviews.
Nail FAANG+ Engineering Management interviews with focused training for leadership, Scalable System Design, and coding.
End-to-end prep program to master FAANG-level SQL, statistics, ML, A/B testing, DL, and FAANG-level DS interviews.
Learn to build AI agents to automate your repetitive workflows
Upskill yourself with AI and Machine learning skills
Prepare for the toughest interviews with FAANG+ mentorship
Time Zone:
Join 25,000+ tech professionals who’ve accelerated their careers with cutting-edge AI skills
25,000+ Professionals Trained
₹23 LPA Average Hike 60% Average Hike
600+ MAANG+ Instructors
Webinar Slot Blocked
Register for our webinar
Learn about hiring processes, interview strategies. Find the best course for you.
ⓘ Used to send reminder for webinar
Time Zone: Asia/Kolkata
Time Zone: Asia/Kolkata
Hands-on AI/ML learning + interview prep to help you win
Explore your personalized path to AI/ML/Gen AI success
The 11 Neural “Power Patterns” For Solving Any FAANG Interview Problem 12.5X Faster Than 99.8% OF Applicants
The 2 “Magic Questions” That Reveal Whether You’re Good Enough To Receive A Lucrative Big Tech Offer
The “Instant Income Multiplier” That 2-3X’s Your Current Tech Salary
Join 25,000+ tech professionals who’ve accelerated their careers with cutting-edge AI skills
Join 25,000+ tech professionals who’ve accelerated their careers with cutting-edge AI skills
Webinar Slot Blocked
Time Zone: Asia/Kolkata
Hands-on AI/ML learning + interview prep to help you win
Time Zone: Asia/Kolkata
Hands-on AI/ML learning + interview prep to help you win
Explore your personalized path to AI/ML/Gen AI success
See you there!