Which other languages apart from SQL are used for data analysis?

Apart from SQL, some other languages used by data analysts include – Python, R, Java, and C++.

What is the average data analyst salary in the US?

The average data analyst salary in the US is $62,789.

What is the typical interview process for data analysts at top companies?

The typical interview process for data analysts at top companies includes 1-2 rounds in coding and a data-based systems design round. Depending on the position you're applying to, there could be additional domain-specific rounds.

Which are some important concepts and technologies for data analyst interviews?

Some important technologies and concepts to prepare for data analyst interviews include – big data technologies, open-source software environments, core data structures and algorithms concepts, databases, and servers, operating systems.

In what type of interviews can you expect SQL interview questions?

SQL interview questions primarily feature in software developer interviews, data analyst interviews, data science, data engineering, full-stack, and backend engineering interviews.

Top SQL Interview Questions for Data Analysts

Last updated by Abhinav Rawat on Apr 23, 2026 at 11:50 PM

| Reading Time: 3 minute

Article written by Rishabh Dev Choudhary, under the guidance of Lakshmi Pavan Daggubati, a Data Science Manager at Meta. Reviewed by Vishal Rana, a versatile ML Engineer with deep expertise in data engineering, big data pipelines, advanced analytics, and AI-driven solutions..

Last updated on Apr 23, 2026 at 11:50 PM

| Reading Time: 3 minutes

The growing demand for data analysts, driven by automation and large-scale digital transformation, has positioned this profession among the most secure and future-focused career paths in today’s data economy. Preparation to SQL interv‌ie⁠w ⁠que⁠stions and answers are central‌ to d‍ata analyst hiring because SQL remains th‍e most tested technical skill across analytics, data engineering, and dat⁠abase‍ roles.

As organisations become increasingly data ⁠dri⁠ven, de⁠mand‌ for ‍analysts who ‌can query, transform, and interpret data efficiently conti‌nues to rise.
‌‌In many technical interviews, SQL interview ques‌tions for data analysts form the backbone of the screening process, evaluating both‍ conceptual und‍erstanding and hands-on problem-solvi‍ng ability.

This guide compiles over 50 carefully selected ‍ques⁠tion‍s⁠ spann⁠ing ⁠basic SQL fundamentals such as join‍s‍, aggreg‍ations, filtering, ‌and NULL handling, progressing‌ t⁠o advanced queries with windo‍w functions, CTEs, and optimisation, and c‍oncluding wi⁠th real-world coding p⁠roblems modelled on l⁠ive i⁠nterview round⁠s‌.

The aspiring candidates preparing for SQL interview questions for data analysts can benefit significantly from this detailed and structured SQL interv‌ie⁠w ⁠que⁠stions and answers to strengthen fundamentals, refine practical skills, and approach interviews with clarity.

Table of Contents

Key Takeaways
Basic SQL Interview Questions for Data Analysts
Advanced SQL Interview Questions for Experienced Data Analysts
SQL Query Interview Questions
Conclusion
FAQs: SQL Interview Questions for Data Analysts

Key Takeaways

SQL interview questions for data analysts assess conceptual clarity and practical skills, covering joins, aggregations, filtering, NULL handling, and data integrity to evaluate real-world problem-solving ability.
Learn how the four JOIN ⁠types (INNER, LEF‌T, RIGHT, and FULL OUTER) differ in how they handle unmatched rows, and why choosing t‍he c⁠orrect J‍OIN is critical to producing ⁠accurate and meaningful query result‍s.
R‍ecognise⁠ that WHERE and⁠ HAVING serv‍e⁠ different purpos‌es: W⁠HERE filters individual rows before aggregation, whil⁠‌e HAVING filters grouped r‌esults after ⁠aggreg‍ati⁠on, a distinction ⁠that appears in nearly e‍very‌ SQL screening.
Di‍scover that win‍dow‌ fun⁠ctions such as ROW_NUMBER, RANK, LAG, and LEAD are tested at a‍ll‌ levels and are preferr‍e‌d over correlated su‌bq‌ueries for rankings, running tot‌als,‌ a⁠nd perio‍d-ov‌er-pe⁠riod comparisons due ⁠to their superior performance.
Learn‌ how CTEs impro‌ve query readability‌ and maintainability ‍by breaking complex log⁠ic into named, reusabl‍e steps, and un⁠derstand wh‍en t⁠o‍ use a CTE versus a subquery, a temporary table,‍ or‍ a view.

Basic SQL Interview Question⁠s for Data Analysts

Basic ‌SQL round‍s eva‌luate whether you p⁠oss‌ess a dependable‌ workin⁠g fo‍undation that r⁠eflect‍s ‌real on-the-job da‌ta work. Interviewers commonly assess your command of joins, aggreg⁠ations⁠, filtering l⁠ogic, NULL handling, and ‍data‌ integ‌rity principles suc⁠h as primary ‍keys, foreign keys, and cons⁠traints.

These topics fre‍quently ap‍pear in SQL interview quest‌ions ⁠for data analysts b‌ecause they test both conc‌eptua⁠l clari‍ty ‌and practical⁠ execution. This section organises questions by subtopic, allowing you to strengthe‌n each core area‌ methodicall‍y and i‍dentify knowledge gap‌s‌ before e⁠ntering the interview room.

What Is SQL and Why Do ‍Data ‌A‍nalysts Use It?

SQL (Structured‍ Query Language) is the standard language for querying and m‍a⁠naging data ⁠stored in relational databases. ‍A DBMS (Database Man‍agem‌en‍t ‌System) is software that allows you to ‍store, retr⁠ieve, and manipulate data, while an R⁠DBMS (R⁠elational Database Management‍ S‍ystem) is a type of DBM⁠S that organises‌ d‌at‍a into ⁠structured tables w‌ith defined rel⁠ationships between them.

SQL is t‌h‍e most widely required technical skill in analyst job descriptions because it sits at the intersection of the data and the business question being answered.

What Is‍ the Differ‍ence Between a Database, a Table, and a Query?

To understand h‌ow SQL works, you first need clarity on the basic structural layers of a da‌ta‍ system.

Database‍: Think of a database as a filing cabinet. It is the cont‌ainer that hol‍ds all of your orga‌nised information. A company might have one database calle‌d sales_db that stores everything related to sales operations.
Table: A tabl⁠e ‍is like a single folde‌r insid‌e that cabinet. It holds one specific type of data in rows and columns. The orders table, for example, might have one row per order‌ wit‌h columns for order ID, customer ID, date, a‌nd amount.
Que‌ry: A query is the request you ⁠make to retrie‍ve or manipulate data from the database. It is the instruction you hand‍ to the filing clerk: “Bring me all or‌d⁠ers placed⁠ in the last 30 days ‍where the amount is above $500.”

Joins

Joins are used to combine rows from two or more tables based on a related column. Data analysts rely on joins constantly, for example, linking a customers table to an orders table to see which customers made purchases.

Q1. What Is a JOIN in SQL, and Why Is It Used?

A JOIN c‌ombines rows from two or more tables by linking them through a shared column, typically where the primary key in one table corresponds to the foreign⁠ key in another. With‍out using joins, you would need to execute separate queries for each table and then manually match the related r‍esults yourself.

How do two tables connect?


customers table          orders table
+-------------+          +----------+-------------+
| customer_id |<─────────| order_id | customer_id |
| name        |          | amount   |             |
| email       |          | order_date            |
+-------------+          +----------+-------------+

Shared column: customer_id

Example

SELECT c.name, o.order_id, o.amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;

Q2. What Is the Difference Between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN?

Although all JOIN operations combine data from related tables, they differ in how they treat unmatched rows. Some return only matching records, while others also include non-matching rows from one or both tables⁠. Choosing the correct JOIN ensures accurate and meaningful query results.

To clearly understand how these JOIN t⁠ypes differ, let us examine what each one returns and the ‌situations where it is most appropriately use‍d. The comparison below highlights how each JOIN handles matching and non matching rows between tables.

Join Type	What It Returns	When to Use It
INNER JOIN	Only rows where the join condition matches in both tables	When you only want records that exist in both tables
LEFT JOIN	All rows from the left table, plus matching rows from the right (NULLs where no match)	When you want all records from the main table, regardless of whether a match exists
RIGHT JOIN	All rows from the right table, plus matching rows from the left (NULLs where no match)	Less common; same as LEFT JOIN with table order reversed
FULL OUTER JOIN	All rows from both tables; NULLs fill the unmatched sides	When you want a complete picture with no records dropped

-- INNER JOIN
SELECT c.name, o.order_id
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

-- LEFT JOIN
SELECT c.name, o.order_id
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;

-- FULL OUTER JOIN
SELECT c.name, o.order_id
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;

Q3. What Is a SELF JOIN and When Would You Use One?

A SELF JOIN joins a table to itself. This is useful when rows in the same table have a relationship with each other, for example, an employees table where each employee has a manager_id that references another row in the same table.

Example: Finding each employee and their manager:


SELECT
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;

Q4. What is a CROSS JOIN?

A CROSS JOIN returns the Cartesian product of two tables, every row in the first table is paired with every row in the second. If table A has 3 rows and table B has 4 rows, the result has 12 rows.

Common use case: Generating all possible combinations. For example, pairing every product with every store location to create a template for sales reporting.

SELECT p.product_name, s.store_name
FROM products p
CROSS JOIN stores s;

Q5. What Is the Difference Between JOIN and UNION?

Feature	JOIN	UNION
Direction	Combines columns from multiple tables	Combines rows from multiple queries
Requirement	Tables share a related column	Queries must return the same number of columns with compatible data types
Duplicates	Not a concern in the same way	UNION removes duplicates; UNION ALL keeps them
Use case	Enriching a dataset with additional columns	Stacking similar datasets from different sources

Aggregations

Aggregate functions summarise multiple rows into a single value. In data analysis, they are the core of any reporting query, calculating totals, averages, counts, and finding extremes across groups of data.

Q6. What Are Aggregate Functions in SQL? Name the Most Common Ones.

Aggregate functions perform a calculation across a set of rows and return a single result.

Function	What It Does
COUNT()	Counts the number of rows
SUM()	Adds up all values in a column
AVG()	Returns the average value
MIN()	Returns the smallest value
MAX()	Returns the largest value

SELECT
    COUNT(*) AS total_orders,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value,
    MIN(amount) AS smallest_order,
    MAX(amount) AS largest_order
FROM orders;

Q7. What Is the Difference Between WHERE and HAVING?

This is one of the most frequently tested basic SQL questions. The key distinction is when the filtering happens.

Feature	WHERE	HAVING
Filters	Individual rows	Grouped results
Used with	Any query	Must follow GROUP BY
Can reference aggregate functions?	No	Yes
Executes	Before aggregation	After aggregation


-- WHERE filters rows before grouping
SELECT region, SUM(amount) AS total_sales
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY region;

-- HAVING filters groups after aggregation
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
HAVING SUM(amount) > 100000;

Q8. What Is GROUP BY and How Does It Work?

GROUP BY collapses all rows that share the same value in one or more columns into a single summary row. Any column in the SELECT list that is not inside an aggregate function must appear in the GROUP BY clause.

Example: Total sales by region:


SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
ORDER BY total_sales DESC;

Q9. How Do You Count Distinct Values in a Column?

Use COUNT(DISTINCT column_name) to count unique values, excluding duplicates and NULLs.

SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;

It is particularly useful in data analysis when you need to know how many unique users, products, or events exist in a dataset, rather than the raw row count which may include repeats.

Q10. What Is the Difference Between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?

Variation	What It Counts	Includes NULLs
COUNT(*)	All rows, including those with NULL values in any column	Yes
COUNT(column)	Only rows where the specified column is NOT NULL	No
COUNT(DISTINCT column)	Only unique non-NULL values in the specified column	No


-- Counts all rows
SELECT COUNT(*) FROM orders;

-- Counts only rows where email is not NULL
SELECT COUNT(email) FROM customers;

-- Counts unique non-NULL product categories
SELECT COUNT(DISTINCT category) FROM products;

Filtering and Sorting

Filtering and sorting determine which rows your query returns and the order in which they appear, making them foundational operations that are tested in almost every SQL screening.

Q11. How Do You Filter Rows in SQL?

Use the WHERE clause to filter rows based on one or more conditions. The WHERE clause is evaluated before any aggregation.


-- Single condition
SELECT * FROM orders WHERE amount > 500;

-- Multiple conditions
SELECT * FROM orders
WHERE amount > 500
  AND order_date >= '2025-01-01';

-- Using OR
SELECT * FROM customers
WHERE country = 'India' OR country = 'Singapore';

12. What Is the Difference Between BETWEEN, IN, and LIKE Operators?

Operator	Purpose	Example
BETWEEN	Filters rows within a range (inclusive)	amount BETWEEN 100 AND 500
IN	Filters rows matching any value in a list	country IN (‘India’, ‘US’, ‘UK’)
LIKE	Filters rows matching a pattern	name LIKE ‘A%’


-- BETWEEN
SELECT * FROM orders WHERE amount BETWEEN 100 AND 500;

-- IN
SELECT * FROM customers WHERE country IN ('India', 'US', 'UK');

-- LIKE (% = any sequence of characters, _ = single character)
SELECT * FROM customers WHERE name LIKE 'A%';
SELECT * FROM products WHERE code LIKE 'SKU_001';

Q13. How Do You Sort Query Results in SQL?

Use ORDER BY to sort results. By default, sorting is ascending (smallest to largest). Add DESC for descending order.


-- Ascending (default)
SELECT name, salary FROM employees ORDER BY salary;

-- Descending
SELECT name, salary FROM employees ORDER BY salary DESC;

-- Sort by multiple columns
SELECT department, name, salary
FROM employees
ORDER BY department ASC, salary DESC;

Q14. What Is the Difference Between LIMIT and TOP?

Both restrict the number of rows returned, but they belong to different SQL dialects.

Keyword	Database	Syntax
LIMIT	MySQL, PostgreSQL, SQLite	At the end of the query
TOP	SQL Server, MS Access	At the start of the SELECT clause


-- PostgreSQL / MySQL
SELECT * FROM orders ORDER BY amount DESC LIMIT 10;

-- SQL Server
SELECT TOP 10 * FROM orders ORDER BY amount DESC;

NULL Handling

NULL values represent missing or unknown data. NULLs can silently distort aggregations and comparisons if not handled correctly; for example, AVG() ignores NULLs, which may give a misleading average if many values are missing.

Q15. What Is a NULL Value in SQL, and How Is It Different From Zero or an Empty String?

NULL means the value is unknown or absent, it is not the same as 0 (which is a real number) or ” (which is a real but empty string). Think of it like a blank cell on a form versus a cell where someone wrote “0” or “N/A.”

0 is a number with a known value
” is a string with zero characters, it exists, it just has no content
NULL means the value was never recorded or is not applicable

The distinction matters because NULL behaves differently in comparisons and arithmetic, any arithmetic operation involving NULL returns NULL.

Q16. How Do You Check for NULL Values in a Query?

Use IS NULL or IS NOT NULL. Never use = NULL. It is a very common mistake that always returns no results because NULL cannot be equal to anything, including itself.


-- Correct
SELECT * FROM customers WHERE phone IS NULL;
SELECT * FROM customers WHERE phone IS NOT NULL;

-- WRONG -- this returns no rows
SELECT * FROM customers WHERE phone = NULL;

Q17. What Is COALESCE and When Would You Use It?

COALESCE returns the first non-NULL value in a list of arguments. It is widely used in data cleaning to replace NULLs with a default or fallback value.

Example: Replacing NULL phone numbers with a placeholder:


SELECT
    name,
    COALESCE(phone, 'No phone on record') AS contact_number
FROM customers;

-- Also useful in calculations to avoid NULL propagation
SELECT
    order_id,
    amount + COALESCE(discount, 0) AS adjusted_amount
FROM orders;

18. What Is the Difference Between NULLIF and COALESCE?

Function	What It Does
COALESCE(a, b)	Returns the first non-NULL value from the list
NULLIF(a, b)	Returns NULL if a equals b; otherwise returns a

NULLIF is often used to avoid division-by-zero errors.


-- COALESCE: replace NULL with a default
SELECT COALESCE(NULL, 0);       -- returns 0
SELECT COALESCE(NULL, NULL, 5); -- returns 5

-- NULLIF: return NULL when values match (prevents divide-by-zero)
SELECT total_revenue / NULLIF(total_orders, 0) AS avg_order_value
FROM summary;

Constraints and Keys

Constraints enforce rules on the data stored in a table, ensuring accuracy and consistency. Understanding them shows interviewers that you understand data integrity, not just how to query data, but how it is structured.

Q19. What Is a Primary Key?

A primary key is a column (or combination of columns) that uniquely identifies each row in a table. No two rows can share the same primary key value, and a primary key column cannot contain NULL.

Example: In a customers table, customer_id is typically the primary key. Every customer gets a unique ID, and no customer record can exist without one.

Q20. What Is a Foreign Key?

A foreign key is a column in one table that references the primary key of another table, establishing a link between the two. It enforces referential integrity, you cannot insert a foreign key value that does not exist in the referenced table.

Example:


customers table              orders table
+-------------+              +-----------+-------------+
| customer_id |<─────────── | order_id  | customer_id |
| name        |              | amount    |             |
+-------------+              +-----------+-------------+

PRIMARY KEY                            FOREIGN KEY

An orders row with customer_id = 999 cannot exist unless a customer with ID 999 exists in the customers table.

Q21. What Is the Difference Between a Primary Key and a Unique Key?

Feature	Primary Key	Unique Key
Uniqueness	Must be unique	Must be unique
NULL values	Not allowed	Allowed (usually one NULL per column)
Number per table	Only one	Multiple unique keys allowed
Purpose	Uniquely identifies each row	Ensures no duplicates in a column

Q22. What Are SQL Constraints? Name the Most Common Ones.

Constraints are rules applied to columns to limit the type of data that can be entered.

Constraint	What It Enforces
PRIMARY KEY	Unique, non-NULL identifier for each row
FOREIGN KEY	References a valid value in another table
UNIQUE	No duplicate values in the column
NOT NULL	Column cannot be left empty
CHECK	Values must satisfy a specified condition (e.g. age > 0)
DEFAULT	Assigns a default value when none is provided

Normalization

Normalization is the process of organising a database to reduce redundancy and improve data integrity. A properly normalised schema makes data easier to maintain and less prone to update anomalies.

Q23. What Is Normalization in SQL?

Normalization means restructuring tables so that each piece of information is stored in only one place. For example, instead of storing a customer’s name and address in every order row, you split the data into a customers table and an orders table linked by a foreign key. It prevents inconsistencies, if the customer’s address changes, you update it in one place, not across hundreds of order rows.

Q24. What Are the Different Normal Forms? Explain 1NF, 2NF, and 3NF Simply.

Normal Form	Rule (Plain Language)	What It Fixes
1NF – First Normal Form	Each column holds a single, atomic value. No repeating groups.	A phone_numbers column storing multiple numbers as “123, 456”, split into separate rows or a separate table.
2NF – Second Normal Form	Must be in 1NF. Every non-key column depends on the whole primary key, not part of it.	Applies to composite keys. If a table uses (order_id, product_id) as a key, a column like customer_name that only depends on order_id violates 2NF, move it to an orders table.
3NF – Third Normal Form	Must be in 2NF. Non-key columns must depend only on the primary key, not on other non-key columns.	If a table stores zip_code and city, and city is determined by zip_code rather than the primary key, move them into a zip_codes lookup table.

Q25. What Is the Difference Between Normalization and Denormalization?

Feature	Normalization	Denormalization
Goal	Reduce redundancy, ensure integrity	Improve read performance
Data duplication	Minimised	Intentionally introduced
Write operations	Faster, fewer anomalies	Slower, more data to maintain
Read operations	More joins required	Fewer joins, simpler queries
Best for	Transactional systems (OLTP)	Reporting and analytics (OLAP)

In data analyst workflows, denormalization is common in data warehouses where fast reads matter more than write efficiency.

Subqueries

A subquery is a query nested inside another query. Subqueries allow you to use the result of one query as input for another.

Q26. What Is a Subquery in SQL?

A subquery (also called an inner query or nested query) is a SELECT statement written inside another SQL statement. It runs first, and its result is used by the outer query.

When to use a subquery vs. a join: Subqueries are often more readable for filtering against aggregate results. Joins are generally preferred for combining tables for performance reasons.


-- Find all customers who have placed at least one order above $1000
SELECT name
FROM customers
WHERE customer_id IN (
    SELECT customer_id
    FROM orders
    WHERE amount > 1000
);

Q27. What Is the Difference Between a Correlated and a Non-Correlated Subquery?

Feature	Non-Correlated	Correlated
Dependency on outer query	Independent, runs once	Depends on the outer query, runs once per row
Performance	Generally faster	Can be slow on large datasets
Typical use	Filtering against a fixed set	Row-level comparisons


-- Non-correlated: inner query runs once and returns a fixed list
SELECT name FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders);

-- Correlated: inner query references the outer query's current row
SELECT name FROM employees e
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department = e.department  -- references outer query
);

Q28. What Is the Difference Between EXISTS and IN?

EXISTS stops as soon as it finds one matching row (short-circuits), making it more efficient than IN when the subquery returns a large result set. IN is simpler for small, fixed lists.


-- IN: compares a value against a fixed list of results
SELECT name FROM customers
WHERE customer_id IN (
    SELECT customer_id FROM orders WHERE amount > 500
);

-- EXISTS: checks whether the subquery returns any rows at all (returns TRUE/FALSE)
SELECT name FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id AND o.amount > 500
);

Also Read: 50+ Advanced SQL Interview Questions and Answers to Master SQL

Advanced SQL Interview Questions for Experienced Data Analysts

Advanced SQL rounds go beyond syntax. Interviewers at this level want to see that you can write performant, readable queries for real-world analytical problems, ranking data, calculating running totals, working with hierarchies, and optimising queries that run against millions of rows.

The following advanced SQL interview questions for data analyst covers window functions, CTEs, query optimisation, and the patterns that come up most in senior and mid-level SQL data analysis interview questions.

Window Functions

Window functions perform calculations across a set of rows related to the current row without collapsing the table the way GROUP BY does. They are used heavily in analyst work for rankings, running totals, period-over-period comparisons, and percentile calculations.

Key Tip: Window functions do not reduce the number of rows returned. It is the most common point of confusion. Every row in the original dataset still appears in the output, the window function just adds a new calculated column alongside it.

Q29. What Is a Window Function in SQL?

A window function operates on a “window” of rows defined by an OVER() clause, calculating a value for each row based on its relationship to other rows in that window without grouping rows together. Think of it like this: GROUP BY collapses your table into summaries, but a window function lets each row keep its identity while still “looking at its neighbours.”


-- Average salary across the entire table, shown alongside each employee row
SELECT
    name,
    department,
    salary,
    AVG(salary) OVER() AS company_avg_salary
FROM employees;

Q30. What Is the Difference Between ROW_NUMBER, RANK, and DENSE_RANK?

All three assign a number to each row, but they handle ties differently.

Function	Behaviour on Ties	Gaps in Sequence?
ROW_NUMBER()	Each tied row gets a unique, arbitrary number	No gaps
RANK()	Tied rows share the same rank; the next rank skips ahead	Yes, gaps after ties
DENSE_RANK()	Tied rows share the same rank; next rank is consecutive	No gaps


SELECT
    name,
    salary,
    ROW_NUMBER() OVER(ORDER BY salary DESC) AS row_num,
    RANK()       OVER(ORDER BY salary DESC) AS rnk,
    DENSE_RANK() OVER(ORDER BY salary DESC) AS dense_rnk
FROM employees;

-- Example output for two employees tied at $90,000:
-- name    | salary | row_num | rnk | dense_rnk
-- Alice   | 90000  |   1     |  1  |    1
-- Bob     | 90000  |   2     |  1  |    1
-- Carol   | 85000  |   3     |  3  |    2  <-- RANK skips to 3; DENSE_RANK goes to 2

Q31. What Is PARTITION BY and How Does It Differ From GROUP BY?

PARTITION BY divides the result set into groups for a window function to operate within without collapsing the rows. GROUP BY collapses rows into summaries.

Feature	GROUP BY	PARTITION BY
Rows in output	One row per group	All original rows preserved
Used with	Aggregate functions	Window functions
Can show individual row details?	No	Yes

Real-World Context: Used when ranking salespeople within each region, while still showing each salesperson’s individual row.


-- GROUP BY: one row per department, total salary
SELECT department, SUM(salary) AS dept_total
FROM employees
GROUP BY department;

-- PARTITION BY: all rows kept, each showing its department's total
SELECT
    name,
    department,
    salary,
    SUM(salary) OVER(PARTITION BY department) AS dept_total
FROM employees;

Q32. What Is the Difference Between LAG and LEAD?

LAG accesses data from a previous row; LEAD accesses data from a following row. Both use an OVER() clause and are commonly used for period-over-period comparisons.

Real-World Context: Comparing this month’s revenue to last month’s in a financial dashboard.


SELECT
    month,
    revenue,
    LAG(revenue, 1) OVER(ORDER BY month) AS prev_month_revenue,
    LEAD(revenue, 1) OVER(ORDER BY month) AS next_month_revenue,
    revenue - LAG(revenue, 1) OVER(ORDER BY month) AS mom_change
FROM monthly_revenue;

Q33. How Do You Calculate a Running Total in SQL?

Use SUM() as a window function with ORDER BY inside the OVER() clause. It tells SQL to sum all rows from the start of the partition up to and including the current row.

Real-World Context: Cumulative revenue over time in a sales report.


SELECT
    order_date,
    daily_revenue,
    SUM(daily_revenue) OVER(ORDER BY order_date) AS running_total
FROM daily_sales;

Q34. How Do You Calculate a 7-Day Rolling Average in SQL?

Use AVG() as a window function with a ROWS BETWEEN frame clause to define the rolling window.

Real-World Context: Smoothing daily traffic spikes in web analytics or financial dashboards.


SELECT
    date,
    daily_sessions,
    AVG(daily_sessions) OVER(
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS rolling_7day_avg
FROM web_traffic;

This frame says: “For each row, average the current row and the 6 rows before it”, a 7-day window.

Q35. What Is NTILE and When Would You Use It?

NTILE(n) divides rows into n equal-sized groups (buckets) and assigns each row a bucket number from 1 to n.

Real-World Example: Splitting customers into quartiles (4 groups) by total spend to identify top spenders.


SELECT
    customer_id,
    total_spend,
    NTILE(4) OVER(ORDER BY total_spend DESC) AS spend_quartile
FROM customer_summary;

-- Quartile 1 = top 25% spenders

Common Table Expressions (CTEs)

CTEs are named, temporary result sets defined at the top of a query using the WITH keyword. They make complex queries dramatically easier to read by breaking logic into labelled steps, rather than nesting subquery upon subquery.

Q36. What Is a CTE (Common Table Expression)?

A CTE defines a named temporary result set that you can reference in the query that follows. Unlike a subquery, which is embedded inline, a CTE sits at the top of the query and can be referenced multiple times. This makes the query far easier to read, debug, and maintain.


-- Without CTE (hard to read)
SELECT customer_id, total_orders
FROM (
    SELECT customer_id, COUNT(*) AS total_orders 
    FROM orders 
    GROUP BY customer_id
) sub
WHERE total_orders > 5;

-- With CTE (clear and readable)
WITH order_counts AS (
    SELECT customer_id, COUNT(*) AS total_orders
    FROM orders
    GROUP BY customer_id
)
SELECT customer_id, total_orders
FROM order_counts
WHERE total_orders > 5;

Q37. What Is a Recursive CTE and When Would You Use One?

A recursive CTE references itself, allowing it to iterate, useful for traversing hierarchical or graph-structured data.

Real-World Example: Navigating an org chart to find all reports under a given manager, regardless of depth.


WITH RECURSIVE org_chart AS (
    -- Anchor: start with the top-level manager
    SELECT employee_id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: join each employee to their manager found in the previous step
    SELECT e.employee_id, e.name, e.manager_id, oc.level + 1
    FROM employees e
    INNER JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT * FROM org_chart ORDER BY level;

Note: Recursive CTEs are tested in senior-level interviews. Understanding the anchor/recursive member structure is key.

Q38. What Is the Difference Between a CTE and a Temporary Table?

Feature	CTE	Temporary Table
Lifespan	Exists only for the duration of the single query	Persists for the duration of the session
Storage	Not physically stored	Written to disk (tempdb or equivalent)
Reusability	Within a single query only	Can be queried multiple times across statements
Indexable	No	Yes
Best for	Complex single queries, improving readability	Multi-step ETL processes, reusing intermediate results

Q39. What Is the Difference Between a CTE and a Subquery?

Both produce intermediate result sets, but CTEs are named and defined upfront, making them far more readable when the same logic is needed more than once.


-- Same logic as a subquery (embedded, harder to read)
SELECT name FROM customers
WHERE customer_id IN (
    SELECT customer_id FROM orders WHERE amount > 500
);

-- Same logic as a CTE (named, reusable, readable)
WITH high_value_orders AS (
    SELECT customer_id FROM orders WHERE amount > 500
)
SELECT name FROM customers
WHERE customer_id IN (
    SELECT customer_id FROM high_value_orders
);

For a single-use filter, a subquery is fine. When the same intermediate result is used in multiple places in a query, a CTE is cleaner and avoids repetition.

Query Optimisation

Query performance matters in analyst roles because slow queries delay dashboards, reports, and ad-hoc investigations. Understanding the basics of how SQL engines retrieve data and how to help them do it faster.

Q40. What Is an Index in SQL and Why Does It Matter?

An index is a separate data structure that the database maintains to speed up data retrieval. Think of it like the index at the back of a book: instead of reading every page to find a topic, you go straight to the page numbers listed in the index. Without an index, the database performs a full table scan, reading every row which becomes very slow on large tables.

With an index on a frequently queried column (like customer_id or order_date), the database jumps directly to the relevant rows. The trade-off is that indexes consume storage and slightly slow down INSERT, UPDATE, and DELETE operations because the index must be updated too.

41. What Are the Different Types of Indexes?

Index Type	Description
Clustered Index	Physically reorders the table rows to match the index. One per table. Typically on the primary key.
Non-Clustered Index	A separate structure that points to row locations. Multiple allowed per table.
Unique Index	Enforces uniqueness on a column (often created automatically by PRIMARY KEY and UNIQUE constraints).
Composite Index	Index on multiple columns, most effective when queries filter on those columns in the same order.
Full-Text Index	Optimised for searching large text columns (e.g. product descriptions).
Partial / Filtered Index	Index on a subset of rows matching a condition, keeps the index small and targeted.

SQL Query Interview Questions

The below mentioned SQL data analyst interview questions and answers cover hands-on write-the-query problems, the format used in live coding rounds, take-home assessments, and on-site whiteboard interviews.

Q42. Find All Customers Who Placed More Than 3 Orders in the Last 30 Days

Column	Type	Description
order_id	INT	Primary key
customer_id	INT	Foreign key to customers
order_date	DATE	Date the order was placed
amount	DECIMAL	Order value

Task: Return customer_id and the count of their orders for any customer who placed more than 3 orders in the last 30 days.


SELECT
    customer_id,
    COUNT(*) AS order_count
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY customer_id
HAVING COUNT(*) > 3
ORDER BY order_count DESC;

Q43. Write a Query to Find the Second Highest Salary in a Table

Column	Type	Description
employee_id	INT	Primary key
name	VARCHAR	Employee name
salary	DECIMAL	Annual salary

Task: Return the second highest distinct salary.


-- Approach 1: Using subquery with MAX
SELECT MAX(salary) AS second_highest_salary
FROM employees
WHERE salary < (
    SELECT MAX(salary) FROM employees
);

-- Approach 2: Using DENSE_RANK (handles ties and is more scalable)
WITH ranked AS (
    SELECT salary, DENSE_RANK() OVER(ORDER BY salary DESC) AS rnk
    FROM employees
)
SELECT salary AS second_highest_salary
FROM ranked
WHERE rnk = 2
LIMIT 1;

Q44. Write a Query to Find Duplicate Rows in a Table

Column	Type	Description
customer_id	INT	Primary key
email	VARCHAR	Customer email address
name	VARCHAR	Customer name

Task: Find email addresses that appear more than once in the table.

Real-world context: Used to identify duplicate email addresses before a CRM migration or merge.


SELECT
    email,
    COUNT(*) AS occurrences
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;

-- To see the full rows, not just the duplicate emails:
SELECT *
FROM customers
WHERE email IN (
    SELECT email
    FROM customers
    GROUP BY email
    HAVING COUNT(*) > 1
)
ORDER BY email;

Q45. Write a Query to Rank Employees by Salary Within Each Department

Column	Type	Description
employee_id	INT	Primary key
name	VARCHAR	Employee name
department	VARCHAR	Department name
salary	DECIMAL	Annual salary

Task: Rank each employee by salary within their department. Highest salary = rank 1.

Real-world context: Commonly used in HR analytics dashboards to surface the highest earners per team.


SELECT
    name,
    department,
    salary,
    DENSE_RANK() OVER(
        PARTITION BY department
        ORDER BY salary DESC
    ) AS salary_rank
FROM employees
ORDER BY department, salary_rank;

Q46. Write a Query to Calculate Month-Over-Month Revenue Change

Column	Type	Description
month	DATE	First day of the month
revenue	DECIMAL	Total revenue for that month

Task: Show each month’s revenue alongside the previous month’s, and calculate the absolute change.

Real-world context: This is a standard business reporting query used in finance and sales dashboards.


SELECT
    month,
    revenue,
    LAG(revenue, 1) OVER(ORDER BY month) AS prev_month_revenue,
    revenue - LAG(revenue, 1) OVER(ORDER BY month) AS mom_change,
    ROUND(
        100.0 * (revenue - LAG(revenue, 1) OVER(ORDER BY month))
              / NULLIF(LAG(revenue, 1) OVER(ORDER BY month), 0),
        2
    ) AS pct_change
FROM monthly_revenue
ORDER BY month;

Q47. Write a Query to Find Customers Who Have Not Placed an Order in the Last 90 Days

Customers	Type	Description
customer_id	INT	Primary key
name	VARCHAR	Customer name

Orders	Type	Description
order_id	INT	Primary key
customer_id	INT	FK to customers
order_date	DATE	Order date

Task: Return customers who have not placed any order in the last 90 days (including customers who have never ordered).


-- Anti-join approach using LEFT JOIN
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o
    ON c.customer_id = o.customer_id
    AND o.order_date >= CURRENT_DATE - INTERVAL '90 days'
WHERE o.order_id IS NULL;

-- Alternative using NOT EXISTS
SELECT customer_id, name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
      AND o.order_date >= CURRENT_DATE - INTERVAL '90 days'
);

Q48. Write a Query to Get the Top 3 Products by Revenue in Each Category

Column	Type	Description
product_id	INT	Primary key
product_name	VARCHAR	Product name
category	VARCHAR	Product category
revenue	DECIMAL	Total revenue

Task: Return the top 3 products by revenue within each category.


WITH ranked_products AS (
    SELECT
        product_name,
        category,
        revenue,
        DENSE_RANK() OVER(
            PARTITION BY category
            ORDER BY revenue DESC
        ) AS revenue_rank
    FROM product_sales
)
SELECT product_name, category, revenue, revenue_rank
FROM ranked_products
WHERE revenue_rank <= 3
ORDER BY category, revenue_rank;

Q49. Write a Query to Find the Cumulative Sum of Sales Over Time

Column	Type	Description
sale_date	DATE	Date of sales
daily_revenue	DECIMAL	Revenue for that day

Task: Show each day’s revenue alongside the running cumulative total from the beginning of the dataset.

Real-world context: Used in financial and sales dashboards to track progress toward revenue targets.


SELECT
    sale_date,
    daily_revenue,
    SUM(daily_revenue) OVER(
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_revenue
FROM daily_sales
ORDER BY sale_date;

Q50. Write a Query to Identify Users Who Logged In on Consecutive Days

Column	Type	Description
user_id	INT	User identifier
login_date	DATE	Date of login

Task: Find users who logged in on at least two consecutive calendar days.

Real-world context: Used in retention and engagement analysis to identify active streaks.


WITH login_gaps AS (
    SELECT
        user_id,
        login_date,
        LAG(login_date) OVER(PARTITION BY user_id ORDER BY login_date) AS prev_login
    FROM user_logins
)
SELECT DISTINCT user_id
FROM login_gaps
WHERE login_date - prev_login = 1;

-- In MySQL: DATEDIFF(login_date, prev_login) = 1

Q51. Write a Query to Pivot a Monthly Sales Table From Rows to Columns

Column	Type	Description
product	VARCHAR	Product name
month	VARCHAR	Month name (e.g. ‘Jan’, ‘Feb’)
revenue	DECIMAL	Revenue for that month

Before (row format):

Product	Month	Revenue
Widget A	Jan	5000
Widget A	Feb	6000
Widget B	Jan	3000

After (column format):

Product	Jan	Feb
Widget A	5000	6000
Widget B	3000	NULL

Real-world context: Required when preparing data for a spreadsheet or executive report where each month needs its own column.


-- PostgreSQL / Standard SQL using conditional aggregation
SELECT
    product,
    SUM(CASE WHEN month = 'Jan' THEN revenue ELSE 0 END) AS Jan,
    SUM(CASE WHEN month = 'Feb' THEN revenue ELSE 0 END) AS Feb,
    SUM(CASE WHEN month = 'Mar' THEN revenue ELSE 0 END) AS Mar
FROM monthly_sales
GROUP BY product;

-- SQL Server supports PIVOT syntax:
SELECT product, [Jan], [Feb], [Mar]
FROM monthly_sales
PIVOT (
    SUM(revenue)
    FOR month IN ([Jan], [Feb], [Mar])
) AS pivot_table;

Conclusion

SQL interview questions for data analysts cover several core concepts such as joins, aggregations, and NULL handling, progressing into advanced techniques like window functions, CTEs, and query optimisation, and concluding with practical SQL data analysis interview questions drawn from real hiring rounds. Reviewing solutions builds conceptual clarity, but interview performa⁠nce depends on execution.

You must practise writ‌ing queries independently, recreate schemas, and solve problems without referring to answers to build true fluency under pressure.

For structured preparat‌ion, you ⁠may exp⁠lore Interview Kickstart⁠’s Dat‍a Analyst Interview Prep program for guided practice and feedback.

FAQs: SQL Interview Questions for Data Analysts

Q1. What SQL Topics Are Most Commonly Tested in Data Analyst Interviews?

The four categories that appear most consistently are joins (especially INNER JOIN and LEFT JOIN), aggregation functions with GROUP BY and HAVING, window functions (particularly ROW_NUMBER, RANK, and LAG/LEAD), and query logic problems involving subqueries or CTEs. Most interviews layer these together, for example, asking you to rank or aggregate within groups, so practising combinations is more effective than drilling each topic in isolation.

Q2. How Many SQL Interview Questions Should I Prepare?

Aim to thoroughly understand around 50-70 questions across all difficulty levels rather than skimming through 200. Depth matters more than breadth in interviews. Interviewers follow up and probe your reasoning. If you can write, explain, and optimise queries for the 20 most common patterns cold, you will be better prepared than someone who has read 100 questions without writing any.

Q3. Is SQL Enough to Get a Data Analyst Job?

SQL is necessary but not sufficient on its own. Most analyst roles also expect proficiency in at least one data visualisation tool (Tableau, Power BI, or Looker), familiarity with spreadsheets, and increasingly some Python or R for statistical analysis. SQL handles the data extraction and transformation layer; the other tools handle the communication of results. Demonstrating all three in a portfolio project is the strongest signal you can give a hiring team.

Q4. What Level of SQL Is Expected for Entry-Level Data Analyst Roles?

Entry-level roles typically expect solid command of SELECT queries with WHERE, GROUP BY, ORDER BY, and HAVING; all four standard join types; basic aggregate functions; and NULL handling. You may also be asked to write a simple subquery. Window functions and CTEs are increasingly common even at entry level, especially in tech companies, so covering them in your preparation is worthwhile even if you are not required to have deep experience with them.

Q5. How Do I Practice SQL for a Data Analyst Interview?

The most effective preparation combines writing queries on real datasets with reviewing worked examples. Use platforms like LeetCode (Database section), Mode Analytics, or Kaggle SQL labs to write queries against actual tables, not just read answers. Focus on getting queries wrong the first time and debugging them; that process mirrors what happens in live interviews. Time yourself on medium-difficulty problems to simulate interview pressure.

Q6. Are Window Functions Commonly Asked in Data Analyst SQL Interviews?

Yes. Window functions appear in a large proportion of mid-to-senior analyst interviews and are increasingly common even at entry level. The functions that come up most are ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), and SUM() OVER() for running totals. If you can explain the difference between PARTITION BY and GROUP BY and write a clean ranking query from scratch, you will handle the majority of window function questions.

References

Related Reads:

Uplevel your career with AI/ML/GenAI

1 Enter details

2 Select webinar slot

*Invalid Name

*Invalid Email Address

By sharing your contact details, you agree to our privacy policy.