Article written by Rishabh Dev Choudhary, under the guidance of Ning Rui, 20+ yrs leading machine learning & engineering teams. Reviewed by Mrudang Vora, an Engineering Leader with 15+ years of experience.
SQL interview questions are a near-universal fixture in tech interviews — from software engineering and backend development to data analytics, data engineering, and database administration. Whether you are a fresher getting started with relational databases, an intermediate developer brushing up on query patterns, or an experienced engineer preparing for FAANG-level rounds, this guide covers the full spectrum of SQL interview questions and answers you need.
SQL (Structured Query Language) is used to query, manage, and manipulate data stored in relational databases — powering everything from e-commerce backends and financial systems to analytics pipelines and reporting layers. In this guide, you will find answers covering core SQL concepts, advanced SQL interview questions on window functions, CTEs, and execution plans, plus hands-on SQL query problems written the way interviewers actually test them.
The guide is organized by difficulty — beginner through advanced — so you can move at your own pace or jump directly to the level that matches your target role. Each section includes code snippets, comparison tables, and interview-specific tips to help you prepare efficiently.
These are the most commonly asked SQL interview questions across all experience levels.
SQL (Structured Query Language) is a domain-specific language designed to query, insert, update, and manage data held in relational database management systems. It is declarative — you describe what data you want, and the database engine decides how to retrieve it. SQL powers virtually every application that handles persistent data, from banking platforms and inventory systems to analytics dashboards and mobile apps.
SQL commands fall into four main categories:
| Category | Abbreviation | What It Does |
|---|---|---|
| Data Definition Language | DDL | Create, alter, and drop database structures (tables, indexes, schemas) |
| Data Manipulation Language | DML | Insert, update, delete, and query rows of data |
| Data Control Language | DCL | Grant and revoke user permissions |
| Transaction Control Language | TCL | Commit, rollback, and manage transactions |
Before writing a single SQL query, interviewers want to know you understand what you are querying — that is where RDBMS comes in.
An RDBMS organizes data into tables (rows and columns) that are related to each other through shared keys. SQL is the language used to communicate with it. What makes a database relational is not just that data is stored in tables — it is that tables are connected via foreign keys that reference primary keys in other tables, enforcing referential integrity across the dataset.
Think of a general DBMS as a filing cabinet: any data goes in any folder. An RDBMS is a filing cabinet where every folder references others by labeled tabs — the folders stay consistent with each other because the system enforces those references.
| Feature | General DBMS | RDBMS |
|---|---|---|
| Data structure | Files, hierarchies, or objects | Tables with rows and columns |
| Relationships | Not enforced | Enforced via foreign keys |
| SQL support | Varies | Yes — core to operation |
| Integrity enforcement | Minimal | Referential integrity, constraints |
| Examples | XML stores, flat files | MySQL, PostgreSQL, Oracle, SQL Server |
| Command Type | Commands | Purpose |
|---|---|---|
| DDL — Data Definition Language | CREATE, ALTER, DROP, TRUNCATE | Define and modify database structure |
| DML — Data Manipulation Language | SELECT, INSERT, UPDATE, DELETE | Manipulate data within tables |
| DCL — Data Control Language | GRANT, REVOKE | Manage access permissions |
| TCL — Transaction Control Language | COMMIT, ROLLBACK, SAVEPOINT | Control transaction boundaries |
| DQL — Data Query Language | SELECT | Query and retrieve data (sometimes classified separately) |
DDL statements change the structure of the database. DML statements operate on the data inside those structures. DCL handles who can do what. TCL manages whether a set of changes is saved or reversed as a single unit.
| Feature | DELETE | TRUNCATE | DROP |
|---|---|---|---|
| What it removes | Specific rows (or all rows with WHERE/no WHERE) | All rows | Entire table + structure |
| WHERE clause | Yes — filter rows | No | No |
| Transaction log | Row-by-row — logged | Minimal logging | Minimal logging |
| Rollback possible | Yes (within transaction) | Yes in some DBs (PostgreSQL) | Generally No |
| Triggers fired | Yes (AFTER DELETE) | No | No |
| Auto-increment reset | No | Yes | Yes (table gone) |
-- DELETE: remove rows conditionally DELETE FROM orders WHERE status = 'cancelled'; -- TRUNCATE: wipe all rows, keep structure TRUNCATE TABLE orders; -- DROP: destroy the table entirely DROP TABLE orders;
A Primary Key (PK) is a column (or combination of columns) that uniquely identifies every row in a table. It must be UNIQUE and NOT NULL, and each table can have only one primary key. When two or more columns together form the primary key, it is called a composite primary key.
A Foreign Key (FK) is a column in one table that references the primary key in another. It enforces referential integrity — a FK value must either match an existing PK value in the referenced table or be NULL.
-- Parent table CREATE TABLE customers ( customer_id INT PRIMARY KEY, name VARCHAR(100) ); -- Child table: order_id is PK; customer_id is FK CREATE TABLE orders ( order_id INT PRIMARY KEY, customer_id INT, amount DECIMAL(10,2), FOREIGN KEY (customer_id) REFERENCES customers(customer_id) );
💡 Pro Tip: A composite primary key uses 2+ columns together — common interview follow-up. E.g., PRIMARY KEY (student_id, course_id) in an enrollments table.
Constraints are your database’s built-in ruleset — they stop bad data before it enters. Imagine an orders table with no constraints: a price could be negative, the same email could appear twice in a users table, an order could reference a customer that does not exist. Each constraint below is the fix for one of these scenarios.
| Constraint | What It Prevents | Quick Example |
|---|---|---|
| NOT NULL | Blank/empty entries in required fields | email VARCHAR(100) NOT NULL |
| UNIQUE | Two rows sharing the same value | UNIQUE (email) — no duplicate emails |
| PRIMARY KEY | Duplicate or missing row identifiers | customer_id INT PRIMARY KEY |
| FOREIGN KEY | Orphaned references (no matching parent) | FOREIGN KEY (customer_id) REFERENCES customers(customer_id) |
| CHECK | Values that violate a business rule | CHECK (price > 0) |
| DEFAULT | Missing values on insert | status VARCHAR(20) DEFAULT ‘pending’ |
A JOIN combines rows from two or more tables based on a related column. Choosing the right join type is one of the most consistently tested skills in SQL interviews.
| Join Type | Returns | When to Use |
|---|---|---|
| INNER JOIN | Only rows with matching values in both tables | When you only want records that exist in both tables |
| LEFT JOIN | All rows from left + matching right rows (NULL if no match) | When left table must be fully preserved |
| RIGHT JOIN | All rows from right + matching left rows (NULL if no match) | When right table must be fully preserved |
| FULL OUTER JOIN | All rows from both tables, NULLs where no match | To find all data across two sets including unmatched |
| CROSS JOIN | Cartesian product — every left row × every right row | Generating combinations (e.g., sizes × colors) |
| SELF JOIN | Rows from same table joined to itself | Hierarchy queries (employees + managers in same table) |
-- INNER JOIN SELECT o.order_id, c.name FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id; -- LEFT JOIN (keep all customers, even without orders) SELECT c.name, o.order_id FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id; -- SELF JOIN (employee + their manager) SELECT e.name AS employee, m.name AS manager FROM employees e LEFT JOIN employees m ON e.manager_id = m.employee_id;
| Feature | WHERE | HAVING |
|---|---|---|
| When it runs | Before aggregation (filters raw rows) | After GROUP BY (filters grouped results) |
| Works with aggregates | No — cannot use SUM(), COUNT(), etc. | Yes — designed for aggregate conditions |
| Used with | SELECT, UPDATE, DELETE | SELECT with GROUP BY |
| Example | WHERE salary > 50000 | HAVING COUNT(*) > 5 |
-- WHERE filters individual rows before grouping -- HAVING filters groups after aggregation SELECT department, COUNT(*) AS headcount FROM employees WHERE status = 'active' -- filter rows first GROUP BY department HAVING COUNT(*) > 5; -- then filter groups
An index is a database structure that speeds up data retrieval by creating a separate lookup path into a table — similar to an index at the back of a textbook. Without an index, the database performs a full table scan, reading every row to find matches. With an index, it follows the lookup path directly to matching rows.
The trade-off: indexes consume storage and slow down INSERT/UPDATE/DELETE operations (because the index must also be updated). Always index columns used in WHERE clauses, JOIN conditions, and ORDER BY.
| Index Type | Description | When to Use |
|---|---|---|
| Clustered | Physically reorders table data by key; one per table | Primary key; range queries on the key column |
| Non-Clustered | Separate structure pointing to row locations; multiple allowed | Frequently filtered non-PK columns |
| Unique | Enforces uniqueness + speeds lookups | Email addresses, usernames, natural keys |
| Composite | Index on two or more columns together | Multi-column WHERE / ORDER BY patterns |
Normalization is the process of organizing a database to reduce data redundancy and improve data integrity. It involves decomposing large tables into smaller, related tables linked by keys. Most interviews test up to 3NF; BCNF is a bonus.
| Normal Form | Rule | Example Fix |
|---|---|---|
| 1NF — First Normal Form | Each column holds atomic (indivisible) values; no repeating groups | Split ‘phones: 555-1234, 555-5678’ into separate rows |
| 2NF — Second Normal Form | Meets 1NF + no partial dependency (every non-key column depends on the whole PK) | Move ‘customer_name’ out of an orders table if PK is (order_id, product_id) |
| 3NF — Third Normal Form | Meets 2NF + no transitive dependency (non-key columns depend only on PK, not on each other) | Move ‘zip_code → city’ mapping to a separate zip_codes table |
| BCNF — Boyce-Codd Normal Form | Stricter 3NF: every determinant must be a candidate key | Resolves edge cases where 3NF still allows anomalies with overlapping composite keys |
GROUP BY collapses rows that share the same value in the specified column(s) into a single row, allowing you to apply aggregate functions (COUNT, SUM, AVG, MIN, MAX) to each group.
-- Count employees per department SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary FROM employees GROUP BY department;
Every column in the SELECT list must either be in the GROUP BY clause or wrapped in an aggregate function. GROUP BY is evaluated after WHERE and before HAVING — which is why HAVING is needed to filter on aggregated values.
A subquery (also called an inner query or nested query) is a SELECT statement embedded inside another SQL statement. The outer query uses the result of the inner query as a value or set of values.
-- Subquery in WHERE: find employees who earn above the company average SELECT name, salary FROM employees WHERE salary > ( SELECT AVG(salary) FROM employees );
Subqueries can appear in the WHERE clause (as a filter), the FROM clause (as an inline table), or the SELECT list (as a scalar value). See the Correlated vs Non-Correlated Subqueries section in the Advanced block for deeper coverage of performance trade-offs.
These hands-on SQL query questions test your ability to write and optimize queries under real interview conditions. Where possible, multiple approaches are shown – interviewers appreciate knowing more than one solution. For an extended set of coding problems, explore SQL query interview questions for experienced developers that go deeper on multi-step and performance-focused scenarios.
Understanding execution order is one of the most insightful answers you can give in an SQL interview — it explains behavior that candidates often find confusing.
| Written Order | Execution Order | Clause | What it Does |
|---|---|---|---|
| 1st | 3rd | FROM / JOIN | Identify source tables and join them |
| 2nd | 4th | WHERE | Filter rows before grouping |
| 3rd | 5th | GROUP BY | Group remaining rows |
| 4th | 6th | HAVING | Filter groups after aggregation |
| 5th | 7th | SELECT | Evaluate expressions, aliases, DISTINCT |
| 6th | 8th | ORDER BY | Sort the final result set |
| 7th | 9th | LIMIT / OFFSET | Restrict the number of rows returned |
-- A query using every clause — on a sales_transactions table SELECT region, SUM(amount) AS total_sales FROM sales_transactions WHERE status = 'completed' GROUP BY region HAVING SUM(amount) > 10000 ORDER BY total_sales DESC
ORDER BY sorts the result set in ascending (ASC, default) or descending (DESC) order. The real interview angle is multi-column sorting, which is what interviewers actually probe.
-- Basic sort: highest score first SELECT player, score FROM leaderboard ORDER BY score DESC; -- Multi-column sort: primary sort by score DESC, tie-break by name ASC SELECT player, score, region FROM leaderboard ORDER BY score DESC, player ASC;
When sorting on multiple columns, SQL processes them left-to-right. The first column is the primary sort; subsequent columns break ties within groups that share the same first-column value.
The performance angle is what separates a basic answer from a strong one — UNION ALL is always faster than UNION.
| Behavior | UNION | UNION ALL |
|---|---|---|
| Duplicates | Removed (de-duplication step) | Kept — all rows returned |
| Internal sort | Yes — implicit sort to find duplicates | No sort step |
| Speed | Slower (extra processing) | Faster — no overhead |
| When to use | Duplicates exist and must be removed | Duplicates are acceptable or already impossible |
-- UNION: combine active + archived employees, remove duplicates SELECT employee_id, name FROM active_employees UNION SELECT employee_id, name FROM archived_employees; -- UNION ALL: same query, keep all rows (faster) SELECT employee_id, name FROM active_employees UNION ALL SELECT employee_id, name FROM archived_employees;
Set operations are often cleaner than JOINs for certain comparisons — here is when to reach for them.
| Operation | Returns | Replaces This JOIN Pattern | Use Case |
|---|---|---|---|
| UNION / UNION ALL | All rows from both sets (with or without duplicates) | OR conditions across tables | Merge active + archived product lists |
| INTERSECT | Only rows that appear in BOTH sets | INNER JOIN on all selected columns | Find products that appear in both current catalog and last month’s orders |
| EXCEPT (MINUS in Oracle) | Rows in first set that are NOT in second | LEFT JOIN … WHERE right IS NULL | Find products that were sold but never returned |
-- EXCEPT: user IDs in Version A but not in Version B SELECT user_id FROM app_version_a EXCEPT SELECT user_id FROM app_version_b; -- INTERSECT: products in both current catalog and last month's orders SELECT product_id FROM current_catalog INTERSECT SELECT product_id FROM last_month_orders;
There are three common approaches — knowing all three signals strong SQL depth.
-- Approach 1: Subquery (works everywhere) SELECT MAX(salary) AS third_highest FROM employees WHERE salary < ( SELECT MAX(salary) FROM employees WHERE salary < ( SELECT MAX(salary) FROM employees ) ); -- Approach 2: DENSE_RANK() — preferred in interviews SELECT salary FROM ( SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees ) ranked WHERE rnk = 3; -- Approach 3: LIMIT + OFFSET (MySQL / PostgreSQL) SELECT DISTINCT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 2;
-- Preferred approach: CTE + ROW_NUMBER() WITH ranked_rows AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn FROM users ) DELETE FROM ranked_rows WHERE rn > 1; -- Alternative: GROUP BY approach (returns unique rows, no DELETE) SELECT email, MAX(created_at) AS latest FROM users GROUP BY email;
ROW_NUMBER() assigns a sequential number within each group defined by PARTITION BY. Rows numbered > 1 are duplicates. This approach lets you keep the most recent record (controlled by ORDER BY in the window).
-- Departments with more than 5 active employees SELECT department, COUNT(*) AS headcount FROM employees WHERE status = 'active' GROUP BY department HAVING COUNT(*) > 5 ORDER BY headcount DESC;
WHERE cannot reference aggregate results (COUNT(*)) because it runs before grouping. HAVING is the correct clause for post-aggregation filters.
-- INNER JOIN: orders with customer names (only matched records) SELECT o.order_id, c.name AS customer_name, o.amount, o.order_date FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id WHERE o.status = 'completed' ORDER BY o.order_date DESC;
-- Fetch odd rows using ROW_NUMBER() SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY employee_id) AS rn FROM employees ) numbered WHERE MOD(rn, 2) = 1; -- Change to = 0 for even rows
MOD(rn, 2) returns 1 for odd row numbers and 0 for even. ROW_NUMBER() assigns a sequential number to each row; the ORDER BY inside OVER controls which rows get which numbers.
LIKE is SQL’s pattern-matching operator. The real interview angle is performance — not just syntax.
| Pattern | Matches | Does Not Match | Index Used? |
|---|---|---|---|
| ‘%engineer%’ | senior engineer, data engineer, engineering lead | developer, analyst | No — leading wildcard disables index |
| ‘engineer%’ | engineer I, engineering manager | senior engineer | Yes — trailing wildcard only |
| ‘%engineer’ | senior engineer, staff engineer | engineering lead | No — leading wildcard disables index |
| ‘_A%’ | BA, CA, DAta | ABC (3 chars before) | Depends on DB |
-- Find all job titles containing 'engineer' SELECT title FROM job_postings WHERE title LIKE '%engineer%'; -- Find customers with .edu email domains SELECT name, email FROM users WHERE email LIKE '%.edu'; -- Case-insensitive search in PostgreSQL SELECT title FROM job_postings WHERE title ILIKE '%engineer%';
The classic interview trap: SELECT * FROM users WHERE phone = NULL returns zero rows, even if phone_number is NULL for many users. Why? Because NULL comparisons in SQL return UNKNOWN, not TRUE or FALSE.
SQL uses three-valued logic: TRUE, FALSE, and UNKNOWN. Any comparison with NULL (using =, !=, <, >) yields UNKNOWN — and WHERE only passes rows that evaluate to TRUE. The correct operators are IS NULL and IS NOT NULL.
-- WRONG: returns zero rows even when phone IS NULL SELECT * FROM users WHERE phone = NULL; -- CORRECT SELECT * FROM users WHERE phone IS NULL; SELECT * FROM users WHERE phone IS NOT NULL; -- COALESCE: substitute a default when NULL SELECT name, COALESCE(phone, 'No phone on file') AS phone_display FROM users; -- NULLIF: returns NULL if two values are equal SELECT NULLIF(discount, 0) AS active_discount FROM orders;
| Operation | With Regular Value | With NULL |
|---|---|---|
| = comparison | Returns TRUE or FALSE | Returns UNKNOWN (row excluded) |
| COUNT(column) | Counts non-NULL values | NULLs are excluded from the count |
| COUNT(*) | Counts all rows | Counts all rows including NULLs |
| SUM(column) | Adds non-NULL values | NULLs are silently ignored |
| AVG(column) | Sum / count of non-NULL rows | Denominator is count of non-NULL rows only |
Interviewers often ask all three together once they hear ‘COALESCE.’
| Function | Returns | Best Used When | Database Support |
|---|---|---|---|
| COALESCE(a, b, c, …) | First non-NULL value from the list | Multi-level fallback logic (preferred_name → first_name → ‘Guest’) | ANSI standard — works everywhere |
| NULLIF(a, b) | NULL if a equals b; otherwise returns a | Avoiding divide-by-zero: NULLIF(denominator, 0) | ANSI standard — works everywhere |
| ISNULL(a, b) | b if a is NULL; otherwise a | Simple two-argument NULL replacement | SQL Server only |
-- COALESCE: display name fallback chain SELECT COALESCE(preferred_name, first_name, 'Guest') AS display_name FROM user_profiles; -- NULLIF: safe division (avoid divide-by-zero) SELECT total_revenue / NULLIF(num_orders, 0) AS avg_order_value FROM sales_summary;
These advanced SQL interview questions are tested at senior, lead, and FAANG-level interviews – covering window functions, recursive, CTEs, optimization, and transactions. Expect follow-up questions on trade-offs and performance. For more depth on multi-step query logic and nested patterns, explore complex SQL interview questions that push further into real-world problem-solving scenarios.
A window function performs a calculation across a set of rows related to the current row — without collapsing those rows into a single result. Unlike GROUP BY aggregation, window functions add a computed column alongside each original row.
SELECT employee_id, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num, RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank, DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank, LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS prev_salary, LEAD(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS next_salary, SUM(salary) OVER (PARTITION BY department) AS dept_total FROM employees;
OVER() defines the window. PARTITION BY divides rows into groups (like GROUP BY but without collapsing). ORDER BY within OVER controls the order of calculation within each partition.
| Function | Tie Behavior | Gaps in Numbering? | Best Use Case |
|---|---|---|---|
| ROW_NUMBER() | Each row gets a unique number even on ties | N/A — always consecutive | Pagination, deduplication (need a unique ID per row) |
| RANK() | Tied rows get the same rank; next rank skips | Yes — 1, 1, 3 (gap after ties) | Leaderboards where position gaps matter |
| DENSE_RANK() | Tied rows get the same rank; next rank does NOT skip | No — 1, 1, 2 (no gap) | Top-N queries, finding Nth highest salary |
-- All three on the same dataset (two employees tied at rank 1): SELECT name, salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num, RANK() OVER (ORDER BY salary DESC) AS rnk, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk FROM employees; -- Example output: -- Alice | 90000 | 1 | 1 | 1 -- Bob | 90000 | 2 | 1 | 1 <-- same salary -- Carol | 80000 | 3 | 3 | 2 <-- RANK skips to 3; DENSE_RANK goes to 2
Before window functions, comparing a row to its previous row required a self-join — messy and slow. LAG and LEAD are the cleaner alternative: they access previous or next row values in a single pass.
Function signature: LAG(column, offset, default) — the offset (how many rows back) and default (value when no previous row exists) are often missed in interview answers.
-- Month-over-month revenue change using LAG SELECT month, revenue, LAG(revenue, 1, 0) OVER (ORDER BY month) AS prev_revenue, revenue - LAG(revenue, 1, 0) OVER (ORDER BY month) AS change FROM monthly_revenue ORDER BY month; -- Detect churn: users who did not log in for 30+ days SELECT user_id, login_date, LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date) AS prev_login, login_date - LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date) AS days_gap FROM user_logins HAVING days_gap > 30;
A CTE is a named, temporary result set defined with the WITH keyword. It exists only for the duration of the single statement it belongs to, making complex queries significantly more readable by letting you name and reference intermediate steps.
-- Basic CTE: find average salary per department, then filter WITH dept_avg AS ( SELECT department, AVG(salary) AS avg_salary FROM employees GROUP BY department ) SELECT e.name, e.salary, d.avg_salary FROM employees e JOIN dept_avg d ON e.department = d.department WHERE e.salary > d.avg_salary;
CTEs are particularly useful for: breaking complex multi-step logic into readable named blocks, avoiding repeated identical subqueries, and enabling recursive queries (which are impossible with regular subqueries).
Some data is inherently hierarchical — employee-manager trees, category hierarchies, network paths. Recursive CTEs are built for these structures.
A recursive CTE has two parts connected by UNION ALL: the anchor query (the starting point) and the recursive member (each iteration that builds on the previous one). Recursion terminates when the recursive member returns no new rows.
-- Traverse an employee org chart: all reports under manager ID 1 WITH RECURSIVE org_tree AS ( -- Anchor: start with the top-level manager SELECT employee_id, name, manager_id, 0 AS depth FROM employees WHERE employee_id = 1 UNION ALL -- Recursive member: find direct reports of each level SELECT e.employee_id, e.name, e.manager_id, ot.depth + 1 FROM employees e INNER JOIN org_tree ot ON e.manager_id = ot.employee_id WHERE ot.depth < 10 -- prevents infinite recursion ) SELECT * FROM org_tree ORDER BY depth, name;
This is a decision framework — here is what to ask yourself before writing either.
| Factor | Prefer CTE | Prefer Subquery |
|---|---|---|
| Reusability | Need to reference the same logic multiple times in one query | Logic is only used once |
| Readability | Multi-step logic that benefits from a named intermediate result | Simple, single-level filter or scalar lookup |
| Recursion needed | Yes — only CTEs support recursion | No |
| Query complexity | 3+ logical steps or nested logic | One or two levels deep |
| Performance | Similar — optimizer treats them identically in most cases | Slightly less overhead for very simple cases |
-- Same question: find departments whose average salary exceeds company average -- As a subquery: SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > ( SELECT AVG(salary) FROM employees ); -- As a CTE (preferred for interviews — shows structured thinking): WITH company_avg AS ( SELECT AVG(salary) AS avg_sal FROM employees ) SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > ( SELECT avg_sal FROM company_avg );
| Feature | Non-Correlated Subquery | Correlated Subquery |
|---|---|---|
| Dependency on outer query | None — runs once independently | References outer query column — runs once per outer row |
| Execution | Computed once, result reused | Computed for every row in the outer query |
| Performance | Generally faster | Can be slow on large datasets |
| Typical use | Single value or set comparison | Row-by-row comparisons, ‘for each row’ logic |
-- Non-correlated: inner query runs once SELECT name FROM employees WHERE salary > ( SELECT AVG(salary) FROM employees ); -- Correlated: inner query runs once PER employee row SELECT e1.name, e1.salary FROM employees e1 WHERE e1.salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.department = e1.department -- references outer query );
Correlated subqueries are powerful but have an O(n) execution cost — they run once per outer row. On large tables, consider rewriting as a JOIN or window function for better performance.
CASE is SQL’s if-else — it lets you transform, classify, or pivot data without touching application code. There are two forms:
-- Simple CASE (match a value): SELECT order_id, CASE status WHEN 'P' THEN 'Pending' WHEN 'C' THEN 'Completed' WHEN 'X' THEN 'Cancelled' ELSE 'Unknown' END AS status_label FROM orders; -- Searched CASE (evaluate conditions): SELECT order_id, amount, CASE WHEN amount < 50 THEN 'Small' WHEN amount < 200 THEN 'Medium' ELSE 'Large' END AS order_size FROM order_items; -- CASE inside aggregate — the pivot pattern FAANG interviewers love: SELECT SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END) AS completed_revenue, SUM(CASE WHEN status = 'pending' THEN amount ELSE 0 END) AS pending_revenue, COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled_count FROM order_items;
MERGE solves the classic ETL problem: here is a source table with new data — insert what is missing, update what changed, delete what was removed. All in one atomic statement.
-- Sync daily_inventory_feed (source) into master inventory (target) MERGE INTO master_inventory AS target USING daily_inventory_feed AS source ON target.product_id = source.product_id WHEN MATCHED AND target.quantity <> source.quantity THEN UPDATE SET target.quantity = source.quantity, target.updated_at = GETDATE() WHEN NOT MATCHED BY TARGET THEN INSERT (product_id, product_name, quantity) VALUES (source.product_id, source.product_name, source.quantity) WHEN NOT MATCHED BY SOURCE THEN DELETE;
If you are working in a Microsoft SQL server environment, syntax patterns liek GETDATE(), ISNULL(), and the MERGE statement are covered in depth in T-SQL interview questions – including T-SQL specific functions and procedural extensions that differ from ANSI SQL.
This scenario-based question is a senior-level favourite. Always measure before making changes — jumping straight to indexing without measuring is a red flag.
| Step | Action | What to Look For |
|---|---|---|
| 1. Reproduce & measure | Run EXPLAIN ANALYZE; capture exact runtime | Row estimates vs actual rows processed |
| 2. Check stale statistics | Run ANALYZE (PostgreSQL) or UPDATE STATISTICS (SQL Server) | If estimates are wildly off, stats need a refresh |
| 3. Look for missing indexes | Check if slow columns are indexed; verify the index is being used | Seq Scan on a large table is a warning sign |
| 4. Audit sargability | Functions on indexed columns (WHERE YEAR(col) = 2024) kill index use | Rewrite as range: WHERE col >= ‘2024-01-01’ AND col < ‘2025-01-01’ |
| 5. Eliminate excess data | Remove SELECT *; add WHERE filters earlier; check for JOIN row explosions | Row count ballooning after a JOIN |
| 6. Rewrite anti-patterns | Replace correlated subqueries with JOINs; split OR conditions into UNION ALL | N+1 subquery execution pattern |
| 7. Architectural fixes | Materialized view for repeated aggregations; partitioning for date-range scans | Large fact tables scanned repeatedly for the same date windows |
An execution plan is a map of how the database engine plans to execute your query. Reading it is a genuine superpower in interviews — it demonstrates production experience.
-- PostgreSQL: show the plan AND actually run the query EXPLAIN ANALYZE SELECT c.name, SUM(o.amount) FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.status = 'completed' GROUP BY c.name ORDER BY 2 DESC;
| Red Flag in Plan | What it Means | How to Fix |
|---|---|---|
| Seq Scan on large table | No suitable index exists for the filter column | Add an index on the WHERE / JOIN column |
| High rows estimate vs actual | Table statistics are stale | Run ANALYZE / UPDATE STATISTICS |
| Nested Loop on large tables | Wrong join algorithm for the data volume | Consider Hash Join; ensure join columns are indexed |
| Sort without index | ORDER BY on an unindexed column causes an in-memory sort | Add an index on the ORDER BY column(s) |
| Very high cost node | Bottleneck step in the query | Focus optimisation effort here first |
This section covers SQL objects and features that appear frequently in intermediate and senior interviews.
A view is a saved SELECT query with a name — it behaves like a table but stores no data of its own. Every time you query a view, the underlying SELECT runs fresh.
-- Create a view for the reporting team CREATE VIEW active_customer_orders AS SELECT c.customer_id, c.name, o.order_id, o.amount, o.order_date FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.status = 'active'; -- Query it like a table: SELECT * FROM active_customer_orders WHERE amount > 500;
Key use cases: (1) simplify a complex JOIN for the reporting team, (2) expose only specific columns to a read-only database user, (3) provide a stable interface when underlying table structures may change.
A regular view reruns its query every time you access it. A materialized view pre-computes and stores the result like a snapshot — it trades freshness for speed.
| Feature | Regular View | Materialized View |
|---|---|---|
| Data stored? | No — query runs every time | Yes — result set is physically stored |
| Performance | As fast as the underlying query | Instant reads; refreshes cost time |
| Data freshness | Always current | Only as fresh as the last refresh |
| Maintenance | None required | Must schedule or trigger refresh |
| Best for | Simple JOINs, security filtering | Heavy aggregations, dashboards, BI reports |
-- Create a materialized view for nightly sales summary CREATE MATERIALIZED VIEW nightly_sales_summary AS SELECT region, DATE(order_date) AS sale_date, SUM(amount) AS total FROM orders WHERE status = 'completed' GROUP BY region, DATE(order_date); -- Manually refresh (ON DEMAND): REFRESH MATERIALIZED VIEW nightly_sales_summary;
Interviewers nearly always ask stored procedures and functions together. Lead with the comparison.
| Feature | Stored Procedure | Function |
|---|---|---|
| Return value | Optional; OUT parameters | Must return a single value (or table) |
| DML allowed | Yes — INSERT, UPDATE, DELETE | No DML in most databases (pure functions only) |
| Transaction control | Yes — COMMIT/ROLLBACK inside | No — cannot control transactions |
| Called how | EXEC / CALL statement | Inline in SELECT, WHERE, HAVING |
| Reusable in SELECT | No | Yes |
-- Stored procedure: process payroll for a department CREATE PROCEDURE process_payroll (@department_id INT) AS BEGIN UPDATE employees SET salary = salary * 1.05 WHERE department_id = @department_id; INSERT INTO audit_log (action, ts) VALUES ('payroll_processed', GETDATE()); END; -- Execute the stored procedure EXEC process_payroll @department_id = 3;
Cursors exist, but experienced developers and interviewers know they are usually the wrong choice — and the best answer demonstrates why.
A cursor processes rows one at a time in a loop (DECLARE → OPEN → FETCH → CLOSE → DEALLOCATE). The critical problem: each FETCH is effectively a round trip, making cursors orders of magnitude slower than set-based operations on large tables.
| Cursor Type | Behavior | When to Use |
|---|---|---|
| FORWARD_ONLY | Read-only, one direction, no backtracking | Most common — fastest option when a cursor is unavoidable |
| STATIC | Snapshot of data at open time — changes not reflected | Read-only reporting on a stable dataset |
| DYNAMIC | Reflects inserts/updates/deletes as they happen | When live data visibility is required during iteration |
| KEYSET | Key set fixed at open; value changes reflected | Middle ground between STATIC and DYNAMIC |
Triggers are powerful but invisible — they fire automatically and can cause unexpected behaviour if you do not know they exist.
A trigger is a stored procedure that executes automatically in response to an INSERT, UPDATE, or DELETE event on a table.
-- AFTER INSERT trigger: log every new order to an audit table CREATE TRIGGER trg_order_audit AFTER INSERT ON orders FOR EACH ROW BEGIN INSERT INTO audit_log (table_name, action, record_id, created_at) VALUES ('orders', 'INSERT', NEW.order_id, NOW()); END;
| Use Case | Trigger Type | Better Alternative? |
|---|---|---|
| Audit logging | AFTER INSERT/UPDATE/DELETE | Keep the trigger — audit tables are a classic trigger use case |
| Cascading business rules | BEFORE / AFTER | Often better as application-layer logic or a stored procedure |
| Enforcing complex constraints | BEFORE INSERT/UPDATE | Consider CHECK constraints or application-layer validation |
This three-way decision is exactly how it comes up in senior interviews.
| Factor | Temporary Table | CTE | Subquery |
|---|---|---|---|
| Persists after query? | Yes — until session ends (#) or dropped | No — single statement only | No — inline only |
| Reusable in same query? | Yes — reference it as many times as needed | Yes — within the same WITH clause | No — must repeat it |
| Indexable? | Yes — CREATE INDEX on it | No | No |
| Best for | Large intermediate results, multi-step analytics | Complex readable multi-step logic | Simple one-off filters or scalar values |
| Performance on large data | Best — indexed, materialized | Optimizer dependent (often same as subquery) | Worst for repeated use |
Local temp tables (#table) are visible only to the current session and are automatically dropped at session end. Global temp tables (##table) are visible to all sessions — use sparingly and drop explicitly.
| Function | What it Computes | NULL Behavior | Example |
|---|---|---|---|
| COUNT(*) | Total row count including NULLs | NULLs counted | COUNT(*) → 10 rows |
| COUNT(column) | Count of non-NULL values in column | NULLs excluded | COUNT(phone) → 7 (3 NULLs skipped) |
| COUNT(DISTINCT col) | Count of unique non-NULL values | NULLs excluded | COUNT(DISTINCT city) → 4 unique cities |
| SUM(column) | Total of all non-NULL values | NULLs ignored | SUM(amount) skips NULL amounts |
| AVG(column) | Sum / count of non-NULL values | Denominator = non-NULL count only | AVG(salary) over 10 rows with 3 NULLs divides by 7 |
| MIN / MAX | Lowest / highest non-NULL value | NULLs ignored | MIN(price), MAX(score) |
-- All three COUNT variants on the same table: SELECT COUNT(*) AS total_rows, COUNT(phone) AS rows_with_phone, COUNT(DISTINCT city) AS unique_cities FROM users;
ACID properties, isolation levels, and deadlocks are tested in almost every backend, DBA, and senior SQL interview.
Understanding ACID through failure scenarios demonstrates real comprehension — not just memorized definitions.
| Property | What it Guarantees | What Breaks Without It | Real-World Failure Example |
|---|---|---|---|
| Atomicity | All operations in a transaction succeed, or none do | Partial writes corrupt data | Bank transfer: Account A is debited but the server crashes before crediting Account B — money disappears |
| Consistency | Every transaction leaves the database in a valid state per defined rules | Business rule violations persist | Inventory system allows product quantity to go negative — an impossible real-world state |
| Isolation | Concurrent transactions do not interfere with each other | Dirty reads, phantom reads, double-booking | Two booking systems both see seat 14A as available; both confirm it — two passengers, one seat |
| Durability | Committed transactions survive system failures | Confirmed data lost after crash | A confirmed order disappears after server restart because it was only held in memory |
Isolation levels exist to prevent three categories of anomaly:
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Typical Use Case |
|---|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible | Rough analytics where performance beats accuracy |
| READ COMMITTED (default) | Prevented | Possible | Possible | Most OLTP applications — a good balance |
| REPEATABLE READ | Prevented | Prevented | Possible (varies) | Financial reports that must be consistent within a transaction |
| SERIALIZABLE | Prevented | Prevented | Prevented | Critical financial or booking transactions |
Deadlock story: Transaction A locks the orders table and waits for inventory. Transaction B locks inventory and waits for orders. Neither can move forward — that is a deadlock. The database resolves it by selecting one transaction as the ‘deadlock victim,’ rolling it back, and letting the other proceed.
Important distinction: blocking = one transaction waiting on another (will eventually resolve on its own). Deadlock = circular wait (never resolves without intervention — the victim must be rolled back).
Prevention checklist:
Always access tables in the same order across all transactions — circular waits cannot form if all code acquires locks in the same sequence
SQL injection occurs when user-supplied input is concatenated directly into a query string, allowing an attacker to modify the query’s logic.
-- VULNERABLE: string concatenation query = "SELECT * FROM users WHERE username = '" + username + "'"; -- Injected input: ' OR '1'='1 -- Resulting query: SELECT * FROM users WHERE username = '' OR '1'='1' -- Result: returns ALL users — authentication bypassed -- SAFE: parameterized query SELECT * FROM users WHERE username = @username; -- The input is passed as a parameter, treated as pure data — never executed as SQL
| Layer | Defence | What it Stops |
|---|---|---|
| Layer 1 (Primary) | Parameterized queries / prepared statements | All injection via user input — the parameter is treated as data, never code |
| Layer 2 (Secondary) | Input validation — allow-list expected formats | Catches invalid data types before they reach the database |
| Layer 3 (Blast radius) | Least-privilege database accounts | Limits damage if injection somehow succeeds — a read-only account cannot DROP tables |
This guide has covered the full spectrum of SQL interview questions — from foundational concepts like DDL/DML and primary keys, through intermediate query patterns including JOINs, GROUP BY, and NULL handling, to advanced SQL interview questions on window functions, CTEs, execution plans, transactions, and ACID properties. Whether you are preparing for your first technical screen or a senior-level FAANG interview, the topics above represent the core of what interviewers consistently test.
SQL remains one of the most in-demand technical skills across engineering and data roles. According to Stack Overflow’s Developer Survey, SQL consistently ranks among the top five most-used languages by professional developers globally — making structured SQL preparation directly valuable to your career trajectory.
To go deeper on the topics covered here, explore our related guides — in particular, the advanced SQL interview questions page for more complex optimization scenarios, and the SQL query interview questions resource for hands-on practice problems.
Ready to sharpen your SQL skills with real interview practice? Interview Kickstart’s Data Engineering Interview prep course is taught by FAANG hiring managers — so you practise exactly what gets tested.
The basics — SELECT, WHERE, JOIN, GROUP BY — are learnable in a few weeks with consistent practice. Mastery of advanced topics like window functions, query optimization, and execution plans takes longer and comes from solving real problems. For interview purposes, focus on depth over breadth: knowing five topics well beats knowing twenty topics shallowly.
SQL is the standard language for querying relational databases — used in backend application development, data analysis, business intelligence reporting, data engineering pipelines, and database administration. If data is stored in a structured database, SQL is how you access and manipulate it.
Yes — SQL is tested across software engineer, data analyst, data engineer, and DBA roles. For data-heavy roles it is often the primary technical screen. For software engineers, it typically appears as a secondary skill. FAANG-level interviews often include advanced SQL questions on window functions, optimization, and schema design.
The most fundamental commands are SELECT (retrieve data), INSERT (add rows), UPDATE (modify rows), DELETE (remove rows), CREATE (define tables/schemas), and DROP (remove tables). These split into DDL (CREATE, DROP, ALTER — structure) and DML (SELECT, INSERT, UPDATE, DELETE — data).
SQL is a language standard — the grammar you write queries in. MySQL, PostgreSQL, Oracle, and SQL Server are database systems (engines) that implement and extend that standard. Knowing SQL means you can work across all of them; each system adds its own extensions and features on top of the common SQL foundation.
Recommended Reads:
Attend our free webinar to amp up your career and get the salary you deserve.
Time Zone:
Master ML interviews with DSA, ML System Design, Supervised/Unsupervised Learning, DL, and FAANG-level interview prep.
Get strategies to ace TPM interviews with training in program planning, execution, reporting, and behavioral frameworks.
Course covering SQL, ETL pipelines, data modeling, scalable systems, and FAANG interview prep to land top DE roles.
Course covering Embedded C, microcontrollers, system design, and debugging to crack FAANG-level Embedded SWE interviews.
Nail FAANG+ Engineering Management interviews with focused training for leadership, Scalable System Design, and coding.
End-to-end prep program to master FAANG-level SQL, statistics, ML, A/B testing, DL, and FAANG-level DS interviews.
Get your enrollment process started by registering for a Pre-enrollment Webinar with one of our Founders.
Time Zone:
Join 25,000+ tech professionals who’ve accelerated their careers with cutting-edge AI skills
25,000+ Professionals Trained
₹23 LPA Average Hike 60% Average Hike
600+ MAANG+ Instructors
Webinar Slot Blocked
Register for our webinar
Learn about hiring processes, interview strategies. Find the best course for you.
ⓘ Used to send reminder for webinar
Time Zone: Asia/Kolkata
Time Zone: Asia/Kolkata
Hands-on AI/ML learning + interview prep to help you win
Explore your personalized path to AI/ML/Gen AI success
The 11 Neural “Power Patterns” For Solving Any FAANG Interview Problem 12.5X Faster Than 99.8% OF Applicants
The 2 “Magic Questions” That Reveal Whether You’re Good Enough To Receive A Lucrative Big Tech Offer
The “Instant Income Multiplier” That 2-3X’s Your Current Tech Salary