Want to learn a new skill in 2024?
Make it
#SQL
!
Oracle Dev Gym has a series of FREE classes to help you do this
Start with Databases for Developers: Foundations
Join at
Make learning
#SQL
your New Year's Resolution!
Oracle Dev Gym has many FREE courses to help you with this
Join these at
Beginners
Improvers
Performance tuning
Analytic SQL
Indexes and transactions are fundamental concepts for relational
#databases
But how do they work?
Read this primer by
@myusuf3
for an introduction to these
Want to select values from the previous/next row in the results?
Use
LAG | LEAD ( ... ) OVER ( ORDER BY ... )
Control how many rows to look forward/back with the second parameter
LAG ( x, 3 ) OVER ( ORDER BY x )
Gets the value of X three rows before the current
#SQL
Use the WITH clause to create named subqueries
WITH cte AS ( ... )
SELECT ... FROM cte
This makes
#SQL
easier to read
Using it you can write queries in a way that matches the logical order of processing
Or define common joins/expressions once and reuse them in a query
With GROUP BY you can get summaries of all the rows with the same value
But what if you want to see all the rows too?
Remove GROUP BY and add
OVER ( PARTITION BY ... )
After the aggregate functions to preserve the result set; e.g.
COUNT (*) OVER ( PARTITION BY ... )
#SQL
Want to join rows from one table to the top-N from another?
e.g. customers and their last three orders?
You can use a LATERAL join
SELECT ... FROM t1
LEFT JOIN LATERAL (
SELECT * FROM t2 WHERE t1.col = t2.col
ORDER BY ...
)
@tobias_petry
explains
Want a running total of the past N rows in
#SQL
?
Set the window to:
OVER (
ORDER BY ...
ROWS BETWEEN n PRECEDING AND CURRENT ROW
)
You can even include rows after the current with
OVER (
ORDER BY ...
ROWS BETWEEN n PRECEDING AND n FOLLOWING
)
Want to return every row from one table combined with every value from another table?
e.g. show every customer & their order summary for each day - including days with no orders
Use a partitioned outer join!
FROM outer_tab
LEFT JOIN inner_tab PARTITION BY ( inner_cols )
#SQL
Want to stop people running SELECT * on a table?
Add a virtual column returning an error, such as divide by zero:
alter table t
add zero_divide int as ( 1/0 );
select * from t;
ORA-01476: divisor is equal to zero
#SQL
You can combine tables with UNION
This deduplicates the rows, adding an extra sort/hash step
For large tables this can slow your
#SQL
down
If you know the tables contain different data UNION ALL is better
This returns all rows from both tables & avoids the deduplication op
You can traverse hierarchies in
#SQL
using recursive WITH
WITH tree ( ... ) AS (
SELECT ...
UNION ALL
SELECT ...
FROM tree ...
)
SELECT ...
FROM tree
This enables you to follow parent-child relationships in your data
@vlad_mihalcea
demos
Accidentally dropped the wrong table?
In production? 😱
Get it back with
FLASHBACK TABLE <tablename>
TO BEFORE DROP;
You can also use flashback to recover wrongly deleted rows with
FLASHBACK TABLE <tablename>
TO TIMESTAMP ...;
#SQL
You can turn rows to columns with filtered sums, like:
SELECT year
,SUM(...) FILTER (WHERE month = 1) jan
,SUM(...) FILTER (WHERE month = 2) feb
...
FROM ...
Or CASE expressions for databases without FILTER
Or the PIVOT clause
@MarkusWinand
demos
When counting rows
COUNT (*) = number of rows in group
COUNT ( col ) = number of rows where COL IS NOT NULL in group
This distinction is particularly important for outer joins
To count matching rows from the inner table, you must count a column on this table
#SQL
Window functions compute their result across a sliding window of rows
This makes it easy to calculate running totals in
#SQL
This cheat sheet from
@vertabelo
gives an overview of how these work
Want to split rows into N buckets with equal-sized ranges?
Use
WIDTH_BUCKET ( val, min, max, N )
This creates N intervals between min and max, each with the same range of values
Generate the min/max values from the data by passing
MIN | MAX ( val ) OVER ()
#SQL
Looking for free resources to help you learn and use
@OracleDatabase
?
Join Oracle Dev Gym to learn
#SQL
for free
Use it for free with
* Oracle Cloud Free Tier
* XE
* Live SQL
* LiveLabs
*
There are three parts to the OVER clause:
- PARTITION BY
- ORDER BY
- Window frame
This cheat sheet from
@vertabelo
explains how these clauses work and details common functions that use them
#SQL
You write
#SQL
queries starting with SELECT, but the first operation is FROM!
@lukaseder
explains the true logical order is
FROM
WHERE
GROUP BY
HAVING
WINDOW
SELECT
DISTINCT
UNION, INTERSECT, EXCEPT
ORDER BY
OFFSET
LIMIT, FETCH, TOP
Want to "fill-down" the previous non-null value in the results to replace nulls?
You can do this with
LAST_VALUE ( ... ) IGNORE NULLS OVER ( ... )
#SQL
Want to learn
#SQL
?
Oracle Dev Gym has these FREE online classes including certificates of completion
Beginners'
Improvers'
SQL tuning basics
Analytic SQL
A
#SQL
query with
GROUP BY ROLLUP ( c1, c2 )
Calculates the subtotals for each group, working right-to-left through the rollup.
This creates number of cols + 1 subtotals
So the above returns the totals for these groups
C1, C2
C1
Grand total
It's nearly the end of 2019!
To find all the rows for this year, avoid
WHERE EXTRACT(YEAR FROM <dt>) = 2019
Instead use
WHERE <dt> >= DATE'2019-01-01'
AND <dt> < DATE'2020-01-01'
So your
#SQL
can use an index & other benefits
@MarkusWinand
discusses
If you have many OVER clauses in a
#SQL
query with the same grouping or sorting, you can define it once in the WINDOW clause & reuse it
SELECT
count ( ... ) OVER ( W ),
sum ( ... ) OVER ( W )
FROM ...
WINDOW W as ( PARTITION BY ... ORDER BY ... )
Now in Oracle Database 21c
You can do IF-THEN logic in
#SQL
with a searched CASE expression
CASE
WHEN condition1 THEN ...
WHEN condition2 THEN ...
...
ELSE ...
END
Or compare a single value with a simple case expression
CASE v
WHEN val1 THEN ...
WHEN val2 THEN ...
...
ELSE ...
END
Get the first or last value in a sorted group in
#SQL
{ MIN | MAX } (...) KEEP (
DENSE_RANK { FIRST | LAST } ORDER BY ...
)
When there are ties MIN returns the lowest value, MAX the highest
Use this for problems like
Find the highest salary for the most recently hired staff
A LEFT (OUTER) JOIN returns
All the rows from the table on the left side of the join
Any rows matching the join criteria from the table on the right
If there are no matching rows in the right table, the result show NULL for these columns
A RIGHT JOIN does the opposite
#SQL
Say NO to Venn Diagrams When Explaining JOINs!
They show set operations
Which are the
#SQL
operators UNION, INTERSECT, & MINUS/EXCEPT
@lukaseder
explains a better way to visualize joins
Using SELECT DISTINCT in
#SQL
removes duplicates rows
But often adding this is a sign you should change your query e.g.
To find all the departments with employees earning > X
Instead of
SELECT DISTINCT ... JOIN
Use
EXISTS for just the depts
or
GROUP BY to count the emps
Want to get your
#SQL
query results as CSV, JSON or INSERT statements?
Do this in
@OracleSQLDev
by adding the format as a comment to the statement
For example
SELECT /*csv*/ ...
Then run the query in script mode (F5 by default)
This also supports XML, HTML, and fixed-width
Want to learn a new programming language in 2021?
Choose
#SQL
!
It's widely used and here to stay
These free courses on Oracle Dev Gym will get you started
Beginners
Intermediate
SQL tuning
Want to find duplicate values in a column with
#SQL
?
Use
SELECT col1, COUNT(*)
FROM ...
GROUP BY col1
HAVING COUNT (*) > 1
To find pairs of columns with duplicates list both in the SELECT & GROUP BY
SELECT col1, col2, COUNT(*)
FROM ...
GROUP BY col1, col2
HAVING COUNT (*) > 1
Want to get the Nth row (only) from a table?
Use
select * from (
select
row_number() over (
order by <sort cols>
) as rn,
...
from <table>
)
where rn = N
For other techniques, see
#SQL
Use window functions to get running totals in
#SQL
Just remember: the window is calculated after filtering
So placing the function in a subquery can give different results
e.g. these are not the same
select count () over () ...
from (
select count () over () ...
)
where ...
Want to convert rows to columns in
#SQL
?
Do this in Oracle Database with
PIVOT (
fn FOR source_column IN ( v1, v2, ... )
)
You can pivot many functions; every value in the IN list has a column for each fn
Columns in the table not in the PIVOT form an implicit GROUP BY
Want to get the value at position N in a result set?
You can use
NTH_VALUE ( col, N ) OVER ( ORDER BY ... )
Filter on this to find all rows this value
Note that without the window
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
This returns null up to row N
#SQL
In the JOIN clause you can go for:
t1 JOIN t2 ON t1.c1 = t2.c1
or
t1 JOIN t2 USING ( c1 )
The key difference is USING de-duplicates the join columns in the results when you use SELECT *
@vlad_mihalcea
explains
#SQL
Are you using the Spring Data JPA?
Avoid using the findById method to lookup the parent value for foreign keys when inserting child rows
This runs an extra SQL query to get values you already have!
Use getReferenceById instead
@vlad_mihalcea
explains
Venn diagrams are a common way to visualise
#SQL
joins
But this is WRONG!
These show set operators (UNION, etc.)
Instead, think of them filtering the Cartesian product of the tables
@lukaseder
explains
The
#SQL
ANY operator compares the lefthand value with all those on the right
It returns all the rows where at least one comparison is true
e.g.
where col > ANY ( select id ... )
Gets all the rows where COL > at least one ID
@vlad_mihalcea
discusses
Normalize to avoid data errors!
1NF - All tables are relations
2NF - All non-key values depend on the whole key
3NF - All non-key values only depend on the key
4NF & 5NF are for many-to-many relationships
William Kent explained with examples in 1982!
You can traverse data trees in
#SQL
using recursive with:
with tree ( ... ) as (
select ... from tab where root = 'Y'
union all
select ...
from tree
join tab
on tab.parent =
)
select * from tree
Want a
#SQL
query returning sales for every product for every week?
But some products had no orders in some weeks?
A partitioned outer join can help
This returns
Every row from the outer table
Combined with every value in the partition columns from the inner table
Calculate moving averages with in
#SQL
AVG ( ... ) OVER ( ORDER BY ... [ ROWS | RANGE ] n PRECEDING )
ROWS => two previous ROWS & current
RANGE => rows between current value - 2 and current value
These can give different results when there are gaps in the data
A full outer join returns
Every row from BOTH tables
If no row in the other table has the same value for the join columns, the other table's columns are null in the output
#SQL
Want to filter rows in the RIGHT table in a LEFT OUTER JOIN?
Ensure you do this in the join (ON) clause
Filtering the right table in the WHERE effectively makes it an INNER JOIN!
This is because the condition is NOT TRUE for all the values missing from the right table
#SQL
Want to get what percentage a value is of a total in
#SQL
?
Use
RATIO_TO_REPORT ( val ) OVER ( ... )
This returns
val / SUM ( val ) OVER ( ... )
To find the percentage within a subgroup of the values, add
PARTITION BY group
To the OVER clause
Are you learning
#SQL
on Oracle Database and looking for FREE instances to practice on?
@thatjeffsmith
shows you how using
Live SQL
Always Free Oracle Autonomous Cloud Database Service
VirtualBox Database Appliance
Oracle Database XE
Want to add row numbers to a
#SQL
result set?
[ ROW_NUMBER | RANK | DENSE_RANK ] OVER ( ORDER BY ... )
The difference comes when the sort expression has duplicates:
Row_number - unique, no gaps
Rank - ties have same number, gaps
Dense_rank - ties have same number, no gaps
Use Common Table Expressions (CTEs) aka the WITH clause to name subqueries:
with oft_buy_together as (...),
oft_buy_soon_after as (...),
product_recs as (...)
select *
from product_recs
making your
#SQL
easier to read & debug
@MarkusWinand
explains
Want to generate subtotals for all combinations of columns in a GROUP BY?
Use
GROUP BY CUBE ( c1, c2 )
This creates 2^num cols groups
So the above creates subtotals for these groups
C1, C2
C1
C2
Grand total
#SQL
Join inside a subquery with CROSS APPLY
A handy way to find matching rows and return
The top-N
CROSS APPLY (
select ... from right_t rt
where lt.col = rt.col
FETCH FIRST ...
)
Aggregations
CROSS APPLY (
select SUM, AVG, ... from right_t rt
where lt.col = rt.col
)
#SQL
Want to stop a long-running query in Oracle Database?
Instead of killing the session, from 18c you can stop it with
ALTER SYSTEM CANCEL SQL '<sid>, <serial#>'
Want to find consecutive rows with
#SQL
?
This links consecutive rows into one group with pattern matching in Oracle Database:
SELECT * FROM ... MATCH_RECOGNIZE (
ORDER BY ...
MEASURES ...
PATTERN ( init consecutive* )
DEFINE consecutive AS col = PREV ( col ) + 1
)
The first clause in
#SQL
queries is SELECT
But (logically) the first clause processed is FROM!
@lukaseder
explains in detail in his beginner’s guide to the true order of SQL operations, that goes:
FROM
WHERE
GROUP BY
aggregations
HAVING
WINDOW
SELECT
Joining tables on columns with the same name?
Instead of checking they're equal in the ON clause you can use USING
t1 JOIN t2
USING ( col_on_both_tables )
This deduplicates the join columns so they only appear once in SELECT *
@vlad_mihalcea
discusses
You can convert rows to columns in Oracle Database with
PIVOT ( FN for col IN ( 'val1', 'val2', ... ) )
This creates a column for each value in the IN list
But what if you want to generate these dynamically?
@stewashton
shows you how using SQL macros
Get the top or bottom value in an ordered data set with FIRST_VALUE & LAST_VALUE
But beware!
The default window for LAST_VALUE stops on the current row, NOT the last in the results!
To get the final value, set this to
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
Want to get the first non-null expression from a list?
Use
COALESCE ( v1, v2, v3, v4, ... )
This uses short-circuit evaluation
So it stops processing the remaining expressions after the first to return a non-null value
#SQL
#SQL
is the most in-demand US tech skill (according to Indeed)
And has been for years!
Learn this valuable language in the FREE online course, Databases for Developers:
Need to find & remove duplicate rows in Oracle Database?
Find them with
SELECT ... FROM ...
GROUP BY ...
HAVING COUNT (*) > 1
And remove with
DELETE ... WHERE ROWID NOT IN (
SELECT MIN ( ROWID ) FROM ... GROUP BY ...
)
If you want to filter the result of aggregate functions (COUNT, SUM, etc.) in
#SQL
, you do this in the HAVING clause
Not the WHERE clause!
WRONG
select ... WHERE count(*) > 1 group by ...
RIGHT
select ... group by ... HAVING count(*) > 1
Got a large
#SQL
execution plan and trying to find its slowest part?
Tricky, isn't it?
Visualizing it as a flame graph can make analyzing it easier
@TanelPoder
has built a script you can use to get the file in a flame graph friendly format
Want to find all the duplicate values in a column with
#SQL
?
Use
SELECT <column>
FROM ...
GROUP BY <column>
HAVING COUNT ( <column> ) > 1
@javinpaul
demos this and more ways to find duplicate values
Want to view data as it existed at some time in the past?
Do this in Oracle Database with flashback query
select ... from ...
as of timestamp ...
Returns the table state at the specific timestamp
This uses undo to reconstruct the past; it's limited to 15 minutes by default
Looking for free
#SQL
quizzes to practice your skills?
Oracle Dev Gym has a library of hundreds for you to choose from and includes
Workouts to train you on specific topics
Complete classes to teach you SQL
Start training today
Want to find all the DML changes to a table in a period?
Do this in Oracle Database with
select ... from ...
VERSIONS BETWEEN TIMESTAMP
<start> AND <end>
Use the VERSIONS_* pseudo columns to find the what & when
As with flashback query, the default is 15 mins of changes
You can get SQL group totals for SUM, COUNT, etc. with GROUP BY or window functions
What's the difference?
GROUP BY combines data and returns one row for each group
Window functions preserve data and return every row
@sundaskhalid6
explains
#SQL
Get a running total in
#SQL
with
SUM ( ... ) OVER ( ORDER BY y frame 2 PRECEDING )
frame defines the rows to include
ROWS the 2 previous rows & the current
RANGE all rows between (y - 2) & y
GROUPS all rows with the same value as the 2 previous & the current - added in 21c
There are three key join methods in
#SQL
* Nested loops
* Hash join
* Sort merge join
@sqlmaria
explains how these work, how the optimizer chooses which to use, and adaptive joins which allow the plan to switch between nested loops and hash joins
Looking for some top
#SQL
tuning tips?
Check out this presentation from
@sqlmaria
, with details on how to
Check row estimates and table stats
Check the order of columns in indexes
Avoid functions on columns
Want to get the most recent row (max date) for every user_id in a table?
You can use
SELECT ... FROM (
SELECT ...
MAX ( my_date ) OVER ( PARTITION BY user_id ) max_date
FROM ...
)
WHERE my_date = max_date
But there are many more
#SQL
solutions, see
After grouping rows in
#SQL
, sometimes you want to filter the result of aggregate functions (e.g. count, max, avg)
Do this in the HAVING clause, e.g:
HAVING COUNT (*) > 2
Not the WHERE clause!
There are many ways to get
#SQL
query plans, but many omit important details!
One way to get plans with these in Oracle Database is
SELECT *
FROM table (
DBMS_XPLAN.DISPLAY_CURSOR (
FORMAT=>'ALLSTATS LAST ALL +OUTLINE'
)
);
@vldbb
shows more ways
Got scalar subqueries aggregating the same table?
SELECT ...,
( SELECT MIN (...) FROM tab ... ),
( SELECT MAX (...) FROM tab ... )
FROM ...
This is unnecessary work!
Read TAB once by either
Moving the subqueries into one OUTER APPLY
Joining & use window functions
#SQL
Generate subtotals for specific columns with
GROUPING SETS ( ( c1 ), ( c2 ), () )
This adds totals for
c1
c2
Overall total
Use GROUPING_ID to see which columns are aggregated based on their bit-position in the call
GROUPING_ID ( c1, c2 )
c1 => 1
c2 => 2
c1 & c2 => 3
#SQL
Convert
#JSON
arrays to rows-and-columns with
#SQL
in Oracle Database with
JSON_TABLE (
jdoc,
'$.arrayPath[*]'
COLUMN (
col <data_type> '$.jsonPath'
)
)
Add row numbers with a FOR ORDINALITY column
Add DEFAULT val ON ERROR to set a return value when there are problems
Use NTILE to split rows into N buckets with an equal number of rows in each, e.g.
NTILE ( 4 ) OVER ( ORDER BY col )
Splits the data into four groups, sorted by col
Add PARTITION BY to get N groups for each value for c2:
NTILE ( 4 ) OVER ( PARTITION BY c2 ORDER BY col )
#SQL
Need to DELETE millions of rows quickly?
Do this fast in
@OracleDatabase
with a filtered table move:
alter table
move including rows
where rows_to_keep = 'Y'
@ChrisRSaxon
discusses other options to wipe data fast
Want to learn how to make
#SQL
faster?
There's a new FREE performance tuning course on Oracle Dev Gym
Databases for Developers: Performance
This will teach you the basics of optimizing SQL
The class starts on Monday 3 August, registration is now open
Say NO to Venn Diagrams When Explaining JOINs!
They show set operations, are the
#SQL
operators UNION, INTERSECT, & MINUS/EXCEPT
@lukaseder
discusses a better way
A cross join combines every row in the left table with every row in the right table
i.e. the Cartesian product
This can be handy for data generation, but it's rare you'll use cross joins in a real app!
#SQL
To get the first, last, or Nth value using
#SQL
use
FIRST_VALUE ( val ) - Start val
NTH_VALUE ( val, N ) - Val at row N
LAST_VALUE ( val ) - Final val
Add OVER ( ORDER BY ... ) to sort
But beware!
The default window stops on the current value => LAST_VALUE = current last
To run a
#SQL
statement the
#database
takes these steps
Parse
Optimize
Execute
Optimizing is the process of deciding how to access and join the tables
This is resource intensive
@vlad_mihalcea
explains how prepared statements help reduce this work
Coming in Oracle Database 23c
GROUP BY alias or position
Avoid duplicating expressions by referring to them by alias in the GROUP BY
SELECT EXTRACT ( YEAR FROM datecol ) AS yr, ...
FROM ...
GROUP BY yr
#SQL
Got a long-running
#SQL
statement that Just Won't STOP?!
You could kill the session, but that's overkill!
From Oracle Database 18c there's a better solution, cancel it with:
alter system cancel sql '<sid>, <serial>'
@oraclebase
demos
Looking to learn a new skill in 2023?
Make it
#SQL
!
This long-lasting tech will be invaluable throughout your career
Get started with this FREE course on Oracle Dev Gym
Find more free resources for Oracle Database at
Get the top-N rows in
#SQL
with
order by ...
fetch first N rows only
If many rows have the same value for the sort column(s), the result is non-deterministic
Get all the rows with the same value as the Nth of the sorting columns with
order by ...
fetch first N rows WITH TIES
Want to learn about
#SQL
tuning in Oracle Database?
@SQLMaria
has shared the slides from her recent workshop
* Understand the Optimizer
* Managing Optimizer Statistics
* Explain the Explain Plan
* Influencing Plans with Hints
* Tuning Tips and Tricks
For each
#SQL
statement you run, the database must
Parse it
Optimize it
Execute it
But if you're just changing values in your where clause, re-running the first two steps is a waste
Use PreparedStatements to avoid this
@vlad_mihalcea
explains
Running
#SQL
in parallel can make it run much faster...
...but it does lead to more complex execution plans!
@SQLMaria
gives a step-by-step guide to following these plans in
@OracleDatabase
Writing out lots of columns in a query just to omit a couple is a pain; it would be handy to write
SELECT * EXCEPT cols
instead - especially to remove working columns from multi-stage
#SQL
@ChrisRSaxon
shows you how to emulate this in Oracle Database