četrtek, 28. februar 2019

Subquery vs join performance example

Issue

We had very slow query running from 20 to 70 minutes. Also, it affected some other database processes during execution. We replaced subquery with join and overall performance of query is much better now. Let's see performance graph created with OBI. Performance improved to 6 minutes in average. You can see minutes on Y axis.













In this article I try to restore issue on EMP, DEPT example. Please download and run emp_dept.sql script. It will insert 4004 records in DEPT table and 1300013 records in EMP table.


Non-optimized query

select e.job, e.deptno from emp e
group by e.job, e.deptno 
having avg(e.sal) > 
(
    select avg(e1.sal) from emp e1 where e1.deptno = e.deptno
); 

It takes more than 2 minutes on Oracle database version 11.2.0.4.0 to fetch first 500 of 4004 rows. 


Optimized query

with avgs as
(
select deptno, avg(sal) avg_sal from emp group by deptno
select e.job, e.deptno from emp e
join avgs on avgs.deptno = e.deptno
group by e.job, e.deptno, avgs.avg_sal 
having avg(e.sal) > avgs.avg_sal;

It takes 1 second to fetch first 500 of 4004 rows!


Execution plans


Non-optimized query

Plan hash value: 2799045373
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |  1303K|    39M|  2606   (6)| 00:00:32 |
|*  1 |  FILTER             |      |       |       |            |          |
|   2 |   HASH GROUP BY     |      |  1303K|    39M|  2606   (6)| 00:00:32 |
|   3 |    TABLE ACCESS FULL| EMP  |  1303K|    39M|  2499   (2)| 00:00:30 |
|   4 |   SORT AGGREGATE    |      |     1 |    26 |            |          |
|*  5 |    TABLE ACCESS FULL| EMP  | 13031 |   330K|  2505   (2)| 00:00:31 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(AVG("E"."SAL")> (SELECT AVG("E1"."SAL") FROM 
              "BK_APEX_MANUAL_CMS"."EMP" "E1" WHERE "E1"."DEPTNO"=:B1))
   5 - filter("E1"."DEPTNO"=:B1)
Note
-----
   - dynamic sampling used for this statement (level=2)
   - SQL plan baseline "SQL_PLAN_9rx6m0mzg4ja1984ce69e" used for this statement

Note use of binded variable :B1 in the plan. Following query uses exactly the same execution plan.

select e.job, e.deptno from emp e
group by e.job, e.deptno 
having avg(e.sal) > 
(
    select avg(e1.sal) from emp e1 where e1.deptno = :b1
); 

Check hash value of the plan. It's equal to previous plan hash:

Plan hash value: 2799045373
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |  1303K|    39M|  2606   (6)| 00:00:32 |
|*  1 |  FILTER             |      |       |       |            |          |
|   2 |   HASH GROUP BY     |      |  1303K|    39M|  2606   (6)| 00:00:32 |
|   3 |    TABLE ACCESS FULL| EMP  |  1303K|    39M|  2499   (2)| 00:00:30 |
|   4 |   SORT AGGREGATE    |      |     1 |    26 |            |          |
|*  5 |    TABLE ACCESS FULL| EMP  | 13031 |   330K|  2516   (3)| 00:00:31 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(AVG("E"."SAL")> (SELECT AVG("E1"."SAL") FROM 
              "BK_APEX_MANUAL_CMS"."EMP" "E1" WHERE "E1"."DEPTNO"=TO_NUMBER(:B1)))
   5 - filter("E1"."DEPTNO"=TO_NUMBER(:B1))
Note
-----
   - dynamic sampling used for this statement (level=2)

It's obvious that execution plan doesn't know how many times subquery must be evaluated. Subquery is evaluated for each deptno (4004 times). Subquery unnesting doesn't take place here. 
Execution plan doesn't reflect real effort to process query with nested subquery. 


Optimized query

Execution plan reflects real processing: 

Plan hash value: 652522642
---------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |   987M|    53G|       |   136K (93)| 00:27:20 |
|*  1 |  FILTER                |      |       |       |       |            |          |
|   2 |   HASH GROUP BY        |      |   987M|    53G|       |   136K (93)| 00:27:20 |
|*  3 |    HASH JOIN           |      |   987M|    53G|    47M| 18931  (48)| 00:03:48 |
|   4 |     VIEW               |      |  1303K|    32M|       |  2606   (6)| 00:00:32 |
|   5 |      HASH GROUP BY     |      |  1303K|    32M|       |  2606   (6)| 00:00:32 |
|   6 |       TABLE ACCESS FULL| EMP  |  1303K|    32M|       |  2499   (2)| 00:00:30 |
|   7 |     TABLE ACCESS FULL  | EMP  |  1303K|    39M|       |  2499   (2)| 00:00:30 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("AVGS"."AVG_SAL"<AVG("E"."SAL"))
   3 - access("AVGS"."DEPTNO"="E"."DEPTNO")
Note
-----
   - dynamic sampling used for this statement (level=2)
   - SQL plan baseline "SQL_PLAN_1kp0nyy5tdw9r987d3348" used for this statement