- Get link
- X
- Other Apps
By
Unknown
-
What order does PROC SQL do your work?
To optimize a Structured Query Language (SQL), the database professional must be friend this order and perhaps even embrace it.
The best onlie and class room training center for SAS, Signetsoft, Bangalore.
Who
is your best friend? I’m talking about the order in which SQL processes
your statements. Simply put, in what order does SQL do your work? (From
my previous post you know another SQL rule, the order in which YOU
submit statements to SQL.) I just want us to be clear that these are two
very different orders.
First up, what makes SQL different from other programming languages?
It’s the way SQL processes code. Most programming languages including
SAS start at the top of your code and make their way to the bottom.
However, SQL processes them in a unique order, known as Logical Query
Processing Phase. This phase (also referred to as a clause) generates a
series of virtual tables with each virtual table feeding into the next
clause or phase. And no, we can’t view these virtual tables, sorry about
that. That’s just the way SQL works!
When I shared the processing order with my recent SQL1 class, Lindsay Jacks of Cancer Care Ontario asked
if I would blog about it. I hope this sharing will help breakdown any
surprises you may expect to see when you submit SAS code. And in turn
help in understanding how to optimize SQL queries.
Here then is the logical query processing phase order.
1. FROM
The 1st clause
SQL processes is the FROM. It tells SQL where to grab your tables from.
Given that this clause is the first to execute, it’s also our first
chance to boil down your table sizes. From an efficiency perspective,
this is why we may need to think about putting as many ON clauses as
possible on JOINs versus putting them on the WHERE clause.
2. WHERE
The 2nd clause
SQL processes is the WHERE. The WHERE clause pre-processes data. It
selects just the rows that meet the WHERE criteria. The results of
WHERE clause processing are stored in an intermediate table.
SQL coders who have sometimes been surprised to see the WHERE clause
fail may now have their answer. In the example that follows, the WHERE
has no idea how to filter Bonus and doesn’t know what Bonus means. Just
because we constructed Bonus on the SELECT doesn’t mean anything to the
WHERE as the SELECT is one of the last clauses in the processing order
& yet to be looked at by SQL. Consider the following example where
the Log rightly complains:
43 /* Try to subset by referencing a new column with its alias */ 44 proc sql; 45 select Employee_ID, Employee_Gender, Salary, 46 Salary * .10 as Bonus 47 from orion.Employee_payroll 48 where Bonus < 3000; ERROR: The following columns were not found in the contributing tables: Bonus. 49 quit; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
3. GROUP BY
The 3rd clause
SQL processes is the GROUP BY. The role of the GROUP BY is to get your
data into groups. To do this grouping, sometimes SQL may need to order
your data, but there’s no guarantee that data will be in ascending or
descending order. Explicitly placing an ORDER BY clause is our only
guarantee that data will be in order.
4. HAVING
The 4th clause
SQL processes is the HAVING. This clause subsets the groups created by
the GROUP BY clause based on the HAVING clause predicates, and, like
the WHERE clause, builds an intermediate table.
5. SELECT
The 5th clause
SQL processes is the SELECT. This is what tells SQL what columns to
pull into the query, existing or calculated. Note: Columns are
calculated on the SELECT only one time regardless of how often they are
referenced, whether on the WHERE (with the calculated keyword) or the
ORDER BY (without the calculated keyword). The SELECT statement is able
to use the results of anything it calculated in other instances using
the same calculation.
6. ORDER BY
The 6th clause
SQL processes is the ORDER BY. By now it should come as no surprise
that the ORDER BY has access to columns created on the SELECT but the
WHERE doesn’t (see 2 above to as a reminder).
Any earlier confusion about the calculated SELECT columns (see 2 again)
not being available in the WHERE clause, but available in the ORDER BY
clause would now make perfect sense. Because the SELECT is executed
before the ORDER BY, all columns in the SELECT will be available at the
time of ORDER BY execution. So the following code works perfectly:
68 proc sql; 69 select Employee_ID, Employee_Gender, salary, 70 Salary * .10 as Bonus 71 from orion.Employee_payroll 72 order by 4; 73 quit; NOTE: PROCEDURE SQL used (Total process time): real time 0.03 seconds cpu time 0.03 seconds
Itzik Ben-Gan makes a good point with a poster (i can provide in my next post) that captures the processing order.
Why do I call this processing order your best friend in the database
world? Because we have likely at some time come across never ending
queries and not known how to optimize. Here then is a starting point for
us to try to keep in mind next time we submit a query, so we can modify
the way we optimize SQL queries.
(courtesy : sasblogs
SIGNETSOFT is the best Training center in Bangalore for SAS, SAS clinical, SAS projects, SAS Certification, Advanced Excel, VBA, Clinical Data Management.
A practical approach! and real time expert faculty, good in placement record.
contact us: info@signetsoft.com
mob: 9844559330
Phone: 080-41 666 550 Website: http://www.signetsoft.com
- Get link
- X
- Other Apps
Comments
Post a Comment