Yep – there’s a Part 3 to this Closer Look at SELECTs series. Who knew there was this much to say about the SELECT statement?
(If you missed Part 1 and Part 2, check them out!)
By looking at SELECT statement, you can tell a lot about your database design before looking at a data model or checking system tables or executing any of the sp_help* stored procs. You’ll want to consult any and all of those to help you along the way but the SELECT statements can help give clues about what you will find or what you should find.
One obvious thing you can find out is what the relationships between your tables are. From there, it should be easy to see what the foreign key are. How those queries are written can also give the key as to why that may not be.
So what do I mean?
Take the following statement:
SELECT a.Account_ID, a.Account_Name, a.Primary_Contact_Person_ID, a.Alternate_Contact_Person_ID, p.Full_Name as Primary_Contact_Person FROM dbo.Account as a JOIN dbo.Person as p ON a.Primary_Contact_Person_ID = p.Person_ID ;
Just from this single statement, you can guess that the Primary_Contact_Person_ID on Account is the foreign key to Person’s Person_ID field. If we looked at the execution plan, we can confirm that the foreign key has the proper index supporting it and whether it’s being used. You may also want to see if the foreign key is in place as well and if there isn’t one, find out why and see what you can do about that.
And of course, if you notice that these objects don’t exist, you can add them.
What other types of relationships can we learn about our design? Let’s look at the following SELECT statements:
SELECT a.ID, a.Name, c.ID, c.Title FROM tableA as a JOIN tableC as c ON a.ID = c.ID WHERE a.TypeID = 1; SELECT a.ID, a.Name, d.ID, d.Agency FROM tableA as a JOIN tableD as d ON a.ID = d.ID WHERE a.TypeID = 2; SELECT a.ID, a.Name, e.ID, e.Position FROM tableA as a JOIN tableE as e ON a.ID = e.ID WHERE a.TypeID = 3;
This suggests a super\subtype relationship with tableA being the super type table. Here’s the fun part of this – you usually identify these relationships when creating a logical model but here we can see one way that it’s implemented in the physical design.
Let’s modify those queries and rename these tables to make more sense.
SELECT a.Person_ID, a.Name, c.ID, c.Title FROM Person as a JOIN Staff as c ON a.Person_ID = c.Person_ID ; SELECT a.Person_ID, a.Name, d.ID, d.Agency FROM Person as a JOIN Agent as d ON a.Person_ID = d.Person_ID ; SELECT a.Person_ID, a.Name, e.ID, e.Position FROM Person as a JOIN Player as e ON a.Person_ID = e.Person_ID ;
Now we’re adding context to help better understand what we’re doing. We’re look at people and data related to specific roles. You’ll notice that I removed the WHERE clause where I used a TypeID in the dummy example. When I filled in the table names, I realized that it was probably likely that there may be situations where a person could have information in more than one table. This was a business logic decision that I made as part of my design. If I decided to use a different example for data, such as ways to contact a person (postal, email, phone, etc.), I may not have made the same decision. But by looking at the statements, you can get a hint at the business logic involved.
But that’s my point. You can write other SELECT statements that reference these tables and see if your application is designed to handle the proper business requirements. You may already have the queries in your database or application logic but you can always write various checks yourself.
Also, pay attention to the types of JOINs that are being used. If you see any OUTER JOINs, it means the relationships are optional. But keep in mind, if you see outer joins but the foreign key fields are NOT NULL, it may be an indicator of other issues. For example, should the OUTER JOIN be an INNER JOIN or are you allowing for orphaned data and what are the implications of that decision? Is something missing that shouldn’t be?
Examining your SELECT statements will also tell you what the important fields in a table are. What fields do you always see listed in the column list or are always listed in the GROUP BY when writing aggregates? What fields do you always see in the WHERE clause? If you’re looking at overall database performance, these are the fields you’ll want to keep an eye to see if they’re indexed properly.
You may also notice things like columns which are constantly converted to other types or always have ISNULL statements around them. This could indicate things like the data type for the underlying wasn’t set up properly. You can look at a nullable column when that NULL isn’t treated as the absence of a value but a stand in for something. For example, I find this happens with bit datatype fields where NULLs are treated the same as 0. If a NULL isn’t different than a zero, then do you really want the column to be nullable? As you read SELECT statements more, you’ll be able to start seeing patterns.
Once you start seeing these patterns, you really start to understand the data in your database. From here, you can analyze how are the tables are being set up, determining whether the proper indexes are in place, figuring out if constraints and indexes are missing, whether there are database design issues that should be addressed, etc. I’ve pointed out little ways you do that along the way but now you can see the bigger picture. You can start using execution plans to further look at the queries to help you answer these questions. You can use a data modeling tool to visualize your database and use the notes and description options for the extra information about fields and tables that may not be stored elsewhere. From here, you ensure your data model is up to date as you use it to make changes to the database going forward.
Overall, you put yourself in a better position to troubleshoot performance and perhaps even speed up the process because you already understand the database or know what you’re trying to return with your data. You may even have the ability to be proactive to prevent your customers from knowing there was ever an issue.
But you did catch the trend here, right? While you can find a good amount of information from a single SELECT, it’s the combination of all the SELECTs where you’re going to get the most information. So just as you’re building your skills from a simple SELECT statement to more advanced complex statements, you’re doing the same thing with your overall understanding of how your database works and what you can do to improve it.
Is it a stretch to say this is all because of the humble little SELECT? Absolutely. But it does get to the importance of understanding how SELECT statements work as individual statements and how you use them altogether in your database. That understanding will help make your skill set even more valuable.
One thought on “A Closer Look at SELECTs – Part 3”