Posted in SQL Server, T-SQL Tuesday

T-SQL Tuesday #88: The Daily DB WTF

It’s time for another T-SQL Tuesday!

This month is hosted by Kennie Pontoppidan (b|t). The topic is “the daily (database-related) WTF”. The challenge is to be inspired by IT horror stories from the dailywtf.com and add our own stories.

I’m one of the DBA types who works on the product\application development side of the house. It is my job to make sure that WTF database designs do not happen. However, I’m usually outnumbered and there’s only so much I can do. But if I can win these arguments 50.1% of the time, it still means I’m winning more than I’m losing. However it also means that I still lose quite a bit. And when you come up with a compromise, things can go weird.

I feel safe in sharing this story for a number of reasons.

  1. It was over a decade ago so I’m hoping the statutes of limitations have expired.
  2. It was not the outcome I wanted. In other words, this is part of the 49.9% of the times I lost.
  3. I am hopefully now relatively secure enough that I can admit that I didn’t help the situation.
  4. I found a way to turn this into a learning opportunity on how to avoid this in the future.

I’ll admit that I’m fairly fuzzy on some of the details of things now. But I remember the important pieces.

The database had a high number of transactions per second. The most frequently updated table was also one of the most central tables in the database. The transactions for this table were updates; inserts and deletes were rare. It had about 40 columns plus with a good percentage of those being nullable fields. I don’t remember the exact number but I think the table would have had about 30,000 records on average.

The programmers came to me and said we need to add a large number of columns to this table for one piece of functionality. It would more than double the total number of columns on the table. Oh, and all of the new columns would be NULL since we would only need to populate them if they were using that functionality and even then, not all of them would require data. The final result would be that 65-75% of the table would end up having nullable fields with the majority of those having NULL for the value.

I said what I think any sane DBA would say to this request: No.

BadIdea-giphy

The programmers said: But this is what we want to do.

Again, I said what I think any sane DBA would say to this request: No.

But I clarified: This table would be fairly wide. It’s used by almost everything in the database and it’s constantly being updated, which means locking and blocking are issues. For performance reasons, we shouldn’t be doubling the size with all of these NULL columns if they’re not going to be used by the majority of the records.

I even came up with an alternate solution:

GoodIdea-giphy

Let’s create a sub-type table or a vertical partition of the table to hold the columns needed by the new functionality so only the records that need to store this data will have it. It will be less rows which means the functionality that uses the fields can be more efficient since it’s looking at a smaller subset of data. Plus, there will be less conflicts for the main table that’s constantly being updated.

The programmers said: But we really want this on the same table….

Racking my brain for some sort of compromise, I came up with an idea. We would have been on SQL Server 2000 at the time. I had read something about table design which said because of the way data was written to and read from disk, the order of columns by size and nullability impacted performance. So, if all the NOT NULL fields were listed first order from smallest to largest data size followed by the NULL fields, also ordered by data size, it was faster to read because instead of having to jump back and forth to accommodate different data lengths and missing data along the way, it created a little more order to speed things along. Or something like that.

As an aside, I don’t know if how data is read from disk on this level is something we even worry about these days with the newer disk technologies. But this theory is why I am still in the habit of putting primary key and alternate key fields at the top of the list of columns on a table. (Anyone else remember this or is this one of those obscure things I incorrectly picked up somewhere?)

The logic seemed sound at the time. And it appealed to the programmers since it met their goal of one table. I was still pretty skeptical of this solution. So, I was asked to test the different solutions and figure out which option had the best performance.

The good news: One of my solutions proved to have the better set of performance results.

The bad news: It was the one where everything was on the same table but reordered based on nullability and size.

Picard facepalm

I thought it was the wrong solution, even though I suggested it. Forget the past tense verb – I still think it was the wrong solution. But it was the only one I was allowed to go with. So it’s what we implemented – our most used and updated table ended up being one of the widest with the majority of fields being NULL. But it was nicely ordered.

*sigh*

Soon after this, we were hiring a database architect. One of my interview questions for the candidates was: “The programmers want to add 50 nullable columns to the table with the most reads and writes in the database. What do you do?”

And now, I’m off to find my dunce cap and sit in the corner for this one. Please be kind in your responses …

 

3 thoughts on “T-SQL Tuesday #88: The Daily DB WTF

  1. Hi,
    Nice article, however I think following statement doesn’t make any sense in SQL Server reality:

    “Racking my brain for some sort of compromise, I came up with an idea. We would have been on SQL Server 2000 at the time. I had read something about table design which said because of the way data was written to and read from disk, the order of columns by size and nullability impacted performance. So, if all the NOT NULL fields were listed first order from smallest to largest data size followed by the NULL fields, also ordered by data size, it was faster to read because instead of having to jump back and forth to accommodate different data lengths and missing data along the way, it created a little more order to speed things along. Or something like that.”

    Reason is described here:
    http://aboutsqlserver.com/2013/10/15/sql-server-storage-engine-data-pages-and-data-rows/

    but also in books, like “SQL Server Internals”

    In short, SQL Server doesn’t take into account order of columns that much. It by default firstly stores columns with a fixed data types, then it stored metadata about variable length column, and then it stores variable length columns data.

    So fixed data types columns grouped and stored always in a beginning of the row and variable length types data stored in the end.

    Like

    1. Hi Alexandr – Thanks for the link! That’s a great explanation of page storage.

      You’re right that my logic doesn’t make sense. I don’t know if there was something specific to the way things were stored and read in SQL Server 2000 or earlier that may have changed with SQL Server 2005 or later. I don’t usually work with database internals this way so I’m not 100% sure how it has worked or changed over the different versions. Of course, the other possibility is that I just misinterpreted the way that data storage works. I’m willing to admit that.

      I think the main point I was trying to make was that just recreating the table with the columns rearranged by fixed and varying data lengths, NULLs and NOT NULLs aside, should not have had the best performance and should not have been the best solution. Especially if I didn’t understand how to do it properly so I set things up incorrectly. It probably means there was something else at play that we never looked into. As I said, I think I made matters worse because it worked. Although if we think about it, my applying incorrect logic almost makes it an even better DB WTF story to share.

      But thanks for your comment. I appreciate the feedback!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s