Posted in Database Development, Source Control, T-SQL Tuesday

T-SQL Tuesday #152 – My rant

Welcome to T-SQL Tuesday! I’m your host this month…

You can find the full invitation here. But to summarize, I’m asking everyone to share that rant they have with their coworkers, the one that makes them throw “It Depends…” out the window.

When I said this was inspired by conversations with coworkers, I wasn’t lying. I realized that I would finish each “rant” with a “I really should write a blog about this”. (And yes, I may have a new session I want to put together that includes this.) I don’t know if this is a rant but as a philosophy I’ve developed. In either case, here’s that blog post:

My coworkers and I were talking about database deployments. The vast majority of my career has been centered around making sure that whatever we develop can be deployed to clients cleanly and repeatably (if that’s a word). I‘ve only worked for software companies so in most of these places, our clients maintained their databases on their own systems or we managed their instances for them. We have multiple versions of the software out in production at any time. One product I worked on was supported on 3 different RDBMS systems. So we had to make sure we have a reliable to make sure that the database changes deployed successfully every single time. I had to make sure that whether we were installing the database at a new client or whether we were upgrading an existing client, the database at the end are the same and have everything that they should have.

That last sentence is key: At any point and time, you should be able to create a brand new version of your database with the basic data set needed to get started. And you should be able to deploy changes to any existing database to get to that same version of your database. How do you do this? You start with a “source of truth”. <begin rant>

The “source of truth” is my newly made up phrase for whatever you are using to say this is my database schema and initial data needed to start up the application. This can be your script directory; this can be a dacpac or bacpac; this can be your data model; this can be a combination of these things. My go-to “source of truth” right now is my source control repository. I’ve got both the schema and the default data needed in the same location. In the past, I would have probably included the data model as way to help me make sure whatever database table changes I have in my source control are there, especially for that one database which only had tables and views. (A different rant for a different time.) Whatever you use, it absolutely CANNOT be an actual database. There are two main reasons for this:

First, an actual database is being used for multiple different purposes whether it’s for production or testing and is actively being modified so it no longer is a reliable source. Because you can’t trust what work is being done at any point and time. If you take a production database, you have production data that you will need to account for and be prepared to clean out, if you even have a good way of know what can be deleted, what the client can and has changed and what you must have. And as much as you like to think that no one is making changes to a production database with you knowing, you know that’s just not the case. You need to be identify what those changes are and determine if and how they need to be backported and made standard for all databases. That gets back to the Catch-22 of how do you know that a database change is standard if you don’t have a way to know what is standard? You can’t trust a QA or Release environment either because there may still be ongoing changes to make sure that the database schema is correct.

(Ask me about the time that the consultant took the QA test database, cleaned out the data, and restored to client production environments only for me to discover this when installers failed in production because he grabbed a backup that had objects that only existed for one day since I completely redesigned them the following day…)

Second, you should have your source of truth checked into source control. This is not just so you can keep track of the changes you’re making now. You will also need to have multiple versions of your “source of truth” at any time. If your clients are all on the same production version, consider yourself really lucky. But what if you need to look at the version before the one that’s currently in production? Or the one before that? I’m used to multiple versions being out in production at the same time so I had to be able to find out what’s in those versions as well. I’ve had to go back to older versions of the database schema in order to troubleshoot a problem because the client was on an older version than the one I was testing in development. This is where source control can help you keep track of what was released and when.

The implication of this is that you will end up having multiple “sources of truth” – one for the different versions for production releases and the version(s) you need as you go through the development lifecycle, such as QA and Release. Why multiple production versions? You need to know what has gone out into production at any point. But the QA and Release versions can help ensure that you know what’s being tested. Those “source of truths” are changing but that’s to be expected. You can still use them to make sure you know what is supposed to be in QA and Release and make adjustments if something is missing or was added before it should have been.

But here’s the real thing about having a “source of truth” – there are multiple people who are likely working on any database at any given point. There needs to be a reliable way to look at source control and find what the definition of the object you need. If I want to see what a table looks like, I should have a standard place for me to go to find it wherever I need to go in source control to find it. If a stored procedure is different in two different databases, I should know where I can find out what that procedure is supposed to look like to help me understand why they are different. If you are on a newer team – maybe the development teams got reshuffled or you just joined a new company, you may not have the institutional knowledge and history of the databases so you can’t guarantee what you’re working with is correct and more importantly, what’s going out into production is correct. This is what a “source of truth” is for. This creates that shared knowledge that you can trust.

My career has been about making sure that we can always do a clean install of the database and upgrade the database to the given schema version, regardless of where we started. I constantly did compares between new installs and upgrades – between the databases and between the databases and the data modeling tool. I made sure that everyone knew how to work the upgrades and was responsible for teaching developers and helping them as the process changed. I could tell you where to find the object in source control and you could be sure that it was the version. There are times where I felt like my job was “SQL Script Cop” because I spent so much time making sure that the scripts we were using and testing were correct. I couldn’t have done any of this if I didn’t create and maintain a reliable “source of truth”. This wasn’t just about making sure that I had a “source of truth”, it was taking ownership and responsibility for the database. And the “source of truth” could go through QA cycles to make sure everything was valid.

When we talk about the importance of data to our applications, it’s this level of ownership that makes our products better for our customers. It streamlines upgrades, and downtime for upgrades, and makes things more reliable. It makes our processes repeatable because we know what to expect so we can better test them to make sure everything works the way we expect it. Every process that we do regarding the database will be built around this concept of a “source of truth”.

Whatever you deem as your “source of truth”, it must be something that everyone knows and is committed to maintain. This has to be built into the development cycle. Trying to create a “source of truth” for a database that has been around for a while is a monumental task to make sure you get correct and you need heavy buy-in and support from the top.

Let’s be honest – this is not an easy thing to create. It’s an even bigger pain to maintain, especially if you don’t have the same level of support from those around you. But to me, it’s a central and essential place to start if we are going to make sure we handle database development and deployment properly. After all, you have to know what your database looks like. Because in the end, you need to be able to create a new database or upgrade any existing database to a given version at any time – consistently and reliably.

</end rant>

Thank you for coming to my TED Talk. Did I answer your question? Wait, what was the question again?

I am looking forward to reading all your posts!

Advertisement