Now that we understand a little more how dynamic SQL works, let’s see how it helped me solve the problem.
What is OPENROWSET?
OPENROWSET is a functionality that allows you to access data sources outside your current server. This could be reading from an Excel file or calling another SQL Server instance. You’re able to treat that other data source as record set, or derived table, and work with the rows returned as you would a local table. One reason you may want to do this is that you need to use a stored procedures to query data from other servers and bring the data together, effectively creating an ELT (Extract – Load – Transform) process without having to use SSIS or Azure Data Factory (ADF).
I admit it – I do waaayyyy too much with dynamic SQL. But I just keep running into situations that require it. That being said, I ran into an interesting problem that had me puzzled. I found a bunch of different blog posts that pointed to me to the right direction but required a little extra work to find the solution.
There are several concepts that are at play here, so I’ll try to break this out so we can put the pieces together. The first once is centered around dynamic SQL. There are two parts of this I want to make sure we understand first – how it fits into sessions and how it gets executed.
Happy T-SQL Tuesday! It’s the first one of 2023 so in the spirit of starting the new year on the right foot, it feels wrong to not join in.
This month, our host is Raul Gonzalez (b | t). His challenge for us is are there cases where the commonly agreed upon “worse cases” are cases where they are useful?
I feel like this is where I should say something like, “Hi, my name is Deborah and I’ve used nolock in production.” I would also have to confess to doing things like using correlated sub queries, not using a foreign key, implemented complicated triggers, etc. I often talk about how the first real SQL script I wrote had cursors running over temp tables in SQL Server 6.5, which I’m fairly certain was one of the first thing I read you were NOT supposed to do. And oh, hello there, denomalized table and dynamic SQL! I’m sure I’ve done more things than this too. These are just the ones I can remember doing, or at least I’m willing to admit in public.
I’m not sure if I can remember all the specifics but what I do remember is, there was a reason for it. More importantly, these were also exceptions to the rule.
Why did we use nolocks? Because after more than 2 weeks of doing deep dive investigation of the deadlock situations and being on an older version of SQL Server (or we at least had the requirement to support a legacy version) where snapshot isolation wasn’t an option, using a combination of nolock, index hints, and the application developer adding logic to immediately retry the query if a deadlock happened mitigated the problem. We tested it thoroughly as a solution before being sent to production.
Why did I OK a correlated sub query? Because I knew the SELECT statement was only going to call one or two rows and while it wasn’t great, it shouldn’t have been much of a problem. I made a note in the pull request that this should be looked at with the next change so we could keep track of the problem. There may have been other times where we were using the STUFF + FOR XML string aggregation logic to take a bunch of related rows and have them returned in a single column value as part of a larger dataset.
The cursor over the temp table? I blame a really, really bad database design that even I, a very new and junior DBA at the time, could tell was just a really, really bad design. And redesigning the database wasn’t an option. But I always come back to this code because it’s really how I learned to write SQL.
Sometimes the worse practices are there because people haven’t run into the issues yet. Or you question whether it’s a worse practice because you personally don’t like it or it’s an actually problem or it only is one under different circumstances. I like CTEs but I’m so cautious of them because I’ve been burned by bad performance. Does that fall under “worse” practice or does that fall under “do extra testing first”? What about the MERGE syntax? I know people who use it and haven’t run into an issue but I’ve just read about all the bugs with it so it really feels like something we don’t want use. So is that a “worse” practice to use or is it a “do extra testing first” or “better safe than sorry so let’s just avoid” situation? But more importantly, is it a worse practice to use CTEs or MERGE because of the problems associated with them based on different use cases or do we just make sure we use them when appropriate, with the proper testing of course?
Here’s the thing – there is a reason these things that we consider bad practices are available in the first place. There are situations where they can be useful and help solve problems. I think what makes them bad practices is implementing them as a rule instead of as an exception. “This solved\prevented this one problem so therefore it must solve\prevent all the problems.” Another issue is when those practices are embedded in either legacy code or legacy work culture. It’s harder to educate people on why you have to change to the best practices, especially if you run into that case where you really do need that “worst” practice. But when the worse practice is embedded in the code and culture to the point you can’t change it later, you start to learn the true cost of Tech Debt the hard (and very expensive) way.
The other catch to all of this is while there was a reason these “worse” practices were implemented at the time, we have to remember to keep track so that way when newer features and functionality come along that could be better solutions, we can use those to get rid of the things we shouldn’t be doing. For example, introducing snapshot isolation where nolocks were used since nolock and dirty reads will take precedence over the optimistic row versioning.
I guess it comes down to you have test your code and be willing to defend your decision to use a “worse” practice to someone else. It is very much a calculated risk so you have to make sure you understand what you’re doing and why and keep track of when you make them. More importantly, you have to make sure that you can pivot to fix the problem should you accidentally make the wrong choice. Good luck!
Thanks again to Raul for a great topic. Looking forward to reading all the “bad” things you have been doing in your databases.
It’s another T-SQL Tuesday! This month, Mala Mahadevan (t | b) is hosting. Her question for us this month is:
What are the T-SQL coding rules that are appropriate for where you work and what you do?
When I think of what my coding standards are, I tend to think of it as how would I review scripts for a pull request (PR). I think my past coworkers can attest that I can get quite picky when I look at code. So if I were your reviewer, what sort of things would I comment on?
Thanks to Rie Merritt (t) for this month’s topic. She’s asked us to write about various aspects about running a user group. You can find the official invitation from Rie here.
For those of you who don’t know me, I’m a board member of the NESQL User Group, the Microsoft Data Platform user group in the greater Boston, Massachusetts (USA) area. I mostly help out with communications – sending out the monthly tweets and posting reminders to our group on LinkedIn.
Essentially, I want to make sure you know about and remember to attend the great event that we’re hosting each month.
This month is hosted by Brent Ozar (t | b). Brent asked us talk about our favorite or least favorite data type this month. You can check out the invitation here.
I never thought about whether I had a favorite data type. I mean, I’m a DBA so naturally, my favorite data type depends on the data.
But I think I’ll “wax poetic” about datetimes and all of the varieties. When I first got started with SQL Server back in the SQL 6.5 days, we really just had datetime, which was the date and time. If you needed just the time, you ignored the date portion. If you needed just the date, you put midnight (00:00:00.000) for the time. And when you dealt with the time portion, you deal with rounding to the .003 millisecond.
Now, we have all sorts of options – datetime, smalldatetime, datetime2, datetimeoffset, date, and time. The problem is when to use each of these.
The easiest ones are date and time. Now if you just need the date portion, you just use the date data type and you don’t have to deal with midnight or ignore the time portion of the datetime. If you just need the time, again you don’t have to use some random substitute date or ignore the date portion because you can use the time data type. Dates can hold any calendar date from January 1, 0001 through December 31, 9999 and Time can hold any time from 00:00:00.0000000 through 23:59.59.9999999
It starts get tricky from here. If you have to deal with multiple time zones, datetimeoffset is going to be your best bet. But then you’re having to deal with time zones, and we know how well we deal with those…. or rather don’t deal well with them. *Checks calendar and see Daylight Savings for the US coming in a few days…*
So now we’re left with smalldatetime, datetime, and datetime2. If you’re dealing with temporal tables, SQL Server says you have to use datetime2 for the system start and end times. Otherwise datetime2 is really a combination of date (all dates) and time (all times out to nanoseconds). Smalldatetime can hold any date from January 1, 1900 through June 6, 2079 (really – June 6??) and the time portion is accurate to the second. Datetime can hold any date from January 1, 1753 through December 31, 9999 and times out to the millisecond.
The fun really starts when you realize that what SQL Server stores internally isn’t what we see. We just see a formatted date time based on the default of the system or whatever we have chosen to have the format be if we specified the format somewhere. Datetime and smalldatetime can be converted to integers, but it looks like these values can be converted to varbinary. What’s even cooler is you can start to see how these different data types are related, regardless of the formatting that gets pulled back.
Query to set the different dates to the current datetimeThe results of the previous query.
Which date to choose really depends on the dates and times you need to store and how exact you need to be. You need to understand your data and what you are really trying to capture. The problem of course is when you are stuck with legacy systems so you can’t replace that datetime field with a date because that truly is all that you need.
But if I had to choose, what’s my favorite date? Well, I’ll just leave you with this:
Thanks for the great topic, Brent! Can’t wait to read what data types everyone else likes or dislikes…
It’s another T-SQL Tuesday! Thanks to Mikey Bronowski (t|b) for hosting. Our challenge:
I would like you to write about the tools that help you at work, those that helped you the most or were the most effective.
Whenever there is talk of switching tools, I always here someone say, “once we are on <insert tool>, we won’t have X problem anymore.” This statement is one of my pet peeves. It’s not the tool that solves the problem. It’s making sure we implement and use the tools properly to help solve the problem. The tools themselves are just that – tools. If we don’t understand the problem, then we’re just shifting the problem to another tool.
There area so many tools that get implemented at work these days. While I use them to varying degrees, I am not always using these tools on a daily basis. And in many cases, I’m not the person responsible for setting them up. I’m a SQL Developer so my daily toolset is much smaller.
Honestly, the vast majority of my time is split between Management Studio (SSMS) or Azure Data Studio. I’m pretty simple\straightforward this way. I started playing a lot more with Azure Data Studio over the past year, but I find I’m not able to make the switch to using it full time. It really depends on the task that I need to do.
So what tasks do I do often and which tool do I use?
2020 ended up being a year for the record books and 2021 is already making its mark. So I appreciate the topic that our host, James McGillivray (b|t), has given us.
Breaks are critical for our mental health. Write a post about relaxation techniques, dream destinations, vacation plans or anything else relating to taking a break for your own mental health.