Now that we have a Person table to answer the first half of the first question on the questionnaire, it’s time to look at the second half: what is your current post office address.
Again, it’s something that seems something fairly simple. So let’s start with the basic of what the data we’re trying to find. For me, this means we need to create address or location tables. Addresses are fairly common things to model, which means there are a lot of ways you can do this. This also means that it’s easy to overthink or even over-design. So to help prevent that, let’s define what are our requirements for these tables.
- We’ll need to store the state.
- We’ll need to capture either the county or city.
- I don’t know if we will get street information.
- We won’t get zip codes since those weren’t introduced until the 1960s. (Fun Fact!)
- We may need to store countries if any addresses in our survey are outside of the US.
- Taking a look through all of the questions, we may have multiple types of address information for a person throughout the survey. For example, this is the current location but we may have a location where they enlisted, were born, grew up, etc.
- After browsing the questionnaires, we may need to capture changes to locations as part of future questions. For example, West Virginia broke from Virginia and became its own state during the Civil War. One of the survey examples notes that the county name changed.
So how do we design our tables to reflect these requirements?
Most of the requirements listed above go together. To include the items except #7, I’ve come up with this:
I’m not calling it an Address table but a Location table because I think there are other cases for this table where using location may be more appropriate. Note that for locations, I’m making the alternate key the person and the location type. In many situations, this may not work because there could be several location types for each person. Right now, I’m going to start with the criteria that there’s only going to be one location for each type. As this is just a model right now, I can change it later as we take a closer look at other questions.
The last piece for this is how to handle changes for things such as county names or city locations. Again this is another place where there are a lot of options. In figuring out which way to go, I decided I wanted to go with a simple solution that focused on the information that I cared about. I don’t care about tracking that West Virginia was once part of Virginia but I do care about knowing that the Wheeling, Virginia before the war is now Wheeling, West Virginia and I do care that parts of Grainger County, Tennessee became Hamblen County.
I thought about just using the Location table as a way to keep track of city\state or county\state changes but we could also be dealing with cities or counties that are renamed. I think this would be a cleaner way to create that connection.
With that, the first question is modeled. So to double check that I’m on the right track, can I look at this model and see if it can hold the data to answer the question:
I think the answer right now is yes.
[If you want to see the thought process behind modeling first part of this question (State your full name), check out the post here.]