In a previous post, I discussed the challenges of collecting data in one or more spreadsheets. A defining characteristic of a spreadsheet in this context is usually that it is a single table containing information about many types of objects (people, places, works, events for example). Another characteristic is that relationships between these objects are defined by being on a single line in the spreadsheet. So to say person A is an associate of person B, his name will need to be in a column named Associate (maybe Associate1 or 2 etc.) where person B’s name is in an another column such as Author on that same row. If person A is also person B’s sibling his name will need to be in an additional column. Because the only way to connect between objects is to put them on the same row, the work’s name will also need to be on the Author’s table row, as seen below.
Author | Work | Associate1 | Associate2 | Sibling1 | Sibling2 |
---|---|---|---|---|---|
PersonB | Book Name | PersonA | PersonC | PersonA |
Relational data modeling uses an approach called reaching normal form. A good description of the normal form rules can be found here for example. I like to think of somewhat simpler rules:
A one-to-many relationship is demonstrated in the previous simplified diagram where we show tables for people and for works. The author of a work is indicated by the AuthorId column in the works table, which will have the Id of someone from the people table, not their name. The AuthorId column is an FK on the Works table. This is a one-to-many relationship because a work has a single author (in this scenario) but a person can be the author of many works. The authors’ name is only included once in the database (first rule) although they may have authored many works. People and works are in separate tables (second rule), and the PK of the People table is independent of additional information (fourth rule).
This diagram shows a many-to-many relationship between the works and people tables, implementing the possibility of having an author and collaborators on a certain work, as was indicated in the spreadsheet above. The PeopleAndWorks table is a junction table in which there will be a row for each relationship between a work and a person. The type of the relationship, be it author or collaborator, is indicated with an FK to an additional table, the RelationshipTypes table. This fourth table makes it easy to add a new relationship type, Editor for example, a change that in the spreadsheet would require the addition of a column.
In the spreadsheet example above, another type of a relationship is included, that of a sibling. This is a case of a many-to-many relationship between a table (People) and itself. Designing a junction table to capture this is left to the reader…
This post is an extremely abbreviated description of relational data modeling principles. Further reading is recommended and I can try and address specific questions in the comments.