I have a newbie question about a database I am trying to create. I have a list of publications, with this general order:
UID, Author, URL, Title, Publication, start page, end page, volume, year
Then I realized that there are multiple Authors and I began trying to normalize the Database for multiple Authors. Then I realized that the Order of the authors is important, and that a journal article could also have numerous Authors, between 1, and dozens, or possibly even more.
Should I just create a table with multiple Authors (null columns)(like 12 or something)? Or is there a way to have a variable number of columns depending on the number of authors?
Database model
You basically need a many-to-many relationship between Authors and Publications, since one author can write many publications, and one publication can be written by more than one author.
This require you to have 3 tables.
Author - general info about every author (without publications_id)
Publication - general info about every publication (without author_id)
AuthorPublication - columns author_id and publication_id that are references to tables Author and Publication.
This way you're not binding a specific author to a publication, but you can have more of them, and the same thing the other way around.
Additional notes
If you would like to distinguish authors' role in particular publication you could also add some column like id_role that would be a reference to a dictionary table stating all possible roles for an author. This way you could differ between leading authors, co-authors etc. This way you could also store information about people handling translation of the book, but perhaps you should then change the naming of Author to something less specific.
Order of appearance
You can ensure a proper ordering of your authors by adding a column in AuthorPublication which you would increment separately for every Publication. This way you would be able to preserve the ordering as you need it.
You have many to many relationship between entity Publication and entity Author.
Publication can have many authors, author can have many publications.
So, you should create table for this relationship. For example table Authors_Publications with columns: UID, author_id, publication_id, order
You should create author table which is many-to-many relation with the publication table
Author have some information
and publication also have informaiton
so should have tables like author and publication
both have primary key like author_id and pblication_id
and both key having many-to-many relationship
Actually your scenario is even more complicated.
A publication can have more than one author. An author can write more than one published article or book. That is a Many-to-Many relationship.
We always(*) represent a many-to-many with a third table, sometimes called. Bridge table. This third table, authorship, is a child table with at least two columns, both foreign keys holding the primary key from each of its parent tables, pub_ and author_ tables. We transform the Many-to-Many into a pair of One-to-Many relationships.
By the way, this books-author scenario is the canonical example used when teaching relational database design.
You can have additional fields on this third table. In your case, we need a priority_ column of an integer type to sort the list of primary vs secondary authors.
Each author’s compensation fee or royalty would be additional columns on this bridge table. If you were tracking each author needing to sign a contract for their work on that publication, the authorship_ table would have a date, date-time, or boolean column contract_signed_. So you can see that the bridge table represents anything to do with one particular author’s involvement on one particular publication.
(*) Not merely an opinion or suggestion. Relational database design is proven by entire books filled with mathematical proofs. This includes the need to break up a many to many with a third table. Relational database design is the only case of true information engineering backed by mathematical description and proofs. Search for relation (a field of mathematics), and doctors E.F. Codd and Chris Date to learn more.
Related
I'm creating a database for personnel records and trying to ease record creation for the user and avoid a kludgy solution. The tables are:
people:
people_id,
person_name,
person_category_id
person_category:
person_category_id,
person type
document_requirement:
document_requirement_id,
document_requirement_name,
person_category_id,
document_section_id
document_section:
document_section_id,
document_section
I've created an append query (inner join) that populates a table caLLed document_repository which contains all of the required documents for all of the people. (I use a primary key composed of people_ID & document_id to avoid duplicates when the append query runs.) Here is the document_repository table.
document_respository:
document_repository_id,
people_id,
person category_id,
document_id,
document_section_id,
document_attachment
I'd like to be able to allow the user to create a document requirement that is applicable to multiple person categories. I understand I should avoid multi field values, which doesn't work anyway with inner joins. For example, if people categories include doctors and nurses, I'd like to be able to create a new document requirement that applies to both people categories (e.g., doctors and nurses), without having to create two separate document requirements.
More information needed?
Suggestions on design changes and/or queries?
Thanks!
snapshot of tables and relationships
What you describe is a many to many relationship. Each document requirement can be applicable to multiple person categories and different document requirements can be applicable to the same person category.
To have a many to many relationship between two entities (tables) in your database, you need another table to relate them. This additional table contains the primary key of both tables and each record in this table represents a link between the two entities.
Your naming is different between your text and your diagram, but I'll assume you want to have document_requirement records that can link to zero or more person_category records.
You need a table which for example could be called document_requirement_person_category and contains the following fields:
document_requirement_id - foreign key referencing PK of document_requirement
person_category_id - foreign key referencing PK of person_category
You then add a record to this link table for each person category that relates to each document requirement.
Edit: BTW, (if I'm reading your schema correctly), you already have a many to many relationship in your schema: document_repository allows a relationship between multiple people and a document requirement as well as multiple document requirements and a person. That's a many to many relationship.
I'm designing a simple db (still learning)
Here's the entities
User, Topic, Article
User Topic is many to many(a user can be interested in many topics)
Topic User is many to many(a topic can be interested by many users)
User Blog is one to many (a user can write many blogs)
Blog User is one to one (a blog can be authored only by one user)
here's the question:
Shall I still make a one to many relationship between Topic and Blogs?
For instance, if I want to find all the latest blogs for certain topics, one way is to find all the users, and find all the blogs for those users, rank by time.
Another way is, if we keep redundancy by having a Topic to Blog relationship (one to many), then we can get all the blogs from the topic directly, then sort by time.
I'm kind confused, shall I have this redundancy? whats's the best practice here (in terms of ease of programming for backend programmers and query efficiencies? est users 100k, blogs 200k, Topics 20)
Thanks a lot!
Add a pic:
For many to many relationships you should use associative entity which means adding a bridge table to store relations, and for one to many simply store the relation key as a foreign key constraint.
Saying that your schema could look like:
Users
Topic
Users_Topic (where primary key would consist of columns identifying both Users and Topic)
Article (where it has a foreign key to table Users through a column eg. user_id)
Regarding Article and Topic relation, there's no need for redundancy by storing it in a separate table, so you could include a foreign key to topic within an article. If you are worried about querying Article table that much, which I don't think should be your consideration right now you can create a table which stores Article content since it is what takes space within the table and call it Article_Content with foreign key to Article also being the primary key for this new table.
I'm porting a MySQL database to Core Data for a Mac OS app. I have two many to many tables in my database. In addition to containing the foreign keys, there are a few data columns. Is it possible to add attributes to a many to many relationship in Core Data? It doesn't look like it to me. My fallback is to replicate the linkage table in Core Data. Are there any problems doing this?
An example:
A record has one or more artists performing on it.
An artist performs on zero or more records.
The linkage table row contains a foreign key for the record, a foreign key for the artist, the instruments the player performed with, and a notes column that adds additional information such has which track the artist performed on.
You are correct: relationships themselves cannot have attributes. And you are on the right track in modelling the linking table as an intermediate entity. This approach is alluded to in the CoreData Programming Guide section on "Modelling a relationship based on its semantics". In their case, they model a (reflexive) many-many relationship from Person to Person using an intermediate FriendsInfo entity with a ranking attribute.
In your example, you might have a Record entity, an Artist entity, and an intermediate Appearance entity. The Appearance entity would have attributes for Instruments and Notes, and (to-one) relationships to Record and Artist (each with a to-many inverse).
The slight downside is that you have to create the Appearance object in order to link a Record object and an Artist object, rather than just adding them to the relevant relationship. You will also have to watch for uniqueness of the combination of Record/Artist, if that's important to you: by default there could be many Appearances for the same Record and Artist.
I was studying about relationships in RDBMS.I have understood the basic concept behind mapping relation ship,but I am not able to spot them.
The three possibilities :
one to many(Most common) requires a PK - FK relationsip.Two tables involved
many to many(less common) requires a junction table.Three tables Involved
one to one(very rare). One table involved.
When I begin a project,I am not able to separate the first two conditions and I am not clear in my head.
Examples when I study help for a brief moment,but not when I need to put these principles in to practice.
This is the place where most begineers falter.
How can I spot these relationships.Is there a simpler way?
Don't look at relationships from a technical perspective. Use analogies and real-life examples when trying to envision relationships in your head.
For example, let's say we have a library database.
A library must have books.
M:M
Each Book may have been written by multiple Authors and each Author may have written multiple Books. Thus it is a many-to-many relationship which will reflect into 3 tables in the database.
1:M
Each Book must also have a Publisher, but a Book may only have one Publisher and a Publisher can publish many Books. Thus it is a one-to-many relationship and it reflects with the PublisherId being referenced in the Books table.
A simple analogy like this one explains relationships to their core. When you try to look at them through a technical lens you're only making it harder on yourself. What's actually difficult is applying real world data scenarios when constructing your database.
I think the reason you are not getting the answers that you need is because of the way you are framing the question. Instead of asking “How do I spot the correct type of relationship between entities”, think about “How do my functional needs dictate what relationship to implement”. Database design doesn’t drive the function; it’s the functional needs that drive the relationships you need to implement.
When designing a database structure, you need to identify all the entities. Entities are all the facts that you want to store: lists of things like book titles, invoices, countries, dog species, etc. Then to identify your relationships, you have to consider the types of questions you will want to ask your database. It takes a bit of forward thinking sometimes… just because nobody is asking the question now doesn’t mean that it might not ever be asked. So you can’t ask the universe “what is the relationship between these lists of facts?” because there is no definitive answer. You define the universe… I only want to know answers to these types of questions; therefore I need to use this type of relationship.
Let’s examine an example relation between two common entities: a table of customers and a table of store locations. There is no “correct” way to relate these entities without first defining what you need to know about them. Let’s say you work for a retailer and you want to give a customer a default store designation so they can see products on the website that their local store has in stock. This only requires a one-to-many relationship between a store and the customer. Designing the relationship this way ensures that one store can have many customers as their default and each customer can only have one default store. To implement this relationship is as easy as adding a DefaultStore field to your Customer table as a foreign key that links to the primary key of the Store table.
The same two entities above might have alternate requirements for the relationship definition in a different context. Let’s say that I need to be able to give the customer the opportunity to select a list of favorite stores so that they can query about in stock information about all of them at once. This requires a many-to-many relationship because you want one customer to be able to relate to many stores and each store can also relate to many customers. To implement a many-to-many relationship requires a little more overhead because you will have to create a separate table to define the relationship links, but you get this additional functionality. You might call your relationship table something like CustomerStoreFavorites and would have as its primary key as the combined primary keys from each of the entities: (CustomerID, StoreID). You could also add attributes to the relationship, like possibly a LastOrderDate field to specify the last date that the customer ordered something from a particular store.
You could technically define both types of relationships for the same two entities. As an example: maybe you need to give the customer the option to select a default store, but you also need to be able to record the last date that a customer ordered something from a particular store. You could implement the DefaultStore field on the Customer table with the foreign key to the Store table and also create a relationship table to track all the stores that a customer has ordered from.
If you had some weird situation where every customer had their own store, then you wouldn’t even need to create two tables for your entities because you can fit all the attributes for both the customer and the store into one table.
In short, the way you determine which type of relationship to implement is to ask yourself what questions you will need to ask the database. The way you design it will restrict the relational data you can collect as well as the queries you can ask. If I design a one-to-many relationship from the store to the customer, I won’t be able to ask questions about all the stores that each customer has ordered from unless I can get to that information though other relationships. For example, I could create an entity called "purchases" which has a one-to-many relationship to the customer and store. If each purchase is defined to relate to one customer and one store, now I can query “what stores has this customer ordered from?” In fact with this structure I am able to capture and report on a much richer source of information about all of the customer's purchases at any store. So you also need to consider the context of all the other relationships in your database to decide which relationship to implement between two particular entities.
There is no magic formula, so it just takes practice, experience, and a little creativity. ER Diagrams are a great way to get your design out of your head and onto paper so that you can analyze your design and ensure that you can get the right types of questions answered. There are also a lot of books and resources to learn about database architecture. One good book I learned a lot from was “Database System Concepts” by Abraham Silberschatz and Henry Korth.
Say you have two tables A and B. Consider an entry from A and think of how many entries from B it could possibly be related with at most: only one, or more? Then consider an entry from B and think of how many entries in A it could be related with.
Some examples:
Table A: Mothers, Table B: Children. Each child has only one mother but a mother may have one or more children. Mothers and Children have a one-to-many relationship.
Table A: Doctors, Table B: Patients. Each patient may be visiting one or more doctors and each doctor treats one or more patients. So they have a many-to-many relationship.
An example of one to one:
LicencePlate to Vehicle. One licence plate belongs to one vehicle and one vehicle has one licence plate.
I am creating a database for a university dept(for internal use) and this database tracks issues related to people. I get information of only employees at the university and I am tracking them using their university ID. But the database is also intended for people who are not employees at the university or even sometimes people outside the university. I want to assign an ID to these people but store values within same column as university id. Any ideas how I should tackle this issue? I don't know how to keep the univ id and the no. I am going to give to the others in the same column and yet treat them differently (when needed). How do people usually tackle such issues?
PS: I do not like auto numbers since I cannot delete an ID and get it back into the database
Your best bet here is probably to create a common Person table for all people which simply tracks an Id, then relate separate tables for each of the different kinds of people (ex. Employee, Student, etc.) to it - note that all of these names are just examples. Each of these tables would contain a foreign key to the Person table. In effect, this logically make each of them a different "subclass" of person.
For example, Employee my be related to Person via the foreign key
[Person.PersonId] {PK} <==(1-0)==> {FK:Person.PersonId} Employee.PersonId
This is generic for any Other entity:
[Person.PersonId] {PK} <==(1-0)==> {FK:Person.PersonId} Other.PersonId
Note that any columns common to all types of person may exist directly on the Person table, with each of the "subclasses" of person only recording columns specific to its particular type.
In addition, you may create a view with joins these tables to present a "combined" generic record for people. In some cases, it's useful to include a column in the view that simply indicates which table an individual record came from (ex, class or type.
I personally would not want to mix the two different kind of ids within one field. This is just waiting for trouble when you migrate since you need to keep these apart by convention or through other fields.
You will of course use the uni id to search the person ...
(uni id and an id for people not on uni)
Approach 1 - introduce an additional table
Issue
- id
- person_id
Person
- id
- university_person_id (optional)
University Person (table)
- id
Approach 2 - make the university ref id optional
Issue
- person_id
Person
- id
- university_ref (optional)
What do you think?
But if you really want to go along with mixing the two kind of ids within one field I suggest to use a prefix followed by a generated number.
EXTERNAL-123456
You can also introduce an additional field external_contact (boolean) or contact_type 'Uni' or 'External'. Also add a unique index on the two combined.
Hope this will give you some food for thought :)