Design tables for data in excel sheet - mysql

I am new in DB design,need advice to design correct number of tables and columns from excel sheet with this columns:
If multiple locations available or Foil is Y for a cardname then you see multiple rows for that cardname.

Identify the entities that your data model needs to represent.
An entity is briefly defined as: a person, place, thing, concept or event that can be uniquely identified, is important to the business, and we can store information about.
Also, identify the relationships between the entities.
The usual approach is consultation with the data owner/system owner; asking appropriate questions, throwing some ideas down in an entity relationship model, and asking further questions reviewing what is right about the model, and what isn't... altering and refining the model.
From the very brief description in the question, we would likely propose, as a starting point, some proposed entities: card and location.
Identify candidate keys:
What uniquely identifies a card?
What uniquely identifies a location?
Identify the cardinality of relationships (how many on each side of the relationship?
Can a location be related to more than one card?
Can a card have more than one location?
Must a card be related to a location (can we have a card that doesn't have a location?
etc.
Then, then assign the non-key attributes to the appropriate entities. Aim for third normal form.
"Each attribute is dependent on the key, the whole key, and nothing but the key. So help me Codd."

Related

How to relate an Entity to another one that can (but not always) belong to

I want to store information about houses. Those houses can be independent or belong to buildings. I want to store information about those buildings as well. So, a building can contain one or more houses and a house can be contained in zero or one building.
The question is how to relate in a mysql database those two entities.
The solucion I'm considering is adding to the house table an id_building that can be null but I'm not sure this is a good idea provided that it'd be a foreign key. Thank you very much in advance!
Your idea is the correct way to implement this relationship. It is a 0/1-->n relationship.
You capture the "0" relationship with a NULL value for building_id. You capture the "1" relationship with a valid value for building_id.

SQL Table Design Issue

So I am building out a set of tables in an existing database at the moment, and have run into a weird problem.
First things first, the tables in question are called Organizations, Applications, and PostOrganizationsApplicants.
Organizations is a pre-existing table that is already populated with lots of data in regards to an organization's information which has been filled out in another form on another portal. EDIT: I cannot edit this table.
Applications is a table that records all information that a user inputs in the application form of the website. It is a new table.
PostOrganizationsApplicants is basically a copy of Organizations. This is also a new table.
The process goes:
1. Go to website and choose between two different web forms, Form A pertains to companies who are in the Organizations table, and Form B pertains to companies who are not in that table.
2a. If Form A is chosen, a lot of the fields in the application will be auto-populated because of their previous submission.
2b. If Form B is chosen, the company has to start from scratch and fill out the entire application from scratch.
3. Any Form B applicants must go into the PostOrganizationsApplicants table.
Now I am extremely new to SQL and Database Management so I may sound pretty stupid, but when I am linking the Organizations and PostOrganizationsApplicants tables to the Applications table, FK's for the OrganizationsID column and PostOrganizationsApplicantsID columns will have lots of empty spaces.
Is this good practice? Is there a better way to structure my tables? I've been racking my brain over this and just can't figure out a better way.
No, it's not necessarily bad practice to allow NULL values for foreign key columns.
If an instance of an entity doesn't have a relationship to an instance of another entity, then storing a NULL in the foreign key column is the normative practice.
From your description of the use case, a "Form A" Applications won't be associated with a row in Organizations or a row in PostOrganizationsApplicants.
Getting the cardinality right is what is important. How manyOrganizations can a given Applications be related to? Zero? One? More than One? And vice versa.
If the relationship is many-to-many, then the usual pattern is to introduce a third relationship table.
Less frequently, we will also implement a relationship table for very sparse relationships, when a relationship to another entity is an exception, rather than the rule.
I'm assuming that the OrganizationsID column you are referring to is in the PostOrganizationsApplicants table (which would mean that a PostOrganizationsApplicants can be associated with (at most) one Organizations.
I'm also assuming that PostOrganizationsApplicantsID column is in the Applications table, which means an instance of Applications can be associated with at most one PostOrganizationsApplicants.
Bottomline, have a "zero-or-one to many" relationship is valid, as long as that supports a suitable representation of the business model.
Why not just add a column to the Organizations table that indicates that the Organization is a "Post" type of organization and set it for the Form B type of applicants? - then, all your orgs are in one table - with a single property to indicate where they came from.
If you can add a new record to Organizations (I hope you can) just
create FK from Organizations as PK of PostOrganizationsApplicants. So
if Organizations has corresponding record in PostOrganizationsApplicants - it's "Post"!
Thanks everybody, I think I found the most efficient way to do it inspired by all of your answers.
My solution below, in case anyone else has a similar problem...
Firstly I will make the PK of PostOrganizationsApplicants the FK of Organizations by making a "link" table.
Then I am going to add a column in PostOrganizationsApplicants which will take in a true/false value on whether they completed the form from the other portal or not. Then I will ask a question in the form whether they have already done the other version of the form or not. If the boolean value is true, then I will point those rows to the Organizations table to auto-populate the forms.
Thanks again!

How to spot the relationship in RDBMS?

I was studying about relationships in RDBMS.I have understood the basic concept behind mapping relation ship,but I am not able to spot them.
The three possibilities :
one to many(Most common) requires a PK - FK relationsip.Two tables involved
many to many(less common) requires a junction table.Three tables Involved
one to one(very rare). One table involved.
When I begin a project,I am not able to separate the first two conditions and I am not clear in my head.
Examples when I study help for a brief moment,but not when I need to put these principles in to practice.
This is the place where most begineers falter.
How can I spot these relationships.Is there a simpler way?
Don't look at relationships from a technical perspective. Use analogies and real-life examples when trying to envision relationships in your head.
For example, let's say we have a library database.
A library must have books.
M:M
Each Book may have been written by multiple Authors and each Author may have written multiple Books. Thus it is a many-to-many relationship which will reflect into 3 tables in the database.
1:M
Each Book must also have a Publisher, but a Book may only have one Publisher and a Publisher can publish many Books. Thus it is a one-to-many relationship and it reflects with the PublisherId being referenced in the Books table.
A simple analogy like this one explains relationships to their core. When you try to look at them through a technical lens you're only making it harder on yourself. What's actually difficult is applying real world data scenarios when constructing your database.
I think the reason you are not getting the answers that you need is because of the way you are framing the question. Instead of asking “How do I spot the correct type of relationship between entities”, think about “How do my functional needs dictate what relationship to implement”. Database design doesn’t drive the function; it’s the functional needs that drive the relationships you need to implement.
When designing a database structure, you need to identify all the entities. Entities are all the facts that you want to store: lists of things like book titles, invoices, countries, dog species, etc. Then to identify your relationships, you have to consider the types of questions you will want to ask your database. It takes a bit of forward thinking sometimes… just because nobody is asking the question now doesn’t mean that it might not ever be asked. So you can’t ask the universe “what is the relationship between these lists of facts?” because there is no definitive answer. You define the universe… I only want to know answers to these types of questions; therefore I need to use this type of relationship.
Let’s examine an example relation between two common entities: a table of customers and a table of store locations. There is no “correct” way to relate these entities without first defining what you need to know about them. Let’s say you work for a retailer and you want to give a customer a default store designation so they can see products on the website that their local store has in stock. This only requires a one-to-many relationship between a store and the customer. Designing the relationship this way ensures that one store can have many customers as their default and each customer can only have one default store. To implement this relationship is as easy as adding a DefaultStore field to your Customer table as a foreign key that links to the primary key of the Store table.
The same two entities above might have alternate requirements for the relationship definition in a different context. Let’s say that I need to be able to give the customer the opportunity to select a list of favorite stores so that they can query about in stock information about all of them at once. This requires a many-to-many relationship because you want one customer to be able to relate to many stores and each store can also relate to many customers. To implement a many-to-many relationship requires a little more overhead because you will have to create a separate table to define the relationship links, but you get this additional functionality. You might call your relationship table something like CustomerStoreFavorites and would have as its primary key as the combined primary keys from each of the entities: (CustomerID, StoreID). You could also add attributes to the relationship, like possibly a LastOrderDate field to specify the last date that the customer ordered something from a particular store.
You could technically define both types of relationships for the same two entities. As an example: maybe you need to give the customer the option to select a default store, but you also need to be able to record the last date that a customer ordered something from a particular store. You could implement the DefaultStore field on the Customer table with the foreign key to the Store table and also create a relationship table to track all the stores that a customer has ordered from.
If you had some weird situation where every customer had their own store, then you wouldn’t even need to create two tables for your entities because you can fit all the attributes for both the customer and the store into one table.
In short, the way you determine which type of relationship to implement is to ask yourself what questions you will need to ask the database. The way you design it will restrict the relational data you can collect as well as the queries you can ask. If I design a one-to-many relationship from the store to the customer, I won’t be able to ask questions about all the stores that each customer has ordered from unless I can get to that information though other relationships. For example, I could create an entity called "purchases" which has a one-to-many relationship to the customer and store. If each purchase is defined to relate to one customer and one store, now I can query “what stores has this customer ordered from?” In fact with this structure I am able to capture and report on a much richer source of information about all of the customer's purchases at any store. So you also need to consider the context of all the other relationships in your database to decide which relationship to implement between two particular entities.
There is no magic formula, so it just takes practice, experience, and a little creativity. ER Diagrams are a great way to get your design out of your head and onto paper so that you can analyze your design and ensure that you can get the right types of questions answered. There are also a lot of books and resources to learn about database architecture. One good book I learned a lot from was “Database System Concepts” by Abraham Silberschatz and Henry Korth.
Say you have two tables A and B. Consider an entry from A and think of how many entries from B it could possibly be related with at most: only one, or more? Then consider an entry from B and think of how many entries in A it could be related with.
Some examples:
Table A: Mothers, Table B: Children. Each child has only one mother but a mother may have one or more children. Mothers and Children have a one-to-many relationship.
Table A: Doctors, Table B: Patients. Each patient may be visiting one or more doctors and each doctor treats one or more patients. So they have a many-to-many relationship.
An example of one to one:
LicencePlate to Vehicle. One licence plate belongs to one vehicle and one vehicle has one licence plate.

Database Design - structure

I'm designing a website with courses and jobs.
I have a jobs table and courses table, and each job or course is offered by a 'body', which is either an institution(offering courses) or a company(offering jobs). I am deciding between these two options:
option1: use a 'Bodies' table, with a body_type column for both insitutions and companies.
option2: use separate 'institution' and 'company' tables.
My main problem is that there is also a post table where all adverts for courses and jobs are displayed from. Therefore if I go with the first option, I would just need to put a body_id as a record for each post, whereas if I choose the second option, I would need to have an extra join somewhere when displaying posts.
Which option is best? or is there an alternative design?
Don't think so much in terms of SQL syntax and "extra joins", think more in terms of models, entities, attributes, and relations.
At the highest level, your model's central entity is a Post. What are the attributes of a post?
Who posted it
When it was posted
Its contents
Some additional metadata for search purposes
(Others?)
Each of these attributes is either unique to that post and therefore should be in the post table directly, or is not and should be in a table which is related; one obvious example is "who posted it" - this should simply be a PostedBy field with an ID which relates another table for poster/body entities. (NB: Your poster entity does not necessarily have to be your body entity ...)
Your poster/body entity has its own attributes that are either unique to each poster/body, or again, should be in some normalized entity of their own.
Are job posts and course posts substantially different? Perhaps you should consider CoursePosts and JobPosts subset tables with job- and course-specific data, and then join these to your Posts table.
The key thing is to get your model in such a state that all of the entity attributes and relationships make sense where they are. Correctly modeling your actual entities will prevent both performance and logic issues down the line.
For your specific question, if your bodies are generally identical in terms of attributes (name, contact info, etc) then you want to put them in the same table. If they are substantially different, then they should probably be in different tables. And if they are substantially different, and your jobs and courses are substantially different, then definitely consider creating two entirely different data models for JobPosts versus CoursePosts and then simply linking them in some superset table of Posts. But as you can tell, from an object-oriented perspective, if your Posts have nothing in common but perhaps a unique key identifier and some administrative metadata, you might even ask why you're mixing these two entities in your application.
When resolving hierarchies there are usually 3 options:
Kill children: Your option 1
Kill parent: Your option 2
Keep both
I get the issue you're talking about when you kill the parent. Basically, you don't know to what table you have to create a foreign key. So unless you also create a post hierarchy where you have a post related to institution and a separate post table relating to company (horrible solution!) that is a no go. You could also solve this outside the design itself adding metadata in each post stating which table they should join against (not a good option either as your schema will not be self documentation and the data will determine how to join tables... which is error prone).
So I would discard killing the parent. Killing the children works good if you don't have too many different fields between the different tables. Also you should bear in mind that that approach is not good to solve issues wether the children can be both: institution and companies but it doesn't seem to be the case. Killing the children is also the most efficient one.
The third option that you haven't evaluated is the keeping both approach. This way you keep a dummy table containing the shared values between the bodies and each of the bodies have a FK to this "abstract" table (if you know what I mean). This is usually the least efficient way but most likely the most flexible. This way you can easily handle bodies that are of both types, and also that are only of type "body" but not a company nor an institution themselves (if that is even possible or might be possible in the future). You should note that in order to join a post to an institution you should always reference the parent table and then join the parent with the children.
This question might also be useful for you:
What is the best database schema to support values that are only appropriate to specific rows?

Database Normalization - I think?

We have a J2EE content management and e-commerce system, and in this system – for sake of a simple example – let’s say that we have 100 objects. All of these objects extend the same base class, and all share many of the same fields.
Let’s take two objects as an example: a news item that would be posted on a website, and a product that would be sold on a website. Both of these share common properties:
IDs: id, client ID, parent ID (long)
Flags: deleted, archived, inactive (boolean)
Dates: created, modified, deleted (datetime)
Content: name, description
And of course they have some properties that are different:
News item: author, posting date
Product: price, tax
So (finally) here is my question. Let’s say we have 100 objects in our system, and they all follow this pattern. They have many fields that overlap, and some unique fields. In terms of a relational database, would we be better off with:
Option One: Less Tables, Common Tables
table_id: id, client ID, parent ID (long) (id is the primary key, a GUID for all objects)
table_flag: id, deleted, archived, inactive (boolean)
table_date: id, created, modified, deleted (datetime)
table_content: id, name, description
table_news: id, author, posting date
table_product: id, price, tax
Option Two: More Tables, Common Fields Repeated
table_news: id, client ID, parent ID, deleted, archived, inactive, name, description, author, posting date
table_product: id, client ID, parent ID, deleted, archived, inactive, name, description, price, tax
For full disclosure – I am a developer and not a DBA, and because of that I prefer option one. But there is another team member that prefers option two, and I think he makes valid points.
Option One: Pros and Cons
Pro: Encapsulates common fields into common tables.
Pro: Need to change a common field? Change it in one place.
Pro: Only creates new fields/tables when they are needed.
Pro: Easier to create the queries dynamically, less repetitive code
Con: More joining to create objects (not sure of DB impact on that)
Con: More complex queries to store objects (not sure of DB impact on that)
Con: Common tables will become huge over time
Option Two: Pros and Cons
Pro: Perhaps it is better to distribute the load of all objects across tables?
Pro: Could index the news table on the client ID, and index the product table on the parent ID.
Pro: More readable to human eye: easy to see all the fields for an object in one table.
My Two Cents
For me, I much prefer the elegance of the first option – but maybe that is me trying to force object oriented patterns on a relational database. If all things were equal, I would go with option one UNLESS a DB expert told me that when we have millions of objects in the system, option one is going to create a performance problem.
Apologies for the long winded question. I am not great with DB lingo, so I probably could have summarized this more succinctly if I better understood terms like normalization. I tried to search for answers on this topic, and while I found many that were close (I suspect this is a common DB issue) I could not find any that answered all my questions. I read through this article on normalization:
But I did not totally understand it. On the one hand it was saying that you should remove any redundancies. But on the other hand, it was saying that each attribute should define only one object.
Thanks,
John
You should read Patterns of Enterprise Application Architecture by Martin Fowler. He writes about several options for the scenario you describe:
Single Table Inheritance: One table for all object subtypes. Stores all attributes, setting them NULL where they are inapplicable to the row's object subtype.
Class Table Inheritance: One table for column common to all subtypes, then one table for each subtype to store subtype-specific columns.
Concrete Table Inheritance: One table for each subtype, storing both subtype-specific columns and columns common to all subtypes.
Serialized LOB: One table for all object subtypes. Store common attributes as conventional columns, but combine optional or subtype-specific columns as fields in a BLOB that stores XML or JSON or whatever format you want.
Each one of these designs has pros and cons, so choose a solution depending on the most common way you access your data.
However, notice I use the word subtype above. I would use these designs only if the different object types are subtypes of a common base class. I'm assuming that News item and Product don't actually share a logical base class (besides Object); they are not subtypes of a common superclass.
So for the sake of OO design, I would choose Concrete Table Inheritance. This avoids any inappropriate coupling between these subtypes. There are columns the two tables have in common, but they basically amount to bookkeeping, not anything to do with the function of the class and hence the table.