I'm trying to determine the best way to represent the concept of "linked" contacts (people) in an existing mysql database. Let me explain what that means:
I have a table with different contacts, each with an ID. I have another table with different time periods that a contact can belong to. Each period also have an ID. There's a N:1 relationship between contacts and periods. I'd like to be able to represent that two (or more) contacts in different periods are actually the same person (for analytics sake). So if two contacts are linked, they should be treated as one for analytics purposes. For example, I might want to represent that contacts 1 and 3 are the same person.
The difficulty is in representing this concept of "linked" for more than two contacts (e.g. contacts 1, 3, and 4). If there were just two, I could simply have a table with two columns representing the connection. However, since there could be three contacts who are linked (or any number, really), this doesn't work out of the box.
I've come up with two possible solutions for now:
1) Have one table (let's call it linked_contacts) that has two columns, one for each contact ID. This will represent a connection. If there are more than 2 linked contacts, then add a connection to one of the linked contacts. The upside of this is that it's cheap to add another contact into the link. The downside is that in order to get all contacts in the link, I essentially have to construct the graph by making a query for each connection.
2) Still have the linked_contacts table with two columns. When adding a new contact to the link, generate a new row for each connection between the new contact and the pre-existing ones. The upside is that I can get all the linked contacts for a particular contact in a single query. The downside is a larger table.
Additional notes
It's not expected that there will be many links with more than 2 linked contacts, but I would like to support more than 2.
The main use case for retrieving this data is getting all contacts that are linked to a specific contact.
It is very unlikely that a contact will be removed from a link. It will be more common for a contact to be added to a link.
Performance of retrieval is the main factor, since I'll be displaying which contacts are linked to a particular contact on each contact's page.
I'm leaning towards option 2 for ease/speed of retrieval of all contacts in a link given a particular contact. Is the right way to go about it? Are there any other factors that I should take into account? I'm open to other design strategies as well!
I think the "linked list" is a wrong abstraction of your requirements.
If you need to represent the concept of "person" in the database, then just represent it directly, as its own table. For example:
All contacts of the same person share the same PERSON_ID, which implicitly connects them.
Effectively, you turn the CONTACT into a "junction" (aka. "link") table between PERSON and PERIOD, which is a standard way of modeling many-to-many relationship. You can then easily find all persons whose contacts belong to the given period, similarly to this:
SELECT DISTINCT PERSON_ID
FROM CONTACT
WHERE PERIOD_ID = <whatever>
And since there is an index on {PERIOD_ID, PERSON_NO}1, this query can be satisfied by a simple index range scan, which is very fast2.
You can also easily JOIN with the PERSON table if you need to get person's other fields, and/or with PERIOD if you need other fields from there.
1Implicitly created underneath the UNIQUE constraint, indicated by U1 in the diagram above.
2 BTW, you'll also probably need an index in the "opposite" direction: {PERSON_ID, PERIOD_ID}, to satisfy searching for contacts of a given person.
Since you don't expect to have many links, and you are concerned about performance as well as ease of retrieval, and management is insert-heavy, then I would have a separate table with multiple columns representing the links.
CREATE TABLE links (
contact_id0 BIGINT UNSIGNED NULL REFERENCES contacts (id),
contact_id1 BIGINT UNSIGNED NULL REFERENCES contacts (id),
contact_id2 BIGINT UNSIGNED NULL REFERENCES contacts (id),
contact_id3 BIGINT UNSIGNED NULL REFERENCES contacts (id)
);
INSERT INTO links VALUES (1,3,4,NULL);
-- find 3's links
SELECT * WHERE 3 IN (contact_id0,contact_id1,contact_id2,contact_id3);
You might also consider proper normal form of either an edge list or adjacency matrix, but the design above seems to meet your requirements with minimum complexity.
The one design tradeoff is the maximum number of links between contacts is fixed by the number of columns. But as mentioned in my comments, columns are cheap and can be added if demand warrants.
FWIW, if you wanted either your Option 1 or Option 2 and weren't open to other ideas, I would go with your #2.
Related
I've made a project that's essentially an online bookstore where one can buy books and place the order.
My database contains various tables like:
user
user_shipping_address
user_payment_mode
user_order
order_shipping_address
order_billing_address
order_payment_details
I tried to construct the EERD diagram for this but I am confused about one thing: A user_order can only have one shipping address. I've created a foreign key order_id in the order_shipping_address table that references the primary key order.id. I also have a shipping_address_id foreign key in the table order that references order_shipping_address.id.
When I try to generate the ER diagram, it gives me two different relationships. A 1:1 relationship between the order and the shipping address and a 1:M relationship from the shipping address to the order. I don't know how to structure the foreign key constraints because I feel the order table should contain the shipping_address_id and the shipping address should contain the order_id, right? This just made everything more confusing.
Please help me about this.
Here is my EERD :
This happens because your current design means it's possible for multiple user_order rows to reference the same single shipping_address row.
You need to change the design so that it's impossible for multiple user_order rows to reference the same single shipping_address row.
There are at least two different possible solutions:
Add a UNIQUE constraint on user_order.shipping_address_id
Or: invert the relationship (this my preferred option as it eliminates unneeded surrogate keys):
Remove the user_order.shipping_address_id column.
Change shipping_address.id to shipping_address.order_id, so that it's a foreign-key of user_order.id
Make shipping_address.order_id the new primary key of shipping_address.
Note that both of these options are a denormalization as it prevents the sharing of shipping addresses between different orders (e.g. if the same customer makes the same repeat order a lot) though this can be intentional - so that if a user's future address changes it won't unintentionally retroactively update old order shipping records.
A few other tips:
Consider using int rather than bigint for your identities - I doubt you'll have over 2 billion rows in each table.
Don't blindly use varchar(255) for all text columns - use it to enforce reasonable constraints on data length, for example state doesn't need to be longer than 2 characters if you're storing abbreviations, ditto zipcode which can be varchar(10) if you're using ZIP+4.
DO NOT STORE FULL CREDIT CARD NUMBERS IN YOUR DATABASE! (as seen in your payment table) - This is a violation of PCI Rules, a massive liability and probably illegal negligence in your jurisdiction. Your payment processor will provide you with a substitute opaque token value (or something similar) as a means of identifying charge-cards and applying future charges to stored payment details - the most you can reasonably store is the last 4 digits. Whether or not your encrypt the data is largely irrelevant.
I have a database with contacts in it. There are two different types of contacts, Vendors and Clients.
The Vendor table has a vendor_contacts table attached via foreign key value to allow for a one to many relationship. The client has a similar table.
These contacts can have a one or many relationship with a phone numbers table. Should i have a separate phone numbers table for each of these or one shared phone number table with two foreign keys allowing one to be null?
OPTION 1
Here I would have to enforce that one of vendor_id or client_id was NULL and the other not NULL in the shared phone table.
OPTION 2
Here each table would have its own phone number table.
TBH I would merge the vendor and client tables and have a 'contact' table. This could have a contact type and would allow for newer contacts to be added.
Consider you want to add something to your contacts - address, you may have to change each table in the same way, then you want birthday (OK maybe not but just as an example) and again, changes to multiple tables. Whereas if you have a single table, it can reduce the overhead of managing this.
This will also mean you have one contact phone number table!
"wasting space" is not really a meaningful concern in modern database systems - and "null" values are usually optimized by the storage engine to take no space anyway.
Instead, I think you need to look at likely query scenarios, at maintainability, and at intelligibility of your schema.
So, in general, a schema that repeats itself - many tables with similar columns - suggest poor maintainability, and often lead to complicated queries.
In your example, imagine a query to find out who called from a given number, and whom they might have been trying to reach.
In option 1, you query the phone number, and outer join it to the two contact tables - relatively easy. In option 2, you have a union of two similar queries (only the table names would change) - duplication and lots of chance for bugs.
Imagine you want to break the phone number into country, region and phone number - in option 2, you have to do this twice (and modify all the queries twice); in option 1, you have to do this only once.
In general terms, repetition is a sign of a bad software design; this also counts for database schemas.
That's also a reason (as #siggisv and #NigelRen suggested) to flatten the vendor_contact and client_contact tables into a single table with a "contact_type" column.
I would use two different tables, a vendor_contacts table and a client_contacts table.
If you only have one table, you always waste space as you will have in each row a null column
option 2
but change vendor_contact and client_contact to 'contact'
and add a 'type' column to 'contact' that identified 'Client' or 'vendor' if you need to separate the records.
I would do as others have suggested and merge vendor_contact and client_contact into one contact table.
But on top of that, I doubt that contact<->phone is a one-to-many relationship. If you consider this example you will see that it's a many-to-many relationship:
"Joe and Mary are both vendors, working in the same office. Therefore they both have the same landline number. They also have each their own mobile number."
So in my opinion you would need to add a contact_number table with two columns of foreign keys, contact_id and phone_id.
I have a table called inventory_movements , and I'm planing to save the products movements in and out the warehouse , it has fields like
1- movement_id(PK)
2- product_id(FK)
3- quantity int
4- unit_price decimal
5- movement ENUM('in','out')
6- date datetime
7- ????????? (reference )(e.g. sell(out)- purchase(in)- fire loss(out)
- sales return (in) - purchase return (out)
my problem is that I want to store the reference of the movement (the cause of the movement) whither it is the order id , or purchase id , purchase return id, .... etc
but I also want to make a constrain on this field to make sure that no invalid data (e.g. not exist purchase) will be stored in the database, of curse I can't make one foreign key references many tables (sales, purchases, purchase returns , ...etc)
a very bad solution is to add column for every reference type (sell id, purchase id, sales return id,etc.. ) and fill the right one in each movement and let the others null , but this is of curse against normalization and I can't add any more reference later.
what can I do in this situation ?
please consider that I'm very newbie, thanks
You have a few approaches. One is to have one foreign key per table type with a constraint that ensures that exactly one is not null. I agree that is clunky but some people prefer it (David Fetter, for example, has blogged about the benefits of this approach).
Another approach is to factor out the common parts of the referenced tables into a single, easily referenced table. If you cannot do this, you can have a trigger-maintained table instead. That would mean something like:
A transaction documents table
A table for sales/purchase data (or maybe different tables for this).
If that cannot be done then you have another table which just stores the ids, relevant tables, and an id for reference purpose, and that is maintained with a trigger, then you have a referring constraint there.
Either way, long-run you are probably going to end up with the second solution (a master transaction journal, and then other tables that extend it).
(Original design question answer below.
Depending on how you want to address this I can see one of two ways of doing it.
The first is to use a basic convention of positive numbers coming in and negative numbers going out. This works for global movements (purchases and sales) but it breaks down for local movements (moving between warehouses).
One option here is to have a separate "states" table which represents both global and local states. For example, purchases, sales, different warehouses, etc. Then you represent the transfer as a graph link between the state. You can also have a documents table which can represent purchases and sales, with appropriate classifictions etc. This allows three-way relationship between an in-state, an out-state, and a document. For example a sale could have an in-state as inventory (or a particular warehouse), an out-state of sale, and a document of the sales invoice.
Of course you can do both, storing global inventory in one way and warehouse movements in the other.
I have a database (for a pet sitting company) containing tables for the following:
Customers
emergency contacts
phone numbers
phone types
The phone numbers are stored in a separate table to allow for efficient storage of a virtually unlimited number of phone numbers per customer. The phone numbers table stores both the customer ID and the phone type ID in addition to the primary key. My question is - is the best way to allow for emergency contacts to have the same functionality with phone number records to add another field to the phone numbers table "emergency contact ID"? Or should I be storing emergency contacts in the same table as customer (and rename it Individuals)? If so, please tell me how to create a relationship between records in the same table.
Thanks so much,
Jessica
You have to ask yourself the question: how far should you really go to respect data normalisation rules?
I'm not sure what the PhoneTypes table would contain, but if it is a list of things like Mobile, Work, Home, iPhone you are probably going a bit too far already: you're not building a contacts application, you're building a Pet Sitting application, there are probably more important areas of the software that would demand your development time.
Increasing the complexity of software is costly: time to mimplement features tends to creep up, as well as complexity and along with it the risks of errors, cost of maintenance, and more often than not, performance suffers as well.
These contacts details are really just properties of the Customer.
A Customer can have multiple telephone numbers and multiple emergency contacts.
Usually these should be listed in order of importance, so if the needs arises, you call the most relevant person first.
Without more information, the way I would handle this would simply to leave 2 memo fields in my Customer table where the user of the app can enter that data in any way she pleases, so she can list it in the right order, make annotations as necessary (Call on Mondays only, Customer's mum, call after 11:00am only, etc).
You can further constrain data input if you wish, like have a textbox where the user enters the details before clicking an 'Add' button that will append the data to the field, for instance by using a semi-conlon or simply CrLf to separate records. The data can then be split on the semi-colon or CrLf and shown in a listbox on the form for better presentation.
You can handle both Customer Phone numbers and Emergency contact numbers in the same way.
This makes things simple: all the Customer data is in one table instead of being split across multiple tables, with no unnecessary joins, it won't take more space than using multiple tables (actually, it will save space). It makes reporting easy (you can simply show the customer list and that will show all available phone numbers for all customer without you having to do anything fancy), it makes searching easy as well.
Having multiple values in a single field is quite common for peripheral data.
Unless you absolutely need to separate contacts, and make complex reports based on them or make sure you can re-use them, you do not have to create tables for every bit of information. Let the application user enter what is relevant for the Customer.
Constrain data entry to format it and check its consistency if you want, but ultimately, unless the purpose of the software is to maintain a complex contact list, don't make it harder than it probably should. A bit of VBA and some string manipulation is sufficient to constrain the data, allow it to be rearanged in the order that's most relevant by the user, and it will make your app snappier by avoiding some complexity.
Anyway, I would start with something simple anyway and see later if splitting the data accross multiple tables makes sense.
Avoid premature optimisation.
However, if you feel you really need to handle this by the book, I would probably handle it as follow:
Store everything in a Contact table that could have properties like these:
ID: unique contact ID
PhoneNumber: TEXT
PhoneTypeID: (whatever that is if it links to your PhoneType table)
IsEmergencyContact: BOOLEAN
ContactName: TEXT, freeform, how to address the contact person
CustomerID: foreign key linking to the Customer table
Notes: MEMO, any useful info about the contact
Rank: INTEGER, a sortable rank of importance for this contact
If you want to decouple the Customer from the Contact, so you can re-use the contact for multiple customers, you would need an intermediary table:
The Contact table would become:
ID: unique contact ID
PhoneNumber: TEXT
PhoneTypeID: (whatever that is if it links to your PhoneType table)
ContactName: TEXT, freeform, how to address the contact person
Notes: MEMO, any useful info about the contact
And the CustomerContact table (that makes the many-to-many relationship possible):
CustomerID: foreign key linking to the Customer table
ContactID: foreign key linking to The Contact table
IsEmergencyContact: BOOLEAN
Rank: INTEGER, a sortable rank of importance for this contact
To display and manage the list of Contacts and list of Emergency contacts, you simply need to filter each listbox or subform where you show the information based on whether IsEmergency is true or false.
Now, if you want the same contact to have multiple phone numbers, you will have to split everything even further:
The Contact table would become:
ID: unique contact ID
ContactName: TEXT, freeform, how to address the contact person
Notes: MEMO, any useful info about the contact
A PhoneNumber table would contain:
ID: Phone record ID
ContactID: foreign key linking to The Contact table
PhoneNumber: TEXT
PhoneTypeID: (whatever that is if it links to your PhoneType table)
Notes: MEMO, any useful info about this particular phone number
Now you have 4 tables to store all the info you need and share it any way you want, so a Customer can have multiple contacts (emergency or not), Contacts can have multiple phone numbers, Contacts can be shared accross customers (so one customer's contact is another customer's emergency contact):
Customer
Contact
PhoneNumber
CustomerContact
As I said, doing it the right way will entail a lot more complexity than maybe you really need.
Be careful about not building complexity prematurely. It's nice to anticipate the worse case scenario, but often, it means that you are prematurely optimising and therefore spending time on an area of the software that is not as important as the core of your app.
You always have to ask yourself: should I spend 2 days implementing this or spend 2 days refining the UI, testing or adding code to ensure data integrity, etc?
More often than not, YAGNI
I'd store it the same way as you store phone numbers; in its own table. This allows you the ability to store multiple numbers, and some people may have multiple emergency contact numbers. You always want to think about scalability when designing a database, and plan for the most complex situations. For example, I would imagine with pet sitting that a lot of your customers would come through word-of-mouth, and it's very possible you'll use the same contact for multiple clients.
Your first instinct (storing customers and contacts in a single table) was correct. If you think about it, customers and contacts are both people. It's just that both customers and emergency contacts are a specialised case of people. We can model this using a relational DB.
Let's create a table to hold info about people:
create table tblPeople (
ID autoincrement primary key
, FirstName varchar(100)
, LastName varchar(100)
, Notes memo
)
Now let's have a table to hold info about customers, but enforcing the fact that customers must also be people:
create table tblCustomers (
ID long primary key
constraint Customers_ID
references tblPeople (ID)
, EmergencyContactID long
constraint Customers_EmergencyContactID
references tblPeople (ID)
)
This is called a one-to-one relationship and is used to implement specialisation--like inheritance in object-oriented programming.
You have a choice here. Do you want to let each person have an arbitrary number of phone numbers of arbitrary types? This is obviously more general and more powerful. But also more complicated. Or do you want to go back and just store a fixed number of phone numbers for each person?
Let's say you want to do the former just to take it all the way. In that case, you need a table to hold phone numbers:
create table tblPhoneNumbers (
ID autoincrement primary key
, PhoneNumber varchar(15)
)
Notice how we don't specify here anything about what type of phone number it is. That part is next:
create table tblPhoneNumberTypes (
ID autoincrement primary key
, PhoneNumberType varchar(20) not null
)
Now we associate each person with a phone number and type:
create table tblPeople_to_PhoneNumberTypes_to_PhoneNumbers (
PersonID long not null
references tblPeople (ID)
, PhoneNumberTypeID long not null
references tblPhoneNumberTypes (ID)
, PhoneNumberID long not null
references tblPhoneNumbers (ID)
, constraint People_to_PhoneNumberTypes_to_PhoneNumbers_PK
primary key (
PersonID
, PhoneNumberTypeID
, PhoneNumberID
)
)
Here each person (and therefore each customer and each emergency contact) can have an arbitrary number of phone numbers of arbitrary types. Hence this is actually a many-to-many-to-many link table. I believe that is the key (or let's say 'secret sauce') to your contact-phone type-phone number model.
In link tables like the above I prefer to use multiple-column primary keys as I feel there is no useful purpose to an integer primary key column. Here the primary key enforces the fact that each person-and-phone-number combination should be listed only once, with one phone number type.
Note that the above is all valid Access ANSI-92 SQL.
i wanna have a Users details stored in the database.. with columns like firstname, last name, username, password, email, cellphone number, activation codes, gender, birthday, occupation, and a few other more. is it good to store all of these on the same table or should i split it between two users and profile ?
If those are attributes of a User (and they are 1-1) then they belong in the user table.
You would only normally split if there were many columns; then you might create another table in a 1-1 mapping.
Another table is obviously required if there are many profile rows per user.
One table should be good enough.
Two tables or more generally vertical portioning comes in when you want to scale out. So you split your tables in multiple tables where usually the partiotioning criteria is the usage i.e., the most common attributes which are used together are housed in one table and others in another table.
One table should be okay. I'd be storing a hash in the password column.
I suggest you read this article on Wikipedia. about database normalization.
It describes the different possibilities and the pros and cons of each. It really depends on what else you want to store and the relationship between the user and its properties.
Ideally one table should be used. If the number of columns becomes harder to manage only then you should move them to another table. In that case, ideally, the two tables should have a one-one relationship which you can easily establish by setting the foreign key in the related table as the primary key:
User
-------------------------------
UserID INT NOT NULL PRIMARY KEY
UserProfile
-------------------------------------------------------
UserID INT NOT NULL PRIMARY KEY REFERENCES User(UserID)
Depend on what kind of application it is, it might be different.
for an enterprise application that my users are the employees as well, I would suggest two tables.
tbl_UserPersonallInformation
(contains the personal information
like name, address, email,...)
tbl_UserSystemInformation (contains
other information like ( Title,
JoinedTheCompanyOn,
LeftTheCompanyOn)
In systems such as "Document Managements" , "Project Information Managements",... this might be necessary.
for example in a company the employees might leave and rejoin after few years and even they will have different job title. The employee had have some activities and records with his old title and he will have some more with the new one. So it should be recorded in the system that with which title (authority) he had done some stuff.