QUESTION: Is it okay to have "shortcut" identifiers in a table so that I don't have to do a long string of joins to get the information I need?
To understand what I'm talking about, I'm going to have to lay ouf an example here that looks pretty complicated but I've simplified the problem quite a bit here, and it should be easily understood (I hope).
The basic setup: A "company" can be an "affiliate", a "client" or both. Each "company" can have multiple "contacts", some of which can be "users" with log in privileges.
`Company` table
----------------------------------------------
ID Company_Name Address
-- ----------------------- -----------------
1 Acme, Inc. 101 Sierra Vista
2 Spacely Space Sprockets East Mars Colony
3 Cogswell Cogs West Mars Colony
4 Stark Industries Los Angeles, CA
We have four companies in our database.
`Affiliates` table
---------------------
ID Company_ID Price Sales
-- ---------- ----- -----
1 1 50 456
2 4 50 222
3 1 75 14
Each company can have multiple affiliate id's so that they can represent the products at different pricing levels to different markets.
Two of our companies are affiliates (Acme, Inc. and Stark Industries), and Acme has two affiliate ID's
`Clients` table
--------------------------------------
ID Company_ID Referring_affiliate_id
-- ---------- ----------------------
1 2 1
2 3 1
3 4 3
Each company can only be a client once.
Three of our companies are clients (Spacely Space Sprockets, Cogswell Cogs, and Stark Industries, who is also an affiliate)
In all three cases, they were referred to us by Acme, Inc., using one of their two affiliate ID's
`Contacts` table
-----------------------------------------
ID Name Email
-- -------------- ---------------------
1 Wylie Coyote wcoyote#acme.com
2 Cosmo Spacely boss#spacely.com
3 H. G. Cogswell ceo#cogs.com
4 Tony Stark tony#stark.com
5 Homer Simpson simpson#burnscorp.com
Each company has at least one contact, but in this table, there is no indication of which company each contact works for, and there's also an extra contact (#5). We'll get to that in a moment.
Each of these contacts may or may not have a login account on the system.
`Contacts_type` table
--------------------------------------
contact_id company_id contact_type
---------- ---------- --------------
1 1 Administrative
2 2 Administrative
3 3 Administrative
4 4 Administrative
5 1 Technical
4 2 Technical
Associates a contact with one or more companies.
Each contact is associated with a company, and in addition, contact 5 (Homer Simpson) is a technical contact for Acme, Inc, and contact 4 (Tony Stark) is a both an administrative contact for company 4 (Stark Industries) and a technical contact for company 3 (Cogswell Cogs)
`Users` table
-------------------------------------------------------------------------------------
ID contact_id company_id client_id affiliate_id user_id password access_level
-- ---------- ---------- --------- ------------ -------- -------- ------------
1 1 1 1 1 wylie A03BA951 2
2 2 2 2 NULL cosmo BF16DA77 3
3 3 3 3 NULL cogswell 39F56ACD 3
4 4 4 4 2 ironman DFA9301A 2
The users table is essentially a list of contacts that are allowed to login to the system.
Zero or one user per contact; one contact per user.
Contact 1 (Wylie Coyote) works for company 1 (Acme) and is a customer (1) and also an affiliate (1)
Contact 2 (Cosmo Spacely) works for company 2 (Spacely Space Sprockets) and is a customer (2) but not an affiliate
etc...
NOW finally onto the problem, if there is one...
Do I have a circular reference via the client_id and affiliate_id columns in the Users table? Is this a bad thing? I'm having a hard time wrapping my head around this.
When someone logs in, it checks their credentials against the users table and uses users.contact_id, users.client_id, and users.affiliate_id to do a quick look up rather than having to join together a string of tables to find out the same information. But this causes duplication of data.
Without client_id in the users table, I would have to find the following information out like this:
affiliate_id: join `users`.`contact_id` to `contacts_types`.`company_id` to `affiliates`.`company_id`
client_id: join `users`.`contact_id` to `contacts_types`.`company_id` to `clients`.`company_id`
company_id: join `users`.`contact_id` to `contacts_types`.`company_id` to `company`.`company_id`
user's name: join `users`.`contact_id` to `contacts_types`.`contact_id` to `contacts`.`contact_id` > `name`
In each case, I wouldn't necessarily know if the user even has an entry in the affiliate table or the clients table, because they likely have an entry in only one of those tables and not both.
Is it better to do these kinds of joins and thread through multiple tables to get the information I want, or is it better to have a "shortcut" field to get me the information I want?
I have a feeling that over all, this is overly complicated in some way, but I don't see how.
I'm using MySQL.
it's better to do the joins. you should only be denormalizing your data when you have timed evidence of a slow response.
having said that, there are various ways to reduce the amount of typing:
use "as" to give shorter names to your fields
create views. these are "virtual tables" that already have your standard joins built-in, so that you don't have to repeat that stuff every time.
use "with" in sql. this lets you define something like a view within a single query.
it's possible mysql doesn't support all the above - you'll need to check the docs [update: ok, recent mysql seems to support views, but not "with". so you can add views to do the work of affiliate_id, client_id etc and treat them just like tables in your queries, but keeping the underlying data nicely organised.]
Related
I am designing a database, and I would like to know;
Can I answer this question with queries, how much skill employees earned from this trainings?
Is this a good structure to do it?
how much money spent per department
how much skill earned per employee
how much skill earned per department
id session_name Skill impact sugg dept function training_value training no
1 PHP Software 3 Sales 2 100usd 1
2 PHP Software 3 Finance 2 100usd 1
3 PHP communication 2 Sales 2 100usd 1
4 PHP communication 2 Finance 2 100usd 1
5 ASP Software 4 Sales 2 200usd 2
6 ASP Software 4 Finance 2 200usd 2
7 ASP database 1 Sales 2 200usd 2
8 ASP database 1 Finance 2 200usd 2
attended training table
id student_id training_no
1 1 1
1 1 2
student table
id name department
1 John 1
2 Mary 2
department table
id name
1 sales
2 finance
In the end I need to find skills for each student
john
software 7
communication 2
database 1
total spent
john 300 usd
total spent by department
sales 300 usd
Your schema looks OK to me.
You should, however, think about entities and relationships.
Your entities seem to be trainings, people, and departments.
You have a many:many relationship for people:trainings. That's good.
You have a one:many relationship for departments:people. That's also good.
It looks like you want some kind of relationship for trainings:departments. I'm guessing here, but you have a sugg dept column in your trainings table. Is that supposed to have a direct relationship to your departments table?
Do you actually need an extra entity called "attendance" rather than just a many-to-many relationship people:trainings. Do you want to record when a person did a training? Do you want to record how much that particular attendance cost? How about what marks they received if there was a quiz?
In that case, you'll want relationships where each person has zero or more attendances, each attendance has exactly one training, and each training has zero or more attendances.
My point: do the hard work of thinking through your entities and relationships, and the result will be a good design for your tables.
If I may put it another way: What part of the real world are you trying to capture in your data base? What's valuable in the real world that you want your data base to hold? In your application ...
Students are people. They are, umm, inherently valuable and persistent entities.
Trainings represent the labor and cost of creating them and presenting them.
Attendances represent the effort of students.
Departments probably pay the bill for attendances. They certainly represent power centers in your application.
What other items of value exist in this corner of the real world? Teachers? Managers? Venues (classrooms)? Equipment? Customers?
My point is, figure out your entities -- the items of value -- and the relationships between them. Then write your table definitions.
I have a table in Mysql that looks like this:
Name Slot_1 Slot_2 Slot_3
------ ------ ------ --------
MyName MyItem
MyName2
And I have this app with an interface to buy some products. When the client "MyName" buy something in the app, it would store it on "Slot_2" because "Slot_1" is already used; On the other hand, if MyName2 buys the same product, it would be stored on Slot_1 because it's the first free slot found. I searched through several links, but all I found was how to update the first empty row found, and not the column.
Note: The app is made in VB.net, so if you think it would be better to do this app-level, feel free to comment.
Note 2: This is the best db-design I found to do this "inventory". If you think there is a better way to do it, please point me to it.
That is really a bad design. I really suggest to read something basic on building relational databases.
Said that the only correct approach is to have at least three tables
The first table contains the customer informations
IDCustomer Name Address City
---------- ------ ------ --------
1 Steve XXXX YYYY
2 Andre KKKK ZZZZ
The second table contains the Items that can be sold
IDItem Item Price
---------- ------ ------
1 Apple 1
2 Orange 2
The third table links the Customers with the Items sold
IDItem IDCustomer Quantity
------- ----------- ---------
1 1 5
2 2 10
1 2 3
Also if you want to limit the items sold to a customer to only three, that is a business rule that should not be enforced by a database design like that.
My question about SEARCH query performance.
I've flattened out data into a read-only Person table (MySQL) that exists purely for search. The table has about 20 columns of data (mostly limited text values, dates and booleans and a few columns containing unlimited text).
Person
=============================================================
id First Last DOB etc (20+ columns)...
1 John Doe 05/02/1969
2 Sara Jones 04/02/1982
3 Dave Moore 10/11/1984
Another two tables support the relationship between Person and Activity.
Activity
===================================
id activity
1 hiking
2 skiing
3 snowboarding
4 bird watching
5 etc...
PersonActivity
===================================
id PersonId ActivityId
1 2 1
2 2 3
3 2 10
4 2 16
5 2 34
6 2 37
7 2 38
8 etc…
Search considerations:
Person table has potentially 200-300k+ rows
Each person potentially has 50+ activities
Search may include Activity filter (e.g., select persons with one and/or more activities)
Returned results are displayed with person details and activities as bulleted list
If the Person table is used only for search, I'm wondering if I should add the activities as comma separated values to the Person table instead of joining to the Activity and PersonActivity tables:
Person
===========================================================================
id First Last DOB Activity
2 Sara Jones 04/02/1982 hiking, snowboarding, golf, etc.
Given the search considerations above, would this help or hurt search performance?
Thanks for the input.
Horrible idea. You will lose the ability to use indexes in querying. Do not under any circumstances store data in a comma delimited list if you ever want to search on that column. Realtional database are designed to have good performance with tables joined together. Your database is relatively small and should have no performance issues at all if you index properly.
You may still want to display the results in a comma delimted fashion. I think MYSQL has a function called GROUP_CONCAT for that.
I would like to create a relational database in which there are two tables Users and products. Each item of each table can be related to many items of the second table.
My current implementation is as follows:
Two main tables-
->Users
User ID
UserInfo
->Products
Product ID
ProductInfo
Two different lookuptables
->UserToProduct
UserID
ProductID
->ProductToUSer
ProductID
UserID
Each time a relation from a user to a product is added, i just add an extra row to the first lookup table, and vice versa.
Is this the right way to do it? Are there any standard models for such scenarios that I can refer to?
You don't need two lookup tables, all you need is users_products. As far as resources, there are zillions, just google "database many to many".
UPDATE
Consider this data:
products
------------
id info
------------
1 car
2 flute
3 football
users
------------
id info
------------
10 bob
20 tim
30 manning
Now, as a simple example, let's say manning owns a football and and a car. Let's say tim owns a flute and a football. Now here's your lookup table:
users_products
----------------------
user_id product_id
----------------------
20 2
20 3
30 3
30 1
That's it. Now you can do queries like "give me all the users that have cars", or "give me all the cars that a user has", etc.
Cheers
You really don't need or want two different lookup tables. You should just have one (either of your tables, UserToProduct or ProductToUser, would be fine). The primary key of the lookup table should be a composite key consisting of both ProductID and UserID.
I am trying to store a family tree.
Here is the platform that I am using, Zend framework, Mysql, Ajax
I have searched stackoverflow I came across this post which is very helpful in handling data in terms of objects.
"Family Tree" Data Structure
I'll Illustrate my use case in brief.
User can create family members or friends based on few relations defined in database. I have Model for relations too. User can create family members like Divorced spouse, frineds. Max the Tree can be deep that we are assuming max to kids of the grandchildren but it can expand in width too. Brother/sister & their family.
I am looking an efficient database design for lesser query time. If I have to use the data structures described in above post where I must keep them as they necessary have to be a Model.
For representation I am planning to use Visualization: Organizational Chart from
http://code.google.com/apis/chart/interactive/docs/gallery/orgchart.html#Example
I'll summarize what I need
Database design
Placing of controllers (ajax) & models
The people that the user will create they will not be any other users. just some another data
yeah thats it! I'll post a complete solution on this thread when I'll be completing the project, of course with help of expertise of u guys
Thanks in advance
EDIT I I'll Contribute more to elaborate my situation
I have a user table, a relation table, & last family/family tree table
the Family table must have similar structure to following
ID userid relation id Name
1 34 3 // for son ABC
2 34 4 // for Wife XYZ
3 34 3 // for Mom PQR
4 34 3 // for DAd THE
5 34 3 // for Daughter GHI
6 34 3 // for Brother KLM
The drawback for this approach is generating relations to the other nodes like daughter-in-law, wifes brother & their family.
The ideal way of doing is for a user we can add Parents, siblings, children & for extra relations they must be derived from the family members relation i.e. Brother-in-law must be derived as sister's husband, or wife's brother.
THis is what I can think now. I just need Implementation guidelines.
Hope this helps u guys to provide a better solution.
I guess that from the database point of view it would be best to implement it like
id | name | parent_male | parent_female
Other option would be string prefixing
id | name | prefix
1 | Joe | 0001
2 | Jack | 000100001 //ie. Joes son
3 | Marry| 0001 //ie. Jacks mother
4 | Eve | 0002 // new family tree
5 | Adam | 00020001 // ie. Eves son
6 | Mark | 000200010001 // ie. Adams son
Other (more effective) algorithms like MPTT assume that the data is a tree, which in this case is not (it has circles).
To show it would work - to select Mark's grandparents:
--Mark
SELECT prefix FROM family_tree WHERE id = 6;
-- create substring - trim N 4-character groups from the end where N is N-th parent generation => 2 for grandparent ==> 0002
--grandparents
SELECT * FROM family_tree WHERE prefix = '0002'
-- same for other side of family
-- cousins from one side of family
SELECT * FROM family_tree WHERE prefix LIKE '0002%' AND LENGTH(prefix) = 12