Database design naming: should use "id" or "table_name" + "id"? - mysql

So, for each table there is a id, generally it is int(11) Auto inc, not null.
When I was in school I mainly named it as id,
However, when I encounter the more complex database design , I found if I use "id", I have to do more in the Select Query
For example, there is a table "customer", "customer_group"
So I simply get the customer and his customer_group info like this:
SELECT *
FROM customer
JOIN customer_group
ON customer.group_id = customer_group.id
Notice that there will be two id return, if I would like differentiate them, I need to do:
SELECT customer.id AS cid
,customer_group_id AS cgid
,customer.NAME
,.......
FROM customer
JOIN customer_group
ON customer.group_id = customer_group.id
That makes the work very tedious and the query is long. So, I wonder what is the practice in naming the id, should I use table_name + "id" to name it? Thanks

We can do something like this using aliasing, to make it less tedious and more readable..
SELECT c.id AS cid
,customer_group_id AS cgid
,c.NAME
,.......
FROM customer c
JOIN customer_group cg
ON c.group_id = cg.id
In Case of Naming the ID the best practice of ID is opinion based, or implementation standard based. Per situation the benefits varies on each method.
Generally, the ID of a X Table should be ID but when it is used in other table lets say Y Table as foreign key it will become X_ID
So, this will make things clear in the Y table that this X_ID refers to(coming from) X table's ID
Naming conventions are defined as per convenience of developers then it becomes methodology and later-on it will become standard.
But, this also affects us with the change in development environment..
I will suggest if its commercial project choose naming convention as per the standard.
If personal project, choose what suits your habits..

Related

Multivalued attributes in MySQL

I am working on a database in MySQL to show multivalued attributes. I am trying to find a way to create a parent and child table. The idea that I am working with is having an employee table and a hobby table. The hobbies in the hobby table would be considered the multivalued attributes since employees can have more than one hobby. My question would be, when creating these tables, should I use 3NF and add a 3 table to show the relation between the two or is there a way to implement this idea with simply two tables. I’m not very familiar with multivalued attributes so I need help creating the tables as far as what kind of keys each would use, as well as the end form. I believe I would need to make it in 3NF form and have the hobbies as the multivalued attribute, but have the hobby id from the hobby table as a primary key and the employee id as another primary key and then making the relational table contain both the employee id and hobby id as foreign keys referencing the other two tables. Any help or suggestions to show multivalued attributes would be greatly appreciated!
You have a table for employees already. It probably has a primary key we'll call employee_id.
You need a table for hobbies. It will need a primary key we'll call hobby_id.
Then, you need a way to relate employees and hobbies many-to-many. That's implemented with a third table, let's call it employees_hobbies. Using a name like that is a good idea, because the next guy to work on your code will recognize its purpose right away.
employees_hobbies should have two columns, employee_id and hobby_id. Those two columns together should be the composite primary key. Then, to confer a hobby on an employee, you add a row to employees_hobbies containing the two id values. If an employee drops a hobby, you delete the row.
If you want a list of employees showing their hobbies, you do this
SELECT e.name, GROUP_CONCAT(h.hobbyname) hobbies
FROM employees e
LEFT JOIN employees_hobbies eh ON e.employee_id = eh.employee_id
LEFT JOIN hobbies h ON eh.hobby_id = h.hobby_id
GROUP BY e.employee_id, e.name
Use LEFT JOIN operations here to keep employees without any hobbies (all work and no play) in your list.
If you want to find the most common five hobbies and the employees doing them, try this
SELECT COUNT(*) hobbycount, h.hobbyname,
GROUP_CONCAT(e.name ORDER BY e.name) people
FROM hobbies h
LEFT JOIN employees_hobbies eh ON h.hobby_id = eh.hobby_id
LEFT JOIN employees e ON eh.employee_id = e.employee_id
GROUP BY h.hobbyname
ORDER BY 1 DESC
LIMIT 5
This way of handling many-to-many relationships gives you all kinds of ways of slicing and dicing your data.
MySQL is made for this sort of thing and handles it very efficiently at small scale and large, opinions to the contrary notwithstanding.
(Avoid putting a surrogate primary id key into your employees_hobbies table. It adds no value.)
mva is not very good for mysql .
and best way to store it depend from price you can pay and accessibility need . if you need index , because database is big , and highly loaded , then possible you will need 2 tables .
employee( id , name )
employee_hobbies ( id , employeeid , hobbyid )
but in simplest case , or if you need good accessibility, you can just add text field to employee table , store there comma separated hobbyid , and then select by FIND_IN_SET() function .
e.q. single table
employee( id , name , MVA VARCHAR(512) )
you need be sure that all ids comma separated will fit into fields , side .
SELECT * from employee where FIND_IN_SET(some_hobbyid , MVA)
advantage of this method is less queries ,
disadvantage - may slower then 1st .
also there is advantages for high load system , when import into sphinx ... but this is another story ...

Whats the best way to implement a database with multivalued attributes?

i am trying to implement a database which has multi valued attributes and create a filter based search. For example i want my people_table to contain id, name, address, hobbies, interests (hobbies and interests are multi-valued). The user will be able to check many attributes and sql will return only those who have all of them.
I made my study and i found some ways to implement this but i can't decide which one is the best.
The first one is to have one table with the basic info of people (id, name, address), two more for the multi-valued attributes and one more which contains only the keys of the other tables (i understand how to create this tables, i don't know yet how to implement the search).
The second one is to have one table with the basic info and then one for each attribute. So i will have 20 or more tables (football, paint, golf, music, hiking etc.) which they only contain the ids of the people. Then when the user checks the hobbies and the activities i am going to get the desired results with the use of the JOIN feature (i am not sure about the complexity, so i don't know how fast is going to be if the user do many checks).
The last one is an implementation that i didn't find on internet (and i know there is a reason :) ) but in my mind is the easiest to implement and the fastest in terms of complexity. Use only one table which will have the basic infos as normal and also all the attributes as boolean variables. So if i have 1000 people in my table there are going to be only 1000 loops and which i imagine with the use of AND condition are going to be fast enough.
So my question is: can i use the the third implementation or there is a big disadvantage that i don't get? And also which one of the first two ways do you suggest me to use?
That is a typical n to m relation. It works like this
persons table
------------
id
name
address
interests table
---------------
id
name
person_interests table
----------------------
person_id
interest_id
person_interests contains a record for each interest of a person. To get the interests of a person do:
select i.name
from interests i
join person_interests pi on pi.interest_id = i.id
join persons p on pi.person_id = p.id
where p.name = 'peter'
You could create also tables for hobbies. To get the hobbies do the same in a separate query. To get both in one query you can do something like this
select p.id, p.name,
i.name as interest,
h.name as hobby
from persons p
left join person_interests pi on pi.person_id = p.id
left join interests i on pi.interest_id = i.id
left join person_hobbies ph on ph.person_id = p.id
left join hobbies h on ph.hobby_id = h.id
where p.name = 'peter'
The basic way to deal with this is with a many-to-many join table. Each user can have many hobbies. Each hobby can have many users. That's basic stuff you can find information about anywhere, and #juergend already covered that.
The harder part is tracking different information about various hobbies and interests. Like if their hobby is "baseball" you might want to track what position they play, but if their hobby is "travel" you might want to track their favorite countries. Doing this with typical SQL relationships will lead to a rapid proliferation of tables and columns.
A hybrid approach is to use the new JSON data type to store some unstructured data. To expand on #juergend's example, you might add a field to Person_Interests which can store some of those details about that person's interest.
create table Person_Interests (
InterestID integer references Interests(ID),
PersonID integer references Persons(ID),
Details JSON
);
And now you could add that Person 45 has Interest 12 (travel), their favorite country is Djibouti, and they've been to 45 countries.
insert into person_interests
(InterestID, PersonID, Details)
(12, 45, '{"favorite_country": "Djibouti", "countries_visited": 45}');
And you can use JSON search functions to find, for example, everyone whose favorite country is Djibouti.
select p.id, p.name
from person_interests pi
join persons p on p.id = pi.personid
where pi.details->"$.favorite_country" = "Djibouti"
The advantage here is flexibility: interests and their attributes aren't limited by your database schema.
The disadvantages is performance. The JSON data type isn't the most efficient, and indexing a JSON column in MySQL is complicated. Good indexing is critical to good SQL performance. So as you figure out common patterns you might want to turn commonly used attributes into real columns in real tables.
The other option would be to use table inheritance. This is a feature of Postgres, not MySQL, and I'd recommend considering switching. Postgres also has better and more mature JSON support and JSON columns are easier to index.
With table inheritance, rather than having to write a completely new table for every different interest, you can make specific tables which inherit from a more generic one.
create table person_interests_travel (
FavoriteCountry text,
CountriesVisited text[]
) inherits(person_interests);
This still has InterestID, PersonID, and Details, but it's added some specific columns for tracking their favorite country and countries they've visited.
Note that text[]. Postgresql also supports arrays so you can store real lists without having to create another join table. You can also do this in MySQL with a JSON field, but arrays offer type constraints that JSON does not.

Aid with SQL query for my JAVA application

I'm not much of an SQL guy so forgive me if something similar has been asked before. I'm not even sure what I would need to search for in order to learn this. Since I only need to do something like this once I thought I could justify asking.
I'm writing one of my first android applications that needs to talk to an online database, and have successfully written a couple of SQL queries that work well with my application, but this one is slightly complicated for my basic knowledge.
Below I have provided a sample of what I need in what I feel is understandable by anyone with at least a basic knowledge of SQL. I am wondering if any kind soul would be able to help scratch up a query or give me a little insight for what I would need to do. Thanks in advance!
Pseudo Sample:
SELECT *
FROM events
WHERE user_has_event.user_user_id = user.user_id AND user_has_event.attendance = 1 OR 2
JOIN attendance
Here is a basic visual of my tables (Without user table):
Event Table User_has_event Table
----------------------------------- ---------------------------------------
|event_id|event_name|event_society| |user_user_id|event_event_id|attendance|
----------------------------------- ---------------------------------------
| | | | | | | |
Here is my desired outcome:
Outcome Table
----------------------------------------------
|event_id|event_name|event_society|attendance|
----------------------------------------------
| | | | |
Since your knowledge of SQL is basic, I'll expand a bit (well, as it turns out, rather a lot) on Andy's answer. First, the t1 and t2 are not required, but are a convenience. You can refer to a table directly, and don't have to if the field names are unique. You could do this:
SELECT
events.event_id,
events.event_name,
events.event_society,
user_has_event.attendance
FROM
events
INNER JOIN user_has_event ON events.event_id = user_has_event.event_event_id
As you can see, that is rather long-winded and tedious. So you can, when you first reference a table, immediately follow it with an abbreviation as Andy has done, and indeed as it is generally considered best practice to do. Now, you could also do this:
SELECT
event_id,
event_name,
event_society,
attendance
FROM
events
INNER JOIN user_has_event ON event_id = event_event_id
You can get away with this because all of the field names are unique in the tables accessed by your SELECT statement. Since this is often not true, it's not a good idea, since it's too easy to miss an ambiguous reference. Andy's is the best way to do it. Now, you might have gone out of your way to use different field names because you didn't know that you could reference the table using Table.Field syntax. It's often clearer to use the same field name; different people feel differently about this. I generally just use "ID" for the primary key in each table. That works because you can resolve ambiguities by using Table.Field to refer to a field.
This leads to the next thing you will find it helpful to know, which is that you can assign whatever field name you want to the output with the AS keyword. Suppose I rename your fields thus:
Event
ID
Name
Society
UserEvent
ID
EventID
Attendance
Now, have a look at this:
SELECT
e.ID AS 'Event ID',
e.Name AS 'Event Name',
e.Society AS 'Event Society',
ue.Attendance
FROM
Events e
INNER JOIN UserEvent ue ON e.ID = ue.EventID
Now you have decoupled the name of the selected field from the name of the field in the outcome, which should save you headaches down the line. An important principle is that the way that you store the data and the way that you format data output should be loosely coupled. You don't want considerations of how you want your output data to look to dictate how you should name your fields, so you need to know this stuff.
Now, let's pretend that you also have a User table (you probably do). Let's say it looks like this (it probably doesn't):
User
ID
FirstName
LastName
OtherStuff
Now, we'll modify the UserEvent table thus, to include a foreign key to the User table:
UserEvent
ID
EventID
UserID
Attendance
Now, have a look at this:
SELECT
e.Name AS 'Event Name',
e.Society AS 'Event Society',
u.LastName + ', ' + u.FirstName AS 'User Name',
ue.Attendance
FROM
Events e
JOIN UserEvent ue ON e.ID = ue.EventID
JOIN User u ON u.ID = ue.UserID
This should give you the basics, except for the WHERE clause, the basics of which you can probably pick up on your own (feel free to ask questions about the WHERE clause as well).
One side note: a JOIN is the same as an INNER JOIN, the most common type of join, representing the intersection of two sets. There are also LEFT, RIGHT, and (sometimes) OUTER joins. I generally just say JOIN rather than INNER JOIN; again, different people feel differently about this. Consistency is the most important principle here.
You can add in USER table similarly ... but for the basic output you requested see the following
SELECT
t1.event_id,
t1.event_name,
t1.event_society,
t2.attendance
FROM
events t1
INNER JOIN user_has_event t2 ON t1.event_id = t2.event_event_id

What is the proper way to store friendship associations in a mysql DB

I want to create a table where my users can associate a friendship between one another. Which at the same time this table will work in conjunction to what I would to be a one-to-many relation between various other tables I am attempting to work up.
Right now I am thinking of something like this
member_id, friend_id, active, date
member_id would be the column of the user making the call, friend_id would be the column of the friend they are attempting to tie to, active would be a toggle of sorts 0 = pending, 1 = active, date would just be a logged date of the last activity on that particular row.
Now my confusion is if I were to query I would typically query for member_id then base the rest of the query off of associated friend_id's to display data accordingly to the right people. So with this logic of sorts in mind, that makes me think I would have to have 2 rows per request. One where its the member_id who's requesting and the friend_id of the request inserted into the table, then one thats the opposite so I could query accordingly every time. So in essences its like double dipping for every one action requested to this particular table I need to make 2 like actions to make it work.
Which in all does not make sense to me as far as optimization goes. So in all my question is what is the proper way to handle data for relations like this? Or am I actually thinking sanely about this being an approach to handling it?
If a friendship is always mutual, then you can choose between data redundancy (i.e. both directions having a row) for the sake of simpler queries, or learn to live with slightly more complex queries. I'd personally avoid data redundancy unless there is a compelling reason otherwise - you're not just wasting space and performance, but you'll need to be careful when enforcing it - a simple CHECK is incapable of referencing other rows and depending on your DBMS a trigger may be limited in what it can do with a mutating table.
An easy way ensure to only one row per friendship is to always insert the lower value in member_id and higher value in friend_id (make a constraint CHECK (member_id < friend_id) to enforce it). Then, when you query, you'll have search in both directions - for example, finding all friends of the given person (identified by person_id) would look something like this:
SELECT *
FROM
person
WHERE
id <> :person_id
AND (
id IN (
SELECT friend_id
FROM friendship
WHERE member_id = :person_id
)
OR
id IN (
SELECT member_id
FROM friendship
WHERE friend_id = :person_id
)
)
BTW, in this scheme, you'd probably want to rename member_id and friend_id to, say, friend1_id and friend2_id...
Two ways to look at it:
WHERE ((friend_id = x AND member_id = y) OR (friend_id = y AND member_id = x))
would allow you to query by simply stating one side of the relationship. If both sides are added, this method would still work without causing duplicate rows to be returned.
Conversely, adding both sides of the relationship, so that your queries consist of
WHERE friend_id = x AND member_id = y
not only makes queries easier to write, but also easier to plan (meaning better DB performance).
My vote is for the latter option.
Beautiful - there's no problem with your table as-is.
ALSO:
I'm not sure if this cardinality is "one to many", or "many to many":
http://en.wikipedia.org/wiki/Cardinality_%28data_modeling%29
Q: I were to query I would typically query for member_id then base the
rest of the query off of associated friend_id's to display data
accordingly to the right people
A: Frankly, I don't see any problem querying "member to friend", or "friend to member" (or any other combinations - e.g. friends who share friends). Again, it looks good.
Introduce a helper table like:
users
user_id, name, ...
friendship
user_id, friend_id, ....
select u.name as user, u2.name as friend from users u
inner join friendship f on f.user_id = u.user_id
inner join users u2 on u2.user_id = f.friend_id
I think this is pretty similar to what you have, just putting a query as an example.

MYSQL join tables based on column data and table name

I'm wondering if this its even posible.
I want to join 2 tables based on the data of table 1.
Example table 1 has column food with its data beeing "hotdog".
And I have a table called hotdog.
IS it possible to do a JOIN like.
SELECT * FROM table1 t join t.food on id = foodid
I know it doesnt work but, its even posible, is there a work arround?.
Thanks in advance.
No, you can't join to a different table per row in table1, not even with dynamic SQL as #Cade Roux suggests.
You could join to the hotdog table for rows where food is 'hotdog' and join to other tables for other specific values of food.
SELECT * FROM table1 JOIN hotdog ON id = foodid WHERE food = 'hotdog'
UNION
SELECT * FROM table1 JOIN apples ON id = foodid WHERE food = 'apples'
UNION
SELECT * FROM table1 JOIN soups ON id = foodid WHERE food = 'soup'
UNION
...
This requires that you know all the distinct values of food, and that all the respective food tables have compatible columns so you can UNION them together.
What you're doing is called polymorphic associations. That is, the foreign key in table1 references rows in multiple "parent" tables, depending on the value in another column of table1. This is a common design mistake of relational database programmers.
For alternative solutions, see my answers to:
Possible to do a MySQL foreign key to one of two possible tables?
Why can you not have a foreign key in a polymorphic association?
I also cover solutions for polymorphic associations in my presentation Practical Object Oriented Models In SQL, and in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Only with dynamic SQL. It is also possible to left join many different tables and use CASE based on type, but the tables would be all have to be known in advance.
It would be easier to recommend an appropriate design if we knew more about what you are trying to achieve, what your design currently looks like and why you've chosen that particular table design in the first place.
-- Say you have a table of foods:
id INT
foodtype VARCHAR(50) (right now it just contains 'hotdog' or 'hamburger')
name VARCHAR(50)
-- Then hotdogs:
id INT
length INT
width INT
-- Then hamburgers:
id INT
radius INT
thickness INT
Normally I would recommend some system for constraining only one auxiliary table to exist, but for simplicity, I'm leaving that out.
SELECT f.*, hd.length, hd.width, hb.radius, hb.thickness
FROM foods f
LEFT JOIN hotdogs hd
ON hd.id = f.id
AND f.foodtype = 'hotdog'
LEFT JOIN hamburgers hb
ON hb.id = f.id
AND f.foodtype = 'hamburger'
Now you will see that such a thing can be code generated (or even for a very slow prototype dynamic SQL on the fly) from SELECT DISTINCT foodtype FROM foods given certain assumptions about table names and access to the table metadata.
The problem is that ultimately whoever consumes the result of this query will have to be aware of new columns showing up whenever a new table is added.
So the question moves back to your client/consumer of the data - how is it going to handle the different types? And what does it mean for different types to be in the same set? And if it needs to be aware of the different types, what's the drawback of just writing different queries for each type or changing a manual query when new types are added given the relative impact of such a change anyway?