Table join--multiple rows to/ from one column (/ cell) - mysql

I have searched for a solution for this problem, but haven't found it (yet), probably because I don't quite know how to explain it properly myself. If it is posted somewhere already, please let me know.
What I have is three databases that are related to each other; main, pieces & groups. Basically, the main database contains the most elementary/ most used information from a post and the pieces database contains data that is associated with that post. The groups database contains all of the (long) names of the groups a post in the main database can be 'posted in'. A post can be posted in multiple groups simultaneously. When a new post is added to my site, I check the pieces too see if there are any duplicates (check if the post has been posted already). In order to make the search for duplicates more effective, I only check the pieces that are posted in the same group(s).
Hopefully you're still with me, cause here's where it starts to get really confusing I think (let me know if I need to specify things more clearly): right now, both the main and the pieces database contain the full name of the group(s) (basically I'm not using the groups database at all). What I want to do is replace the names of those groups with their associated IDs from the groups database. For example, I want to change this:
from:
MAIN_table:
id  |  group_posted_in
--------|---------------------------
1   | group_1, group_5
2   | group_15, group_75
3   | group_1, group_215
GROUPS_table:
id  |  group_name
--------|---------------------------
1   | group_1
2   | group_2
3   | group_3
etc...
into:
MAIN_table:
id  |  group_posted_in
--------|---------------------------
1   | 1,5
2   | 15,75
3   | 1,215
Or something similar to this. However, This format specifically causes issues as the following query will return all of the rows (from the example), instead of just the one I need:
SELECT * FROM main_table WHERE group = '5'
I either have to change the query to something like this:
...WHERE group = '5' OR group = '5,%' OR group = '%,5,%' OR group = '%,5'
Or I have to change the database structure from Comma Separated Values to something like this: [15][75]. The accompanying query would be simpler, but it somehow seems like a cumbersome solution to me. Additionally, (simple) joins will not be easy/ possible at all. It will always require me to run a separate query to fetch the names of the groups--whether a user searches for posts in a specific group (in which case, I first have to run a query to fetch the id's, then to search for the associated posts), or whether it is to display them (first the posts, then another query to match the groups).
So, in conclusion: I suppose I know there is a solution to this problem, but my gut tells me that it is not the right/ best way to do it. So, I suppose the question that ties this post together is:
What is the correct method to connect the group database to the others?

For a many-to-many relationship, you need to create a joining table. Rather than storing a list of groups in a single column, you should split that column out into multiple rows in a separate table. This will allow you to perform set based functions on them and will significantly speed up the database, as well as making it more robust and error proof.
Main
MainID ...
Group
GroupID GroupName
GroupsInMain
GroupsInMainID MainID(FK) GroupID(FK)
So, for MainID 1, you would have GroupsInMain records:
1,1,1
2,1,5
This associates groups 1 and 5 with MainID 1
FK in this case means a Foreign Key (i.e. a reference to a primary key in another table). You'd probably also want to add a unique constraint to GroupsInMain on MainID and GroupID, since you'd never want the same values for the pairing to show up more than once.
Your query would then be:
select GroupsInMain.MainID, Group.GroupName
from Group, GroupsInMain
where Group.GroupID=GroupsInMain.GroupID
and Group.GroupID=5

Related

SQL only get rows that matches full number split by a comma

I'm working on something that shows shops under a specific category, however I have an issue because I store the categories of a shop like this in a record with the id of a category. "1,5,12". Now, the problem is if I want to show shops with category 2, it "mistakens" 12 as category 2. This is the SQL right now.
SELECT * FROM shops WHERE shop_cats LIKE '%".$sqlid."%' LIMIT 8
Is there a way to split the record "shop_cats" by a comma in SQL, so it checks the full number? The only way I can think of is to get all the shops, and do it with PHP, but I don't like that as it will take too many resources.
This is a really, really bad way to store categories, for many reasons:
You are storing numbers as strings.
You cannot declare proper foreign key relationships.
A (normal) column in a table should have only one value.
SQL has poor string functions.
The resulting queries cannot take advantage of indexes.
The proper way to store this information in a database is using a junction table, with one row per shop and per category.
Sometimes, we are stuck with other people's really bad design decisions. If this is your case, then you can use FIND_IN_SET():
WHERE FIND_IN_SET($sqlid, shop_cats) > 0
But you should really fix the data structure.
If you can, the correct solution should be to normalize the table, i.e. have a separate row per category, not with commas.
If you can't, this should do the work:
SELECT * FROM shops WHERE CONCAT(',' , shop_cats , ',') LIKE '%,".$sqlid.",%' LIMIT 8
The table shops does not follow 1NF (1st Normal Form) i.e; every column should exactly one value. To avoid that you need to create another table called pivot table which relates two tables (or entities). But to answer your question, the below SQL query should do the trick.
SELECT * FROM shops WHERE concat(',',shop_cats,',') LIKE '%,".$sqlid.",%' LIMIT 8

What is the benefit of creating a SQL table that has numbers 1 to 500,000 listed?

I was poking around a TFS database today to try and run some statistics and I came across a table called tbl_Number. This table contains one column Number, and all the values are just the values 1 to 500,000. None of the values differ from their respective index in the list, as you can see in the screenshot from queries I ran in LinqPad:
Tbl_Numbers.Max(x => x.Number).Dump(); //max value
Tbl_Numbers.Count().Dump(); //number of entries
var asList = Tbl_Numbers.ToList();
asList.Where(x => asList[x.Number - 1].Number != x.Number).Any().Dump();
//False shows that every entry matches the value at its ordinal location in the list
My question is: What would the use of such a table be? Is this in case one of the referenced numbers needs to change for some reason? The only way to identify a number from this table is by using that same number, so I don't see what use this table could be.
I realize this question could lead to answers that are conjecture, but I'd be interested to see if there's some programming principal that I'm unaware of that's being used here.
It can be used in OUTER JOINS to make sure that you always get all the numbers in a given range, even if there is no data related to that number.
For example, suppose I want to return the count of customers who bought 3,4 or 5 products on their last order. But in fact, there are no customers who bought 4 products. If I just ran a count query on my data, I wouldn't get a row for the customers who bought 4 products at all.
However, If I query my numbers table and LEFT JOIN to my data, I will get the number 4, and a count of 0 or NULL, depending on how I wrote my query.
People also often do this with Date tables, by the way.

MySQL Performance: one query using left join vs. multiple queries

I would like to achieve something like you see on Facebook:
- Posting status
- Comment status
- Like status (like for comments not implemented yet)
My tables structure is like this :
Posts Users Comments Likes
------- ------- -------- -------
ID ID ID ID
UserID Username PostID PostID
Content UserID UserID
Date Content
Date
So at this time when someone access to the main page the system is going to show the 10 lasts posts. My query uses LEFT JOIN on theses tables.
If for example there is 10 posts without any comments and any likes the query will return 10 records.
But for each comment or likes my query will return a new record (row) with some NULL value in the corresponding column.
At the end by simply wanting to retrieve 10 posts my query will return at least 50 rows (if each post has some comments and likes).
I was wondering if that will cause problem in the future. And I was wondering if I should better use multiple queries and parse all the results into an array like:
1. Select the 10 last posts
2. Save the IDs into array and all data into global array
3. Parse the array and make a prepared query for the comments something like:
SELECT * FROM COMMENTS WHERE PostID IN (1, 2, 3, 4, 5, 6,...)
4. Save the result into global array
5. Repeat again for the like table
I hope my explanation was clear enough :) Thank you
Doing one 50 row query reduces the overhead when communicating with the server, on the other hand it adds processing after the rows are retrieved.
It really depends on the overall solution.
However, unless the application is performance critical with the server being the bottleneck, i would go with 10 result sets - one per row, probably using some class/widget/object to display the post on the page.
I'm not an expert, if I understand correctly your option are:
A) the single mega query that will return a lot of NULL's and repeated values.
[Note: By "all" I mean, all you are interested in]
B) Three queries: One for all posts, one for all comments, and one for all likes (all joined with the users table), and then you can process them into objects or structs or dictionaries with whatever language you are using to query the database.
I would go with the second because It is easier, and the increase in order of magnitude seems benign, and probably even more flexible design wise.
What I would prefer NOT to do is one query per post. That would probably become a problem sooner than later. At least much sooner than A or B.

How to efficiently design MySQL database for my particular case

I am developing a forum in PHP MySQL. I want to make my forum as efficient as I can.
I have made these two tables
tbl_threads
tbl_comments
Now, the problems is that there is a like and dislike button under the each comment. I have to store the user_name which has clicked the Like or Dislike Button with the comment_id. I have made a column user_likes and a column user_dislikes in tbl_comments to store the comma separated user_names. But on this forum, I have read that this is not an efficient way. I have been advised to create a third table to store the Likes and Dislikes and to comply my database design with 1NF.
But the problem is, If I make a third table tbl_user_opinion and make two fields like this
1. comment_id
2. type (like or dislike)
So, will I have to run as many sql queries as there are comments on my page to get the like and dislike data for each comment. Will it not inefficient. I think there is some confusion on my part here. Can some one clarify this.
You have a Relational Scheme like this:
There are two ways to solve this. The first one, the "clean" one is to build your "like" table, and do "count(*)'s" on the appropriate column.
The second one would be to store in each comment a counter, indicating how many up's and down's have been there.
If you want to check, if a specific user has voted on the comment, you only have to check one entry, wich you can easily handle as own query and merge them two outside of your database (for this use a query resulting in comment_id and kind of the vote the user has done in a specific thread.)
Your approach with a comma-seperated-list is not quite performant, due you cannot parse it without higher intelligence, or a huge amount of parsing strings. If you have a database - use it!
("One Information - One Dataset"!)
The comma-separate list violates the principle of atomicity, and therefore the 1NF. You'll have hard time maintaining referential integrity and, for the most part, querying as well.
Here is one way to do it in a normalized fashion:
This is very clustering-friendly: it groups up-votes belonging to the same comment physically close together (ditto for down-votes), making the following query rather efficient:
SELECT
COMMENT.COMMENT_ID,
<other COMMENT fields>,
COUNT(DISTINCT UP_VOTE.USER_ID) - COUNT(DISTINCT DOWN_VOTE.USER_ID) SCORE
FROM COMMENT
LEFT JOIN UP_VOTE
ON COMMENT.COMMENT_ID = UP_VOTE.COMMENT_ID
LEFT JOIN DOWN_VOTE
ON COMMENT.COMMENT_ID = DOWN_VOTE.COMMENT_ID
WHERE
COMMENT.COMMENT_ID = <whatever>
GROUP BY
COMMENT.COMMENT_ID,
<other COMMENT fields>;
[SQL Fiddle]
Please measure on realistic amounts of data if that works fast enough for you. If not, then denormalize the model and cache the total score in the COMMENT table, and keep it current it through triggers every time a new row is inserted to or deleted from *_VOTE tables.
If you also need to get which comments a particular user voted on, you'll need indexes on *_VOTE {USER_ID, COMMENT_ID}, i.e. the reverse of the primary/clustering key above.1
1 This is one of the reasons why I didn't go with just one VOTE table containing an additional field that can be either 1 (for up-vote) or -1 (for down-vote): it's less efficient to cover with secondary indexes.

Store Voting Information - Database Outline

Summary: What is the most efficient way to store information similar to the like system on FB. Aka, a tally of likes is kept, the person who like it is kept etc.
It needs to be associated with a user id so as to know who liked it. The issue is, do you have a column that has a comma delimited list of the id of things that were liked, or do you have a separate column for each like (way too many columns). The info that's stored would be a boolean value (1/0) but needs to be associated with the user as well as the "page" that was liked.
My thought was this:
Column name = likes eg.:
1,2,3,4,5
Aka, the user has "like" the pages that have an id of 1, 2, 3, 4 and 5. To calculate total "likes" a tally would need to be taken and then stored in a database associated with the pages themselves (table already exists).
That seems the best way to me but is there a better option that anyone can think of?
P.S. I'm not doing FB likes but it's the easiest explanation.
EDIT: Similar idea to the plus/neg here on stackoverflow.
In this case the best way would be to create a new table to keep track of the likes. So supposing you have table posts, which has a column post_id which contains all the posts (on which the users can vote). And you have another table users with a column user_id, which contains all the users.
You should create a table likes which has at least two columns, something like like_postid and like_userid. Now, everytime a user likes a post create a new row in this table with the id of the post (the value of post_id from posts) that is liked and the id of the user (the value of user_id from users) that likes the post. Of course you can enter some more columns in the likes table (for instance to keep track of when a like is created).
What you have here is called a many-to-many relationship. Google it to get some more information about it and to find some more advice on how to implement them correctly (you will find that a comma seperated lists of id's will not be one of the best practices).
Update based on comments:
If I'm correct; you want to get a list of all users (ordered by name) who have voted on an artist. You should do that something like:
SELECT Artists.Name, User.Name
FROM Artists
JOIN Votes
ON Votes.page_ID = Artists.ID
JOIN Users
ON Votes.Votes_Userid = Users.User_ID
WHERE Artists.Name = "dfgdfg"
ORDER BY Users.Users_Name
There a strange thing here; the column in your Votes table which contains the artist id seems to be called page_ID. Also you're a bit inconsistent in column names (not really bad, but something to keep in mind if you want to be able to understand your code after leaving it alone for 6 months). In your comment you say that you only make one join, but you actually do two joins. If you specify two table names (like you do: JOIN Users, Votes SQL actually joins these two tables.
Based on the query you posted in the comments I can tell you haven't got much experience using joins. I suggest you read up on how to use them, it will really improve your ability to write good code.