I'm working on something that shows shops under a specific category, however I have an issue because I store the categories of a shop like this in a record with the id of a category. "1,5,12". Now, the problem is if I want to show shops with category 2, it "mistakens" 12 as category 2. This is the SQL right now.
SELECT * FROM shops WHERE shop_cats LIKE '%".$sqlid."%' LIMIT 8
Is there a way to split the record "shop_cats" by a comma in SQL, so it checks the full number? The only way I can think of is to get all the shops, and do it with PHP, but I don't like that as it will take too many resources.
This is a really, really bad way to store categories, for many reasons:
You are storing numbers as strings.
You cannot declare proper foreign key relationships.
A (normal) column in a table should have only one value.
SQL has poor string functions.
The resulting queries cannot take advantage of indexes.
The proper way to store this information in a database is using a junction table, with one row per shop and per category.
Sometimes, we are stuck with other people's really bad design decisions. If this is your case, then you can use FIND_IN_SET():
WHERE FIND_IN_SET($sqlid, shop_cats) > 0
But you should really fix the data structure.
If you can, the correct solution should be to normalize the table, i.e. have a separate row per category, not with commas.
If you can't, this should do the work:
SELECT * FROM shops WHERE CONCAT(',' , shop_cats , ',') LIKE '%,".$sqlid.",%' LIMIT 8
The table shops does not follow 1NF (1st Normal Form) i.e; every column should exactly one value. To avoid that you need to create another table called pivot table which relates two tables (or entities). But to answer your question, the below SQL query should do the trick.
SELECT * FROM shops WHERE concat(',',shop_cats,',') LIKE '%,".$sqlid.",%' LIMIT 8
Related
I have a bunch of products, and a bunch of category pages. One product can be in multiple categories. So in my database I have a products table with a "categories" column. In this column I store the ID's of all the categories that the current product is stored in, its a string seperated with semicolons.
Example: 1;5;23;35;49;.
When I browse to Category Page ID 5, I want to see all products that have 5; in its categories-column. I currently do this by
SELECT * FROM products WHERE categories LIKE "%".category.";%"
The problem is that this matches more than just 5. It matches 15; or 25; aswell.
So questions:
How do I make sure that I only select the number I want? If category is "5" I do not want it to match 15, 25, 35 and so on.
Maybe this is a very bad way of storing the category-ids. Do you have any suggestions of a different way of storing what products that belong to what category?
Others have mentioned that a junction table is the right way to design the database. SQL has a very nice data structure for storing lists. It is not called a "string", it is called a "table".
But, sometimes one is stuck with data in this format and needs to work with it. In that case, the key is to put the delimiters on both side to prevent the problem you are having:
SELECT *
FROM products
WHERE concat(';', categories) LIKE "%;".category.";%"
Your list already ends in a semicolon, so that is not necessary.
Another more typical MySQL solution is find_in_set():
SELECT *
FROM products
WHERE find_in_set(category, replace(categories, ';', ',') > 0;
It is designed for comma-delimited lists. Odd that MySQL supports such a function when storing lists this way is generally a bad idea, but it does. Still, a junction table is better for performance reasons (and for other reasons).
Answers/comments to your two questions:
The only way I can think of that you could do this without modifying your schema (see #2) is to use a MySQL regular expression but this is really not a good idea. See http://dev.mysql.com/doc/refman/5.1/en/regexp.html for documentation though
You are right - this is not a good way to store categories. What you want is a join also known as a junction table (see http://en.wikipedia.org/wiki/Junction_table). One way would be to have three tables: product, category, and a product_categories table. Product and category would have a unique ID as you already have and the product_categories table would have two columns: product_id and category_id. If product 1 belongs to categories 10 and 11, you would have two rows in the product_categories table: 1,10 and 1,11.
I can elaborate if you need more help but this should get you started in re-architecting your database (more) correctly.
You can try changing your like criteria to "%;".category.";%"
We're working a web application (Ruby/Rails + Backbone,jQuery,Javascript) where a user can manage a booklist and drag and drop books to rearrange their order within the list, which has to be persisted.
We have books and a custom collection of books called booklist, for which we have two tables: book and booklist. Since a book could belong to multiple booklists, and a booklist consists of multiple books, they have an m x n relationship, and we have another additional table to store the mapping. Lets say we use this for all purposes. Now when the user wants to re-order the books in her bookshelf, we'd need to store that order.
I can totally see the sense about why storing ids in a column is evil , no doubts about it. What if we have the tables normalized, and for all other cases we'd go through the standard operations.
There are quite a few approaches on storing an additional order column. But still it seems like bad design to store the ids of the books in a booklist in a comma separated list in the booklist table, even assuming that integrity is maintained.
We'd never run into this...
SELECT * FROM users WHERE... OH F#$%CK -
Yes it's bad, you can't order, count, sum (etc) or even do a simple report without depending
on a top level language.
because we'd simply be selecting books based on the booklist id using the join table like the standard approach. (In any case, we're only getting the books as an array as part of the backbone booklist model)
So what if we retrieve the booklist and books for the booklist, and do the sorting programatically on the client side (in this case Javascript?) based on the CSV column.
It appears to be a simple solution because:
Every time the user reorders a book, we simply store all the ids in this one column freshly again. (A user will have at the most 20 to 30 books in a booklist).
We could of course simply ignore invalid ids, i.e. books that have been deleted after the booklist had been created.
What are the disadvantages of this approach, which seems to be simpler than maintaining the sort order and updating other columns every time an order is changed, or using a float or weightage, etc.
As per my knowldege its really violating the rule of RDBMS.Which causes facing many difficulties when applying JOIN.
Hope it will help you.
I have a table that is going to have a bunch of stock numbers and some other information. I need to be able to create groups of stock numbers e.g. products 123A,456B, and 789C all are in the same group. Each stock number has its own record in the table. I don't really care how it is accomplished so long as I can make sure that the items are grouped and that group numbers are unique. What is the best way to accomplish this?
There are a few ways I can think of to accomplish this but I'm not really sure what the best way is or if there are hidden drawbacks:
1) I could have a separate table with one auto increment column that is responsible for generating group id's and then add the Id to the stock number table once it's generated but that semes wasteful.
2) Would selecting the max groupId from the stock number table and adding one to it be good to get new group id's?
3) I could not have groupId in the stock number table and do it using a separate table with a GroupID column and a StockNumberId column. Still how would I get the groupId doing it this way?
You don't have to respond to all three. In indication of which would be the most appropriate way and any notes on implementation or pitfalls would be helpful. Also if there is a better way I did not innumerate please enlighten me. Thank you.
If you only need to set a groupId for each stock, and if each stock belongs to one and only one group then solution (2) would work just fine.
I would also create an index on the groupId column for easy retrieval of stocks that belong to the same group, as well as for getting the MAX(groupId) more efficiently.
I would use solution (1) if I had to also store group information, e.g., groupName, etc.
Solution (3) would be more towards a Many-To-Many relation between Stocks and Groups where the table you are describing sits in between the two. But in your case this would not be a good solution.
I guess that title isn't very descriptive, so I will explain! I have table called users_favs where is stored all info about which posts user has liked, which post he has favourited and the same for comments. info there is stored as serealized array / or JSON who cares.
Question: What is better? Stay like this or to make 4 tables for each of the fields and store not in serealized version but like user_id => post_id???
What I think about second option is that after some time this field will be GIGANTIC. Also, I will need to make 4 queries (or with JOINS) to take all of the info from these tables.
Keeping it in 1 table means that you'll only need 1 table access and 0 joins to get all the data. While storing it in 4 tables, you'll need at least 1 table access and n-1 joins, when you need n fields of information. Your result set at the end of the query will probably be the same, so the amount of data send over the network is independent of your table structure.
I presume a scenario when you will have data for fav_categories and other columns are null. Similarly for columns fav_posts, liked_posts, liked_comments. So there is a high probability that in each row , only three columns will have data most of the time (id,user_id,any one of rest). If my assumptions are right and the use cases as well , then i would definitely go four four tables.
To add to above you can always choose from whether you want to make read-friendly or write-friendly.
I have searched for a solution for this problem, but haven't found it (yet), probably because I don't quite know how to explain it properly myself. If it is posted somewhere already, please let me know.
What I have is three databases that are related to each other; main, pieces & groups. Basically, the main database contains the most elementary/ most used information from a post and the pieces database contains data that is associated with that post. The groups database contains all of the (long) names of the groups a post in the main database can be 'posted in'. A post can be posted in multiple groups simultaneously. When a new post is added to my site, I check the pieces too see if there are any duplicates (check if the post has been posted already). In order to make the search for duplicates more effective, I only check the pieces that are posted in the same group(s).
Hopefully you're still with me, cause here's where it starts to get really confusing I think (let me know if I need to specify things more clearly): right now, both the main and the pieces database contain the full name of the group(s) (basically I'm not using the groups database at all). What I want to do is replace the names of those groups with their associated IDs from the groups database. For example, I want to change this:
from:
MAIN_table:
id | group_posted_in
--------|---------------------------
1 | group_1, group_5
2 | group_15, group_75
3 | group_1, group_215
GROUPS_table:
id | group_name
--------|---------------------------
1 | group_1
2 | group_2
3 | group_3
etc...
into:
MAIN_table:
id | group_posted_in
--------|---------------------------
1 | 1,5
2 | 15,75
3 | 1,215
Or something similar to this. However, This format specifically causes issues as the following query will return all of the rows (from the example), instead of just the one I need:
SELECT * FROM main_table WHERE group = '5'
I either have to change the query to something like this:
...WHERE group = '5' OR group = '5,%' OR group = '%,5,%' OR group = '%,5'
Or I have to change the database structure from Comma Separated Values to something like this: [15][75]. The accompanying query would be simpler, but it somehow seems like a cumbersome solution to me. Additionally, (simple) joins will not be easy/ possible at all. It will always require me to run a separate query to fetch the names of the groups--whether a user searches for posts in a specific group (in which case, I first have to run a query to fetch the id's, then to search for the associated posts), or whether it is to display them (first the posts, then another query to match the groups).
So, in conclusion: I suppose I know there is a solution to this problem, but my gut tells me that it is not the right/ best way to do it. So, I suppose the question that ties this post together is:
What is the correct method to connect the group database to the others?
For a many-to-many relationship, you need to create a joining table. Rather than storing a list of groups in a single column, you should split that column out into multiple rows in a separate table. This will allow you to perform set based functions on them and will significantly speed up the database, as well as making it more robust and error proof.
Main
MainID ...
Group
GroupID GroupName
GroupsInMain
GroupsInMainID MainID(FK) GroupID(FK)
So, for MainID 1, you would have GroupsInMain records:
1,1,1
2,1,5
This associates groups 1 and 5 with MainID 1
FK in this case means a Foreign Key (i.e. a reference to a primary key in another table). You'd probably also want to add a unique constraint to GroupsInMain on MainID and GroupID, since you'd never want the same values for the pairing to show up more than once.
Your query would then be:
select GroupsInMain.MainID, Group.GroupName
from Group, GroupsInMain
where Group.GroupID=GroupsInMain.GroupID
and Group.GroupID=5