Single MySQL field with comma separated values [duplicate] - mysql

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
mysql is array in multiple columns
I have two tables:
Posts Table
PostID | Gallery
1 | 4,7,8,12,13
2 | 1,2,3,4
3 | 5,8,9
4 | 3,6,11,14
The values in Gallery are the primary keys in the Images table:
Images Table
ImageID | FileName
1 | something.jpg
2 | lorem.jpg
3 | ipsum.jpg
4 | what.jpg
5 | why.jpg
The reason I do this instead of just adding a PostID key to the Images table is because those images can be associated with a lot of different posts. I suppose I could add another table for the relationships, but the comma-separated value is easier to work with as far as the jQuery script I am using to add to it.
If I'm on a page that requires the images associated with PostID 3, what kind of query can I run to output all of the FileNames for it?

You can use this solution:
SELECT b.filename
FROM posts a
INNER JOIN images b ON FIND_IN_SET(b.imageid, a.gallery) > 0
WHERE a.postid = 3
SQLFiddle
However, you should really normalize your design and use a cross-reference table between posts and images. This would be the best and most efficient way of representing N:M (many-to-many) relationships. Not only is it much more efficient for retrieval, but it will vastly simplify updating and deleting image associations.
...but the comma-separated value is easier to work with as far as the jQuery script I am using to add to it.
Even if you properly represented the N:M relationship with a cross-reference table, you can still get the imageid's in CSV format:
Suppose you have a posts_has_images table with primary key fields (postid, imageid):
You can use GROUP_CONCAT() to get a CSV of the imageid's for each postid:
SELECT postid, GROUP_CONCAT(imageid) AS gallery
FROM posts_has_images
GROUP BY postid

In terms of proper SQL, you definitely should have another table to relate the two rather than the delimited column.
That said, here's how you could do it:
SELECT * FROM Images i WHERE EXISTS (SELECT 1 FROM Posts p WHERE p.PostID = 3 AND i.ImageID IN (p.Gallery))

Here is your problem. This is bad design as you need to search for specific values of Gallery field. You can use FIND_IN_SET, but your query will be slow. Turn to atomic values for Gallery - normalize it.

Related

How to select comma-separated values from a field in one table joined to another table with a specific where condition?

I'm working on a mysql database select and cannot find a solution for this tricky problem.
There's one table "words" with id and names of objects (in this case possible objects in a picture).
words
ID object
house
tree
car
…
In the other table "pictures" all the information to a picture is saved. Besides to information to resolution, etc. there are especially informations on the objects in the picture. They are saved in the column objects by the ids from the table words like 1,5,122,345, etc.
Also the table pictures has a column "location", where the id of the place is written, where I took the picture.
pictures
location objectsinpicture ...
1 - 1,2,3,4
2 - 1,5,122,34
1 - 50,122,345
1 - 91,35,122,345
2 - 1,14,32
1 - 1,5,122,345
To tag new pictures of a particular place I want to become suggestions of already saved information. So I can create buttons in php to update the database instead of using a dropdown with multiple select.
What I have tried so far is the following:
SELECT words.id, words.object
FROM words, pictures
WHERE location = 2 AND FIND_IN_SET(words.id, pictures.objectsinpicture)
GROUP BY words.id
ORDER BY words.id
This nearly shows the expected values. But some information is missing. It doesn't show all the possible objects and I cannot find any reason for this.
What I want is for example all ids fo location 2 joined to the table words and to group double entries of objectsinpicture:
1,5,122,34
1,14,32
1,5,14,32,34,122
house
...
...
...
...
...
Maybe I need to use group_concat with comma separator. But this doesn't work, either. The problem seems to be where condition with the location.
I hope that anyone has an idea of solving this request.
Thanks in advance for any support!!!
This is a classic problem of denormalization causing problems.
What you need to do is store each object/picture association separately, in another table:
create table objectsinpicture (
picture_id int,
object_id int,
primary key (picture_id, object_id)
);
Instead of storing a comma-separated list, you would store one association per row in this table. It will grow to a large number of rows of course, but each row is just a pair of id's so the total size won't be too great.
Then you can query:
SELECT w.id, w.object
FROM pictures AS p
JOIN objectsinpicture AS o ON o.picture_id = p.id
JOIN words AS w ON o.object_id = w.id
WHERE p.location = 2;

How to design a relational database for associating multiple tags with id?

I'm developing a project of Q/A website, and a questions in this website can be linked with multiple tags.
For example: How can I implement quicksort in C++ ?
Tags for this question can be C, C++, Algorithm
My question is how can I store these tags in MySQL table ?
My approach:
The question above has id = 123
-------------------------
|question_id | tags |
-------------------------
| 123 | C |
-------------------------
| 123 | C++ |
-------------------------
| 123 |Algorithm |
-------------------------
| 124 | Java |
-------------------------
But if I create a table in this sense, then for lot of questions, the table will become very large.
Is there any better and efficient way to store this kind of data ?
Managing Tags in Information Systems can be done by two methods. And we have trade-off between simplicity and low performance and complexity and difficulties and high performance.
First Solution: Using TAG Table and many-to-many relationship between Our_Table and TAG table. (as #tadman describe)
Second Solution:
If we want a very quick and high performance way to retrieve data related to specific tags, we can use Bit-Mask Solution.
Bit-Mask Solution for TAG Management System (previously described for similar problem here)
In this method we have same tables (as #tadman said).
just add 1 fields, a long or bigint (related to our DBMS) into the table that wants to have TAGs (like Question)
This field shows TAGs of Question in binary format. For example suppose that we have 8 records in TAG table.
1- some TAG 1
2- some TAG 2
...
8- some TAG 8
then if we want to set TAGs 1,3,6,7 to one Question, just use this number 01100101. (I offer to use reversed version of binary (0,1) placement to support additional TAGs in future.)
we can use 10 base number to add in database (101 instead of 01100101)
Then, to find appropriate Questions for any given TAGs, JUST select from Question and use bitwise AND like below.
A: (shows the TAGs sequence that we are looking in Question, like 5 as 00000101)
B: (shows TAGs of any Question like 101 as 01100101)
select * from Question q
where A & q.B = q.B
This query returns all Questions that have subsets of the specific TAGs.
We can use other functionalities of Bit-Wise operations like A & q.B > 0 to return at least 1 equal TAG between A and B. And So on.
Notice:
1: we have some extra difficulties when a new TAG is Added or to be Delete. But this is a trade-off. Adding or Deleting new TAGs less happens.
2: we should use Question, Tags and Question_Tags Table too. But in search, we just use new field.
Have an extra table with exactly 3 columns: id of item being tagged, tag, ordering info.
When you add or delete an item, add or delete all the relevant rows from this table.
The details are here: http://mysql.rjweb.org/doc.php/lists
Efficient, fast, easy, and highly scalable.
To keep your tags compact, make the tags themselves a first-class entity, not just some random string column:
CREATE TABLE tags (
id INT PRIMARY KEY AUTO_INCREMENT NOT NULL,
label VARCHAR(255) NOT NULL,
UNIQUE KEY `index_on_label` (label)
);
Then have a small, simple join table between the questions and tags:
CREATE TABLE question_tags (
question_id INT NOT NULL,
tag_id INT NOT NULL,
UNIQUE KEY `index_on_question_tag` (question_id, tag_id),
KEY `index_on_tag` (tag_id)
);
Which should cover querying both what tags a question has and what questions a tag has.

MySQL - Compare comma-separated field in the same table [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
mysql is array in multiple columns
I have two tables:
Posts Table
PostID | Gallery
1 | 4,7,8,12,13
2 | 1,2,3,4
3 | 5,8,9
4 | 3,6,11,14
The values in Gallery are the primary keys in the Images table:
Images Table
ImageID | FileName
1 | something.jpg
2 | lorem.jpg
3 | ipsum.jpg
4 | what.jpg
5 | why.jpg
The reason I do this instead of just adding a PostID key to the Images table is because those images can be associated with a lot of different posts. I suppose I could add another table for the relationships, but the comma-separated value is easier to work with as far as the jQuery script I am using to add to it.
If I'm on a page that requires the images associated with PostID 3, what kind of query can I run to output all of the FileNames for it?
You can use this solution:
SELECT b.filename
FROM posts a
INNER JOIN images b ON FIND_IN_SET(b.imageid, a.gallery) > 0
WHERE a.postid = 3
SQLFiddle
However, you should really normalize your design and use a cross-reference table between posts and images. This would be the best and most efficient way of representing N:M (many-to-many) relationships. Not only is it much more efficient for retrieval, but it will vastly simplify updating and deleting image associations.
...but the comma-separated value is easier to work with as far as the jQuery script I am using to add to it.
Even if you properly represented the N:M relationship with a cross-reference table, you can still get the imageid's in CSV format:
Suppose you have a posts_has_images table with primary key fields (postid, imageid):
You can use GROUP_CONCAT() to get a CSV of the imageid's for each postid:
SELECT postid, GROUP_CONCAT(imageid) AS gallery
FROM posts_has_images
GROUP BY postid
In terms of proper SQL, you definitely should have another table to relate the two rather than the delimited column.
That said, here's how you could do it:
SELECT * FROM Images i WHERE EXISTS (SELECT 1 FROM Posts p WHERE p.PostID = 3 AND i.ImageID IN (p.Gallery))
Here is your problem. This is bad design as you need to search for specific values of Gallery field. You can use FIND_IN_SET, but your query will be slow. Turn to atomic values for Gallery - normalize it.

How to implement cross-reference among 3+ records?

I have a table with this structure
(int)id | (text)title | (text)description | (int)related
and a query which joins the table with itself
SELECT t1.*, t2.title as relatedTitle
FROM mytable t1 LEFT JOIN mytable t2 ON t2.related=t1.id
to produce in one SELECT list like this
title: Hi, description: informal greetings, see also: Hello
When a new record is stored into the table, only one other record can be referenced
What I try to achieve is cross reference
which can be among 2-5 objects
All objects should be cross referenced in every combination. I want this feature: if related is set, the script should automagically create cross reference in the related records. If record is deleted, the script should update the reference in the related records.
For 3+ records cross referenced, I am considering this joining table
(int)id | (int)related
but it would be 20 records for 5 cross referenced objects. I could also create one-column table
(varchar)relatedList
but how to create the left join and how to delete relations in this structure? Or should I try some other approach like triggers, views or temporary tables? I want to avoid redundance and keep it as simple as possible and just can't figure this out.
If your groups are typically bigger than 2, then you should create a list of groups - if A is connected to B and C it makes a group A,B,C.
So, as soon as a relation is inserted, you check if the related item is already in a group. If it is set, then the "new" related entry is also in that group.
If not, you just created a new group which contains those two Items.
So if from your Example "Hi" is alone, and "Ho" gets connected to "Hi", then both form a new group.
When "Ahoi" also gets into connection to "Hi", it just needs to copy the group_id from Hi.
EDIT: according to the comment asking for the select:
The structure:
table groups: group_id int not null primary key auto_increment, created_tmstmp timestamp
table items: item_id int, group_id int default null
The select:
select * from items i1
inner join items i2 on i2.item_id != i1.item_id
and i2.group_id = i1.group_id
where i1.id = <given item>.
The insert of a relation may be connected to insert of one of the Items, this depends on the scenario of the Thread Owner. If it is a new relation for both entries then a new group is inserted.
Other questions are: is an item only in one group? Otherwise one needs a item_group table to connect a item to more then one group.
No join on strings, sorry for the possibility to be understood so cruelly. ;-)
As mzedeler said a many-to-many relation is usually realized using a join table. Please consider using a separate id column so you'd get
Id,rel1,rel2
That way frameworks like hibernate would know what to do. This makes even more sense considering the fact that you are talking about a transitive and symmetric relation. So for 4 related items 3 entries could suffice if your script does some basic inference. Of course you would need to fill the gap in a given relation chain if a connecting entry where removed.
Go with this one:
(int)id | (int)related
It is a very common approach.
If you use the list approach, your SQL queries will be extremely complicated.
Generally, SQL engines are very good at optimizing queries against tables with very large numbers of rows, so at most reasonable hardware, you shouldn't have to worry about millions of rows in such table. (Depending on what you are going to use it for, of course.)
To model non-directed edges, always insert the lowest id in the id column (you can add a CHECK constraint to enforce this). By doing this, you'll eliminate half of the tuples.
If you run into performance issues because you want to model a complete graph, consider only using the table above for "neighbor" connections, calculate the completion of the graph and insert it into a table that contains partitions of all items, one partition for each complete subgraph:
(int)partition | (int)id
Lets take a look at an example. Given the items (1 .. 8) and the edges (1, 2), (1, 3), (4, 3), (6, 5), (6, 7) - not including the edges that are required to complete the graph, you get
(int)id | (int)related
1 | 2
1 | 3
3 | 4
5 | 6
6 | 7
(And no records with the item 8.)
And then in the partition table:
(int)partition | (int)id
1 | 1
1 | 2
1 | 3
1 | 4
2 | 5
2 | 6
2 | 7
3 | 8
To check if an item is related to another item will only be a self-join on the partition table, but changing the graph requires manipulation of both tables.

Table with a lot of attributes

I'm planing to build some database project.
One of the tables have a lot of attributes.
My question is: What is better, to divide the the class into 2 separate tables or put all of them into one table. below is an example
create table User { id, name, surname,... show_name, show_photos, ...)
or
create table User { id, name, surname,... )
create table UserPrivacy {usr_id, show_name, show_photos, ...)
The performance i suppose is similar due to i can use index.
It's best to put all the attributes in the same table.
If you start storing attribute names in a table, you're storing meta data in your database, which breaks first normal form.
Besides, keeping them all in the same table simplifies your queries.
Would you rather have:
SELECT show_photos FROM User WHERE user_id = 1
Or
SELECT up.show_photos FROM User u
LEFT JOIN UserPrivacy up USING(user_id)
WHERE u.user_id = 1
Joins are okay, but keep them for associating separate entities and 1->N relationships.
There is a limit to the number of columns, and only if you think you might hit that limit would you do anything else.
There are legitimate reasons for storing name value pairs in a separate table, but fear of adding columns isn't one of them. For example, creating a name value table might, in some circumstances, make it easier for you to query a list of attributes. However, most database engines, including PDO in PHP include reflection methods whereby you can easily get a list of columns for a table (attributes for an entity).
Also, please note that your id field on User should be user_id, not just id, unless you're using Ruby, which forces just id. 'user_id' is preferred because with just id, your joins look like this:
ON u.id = up.user_id
Which seems odd, and the preferred way is this:
ON u.user_id = up.user_id
or more simply:
USING(user_id)
Don't be afraid to 'add yet another attribute'. It's normal, and it's okay.
I'd say the 2 separate tables especially if you are using ORM. In most cases its best to have each table correspond to a particular object and have its field or "attributes" be things that are required to describe that object.
You don't need 'show_photos' to describe a User but you do need it to describe UserPrivacy.
You should consider splitting the table if all of the privacy attributes are nullable and will most probably have values of NULL.
This will help you to keep the main table smaller.
If the privacy attributes will mostly be filled, there is no point in splitting the table, as it will require extra JOINs to fetch the data.
Since this appears to be a one to one relationship, I would normally keep it all in one table unless:
You would be near the limit of the number of bytes that can be stored in a row - then you should split it out.
Or if you will normally be querying the main table separately and won't need those fields much of the time.
If some columns is (not identifiable or dependent on the primary key) or (values from a definite/fixed set is being used repeatedly) of the Table make a Different Table for those columns and maintain a one to one relationship.
Why not have a User table and Features table, e.g.:
create table User ( id int primary key, name varchar(255) ... )
create table Features (
user_id int,
feature varchar(50),
enabled bit,
primary key (user_id, feature)
)
Then the data in your Features table would look like:
| user_id | feature | enabled
| -------------------------------
| 291 | show_photos | 1
| -------------------------------
| 291 | show_name | 1
| -------------------------------
| 292 | show_photos | 0
| -------------------------------
| 293 | show_name | 0
I would suggest something differnet. It seems likely that in the future you will be asked for 'yet another attribute' to manage. Rather than add a column, you could just add a row to an attributes table:
TABLE Attribute
(
ID
Name
)
TABLE User
(
ID
...
)
TABLE UserAttributes
(
UserID FK Users.ID
Attribute FK Attributes.ID
Value...
)
Good comments from everyone. I should have been clearer in my response.
We do this quite a bit to handle special-cases where customers ask us to tailor our site for them in some way. We never 'pivot' the NVP's into columns in a query - we're always querying "should I do this here?" by looking for a specific attribute listed for a customer. If it is there, that's a 'true'. So rather than having these be a ton of boolean-columns, most of which would be false or NULL for most customers, AND the tendency for these features to grow in number, this works well for us.