I have two tables, meta_works(docID, tag, content) and meta_authors(authID, tag, content). Both tables have rows which include:
"xID", "authEng", "Author Name";
"xID", "authDate", "date range string";
"xID", "authLang", "language";
A distinct grouping in either table using the content from the tags of authEng + authDate + authLang will always have exactly one meta_authors.authID associated with it. One problem is I don't know how to do this distinct grouping across rows using multiple values of the same column.
I need to insert this distinct authID from meta_authors into meta_works for each distinct meta_works.docID as (docID=meta_works.docID, tag="authID", content=meta_authors.authID) so I can delete the other fields in meta_works and grab data from meta_authors instead.
The point of this attempted structure is to allow an arbitrary number of tags in either table associated with a work or author, and avoid duplicating the author data for each work, which helps with updating etc. The idea is to then expand this with a meta_sections table to allow meta tagging of work subsections as well.
I have tried to play around with inner/left/right joins and subqueries (never had to use these before due to limited mysql use) but cannot come up with anything that seems like even a start. The lack of examples is not for lack of trying/research, but with no experience this is proving difficult to approach. Perhaps it would be better to do this outside mysql and then update the table, but it intuitively feels well within the strengths of SQL so i feel it is worth the question.
Related
What methods are recommended for Selecting multiple columns within a nested subquery? It's been a while since I've coded any queries and I'm having some difficulty wrapping my head around this. The specific challenge is on Line 2 of the code below. The IN operand doesn't quite work here (see error message below), and I'm not sure if it's simply a matter of the syntax I'm using, and/or there is a much better way to go about this (i.e. using the HAVING operand or a JOIN statement)
SELECT * FROM Rules WHERE Rules.LNRule_id
IN(SELECT LNRule_id1,LNRule_id2,LNRule_id3,LNRule_id4 FROM Silhouette
WHERE Silhouette.Silhouette_Skirt=(SELECT Silhouette_Skirt FROM Style
WHERE Style.Style_Skirt='$Style_Skirt')
)
The purpose of this query is to SELECT all the relevant rows in table Rules for a particular value in table Style (i.e. $Style_Skirt), which it does by matching it to one of several factors - in this case the garment's Silhouette. What I am therefore trying to do in this portion of the query is SELECT all rows in table Rules who's ID (LNRule_id) matches values in any of the specified columns in table Silhouette
(SELECT LNRule_id1,LNRule_id2,LNRule_id3,LNRule_id4 FROM Silhouette WHERE Silhouette.Silhouette_Skirt=(...))
Edit
There is a many-to-many relationship (each Silhouette has several applicable Rules, and each Rule can apply to several Silhouettes). All the rules reside in the table 'Rules' (one per row), and each rule has an id ('LNRule_id'). The table 'Silhouette' has columns which tell it which rows need to be called from 'Rules' by 'LNRule_id' (LNRule_id1,2,3,4 indicate which Rules should be called, and store the values of the id's for the relevant rows in table 'Rules')
The error message currently being generated by the IN Operand is:
SQLSTATE[21000]: Cardinality violation: 1241 Operand should contain 1
column(s)
I think you want this query
SELECT * FROM Rules r
JOIN Silhouette s
(ON r.LNRule_id=s.LNRule_id1
OR r.LNRule_id=s.LNRule_id2
OR r.LNRule_id=s.LNRule_id3
OR r.LNRule_id=s.LNRule_id4)
JOIN Style st
ON s.Silhouette_Skirt=st.Silhouette_Skirt
WHERE st.Silhouette_Skirt = '$Style_Skirt'
mysql is complaining that on one side of IN you have a single column, and on the other side you have a multi-column rowset. In order for the IN operator to work, the rowset on the right side of IN must have the exact same number of columns as the left side; in this case, one column.
What you are trying to accomplish could perhaps be achieved if you did something like WHERE LNRule_id IN( SELECT LNRule_Id1 ...) OR LNRule_id IN( SELECT LNRule_Id2 ...) OR ... OR ... but the resulting query would be a monstrosity, and its performance would be horrendous. There may be other ways to go about it too, but anything you try will probably be similarly atrocious.
I do not have enough information to be absolutely sure about what I am saying, but it seems to me that the reason why you have this problem is that your database schema is not normalized. Generally, whenever you see a table with a group of columns having names that all begin with the same prefix and end with a number, it means that someone, somewhere, did not normalize their data.
To address the edit in your question, what you have implemented might conceptually be a many to many relationship, but as far as relational databases are concerned, (you know, the science, the theory, the established practices, the approaches necessary to get things to actually work,) it is definitely not a many to many relationship. Many to many relationships are most certainly not implemented with column1, column2, column3, ... columnN. To be sure that I am not making this stuff up, you can read what others say about many to many relationships here:
Many-to-many relations in RDBMS databases
So, my suggestion, if I correctly understand what is going on, would be to introduce a new table, called SilhouetteRules, which contains two columns, silhouette_id and rule_id. This table will implement a many-to-many relationship between silhouettes and your rules. Then of course you get rid of all the rule1, rule2, rule3, etc. columns from Silhouette.
Once you have done that, you can obtain all silhouettes and all rules associated with them using a query like this:
SELECT * FROM Silhouette
LEFT JOIN SilhouetteRules ON
Silhouette.id = SilhouetteRules.silhouette_id
LEFT JOIN Rules ON
SilhouetteRules.rule_id = Rules.id
The above query will yield multiple rows for each silhouette, where the silhouette fields will be identical from row to row, and only the rule fields will differ. Do not be surprised by this, that's how relational databases work.
Given a given_silhouette_id, you can retrieve all rules associated with it using a query like this:
SELECT * FROM Rules
LEFT JOIN SilhouetteRules ON
Rules.id = SilhouetteRules.rule_id
WHERE
SilhouetteRules.silhouette_id = given_silhouette_id
So, you are going to be using this query as a subquery in queries like the one in the question.
Now, regarding the query in the question, I am unable to tell you exactly how you would need to modify it to get it to work with the normalization that I proposed, because I cannot make sense of it. You see, even if you fix the problem that you currently have with SELECT * FROM table WHERE single-column IN multi-column-rowset, there is another problem further down: the WHERE Silhouette.Silhouette_Skirt=(SELECT ... part would not work either, because you cannot compare the value of a column against the result of a select statement. So, I do not know what you are trying to do there. Hopefully, once you normalize your schema and fix the first problem with your query, then the solution to the second problem will become obvious, or you can ask another question on stackoverflow.
P.S. did Mihai's answer work?
I am developing a forum in PHP MySQL. I want to make my forum as efficient as I can.
I have made these two tables
tbl_threads
tbl_comments
Now, the problems is that there is a like and dislike button under the each comment. I have to store the user_name which has clicked the Like or Dislike Button with the comment_id. I have made a column user_likes and a column user_dislikes in tbl_comments to store the comma separated user_names. But on this forum, I have read that this is not an efficient way. I have been advised to create a third table to store the Likes and Dislikes and to comply my database design with 1NF.
But the problem is, If I make a third table tbl_user_opinion and make two fields like this
1. comment_id
2. type (like or dislike)
So, will I have to run as many sql queries as there are comments on my page to get the like and dislike data for each comment. Will it not inefficient. I think there is some confusion on my part here. Can some one clarify this.
You have a Relational Scheme like this:
There are two ways to solve this. The first one, the "clean" one is to build your "like" table, and do "count(*)'s" on the appropriate column.
The second one would be to store in each comment a counter, indicating how many up's and down's have been there.
If you want to check, if a specific user has voted on the comment, you only have to check one entry, wich you can easily handle as own query and merge them two outside of your database (for this use a query resulting in comment_id and kind of the vote the user has done in a specific thread.)
Your approach with a comma-seperated-list is not quite performant, due you cannot parse it without higher intelligence, or a huge amount of parsing strings. If you have a database - use it!
("One Information - One Dataset"!)
The comma-separate list violates the principle of atomicity, and therefore the 1NF. You'll have hard time maintaining referential integrity and, for the most part, querying as well.
Here is one way to do it in a normalized fashion:
This is very clustering-friendly: it groups up-votes belonging to the same comment physically close together (ditto for down-votes), making the following query rather efficient:
SELECT
COMMENT.COMMENT_ID,
<other COMMENT fields>,
COUNT(DISTINCT UP_VOTE.USER_ID) - COUNT(DISTINCT DOWN_VOTE.USER_ID) SCORE
FROM COMMENT
LEFT JOIN UP_VOTE
ON COMMENT.COMMENT_ID = UP_VOTE.COMMENT_ID
LEFT JOIN DOWN_VOTE
ON COMMENT.COMMENT_ID = DOWN_VOTE.COMMENT_ID
WHERE
COMMENT.COMMENT_ID = <whatever>
GROUP BY
COMMENT.COMMENT_ID,
<other COMMENT fields>;
[SQL Fiddle]
Please measure on realistic amounts of data if that works fast enough for you. If not, then denormalize the model and cache the total score in the COMMENT table, and keep it current it through triggers every time a new row is inserted to or deleted from *_VOTE tables.
If you also need to get which comments a particular user voted on, you'll need indexes on *_VOTE {USER_ID, COMMENT_ID}, i.e. the reverse of the primary/clustering key above.1
1 This is one of the reasons why I didn't go with just one VOTE table containing an additional field that can be either 1 (for up-vote) or -1 (for down-vote): it's less efficient to cover with secondary indexes.
Apologies if this is redundant, and it probably is, I gave it a look but couldn't find a question here that fell in with what I wanted to know.
Basically we have a table with about ~50000 rows, and it's expected to grow much bigger than that. We need to be able to allow admin users to add in custom data to an item based on its category, and users can just pick which fields defined by the administrators they want to add info to.
Initially I had gone with an item_categories_fields table which pairs up entries from item_fields to item_categories, so admins can add custom fields and reuse them across categories for consistency. item_fields has a relationship to item_field_values which links values with fields, which is how we handled things in .NET. The project is using CAKEPHP though, and we're just learning as we go, so it can get a bit annoying at times.
I'm however thinking of maybe just adding an item_custom_fields table that is essentially the item_id and a text field that stores XMLish formatted data. This is just for the values of the custom fields.
No problems if I want to fetch the item by its id as the required data is stored in the items table, but what if I wanted to do a search based on a custom field? Would a
SELECT * FROM item_custom_fields
WHERE custom_data LIKE '%<material>Plastic</material>%'
(user input related issues aside) be practical if I wanted to fetch items made of plastic in this case? Like how slow would that be?
Thanks.
Edit: I was afraid of that as realistically this thing will be around 400k rows for that one table at launch, thanks guys.
Any LIKE query that starts with % will not use any indexes you have on the column, so the query will scan the whole table to find the result.
The response time for that depends highly on your machine and the size of the table, but it definitely won't be efficient in any shape or form.
Your previous/existing solution (if well indexed) should be quite a bit faster.
I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please donĀ“t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...
Apologize for the long topic, I didn't intend for it to be this long, but it's a pretty simple issue I've been having. :)
Let's say you have a simple table called tags that has columns tag_id and tag. The tag_id is simply an auto increment column and the tag is the title of the tag. If I need to add a description field, that would be around 1-2 paragraphs on average (max around 3-4 paragraphs probably), should I simply add a description field to the table or should I create a new table called tag_descriptions and store the descriptions with the tag_id?
I remember reading that it is better to do this because if you do a query that doesn't select the description, that description field will still slow down mysql. Is this true? I don't even remember where I read that from, but I've been kind of following it for a couple years now... Finally I question if I need to do this, I have a feeling I don't. You'd also need to inner join whenever you need the description field.
Another question I have is, is it generally bad to create new tables that will only hold very few rows at the max? What if this data doesn't fit anywhere else?
I have a simple case below which relates to these two questions.
I have three tables content, tags, and content_tags that make up a many to many relationship:
content
content_id
region (enum column with
about 6-7 different values and most
likely won't grow later on)
tags
tag_id
tag
content_tags
content_id
tag_id
I want to store a description around 1-2 paragraphs for each tag, but also for each region. I'm wondering what would be the best way to do this?
Option A:
Just add a description column to the
tags table
Create a new table for
region_descriptions
Option B:
Create a new table called
descriptions with fields: id,
description, and type
The id would be id of the content or
id of the enum field
The type would be whether it is a tag
description, or region description
(Would use the enum column for this)
Maybe have a primary key on the id and type?
Option C:
Create a new table for tag_descriptions
Create a new table for region_descriptions
Option A seems to be a good choice if adding the description column doesn't slow down mysql select queries that don't need the description.
Assuming the description column would slow down mysql, option B might be a good choice. It also removes the need for a small table with just 6-7 rows that would hold the region descriptions. Although now that I think of it, would it be slow to connect to this table if originally to get a region description you'd only need to go through very little rows.
Option C would be ideal if the description columns would slow down mysql and if a small table like region descriptions would not matter.
Maybe none of these options are the best, feel free to offer another option. Thanks.
P.S. What would be an ideal column type to use to hold data that usually 1-2 paragraphs, but might be a little more sometimes?
I don't think it really matters if you don't handle thousands of queries per minute. If you are going to have a zillion queries per minute, then I would implement the various options and perform benchmarks for all these options. Based on the results, you can make a decision.
In my (admittedly somewhat uninformed) opinion, it really depends on how much you'll be using both of them.
If properly indexed, that JOIN should not be very expensive. Also, a larger table will be slower. It inhibits caching, and takes longer to access stuff, although indexing seriously mitigates this problem.
If you'll be joining tag names to tag IDs a LOT, and only rarely will be using the descriptions, I'd say go with separate tables. If you'll be using the descriptions more often, go with one table.
For the first part of your question: if you have a tag with an id, a name and a description, you should save it in 1 table.
Now, this query
SELECT name FROM tags WHERE id = 1;
will NOT slow down if you have 1, 2 or 20 extra fields in there.