Tag Database Schema - mysql

I have three tables for tagging. The first one is the question table that has a list of question with certain ID. The second one is the tag table that has a list of tag name with certain ID. And the third one is the question_tag table, a collection of question to a tag. A question that has multiple tag means multiple rows in question_tag, I thought of storing an serialized array into the question_tag table but it's in general not a good idea to store array inside of a SQL database.
Below is the schema. Arrow denoting foreign key.
-------------------
----------------- | question_tag |
| question | ------------------- ------------------
----------------- | question_tag_ID | | tag |
| question_ID | ---> | question_ID | ------------------
----------------- | tag_ID | <----- | tag_ID |
------------------- | tag_name |
------------------
I want to make a query that will output this table below.
----------------------------------------------------
| question_id | tag_name |
----------------------------------------------------
| 1 | algebra, calculus, differentiation |
| 2 | calculus |
| 3 | algebra, trigonometry |
----------------------------------------------------
How do I manage to do this query? I thought about SELECTING from question and JOINING a temporary table of SELECT tag.tag_name FROM tag WHERE question_tag.tag_ID = tag.tag_ID, but how do I output this RIGHT column (tag_name) like the table above.
I would really appreciate it if you can help me with this SQL query, I am guessing that I need to do a nesting SELECT query for the RIGHT (tag_name) column, then JOIN it to the question_table. But I am not sure how to the nesting of SELECT query.
This is what I have come up with:
SELECT * FROM question as Q LEFT JOIN (SELECT T.tag_name FROM tag as T WHERE T.tag_id IN (SELECT QT.tag_id FROM question_tag AS QT WHERE QT.question_ID = Q.id)) AS QT_T

You'll need to aggregate and concatenate. Please check out some of the related questions:
Does T-SQL have an aggregate function to concatenate strings?
Implode type function in SQL Server 2000?
Concatenate row values T-SQL
How to use GROUP BY to concatenate strings in MySQL?
Aggregate String Concatenation in Oracle 10g
Create a delimitted string from a query in DB2
How to concatenate strings of a string field in a PostgreSQL 'group by' query?
UPDATE
While I don't have a mysql DB to test this on, going off of one of the above links (with the knowledge this is mysql), I've designed the following query:
SELECT
qt.question_id,
GROUP_CONCAT(t.tag_name SEPARATOR ',')
FROM question_tag qt
LEFT JOIN tag t ON t.tag_id = qt.tag_id
GROUP BY qt.question_id;
Please try this out and let me know if it works.

Related

ER_NON_UNIQ_ERROR and how to design tables correctly

I have come across this problem and I've tried to solve it few days now.
Let's say I have following tables
properties
-----------------------------------------
| id | address | building_material |
-----------------------------------------
| 1 | Street 1 | 1 |
-----------------------------------------
| 2 | Street 2 | 2 |
-----------------------------------------
building_materials
-----------------------------
| id | building_material |
-----------------------------
| 1 | Wood |
-----------------------------
| 2 | Stone |
-----------------------------
Now. I would like to provide an API where you could send a request and ask for every property that has building material of wood. Like this:
myapi.com/properties?building_material=Wood
So I would like to query database like this (I want to return the string value of building_material not the numeric value):
SELECT p.id, p.address, bm.building_material
FROM properties as p
JOIN building_materials as bm ON (p.building_material = bm.id)
WHERE building_material = "Wood"
But this will give me an error
Column 'building_material' in where clause is ambiguous
Also if I want to get property with id of 1.
SELECT p.id, p.address, bm.building_material
FROM properties as p
JOIN building_materials as bm ON (p.building_material = bm.id)
WHERE id = 1
Column 'id' in where clause is ambiguous
I understand that the error means that I have same column name in two tables and I don't specify which id I want like p.id.
Problem is I don't know how many query parametes API user is going to send and I would like to avoid looping through them and changing id to p.id and building_material to bm.building_material. Also I don't want that user has to send request to the API like this
myapi.com/properties?bm.building_material=Wood
I've thought about changing the properties table building_material to fk_building_material and changing properties table id to property_id.
I just don't like the idea that on client side I would then have to refer property's building material as fk_building_material. Is this a valid method to solve this problem or what would be the correct way of designing these tables?
The query mentions two tables, so all the columns in both tables are "on the table" for use anywhere in the query.
In one table building_material is an "id" for linking to the other table; in the other table, it is a string. While this is possible, it is confusing to the reader. And to the parser. To resolve the confusion, you must qualify building_material with which one you want; that is done with a table alias (or table) in front (as you did in all other places).
There are two ids are all ambiguous. But this is the "convention" used by table designers. So, it is OK for an id in one table to be different than the id in the other table. (p.id refers to one thing in one table; bm.id refers to another in another table.)
SELECT p.id, p.address, bm.building_material
FROM properties as p
JOIN building_materials as bm ON (p.building_material = bm.id)
WHERE bm.building_material = "Wood" -- Note "bm."

MySQL query - how to look for certain string in the field

I have a table "story" as follows:
+++++++++++++++++++++++++++++++++++++++++++
| id | keywords |
+++++++++++++++++++++++++++++++++++++++++++
| 1 | romance,movie,drama |
| 2 | newmovie,horor,comedy |
| 3 | movie,scifi |
| 4 | newmovie,romance,drama,asia |
| 5 | kids,movie |
+++++++++++++++++++++++++++++++++++++++++++
I try a query to search 'movie' in keywords field as below:
SELECT id FROM story WHERE keywords LIKE '%movie%'
and the result is
1,2,3,4,5
but in this case I wanted the result is 1,3,5 (field value with newmovie not include). Can someone help me how the query to do it?
Thank you for your help..
You want to use find_in_set like this:
SELECT id FROM story WHERE find_in_set('movie', keywords) > 0;
Though you should really consider normalizing your table structure.
In this case, you could've stored one single keyword in one row, then the query would be simply like:
select id from story where keyword = 'movie';
and that would've been the end of it. No heavy string functions needed.
You could have structure like this:
keywords(id, name);
story(story_id,. . ., keyword_id);
then, you could simply join the two like this:
select s.*
from story s
inner join keywords k on s.keyword_id = k.id
where k.name = 'movie';
Your problem is that "newmovie" can be found by "%movie%" you need only search "movie".

Counting instances in table

I've got this tag system for tagging blog entries and such. The tags are in one table, containing only a tag name and a primary key. Then I have another table with objects that are using the tags.
It could look something like this:
_________________________________
| tags |
--------------------------------|
| id | name |
|-------------------------------|
| 1 | Scuba diving |
| 2 | Dancing |
---------------------------------
_________________________________
| tag_objects |
--------------------------------|
| id | tag | object |
|-------------------------------|
| 1 | 2 | 13 |
| 2 | 2 | 18 |
| 3 | 1 | 24 |
---------------------------------
Now, what I need to accomplish is to to add a column to the tags table, called "occurrences" or something. For each tag in tags, occurrences should be set to the number of times that tag is used in tag_objects.
So basically something like (obviously pseudo-code):
foreach(tags):
UPDATE tags
SET occurrences = (SELECT COUNT(id)
FROM tag_objects
WHERE tag = tags.id);
When people create new posts and stuff in the future, I'll just have a trigger to update the count, but I have a couple of thousand rows already that I need to count first. I don't know how to do this, so any assistance would be appreciated.
The easiest way to do this, without any extra tables, would be:
First add the extra field:
mysql> alter table tags add occurs int
default 0;
Then just update this new field with the number of occurences.
mysql> update tags left join (select tag,
count(id) as cnt from tag_objects
group by tag) as subq on
tags.id=subq.tag set
occurs=coalesce(subq.cnt,0);
Note the use of the left join to ensure all tags are counted, even the unused ones. The coalesce-function will convert NULL to 0.
You have done a good work, your query must work.
But, this will result in awful performance. I advise you to recreate a table :
CREATE TABLE newTags AS
SELECT t.id, t.name, COUNT(*) AS occurrences
FROM tags t
INNER JOIN tag_objects to
ON to.tag = tags.id
GROUP BY t.id, t.name
This will be very fast.
Unless you really need to denormalize your data, you should stay away from that. Counting on indexed columns is usually very fast. I am a big fan of clean and normalized data ;-)
I would generally not want to store computed values in columns on the database - it's messy, can easily get out of sync, and offends the deities of normalization.
However, if you really must have a database entity with the count, rather than calculating on the fly, I'd create a view (http://dev.mysql.com/doc/refman/5.0/en/create-view.html) which stores the pre-computed value, using the SQL provided by Scorpio
CREATE view tag_occurences AS
SELECT t.id, t.name,
COUNT(*) AS occurrences
FROM tags t
INNER JOIN tag_objects to
ON to.tag = tags.id
GROUP BY t.id, t.name
I think you will gain better performance if you will be incrementing and decrementing the value of occurrences on table tag_objects insert/delete trigger.
Your psuedeo code will work exactly as written (without the foreach loop). At least it would in oracle, I assume MySQL lets you use a correlated subquery as the value too.
For the inserting of new rows you could use a query like:
INSERT INTO tags VALUES(x,y,z,1) ON DUPLICATE KEY UPDATE occurrences = occurrences+1;
I didn't check the syntax, but something like that.

SQL statement to return elements from a column only if no elements from a different column match

Sorry for the confusing question, I will try to clarify.
I have an SQL database ( that I did not create ) that I would like to write a query for. I know very little about SQL, so it is hard for me to even know what to search for to see if this question has already been asked, so sorry if it has. It should be an easy solution for those in the know.
The query I need is for a search I would like to perform on an existing data management system. I want to return all the documents that a given user has NOT signed-off on, as indicated by rows in a signoffs_table. The data is stored similarly to as follows: (this is actually a simplification of the actual schema and hides several LEFT JOINS and columns)
signoffs_table:
| id | user_id | document_id | signers_list |
The naive solution I had was to do something like the following:
SELECT document_id from signoffs_table WHERE (user_id <> $BobsID) AND signers_list LIKE "%Bob%";
This works if ONLY Bob signs the document. The problem is that if Bob and Mary have signed the document then the table looks like this:
signoffs_table:
-----------------------------------------------
| id | user_id | document_id | signers_list |
-----------------------------------------------
| 1 | 10 | 100 | "Bob,Mary,Jim" |
| 2 | 20 | 100 | "Bob,Mary,Jim" |
-----------------------------------------------
(assume Bob's ID = 10 and mary's ID = 20).
and then when I do the query then I get back document_id 100 (in row #2) because there is a row that Bob should have signed, but did not.
Is what I am trying to do possible with the given database structure? I can provide more details if needed. I am not sure how much details are needed.
I guess this query is what you mean:
SELECT document_id FROM signoffs_table AS t1
WHERE signers_list LIKE "%Bob%"
AND NOT EXISTS (
SELECT 1 FROM signoffs_table AS t2
WHERE (t2.user_id = $BobsID) AND t2.document_id = t1.document_id )
I believe your design is incorrect. You have a many-to-many relationship between documents and signers. You should have a junction table, something like:
ID DocumentID SignerID

How do I store URL fragments in a database?

How are URLs (fragments) stored in a relational database?
In the following URL fragment:
~/house/room/table
it lists all the information on a table, and perhaps some information about the table.
This fragment:
~/house
outputs: Street 13 and Room, Garage, Garden
~/house/room
outputs: My room and Chair, Table, Window
What does the Database schema looks like? What if I rename house to flat?
Possible solution
I was thinking that I could create a hash for the URL and store it along with parentID and information. If I rename some upper-level segment I would then need to update all the rows which contain the given segment.
Then I thought would store each segment along with information and its level:
SELECT FROM items WHERE key=house AND level=1 AND key=room AND level=2
How do I solve this problem if the URL can be arbitrarily deep?
check The Adjacency List Model and The Nested Set Model described in Joe Celko's Trees and Hierarchies in SQL for Smarties
you should find plenty information to this topic. one article is here
Update
The Nested Set Model is very good if you are looking for a task like 'Retrieving a Single Path'. What you have is 'Find the Immediate Subordinates of a Node'. Here the Adjacency List Model is better.
| id | p_id | name |
| 1 | null | root |
| 2 | 1 | nd1.1 |
| 3 | 2 | nd1.2 |
| 4 | 1 | nd2.1 |
SQL to get a row with name of a fragment and it's direct sub items.
SELECT
p.name,
GROUP_CONCAT(
c.name
SEPARATOR '/'
) AS subList
FROM _table p
INNER JOIN _table c
ON p.id = c.p_id
WHERE p.name = 'root'
P.S. prefer WHERE p.id = 1. Id is unique where as name can be ambiguous.
see MySQL GROUP CONCAT function for more syntax details.