Multi-conditional join through a link table - mysql

Bear with me, this needs a lot of up front info to explain what I am trying to do. I have tried to genericize it as much as possible to make things clearer. In a single query I am hoping to pull out a list of pages which match against tags linked in another table, and these tags are in groups. I am hoping to use the textual representation of the item instead of it's id, but if nothing else I could do 2 up front queries to get the tag_id and taggroup_id - just hoping not to have to do that.
DB Schema:
+-----------------------------------+
| taggroups |
+------------------+----------------+
| taggroup_id | group_name |
+------------------+----------------+
| 1 | fruits |
+------------------+----------------+
+-----------------------------------------------+
| tags |
+-------------+-----------------+---------------+
| tag_id | taggroup_id | tag_name |
+-------------+-----------------+---------------+
| 1 | 1 | apple |
| 2 | 1 | orange |
| 3 | 1 | grape |
+-------------+-----------------+---------------+
+--------------------------------------+
| pages |
+------------------+-------------------+
| page_id | title |
+------------------+-------------------+
| 99 | Doctor a day |
+------------------+-------------------+
+--------------------------------------------------+
| tags_to_pages |
+------------+----------+---------------+----------+
| join_id | tag_id | taggroup_id | page_id |
+------------+----------+---------------+----------+
| 1 | 1 | 1 | 99 |
| 2 | 2 | 1 | 99 |
+------------+----------+---------------+----------+
Test Query:
Got this far and can't seem to get it to work.
SELECT
pages.*, tags.tag_name, taggroups.group_name
FROM
tags_to_pages
INNER JOIN taggroups as grp ON (
grp.group_name = 'fruits'
AND
tags_to_pages.taggroup_id = grp.taggroup_id
)
INNER JOIN tags as val ON (val.tag_name = 'apple' AND tags_to_pages.tag_id = val.tag_id)
LEFT JOIN pages ON (tags_to_pages.page_id = pages.page_id)
Additionally, what tables should have indexes and what should the indexes be for be optimization?

I'd do it this way:
SELECT
pages.*, tags.tag_name, taggroups.group_name
FROM
tags_to_pages
JOIN taggroups AS grp ON grp.taggroup_id = tags_to_pages.taggroup_id
JOIN tags AS val ON val.taggroup_id = grp.taggroup_id
JOIN pages ON tags_to_pages.page_id = pages.page_id
WHERE
grp.group_name='fruits'
AND val.tag_name = 'apple'
This isn't that different to what you have, but I'm putting the join criteria in the JOIN clauses and the selection criteria in the WHERE clauses, which seems tidier to me.
While re-typing this query I spotted that you were using a tag_id somewhere I thought should have been a taggroup_id, so I changed it, but I regret I can't see it again now.
I'd also worry about the selection criteria - what if an apple doesn't happed to be a fruit? Obviously it is in this case, and indeed in reality :-), but I think you should only be specifying the fruit name in your query, not fruit and group names, and leave the database to sort it out for itself.
Also, why use an INNER JOIN for tags, tags_to_pages and taggroups and an OUTER JOIN for pages? Surely if there are no pages, you're better of getting no rows returned rather than one row, half full on NULLS?
I'd index the id columns only.
Just my 2p worth, really.
EDIT
I have set up a demo of this on www.sqlfiddle.com. Your original query worked fine once I changed the aliases in the SELECT list. My attempt above didn't work terribly well :-(. I had the same problems with the aliases and once I fixed them the query returned the same row twice.
I re-wrote it again from scratch as
SELECT pages.*, tags.tag_name, taggroups.group_name
FROM pages
JOIN tags_to_pages AS ttp ON ttp.page_id = pages.page_id
JOIN tags ON tags.tag_id = ttp.tag_id
JOIN taggroups ON taggroups.taggroup_id = ttp.taggroup_id
WHERE taggroups.group_name = 'fruits' AND tags.tag_name='apple';
and it works fine link.
Setting up this demo, it struck me to wonder why you were saving the taggroup_id in the tags_to_pages table. I'm "self-taught" in SQL and databases (translate that to: "I make it up as I go along, rely on doing things that 'work' and trust to intuition to find out what's 'right'") but doesn't this break the idea of normalisation? Shouldn't the connection between tags and taggroups be defined only via the taggroup_id column in the tags table? Perhaps someone who really understands databases will come along and put me right.
Finally, I've no idea why PHPMyAdmin just hung up when you tried your query. Good luck!

Related

MySQL linking two tables with another table and running filtered queries

Sorry if this has already been posted, I've had a look but don't really know what to search for!
Problem
I'm currently setting up a system to 'tag' a student society with various tags. The idea is that we can create and apply any tags we want, then using an API use the tags to authorise different societies for different processes (i.e. check if they have a certain tag applied).
We're also using these tags to create groups of students. I want a user to be able to set up tag filters (for example, they should be able to say 'I want to make a group of all Academic societies who are in the Science faculty', where these groups would be tagged with both the 'Academic' tag and the 'Science' tag).
The design for our tables is outlined below:
Societies Table:
+------------+-------------------+
| Society ID | Society Name |
+------------+-------------------+
| 1 | Physics Society |
| 2 | Chemistry Society |
+------------+-------------------+
Tags Table (Where the tags are defined):
+--------+--------------+
| Tag ID | Tag Name |
+--------+--------------+
| 1 | Academic |
| 2 | Science |
| 3 | Volunteering |
+--------+--------------+
Linking Table:
+---------+--------+----------+
| Link ID | Tag ID | Group ID |
+---------+--------+----------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |
| 5 | 3 | 2 |
+---------+--------+----------+
In this case, the Physics society has been tagged with the Academic and Science tags.
Attempt at a Solution
I now need a way to search for specific societies. To ensure any groups can be made, I think we need to include the operators AND, OR and NOT as well as brackets, then construct an SQL query from this? Please correct me if there's a better way to do it!
The user will enter filters in a format like below (this doesn't have to be the case but I couldn't think of any other ways!):
({1,3} OR {6}) NOT {5}
Using PHP, I then convert this to an SQL query:
WHERE (tag_id IN (1,3) OR tag_id IN (6)) AND tag_id NOT IN (5)
giving a complete query
SELECT tag_links.society_id FROM tag_links WHERE (tag_id IN (1,3) OR tag_id IN (6)) AND tag_id NOT IN (5)
This unfortunately doesn't quite work - the results return the same society ID multiple times. I think this is because, as there are multiple rows per society, the query won't exclude any groups tagged with the tag_id 5. It'll just not return the row linking the society and the tag_id 5.
Is there a better way to do this? I haven't had a huge amount of experience with SQL queries beyond the basics, so may be missing something obvious...
Thanks a lot for any help!
Toby
I think you want a way to find societies having a certain set of tags. If so, then here is a general query which should work:
SELECT
s.ID, s.Name
FROM Societies s
INNER JOIN Linking lnk
ON s.ID = lnk.Soc_ID
INNER JOIN Tags t
ON lnk.Tag_ID = t.ID
WHERE
t.Name IN ('Academic', 'Science')
GROUP BY
S.ID
HAVING COUNT(DISTINCT t.ID) = 2;
The above query would find all societies having both tags 'Academic' and 'Science'. Based on your sample data, this would find the physics society only.
To address your comment below, if you wanted to find societies having two tags, and also not having one or more tags, then the query gets a bit uglier, because we need conditional aggregation:
SELECT
s.ID, s.Name
FROM Societies s
INNER JOIN Linking lnk
ON s.ID = lnk.Soc_ID
INNER JOIN Tags t
ON lnk.Tag_ID = t.ID
GROUP BY
S.ID
HAVING
SUM(CASE WHEN t.Name = 'Academic' THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN t.Name = 'Science' THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN t.Name = 'Volunteering' THEN 1 ELSE 0 END) = 0;

Removing Records with String Contained in Other Records using 3 tables and Joins

I previously got a great answer (thank you #Paul Spiegel) on removing records from a table whose string was contained at the end of another record. For example, removing 'Farm' when 'Animal Farm' existed) and grouped by a Client Field.
The problem is, in fact, a little more complex and spans three tables, I'd hoped I could extend the logic easily but it turns out to also be challenging (for me). Instead of one table with Client and Term, I have three tables:
Terms
Clients
Look-up-Table (LUT) where I store pairs of TermID and ClientID
I have made some progress since initially posting this question so where I stand is I made the Joins and resultant Select return the fields I want to delete from the Look-up-Table (LUT):
http://sqlfiddle.com/#!9/479c72/45
The final select being:
Select Distinct(C.Title),T2.Term From LUT L
Inner Join Terms T
On L.TermID=T.ID
Inner Join Terms T2
On T.Term Like Concat('% ', T2.Term)
Inner Join Clients C
On C.ID=L.ClientID;
I am in the process of trying to turn this into a Delete with little success.
Append this to your query:
Inner Join LUT L2
On L2.ClientID = L.ClientID
And L2.TermID = T2.ID
That will ensure, that the clients do match and you will get the following result:
| ClientID | TermID | ID | Term | ID | Term | ID | Title | ClientID | TermID |
|----------|--------|----|---------------|----|-----------|----|-------|----------|--------|
| 1 | 2 | 2 | Small Dog | 1 | Dog | 1 | Bob | 1 | 1 |
| 2 | 5 | 5 | Big Black Dog | 3 | Black Dog | 2 | Alice | 2 | 3 |
To delete the corresponding rows from the LUT table, replace Select * with Delete L2.
But deleting the terms is more tricky. Since it's a many-to-many relation, the term may belong to multiple clients. So you can't just delete them. You will need to cleanup up the table in a second statement. That can be done with the following statement:
Delete T
From Terms T
Left Join LUT L
On L.TermID = T.ID
Where L.TermID Is Null
Demo: http://sqlfiddle.com/#!9/b17659/1
Note that in this case the term Medium Dog will also be deleted, since it doesn't belong to any client.

MySQL get data from three tables using one id

I have three different tables, which have following structure:
Food
ID | title
---+----------
1 | sandwich
2 | spaghetti
Ingridients
ID | food_reference | type | location | bought
----+----------------+------+----------+----------
100 | 1 | ham | storeA | 11-1-2013
101 | 1 | jam | storeB | 11-1-2013
102 | 2 | tuna | storeB | 11-6-2013
Tags
ID | food_reference | tag
----+----------------+-----
1000| 1 | Tag
1001| 1 | Tag2
1002| 2 | fish
and using one select I want to get all information from these three tables (title,type,location,bought,tag) for one specific ID.
I have tried something like
SELECT food.*,ingridients.*,tags.* FROM food
JOIN ingridients
ON :id=ingridients.food_reference
JOIN tags
ON :id=tags.food_reference
WHERE id=:id
BUT this query returns for id=1 only one row from ingridients and tags even though there are two matching rows (ham and jam, Tag and Tag2). Could you tell me what am I doing wrong?
EDIT: I tried LolCoder's solution, but I still got only one result, even though in fiddle it seems to work. However I tried :
SELECT F.*, group_concat(I.type), group_concat(I.location),
group_concat(I.bought), group_concat(T.tag)
FROM feeds F
INNER JOIN ingridients I
ON :id = I.food_reference
INNER JOIN tags T
ON :id=T.food_reference
WHERE F.id=:id
This finds data from ALL matching rows, but several times, i.e. I get (for id=1)
sandwich,ham,ham,ham,jam,jam,jam,tag,tag,tag,tag2,tag2,tag2
EDIT2: magic happened and LolCoder's solution works, so thank you :-)
Try with this query:
SELECT food.*,ingridients.*,tags.* FROM food
JOIN ingridients
ON food.id=ingridients.food_reference
JOIN tags
ON food.id=tags.food_reference
WHERE food.id=1
Check SQLFIDDLE
use alias with joins as.column name

mysql get table based on common column between two tables

while trying to learn sql i came across "Learn SQL The Hard Way" and i started reading it.
Everything was going fine then i thought ,as a way to practice, to make something like given example in the book (example consists in 3 tables pet,person,person_pet and the person_pet table 'links' pets to their owners).
I made this:
report table
+----+-------------+
| id | content |
+----+-------------+
| 1 | bank robbery|
| 2 | invalid |
| 3 | cat on tree |
+----+-------------+
notes table
+-----------+--------------------+
| report_id | content |
+-----------+--------------------+
| 1 | they had guns |
| 3 | cat was saved |
+-----------+--------------------+
wanted result
+-----------+--------------------+---------------+
| report_id | report_content | report_notes |
+-----------+--------------------+---------------+
| 1 | bank robbery | they had guns |
| 2 | invalid | null or '' |
| 3 | cat on tree | cat was saved |
+-----------+--------------------+---------------+
I tried a few combinations but no success.
My first thought was
SELECT report.id,report.content AS report_content,note.content AS note_content
FROM report,note
WHERE report.id = note.report_id
but this only returns the ones that have a match (would not return the invalid report).
after this i tried adding IF conditions but i just made it worse.
My question is, is this something i will figure out after getting past basic sql
or can this be done in simple way?
Anyway i would appreciate any help, i pretty much lost with this.
Thank you.
EDIT: i have looked into related questions but havent yet found one that solves my problem.
I probably need to look into other statements such as join or something to sort this out.
You need to get to the chapter on OUTER JOINS, specifically, a LEFT JOIN
SELECT report.id,report.content AS report_content,note.content AS note_content
FROM report
LEFT JOIN note ON report.id = note.report_id
Note the ANSI-92 JOIN syntax as opposed to using WHERE x=y
(You can probably do it using the older syntax you were using WHERE report.id *= note.report_id, if I recall the old syntax correctly, but I'd recommend the above syntax instead)
You are doing a join. The kind of join you have is an inner join, but you want an outer join:
SELECT report.id,report.content AS report_content,note.content AS note_content
FROM report
LEFT JOIN note on report.id = note.report_id
Note that the LEFT table is the one that will supply the missing values.

Is this good Database Normalization?

I am a beginner at using mysql and I am trying to learn the best practices. I have setup a similar structure as seen below.
(main table that contains all unique entries) TABLE = 'main_content'
+------------+---------------+------------------------------+-----------+
| content_id | (deleted) | title | member_id |
+------------+---------------+------------------------------+-----------+
| 6 | | This is a very spe?cal t|_st | 1 |
+------------+---------------+------------------------------+-----------+
(Provides the total of each difficulty and joins id --> actual name) TABLE = 'difficulty'
+---------------+-------------------+------------------+
| difficulty_id | difficulty_name | difficulty_total |
+---------------+-------------------+------------------+
| 1 | Absolute Beginner | 1 |
| 2 | Beginner | 1 |
| 3 | Intermediate | 0 |
| 4 | Advanced | 0 |
| 5 | Expert | 0 |
+---------------+-------------------+------------------+
(This table ensures that multiple values can be inserted for each entry. For example,
this specific entry indicates that there are 2 difficulties associated with the submission)
TABLE = 'lookup_difficulty'
+------------+---------------+
| content_id | difficulty_id |
+------------+---------------+
| 6 | 1 |
| 6 | 2 |
+------------+---------------+
I am joining all of this into a readable query:
SELECT group_concat(difficulty.difficulty_name) as difficulty, member.member_name
FROM main_content
INNER JOIN difficulty ON difficulty.difficulty_id
IN (SELECT difficulty_id FROM main_content, lookup_difficulty WHERE lookup_difficulty.content_id = main_content.content_id )
INNER JOIN member ON member.member_id = main_content.member_id
The above works fine, but I am wondering if this is good practice. I practically followed the structure laid out Wikipedia's Database Normalization example.
When I run the above query using EXPLAIN, it says: 'Using where; Using join buffer' and also that I am using 2 DEPENDENT SUBQUERY (s) . I don't see any way to NOT use sub-queries to achieve the same affect, but then again I'm a noob so perhaps there is a better way....
The DB design looks fine - regarding your query, you could rewrite it exclusively with joins like:
SELECT group_concat(difficulty.difficulty_name) as difficulty, member.member_name
FROM main_content
INNER JOIN lookup_difficulty ON main_content.id = lookup_difficulty.content_id
INNER JOIN difficulty ON difficulty.id = lookup_difficulty.difficulty_id
INNER JOIN member ON member.member_id = main_content.member_id
If the lookup_difficulty provides a link between content and difficulty I would suggest you take out the difficulty_id column from your main_content table. Since you can have multiple lookups for each content_id, you would need some extra business logic to determine which difficulty_id to put in your main_content table (or multiple entries in the main_content table for each difficulty_id, but that goes against normalization practices). For ex. the biggest value / smallest value / random value. In either case, it does not make much sense.
Other than that the table looks fine.
Update
Saw you updated the table :)
Just as a side-note. Using IN can slow down your query (IN can cause a table-scan). In any case, it used to be that way, but I'm sure that these days the SQL compiler optimizes it pretty well.