Replace nested select query by self join - mysql

I recently asked a question here concerning an SQL query: Trouble wrapping head around complex SQL delete query
I now understand that what I'm trying to do is too complex to pull off with a single query or even multiple queries without some way to keep results in between. Therefore I decided to create a bash script (the end result will be something to do with a cronjob so bash is the most straightforward choice).
Consider the following table:
AssociatedClient:
+-----------+-----------------+
| Client_id | Registration_id |
+-----------+-----------------+
| 2 | 2 |
| 3 | 2 |
| 3 | 4 |
| 4 | 5 |
| 3 | 6 |
| 5 | 6 |
| 3 | 8 |
| 8 | 9 |
| 7 | 10 |
+-----------------------------+
What I want to do is select all Registration_ids where the Client_id is in the list of Client_ids associated with a specific Registration_id.
Although I'm pretty noob with SQL, I found this query relatively easy:
SELECT `Registration_id` FROM `AssociatedClient` ac1
WHERE ac1.`Client_id` IN
(SELECT `Client_id` FROM `AssociatedClient` ac2
WHERE ac2.`Registration_id` = $reg_id);
where $reg_id is just a bash variable.
This works but I would like to see it done with a self join, because it looks nicer, especially within a bash script where a lot of character clutter occurs. I'm afraid my SQL skills just don't reach that far.

If I've understood correctly, you should just be able to do a simple self join like so:
SELECT ac1.registration_id
FROM associatedclient ac1
JOIN associatedclient ac2 ON ac2.client_id = ac1.client_id
WHERE ac2.registration_id = $reg_id
So what you are doing is scanning the table once, joining it to itself where the client_id matches. Then you are restricting the joined rows to ones where the 2nd version of the table has a specific id, leaving you with the different permutations of the join on the 1st table, and then just picking the registration_id from those rows.
So, given the example of a variable value of 6, try running the following statement:
SELECT
ac1.client_id AS client_id_1
, ac1.registration_id AS reg_id_1
, ac2.client_id AS client_id_2
, ac2.registration_id AS reg_id_2
FROM associatedclient ac1
JOIN associatedclient ac2 ON ac1.client_id = ac2.client_id
and you'll notice the full set of joins. Then try adding the WHERE restriction and notice which rows come back. Then finally just pick the column you want.
You can check out a SQLFiddle I set up which tests it with a value of 6

Related

Removing Records with String Contained in Other Records using 3 tables and Joins

I previously got a great answer (thank you #Paul Spiegel) on removing records from a table whose string was contained at the end of another record. For example, removing 'Farm' when 'Animal Farm' existed) and grouped by a Client Field.
The problem is, in fact, a little more complex and spans three tables, I'd hoped I could extend the logic easily but it turns out to also be challenging (for me). Instead of one table with Client and Term, I have three tables:
Terms
Clients
Look-up-Table (LUT) where I store pairs of TermID and ClientID
I have made some progress since initially posting this question so where I stand is I made the Joins and resultant Select return the fields I want to delete from the Look-up-Table (LUT):
http://sqlfiddle.com/#!9/479c72/45
The final select being:
Select Distinct(C.Title),T2.Term From LUT L
Inner Join Terms T
On L.TermID=T.ID
Inner Join Terms T2
On T.Term Like Concat('% ', T2.Term)
Inner Join Clients C
On C.ID=L.ClientID;
I am in the process of trying to turn this into a Delete with little success.
Append this to your query:
Inner Join LUT L2
On L2.ClientID = L.ClientID
And L2.TermID = T2.ID
That will ensure, that the clients do match and you will get the following result:
| ClientID | TermID | ID | Term | ID | Term | ID | Title | ClientID | TermID |
|----------|--------|----|---------------|----|-----------|----|-------|----------|--------|
| 1 | 2 | 2 | Small Dog | 1 | Dog | 1 | Bob | 1 | 1 |
| 2 | 5 | 5 | Big Black Dog | 3 | Black Dog | 2 | Alice | 2 | 3 |
To delete the corresponding rows from the LUT table, replace Select * with Delete L2.
But deleting the terms is more tricky. Since it's a many-to-many relation, the term may belong to multiple clients. So you can't just delete them. You will need to cleanup up the table in a second statement. That can be done with the following statement:
Delete T
From Terms T
Left Join LUT L
On L.TermID = T.ID
Where L.TermID Is Null
Demo: http://sqlfiddle.com/#!9/b17659/1
Note that in this case the term Medium Dog will also be deleted, since it doesn't belong to any client.

MySQL split and join the values

I have a table [mapping] with 2 columns similar to below
id | values
1 | 1,2
2 | 1,2,3
3 | 1,1
4 | 1,1,2
and another table [map] is similar to this
sno | values
1 | Test
2 | Hello
3 | Hai
My expected output is
id | values
1 | Test,Hello
2 | Test,Hello,Hai
3 | Test,Test
4 | Test,Test,Hello
Is it possible? If it is please can anybody build a query for me.
You can use MySQL FIND_IN_SET() to join the tables and GROUP_CONCAT() to concat the values :
SELECT s.sno,GROUP_CONCAT(s.values) as `values`
FROM mapping t
INNER JOIN map s ON(FIND_IN_SET(s.id,t.values))
GROUP BY s.sno
Note: You should know that this is a very bad DB structure. This may lead to a lot more complicated queries and will force you to over complicate things. You should Normalize your data, split it , and place each ID in a separate record!
SELECT
`ids`.`id`,
GROUP_CONCAT(`values`.`texts`) AS texts
FROM
`ids`
INNER JOIN `values` ON FIND_IN_SET(`values`.`id`, `ids`.`values`)
GROUP BY
`ids`.`id`
It works like this: Example

Multi-conditional join through a link table

Bear with me, this needs a lot of up front info to explain what I am trying to do. I have tried to genericize it as much as possible to make things clearer. In a single query I am hoping to pull out a list of pages which match against tags linked in another table, and these tags are in groups. I am hoping to use the textual representation of the item instead of it's id, but if nothing else I could do 2 up front queries to get the tag_id and taggroup_id - just hoping not to have to do that.
DB Schema:
+-----------------------------------+
| taggroups |
+------------------+----------------+
| taggroup_id | group_name |
+------------------+----------------+
| 1 | fruits |
+------------------+----------------+
+-----------------------------------------------+
| tags |
+-------------+-----------------+---------------+
| tag_id | taggroup_id | tag_name |
+-------------+-----------------+---------------+
| 1 | 1 | apple |
| 2 | 1 | orange |
| 3 | 1 | grape |
+-------------+-----------------+---------------+
+--------------------------------------+
| pages |
+------------------+-------------------+
| page_id | title |
+------------------+-------------------+
| 99 | Doctor a day |
+------------------+-------------------+
+--------------------------------------------------+
| tags_to_pages |
+------------+----------+---------------+----------+
| join_id | tag_id | taggroup_id | page_id |
+------------+----------+---------------+----------+
| 1 | 1 | 1 | 99 |
| 2 | 2 | 1 | 99 |
+------------+----------+---------------+----------+
Test Query:
Got this far and can't seem to get it to work.
SELECT
pages.*, tags.tag_name, taggroups.group_name
FROM
tags_to_pages
INNER JOIN taggroups as grp ON (
grp.group_name = 'fruits'
AND
tags_to_pages.taggroup_id = grp.taggroup_id
)
INNER JOIN tags as val ON (val.tag_name = 'apple' AND tags_to_pages.tag_id = val.tag_id)
LEFT JOIN pages ON (tags_to_pages.page_id = pages.page_id)
Additionally, what tables should have indexes and what should the indexes be for be optimization?
I'd do it this way:
SELECT
pages.*, tags.tag_name, taggroups.group_name
FROM
tags_to_pages
JOIN taggroups AS grp ON grp.taggroup_id = tags_to_pages.taggroup_id
JOIN tags AS val ON val.taggroup_id = grp.taggroup_id
JOIN pages ON tags_to_pages.page_id = pages.page_id
WHERE
grp.group_name='fruits'
AND val.tag_name = 'apple'
This isn't that different to what you have, but I'm putting the join criteria in the JOIN clauses and the selection criteria in the WHERE clauses, which seems tidier to me.
While re-typing this query I spotted that you were using a tag_id somewhere I thought should have been a taggroup_id, so I changed it, but I regret I can't see it again now.
I'd also worry about the selection criteria - what if an apple doesn't happed to be a fruit? Obviously it is in this case, and indeed in reality :-), but I think you should only be specifying the fruit name in your query, not fruit and group names, and leave the database to sort it out for itself.
Also, why use an INNER JOIN for tags, tags_to_pages and taggroups and an OUTER JOIN for pages? Surely if there are no pages, you're better of getting no rows returned rather than one row, half full on NULLS?
I'd index the id columns only.
Just my 2p worth, really.
EDIT
I have set up a demo of this on www.sqlfiddle.com. Your original query worked fine once I changed the aliases in the SELECT list. My attempt above didn't work terribly well :-(. I had the same problems with the aliases and once I fixed them the query returned the same row twice.
I re-wrote it again from scratch as
SELECT pages.*, tags.tag_name, taggroups.group_name
FROM pages
JOIN tags_to_pages AS ttp ON ttp.page_id = pages.page_id
JOIN tags ON tags.tag_id = ttp.tag_id
JOIN taggroups ON taggroups.taggroup_id = ttp.taggroup_id
WHERE taggroups.group_name = 'fruits' AND tags.tag_name='apple';
and it works fine link.
Setting up this demo, it struck me to wonder why you were saving the taggroup_id in the tags_to_pages table. I'm "self-taught" in SQL and databases (translate that to: "I make it up as I go along, rely on doing things that 'work' and trust to intuition to find out what's 'right'") but doesn't this break the idea of normalisation? Shouldn't the connection between tags and taggroups be defined only via the taggroup_id column in the tags table? Perhaps someone who really understands databases will come along and put me right.
Finally, I've no idea why PHPMyAdmin just hung up when you tried your query. Good luck!

Is this good Database Normalization?

I am a beginner at using mysql and I am trying to learn the best practices. I have setup a similar structure as seen below.
(main table that contains all unique entries) TABLE = 'main_content'
+------------+---------------+------------------------------+-----------+
| content_id | (deleted) | title | member_id |
+------------+---------------+------------------------------+-----------+
| 6 | | This is a very spe?cal t|_st | 1 |
+------------+---------------+------------------------------+-----------+
(Provides the total of each difficulty and joins id --> actual name) TABLE = 'difficulty'
+---------------+-------------------+------------------+
| difficulty_id | difficulty_name | difficulty_total |
+---------------+-------------------+------------------+
| 1 | Absolute Beginner | 1 |
| 2 | Beginner | 1 |
| 3 | Intermediate | 0 |
| 4 | Advanced | 0 |
| 5 | Expert | 0 |
+---------------+-------------------+------------------+
(This table ensures that multiple values can be inserted for each entry. For example,
this specific entry indicates that there are 2 difficulties associated with the submission)
TABLE = 'lookup_difficulty'
+------------+---------------+
| content_id | difficulty_id |
+------------+---------------+
| 6 | 1 |
| 6 | 2 |
+------------+---------------+
I am joining all of this into a readable query:
SELECT group_concat(difficulty.difficulty_name) as difficulty, member.member_name
FROM main_content
INNER JOIN difficulty ON difficulty.difficulty_id
IN (SELECT difficulty_id FROM main_content, lookup_difficulty WHERE lookup_difficulty.content_id = main_content.content_id )
INNER JOIN member ON member.member_id = main_content.member_id
The above works fine, but I am wondering if this is good practice. I practically followed the structure laid out Wikipedia's Database Normalization example.
When I run the above query using EXPLAIN, it says: 'Using where; Using join buffer' and also that I am using 2 DEPENDENT SUBQUERY (s) . I don't see any way to NOT use sub-queries to achieve the same affect, but then again I'm a noob so perhaps there is a better way....
The DB design looks fine - regarding your query, you could rewrite it exclusively with joins like:
SELECT group_concat(difficulty.difficulty_name) as difficulty, member.member_name
FROM main_content
INNER JOIN lookup_difficulty ON main_content.id = lookup_difficulty.content_id
INNER JOIN difficulty ON difficulty.id = lookup_difficulty.difficulty_id
INNER JOIN member ON member.member_id = main_content.member_id
If the lookup_difficulty provides a link between content and difficulty I would suggest you take out the difficulty_id column from your main_content table. Since you can have multiple lookups for each content_id, you would need some extra business logic to determine which difficulty_id to put in your main_content table (or multiple entries in the main_content table for each difficulty_id, but that goes against normalization practices). For ex. the biggest value / smallest value / random value. In either case, it does not make much sense.
Other than that the table looks fine.
Update
Saw you updated the table :)
Just as a side-note. Using IN can slow down your query (IN can cause a table-scan). In any case, it used to be that way, but I'm sure that these days the SQL compiler optimizes it pretty well.

Chaining results from multiple tables using SQL

I have a set of tables with following structures
**EntityFields**
fid | pid
1 | 1
2 | 1
3 | 2
4 | 2
5 | 1
**Language**
id | type | value
1 | Entity | FirstEntity
2 | Entity | SecondEntity
1 | Field | Name
2 | Field | Age
3 | Field | Name
4 | Field | Age
5 | Field | Location
Now as you may have understood, the first table gives the EntityField assignment to each Entity. The second table gives out the names for those IDs. What I want to output is something like the following
1 | FirstEntity / Name (i.e. a concat of the Entity and the EntityField name)
2 | FirstEntity / Age
3 | FirstEntity / Location
4 | SecondEntity / Name
5 | SecondEntity / Age
Is this possible?
Thank you for the answers, unfortunately the table structure is something that I cannot change. The table structure it self belongs to another data directory system which is quite flexible and which I am using to pull out data. I know that without providing the necessary background, this table structure looks quite weird, but it is something that works quite well (except in this scenario).
I will try out the examples here and will let you know.
For your current table structure, I think the following will work
SELECT EntityFields.fid, CONCAT(L1.value, ' / ' L2.value)
FROM EntityFields INNER JOIN Language as L1 ON EntityFields.pid=L1.id and L1.type='Entity'
INNER JOIN Language as L2 ON EntityFields.fid=L2.id and L2.type='Field'
ORDER BY EntityFields.fid
However, this query could be made much easier by having a better table structure. For example, with the following structure:
**EntityFields**
fid | pid | uid
1 | 1 | 1
2 | 1 | 2
1 | 2 | 3
2 | 2 | 4
3 | 1 | 5
**Entities**
id | value
1 | FirstEntity
2 | SecondEntity
**Fields**
id | value
1 | Name
2 | Age
3 | Location
you can use the somewhat simpler query:
SELECT uid, CONCAT(Entities.value, Fields.value)
FROM EntityFields INNER JOIN Entities ON EntityFields.pid=Entities.id
INNER JOIN Fields ON EntityFields.fid=Fields.id
ORDER BY uid
Well, I have no idea what you're trying to accomplish here. The fact that you label some records "Entity" and others "Field" and then try to connect them to each other makes it look to me like you are mixing two totally different things in the same table. Why not have an Entity table and a Field table?
You could get the results you seem to want by writing
select fid, le.value, lf.value
from entittyfields e
join language le on e.pid=le.id and type='Entity'
join language lf on e.fid=lf.id and type='Field'
order by fid
But I think you'd be wise to rethink your table design. Perhaps you could explain what you're trying to accomplish.
SELECT ef.fid AS id
, COALESCE(e.value, '-', ef.pid, ' / ', f.value)
AS entity_field
FROM EntityFields ef
JOIN Language AS e
ON e.id = ef.id
AND e.type = 'Entity'
JOIN Language AS f
ON f.id = ef.id
AND f.type = 'Field'
ORDER BY ef.pid
, ef.fid
If I understand your question, which I don't think I do, this is simple. It appears to be a set of very poorly designed tables (Language doing more than one thing, for example). And it appears that the Language table has two types of records: a) The Entity records, which have type='Entity' and b) Field records, which have type='Field'.
At any rate, the way I would approach it is to treat the Language table as if it were two tables:
select ef.fid, Entities.value, Fields.value
from entityfields ef
inner join language Entities
on Entities.id = ef.id
and Entities.type = 'Entity'
inner join language Fields
on Fields.id = ef.id
and Fields.Type = 'Field'
order by 2, 3
First stab, anyway. That should help you get the answer.