Querying a Many-to-Many Linking Table - mysql

I have a linking table for a many-to-many relationship, with the fields
-
idNote
idTag
I would like to filter for all the tags that are associated with the notes that contain a specified number of tags.
For example, if I select the tags 'Running', 'Form', and 'Times', I would then like to see all the tags that are associated with the notes that have these 3 tags.
This process will be used by the user on the front end to refine the results they are looking for, so I need to be able to generate this SQL with code (node.js), with the filtering by tags potentially occurring many times over.
I have the below SQL code, which can query for two tags, but there are some problems with it:
It does not seem efficient
It can not be easily generated through code if another layer of filers needs to be added
SELECT DISTINCT idtag FROM table WHERE idnote IN (SELECT idnote FROM
(SELECT * FROM table WHERE idnote IN (SELECT idnote FROM table WHERE idtag
= 'Example')) as t1 where t1.idtag = 'SecondExample');
I am hoping for some suggestions on how to improve the efficiency of this code, as well as turning the sql statement into something that is easily code generateable.

Sounds like a data trap, the Cartesian product
https://en.wikipedia.org/wiki/Cartesian_product
Is there anything to bridge the two tables?
Like a common table between the two that we can join to?
Instead of N:N
Table A would be something in common with the notes table (B) and tags table (C)
we could have Table A join to Table B as 1:N
and Table A also join to C as 1:N
Then you could stitch the two separate facts together with a common table

I used your example of 'Running','Form','Times' as the specified set of tags.
select distinct idTag from table
where idNote in (select idNote from table where idTag in ('Running'))
and idNote in (select idNote from table where idTag in ('Form'))
and idNote in (select idNote from table where idTag in ('Times'))

Try something like this:
; with cteTagList as
(select 'Example' idtag
union select 'SecondExample'
--...
union select 'LastExample'
)
select t.idnote
from table t inner join cteTabList l on l.idtag = t.idtag
group by t.idnote
having count(*) = [NUMBER_OF_SEARCH_TAGS]
Where you generate a CTE (Common Table Expression) which contains all the search tags. Join it with the many-to-many relation table, and only select those notes that hat count equal to the number of search tags inputed by the user, noted [NUMBER_OF_SEARCH_TAGS] in the query

Related

Why does this sql query error show up? Is there maybe another way to write this?

I am using the Chinook database for a project and I have two difficult queries to execute, but both provide errors.
I am looking for all the orders (invoice) that were sent to 'New York' and contain tracks that belong to more than one genre. [InvoiceId, amount of products, total1, total2]. Total1 should be unitprice*quantity and total2 is total. It should show only 2 rows.
So far I have come up with this. I have also tried switching up with left join, full outer join, etc
CREATE TEMPORARY TABLE temp AS
SELECT *
FROM track join invoiceline USING (TrackId)
WHERE (select * from track t1 where EXISTS (select * from track t2 where t1.GenreId <> t2.GenreId));
SELECT invoice.InvoiceId, invoiceline.Quantity, invoiceline.UnitPrice*invoiceline.Quantity, invoice.Total
FROM (SELECT * FROM invoice JOIN invoiceline
WHERE invoice.BillingCity LIKE '%New York%') JOIN temp cc ON invoiceline.TrackId
GROUP BY invoiceline.InvoiceId;
DROP TABLE temp;
It provides the error:
Operand should contain 1 column(s)
I am looking for clients (in couples) that have bought more than two of the same tracks. It should provide 14 rows.
Until now I have come up with this.
SELECT CONCAT(FIRSTNAME,',', LASTNAME) AS name1 FROM customer
JOIN invoice ON customer.CustomerId = invoice.CustomerId
JOIN invoiceline ON invoice.InvoiceId = invoiceline.InvoiceId
JOIN track ON invoiceline.TrackId = track.TrackId
UNION
(
SELECT CONCAT(FIRSTNAME,',', LASTNAME) AS name2 FROM customer
JOIN invoice ON customer.CustomerId = invoice.CustomerId
JOIN invoiceline ON invoice.InvoiceId = invoiceline.InvoiceId
JOIN track ON invoiceline.TrackId = track.TrackId
);
So A) Does anybody know why it provides that error?
B) Could anyone give any tips or suggest a better way to write these queries?
Here are two helpful schemas:ER diagram
relational diagram
Answer to you first question:
The error comes up because many rows would have a single genre id. This method is also very redundant.
You should use count of genre Ids and take track Ids with count more than 1 as shown below:
CREATE TEMPORARY TABLE temp AS
SELECT *
FROM track join invoiceline USING (TrackId)
WHERE TrackId in
(select TrackId from (select TrackId, count(distinct GenreId) as genres from track group by 1 having genres>1));
SELECT invoice.InvoiceId, invoiceline.Quantity, invoiceline.UnitPrice*invoiceline.Quantity, invoice.Total
FROM (SELECT * FROM invoice JOIN invoiceline
WHERE invoice.BillingCity LIKE '%New York%') JOIN temp cc ON invoiceline.TrackId
GROUP BY invoiceline.InvoiceId;
DROP TABLE temp;
I have assumed that track id is the primary key here.
For the second question, I assume that you want to find customers buying the same records. You can use a query like the one below:
SELECT invoiceline.TrackId, group_concat(customer.CustomerId) as customers FROM customer
JOIN invoice ON customer.CustomerId = invoice.CustomerId
JOIN invoiceline ON invoice.InvoiceId = invoiceline.InvoiceId
JOIN track ON invoiceline.TrackId = track.TrackId
group by 1
This will give you comma separated customer ids who have bought the same track. Also, use customer id instead of first name and last name since some customers can have the same name. Using primary key is best.
Since you mentioned, you want customers buying the same records in couples, I would suggest reading up on market basket analysis or association analysis using apriori algorithm. You can import your dataset into R or Python whichever you are comfortable with and build a visualization. Python is faster and can handle more data but its visualizations are bad. R is a bit slow at handling large amounts of data but has good visualizations for apriori algorithm

MySQL select all records that match the same others records in another table

I have two tables:
Table _models with fields name and model_id
And
Table _tags with fields tag_name, tags_id and model_id
In my web app, I can assign some tags to a model by adding records in table _tags with the model’s model_id related field.
How can I SELECT from table _models just the models which have the same tags assigned in the _tags table?
For example, I need to SELECT all the models that have assigned both the tag #jacket and the tag #trench
For your example, you can use a GROUP BY/HAVING along with a COUNT DISTINCT to find models that have both of the tags assigned.
SELECT m.model_id, m.name
FROM models m
INNER JOIN tags t
ON m.model_id = t.model_id
WHERE t.tag_name IN ('#jacket', '#trench')
GROUP BY m.model_id, m.name
HAVING COUNT(DISTINCT t.tag_name) = 2;
Slightly different way to do the same:
SELECT
m.model_id
, m.name
FROM
_models m
WHERE
2 = (
SELECT
COUNT(*)
FROM
_tags t
WHERE
m.model_id = t.model_id
AND t.tag_name IN ('#jacket', '#trench')
)
Note 1: You would better move tag names to separate table. So 3 tables: models (id, name), tags (id, name), tags2models (tag_id, model_id)
Note 2: Do not forget to add index (tag_name, model_id) on table _tags

SQL Comment Grouping

I have two table in MySQL
Table 1: List of ID's
--Just a single column list of ID's
Table 2: Groups
--Group Titles
--Members **
Now the member field is basically a comments field where all the ID's that are part of that group are listed. So for instance one whole field of members looks like this:
"ID003|ID004|ID005|ID006|ID007|ID008|... Etc."
There they can be up to 500+ listed in the field.
What I would like to do is to run a query and find out which ID's appear in only three or less groups.
I've been taking cracks at it, but honestly I'm totally lost. Any ideas?
Edit; I misunderstood the question the first time, so I'm changing my answer.
SELECT l.id
FROM List_of_ids AS l
JOIN Groups AS g ON CONCAT('|', g.members, '|') LIKE CONCAT('%|', l.id, '|%')
GROUP BY l.id
HAVING COUNT(*) <= 3
This is bound to perform very poorly, because it forces a table-scan of both tables. If you have 500 id's and 500 groups, it must run 250000 comparisons.
You should really consider if storing a symbol-separated list is the right way to do this. See my answer to Is storing a delimited list in a database column really that bad?
The proper way to design such a relationship is to create a third table that maps id's to groups:
CREATE TABLE GroupsIds (
memberid INT,
groupid INT,
PRIMARY KEY (memberid, groupid)
);
With this table, it would be much more efficient by using an index for the join:
SELECT l.id
FROM List_of_ids AS l
JOIN GroupsIds AS gi ON gi.memberid = l.id
GROUP BY l.id
HAVING COUNT(*) <= 3
select * from
(
select ID,
(
select count(*)
From Groups
where LOCATE(concat('ID', a.id, '|'), concat(Members, '|'))>0
) as groupcount
from ListIDTable as a
) as q
where groupcount <= 3

MySQL: how to determine which rows in tables A and B are referenced by rows in table C in linear time?

I am working with a poorly designed database that I am not at liberty to restructure. In this database, there are three tables (let's call them 'companiesA', 'companiesB', and 'items') that are involved in a query that I need to optimize. 'companiesA' and 'companiesB' describe companies in the same way in that the column values are the same, but they represent two different groups of companies and have different column names. Essentially, the ID and company name columns are 'aID' and 'aName' in 'companiesA', and 'idB' and 'nameB' in 'companiesB'. 'items' contains a column, 'companyID', that contains a foreign key value from one of the two company tables.
The query I need to optimize gets a page's worth of company IDs and names from the union of the two tables, sorted by the names column, with an added column that states whether the row's company has any items associated with it. This query can also filter by the company names if the user requests it in the front-end. In its current state, I think it runs in THETA(companies * items) time, which is prohibitively slow:
select
a.aID as companyID,
a.aName as companyName,
(select
count(companyID)
from
items
where
companyID = a.aID
) as items
from
companiesA as a
where
a.aName like '%<string>%'
union
select
b.idB as companyID,
b.nameB as companyName,
(select
count(companyID)
from
items
where
companyID = b.idB
) as items
from
companiesB as b
where
b.nameB like '%<string>%'
order by
companyName ASC
limit
[optional_starting_index, ] 50;
It is not important that the items column contain the actual counts as this query returns (it was the only way I could figure out to cleanly return a value regarding the entire 'items' table). I suppose that I can count myself fortunate that with 1500 companies and 9000 items, this algorithm only takes seven seconds.
If I were writing this in another language in which I had access to the tables myself, I could easily write this in O(companies + items) time, but I am finding it difficult to figure out how to do so in MySQL. Is it possible to do this, preferably without stored functions or procedures? I CAN add them if necessary, but I have had a hard time adding them through phpMyAdmin now that the server's host only allows that interface to access the database by GUI.
In this solution, I took the daring assumption that the company names in each of the tables are unique by using Union All. If they are not, then you can switch back to Union but you'll get the performance hit of making the list unique. Basically, I'm eliminating your need for correlated subqueries to return the counts by using derived tables.
Select Companies.CompanyID, Companies.CompanyName
, Coalesce(ItemTotals.ItemCount,0) As ItemCount
From (
Select a.aID As CompanyID, a.aName As CompanyName
From companiesA As a
Where a.aName Like '%<string>%'
Union All
Select b.IDB, b.nameB
From companiesB As b
Where b.bName Like '%<string>%'
) As Companies
Left Join (
Select companyID, Count(*) As ItemCount
From items
Group By companyID
) As ItemTotals
On ItemTotals.companyID = Companies.CompanyID
Order By Company.CompanyName
Here is another variant. This one is similar to your original except that I replaced the correlated subqueries with two Group By queries. As before, if the names and IDs between the two tables are mutually exclusive, you can use Union All otherwise you will need to use Union.
Select Z.CompanyId, Z.CompanyName, Z.ItemCount
From (
Select A.companyID, A.aName As CompanyName
, Count(I.CompanyID) As ItemCount
From companiesA As A
Left Join items As I
On I.CompanyId = A.CompanyId
Where A.aName Like '%<string>%'
Group By A.companyID, A.aName
Union All
Select B.companyID, B.bName, Count(I.CompanyID)
From companiesB As B
Left Join items As I
On I.CompanyId = B.CompanyId
Where B.bName Like '%<string>%'
Group By B.companyID, B.bName
) As Z
Order By Z.CompanyName

SELECT * WHERE NOT EXISTS

I think I'm going down the right path with this one... Please bear with me as my SQL isn't the greatest
I'm trying to query a database to select everything from one table where certain cells don't exist in another. That much doesn't make a lot of sense but I'm hoping this piece of code will
SELECT * from employees WHERE NOT EXISTS (SELECT name FROM eotm_dyn)
So basically I have one table with a list of employees and their details. Then another table with some other details, including their name. Where there name is not in the eotm_dyn table, meaning there is no entry for them, I would like to see exactly who they are, or in other words, see what exactly is missing.
The above query returns nothing, but I know there are 20ish names missing so I've obviously not gotten it right.
Can anyone help?
You didn't join the table in your query.
Your original query will always return nothing unless there are no records at all in eotm_dyn, in which case it will return everything.
Assuming these tables should be joined on employeeID, use the following:
SELECT *
FROM employees e
WHERE NOT EXISTS
(
SELECT null
FROM eotm_dyn d
WHERE d.employeeID = e.id
)
You can join these tables with a LEFT JOIN keyword and filter out the NULL's, but this will likely be less efficient than using NOT EXISTS.
SELECT * FROM employees WHERE name NOT IN (SELECT name FROM eotm_dyn)
OR
SELECT * FROM employees WHERE NOT EXISTS (SELECT * FROM eotm_dyn WHERE eotm_dyn.name = employees.name)
OR
SELECT * FROM employees LEFT OUTER JOIN eotm_dyn ON eotm_dyn.name = employees.name WHERE eotm_dyn IS NULL
You can do a LEFT JOIN and assert the joined column is NULL.
Example:
SELECT * FROM employees a LEFT JOIN eotm_dyn b on (a.joinfield=b.joinfield) WHERE b.name IS NULL
SELECT * from employees
WHERE NOT EXISTS (SELECT name FROM eotm_dyn)
Never returns any records unless eotm_dyn is empty. You need to some kind of criteria on SELECT name FROM eotm_dyn like
SELECT * from employees
WHERE NOT EXISTS (
SELECT name FROM eotm_dyn WHERE eotm_dyn.employeeid = employees.employeeid
)
assuming that the two tables are linked by a foreign key relationship. At this point you could use a variety of other options including a LEFT JOIN. The optimizer will typically handle them the same in most cases, however.
You can also have a look at this related question. That user reported that using a join provided better performance than using a sub query.