mysql query with HAVING COUNT clause - mysql

So, I have a particular query that I'm trying to tweak a bit. The needs for the project changed a bit and I'm not sure how to approach this.
I have 3 tables - a main table, a "tags" table and then a linking table to tie the tags to the main entries. The wrinkle is that there is a weight given to that linkage and that is used to sum the total weight of the tags linked to a particular Name entry in the main table. In short, a main entry might have multiple tags on it, each with a different weight. The current query sums all of the tag weights and orders them by the total sum of all tags.
UID | Name
-----------------
123 | Robert
UID | Tag_Name
-----------------
1 | Name_One
2 | Name_Two
Tag_ID | Name_ID | Weight
-----------------------------
2 | Name_One | 2
1 | Name_Two | 3
1 | Name_One | 5
Currently, I've built this that accomplishes this fine where 2,1 is a string of the tag id's that I'm looking to match:
SELECT person.id,
SUM(linkage.weight) AS total_weight
FROM (person)
INNER JOIN linked_tags AS linkage
ON linkage.track_id = person.id AND linkage.tag_id IN (2,1)
GROUP BY person.id
HAVING COUNT(DISTINCT linkage.tag_id)=1
ORDER BY total_weight DESC
I want to extend this for another use. Right now, the tag id's that are passed in as a string are subtractive. It only finds matches where both tag id's exist for a certain person id. If I wanted to pass another string of id's in, where if ANY person id's match ANY of the tag id's out of that string, followed by the current subtractive string of id's, plus sum the weight of those tags, how might I go about it?

I believe the correct having clause for your query is:
HAVING COUNT(DISTINCT linkage.tag_id)=2
Your version finds exactly 1 tag.
The following version of the query has tags 3 and 4 being optional:
SELECT person.id, SUM(linkage.weight) AS total_weight
FROM person INNER JOIN
linked_tags AS linkage
ON linkage.track_id = person.id AND
linkage.tag_id IN (2, 1, 3, 4)
GROUP BY person.id
HAVING COUNT(DISTINCT case when linkage.tag_id in (1, 2) then linkage.tag_id end) = 2
ORDER BY total_weight DESC ;
The big difference is the use of the case statement in the count(distinct) clause.

Related

mysql select rows including related

I have table:
id | parent | regno | person
1 | 0 | 12 | 5
2 | 1 | 12 | 15
3 | 0 | 13 | 5
4 | 0 | 14 | 6
I have MySQL query...
SELECT *
FROM table
WHERE person='5';
...that returns rows 1 and 3.
In this table row 1 and 2 are related (same regno).
How can i build this query to include related rows?
Basically when searching for person 5 i need MySQL query to return following:
id | parent | regno | person
1 | 0 | 12 | 5
2 | 1 | 12 | 15
3 | 0 | 13 | 5
Parent column has id of column it is related to, but it can be positive and negative integer. All related rows always have same regno.
Thank you.
You want all people who have a regno that is the same as the regno of anyone who is person 5:
--this main query finds all people with the regno from the subquery
SELECT *
FROM table
WHERE regno IN
( --this subquery finds the list of regno
SELECT regno
FROM table
WHERE person = '5'
)
There are other ways to write this; i'm not a fan of IN, and personally would write it like this:
SELECT t.*
FROM table t
INNER JOIN
(
SELECT DISTINCT regno
FROM table
WHERE person = '5'
) u
WHERE t.regno = u.regno
But it's harder to understand, and it's quite likely that these queries would end up being executed identically internally anyway. In this form the DISTINCT is required to make the regno from the subquery unique. If it were not, joined rows would end up duplicated. Why do I prefer it over IN? In some database systems IN's implementation can be very naive and low performing. "Never use IN to create a list longer then you would write by hand" is an old mantra I tend to stick to. This join pattern is also more flexible, can work with multiple values. Not every database supports Oracle-esque where x,y in ((1,3),(3,4)) value multiples
As an aside (and partly in response to the first comment on this answer) it would be more typical and more useful/usual to have the database prepare a set of rows that had parent and child data on the same line
It would look more like this:
SELECT *
FROM
table c
LEFT OUTER JOIN
table p
ON c.regno = p.regno AND p.parent = 1
WHERE c.person = '5' AND c.parent=0
This is assuming your "parent" column is 0 1 indicating true false.. you seem to have made a comment that parent is the id of the relative (not sure if it's parent-of or parent-is)
For a table where there is an id, and parentid column, and the parentid is set to a value when the row is a child of that other id;
id, parentid, name
1, null, Daddy
2, 1, Little Jonny
3, 1, Little Sarah
That looks like:
SELECT *
FROM
table c
INNER JOIN
table p
ON c.parentid = p.id
WHERE p.parentid ID NULL
Rows can have only one parent. A NULL in the parent id defines the row as being a parent, otherwise it's a child. You could turn this logic on its head if you wanted, call the column isparentof and have all child rows with null in the isparentof, and anyone who is a parent of a child, out the child id in isparentof. This then limits you to one child per multiple parents (single child families).. the query to pull them out is broadly the same
You can get all the id values for the person = '5' in a Derived Table.
Now, join back to the main table, matching either the absolute of parent (to get the child row(s)) or the id (to get the parent id row itself).
Based on discussion in comments, Try:
SELECT t.*
FROM your_table AS t
JOIN
(
SELECT id AS parent_id
FROM your_table
WHERE person = '5'
) AS dt
ON dt.parent_id = ABS(t.parent) OR
dt.parent_id = t.id
It is hard to comprehend though, why would you put negative values in parent!

SQL, Find orders where strict critera is met, Match item, not if other items purchased

I have stumped all the IT people at my work with this one, so wondeirng if anyone can help.
I need to extract from an order table anyone who has only purchased a specific product type, (if they have order the product type and any other product types i dont want to know who you are)
for example the table is roughly
---------------------------------------------------------------------------------------
Order ID | item code | Name |
----------------------------------------------------------------------------------------
1 | ADA | item 1
2 | ADA | item 1
2 | GGG | item 2
3 | ADA | item 1
----------------------------------------------------------------------------------------
So i want to find all the order IDs of people who only purchased item code ADA, BUT not if they purchased over items, so the output of this query should be order ID 1 & 3 and skipping order 2 as this had a different item.
Would really appriciate it if anyone could help.
Assuming an order can't have multiple records with the same ItemCode, you could use:
SELECT *
FROM Orders
WHERE OrderID IN (
SELECT OrderID
FROM Orders
GROUP BY OrderID HAVING COUNT(*) = 1
)
AND ItemCode = 'ADA'
If an order could have multiple records with the same ItemCode then you'd have to change the SELECT * to SELECT DISTINCT * and then COUNT(*) to COUNT(DISTINCT ItemCode)
Based on your current explanation and example, the below should work. However, there are outstanding questions in the comments which may change the actual correct solution.
SELECT
O.OrderId, MAX(itemCode), MAX(Name)
FROM
Orders O
INNER JOIN
(SELECT
OrderId
FROM
Orders
WHERE
itemCode = 'ADA') ADA
ON
O.OrderId = ADA.OrderId
GROUP BY
O.OrderId
HAVING
COUNT(*) = 1

Selecting a value from multiple rows in MySQL

I have a little problem with my SQL sentence. I have a table with a product_id and a flag_id, now I want to get the product_id which matches all the flags specified. I know you have to inner join it self, to match more than one, but I don't know the exact SQL for it.
Table for flags
product_id | flag_id
1 1
1 51
1 23
2 1
2 51
3 1
I would like to get all products which have flag_id 1, 51 and 23.
get the product_id which matches all the flags specified
This problem is called Relational Division. One way to solve it, is to do this:
GROUP BY product_id .
Use the IN predicate to specify which flags to match.
Use the HAVING clause to ensure the flags each product have,
like this:
SELECT product_id
FROM flags
WHERE flag_id IN(1, 51, 23)
GROUP BY product_id
HAVING COUNT(DISTINCT flag_id) = 3
The HAVING clause will ensure that the selected product_id must have both the three flags, if it has only one or two of them it will be eliminated.
See it in action here:
SQL Fiddle Demo
This will give you only:
| PRODUCT_ID |
--------------
| 1 |
try this:
SELECT *
FROM your_table
WHERE flag_id IN(1,2,..);
Firstly it would help if you can specify what you have tried before, but as I understood you need to get products with certain flags, so you can just use WHERE:
SELECT product_id FROM Product WHERE flag_id IN (1,2,3,4,5)
Try:
SELECT *
FROM TABLE_NAME A INNER JOIN TABLE_NAME B ON A.product_id = B.product_id

Optimize a SQL query for tag matching

Example dataset:
id | tag
---|------
1 | car
1 | bike
2 | boat
2 | bike
3 | plane
3 | car
id and tag are both indexed.
I am trying to get the id who matches the tags [car, bike] (the number of tags can vary).
A naive query to do so would be:
SELECT id
FROM test
WHERE tag = 'car'
OR tag = 'bike'
GROUP BY id
HAVING COUNT(*) = 2
However, doing so is quite inefficient because of the group by and the fact that any line that match one tag is taken into account for the group by (and I have a large volumetry).
Is there a more efficient query for this situation?
The only solution I see would be to have another table containing something like:
id | hash
---|------
1 | car,bike
2 | boat,bike
3 | plane,car
But this is not an easy solution to implement and maintain up to date.
Additional infos:
the name matching must be exact (no fulltext index)
the number of tags is not always 2
try this:
SELECT id
FROM test
WHERE tag in('car','bike')
GROUP BY id
HAVING COUNT(*) = 2
And create a nonclustered index on tag column
Here you go:
select id from TEST where tag = 'car' and ID in (select id from TEST where tag='bike')
not sure if I get you, but try this:
select tag, count(*) as amount
into #temp
from MYTABLE
group by tag
select t1.tag
from #temp t1 join #temp t2 on t1.amount=t2.amount and t1.tag=t2.tag and t1.amount=2
should result bike and car since they both have 2 rows, whihc is equal to 2

How to filter duplicates within row using Distinct/group by with JOINS

For simplicity, I will give a quick example of what i am trying to achieve:
Table 1 - Members
ID | Name
--------------------
1 | John
2 | Mike
3 | Sam
Table 1 - Member_Selections
ID | planID
--------------------
1 | 1
1 | 2
1 | 1
2 | 2
2 | 3
3 | 2
3 | 1
Table 3 - Selection_Details
planID | Cost
--------------------
1 | 5
2 | 10
3 | 12
When i run my query, I want to return the sum of the all member selections grouped by member. The issue I face however (e.g. table 2 data) is that some members may have duplicate information within the system by mistake. While we do our best to filter this data up front, sometimes it slips through the cracks so when I make the necessary calls to the system to pull information, I also want to filter this data.
the results SHOULD show:
Results Table
ID | Name | Total_Cost
-----------------------------
1 | John | 15
2 | Mike | 22
3 | Sam | 15
but instead have John as $20 because he has plan ID #1 inserted twice by mistake.
My query is currently:
SELECT
sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
SELECT
m.id, m.name, g.premium
FROM members m
INNER JOIN member_selections s USING(ID)
INNER JOIN selection_details g USING(planid)
) sq group by sq.agent
Adding DISTINCT s.planID filters the results incorrectly as it will only show a single PlanID 1 sold (even though members 1 and 3 bought it).
Any help is appreciated.
EDIT
There is also another table I forgot to mention which is the agent table (the agent who sold the plans to members).
the final group by statement groups ALL items sold by the agent ID (which turns the final results into a single row).
Perhaps the simplest solution is to put a unique composite key on the member_selections table:
alter table member_selections add unique key ms_key (ID, planID);
which would prevent any records from being added where the unique combo of ID/planID already exist elsewhere in the table. That'd allow only a single (1,1)
comment followup:
just saw your comment about the 'alter ignore...'. That's work fine, but you'd still be left with the bad duplicates in the table. I'd suggest doing the unique key, then manually cleaning up the table. The query I put in the comments should find all the duplicates for you, which you can then weed out by hand. once the table's clean, there'll be no need for the duplicate-handling version of the query.
Use UNIQUE keys to prevent accidental duplicate entries. This will eliminate the problem at the source, instead of when it starts to show symptoms. It also makes later queries easier, because you can count on having a consistent database.
What about:
SELECT
sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
SELECT
m.id, m.name, g.premium
FROM members m
INNER JOIN
(select distinct ID, PlanID from member_selections) s
USING(ID)
INNER JOIN selection_details g USING(planid)
) sq group by sq.agent
By the way, is there a reason you don't have a primary key on member_selections that will prevent these duplicates from happening in the first place?
You can add a group by clause into the inner query, which groups by all three columns, basically returning only unique rows. (I also changed 'premium' to 'cost' to match your example tables, and dropped the agent part)
SELECT
sq.ID,
sq.name,
SUM(sq.Cost) AS total_cost
FROM
(
SELECT
m.id,
m.name,
g.Cost
FROM
members m
INNER JOIN member_selections s USING(ID)
INNER JOIN selection_details g USING(planid)
GROUP BY
m.ID,
m.NAME,
g.Cost
) sq
group by
sq.ID,
sq.NAME