MySQL if row is duplicated prefer which matches field value - mysql

I have the following select query. I want to avoid getting the duplicated "EN" row when "ES" row is present. Like prefer ES over EN.
SELECT s.soft_id,s.groupby,s.packageid,s.name,s.area,l.min,GROUP_CONCAT(DISTINCT JSON_ARRAY(s.version,s.detailid,s.filesize,s.updatetime)) versions
FROM software s
INNER JOIN langs l ON s.lang_id=l.lang_id
INNER JOIN devices_type t ON (s.familylock_id=t.familylock_id OR (s.familylock_id=0 AND s.devicelock_id=t.device_type_id))
INNER JOIN devices d ON t.device_type_id=d.device_type_id
INNER JOIN users u ON d.user_id=u.user_id
WHERE s.groupby IN(1,2,3)
AND u.token="abc"
AND d.serialno="123456789"
AND l.min IN("ES","EN")
GROUP BY s.soft_id,s.groupby,s.packageid,s.name,s.area,l.min ORDER BY s.name ASC
This is the example result:
image
You can test your query here: http://185.27.134.10/login.php?2=epiz_26706010wejghelqwdtg3e54gVGtSRk1VMUVRVE5QVkdzeFRWaDNhRWxUUldoSldIZzRaa2g0T0daSWVEaG1TSGhvVjIxc1JGb3lkRk5rV0U1cFZsRTlQUT09wejghelqwdtg3e54gsql102.epizy.comwejghelqwdtg3e54gepiz_26706010_test&db=epiz_26706010_test

I'd do it in two steps, count how many duplicate rows you have. And by duplicate, I mean identical on one column but differ in ES vs EN. have the table sorted
select the last among the duplicates
Get top first record from duplicate records having no unique identity

Outer join the table twice, looking for a single specific value with each join, and then coalesce the fields in the order you prefer:
SELECT s.soft_id, coalesce(les.min, len.min) As min, ...
FROM software s
LEFT JOIN langs les ON s.lang_id=les.lang_id AND les.min = 'ES'
LEFT JOIN langs len ON s.lang_id=len.lang_id AND len.min = 'EN'
...
WHERE s.groupby IN(1,2,3)
AND coalesce(les.lang_id, len.lang_id) IS NOT NULL
...
Windowing functions can do it more efficiently, but they're still not support on lots of MySql servers in the wild. If you're using 8.0 or later, you should look into that option.

You can use window functions as:
with q as (
<your query here>
)
select q.*
from (select q.*,
row_number() over (partition by soft_id order by field(l.min, 'ES', 'EN')) as seqnum
from q
) q
where seqnum = 1;

Related

How can I eliminate duplicates using MAX function?

I have these tables
recommendation_object_id, exhibitor_name, event_edition_id, timestamp
I want to hide/remove the duplicates in recommendation_object_id to make it a primary key.
I successfully removed most of the dups, but a few recommendation id's have a different event edition id so some id's are still duplicating as a result.
A colleague of mine said I could eliminate those further by using max(timestamp) but I could not pull it off :(
My current query is this:
SELECT DISTINCT r.recommended_object_id, ed.exhibitor_name, sd.event_edition_id, r.object_type, max(r.timestamp)
FROM recommendations r
left join show_details sd on r.event_edition_id = sd.event_edition_id
left join exhibitor_details ed on r.recommended_object_id = ed.exhibitor_id
group by r.recommended_object_id, ed.exhibitor_name, sd.event_edition_id, r.object_type
order by r.recommended_object_id
If you want one row per recommended_object_id, the one with the most recent timestamp, then use window functions:
select r.*
from (select r.recommended_object_id, ed.exhibitor_name, sd.event_edition_id, r.object_type,
row_number() over (partition by recommended_object_id order by r.timestamp desc) as seqnum
from recommendations r left join
show_details sd
on r.event_edition_id = sd.event_edition_id left join
exhibitor_details ed
on r.recommended_object_id = ed.exhibitor_id
) r
where seqnum = 1
order by r.recommended_object_id;

Speeding up mysql query

I have a mysql query to join four tables and I thought that it was just best to join tables but now that mysql data is getting bigger the query seems to cause the application to stop execution.
SELECT
`purchase_order`.`id`,
`purchase_order`.`po_date` AS po_date,
`purchase_order`.`po_number`,
`purchase_order`.`customer_id` AS customer_id ,
`customer`.`name` AS customer_name,
`purchase_order`.`status` AS po_status,
`purchase_order_items`.`product_id`,
`purchase_order_items`.`po_item_name`,
`product`.`weight` as product_weight,
`product`.`pending` as product_pending,
`product`.`company_owner` as company_owner,
`purchase_order_items`.`uom`,
`purchase_order_items`.`po_item_type`,
`purchase_order_items`.`order_sequence`,
`purchase_order_items`.`pending_balance`,
`purchase_order_items`.`quantity`,
`purchase_order_items`.`notes`,
`purchase_order_items`.`status` AS po_item_status,
`purchase_order_items`.`id` AS po_item_id
FROM `purchase_order`
INNER JOIN customer ON `customer`.`id` = `purchase_order`.`customer_id`
INNER JOIN purchase_order_items ON `purchase_order_items`.`po_id` = `purchase_order`.`id`
INNER JOIN product ON `purchase_order_items`.`product_id` = `product`.`id`
GROUP BY id ORDER BY `purchase_order`.`po_date` DESC LIMIT 0, 20
my problem really is the query that takes a lot of time to finish. Is there a way to speed this query or to change this query for faster retrieval of the data?
heres the EXPLAIN EXTENED as requested in the comments.
Thanks in advance, I really hope this is the right channel for me to ask. If not please let me know.
Will this give you the correct list of ids?
SELECT id
FROM purchase_order
ORDER BY`po_date` DESC
LIMIT 0, 20
If so, then start with that before launching into the JOIN. You can also (I think) get rid of the GROUP BY that is causing an "explode-implode" of rows.
SELECT ...
FROM ( SELECT id ... (as above) ...) AS ids
JOIN purchase_order po ON po.id = ids.id
JOIN ... (the other tables)
GROUP BY ... -- (this may be problematic, especially with the LIMIT)
ORDER BY po.po_date DESC -- yes, this needs repeating
-- no LIMIT
Something like this
SELECT
`purchase_order`.`id`,
`purchase_order`.`po_date` AS po_date,
`purchase_order`.`po_number`,
`purchase_order`.`customer_id` AS customer_id ,
`customer`.`name` AS customer_name,
`purchase_order`.`status` AS po_status,
`purchase_order_items`.`product_id`,
`purchase_order_items`.`po_item_name`,
`product`.`weight` as product_weight,
`product`.`pending` as product_pending,
`product`.`company_owner` as company_owner,
`purchase_order_items`.`uom`,
`purchase_order_items`.`po_item_type`,
`purchase_order_items`.`order_sequence`,
`purchase_order_items`.`pending_balance`,
`purchase_order_items`.`quantity`,
`purchase_order_items`.`notes`,
`purchase_order_items`.`status` AS po_item_status,
`purchase_order_items`.`id` AS po_item_id
FROM (SELECT id, po_date, po_number, customer_id, status
FROM purchase_order
ORDER BY `po_date` DESC
LIMIT 0, 5) as purchase_order
INNER JOIN customer ON `customer`.`id` = `purchase_order`.`customer_id`
INNER JOIN purchase_order_items
ON `purchase_order_items`.`po_id` = `purchase_order`.`id`
INNER JOIN product ON `purchase_order_items`.`product_id` = `product`.`id`
GROUP BY purchase_order.id DESC
LIMIT 0, 5
You need to be sure that purchase_order.po_date and all id column are indexed. You can check it with below query.
SHOW INDEX FROM yourtable;
Since you mentioned that data is getting bigger. I would suggest doing sharding and then you can parallelize multiple queries. Please refer to the following article
Parallel Query for MySQL with Shard-Query
First, I cleaned up readability a bit. You don't need tick marks around every table.column reference. Also, for short-hand, using aliases works well. Ex: "po" instead of "purchase_order", "poi" instead of "purchase_order_items". The only time I would use tick marks is around reserved words that might cause a problem.
Second, you don't have any aggregations (sum, min, max, count, avg, etc.) in your query so you should be able to strip the GROUP BY clause.
As for indexes, I would have to assume you have an index on your reference tables on their respective "id" key columns.
For your Purchase Order table, I would have an index on that based on the "po_date" in the first index field position in case you already had an index using it. Since your Order by is on that, let the engine jump directly to those dated records first and you have your descending order resolved.
SELECT
po.id,
po.po_date,
po.po_number,
po.customer_id,
c.`name` AS customer_name,
po.`status` AS po_status,
poi.product_id,
poi.po_item_name,
p.weight as product_weight,
p.pending as product_pending,
p.company_owner,
poi.uom,
poi.po_item_type,
poi.order_sequence,
poi.pending_balance,
poi.quantity,
poi.notes,
poi.`status` AS po_item_status,
poi.id AS po_item_id
FROM
purchase_order po
INNER JOIN customer c
ON po.customer_id = c.id
INNER JOIN purchase_order_items poi
ON po.id = poi.po_id
INNER JOIN product p
ON poi.product_id = p.id
ORDER BY
po.po_date DESC
LIMIT
0, 20

Duplicate column name SQL - need change alias?

I have written SQL query with a INNER JOIN and Sub-query:
SELECT c.*,
ar.ArticleName,
ar.idArticle,
du.DetailToUsersName,
du.DetailToUsersPhoto,
COUNT(c.idCommentToArticle) AS CNT,
CASE WHEN d.Count IS NULL THEN 0 ELSE d.Count END AS CountLikes
from (select *
from commenttoarticle g
inner join (select distinct(s.idCommentToArticle)
from commenttoarticle s
order by s.CommentToArticlePID limit 3) as gh) as c
LEFT JOIN article ar ON c.CommentToArticleIdArticle = ar.idArticle
LEFT JOIN detailtousers du ON du.idDetailToUsers = c.CommentToArticleIdUser
LEFT JOIN `likes` d ON (d.IdNote = c.idCommentToArticle AND d.LikeType = 6)
WHERE c.CommentToArticleIdArticle = 11
GROUP BY c.idCommentToArticle
ORDER BY c.idCommentToArticle DESC
So, I get error:
Duplicate column name 'idCommentToArticle'
I can not find where the duplication is?
you can specify in the alias table query c
select g.* from commenttoarticle g
instead of
select * from commenttoarticle g
Also you should specify Join condition to limit the rows to 3 as per your intention, with out the ON clause it will be like a cross join.
select g.* from commenttoarticle g
inner join (select distinct(s.idCommentToArticle) from commenttoarticle s order by s.CommentToArticlePID limit 3) as gh
on g.idcommenttoarticle = gh.idcommenttoarticle
As #RADAR has suggested, your inner query joins don't seem to be complete. And I see from comments that once you place the JOIN condition in, then you lose all data. I think this is because neither part of the subqueries were doing what they were supposed to do.
Here is my attempt at a total solution (note, without dataset and table definition I can't show it working). OK, so you have asked the question again over here and provided a SQL-Fiddle, I have updated with a working version, but minus the additional JOIN tables, since they are not defined.
SELECT c.*,
ar.ArticleName,
ar.idArticle,
du.DetailToUsersName,
du.DetailToUsersPhoto,
COUNT(c.idCommentToArticle) AS CNT,
CASE WHEN d.Count IS NULL THEN 0 ELSE d.Count END AS CountLikes
FROM commenttoarticle c -- one layer of subquery not required.
INNER JOIN (select s.idCommentToArticle, s.CommentToArticlePID -- added both the id and the parent id
FROM commenttoarticle s
WHERE s.CommentToArticleIdArticle = 11 -- moved to inner query, instead of outer query
ORDER BY s.idCommentToArticle DESC limit 3) as gh
ON c.idcommenttoarticle = gh.idcommenttoarticle -- add join condition
OR c.idcommenttoarticle = gh.CommentToArticlePID -- which matches id and parent id
LEFT JOIN article ar ON c.CommentToArticleIdArticle = ar.idArticle
LEFT JOIN detailtousers du ON du.idDetailToUsers = c.CommentToArticleIdUser
LEFT JOIN `likes` d ON (d.IdNote = c.idCommentToArticle AND d.LikeType = 6)
GROUP BY c.idCommentToArticle
ORDER BY c.idCommentToArticle DESC
But let me explain a little further, the following code from your original query was selecting the top 3 idCommentToArticlePID,
(select *
from commenttoarticle g
inner join (select distinct(s.idCommentToArticle)
from commenttoarticle s
order by s.CommentToArticlePID limit 3) as gh)
but then because there was no ON specified the 3 records were then joined to every single record from the g reference. This resulted in the full dataset being returned.
And then you you specified WHERE c.CommentToArticleIdArticle = 11 this filtered the result set back down again to something that looked correct.
When you then added the ON (as per #RADAR's suggestion) the inner query did not contain any values that matched the WHERE c.CommentToArticleIdArticle = 11 filter and thus you lost all your results. If you move this filter into the inner query as shown above, then these will work together and not conflict.
Within the JOIN condition, you indicate that you want both the matching articles and their parents, so I added both to the return of the inner query, and checked for either in the join condition.
Also I think the whole g table reference is redundant and can be removed. You should be able to access this table directly as c.
I also have some concerns about the GROUP BY and COUNT (c.idCommentToArticle) - there seem a little strange, but I have no supporting context (ie data examples), so they may be correct. If you still have issues, I would comment the GROUP BY and COUNT statements out, and test to see what data you are getting, before adding these back in.

MySQL: why does GROUP_CONCAT lose rows in my multiple join query?

I'm trying to fetch all the rows from table_m which also have an index in table_mi and I'm expecting to get 2 rows as a result (with m.id=3 and m.id=9) - but if I add GROUP_CONCAT to my select then I only get one row returned. Am I having a misshap somewhere within those joins of mine?
Query:
SELECT
m.id,
m.name,
m.keyword,
IFNULL(GROUP_CONCAT(r.keyword),'TEST') AS restrictions
FROM
table_m AS m
INNER JOIN
table_mi as mi ON m.id=mi.m_id
LEFT JOIN
table_ri as ri ON m.id=ri.m_id
LEFT JOIN
table_r AS r ON ri.r_id=r.id
WHERE
(
m.id>0
AND m.active=1
AND mi.p_id=0
AND (mi.pa_id="11" OR (mi.pa_id=0 AND mi.id!=0))
AND mi.u_id=IF((SELECT id FROM table_mi WHERE p_id=0 AND pa_id="11" AND u_id="2")>0,"2",0)
) OR mi.id=0
ORDER BY
mi.priority;
This is what I'm getting as a result:
ID NAME KEYWORD RESTRICTIONS
9 test_a key_a r_key_2,r_key_3,r_key_4
This is what I'm expecting:
ID NAME KEYWORD RESTRICTIONS
9 test_a key_a r_key_2,r_key_3,r_key_4
3 test_b key_b TEST
Please see my full example with schema on sql fiddle: http://sqlfiddle.com/#!2/359d9/1
GROUP_CONCAT is an aggregate function. It will bring back a single row UNLESS you specify a GROUP BY clause (with any fields that are not in the GROUP BY being aggregate fields)
Before the ORDER BY add the following:-
GROUP BY m.id, m.name, m.keyword
That said it looks like you might want to use CONCAT to join 2 values together rather than GROUP_CONCAT
As an aside, your SQL might be easier to read if you eliminate the subselect. Assuming it is bringing back a single record then possibly as follows
SELECT
m.id,
m.name,
m.keyword,
IFNULL(GROUP_CONCAT(r.keyword),'TEST') AS restrictions
FROM
table_m AS m
INNER JOIN
table_mi as mi ON m.id=mi.m_id
LEFT JOIN
table_ri as ri ON m.id=ri.m_id
LEFT JOIN
table_r AS r ON ri.r_id=r.id
LEFT OUTER JOIN
table_mi AS mi2 ON mi2.p_id=0 AND mi2.pa_id="11" AND mi2.u_id="2"
WHERE
(
m.id>0
AND m.active=1
AND mi.p_id=0
AND (mi.pa_id="11" OR (mi.pa_id=0 AND mi.id!=0))
AND mi.u_id=IF(mi2.id >0,"2",0)
) OR mi.id=0
ORDER BY
mi.priority;
You do no need GROUP_CONCAT to achieve what you want.
Instead of :
IFNULL(GROUP_CONCAT(r.keyword),'TEST') AS restrictions
use
IFNULL(r.keyword,'TEST') AS restrictions
OR:
Keep the query as it is and add GROUP BY m.id before ORDER BY

Why Does My MySQL Query Using a Subselect Hang?

The following query hangs: (although subqueries perfomed separately are fine)
I don't know how to make the explain table look ok. If someone tells me, I'll clean it up.
select
sum(grades.points)) as p,
from assignments
left join grades using (assignmentID)
where gradeID IN
(select grades.gradeID
from assignments
left join grades using (assignmentID)
where ... grades.date <= '1255503600' AND grades.date >= '984902400'
group by assignmentID order by grades.date DESC);
I think the problem is with the first grades table... the type ALL with that many rows seems to be the cause.. Everything is indexed.
I uploaded the table as an image. Couldn't get the formatting right:
http://imgur.com/AjX34.png
A commenter wanted the full where clause:
explain extended select count(assignments.assignmentID) as asscount, sum(TRIM(TRAILING '-' FROM grades.points)) as p, sum(assignments.points) as t
from assignments left join grades using (assignmentID)
where gradeID IN
(select grades.gradeID from assignments left join grades using (assignmentID) left join as_types on as_types.ID = assignments.type
where assignments.classID = '7815'
and (assignments.type = 30170 )
and grades.contactID = 7141
and grades.points REGEXP '^[-]?[0-9]+[-]?'
and grades.points != '-'
and grades.points != ''
and (grades.pointsposs IS NULL or grades.pointsposs = '')
and grades.date <= '1255503600'
AND grades.date >= '984902400'
group by assignmentID
order by grades.date DESC);
See "The unbearable slowness of IN":
http://www.artfulsoftware.com/infotree/queries.php#568
Super messy, but: (thanks for everyone's help)
SELECT *
FROM grades
LEFT JOIN assignments ON grades.assignmentID = assignments.assignmentID
RIGHT JOIN (
SELECT g.gradeID
FROM assignments a
LEFT JOIN grades g
USING ( assignmentID )
WHERE a.classID = '7815'
AND (
a.type =30170
)
AND g.contactID =7141
g.points
REGEXP '^[-]?[0-9]+[-]?'
AND g.points != '-'
AND g.points != ''
AND (
g.pointsposs IS NULL
OR g.pointsposs = ''
)
AND g.date <= '1255503600'
AND g.date >= '984902400'
GROUP BY assignmentID
ORDER BY g.date DESC
) AS t1 ON t1.gradeID = grades.gradeID
Suppose you use a Real Database (ie, any database except MySQL, but I'll use Postgres as an example) to do this query :
SELECT * FROM ta WHERE aid IN (SELECT subquery)
a Real Database would look at the subquery and estimate its rowcount :
If the rowcount is small (say, less than a few millions)
It would run the subquery, then build an in-memory hash of ids, which also makes them unique, which is a feature of IN().
Then, if the number of rows pulled from ta is a small part of ta, it would use a suitable index to pull the rows. Or, if a major part of the table is selected, it would just scan it entirely, and lookup each id in the hash, which is very fast.
If however the subquery rowcount is quite large
The database would probably rewrite it as a merge JOIN, adding a Sort+Unique to the subquery.
However, you are using MySQL. In this case, it will not do any of this (it is gonna re-execute the subquery for each row of your table) so it will take 1000 years. Sorry.
If your subquery performs fine when it is executed separately, then try using a JOIN rather than IN, like this:
select count(assignments.assignmentID) as asscount, sum(TRIM(TRAILING '-' FROM grades.points)) as p, sum(assignments.points) as t
from assignments left join grades using (assignmentID)
join
(select grades.gradeID from assignments left join grades using (assignmentID) left join as_types on as_types.ID = assignments.type
where assignments.classID = '7815'
and (assignments.type = 30170 )
and grades.contactID = 7141
and grades.points REGEXP '^[-]?[0-9]+[-]?'
and grades.points != '-'
and grades.points != ''
and (grades.pointsposs IS NULL or grades.pointsposs = '')
and grades.date <= '1255503600'
AND grades.date >= '984902400'
group by assignmentID
order by grades.date DESC) using (gradeID);
There really isn't enough information to answer your question, and you've put a ... in the middle of the where clause which is weird. How big are the tables involved and what are the indexes?
Having said that, if there are too many terms in an in clause, you can see seriously degraded performance. Replace the use of in with a right join.
For starters, the table as_types in the in clause is not used. Left joining it serves no purpose so get rid of it.
That leaves the in clause having only the assignments and grades table from the outer query. Clearly the wheres the modify assignments belong in the where clause for the outer query. You should move all of the where grades=whatever into the on clause of the left join to grades.
The query is a little tough to follow, but I suspect that the subquery isn't necessary at all.
It seems like your query is basically thus:
SELECT FOO()
FROM assignments LEFT JOIN grades USING (assignmentID)
WHERE gradeID IN
(
SELECT grades.gradeID
FROM assignments LEFT JOIN grades USING (assignmentID)
WHERE your_conditions = TRUE
);
But, you're not doing anything really fancy in the where clause in the subquery.
I suspect something more like
SELECT FOO()
FROM assignments LEFT JOIN grades USING (assignmentID)
GROUP BY groupings
WHERE your_conditions_with_some_tweaks = TRUE;
would work just as well.
If I'm missing some key logic here please comment back and I'll edit/delete this post.