MySQL: Select on GROUP BY only one row with certain criteria

MySQL: Select on GROUP BY only one row with certain criteria - mysql

I am having a table with documents where each document has a doc_id but on the same date for the same case_id I might be having two different language versions
doc_id case_id date lang
001-89259 1012/02 2008-11-04 FRA
001-144945 10122/04 2014-06-19 ENG
001-57558 10126/82 1988-06-21 ENG
001-62116 10126/82 1988-06-21 FRA
001-91708 10129/04 2009-03-10 FRA
001-116955 10131/11 2013-03-07 FRA
001-102676 10143/07 2011-01-11 FRA
001-104520 10145/07 2011-04-12 FRA
001-72756 10162/02 2006-03-09 FRA
001-72757 10162/02 2006-03-09 ENG
001-82198 10163/02 2007-09-06 ENG
001-57555 10208/82 1988-05-26 ENG
001-62113 10208/82 1988-05-26 FRA
What I want to do is to select the english version, if available, per case_id, date, otherwise keep the french. My output would then look like:
doc_id case_id date lang
001-89259 1012/02 2008-11-04 FRA
001-144945 10122/04 2014-06-19 ENG
001-57558 10126/82 1988-06-21 ENG -- keep only the english version
001-91708 10129/04 2009-03-10 FRA
001-116955 10131/11 2013-03-07 FRA
001-102676 10143/07 2011-01-11 FRA
001-104520 10145/07 2011-04-12 FRA
001-72757 10162/02 2006-03-09 ENG -- keep only the english version
001-82198 10163/02 2007-09-06 ENG
001-57555 10208/82 1988-05-26 ENG -- keep only the english version
How can I do it with MySQL?
UPDATE:
All answers give the correct result but I marked Görkem's as correct as IMO is the most elegant and straight-forward as of why it works.
I initially accepted Görkem's answer but for some reason it returned one wrong result that Strawberry pointed out. That leaves Strawberry's answer as the most elegant and correct

SELECT DISTINCT COALESCE(e.doc_id,f.doc_id) doc_id
, f.case_id
, f.date
, COALESCE(e.lang,f.lang) lang
FROM my_table f
LEFT
JOIN my_table e
ON e.case_id = f.case_id
AND e.date = f.date
AND e.lang = 'ENG';

SELECT
sorted.doc_id,
sorted.case_id,
sorted.date,
sorted.lang
FROM (
SELECT
doc_id,
case_id,
date,
lang
FROM tbl
ORDER BY FIELD(lang, 'ENG', 'FRA')
) sorted
GROUP BY sorted.case_id

If this SQL is required for some research, there is a way to get the expected result set:
Select SUBSTRING_INDEX(GROUP_CONCAT(doc_id ORDER BY lang ), ',', 1) doc_id, case_id, date, SUBSTRING_INDEX(GROUP_CONCAT(lang ORDER BY lang), ',', 1) lang from table group by case_id,date

SELECT
doc_id,
case_id,
date,
lang,
max(case lang when 'ENG' then 1 else 0 end)
FROM tbl
GROUP BY case_id

Related

does count automatically sum up similar values without a group by statement

THIS IS THE INPUT
team_1 team_2 winner
Aus India India
Eng NZ NZ
India SL India
SA Eng Eng
SL Aus Aus
OUTPUT
team_name matches_played no_of_wins
India 2 2
SL 2 NULL
SA 1 NULL
Eng 2 1
Aus 2 1
NZ 1 1
This is the MYSQL solution for the problem:
WITH CTE AS (SELECT team_1 team_name,winner FROM icc_world_cup
UNION ALL
SELECT team_2 team_name,winner FROM icc_world_cup)
SELECT DISTINCT team_name, # first column
COUNT(team_name) as Macthes_played, #second column
(SELECT COUNT(*) FROM
(SELECT IF(team_1=winner,team_1,team_2) win_team FROM icc_world_cup )a
WHERE team_name=win_team GROUP BY win_team) no_of_wins #third column
FROM CTE GROUP BY team_name
The above output is what I got from the code which I have written but the problem is
If I remove the GROUP BY statement in the third column that is
GROUP BY win_team
Then the output was something like this
team_name matches_played no_of_wins
India 2 2
SL 2 0
SA 1 0
Eng 2 1
Aus 2 1
NZ 1 1
How the count is able sum up team india's wins that is 2 without a group by statement, does it have something to with the where clause condition and
NOTICE that the NULL values in the third column were replaced by 0's.
How is it possible that without a group by statement my count function is able to sum up similar values and how the null are changed to 0.

I would use a union approach here:
SELECT team_name, COUNT(*) AS matches_played, SUM(win) AS no_of_wins
FROM
(
SELECT team_1 AS team_name, IF(team_1 = winner, 1, 0) AS win FROM yourTable
UNION ALL
SELECT team_2, IF(team_2 = winner, 1, 0) FROM yourTable
) t
GROUP BY team_name;

mysql order by multiple columns with if

SELECT s.name, s.mark, g.grade FROM students s, grades g
where g.grade = ( Select grade from grades where s.mark >= min_mark and s.mark <= max_mark)
order by IF(g.grade='F' or g.grade='E' or g.grade='D', (g.grade, s.mark), g.grade)
This is the mysql syntax that I am trying but not getting it to work.
The select works as intended, but I want to order the grades from A to F and on same grades I want to order the marks desc for A-C and asc for D-F
Hope it's clear what I want:
name grade mark
Ewan Black A 100
Ryan Richards B 90
Drake Porter C 78
Jamie Miller C 76
NULL D 67
NULL F 43
NULL F 54

As #Vatev noted, you can use a conditional statement to change the 2nd value being sorted. I would recommend using a CASE statement, as it's more compliant with SQL standards. Also, I would recommend you use the standard JOIN syntax, rather than the old-style (20-year-plus) joins. Also, you don't need a sub-query. So something like this:
select
students.name
,grades.grade
,students.mark
from students
inner join grades on
students.mark between grades.min_mark and grades.max_mark
order by
grades.grade
,case
when grades.grade in ('D', 'E', 'F')
then students.mark
else
100 - students.mark
end

Conditional ordering for Latin and English Names in SQLite

I have a table 'Employee' which has column DevId, Id, FName, FNamePinYin.
FName will have both Chinese and English contact names. Now as per requirement I could manage to get contacts in below order:
FName FNamePinYin
爱华 杨 AIHUA YANG
安国华 ANGUOHUA
Anguohua ANGUOHUA
Aihua Yang AIHUA YANG
爸 BA
波 小 BO BEI BI XIAO
毕慧 BIHUI
Bin Guo BIN GUO
Bihui BIHUI
Ba BA
Using below query:
Select FName, SortString
from Employee
where Id in (SELECT Id
FROM EMP1
WHERE '1' = DevId
ORDER BY FnamePinYin
LIMIT 500 OFFSET 0)
ORDER BY substr(FnamePinYin,1,1) , Lower(FName) DESC
Now the problem is that contact names are not sorted in ascending order.
Note: Here Lower(FName) DESC is required to get Chinese names to be displayed first in each alphabet's category.
My desired output:
FName FNamePinYin
爱华 杨 AIHUA YANG
安国华 ANGUOHUA
Aihua Yang AIHUA YANG
Anguohua ANGUOHUA
爸 BA
波 小 BO BEI BI XIAO
毕慧 BIHUI
Ba BA
Bihui BIHUI
Bin Guo BIN GUO
FNamePinYin is English equivalent of Chinese names.
Can anyone help me get the result I want?

Your question is still not entirely clear, but based on your statement that "both Chinese and English Name should be in Acceding order" "in each category (A-Z)", then the following should do what I think you want:
select *
from Employee
where Id in (SELECT Id
FROM PBAPL1
WHERE '1' = DevId
ORDER BY FnamePinYin
LIMIT 500 OFFSET 0)
order by substr(FNamePinYin,1,1), (substr(FName,1,1) < 'zz'), FName;

Index performance on multiple tables

I'm experiencing some troubles into making my query faster for production.
The query I want to execute currently takes 12 sec to show the resultset, and it crashes the production server which is ressource restricted.
The point is that I need to get all the enregistrement records when they are the last of the given periode (which is a date as YYYYMM).
After getting these records, I want to sum one of the fields given into I.sum_field as a total field.
When I comment the CASE part, the query takes approx 5sec (+/- 500ms).
Here is the query :
SELECT
I.libelle,
E1.periode,
E1.created_at,
CASE WHEN I.sum_field = 'fat' THEN SUM(E1.Fat)
WHEN I.sum_field = 'etp' THEN SUM(E1.Etp)
WHEN I.sum_field = 'nb_ident' THEN COUNT(*)
WHEN I.sum_field = 'cdi_actif' THEN SUM(E1.cdi_actif)
END AS total
FROM
indicateur_motif IM
INNER JOIN indicateur I
ON IM.indicateur_id = I.id
INNER JOIN `position` P
ON IM.motif_id = P.id
INNER JOIN enregistrement E1
ON P.id = E1.position_id
INNER JOIN
( SELECT
MAX(id) AS id,
MAX(created_at) AS created_at
FROM
enregistrement
WHERE
(etat_mouvement_id IN (1,3,4))
AND (periode >= '201410' AND periode <= '201512')
AND created_at <= DATE_FORMAT('2015-02-03', '%Y-%m-%d %H:%i:%s')
GROUP BY
salarie_id,
periode ) E2
ON E1.id = E2.id
AND E1.created_at = E2.created_at
WHERE
I.formule_id = 1
GROUP BY
I.id,
E1.periode
ORDER BY
I.position,
E1.periode
Here is the EXPLAIN result :
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- -------------- ------ ---------------------------------------------- ---------------------------------------------- ------- ------------------ ------ ----------------------------------------------------
1 PRIMARY I ALL PRIMARY (NULL) (NULL) (NULL) 21 Using where; Using temporary; Using filesort
1 PRIMARY IM ref indicateur_motif_indicateur_id_motif_id_unique indicateur_motif_indicateur_id_motif_id_unique 4 orhase.I.id 2 Using index
1 PRIMARY P eq_ref PRIMARY PRIMARY 4 orhase.IM.motif_id 1 Using index
1 PRIMARY <derived2> ALL (NULL) (NULL) (NULL) (NULL) 165352 Using where; Using join buffer (Block Nested Loop)
1 PRIMARY e1 eq_ref PRIMARY PRIMARY 4 e2.id 1 Using where
2 DERIVED enregistrement index sp sp 771 (NULL) 165352 Using where
Here is a sample of the resultset :
libelle periode created_at total
------------------------------------------ ------- ------------------- ---------
CDI actifs fin de période 201410 2014-10-01 00:00:00 4689
CDI actifs fin de période 201411 2015-01-29 08:12:03 4674
CDI actifs fin de période 201412 2015-01-29 08:12:03 4660
CDI actifs fin de période 201501 2015-01-29 08:12:04 4444
CDI actifs fin de période 201502 2015-01-29 08:12:04 4222
CDI actifs fin de période 201503 2015-01-29 08:12:04 4195
CDI actifs fin de période 201504 2015-01-29 08:12:04 4176
CDI actifs fin de période 201505 2015-01-29 08:12:04 4155
CDI actifs fin de période 201506 2015-01-29 08:12:04 4136
CDI actifs fin de période 201507 2015-01-29 08:12:04 4121
CDI actifs fin de période 201508 2015-01-29 08:12:04 4080
CDI actifs fin de période 201509 2015-01-29 08:12:04 4061
CDI actifs fin de période 201510 2015-01-29 08:12:04 4036
CDI actifs fin de période 201511 2015-01-29 08:12:04 4001
CDI actifs fin de période 201512 2015-01-29 08:12:04 3976
ETP fin de période CDI stock 201410 2014-10-01 00:00:00 4259.16
ETP fin de période CDI stock 201411 2015-01-29 08:12:03 4241.91
ETP fin de période CDI stock 201412 2015-01-29 08:12:03 4222.12
ETP fin de période CDI stock 201501 2015-01-29 08:12:04 4028.07
I just have no idea where to put a new index to avoid this execution time... I've already put one on enregistrement, called sp :
ALTER TABLE enregistrement ADD INDEX sp(salarie_id, periode);
This one makes me get an execution time from 16sec to 12s.
Any ideas ?
Thanks.

Don't know if this will help, but what is your case doing... You are summing totally different fields and counting another into a "Total". I would suspect you might actually want these as their own columns.
However, that being said, what do you have for indexes... Your explain shows some, but I would try to include the following if they are NOT available...
table index
indicateur ( formule_id, id, position )
indicateur_motif ( indicateur_id, motif_id )
`position` ( id )
enregistrement ( position_id, id, created_at ) <-- for the JOIN portion
enregistrement ( etat_mouvement_id, periode, created_at, salarie_id, id ) <-- for sub-select query
Also, from your joins, you are not really using anything from the 'Position' table. Yes, you join from motif to position, position to enreg, but since
IM.motif_id = P.id and P.id = E1.position_id
then you could jump directly
IM.motif_id = E1.position_id
and remove the 'position' table from the query. Here is a slightly revised query to what you started. I removed the position reference, and also changed the "group by" of the inner query so that it might be better performance matching the available index for columns periode, and salarie_id.
SELECT
I.libelle,
E1.periode,
E1.created_at,
CASE WHEN I.sum_field = 'fat' THEN SUM(E1.Fat)
WHEN I.sum_field = 'etp' THEN SUM(E1.Etp)
WHEN I.sum_field = 'nb_ident' THEN COUNT(*)
WHEN I.sum_field = 'cdi_actif' THEN SUM(E1.cdi_actif)
END AS total
FROM
indicateur I
JOIN indicateur_motif IM
ON I.id = IM.indicateur_id
INNER JOIN enregistrement E1
ON IM.motif_id = E1.position_id
INNER JOIN
( SELECT
MAX(id) AS id,
MAX(created_at) AS created_at
FROM
enregistrement
WHERE
etat_mouvement_id IN (1,3,4)
AND periode >= '201410'
AND periode <= '201512'
AND created_at <= '2015-02-03'
GROUP BY
periode,
salarie_id ) E2
ON E1.id = E2.id
AND E1.created_at = E2.created_at
WHERE
I.formule_id = 1
GROUP BY
I.id,
E1.periode
ORDER BY
I.position,
E1.periode

I dont know what your tables look like, but this query:
SELECT MAX(id) AS id, MAX(created_at) AS created_at
FROM enregistrement
WHERE (etat_mouvement_id IN (1,3,4))
AND (periode >= '201410' AND periode <= '201512')
AND created_at <= DATE_FORMAT('2015-02-03', '%Y-%m-%d %H:%i:%s')
GROUP BY salarie_id, periode
is very expensive. If you want to try to fix this solely through indexes, adding indexes to the id and created_at columns might be a good start. The other suggestion I might make is to run this query in a separate transaction, and insert the results into a temp table. That should at least free up some of the required resources by turning it into a simple join rather than a very complex search operation in the middle of your query. If that doesnt work, you could also try running all of the selects and joins without the sums, inserting those results into a temp table, and then selecting and summing the results from there.
That said, without seeing your tables, the number of rows in each and all of the data in each column, what kind of hardware youre running, or having any idea what your prod environment looks like in regards to usage, its really hard to say exactly where the problem might be. I'm pretty sure there is no built-in function in MySQL yet, but profiling the query using something like Jet Profiler might be worthwhile if this is business critical. Seeing exactly where the resource pressure is coming from would be the first thing I would want to do if I were writing a query that is crashing production servers.

your slowness is coming from your sub-select on enregistrement. they are both seem to be table scanning what looks all the records. The IN is also not helping.
try creating indexes on the following table fields and let me know.
enregistrement.etat_mouvement_id
enregistrement.periode
enregistrement.created_at

Here it is. I reduced the execution time from 12s to 6.8s with this query :
SELECT I.libelle, e1.periode,
CASE WHEN I.sum_field = 'fat' THEN SUM(E1.Fat)
WHEN I.sum_field = 'etp' THEN SUM(E1.Etp)
WHEN I.sum_field = 'nb_ident' THEN COUNT(*)
WHEN I.sum_field = 'cdi_actif' THEN SUM(E1.cdi_actif) END AS 'total'
FROM indicateur_motif IM
INNER JOIN indicateur I ON IM.indicateur_id = I.id
INNER JOIN enregistrement e1 ON IM.motif_id = e1.position_id
INNER JOIN
(
SELECT MAX(created_at) AS createdat, salarie_id, periode
FROM enregistrement
WHERE (etat_mouvement_id IN (1,3,4))
AND (periode >= '201410' AND periode <= '201512')
AND created_at <= DATE_FORMAT('2015-02-03', '%Y-%m-%d %H:%i:%s')
GROUP BY salarie_id, periode
) e2 ON (e1.created_at = e2.createdat AND e1.salarie_id = e2.salarie_id AND e1.periode = e2.periode)
WHERE I.formule_id = 1
GROUP BY I.id, e1.periode
ORDER BY I.position, e1.periode
Just for information, this subquery :
SELECT MAX(created_at) AS createdat, salarie_id, periode
FROM enregistrement
WHERE (etat_mouvement_id IN (1,3,4))
AND (periode >= '201410' AND periode <= '201512')
AND created_at <= DATE_FORMAT('2015-02-03', '%Y-%m-%d %H:%i:%s')
GROUP BY salarie_id, periode
Only takes 0.003s to execute, thanks to my sp index :
ALTER TABLE enregistrement ADD INDEX sp(salarie_id, periode);
#DRapp : You were right on my JOINS, I removed position from the joins and corrected the query. On total field, I do want to get the values on a single column, to not to do conditions on my code logic.
I tried #DRapp indexes and query proposition, they just slowed or changed nothing to my query.
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- -------------- ------ ---------------------------------------------- ---------------------------------------------- ------- --------------------------------- ------ ----------------------------------------------------
1 PRIMARY <derived2> ALL (NULL) (NULL) (NULL) (NULL) 165352 Using temporary; Using filesort
1 PRIMARY e1 ref sp sp 771 e2.salarie_id,e2.periode 1 Using where
1 PRIMARY I ALL PRIMARY (NULL) (NULL) (NULL) 21 Using where; Using join buffer (Block Nested Loop)
1 PRIMARY IM eq_ref indicateur_motif_indicateur_id_motif_id_unique indicateur_motif_indicateur_id_motif_id_unique 8 orhase.I.id,orhase.e1.position_id 1 Using index
2 DERIVED enregistrement index sp sp 771 (NULL) 165352 Using where
With this EXPLAIN result, I want to resolve the first line, which describes Using temporary; Using filesort. The solution would be to index the GROUP BY columns, but I dont know if it's possible to create a composite index on these two fields, because they come from different tables. What would be a better or an alternative solution ?
Thanks all for your answers :)

How to write this SQL statement?

Here is the table:
User
Name: Subject:
Peter Math
Mary Chinese
Mary Computer
Mary Hist
Mary PE
Mary English
Peter Art
Chris English
Chris Computer
Peter Computer
Paul Math
I would like to get the the top appear in name, and return top 4 result should be subject name. For example, in this case top appear name is Mary, and base on the order in subject, the Chinese , Computer, English, so I would like to have the result:
Mary Chinese
Mary Computer
Mary English
Mary Hist
If Mary is not the most enough to show the result, the second people will be the follow, like, let say the table will like this:
Name: Subject:
Peter Math
Mary Chinese
Mary Computer
Mary Hist
Peter Art
Chris English
Chris Computer
Peter Computer
Paul Math
The result will be,
Mary Chinese
Mary Computer
Mary Hist
Peter Art
Because Mary is the most appear, so Mary will return, but Mary is not enough to fill in 4 positions, so the second most appear will take the place, in this case, we use Peter.

SELECT user.name, user.subject
FROM user
INNER JOIN (
SELECT name, COUNT(1) AS occurrences
FROM user
GROUP BY name
) AS user_occurrences
ON user.name = user_occurrences.name
ORDER BY user_occurrences.occurrences DESC, user.name ASC, user.subject ASC
LIMIT 4
edit This might perform better, depending on the RDBMS you're using and the size of the dataset. Try both and compare.
SELECT user.name, user.subject
FROM user
INNER JOIN user AS user_occurrences
ON user.name = user_occurrences.name
GROUP BY user.name --, user.subject Second GROUP BY not needed on MySQL, but it should logically be there
ORDER BY COUNT(user_occurrences.subject) DESC, user.name ASC, user.subject ASC
LIMIT 4

select top 4 from group by Name, Subject and sort by count
MSSQL Code:
select top 4 q.marketname, cc.countryname from (
select top 100 m.MarketName, m.MarketId, COUNT(m.marketname) as [count]
from Common.Country c inner join Common.Market m on c.MarketId = m.MarketId
group by m.MarketName, m.MarketId order by COUNT(m.marketname) desc)
q inner join Common.Country cc
on cc.MarketId = q.MarketId order by [Count] desc
You can make similar mySQL code
Here is relevent MySQL Code
select q.name, cc.subject from (
select m.Name, count(*) as Count
from User m
group by m.Name
order by COUNT(*) desc
LIMIT 100
) q
inner join user cc on cc.Name= q.name
order by Count desc
LIMIT 4
This is weired, you want solution with no effort? Can't you implemet logic in your technology? You should not downvote without understanding solution suggested.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL: Select on GROUP BY only one row with certain criteria - mysql

SELECT DISTINCT COALESCE(e.doc_id,f.doc_id) doc_id , f.case_id , f.date , COALESCE(e.lang,f.lang) lang FROM my_table f LEFT JOIN my_table e ON e.case_id = f.case_id AND e.date = f.date AND e.lang = 'ENG';

SELECT sorted.doc_id, sorted.case_id, sorted.date, sorted.lang FROM ( SELECT doc_id, case_id, date, lang FROM tbl ORDER BY FIELD(lang, 'ENG', 'FRA') ) sorted GROUP BY sorted.case_id

If this SQL is required for some research, there is a way to get the expected result set: Select SUBSTRING_INDEX(GROUP_CONCAT(doc_id ORDER BY lang ), ',', 1) doc_id, case_id, date, SUBSTRING_INDEX(GROUP_CONCAT(lang ORDER BY lang), ',', 1) lang from table group by case_id,date

SELECT doc_id, case_id, date, lang, max(case lang when 'ENG' then 1 else 0 end) FROM tbl GROUP BY case_id

Related

does count automatically sum up similar values without a group by statement

mysql order by multiple columns with if

Conditional ordering for Latin and English Names in SQLite

Index performance on multiple tables

How to write this SQL statement?

Categories

Resources