Can we make a DISTINCT of a group_concat(distinct somefield)? - mysql

http://sqlfiddle.com/#!2/37dd94/17
If I do SELECT DISTINCT I get the same results as doing just SELECT.
On the query results, you will see two activities that contains the District "Evora".
Only one should appear.
Any clue?

How about the following query (SQL FIDDLE):
SELECT GROUP_CONCAT(APA_T.district), t.name
FROM tbl_activity AS t
JOIN tbl_activity_package AS ap ON t.id = ap.id_activity
JOIN
(
SELECT DISTINCT apa.district AS district,
(
SELECT s1.id_activity_package
FROM tbl_activity_package_address s1
WHERE apa.district = s1.district
ORDER BY s1.id DESC
LIMIT 1
) AS idActivityPackage
FROM
tbl_activity_package_address apa
ORDER BY apa.district
) AS APA_T
ON ap.id = APA_T.idActivityPackage
GROUP BY t.name
ORDER BY APA_T.district;
The above query will eliminate the extra Faro and Evora.

Related

Performance issue with mysql query

I am working in MYSQL for the first and I am having issues with the following query
SELECT
t.id,
t.name,
t.description,
(
SELECT
GROUP_CONCAT( CONCAT( hs.name, '|', s.rate ) )
FROM
occupation_skill_rate s
INNER JOIN hard_skills hs ON s.hard_skill_id = hs.id
WHERE
s.occupation_id = t.id
ORDER BY
s.rate DESC LIMIT 15
) AS skills,
(
SELECT
GROUP_CONCAT( CONCAT( hs.name, '|', s.rate ) )
FROM
occupation_knowledge_rate s
INNER JOIN knowledge hs ON s.knowledge_id = hs.id
WHERE
s.occupation_id = t.id
ORDER BY
s.rate DESC LIMIT 15
) AS knowledge,
(
SELECT
GROUP_CONCAT( CONCAT( hs.name, '|', s.rate ) )
FROM
occupation_abilities_rate s
INNER JOIN ability hs ON s.ability_id = hs.id
WHERE
s.occupation_id = t.id
ORDER BY
s.rate DESC LIMIT 15
) AS knowledge
FROM
occupations t
The occupation table contains 1033 rows occupation_skill_rate contains 34160 rows and it takes more than 1 minute to execute this query. Please let me know if you need further clarification for helping me.
Thanks for your help
Ajai
The occupation_%_rate tables seem to be many-to-many, correct? They need these indexes and no id:
PRIMARY KEY(occupation_id, xxx_id)
INDEX(xxx_id, occupation_id)
But, it seems like the ORDER BY and LIMIT when used with GROUP_CONCAT(). Please describe what the query's intention is; we may be able to help in rewriting it.
GROUP_CONCAT allows an ORDER BY clause but not a LIMIT. Can you do without the LIMITs?
Example
Instead of
( SELECT GROUP_CONCAT( CONCAT( hs.name, '|', s.rate ) )
FROM occupation_skill_rate s
INNER JOIN hard_skills hs ON s.hard_skill_id = hs.id
WHERE s.occupation_id = t.id
ORDER BY s.rate DESC
LIMIT 15
) AS skills;
Do
( SELECT CONCAT( name, '|', rate ORDER BY rate DESC )
FROM (
SELECT hs.name, s.rate
FROM occupation_skill_rate s
INNER JOIN hard_skills hs
ON s.hard_skill_id = hs.id
AND s.occupation_id = t.id
ORDER BY s.rate DESC
LIMIT 15
) AS a
) AS skills
But I suspect t is not visible that deeply nested.
If that is the case, rearrange things thus:
SELECT t.id, t.name, t.description, s3.skills, ...
FROM occupations AS t
JOIN (
SELECT s2.occupation_id,
CONCAT( s2.name, '|', s2.rate ORDER BY rate DESC )
AS skills
FROM (
SELECT hs.name, s1.rate, s1.occupation_id
FROM occupation_skill_rate s1
INNER JOIN hard_skills hs
ON s1.hard_skill_id = hs.id
ORDER BY s.rate DESC
LIMIT 15
) AS s2
GROUP BY s2.occupation_id
ORDER BY s2.rate DESC
) AS s3 ON s2.occupation_id = t.id
JOIN ...
JOIN ... ;
Another
There is also a way to build the long GROUP_CONCAT, then chop to 15 items by using SUBSTRING_INDEX(...).

how to combine 2 cte to get grouping

hi i need help combining 2 cte to get who get 100 attendance percentage but failed at exam
here my first cte
with main as(
select ca.STUDENT_ID,
ca.SCHEDULE_ID,
s.COURSE_ID,
co.NAME as course_name,
st.NAME,
count(ca.ID) as total_attendance,
((CHAR_LENGTH(s.COURSE_DAYS) - CHAR_LENGTH(REPLACE(s.COURSE_DAYS , ',', '')) + 1) * 13) as attendance_needed
from univ.course_attendance ca
left join univ.schedule s on ca.SCHEDULE_ID = s.ID
left join univ.student st on ca.SCHEDULE_ID = st.ID
left join univ.course co on ca.SCHEDULE_ID = co.ID
group by ca.STUDENT_ID, ca.SCHEDULE_ID
)
select *,total_attendance/attendance_needed as attendance_percentage
from main
order by 1,2;
second cte
;with inputdata as
(
select es.STUDENT_ID,es.EXAM_ID,es.SCORE,e.PASS_THRESHOLD, s.NAME , c.NAME as Course_name, es.EXAM_DT,
case
when SCORE>=PASS_THRESHOLD then 'PASS'
else 'Fail'
end as Flag
from exam_submission es
left join student s on es.STUDENT_ID = s.ID
left join exam e on es.EXAM_ID = e.ID
left join course c on e.COURSE_ID = c.ID
)
select * from inputdata I
join
( select student_id,exam_id from
inputdata
group by student_id, exam_id
)T on I.student_id=T.student_id and I.exam_id=T.exam_id
order by exam_dt asc
result:
what i need student name, course name, attendace percentage & flag "failed/pass"
Just chain multiple table expressions in a single CTE by introducing "aliases" like main_ordered for the first CTE and inputdata_grouped for the second one. I'm sticking with the original naming, but it could be improved.
with
main as (
select ca.STUDENT_ID,
...
group by ca.STUDENT_ID, ca.SCHEDULE_ID),
main_ordered as (
select *,total_attendance/attendance_needed as attendance_percentage
...
order by 1,2),
inputdata as (
select es.STUDENT_ID,es.EXAM_ID,es.S...
...),
inputdata_grouped as (
select * from inputdata I
...
group by student_id, exam_id...
...
order by exam_dt asc)
select *
from main_ordered join inputdata_grouped on ...

MySql Query Order BY datetime and Group By Id issue

Having issue with this query
there is 3 rows in comments table with different date-times, I want customer list with last comment created_date / updated_date.
but didn't getting last commented customer with group by customer
SELECT * FROM(
SELECT MAX(comments.`date_updated`), customer.id AS vid, comments.`date_updated` AS dts, comments.`id` AS comments_id, comments.* FROM customer
INNER JOIN comments ON comments.`customer_id` = customer.`id`
WHERE customer.`id` IN ('')
) AS v
GROUP BY v.`vid` LIMIT 0,50
You use a self join to comments table and filter row for each customer with latest date_updated
SELECT c.id AS vid, co.`date_updated` AS dts, co.`id` AS comments_id, co.*
FROM customer c
INNER JOIN comments co ON co.`customer_id` = c.`id`
LEFT JOIN comments co1 ON co.`customer_id` = co1.`customer_id` AND co.date_updated < co1.date_updated
WHERE co1.customer_id IS NULL AND c.`id` IN ('')
Or with inner join
SELECT c.id AS vid, co.`date_updated` AS dts, co.`id` AS comments_id, co.*
FROM customer c
INNER JOIN comments co ON co.`customer_id` = c.`id`
INNER JOIN (
SELECT customer_id, MAX(date_updated) date_updated
FROM comments
GROUP BY customer_id
) co1 ON co.customer_id = co1.customer_id AND co.date_updated = co1.date_updated
WHERE c.`id` IN ('')
You forgot order by clause
SELECT * FROM(
SELECT MAX(comments.`date_updated`), customer.id AS vid, comments.`date_updated` AS dts, comments.`id` AS comments_id, comments.* FROM customer
INNER JOIN comments ON comments.`customer_id` = customer.`id`
WHERE customer.`id` IN ('')
) AS v
GROUP BY v.`vid` LIMIT 0,50
ORDER BY created_date desc;

Query returning one "extra" record. Any advice on how to remove it from the query results?

We have the following, quite complex (at least for us) query.
Since, as far as we know, there's no such thing as INTERSECT on MySQL, we are wondering how can we fix this:
(
SELECT GROUP_CONCAT(APA_T.district ORDER BY APA_T.district), t.name
FROM tbl_activity AS t
INNER JOIN tbl_activity_package AS ap ON t.id = ap.id_activity
INNER JOIN (
SELECT DISTINCT apa.district AS district, (
SELECT s1.id_activity_package
FROM tbl_activity_package_address s1
WHERE apa.district = s1.district
ORDER BY s1.id DESC
LIMIT 1
) AS idActivityPackage
FROM
tbl_activity_package_address apa
ORDER BY apa.district
) AS APA_T
ON ap.id = APA_T.idActivityPackage
GROUP BY t.name
ORDER BY APA_T.district
)
UNION DISTINCT
(
SELECT GROUP_CONCAT(DISTINCT apa2.district ORDER BY apa2.district), t2.name
FROM tbl_activity AS t2
INNER JOIN tbl_activity_package AS ap2 ON t2.id = ap2.id_activity
INNER JOIN tbl_activity_package_address AS apa2 ON ap2.id = apa2.id_activity_package
GROUP BY t2.name
ORDER BY apa2.district
)
#LIMIT 6, 6
Here are the results:
GROUP_CONCAT(APA_T.DISTRICT ORDER BY APA_T.DISTRICT) NAME
Beja,Faro,Setubal activity-1
Evora activity-2
Sintra activity-4
Braga,Sines activity-5
Santarem activity-6
Guarda,Matosinhos,Sagres activity-7
Lisboa,Montemor,Porto,Rio de Janeiro activity-8
Beja,Evora,Faro,Setubal activity-1
Faro activity-3
Here are the results as we wish they were:
GROUP_CONCAT(APA_T.DISTRICT ORDER BY APA_T.DISTRICT) NAME
Beja,Faro,Setubal activity-1
Evora activity-2
Sintra activity-4
Braga,Sines activity-5
Santarem activity-6
Guarda,Matosinhos,Sagres activity-7
Lisboa,Montemor,Porto,Rio de Janeiro activity-8
Faro activity-3
ISSUE
This line should NOT appear. No activity should appear twice.
Beja,Evora,Faro,Setubal activity-1
We understand that the UNION DISTINCT doesn't remove it, because indeed:
Beja, Faro, Setubal IS DIFFERENT THAN Beja,Evora,Faro,Setubal
HOWEVER, we wish NOT to have Evora to appear on the first result. So, it is OK as it is, the first query on the UNION does it's job as it should.
Still, that second activity-1 that appears, should be removed.
Any advice on how to solve this?
THE BIG PICTURE
As you can see, this is quite a huge select that will, perhaps, get worst and slow by time.
We wish to have a INFINITE SCROLL of Activities, and the first results of that Infinite Scroll, should be from Activities happening on different districts. Why? Why can't we do it "order by date" or something, you may ask.
Because if a database back-end user do insert the last 20 records, all from one single district, we will have on the infinite scroll first list results, only activities from that district APPEARING that we don't have more than that district.
So, the point is to LIST ALL the results on a certain (complex) ORDER. :)
Any other, perhaps better way, would be great.
http://sqlfiddle.com/#!2/37dd94/51
Does the below (SQL Fiddle) produce the results you are looking for. I wrapped the union so I could then sort on the name field. If you don't want it that way then you can remove it or sort on the DistCon field instead.
SELECT * FROM
(
SELECT GROUP_CONCAT(APA_T.district) AS DistCon, t.name
FROM tbl_activity AS t
JOIN tbl_activity_package AS ap ON t.id = ap.id_activity
JOIN
(
SELECT DISTINCT apa.district AS district,
(
SELECT s1.id_activity_package
FROM tbl_activity_package_address s1
WHERE apa.district = s1.district
ORDER BY s1.id DESC
LIMIT 1
) AS idActivityPackage
FROM
tbl_activity_package_address apa
ORDER BY apa.district
) AS APA_T
ON ap.id = APA_T.idActivityPackage
GROUP BY t.name
UNION
SELECT GROUP_CONCAT(apa.district), t.name
FROM tbl_activity AS t
JOIN tbl_activity_package AS ap ON t.id = ap.id_activity
JOIN tbl_activity_package_address AS apa ON ap.id = apa.id_activity_package
WHERE t.name NOT IN
(
SELECT DISTINCT t.name
FROM tbl_activity AS t
JOIN tbl_activity_package AS ap ON t.id = ap.id_activity
JOIN
(
SELECT DISTINCT apa.district AS district,
(
SELECT s1.id_activity_package
FROM tbl_activity_package_address s1
WHERE apa.district = s1.district
ORDER BY s1.id DESC
LIMIT 1
) AS idActivityPackage
FROM
tbl_activity_package_address apa
) AS APA_T
ON ap.id = APA_T.idActivityPackage
)
GROUP BY t.name
) AS Mm
ORDER BY Mm.name
This query provides a slightly different result from that specified above because it employs slightly different rules.
Basically, it says "Give me at least one district for every activity. Where multiple districts offer the same activity, exclude any that are sole providers of another activity."
SELECT x.activity
, GROUP_CONCAT(DISTINCT x.district) districts
FROM
( SELECT a.name activity
, apa.district
FROM tbl_activity a
JOIN tbl_activity_package ap
ON ap.id_activity = a.id
JOIN tbl_activity_package_address apa
ON apa.id_activity_package = ap.id
) x
LEFT
JOIN
( SELECT a.name activity
, apa.district
FROM tbl_activity a
JOIN tbl_activity_package ap
ON ap.id_activity = a.id
JOIN tbl_activity_package_address apa
ON apa.id_activity_package = ap.id
GROUP
BY activity
HAVING COUNT(*) = 1
) y
ON y.district = x.district
AND y.activity <> x.activity
WHERE y.activity IS NULL
GROUP
BY activity;
+------------+--------------------------------------+
| activity | districts |
+------------+--------------------------------------+
| activity-1 | Beja,Setubal |
| activity-2 | Evora |
| activity-3 | Faro |
| activity-4 | Sintra |
| activity-5 | Braga,Sines |
| activity-6 | Santarem |
| activity-7 | Guarda,Sagres,Matosinhos |
| activity-8 | Lisboa,Porto,Rio de Janeiro,Montemor |
+------------+--------------------------------------+

Query for multiple count values

SELECT cm.commenter_id,
cm.comment,
m.id,
(
SELECT COUNT(*) AS r_count
FROM comments
GROUP BY comments.commenter_id
) AS count,
m.display_name
FROM comments cm
INNER JOIN members m
ON cm.commenter_id = m.id
From this query I want to get the display_name for the person with the highest count of comments. Any guidance is appreciated.
SELECT m.id, m.display_name, COUNT(*) totalComments
FROM comments cm
INNER JOIN members m
ON cm.commenter_id = m.id
GROUP BY m.id, m.display_name
HAVING COUNT(*) =
(
SELECT COUNT(*) totalCount
FROM Comments
GROUP BY commenter_id
ORDER BY totalCount DESC
LIMIT 1
)
SQLFiddle Demo
SQLFiddle Demo (with duplicates)
I think the simplest way is just to sort your query and take the first row:
SELECT cm.commenter_id,
cm.comment,
m.id,
(
SELECT COUNT(*) AS r_count
FROM comments
GROUP BY comments.commenter_id
) AS count,
m.display_name
FROM comments cm
INNER JOIN members m
ON cm.commenter_id = m.id
order by count desc
limit 1