SELECT DISTINCT from one column based on MAX of another

SELECT DISTINCT from one column based on MAX of another - mysql

I have a query that returns the relative activity of users in each region. I want to be returned that list but with each user only in 1 region, so I want to filter out on everyone's MAX applications.
The current query:
SELECT
r.region_id,
ha.user_id,
count(ha.user_id) AS applications
FROM
sit_applications ha
LEFT JOIN
listings_regions r
ON
r.listingID = ha.listingID
AND deleted = 0
WHERE
ha.datetime_applied >= (NOW() - INTERVAL 1 MONTH)
GROUP BY
ha.user_id, r.region_id
HAVING
applications > 0
ORDER BY
r.region_id DESC
I need to filter this query so I only grab each user_id once, and with it's biggest applications for a region. This is so I have a list of all the top performers for each region, with no duplicate users.

In MySQL, you have three basic ways to do this:
Using variables
Using a complex join
Using a hack with substring_index() and group_concat().
The complex join is really a mess when you have aggregation queries. The hack is fun, but does have its limitation. So, let's consider the variables method:
SELECT ur.*
FROM (SELECT ur.*,
(#rn := if(#u = user_id, #rn + 1,
if(#u := user_id, 1, 1)
)
) as rn
FROM (SELECT r.region_id, ha.user_id, count(ha.user_id) AS applications
FROM sit_applications ha LEFT JOIN
listings_regions r
ON r.listingID = ha.listingID AND deleted = 0
WHERE ha.datetime_applied >= (NOW() - INTERVAL 1 MONTH)
GROUP BY ha.user_id, r.region_id
HAVING applications > 0
) ur CROSS JOIN
(SELECT #u := -1, #rn := 0) params
ORDER BY user_id, applications DESC
) ur
WHERE rn = 1;
Note: Aspects of your query do not really make sense, even though I left them in. You are using LEFT JOIN, so r.region_id could be NULL -- and that is usually not desirable. You have a HAVING clause that is totally unnecessary, because the COUNT() is always 1 -- assuming that ha.user_id is never NULL. I suspect that the logic could be replaced with an INNER JOIN, no HAVING clause, and COUNT(*).

You could try wrapping the query and extracting out what you want:
SELECT t2.user_id, t2.region_id, t2.applications
FROM
(
SELECT t.user_id, MAX(t.applications) AS applications
FROM
(
SELECT r.region_id, ha.user_id, COUNT(ha.user_id) AS applications
FROM sit_applications ha LEFT JOIN listings_regions r
ON r.listingID = ha.listingID AND deleted = 0
WHERE ha.datetime_applied >= (NOW() - INTERVAL 1 MONTH)
GROUP BY ha.user_id, r.region_id
HAVING applications > 0
) t
GROUP BY t.user_id
) t1
INNER JOIN
(
SELECT r.region_id, ha.user_id, COUNT(ha.user_id) AS applications
FROM sit_applications ha LEFT JOIN listings_regions r
ON r.listingID = ha.listingID AND deleted = 0
WHERE ha.datetime_applied >= (NOW() - INTERVAL 1 MONTH)
GROUP BY ha.user_id, r.region_id
HAVING applications > 0
) t2
ON t1.user_id = t2.user_id AND t1.applications = t2.applications

Related

Selecting from three tables

I am trying to SELECT from one table and count from two other tables based on the rows from the first table. I tried the following code below but the rows keep coming empty.
SELECT list.id, list.title, list.body, list.poster, list.created_at, count(comments.id) as comcount, count(supports.topic_id) as supcount
FROM (
SELECT *
FROM topics
ORDER BY created_at DESC
LIMIT 5
) AS list, comments, supports
WHERE
list.id = comments.id OR
list.id = supports.topic_id
Through in this scenario table topics has only two rows and tables comments and supports have no rows in them, but yet still I should be able to get two rows with their aliases supcount and comcount each having a value 0 as an output.
I got the solution to the above but am trying something else with the solution provided which I explained in the comment area of the solution provided.
SELECT
t.id,
t.title,
t.body,
t.poster,
t.created_at,
s.supporter,
IFNULL((SELECT COUNT(*) FROM comments c WHERE c.id = t.id), 0) AS comcount,
IFNULL((SELECT COUNT(*) FROM supports s WHERE s.topic_id = t.id), 0) AS supcount,
CASE WHEN (s.supporter = "Davies Alex") THEN '1' ELSE '0' END sup,
CASE WHEN (c.commenter = "Davies Alex") THEN '1' ELSE '0' END com
FROM topics t, comments c, supports s
ORDER BY created_at DESC

This gonna be working, give a try (using subquery for just counting entries in another table is more suitable):
SELECT
id,
title,
body,
poster,
created_at,
IFNULL((SELECT COUNT(*) FROM comments c WHERE c.id = t.id), 0) AS comcount,
IFNULL((SELECT COUNT(*) FROM supports s WHERE s.topic_id = t.id), 0) AS supcount
FROM topics t
ORDER BY created_at DESC
LIMIT 5
Update for new requirement:
SELECT
t.id,
t.title,
t.body,
t.poster,
t.created_at,
s.supporter,
IFNULL(COUNT(c.id), 0) AS comcount,
IFNULL(COUNT(s.id), 0) AS supcount,
SUM(IF(s.supporter IS NOT NULL AND s.supporter = "Davies Alex", 1, 0)) > 0 AS sup,
SUM(IF(c.commenter IS NOT NULL AND c.commenter = "Davies Alex", 1, 0)) > 0 AS com
FROM topics t
LEFT JOIN comments c ON c.id = t.id
LEFT JOIN supports s ON s.topic_id = t.id
GROUP BY t.id
ORDER BY created_at DESC

In your query, you require list.id to either match comments.id or supports.topic_id. If you use an outer join, you'll be able to retrieve data from the initial table even though the joined tables don't match or contain any data.
SELECT
topics.id, topics.title, topics.body, topics.poster, list.created_at,
count(comments.id) as comcount,
count(supports.topic_id) as supcount
FROM lists
LEFT JOIN comments ON comments.id = topics.id
LEFT JOIN supports ON supports.topic_id = topics.id
ORDER BY created_at DESC
LIMIT 5

Way to reduce execution time of this query in mysql

Some time ago I needed a little help here to build a custom query. And this query worked fine till now.
 
When I run the query (in a procedure) I get the error:
Error Code: 2013. Lost connection to MySQL server during query
My access to my.ini via ssh is read only (because my db is in a shared host "godaddy") so I can't increase the execution time (actual is 60)
Is there one way to optimize this query to make it more fast?
The query is:
SELECT #curRank := #curRank + 1 as rank, p.nick,(kills + ((p.vpos - p.vneg)*5) + (top * 5) - deaths) as score
FROM (SELECT
(SELECT uuid FROM players WHERE players.uuid = p.uuid LIMIT 1) as uuid,
(SELECT nick FROM nicks n WHERE n.pid = p.id ORDER BY id DESC LIMIT 1) as nick,
(SELECT COUNT(*) FROM kills k WHERE k.pid = p.id ) as kills,
(SELECT COUNT(*) FROM deaths d WHERE d.pid = p.id ) as deaths,
(SELECT COUNT(*) FROM headshots h WHERE h.pid = p.id ) as hs,
(SELECT COUNT(*) FROM votos vp WHERE vp.vid = p.id AND tipo="p") as vpos,
(SELECT COUNT(*) FROM votos vn WHERE vn.vid = p.id AND tipo="n") as vneg,
(SELECT COUNT(*) FROM top_rounds t WHERE t.pid = p.id ) as top,
(SELECT #curRank := 0) as rank
FROM players p
) p ORDER BY score DESC LIMIT 30;
Note: all pid's and p.id's already are indexes

Untested (due to lack of sample data):
SELECT p.nick,
(IFNULL(k.cnt, 0)
+ ((IFNULL(vpos.cnt, 0) - IFNULL(vneg.cnt, 0))*5)
+ (IFNULL(t.cnt, 0) * 5) - IFNULL(d.cnt, 0) AS score
FROM players p
LEFT JOIN (
SELECT pid, COUNT(*) AS cnt
FROM kills
GROUP BY pid
) AS k ON p.id = k.pid
⋮
LEFT JOIN (
SELECT pid, COUNT(*) AS cnt
FROM top_rounds
GROUP BY pid
) AS t ON p.id = t.pid
ORDER BY score DESC
LIMIT 30
i.e. make sure each inner query runs once only for all the players. Each subquery results in a table which maps player id to corresponding count. Since there might be zero matching rows, we have to use LEFT JOIN and translate NULL into 0 using IFNULL(foo.cnt, 0).
If you need to index rows, you can add an extra outer query for that alone, but personally I'd prefer to handle that outside SQL in the application which processes the query result.

SQl Server 2008 Performance Issue for Count(distinct()) and SUM. How can avoid this issue?

The below one is my query. It's taking 12 seconds for process. I have created the index for T.DataViewId, but it's still taking long time due to Count(distinct()) and Sum. Thanks in Advance.
;WITH my_cte
AS (SELECT T.name AS name,
T.id AS id,
Count(DISTINCT( DD.dynamictableid )) AS counts,
Round(Sum(D.[employees]), 0) AS measure1
FROM dbo.treehierarchy T
LEFT JOIN dbo.dynamicdatatableid DD
ON T.id = DD.hierarchyid
AND T.dataviewid = DD.dataviewid
LEFT JOIN dbo.demo1 D
ON D.[demo1id] = DD.dynamictableid
WHERE T.dataviewid = 2
AND T.parentid = 0
GROUP BY T.id,
T.name)
SELECT name, id, counts, row_num, measure1
FROM (SELECT name,
id,
counts,
Row_number()
OVER(
ORDER BY counts DESC) AS row_num,
measure1
FROM my_cte) innertable
WHERE ( row_num BETWEEN 1 AND 15 )

It looks as if you only need top 15 records of descending counts. It could be done simply like this :
SELECT
TOP 15 T.name AS name,
T.id AS id,
Count(DISTINCT( DD.dynamictableid )) AS counts,
Round(Sum(D.[employees]), 0) AS measure1
FROM
dbo.treehierarchy T
LEFT JOIN
dbo.dynamicdatatableid DD
ON
T.id = DD.hierarchyid
AND
T.dataviewid = DD.dataviewid
LEFT JOIN
dbo.demo1 D
ON
D.[demo1id] = DD.dynamictableid
WHERE
T.dataviewid = 2
AND
T.parentid = 0
GROUP BY
T.id,T.name
ORDER BY
3 DESC

MySQL Query Efficiency - Is there a better way?

I have a query that basically combines tables of actions and selects from them in chronological order while preserving pagination..
Is there a more efficient / better way to do this? The query takes 3 seconds. Not terrible.. but I think there is room for improvement and I will be using it alot..
Thanks!
SELECT
`newsletters_subscribers`.`email`,
`newsletters_subscribers`.`first_name`,
`newsletters_subscribers`.`last_name`,
`newsletters_subscribers`.`id` AS subscriber_id,
COUNT(DISTINCT newsletters_opens.id) AS opens,
COUNT(DISTINCT newsletters_clicks.id) AS clicks,
COUNT(DISTINCT newsletters_forwards.id) AS forwards
FROM `thebookrackqccom_newsletters_subscribers` newsletters_subscribers
LEFT JOIN
`thebookrackqccom_newsletters_opens` newsletters_opens
ON `newsletters_opens`.`subscriber_id` = `newsletters_subscribers`.`id`
AND newsletters_opens.newsletter_id = 1
LEFT JOIN
`thebookrackqccom_newsletters_clicks` newsletters_clicks
ON `newsletters_clicks`.`subscriber_id` = `newsletters_subscribers`.`id`
AND newsletters_clicks.newsletter_id = 1
LEFT JOIN
`thebookrackqccom_newsletters_forwards` newsletters_forwards
ON `newsletters_forwards`.`subscriber_id` = `newsletters_subscribers`.`id`
AND newsletters_forwards.newsletter_id = 1
WHERE
( newsletters_opens.id IS NOT NULL
OR newsletters_clicks.id IS NOT NULL
OR newsletters_forwards.id IS NOT NULL )
GROUP BY
`newsletters_subscribers`.`id`
ORDER BY
`newsletters_subscribers`.`email` ASC
LIMIT 25

What you need is indexes that the query can use. A compound index on (newsletter_id, subscribe_id) on each one of the three tables would help.
You can also rewrite the query like this:
SELECT
s.email,
s.first_name,
s.last_name,
s.id AS subscriber_id,
COALESCE(o.opens, 0) AS opens,
COALESCE(c.clicks, 0) AS clicks,
COALESCE(f.forwards, 0) AS forwards
FROM thebookrackqccom_newsletters_subscribers AS s
LEFT JOIN
( SELECT subscriber_id,
COUNT(*) AS opens
FROM thebookrackqccom_newsletters_opens
WHERE newsletters_opens.newsletter_id = 1
) AS o ON o.subscriber_id = s.id
LEFT JOIN
( SELECT subscriber_id,
COUNT(*) AS clicks
FROM thebookrackqccom_newsletters_clicks
WHERE newsletter_id = 1
) AS c ON c.subscriber_id = s.id
LEFT JOIN
( SELECT subscriber_id,
COUNT(*) AS forwards
FROM thebookrackqccom_newsletters_forwards
WHERE newsletter_id = 1
) AS f ON f.subscriber_id = s.id
WHERE ( o.subscriber_id IS NOT NULL
OR c.subscriber_id IS NOT NULL
OR f.subscriber_id IS NOT NULL )
ORDER BY
s.email ASC
LIMIT 25

Try this Query i hope you get a better execution time
QUERY
SELECT
`newsletters_subscribers`.`email`,
`newsletters_subscribers`.`first_name`,
`newsletters_subscribers`.`last_name`,
`newsletters_subscribers`.`id` AS subscriber_id,
#nopen := coalesce( N_OPEN.NOPENIDCOUNT, 000000 ) as opens,
#nclick := coalesce( N_CLICK.NCLICKIDCOUNT, 000000 ) as clicks,
#nfwd := coalesce( N_FWD.NFWDIDCOUNT, 000000 ) as forwards
FROM
(select #nopen := 0,#nclick := 0,#nfwd :=0) sqlvars,
`thebookrackqccom_newsletters_subscribers` AS newsletters_subscribers
LEFT JOIN (SELECT `newsletters_opens`.`subscriber_id`,
COUNT(newsletters_opens.id) AS NOPENIDCOUNT
FROM `thebookrackqccom_newsletters_opens` AS newsletters_opens
WHERE newsletters_opens.newsletter_id = 1) AS N_OPEN
ON N_OPEN.subscriber_id = `newsletters_subscribers`.`id`
LEFT JOIN (SELECT `newsletters_clicks`.`subscriber_id`,
COUNT(newsletters_clicks.id) AS NCLICKIDCOUNT
FROM `thebookrackqccom_newsletters_clicks` AS newsletters_clicks
WHERE newsletters_clicks.newsletter_id = 1) AS N_CLICK
ON N_CLICK.subscriber_id = `newsletters_subscribers`.`id`
LEFT JOIN (SELECT `newsletters_forwards`.`subscriber_id`,
COUNT(newsletters_forwards.id) AS NFWDIDCOUNT
FROM `thebookrackqccom_newsletters_forwards` AS newsletters_forwards
WHERE newsletters_forwards.newsletter_id = 1) AS N_FWD
ON N_FWD.subscriber_id = `newsletters_subscribers`.`id`
GROUP BY `newsletters_subscribers`.`id`
ORDER BY `newsletters_subscribers`.`email` ASC
LIMIT 25

Mysql Join with limit?

I have a table with category, product and count. All integers.
I'm looking for the most efficient query that will give me the top 10 products (highest count) for each category.
I've tried several subselects and joins but couldn't figure out how to do it in a single query. Thanks for your help.

select a.* from t a where 10 > (
select count(*) from t b
where b.category=a.category
and b.count<a.count
)
I think this is what you need.

A slightly modified query from this article in my blog:
Advanced row sampling
SELECT l.*
FROM (
SELECT category,
COALESCE(
(
SELECT count
FROM mytable li
WHERE li.category = dlo.category
ORDER BY
li.category DESC, li.count DESC, li.id DESC
LIMIT 9, 1
), CAST(-1 AS DECIMAL)) AS mcount
COALESCE(
(
SELECT id
FROM mytable li
WHERE li.category = dlo.category
ORDER BY
li.category DESC, li.count DESC, li.id DESC
LIMIT 9, 1
), CAST(-1 AS DECIMAL)) AS mid
FROM (
SELECT DISTINCT category
FROM mytable dl
) dlo
) lo, mytable l
WHERE l.category >= lo.category
AND l.category <= lo.category
AND (l.count, l.id) >= (lo.mcout, lo.id)
You need to create a composite index on (category, count, id) for this to work efficiently.
Note the usage of l.category >= lo.category AND l.category <= lo.category instead of mere: l.category = lo.category
This is a hack to make MySQL use efficient Range check for each record

This article addresses your problem I think.
Basically, it says that if your table is small, you can do a self inequality join, like this:
SELECT t1.*, COUNT(*) AS countRank
FROM tbl AS t1
JOIN tbl AS t2 ON t1.category=t2.category AND t1.count <= t2.count
GROUP BY t1.category, t1.count
HAVING countRank <= 10
ORDER BY category,count DESC;
It's an expensive operation, but for a small table you should be fine. If you have a large table, you should forget about doing it with one query and implement a different approach to the solution.

SET #row = 0;
SET #category = 0;
 
SELECT top.*
FROM (
  SELECT IF(#category = p.cId, #row := #row + 1, #row := 1) rowNumber,
    (#category := p.cId) categoryId,
    p.pId
  FROM (
    SELECT c.cId,
      c.pId
    FROM prod pr
      INNER JOIN cat_prod c ON c.pId = pr.id
    GROUP BY c.cId, c.pId
    ) p
  ) top
HAVING top.rowNumber < 4;

select a.* from `table` a where a.product in (
select b.product from `table` b
where b.category=a.category
order by b.count desc
limit 10
)
I think this is a good way, but mysql returns:
MySQL 返回：文档
#1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SELECT DISTINCT from one column based on MAX of another - mysql

Related

Selecting from three tables

Way to reduce execution time of this query in mysql

SQl Server 2008 Performance Issue for Count(distinct()) and SUM. How can avoid this issue?

MySQL Query Efficiency - Is there a better way?

Mysql Join with limit?

Categories

Resources