group by and order by count desc taking time - mysql

The query take more than 6 seconds for 4 million records. Any other procedure can be done to minimize the query time.
SELECT title_id, count(title_id) as count
FROM `title_keywords`
WHERE keyword_id in (1,2,3,4,5,6,7,8,9)
GROUP BY title_id
ORDER BY count desc
Index and unique columns
Added composite index too

Because the COUNT function needs to potentially touch every record in each group, there may not be much which can speed up the aggregation. However, we might be able to take advantage of an index to speed up the WHERE clause:
CREATE INDEX idx ON title_keywords (keyword_id, title_id);
You could also try reversing the order of the index columns, and in either case perhaps check the execution plan using EXPLAIN. The reason this index might work is that it would allow MySQL to quickly access on the matching keyword_id records. The index also covers title_id, so that this value would be available in the leaf nodes of the B-tree.

try avoid IN clause using a INNER JOIN
SELECT title_id, count(title_id) as count
FROM title_keywords
INNER JOIN (
SELECT 1 col1
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
UNION
SELECT 6
UNION
SELECT 7
UNION
SELECT 8
UNION
SELECT 9
) t t.col1 = title_keywords.keyword_id
group by title_id
order by count desc
and be sure you have a proper index on
table title_keywords columns( keyword_id, title_id )

Related

I scratch my head with UNION ALL

I am not an SQL query wizard at all, and here is my problem:
I have those 3 separate querys that works very well and each one gives me a nice looking frame with results on my website.
SELECT arretsautressb AS Raison, SUM(minutesarrets) AS Minutes
FROM rapport_production_salles_blanches_2_repeat
GROUP BY Raison
ORDER BY Minutes DESC
SELECT redresseuseminutesarrets AS Raison, SUM(minutesarretsredresseuse) AS Minutes
FROM rapport_production_salles_blanches_3_repeat
GROUP BY Raison
ORDER BY Minutes DESC
SELECT raisonarretsconvoyeurair AS Raison, SUM(minutesarretsconvoyeurair) AS Minutes
FROM rapport_production_salles_blanches_4_repeat
GROUP BY Raison
ORDER BY Minutes DESC
So everything is fine with those 3 results...the Raison column in my table return all the rows and the Minutes query SUM all rows Group by Raison...
but i would like to merge those querys so it would give me only 1 big table with the results,instead on 3 tables.
But no matter how i try to format my UNION ALL code, what i get is 1 result only from each Raison query (so it takes only 1 row in sql table), instead of all the rows when they are separated. but the Minutes query is doing fine calculating the SUM of all the rows.
It would be cool if someone would just show me how to do it...cause i have been reading documentation for a couple of hours, and i am still stuck on this one.
This is what i tried so far, no error, but only 1 row of Raison is taken from sql table, instead of all rows:
SELECT *
FROM ( (SELECT arretsautressb AS Raison,
SUM(minutesarrets) AS Minutes
FROM rapport_production_salles_blanches_2_repeat t1)
UNION ALL
(SELECT redresseuseminutesarrets AS Raison,
SUM(minutesarretsredresseuse) AS Minutes
FROM rapport_production_salles_blanches_3_repeat t2)
UNION ALL
(SELECT raisonarretsconvoyeurair AS Raison,
SUM(minutesarretsconvoyeurair) AS Minutes
FROM rapport_production_salles_blanches_4_repeat t3)
) AS t123
GROUP BY Raison
ORDER BY Minutes DESC
This is what i get from my UNION ALL query:
UNION ALL
But this is what i get from 3 separated querys:
3 querys
I think your query doesn't return your desired result because of the following things:
It's fine to use a sub query where you specify the three tables and union them. However, you cannot use an aggregate (in this case SUM) without the use of GROUP BY.
Next, whenever you use GROUP BY, you should refer to the attribute instead of the column name. In my query I changed GROUP BY Raison to GROUP BY t1.arretsautressb.
I have used an ORDER BY on the outer query and I order by the second column, which is in this case the SUM(minutesarrets).
The query I would use is the following:
SELECT *
FROM (
SELECT arretsautressb AS Raison
, SUM(minutesarrets) AS sum_minutes
FROM rapport_production_salles_blanches_2_repeat AS t1
GROUP BY t1.arretsautressb
UNION ALL
SELECT redresseuseminutesarrets AS Raison
, SUM(minutesarretsredresseuse) AS sum_minutes
FROM rapport_production_salles_blanches_3_repeat AS t2
GROUP BY t2.redresseuseminutesarrets
UNION ALL
SELECT raisonarretsconvoyeurair AS Raison
, SUM(minutesarretsconvoyeurair) AS sum_minutes
FROM rapport_production_salles_blanches_4_repeat AS t3
GROUP BY t3.raisonarretsconvoyeurair
) AS t123
ORDER BY 2 DESC
Try this:
SELECT * FROM (
SELECT * FROM (
(SELECT arretsautressb AS Raison, SUM(minutesarrets) AS Minutes FROM rapport_production_salles_blanches_2_repeat t1)
UNION ALL
(SELECT redresseuseminutesarrets AS Raison, SUM(minutesarretsredresseuse) AS Minutes FROM rapport_production_salles_blanches_3_repeat t2)
) t1
UNION All
(SELECT raisonarretsconvoyeurair AS Raison, SUM(minutesarretsconvoyeurair) AS Minutes FROM rapport_production_salles_blanches_4_repeat t3)
) AS t123 GROUP BY t123.Raison ORDER BY t123.Minutes DESC

MySQL grouping with detail

I have a table that looks like this...
user_id, match_id, points_won
1 14 10
1 8 12
1 12 80
2 8 10
3 14 20
3 2 25
I want to write a MYSQL script that pulls back the most points a user has won in a single match and includes the match_id in the results - in other words...
user_id, match_id, max_points_won
1 12 80
2 8 10
3 2 25
Of course if I didn't need the match_id I could just do...
select user_id, max(points_won)
from table
group by user_id
But as soon as I add match_id to the "select" and "group by" I have a row for every match, and if I only add the match_id to the "select" (and not the "group by") then it won't correctly relate to the points_won.
Ideally I don't want to do the following either because it doesn't feel particularly safe (e.g. if the user has won the same amount of points on multiple matches)...
SELECT t.user_id, max(t.points_won) max_points_won
, (select t2.match_id
from table t2
where t2.user_id = t.user_id
and t2.points_won = max_points_won) as 'match_of_points_maximum'
FROM table t
GROUP BY t.user_id
Are there any more elegant options for this problem?
This is harder than it needs to be in MySQL. One method is a bit of a hack but it works in most circumstances. That is the group_concat()/substring_index() trick:
select user_id, max(points_won),
substring_index(group_concat(match_id order by points_won desc), ',', 1)
from table
group by user_id;
The group_concat() concatenates together all the match_ids, ordered by the points descending. The substring_index() then takes the first one.
Two important caveats:
The resulting expression has a type of string, regardless of the internal type.
The group_concat() uses an internal buffer, whose length -- by default -- is 1,024 characters. This default length can be changed.
You can use the query:
select user_id, max(points_won)
from table
group by user_id
as a derived table. Joining this to the original table gets you what you want:
select t1.user_id, t1.match_id, t2.max_points_won
from table as t1
join (
select user_id, max(points_won) as max_points_won
from table
group by user_id
) as t2 on t1.user_id = t2.user_id and t1.points_won = t2.max_points_won
I think you can optimize your query by add limit 1 in the inner query.
SELECT t.user_id, max(t.points_won) max_points_won
, (select t2.match_id
from table t2
where t2.user_id = t.user_id
and t2.points_won = max_points_won limit 1) as 'match_of_points_maximum'
FROM table t
GROUP BY t.user_id
EDIT : only for postgresql, sql-server, oracle
You could use row_number :
SELECT USER_ID, MATCH_ID, POINTS_WON
FROM
(
SELECT user_id, match_id, points_won, row_number() over (partition by user_id order by points_won desc) rn
from table
) q
where q.rn = 1
For a similar function, have a look at Gordon Linoff's answer or at this article.
In your example, you partition your set of result per user then you order by points_won desc to obtain highest winning point first.

Adding Row Values when there are no results - MySQL

Problem Statement: I need my result set to include records that would not naturally return because they are NULL.
I'm going to put some simplified code here since my code seems to be too long.
Table Scores has Company_type, Company, Score, Project_ID
Select Score, Count(Project_ID)
FROM Scores
WHERE company_type= :company_type
GROUP BY Score
Results in the following:
Score Projects
5 95
4 94
3 215
2 51
1 155
Everything is working fine until I apply a condition to company_type that does not include results in one of the 5 score categories. When this happens, I don't have 5 rows in my result set any more.
It displays like this:
Score Projects
5 5
3 6
1 3
I'd like it to display like this:
Score Projects
5 5
4 0
3 6
2 0
1 3
I need the results to always display 5 rows. (Scores = 1-5)
I tried one of the approaches below by Spencer7593. My simplified query now looks like this:
SELECT i.score AS Score, IFNULL(count(*), 0) AS Projects
FROM (SELECT 5 AS score
UNION ALL
SELECT 4
UNION ALL
SELECT 3
UNION ALL
SELECT 2
UNION ALL
SELECT 1) i
LEFT JOIN Scores ON Scores.score = i.score
GROUP BY Score
ORDER BY i.score DESC
And gives the following results, which is accurate except that the rows with 1 in Projects should actually be 0 because they are derived by the "i". There are no projects with a score of 5 or 2.
Score Projects
5 1
4 5
3 6
2 1
1 3
Solved! I just needed to adjust my count to specifically look at the project count - count(project) rather than count(*). This returned the expected results.
If you always want your query to return 5 rows, with Score values of 5,4,3,2,1... you'll need a rowsource that supplies those Score values.
One approach would be to use a simple query to return those fixed values, e.g.
SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
Then use that query as inline view, and do an outer join operation to the results from your current query
SELECT i.score AS `Score`
, IFNULL(q.projects,0) AS `Projects`
FROM ( SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
) i
LEFT
JOIN (
-- the current query with "missing" Score rows goes here
-- for completeness of this example, without the query
-- we emulate that result with a different query
SELECT 5 AS score, 95 AS projects
UNION ALL SELECT 3, 215
UNION ALL SELECT 1, 155
) q
ON q.score = i.score
ORDER BY i.score DESC
It doesn't have to be the view query in this example. But there does need to be a rowsource that the rows can be returned from. You could, for example, have a simple table that contains those five rows, with those five score values.
This is just an example approach for the general approach. It might be possible to modify your existing query to return the rows you want. But without seeing the query, the schema, and example data, we can't tell.
FOLLOWUP
Based on the edit to the question, showing an example of the current query.
If we are guaranteed that the five values of Score will always appear in the Scores table, we could do conditional aggregation, writing a query like this:
SELECT s.score
, COUNT(IF(s.company_type = :company_type,s.project_id,NULL)) AS projects
FROM Scores s
GROUP BY s.score
ORDER BY s.score DESC
Note that this will require a scan of all the rows, so it may not perform as well. The "trick" is the IF function, which returns a NULL value in place of project_id, when the row would have been excluded by the WHERE clause.)
If we are guaranteed that project_id is non-NULL, we could use a more terse MySQL shorthand expression to achieve an equivalent result...
, IFNULL(SUM(s.company_type = :company_type),0) AS projects
This works because MySQL returns 1 when the comparison is TRUE, and otherwisee returns 0 or NULL.
Try something like this:
select distinct score
from (
select distinct score from scores
) s
left outer join (
Select Score, Count(Project_ID) cnt
FROM Scores
WHERE company_type= :company_type
) x
on s.score = x.score
Your posted query would not work without a group by statement. However, even there, if you don't have those particular scores for that company type, it wouldn't work either.
One option is to use an outer join. That would require a little more work though.
Here's another option using conditional aggregation:
select Score, sum(company_type=:company_type)
from Scores
group by Score

Select NOT IN from a predefined list

For simplicity, suppose I have a table transactions with id as the primary key. Currently there are only 10 rows in the table with id from 1 to 10.
I have a list of ids: {9,10,11,12}. This list is not stored in the database.
I want to query the database for the ids not in the transactions table. In the case above I want to get 11, 12.
What's the best way to way to write this query?
Currently I just query SELECT * FROM transactions WHERE id IN (9,10,11,12). And do my intersection in code. I'm wondering is I can do it all in one step in SQL.
You can do this with a subquery containing the ids. Here is one way:
select ids.id
from (select 9 as id union all select 10 union all select 11 union all select 12
) ids
where not exists (select 1 from transactions t where t.id = ids.id);
Returning rows from a table called transactions seems inefficient -- way too much data going back and forth for what you need. (Although you only have 10 rows, so this isn't a big deal with your data size.)
An alternative approach would be to use the EXCEPT clause. For example;
select 9 as id union all select 10 union all select 11 union all select 12
except select id from transactions

Sorting results of UNION query combining two tables by the table of origin

I'm using the UNION operator to select results from two different tables. I want results from the first table result to come before those from the second table.
For example: I have the tables customer_coupons and segment_coupons. Both tables have a column named coupon_id. When I run a query involving a UNION of these two tables, it returns the correct records, but they are not the order I want: It gives me the coupon_ids of both tables mixed in ascending order, but I want to show ALL coupon_ids of the first table and then ALL coupon_ids of the second table.
Here's the query as it currently exists:
SELECT coupon_id
FROM customer_coupons
UNION
SELECT coupon_id
FROM segment_coupons;
How can I change this so that all results from the first half of the query come before all results of the second half?
Put in a fixed table-identifying field:
(SELECT 1 AS source_table, coupon_id
FROM customer_coupons)
UNION ALL
(SELECT 2 AS sourcE_table, coupon_id
FROM segment_coupons)
ORDER BY source_table, coupon_id
Note the brackets around the individual queries. This forces MySQL to apply the order by to the result of the union, not to the 2 sub-query.
SELECT * FROM (
SELECT coupon_id, 1 as myorder
FROM customer_coupons
UNION
SELECT coupon_id 2 as myorder
FROM segment_coupons)
Order by myorder