MySQL: Getting average or sum from 200,000 rows - mysql

I would like to get the average or at least the sum of 200,000 rows from mySQL database. This is how I am querying the database but the amount is too large for me to query because I cannot afford to overload the server.
SELECT user_id, total_email FROM email_users
WHERE email_code = 1
LIMIT 200000
SELECT SUM(total_email), AVG(total_email) FROM email_users
WHERE user_id IN
(
01, 02,..., 200000-th user_id
)
My question is there a way to somehow combine the two queries into one so that I can get just the sum or average of 200,000 email_users which has email_code = 1.
EDIT: Thanks to all that have answered. I didn't realise the answer was so easy - nested select statement.

You can do this with a subquery:
SELECT SUM(total_email), AVG(total_email)
from (SELECT eu.*
FROM email_users eu
WHERE eu.email_code = 1
LIMIT 200000
) eu
Some notes. First, using limit without an order by gives indeterminate results. You could (in theory) run this query twice and get different results. Second, this assumes that there is a field called total_email in email_users.

SELECT SUM(total_email), AVG(total_email)
FROM (SELECT total_email
FROM email_users
WHERE email_code = 1
LIMIT 200000) x

How about something like this assuming you just want any 200K records from the DB where email_code=1
SELECT SUM(total_email), AVG(total_email) FROM email_users
WHERE user_id IN
(
SELECT user_id
FROM email_users
WHERE email_code = 1 LIMIT 200000
)
or
SELECT SUM(total_email), AVG(total_email) FROM
(SELECT user_id , total_email
FROM email_users
WHERE email_code = 1 LIMIT 200000)

Related

MySQL : Group By Clause Not Using Index when used with Case

Im using MySQL
I cant change the DB structure, so thats not an option sadly
THE ISSUE:
When i use GROUP BY with CASE (as need in my situation), MYSQL uses
file_sort and the delay is humongous (approx 2-3minutes):
http://sqlfiddle.com/#!9/f97d8/11/0
But when i dont use CASE just GROUP BY group_id , MYSQL easily uses
index and result is fast:
http://sqlfiddle.com/#!9/f97d8/12/0
Scenerio: DETAILED
Table msgs, containing records of sent messages, with fields:
id,
user_id, (the guy who sent the message)
type, (0=> means it's group msg. All the msgs sent under this are marked by group_id. So lets say group_id = 5 sent 5 msgs, the table will have 5 records with group_id =5 and type=0. For type>0, the group_id will be NULL, coz all other types have no group_id as they are individual msgs sent to single recipient)
group_id (if type=0, will contain group_id, else NULL)
Table contains approx 10 million records for user id 50001 and with different types (i.e group as well as individual msgs)
Now the QUERY:
SELECT
msgs.*
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.user_id IN (50111)
AND msgs.type IN (0, 1, 5, 7)
GROUP BY CASE `msgs`.`type` WHEN 0 THEN `msgs`.`group_id` ELSE `msgs`.`id` END
ORDER BY `msgs`.`group_id` DESC
LIMIT 100
I HAVE to get summary in a single QUERY,
so msgs sent to group lets say 5 (have 5 records in this table) will be shown as 1 record for summary (i may show COUNT later, but thats not an issue).
The individual msgs have NULL as group_id, so i cant just put 'GROUP BY group_id ' coz that will Group all individual msgs to single record which is not acceptable.
Sample output can be something like:
id owner_id, type group_id COUNT
1 50001 0 2 5
1 50001 1 NULL 1
1 50001 4 NULL 1
1 50001 0 7 5
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 0 10 5
Now the problem is that the GROUP condition after using CASE (which i currently think that i have to because i only need to group by group_id if type=0) is causing alot of delay coz it's not using indexes which it does if i dont use CASE (like just group by group_id ). Please view SQLFiddles above to see the explain results
Can anyone plz give an advice how to get it optimized
UPDATE
I tried a workaround , that does somehow works out (drops INITIAL queries to 1sec). Using union, what it does is, to minimize the resultset by union that forces SQL to write on disk for filesort (due to huge resultset), limit the resultset of group msgs, and individual msgs (view query below)
-- first part of union retrieves group msgs (that have type 0 and needs to be grouped by group_id). Applies the limit to captivate the out of control result set
-- The second query retrieves individual msgs, (those with type !=0, grouped by msgs.id - not necessary but just to be save from duplicate entries due to joins). Applies the limit to captivate the out of control result set
-- JOins the two to retrieve the desired resultset
Here's the query:
SELECT
*
FROM
(
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (msgs.user_id = accounts.id)
WHERE 1
AND accounts.id IN (50111 ) AND type = 0
GROUP BY msgs.group_id
ORDER BY msgs.id DESC
LIMIT 40
)
UNION
ALL
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.type != 0
AND accounts.id IN (50111)
GROUP BY msgs.id
ORDER BY msgs.id
LIMIT 40
)
) AS temp
ORDER BY reference_id
LIMIT 20,20
But has alot of caveats,
-I need to handle the limit in inner queries as well. Lets say 20recs per page, and im on page 4. For inner queries , i need to apply limit 0,80, since im not sure which of the two parts had how many records in the previous 3 pages. So, as the records per page and number of pages grow, my query grows heavier. Lets say 1k rec per page, and im on page 100 , or 1K, the load gets heavier and time exponentially increases
I need to handle ordering in inner queries and then apply on the resultset prepared by union , conditions need to be applied on both inner queries seperately(but not much of an issue)
-Cant use calc_found_rows, so will need to get count using queries seperately
The main issue is the first one. The higher i go with the pagination , the heavier it gets
Would this run faster?
SELECT id, user_id, type, group_id
FROM
( SELECT id, user_id, type, group_id, IFNULL(group_id, id) AS foo
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
)
GROUP BY foo
ORDER BY `group_id` DESC
LIMIT 100
It needs INDEX(user_id, type).
Does this give the 'correct' answer?
SELECT DISTINCT *
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
GROUP BY IFNULL(group_id, id)
ORDER BY `group_id` DESC
LIMIT 100
(It needs the same index)

Need to improve sql performance

Table temporary_search_table
post_id,property_status, property_address,....more 30 field
Table search_meta
meta_id,search_id,status,created_date
Ok I need Total data which created_date is yesterday. For each temporary_search_table data there may multiple entry within search_meta. So we need to pick last one field from search_meta and check created date is yesterday and property_status is pending. if yes then we can count the number. If there is no data available in search_meta for entry in temporary_search_table then we dont need to count that row within our results.
Here i am attaching my sql data. its work but for 30000 row it take lots of time.
SELECT COUNT(id) FROM temporary_search_table
WHERE property_status = 'pending' AND (1 = (SELECT DATEDIFF(NOW(), created_date)
FROM search_meta WHERE post_id = search_id ORDER BY created_date DESC LIMIT 0,1 ))
Thanks in advance.
Apart from checking the indexes on your table, it would probably be better to not use a correlated sub query and use a straight join instead.
SELECT COUNT(id)
FROM temporary_search_table
INNER JOIN search_meta ON post_id = search_id
WHERE property_status = 'pending' AND DATEDIFF(NOW(), created_date) = 1
ORDER BY created_date DESC
LIMIT 1

MySQL select top ten records with no duplicate uid

I have the following table (user_record) with millions of rows like this:
no uid s
================
1 a 999
2 b 899
3 c 1234
4 a 1322
5 b 933
-----------------
The uid can be duplicate .What I need is to show the top ten records(need inclued uid and s) with no duplicate uid order by s (desc). I can do this by two steps in the following SQL statements:
SELECT distinct(uid) FROM user_record ORDER BY s DESC LIMIT 10
SELECT uid,s FROM user_record WHERE uid IN(Just Results)
I just wana know is there a bit more efficient way in one statement?
Any help is greatly appreciated.
ps:I also have following the SQL statement:
select * from(select uid,s from user_record order by s desc) as tb group by tb.uid order by tb.s desc limit 10
but it's slow
The simpliest would be by using MAX() to get the highest s for every uid and sorted it based on the highest s.
SELECT uid, MAX(s) max_s
FROM TableName
GROUP BY uid
ORDER BY max_s DESC
LIMIT 10
SQLFiddle Demo
The disadvantage of the query above is that it doesn't handles duplicates if for instance there are multiple uid that have the same s and turn out to be the highest value. If you want to get the highest value s with duplicate, you can do by calculating it on the subquery and joining the result on the original table.
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT DISTINCT s
FROM TableName
ORDER BY s DESC
LIMIT 10
) b ON a.s = b.s
ORDER BY s DESC

Determine total amount of top result returned

I would like to determine two things from a single query:
Most prevalent column in a table
The amount of times such column was located upon querying the table
Example Table:
user_id some_field
1 data
2 data
1 data
The above would return user_id # 1 as being the most prevalent in the table, and it would return (2) for the total amount of times that it was located in the table.
I have done my research and I came across two types of queries.
GROUP BY user_id ORDER BY COUNT(*) DESC
SUM
The problem is that I can't figure out how to use these two queries in conjunction with one another. For example, consider the following query which successfully returns the most prevalent column.
$top_user = "SELECT user_id FROM table_name GROUP BY user_id ORDER BY COUNT(*) DESC";
The above query returns "1" based on the example table shown above. Now, I would like to be able to return "2" for the total amount of times the user_id (1) was found in the table.
Is this by any chance possible?
Thanks,
Evan
You can include count(*) in the SELECT list:
SELECT user_id, count(*) as totaltimes from table_name
GROUP BY user_id ORDER BY count(*) DESC;
If you want only the first one:
SELECT user_id, count(*) as totaltimes from table_name
GROUP BY user_id ORDER BY count(*) DESC LIMIT 1;

Why is my SQL so slow?

My table is reasonably small around 50,000 rows. My schema is as follows:
DAILY
match_id
user_id
result
round
tournament_id
Query:
SELECT user_id
FROM `daily`
WHERE user_id IN (SELECT user_id
FROM daily
WHERE round > 25
AND tournament_id = 24
AND (result = 'Won' OR result = 'Lost'))
Using the in keyword in the fashion you are is a very dangerous [from a performance perspective] thing to do. It will result in the sub query [(select user_id from daily where round > 25 and tournament_id=24 and (result='Won' or result='Lost'))] being ran 50,000 times in this case.
You'll want to convert this onto a join something to the effect of
select user_id from daily a join
(select user_id from daily where round > 25 and tournament_id=24 and (result='Won' or result='Lost')) b on a.user_id = b.user_id
Doing something similar to this will result in only two queries and a join.
As Cybernate pointed out in your specific example you can simply use where clauses, but I went ahead and suggested this in case your query is actually more complex than what you posted.
First verify and add Indexes as suggested earlier.
Also why are you using an in if you are querying data from same table.
Change your query to:
SELECT user_id
FROM daily
WHERE round > 25
AND tournament_id = 24
AND ( result = 'Won'
OR result = 'Lost' )
Your query only needs to be:
SELECT d.user_id
FROM DAILY d
WHERE d.round > 25
AND d.tournament_id = 24
AND d.result IN ('Won', 'Lost')
Indexes should be considered on:
DAILY.round
DAILY.tournament_id
DAILY.result
This should return in a millisecond.
SELECT user_id FROM daily WITH(NOLOCK)
where user_id in (select user_id from daily WITH(NOLOCK) where round > 25 and tournament_id = 24 and (result = 'Won' or result = 'Lost'))
Then make sure there is an index on the filter columns.
CREATE NONCLUSTERED INDEX IX_1 ON daily (round ASC, tournament_id ASC, result ASC)