how to stop or limit select after selecting top 100 row? - mysql

I have a query like this -
SELECT c.msisdn,SUM(c.dataVolumeDownLink+c.dataVolumeUpLink) AS datasum
FROM cdr c
WHERE c.eveDate>='2013-10-29'
GROUP BY c.msisdn
ORDER BY datasum DESC;
This one taking 4 minutes. I have an index on evedate.
CDR table contains 2400000 records for each day from '2013-10-01' to '2013-10-30'. But I want to select only first 100 records. How I am suppose to optimize this query.
I have used limit clause but there is no benefit of it.
So please let me know how I can optimize this query.
Thank you.

you just put
LIMIT 100
after .... ORDER BY datasum DESC here;
like .... ORDER BY datasum DESC LIMIT 100;

If records are distributed evenly, one day would have 80k rows. GROUP BY operation over 80k might not take 4 minute (I guess)
I'm not sure you have following index:
INDEX(eveDate, msisdn)
with above index, records are sorted by eveDate and msisdn so GROUP BY operation is optimized. i.e, same msisdns are located same block. I guess following query is faster than your query.
Q1
SELECT x.msisdn, SUM(datasum)
FROM
(
SELECT c.msisdn AS msisdn,
SUM(c.dataVolumeDownLink+c.dataVolumeUpLink) AS datasum
FROM cdr c
WHERE c.eveDate>='2013-10-29'
GROUP BY eveDate, c.msisdn
) x
GROUP BY x.msisdn
ORDER BY SUM(datasum)
LIMIT 100;
or something like this.
Q2
SELECT c.msisdn SUM(c.dataVolumeDownLink+c.dataVolumeUpLink) AS datasum
FROM cdr c
WHERE c.eveDate>='2013-10-29'
GROUP BY c.msisdn
ORDER BY 100;
above query is simpler, but same msisdn can be located in another eveDate. so benefit from INDEX(eveDate, msisdn) is a little. If you disk has large free space, following INDEX makes execution only INDEX scan. no need for data. all required is in INDEX
INDEX(eveDate, msisdn, dataVolumeDownLink, dataVolumeUpLink)
UPDATED
hmm, If data is append only, and appended data is never changed. I wonder if make summary table for every day.
CREATE TABLE summary(eveDate, msisdn, datasum, INDEX(eveDate, msisdn);
and run following query every night via cronjob
INSERT INTO summary
SELECT NOW() c.msisdn,SUM(c.dataVolumeDownLink+c.dataVolumeUpLink) AS datasum
FROM cdr c
WHERE c.eveDate = NOW()
GROUP BY c.msisdn
then your query would be very simple.
SELECT msisdn, SUM(datasum) as datasum
FROM summary
WHERE eveDate BETWEEN ? AND ?

SELECT c.msisdn,SUM(c.dataVolumeDownLink+c.dataVolumeUpLink) AS datasum
FROM cdr c
WHERE c.eveDate>='2013-10-29'
GROUP BY c.msisdn
ORDER BY datasum DESC
LIMIT 0, 100;

SELECT c.msisdn,SUM(c.dataVolumeDownLink+c.dataVolumeUpLink) AS datasum
FROM
(select * from cdr where eveDate>='2013-10-29' limit 100) as c
GROUP BY c.msisdn
ORDER BY datasum DESC;
Small change from Larry's answer
Not completely sure if I understood the question correctly
This would first take the first 100 records and do the calculation on that.
So the final result may be less that 100 rows, based on the group by clause
EDIT :
As per your clarification, you will need to add an index on c.msisdn and add a limit clause at the end Remove the order by clause and put an outer query just to have the records ordered by
SELECT a.* FROM (
SELECT c.msisdn,SUM(c.dataVolumeDownLink+c.dataVolumeUpLink) AS datasum
FROM cdr c
WHERE c.eveDate>='2013-10-29'
GROUP BY c.msisdn limit 100) a
ORDER BY a.datasum DESC;
add an index on c.msisdn

Related

how to select random user

i have the following table with its fields, this table has more than 30K user's,
EACH USE'r has more than 1000 records,
userid is named as ANONID , i want to select randomly 100 user with all their records,using MYSQL
thanks in advance
"rand()" function as mentioned in earlier answers do not work in SQL 2k12 . for SQL use following query to get random 100 rows using "newid()" function
("newid()" is built in function for SQL)
select * from table
order by newid()
offset 0 rows
fetch next 100 rows only
For a table of 30,000 and a single sample, you can use:
select t.*
from t
order by rand()
limit 100;
This does exactly what you want. It will take a few seconds to return.
If performance is an issue, there are other more complicated methods for sampling the data. A simple method reduces the number of rows before the order by. So a 5% sample will speed the query and here is one method for doing that:
select t.*
from t
where rand() < 0.05
order by rand()
limit 100;
EDIT:
You want what is called a clustered sample or hierarchical sample. Use a subquery:
select t.*
from t join
(select userid
from (select distinct userid from t) t
order by rand()
limit 100
) tt
on t.userid = tt.userid;
SELECT * FROM table
ORDER BY RAND()
LIMIT 100
It is slow, but it works.

How can I limit query's results without using LIMIT

I need to show ordered 20 records on my grid but I can't use LIMIT because of my generator(Scriptcase) using LIMIT to show lines per page. It's generator's bug but I need to solve it for my project. So is it possible to show 20 ordered record from my table with a query?
As from comments,if you can't use limit then you can rank your results on basis of some order and in parent select filter limit the results by rank number
select * from (
select *
,#r:=#r + 1 as row_num
from your_table_name
cross join (select #r:=0)t
order by some_column asc /* or desc*/
) t1
where row_num <= 20
Demo with rank no.
Another hackish way would be using group_concat() with order by to get the list of ids ordered on asc/desc and substring_index to pick the desired ids like you need 20 records then join with same table using find_in_set ,But this solution will be very expensive in terms of performance and group_concat limitations if you need more than 20 records
select t.*
from your_table_name t
join (
select
substring_index(group_concat(id order by some_column asc),',',20) ids_list
from your_table_name
) t1 on (find_in_set(t.id , t1.ids_list) > 0)
Demo without rank
What about SELECT in SELECT:
SELECT *
FROM (
-- there put your query
-- with LIMIT 20
) q
So outer SELECT is without LIMIT and your generator can add own.
In a Scriptcase Grid, you CAN use Limit. This is a valid SQL query that selects only the first 20 records from a table. The grid is set to show only 10 records per page, so it will show 20 results split in a total of 2 pages:
SELECT
ProductID,
ProductName
FROM
Products
LIMIT 20
Also the embraced query works out well:
SELECT
ProductID,
ProductName
FROM
(SELECT
ProductID,
ProductName
FROM Products LIMIT 20) tmp

MySQL select last 'n' records by date, but sorted oldest to newest

I have a table that has transactions with a datetime column. I'm trying to select the last 'n' records (i.e. 20 rows) but have it sorted oldest to newest.
SELECT *
FROM table
WHERE 1=1
ORDER BY table.datefield DESC
LIMIT 20;
Gives me the 20 most recent, but in the opposite order.
Is this possible in one query, or will I have to do a query to get total rows and then adjust the limit based on that so I can do the table.datefiled ASC and then limit (total rows - n), n
Building a SELECT around your original SELECT and convert this to a derived table should do it
SELECT t.*
FROM (
SELECT *
FROM table
WHERE 1=1
ORDER BY table.datefield DESC
LIMIT 20
) t
ORDER BY t.datefield

MySQL - Select highest values from one column, then re-order based on a second column?

I think what I need to do can be done using one query, but I'm really not sure - and I'd like to avoid performing a query and then sorting the resultant array if possible.
Basically, I have one table, which includes the following columns:
product_name, price, sold
From these columns, I'd like to do the following:
Select the highest 20 values from the 'sold' column DESC;
Order the 20 results by price ASC.
Sounds so simple, but can't figure out how to accomplish this to save my life, and SQL is not my strong point. If anyone could help out, it would be appreciated!
You can use subqueries for this:
select t.*
from (select t.*
from t
order by sold desc
limit 20
) t
order by price asc
You have a query that does a bunch of stuff. I'll call this . Here is what you do:
select t.*
from (select t.*
from (<subquery
) t
order by sold desc
limit 20
) t
order by price asc
I think this will do what you are looking for:
select * from table
order by sold desc, price asc

SQL Distinct - Get all values

Thanks for looking, I'm trying to get 20 entries from the database randomly and unique, so the same one doesn't appear twice. But I also have a questionGroup field, which should also not appear twice. I want to make that field distinct, but then get the ID of the field selected.
Below is my NOT WORKING script, because it does the ID as distinct too which
SELECT DISTINCT `questionGroup`,`id`
FROM `questions`
WHERE `area`='1'
ORDER BY rand() LIMIT 20
Any advise is greatly appreciated!
Thanks
Try doing the group by/distinct first in a subquery:
select *
from (select distinct `questionGroup`,`id`
from `questions`
where `area`='1'
) qc
order by rand()
limit 20
I see . . . What you want is to select a random row from each group, and then limit it to 20 groups. This is a harder problem. I'm not sure if you can do this accurately with a single query in mysql, not using variables or outside tables.
Here is an approximation:
select *
from (select `questionGroup`
coalesce(max(case when rand()*num < 1 then id end), min(id)) as id
from `questions` q join
(select questionGroup, count(*) as num
from questions
group by questionGroup
) qg
on qg.questionGroup = q.questionGroup
where `area`='1'
group by questionGroup
) qc
order by rand()
limit 20
This uses rand() to select an id, taking, on average two per grouping (but it is random, so sometimes 0, 1, 2, etc.). It chooses the max() of these. If none appear, then it takes the minimum.
This will be slightly biased away from the maximum id (or minimum, if you switch the min's and max's in the equation). For most applications, I'm not sure that this bias would make a big difference. In other databases that support ranking functions, you can solve the problem directly.
Something like this
SELECT DISTINCT *
FROM (
SELECT `questionGroup`,`id`
FROM `questions`
WHERE `area`='1'
ORDER BY rand()
) As q
LIMIT 20