MySQL select AVG, ORDER BY, GROUP BY & LIMIT - mysql

The bellow statement does not work but i cant seem to figure out why
select AVG(delay_in_seconds) from A_TABLE ORDER by created_at DESC GROUP BY row_type limit 1000;
I want to get the avg's of the most recent 1000 rows for each row_type. created_at is of type DATETIME and row_type is of type VARCHAR

If you only want the 1000 most recent rows, regardless of row_type, and then get the average of delay_in_seconds for each row_type, that's a fairly straightforward query. For example:
SELECT t.row_type
, AVG(t.delay_in_seconds)
FROM (
SELECT r.row_type
, r.delay_in_seconds
FROM A_table r
ORDER BY r.created_at DESC
LIMIT 1000
) t
GROUP BY t.row_type
I suspect, however, that this query does not satisfy the requirements that were specified. (I know it doesn't satisfy what I understood as the specification.)
If what we want is the average of the most recent 1000 rows for each row_type, that would also be fairly straightforward... if we were using a database that supported analytic functions.
Unfortunately, MySQL doesn't provide support for analytic functions. But it is possible to emulate one in MySQL, but the syntax is a bit involved, and it is dependent on behavior that is not guaranteed.
As an example:
SELECT s.row_type
, AVG(s.delay_in_seconds)
FROM (
SELECT #row_ := IF(#prev_row_type = t.row_type, #row_ + 1, 1) AS row_
, #prev_row_type := t.row_type AS row_type
, t.delay_in_seconds
FROM A_table t
CROSS
JOIN (SELECT #prev_row_type := NULL, #row_ := NULL) i
ORDER BY t.row_type DESC, t.created_at DESC
) s
WHERE s.row_ <= 1000
GROUP
BY s.row_type
NOTES:
The inline view query is going to be expensive for large sets. What that's effectively doing is assigning a row number to each row. The "order by" is sorting the rows in descending sequence by created_at, what we want is for the most recent row to be assigned a value of 1, the next most recent 2, etc. This numbering of rows will be repeated for each distinct value of row_type.
For performance, we'd want a suitable index with leading columns (row_type,created_at,delay_seconds) to avoid an expensive "Using filesort" operation. We need at least those first two columns for that, including the delay_seconds makes it a covering index (the query can be satisfied entirely from the index.)
The outer query then runs against the resultset returned from the view query (a "derived table"). The predicate in the WHERE filters out all rows that were assigned a row number greater than 1000, the rest is a straighforward GROUP BY and and AVG aggregate.
A LIMIT clause is entirely unnecessary. It may be possible to incorporate some additional predicates for some additional performance enhancement... like, what if we specified the most recent 1000 rows, but only that were create_at within the past 30 or 90 days?
(I'm not entirely sure this answers the question that OP was asking. What this answers is: Is there a query that can return the specified resultset, making use of AVG aggregate and GROUP BY, ORDER BY and LIMIT clauses.)
N.B. This query is dependent on a behavior of MySQL user-defined variables which is not guaranteed.
The query above shows one approach, but there is also another approach. It's possible to use a "join" operation (of A_table with A_table) to get a row number assigned (getting a COUNT of the number of rows that are "more recent" than each row. With large sets, however, that can produce a humongous intermediate result, if we aren't careful to limit it.

Write the ORDER BY at the last of the statement.
SELECT AVG(delay_in_seconds) from A_TABLE GROUP BY row_type ORDER by created_at DESC limit 1000;
read mysql dev site for details.

Related

Optimizing query with LIMIT mysql

I have a table, lets call it mytable, which holds huge amount of data that I need to query based on some column values of types varchar and datetime (none of these columns have indexing on them and I cannot use primary key for this query).
I need to fetch the data with pagination, for which I am using variables varLimit and varOffset. Now what I have noticed after much experimentation is that though LIMIT varLimit optimizes a query when result count is high, it severely reduces performance when it is greater than the result count. If the query returns 0 rows, with LIMIT 20 applied it takes 30 more seconds than it does with the LIMIT removed!
Here's my query
SELECT `data`
FROM mytable
WHERE (conditions...)
ORDER BY `heure` desc LIMIT varLimit OFFSET varOffset;
To optimize this, I have to first re-calculate varLimit to set it to the minimum value between result count and itself (varLimit = 20 but if query returns 10 rows, it should set varLimit = 10. The final code becomes:
SELECT COUNT(*) INTO varCount
FROM mytable
WHERE (conditions...);
SELECT LEAST(varLimit, varCount - varOffset) INTO varLimit; -- Assume varOffset <= varCount
SELECT `data`
FROM mytable
WHERE (conditions...)
ORDER BY `heure` desc LIMIT varLimit OFFSET varOffset;
Is there any way to do it in a single query, or a better way to achieve the same?
Unfortunately you cannot use variables in LIMIT and OFFSET clauses. They must be constants, so you must do this limit/offset computation either in application code or by creating a MySQL "prepared statement" with string concatenation.

How to improve performance getting recent records to display in list, recent top 5 most

I'm making a sample recent screen that will display a list, it displays the list, with id set as primary key.
I have done the correct query as expected but the table with big amount of data can cause slow performance issues.
This is the sample query below:
SELECT distinct H.id -- (Primary Key),
H.partnerid as PartnerId,
H.partnername AS partner, H.accountname AS accountName,
H.accountid as AccountNo,
FROM myschema.mytransactionstable H
INNER JOIN (
SELECT S.accountid, S.partnerid, S.accountname,
max(S.transdate) AS maxDate
from myschema.mytransactionstable S
group by S.accountid, S.partnerid, S.accountname
) ms ON H.accountid = ms.accountid
AND H.partnerid = ms.partnerid
AND H.accountname =ms.accountname
AND H.transdate = maxDate
WHERE H.accountid = ms.accountid
AND H.partnerid = ms.partnerid
AND H.accountname = ms.accountname
AND H.transdate = maxDate
GROUP BY H.partnerid,H.accountid, H.accountname
ORDER BY H.id DESC
LIMIT 5
In my case, there are values which are similar in the selected columns but differ only in their id's
Below is a link to an image without executing the query above. They are all the records that have not yet been filtered.
Sample result query click here
Since I only want to get the 5 most recent by their id but the other columns can contain similar values
accountname,accountid,partnerid.
I already got the correct query but,
I want to improve the performance of the query. Any suggestions for the improvement of query?
You can try using row_number()
select * from
(
select *,row_number() over(order by transdate desc) as rn
from myschema.mytransactionstable
)A where rn<=5
Don't repeat ON and WHERE clauses. Use ON to say how the tables (or subqueries) are "related"; use WHERE for filtering (that is, which rows to keep). Probably in your case, all the WHERE should be removed.
Please provide SHOW CREATE TABLE
This 'composite' index would probably help because of dealing with the subquery and the JOIN:
INDEX(partnerid, accountid, accountname, transdate)
That would also avoid a separate sort for the GROUP BY.
But then the ORDER BY is different, so it cannot avoid a sort.
This might avoid the sort without changing the result set ordering: ORDER BY partnerid, accountid, accountname, transdate DESC
Please provide EXPLAIN SELECT ... and EXPLAIN FORMAT=JSON SELECT ... if you have further questions.
If we cannot get an index to handle the WHERE, GROUP BY, and ORDER BY, the query will generate all the rows before seeing the LIMIT 5. If the index does work, then the outer query will stop after 5 -- potentially a big savings.

MySQL ORDER BY from subquery lost by GROUP BY

I have a table x :
id lang externalid
1 nl 10
2 nl 11
3 fr 10
From this table I want al the rows for a certain lang and externalid, if the externalid doesn't exist for this lang, I want the row with any other lang.
The subquery sorts the table correct, but when I add the group by, the sort of the subquery is lost. This works in older mysql versions but not in 5.7.
(
SELECT
*
FROM
x
ORDER BY FIELD(lang, "fr") DESC, id
)
as y
group by externalid
I want the query to return the records with id 2 & 3. So for each distinct external id, if possible the lang = 'fr', else any other lang.
How can i solve this problem?
You are talking of given externalid and land. No need to group by externalid hence; use a mere where clause instead.
Combined with ORDER BY and LIMIT you get the record you want (i.e. the desired language if such a record exists, else another one).
select *
from mytable
where externalid = 10
order by lang = 'fr' desc
limit 1;
UPDATE: Okay, according to your comment you want to get the "best" record per externalid. In standard SQL you'd use ROW_NUMBER for this. Other DBMS have further solutions, e.g. Oracle's KEEP FIRST or Postgre's DISTINCT ON. MySQL doesn't support any of these. One way would be to emulate ROW_NUMBER with variables. Another would be to use above query as a subquery per externalid to find the best records:
select *
from mytable
where id in
(
select
(
select m.id
from mytable m
where m.externalid = e.externalid
order by m.lang = 'fr' desc
limit 1
) as best_id
from (select distinct externalid from mytable) e
);
Your subquery generates a result set (a virtual table) that's passed to your outer query.
All SQL queries, without exception, generate their results in unpredictable order unless you specify the order completely in an ORDER BY clause.
Unpredictable is like random, except worse. Random implies you'll get a different order every time you run the query. Unpredictable means you'll get the same order every time, until you don't.
MySQL ordinarily ignores ORDER BY clauses in subqueries (there are a few exceptions, mostly related to subquery LIMIT clauses). Move your ORDER BY to the top level query.
Edit. You are also misusing MySQL's notorious nonstandard extension to GROUP BY.

mysql "group by" fields orders

I have a question about using "group by" in mysql: group order whether to affect the efficiency of query.
1.SELECT SQL_NO_CACHE `er_ct`, `appve` FROM TBL_547 WHERE UAEWA_ts >= 1417276800 AND UAEWA_ts <= 1417449540 GROUP BY `appve`, `er_ct` ORDER BY `c79fd348-cc8e-41f2-ae93-0b2b2cde8a31` DESC limit 5;
2.SELECT SQL_NO_CACHE `er_ct`, `appve` FROM TBL_547 WHERE UAEWA_ts >= 1417276800 AND UAEWA_ts <= 1417449540 GROUP BY `er_ct`,`appve` ORDER BY `c79fd348-cc8e-41f2-ae93-0b2b2cde8a31` DESC limit 5;
The difference betwen two sentence is "GROUP BY appve, er_ct " and " GROUP BY er_ct,appve".There is no index(combined index) on appve and er_ct. The value of "SELECT COUNT(DISTINCT er_ct) FROM TBL_547" is 7000. The value of "SELECT COUNT(DISTINCT appve) FROM TBL_547" is 3.
here is the screenshot. http://i.stack.imgur.com/AeQy2.png
the structure: http://i.stack.imgur.com/ewgAy.png
thanks.
Creating index on column with group by will not boost your results. When you perform a query, the SQL statement first gets compiled into a tree of relational algebra operations. These operations each take one or more tables as input and produce another table as output. Then using the output table SQL engine is applying any other operations:
- agregation - group by
- sorting
So you can boost your query mostly by:
- creating smart queries, like on only indexed columns.
- asuring your result set is not huge, and not accesssing all joined tble columns, like Select * is a total overkill on production
I would also recommend SQL Tuning as an lecture. I hope my answer will help.
First think that pops in my mind is size of distinct results in both columns, you mentioned 3 and 7k, that would be the main factor I assume.
When query optimizer (they are changing all the time) will see that first group column is small, it will just go with the flow, but if he sees that the first column is large (7k distinct results) he could go with building up an index on it. That operation on a large column could be slow, thats why you have two different times for both queries.

How does MySQL process ORDER BY and LIMIT in a query?

I have a query that looks like this:
SELECT article FROM table1 ORDER BY publish_date LIMIT 20
How does ORDER BY work? Will it order all records, then get the first 20, or will it get 20 records and order them by the publish_date field?
If it's the last one, you're not guaranteed to really get the most recent 20 articles.
It will order first, then get the first 20. A database will also process anything in the WHERE clause before ORDER BY.
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants (except when using prepared statements).
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.
All details on: http://dev.mysql.com/doc/refman/5.0/en/select.html
Just as #James says, it will order all records, then get the first 20 rows.
As it is so, you are guaranteed to get the 20 first published articles, the newer ones will not be shown.
In your situation, I recommend that you add desc to order by publish_date, if you want the newest articles, then the newest article will be first.
If you need to keep the result in ascending order, and still only want the 10 newest articles you can ask mysql to sort your result two times.
This query below will sort the result descending and limit the result to 10 (that is the query inside the parenthesis). It will still be sorted in descending order, and we are not satisfied with that, so we ask mysql to sort it one more time. Now we have the newest result on the last row.
select t.article
from
(select article, publish_date
from table1
order by publish_date desc limit 10) t
order by t.publish_date asc;
If you need all columns, it is done this way:
select t.*
from
(select *
from table1
order by publish_date desc limit 10) t
order by t.publish_date asc;
I use this technique when I manually write queries to examine the database for various things. I have not used it in a production environment, but now when I bench marked it, the extra sorting does not impact the performance.
You could add [asc] or [desc] at the end of the order by to get the earliest or latest records
For example, this will give you the latest records first
ORDER BY stamp DESC
Append the LIMIT clause after ORDER BY
If there is a suitable index, in this case on the publish_date field, then MySQL need not scan the whole index to get the 20 records requested - the 20 records will be found at the start of the index. But if there is no suitable index, then a full scan of the table will be needed.
There is a MySQL Performance Blog article from 2009 on this.
You can use this code
SELECT article FROM table1 ORDER BY publish_date LIMIT 0,10
where 0 is a start limit of record & 10 number of record
LIMIT is usually applied as the last operation, so the result will first be sorted and then limited to 20. In fact, sorting will stop as soon as first 20 sorted results are found.
Could be simplified to this:
SELECT article FROM table1 ORDER BY publish_date DESC FETCH FIRST 20 ROWS ONLY;
You could also add many argument in the ORDER BY that is just comma separated like: ORDER BY publish_date, tab2, tab3 DESC etc...