Mysql - GROUP BY Avoid using tempoary

Mysql - GROUP BY Avoid using tempoary - mysql

The goal of this query is to get a total of unique records (most recent per IP, by IP) per ref ID.
SELECT COUNT(DISTINCT ip), GROUP_CONCAT(ref.id)
FROM `sess` sess
JOIN `ref` USING(row_id)
WHERE sess.time BETWEEN '2010-04-21 00:00:00' AND '2010-04-21 23:59:59'
GROUP BY ref.id
ORDER BY sess.time DESC
The query works fine, but its using a temporary table. Any ideas?
The row_id is the primary on both tables. sess.time, sess.ip, and ref.id are all indexes.

I'm having trouble understanding how this query makes sense. Why do you use GROUP_CONCAT(ref.id) if you have GROUP BY ref.id? There can be only one value for ref.id per group by definition.
Also you ORDER BY sess.time even though sess could have multiple values for time per group. Which row in the group do you want to use for sorting?
I agree that a query that invokes a temporary table usually has a performance issue in MySQL. The temporary table often writes to disk, so you get an expensive disk I/O as part of the grouping & sorting.
Could you edit your question and show the table defintions (SHOW CREATE TABLE would be best)? Also please describe what the query is supposed to represent. Then we will have a better chance of giving you some suggestions about how to rewrite it.

It's probably using a temporary table because of the GROUP_CONCAT. But is that a problem really? Is the query too slow or do you simply dislike temporary tables?

Related

How can I optimize large table in mysql?

I have a table with nearly 30 M records and size is 6.6 GB. I need to query some data from it and use group by and order by. It takes me too long to query the data, I lost connection to DB so many times...
I have index on all necessary fields as key and composite key. What else can I do to make it faster for the query?
Example query:
select id, max(price), avg(order) from table group by id, date order by id, location.

use EXPLAIN query, where query is your query. For example: EXPLAIN select * from table group by id, date order by id, location.
You'll see a table where MySQL analyses your query and shows which indices it looks for. Possibly you don't have sufficient (god enough) indices.

I don't think you can. With no filter (WHERE clause) and AVG the entire tables has to be read.
The only thing I can think of is to have a new table with ID, AVG_ORDER, MAX_PRICE (or whatever you need) and update that using a trigger or stored procedure when you insert/update new rows.
an index on ID,PRICE index might help you if you didn't need that pesky average.

Indexing isn't going to do you any good. You're averaging a column, so you have to read every row in the table. That's going to take time.

Very slow query, any other ways to format this with better performace?

I have this query (I didn't write) that was working fine for a client until the table got more then a few thousand rows in it, now it's taking 40 seconds+ on only 4200 rows.
Any suggetions on how to optimize and get the same result?
I've tried a few other methods but didn't get the correct result that this slower query returned...
SELECT COUNT(*) AS num
FROM `fl_events`
WHERE id IN(
SELECT DISTINCT (e2.id)
FROM `fl_events` AS e1, fl_events AS e2
WHERE e1.startdate >= now() AND e1.startdate = e2.startdate
)
ORDER BY `startdate`
Any help would be greatly appriciated!

Appart from the obvious indexes needed, I don't really get why you are joining your table with itself for choosing the IN condition. The ORDER BY is also not needed. Are you sure that your query can't be written just like this?:
SELECT COUNT(*) AS num
FROM `fl_events` AS e1
WHERE e1.startdate >= now()

I don't think rewriting the query will help. The key to your question is "until the table got more than a few thousand rows." This implies that important columns aren't indexed. Prior to a certain number of records, all the data fit on a single memory block - over that point, it takes a new block. And index is the only way to speed up the search.
first - check to see that the ID in fl_events is actually marked as a primary key. That physically orders the records and without it you can see data corruption and occasionally super-slow results. The use of distinct in the query makes it look like it might NOT be a unique value. That will pose a problem.
Then, make sure to add an index on the start_date.

The slowness is probably related to the join of the event table with itself, and possibly startdate not having an index.

How to optimize a JOIN and AVG statement for a ratings table

I basically have two tables, a 'server' table and a 'server_ratings' table. I need to optimize the current query that I have (It works but it takes around 4 seconds). Is there any way I can do this better?
SELECT ROUND(AVG(server_ratings.rating), 0), server.id, server.name
FROM server LEFT JOIN server_ratings ON server.id = server_ratings.server_id
GROUP BY server.id;

Query looks ok, but make sure you have proper indexes:
on id column in server table - probably primary key,
on server_id column in server_ratings table,
If it does not help, then add rating column into server table and calculate it on a constant basis (see this answer about Cron jobs). This way you will save the time you spend on calculations. They can be made separately eg. every minute, but probably some less frequent calculations are enough (depending on how dynamic is your data).
Also make sure you query proper table - in the question you have mentioned servers table, but in the code there is reference to server table. Probably a typo :)

This should be slightly faster, because the aggregate function is executed first, resulting in fewer JOIN operations.
SELECT s.id, s.name, r.avg_rating
FROM server s
LEFT JOIN (
SELECT server_id, ROUND(AVG(rating), 0) AS avg_rating
FROM server_ratings
GROUP BY server_id
) r ON r.server_id = s.id
But the major point are matching indexes. Primary keys are indexed automatically. Make sure you have one on server_ratings.server_id, too.

Which is a less expensive query count(id) or order by id

I'd like to know which of the followings would execute faster in MySQL database. The table would have 200 - 1000 entries.
SELECT id
from TABLE
order by id desc
limit 1
or
SELECT count(id)
from TABLE
The story is the Table is cached. So this query is to be executed every time before cache retrieval to determine whether the cache data is invalid by comparing the previous value.
So if there exists a even less expensive query, please kindly let me know. Thanks.

If you
start from 1
never have any gaps
use the InnoDB engine
id is not nullable
Then the 2nd could run [ever so marginally] faster due to not having to visit table data at all (count is stored in metadata).
Otherwise,
if the table has NO index on ID (causing a SCAN), the 2nd one is faster
Barring both the above
the first one is faster
And if you actually meant to ask SELECT .. LIMIT 1 vs SELECT MAX(id).. then the answer is actually that they are the same for MySQL and most sane DBMS, whether or not there is an index.

I think, the first query will run faster, as the query is limited to be executed for one row only, 200-1000 may not matter that much in this case.

As already pointed out in the comments, your table is so small it really doesn't what your solution will be. For this reason the select count(id) should be used as it expresses the intent and doesn't need any further processing.
Now select count(id) comes with an alternative select count(*). These two are not synonyms. select count(*) will count the number of rows and use a cached value if possible when select count(id) counts the number of non null values of the column id exists. If the id columns is set as not null then the cached row count may be used.
The selection between count(*) and count(id) depends once again on your intent. In the general case, count(*) describes the intent better.
The there is the possibility of count(1) which is actually a synonym of count(*) when using mysql but the interpretation may vary if end up using a different RDBMS.
The performance of each type of count also varies depending on whether you are using MyISAM or InnoDB. The row counts are cached on the former but not on the latter, if I've understood correctly.
In the end, you should rely on query plans and running tests and measuring their performance rather than these general ramblings.

How can I improve the performance of this MySQL query?

I have a MySQL query:
SELECT DISTINCT
c.id,
c.company_name,
cd.firstname,
cd.surname,
cis.description AS industry_sector
FROM (clients c)
JOIN clients_details cd ON c.id = cd.client_id
LEFT JOIN clients_industry_sectors cis ON cd.industry_sector_id = cis.id
WHERE c.record_type='virgin'
ORDER BY date_action, company_name asc, id desc
LIMIT 30
The clients table has about 60-70k rows and has an index for 'id', 'record_type', 'date_action' and 'company_name' - unfortunately the query still takes 5+ secs to complete. Removing the 'ORDER BY' reduces this to about 30ms since a filesort is not required. Is there any way I can alter this query to improve upon the 5+ sec response time?

See: http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
Especially:
In some cases, MySQL cannot use indexes to resolve the ORDER BY (..). These cases include the following:
(..)
You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)

You have an index for id, record_type, date_action. But if you want to order by date_action, you really need an index that has date_action as the first field in the index, preferably matching the exact fields in the order by. Otherwise yes, it will be a slow query.

Without seeing all your tables and indexes, it's hard to tell. When asking a question about speeding up a query, the query is just part of the equation.
Does clients have an index on id?
Does clients have an index on record_type
Does clients_details have an index on client_id?
Does clients_industry_sectors have an index on id?
These are the minimum you need for this query to have any chance of working quickly.

thanks so much for the input and suggestions. In the end I've decided to create a new DB table which has the sole purpose of existing to return results for this purpose so no joins are required, I just update the table when records are added or deleted to/from the master clients table. Not ideal from a data storage point of view but it solves the problem and means I'm getting results fantastically fast. :)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Mysql - GROUP BY Avoid using tempoary - mysql

It's probably using a temporary table because of the GROUP_CONCAT. But is that a problem really? Is the query too slow or do you simply dislike temporary tables?

Related

How can I optimize large table in mysql?

Very slow query, any other ways to format this with better performace?

How to optimize a JOIN and AVG statement for a ratings table

Which is a less expensive query count(id) or order by id

How can I improve the performance of this MySQL query?

Categories

Resources