MySQL select count(distinct id) is very slow - mysql

I have a MyISAM table with about 70 million records. When I do select count(distinct id) it takes about 80 seconds to get the query results. This table is a de-normalized table, that's why I need to get the unique count for id, and it has to be done dynamically. If I add a where clause, depending on the range I give, it takes a shorter time between 4 - 90 secs.
I'm wondering if there is any way I can optimize this to improve the query speed.

Try SELECT SQL_CALC_FOUND_ROWS DISTINCT(id) FROM ..., and after that execute query SELECT FOUND_ROWS() AS Total in order to fetch result.

Related

Group by with sort on more than million rows

I have a table with million+ rows. Each of them have a license no.
The query I have right now, does a group by on license no and sorts by count(Distinct(type)) and count(license_no) and date.
All the fields with joins - license_no or date are indexed.
But its taking me 5 seconds to return the results.
How do I speed up the performance. Ideally the results should not take more than a second.
Query:
SELECT `license_no`,
COUNT(DISTINCT(type)) AS gdid,
COUNT(id) AS cdid,
max(updated_on) as maxdate
FROM `mytable`
WHERE `license_no` >0
GROUP BY `license_no`
ORDER BY `gdid` DESC, `cdid` DESC, maxdate DESC LIMIT 12
Logic I want to implement:
I have a list of cars (million + records).
I want to find all unique cars (unique by license_no)
sorted by :
license_no which has max count of different types
license_no which has max total counts
finally sort individual records by latest date.
The only way you are going to make this run fast is by pre-aggregating. You can do this using triggers on mytable. Your writes will be a little slower, but the query above will only need to scan a much smaller table.

How to find the repeat value in a mysql table with 30 million rows

In Mysql, I have a table with two columns (id, uuid). Then I inserted 30 million values into it. (ps: the uuid can repeated)
Now, I want to find the repeat value in the table by using Mysql grammar, but the sql spent too much time.
I want to search all columns, but it takes much time, so I tried querying first million rows, the it spent 8 seconds.
Then I tried with 10 million rows, it spend 5mins,
then with 20 million rows, the server seem died.
select count(uuid) as cnt
from uuid_test
where id between 1
and 1000000
group by uuid having cnt > 1;
Anyone can help me to optimized the sql, thanks
Try this query,
SELECT uuid, count(*) cnt FROM uuid_test GROUP BY 1 HAVING cnt>1;
Hope it helps.
Often the fastest way to find duplicates uses a correlated subquery rather than aggregation:
select ut2.*
from uuid_test ut2
where exists (select 1
from uuid_test ut2
where ut2.uuid = ut.uuid and
ut2.id <> ut.id
);
This can take advantage of an index on uuid_test(uuid, id).

Slow performance with OR clause

I have a MyISAM Table with circa 1 million rows.
SELECT * FROM table WHERE colA='ABC' AND (colB='123' OR colC='123');
The query above takes over 10 seconds to run. All columns in question are indexed.
But when I split it as follows...
SELECT * FROM table WHERE colA='ABC' AND colB='123';
SELECT * FROM table WHERE colA='ABC' AND colC='123';
Each individual query takes 0.002 seconds.
What gives, and how do I optimize the table/query?
( SELECT * FROM table WHERE colA='ABC' AND colB='123' )
UNION DISTINCT
( SELECT * FROM table WHERE colA='ABC' AND colC='123' )
;
And have
INDEX(colA, colB),
INDEX(colA, colC)
You should consider moving to InnoDB, though it may not matter to this particular question.
Here's how the UNION will work:
Perform each SELECT. They will be very efficient due to the indexes suggested.
Collect the results in a tmp table.
De-dup the temp table and deliver the resulting rows.
All the rows are 'touched' in the original query (with OR).
With the UNION:
Only the necessary rows are touched in the SELECTs.
Those rows are written to the tmp table.
Those rows are reread. (The de-dupping may involve touching the rows more than once.)

SELECT COUNT(*) with GROUP BY slow for large table

I have a table with around 100 million rows consisting of three columns (all INT):
id | c_id | l_id
Even though I use indices even a basic
select count(*), c_id
from table
group by c_id;
takes 16 seconds (MYISAM) to 25 seconds (InnoDB) to complete.
Is there any way to speed up this process without tracking the count in a seperate table (e.g. by using triggers)?
/edit: all colums have indices
See execution plan for possible ways to do the same queries SqlFiddle,
SELECT COUNT(id) will be faster if c_id is not indexed on the test set i have provided.
otherwise you should use COUNT(*) since optimization of index may not be used in the query.
It is also dependent of the number of rows in the DB and the ENGINE type , since mysql will decide what is better based on this fact also.
You should always see the execution plan of the query before executing it by typing EXPLAIN before the select.
I have to say that in most cases on big datasets, COUNT(*) and COUNT(id) should result in the same execution plan.
It's not the Count(*) that gives the performance issue but grouping on 100 million rows.
You should add an index on the c_id column

Speed of two queries versus one query but limit output

I am running a query where i need to know the number of lines total in a table but only need to show the first 6.
So, is it faster to run select count(*) then select * ... limit 6 and print data returned? Or, just select * with no limit and put a counter in the while loop printing the results? With the latter I can obviously use mysql_num_rows to get the total.
The table in question will contain up to 1 million rows, the query includes a where row = xxx that column will be indexed
Use FOUND_ROWS(). Here's an example:
SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name WHERE id > 100 LIMIT 10;
SELECT FOUND_ROWS();
Do two queries. Your count query will use an index and will not have to scan the whole table, only the index. The second query will only have to read the 6 rows from the table.