I have a MySQL query :
SELECT date(FROM_UNIXTIME(time)) as date,
count(view) as views
FROM ('table_1')
WHERE 'address' = 1
GROUP BY date(FROM_UNIXTIME(time))
where
view : auto increment and primary key, (int(11))
address : index , (int(11))
time : index, (int(11))
total rows number of the table is : 270k
this query have slow executing, in mysql-slow.log I got :
Query_time: 1.839096
Lock_time: 0.000042
Rows_sent: 155
Rows_examined: 286435
with use EXPLAIN looks like below:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table_1 ref address address 5 const 139138 Using where; Using temporary; Using filesort
How to improve this query to speed up executing? Maybe better will be if I change date in PHP? But I think in PHP take a date as timestamp next convert to human readable and make "group by" will take more time then one query in MySQL. Maybe somebody knows how to make this query faster?
When you apply the functions date() and FROM_UNIXTIME() to the time in the group by you kill any indexing benefit you may have on that field.
Adding a date column would be the only way i can see speeding this up if you need it grouped by day. Without it, you'll need to decrase the overall set you are trying to group by. You could maybe add start/end dates to limit the date range. That would decrease the dates being transformed and grouped.
You should consider adding a additional DATE column to your table and indexing it.
Related
For a table with two non-null columns: id (primary), and date (indexed), I get the following entry in the mysql-slow log.
# Query_time: 16.316747 Lock_time: 0.000049 Rows_sent: 1 Rows_examined: 616021
SET timestamp=1451837371;
select max(date) from mytable where id<896173;
I ran EXPLAIN on this query, and this is the outcome.
id = 1
select_type = SIMPLE
table = mytable
type = range
possible_keys = PRIMARY
key = PRIMARY
key_len = 4
ref = NULL
rows = 337499
Extra = Using where
I tried to edit the date index to add the id column to it. However, it is still the case that the number of rows examined is high. What can I do to reduce this number?
The engine needs to look at all rows where id<896173 and select the max(date) from that. Having an index on date and an index on id does not really help. Either MySQL can use the index on date to identify only a subset of rows.
However, that subset is big enough that it will be faster to read all the rows (with sequential access) than it would be to read only a subset (with random access).
I suggest you using an index more selective like the inverse of your
use an index based on id, date
in this way the id drive the selection and the date fiels support the selection.
I wanted to find all hourly records that have a successor in a ~5m row table.
I tried :
SELECT DISTINCT (date_time)
FROM my_table
JOIN (SELECT DISTINCT (DATE_ADD( date_time, INTERVAL 1 HOUR)) date_offset
FROM my_table) offset_dates
ON date_time = date_offset
and
SELECT DISTINCT(date_time)
FROM my_table
WHERE date_time IN (SELECT DISTINCT(DATE_ADD(date_time, INTERVAL 1 HOUR))
FROM my_table)
The first one completes in a few seconds, the seconds hangs for hours.
I can understand that the sooner is better but why such a huge performance gap?
-------- EDIT ---------------
Here are the EXPLAIN for both queries
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1710 Using temporary
1 PRIMARY my_table ref PRIMARY PRIMARY 8 offset_dates.date_offset 555 Using index
2 DERIVED my_table index NULL PRIMARY 13 NULL 5644204 Using index; Using temporary
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY my_table range NULL PRIMARY 8 NULL 9244 Using where; Using index for group-by
2 DEPENDENT SUBQUERY my_table index NULL PRIMARY 13 NULL 5129983 Using where; Using index; Using temporary
In general, a query using a join will perform better than an equivalent query using IN (...), because the former can take advantage of indexes while the latter can't; the entire IN list must be scanned for each row which might be returned.
(Do note that some database engines perform better than others in this case; for example, SQL Server can produce equivalent performance for both types of queries.)
You can see what the MySQL query optimizer intends to do with a given SELECT query by prepending EXPLAIN to the query and running it. This will give you, among other things, a count of rows the engine will have to examine for each step in a query; multiply these counts to get the overall number of rows the engine will have to visit, which can serve as a rough estimate of likely performance.
I would prefix both queries by explain, and then compare the difference in the access plans. You will probably find that the first query looks at far fewer rows than the second.
But my hunch is that the JOIN is applied more immediately than the WHERE clause. So, in the WHERE clause you are getting every record from my_table, applying an arithmetic function, and then sorting them because select distinct usually requires a sort and sometimes it creates a temporary table in memory or on disk. The # of rows examined is probably the product of the size of each table.
But in the JOIN clause, a lot of the rows that are being examined and sorted in the WHERE clause are probably eliminated beforehand. You probably end up looking at far fewer rows... and the database probably takes easier measures to accomplish it.
But I think this post answers your question best: SQL fixed-value IN() vs. INNER JOIN performance
'IN' clause is usually slow for huge tables. As far as I remember, for the second statement you printed out - it will simply loop through all rows of my_table (unless you have index there) checking each row for match of WHERE clause. In general IN is treated as a set of OR clauses with all set elements in it.
That's why, I think, using temporary tables that are created in background of JOIN query is faster.
Here are some helpful links about that:
MySQL Query IN() Clause Slow on Indexed Column
inner join and where in() clause performance?
http://explainextended.com/2009/08/18/passing-parameters-in-mysql-in-list-vs-temporary-table/
Another things to consider is that with your IN style, very little future optimization is possible compared to the JOIN. With the join you can possibly add an index, which, who knows, it depends on the data set, it might speed things up by a 2, 5, 10 times. With the IN, it's going to run that query.
I'm using MySQL database. In that i'm having sales datas in one table and i've created an index for the Date column (i.e) OrderedDate.
The datas are retrieved fast when we use the query like
SELECT CustID
,CustName
,CustPhone
FROM SalesData
WHERE OrderDate BETWEEN '2012-01-08' AND '2012-05-08';
But for taking the details for Particular Quarter cycle , it gets slow and scans whole table datas..
The Query is like
SELECT CustID
,CustName
,CustPhone
FROM SalesData
WHERE Quarter(OrderDate) = Quarter('2012-01-08')
AND Year(OrderDate) = Year('2012-01-08');
Is there anyway to index the quarter function (or) any way to speedup the data retrieval for quarter cycle...?
Explain statement:
For 1st Query
id Selecttype table type possible_keys key key_len ref rows Extra
1 SIMPLE SalesData range csrv csrv 4 138 Using where
For 2nd Query (scanning 104785 rows data)
id Selecttype table type possible_keys key key_len ref rows Extra
1 SIMPLE SalesData All 104785 Using where
You found the solution yourself, why don't you use it?
Just calculate the boundary dates for the quarter in question and use them with BETWEEN:
SELECT CustID
,CustName
,CustPhone
FROM SalesData
WHERE OrderDate BETWEEN '2012-01-01' AND '2012-03-31';
You can calculate the boundary dates in your application, or also via MySQL as shown in this article:
http://use-the-index-luke.com/sql/where-clause/obfuscation/dates?db_type=mysql
There is no way to index anyFunction(OrderDate) in mysql, unless your store it separately.
I think you can handle it more efficient in 2 ways:
1- Migrate from MySQL to Mariadb, it has Computed/Virtual columns that you can make virtual field as OrderDateQuarter and OrderDateYear also you can index them without much overhead.
(MariaDB is a community-developed fork of the MySQL, Ithink it much better than native MySQL)
2- You can store OrderDateQuarter and OrderDateYear in another columns, then index them, you can make it easy by write a trigger to store these two columns OrderDate.
I have a simple key-value table with two fields, created like so:
CREATE TABLE `mytable` (
`key` varchar(255) NOT NULL,
`value` double NOT NULL,
KEY `MYKEY` (`key`)
);
The keys are not unique. The table contains over one million records. I need a query that will sum up all the values for a given key, and return the top 10 keys. Here's my attempt:
SELECT t.key, SUM(t.value) value
FROM mytable t
GROUP BY t.key
ORDER BY value DESC
LIMIT 0, 10;
But this is very slow. Thing is, without the GROUP BY and SUM, it's pretty fast, and without the ORDER BY, it's very fast, but for some reason the combination of the two makes it very very slow. Can anyone explain why this is so, and how this can be speeded up?
There is no index on value. I tried creating one but it didn't help.
EXPLAIN EXTENDED produces the following in Workbench:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t index NULL MYKEY 257 NULL 1340532 100.00 "Using temporary; Using filesort"
There are about 400K unique keys in the table.
The query takes over 3 minutes to run. I don't know how long because I stopped it after 3 minutes. However, if I remove the index on key, it runs in 30 seconds! Anyone has any idea why?
The only way to really speed this up, as far as I can see, is to create a seperate table with unique keys in and maintain the total values. Then you will be able to index values to retrieve the top ten quickly, also the calculation will already be done. As long as the table is not updated in too many places, this shouldn't be a major problem.
The major problem with this type of query is that the group by requires indexing in one order and the order by requires sorting into a different order.
Using MySQL (5.1.66) explain says it will scan just 72 rows while the "slow log" reports the whole table was scanned (Rows_examined: 5476845)
How is this possible? I can't figure out what's wrong with the query
*name* is a string unique index and
*date* is just a regular int index
This is the EXPLAIN
EXPLAIN SELECT *
FROM table
WHERE name LIKE 'The%Query%'
ORDER BY date DESC
LIMIT 3;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table index name date 4 NULL 72 Using where
Output from Slow Log
# Query_time: 5.545731 Lock_time: 0.000083 Rows_sent: 1 Rows_examined: 5476845
SET timestamp=1360007079;
SELECT * FROM table WHERE name LIKE 'The%Query%' ORDER BY date DESC LIMIT 3;
The rows value that is returned from an EXPLAIN is an estimate of the number of rows that have to be examined to find results that match your query.
If you look, you will see that the key being chosen for the query execution is date, which is probably being picked because of your ORDER BY clause. Because the key being used in the query is unrelated to your WHERE clause, that's probably why the estimate is getting messed up. Even though your WHERE clause is doing a LIKE on the name column, the optimizer may decide not to use an index at all:
Sometimes MySQL does not use an index, even if one is available. One
circumstance under which this occurs is when the optimizer estimates
that using the index would require MySQL to access a very large
percentage of the rows in the table. (In this case, a table scan is
likely to be much faster because it requires fewer seeks.) source
In short, the optimizer is choosing not to use the name key, even though it would be the one that is the limiting factor of rows to be returned. You can try forcing the index to see if that improves the performance.