I have problem with mysql request. Currently it runs approximately 700-1300ms (it's definitely too slow). Table has about 1mil rows.
Select *
from `numbers`
where `code` = ?
and `id` not in (?)
and `open` = ?
and `views` >= ?
and `last_visit` >= ?
and `last_visit` <= ?
order
by `views` desc
limit 24
Fields:
"id" => mediumint, primary, unique
"code" => smallint, not unique, unsigned, not null
"open" => tinyint, unsigned, not null
"views" => smallint, unsigned, not null
How can i get more performance with this request? Or try something else, like cache results? Thanks!
I created a test table and filled it with 1 million rows. I used random data.
Here's the EXPLAIN for your query with no indexes defined in the table. You didn't say you had any indexes, so I assumed you had none.
id: 1
select_type: SIMPLE
table: numbers
partitions: NULL
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: NULL
rows: 523516
filtered: 0.04
Extra: Using where; Using filesort
The estimated rows examined is about 50% of the 1 million rows. Also the result must be sorted, indicated by "Using filesort." Both of these contribute to the query being expensive.
I created the following index:
alter table numbers add key bk1 (code,open,views,last_visit);
The reason for the order of these columns in this index is:
First the code and open columns that are referenced in equality conditions.
Next the views column because it's the order you want for the result, so reading the index in this order will eliminate the need to do filesort.
Last_visit is also helpful, maybe the storage engine can do some pre-filtering (this is called index condition pushdown).
There's no need to add id to the index, because InnoDB secondary indexes implicitly have the primary key appended to the end of the list of columns.
Now the EXPLAIN for the same query is as follows:
id: 1
select_type: SIMPLE
table: numbers
partitions: NULL
type: range
possible_keys: PRIMARY,bk1
key: bk1
key_len: 10
ref: NULL
rows: 527
filtered: 5.56
Extra: Using where; Backward index scan; Using index
Notice it's no longer doing a filesort, because it's able to read the table in index order.
The backward index scan note is a new thing in MySQL 8.0. If you use an earlier version of MySQL, you may not see this.
Also the number of rows examined is reduced by a factor of 1000. Your results may vary.
Are you using the right indexes?
Create a composite index with the fields you filter:
CREATE INDEX index_name
ON numbers(id, code, open, last_visit, views);
And you can execute an Explain query to check that the query is using the index:
Explain Select * from `numbers` ...
Related
I am having a pretty severe performance problem when comparing or manipulating two datetime columns in a MySQL table. I have a table tbl that's got around 10 million rows. It looks something like this:
`id` int(11) NOT NULL AUTO_INCREMENT,
`last_interaction` datetime NOT NULL DEFAULT '1970-01-01 00:00:00',
`last_maintenence` datetime NOT NULL DEFAULT '1970-01-01 00:00:00',
PRIMARY KEY (`id`),
KEY `last_maintenence_idx` (`last_maintenence`),
KEY `last_interaction_idx` (`last_interaction`),
ENGINE=InnoDB AUTO_INCREMENT=12389814 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
When I run this query, it takes about a minute to execute:
SELECT id FROM tbl
WHERE last_maintenence < last_interaction
ORDER BY last_maintenence ASC LIMIT 200;
Describing the query renders this:
id: 1
select_type: SIMPLE
table: tbl
type: index
possible_keys: NULL
key: last_maintenence_idx
key_len: 5
ref: NULL
rows: 200
Extra: Using where
It looks like MySQL isn't finding/using the index on last_interaction. Does anybody know why that might be and how to trigger it?
That is because mysql normally uses only one index per table. Unfortunately your query though it looks a simple one. The solution would be to create a composite index involving both columns.
The query you provide needs a full table scan.
Each and every record has to be checked to match your criterion last_maintenence < last_interaction.
So, no index is used when searching.
The index last_maintenence_idx is used only because you want the results ordered by it.
Thats easy: Mistake No 1.
MySQL can only (mostly) use one index per query and the optimizer looks which one is better for using.
So, you must create a composite index with both fields.
First, sorry if the used terms are not right. I'm not a mySQL-professional.
I have a table like this :
CREATE TABLE `accesses` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`time` int(11) DEFAULT NULL,
`accessed_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_accesses_on_accessed_at` (`accessed_at`)
) ENGINE=InnoDB AUTO_INCREMENT=9278483 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
This table has 10.000.000 rows in it. I use it to generate charts, with queries like this :
SELECT SUM(time) AS value, DATE(created_at) AS date
FROM `accesses`
GROUP BY date;
This query is very long (more than 1 minute). I'm doing lots of others queries (with AVG, MIN or MAX instead of SUM, or with a WHERE on a specific day or month, or GROUP BY HOUR(created_at), etc...)
I want to optimize it.
The best idea I have is to add several columns, with redundancy, like DATE(created_at), HOUR(created_at), MONTH(created_at), then add an index on it.
... Is this solution good or is there any other one ?
Regards
Yes, it can be an optimization to store data redundantly in permanent columns with an index to optimize certain queries. This is one example of denormalization.
Depending on the amount of data and the frequency of queries, this can be an important speedup (#Marshall Tigerus downplays it too much, IMHO).
I tested this out by running EXPLAIN:
mysql> explain SELECT SUM(time) AS value, DATE(created_at) AS date FROM `accesses` GROUP BY date\G *************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: accesses
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: Using temporary; Using filesort
Ignore the fact that the table is empty in my test. The important part is Using temporary; Using filesort which are expensive operations, especially if your temp table gets so large that MySQL can't fit it in memory.
I added some columns and indexes on them:
mysql> alter table accesses add column cdate date, add key (cdate),
add column chour tinyint, add key (chour),
add column cmonth tinyint, add key (cmonth);
mysql> explain SELECT SUM(time) AS value, cdate FROM `accesses` GROUP BY cdate\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: accesses
partitions: NULL
type: index
possible_keys: cdate
key: cdate
key_len: 4
ref: NULL
rows: 1
filtered: 100.00
Extra: NULL
The temporary table and filesort went away, because MySQL knows it can do an index scan to process the rows in the correct order.
I have two tables, app and pricehistory
there is a primary index id on app which is an int
on pricehistory i have two fields id_app (int), price(float) and dateup (date) and an unique index on "id_app, dateup"
i'm trying to get the latest (of date) price of an app :
select app.id,
( select price
from pricehistory
where id_app=app.id
order by dateup desc limit 1)
from app
where id=147
the explain select is kind of weird because it return 1 row but it still makes a filesort :
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY app const PRIMARY PRIMARY 4 const 1
2 DEPENDENT SUBQUERY pricehistory ref id_app,id_app_2,id_app_3 id_app 4 const 1 Using where; Using filesort
why does it need to filesort when there is only 1 row ? and why it's file sorting when i'm indexing all it need (id_app and dateup)
app has 1 million rows and i'm using innodb
edit: a sql fiddle explaining the problem:
http://sqlfiddle.com/#!2/085027/1
edit3 :
a new fiddle with another request with the same problem :
http://sqlfiddle.com/#!2/f7682/6
edit4: this fiddle ( http://sqlfiddle.com/#!2/2785c/2 ) shows that the query proposed doesn't work because it select all the data from pricehistory just to fetch the ones i want
Here's a quick rule of thumb for which order columns should go in an index:
Columns referenced in the WHERE clause with an equality condition (=).
Choose one of:
a. Columns referenced in the ORDER BY clause.
b. Columns referenced in a GROUP BY clause.
c. Columns referenced in the WHERE clause with a range condition (!=, >, <, IN, BETWEEN, IS [NOT] NULL).
Columns referenced in the SELECT-list.
See How to Design Indexes, Really.
In this case, I was able to remove the filesort with this index:
mysql> alter table pricehistory add key bk1 (id_app, dateup, price_fr);
And here's the EXPLAIN, showing no filesort, and the improvement of "Using index":
mysql> explain select price_fr from pricehistory where id_app=1 order by dateup desc\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: pricehistory
type: ref
possible_keys: bk1
key: bk1
key_len: 4
ref: const
rows: 1
Extra: Using where; Using index
You can make this index UNIQUE if you want to.
I had to drop the other unique keys, to avoid confusing the optimizer.
The two UNIQUE KEYs are causing the problem. I changed your fiddle to the following, and it works without a filesort:
CREATE TABLE IF NOT EXISTS `pricehistory` (
`id_app` int(10) unsigned NOT NULL,
`dateup` date NOT NULL,
`price_fr` float NOT NULL DEFAULT '-1',
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_app` (`id_app`,`dateup`,`price_fr`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=345 ;
INSERT INTO pricehistory
(id_app, price_fr,dateup)
VALUES
('1', '4.99', now()),
('2', '0.45', now());
The EXPLAIN gives:
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS EXTRA
1 SIMPLE pricehistory ref id_app id_app 4 const 1 Using where; Using index
There's no reason to use a UNIQUE KEY on both (id_app,dateup) and (id_app,price_fr,dateup), as they are redundant. I'm pretty confident that redundancy is making MySQL somehow uncertain of itself so that it errs on the side of doing a filesort.
The solution is to remove the unique to one of the indexes. It seems like if it's not useful it's better to not put the unique keyword.
thanks to both of you
edit:
damn, with a different query with 2 tables, the filesort is back :
http://sqlfiddle.com/#!2/f7682/6
I'm trying to optimize a MySQL table for faster reads. The ratio of read to writes is about 100:1 so I'm disposed to sacrifice write performances with multi indexes.
Relevant fields for my table are the following and it contains about 200000 records
CREATE TABLE `publications` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
-- omitted fields
`publicaton_date` date NOT NULL,
`active` tinyint(1) NOT NULL DEFAULT '0',
`position` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
-- these are just attempts, they are not production index
KEY `publication_date` (`publication_date`),
KEY `publication_date_2` (`publication_date`,`position`,`active`)
) ENGINE=MyISAM;`enter code here`
Since I'm using Ruby on Rails to access data in this table I've defined a default scope for this table which is
default_scope where(:active => true).order('publication_date DESC, position ASC')
i.e. every query in this table by default will be completed automatically with the following SQL fragment, so you can assume that almost all queries will have these conditions
WHERE `publications`.`active` = 1 ORDER BY publication_date DESC, position
So I'm mainly interested in optimize this kind of query, plus queries with publication_date in the WHERE condition.
I tried with the following indexes in various combinations (also with multiple of them at the same time)
`publication_date`
`publication_date`,`position`
`publication_date`,`position`,`active`
However a simple query as this one still doesn't use the index properly and uses filesort
SELECT `publications`.* FROM `publications`
WHERE `publications`.`active` = 1
AND (id NOT IN (35217,35216,35215,35218))
ORDER BY publication_date DESC, position
LIMIT 8 OFFSET 0
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: publications
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 34903
Extra: Using where; Using filesort
1 row in set (0.00 sec)
Some considerations on my issue:
According to MySQL documentation a composite index can't be used for ordering when you mix ASC and DESC in ORDER BY clause
active is a boolean flag, so put it in a standalone index make no sense (it has just 2 possible values) but it's always used in WHERE clause so it should appear somewhere in an index to avoid Using where in Extra
position is an integer with few possible values and it's always used scoped to publication_date so I think it's useless to have it in a standalone index
Lot of queries uses publication_date in the where part so it can be useful to have it also in a standalone index, even if redundant and it's the first column of the composite index.
One problem is that your are mixing sort orders in the order by clause. You could invert your position (inverted_position = max_position - position) so that you may also invert the sort order on that column.
You can then create a compound index on [publication_date, inverted_position] and change your order by clause to publication_date DESC, inverted_position DESC.
The active column should most likely not be part of the index as it has a very low selectivity.
I'm having a hard time figuring how to query/index a database.
The situation is pretty simple. Each time a user visits a category, his/her visit date is stored. My goal is to list the categories in which elements have been added after the user's latest visit.
Here are the two tables:
CREATE TABLE `elements` (
`category_id` int(11) NOT NULL,
`element_id` int(11) NOT NULL,
`title` varchar(255) NOT NULL,
`added_date` datetime NOT NULL,
PRIMARY KEY (`category_id`,`element_id`),
KEY `index_element_id` (`element_id`)
)
CREATE TABLE `categories_views` (
`member_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
`view_date` datetime NOT NULL,
PRIMARY KEY (`member_id`,`category_id`),
KEY `index_element_id` (`category_id`)
)
Query:
SELECT
categories_views.*,
elements.category_id
FROM
elements
INNER JOIN categories_views ON (categories_views.category_id = elements.category_id)
WHERE
categories_views.member_id = 1
AND elements.added_date > categories_views.view_date
GROUP BY elements.category_id
Explained:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: elements
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 89057
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: categories_views
type: eq_ref
possible_keys: PRIMARY,index_element_id
key: PRIMARY
key_len: 8
ref: const,convert.elements.category_id
rows: 1
Extra: Using where
With about 100k rows in each table, the query is taking around 0.3s, which is too long for something that should be executed for every user action in a web context.
If possible, what indexes should I add, or how should I rewrite this query in order to avoid using filesorts and temporary tables?
If each member has a relatively low number of category_views, I suggest testing a different query:
SELECT v.*
FROM categories_views v
WHERE v.member_id = 1
AND EXISTS
( SELECT 1
FROM elements e
WHERE e.category_id = v.category_id
AND e.added_date > v.view_date
)
For optimum performance of that query, you'd want to ensure you had indexes:
... ON elements (category_id, added_date)
... ON categories_views (member_id, category_id)
NOTE: It looks like the primary key on the categories_views table may be (member_id, category_id), which means an appropriate index already exists.
I'm assuming (as best as I can figure out from the original query) is that the categories_views table contains only the "latest" view of the category for a user, that is, member_id, category_id is unique. It looks like that has to be the case, if the original query is returning a correct result set (if its only returning categories that have "new" elements added since the "last view" of that category by the user; otherwise, the existence of any "older" view_date values in the categories_views table would trigger the inclusion of the category, even if there were a newer view_date that was later than the latest (max added_date) element in a category.
If that's not the case, i.e. (member_id,category_id) is not unique, then the query would need to be changed.
The query in the original question is a bit puzzling, it references element_views as a table name or table alias, but that doesn't appear in the EXPLAIN output. I'm going under the assumption that element_views is meant to be a synonym for categories_views.
For the original query, add a covering index on the elements table:
... ON elements (category_id, added_date)
The goal there is to get the explain output to show "Using index"
You might also try adding an index:
... ON categories_views (member_id, category_id, added_date)
To get all the columns from the categories_view table (for the select list), the query is going to have to visit the pages in the table (unless there's an index that contains all of those columns. The goal would be reduce the number of rows that need to be visited on data pages to find the row, by having all (or most) of the predicates satisfied from the index.
Is it necessary to return the category_id column from the elements table? Don't we already know that this is the same value as in the category_id column from the categories_views table, due to the inner join predicate?