Add index to generated column - mysql

First, sorry if the used terms are not right. I'm not a mySQL-professional.
I have a table like this :
CREATE TABLE `accesses` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`time` int(11) DEFAULT NULL,
`accessed_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_accesses_on_accessed_at` (`accessed_at`)
) ENGINE=InnoDB AUTO_INCREMENT=9278483 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
This table has 10.000.000 rows in it. I use it to generate charts, with queries like this :
SELECT SUM(time) AS value, DATE(created_at) AS date
FROM `accesses`
GROUP BY date;
This query is very long (more than 1 minute). I'm doing lots of others queries (with AVG, MIN or MAX instead of SUM, or with a WHERE on a specific day or month, or GROUP BY HOUR(created_at), etc...)
I want to optimize it.
The best idea I have is to add several columns, with redundancy, like DATE(created_at), HOUR(created_at), MONTH(created_at), then add an index on it.
... Is this solution good or is there any other one ?
Regards

Yes, it can be an optimization to store data redundantly in permanent columns with an index to optimize certain queries. This is one example of denormalization.
Depending on the amount of data and the frequency of queries, this can be an important speedup (#Marshall Tigerus downplays it too much, IMHO).
I tested this out by running EXPLAIN:
mysql> explain SELECT SUM(time) AS value, DATE(created_at) AS date FROM `accesses` GROUP BY date\G *************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: accesses
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: Using temporary; Using filesort
Ignore the fact that the table is empty in my test. The important part is Using temporary; Using filesort which are expensive operations, especially if your temp table gets so large that MySQL can't fit it in memory.
I added some columns and indexes on them:
mysql> alter table accesses add column cdate date, add key (cdate),
add column chour tinyint, add key (chour),
add column cmonth tinyint, add key (cmonth);
mysql> explain SELECT SUM(time) AS value, cdate FROM `accesses` GROUP BY cdate\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: accesses
partitions: NULL
type: index
possible_keys: cdate
key: cdate
key_len: 4
ref: NULL
rows: 1
filtered: 100.00
Extra: NULL
The temporary table and filesort went away, because MySQL knows it can do an index scan to process the rows in the correct order.

Related

How to optimize mysql request?

I have problem with mysql request. Currently it runs approximately 700-1300ms (it's definitely too slow). Table has about 1mil rows.
Select *
from `numbers`
where `code` = ?
and `id` not in (?)
and `open` = ?
and `views` >= ?
and `last_visit` >= ?
and `last_visit` <= ?
order
by `views` desc
limit 24
Fields:
"id" => mediumint, primary, unique
"code" => smallint, not unique, unsigned, not null
"open" => tinyint, unsigned, not null
"views" => smallint, unsigned, not null
How can i get more performance with this request? Or try something else, like cache results? Thanks!
I created a test table and filled it with 1 million rows. I used random data.
Here's the EXPLAIN for your query with no indexes defined in the table. You didn't say you had any indexes, so I assumed you had none.
id: 1
select_type: SIMPLE
table: numbers
partitions: NULL
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: NULL
rows: 523516
filtered: 0.04
Extra: Using where; Using filesort
The estimated rows examined is about 50% of the 1 million rows. Also the result must be sorted, indicated by "Using filesort." Both of these contribute to the query being expensive.
I created the following index:
alter table numbers add key bk1 (code,open,views,last_visit);
The reason for the order of these columns in this index is:
First the code and open columns that are referenced in equality conditions.
Next the views column because it's the order you want for the result, so reading the index in this order will eliminate the need to do filesort.
Last_visit is also helpful, maybe the storage engine can do some pre-filtering (this is called index condition pushdown).
There's no need to add id to the index, because InnoDB secondary indexes implicitly have the primary key appended to the end of the list of columns.
Now the EXPLAIN for the same query is as follows:
id: 1
select_type: SIMPLE
table: numbers
partitions: NULL
type: range
possible_keys: PRIMARY,bk1
key: bk1
key_len: 10
ref: NULL
rows: 527
filtered: 5.56
Extra: Using where; Backward index scan; Using index
Notice it's no longer doing a filesort, because it's able to read the table in index order.
The backward index scan note is a new thing in MySQL 8.0. If you use an earlier version of MySQL, you may not see this.
Also the number of rows examined is reduced by a factor of 1000. Your results may vary.
Are you using the right indexes?
Create a composite index with the fields you filter:
CREATE INDEX index_name
ON numbers(id, code, open, last_visit, views);
And you can execute an Explain query to check that the query is using the index:
Explain Select * from `numbers` ...

Using a covering index to select records for a certain day

I would like to run these queries:
select url from weixin_kol_status where created_at>'2015-12-11 00:00:00' and created_at<'2015-12-11 23:59:59';
and
select url from weixin_kol_status where userid in ('...') and created_at>'2015-12-11 00:00:00' and created_at<'2015-12-11 23:59:59';
… using this table definition:
CREATE TABLE `weixin_kol_status` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`url` varchar(512) NOT NULL,
`created_at` datetime NOT NULL,
`title` varchar(512) NOT NULL DEFAULT '',
`text` text,
`attitudes_count` int(11) NOT NULL DEFAULT '0',
`readcount` int(11) NOT NULL DEFAULT '0',
`reposts_count` int(11) NOT NULL DEFAULT '0',
`comments_count` int(11) NOT NULL DEFAULT '0',
`userid` varchar(32) NOT NULL,
`screen_name` varchar(32) NOT NULL,
`type` tinyint(4) NOT NULL DEFAULT '0',
`ext_data` text,
`is_topline` tinyint(4) NOT NULL DEFAULT '0',
`is_business` tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `idx_url` (`url`(255)),
KEY `idx_userid` (`userid`),
KEY `idx_name` (`screen_name`),
KEY `idx_created_at` (`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=328727437 DEFAULT CHARSET=utf8 |
rows = 328727437;
The queries take several minutes. How can I optimize the queries? How can I use the covering index?
The execution plans are:
explain select id from weixin_kol_status where created_at>='2015-12-11 00:00:00' and created_at<='2015-12-11 23:59:59'\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: weixin_kol_status
type: range
possible_keys: idx_created_at
key: idx_created_at
key_len: 5
ref: NULL
rows: 1433704
Extra: Using where; Using index
1 row in set (0.00 sec)
and
explain select id from weixin_kol_status where created_at='2015-12-11 00:00:00'\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: weixin_kol_status
type: ref
possible_keys: idx_created_at
key: idx_created_at
key_len: 5
ref: const
rows: 1
Extra: Using index
1 row in set (0.00 sec)
but why the first query Extra: Using where; Using index, and the second query Extra: Using index. Did the first query not use the covering index?
How can I use the covering index?
Do you know what a covering index is? It's an index that contains all the columns that you need for your query. So for
select url from weixin_kol_status where created_at>'2015-12-11 00:00:00' and created_at<'2015-12-11 23:59:59';
the minimum covering index would be something like
KEY `idx_created_url` (`created_at`, `url`)
And for
select url from weixin_kol_status where userid in ('...') and created_at>'2015-12-11 00:00:00' and created_at<'2015-12-11 23:59:59';
the minimum covering index might be
KEY `idx_created_user_url` (`created_at`, `userid`, `url`)
which would also cover the first query or
KEY `idx_user_created_url` (`userid`, `created_at`, `url`)
which wouldn't work for the first query but may better optimize the second.
You might have to write out url(512) instead of just url. VARCHAR columns don't index well. If you get an error about the indexed values being too wide, you may not be able to use a covering index with this query.
A covering index is helpful because it can answer everything from the index in memory without going to the table on disk. Since memory is faster than disk, it has the effect of speeding up the query. Of course, if your index is paged out, you will still have to load it from disk. So if you're memory bound, this may not help.
Note that a query will only use one index per table, so the separate indexes on each column won't cover either query. You need a compound index to cover all the needed columns at once.
As a side note, I think that your > and < should be >= and <= respectively. Probably won't make much difference, but you seem to be skipping two seconds a day.
Several issues
UNIQUE(url(255)) constrains the first 255 characters to be unique; this was probably not desired.
If you need to force uniqueness of a long string (url), add another column with MD5(url) and make that column UNIQUE. (Or something like that.)
There is a limit of 767 bytes per column in an index, so if you try to create INDEX(created_at, url), you get INDEX(created_at, url(255)), which is not covering, since not all of url is in the index.
Both EXPLAINs are useless for this discussion since they are not using the SELECTs you are asking about. In the first, it says Using index because you say SELECT id; the actual query is SELECT url. This makes a big difference in performance.
You have a very large table. I see no way for PARTITIONing to help with speed.
This is a better way to express a 1-day WHERE:
created_at >= '2015-12-11'
AND created_at < '2015-12-11' + INTERVAL 1 DAY
To speed it up
Here is an off-the-wall technique that should help both queries. Instead of
PRIMARY KEY (`id`),
KEY `idx_created_at` (`created_at`)
Do it this way:
PRIMARY KEY(created_at, id),
INDEX(id)
That will "cluster" on created_at, thereby cutting down on I/O significantly, especially for the first SELECT. Yes, it is OK for an AUTO_INCREMENT to be just INDEXed, not UNIQUE nor PRIMARY KEY.
Caution: To make the change will take hours and a lot of disk space.

MySQL, the index of text can`t work

I create a table like this
CREATE TABLE `text_tests` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`text_st_date` text NOT NULL,
`varchar_st_date` varchar(255) NOT NULL DEFAULT '2015-08-25',
`text_id` text NOT NULL,
`varchar_id` varchar(255) NOT NULL DEFAULT '0',
`int_id` int(11) NOT NULL DEFAULT '0',
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_of_text_st_date` (`text_st_date`(50),`id`),
KEY `idx_of_varchar_st_date` (`varchar_st_date`,`id`),
KEY `idx_of_text_id` (`text_id`(20),`id`),
KEY `idx_of_varchar_id` (`varchar_id`,`id`),
KEY `idx_of_int_id` (`int_id`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
then i make some datas use Ruby
(1..10000).each do |_i|
item = TextTest.new
item.text_st_date = (Time.now + _i.days).to_s
item.varchar_st_date = (Time.now + _i.days).to_s
item.text_id = _i
item.varchar_id = _i
item.int_id = _i
item.save
end
at last, I try to use the index of text, but it can`t work, it always full table scan.
EXPLAIN SELECT id
FROM text_tests
ORDER BY text_st_date DESC
LIMIT 20\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: text_tests
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 9797
Extra: Using filesort
1 row in set (0.02 sec)
EXPLAIN SELECT id
FROM text_tests
ORDER BY text_id DESC
LIMIT 20\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: text_tests
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 9797
Extra: Using filesort
1 row in set (0.00 sec)
varchar works good
EXPLAIN SELECT id
FROM text_tests
ORDER BY varchar_st_date DESC
LIMIT 20\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: text_tests
type: index
possible_keys: NULL
key: idx_of_varchar_st_date `enter code here`
key_len: 771
ref: NULL
rows: 20
Extra: Using index
1 row in set (0.00 sec)
EXPLAIN SELECT id
FROM text_tests
ORDER BY varchar_id DESC
LIMIT 20\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: text_tests
type: index
possible_keys: NULL
key: idx_of_varchar_id
key_len: 771
ref: NULL
rows: 20
Extra: Using index
1 row in set (0.00 sec)
Why the index of text can`t work, and how to use the index of text?
Indexes don't serve a very strong purpose to satisfy queries that return all the rows of the table in the result set. One of their primary purposes is to accelerate WHERE and JOIN ... ON clauses. If your query has no WHERE clause, don't be surprised if the query planner decides to scan the whole table.
Also, your first query does ORDER BY text_column. But your index only encompasses the first fifty characters of that column. So, to satisfy the query, MySql has sort the whole thing. What's more, it has to sort it on the hard drive, because the in-memory table support can't handle BLOB or Text Large Objects.
MySQL is very good at handling dates, but you need to tell it that you have dates, not VARCHAR(255).
Use a DATE datatype for date columns ! If Ruby won't help you do that, then get rid of Ruby.

Mysql using filesort even when using index and only one row

I have two tables, app and pricehistory
there is a primary index id on app which is an int
on pricehistory i have two fields id_app (int), price(float) and dateup (date) and an unique index on "id_app, dateup"
i'm trying to get the latest (of date) price of an app :
select app.id,
( select price
from pricehistory
where id_app=app.id
order by dateup desc limit 1)
from app
where id=147
the explain select is kind of weird because it return 1 row but it still makes a filesort :
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY app const PRIMARY PRIMARY 4 const 1
2 DEPENDENT SUBQUERY pricehistory ref id_app,id_app_2,id_app_3 id_app 4 const 1 Using where; Using filesort
why does it need to filesort when there is only 1 row ? and why it's file sorting when i'm indexing all it need (id_app and dateup)
app has 1 million rows and i'm using innodb
edit: a sql fiddle explaining the problem:
http://sqlfiddle.com/#!2/085027/1
edit3 :
a new fiddle with another request with the same problem :
http://sqlfiddle.com/#!2/f7682/6
edit4: this fiddle ( http://sqlfiddle.com/#!2/2785c/2 ) shows that the query proposed doesn't work because it select all the data from pricehistory just to fetch the ones i want
Here's a quick rule of thumb for which order columns should go in an index:
Columns referenced in the WHERE clause with an equality condition (=).
Choose one of:
a. Columns referenced in the ORDER BY clause.
b. Columns referenced in a GROUP BY clause.
c. Columns referenced in the WHERE clause with a range condition (!=, >, <, IN, BETWEEN, IS [NOT] NULL).
Columns referenced in the SELECT-list.
See How to Design Indexes, Really.
In this case, I was able to remove the filesort with this index:
mysql> alter table pricehistory add key bk1 (id_app, dateup, price_fr);
And here's the EXPLAIN, showing no filesort, and the improvement of "Using index":
mysql> explain select price_fr from pricehistory where id_app=1 order by dateup desc\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: pricehistory
type: ref
possible_keys: bk1
key: bk1
key_len: 4
ref: const
rows: 1
Extra: Using where; Using index
You can make this index UNIQUE if you want to.
I had to drop the other unique keys, to avoid confusing the optimizer.
The two UNIQUE KEYs are causing the problem. I changed your fiddle to the following, and it works without a filesort:
CREATE TABLE IF NOT EXISTS `pricehistory` (
`id_app` int(10) unsigned NOT NULL,
`dateup` date NOT NULL,
`price_fr` float NOT NULL DEFAULT '-1',
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_app` (`id_app`,`dateup`,`price_fr`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=345 ;
INSERT INTO pricehistory
(id_app, price_fr,dateup)
VALUES
('1', '4.99', now()),
('2', '0.45', now());
The EXPLAIN gives:
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS EXTRA
1 SIMPLE pricehistory ref id_app id_app 4 const 1 Using where; Using index
There's no reason to use a UNIQUE KEY on both (id_app,dateup) and (id_app,price_fr,dateup), as they are redundant. I'm pretty confident that redundancy is making MySQL somehow uncertain of itself so that it errs on the side of doing a filesort.
The solution is to remove the unique to one of the indexes. It seems like if it's not useful it's better to not put the unique keyword.
thanks to both of you
edit:
damn, with a different query with 2 tables, the filesort is back :
http://sqlfiddle.com/#!2/f7682/6

MySQL: Comparing dates in where clause with joins

I'm having a hard time figuring how to query/index a database.
The situation is pretty simple. Each time a user visits a category, his/her visit date is stored. My goal is to list the categories in which elements have been added after the user's latest visit.
Here are the two tables:
CREATE TABLE `elements` (
`category_id` int(11) NOT NULL,
`element_id` int(11) NOT NULL,
`title` varchar(255) NOT NULL,
`added_date` datetime NOT NULL,
PRIMARY KEY (`category_id`,`element_id`),
KEY `index_element_id` (`element_id`)
)
CREATE TABLE `categories_views` (
`member_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
`view_date` datetime NOT NULL,
PRIMARY KEY (`member_id`,`category_id`),
KEY `index_element_id` (`category_id`)
)
Query:
SELECT
categories_views.*,
elements.category_id
FROM
elements
INNER JOIN categories_views ON (categories_views.category_id = elements.category_id)
WHERE
categories_views.member_id = 1
AND elements.added_date > categories_views.view_date
GROUP BY elements.category_id
Explained:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: elements
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 89057
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: categories_views
type: eq_ref
possible_keys: PRIMARY,index_element_id
key: PRIMARY
key_len: 8
ref: const,convert.elements.category_id
rows: 1
Extra: Using where
With about 100k rows in each table, the query is taking around 0.3s, which is too long for something that should be executed for every user action in a web context.
If possible, what indexes should I add, or how should I rewrite this query in order to avoid using filesorts and temporary tables?
If each member has a relatively low number of category_views, I suggest testing a different query:
SELECT v.*
FROM categories_views v
WHERE v.member_id = 1
AND EXISTS
( SELECT 1
FROM elements e
WHERE e.category_id = v.category_id
AND e.added_date > v.view_date
)
For optimum performance of that query, you'd want to ensure you had indexes:
... ON elements (category_id, added_date)
... ON categories_views (member_id, category_id)
NOTE: It looks like the primary key on the categories_views table may be (member_id, category_id), which means an appropriate index already exists.
I'm assuming (as best as I can figure out from the original query) is that the categories_views table contains only the "latest" view of the category for a user, that is, member_id, category_id is unique. It looks like that has to be the case, if the original query is returning a correct result set (if its only returning categories that have "new" elements added since the "last view" of that category by the user; otherwise, the existence of any "older" view_date values in the categories_views table would trigger the inclusion of the category, even if there were a newer view_date that was later than the latest (max added_date) element in a category.
If that's not the case, i.e. (member_id,category_id) is not unique, then the query would need to be changed.
The query in the original question is a bit puzzling, it references element_views as a table name or table alias, but that doesn't appear in the EXPLAIN output. I'm going under the assumption that element_views is meant to be a synonym for categories_views.
For the original query, add a covering index on the elements table:
... ON elements (category_id, added_date)
The goal there is to get the explain output to show "Using index"
You might also try adding an index:
... ON categories_views (member_id, category_id, added_date)
To get all the columns from the categories_view table (for the select list), the query is going to have to visit the pages in the table (unless there's an index that contains all of those columns. The goal would be reduce the number of rows that need to be visited on data pages to find the row, by having all (or most) of the predicates satisfied from the index.
Is it necessary to return the category_id column from the elements table? Don't we already know that this is the same value as in the category_id column from the categories_views table, due to the inner join predicate?