MySQL table index optimization - mysql

I'm working with an application that has a MySQL database at Amazon RDS. The table in questions is set up as such:
CREATE TABLE `log` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`timestamp` datetime NOT NULL,
`username` varchar(45) NOT NULL,
.. snip some varchar and int fields ..
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
This system has been in beta for a while and already the dataset is quite huge and the queries are starting to be rather slow.
SELECT COUNT(*) FROM log --> 16307224 (takes 105 seconds to complete)
This table is pretty much only used to build one report off a query like this
SELECT timestamp, username, [a few more] FROM log
WHERE timestamp BETWEEN '2012-03-30 08:00:00' AND '2012-03-30 16:00:00'
AND username='XX'
Which typically will give something between 1000 and 6000 rows taking around 100-180 sec to complete, meaning the web application will often time out and leave an empty report (I will look in to the timeout as well, but this question is for the root cause).
I'm not very good with databases, but my guess is that it's the BETWEEN that's killing me here. What I'm thinking is that I should perhaps somehow use the timestamp as index. Timestamp togethere with username should still provide uniqueness (I don't use the id field for anything).
If there's anyone out there with suggestions for optimizations I'm all ears.
UPDATE:
Table is now altered to the following
CREATE TABLE `log` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`timestamp` datetime NOT NULL,
`username` varchar(45) NOT NULL,
.. snip ..
`task_id` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_un_ts` (`timestamp`,`username`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
EXPLAIN of the SELECT statement returns the following
id => 1
select_type => SIMPLE
table => log
type => range
possible_keys => index_un_ts
key => index_un_ts
key_len => 55
ref =>
rows => 52258
Extra => Using where; Using index

Well a index on the timestamp column and userid would be helpful. You need to be able to read the output of a EXPLAIN Statement.
Go to MySQL and do the Following:
EXPLAIN SELECT timestamp, username, [a few more] FROM log
WHERE timestamp BETWEEN '2012-03-30 08:00:00' AND '2012-03-30 16:00:00'
AND username='XX'
This show you the plan MySQL uses to execute the query. There will be column called key. This indicates what index MySQL is using in the query. I suspect you will see ALL there which means MySQL is scanning the table from top to bottom matching every row against your where clause. Now create a index on the timestamp and userid columns. Run the EXPLAIN statement again. You should see the index that you created in the key column.
If MySQL uses the index then your query should be considerably faster. Just remember not to over index. Indexes make inserts, updates and deletes slower. When you insert a new row into a table and there is three indexes on the table the new row has to write 3 values to the three different indexes. So it is a double edged sword.

Related

MySQL query is slower after index create [duplicate]

At first i will write some information about my test table.
This is books table with 665647 rows of data.
Below you can see how it looks.
I made 10 same queries for books with price equal
select * from books where price = 10
Execution time for all 10 queries was 9s 663ms.
After that i created index which you can see here:
i tried to run same 10 queries one more time.
Execution time for them was 21s 996ms.
show index from books;
Showed very wired data for me.
Possible value is just one!
What did i wrong? I was sure indexes are thing that can make our queries faster, not slower.
i found this topic : MySQL index slowing down query
but to be honest i dont really understand this especially Cardinality column
in my table books i have two possible values for price field at this moment
10 and 30 still show index from books; shows 1
#Edit1
SHOW CREATE TABLE books
Result:
CREATE TABLE `books` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`description` text COLLATE utf8mb4_unicode_ci NOT NULL,
`isbn` bigint unsigned NOT NULL,
`price` double(8,2) unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`author_id` bigint unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `books_isbn_unique` (`isbn`),
KEY `books_author_id_foreign` (`author_id`),
KEY `books_price_index` (`price`),
CONSTRAINT `books_author_id_foreign` FOREIGN KEY (`author_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=665648 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
#Edit2
I added new index create index nameIndex on books (name)
Which have big Cardinality value.
When i tried to do this query select * from books where name ='Inventore cumque quis.'
Before and after index create i can see the difference in execution time.
But still i dont understand how index'es works. I was sure about one thing - if i create new index in my database is creating new data structure with data which fit to this index.
For example if i have orws with price 10, 30 i got two "Tables" where are rows with these prices.
Is it realistic to have so many rows with the same price? Is it realistic to return 444K rows from a query? I ask these because query optimization is predicated on "normal" data.
An index (eg, INDEX(price)) is useful when looking for a price that occurs a small number of times. In fact, the Optimizer shuns the index if it sees that the value being searched for occurs more than about 20% of the time. Instead, it would simply ignore the index and do what you tested first--simply scan the entire table, ignoring any rows that don't match.
You should be able to see that by doing
EXPLAIN select * from books where price = 10
with and without the index. Alternatively, you can try:
EXPLAIN select * from books IGNORE INDEX(books_price_index) where price = 10
EXPLAIN select * from books FORCE INDEX(books_price_index) where price = 10
But, ... It seems that the Optimizer did not ignore the index. I see that the "cardinality" of price is "1", which implies that there is only one distinct value in that column. This 'statistic' is either incorrect or misleading. Please run this and see what changes:
ANALYZE TABLE books;
This will recompute the stats via a few random probes, and may change that "1" to perhaps "2".
General advice: Beware of benchmarks that run against fabricated data.
Maybe this?
https://stackoverflow.com/questions/755569/why-does-the-cardinality-of-an-index-in-mysql-remain-unchanged-when-i-add-a-new
Cardinality didnt get updated after index was created. Try to run the analyze table command.

MySQL query is slower after index create

At first i will write some information about my test table.
This is books table with 665647 rows of data.
Below you can see how it looks.
I made 10 same queries for books with price equal
select * from books where price = 10
Execution time for all 10 queries was 9s 663ms.
After that i created index which you can see here:
i tried to run same 10 queries one more time.
Execution time for them was 21s 996ms.
show index from books;
Showed very wired data for me.
Possible value is just one!
What did i wrong? I was sure indexes are thing that can make our queries faster, not slower.
i found this topic : MySQL index slowing down query
but to be honest i dont really understand this especially Cardinality column
in my table books i have two possible values for price field at this moment
10 and 30 still show index from books; shows 1
#Edit1
SHOW CREATE TABLE books
Result:
CREATE TABLE `books` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`description` text COLLATE utf8mb4_unicode_ci NOT NULL,
`isbn` bigint unsigned NOT NULL,
`price` double(8,2) unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`author_id` bigint unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `books_isbn_unique` (`isbn`),
KEY `books_author_id_foreign` (`author_id`),
KEY `books_price_index` (`price`),
CONSTRAINT `books_author_id_foreign` FOREIGN KEY (`author_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=665648 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
#Edit2
I added new index create index nameIndex on books (name)
Which have big Cardinality value.
When i tried to do this query select * from books where name ='Inventore cumque quis.'
Before and after index create i can see the difference in execution time.
But still i dont understand how index'es works. I was sure about one thing - if i create new index in my database is creating new data structure with data which fit to this index.
For example if i have orws with price 10, 30 i got two "Tables" where are rows with these prices.
Is it realistic to have so many rows with the same price? Is it realistic to return 444K rows from a query? I ask these because query optimization is predicated on "normal" data.
An index (eg, INDEX(price)) is useful when looking for a price that occurs a small number of times. In fact, the Optimizer shuns the index if it sees that the value being searched for occurs more than about 20% of the time. Instead, it would simply ignore the index and do what you tested first--simply scan the entire table, ignoring any rows that don't match.
You should be able to see that by doing
EXPLAIN select * from books where price = 10
with and without the index. Alternatively, you can try:
EXPLAIN select * from books IGNORE INDEX(books_price_index) where price = 10
EXPLAIN select * from books FORCE INDEX(books_price_index) where price = 10
But, ... It seems that the Optimizer did not ignore the index. I see that the "cardinality" of price is "1", which implies that there is only one distinct value in that column. This 'statistic' is either incorrect or misleading. Please run this and see what changes:
ANALYZE TABLE books;
This will recompute the stats via a few random probes, and may change that "1" to perhaps "2".
General advice: Beware of benchmarks that run against fabricated data.
Maybe this?
https://stackoverflow.com/questions/755569/why-does-the-cardinality-of-an-index-in-mysql-remain-unchanged-when-i-add-a-new
Cardinality didnt get updated after index was created. Try to run the analyze table command.

Very slow query on mysql table with 35 million rows

I am trying to figure out why a query is so slow on my MySQL database. I've read various content about MySQL performance, various SO questions, but this stays a riddle for me.
I am using MySQL 5.6.23-log - MySQL Community Server (GPL)
I have a table with roughly 35 million rows.
This table is being inserted to about 5 times / second
The table looks like this:
I have indexes on all the columns except for answer_text
The query I'm running is:
SELECT answer_id, COUNT(1)
FROM answers_onsite a
WHERE a.screen_id=384
AND a.timestamp BETWEEN 1462670000000 AND 1463374800000
GROUP BY a.answer_id
this query takes roughly 20-30 seconds, then gives a result set:
Any insights?
EDIT
as asked, my show create table:
CREATE TABLE 'answers_onsite' (
'id' bigint(20) unsigned NOT NULL AUTO_INCREMENT,
'device_id' bigint(20) unsigned NOT NULL,
'survey_id' bigint(20) unsigned NOT NULL,
'answer_set_group' varchar(255) NOT NULL,
'timestamp' bigint(20) unsigned NOT NULL,
'screen_id' bigint(20) unsigned NOT NULL,
'answer_id' bigint(20) unsigned NOT NULL DEFAULT '0',
'answer_text' text,
PRIMARY KEY ('id'),
KEY 'device_id' ('device_id'),
KEY 'survey_id' ('survey_id'),
KEY 'answer_set_group' ('answer_set_group'),
KEY 'timestamp' ('timestamp'),
KEY 'screen_id' ('screen_id'),
KEY 'answer_id' ('answer_id')
) ENGINE=InnoDB AUTO_INCREMENT=35716605 DEFAULT CHARSET=utf8
ALTER TABLE answers_onsite ADD key complex_index (screen_id,`timestamp`,answer_id);
you can use mysql Partitioning like this :
alter table answers_onsite drop primary key;
alter table answers_onsite add primary key (id, timestamp) partition by HASH(id) partitions 500;
Running the above may take a while depending on the size of your table.
Look at your WHERE clause:
WHERE a.screen_id=384
AND a.timestamp BETWEEN 1462670000000 AND 1463374800000
GROUP BY a.answer_id
I would create a composite index (screen_id, answer_id, timestamp) and run some tests.
You could also try (screen_id, timestamp, answer_id) to see if it performs better.
The BETWEEN clause is known to be slow though, as any range query. So is COUNT on million of rows. I would count once a day and save the result to a 'Stats' table which you can query when you need...obviously if you do not need live data.

Count the number of rows between unix time stamps for each ID

I'm trying to populate some data for a table. The query is being run on a table that contains ~50 million records. The query I'm currently using is below. It counts the number of rows that match the template id and are BETWEEN two unix timestamps:
SELECT COUNT(*) as count FROM `s_log`
WHERE `time_sent` BETWEEN '1346904000' AND '1346993271'
AND `template` = '1'
While the query above does work, performance is rather slow while looping through each template which at times can be in the hundreds. The time stamps are stored as int and are properly indexed. Just to test thing out, I tried running the query below, omitting the time_sent restriction:
SELECT COUNT(*) as count FROM `s_log`
AND `template` = '1'
As expected, it runs very fast, but is obviously not restricting count results inside the correct time frame. How can I obtain a count for a specific template AND restrict that count BETWEEN two unix timestamps?
EXPLAIN:
1 | SIMPLE | s_log | ref | time_sent,template | template | 4 | const | 71925 | Using where
SHOW CREATE TABLE s_log:
CREATE TABLE `s_log` (
`id` int(255) NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL,
`time_sent` int(25) NOT NULL,
`template` int(55) NOT NULL,
`key` varchar(255) NOT NULL,
`node_id` int(55) NOT NULL,
`status` varchar(55) NOT NULL,
PRIMARY KEY (`id`),
KEY `email` (`email`),
KEY `time_sent` (`time_sent`),
KEY `template` (`template`),
KEY `node_id` (`node_id`),
KEY `key` (`key`),
KEY `status` (`status`),
KEY `timestamp` (`timestamp`)
) ENGINE=MyISAM AUTO_INCREMENT=2078966 DEFAULT CHARSET=latin1
The best index you may have in this case is composite one template + time_sent
CREATE INDEX template_time_sent ON s_log (template, time_sent)
PS: Also as long as all your columns in the query are integer DON'T enclose their values in quotes (in some cases it could lead to issues, at least with older mysql versions)
First, you have to create an index that has both of your columns together (not seperately). Also check your table type, i think it would work great if your table is innoDB.
And lastly, use your WHERE clause in this fashion:
`WHEREtemplate= '1' ANDtime_sent` BETWEEN '1346904000' AND '1346993271'
What this does is first check if template is 1, if it is then it would check for the second condition else skip. This will definitely give you performance-edge
If you have to call the query for each template maybe it would be faster to get all the information with one query call by using GROUP BY:
SELECT template, COUNT(*) as count FROM `s_log`
WHERE `time_sent` BETWEEN 1346904000 AND 1346993271;
GROUP BY template
It's just a guess that this would be faster and you also would have to redesign your code a bit.
You could also try to use InnoDB instead of MyISAM. InnoDB uses a clustered index which maybe performs better on large tables. From the MySQL site:
Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
There are some questions on Stackoverflow which discuss the performance between InnoDB and MyISAM:
Should I use MyISAM or InnoDB Tables for my MySQL Database?
Migrating from MyISAM to InnoDB
MyISAM versus InnoDB

Optimizing MySQL table structure. Advice needed

I have these table structures and while it works, using EXPLAIN on certain SQL queries gives 'Using temporary; Using filesort' on one of the table. This might hamper performance once the table is populated with thousands of data. Below are the table structure and explanations of the system.
CREATE TABLE IF NOT EXISTS `jobapp` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fullname` varchar(50) NOT NULL,
`icno` varchar(14) NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '1',
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `icno` (`icno`)
) ENGINE=MyISAM;
CREATE TABLE IF NOT EXISTS `jobapplied` (
`appid` int(11) NOT NULL,
`jid` int(11) NOT NULL,
`jobstatus` tinyint(1) NOT NULL,
`timestamp` int(10) NOT NULL,
KEY `jid` (`jid`),
KEY `appid` (`appid`)
) ENGINE=MyISAM;
Query I tried which gives aforementioned statement:
EXPLAIN SELECT japp.id, japp.fullname, japp.icno, japp.status, japped.jid, japped.jobstatus
FROM jobapp AS japp
INNER JOIN jobapplied AS japped ON japp.id = japped.appid
WHERE japped.jid = '85'
AND japped.jobstatus = '2'
AND japp.status = '2'
ORDER BY japp.`timestamp` DESC
This system is for recruiting new staff. Once registration is opened, hundreds of applicant will register in a single time. They are allowed to select 5 different jobs. Later on at the end of registration session, the admin will go through each job one by one. I have used a single table (jobapplied) to store 2 items (applicant id, job id) to record who applied what. And this is the table which causes aforementioned statement. I realize this table is without PRIMARY key but I just can't figure out any other way later on for the admin to search specifically which job who have applied.
Any advice on how can I optimize the table?
Apart from the missing indexes and primary keys others have mentioned . . .
This might hamper performance once the
table is populated with thousands of
data.
You seem to be assuming that the query optimizer will use the same execution plan on a table with thousands of rows as it will on a table with just a few rows. Optimizers don't work like that.
The only reliable way to tell how a particular vendor's optimizer will execute a query on a table with thousands of rows--which is still a small table, and will probably easily fit in memory--is to
load a scratch version of the
database with thousands of rows
"explain" the query you're interested
in
FWIW, the last test I ran like this involved close to a billion rows--about 50 million in each of about 20 tables. The execution plan for that query--which included about 20 left outer joins--was a lot different than it was for the sample data (just a few thousand rows).
You are ordering by jobapp.timestamp, but there is no index for timestamp so the tablesort (and probably the temporary) will be necessary try adding and index for timestamp to jobapp something like KEY timid (timestamp,id)