MySQL query is slower after index create [duplicate] - mysql

At first i will write some information about my test table.
This is books table with 665647 rows of data.
Below you can see how it looks.
I made 10 same queries for books with price equal
select * from books where price = 10
Execution time for all 10 queries was 9s 663ms.
After that i created index which you can see here:
i tried to run same 10 queries one more time.
Execution time for them was 21s 996ms.
show index from books;
Showed very wired data for me.
Possible value is just one!
What did i wrong? I was sure indexes are thing that can make our queries faster, not slower.
i found this topic : MySQL index slowing down query
but to be honest i dont really understand this especially Cardinality column
in my table books i have two possible values for price field at this moment
10 and 30 still show index from books; shows 1
#Edit1
SHOW CREATE TABLE books
Result:
CREATE TABLE `books` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`description` text COLLATE utf8mb4_unicode_ci NOT NULL,
`isbn` bigint unsigned NOT NULL,
`price` double(8,2) unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`author_id` bigint unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `books_isbn_unique` (`isbn`),
KEY `books_author_id_foreign` (`author_id`),
KEY `books_price_index` (`price`),
CONSTRAINT `books_author_id_foreign` FOREIGN KEY (`author_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=665648 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
#Edit2
I added new index create index nameIndex on books (name)
Which have big Cardinality value.
When i tried to do this query select * from books where name ='Inventore cumque quis.'
Before and after index create i can see the difference in execution time.
But still i dont understand how index'es works. I was sure about one thing - if i create new index in my database is creating new data structure with data which fit to this index.
For example if i have orws with price 10, 30 i got two "Tables" where are rows with these prices.

Is it realistic to have so many rows with the same price? Is it realistic to return 444K rows from a query? I ask these because query optimization is predicated on "normal" data.
An index (eg, INDEX(price)) is useful when looking for a price that occurs a small number of times. In fact, the Optimizer shuns the index if it sees that the value being searched for occurs more than about 20% of the time. Instead, it would simply ignore the index and do what you tested first--simply scan the entire table, ignoring any rows that don't match.
You should be able to see that by doing
EXPLAIN select * from books where price = 10
with and without the index. Alternatively, you can try:
EXPLAIN select * from books IGNORE INDEX(books_price_index) where price = 10
EXPLAIN select * from books FORCE INDEX(books_price_index) where price = 10
But, ... It seems that the Optimizer did not ignore the index. I see that the "cardinality" of price is "1", which implies that there is only one distinct value in that column. This 'statistic' is either incorrect or misleading. Please run this and see what changes:
ANALYZE TABLE books;
This will recompute the stats via a few random probes, and may change that "1" to perhaps "2".
General advice: Beware of benchmarks that run against fabricated data.

Maybe this?
https://stackoverflow.com/questions/755569/why-does-the-cardinality-of-an-index-in-mysql-remain-unchanged-when-i-add-a-new
Cardinality didnt get updated after index was created. Try to run the analyze table command.

Related

MySQL query is slower after index create

At first i will write some information about my test table.
This is books table with 665647 rows of data.
Below you can see how it looks.
I made 10 same queries for books with price equal
select * from books where price = 10
Execution time for all 10 queries was 9s 663ms.
After that i created index which you can see here:
i tried to run same 10 queries one more time.
Execution time for them was 21s 996ms.
show index from books;
Showed very wired data for me.
Possible value is just one!
What did i wrong? I was sure indexes are thing that can make our queries faster, not slower.
i found this topic : MySQL index slowing down query
but to be honest i dont really understand this especially Cardinality column
in my table books i have two possible values for price field at this moment
10 and 30 still show index from books; shows 1
#Edit1
SHOW CREATE TABLE books
Result:
CREATE TABLE `books` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`description` text COLLATE utf8mb4_unicode_ci NOT NULL,
`isbn` bigint unsigned NOT NULL,
`price` double(8,2) unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`author_id` bigint unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `books_isbn_unique` (`isbn`),
KEY `books_author_id_foreign` (`author_id`),
KEY `books_price_index` (`price`),
CONSTRAINT `books_author_id_foreign` FOREIGN KEY (`author_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=665648 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
#Edit2
I added new index create index nameIndex on books (name)
Which have big Cardinality value.
When i tried to do this query select * from books where name ='Inventore cumque quis.'
Before and after index create i can see the difference in execution time.
But still i dont understand how index'es works. I was sure about one thing - if i create new index in my database is creating new data structure with data which fit to this index.
For example if i have orws with price 10, 30 i got two "Tables" where are rows with these prices.
Is it realistic to have so many rows with the same price? Is it realistic to return 444K rows from a query? I ask these because query optimization is predicated on "normal" data.
An index (eg, INDEX(price)) is useful when looking for a price that occurs a small number of times. In fact, the Optimizer shuns the index if it sees that the value being searched for occurs more than about 20% of the time. Instead, it would simply ignore the index and do what you tested first--simply scan the entire table, ignoring any rows that don't match.
You should be able to see that by doing
EXPLAIN select * from books where price = 10
with and without the index. Alternatively, you can try:
EXPLAIN select * from books IGNORE INDEX(books_price_index) where price = 10
EXPLAIN select * from books FORCE INDEX(books_price_index) where price = 10
But, ... It seems that the Optimizer did not ignore the index. I see that the "cardinality" of price is "1", which implies that there is only one distinct value in that column. This 'statistic' is either incorrect or misleading. Please run this and see what changes:
ANALYZE TABLE books;
This will recompute the stats via a few random probes, and may change that "1" to perhaps "2".
General advice: Beware of benchmarks that run against fabricated data.
Maybe this?
https://stackoverflow.com/questions/755569/why-does-the-cardinality-of-an-index-in-mysql-remain-unchanged-when-i-add-a-new
Cardinality didnt get updated after index was created. Try to run the analyze table command.

Mysql, In my matrix, an index is not used

given this table:
CREATE TABLE `matrix` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`city1_id` int(10) unsigned NOT NULL DEFAULT '0',
`city2_id` int(10) unsigned NOT NULL DEFAULT '0',
`timeinmin` mediumint(8) unsigned NOT NULL DEFAULT '0',
`distancem` mediumint(8) unsigned NOT NULL DEFAULT '0',
`OWNER` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `city12_index` (`city1_id`,`city2_id`),
UNIQUE KEY `city21_index` (`city2_id`,`city1_id`),
KEY `city1_index` (`city1_id`),
KEY `city2_index` (`city2_id`),
KEY `ownerIndex` (`OWNER`),
CONSTRAINT `PK_city_city1` FOREIGN KEY (`city1_id`) REFERENCES `city` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `PK_city_city2` FOREIGN KEY (`city2_id`) REFERENCES `city` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=5118409 DEFAULT CHARSET=utf8;
there are very huge amount of datas.
This SQL runs very fast:
select count(*) from city_matrix where owner=1
since there is index on "owner"
select count(*) from city_matrix where owner=1 order by id
this also runs very fast. But this:
select count(*) from city_matrix where owner=1 order by city1_id
requires some seconds, BUT there is index on city1_id too!
The explain tells this:
1, 'SIMPLE', 'city_matrix', '', 'ref', 'ownerIndex', 'ownerIndex', '4', 'const', 169724, 100.00, ''
This is a great question. MySQL determines the right index based on many different cases. Its main goal is to find the most suitable index that can retrieve the data fast.
select count(*) from city_matrix where owner=1 order by id
In this query, MySQL determined that where owner=1 reduced the results to a small enough number that ordering by ID was relatively easy. For example, if ID is also a key (primary/unique/index), which I suspect it is, MySQL could take advantage of ID for sorting.
In case of this:
select count(*) from city_matrix where owner=1 order by city1_id
MySQL can still filter out all the records for owner but will take time to shuffle all the city1_id data so that you receive sorted result. Since it took time, show processlist during that time could have showed you that the query was reordering data.
To help MySQL do the job faster, we can use something called a covering index. Covering index has all fields used in the query so that MySQL just has to read through the index to get you the data without having to touch the underlying table. A composite index on owner and city1_id will help MySQL use one single index to filter data, and that same index again to sort data and then do a count on it.
So, let's create the covering index:
create index idx_city_matrix_city1_owner on city_matrix(owner, city1_id)
As you noticed, MySQL took some time to make the index and once the index was ready, it could zip through data pretty quickly to give you counts.
EDIT: It is important to note that when you do count(*) like the statements about do, you don't need ordering. The resultset is scalar - just one value. Ordering by any field does not impact your count. For example, count all the fruits on the table will give you the same results as count all the fruits on the table ordered by its size.
The process for retrieval and index application is as follows:
The intermediate result which is retrieved by MySQL for the key owner is "stored" in a temporary table (either in memory or on disk depending on the size of the result).
Based on histogram data on the intermediate result an index can be applied. If the data is not unique enough, the index can be discarded as not useful (for example: There are only 5 cities in this 169k results).
Work around:
Apply the index with a hint: This is considered poor since it can lead to unwanted index use speeding up one query and slowing down the next one (Yes, an index can make a query slower);
Create a multi column index which contains both the owner and city1_id.
One last remark
An order by on a COUNT(*) always slows down everything since the order by does not change anything of your result.

Performance of simple SELECT operation on big (2 GB) table

I've really simple query to get MIN and MAX values, it looks like:
SELECT MAX(value_avg)
, MIN(value_avg)
FROM value_data
WHERE value_id = 769
AND time_id BETWEEN 214000 AND 219760;
And here you are the schema of the value_data table:
CREATE TABLE `value_data` (
`value_id` int(11) NOT NULL,
`time_id` bigint(20) NOT NULL,
`value_min` float DEFAULT NULL,
`value_avg` float DEFAULT NULL,
`value_max` float DEFAULT NULL,
KEY `idx_vdata_vid` (`value_id`),
KEY `idx_vdata_tid` (`time_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
As you see, the query and the table are simple and I don't see anything wrong here, but when I execute this query, it takes about ~9 seconds to get data. I also made profile of this query, and 99% of time is "Sending data".
The table is really big and it weighs about 2 GB, but is it a problem? I don't think this table is too big, it must be something else...
MySQL can easily handle a database of that size. However, you should be able to improve the performance of this query and probably the table in general. By changing the time_id column to an UNSIGNED INT NOT NULL, you can significantly decrease the size of the data and indexes on that column. Also, the query you mention could benefit from a composite index on (value_id, time_id). With that index, it would be able to use the index for both parts of the query instead of just one as it is now.
Also, please edit your question with an EXPLAIN of the query. It should confirm what I expect about the indexes, but it's always helpful information to have.
Edit:
You don't have a PRIMARY index defined for the table, which definitely isn't helping your situation. If the values of (value_id, time_id) are unique, you should probably make the new composite index I mention above the PRIMARY index for the table.

Count the number of rows between unix time stamps for each ID

I'm trying to populate some data for a table. The query is being run on a table that contains ~50 million records. The query I'm currently using is below. It counts the number of rows that match the template id and are BETWEEN two unix timestamps:
SELECT COUNT(*) as count FROM `s_log`
WHERE `time_sent` BETWEEN '1346904000' AND '1346993271'
AND `template` = '1'
While the query above does work, performance is rather slow while looping through each template which at times can be in the hundreds. The time stamps are stored as int and are properly indexed. Just to test thing out, I tried running the query below, omitting the time_sent restriction:
SELECT COUNT(*) as count FROM `s_log`
AND `template` = '1'
As expected, it runs very fast, but is obviously not restricting count results inside the correct time frame. How can I obtain a count for a specific template AND restrict that count BETWEEN two unix timestamps?
EXPLAIN:
1 | SIMPLE | s_log | ref | time_sent,template | template | 4 | const | 71925 | Using where
SHOW CREATE TABLE s_log:
CREATE TABLE `s_log` (
`id` int(255) NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL,
`time_sent` int(25) NOT NULL,
`template` int(55) NOT NULL,
`key` varchar(255) NOT NULL,
`node_id` int(55) NOT NULL,
`status` varchar(55) NOT NULL,
PRIMARY KEY (`id`),
KEY `email` (`email`),
KEY `time_sent` (`time_sent`),
KEY `template` (`template`),
KEY `node_id` (`node_id`),
KEY `key` (`key`),
KEY `status` (`status`),
KEY `timestamp` (`timestamp`)
) ENGINE=MyISAM AUTO_INCREMENT=2078966 DEFAULT CHARSET=latin1
The best index you may have in this case is composite one template + time_sent
CREATE INDEX template_time_sent ON s_log (template, time_sent)
PS: Also as long as all your columns in the query are integer DON'T enclose their values in quotes (in some cases it could lead to issues, at least with older mysql versions)
First, you have to create an index that has both of your columns together (not seperately). Also check your table type, i think it would work great if your table is innoDB.
And lastly, use your WHERE clause in this fashion:
`WHEREtemplate= '1' ANDtime_sent` BETWEEN '1346904000' AND '1346993271'
What this does is first check if template is 1, if it is then it would check for the second condition else skip. This will definitely give you performance-edge
If you have to call the query for each template maybe it would be faster to get all the information with one query call by using GROUP BY:
SELECT template, COUNT(*) as count FROM `s_log`
WHERE `time_sent` BETWEEN 1346904000 AND 1346993271;
GROUP BY template
It's just a guess that this would be faster and you also would have to redesign your code a bit.
You could also try to use InnoDB instead of MyISAM. InnoDB uses a clustered index which maybe performs better on large tables. From the MySQL site:
Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
There are some questions on Stackoverflow which discuss the performance between InnoDB and MyISAM:
Should I use MyISAM or InnoDB Tables for my MySQL Database?
Migrating from MyISAM to InnoDB
MyISAM versus InnoDB

Optimizing MySQL table structure. Advice needed

I have these table structures and while it works, using EXPLAIN on certain SQL queries gives 'Using temporary; Using filesort' on one of the table. This might hamper performance once the table is populated with thousands of data. Below are the table structure and explanations of the system.
CREATE TABLE IF NOT EXISTS `jobapp` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fullname` varchar(50) NOT NULL,
`icno` varchar(14) NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '1',
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `icno` (`icno`)
) ENGINE=MyISAM;
CREATE TABLE IF NOT EXISTS `jobapplied` (
`appid` int(11) NOT NULL,
`jid` int(11) NOT NULL,
`jobstatus` tinyint(1) NOT NULL,
`timestamp` int(10) NOT NULL,
KEY `jid` (`jid`),
KEY `appid` (`appid`)
) ENGINE=MyISAM;
Query I tried which gives aforementioned statement:
EXPLAIN SELECT japp.id, japp.fullname, japp.icno, japp.status, japped.jid, japped.jobstatus
FROM jobapp AS japp
INNER JOIN jobapplied AS japped ON japp.id = japped.appid
WHERE japped.jid = '85'
AND japped.jobstatus = '2'
AND japp.status = '2'
ORDER BY japp.`timestamp` DESC
This system is for recruiting new staff. Once registration is opened, hundreds of applicant will register in a single time. They are allowed to select 5 different jobs. Later on at the end of registration session, the admin will go through each job one by one. I have used a single table (jobapplied) to store 2 items (applicant id, job id) to record who applied what. And this is the table which causes aforementioned statement. I realize this table is without PRIMARY key but I just can't figure out any other way later on for the admin to search specifically which job who have applied.
Any advice on how can I optimize the table?
Apart from the missing indexes and primary keys others have mentioned . . .
This might hamper performance once the
table is populated with thousands of
data.
You seem to be assuming that the query optimizer will use the same execution plan on a table with thousands of rows as it will on a table with just a few rows. Optimizers don't work like that.
The only reliable way to tell how a particular vendor's optimizer will execute a query on a table with thousands of rows--which is still a small table, and will probably easily fit in memory--is to
load a scratch version of the
database with thousands of rows
"explain" the query you're interested
in
FWIW, the last test I ran like this involved close to a billion rows--about 50 million in each of about 20 tables. The execution plan for that query--which included about 20 left outer joins--was a lot different than it was for the sample data (just a few thousand rows).
You are ordering by jobapp.timestamp, but there is no index for timestamp so the tablesort (and probably the temporary) will be necessary try adding and index for timestamp to jobapp something like KEY timid (timestamp,id)