improving the count() performance in MySQL - mysql

I have a mysql query like below.
SELECT `indexVal`, COUNT(`indexVal`)
FROM `key_word`
WHERE `hashed_word` IN ('001','01v','0ji','0k9','0vc','0#v','0%d','13#' ,'148' ,
'1e1','1sx','1v$','1#c','1?b','1?k','226','2kl','2ue',
'2*l','2?4','36h','3au','3us','4d~')
GROUP BY `indexVal`
This query take 5 seconds to generate the results! I even have a compound index created with ALTER TABLE key_word ADD INDEX (hashed_word, indexVal). Please note that my query is counting how many times indexVal appeared in the "search" and not how many times it appears in the "table".
My table is having 3 columns, 28 million records, future table will have billions of records. I am using InndoDB, I just selected it. Below is my table Show Create Table result
CREATE TABLE `key_word` (
`primary_key` bigint(20) NOT NULL AUTO_INCREMENT,
`indexVal` int(11) NOT NULL,
`hashed_word` char(3) NOT NULL,
PRIMARY KEY (`primary_key`),
KEY `hashed_word` (`hashed_word`,`indexVal`)
) ENGINE=InnoDB AUTO_INCREMENT=28570982 DEFAULT CHARSET=latin1
I ran the above select query with Explain command. Below is the result
So how can I speed up this? I prefer to have the result in less than 1 second. I appreciate your advice.
PS: I don't need the result to be in any order.

Try an index with reversed orders of columns:
create index xx on key_word( `indexVal`,`hashed_word`);
This may help prevent from using the filesort by the optimizer,
but I don't think that this can help to speed up the query by 500% from 5 sec to less than 1 sec.
You probably need a faster hardware.

Related

MySQL query is slower after index create [duplicate]

At first i will write some information about my test table.
This is books table with 665647 rows of data.
Below you can see how it looks.
I made 10 same queries for books with price equal
select * from books where price = 10
Execution time for all 10 queries was 9s 663ms.
After that i created index which you can see here:
i tried to run same 10 queries one more time.
Execution time for them was 21s 996ms.
show index from books;
Showed very wired data for me.
Possible value is just one!
What did i wrong? I was sure indexes are thing that can make our queries faster, not slower.
i found this topic : MySQL index slowing down query
but to be honest i dont really understand this especially Cardinality column
in my table books i have two possible values for price field at this moment
10 and 30 still show index from books; shows 1
#Edit1
SHOW CREATE TABLE books
Result:
CREATE TABLE `books` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`description` text COLLATE utf8mb4_unicode_ci NOT NULL,
`isbn` bigint unsigned NOT NULL,
`price` double(8,2) unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`author_id` bigint unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `books_isbn_unique` (`isbn`),
KEY `books_author_id_foreign` (`author_id`),
KEY `books_price_index` (`price`),
CONSTRAINT `books_author_id_foreign` FOREIGN KEY (`author_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=665648 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
#Edit2
I added new index create index nameIndex on books (name)
Which have big Cardinality value.
When i tried to do this query select * from books where name ='Inventore cumque quis.'
Before and after index create i can see the difference in execution time.
But still i dont understand how index'es works. I was sure about one thing - if i create new index in my database is creating new data structure with data which fit to this index.
For example if i have orws with price 10, 30 i got two "Tables" where are rows with these prices.
Is it realistic to have so many rows with the same price? Is it realistic to return 444K rows from a query? I ask these because query optimization is predicated on "normal" data.
An index (eg, INDEX(price)) is useful when looking for a price that occurs a small number of times. In fact, the Optimizer shuns the index if it sees that the value being searched for occurs more than about 20% of the time. Instead, it would simply ignore the index and do what you tested first--simply scan the entire table, ignoring any rows that don't match.
You should be able to see that by doing
EXPLAIN select * from books where price = 10
with and without the index. Alternatively, you can try:
EXPLAIN select * from books IGNORE INDEX(books_price_index) where price = 10
EXPLAIN select * from books FORCE INDEX(books_price_index) where price = 10
But, ... It seems that the Optimizer did not ignore the index. I see that the "cardinality" of price is "1", which implies that there is only one distinct value in that column. This 'statistic' is either incorrect or misleading. Please run this and see what changes:
ANALYZE TABLE books;
This will recompute the stats via a few random probes, and may change that "1" to perhaps "2".
General advice: Beware of benchmarks that run against fabricated data.
Maybe this?
https://stackoverflow.com/questions/755569/why-does-the-cardinality-of-an-index-in-mysql-remain-unchanged-when-i-add-a-new
Cardinality didnt get updated after index was created. Try to run the analyze table command.

MySQL query is slower after index create

At first i will write some information about my test table.
This is books table with 665647 rows of data.
Below you can see how it looks.
I made 10 same queries for books with price equal
select * from books where price = 10
Execution time for all 10 queries was 9s 663ms.
After that i created index which you can see here:
i tried to run same 10 queries one more time.
Execution time for them was 21s 996ms.
show index from books;
Showed very wired data for me.
Possible value is just one!
What did i wrong? I was sure indexes are thing that can make our queries faster, not slower.
i found this topic : MySQL index slowing down query
but to be honest i dont really understand this especially Cardinality column
in my table books i have two possible values for price field at this moment
10 and 30 still show index from books; shows 1
#Edit1
SHOW CREATE TABLE books
Result:
CREATE TABLE `books` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`description` text COLLATE utf8mb4_unicode_ci NOT NULL,
`isbn` bigint unsigned NOT NULL,
`price` double(8,2) unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`author_id` bigint unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `books_isbn_unique` (`isbn`),
KEY `books_author_id_foreign` (`author_id`),
KEY `books_price_index` (`price`),
CONSTRAINT `books_author_id_foreign` FOREIGN KEY (`author_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=665648 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
#Edit2
I added new index create index nameIndex on books (name)
Which have big Cardinality value.
When i tried to do this query select * from books where name ='Inventore cumque quis.'
Before and after index create i can see the difference in execution time.
But still i dont understand how index'es works. I was sure about one thing - if i create new index in my database is creating new data structure with data which fit to this index.
For example if i have orws with price 10, 30 i got two "Tables" where are rows with these prices.
Is it realistic to have so many rows with the same price? Is it realistic to return 444K rows from a query? I ask these because query optimization is predicated on "normal" data.
An index (eg, INDEX(price)) is useful when looking for a price that occurs a small number of times. In fact, the Optimizer shuns the index if it sees that the value being searched for occurs more than about 20% of the time. Instead, it would simply ignore the index and do what you tested first--simply scan the entire table, ignoring any rows that don't match.
You should be able to see that by doing
EXPLAIN select * from books where price = 10
with and without the index. Alternatively, you can try:
EXPLAIN select * from books IGNORE INDEX(books_price_index) where price = 10
EXPLAIN select * from books FORCE INDEX(books_price_index) where price = 10
But, ... It seems that the Optimizer did not ignore the index. I see that the "cardinality" of price is "1", which implies that there is only one distinct value in that column. This 'statistic' is either incorrect or misleading. Please run this and see what changes:
ANALYZE TABLE books;
This will recompute the stats via a few random probes, and may change that "1" to perhaps "2".
General advice: Beware of benchmarks that run against fabricated data.
Maybe this?
https://stackoverflow.com/questions/755569/why-does-the-cardinality-of-an-index-in-mysql-remain-unchanged-when-i-add-a-new
Cardinality didnt get updated after index was created. Try to run the analyze table command.

MYSQL server running query very slow on 1 Million records

I am using MYSQL to manage book library data, here is my main table
CREATE TABLE `book` (
`id` int(11) NOT NULL,
`bookId` int(11) DEFAULT NULL,
`pageNum` smallint(6) DEFAULT NULL,
`pageData` longtext
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Now when I run this query
SELECT c.id, c.bookId,c.pageNum, c.pageData, o.BookName FROM book c left
join kotarbooks o on c.bookId=o.BookId WHERE pageData like '%[Search Word]%'"
It takes about 3 minutes
and when I run query
SELECT * FROM book WHERE bookid=[bookid] AND pageNum=[pageNumber]
it takes about 2 minutes
Is there any idea to speed up these queries?
Thanks a lot
I run query SELECT * FROM book WHERE bookid=[bookid] AND pageNum=[pageNumber]
it takes about 2 minutes
You could create index to speed up filtering:
CREATE INDEX idx ON book(bookid, pageNum);
As for WHERE pageData like '%[Search Word]%' is not so trivial because you are searching using '%' at the beginning and it makes a query not SARGable.
The LIKE expression:
WHERE book.pageData LIKE '%[Search Word]%'
is not sargable, and therefore any index on the book table likely would not be used by MySQL. So, we can instead approach a strategy of a full table scan on the book table along with an index on the kotarbooks table:
CREATE INDEX kotaridx ON kotarabooks (BookId, BookName);
The inclusion of the BookId column should help the join, and BookName covers this column, since it appears the select clause.

MySQL Count Distinct - Very Slow

I have a very big MySQL InnoDB table with following structure:
TABLE `whois_records` (
`record_id` int(10) unsigned NOT NULL,
`domain_name` varchar(100) NOT NULL,
`tld_id` smallint(5) unsigned DEFAULT NULL,
`create_date` date DEFAULT NULL,
`update_date` date DEFAULT NULL,
`expiry_date` date DEFAULT NULL,
`query_time` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
PRIMARY KEY (`record_id`)
UNIQUE KEY `domain_time` (`domain_name`,`query_time`)
INDEX `tld_id` (`tld_id`)
This table currently has 10 Million rows.
It stores frequently updated details of domain names.
So there can be multiple records for same domain name in the table.
TLD ID is the numeric value of the type of domain extension.
Problem is when I'm trying to count the total number of domain names of a particular TLD.
I have tried the following 3 SQL queries:
SELECT COUNT(DISTINCT(domain_name)) FROM `whois_records` WHERE tld_id=159
SELECT COUNT(*) FROM `whois_records` WHERE tld_id=159 GROUP BY domain_name
SELECT COUNT(*) FROM ( SELECT 1 FROM `whois_records` WHERE tld_id=159 GROUP BY domain_name) q
All the 3 are very slow, taking between 5 to 10 minutes. It is also using up a lot of CPU to complete. There is INDEX defined on the TLD ID column, so these queries might be doing a FULL INDEX SCAN. It is still very slow. TLD ID of 159 is for ".com", which are the most in number. So when doing a search for 159, it is slowest. For non-popular TLD, with less than 100 domains, the same query takes around 0.10 seconds. TLD ID 159 has around 6 Million records, which is 60% of the entire table consisting of 10 Million rows.
Is there any way to optimize the calculation?
As table grows, the current queries will take longer. So please can anyone help me with a future proof solution to this problem. Is any alteration of table required? Plz help, thank you :)
Extend the index to contain domain_name as well:
INDEX `tld_id` (`tld_id`, `domain_name`)
This should make MySQL use only the index and not table data to compute the result. If the combination of both values is unique, instead add a new unique index:
UNIQUE INDEX `new_index` (`tld_id`, `domain_name`)
I doubt you can push it a lot further than that. If it is still not fast enough, think about caching the counters.

Optimizing MySQL table structure. Advice needed

I have these table structures and while it works, using EXPLAIN on certain SQL queries gives 'Using temporary; Using filesort' on one of the table. This might hamper performance once the table is populated with thousands of data. Below are the table structure and explanations of the system.
CREATE TABLE IF NOT EXISTS `jobapp` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fullname` varchar(50) NOT NULL,
`icno` varchar(14) NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '1',
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `icno` (`icno`)
) ENGINE=MyISAM;
CREATE TABLE IF NOT EXISTS `jobapplied` (
`appid` int(11) NOT NULL,
`jid` int(11) NOT NULL,
`jobstatus` tinyint(1) NOT NULL,
`timestamp` int(10) NOT NULL,
KEY `jid` (`jid`),
KEY `appid` (`appid`)
) ENGINE=MyISAM;
Query I tried which gives aforementioned statement:
EXPLAIN SELECT japp.id, japp.fullname, japp.icno, japp.status, japped.jid, japped.jobstatus
FROM jobapp AS japp
INNER JOIN jobapplied AS japped ON japp.id = japped.appid
WHERE japped.jid = '85'
AND japped.jobstatus = '2'
AND japp.status = '2'
ORDER BY japp.`timestamp` DESC
This system is for recruiting new staff. Once registration is opened, hundreds of applicant will register in a single time. They are allowed to select 5 different jobs. Later on at the end of registration session, the admin will go through each job one by one. I have used a single table (jobapplied) to store 2 items (applicant id, job id) to record who applied what. And this is the table which causes aforementioned statement. I realize this table is without PRIMARY key but I just can't figure out any other way later on for the admin to search specifically which job who have applied.
Any advice on how can I optimize the table?
Apart from the missing indexes and primary keys others have mentioned . . .
This might hamper performance once the
table is populated with thousands of
data.
You seem to be assuming that the query optimizer will use the same execution plan on a table with thousands of rows as it will on a table with just a few rows. Optimizers don't work like that.
The only reliable way to tell how a particular vendor's optimizer will execute a query on a table with thousands of rows--which is still a small table, and will probably easily fit in memory--is to
load a scratch version of the
database with thousands of rows
"explain" the query you're interested
in
FWIW, the last test I ran like this involved close to a billion rows--about 50 million in each of about 20 tables. The execution plan for that query--which included about 20 left outer joins--was a lot different than it was for the sample data (just a few thousand rows).
You are ordering by jobapp.timestamp, but there is no index for timestamp so the tablesort (and probably the temporary) will be necessary try adding and index for timestamp to jobapp something like KEY timid (timestamp,id)