Simple MySQL select taking too much time - mysql

I am having following simple query, but its taking too much time (.5 seconds/half of a second), I am unable to find the reason. I want to improve the performance of Search.
Title
VARCHAR (50), Collation = latin1_general_ci, NULL allowed, MySQL Engine: MyISAM
Number of record in Product table: 163129
CatID
Foreign Key (int) From Category table
I already have tried this Optimize statement
OPTIMIZE table Product
Here is the simple query.
SELECT COUNT(*)
FROM
Product
WHERE
CatID=123
AND Title = 'abc xyz'
Please help. Thanks in advance.

Taking more time meaning the table is not properly indexed and probably doing full table scan.
For this particular query a covering index would be useful
alter table Product add index cat_title_idx(CatID,Title);
Make sure to take a backup of the table before applying the index.

Related

Order by fields from different tables create full table scan. Should I combine data to one table?

This is a theoretical question. Sorry, but I don't have a working tables data to show, I'll try to improvise with a theoretical example.
Using MySql/MariaDB. Have indexes for all relevant fields.
I have a system, which historical design had a ProductType table, something like:
ID=1, Description="Milk"
ID=2, Description="Bread"
ID=3, Description="Salt"
ID=4, Description="Sugar"
and so on.
There are some features in the system that rely on the ProductType ID and the Description is also used in different places, such as for defining different properties of the product type.
There is also a Product table, with fields such as:
ID, ProductTypeID, Name
The Product:Name don't have the product type description in it, so a "Milk bottle 1l" will have an entry such as:
ID=101, ProductTypeID=1, Name="bottle 1l"
and "Sugar pack 1kg" will be:
ID=102, ProductTypeID=4, Name="pack 1kg"
You get the idea...
The system combines the ProductType:Description and Product:Name to show full product names to the users. This creates a systematic naming for all the products, so there is no way to define a product with a name such as "1l bottle of milk". I know that in English that might be hard to swallow, but that way works great with my local language.
Years passed, the database grow to millions of products.
Since full-text index should have all searched data in one table, I had to store the ProductType:Description inside the Product table in a string field I added that have different keywords related to the product, so the full-text search will be able to find anything related to the product (type, name, barcode, SKU and etc.)
Now I'm trying to solve the full table scans and it makes me think that current design might not be optimal and I'll have to redesign and store the full product name (type + name) in the same table...
In order to show the proper order of the products there's an ORDER BY TypeDescription ASC, ProductName ASC after the ProductType table is joined to Product select queries.
From my research I see that the database can't use indexes when the order is done on fields from different tables, so it's doing full table scan to get to the right entries.
During pagination, there's ORDER and LIMIT 50000,100 in the query that take lots of time.
There are sections with lots for products, so that ordering and limiting cause very long full table scans.
How would you handle that situation?
Change the design and store all query related data to the Product table? Feels a bit of a duplication and not natural solution.
Or maybe there's another way to solve it?
Will index on VARCHAR type (product name) be efficient for the ORDER speed? Or the database will still do full table scan?
My first question here. Couldn't find answers on similar cases.
Thanks!
I've tried to play with the queries to see if ordering by a VARCHAR field that have an index will work, but the EXPLAIN SELECT still shows that the query didn't use the index and did WHERE run :(
UPDATE
Trying to add some more data...
The situation is a bit more complicated and after digging a bit more it looks like the initial question was not in the right direction.
I removed the product type from the queries and still have the slow query.
I feel like it's a chicken and egg situation...
I have a table that maps prodcut IDs to section IDs:
CREATE TABLE `Product2Section` (
`SectionId` int(10) unsigned NOT NULL,
`ProductId` int(10) unsigned NOT NULL,
KEY `idx_ProductId` (`ProductId`),
KEY `idx_SectionId` (`SectionId`),
KEY `idx_ProductId_SectionId` (`ProductId`,`SectionId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
The query (after stripping all non-relevant to the question feilds):
SELECT DISTINCT
DRIVER.ProductId AS ID,
p.*
FROM
Product2Section AS DRIVER
LEFT JOIN Product p ON
(p.ID = DRIVER.ProductId)
WHERE
DRIVER.SectionId IN(
544,545,546,548,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,566,567,568,570,571,572,573,574,575,1337,1343,1353,1358,1369,1385,1956,1957,1964,1973,1979,1980,1987,1988,1994,1999,2016,2020,576,577,578,579,580,582,586,587,589,590,591,593,596,597,598,604,605,606,608,609,612,613,614,615,617,619,620,621,622,624,625,626,627,628,629,630,632,634,635,637,639,640,642,643,644,645,647,648,651,656,659,660,661,662,663,665,667,669,670,672,674,675,677,683,684,689,690,691,695,726,728,729,730,731,734,736,741,742,743,745,746,749,752,758,761,762,763,764,768,769,771,772,773,774,775,776,777
)
ORDER BY
p.ProductName ASC
LIMIT 500900,100;
explain shows:
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
DRIVER
index
idx_SectionId
idx_ProductId_SectionId
8
NULL
589966
Using where; Using index; Using temporary; Using filesort
1
SIMPLE
p
eq_ref
PRIMARY,idx_ID
PRIMARY
4
4project.DRIVER.ProductId
1
Using where
I've tried to select from the products table and join the Product2Section in order to filter the results, but get the same results:
SELECT DISTINCT
p.ID,
p.ProductName
FROM
Product p
LEFT JOIN
Product2Section p2s ON (p.ID=p2s.ProductId)
WHERE
p2s.SectionId IN(
544,545,546,548,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,566,567,568,570,571,572,573,574,575,1337,1343,1353,1358,1369,1385,1956,1957,1964,1973,1979,1980,1987,1988,1994,1999,2016,2020,576,577,578,579,580,582,586,587,589,590,591,593,596,597,598,604,605,606,608,609,612,613,614,615,617,619,620,621,622,624,625,626,627,628,629,630,632,634,635,637,639,640,642,643,644,645,647,648,651,656,659,660,661,662,663,665,667,669,670,672,674,675,677,683,684,689,690,691,695,726,728,729,730,731,734,736,741,742,743,745,746,749,752,758,761,762,763,764,768,769,771,772,773,774,775,776,777
)
ORDER BY
p.ProductName ASC
LIMIT 500900,
100;
explain:
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
p2s
index
idx_ProductId,idx_SectionId,idx_ProductId_SectionId
idx_ProductId_SectionId
8
NULL
589966
Using where; Using index; Using temporary; Using filesort
1
SIMPLE
p
eq_ref
PRIMARY,idx_ID
PRIMARY
4
4project.p2s.ProductId
1
Using where
Don't see a way out of that situation.
The two single column indices on Product2Section serve no purpose. You should change your junction table to:
CREATE TABLE `Product2Section` (
`SectionId` int unsigned NOT NULL,
`ProductId` int unsigned NOT NULL,
PRIMARY KEY (`SectionId`, `ProductId`),
KEY `idx_ProductId_SectionId` (`ProductId`, `SectionId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
There are other queries in the system that probably use the single field indexes
The single column indices cannot be used for anything that the two composite indices cannot be used for. They are just wasting space and cause unnecessary overhead on insert and for the optimizer. Setting one of the composite indices as PRIMARY stops InnoDB from having to create its own internal rowid, which just wastes space. It also adds the uniqueness constraint which is currently missing from your table.
From the docs:
Accessing a row through the clustered index is fast because the index search leads directly to the page that contains the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.
This is not significant for a "simple" junction table as both columns should be stored in both indices, therefor no further read is required.
You said:
that didn't really bother me since there was no real performance hit
You may not see the difference when running an individual query with no contention but the difference in a highly contended production environment can be huge, due to the amount of effort required.
Do you really need to accommodate 4,294,967,295 (int unsigned) sections? Perhaps the 65,535 provided by smallint unsigned would be enough?
You said:
Might change it in the future. Don't think it will change the performance somehow
Changing SectionId to smallint will reduce each index entry from 8 to 6 bytes. That's a 25% reduction in size. Smaller is faster.
Why are you using LEFT JOIN? The fact that you are happy to reverse the order of the tables in the query suggests it should be an INNER JOIN.
Do you have your buffer pool configured appropriately, or is it set to defaults? Please run ANALYZE TABLE Product2Section; and then provide the output from:
SELECT TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH + INDEX_LENGTH
FROM information_schema.TABLES
WHERE TABLE_NAME = 'Product2Section';
And:
SELECT ROUND(SUM(DATA_LENGTH + INDEX_LENGTH)/POW(1024, 3), 2)
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database_name';
And:
SHOW VARIABLES LIKE 'innodb_buffer%';

How to optimise mysql query as Full ProcessList is showing Sending Data for over 24 hours

I have the following query that runs forever and I am looking to see if there is anyway that I can optimise it. This is running on a table that has in total 1,406,480 rows of data but apart from the Filename and Refcolumn, the ID and End_Date have both been indexed.
My Query:
INSERT INTO UniqueIDs
(
SELECT
T1.ID
FROM
master_table T1
LEFT JOIN
master_table T2
ON
(
T1.Ref_No = T2.Ref_No
AND
T1.End_Date = T2.End_Date
AND
T1.Filename = T2.Filename
AND
T1.ID > T2.ID
)
WHERE T2.ID IS NULL
AND
LENGTH(T1.Ref_No) BETWEEN 5 AND 10
)
;
Explain Results:
The reason for not indexing the Ref_No is that this is a text column and therefore I get a BLOB/TEXT error when I try and index this column.
Would really appreciate if somebody could advise on how I can quicken this query.
Thanks
Thanks to Bill in regards to multi column indexes I have managed to make some headway. I first ran this code:
CREATE INDEX I_DELETE_DUPS ON master_table(id, End_Date);
I then added a new column to show the length of the Ref_No but had to change it from the query Bill mentioned as my version of MySQL is 5.5. So I ran it in 3 steps:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED;
UPDATE master_table SET Ref_No_length = LENGTH(Ref_No);
ALTER TABLE master_table ADD INDEX (Ref_No_length);
Last step was to change my insert query with the where clause for the length. This was changed to:
AND t1.Ref_No_length between 5 and 10;
I then ran this query and within 15 mins I had 280k worth of id's inserted into my UniqueIDs table. I did go change my insert script to see if I could add more values to the length by doing the following:
AND t1.Ref_No_length IN (5,6,7,8,9,10,13);
This was to bring in the values where length was also equal to 13. This query took a lot longer, 2hr 50 mins to be precise but the additional ask of looking for all rows that have length of 13 gave me an extra 700k unique ids.
I am looking at ways to optimise the query with the IN clause, but a big improvement where this query kept running for 24 hours. So thank you so much Bill.
For the JOIN, you should have a multi-column index on (Ref_No, End_Date, Filename).
You can create a prefix index on a TEXT column like this:
ALTER TABLE master_table ADD INDEX (Ref_No(10));
But that won't help you search based on the LENGTH(). Indexing only helps search by value indexed, not by functions on the column.
In MySQL 5.7 or later, you can create a virtual column like this, with an index on the values calculated for the virtual column:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED AS (LENGTH(Ref_No)),
ADD INDEX (Ref_No_length);
Then MySQL will recognize that your condition in your query is the same as the expression for the virtual column, and it will automatically use the index (exception: in my experience, this doesn't work for expressions using JSON functions).
But this is no guarantee that the index will help. If most of the rows match the condition of the length being between 5 and 10, the optimizer will not bother with the index. It may be more work to use the index than to do a table-scan.
the ID and End_Date have both been indexed.
You have PRIMARY KEY(id) and redundantly INDEX(id)? A PK is a unique key.
"have both been indexed" -- INDEX(a), INDEX(b) is not the same as INDEX(a,b) -- they have different uses. Read about "composite" indexes.
That query smells a lot like "group-wise" max done in a very slow way. (Alas, that may have come from the online docs.)
I have compiled the fastest ways to do that task here: http://mysql.rjweb.org/doc.php/groupwise_max (There are multiple versions, based on MySQL version and what issues your code can/cannot tolerate.)
Please provide SHOW CREATE TABLE. One important question: Is id the PRIMARY KEY?
This composite index may be useful:
(Filename, End_Date, Ref_No, -- first, in any order
ID) -- last
This, as others have noted, is unlikely to be helped by any index, hence T1 will need a full-table-scan:
AND LENGTH(T1.Ref_No) BETWEEN 5 AND 10
If Ref_No cannot be bigger than 191 characters, change it to a VARCHAR so that it can be used in an index. Oh, did I ask for SHOW CREATE TABLE? If you can't make it VARCHAR, then my recommended composite index is
INDEX(Filename, End_Date, ID)

How can I optimize this SQL query with 100.000 records?

There is my SQL query. Table system_mailer is for logging sent e-mails. When i want to search some data, query is 10 seconds long. It is possible on any way to optimize this query?
SELECT `ID`
FROM `system_mailer`
WHERE `group` = 'selling_center'
AND `group_parameter_1` = '1'
AND `group_parameter_2` = '2138'
Timins is around couple of seconds, how could it be optimised?
You might find the following index on 4 columns would help performance:
CREATE INDEX idx ON system_mailer (`group`, group_parameter_1, group_parameter_2, ID);
MySQL should be able to use this index on your current query. By the way, if you are using InnoDB, and ID is the primary key, then you might be able to drop it from the explicit index definition, and just use this:
CREATE INDEX idx ON system_mailer (`group`, group_parameter_1, group_parameter_2);
Please avoid naming your columns and tables with reserved MySQL keywords like group. Because you made this design decision, you will now be forced to forever escape that column name with backticks (ugly).
just be sure you have a composite index on table system_mailer for the columns
(`group`, `group_parameter_1`, `group_parameter_2`)
and you can use redudancy adding the id to index for avoid data table access in query
(`group`, `group_parameter_1`, `group_parameter_2`, ID)

MySQL Index sometimes not being used

I have a table with 150k rows of data, and I have column with a UNIQUE INDEX, It has a type of VARCHAR(10) and stores 10 digit account numbers.
Now whenever I query, like a simple one:
SELECT * FROM table WHERE account_number LIKE '0103%'
It results 30,000+ ROWS, and when I run a EXPLAIN on my query It shows no INDEX is used.
But when I do:
SELECT * FROM table WHERE account_number LIKE '0104%'
It results 4,000+ ROWS, with the INDEX used.
Anyone can explain this?
I'm using MySQL 5.7 Percona XtraDB.
30k+/150k > 20% and I guess it is faster to do table scan. From 8.2.1.19 Avoiding Full Table Scans:
The output from EXPLAIN shows ALL in the type column when MySQL uses a full table scan to resolve a query. This usually happens under the following conditions:
You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it probably will do many key lookups and that a table scan would be faster.
If you don't need all values try to use:
SELECT account_number FROM table WHERE account_number LIKE '0103%'
instead of SELECT *. Then your index will become covering index and optimizer should always use it (as long as WHERE condition is SARGable).
The most database uses B tree for indexing. In this case the database optimizer don't use the index because its faster to scan without index. Like #lad2025 explained.
Your database column is unique and i think your cardinality of your index is high. But since your query using the like filter the database optimizer decides for you to choose not to use the index.
You can use try force index to see the result. Your using varchar with unique index. I would choose another data type or change your index type. If your table only contains numbers change it to numbers. This will help to optimize you query a lot.
In some cases when you have to use like you can use full text index.
If you need help with optimizing your query and table. Provide us more info and which info you want to fetch from your table.
lad2025 is correct. The database is attempting to make an intelligent optimization.
Benchmark with:
SELECT * FROM table FORCE INDEX(table_index) WHERE account_number LIKE '0103%'
and see who is smarter :-) You can always try your hand at questioning the optimizer. That's what index hints are for...
https://dev.mysql.com/doc/refman/5.7/en/index-hints.html

Best way to index a table with a unique multi-column?

I am creating a table which will store around 100million rows in MySQL 5.6 using InnoDB storage engine. This table will have a foreign key that will link to another table with around 5 million rows.
Current Table Structure:
`pid`: [Foreign key from another table]
`price`: [decimal(9,2)]
`date`: [date field]
and every pid should have only one record for a date
What is the best way to create indexes on this table?
Option #1: Create Primary index on two fields pid and date
Option #2: Add another column id with AUTO_INCREMENT and primary index and create a unique index on column pid and date
Or any other option?
Only select query i will be using on this table is:
SELECT pid,price,date FROM table WHERE pid = 123
Based on what you said (100M; the only query is...; InnoDB; etc):
PRIMARY KEY(pid, date);
and no other indexes
Some notes:
Since it is InnoDB, all the rest of the fields are "clustered" with the PK, so a lookup by pid is acts as if price were part of the PK. Also WHERE pid=123 ORDER BY date would be very efficient.
No need for INDEX(pid, date, price)
Adding an AUTO_INCREMENT gains nothing (except a hint of ordering). If you needed ordering, then an index starting with date might be best.
Extra indexes slow down inserts. Especially UNIQUE ones.
Either method is fine. I prefer having synthetic primary keys (that is, the auto-incremented version with the additional unique index). I find that this is useful for several reasons:
You can have a foreign key relationship to the table.
You have an indicator of the order of insertion.
You can change requirements, so if some pids allows two values per day or only one per week, then the table can support them.
That said, there is additional overhead for such a column. This overhead adds space and a small amount of time when you are accessing the data. You have a pretty large table, so you might want to avoid this additional effort.
I would try with an index that attempts to cover the query, in the hope that MySQL has to access to the index only in order to get the result set.
ALTER TABLE `table` ADD INDEX `pid_date_price` (`pid` , `date`, `price`);
or
ALTER TABLE `table` ADD INDEX `pid_price_date` (`pid` , `price`, `date`);
Choose the first one if you think you may need to select applying conditions over pid and date in the future, or the second one if you think the conditions will be most probable over pid and price.
This way, the index has all the data the query needs (pid, price and date) and its indexing on the right column (pid)
By the way, always use EXPLAIN to see if the query planner will really use the whole index (take a look at the key and keylen outputs)