MySQL index for normal column and full text column - mysql

I'm trying to speed up a query for the below:
My table has around 4 million records.
EXPLAIN SELECT * FROM chrecords WHERE company_number = 'test' OR MATCH (company_name,registered_office_address_address_line_1,registered_office_address_address_line_2) AGAINST('test') LIMIT 0, 10;
+------+-------------+-----------+------+------------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------+------+------------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | chrecords | ALL | i_company_number | NULL | NULL | NULL | 2208348 | Using where |
+------+-------------+-----------+------+------------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
I've created two indexes using the below:
ALTER TABLE `chapp`.`chrecords` ADD INDEX `i_company_number` (`company_number`);
ALTER TABLE `chapp`.`chrecords`ADD FULLTEXT(
`company_name`,
`registered_office_address_address_line_1`,
`registered_office_address_address_line_2`
);
How can "combine" the two indexes however? As the above query takes 15+ seconds to execute (only using one index).
The entire table definition:
CREATE TABLE `chapp`.`chrecords` (
`id` INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
`company_name` VARCHAR(100) NULL,
`company_number` VARCHAR(100) NULL,
`registered_office_care_of` VARCHAR(100) NULL,
`registered_office_po_box` VARCHAR(100) NULL,
`registered_office_address_address_line_1` VARCHAR(100) NULL,
`registered_office_address_address_line_2` VARCHAR(100) NULL,
`registered_office_locality` VARCHAR(100) NULL,
`registered_office_region` VARCHAR(100) NULL,
`registered_office_country` VARCHAR(100) NULL,
`registered_office_postal_code` VARCHAR(100) NULL
);
ALTER TABLE `chapp`.`chrecords` ADD INDEX `i_company_name` (`company_name`);
ALTER TABLE `chapp`.`chrecords` ADD INDEX `i_company_number` (`company_number`);
ALTER TABLE `chapp`.`chrecords` ADD INDEX `i_registered_office_address_address_line_1` (`registered_office_address_address_line_1`);
ALTER TABLE `chapp`.`chrecords` ADD INDEX `i_registered_office_address_address_line_2` (`registered_office_address_address_line_2`);
ALTER TABLE `chapp`.`chrecords`ADD FULLTEXT(
`company_name`,
`registered_office_address_address_line_1`,
`registered_office_address_address_line_2`
);

(
SELECT *
FROM chrecords
WHERE company_number = 'test'
ORDER BY something
LIMIT 10
)
UNION DISTINCT
(
SELECT *
FROM cbrecords
WHERE MATCH (company_name, registered_office_address_address_line_1,
registered_office_address_address_line_2)
AGAINST('test')
ORDER BY something
LIMIT 10
)
ORDER BY something
LIMIT 10
Notes:
No need for an outer SELECT
Explicitly say DISTINCT (the default) or ALL (which is faster) so that you will know that you thought about whether dedupping was needed, versus speed.
A LIMIT without an ORDER BY is not very meaningful
However, if you just want some rows to look at, you can remove the ORDER BYs.
Yes the ORDER BY and LIMIT need to be repeated outside so that you can get the ordering correct and limit to 10.
If you need an OFFSET, the the inside need a full count, say LIMIT 50 for 5 pages, the n the outside needs to skip to the 5th page: LIMIT 40,10.

Try using a UNION rather than OR.
SELECT *
FROM (
SELECT *
FROM chrecords
WHERE company_number = 'test'
) a
UNION (
SELECT *
FROM cbrecords
WHERE MATCH (company_name,
registered_office_address_address_line_1,
registered_office_address_address_line_2)
AGAINST('test')
LIMIT 0, 10
) b
If this helps, it's because MySQL struggles to use more than one index in a single subquery. This gives the query planner two queries.
You can run EXPLAIN on each of the subqueries separately to understand their performance. UNION just puts their results together and eliminates duplicates. If you want to keep the duplicates, do UNION ALL.
Please notice that lots of single-column indexes on MySQL tables are generally harmful to performance. You should refrain from creating indexes unless they're constructed to help specific queries.

Related

Mariadb - select count() using index but select * not using proper index

Version - 10.4.25-MariaDB
I have a table where column(name) is a second part of primary key(idarchive,name).
When i run count(*) on table where name like 'done%', its using the index on field name properly but when i run select * its not using the separate index instead using primary key and slowing down the query.
Any idea what we can do here ?
any changes in optimizer switch or any other alternative which can help ?
Note - we can't use force index as queries are not controlable.
Table structure:
CREATE TABLE `table1 ` (
`idarchive` int(10) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
`idsite` int(10) unsigned DEFAULT NULL,
`date1` date DEFAULT NULL,
`date2` date DEFAULT NULL,
`period` tinyint(3) unsigned DEFAULT NULL,
`ts_archived` datetime DEFAULT NULL,
`value` double DEFAULT NULL,
PRIMARY KEY (`idarchive`,`name`),
KEY `index_idsite_dates_period` (`idsite`,`date1`,`date2`,`period`,`ts_archived`),
KEY `index_period_archived` (`period`,`ts_archived`),
KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Queries:
explain select count(*) from table1 WHERE name like 'done%' ;
+------+-------------+-------------------------------+-------+---------------+------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------------------------------+-------+---------------+------+---------+------+---------+--------------------------+
| 1 | SIMPLE | table1 | range | name | name | 767 | NULL | 9131455 | Using where; Using index |
+------+-------------+-------------------------------+-------+---------------+------+---------+------+---------+--------------------------+
1 row in set (0.000 sec)
explain select * from table1 WHERE name like 'done%' ;
+------+-------------+-------------------------------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------------------------------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | table1 | ALL | name | NULL | NULL | NULL | 18262910 | Using where |
+------+-------------+-------------------------------+------+---------------+------+---------+------+----------+-------------+
1 row in set (0.000 sec) ```
Your SELECT COUNT(*) ... LIKE 'constant%' query is covered by your index on your name column. That is, the entire query can be satisfied by reading the index. So the query planner decides to range-scan your index to generate the result.
On the other hand, your SELECT * query needs all columns from all rows of the table. That can't be satisfied from any of your indexes. And, it's possible your WHERE name like 'done%' filter reads a significant fraction of the table, enough so the query planner decides the fastest way to satisfy it is to scan the entire table. The query planner figures this out by using statistics on the contents of the table, plus some knowledge of the relative costs of IO and CPU.
If you have just inserted many rows into the table you could try doing ANALYZE TABLE table1 and then rerun the query. Maybe after the table's statistics are updated you'll get a different query plan.
And, if you don't need all the columns, you could stop using SELECT * and instead name the columns you need. SELECT * is a notorious query-performance antipattern, because it often returns column data that never gets used. Once you know exactly what columns you want, you could create a covering index to provide them.
These days the query planner does a pretty good job of optimizing simple queries such as yours.
In MariaDB you can say ANALYZE FORMAT=JSON SELECT.... It will run the query and show you details of the actual execution plan it used.

Populate MYSQL Column from specific rows in other table

I am new to SQL (using mySQL Community Workbench) and not sure where to begin with this problem.
Here is the overview: I have two tables in my food database: branded_food and food_nutrient
The important columns in branded_food are fdc_id and kcals.
The important columns in food_nutrient are fdc_id, nutrient_id, and value
branded_food's fdc_id column indexes into food_nutrient's fdc_id column. However, this returns every nutrient in the food, when I only want nutrient id 208's value entry.
Here is an example:
branded_food looks like:
fdc_id | kcals
-----------------
123 | (Empty)
456 | (Empty)
... | (Empty)
food_nutrient looks like:
fdc_id | nutrient_id | value
----------------------------
123 | 203 | 23
123 | 204 | 25
123 | ... | ...
123 | 208 | 500
Essentially, I would like to write some sort of loop that goes through each fdc_id in branded_food, finds the row in food_nutrient that has fdc_id equal to the looped value, and then populate the kcals in the row of the fdc_id in branded_food. Thus the first example row should populate like:
fdc_id | kcals
-----------------
123 | 500
As an update, I have looked at INNER JOIN and have created this:
SELECT food_nutrient.amount,food_branded_food.description, food_branded_food.fdc_id
FROM food_nutrient
INNER JOIN food_branded_food ON food_nutrient.fdc_id = food_branded_food.fdc_id
WHERE food_nutrient.nutrient_id = 208
LIMIT 1;
This will correctly display the kcals of the food_branded_food.description (the name of the food) that has fdc_id of food_branded_food.fdc_id. I limit to 1 because the query takes very long. Is there a better way?
Update #2: Here is something I recently tried, but just spins forever:
UPDATE backup_branded_food bf
INNER JOIN (
SELECT food_nutrient.fdc_id,food_nutrient.amount amt FROM food_nutrient WHERE food_nutrient.nutrient_id = 208
) mn ON bf.fdc_id = mn.fdc_id
SET bf.kcals = mn.amt
WHERE bf.kcals IS NULL;
Running explain:
And SHOW CREATE TABLE food_nutrient
| food_nutrient | CREATE TABLE `food_nutrient` (
`id` bigint DEFAULT NULL,
`fdc_id` bigint DEFAULT NULL,
`nutrient_id` bigint DEFAULT NULL,
`amount` bigint DEFAULT NULL,
`data_points` bigint DEFAULT NULL,
`derivation_id` bigint DEFAULT NULL,
`min` double DEFAULT NULL,
`max` double DEFAULT NULL,
`median` double DEFAULT NULL,
`loq` text,
`footnote` text,
`min_year_acquired` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
Running SHOW CREATE TABLE backup_branded_food (I use a backup of branded food instead of the actual table)
| backup_branded_food | CREATE TABLE `backup_branded_food` (
`fdc_id` bigint DEFAULT NULL,
`data_type` text,
`description` text,
`food_category_id` bigint DEFAULT NULL,
`publication_date` text,
`brand_owner` varchar(255) DEFAULT NULL,
`brand_name` varchar(255) DEFAULT NULL,
`serving_size` double DEFAULT NULL,
`serving_size_unit` varchar(50) DEFAULT NULL,
`kcals` double DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
Table Indexes:
The table structure info obtained from SHOW CREATE TABLE table_name shows that both table don't have any indexes and/or primary key. This is probably why your query runs very slow. To quickly fix this issue, let's start by adding indexes on columns appear in WHERE and ON (in the JOIN):
ALTER TABLE food_nutrient
ADD INDEX fdc_id(fdc_id),
ADD INDEX nutrient_id(nutrient_id);
ALTER TABLE branded_food
ADD INDEX fdc_id(fdc_id);
With these indexes added, the EXPLAIN shows the following:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
fn
ref
fdc_id,nutrient_id
nutrient_id
9
const
1
100.00
Using where
1
SIMPLE
bf
ref
fdc_id
fdc_id
9
db_40606077.fn.fdc_id
1
100.00
Since I don't know the size of the table, I can't really test how quick the query will be after adding these indexes but I assume that this will improve the query speed significantly.
P/S: Normally, you would have at least 1 column assigned as PRIMARY KEY - which will never have any duplicates. In your table food_nutrient, there's an id column that might be the PRIMARY KEY but there's a possible unique combination between fdc_id and nutrient_id. Therefore, you might consider adding UNIQUE KEY on those two columns apart from adding PRIMARY KEY on id. 24.6.1 Partitioning Keys, Primary Keys, and Unique Keys
Usage of aliases:
This is to help make your query more readable. You didn't use any in your current query so you end up appending the full table name on column(s) that you're using in you operations:
....
FROM food_nutrient AS fn
INNER JOIN food_branded_food fbf /*can simply be written without "..AS.."*/
ON fn.fdc_id = fbf.fdc_id /*the operation afterwards didn't require you to append full table name*/
...
Similarly, once you've added the table alias, you can use it in SELECT too:
SELECT fn.amount, fbf.description,
fbf.fdc_id AS 'FBF_id'
/*you can also assign a custom/desired alias to your column - as your output column name*/
...
Couldn't find official documentation on MySQL website but here's a further explanation from a different site.
Alternative UPDATE syntax:
Your current UPDATE query should be able to perform what you need but you probably don't need the subquery at all. This UPDATE query should work as well:
UPDATE branded_food bf
JOIN food_nutrient fn ON bf.fdc_id = fn.fdc_id
SET bf.kcals = fn.amount
WHERE fn.nutrient_id = 208
AND bf.kcals IS NULL;
Here's a demo fiddle for reference
A UPDATE and an INNER JOIN gets you your wanted result
UPDATE branded_food bf
INNER JOIN (SELECT fdc_id , SUM(value) svalue FROM Mfood_nutrient ) mn ON bg.fdc_id = mn.fdc_id
SET bf.value = mn.svalue
WHERE bf.value IS NULL;

mysql query optimization: select with counted subquery extremely slow

I have the following tables:
mysql> show create table rsspodcastitems \G
*************************** 1. row ***************************
Table: rsspodcastitems
Create Table: CREATE TABLE `rsspodcastitems` (
`id` char(20) NOT NULL,
`description` mediumtext,
`duration` int(11) default NULL,
`enclosure` mediumtext NOT NULL,
`guid` varchar(300) NOT NULL,
`indexed` datetime NOT NULL,
`published` datetime default NULL,
`subtitle` varchar(255) default NULL,
`summary` mediumtext,
`title` varchar(255) NOT NULL,
`podcast_id` char(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `podcast_id` (`podcast_id`,`guid`),
UNIQUE KEY `UKfb6nlyxvxf3i2ibwd8jx6k025` (`podcast_id`,`guid`),
KEY `IDXkcqf7wi47t3epqxlh34538k7c` (`indexed`),
KEY `IDXt2ofice5w51uun6w80g8ou7hc` (`podcast_id`,`published`),
KEY `IDXfb6nlyxvxf3i2ibwd8jx6k025` (`podcast_id`,`guid`),
KEY `published` (`published`),
FULLTEXT KEY `title` (`title`),
FULLTEXT KEY `summary` (`summary`),
FULLTEXT KEY `subtitle` (`subtitle`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> show create table station_cache \G
*************************** 1. row ***************************
Table: station_cache
Create Table: CREATE TABLE `station_cache` (
`Station_id` char(36) NOT NULL,
`item_id` char(20) NOT NULL,
`item_type` int(11) NOT NULL,
`podcast_id` char(20) NOT NULL,
`published` datetime NOT NULL,
KEY `Station_id` (`Station_id`,`published`),
KEY `IDX12n81jv8irarbtp8h2hl6k4q3` (`Station_id`,`published`),
KEY `item_id` (`item_id`,`item_type`),
KEY `IDXqw9yqpavo9fcduereqqij4c80` (`item_id`,`item_type`),
KEY `podcast_id` (`podcast_id`,`published`),
KEY `IDXkp2ehbpmu41u1vhwt7qdl2fuf` (`podcast_id`,`published`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
The "item_id" column of the second refers to the "id" column of the former (there isn't a foreign key between the two because the relationship is polymorphic, i.e. the second table may have references to entities that aren't in the first but in other tables that are similar but distinct).
I'm trying to get a query that lists the most recent items in the first table that do not have any corresponding items in the second. The highest performing query I've found so far is:
select i.*,
(select count(station_id)
from station_cache
where item_id = i.id) as stations
from rsspodcastitems i
having stations = 0
order by published desc
I've also considered using a where not exists (...) subquery to perform the restriction, but this was actually slower than the one I have above. But this is still taking a substantial length of time to complete. MySQL's query plan doesn't seem to be using the available indices:
+----+--------------------+---------------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------+------+---------------+------+---------+------+--------+----------------+
| 1 | PRIMARY | i | ALL | NULL | NULL | NULL | NULL | 106978 | Using filesort |
| 2 | DEPENDENT SUBQUERY | station_cache | ALL | NULL | NULL | NULL | NULL | 44227 | Using where |
+----+--------------------+---------------+------+---------------+------+---------+------+--------+----------------+
Note that neither portion of the query is using a key, whereas it ought to be able to use KEY published (published) from the primary table and KEY item_id (item_id,item_type) for the subquery.
Any suggestions how I can get an appropriate result without waiting for several minutes?
I would expect the fastest query to be:
select i.*
from rsspodcastitems i
where not exists (select 1
from station_cache sc
where sc.item_id = i.id
)
order by published desc;
This would take advantage of an index on station_cache(item_id) and perhaps rsspodcastitems(published, id).
Your query could be faster, if your query returns a significant number of rows. Your phrasing of the query allows the index on rsspodcastitems(published) to avoid the file sort. If you remove the group by, the exists version should be faster.
I should note that I like your use of the having clause. When faced with this in the past, I have used a subquery:
select i.*,
(select count(station_id)
from station_cache
where item_id = i.id) as stations
from (select i.*
from rsspodcastitems i
order by published desc
) i
where not exists (select 1
from station_cache sc
where sc.item_id = i.id
);
This allows one index for sorting.
I prefer a slight variation on your method:
select i.*,
(exists (select 1
from station_cache sc
where sc.item_id = i.id
)
) as has_station
from rsspodcastitems i
having has_station = 0
order by published desc;
This should be slightly faster than the version with count().
You might want to detect and remove redundant indexes from your tables. Reviewing your CREATE TABLE information for both tables with help you discover several, including podcast_id,guid and Station_id,published, item_id,item_type and podcast_id,published there may be more.
My eventual solution was to delete the full text indices and use an externally generated index table (produced by iterating over the words in the text, filtering stop words, and applying a stemming algorithm) to allow searching. I don't know why the full text indices were causing performance problems, but they seemed to slow down every query that touched the table even if they weren't used.

Simple ordered MySQL query running very slow on a large table

I have a table with ~ 1.500.000 records:
CREATE TABLE `item_locale` (
`item_id` bigint(20) NOT NULL,
`language` int(11) NOT NULL,
`name` varchar(256) COLLATE utf8_czech_ci NOT NULL,
`text` text COLLATE utf8_czech_ci)
PRIMARY KEY (`item_id`,`language`),
KEY `name` (`name`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_czech_ci;
With item_id, language as primary keys and index on name with size 255.
With following query:
select item_id, name from item_locale order by name limit 50;
The select takes around 3 seconds event though only 50 rows were required.
What can I do to speed up such query?
EDIT: Some of you suggested adding an INDEX. I mentioned above, that the name column is indexed with size 255.
I runned explain on the command:
+----+-------------+---------------+------+---------------+------+---------+------+---------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+---------------+------+---------+------+---------+----------------+
| 1 | SIMPLE | item_locale | ALL | NULL | NULL | NULL | NULL | 1558653 | Using filesort |
+----+-------------+---------------+------+---------------+------+---------+------+---------+----------------+
Strange thing is that it is seems not to use any index...
Retrieving 50 Records is heavier too. Limit them to 10 Since you are using Order by also..
Try to use query hint:
select item_id, name
from item_locale USE INDEX FOR ORDER BY (name)
order by name limit 50;
also try to use
select item_id, name
from item_locale FORCE INDEX (name)
order by name limit 50;
In the end, there was some kind of problem with indexes - I dropped them all and recreated them again. And it finally works. Thanks.
Apply an index on the name field which might speed it up a bit.

any one can help me here.how to improve query performance

my query is running longer than 30 minutes .it is a simple query even it contains indexes also.we are unable to find why it was taking too much execution time and it effects on our entire db performance.
yesterday it ran around: 122.6mins
any one can help me here.how to improve query performance
This is my query:
SELECT tab1.customer_id,tab1.row_mod,tab1.row_create,tab1.event_id,tab1.event_type,
tab1.new_value,tab1.old_value FROM tab1 force index (tab1_n2)where customer_id >= 1 and customer_id
< 5000000 and (tab1.row_mod >= '2012-10-01') or (tab1.row_create >= '2012-10-01' and tab1.row_create < '2012-10-13');
Explain plan
+----+-------------+------------------+------+---------------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | tab1 | ALL | tab1_n2 | NULL | NULL | NULL | 18490530 | Using where |
+----+-------------+------------------+------+---------------------+------+---------+------+----------+-------------+
1 row in set (0.00 sec)
Table structure:
mysql> show create table tab1\G
*************************** 1. row ***************************
Table: tab1
Create Table: CREATE TABLE `tab1` (
`customer_id` int(11) NOT NULL,
`row_mod` datetime DEFAULT NULL,
`row_create` datetime DEFAULT NULL,
`event_id` int(11) DEFAULT NULL,
`event_type` varchar(45) DEFAULT NULL,
`new_value` varchar(255) DEFAULT NULL,
`old_value` varchar(255) DEFAULT NULL,
KEY `customer_id1` (`customer_id`),
KEY `new_value_n1` (`new_value`),
KEY `tab1_n1` (`row_create`),
KEY `tab1_n2` (`row_mod`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
Please help me how to tune it .even it having indexes also
Probably because you are using an index that does not make sense.
The row_mod condition is only one branch of the OR condition, so that index is not much help here. If you are forcing every lookup through the index without eliminating any rows, that could be a lot slower than a full table scan. Good rule of thumb is that an index should eliminate more than 90% of rows.
Try to do without the "force index" part.
Try using a UNION of the two conditions. That way each condition can use an index.
ALTER TABLE tab1 ADD INDEX idx_row_mod_customer_id (row_mod, customer_id);
ALTER TABLE tab1 ADD INDEX idx_row_create (row_create);
SELECT tab1.customer_id, tab1.row_mod, tab1.row_create, tab1.event_id, tab1.event_type,
tab1.new_value, tab1.old_value
FROM tab1
WHERE customer_id >= 1 and customer_id
< 5000000 AND tab1.row_mod >= '2012-10-01'
UNION
SELECT tab1.customer_id, tab1.row_mod, tab1.row_create, tab1.event_id, tab1.event_type,
tab1.new_value, tab1.old_value
FROM tab1
WHERE tab1.row_create >= '2012-10-01' AND tab1.row_create < '2012-10-13';
To optimise further, you could add all selected columns to both indices, saving MySQL from having to load the rows into memory. This will greatly increase the size of the indices, and therefore their memory requirement.