I have a table that contains two bigint columns: beginNumber, endNumber, defined as UNIQUE. The ID is the Primary Key.
ID | beginNumber | endNumber | Name | Criteria
The second table contains a number. I want to retrieve the record from table1 when the Number from table2 is found to be between any two numbers. The is the query:
select distinct t1.Name, t1.Country
from t1
where t2.Number
BETWEEN t1.beginIpNum AND t1.endNumber
The query is taking too much time as I have so many records. I don't have experience in DB. But, I read that indexing the table will improve the search so MySQL does not have to pass through every row searching about m Number and this can be done by, for example, having UNIQE values. I made the beginNumber & endNumber in table1 as UNIQUE. Is this all what I can do ? Is there any possible way to improve the time ? Please, provide detailed answers.
EDIT:
table1:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`beginNumber` bigint(20) DEFAULT NULL,
`endNumber` bigint(20) DEFAULT NULL,
`Name` varchar(255) DEFAULT NULL,
`Criteria` varchar(455) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `beginNumber_UNIQUE` (`beginNumber`),
UNIQUE KEY `endNumber_UNIQUE` (`endNumber `)
) ENGINE=InnoDB AUTO_INCREMENT=327 DEFAULT CHARSET=utf8
table2:
CREATE TABLE `t2` (
`id2` int(11) NOT NULL AUTO_INCREMENT,
`description` varchar(255) DEFAULT NULL,
`Number` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id2`),
UNIQUE KEY ` description _UNIQUE` (`description `)
) ENGINE=InnoDB AUTO_INCREMENT=433 DEFAULT CHARSET=utf8
This is a toy example of the tables but it shows the concerned part.
I'd suggest an index on t2.Number like this:
ALTER TABLE t2 ADD INDEX numindex(Number);
Your query won't work as written because it won't know which t2 to use. Try this:
SELECT DISTINCT t1.Name, t1.Criteria
FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE t2.Number BETWEEN t1.beginNumber AND t1.endNumber);
Without the t2.Number index EXPLAIN gives this query plan:
1 PRIMARY t1 ALL 1 Using where; Using temporary
2 DEPENDENT SUBQUERY t2 ALL 1 Using where
With an index on t2.Number, you get this plan:
PRIMARY t1 ALL 1 Using where; Using temporary
DEPENDENT SUBQUERY t2 index numindex numindex 9 1 Using where; Using index
The important part to understand is that an ALL comparison is slower than an index comparison.
This is a good place to use binary tree index (default is hashmap). Btree indexes are best when you often sort or use between on column.
CREATE INDEX index_name
ON table_name (column_name)
USING BTREE
Related
I have a MySQL table (TABLE1) with 400 thousand records
CREATE TABLE `TABLE1` (
`ID` bigint(20) NOT NULL AUTO_INCREMENT,
`NAME` varchar(255) NOT NULL,
`VALUE` varchar(255) NOT NULL,
`UID` varchar(255) NOT NULL,
`USER_ID` varchar(255) DEFAULT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `ukey1` (`VALUE`,`NAME`,`UID`),
UNIQUE KEY `ukey2` (`UID`,`NAME`,`VALUE`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `TABLE2` (
`ID` bigint(20) NOT NULL AUTO_INCREMENT,
`UID` varchar(255) DEFAULT NULL,
`TABLE3ID` bigint(20) NOT NULL
PRIMARY KEY (`ID`),
KEY `FKEY` (`TABLE3ID`),
CONSTRAINT `FKEY` FOREIGN KEY (`TABLE3ID`) REFERENCES `TABLE3` (`ID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `TABLE3` (
`ID` bigint(20) NOT NULL AUTO_INCREMENT,
`TYPEID` bigint(20) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
The following query is very slow and takes hours and finally fails
delete from TABLE1 t1
inner join TABLE2 t2 on t1.UID=t2.UID
inner join TABLE3 t3 on t2.TABLE3ID=t3.ID
where t3.TYPEID in (234,3434) t1.USER_ID is not null and t1.USER_ID <> '12345';
Visual explain shows the following and adding index on UID not helping. How to optimize the performance of this query?
I tried adding an index on TABLE1.UID
Converting into a subquery
A simple query like SELECT * FROM TABLE3 where UID="SOMEUID" takes 800+ ms to fetch data
Change it to a JOIN.
DELETE t1
FROM TABLE1 AS t1
JOIN (SELECT uid FROM ...) AS t2 ON t1.uid = t2.uid
WHERE USER_ID is not null and USER_ID <> '12345';
I've found that MySQL implements WHERE uid IN (subquery) very poorly sometimes. Instead of getting all the results of the subquery and looking them up in the index of the table, it scans the table and performs the subquery for each row, then checks if the uid is in that result.
First of all make a backup of that table this is the first rule for doing a delete queries or you can ruin it and take all the precautions that you considere before
( uid1,uid2,...uid45000)
What is the meaning of those values between the parenthesis ? Are you need to compare in the list all the UID values or some of them?
beacause you can avoiding put all the UIDS manually like this.
delete from TABLE1 where UID in (SELECT T.UID FROM TABLE1 as T where T.UID is not NULL and USER_ID <> '12345');
Before to doing this please check what do you want between the parenthesis and run the command in a TEST environment first with dummy values
Take in consideration that you have in the table varchars types in the UIDS field and thats the reason that this operation take a lot of time more than if you are using integer values
The other way is that you need to create a new table and put the data that you need to store for the old table, next truncate the original table and reinsert the same values of the new table to the old table again
Please before to run a solution check all your restrictions with your teamates and make a test with dummy values
I would split your uid filter list in chunks (100 by chunk or other, need to test) and iterate or multithread over it
Why such query has issue "Using temporary; Using filesort" on table?
Explain
SELECT `table`.*, COUNT(table.id) AS `count`
FROM `table`
LEFT JOIN `table2` ON table.id = table2.foreign_id
GROUP BY `table2`.`foreign_id`
ORDER BY table.`title` ASC
1 SIMPLE table ALL NULL NULL NULL NULL 305 Using temporary; Using filesort
1 SIMPLE table2 ref table table 5 table.id 343 Using index
During doc could be without these slow processes.
EDIT:
Tables are the easiest as could be.
CREATE TABLE `table` (
`id` int(11) NOT NULL,
`title` varchar(100) DEFAULT NULL,
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `table2` (
`id` int(11) NOT NULL,
`foreign_id`(11) NOT NULL,
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
`table`.* -- This implies that you want all the rows (and all columns)
COUNT(table.id) -- This implies that you just one a single number
Although MySQL let you run the query, it is not a proper query. (And, in version 8.0, it will spit at you.)
If id is the PRIMARY KEY, then GROUP BY table.id further complicates the issue. That says to produce one row for each id. But since id is unique, there is only one row for each id. So the GROUP BY is redundant.
Please describe what the query is supposed to do; we can help you correctly formulate it.
Meanwhile, use InnoDB instead of MyISAM.
I have the following tables:
mysql> show create table rsspodcastitems \G
*************************** 1. row ***************************
Table: rsspodcastitems
Create Table: CREATE TABLE `rsspodcastitems` (
`id` char(20) NOT NULL,
`description` mediumtext,
`duration` int(11) default NULL,
`enclosure` mediumtext NOT NULL,
`guid` varchar(300) NOT NULL,
`indexed` datetime NOT NULL,
`published` datetime default NULL,
`subtitle` varchar(255) default NULL,
`summary` mediumtext,
`title` varchar(255) NOT NULL,
`podcast_id` char(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `podcast_id` (`podcast_id`,`guid`),
UNIQUE KEY `UKfb6nlyxvxf3i2ibwd8jx6k025` (`podcast_id`,`guid`),
KEY `IDXkcqf7wi47t3epqxlh34538k7c` (`indexed`),
KEY `IDXt2ofice5w51uun6w80g8ou7hc` (`podcast_id`,`published`),
KEY `IDXfb6nlyxvxf3i2ibwd8jx6k025` (`podcast_id`,`guid`),
KEY `published` (`published`),
FULLTEXT KEY `title` (`title`),
FULLTEXT KEY `summary` (`summary`),
FULLTEXT KEY `subtitle` (`subtitle`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> show create table station_cache \G
*************************** 1. row ***************************
Table: station_cache
Create Table: CREATE TABLE `station_cache` (
`Station_id` char(36) NOT NULL,
`item_id` char(20) NOT NULL,
`item_type` int(11) NOT NULL,
`podcast_id` char(20) NOT NULL,
`published` datetime NOT NULL,
KEY `Station_id` (`Station_id`,`published`),
KEY `IDX12n81jv8irarbtp8h2hl6k4q3` (`Station_id`,`published`),
KEY `item_id` (`item_id`,`item_type`),
KEY `IDXqw9yqpavo9fcduereqqij4c80` (`item_id`,`item_type`),
KEY `podcast_id` (`podcast_id`,`published`),
KEY `IDXkp2ehbpmu41u1vhwt7qdl2fuf` (`podcast_id`,`published`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
The "item_id" column of the second refers to the "id" column of the former (there isn't a foreign key between the two because the relationship is polymorphic, i.e. the second table may have references to entities that aren't in the first but in other tables that are similar but distinct).
I'm trying to get a query that lists the most recent items in the first table that do not have any corresponding items in the second. The highest performing query I've found so far is:
select i.*,
(select count(station_id)
from station_cache
where item_id = i.id) as stations
from rsspodcastitems i
having stations = 0
order by published desc
I've also considered using a where not exists (...) subquery to perform the restriction, but this was actually slower than the one I have above. But this is still taking a substantial length of time to complete. MySQL's query plan doesn't seem to be using the available indices:
+----+--------------------+---------------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------+------+---------------+------+---------+------+--------+----------------+
| 1 | PRIMARY | i | ALL | NULL | NULL | NULL | NULL | 106978 | Using filesort |
| 2 | DEPENDENT SUBQUERY | station_cache | ALL | NULL | NULL | NULL | NULL | 44227 | Using where |
+----+--------------------+---------------+------+---------------+------+---------+------+--------+----------------+
Note that neither portion of the query is using a key, whereas it ought to be able to use KEY published (published) from the primary table and KEY item_id (item_id,item_type) for the subquery.
Any suggestions how I can get an appropriate result without waiting for several minutes?
I would expect the fastest query to be:
select i.*
from rsspodcastitems i
where not exists (select 1
from station_cache sc
where sc.item_id = i.id
)
order by published desc;
This would take advantage of an index on station_cache(item_id) and perhaps rsspodcastitems(published, id).
Your query could be faster, if your query returns a significant number of rows. Your phrasing of the query allows the index on rsspodcastitems(published) to avoid the file sort. If you remove the group by, the exists version should be faster.
I should note that I like your use of the having clause. When faced with this in the past, I have used a subquery:
select i.*,
(select count(station_id)
from station_cache
where item_id = i.id) as stations
from (select i.*
from rsspodcastitems i
order by published desc
) i
where not exists (select 1
from station_cache sc
where sc.item_id = i.id
);
This allows one index for sorting.
I prefer a slight variation on your method:
select i.*,
(exists (select 1
from station_cache sc
where sc.item_id = i.id
)
) as has_station
from rsspodcastitems i
having has_station = 0
order by published desc;
This should be slightly faster than the version with count().
You might want to detect and remove redundant indexes from your tables. Reviewing your CREATE TABLE information for both tables with help you discover several, including podcast_id,guid and Station_id,published, item_id,item_type and podcast_id,published there may be more.
My eventual solution was to delete the full text indices and use an externally generated index table (produced by iterating over the words in the text, filtering stop words, and applying a stemming algorithm) to allow searching. I don't know why the full text indices were causing performance problems, but they seemed to slow down every query that touched the table even if they weren't used.
Problem: slow query.
table1 has about 5 000 rows
table2 has about 50 000 rows
timestamp format is int(11)
MySQL - 20 seconds (with indexes)
PostgreSQL - 0,04 seconds (with indexes)
SELECT *
FROM table1
LEFT JOIN table2
ON table2_timestamp BETWEEN table1_timestamp - 500
AND table1_timestamp + 500 ;
Can anybody help me with optimize this query for MySQL?
Explain:
1 SIMPLE a index a 9 2 Using index
1 SIMPLE b index b b 9 5 Using index
Tables:
CREATE TABLE `a` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`table1_timestamp` bigint(20) NULL DEFAULT NULL ,
PRIMARY KEY (`id`),
INDEX `a` (`table1_timestamp`) USING BTREE
)
ENGINE=InnoDB
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_general_ci
AUTO_INCREMENT=3
ROW_FORMAT=COMPACT
;
CREATE TABLE `b` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`table2_timestamp` bigint(20) NULL DEFAULT NULL ,
PRIMARY KEY (`id`),
INDEX `a` (`table2_timestamp`) USING BTREE
)
ENGINE=InnoDB
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_general_ci
AUTO_INCREMENT=3
ROW_FORMAT=COMPACT
;
A couple of points spring to mind but both feel like long-shots. Realistically it looks as though there shouldn't be much you can do to your query assuming your example is an accurate representation.
1 : You are using BIGINT which has a maximum value of 9x10^18 (SIGNED). INT has a max value of 4x10^9 (UNSIGNED), compared to days timestamp which is around 1.4x10^9 (all values approximate) and so consider changing the data type of that column in both tables from BIGINT to INT UNSIGNED or DATETIME
2 : The ROW_FORMAT is COMPACT which may cause issues with BTREE indexes (source). You are dealing with INT data types and so a ROW_FORMAT of FIXED would suffice so try changing to ROW_FORMAT=FIXED on both tables
3 : If always expecting rows to be returned from table2 for table1 rows then INNER JOIN would be more efficient than LEFT JOIN
To start out here is a simplified version of the tables involved.
tbl_map has approx 4,000,000 rows, tbl_1 has approx 120 rows, tbl_2 contains approx 5,000,000 rows. I know the data shouldn't be consider that large given that Google, Yahoo!, etc use much larger datasets. So I'm just assuming that I'm missing something.
CREATE TABLE `tbl_map` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`tbl_1_id` bigint(20) DEFAULT '-1',
`tbl_2_id` bigint(20) DEFAULT '-1',
`rating` decimal(3,3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `tbl_1_id` (`tbl_1_id`),
KEY `tbl_2_id` (`tbl_2_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tbl_1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tbl_2` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`data` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The Query in interest: also, instead of ORDER BY RAND(), ORDERY BY t.id DESC. The query is taking as much as 5~10 seconds and causes a considerable wait when users view this page.
EXPLAIN SELECT t.data, t.id , tm.rating
FROM tbl_2 AS t
JOIN tbl_map AS tm
ON t.id = tm.tbl_2_id
WHERE tm.tbl_1_id =94
AND tm.rating IS NOT NULL
ORDER BY t.id DESC
LIMIT 200
1 SIMPLE tm ref tbl_1_id, tbl_2_id tbl_1_id 9 const 703438 Using where; Using temporary; Using filesort
1 SIMPLE t eq_ref PRIMARY PRIMARY 8 tm.tbl_2_id 1
I would just liked to speed up the query, ensure that I have proper indexes, etc.
I appreciate any advice from DB Gurus out there! Thanks.
SUGGESTION : Index the table as follows:
ALTER TABLE tbl_map ADD INDEX (tbl_1_id,rating,tbl_2_id);
As per Rolando, yes, you definitely need an index on the map table but I would expand to ALSO include the tbl_2_id which is for your ORDER BY clause of Table 2's ID (which is in the same table as the map, so just use that index. Also, since the index now holds all 3 fields, and is based on the ID of the key search and criteria of null (or not) of rating, the 3rd element has them already in order for your ORDER BY clause.
INDEX (tbl_1_id,rating, tbl_2_id);
Then, I would just have the query as
SELECT STRAIGHT_JOIN
t.data,
t.id ,
tm.rating
FROM
tbl_map tm
join tbl_2 t
on tm.tbl_2_id = t.id
WHERE
tm.tbl_1_id = 94
AND tm.rating IS NOT NULL
ORDER BY
tm.tbl_2_id DESC
LIMIT 200