I dont have other ideas on how to optimize this query, runs about 2,5 sec with goodsXML table over 2 million rows. It looks like order by is slowing a lot, but i cant remove it. Also, it depends a lot on how many items selected here gx.categoryID IN(892), because later another tables joins this items set. I cant make joins after this option, because joined tables perform in where clauses.
SELECT MD5(CONCAT(gx.id,598)) citySort,gx.dateCreated lastModifiedSince,IF(DATE(gx.dateModified)>=(IF((EXTRACT(HOUR FROM NOW()) BETWEEN 0 AND 6),DATE(NOW() -INTERVAL 6 HOUR),DATE(NOW()))) OR DATE(gx.dateModified)>=DATE(NOW()),1,0) isActual,
gx.id,p.producerName,gx.categoryID,gx.name,CONCAT('::',gxi.imageName) images,IF(CONCAT('::',gxi.imageName)!='',1,0) imExist,gx.price,gx.oldPrice,gx.oldPricePt,gx.sourceUrl,IF(s.offerPostingType='XML',IF(s.alternateName!='',s.alternateName,s.name),CONCAT(u.lastName,' ',u.name)) shopName,s.logoName,
s.id shopID,s.active shopActive,s.offerPostingType,c.titleAdd,'Москва' cityName,
IF((s.cityID='598' AND s.deliveryByCity=1) OR (sa.cityID='598' AND sa.deliveryByCity=1) OR (s.deliveryByMRCities LIKE '%^598^%' AND s.deliveryByMR=1),1,0) deliveryInYourCity,
IF(s.deliveryByCityAll=1 OR (s.cityID='598' AND s.deliveryByCity=1) OR (sa.cityID='598' AND sa.deliveryByCity=1) OR (s.deliveryByMRCities LIKE '%^598^%' AND s.deliveryByMR=1),1,0) deliveryByCity,
IF(s.deliveryByMail=1,1,0) deliveryByMail,
IF(s.deliveryBySelfAll=1 OR (s.cityID='598' AND s.deliveryBySelf=1) OR (sa.cityID='598' AND sa.deliveryBySelf=1),1,0) deliveryBySelf
FROM goodsXML gx
JOIN category c ON c.id=gx.categoryID
LEFT JOIN producer p ON p.id=gx.producerID
JOIN shop s ON s.id=gx.shopID
LEFT JOIN shopAddress sa ON sa.shopID=s.id
LEFT JOIN users u ON u.id=s.userID
LEFT JOIN goodsXMLImages gxi ON gxi.goodsXMLID=gx.id AND gxi.isMain = 1
WHERE 1=1 AND (s.cityID='598' OR s.deliveryByCityAll=1 OR s.deliveryBySelfAll=1 OR s.deliveryByMail=1 OR sa.cityID='598' OR (s.deliveryByMR=1 AND s.deliveryByMRCities LIKE '%^598^%')) AND (s.isPaying=0 OR u.balance>0) AND gx.categoryID IN(892)
GROUP BY gx.id
ORDER BY isActual DESC,imExist DESC,gx.PPC DESC,gx.payPrior ASC,citySort DESC
LIMIT 0,40
Explain is following:
+----+-------------+-------+--------+--------------------------------+-----------------+----------------+------------------------+--------------------+--------------------------------+
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | EXTRA |
+----+-------------+-------+--------+--------------------------------+-----------------+----------------+------------------------+--------------------+--------------------------------+
| | | | | | | | | | |
| 1 | SIMPLE | c | const | PRIMARY | PRIMARY | 4 | const | 1 | Using temporary; Using filesor |
| | | | | | | | | | |
| 1 | SIMPLE | gx | ref | ixGroupNameCategoryIDShopIDPro | ixCategoryID... | ixCategoryIDid | 4 | const | 82005 |
| | | | | ducerID | | | | | |
| | | | | | | | | | |
| 1 | SIMPLE | s | eq_ref | PRIMARY | deliveryByMR | PRIMARY | 4 | vsesrazu.gx.shopID | 1 |
| | | | | | | | | | |
| 1 | SIMPLE | sa | ref | shopKey | shopKey | 5 | vsesrazu.s.id | 2 | Using where |
| | | | | | | | | | |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | vsesrazu.s.userID | 1 | Using where |
| | | | | | | | | | |
| 1 | SIMPLE | p | eq_ref | PRIMARY | PRIMARY | 4 | vsesrazu.gx.producerID | 1 | |
| | | | | | | | | | |
| 1 | SIMPLE | gxi | ref | over | over | 4 | vsesrazu.gx.id | 1 | |
+----+-------------+-------+--------+--------------------------------+-----------------+----------------+------------------------+--------------------+--------------------------------+
Show create table for goodsXML:
CREATE TABLE goodsXML (
id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
localID char(255) NOT NULL,
groupID char(255) DEFAULT NULL,
dateCreated datetime NOT NULL,
dateModified datetime NOT NULL,
dateModifiedPrice datetime NOT NULL,
name char(255) DEFAULT NULL,
nameHash char(32) NOT NULL,
groupName char(255) DEFAULT NULL,
newGroupName char(255) NOT NULL,
url char(255) NOT NULL,
sourceUrl char(255) NOT NULL,
categoryID int(6) unsigned NOT NULL,
producerID int(6) DEFAULT NULL,
authorID int(6) DEFAULT NULL,
shopID int(6) NOT NULL,
XMLUrlOrder tinyint(2) NOT NULL,
price float(12,2) NOT NULL,
oldPrice float(12,2) NOT NULL,
oldPricePt smallint(3) NOT NULL,
description text,
descriptionHash char(32) NOT NULL,
descriptionForGroup text NOT NULL,
imExist tinyint(1) NOT NULL DEFAULT '0',
imagesForGroup tinyint(1) NOT NULL DEFAULT '0',
videoHighlight text NOT NULL,
videoSiteUrl char(255) NOT NULL,
videoChannelUrl char(255) NOT NULL,
plusesMinuses text NOT NULL,
toIndex tinyint(1) NOT NULL DEFAULT '0',
isRST tinyint(1) NOT NULL DEFAULT '0',
isReplica tinyint(1) NOT NULL DEFAULT '0',
status tinyint(1) NOT NULL DEFAULT '0',
comment char(255) NOT NULL,
daysLeft tinyint(2) NOT NULL,
PPC float(5,2) DEFAULT '0.00',
payPrior tinyint(1) NOT NULL DEFAULT '4',
PRIMARY KEY (id),
UNIQUE KEY ixGroupNameCategoryIDShopIDProducerID (shopID,localID),
KEY ixGroupNameCategoryID (groupName,categoryID),
KEY ixStatusShopID (status,shopID),
KEY ixCategoryID (categoryID),
KEY authorID (authorID),
KEY ixDateModified (dateModified,imExist),
KEY daysLeft (daysLeft),
KEY sourceUrl (sourceUrl),
KEY ixCategoryIDid (categoryID,id)
) ENGINE=MyISAM AUTO_INCREMENT=4218880 DEFAULT CHARSET=utf8
The EXPLAIN shows that it needs to scan about 82K rows of gx. There are apparently about that many rows with categoryID = 892, correct? Most of the rest is straightforward JOINs.
Don't use MyISAM, use InnoDB.
INT(6) -- the (6) means nothing. Perhaps you meant MEDIUMINT UNSIGNED? INT is 4 bytes; MEDIUMINT is 3.
Are you 'always' searching by category? If so, make use of InnoDB's "clustered" PK by switching to PRIMARY KEY (categoryID, id), INDEX(id) and chuck the two existing indexes starting with categoryID.
Don't use CHAR unless the column is truly fixed length; use VARCHAR.
Don't use FLOAT(m,n), it can lead to subtle rounding errors. For Money, use DECIMAL(m,n); for Scientific values, use FLOAT.
ORs defeat optimizations. See if you can redesign the schema to avoid some of them.
What is LIKE '%^598^%'? Do you have a list of numbers in that column?
After switching to InnoDB, decrease key_buffer_size to only 30M and increase innodb_buffer_pool_size to 70% of available RAM`.
Related
I have three tables that are concerned by this query
CREATE TABLE `tags` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`latName` varchar(191) NOT NULL,
`araName` varchar(191) NOT NULL,
`active` tinyint(1) NOT NULL DEFAULT 0,
`img_name` varchar(191) DEFAULT NULL,
`icon` varchar(191) DEFAULT NULL,
`rgba_color` varchar(191) DEFAULT NULL,
`color` varchar(191) DEFAULT NULL,
`overlay` varchar(191) DEFAULT NULL,
`position` int(11) NOT NULL,
`mdi_icon` varchar(191) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `tags_latname_unique` (`latName`),
UNIQUE KEY `tags_araname_unique` (`araName`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb3
CREATE TABLE `newspapers` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`latName` varchar(191) NOT NULL,
`araName` varchar(191) NOT NULL,
`img_name` varchar(191) DEFAULT NULL,
`active` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
UNIQUE KEY `newspapers_latname_unique` (`latName`),
UNIQUE KEY `newspapers_araname_unique` (`araName`)
) ENGINE=InnoDB AUTO_INCREMENT=21 DEFAULT CHARSET=utf8mb3
CREATE TABLE `articles` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`newspaper_id` bigint(20) unsigned NOT NULL,
`tag_id` bigint(20) unsigned NOT NULL,
`seen` int(10) unsigned NOT NULL,
`link` varchar(1000) NOT NULL,
`title` varchar(191) NOT NULL,
`img_name` varchar(191) NOT NULL,
`date` datetime NOT NULL,
`paragraph` text NOT NULL,
`read_time` int(11) DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `articles_link_unique` (`link`),
UNIQUE KEY `articles_img_name_unique` (`img_name`),
KEY `articles_newspaper_id_foreign` (`newspaper_id`),
KEY `articles_tag_id_foreign` (`tag_id`),
CONSTRAINT `articles_newspaper_id_foreign` FOREIGN KEY (`newspaper_id`) REFERENCES `newspapers` (`id`),
CONSTRAINT `articles_tag_id_foreign` FOREIGN KEY (`tag_id`) REFERENCES `tags` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=47421 DEFAULT CHARSET=utf8mb3
Basically, I want to load the latest 5 articles (ordered by date) that have an active newspaper and active tag.
Right now articles table contains about 40k entries.
This is the query generated by Laravel's query builder
SELECT `articles`.*
FROM `articles`
INNER JOIN `tags` ON `tags`.`id` = `articles`.`tag_id`
AND `tags`.`active` = 1
INNER JOIN `newspapers` ON `newspapers`.`id` = `articles`.`newspaper_id`
AND `newspapers`.`active` = 1
ORDER BY `date` DESC
LIMIT 5;
It takes Mysql about 6sec to run the query, when I remove the ORDER BY clause, the query becomes very fast (0.001sec).
Here is the query explanation:
+------+-------------+------------+--------+-------------------------------------------------------+-------------------------------+---------+------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+--------+-------------------------------------------------------+-------------------------------+---------+------------------------+------+----------------------------------------------+
| 1 | SIMPLE | newspapers | ALL | PRIMARY | NULL | NULL | NULL | 18 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | articles | ref | articles_newspaper_id_foreign,articles_tag_id_foreign | articles_newspaper_id_foreign | 8 | mouhim.newspapers.id | 1127 | |
| 1 | SIMPLE | tags | eq_ref | PRIMARY | PRIMARY | 8 | mouhim.articles.tag_id | 1 | Using where |
+------+-------------+------------+--------+-------------------------------------------------------+-------------------------------+---------+------------------------+------+----------------------------------------------+
I tried creating an index on the date attribute but it didn't help.
for convenience, this is how I am using Query Builder for this query:
Article::select("articles.*")
->join("tags", function ($join) {
$join->on("tags.id", "articles.tag_id")
->where("tags.active", 1);
})
->join("newspapers", function ($join) {
$join->on("newspapers.id", "articles.newspaper_id")
->where("newspapers.active", 1);
})
->orderBy("date", "desc")
->paginate(5)
At first, I was using Eloquent (whereHas) but Eloquent was generating non optimized query using (where exists), so I had to go the joins way.
What can I do to improve execution time of this query?
Result of SHOW INDEXES FROM articles;
+----------+------------+-------------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Ignored |
+----------+------------+-------------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| articles | 0 | PRIMARY | 1 | id | A | 36072 | NULL | NULL | | BTREE | | | NO |
| articles | 0 | articles_link_unique | 1 | link | A | 36072 | NULL | NULL | | BTREE | | | NO |
| articles | 0 | articles_img_name_unique | 1 | img_name | A | 36072 | NULL | NULL | | BTREE | | | NO |
| articles | 1 | articles_newspaper_id_foreign | 1 | newspaper_id | A | 32 | NULL | NULL | | BTREE | | | NO |
| articles | 1 | articles_tag_id_foreign | 1 | tag_id | A | 12 | NULL | NULL | | BTREE | | | NO |
| articles | 1 | data | 1 | date | A | 36072 | NULL | NULL | | BTREE | | | NO |
+----------+------------+-------------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
This query was suggested by Rick James as a solution
SELECT `articles`.*
FROM `articles`
WHERE EXISTS ( SELECT 1 FROM tags WHERE id = `articles`.`tag_id` and active = 1)
AND EXISTS ( SELECT 1 FROM newspapers WHERE id = `articles`.`newspaper_id` and active = 1)
ORDER BY `date` DESC
LIMIT 5;
Running EXPLAIN on this query yields the following result
+------+-------------+------------+--------+-------------------------------------------------------+-------------------------------+---------+------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+--------+-------------------------------------------------------+-------------------------------+---------+------------------------+------+----------------------------------------------+
| 1 | PRIMARY | newspapers | ALL | PRIMARY | NULL | NULL | NULL | 18 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | articles | ref | articles_newspaper_id_foreign,articles_tag_id_foreign | articles_newspaper_id_foreign | 8 | mouhim.newspapers.id | 1127 | |
| 1 | PRIMARY | tags | eq_ref | PRIMARY | PRIMARY | 8 | mouhim.articles.tag_id | 1 | Using where |
+------+-------------+------------+--------+-------------------------------------------------------+-------------------------------+---------+------------------------+------+----------------------------------------------+
Assuming you don't want dups, change to this; it is likely to be much faster:
SELECT `articles`.*
FROM `articles`
WHERE EXISTS ( SELECT 1 FROM tags
WHERE id = `articles`.`tag_id` )
AND EXISTS ( SELECT 1 FROM newspapers
WHERE id = `articles`.`newspaper_id` )
ORDER BY `date` DESC
LIMIT 5;
Also, have this index on articles:
INDEX(date)
(This is a rare use case for starting index with a column that will be used in a 'range'.)
(Sorry, I don't speak 'Laravel'; maybe someone else can help with that part.)
PS. Having 3 UNIQUE keys on a table is highly unusual. It often indicates a problem with the schema design.
each article has one and only one Tag associated with it
Can multiple articles have the same Tag?
when I remove the ORDER BY clause, the query becomes very fast (0.001sec).
That is because you get whatever 5 rows are easy to return to you. Clearly the ORDER BY is part of the requirement. "Using temporary; Using filesort" says there was at least a sort. It will actually be a "file" sort -- because SELECT * includes a TEXT column. (There is a technique to avoid "file", but I don't think it is needed here.)
I am not sure if the two queries are supposed to be same, but they are not.
Anyway for the second query I think this should be better
Article::leftJoin('tags', 'articles.tag_id', '=', 'tags.id)
->where('tags.latName', $tag)
->orderBy("articles.date", "desc")
->select(['articles.*'])
->paginate(5);
The problem is probably, that the subquery you created in whereIn is slowing it down and whereIn itself may as well slow your query. This may be eased by using join and where.
As for the first query, can you show how you did the index for date? :)
I have a single table book_log Mysql 5.7
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| book_id | int(11) | YES | MUL | NULL | |
| type | int(11) | NO | MUL | NULL | |
| value | int(11) | YES | | NULL | |
| created_at | datetime | NO | | NULL | |
+------------+----------+------+-----+---------+----------------+
Book table makes connection with series (One series can have many books)
Create table info :
book_log | CREATE TABLE `book_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`book_id` int(11) DEFAULT NULL,
`type` int(11) NOT NULL,
`value` int(11) DEFAULT NULL,
`created_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_7E42115316A2B381` (`book_id`),
KEY `IDX_TYPE` (`type`),
KEY `IDX_ME` (`book_id`,`type`) USING BTREE,
CONSTRAINT `FK_7E42115316A2B381` FOREIGN KEY (`book_id`) REFERENCES `book` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1158962 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
book | CREATE TABLE `book` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`series_id` int(11) DEFAULT NULL,
`language_id` int(11) DEFAULT NULL,
`position` int(11) NOT NULL,
`dir` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_CBE5A3315278319C` (`series_id`),
KEY `IDX_CBE5A33182F1BAF4` (`language_id`),
CONSTRAINT `FK_CBE5A3315278319C` FOREIGN KEY (`series_id`) REFERENCES `series` (`id`),
CONSTRAINT `FK_CBE5A33182F1BAF4` FOREIGN KEY (`language_id`) REFERENCES `language` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=55022 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
I make the avg value for a given series
select AVG(value)
from book_log
join book b on book_log.book_id = b.id
where type = 20 and b.series_id = ?;
Explain :
+----+-------------+----------+------------+------+------------------------------+----------------------+---------+----------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+------------------------------+----------------------+---------+----------------+------+----------+-------------+
| 1 | SIMPLE | b | NULL | ref | PRIMARY,IDX_CBE5A3315278319C | IDX_CBE5A3315278319C | 5 | const | 212 | 100.00 | Using index |
| 1 | SIMPLE | book_log | NULL | ref | IDX_7E42115316A2B381,IDX_ME | IDX_7E42115316A2B381 | 5 | bdd.b.id | 33 | 100.00 | NULL |
+----+-------------+----------+------------+------+------------------------------+----------------------+---------+----------------+------+----------+-------------+
Or
select AVG(value)
from book_log
where type = 20 AND book_id IN (
select id from book where series_id = ?
);
Explain :
+----+-------------+----------+------------+------+--------------------------------------+----------------------+---------+-------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+--------------------------------------+----------------------+---------+-------------------+------+----------+-------------+
| 1 | SIMPLE | book | NULL | ref | PRIMARY,IDX_CBE5A3315278319C | IDX_CBE5A3315278319C | 5 | const | 212 | 100.00 | Using index |
| 1 | SIMPLE | book_log | NULL | ref | IDX_7E42115316A2B381,IDX_TYPE,IDX_ME | IDX_7E42115316A2B381 | 5 | bdd.book.id | 33 | 3.72 | Using where |
+----+-------------+----------+------------+------+--------------------------------------+----------------------+---------+-------------------+------+----------+-------------+
I have 10 973 results for these query, 42ms for a count(*) but more than 1 sec for the avg query.
I don't understand why is it so long.
Any idea ?
Thx.
You can expect to COUNT(*) be as fast or faster than SUM(somecol) or AVG(othercolumn). Why? The database server is, by the rules of SQL, to apply any optimization that yields the correct anwer. COUNT(*) has some serious optimization to it.
But the aggregate functions that do arithmetic ; they must instead examine every record. So, slower.
You can create a purpose-built index for your query.
It is this:
ALTER TABLE book_log
ADD INDEX type_id_val
(type, book_id, val)
I chose these columns for the index because your query searches for a particuler type in the index. Upon finding the first row of the chosen type, MySQL can range-scan through just the index and not the table. So, faster. It's called a covering index.
I crawed lots of data, and saved it into mysql table, but there're some data duplicated, and I want to delete them in a effective way.
table (ads_info)
+------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ad_id | varchar(64) | YES | MUL | NULL | |
| adset_id | varchar(64) | YES | MUL | NULL | |
| campaign_id | varchar(64) | YES | | NULL | |
| account_id | varchar(64) | YES | MUL | NULL | |
| conversion_specs | text | YES | | NULL | |
| creative | text | YES | | NULL | |
| effective_status | varchar(32) | YES | | NULL | |
| status | varchar(32) | YES | | NULL | |
| name | varchar(255) | YES | | NULL | |
| tracking_specs | text | YES | | NULL | |
| object_store_url | varchar(255) | YES | | NULL | |
| link | varchar(255) | YES | | NULL | |
| object_type | varchar(32) | YES | | NULL | |
| updated_time | timestamp | YES | | NULL | |
| created_time | timestamp | YES | | NULL | |
+------------------+--------------+------+-----+---------+----------------+
show create table ads_info
CREATE TABLE `ads_info` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`ad_id` varchar(64) DEFAULT NULL,
`adset_id` varchar(64) DEFAULT NULL,
`campaign_id` varchar(64) DEFAULT NULL,
`account_id` varchar(64) DEFAULT NULL,
`conversion_specs` text,
`creative` text,
`effective_status` varchar(32) DEFAULT NULL,
`status` varchar(32) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`tracking_specs` text,
`object_store_url` varchar(255) DEFAULT NULL,
`link` varchar(255) DEFAULT NULL,
`object_type` varchar(32) DEFAULT NULL,
`updated_time` timestamp NULL DEFAULT NULL,
`created_time` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ad_id` (`ad_id`),
KEY `adset_id` (`adset_id`),
KEY `account_id` (`account_id`)
) ENGINE=InnoDB AUTO_INCREMENT=18827534 DEFAULT CHARSET=utf8mb4
There're over ten million ad info in the table, and about 40 thusand repeated. And I want delete all those repeated data.
Here's my poor trial
1)select all repeated ad_id
select ad_id from ads_info group by ad_id having count(id) > 1;
#42387 rows in set (12.42 sec)
The query cost 12s, but I don't know how to do an optimization.
2) use subquery to delete all these repeated data.
delete from ads_info where ad_id in ( select ad_id from (select ad_id from ads_info group by ad_id having count(id) > 1) t);
But I failed to get response from mysql with this trial, it seemed to be hanged with the query.
How can I delete these repeated data?
You needed a UNIQUE key in the first place. This will add it and dedup:
ALTER IGNORE TABLE ads_info
ADD UNIQUE KEY(ad_id);
If you want delete all the occurence then
Instead of IN clause you could try using a join
delete ads_info
from ads_info
INNER JOIN (
select ad_id
from ads_info
group by ad_id
having count(*) > 1
) T ON T.ad_id = ads_info.ad_id
be sure you have and index on ads_info.ad_id
if you have index .. but the query optimizer don't use and you are sure is a valid index you could try using USE or FORCE
delete ads_info
from ads_info
INNER JOIN (
select ad_id
from ads_info
group by ad_id
having count(*) > 1
) T FORCE INDEX FOR JOIN (`ad_id`) ON T.ad_id = ads_info.ad_id
I have the following query. the main tables are company_reports and project_con_messages with 770,000 and 1,040,000 records respectively.
When I run this query it takes about 20 seconds and when I remove the last table company_par_user_settings it takes less than 0.5 seconds.
company_par_user_settings (short name: pus) has about 200,000 records and is meant to show user settings for each company_partner. on company_par_user_settings table we have composite unique index key on partner_id and user_id fields. I also removed the index and replaced with simple indexes on partner_id and user_id but at the end, it didn't make any big difference in running time.
SELECT *
FROM company_reports rep
LEFT JOIN system_users usr ON rep.user_id=usr.id
LEFT JOIN company_rep_subjects sbj ON rep.subject_id=sbj.id
INNER JOIN company_partners par ON rep.partner_id=par.id
LEFT JOIN project_con_messages mes ON rep.message_id=mes.id
LEFT JOIN company_par_user_settings pus ON par.id=pus.partner_id AND 1=pus.user_id
WHERE 1=1
ORDER BY rep.id DESC
LIMIT 0,50
Here below I added the explain on the above query:
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| 1 | SIMPLE | rep | NULL | ALL | partner_id | NULL | NULL | NULL | 772236 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | par | NULL | eq_ref | PRIMARY | PRIMARY | 3 | portal_ebrahim.rep.partner_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | usr | NULL | eq_ref | PRIMARY | PRIMARY | 2 | portal_ebrahim.rep.user_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | sbj | NULL | eq_ref | PRIMARY | PRIMARY | 2 | portal_ebrahim.rep.subject_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | mes | NULL | eq_ref | PRIMARY | PRIMARY | 4 | portal_ebrahim.rep.message_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | pus | NULL | ALL | NULL | NULL | NULL | NULL | 191643 | 100.00 | Using where; Using join buffer (Block Nested Loop) |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
I appreciate if someone help me on the right indexes or any other solutions that make the query run faster.
Edit:
Here is the show table for company_par_user_settings
CREATE TABLE `company_par_user_settings` (
`id` mediumint(9) NOT NULL AUTO_INCREMENT,
`partner_id` mediumint(8) unsigned NOT NULL,
`user_id` smallint(5) unsigned NOT NULL,
`access` tinyint(1) unsigned NOT NULL DEFAULT '0' COMMENT '0-Not specified',
`access_category` tinyint(1) unsigned NOT NULL DEFAULT '0',
`notify` tinyint(1) unsigned NOT NULL DEFAULT '0',
`stars` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `partner_id` (`partner_id`,`user_id`),
KEY `stars` (`stars`)
) ENGINE=MyISAM AUTO_INCREMENT=198729 DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci
One of my tables is losing the primary key every time I make an update on that particular table.
Describe zizi_card_household
gives me this result after the update.
+-------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | | NULL | auto_increment |
| householdnumber | varchar(45) | NO | | | |
| cardnumber | varchar(45) | YES | MUL | NULL | |
| startdate | datetime | YES | | NULL | |
| enddate | datetime | YES | | NULL | |
| assignedby | int(11) | YES | | NULL | |
| assigneddate | datetime | YES | | NULL | |
Where it should normally be
+-------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| householdnumber | varchar(45) | NO | MUL | | |
| cardnumber | varchar(45) | YES | MUL | NULL | |
| startdate | datetime | YES | | NULL | |
| enddate | datetime | YES | | NULL | |
| assignedby | int(11) | YES | | NULL | |
SHOW CREATE TABLE
will give this statement which shows a primary key was defined. It's possible that entry is updated whenever the table is altered so it shows the table as it was last altered or created.
CREATE TABLE `zizi_card_household` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`householdnumber` varchar(45) NOT NULL DEFAULT '',
`cardnumber` varchar(45) DEFAULT NULL,
`startdate` datetime DEFAULT NULL,
`enddate` datetime DEFAULT NULL,
`assignedby` int(11) DEFAULT NULL,
`assigneddate` datetime DEFAULT NULL,
`deassignedby` int(11) DEFAULT NULL,
.
.
.
.
`modifiedby` int(11) DEFAULT NULL,
`reprintstatus` tinyint(4) DEFAULT NULL,
`printeddate` datetime DEFAULT NULL,
`printedby` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_householdnumber` (`householdnumber`),
KEY `idx_cardnumber` (`cardnumber`),
KEY `idx_deassignedby` (`deassignedby`),
.
.
.
.
KEY `idx_reprintstatus` (`reprintstatus`)
) ENGINE=InnoDB AUTO_INCREMENT=860137 DEFAULT CHARSET=latin1
Attempting to add a primary key
ALTER TABLE `labour`.`zizi_card_household`
CHANGE COLUMN `id` `id` INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY
at this point gives me
ERROR 1068 (42000): Multiple primary key defined
Repair table
repair table zizi_card_household;
Gives me this but it does solve the problem. The primary key returns but will get ruined the next time the table is updated.
+----------------------------+--------+----------+---------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+----------------------------+--------+----------+---------------------------------------------------------+
| labour.zizi_card_household | repair | note | The storage engine for the table doesn't support repair |
+----------------------------+--------+----------+---------------------------------------------------------+
The table and other tables have a trigger that updates a summary table. It deletes related rows and reconstructs them. It does something like
FOR EACH ROW
begin
delete from zizi_card_summary where householdnumber=new.householdnumber;
insert into zizi_card_summary(
cardnumber,cardhouseholdid,householdnumber,startdate,enddate,assignedby
.
.
.
)
select ch.cardnumber,ch.id,b.householdnumber,
ch.startdate,ch.enddate,ch.assignedby,
.
.
.
.
where b.householdnumber=new.householdnumber and b.beneficiary_type=0;
end
All other tables work just fine. Thanks. Been playing around with this for a week. The application is a Zend framework 1.12.
The db version is 10.1.11-MariaDB-log. The operating system is Linux 2.6.32-573.8.1.el6.x86_64 x86_64.
I can confirm the bug and the workaround as described by Timothy : removing the trigger keeps the Primary key after a record update.
Bug resolution progress is on https://jira.mariadb.org/browse/MDEV-9558