I have following table with around 10 million records.
and using following query to retrieve data, but it is taking more than 4, 5 seconds to hand over the response.
Is any way to improve query...?
CREATE TABLE `master` (
`organizationName` varchar(200) NOT NULL DEFAULT '',
`organizationNameQuery` varchar(200) DEFAULT NULL,
`organizationLinkedinHandle` varchar(200) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',
`organizationDomain` varchar(110) NOT NULL DEFAULT '',
`source` varchar(10) NOT NULL DEFAULT '',
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY `master_inx` (`organizationName`(80),`organizationDomain`(80),`organizationLinkedinHandle`(80),`organizationNameQuery`(80),`source`),
KEY `organizationDomain` (`organizationDomain`),
KEY `domainWithModified` (`organizationDomain`,`modified`),
KEY `modifiedInx` (`modified`)
);
Query:
SELECT *
FROM (SELECT *
FROM Organizations.master
where ( ( organizationDomain like 'linkedin.com'
|| organizationNameQuery = 'linkedin.com')
and source like 'MY_SOURCE') ) M
ORDER BY M.modified DESC limit 1;
1 row in set (4.69 sec)
UPDATE
I found by breaking OR operator i am getting result faster.
For example:
SELECT *
FROM (SELECT *
FROM Organizations.master
where ( ( organizationDomain like 'linkedin.com')
and source like 'MY_SOURCE') ) M
ORDER BY M.modified DESC limit 1;
1 row in set (0.00 sec)
SELECT *
FROM (SELECT *
FROM Organizations.master
where ( (organizationNameQuery = 'linkedin.com')
and source like 'MY_SOURCE') ) M
ORDER BY M.modified DESC limit 1;
1 row in set (0.00 sec)
Use OR, not || in that context.
The performance villain is OR. Turn the OR into UNION:
( SELECT *
FROM Organizations.master
WHERE organizationDomain = 'linkedin.com'
AND source = 'MY_SOURCE'
ORDER BY modified DESC limit 1
) UNION ALL
( SELECT *
FROM Organizations.master
WHERE organizationNameQuery = 'linkedin.com'
AND source = 'MY_SOURCE'
ORDER BY modified DESC limit 1
}
ORDER BY modified DESC LIMIT 1;
Notes:
This formulation is likely to take about 0.00s to run.
The ORDER BY and LIMIT shows up 3 times.
If you need OFFSET, things get a little tricky.
Change back to LIKE if you allow users to enter wildcards.
A leading wildcard would not be efficient.
UNION ALL is faster than UNION (aka UNION DISTINCT).
It needs two new composite indexes; the order of the 2 columns is not critical:
INDEX(organizationDomain, source),
INDEX(organizationNameQuery, source)
As I checked the query I think you can remove the like operator and use =.
SELECT * FROM (
SELECT * FROM Organizations.master
where ( (organizationDomain = 'linkedin.com' ||
organizationNameQuery = 'linkedin.com')
and source = 'MY_SOURCE')
) M
ORDER BY M.modified DESC limit 1
Related
I have the following MySQL query
SELECT `category`
FROM `jeopardy_questions`
WHERE `amount` = "$2,000"
GROUP BY `category`
HAVING COUNT(*) > 4
ORDER BY RAND() LIMIT 1
This will grab me a random category where there is at least 5 questions in that category.
Now I want to grab all the rows for that category. So how can I do a second SELECT WHERE category is equal to the category returned from the previous query?
I tried the following but I believe the RAND() is causing it to crash/timeout.
SELECT *
FROM `jeopardy_questions`
WHERE `category` = (
SELECT `category`
FROM `jeopardy_questions`
WHERE `amount` = "$2,000"
GROUP BY `category`
HAVING COUNT(*) > 4
ORDER BY RAND() LIMIT 1
)
You can use the above query as a subquery. Something like this:
SELECT *
FROM `jeopardy_questions`
WHERE `category` = (
SELECT `category`
FROM `jeopardy_questions`
WHERE `amount` = "$2,000"
GROUP BY `category`
HAVING COUNT(*) > 4
ORDER BY RAND() LIMIT 1
)
SELECT max(sum(`orderquantity`)), `medicinename`
FROM `orerdetails`
WHERE `OID`=
(
SELECT `OrderID`
FROM `order`
where `VID` = 5 AND `OrerResponse` = 1
)
GROUP BY `medicinename`
i want to get the max of the result(sum of the order quantity) but it gives error any soultion to solve this
You don't need Max() here. Instead sort your recordset by that Sum('orderquantity') descending, and take the first record returned:
SELECT sum(`orderquantity`) as sumoforderqty, `medicinename`
FROM `orerdetails`
WHERE `OID`=
(
SELECT `OrderID`
FROM `order`
where `VID` = 5 AND `OrerResponse` = 1
)
GROUP BY `medicinename`
ORDER BY sumoforderqty DESC
LIMIT 1
I am using mysql2 module in nodejs v8.9.4.
This is my function to get a message from message queue which meets this conditions :
status==0
if count of botId with status==1 is less than 10
if retry_after in wait table for botId+chatId and just botId is less than NOW(timestamp)
if there is no same chatId with status==1
static async Find(activeMessageIds, maxActiveMsgPerBot) {
let params = [maxActiveMsgPerBot];
let filterActiveMessageIds = ' ';
let time = Util.GetTimeStamp();
if (activeMessageIds && activeMessageIds.length) {
filterActiveMessageIds = 'q.id NOT IN (?) AND ';
params.push(activeMessageIds);
}
let q =
`select q.*
from bot_message_queue q
left join bot_message_queue_wait w on q.botId=w.botId AND q.chatId=w.chatId
left join bot_message_queue_wait w2 on q.botId=w2.botId AND w2.chatId=0
where
q.status=0 AND
q.botId NOT IN (select q2.botId from bot_message_queue q2 where q2.status=1 group by q2.botId HAVING COUNT(q2.botId)>?) AND
${filterActiveMessageIds}
q.chatId NOT IN (select q3.chatId from bot_message_queue q3 where q3.status=1 group by q3.chatId) AND
(w.retry_after IS NULL OR w.retry_after <= ?) AND
(w2.retry_after IS NULL OR w2.retry_after <= ?)
order by q.priority DESC,q.id ASC
limit 1;`;
params.push(time);
params.push(time);
let con = await DB.connection();
let result = await DB.query(q, params, con);
if (result && result.length) {
result = result[0];
let updateQ = `update bot_message_queue set status=1 where id=?;`;
await DB.query(updateQ, [result.id], con);
} else
result = null;
con.release();
return result;
}
This query runs fine on my local dev system. It also runs fine in servers phpmyadmin in couple of milliseconds.
BUT when it runs throw nodejs+mysql2 The cpu usage goes up to 100%
There is only 2K rows in this table.
CREATE TABLE IF NOT EXISTS `bot_message_queue` (
`id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`botId` int(10) UNSIGNED NOT NULL,
`chatId` varchar(50) CHARACTER SET utf8 NOT NULL,
`type` varchar(50) DEFAULT NULL,
`message` longtext NOT NULL,
`add_date` int(10) UNSIGNED NOT NULL,
`status` tinyint(2) UNSIGNED NOT NULL DEFAULT '0' COMMENT '0=waiting,1=sendig,2=sent,3=error',
`priority` tinyint(1) UNSIGNED NOT NULL DEFAULT '5' COMMENT '5=normal messages,<5 = bulk messages',
`delay_after` int(10) UNSIGNED NOT NULL DEFAULT '1000',
`send_date` int(10) UNSIGNED DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `botId` (`botId`,`status`),
KEY `botId_2` (`botId`,`chatId`,`status`,`priority`),
KEY `chatId` (`chatId`,`status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE IF NOT EXISTS `bot_message_queue_wait` (
`botId` int(10) UNSIGNED NOT NULL,
`chatId` varchar(50) CHARACTER SET utf8 NOT NULL,
`retry_after` int(10) UNSIGNED NOT NULL,
PRIMARY KEY (`botId`,`chatId`),
KEY `retry_after` (`retry_after`),
KEY `botId` (`botId`,`chatId`,`retry_after`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
UPDATE: Real table data here
UPDATE 2:
FetchMessageTime :
- Min : 1788 ms
- Max : 44285 ms
- Average : 20185.4 ms
The max was like 20ms until yesterday :( now its 40 seconds!!!
UPDATE 3: I merged these 2 joins and wheres:
left join bot_message_queue_wait w on q.botId=w.botId AND q.chatId=w.chatId
left join bot_message_queue_wait w2 on q.botId=w2.botId AND w2.chatId=0
(w.retry_after IS NULL OR w.retry_after <= ?) AND
(w2.retry_after IS NULL OR w2.retry_after <= ?)
into a single one, I hope this will work as intended!
left join bot_message_queue_wait w on q.botId=w.botId AND ( q.chatId=w.chatId OR w.chatId=0 )
and for the time being I removed the 2 wheres and the query time went back to normal.
q.botId NOT IN (select ...)
q.chatId NOT IN (select ...)
So these 2 where queries are the chock points and needs to be fixed.
NOT IN ( SELECT ... ) is difficult to optimize.
OR cannot be optimized.
In ORDER BY, mixing DESC and ASC eliminates use of an index (until 8.0). Consider changing ASC to DESC. After that, INDEX(priority, id) might help.
What is ${filterActiveMessageIds}?
The GROUP BY is not needed in
NOT IN ( SELECT q3.chatId
from bot_message_queue q3
where q3.status=1
group by q3.chatId )
INDEX(status, chatid) in this order would benefit that subquery.
INDEX(status, botid) in this order
More on index creation: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
I would replace the NOT IN subquery with a NOT EXISTS in this case, as it can perform better.
Switch the ORDER BY to either all DESC or all ASC
So to optimize the query, first, add these indexes:
ALTER TABLE `bot_message_queue` ADD INDEX `bot_message_queue_idx_status_botid_chatid_priori_id` (`status`,`botId`,`chatId`,`priority`,`id`);
ALTER TABLE `bot_message_queue` ADD INDEX `bot_message_queue_idx_priority_id` (`priority`,`id`);
ALTER TABLE `bot_message_queue` ADD INDEX `bot_message_queue_idx_botid_status` (`botId`,`status`);
ALTER TABLE `bot_message_queue` ADD INDEX `bot_message_queue_idx_chatid_status` (`chatId`,`status`);
ALTER TABLE `bot_message_queue_wait` ADD INDEX `bot_message_queue_wa_idx_chatid_botid` (`chatId`,`botId`);
Now, you can try to run this query (please note I changed the order by to all DESC, so you can change it to ASC if that's a requirement):
SELECT
bot_message_queue.*
FROM
bot_message_queue q
LEFT JOIN
bot_message_queue_wait w
ON q.botId = w.botId
AND q.chatId = w.chatId
LEFT JOIN
bot_message_queue_wait w2
ON q.botId = w2.botId
AND w2.chatId = 0
WHERE
q.status = 0
AND NOT EXISTS (
SELECT
1
FROM
bot_message_queue AS q21
WHERE
q21.status = 1
AND q.botId = q21.botId
GROUP BY
q21.botId
HAVING
COUNT(q21.botId) > ?
ORDER BY
NULL
)
AND NOT EXISTS (
SELECT
1
FROM
bot_message_queue AS q32
WHERE
q32.status = 1
AND q.chatId = q32.chatId
GROUP BY
q32.chatId
ORDER BY
NULL
)
AND (
w.retry_after IS NULL
OR w.retry_after <= ?
)
AND (
w2.retry_after IS NULL
OR w2.retry_after <= ?
)
ORDER BY
q.priority DESC,
q.id DESC LIMIT 1
I'm trying to create a SQL statement to find the matching record based on the provided post code and stored post codes in the database plus the weight aspect.
The post codes in the database are between 1 or 2 characters i.e. B, BA ...
Now - the value passed to the SQL statement will always have 2 first characters of the client's post code. How can I find the match for it? Say I have a post code B1, which would only match the single B in the database plus the weight aspect, which I'm ok with.
Here's my current SQL statement, which also takes the factor of the free shipping above certain weight:
SELECT `s`.*,
IF (
'{$weight}' > (
SELECT MAX(`weight_from`)
FROM `shipping`
WHERE UPPER(SUBSTRING(`post_code`, 1, 2)) = 'B1'
),
(
SELECT `cost`
FROM `shipping`
WHERE UPPER(SUBSTRING(`post_code`, 1, 2)) = 'B1'
ORDER BY `weight_from` DESC
LIMIT 0, 1
),
`s`.`cost`
) AS `cost`
FROM `shipping` `s`
WHERE UPPER(SUBSTRING(`s`.`post_code`, 1, 2)) = 'B1'
AND
(
(
'{$weight}' > (
SELECT MAX(`weight_from`)
FROM `shipping`
WHERE UPPER(SUBSTRING(`post_code`, 1, 2)) = 'B1'
)
)
OR
('{$weight}' BETWEEN `s`.`weight_from` AND `s`.`weight_to`)
)
LIMIT 0, 1
The above however uses the SUBSTRING() function with hard coded number of characters set to 2 - this is where I need some help really to make it match only number of characters that matches the provided post code - in this case B1.
Marcus - thanks for the help - outstanding example - here's what my code look like for those who also wonder:
First I've run the following statement to get the right post code:
(
SELECT `post_code`
FROM `shipping`
WHERE `post_code` = 'B1'
)
UNION
(
SELECT `post_code`
FROM `shipping`
WHERE `post_code` = SUBSTRING('B1', 1, 1)
)
ORDER BY `post_code` DESC
LIMIT 0, 1
Then, based on the returned value assigned to the 'post_code' index my second statement followed with:
$post_code = $result['post_code'];
SELECT `s`.*,
IF (
'1000' > (
SELECT MAX(`weight_from`)
FROM `shipping`
WHERE `post_code` = '{$post_code}'
),
(
SELECT `cost`
FROM `shipping`
WHERE `post_code` = '{$post_code}'
ORDER BY `weight_from` DESC
LIMIT 0, 1
),
`s`.`cost`
) AS `cost`
FROM `shipping` `s`
WHERE `s`.`post_code` = '{$post_code}'
AND
(
(
'1000' > (
SELECT MAX(`weight_from`)
FROM `shipping`
WHERE `post_code` = '{$post_code}'
ORDER BY LENGTH(`post_code`) DESC
)
)
OR
('1000' BETWEEN `s`.`weight_from` AND `s`.`weight_to`)
)
LIMIT 0, 1
The following query will get all results where the post_code in the shipping table matches the beginning of the passed in post_code, then it orders it most explicit to least explicit, returning the most explicit one:
SELECT *
FROM shipping
WHERE post_code = SUBSTRING('B1', 1, LENGTH(post_code))
ORDER BY LENGTH(post_code) DESC
LIMIT 1
Update
While this query is flexible, it's not very fast, since it can't utilize an index. If the shipping table is large, and you'll only pass in up to two characters, it might be faster to make two separate calls.
First, try the most explicit call.
SELECT *
FROM shipping
WHERE post_code = 'B1'
If it doesn't return a result then search on a single character:
SELECT *
FROM shipping
WHERE post_code = SUBSTRING('B1', 1, 1)
Of course, you can combine these with a UNION if you must do it in a single call:
SELECT * FROM
((SELECT *
FROM shipping
WHERE post_code = 'B1')
UNION
(SELECT *
FROM shipping
WHERE post_code = SUBSTRING('B1', 1, 1))) a
ORDER BY post_code DESC
LIMIT 1
I have problem with optimize this query:
SET #SEARCH = "dokumentalne";
SELECT SQL_NO_CACHE
`AA`.`version` AS `Version` ,
`AA`.`contents` AS `Contents` ,
`AA`.`idarticle` AS `AdressInSQL` ,
`AA` .`topic` AS `Topic` ,
MATCH (`AA`.`topic` , `AA`.`contents`) AGAINST (#SEARCH) AS `Relevance` ,
`IA`.`url` AS `URL`
FROM `xv_article` AS `AA`
INNER JOIN `xv_articleindex` AS `IA` ON ( `AA`.`idarticle` = `IA`.`adressinsql` )
INNER JOIN (
SELECT `idarticle` , MAX( `version` ) AS `version`
FROM `xv_article`
WHERE MATCH (`topic` , `contents`) AGAINST (#SEARCH)
GROUP BY `idarticle`
) AS `MG`
ON ( `AA`.`idarticle` = `MG`.`idarticle` )
WHERE `IA`.`accepted` = "yes"
AND `AA`.`version` = `MG`.`version`
ORDER BY `Relevance` DESC
LIMIT 0 , 30
Now, this query using ^ 20 seconds. How to optimize this?
EXPLAIN gives this:
1 PRIMARY AA ALL NULL NULL NULL NULL 11169 Using temporary; Using filesort
1 PRIMARY ALL NULL NULL NULL NULL 681 Using where
1 PRIMARY IA ALL accepted NULL NULL NULL 11967 Using where
2 DERIVED xv_article fulltext topic topic 0 1 Using where; Using temporary; Using filesort
This is example server with my data:
user: bordeux_4prog
password: 4prog
phpmyadmin: http://phpmyadmin.bordeux.net/
chive: http://chive.bordeux.net/
Looks like your db is dead. Getting rid of inner query is the key part to optimization. Please try this (not tested) query:
SET #SEARCH = "dokumentalne";
SELECT SQL_NO_CACHE
aa.idarticle AS `AdressInSQL`,
aa.contents AS `Contents`,
aa.topic AS `Topic`,
MATCH(aa.topic , aa.contents) AGAINST (#SEARCH) AS `Relevance`,
ia.url AS `URL`,
MAX(aa.version) AS `Version`
FROM
xv_article AS aa,
xv_articleindex AS ia
WHERE
aa.idarticle = ia.adressinsql
AND ia.accepted = "yes"
AND MATCH(aa.topic , aa.contents) AGAINST (#SEARCH)
GROUP BY
aa.idarticle,
aa.contents,
`Relevance`,
ia.url
ORDER BY
`Relevance` DESC
LIMIT
0, 30
To further optimize your query you may also split getting articles with newest version from full text search as the latter is the most expensive. This can be done by subquerying (also not tested on your db):
SELECT SQL_NO_CACHE
iq.idarticle AS `AdressInSQL`,
iq.topic AS `Topic`,
iq.contents AS `Contents`,
iq.url AS `URL`,
MATCH(iq.topic, iq.contents) AGAINST (#SEARCH) AS `Relevance`
FROM (
SELECT
a.idarticle,
a.topic,
a.contents,
i.url,
MAX(a.version) AS version
FROM
xv_article AS a,
xv_articleindex AS i
WHERE
i.accepted = "yes"
AND a.idarticle = i.adressinsql
GROUP BY
a.idarticle AS id,
a.topic,
a.contents,
i.url
) AS iq
WHERE
MATCH(iq.topic, iq.contents) AGAINST (#SEARCH)
ORDER BY
`Relevance` DESC
LIMIT
0, 30
The first thing I noticed in your DB is that you don't have an index on xv_articleindex.adressinsql. Add it, and it should significantly improve the query performance. Also, one table is MyISAM, whereas another one is InnoDb. Use one engine(in general, I'd recommend InnoDB)