I have an app developed over NodeJS on AWS that has a MySQL RDS database (server class: db.r3.large - Engine: InnoDB) associated. We are having a performance problem, when we execute simultaneous queries (at the same time), the database is returning the results after finishing the last query and not after each query is finished.
So, as an example: if we execute a process that has 10 simultaneous queries of 3 seconds each, we start receiving the results at approximately 30 seconds and we want to start receiving when the first query is finished (3 seconds).
It seems that the database is receiving the queries and make a queue of them.
I'm kind of lost here since I changed several things (separate connections, pool connections, etc) of the code and the settings of AWS but doesn’t seem to improve the result.
TableA (13M records) schema:
CREATE TABLE `TableA` (
`columnA` int(11) NOT NULL AUTO_INCREMENT,
`columnB` varchar(20) DEFAULT NULL,
`columnC` varchar(15) DEFAULT NULL,
`columnD` varchar(20) DEFAULT NULL,
`columnE` varchar(255) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
`columnH` varchar(10) DEFAULT NULL,
`columnI` bigint(11) DEFAULT NULL,
`columnJ` bigint(11) DEFAULT NULL,
`columnK` varchar(5) DEFAULT NULL,
`columnL` varchar(50) DEFAULT NULL,
`columnM` varchar(20) DEFAULT NULL,
`columnN` int(1) DEFAULT NULL,
`columnO` int(1) DEFAULT '0',
`columnP` datetime NOT NULL,
`columnQ` datetime NOT NULL,
PRIMARY KEY (`columnA`),
KEY `columnB` (`columnB`),
KEY `columnO` (`columnO`),
KEY `columnK` (`columnK`),
KEY `columnN` (`columnN`),
FULLTEXT KEY `columnE` (`columnE`)
) ENGINE=InnoDB AUTO_INCREMENT=13867504 DEFAULT CHARSET=utf8;
TableB (15M records) schema:
CREATE TABLE `TableB` (
`columnA` int(11) NOT NULL AUTO_INCREMENT,
`columnB` varchar(50) DEFAULT NULL,
`columnC` varchar(50) DEFAULT NULL,
`columnD` int(1) DEFAULT NULL,
`columnE` datetime NOT NULL,
`columnF` datetime NOT NULL,
PRIMARY KEY (`columnA`),
KEY `columnB` (`columnB`),
KEY `columnC` (`columnC`)
) ENGINE=InnoDB AUTO_INCREMENT=19153275 DEFAULT CHARSET=utf8;
Query:
SELECT COUNT(*) AS total
FROM TableA
WHERE TableA.columnB IN (
SELECT TableB.columnC
FROM TableB
WHERE TableB.columnB = "3764301"
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
)
AND columnM > 2;
1 execution return in 2s
10 executions return the first result in 20s and the another results after that time.
To see that queries are running I'm using "SHOW FULL PROCESSLIST" and the queries are most of the time with state "sending data".
It is not a performance issue about the query, it is a problem of recurrence over database. Even a very simple query like "SELECT COUNT(*) FROM TableA WHERE columnM = 5" has the same problem.
UPDATE
Only for testing purpose I reduce the query to only one subquery condition. Both results have 65k records.
-- USING IN
SELECT COUNT(*) as total
FROM TableA
WHERE TableA.columnB IN (
SELECT TableB.columnC
FROM TableB
WHERE TableB.columnB = "103550181"
AND TableB.columnC NOT IN (
SELECT field
FROM tableX
WHERE fieldX = 15
)
)
AND columnM > 2;
-- USING EXISTS
SELECT COUNT(*) as total
FROM TableA
WHERE EXISTS (
SELECT *
FROM TableB
WHERE TableB.columnB = "103550181"
AND TableA.columnB = TableB.columnC
AND NOT EXISTS (
SELECT *
FROM tableX
WHERE fieldX = 15
AND fieldY = TableB.columnC
)
)
AND columnM > 2;
-- Result
Query using IN : 1.7 sec
Query using EXISTS : 141 sec (:O)
Using IN or EXISTS the problem is the same, when I execute many times this query the data base have a delay and the response comes after a lot of time.
Example: If one query response in 1.7 sec, if I execute 10 times this query, the first result is in 20 sec.
Recommendation 1
Change the NOT IN ( SELECT ... ) to NOT EXISTS ( SELECT * ... ). (And you may need to change the WHERE clause a bit.
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
-->
AND NOT EXISTS ( SELECT * FROM table WHERE field = TableB.columnC )
table needs an index on field.
IN ( SELECT ... ) performs very poorly. EXISTS is much better optimized.
Recommendation 2
To deal with the concurrency, consider doing SET SESSION TRANSACTION READ UNCOMMITTED before the query. This may keep one connection from interfering with another.
Recommendation 3
Show us the EXPLAIN, the indexes (SHOW CREATE TABLE) (what you gave is not sufficient), and the WHERE clauses so we can critique the indexes.
Recommendation 4
It might help for TableB to have a composite INDEX(ColumnB, ColumnC) in that order.
What I can see here is that HUGE temporary table is being build for each query. Consider different architecture.
Related
I have two tables with the following schema,
CREATE TABLE `open_log` (
`delivery_id` varchar(30) DEFAULT NULL,
`email_id` varchar(50) DEFAULT NULL,
`email_activity` varchar(30) DEFAULT NULL,
`click_url` text,
`email_code` varchar(30) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `sent_log` (
`email_id` varchar(50) DEFAULT NULL,
`delivery_id` varchar(50) DEFAULT NULL,
`email_code` varchar(50) DEFAULT NULL,
`delivery_status` varchar(50) DEFAULT NULL,
`tries` int(11) DEFAULT NULL,
`creation_ts` varchar(50) DEFAULT NULL,
`creation_dt` varchar(50) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The email_id and delivery_id columns in both tables make up a unique key.
The open_log table have 2.5 million records where as sent_log table has 0.25 million records.
I want to filter out the records from open log table based on the unique key (email_id and delivery_id).
I'm writing the following query.
SELECT * FROM open_log
WHERE CONCAT(email_id,'^',delivery_id)
IN (
SELECT DISTINCT CONCAT(email_id,'^',delivery_id) FROM sent_log
)
The problem is the query is taking too much time to execute. I've waited for an hour for the query completion but didn't succeed.
Kindly, suggest what I can do to make it fast since, I have the big data size in the tables.
Thanks,
Faisal Nasir
First, rewrite your query using exists:
SELECT *
FROM open_log ol
WHERE EXISTS (SELECT 1
FROM send_log sl
WHERE sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id
);
Then, add an index so this query will run faster:
create index idx_sendlog_emailid_deliveryid on send_log(email_id, delivery_id);
Your query is slow for a variety of reasons:
The use of string concatenation makes it impossible for MySQL to use an index.
The select distinct in the subquery is unnecessary.
Exists can be faster than in.
If this request is often on, you can greatly increase it by create bigint id column, enven if it not unique.
For example you can put trigger and create column like this
alter table sent_log for_get bigint;
After that create trigger/ update it to put hash into that bigint
for_get=CONV(substr(md5(concat(email_id, delivery_id)),1,10),16,10)
If you have such column in both table and index on it, query will be like
SELECT *
FROM open_log ol
left join send_log sl on sl.for_get=ol.for_get
WHERE sl.email_id is not null and sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id;
That query will be fast.
I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
WHERE ciid NOT
IN (
SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.
Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.
Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?
Here is the latest version of the query, derived from suggestions from #Gordon, below
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
LEFT JOIN (
SELECT ciid, dateRead
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND `cid`
IN ( 6648, 329, 323, 6654, 6647 )
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:
CREATE TABLE IF NOT EXISTS `channelitem` (
`newsversion` int(11) NOT NULL DEFAULT '0',
`cid` int(11) NOT NULL DEFAULT '0',
`ciid` int(11) NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`url` varchar(222) DEFAULT NULL,
`creationdate` datetime DEFAULT NULL,
`urgent` varchar(10) DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`author` varchar(255) NOT NULL,
PRIMARY KEY (`ciid`),
KEY `newsversion` (`newsversion`),
KEY `cid` (`cid`),
KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;
CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
`ciid` int(11) NOT NULL,
`uid` int(11) NOT NULL,
`dateRead` datetime NOT NULL,
PRIMARY KEY (`ciid`,`uid`),
KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
It never hurts to try the left outer join version of such a query:
SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM `channelitem` ci left outer join
(SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
) cr
on ci.ciid = cr.ciid
where cr.ciid is null and
ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC
LIMIT 0 , 100
This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).
The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing
I am facing some issue in query execution here is my case :
I have two tables log with 2 lakh records and logrecords with 6 lakh records
Where single log record in log table can have multiple log messages in logrecords table my database schema is as below
log Table
CREATE TABLE `log` (
`logid` varchar(50) NOT NULL DEFAULT '',
`creationtime` bigint(20) DEFAULT NULL,
`serviceInitiatorID` varchar(50) DEFAULT NULL,
PRIMARY KEY (`logid`),
KEY `idx_creationtime_wsc_log` (`creationtime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
And logrecords Table
CREATE TABLE `logrecords` (
`logrecordid` varchar(50) NOT NULL DEFAULT '',
`timestamp` bigint(20) DEFAULT NULL,
`message` varchar(8000) DEFAULT NULL,
`loglevel` int(11) DEFAULT NULL,
`logid` varchar(50) DEFAULT NULL,
`indexcolumn` int(11) DEFAULT NULL,
PRIMARY KEY (`logrecordid`),
KEY `indx_logrecordid_message_logid` (`logrecordid`,`message`(767),`logid`),
KEY `logid` (`logid`),
KEY `indx_message` (`message`(767))
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Query created by hibernate is like
select this_.logid as logid4_1_, this_.loglevel as loglevel4_1_, this_.creationtime as creation3_4_1_,this_.serviceInitiatorID as service17_4_1_, this_.logtype as logtype4_1_,logrecord1_.logrecordid as logrecor1_3_0_, logrecord1_.timestamp as timestamp3_0_, logrecord1_.message as message3_0_, logrecord1_.loglevel as loglevel3_0_, logrecord1_.logid as logid3_0_, logrecord1_.indexcolumn as indexcol6_3_0_ from log this_ inner join wsc_logrecords logrecord1_ on this_.logid=logrecord1_.logid where (1=1) and (1=1) and logrecord1_.message like 'SecondMessage' order by this_.creationtime desc limit 25
Which taking around 7313ms to execute
Query Explain is like
But when I execute below query it is taking around 15 min to execute
select count(*) as y0_ from log this_ inner join logrecords logrecord1_ on this_.logid=logrecord1_.logid where (1=1) and (1=1) and lower(logrecord1_.message) like 'SecondMessage' order by this_.creationtime desc limit 25
For above query explain is like
and I am using MySQl database. I think there is some issue in indexing or some other which I am not able to identify
Any solution will be appreciated.
When you use lower(logrecord1_.message) like 'SecondMessage' instead of plain logrecord1_.message like 'SecondMessage' the DB engine will stop using the index on logrecord1_.message.
You can overcome this by creating a function based index that has lower(logrecord1_.message) in place of logrecord1_.message.
There are two tables as follows in my problem.
CREATE TABLE `t_user_relation` (
`User_id` INT(32) UNSIGNED NOT NULL ,
`Follow_id` INT(32) UNSIGNED NOT NULL ,
PRIMARY KEY (`User_id`,Follow_id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `t_user_info`(
`User_id` int(32) unsigned NOT NULL ,
`User_name` varchar(20) NOT NULL ,
`User_avatar` varchar(60) NOT NULL ,
`Msg_count` int(32) unsigned DEFAULT '0' ,
`Fans_count` int(32) unsigned DEFAULT '0' ,
`Follow_count` int(32) unsigned DEFAULT '0' ,
PRIMARY KEY (`User_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
What I am trying to do is to update the Fans_count filed of the t_user_info table. My update statement is as follows:
UPDATE t_user_info set t_user_info.Fans_count=(SELECT COUNT(*) FROM t_user_relation
WHERE t_user_relation.Follow_id=t_user_info.User_id);
But it execute really slow! The table t_user_info consist of 20,445 records and t_user_relation consist of 1,809,915 records.Can anyone help me improve the speed! Thanks for any advices!
I would try this:
UPDATE
t_user_info inner join
(SELECT Follow_id, COUNT(*) as cnt
FROM t_user_relation
GROUP BY Follow_id) s
on t_user_info.User_id=s.Follow_id
SET t_user_info.Fans_count=s.cnt
I'm using a subquery to calculate the count of rows for every Follow_id in table t_user_relation:
SELECT Follow_id, COUNT(*) as cnt
FROM t_user_relation
GROUP BY Follow_id
I am then joining the result of this query with t_user_info, and I am updating Fans_count where the join succeeds, setting it to the count calculated in the subquery.
A query written like this usually runs faster because the resulting rows from the subquery are calculated only once, before the join, while in your solution your subquery is calculated once for every user row.
When dealing with a large number of records on a DB you want to stay away from the wildcard (*) and utilize indexes.
The following MySql update state seems to take an excessive amount of time to execute for the recordset provided (~5000 records). The update statement below takes on average 12 seconds to execute. I currently plan to run this calculation for 5 different periods and about 500 different stock symbols. This translates into 12secs * 5 calculations * 500 symbols = 30,000 seconds or 8..33 hrs.
Update Statement:
UPDATE tblStockDataMovingAverages_AAPL JOIN
(SELECT t1.Sequence,
(
SELECT AVG(t2.Close)
FROM tblStockDataMovingAverages_AAPL AS t2
WHERE (t1.Sequence - t2.Sequence)BETWEEN 0 AND 7
)AS "8SMA"
FROM tblStockDataMovingAverages_AAPL AS t1
ORDER BY t1.Sequence) AS ma_query
ON tblStockDataMovingAverages_AAPL.Sequence = ma_query.Sequence
SET tblStockDataMovingAverages_AAPL.8MA_Price = ma_query.8SMA
Table Design:
CREATE TABLE `tblStockDataMovingAverages_AAPL` (
`Symbol` char(6) NOT NULL DEFAULT '',
`TradeDate` date NOT NULL DEFAULT '0000-00-00',
`Sequence` int(11) DEFAULT NULL,
`Close` decimal(18,5) DEFAULT NULL,
`200MA_Price` decimal(18,5) DEFAULT NULL,
`100MA_Price` decimal(18,5) DEFAULT NULL,
`50MA_Price` decimal(18,5) DEFAULT NULL,
`20MA_Price` decimal(18,5) DEFAULT NULL,
`8MA_Price` decimal(18,5) DEFAULT NULL,
`50_200_Cross` int(5) DEFAULT NULL,
PRIMARY KEY (`Symbol`,`Sequence`),
KEY `idxSequnce` (`Sequence`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1$$
Any help on sppeding up the process would be greatly appreciated.
Output of Select Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 index NULL idxSymbol_Sequnce 11 NULL 5205 Using index; Using filesort
2 DEPENDENT SUBQUERY t2 ALL NULL NULL NULL NULL 5271 Using where
This should be a little better:
update tblStockDataMovingAverages_AAPL
join (
select t1.sequence as sequence, avg(t2.close) as av
from tblStockDataMovingAverages_AAPL t1
join tblStockDataMovingAverages_AAPL t2
on t2.sequence BETWEEN t1.sequence-7 AND t1.sequence
group by t1.sequence
) t1 on tblStockDataMovingAverages_AAPL.sequence = t1.sequence
set 8MA_Price = t1.av
With regard to my BETWEEN statement: field1 OPERATOR expression(field2) is easier to optimise than expression(field1, field2) OPERATOR expression in the ON condition. I think this holds for BETWEEN.
It looks like the ORDER BY in your query is unnecessary and removing it might speed your query up a ton.
If any of the stock symbols appear in the same table, stick all these into a single update query (different periods won't work though), this would likely be way faster than running it for each.
As already suggested, adding an index to Close may help.
you can optimize it slightly by adding index to Close field. AVG function have to be more effective. Please share dump of your dataset to see it more close.