There are two tables as follows in my problem.
CREATE TABLE `t_user_relation` (
`User_id` INT(32) UNSIGNED NOT NULL ,
`Follow_id` INT(32) UNSIGNED NOT NULL ,
PRIMARY KEY (`User_id`,Follow_id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `t_user_info`(
`User_id` int(32) unsigned NOT NULL ,
`User_name` varchar(20) NOT NULL ,
`User_avatar` varchar(60) NOT NULL ,
`Msg_count` int(32) unsigned DEFAULT '0' ,
`Fans_count` int(32) unsigned DEFAULT '0' ,
`Follow_count` int(32) unsigned DEFAULT '0' ,
PRIMARY KEY (`User_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
What I am trying to do is to update the Fans_count filed of the t_user_info table. My update statement is as follows:
UPDATE t_user_info set t_user_info.Fans_count=(SELECT COUNT(*) FROM t_user_relation
WHERE t_user_relation.Follow_id=t_user_info.User_id);
But it execute really slow! The table t_user_info consist of 20,445 records and t_user_relation consist of 1,809,915 records.Can anyone help me improve the speed! Thanks for any advices!
I would try this:
UPDATE
t_user_info inner join
(SELECT Follow_id, COUNT(*) as cnt
FROM t_user_relation
GROUP BY Follow_id) s
on t_user_info.User_id=s.Follow_id
SET t_user_info.Fans_count=s.cnt
I'm using a subquery to calculate the count of rows for every Follow_id in table t_user_relation:
SELECT Follow_id, COUNT(*) as cnt
FROM t_user_relation
GROUP BY Follow_id
I am then joining the result of this query with t_user_info, and I am updating Fans_count where the join succeeds, setting it to the count calculated in the subquery.
A query written like this usually runs faster because the resulting rows from the subquery are calculated only once, before the join, while in your solution your subquery is calculated once for every user row.
When dealing with a large number of records on a DB you want to stay away from the wildcard (*) and utilize indexes.
Related
I have a one table with millions of entry.Below is table structure.
CREATE TABLE `useractivity` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`userid` bigint(20) NOT NULL,
`likes` bigint(20) DEFAULT NULL,
`views` bigint(20) DEFAULT NULL,
`shares` bigint(20) DEFAULT NULL,
`totalcount` bigint(20) DEFAULT NULL,
`status` bigint(20) DEFAULT NULL,
`createdat` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `userid` (`userid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And Below is query in which i am getting slow performance.
SELECT userid,
(sum(likes)+SUM(views)+SUM(shares)+SUM(totalcount)+SUM(`status`)) as total
from useractivity
GROUP BY userid
ORDER BY total DESC
limit 0, 20;
When i am executing above query without ORDER BY then it gives me fast result set But when using ORDER BY then this query became slow,though i used limit for pagination.
What can I do to speed up this query?
You can't speed up the query as it is, MySQL needs to visit every single row and calculate the sum before sorting and finally returning the first rows. That is bound to take time. You can probably cheat though.
The most obvious approach would be to create a summary table with userid and total. Update it when the base table changes or recompute it regularly, whatever makes sense. In that table you can index total, which makes the query trivial.
Another option may be to find the top users. Most sites have users that are more active than the others. Keep the 1000 top users in a separate table, then use the same select but only for the top users (i.e. join with that table). Only the useractivity rows for the top users need to be visited, which should be fast. If 1000 users are not enough perhaps 10000 works.
I have an app developed over NodeJS on AWS that has a MySQL RDS database (server class: db.r3.large - Engine: InnoDB) associated. We are having a performance problem, when we execute simultaneous queries (at the same time), the database is returning the results after finishing the last query and not after each query is finished.
So, as an example: if we execute a process that has 10 simultaneous queries of 3 seconds each, we start receiving the results at approximately 30 seconds and we want to start receiving when the first query is finished (3 seconds).
It seems that the database is receiving the queries and make a queue of them.
I'm kind of lost here since I changed several things (separate connections, pool connections, etc) of the code and the settings of AWS but doesn’t seem to improve the result.
TableA (13M records) schema:
CREATE TABLE `TableA` (
`columnA` int(11) NOT NULL AUTO_INCREMENT,
`columnB` varchar(20) DEFAULT NULL,
`columnC` varchar(15) DEFAULT NULL,
`columnD` varchar(20) DEFAULT NULL,
`columnE` varchar(255) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
`columnH` varchar(10) DEFAULT NULL,
`columnI` bigint(11) DEFAULT NULL,
`columnJ` bigint(11) DEFAULT NULL,
`columnK` varchar(5) DEFAULT NULL,
`columnL` varchar(50) DEFAULT NULL,
`columnM` varchar(20) DEFAULT NULL,
`columnN` int(1) DEFAULT NULL,
`columnO` int(1) DEFAULT '0',
`columnP` datetime NOT NULL,
`columnQ` datetime NOT NULL,
PRIMARY KEY (`columnA`),
KEY `columnB` (`columnB`),
KEY `columnO` (`columnO`),
KEY `columnK` (`columnK`),
KEY `columnN` (`columnN`),
FULLTEXT KEY `columnE` (`columnE`)
) ENGINE=InnoDB AUTO_INCREMENT=13867504 DEFAULT CHARSET=utf8;
TableB (15M records) schema:
CREATE TABLE `TableB` (
`columnA` int(11) NOT NULL AUTO_INCREMENT,
`columnB` varchar(50) DEFAULT NULL,
`columnC` varchar(50) DEFAULT NULL,
`columnD` int(1) DEFAULT NULL,
`columnE` datetime NOT NULL,
`columnF` datetime NOT NULL,
PRIMARY KEY (`columnA`),
KEY `columnB` (`columnB`),
KEY `columnC` (`columnC`)
) ENGINE=InnoDB AUTO_INCREMENT=19153275 DEFAULT CHARSET=utf8;
Query:
SELECT COUNT(*) AS total
FROM TableA
WHERE TableA.columnB IN (
SELECT TableB.columnC
FROM TableB
WHERE TableB.columnB = "3764301"
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
)
AND columnM > 2;
1 execution return in 2s
10 executions return the first result in 20s and the another results after that time.
To see that queries are running I'm using "SHOW FULL PROCESSLIST" and the queries are most of the time with state "sending data".
It is not a performance issue about the query, it is a problem of recurrence over database. Even a very simple query like "SELECT COUNT(*) FROM TableA WHERE columnM = 5" has the same problem.
UPDATE
Only for testing purpose I reduce the query to only one subquery condition. Both results have 65k records.
-- USING IN
SELECT COUNT(*) as total
FROM TableA
WHERE TableA.columnB IN (
SELECT TableB.columnC
FROM TableB
WHERE TableB.columnB = "103550181"
AND TableB.columnC NOT IN (
SELECT field
FROM tableX
WHERE fieldX = 15
)
)
AND columnM > 2;
-- USING EXISTS
SELECT COUNT(*) as total
FROM TableA
WHERE EXISTS (
SELECT *
FROM TableB
WHERE TableB.columnB = "103550181"
AND TableA.columnB = TableB.columnC
AND NOT EXISTS (
SELECT *
FROM tableX
WHERE fieldX = 15
AND fieldY = TableB.columnC
)
)
AND columnM > 2;
-- Result
Query using IN : 1.7 sec
Query using EXISTS : 141 sec (:O)
Using IN or EXISTS the problem is the same, when I execute many times this query the data base have a delay and the response comes after a lot of time.
Example: If one query response in 1.7 sec, if I execute 10 times this query, the first result is in 20 sec.
Recommendation 1
Change the NOT IN ( SELECT ... ) to NOT EXISTS ( SELECT * ... ). (And you may need to change the WHERE clause a bit.
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
-->
AND NOT EXISTS ( SELECT * FROM table WHERE field = TableB.columnC )
table needs an index on field.
IN ( SELECT ... ) performs very poorly. EXISTS is much better optimized.
Recommendation 2
To deal with the concurrency, consider doing SET SESSION TRANSACTION READ UNCOMMITTED before the query. This may keep one connection from interfering with another.
Recommendation 3
Show us the EXPLAIN, the indexes (SHOW CREATE TABLE) (what you gave is not sufficient), and the WHERE clauses so we can critique the indexes.
Recommendation 4
It might help for TableB to have a composite INDEX(ColumnB, ColumnC) in that order.
What I can see here is that HUGE temporary table is being build for each query. Consider different architecture.
I have two tables with the following schema,
CREATE TABLE `open_log` (
`delivery_id` varchar(30) DEFAULT NULL,
`email_id` varchar(50) DEFAULT NULL,
`email_activity` varchar(30) DEFAULT NULL,
`click_url` text,
`email_code` varchar(30) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `sent_log` (
`email_id` varchar(50) DEFAULT NULL,
`delivery_id` varchar(50) DEFAULT NULL,
`email_code` varchar(50) DEFAULT NULL,
`delivery_status` varchar(50) DEFAULT NULL,
`tries` int(11) DEFAULT NULL,
`creation_ts` varchar(50) DEFAULT NULL,
`creation_dt` varchar(50) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The email_id and delivery_id columns in both tables make up a unique key.
The open_log table have 2.5 million records where as sent_log table has 0.25 million records.
I want to filter out the records from open log table based on the unique key (email_id and delivery_id).
I'm writing the following query.
SELECT * FROM open_log
WHERE CONCAT(email_id,'^',delivery_id)
IN (
SELECT DISTINCT CONCAT(email_id,'^',delivery_id) FROM sent_log
)
The problem is the query is taking too much time to execute. I've waited for an hour for the query completion but didn't succeed.
Kindly, suggest what I can do to make it fast since, I have the big data size in the tables.
Thanks,
Faisal Nasir
First, rewrite your query using exists:
SELECT *
FROM open_log ol
WHERE EXISTS (SELECT 1
FROM send_log sl
WHERE sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id
);
Then, add an index so this query will run faster:
create index idx_sendlog_emailid_deliveryid on send_log(email_id, delivery_id);
Your query is slow for a variety of reasons:
The use of string concatenation makes it impossible for MySQL to use an index.
The select distinct in the subquery is unnecessary.
Exists can be faster than in.
If this request is often on, you can greatly increase it by create bigint id column, enven if it not unique.
For example you can put trigger and create column like this
alter table sent_log for_get bigint;
After that create trigger/ update it to put hash into that bigint
for_get=CONV(substr(md5(concat(email_id, delivery_id)),1,10),16,10)
If you have such column in both table and index on it, query will be like
SELECT *
FROM open_log ol
left join send_log sl on sl.for_get=ol.for_get
WHERE sl.email_id is not null and sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id;
That query will be fast.
I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
WHERE ciid NOT
IN (
SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.
Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.
Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?
Here is the latest version of the query, derived from suggestions from #Gordon, below
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
LEFT JOIN (
SELECT ciid, dateRead
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND `cid`
IN ( 6648, 329, 323, 6654, 6647 )
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:
CREATE TABLE IF NOT EXISTS `channelitem` (
`newsversion` int(11) NOT NULL DEFAULT '0',
`cid` int(11) NOT NULL DEFAULT '0',
`ciid` int(11) NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`url` varchar(222) DEFAULT NULL,
`creationdate` datetime DEFAULT NULL,
`urgent` varchar(10) DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`author` varchar(255) NOT NULL,
PRIMARY KEY (`ciid`),
KEY `newsversion` (`newsversion`),
KEY `cid` (`cid`),
KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;
CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
`ciid` int(11) NOT NULL,
`uid` int(11) NOT NULL,
`dateRead` datetime NOT NULL,
PRIMARY KEY (`ciid`,`uid`),
KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
It never hurts to try the left outer join version of such a query:
SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM `channelitem` ci left outer join
(SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
) cr
on ci.ciid = cr.ciid
where cr.ciid is null and
ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC
LIMIT 0 , 100
This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).
The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing
Yesterday I run into some sql weirdness. I had a query that melted the server so, trying to improve it, I made this query:
SELECT idEvent, MAX( fechaHora ) , codAgente, evento FROM eventos_centralita GROUP BY codAgente
And it seems to work for this schema:
CREATE TABLE IF NOT EXISTS `eventos_centralita` (
`idEvent` int(11) NOT NULL AUTO_INCREMENT,
`fechaHora` datetime NOT NULL,
`codAgente` varchar(8) DEFAULT NULL,
`extension` varchar(20) DEFAULT NULL,
`evento` varchar(45) DEFAULT NULL,
PRIMARY KEY (`idEvent`),
KEY `codAgente` (`codAgente`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=105847 ;
I mean, that the hour it's indeed the MAX one for the agent. However, the id of the event and the event itself is wrong...
So, is this a bug or is this expected?
You are mixing an aggregate function and a "normal" column select. This "feature" only works in MySQL and returns a random id.
Normally you should group by a specific column and the use aggregate functions to select all other columns not in that group. Example:
SELECT e1.codAgente, e1.idEvent, e1.fechaHora, e1.evento
FROM eventos_centralita e1
inner join
(
select codAgente, MAX(fechaHora) as fechaHora
from eventos_centralita
group by codAgente
) e2
on e1.codAgente = e2.codAgente and e1.fechaHora = e2.fechaHora