I'm having a really weird issue that burns my MySQL server. From my point of view (which is surely wrong), the query is pretty trivial.
I have a table to store PBX events and I try to get the last events for every agent to see his/her situation whenever my application is restarted or whatever.
Whenever I launch, the server goes up to 99% of CPU and lasts about 5 minutes to solve by itself.
It seems that's because the number of records, more than 100,000.
The table is as follows:
CREATE TABLE IF NOT EXISTS `eventos_centralita` (
`idEvent` int(11) NOT NULL AUTO_INCREMENT,
`fechaHora` datetime NOT NULL,
`codAgente` varchar(8) DEFAULT NULL,
`extension` varchar(20) DEFAULT NULL,
`evento` varchar(45) DEFAULT NULL,
PRIMARY KEY (`idEvent`),
KEY `codAgente` (`codAgente`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=105847 ;
And the query is as follows:
SELECT a.* FROM eventos_centralita a
LEFT JOIN eventos_centralita b ON b.codAgente = a.codAgente AND b.fechaHora > a.fechaHora
GROUP BY a.codAgente
I've tried to limit it by date but no luck as the query doesn't give me anything. How could I improve the query to avoid this ?
Please try below:
SELECT a.* FROM eventos_centralita a
INNER JOIN
(
SELECT idEvent, MAX(fechaHora)
FROM eventos_centralita
GROUP BY codAgente
) as b
ON a.idEvent = b.idEvent
Related
I have an app developed over NodeJS on AWS that has a MySQL RDS database (server class: db.r3.large - Engine: InnoDB) associated. We are having a performance problem, when we execute simultaneous queries (at the same time), the database is returning the results after finishing the last query and not after each query is finished.
So, as an example: if we execute a process that has 10 simultaneous queries of 3 seconds each, we start receiving the results at approximately 30 seconds and we want to start receiving when the first query is finished (3 seconds).
It seems that the database is receiving the queries and make a queue of them.
I'm kind of lost here since I changed several things (separate connections, pool connections, etc) of the code and the settings of AWS but doesn’t seem to improve the result.
TableA (13M records) schema:
CREATE TABLE `TableA` (
`columnA` int(11) NOT NULL AUTO_INCREMENT,
`columnB` varchar(20) DEFAULT NULL,
`columnC` varchar(15) DEFAULT NULL,
`columnD` varchar(20) DEFAULT NULL,
`columnE` varchar(255) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
`columnH` varchar(10) DEFAULT NULL,
`columnI` bigint(11) DEFAULT NULL,
`columnJ` bigint(11) DEFAULT NULL,
`columnK` varchar(5) DEFAULT NULL,
`columnL` varchar(50) DEFAULT NULL,
`columnM` varchar(20) DEFAULT NULL,
`columnN` int(1) DEFAULT NULL,
`columnO` int(1) DEFAULT '0',
`columnP` datetime NOT NULL,
`columnQ` datetime NOT NULL,
PRIMARY KEY (`columnA`),
KEY `columnB` (`columnB`),
KEY `columnO` (`columnO`),
KEY `columnK` (`columnK`),
KEY `columnN` (`columnN`),
FULLTEXT KEY `columnE` (`columnE`)
) ENGINE=InnoDB AUTO_INCREMENT=13867504 DEFAULT CHARSET=utf8;
TableB (15M records) schema:
CREATE TABLE `TableB` (
`columnA` int(11) NOT NULL AUTO_INCREMENT,
`columnB` varchar(50) DEFAULT NULL,
`columnC` varchar(50) DEFAULT NULL,
`columnD` int(1) DEFAULT NULL,
`columnE` datetime NOT NULL,
`columnF` datetime NOT NULL,
PRIMARY KEY (`columnA`),
KEY `columnB` (`columnB`),
KEY `columnC` (`columnC`)
) ENGINE=InnoDB AUTO_INCREMENT=19153275 DEFAULT CHARSET=utf8;
Query:
SELECT COUNT(*) AS total
FROM TableA
WHERE TableA.columnB IN (
SELECT TableB.columnC
FROM TableB
WHERE TableB.columnB = "3764301"
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
)
AND columnM > 2;
1 execution return in 2s
10 executions return the first result in 20s and the another results after that time.
To see that queries are running I'm using "SHOW FULL PROCESSLIST" and the queries are most of the time with state "sending data".
It is not a performance issue about the query, it is a problem of recurrence over database. Even a very simple query like "SELECT COUNT(*) FROM TableA WHERE columnM = 5" has the same problem.
UPDATE
Only for testing purpose I reduce the query to only one subquery condition. Both results have 65k records.
-- USING IN
SELECT COUNT(*) as total
FROM TableA
WHERE TableA.columnB IN (
SELECT TableB.columnC
FROM TableB
WHERE TableB.columnB = "103550181"
AND TableB.columnC NOT IN (
SELECT field
FROM tableX
WHERE fieldX = 15
)
)
AND columnM > 2;
-- USING EXISTS
SELECT COUNT(*) as total
FROM TableA
WHERE EXISTS (
SELECT *
FROM TableB
WHERE TableB.columnB = "103550181"
AND TableA.columnB = TableB.columnC
AND NOT EXISTS (
SELECT *
FROM tableX
WHERE fieldX = 15
AND fieldY = TableB.columnC
)
)
AND columnM > 2;
-- Result
Query using IN : 1.7 sec
Query using EXISTS : 141 sec (:O)
Using IN or EXISTS the problem is the same, when I execute many times this query the data base have a delay and the response comes after a lot of time.
Example: If one query response in 1.7 sec, if I execute 10 times this query, the first result is in 20 sec.
Recommendation 1
Change the NOT IN ( SELECT ... ) to NOT EXISTS ( SELECT * ... ). (And you may need to change the WHERE clause a bit.
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
-->
AND NOT EXISTS ( SELECT * FROM table WHERE field = TableB.columnC )
table needs an index on field.
IN ( SELECT ... ) performs very poorly. EXISTS is much better optimized.
Recommendation 2
To deal with the concurrency, consider doing SET SESSION TRANSACTION READ UNCOMMITTED before the query. This may keep one connection from interfering with another.
Recommendation 3
Show us the EXPLAIN, the indexes (SHOW CREATE TABLE) (what you gave is not sufficient), and the WHERE clauses so we can critique the indexes.
Recommendation 4
It might help for TableB to have a composite INDEX(ColumnB, ColumnC) in that order.
What I can see here is that HUGE temporary table is being build for each query. Consider different architecture.
We have the following mysql tables (simplified for going straight to the point)
CREATE TABLE `MONTH_RAW_EVENTS` (
`idEvent` int(11) unsigned NOT NULL,
`city` varchar(45) NOT NULL,
`country` varchar(45) NOT NULL,
`ts` datetime NOT NULL,
`idClient` varchar(45) NOT NULL,
`event_category` varchar(45) NOT NULL,
... bunch of other fields
PRIMARY KEY (`idEvent`),
KEY `idx_city` (`city`),
KEY `idx_country` (`country`),
KEY `idClient` (`idClient`),
) ENGINE=InnoDB;
CREATE TABLE `compilation_table` (
`idClient` int(11) unsigned DEFAULT NULL,
`city` varchar(200) DEFAULT NULL,
`month` int(2) DEFAULT NULL,
`year` int(4) DEFAULT NULL,
`events_profile` int(10) unsigned NOT NULL DEFAULT '0',
`events_others` int(10) unsigned NOT NULL DEFAULT '0',
`events_total` int(10) unsigned NOT NULL DEFAULT '0',
KEY `idx_month` (`month`),
KEY `idx_year` (`year`),
KEY `idx_idClient` (`idClient`),
KEY `idx_city` (`city`)
) ENGINE=InnoDB;
MONTH_RAW_EVENTS contains almost 20M rows having user performed actions in a website, it sizes almost 4GB
compilation_table has a summary clients/cities per each month, we use it for displaying stats on a website in real time
We process the statistics (from first table to second one) once per month, and we're trying to optimize a query that performs such operation (as until now we're processing everything in PHP which takes loong loong time)
Here's the query we came up with, which seems doing the job when using small subsets of data,
the problem that takes more than 6hours to process for the full set of data
INSERT INTO compilation_table (idClient,city,month,year,events_profile,events_others)
SELECT IFNULL(OTHERS.idClient,AP.idClient) as idClient,
IF(IFNULL(OTHERS.city,AP.city)='','Others',IFNULL(OTHERS.city,AP.city)) as city,
01,2014,
IFNULL(AP.cnt,0) as events_profile,
IFNULL(OTHERS.cnt,0) as events_others
FROM
(
SELECT idClient,CONCAT(city,', ',country) as city,count(*) as cnt
FROM `MONTH_RAW_EVENTS` WHERE `ts`>'2014-01-01 00:00:00' AND `ts`<='2014-01-31 23:59:59'
AND `event_category`!='CLIENT PROFILE'
GROUP BY idClient,city
) as OTHERS
LEFT JOIN
(
SELECT idClient,CONCAT(city,', ',country) as city,count(*) as cnt
FROM `MONTH_RAW_EVENTS` WHERE `ts`>'2014-01-01 00:00:00' AND `ts`<='2014-01-31 23:59:59'
AND `event_category`='CLIENT PROFILE'
GROUP BY idClient,city
) as CLIPROFILE
ON CLIPROFILE.city=OTHERS.city and CLIPROFILE.idClient=OTHERS.idClient
UNION
SELECT IFNULL(OTHERS.idClient,CLIPROFILE.idClient) as idClient,
IF(IFNULL(OTHERS.city,CLIPROFILE.city)='','Others',IFNULL(OTHERS.city,CLIPROFILE.city)) as city,
01,2014,
IFNULL(CLIPROFILE.cnt,0) as events_profile,
IFNULL(OTHERS.cnt,0) as events_others
FROM
(
SELECT idClient,CONCAT(city,', ',country) as city,count(*) as cnt
FROM `MONTH_RAW_EVENTS` WHERE `ts`>'2014-01-01 00:00:00' AND `ts`<='2014-01-31 23:59:59'
AND `event_category`!='CLIENT PROFILE'
GROUP BY idClient,city
) as OTHERS
RIGHT JOIN
(
SELECT idClient,CONCAT(city,', ',country) as city,count(*) as cnt
FROM `MONTH_RAW_EVENTS` WHERE `ts`>'2014-01-01 00:00:00' AND `ts`<='2014-01-31 23:59:59'
AND `event_category`='CLIENT PROFILE'
GROUP BY idClient,city
) as CLIPROFILE
ON CLIPROFILE.city=OTHERS.city and CLIPROFILE.idClient=OTHERS.idClient
What we're trying to do is a FULL Outer Join in Mysql so the basic schema of the query is like: the one proposed here
How can we optimize the query? we've been trying diferent indexes, swiching things around but after 8 hours still didnt finished running,
The MySQL server is a Percona MySQL 5.5 dedicated machine with 2cpu, 2GB ram, and SSD disk,
we optimized the configuration of such server using Percona tools,
Any help would be really appreciated,
thanks
You're doing a UNION which results in DISTINCT processing.
It's usually better to rewrite a Full Join to a Left Join plus the non-matching rows of a Right Join (if it's proper 1:n join)
OTHERS LEFT JOIN CLIPROFILE
ON CLIPROFILE.city=OTHERS.city and CLIPROFILE.idClient=OTHERS.idClient
union all
OTHERS RIGHT JOIN CLIPROFILE
ON CLIPROFILE.city=OTHERS.city and CLIPROFILE.idClient=OTHERS.idClient
WHERE OTHERS.idClient IS NULL
Additionally you might materialize the results of the Derived Tables in temp tables before joining them, thus the calculation is only done once (I don't know if MySQL's optimizer is smart enough to do that automatically).
Plus it might be more efficient to group by and join on city/country as separate columns and do the CONCAT(city,', ',country) as city in the outer step.
This is second time that i can face this kind of issue to retrieve the data.
CREATE TABLE `pm_projects` (
`project_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`project_name` varchar(255) NOT NULL,
`assigned_client` varchar(100) NOT NULL,
`project_detail` longtext NOT NULL,
`creation_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`project_status` tinyint(1) NOT NULL,
PRIMARY KEY (`project_id`),
KEY `assigned_client` (`assigned_client`)
) ENGINE=MyISAM AUTO_INCREMENT=4 DEFAULT CHARSET=utf8
On the above table i have a field assigned_client that holds the multiple id's of the clients that are assigned to the project with comma separated(3,4,...).
And i am trying to fetch the result on this table with the Assigned Client's Name(that is on my pm_users table) with JOIN, I tried the following:
SELECT
p.project_id, u.user_name, p.project_name,
p.creation_date, p.project_status
FROM pm_projects p
LEFT JOIN pm_users u ON u.user_id
IN (
'p.assigned_clients'
)
that returns the NULL value of u.user_name field.
Can i have to change my schema, if yes then how?
OR i am trying with wrong Query?
You can use find_in_set for this:
on find_in_set(u.user_id, p.assigned_clients) > 0;
Note that there are no single quotes around p.assigned_clients. This is another error in your query (but even if you replaced it with back quotes, the query still wouldn't work).
However, the problem is your table schema. You should have a separate association table, with one row per user and assigned client.
Trying to store this all in one field will only lead to problems, overly-complicated queries, and performance problems in the future.
I would go with a many to many link approach.
Something like
CREATE TABLE pm_project_client_link(
project_id INT,
client_id INT
)
That would allow you to write the query something like
SELECT
p.project_id,
u.user_name,
p.project_name,
p.creation_date,
p.project_status
FROM pm_projects p INNER JOIN
pm_project_client_link pcl ON p.project_id = pcl.project_id INNER JOIN
pm_users u ON pcl.client_id = user_id
I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
WHERE ciid NOT
IN (
SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.
Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.
Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?
Here is the latest version of the query, derived from suggestions from #Gordon, below
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
LEFT JOIN (
SELECT ciid, dateRead
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND `cid`
IN ( 6648, 329, 323, 6654, 6647 )
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:
CREATE TABLE IF NOT EXISTS `channelitem` (
`newsversion` int(11) NOT NULL DEFAULT '0',
`cid` int(11) NOT NULL DEFAULT '0',
`ciid` int(11) NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`url` varchar(222) DEFAULT NULL,
`creationdate` datetime DEFAULT NULL,
`urgent` varchar(10) DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`author` varchar(255) NOT NULL,
PRIMARY KEY (`ciid`),
KEY `newsversion` (`newsversion`),
KEY `cid` (`cid`),
KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;
CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
`ciid` int(11) NOT NULL,
`uid` int(11) NOT NULL,
`dateRead` datetime NOT NULL,
PRIMARY KEY (`ciid`,`uid`),
KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
It never hurts to try the left outer join version of such a query:
SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM `channelitem` ci left outer join
(SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
) cr
on ci.ciid = cr.ciid
where cr.ciid is null and
ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC
LIMIT 0 , 100
This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).
The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing
Yesterday I run into some sql weirdness. I had a query that melted the server so, trying to improve it, I made this query:
SELECT idEvent, MAX( fechaHora ) , codAgente, evento FROM eventos_centralita GROUP BY codAgente
And it seems to work for this schema:
CREATE TABLE IF NOT EXISTS `eventos_centralita` (
`idEvent` int(11) NOT NULL AUTO_INCREMENT,
`fechaHora` datetime NOT NULL,
`codAgente` varchar(8) DEFAULT NULL,
`extension` varchar(20) DEFAULT NULL,
`evento` varchar(45) DEFAULT NULL,
PRIMARY KEY (`idEvent`),
KEY `codAgente` (`codAgente`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=105847 ;
I mean, that the hour it's indeed the MAX one for the agent. However, the id of the event and the event itself is wrong...
So, is this a bug or is this expected?
You are mixing an aggregate function and a "normal" column select. This "feature" only works in MySQL and returns a random id.
Normally you should group by a specific column and the use aggregate functions to select all other columns not in that group. Example:
SELECT e1.codAgente, e1.idEvent, e1.fechaHora, e1.evento
FROM eventos_centralita e1
inner join
(
select codAgente, MAX(fechaHora) as fechaHora
from eventos_centralita
group by codAgente
) e2
on e1.codAgente = e2.codAgente and e1.fechaHora = e2.fechaHora