I'm new to this platform, even this is my first question. Sorry for my bad English, I use translate. Let me know if I have used inappropriate language.
my table is like this
CREATE TABLE tbl_records (
id int(11) NOT NULL,
data_id int(11) NOT NULL,
value double NOT NULL,
record_time datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
ALTER TABLE tbl_records
ADD PRIMARY KEY (id),
ADD KEY data_id (data_id),
ADD KEY record_time (record_time);
ALTER TABLE tbl_records
MODIFY id int(11) NOT NULL AUTO_INCREMENT;
COMMIT;
my first query
It takes 0.0096 seconds
SELECT b.* FROM tbl_records b
INNER JOIN
(SELECT MAX(id) AS id FROM tbl_records GROUP BY data_id) a
ON a.id=b.id;
my second query
Its takes 2.4957 seconds
SELECT MAX(id) AS id FROM tbl_records GROUP BY data_id;
When I do these operations over and over again, the result is similar.
There are 20 million data in the table.
Why is the one with the subquery faster?
Also what I really need is MAX(record_time) but
SELECT b.* FROM tbl_records b
INNER JOIN
(SELECT MAX(record_time) AS id FROM tbl_records GROUP BY data_id) a
ON a.id=b.id
It takes minutes when I run it.
I also need records such as hourly, daily, and monthly. I couldn't see much performance difference between GROUP BY SUBSTR(record_time,1,10) or GROUP BY DATE_FORMAT(record_time,'%Y%m%d') both take minutes.
What am I doing wrong?
The first query can be simplified to
SELECT * FROM tbl_records
ORDER BY id DESC
LIMIT 1.
The second:
SELECT id FROM tbl_records
ORDER BY data_id DESC
LIMIT 1;
I don't know what the third is trying to do. This does not make sense: MAX(record_time) AS id -- it is a DATETIME that will subsequently be compared to an INT in ON a.id=b.id.
Another option for turning a DATETIME into a DATE is simply DATE(record_time). But it will not be significantly faster.
If the goal is to build daily counts and subtotals, then there is a much better way. Build and incrementally maintain a Summary table .
(responding to Comment)
The GROUP BY that you have is improper and probably incorrect. I took the liberty of changing from id to data_id:
SELECT b.*
FROM
( SELECT data_id, MAX(record_time) AS max_time
FROM tbl_records
GROUP BY data_id
) AS a
FROM tbl_records AS b
ON a.data_id = b.data_id
AND a.max_time = b.record_time
And have
INDEX(data_id, record_time)
Can there be duplicate times for one data_id? To discuss that and other "groupwise-max" queries, see http://mysql.rjweb.org/doc.php/groupwise_max
Related
How can we optimize the delete query.
delete FROM student_score
WHERE lesson_id IS NOT null
AND id NOT IN(SELECT MaxID FROM temp)
ORDER BY id
LIMIT 1000
This select statement return "SELECT MaxID FROM temp" 35k lines and temp is a temporary table.
and select * FROM student_score WHERE
lesson_id IS NOT null return around 500k rows
I tried using limit and order by clause but doesn't result in faster ways
IN(SELECT...)` is, in many situations, really inefficient.
Use a multi-table DELETE. This involves a LEFT JOIN ... IS NULL, which is much more efficient.
Once you have mastered that, you might be able to get rid of the temp and simply fold it into the query.
Also more efficient is
WHERE NOT EXISTS ( SELECT 1 FROM temp
WHERE student_score.lesson_id = temp.MAXID )
Also, DELETEing a large number of rows is inherently slow. 1000 is not so bad; 35K is. The reason is the need to save all the potentially-deleted rows until "commit" time.
Other techniques for big deletes: http://mysql.rjweb.org/doc.php/deletebig
Note that one of then explains a more efficient way to walk through the PRIMARY KEY (via id). Note that your query may have to step over lots of ids that have lesson_id IS NULL. That is, the LIMIT 1000 is not doing what you expected.
You can do it without order by :
DELETE FROM student_score
WHERE lesson_id IS NOT null
AND id NOT IN (SELECT MaxID FROM temp)
Or like this using left join which is more optimized in term of speed :
DELETE s
FROM student_score s
LEFT JOIN temp t1 ON s.id = t1.MaxID
WHERE lesson_id IS NOT null and t1.MaxID is null;
I wrote this query after optimizing a php script that did the same thing but using 3 different queries and 2 loops while...and this php script took over 6 hours to run...
So i've compress all in a simple query that to the same job without any loops...
DELETE table FROM table WHERE id IN (
SELECT id from(
SELECT MAX(data_elab) as data_elab_new, count(*) as volte,t1.* FROM (
SELECT * from table ORDER BY data_elab DESC
)t1
group by cod_dl,issn,variante,add_on having volte>1
)t2
);
Note:the server is very old (Windows,3gb of ram,32bit),table size 204 MB,100.000 row,20 columns,only id is primary key,no indexes.
This query took only 20sec...the delete is the problem....
SELECT id from(
SELECT MAX(data_elab) as data_elab_new, count(*) as volte,t1.* FROM (
SELECT * from table ORDER BY data_elab DESC
)t1
group by cod_dl,issn,variante,add_on having volte>1
)t2
The problem is that I thought of speeding up the operation a lot but actually after more than two hours the query did not complete and continues to works...
Any advice to optimize this query or I did something wrong in the query?
Thank you.
Assuming data_elab is never repeated for any combination of cod_dl, issn, variante, add_on (I am assuming that is what "univocal" means), this the form the query you need should take:
DELETE table
FROM table
WHERE (cod_dl, issn, variante, add_on, data_elab) IN (
SELECT cod_dl, issn, variante, add_on, MAX(data_elab) as data_elab_max
FROM table
GROUP BY cod_dl, issn, variante, add_on
HAVING COUNT(*) > 1
);
As MySQL doesn't tend to like DELETEing and SELECTing from the same table in a query, you might have to do some tweaking, something like:
DELETE table
FROM table
WHERE (cod_dl, issn, variante, add_on, data_elab) IN (
SELECT extraLayerOfIndirection.*
FROM (
SELECT cod_dl, issn, variante, add_on, MAX(data_elab) as data_elab_max
FROM table
GROUP BY cod_dl, issn, variante, add_on
HAVING COUNT(*) > 1
) AS extraLayerOfIndirection
);
Also, it's not quite the same, but you might want to consider this instead:
DELETE table
FROM table
WHERE (cod_dl, issn, variante, add_on, data_elab) NOT IN (
SELECT extraLayerOfIndirection.*
FROM (
SELECT cod_dl, issn, variante, add_on, MIN(data_elab) as data_elab_max
FROM table
GROUP BY cod_dl, issn, variante, add_on
) AS extraLayerOfIndirection
);
Instead of only deleting the last of each grouping, this deletes all but the first for each grouping. If you have a lot of repeats, and only want to preserve the first for each anyway, this could result in a much smaller result from the subquery.
I have a table which is having two columns msisdn,points.I require to display the max points in table and points corresponding to a particular msisdn through a single query.The query that i am using is based on sub queries and i don't think so that it is the most efficient way to do this.Guys kindly share an alternative optimized single query for this.
Table Structure:
CREATE TABLE `tbl_121314_point_base` (
`msisdn` bigint(12) NOT NULL DEFAULT '0',
`points` int(10) NOT NULL DEFAULT 0,
KEY `msisdn` (`msisdn`)
) ENGINE=INnoDB;
Current Query:
select (
select max(points) from tbl_121314_point_base ) as max_points,
(select points from tbl_121314_point_base where msisdn = 9024317476) as ori_points
from tbl_121314_point_base limit 1;
Another way you can rewrite your query using cross join use EXPLAIN plan to see performance of both queries
select p.points ori_points ,
t.max_points
from tbl_121314_point_base p
where p.msisdn = 9024317476
cross join(select max(points) max_points
from tbl_121314_point_base ) t
limit 1;
I have the following tables:
log - stores info about each interaction. Indexes on clickID (unique) and businessID (not unique).
actions - stores info about each specific action taken by a customer. indexes on clickID, actionID, personID, businessID
customers - stores info about each specific customer of a specific business. indexes on personID and businessID (neither is unique, but the combo of the two together will be)
people - stores universal stats about each person who is a customer of one or more businesses. Index on personID (unique).
I need to get all of this info in one result set to pull data from, so that I can connect interactions to individual people's data, and their business-specific data.
I am currently using two datasets, that I correlate in PHP, but I'd prefer to work from one returned dataset, if it makes sense.
Here is my current set of queries:
SELECT * FROM `log`
WHERE `timestamp` >= STARTTIME AND `timestamp` <= ENDTIME AND `pageID`='aPageID' AND `businessID`='aBusinessID'
ORDER BY `timestamp` DESC
SELECT * FROM `actions` AS `t1`
INNER JOIN `people` AS `t2` ON (`t1.personID`=`t2.personID`)
INNER JOIN `customers` AS `t2` ON (`t1.personID`=`t3.personID` AND `t1.businessID`=`t3.businessID`)
WHERE `timestamp` >= STARTTIME AND `timestamp` <= ENDTIME AND `pageID`='aPageID' AND `businessID`='aBusinessID'
ORDER BY `timestamp` DESC
It seems like I'd to better with one query where the actionID (and all following results) might be null, but I don't really know what that would look like, or how it would impact performance. Help?
SELECT * FROM `log` AS t1
INNER JOIN `actions` AS t2 ON (t1.`clickID`=t2.`clickID` AND t1.`businessID`=t2.`businessID`)
INNER JOIN `customers` AS t3 ON (t1.`businessID`=t3.`BusinessID` AND t2.`personID`=t3.`personID`)
INNER JOIN `people` AS t4 ON (t2.`personID`=t4.`personID`)
WHERE `timestamp` >= STARTTIME AND `timestamp` <= ENDTIME AND `pageID`='aPageID' AND `businessID`='aBusinessID'
ORDER BY `timestamp` DESC
I am having trouble debugging a slow query, when I take it appart they perform relatively fast, let me break it down for you:
The first query, which is my subquery, groups all rows by lmu_id (currently only 2 unique ones) and returns the max(id) in other words the last inserted row.
SELECT max(id) FROM `position` GROUP by lmu_id
-> 15055,15091
2 total, Query took 0.0030 seconds
The outer query retrieves the full row of those two positions, so here I've manually inserted the ids (15055,15091)
SELECT * FROM `position` WHERE id IN (15055,15091)
2 total, Query took 0.1169 sec
Not the fastest query, but still a bink of an eye.
Now my problem is I do not understand why if I combine these two queries the whole system crashes:
SELECT * FROM `position` AS p1 WHERE p1.id IN (SELECT max(id) FROM `position` AS p2 GROUP by p2.lmu_id)
takes forever, 100% cpu, crashing, lost patience after 2 minutes, service mysql restart
For your reference I did an explain of the query
EXPLAIN SELECT * FROM `position` AS p1 WHERE p1.id IN (SELECT max(p2.id) FROM `position` AS p2 GROUP by p2.lmu_id)
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY p1 ALL NULL NULL NULL NULL 7613 Using where
2 DEPENDENT SUBQUERY p2 index NULL position_lmu_id_index 5 NULL 1268 Using index
id is the primary key, and lmu_id is a foreign key and also indexed.
I'm really stumped. Why is the final query taking so long/crashing? What other things shoud I look in to?
Joins can work too.
SELECT *
FROM `position` AS p1
INNER JOIN (SELECT max(id) FROM `position` GROUP by lmu_id) p2 on (p1.id = p2.id)
Scott's answer is good too, as I find EXISTS tends to run quite fast as well. In general, avoid IN.
Also try
SELECT *
FROM `position` AS p1
GROUP BY p1.lmu_id
HAVING p1.id = (SELECT max(id) FROM `position` where lmu_id = p1.lmu_id)
I've found that using EXISTS runs much faster than IN sub selects.
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
Subqueries with EXISTS vs IN - MySQL