I have a MySQL table that looks (very simplified) like this:
CREATE TABLE `logging` (
`id` bigint(20) NOT NULL,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`level` smallint(3) NOT NULL,
`message` longtext CHARACTER SET utf8 COLLATE utf8_general_mysql500_ci NOT NULL
);
I would like to delete all rows of a specific level, except the last one (time is most recent).
Is there a way to select all rows with level set to a specific value and then delete all rows except the latest one in one single SQL query? How would I start solving this problem?
(As I said, this is a very simplified table, so please don't try to discuss possible design problems of this table. I removed some columns. It is designed per PSR-3 logging standard and I don't think there is an easy way to change that. What I want to solve is how I can select from a table and then delete all but some rows of the same table. I have only intermediate knowledge of MySQL.)
Thank you for pushing me in the right direction :)
Edit:
The Database version is /usr/sbin/mysqld Ver 8.0.18-0ubuntu0.19.10.1 for Linux on x86_64 ((Ubuntu))
You can use ROW_NUMBER() analytic function ( as using DB version 8+ ) :
DELETE lg FROM `logging` AS lg
WHERE lg.`id` IN
( SELECT t.`id`
FROM
(
SELECT t.*,
ROW_NUMBER() OVER (ORDER BY `time` DESC) as rn
FROM `logging` t
-- WHERE `level` = #lvl -- optionally add this line to restrict for a spesific value of `level`
) t
WHERE t.rn > 1
)
to delete all of the rows except the last inserted one(considering id is your primary key column).
You can do this:
SELECT COUNT(time) FROM logging WHERE level=some_level INTO #TIME_COUNT;
SET #TIME_COUNT = #TIME_COUNT-1;
PREPARE STMT FROM 'DELETE FROM logging WHERE level=some_level ORDER BY time ASC LIMIT ?;';
EXECUTE STMT USING #TIME_COUNT;
If you have an AUTO_INCREMENT id column - I would use it to determine the most recent entry. Here is one way doing that:
delete l
from (
select l1.level, max(id) as id
from logging l1
where l1.level = #level
) m
join logging l
on l.level = m.level
and l.id < m.id
An index on (level) should give you good performance and will support the MAX() subquery as well as the JOIN.
View on DB Fiddle
If you really need to use the time column, you can modify the query as follows:
delete l
from (
select l1.level, l1.id
from logging l1
where l1.level = #level
order by l1.time desc, l1.id desc
limit 1
) m
join logging l
on l.level = m.level
and l.id <> m.id
View on DB Fiddle
Here you would want to have an index on (level, time).
Related
I've got 2 mysql 5.7 databases hosted on the same server (we're migrating from 1 structure to another)
I want to delete all the rows from database1.table_x where the there is a corresponding row in database2.table_y
The column which contains the data to match on is called code
I'm able to do a SELECT which returns everything that is expected - this is effectively the set of data I want to delete.
An example select would be:
SELECT *
FROM `database1`.`table_x`
WHERE `code` NOT IN (SELECT `code`
FROM `database2`.`table_y`);
This works and it returns 5 rows within 138ms.
--
However, If I change the SELECT to a DELETE e.g.
DELETE
FROM `database1`.`table_x`
WHERE `code` NOT IN (SELECT `code`
FROM `database2`.`table_y`);
The query seems to hang - there are no errors returned, so I have to manually cancel the query after about 3 minutes.
--
Could anyone advise the most efficient/fastest way to achieve this?
try like below it will work
DELETE FROM table_a WHERE `code` NOT IN (
select * from
(
SELECT `code` FROM `second_database`.`table_b`
) as t
);
Try the following query:
DELETE a
FROM first_database.table_a AS a
LEFT JOIN second_database.table_b AS b ON b.code = a.code
WHERE b.code IS NULL;
I have a table containing user to user messages. A conversation has all messages between two users. I am trying to get a list of all the different conversations and display only the last message sent in the listing.
I am able to do this with a SQL sub-query in FROM.
CREATE TABLE `messages` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`from_user_id` bigint(20) DEFAULT NULL,
`to_user_id` bigint(20) DEFAULT NULL,
`type` smallint(6) NOT NULL,
`is_read` tinyint(1) NOT NULL,
`is_deleted` tinyint(1) NOT NULL,
`text` longtext COLLATE utf8_unicode_ci NOT NULL,
`heading` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at_utc` datetime DEFAULT NULL,
`read_at_utc` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
);
SELECT * FROM
(SELECT * FROM `messages` WHERE TYPE = 1 AND
(from_user_id = 22 OR to_user_id = 22)
ORDER BY created_at_utc DESC
) tb
GROUP BY from_user_id, to_user_id;
SQL Fiddle:
http://www.sqlfiddle.com/#!2/845275/2
Is there a way to do this without a sub-query?
(writing a DQL which supports sub-queries only in 'IN')
You seem to be trying to get the last contents of messages to or from user 22 with type = 1. Your method is explicitly not guaranteed to work, because the extra columns (not in the group by) can come from arbitrary rows. As explained in the [documentation][1]:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
The query that you want is more along the lines of this (assuming that you have an auto-incrementing id column for messages):
select m.*
from (select m.from_user_id, m.to_user_id, max(m.id) as max_id
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22)
) lm join
messages m
on lm.max_id = m.id;
Or this:
select m.*
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22) and
not exists (select 1
from messages m2
where m2.type = m.type and m2.from_user_id = m.from_user_id and
m2.to_user_id = m.to_user_id and
m2.created_at_utc > m.created_at_utc
);
For this latter query, an index on messages(type, from_user_id, to_user_id, created_at_utc) would help performance.
Since this is a rather specific type of data query which goes outside common ORM use cases, DQL isn't really fit for this - it's optimized for walking well-defined relationships.
For your case however Doctrine fully supports native SQL with result set mapping. Using a NativeQuery with ResultSetMapping like this you can easily use the subquery this problem requires, and still map the results on native Doctrine entities, allowing you to still profit from all caching, usability and performance advantages.
Samples found here.
If you mean to get all conversations and all their last messages, then a subquery is necessary.
SELECT a.* FROM messages a
INNER JOIN (
SELECT
MAX(created_at_utc) as max_created,
from_user_id,
to_user_id
FROM messages
GROUP BY from_user_id, to_user_id
) b ON a.created_at_utc = b.max_created
AND a.from_user_id = b.from_user_id
AND a.to_user_id = b.to_user_id
And you could append the where condition as you like.
THE SQL FIDDLE.
I don't think your original query was even doing this correctly. Not sure what the GROUP BY was being used for other than maybe try to only return a single (unpredictable) result.
Just add a limit clause:
SELECT * FROM `messages`
WHERE `type` = 1 AND
(`from_user_id` = 22 OR `to_user_id` = 22)
ORDER BY `created_at_utc` DESC
LIMIT 1
For optimum query performance you need indexes on the following fields:
type
from_user_id
to_user_id
created_at_utc
I have a query that I'm testing on my database, but for some weird reason, and randomly, it returns a different set of results. Interestingly, there are only two distinct result-sets that it returns, from thousands of rows, and the query will randomly return one or the other, but nothing else.
Is there a reason the query only returns one of two datasets? Query and schema below.
My goal is to select the fastest laps for a given track, in a given time period, but only the fastest lap for each user (so there are always 10 different users in the top 10).
Most of the time the correct results are returned, but randomly, a totally different result set is returned.
SELECT `lap`.`ID`, `lap`.`qualificationTime`, `lap`.`userId`
FROM `lap`
WHERE (lap.trackID =4)
AND (lap.raceDateTime >= "2013-07-25 10:00:00")
AND (lap.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
ORDER BY `qualificationTime` ASC
LIMIT 10
Schema:
CREATE TABLE IF NOT EXISTS `lap` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`userId` int(11) DEFAULT NULL,
`trackId` int(11) DEFAULT NULL,
`raceDateTime` datetime NOT NULL,
`qualificationTime` decimal(7,4) DEFAULT '0.0000',
`isTestLap` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`)
(DB create script trimmed of un-needed columns)
You are using a (mis)feature of MySQL called hidden columns. As others have pointed out, you are allowed to put columns in the select statement that are not in the group by. But, the returned values are arbitrary, and not even guaranteed to be the same from one run to the next.
The solution is to find the max qualification time for each user. Then join this information back to get the other fields. Here is one way:
select l.*
from (SELECT userId, min(qualificationtime) as minqf
FROM lap
WHERE (lap.trackID =4)
AND (lap.raceDateTime >= "2013-07-25 10:00:00")
AND (lap.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
) lu join
lap l
on lu.minqf = l.qualificationtime
ORDER BY l.`qualificationTime` ASC
LIMIT 10
You are selecting lap.ID, lap.qualificationTime and lap.userId, but you are not GROUPing BY them. You can only select fields you group by, or else aggregate functions on the other fields (MIN, MAX, AVG, etc). Otherwise, results are undefined.
I think you mean that sometimes values for lap.ID, lap.qualificationTime are different. And it's right behaviour for mysql. Because you group by userId and you don't know what values for other fields will be returned. Mysql can select different values depend on first value or last rows reading.
I would check something like this:
SELECT `l1`.`qualificationTime`, `l1`.`userId`,
(SELECT l2.ID FROM `lap` AS l2 WHERE l2.`userId` = l1.userId AND
l2.qualificationTime = min(l1.`qualificationTime`))
FROM `lap` AS `l1`
WHERE (l1.trackID =4)
AND (l1.raceDateTime >= "2013-07-25 10:00:00")
AND (l1.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
ORDER BY `qualificationTime` ASC
LIMIT 10
It's likely to be your ORDER BY on a decimal entity, and how the DB stores this and then retrieves it.
I found some strange(for me) behavour in MySQL. I have a simple query:
SELECT CONVERT( `text`.`old_text`
USING utf8 ) AS stext
FROM `text`
WHERE `text`.`old_id` IN
(
SELECT `revision`.`rev_text_id`
FROM `revision`
WHERE `revision`.`rev_id`
IN
(
SELECT `page_latest`
FROM `page`
WHERE `page_id` = 108
)
)
when i run it, phpmyadmin show execution time of 77.0446 seconds.
But then i replace
WHERE `text`.`old_id` IN
by
WHERE `text`.`old_id` =
it's execution time falls to about 0.001 sec. Result of this query
SELECT `revision`.`rev_text_id`
FROM `revision`
WHERE `revision`.`rev_id`
IN
(
SELECT `page_latest`
FROM `page`
WHERE `page_id` = 108
)
is
+------------+
|rev_text_id |
+------------+
|6506 |
+------------+
Can somebody please explain this behavour?
try to add INDEX on the following columns,
ALTER TABLE `text` ADD INDEX idx_text (old_id);
ALTER TABLE `revision` ADD INDEX idx_revision (rev_text_id);
and Execute the following query
SELECT DISTINCT CONVERT(a.`old_text` USING utf8 ) AS stext
FROM `text` a
INNER JOIN `revision` b
ON a.`old_id` = b.`rev_text_id`
INNER JOIN `page` c
ON b.`rev_id` = c.`page_latest`
WHERE c.`page_id` = 108
PS: Can you run also the following query and post their respective results?
DESC `text`;
DESC `revision`;
DESC `page`;
There are two primary ways you can increase your query performance here
Add Indexes (such as Kuya mentioned)
Rid yourself of the subqueries where possible
For Indexes, add an index on the columns you are searching for your matches:
text.old_id, revision.rev_text_id & page.page_id
ALTER TABLE `text` ADD INDEX idx_text (old_id);
ALTER TABLE `revision` ADD INDEX idx_revision (rev_text_id);
ALTER TABLE `page` ADD INDEX idx_page (page_id);
Your next issue is that nested-sub-selects are hell on your query execution plan. Here is a good thread discussing JOIN vs Subquery. Here is an article on how to get execution plan info from mySQL.
First looks at an execution plan can be confusing, but it will be your best friend when you have to concern yourself with query optimization.
Here is an example of your same query with just joins ( you could use inner or left and get pretty much the same result). I don't have your tables or data, so forgive synax issues (there is no way I can verify the code works verbatim in your environment, but it should give you a good starting point).
SELECT
CONVERT( `text`.`old_text` USING utf8 ) AS stext
FROM `text`
-- inner join only returns rows when it can find a
-- matching `revision`.`rev_text_id` row to `text`.`old_id`
INNER JOIN `revision`
ON `text`.`old_id` = `revision`.`rev_text_id`
-- Inner Join only returns rows when it can find a
-- matching `page_latest` row to `page_latest`
INNER JOIN `page`
ON `revision`.`rev_id` = `page`.`page_latest`
WHERE `page`.`page_id` = 108
MySQLDB is looping through each result of the inner query and comparing it with each record in the outer query.
in the second inner query;
WHERE `revision`.`rev_id`
IN
( SELECT `page_latest`
FROM `page`
WHERE `page_id` = 108
you should definitely use '=' instead of IN, since you're selecting a distinct record, there would be no point in looping through a result when you know only one record will be returned each time
What I have is a table statistieken with an ip, hash of browser info, url visited and last visited date in timestamp.
What I could compile from different sources led to this query, the only problem is that this query takes forever(9 minutes) to complete on a table with about 15000 rows, so this query is very inefficient.
I think I'm going to this the wrong way around, but I can't find a decent post or tutorial how to use the results of a select as basis for getting the results I want.
What I simply want is an overview of every entry in the table that matches the hash of the results that are returned that have visted more than 25 pages in the last 12 hours.
CREATE TABLE IF NOT EXISTS `statsitieken` (
`hash` varchar(35) NOT NULL,
`ip` varchar(24) NOT NULL,
`visits` int(11) NOT NULL,
`lastvisit` int(11) NOT NULL,
`browserinfo` text NOT NULL,
`urls` text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This is the query I have tried to compile so far.
SELECT * FROM `database`.`statsitieken` WHERE hash in (SELECT hash FROM `database`.`statsitieken`
where `lastvisit` > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
)
group by hash
having count(urls) > 25
order by urls)
I need this to compile in a decent time, like < 1 second which should be possible in my opinion...
I suggest trying this modified query. The subquery is now computed only once instead of being run for each record returned:
SELECT s.*
FROM `database`.`statsitieken` s, (SELECT *
FROM `database`.`statsitieken`
WHERE `lastvisit` > UNIX_TIMESTAMP(DATE_SUB(NOW(),INTERVAL 12 HOUR))
GROUP BY hash
HAVING COUNT(urls)>25) tmp
WHERE s.`hash`=tmp.`hash`
ORDER BY s.urls
Be sure you have indexes on the following fields:
hash to speed up the GROUP BY and WHERE
urls to speed up the ORDER BY
Derived table with INNER JOIN is faster than a subquery. try this optimized query:
SELECT *
FROM statsitieken a
INNER JOIN (SELECT hash
FROM statsitieken
WHERE lastvisit > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
) b
ON a.hash = b.hash
GROUP BY a.hash
HAVING COUNT(urls) > 25
ORDER BY urls;
For better performance of this select query you should add indexes as:
ALTER TABLE statsitieken ADD KEY ix_hash(hash);
ALTER TABLE statsitieken ADD KEY ix_lastvisit(lastvisit);
WHERE hash in (SELECT hash FROM `database`.`statsitieken`
where `lastvisit` > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
)
You are "subquerying" (i don't know if exists that word :P, 'doing a subquery') in the same table, why not to:
where `lastvisit` > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
do it directly?