MySQL : selecting the X smallest values

MySQL : selecting the X smallest values - mysql

Let be a table like this :
CREATE TABLE `amoreAgentTST01` (
`moname` char(64) NOT NULL DEFAULT '',
`updatetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`data` longblob,
PRIMARY KEY (`moname`,`updatetime`)
I have a query to find the oldest records for each distinct 'moname', but only if there are multiple records for this 'moname' :
SELECT moname, updatetime FROM amoreAgentTST01 a
WHERE (SELECT count(*) FROM amoreAgentTST01 x WHERE x.moname = a.moname) > 1
AND a.updatetime = (SELECT min(updatetime) FROM amoreAgentTST01 y WHERE y.moname = a.moname) ;
My question is : how to do the same but selecting the X oldest values ?
I now simply run this, delete the oldest values and rerun it... which is not so nice.
Seconds question is : what do you think of the above query ? can it be improved ? is there any obvious bad practice ?
Thank you in advance for your advices and help.
Barth

Would something like this work (untested):
SELECT moname, MIN(updatetime) FROM amoreAgentTST01
GROUP BY moname HAVING COUNT(moname)>1
Edit - the above is meant only as a replacement for your existing code, so it doesn't directly answer your question.
I think something like this should work for your main question:
SELECT moname, updatetime FROM amoreAgentTST01
GROUP BY moname, updatetime
HAVING COUNT(moname)>1
ORDER BY updatetime LIMIT 0, 10
Edit - sorry, the above won't work because it's returning only 10 records for all the monames - rather than the 10 oldest for each. Let me have a think.
One more go at this (admittedly, this one looks a bit convoluted):
SELECT a.moname, a.updatetime FROM amoreAgentTST01 a
WHERE EXISTS
(SELECT * FROM amoreAgentTST01 b
WHERE a.moname = b.moname AND a.updatetime = b.updatetime
ORDER BY b.updatetime LIMIT 0, 10)
AND (SELECT COUNT(*) FROM amoreAgentTST01 x WHERE x.moname = a.moname) > 1
I should add that if there is an ID column - generally the primary key- then that should be used for the sub-query joins for improved performance.

Related

Conditionally update column from another column in same table

I've looked through SO, and there are similar questions, but I can't seem to figure out how to do what I need.
For the purposes of this question, my table has 3 columns: reconciled (tinyint), datereconciled (timestamp, CAN BE NULL), and dateadded (timestamp).
For my code logic, if reconciled==1, there should be a timestamp in datereconciled, but I recently noticed that wasn't always happening. Fixed the code, but now have a lot of NULL values in datereconciled where there should be a timestamp. So, for all rows where reconciled==1 AND datereconciled==NULL, I would like to "update" the value FROM dateadded INTO datereconciled. If there is already a timestamp in datereconciled, leave it alone. And leave it alone if reconciled==0.

You should be able to use a simple update:
UPDATE YourTable
SET DateReconciled = DateAdded
WHERE DateReconciled IS NULL
AND reconciled = 1;

You basically wrote the query already
UPDATE table SET datereconciled = dateadded
WHERE reconciled = 1
AND datereconciled IS NULL

I figured I'd have to use a select in my update query, so I'm a victim of over-complicating things! However, here is my overly complicated self-discovered answer prior to the answers provided:
UPDATE
`transactions` AS `dest`,
(
SELECT
*
FROM
`transactions`
WHERE
`reconciled` = 1 AND `datereconciled` IS NULL
) AS `src`
SET
`dest`.`datereconciled` = `src`.`dateadded`
;

Explain insert query with select queries inside

I'm refactoring someone's code where queries like this are all over the place and I can't understand what's going on. I have no idea what to search for and "mysql insert with nested select" doesn't yield anything helpful.
INSERT INTO stats_users_hour (
hour,
uid,
lastupdate,
hash1hr
)
SELECT * FROM (SELECT
? as hour,
? as uid,
? as lastupdate,
? as hash1hr) AS tmp
WHERE NOT EXISTS (
SELECT lastupdate FROM stats_users_hour WHERE lastupdate = ? AND uid = ?
) LIMIT 1
ON DUPLICATE KEY UPDATE lastupdate = ?, hash1hr = ?
I'm not even sure how to indent this properly. I'm assuming the first select uses the values yielded from the second select, but where does the second select get it's data from? Is this just an "update or create" query? And are the first select's results used as values for the insert query?
I'm sorry if this is duplicate. Any help is appreciated.
edit: Also the where clause, what query does it apply to? I assume the first select query but not really sure.

This create a Table with one row, which will be inserted, if soem conditions arrive see below
SELECT
? as hour,
? as uid,
? as lastupdate,
? as hash1hr
this will check if such a row exsist already in the Table stats_users_hour (same pid and lastupdate )
WHERE NOT EXISTS (
SELECT lastupdate FROM stats_users_hour WHERE lastupdate = ? AND uid = ?
)
And when such row exists,, so that now new row will be inserted
ON DUPLICATE KEY UPDATE lastupdate = ?, hash1hr = ?
This will update the row with the newest data for lastupdate and hash1hr

Strange query results from MySQL

I have a query that I'm testing on my database, but for some weird reason, and randomly, it returns a different set of results. Interestingly, there are only two distinct result-sets that it returns, from thousands of rows, and the query will randomly return one or the other, but nothing else.
Is there a reason the query only returns one of two datasets? Query and schema below.
My goal is to select the fastest laps for a given track, in a given time period, but only the fastest lap for each user (so there are always 10 different users in the top 10).
Most of the time the correct results are returned, but randomly, a totally different result set is returned.
SELECT `lap`.`ID`, `lap`.`qualificationTime`, `lap`.`userId`
FROM `lap`
WHERE (lap.trackID =4)
AND (lap.raceDateTime >= "2013-07-25 10:00:00")
AND (lap.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
ORDER BY `qualificationTime` ASC
LIMIT 10
Schema:
CREATE TABLE IF NOT EXISTS `lap` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`userId` int(11) DEFAULT NULL,
`trackId` int(11) DEFAULT NULL,
`raceDateTime` datetime NOT NULL,
`qualificationTime` decimal(7,4) DEFAULT '0.0000',
`isTestLap` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`)
(DB create script trimmed of un-needed columns)

You are using a (mis)feature of MySQL called hidden columns. As others have pointed out, you are allowed to put columns in the select statement that are not in the group by. But, the returned values are arbitrary, and not even guaranteed to be the same from one run to the next.
The solution is to find the max qualification time for each user. Then join this information back to get the other fields. Here is one way:
select l.*
from (SELECT userId, min(qualificationtime) as minqf
FROM lap
WHERE (lap.trackID =4)
AND (lap.raceDateTime >= "2013-07-25 10:00:00")
AND (lap.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
) lu join
lap l
on lu.minqf = l.qualificationtime
ORDER BY l.`qualificationTime` ASC
LIMIT 10

You are selecting lap.ID, lap.qualificationTime and lap.userId, but you are not GROUPing BY them. You can only select fields you group by, or else aggregate functions on the other fields (MIN, MAX, AVG, etc). Otherwise, results are undefined.

I think you mean that sometimes values for lap.ID, lap.qualificationTime are different. And it's right behaviour for mysql. Because you group by userId and you don't know what values for other fields will be returned. Mysql can select different values depend on first value or last rows reading.

I would check something like this:
SELECT `l1`.`qualificationTime`, `l1`.`userId`,
(SELECT l2.ID FROM `lap` AS l2 WHERE l2.`userId` = l1.userId AND
l2.qualificationTime = min(l1.`qualificationTime`))
FROM `lap` AS `l1`
WHERE (l1.trackID =4)
AND (l1.raceDateTime >= "2013-07-25 10:00:00")
AND (l1.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
ORDER BY `qualificationTime` ASC
LIMIT 10

It's likely to be your ORDER BY on a decimal entity, and how the DB stores this and then retrieves it.

Generate statistics in MySQL

I have a table with posts and I want to generate a graph that shows how many posts were made the previous last 30 minutes, and the last 30 minutes before that etc. The posts are selected by their post_handler and post_status.
The table structure looks like this.
CREATE TABLE IF NOT EXISTS `posts` (
`post_title` varchar(255) NOT NULL,
`post_content` text NOT NULL,
`post_date_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`post_handler` varchar(255) NOT NULL,
`post_status` tinyint(4) NOT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
KEY `post_status` (`post_status`),
KEY `post_status_2` (`post_status`,`id`),
KEY `post_handler` (`post_handler`),
KEY `post_date_added` (`post_date_added`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2300131 ;
The results I'd like to receive, sorted after post_date_added.
period_start period_end posts
2011-12-06 19:23:44 2011-12-06 19:53:44 10
2011-12-06 19:53:44 2011-12-06 20:23:44 39
2011-12-06 20:23:44 2011-12-06 20:53:44 40
Right now I use solution where I have to run this query many times over, and then insert the data into another table from the PHP script.
SELECT COUNT(*) FROM posts WHERE post_handler = 'test' AND post_status = 1 AND post_date_added BETWEEN '2011-12-06 19:23:44' AND '2011-12-06 19:53:44'
Do you know any other solution? Is there any way to run a query that also inserts results into the database, all in one query?

Its fairly easy to group by distinctive time parameters, like hour, minute, day or whatever. If you want to group this by an hour, a possible query might look like this:
SELECT DATE_FORMAT(post_date_added,"%Y-%m-%d %H") AS "_Date",
COUNT(*)
FROM posts
WHERE post_handler = 'test'
AND post_status = 1
GROUP BY _Date;
(run this with a mysql query tool of your choice to see the output).
However, if you want to consider 30mins as the base of your group, the SQL part will get more tricky. For this special purpose, since you've only have to divide into two different subsets, maybe work with this approach:
SELECT DATE_FORMAT(post_date_added,"%Y-%m-%d %H") AS "_Date",
"00" AS "semihour",
COUNT(*)
FROM posts
WHERE post_handler = 'test'
AND DATE_FORMAT(post_date_added,"%i") < 30
AND post_status = 1
GROUP BY _Date
UNION
SELECT DATE_FORMAT(post_date_added,"%Y-%m-%d %H") AS "_Date",
"30" AS "semihour",
COUNT(*)
FROM posts
WHERE post_handler = 'test'
AND DATE_FORMAT(post_date_added,"%i") >= 30
AND post_status = 1
GROUP BY _Date;
Again, run this with a mysql query tool of your choice to see the output. You could add mathematical distinguishments there too working with CASE or IF and such, but personally I'd either group by hour or minute just to keep the SQL part way easier.
To directly add those numbers into your graph database, use this syntax:
INSERT INTO yourtable (yourfields)
SELECT ...
More details about this can be found here in the MySQL documentation.

In (very) brief: yes, you can insert the results of a query into another table. Take a look at INSERT ... SELECT here: http://dev.mysql.com/doc/refman/5.1/en/insert-select.html
Essentially, you'd just change what you have to something like
INSERT INTO post_statistics_table (period_start, period_end, posts)
SELECT ?, ?, COUNT(*) FROM posts
WHERE post_handler = 'test'
AND post_status = 1
AND post_date_added BETWEEN ? AND ?
and then fill in the four ?s with the same two DATETIMEs, repeated. ($from, $to, $from, $to)

Update with SELECT and group without GROUP BY

I have a table like this (MySQL 5.0.x, MyISAM):
response{id, title, status, ...} (status: 1 new, 3 multi)
I would like to update the status from new (status=1) to multi (status=3) of all the responses if at least 20 have the same title.
I have this one, but it does not work :
UPDATE response SET status = 3 WHERE status = 1 AND title IN (
SELECT title FROM (
SELECT DISTINCT(r.title) FROM response r WHERE EXISTS (
SELECT 1 FROM response spam WHERE spam.title = r.title LIMIT 20, 1)
)
as u)
Please note:
I do the nested select to avoid the famous You can't specify target table 'response' for update in FROM clause
I cannot use GROUP BY for performance reasons. The query cost with a solution using LIMIT is way better (but it is less readable).
EDIT:
It is possible to do SELECT FROM an UPDATE target in MySQL. See solution here
The issue is on the data selected which is totaly wrong.
The only solution I found which works is with a GROUP BY:
UPDATE response SET status = 3
WHERE status = 1 AND title IN (SELECT title
FROM (SELECT title
FROM response
GROUP BY title
HAVING COUNT(1) >= 20)
as derived_response)
Thanks for your help! :)

MySQL doesn't like it when you try to UPDATE and SELECT from the same table in one query. It has to do with locking priorities, etc.
Here's how I would solve this problem:
SELECT CONCAT('UPDATE response SET status = 3 ',
'WHERE status = 1 AND title = ', QUOTE(title), ';') AS sql
FROM response
GROUP BY title
HAVING COUNT(*) >= 20;
This query produces a series of UPDATE statements, with the quoted titles that deserve to be updated embedded. Capture the result and run it as an SQL script.
I understand that GROUP BY in MySQL often incurs a temporary table, and this can be costly. But is that a deal-breaker? How frequently do you need to run this query? Besides, any other solutions are likely to require a temporary table too.
I can think of one way to solve this problem without using GROUP BY:
CREATE TEMPORARY TABLE titlecount (c INTEGER, title VARCHAR(100) PRIMARY KEY);
INSERT INTO titlecount (c, title)
SELECT 1, title FROM response
ON DUPLICATE KEY UPDATE c = c+1;
UPDATE response JOIN titlecount USING (title)
SET response.status = 3
WHERE response.status = 1 AND titlecount.c >= 20;
But this also uses a temporary table, which is why you try to avoid using GROUP BY in the first place.

I would write something straightforward like below
UPDATE `response`, (
SELECT title, count(title) as count from `response`
WHERE status = 1
GROUP BY title
) AS tmp
SET response.status = 3
WHERE status = 1 AND response.title = tmp.title AND count >= 20;
Is using GROUP BY really that slow ? The solution you tried to implement looks like requesting again and again on the same table and should be way slower than using GROUP BY if it worked.

This is a funny peculiarity with MySQL - I can't think of a way to do it in a single statement (GROUP BY or no GROUP BY).
You could select the appropriate response rows into a temporary table first then do the update by selecting from that temp table.

you'll have to use a temporary table:
create temporary table r_update (title varchar(10));
insert r_update
select title
from response
group
by title
having count(*) < 20;
update response r
left outer
join r_update ru
on ru.title = r.title
set status = case when ru.title is null then 3 else 1;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL : selecting the X smallest values - mysql

Related

Conditionally update column from another column in same table

Explain insert query with select queries inside

Strange query results from MySQL

Generate statistics in MySQL

Update with SELECT and group without GROUP BY

Categories

Resources