i have table data like this:
id,time,otherdata
a,1,fsdfas
a,2,fasdfag
a,3,fasdfas
a,7,asfdsaf
b,8,fasdf
a,8,asdfasd
a,9,afsadfa
b,10,fasdf
...
so essentially, i can select all the data in the order i want by saying something like:
select * from mytable ordered by id,time;
so i get all the records in the order i want, sorted by id first, and then by time. but instead of getting all the records, i need the latest 3 times for each id.
Answer:
Well, I figured out how to do it. I'm surprised at how quick it was, as I'm operating on a couple million rows of data and it took about 11 seconds. I wrote a procedure in a sql script to do it, and here's what it looks like. --Note that instead of getting the last 3, it gets the last "n" number of rows of data.
use my_database;
drop procedure if exists getLastN;
drop table if exists lastN;
-- Create a procedure that gets the last three records for each id
delimiter //
create procedure getLastN(n int)
begin
# Declare cursor for data iterations, and variables for storage
declare idData varchar(32);
declare done int default 0;
declare curs cursor for select distinct id from my_table;
declare continue handler for not found set done = 1;
open curs;
# Create a temporary table to contain our results
create temporary table lastN like my_table;
# Iterate through each id
DATA_LOOP: loop
if done then leave DATA_LOOP; end if;
fetch curs into idData;
insert into lastThree select * from my_table where id = idData order by time desc limit n;
end loop;
end//
delimiter ;
call getLastN(3);
select * from lastN;
sorry if this doesn't exactly work, I've had to change variable names and stuff to obfuscate my work's work, but i ran this exact piece of code and got what i needed!
I think it's as simple as:
SELECT * FROM `mytable`
GROUP BY `id`
ORDER BY `time` DESC
LIMIT 3
Two approaches that I'm aware of are (1) to use a set of unions, each one containing a "limit 3", or (2) to use a temporary variable. These approaches, along with other useful links and discussion can be found here.
Try this:
select *
from mytable as m1
where (
select count(*) from mytable as m2
where m1.id = m2.id
) <= 3 ORDER BY id, time
Related
So I have 2 tables, communication,and movement.
communication has columns fromID,timestamp that has ID of caller, and time the call was made. Then I have another table movement that has ID,timestamp,x,y, that has the ID of a person, their location (x,y), and the time that they are at that location.
I want to write a query that looks something like this:
For every single row of communication(R)
SELECT * FROM movement m
WHERE m.ID = R.fromID && m.timestamp <= R.timestamp
ORDER BY timestamp
Basically, what this is doing is finding the closest movement timestamp for a given communication timestamp. After that, eventually, I want to find the location (x,y) of a call, based on the movement data.
How would I do this? I know there's a set based approach, but I don't want to do it that way. I looked into cursors, but I get the feeling that the performance is terrible on that.
So is there anyway to do this with a loop? I essentially want to loop through every single row of the communication, and get the result.
I tried something like this:
DELMITER $$
CREATE PROCEDURE findClosestTimestamp()
BEGIN
DECLARE commRowCount DEFAULT 0;
DECLARE i DEFAULT 0;
DECLARE ctimestamp DEFAULT 0;
SELECT COUNT(*) FROM communication INTO commRowCount;
SET i = 0;
WHILE i < commRowCount DO
SELECT timestamp INTO ctimestamp FROM communication c
SELECT * FROM movement m
WHERE m.vID = c.fromID && m.timestamp <= R.timestamp
END$$
DELIMITER ;
But I know that's completely wrong.
Is the only way to do this cursors? I just can't find an example of this anywhere on the internet, and I'm completely new to procedures in SQL.
Any guidance would be greatly appreciated, thank you!!
Let's see if I can point you in the right direction using cursors:
delimiter $$
create procedure findClosestTimeStamp()
begin
-- Variables to hold values from the communications table
declare cFromId int;
declare cTimeStamp datetime;
-- Variables related to cursor:
-- 1. 'done' will be used to check if all the rows in the cursor
-- have been read
-- 2. 'curComm' will be the cursor: it will fetch each row
-- 3. The 'continue' handler will update the 'done' variable
declare done int default false;
declare curComm cursor for
select fromId, timestamp from communication; -- This is the query used by the cursor.
declare continue handler for not found -- This handler will be executed if no row is found in the cursor (for example, if all rows have been read).
set done = true;
-- Open the cursor: This will put the cursor on the first row of its
-- rowset.
open curComm;
-- Begin the loop (that 'loop_comm' is a label for the loop)
loop_comm: loop
-- When you fetch a row from the cursor, the data from the current
-- row is read into the variables, and the cursor advances to the
-- next row. If there's no next row, the 'continue handler for not found'
-- will set the 'done' variable to 'TRUE'
fetch curComm into cFromId, cTimeStamp;
-- Exit the loop if you're done
if done then
leave loop_comm;
end if;
-- Execute your desired query.
-- As an example, I'm putting a SELECT statement, but it may be
-- anything.
select *
from movement as m
where m.vID = cFromId and m.timeStamp <= cTimeStamp
order by timestampdiff(SECOND, cTimeStamp, m.timeStamp)
limit 1;
end loop;
-- Don't forget to close the cursor when you finish
close curComm;
end $$
delimiter ;
References:
MySQL Reference: Cursors
MySQL Reference: Date and time functions - timestampdiff()
I have a table name tbl_tmp_trans
it contains every user transactions ever done ( and it's up to 6Mil right now !)
we have decided to keep only last 100 transaction per user in our database so we could keep the db clean
here is a query that i have came up with
delete from tbl_tmp_trans
where trans_id in
(
select trans_id
from
(
select trans_id
from tbl_faucets_transactions
order by date
group by user_id
limit 100
) foo
)
what am i doing wrong?
because after doing this my cpu reach 100% and mysql crashed.
Thanks in advance
P.S: our db is Mysql and table engine is Innodb
P.S2: We have about 120k and transction table have near 6 million record
I have a proposal... Hopefully, it might help you.
Alter your table:
alter table tbl_tmp_trans add column todel tinyint(1);
Implement a stored procedure to iterate through the table with a cursor and mark (set todel to 1) records that should be deleted. Example procedure to do that:
delimiter //
drop procedure if exists mark_old_transactions //
create procedure mark_old_transactions()
begin
declare done int default false;
declare tid int;
declare uid int;
declare last_uid int default 0;
declare count int default 0;
declare cur cursor for select trans_id, user_id from tbl_tmp_trans order by user_id, date desc;
declare continue handler for not found set done = true;
open cur;
repeat
fetch cur into tid, uid;
if (!done) then
if (uid!=last_uid) then
set count = 0;
end if;
set last_uid = uid;
set count = count + 1;
if (count > 100) then
update tbl_tmp_trans set todel=1 where trans_id=tid;
end if;
end if;
until done
end repeat;
close cur;
end //
Invoke the procedure, maybe do some simple checks (how many transactions you delete from the table, etc.), and delete the marked records.
call mark_old_transactions;
-- select count(*) from tbl_tmp_trans where todel=1;
-- select count(*) from tbl_tmp_trans;
delete from tbl_tmp_trans where todel=1;
Finally, remove the column that we just added.
alter table tbl_tmp_trans drop column todel;
Some notes:
Probably you have to iterate through all the records of the table
anyway, so you don't loose performance with the cursor.
If you have ~120K users and ~6M transactions, you have ~50 transactions per user on average. Which means, that probably you don't really
have too many users with transactions over 100, so the number of
updates (hopefully) won't be too many. => the procedure runs relatively fast.
Delete should be fast again with the new column.
This stored procedure that I'm working on errors out some times. I am getting a Result consisted of more than one row error, but only for certain JOB_ID_INPUT values. I understand what causes this error, and so I have tried to be really careful to make sure that my return values are scalar when they should be. Its tough to see into the stored procedure, so I'm not sure where the error could be generated. Since the error is thrown conditionally, it has me thinking memory could be an issue, or cursor reuse. I don't work with cursors that often so I'm not sure. Thank you to anyone who helps.
DROP PROCEDURE IF EXISTS export_job_candidates;
DELIMITER $$
CREATE PROCEDURE export_job_candidates (IN JOB_ID_INPUT INT(11))
BEGIN
DECLARE candidate_count INT(11) DEFAULT 0;
DECLARE candidate_id INT(11) DEFAULT 0;
# these are the ib variables
DECLARE _overall_score DECIMAL(5, 2) DEFAULT 0.0;
# declare the cursor that will be needed for this SP
DECLARE curs CURSOR FOR SELECT user_id FROM job_application WHERE job_id = JOB_ID_INPUT;
# this table stores all of the data that will be returned from the various tables that will be joined together to build the final export
CREATE TEMPORARY TABLE IF NOT EXISTS candidate_stats_temp_table (
overall_score_ib DECIMAL(5, 2) DEFAULT 0.0
) engine = memory;
SELECT COUNT(job_application.id) INTO candidate_count FROM job_application WHERE job_id = JOB_ID_INPUT;
OPEN curs;
# loop controlling the insert of data into the temp table that is retuned by this function
insert_loop: LOOP
# end the loop if there is no more computation that needs to be done
IF candidate_count = 0 THEN
LEAVE insert_loop;
END IF;
FETCH curs INTO candidate_id;
# get the ib data that may exist for this user
SELECT
tests.overall_score
INTO
_overall_score
FROM
tests
WHERE
user_id = candidate_id;
#build the insert for the table that is being constructed via this loop
INSERT INTO candidate_stats_temp_table (
overall_score
) VALUES (
_overall_score
);
SET candidate_count = candidate_count - 1;
END LOOP;
CLOSE curs;
SELECT * FROM candidate_stats_temp_table WHERE 1;
END $$
DELIMITER ;
The WHERE 1 (as pointed out by #cdonner) definitely doesn't look right, but I'm pretty sure this error is happening because one of your SELECT ... INTO commands is returning more than one row.
This one should be OK because it's an aggregate without a GROUP BY, which always returns one row:
SELECT COUNT(job_application.id) INTO candidate_count
FROM job_application WHERE job_id = JOB_ID_INPUT;
So it's probably this one:
# get the ib data that may exist for this user
SELECT
tests.overall_score
INTO
_overall_score
FROM
tests
WHERE
user_id = candidate_id;
Try to figure out if it's possible for this query to return more than one row, and if so, how do you work around it. One way might be to MAX the overall score:
SELECT MAX(tests.overall_sore) INTO _overall_score
FROM tests
WHERE user_id = candidate_id
I think you want to use
LIMIT 1
in your select, not
WHERE 1
Aside from using this safety net, you should understand your data to figure out why you are getting multiple results. Without seeing the data, it is difficult for me to take a guess.
I am trying to combine these two queries in twisted python:
SELECT * FROM table WHERE group_id = 1013 and time > 100;
and:
UPDATE table SET time = 0 WHERE group_id = 1013 and time > 100
into a single query. Is it possible to do so?
I tried putting the SELECT in a sub query, but I don't think the whole query returns me what I want.
Is there a way to do this? (even better, without a sub query)
Or do I just have to stick with two queries?
Thank You,
Quan
Apparently mysql does have something that might be of use, especially if you are only updating one row.
This example is from: http://lists.mysql.com/mysql/219882
UPDATE mytable SET
mycolumn = #mycolumn := mycolumn + 1
WHERE mykey = 'dante';
SELECT #mycolumn;
I've never tried this though, but do let me know how you get on.
This is really late to the party, but I had this same problem, and the solution I found most helpful was the following:
SET #uids := null;
UPDATE footable
SET foo = 'bar'
WHERE fooid > 5
AND ( SELECT #uids := CONCAT_WS(',', fooid, #uids) );
SELECT #uids;
from https://gist.github.com/PieterScheffers/189cad9510d304118c33135965e9cddb
You can't combine these queries directly. But you can write a stored procedure that executes both queries. example:
delimiter |
create procedure upd_select(IN group INT, IN time INT)
begin
UPDATE table SET time = 0 WHERE group_id = #group and time > #time;
SELECT * FROM table WHERE group_id = #group and time > #time;
end;
|
delimiter ;
So what you're trying to do is reset time to zero whenever you access a row -- sort of like a trigger, but MySQL cannot do triggers after SELECT.
Probably the best way to do it with one server request from the app is to write a stored procedure that updates and then returns the row. If it's very important to have the two occur together, wrap the two statements in a transaction.
There is a faster version of the return of updated rows, and more correct when dealing with highly loaded system asks for the execution of the query at the same time on the same database server
update table_name WITH (UPDLOCK, READPAST)
SET state = 1
OUTPUT inserted.
UPDATE tab SET column=value RETURNING column1,column2...
I'm trying to loop over selected slugs and execute little complicated INSERT INTO SELECT query.
slugs[iteration] usage is not a correct mysql syntax. But I have to access fetched slugs one by one inside the query. How Could I achieve that ?
DELIMITER $$
CREATE PROCEDURE create_sitemap_from_slugs()
BEGIN
SELECT `slug` INTO slugs FROM slug_table;
SELECT COUNT(*) INTO count FROM slug_table;
SET iteration = 0;
START TRANSACTION;
WHILE iteration < count DO
INSERT INTO line_combinations
SELECT REPLACE(`line`, '{a}', slugs[iteration]) AS `line`
FROM line_combinations
WHERE `line` LIKE CONCAT('%/', '{a}', '%');
SET iteration = iteration + 1;
END WHILE;
COMMIT;
END
$$
DELIMITER ;
Btw, I don't want to use any external programming language to make this, this procedure will be working for billions of rows. I read Loops in SQL is not a good way due to performance concerns.
If you suggest another way I would accept this also.
I asked another detailed question but couldn't get an answer. if you would like to check that also : https://stackoverflow.com/questions/35320494/fetch-placeholders-from-table-and-place-into-generated-line-combination-pattern
So for each line with {a} you need to insert COUNT(*) from slug_table times values filled with slug value.
It seems you can do that just in one INSERT from SELECT
INSERT INTO line_combinations
(SELECT REPLACE(lc.line, '{a}', st.slug) AS `line`
FROM line_combinations lc, slug_table st
WHERE lc.line LIKE CONCAT('%/', '{a}', '%');
UPDATE:
You can create a temp table line_combinations2 and insert all the records
FROM line_combinations
WHERE line LIKE CONCAT('%/', '{a}', '%')
into the temp table. Then just use the temp table in the INSERT instead of original one