Update a list of things without hitting every entry - mysql

I have a list in a database that the user should be able to order.
itemname| order value (int)
--------+---------------------
salad | 1
mango | 2
orange | 3
apples | 4
On load from the database, I simply order by order_value.
By drag 'n drop, he should be able to move apples so that it appears at the top of the list..
itemname| order value (int)
--------+---------------------
apples | 4
salad | 1
mango | 2
orange | 3
Ok. So now internally I have to update EVERY LIST ITEM! If the list has 20 or 100 items, that's a lot of updates for a simple drag operation.
itemname| order value (int)
--------+---------------------
apples | 1
salad | 2
mango | 3
orange | 4
I'd rather do it with only one update. One way I thought of is if "internal Order" is a double value.
itemname| order value (double)
--------+---------------------
salad | 1.0
mango | 2.0
orange | 3.0
apples | 4.0
SO after the drag n' drop operation, I assign apples has a value that is less than the item it is to appear in front of:
itemname| order value (double)
--------+---------------------
apples | 0.5
salad | 1.0
mango | 2.0
orange | 3.0
.. and if an item is dragged into the middle somewhere, its order_value is bigger than the one it appears after .. here I moved orange to be between salad and mango:
itemname| order value (double)
--------+---------------------
apples | 0.5
salad | 1.0
orange | 1.5
mango | 2.0
Any thoughts on better ways to do this?

Suppose #old is the value 4 for the old position of apples, and #new is the new position 1.
set #old = 4;
set #new = 1;
UPDATE Items
SET `order value` =
CASE `order value` WHEN #old THEN #new
ELSE `order value` + SIGN(#old-#new) END
WHERE `order value` BETWEEN LEAST(#old, #new) AND GREATEST(#old, #new);
I used MySQL 5.1.52 to test this on your example data and it works. The identical SQL also works if you need to move an early entry to be later, or move one in the middle, etc. Just set the values of #old and #new.

I ended up using an adjacencies table. I didn't know about it at the time.

I'm not sure if this counts as a solution, but you don't literally need to do one update for every row. If you move 'foo' from position 4 to position 1, you just do
UPDATE table SET position = 1 WHERE itemname = 'foo'
UPDATE table SET position = position + 1 WHERE itemname != 'foo' AND position < 4
It's the same number of updates even if you're moving from position 1000 to 500, or from 500 to 1000 (although you'll need to flip it, naturally), you just need to mass shift all the affected rows plus or minus one

You can do it in a single Update statement like so:
Update Table
Set OrderValue = Case
When Table.ItemName = 'apples' Then 0
Else (
Select Count(*)
From Table As T1
Where T1.ItemName <> 'apples'
And T1.OrderValue < Table.OrderValue
) + 1
End + 1
You would obviously replace apples with the selected value. However, I would think that this type of sorting would best done in the client application rather than in the database.

If you were using SQL Server uou could do this using a linked-list representation and CTEs. I don't know whether mysql supports CTEs though...
SET NOCOUNT ON
GO
DROP TABLE [Item]
GO
CREATE TABLE [Item]
(
[ItemId] int NOT NULL PRIMARY KEY,
[Name] varchar(100) NOT NULL,
[PreviousId] int NULL
)
GO
INSERT [Item] VALUES (6, 'apples', 3)
INSERT [Item] VALUES (3, 'orange', 36)
INSERT [Item] VALUES (9, 'mango', 100)
INSERT [Item] VALUES (100, 'salad', NULL)
INSERT [Item] VALUES (36, 'banana', 9)
GO
;WITH
[LinkedItem] AS
(
SELECT
[Item].*,
1 AS [OrderValue]
FROM [Item]
WHERE [Item].[PreviousId] IS NULL
UNION ALL
SELECT
[Item].*,
[LinkedItem].[OrderValue] + 1
FROM [Item]
INNER JOIN [LinkedItem] ON [LinkedItem].[ItemId] = [Item].[PreviousId]
)
SELECT *
FROM [LinkedItem]
ORDER BY
[LinkedItem].[OrderValue]
-- Drag orange up two spaces
DECLARE #MovingItemId int
DECLARE #NewPreviousId int
SET #MovingItemId = 3
SET #NewPreviousId = 100
DECLARE #OldPreviousId int
SELECT #OldPreviousId = [PreviousId] FROM [Item] WHERE [ItemId] = #MovingItemId
UPDATE [Item] SET [PreviousId] = #OldPreviousId WHERE [PreviousId] = #MovingItemId
UPDATE [Item] SET [PreviousId] = #MovingItemId WHERE [PreviousId] = #NewPreviousId
UPDATE [Item] SET [PreviousId] = #NewPreviousId WHERE [ItemId] = #MovingItemId
This produces the following before and after results:
100 salad NULL 1
9 mango 100 2
36 banana 9 3
3 orange 36 4
6 apples 3 5
100 salad NULL 1
3 orange 100 2
9 mango 3 3
36 banana 9 4
6 apples 36 5

I suppose you have a primary key on your table, an id column.
These two statements should do.
update table set order_value=0 where itemname='apples';
update
(select #num := 0 )vars
straight_join
(select id, #num := #num+1 as ord_value
from table
order by order_value
)big
inner join table t on t.id = big.id
set t.order_value = big.ord_value;
If you don't have an id, use itemname instead.

As has been suggested previously, and unless you have to show to all the users the current order that a given user is affecting, I would suggest that you treat this in the client first (there are many ways of solving this), and then, based on a user action (pressing the "I am done" button, for example) you update the rows in the database with the final order from the structure that you have chosen to store in the client.
You can make the code in the client as complex as you want to try to minimize the number of rows that need to be updated in the database: in some case, you may only need to insert one row (if the user inserts a new item at the end of the list); in many cases you may need to update two rows (if the user just swaps two consecutive items). The worst scenario as per the number of rows that need to be updated, is all the rows (you can device an algorithm that will only detect the rows that need to be updated and just update those). The choice is yours whether it is worth doing this or just issuing an update of all the rows.
The bottom line is that you do not need to update all the rows in the database, that situation is just one of the many possible scenarios. Some databases allow update in bulk (the name may vary from one database to another) and this will not be very expensive.

If the column simply specifes the order of the rows, I don't see anything wrong with using 0 and negatives.

Related

MySQL: select random individual from available to populate new table

I am trying to automate the production of a roster based on leave dates and working preferences. I have generated some data to work with and I now have two tables - one with a list of individuals and their preferences for working on particular days of the week(e.g. some prefer to work on a Tuesday, others only every other Wednesday, etc), and another with leave dates for individuals. That looks like this, where firstpref and secondpref represent weekdays with Mon = 1, Sun = 7 and firstprefclw represents a marker for which week of a 2 week pattern someone prefers (0 = no pref, 1 = wk 1 preferred, 2 = wk2 preferred)
initials | firstpref | firstprefclw | secondpref | secondprefclw
KP | 3 | 0 | 1 | 0
BD | 2 | 1 | 1 | 0
LW | 3 | 0 | 4 | 1
Then there is a table leave_entries which basically has the initials, a start date, and an end date for each leave request.
Finally, there is a pre-calculated clwdates table which contains a marker (a 1 or 2) for each day in one of its columns as to what week of the roster pattern it is.
I have run this query:
SELECT #tdate, DATE_FORMAT(#tdate,'%W') AS whatDay, GROUP_CONCAT(t1.initials separator ',') AS available
FROM people AS t1
WHERE ((t1.firstpref = (DAYOFWEEK(#tdate))-1
AND (t1.firstprefclw = 0 OR (t1.firstprefclw = (SELECT c_dates.clw from clwdates AS c_dates LIMIT i,1))))
OR (t1.secondpref = (DAYOFWEEK(#tdate))-1
AND (t1.secondprefclw = 0 OR (t1.secondprefclw = (SELECT c_dates.clw from clwdates AS c_dates LIMIT i,1)))
OR ((DAYOFWEEK(#tdate))-1 IN (0,5,6))
AND t1.initials NOT IN (SELECT initials FROM leave_entries WHERE #tdate BETWEEN leave_entries.start_date and leave_entries.end_date)
);
My output from that is a list of dates with initials of the pattern:
2018-01-03;Wednesday;KP,LW,TH
My desired output is
2018-01-03;Wednesday;KP
Where the initials of the person have been randomly selected from the list of available people generated by the first set of SELECTs.
I have seen a SO post where a suggestion of how to do this has been made involving SUBSTRING_INDEX (How to select Random Sub string,which seperated by coma(",") From a string), however I note the comment that CSV is not the way to go, and since I have a table which is not CSV, I am wondering:
How can I randomly select an individual's initials from the available ones and create a table which is basically date ; random_person?
So I figured out how to do it.
The first select (outlined above) forms the heart of a PROCEDURE called ROWPERROW() and generates a table called available_people
This is probably filthy MySQL code, but it works:
SET #tdate = 0
DROP TABLE IF EXISTS on_call;
CREATE TABLE working(tdate DATE, whatDay VARCHAR(20), selected VARCHAR(255));
DELIMITER //
DROP PROCEDURE IF EXISTS ROWPERROW2;
CREATE PROCEDURE ROWPERROW2()
BEGIN
DECLARE n INT DEFAULT 0;
DECLARE kk INT DEFAULT 0;
SET n=90; -- or however many days the roster is going to run for
SET kk=0;
WHILE kk<n DO
SET #tdate = (SELECT c_dates.fulldate from clwdates AS c_dates LIMIT kk,1);
INSERT INTO working
SELECT #tdate, DATE_FORMAT(#tdate,'%W') AS whatDay, t1.available
FROM available_people AS t1 -- this is the table created by the first query above
WHERE tdate = #tdate ORDER BY RAND() LIMIT 1;
SET kk = kk + 1;
END WHILE;
end;
//
DELIMITER ;
CALL ROWPERROW2();
SELECT * from working;

select one row multiple time when using IN()

I have this query :
select
name
from
provinces
WHERE
province_id IN(1,3,2,1)
ORDER BY FIELD(province_id, 1,3,2,1)
the Number of values in IN() are dynamic
How can I get all rows even duplicates ( in this example -> 1 ) with given ORDER BY ?
the result should be like this :
name1
name3
name2
name1
plus I shouldn't use UNION ALL :
select * from provinces WHERE province_id=1
UNION ALL
select * from provinces WHERE province_id=3
UNION ALL
select * from provinces WHERE province_id=2
UNION ALL
select * from provinces WHERE province_id=1
You need a helper table here. On SQL Server that can be something like:
SELECT name
FROM (Values (1),(3),(2),(1)) As list (id) --< List of values to join to as a table
INNER JOIN provinces ON province_id = list.id
Update: In MySQL Split Comma Separated String Into Temp Table can be used to split string parameter into a helper table.
To get the same row more than once you need to join in another table. I suggest to create, only once(!), a helper table. This table will just contain a series of natural numbers (1, 2, 3, 4, ... etc). Such a table can be useful for many other purposes.
Here is the script to create it:
create table seq (num int);
insert into seq values (1),(2),(3),(4),(5),(6),(7),(8);
insert into seq select num+8 from seq;
insert into seq select num+16 from seq;
insert into seq select num+32 from seq;
insert into seq select num+64 from seq;
/* continue doubling the number of records until you feel you have enough */
For the task at hand it is not necessary to add many records, as you only need to make sure you never have more repetitions in your in condition than in the above seq table. I guess 128 will be good enough, but feel free to double the number of records a few times more.
Once you have the above, you can write queries like this:
select province_id,
name,
#pos := instr(#in2 := insert(#in2, #pos+1, 1, '#'),
concat(',',province_id,',')) ord
from (select #in := '0,1,2,3,1,0', #in2 := #in, #pos := 10000) init
inner join provinces
on find_in_set(province_id, #in)
inner join seq
on num <= length(replace(#in, concat(',',province_id,','),
concat(',+',province_id,',')))-length(#in)
order by ord asc
Output for the sample data and sample in list:
| province_id | name | ord |
|-------------|--------|-----|
| 1 | name 1 | 2 |
| 2 | name 2 | 4 |
| 3 | name 3 | 6 |
| 1 | name 1 | 8 |
SQL Fiddle
How it works
You need to put the list of values in the assignment to the variable #in. For it to work, every valid id must be wrapped between commas, so that is why there is a dummy zero at the start and the end.
By joining in the seq table the result set can grow. The number of records joined in from seq for a particular provinces record is equal to the number of occurrences of the corresponding province_id in the list #in.
There is no out-of-the-box function to count the number of such occurrences, so the expression at the right of num <= may look a bit complex. But it just adds a character for every match in #in and checks how much the length grows by that action. That growth is the number of occurrences.
In the select clause the position of the province_id in the #in list is returned and used to order the result set, so it corresponds to the order in the #in list. In fact, the position is taken with reference to #in2, which is a copy of #in, but is allowed to change:
While this #pos is being calculated, the number at the previous found #pos in #in2 is destroyed with a # character, so the same province_id cannot be found again at the same position.
Its unclear exactly what you are wanting, but here's why its not working the way you want. The IN keyword is shorthand for creating a statement like ....Where province_id = 1 OR province_id = 2 OR province_id = 3 OR province_id = 1. Since province_id = 1 is evaluated as true at the beginning of that statement, it doesn't matter that it is included again later, it is already true. This has no bearing on whether the result returns a duplicate.

Efficient way to remove successive duplicate rows in MySQL

I have a table with columns like (PROPERTY_ID, GPSTIME, STATION_ID, PROPERTY_TYPE, VALUE) where PROPERTY_ID is primary key and STATION_ID is foreign key.
This table records state changes; each row represents property value of some station at given time. However, its data was converted from old table where each property was a column (like (STATION_ID, GPSTIME, PROPERTY1, PROPERTY2, PROPERTY3, ...)). Because usually only one property changed at time I have lots of duplicates.
I need to remove all successive rows with same values.
Example. Old table contained values like
time stn prop1 prop2
100 7 red large
101 7 red small
102 7 blue small
103 7 red small
The converted table is
(order by time,type) (order by type,time)
time stn type value time stn type value
100 7 1 red 100 7 1 red
100 7 2 large 101 7 1 red
101 7 1 red 102 7 1 blue
101 7 2 small 103 7 1 red
102 7 1 blue 100 7 2 large
102 7 2 small 101 7 2 small
103 7 1 red 102 7 2 small
103 7 2 small 103 7 2 small
should be changed to
time stn type value
100 7 1 red
100 7 2 large
101 7 2 small
102 7 1 blue
103 7 1 red
The table contains about 22 mln rows.
My current approach is to use procedure to iterate over the table and remove duplicates:
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE id INT;
DECLARE psid,nsid INT DEFAULT null;
DECLARE ptype,ntype INT DEFAULT null;
DECLARE pvalue,nvalue VARCHAR(50) DEFAULT null;
DECLARE cur CURSOR FOR
SELECT station_property_id,station_id,property_type,value
FROM station_property
ORDER BY station_id,property_type,gpstime;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur;
read_loop: LOOP
FETCH cur INTO id,nsid,ntype,nvalue;
IF done THEN
LEAVE read_loop;
END IF;
IF (psid = nsid and ptype = ntype and pvalue = nvalue) THEN
delete from station_property where station_property_id=id;
END IF;
SET psid = nsid;
SET ptype = ntype;
SET pvalue = nvalue;
END LOOP;
CLOSE cur;
END
However, it is too slow. On test table with 20000 rows it removes 10000 duplicates for 6 minutes. Is there a way to optimize the procedure?
P.S. I still have my old table intact, so maybe it is better to try and convert it without the duplicates rather than dealing with duplicates after conversion.
UPDATE.
To clarify which duplicates I want to allow and which not.
If a property changes, then changes back, I want all 3 records to be saved, even though first and the last contains same station_id, type, and value.
If there are several successive (by GPSTIME) records with same station_id, type, and value, I want only the first one (which represents the change to that value) to be saved.
In short, a -> b -> b -> a -> a should be optimized to a -> b -> a.
SOLUTION
As #Kickstart suggested, I've created new table, populated with filtered data. To refer previous rows, I've used approach similar to one used in this question.
rename table station_property to station_property_old;
create table station_property like station_property_old;
set #lastsid=-1;
set #lasttype=-1;
set #lastvalue='';
INSERT INTO station_property(station_id,gpstime,property_type,value)
select newsid as station_id,gpstime,newtype as type,newvalue as value from
-- this subquery adds columns with previous values
(select station_property_id,gpstime,#lastsid as lastsid,#lastsid:=station_id as newsid,
#lasttype as lasttype,#lasttype:=property_type as newtype,
#lastvalue as lastvalue,#lastvalue:=value as newvalue
from station_property_old
order by newsid,newtype,gpstime) sub
-- we filter the data, removing unnecessary duplicates
where lastvalue != newvalue or lastsid != newsid or lasttype != newtype;
drop table station_property_old;
Possibly create a new table, populated with a select from the existing table using a GROUP BY. Something like this (not tested so excuse any typos):-
INSERT INTO station_property_new
SELECT station_property_id, station_id, property_type, value
FROM (SELECT station_property_id, station_id, property_type, value, COUNT(*) FROM station_property GROUP BY station_property_id, station_id, property_type, value) Sub1
Regarding chainging properties, cant you put a unique constraint to ensure the combination of station/type/value columns is unique. That way you will not be able to change it to a value which will result in a duplication.

Simple MySQL Query - Change table format around

I'm fairly sure this is a fairly easy answer but the answer is completely slipping my mind.
I have a database table that is currently formatted like:
event_id | elem_id | value
1 1 Value 1
1 2 Value 2
2 1 Value 3
2 2 Value 4
Both event_id and elem_id are undetermined numbers and have infinite possibilities.
How would I query it for example based on event_id 1 to get the data to be formatted as such:
event_id | 1 | 2
1 Value 1 Value 2
Knowing that elem_id is a number >= n so potentially there could be 50 elem_id yet I still need the data in that format.
Like I said I can't for the life of me figure out the query to assemble it that way. Any help would be GREATLY appreciated.
Try following:
SELECT
`event_id`,
(SELECT t2.`value` FROM table t2 WHERE t2.`event_id` = t1.`event_id` AND t2.`elem_id` = 1),
(SELECT t3.`value` FROM table t3 WHERE t3.`event_id` = t1.`event_id` AND t3.`elem_id` = 2)
FROM `table` t1 GROUP BY `event_id`;
Also you can use different way, and get elem_ids and values in comma-separated format in two cells
SELECT `event_id`, GROUP_CONCAT(`elem_id`), GROUP_CONCAT(`value`) FROM `table` GROUP BY `event_id`;
and you can change separator with following syntax: GROUP_CONCAT(field SEPARATOR '::')

how find "holes" in auto_increment column?

when I DELETE, as example, the id 3, I have this:
id | name
1 |
2 |
4 |
5 |
...
now, I want to search for the missing id(s), because i want to fill the id again with:
INSERT INTO xx (id,...) VALUES (3,...)
is there a way to search for "holes" in the auto_increment index?
thanks!
You can find the top value of gaps like this:
select t1.id - 1 as missing_id
from mytable t1
left join mytable t2 on t2.id = t1.id - 1
where t2.id is null
The purpose of AUTO_INCREMENT is to generate simple unique and meaningless identifiers for your rows. As soon as you plan to re-use those IDs, they're no longer unique (not at least in time) so I have the impression that you are not using the right tool for the job. If you decide to get rid of AUTO_INCREMENT, you can do all your inserts with the same algorithm.
As about the SQL code, this query will match existing rows with the rows that has the next ID:
SELECT a.foo_id, b.foo_id
FROM foo a
LEFT JOIN foo b ON a.foo_id=b.foo_id-1
E.g.:
1 NULL
4 NULL
10 NULL
12 NULL
17 NULL
19 20
20 NULL
24 25
25 26
26 27
27 NULL
So it's easy to filter out rows and get the first gap:
SELECT MIN(a.foo_id)+1 AS next_id
FROM foo a
LEFT JOIN foo b ON a.foo_id=b.foo_id-1
WHERE b.foo_id IS NULL
Take this as a starting point because it still needs some tweaking:
You need to consider the case where the lowest available number is the lowest possible one.
You need to lock the table to handle concurrent inserts.
In my computer it's slow as hell with big tables.
I think the only way you can do this is with a loop:
Any other solutions wont show gaps bigger than 1:
insert into XX values (1)
insert into XX values (2)
insert into XX values (4)
insert into XX values (5)
insert into XX values (10)
declare #min int
declare #max int
select #min=MIN(ID) from xx
select #max=MAX(ID) from xx
while #min<#max begin
if not exists(select 1 from XX where id = #min+1) BEGIN
print 'GAP: '+ cast(#min +1 as varchar(10))
END
set #min=#min+1
end
result:
GAP: 3
GAP: 6
GAP: 7
GAP: 8
GAP: 9
First, I agree with the comments that you shouldn't try filling in holes. You won't be able to find all the holes with a single SQL statement. You'll have to loop through all possible numbers starting with 1 until you find a hole. You could write a sql function to do this for you that could then be used in a function. So if you wrote a function called find_first_hole you could then call it in an insert like:
INSERT INTO xx (id, ...) VALUES (find_first_hole(), ...)
This is a gaps&island problem, see my (and other) replies here and here. In most cases, gaps&islands problems are most elegantly solved using recursive CTE's, which are not available in mysql.