Renumbering items in a list with SQL queries? - mysql

The query:
$consulta = "UPDATE `list`
SET `pos` = $pos
WHERE `id_item` IN (SELECT id_item
FROM lists
WHERE pos = '$item'
ORDER BY pos DESC
LIMIT 1)
AND id_usuario = '$us'
AND id_list = '$id_pl'";
The thing is, this query is inside a foreach, and it wants to update the order of the items in a list. Before I had it like this:
$consulta = "UPDATE `list`
SET `pos` = $pos
WHERE `$pos` = '$item'
AND id_usuario = '$us'
AND id_list = '$id_pl'";
But when I update pos 2 -> 1, and then 1 -> 2, the result is two times 2 and no 1...
Is there a solution for this query?

Renumbering the items in a list is tricky. When you renumber the items in the list using multiple separate SQL statements, it is even trickier.
Your inner sub-select statement also is not properly constrained. You need an extra condition such as:
AND id_list = '$id_pl'
There are probably many ways to do this, but the one that may be simplest follows. I'm assuming that:
the unshown foreach loop generates $pos values in the desired sequence (1, 2, ...)
the value of $id_pl is constant for the loop
the foreach loop gives values for $us and $item for each iteration
the combination of $id_pl, $us, and $item uniquely identifies a row in the list table
there aren't more than 100 pos values to worry about
you are able to use an explicit transaction around the statement sequence
The suggested solution has two stages:
Allocate 100 + pos to each row to place it in its new position
Subtract 100 from each pos
This technique avoids any complicated issues about whether rows that have had there position adjusted are reread by the same query.
Inside the loop:
foreach ...
...$pos, $item, $us...
UPDATE list
SET pos = $pos + 100
WHERE id_item = '$item'
AND id_usuario = '$us'
AND id_list = '$id_pl'
AND pos < 100
end foreach
UPDATE list
SET pos = pos - 100
WHERE id__list = '$id_pl';
If you don't know the size of the lists, you could assign negative pos values in the loop and convert to positive after the loop, or any of a number of other equivalent mappings. The key is to update the table so that the new pos numbers in the loop are disjoint from the old numbers, and then adjust the new values after the loop.
Alternative techniques create a temporary table that maps the old numbers to the new and then executes a single UPDATE statement that changes the old pos value to the new for all rows in a single operation. This is probably more efficient, especially if the mapping table can be generated as a query, but that depends on whether the renumbering is algorithmic. The technique shown, albeit somewhat clumsy, can be made to work for arbitrary renumberings.

Related

bit operations in mysql

I am trying to create a simple filtering of records with bit operations by using this manual: https://dev.mysql.com/doc/refman/8.0/en/bit-functions.html
I have four properties that are defined and based on certain content:
Filter 1 (field1) gets the value 1 (binary 1)
Filter 2 (field2) gets the value 2 (binary 10)
Filter 3 (field3) gets the value 4 (binary 100)
Filter 4 (field4) gets the value 8 (binary 1000)
I set the values with an update:
UPDATE table SET filter = 1 where field1 = "a";
UPDATE table SET filter = filter|2 where field2 = "b";
UPDATE table SET filter = filter|4 where field3 = "c";
UPDATE table SET filter = filter|8 where field4 = "d";
Now the column is filled for the different properties. I now have values between 0 (no property applies) and 15 (all properties apply).
How do I manage to use them now? If I want to use e.g. the filters 1,2 and 4, I get with:
select * from table where filter = 1|2|8;
I get the value "11". But actually, "15" should also match, since all four properties are applied here.
I had no success with this, too:
select * from table where filter & (1|2|8);
Can someone help me? Or am I completely wrong?
Try WHERE (filter & (1|2|8)) = (1|2|8).
But please be aware that this bitmasking approach can't exploit indexes, so it will scale up poorly to megarow tables.

MySQL workbench procedure returning no results

Hello to the lovely stack overflow community.
I have a set of values x values and I am trying to find the nearest larger value, then divide the difference by 2 and set this as a new value in new column called nearest_x. I have created a procedure with a while loop. The procedure runs and then give me no results. I think this is because I using the ele_id in the while loop, I think I need to simply look at each row in turn? Thoughts much appreciated. My first stack overflow post!
DELIMITER ;;
CREATE PROCEDURE highX_rowperrow()
BEGIN
DECLARE n INT DEFAULT 0;
DECLARE i INT DEFAULT 0;
SET n = (SELECT COUNT(*) FROM table_csa WHERE condition = "pal0");
SET i=0;
WHILE i<n DO #start while loop
SET #x1pal0 = (SELECT x FROM table_csa WHERE ele_id=(21001+i) AND condition = "pal0"); #count applicable rows (pal10 always first)
SET #Xnearest = (select x from table_csa where x > #x1pal0 and condition = "pal0" order by x asc limit 1); #Select the nearest larger than x value
SET #polypoint = #x1pal0+((abs(#x1pal)-abs(#Xnearest))/2); #calculate the difference and divide by 2 to set up point
UPDATE table_csa SET nearest_x = #polypoint WHERE ele_id=(21001+i); #put value in table
SET i=i+1;
END WHILE;
END ;;
DELIMITER ;
So grateful for any help, I thought this would be a simple obvious answer and I missed something as I'm learning. I guess not so simple, so I'm adding more information to make it more of a re-producible problem. I have quite a big data set so not immediately obvious how to do that but here is my effort. I also tried to explain the background so it makes more sense.
I have a table which has the following columns:
"ele_id" which is a set of location ids not necessarily individual in
numbers, but does generally count up by 1.
"x", which is x location,
"y" which is y location,
"condition" which is condition code for the location. So a different
condition codes do exist for the same location later on in the data
set.
"nearest_x" which is the column I want to populate
I also added a snapshot with example data
snapshot of table with example data
I am actually trying to turn these points into zones which I can make into a polygon, I have another data set in my database where I want to find which pieces of data fall within my polygons from the csa table. So I am trying to find the nearest x point value and calculate a point that falls half way between them, then I can do the same for y and draw a polygon. I know the condition pal0 has each location I need in it as an individual value which seemed like a good starting place.
As part of debugging I tried the following, this set of code performs what I want, but only on one element 21003 in this case, I can't seem to get it working in a while loop:
Set #var=2;
SET #x1pal0 = (SELECT x FROM table_csa WHERE ele_id=(21001+#var) AND condition = "pal0");
SET #Xnearest = (select x from table_csa where x > #x1pal0 and condition = "pal0" order by x asc limit 1);
SET #polypoint =#x1pal0+abs(#x1pal0-#Xnearest)/2;
Select #x1pal0, #Xnearest, #polypoint, ele_id from table_csa where ele_id=(21001+#var) and condition = "pal0" ;
UPDATE table_csa SET nearest_x = #polypoint WHERE ele_id=(21001+#var);
I needed to use a cursor not a while loop!
[MySQL_tutorial_link]

Select a random row with where statement is taking to long

I want to select a random row with a specific where statement but the query is taking to long (around 2.7 seconds)
SELECT * FROM PIN WHERE available = '1' ORDER BY RAND() LIMIT 1
The database contains around 900k rows
Thanks
SELECT * FROM PIN WHERE available = '1' ORDER BY RAND() LIMIT 1
means, that you are going to generate a random number for EVERY row, then sort the whole result-set and finally retrieve one row.
That's a lot of work for querying a single row.
Assuming you have id's without gaps - or only little of them - you better use the programming language you are using to generate ONE random number - and fetch that id:
Pseudo-Example:
result = null;
min_id = queryMinId();
max_id = queryMaxId();
while (result == null){
random_number = random_beetween(min_id, max_id);
result = queryById(randomNumber);
}
If you have a lot of gaps, you could retrieve the whole id-set, and then pick ONE random number from that result prior:
id_set = queryAllIds();
random_number = random_beetween(0, size(id_set)-1);
result = queryById(id_set[random_number])
The first example will work without additional constraints. In your case, you should use option 2. This ensures, that all IDs with available=1 are pre-selected into an 0 to count() -1 array, hence ignoring all invalid ids.
Then you can generate a random number between 0 and count() -1 to get an index within that result-set, which you can translate to an actual ID, which you are going to fetch finally.
id_set = queryAllIdsWithAvailableEqualsOne(); //"Condition"
random_number = random_beetween(0, size(id_set)-1);
result = queryById(id_set[random_number])

Which of the two approaches for the specified task is better performance-wise in Perl DBI module?

Here is a problem that I'm facing, which I need to solve using Perl DBI module:
Table:
c1 c2 c3 c4 c5 c6
__________________
r1 | a b c d e f
r2 | h i j k x m
r3 | n x p q r x
r4 | k l m n p q
Task: determine the name of the left-most column that has value 'x' in any of the rows. In the example it is c2.
I see two ways to do this:
First
Select column c1 in all the rows;
Loop through the retrieved fields, starting from top-most;
If any of the fields have value 'x', return c1;
Otherwise, repeat 1-4 for next column;
How I approximately imagine it to look in perl:
my #bind_values=\(my $field);
my $var;
for my $i (1..6) {
$statement="select c$i from table"
$dbh->selectcol_arrayref($statement, undef, #bind_values);
if ($field eq 'x') {$var=$i;last;}
}
return $field;
Second
Set variable $var to 4;
Select all columns from r1 to r$var.
Loop through returned fields, starting from left-most;
If a field has value 'x' and current column number is lower than x, assign the current column number to x;
repeat 2-5 for next row
return x
How I approximately imagine it to look in Perl:
my #bind_values;
my $var=6;
my #cols;
for my $i (1..6) {
for (1..$var){push #cols, "c$_"; push #bind_values, my "c$_";}
$statement="select #cols from table"
$dbh->selectrow_array($statement, undef, #bind_values)
for (#bind values){
if ($$_<$var) $var=$$_;
}
}
return $var;
If I understood the manual correctly, selectcol_array() actually performs a separate SQL call for each row in the table, so both approaches involve a two-level loop.
To people know more about the inner workings of Perl DBI module my question is the following:
Which of the approaches is better performance-wise?
If it's of any significance, I'm working with a MySQL database.
EDIT: Actual table dimensions are potentially c200 x r1000.
EDIT2:
Another idea: using LIMIT statement, to determine if a column contains a field with the statement SQL statement itself, for example:
SELECT c1
FROM table
WHERE c1='x'
LIMIT 0,1
This statement should allow to determine if c1 contains value 'x'. This would move some more of the performance load to DB engine, correct? Would this improve or worsen performance?
Here is a version using SQLite. I expect the same code to work for MySQL with little or no change. It should work fine unless your detabase table is huge, but you don't mention its size so I presume it's not out of the ordinary.
It simply fetches the contents of the table into memory and checks each column, one by one, to see if any field is x, printing the name of the column once it is found.
use strict;
use warnings;
use DBI;
use List::Util qw/ any /;
my $dbh = DBI->connect('dbi:SQLite:test.sqlite');
my $sth = $dbh->prepare('SELECT * FROM "table"');
$sth->execute;
my $table = $sth->fetchall_arrayref;
my $first_column;
for my $i (0 .. $#{$table->[0]}) {
my #column = map { $_->[$i] } #$table;
if ( any { $_ eq 'x' } #column ) {
$first_column = $sth->{NAME}[$i];
last;
}
}
print $first_column, "\n";
output
c2
Update
This way is likely to be faster, as it uses the database engine to search for columns that contain an x and very little data is loaded into memory
use strict;
use warnings;
use DBI;
my $dbh = DBI->connect('dbi:SQLite:test.sqlite');
my #names = do {
my $sth = $dbh->prepare('SELECT * FROM "table"' LIMIT 0);
$sth->execute;
#{ $sth->{NAME_lc} };
};
my $first_column;
for my $col (#names) {
my $sql = qq{SELECT $col from "table" WHERE $col = 'x' LIMIT 1};
my $row = $dbh->selectrow_arrayref($sql);
if ($row) {
$first_column = $col;
last;
}
}
print $first_column, "\n";
Short of redesigning your table so that it can be queried more effectively, I think your optimal solution is likely to be a modified version of your Option 1. Instead of using fetchall_arrayref(), use fetchrow_arrayref() to collect 1 row at a time. Examine each row as you get it. Break the loop if the minimum column ever gets to column 1. This minimizes the memory used in the Perl code; it uses a single SQL statement (but multiple fetch operations — but then fetchall_arrayref() also uses multiple fetch operations).
The fact that you need to query your data this way tells me that it's stored in a bizarre and inappropriate way. Relational databases are meant to store relations, and the order of their columns should be irrelevant to how they logically function. Any need to refer to column order is a guaranteed sign that you're doing something wrong.
I understand that sometimes one needs to perform one-time queries to determine unusual things about data sets, but I stand by my assessment: this data is stored inappropriately.
My guess is that there are many columns that define related, sequential attributes, maybe something like "profits_1q2001", "profits_2q2001", etc. You'll want to create a separate table for those, maybe something like:
CREATE TABLE `department_profits` (
`id` int(10) unsigned NOT NULL,
`department_id` same_as_parent_table NOT NULL,
`y` year(4) NOT NULL,
`q` tinyint(3) unsigned NOT NULL,
`profits` decimal(9,2) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_dept_quarter` (`department_id`,`y`,`q`),
KEY `idx_profits_y_q_dept` (`profits`,`y`,`q`,`department_id`)
) ENGINE=InnoDB;
Converting the data from its current format to the proper format is left as an exercise for the reader, but it might involve 200 script-generated queries that look like:
SELECT CONCAT(
"INSERT INTO department_profits (department_id, y, q, profits) VALUES (",
"'", department_id, "',",
2001, ",",
1, ",",
profits_1q2001,
");"
)
FROM old_table;
If your question is then (say) when was the first time profits exceeded $10,000 and in which department, then finding the answer becomes something like:
SELECT department_id, y, q, profits
FROM department_profits
WHERE profits > 10000
ORDER BY y, q LIMIT 1;
For the actual question you asked -- if it really is a one-off -- since there are just 200,000 data points, I would do it manually. Export the whole table as tab-separated, drag it onto Excel, "Find/Replace" to change "x" to "-999" or some small value, then "Data -> Sort" by each column in turn until your answer pops to the top. Heck, plain old "Find" might tell you your answer. With just 200 columns, it won't take long, and you might learn something new about your data by seeing it all on the screen sorted various ways :)
Assuming your columns are c1 .. c6 you can use something like this to get it in sqlite:
select distinct (case when c1 = 'x' then 'c1' when c2 = 'x' then 'c2' when c3 = 'x' then 'c4' when c4 = 'x' then 'c4' when c5 = 'x' then 'c5' when c6 = 'x' then 'c6' else 'x' end) from mje order by 1 limit 1;

Order of execution of SQL UPDATE while updating multiple values?

What is the sequence in which the values (separated by commas) will be updated?
$command = sprintf('UPDATE %s SET rating = ((rating * rating_count + %f) / (rating_count + 1.0)) , rating_count=rating_count+1 WHERE id=%d', $table, $ratingGiven, $id)`;
I want to make sure that
rating = (rating * rating_count + %f) / (rating_count + 1.0)
is executed before
rating_count=rating_count+1
without firing two SQL commands.
I am not sure if the update-value-statements are executed in the order in which they are separated by commas in MySql (or any other DB)?
I don't think it will matter UPDATE will read the current row and do the update upon it based on the existing values, and not the ones that are in the update.
So in both SET operations, the original value of rating_count will be used.