I am trying to create a simple filtering of records with bit operations by using this manual: https://dev.mysql.com/doc/refman/8.0/en/bit-functions.html
I have four properties that are defined and based on certain content:
Filter 1 (field1) gets the value 1 (binary 1)
Filter 2 (field2) gets the value 2 (binary 10)
Filter 3 (field3) gets the value 4 (binary 100)
Filter 4 (field4) gets the value 8 (binary 1000)
I set the values with an update:
UPDATE table SET filter = 1 where field1 = "a";
UPDATE table SET filter = filter|2 where field2 = "b";
UPDATE table SET filter = filter|4 where field3 = "c";
UPDATE table SET filter = filter|8 where field4 = "d";
Now the column is filled for the different properties. I now have values between 0 (no property applies) and 15 (all properties apply).
How do I manage to use them now? If I want to use e.g. the filters 1,2 and 4, I get with:
select * from table where filter = 1|2|8;
I get the value "11". But actually, "15" should also match, since all four properties are applied here.
I had no success with this, too:
select * from table where filter & (1|2|8);
Can someone help me? Or am I completely wrong?
Try WHERE (filter & (1|2|8)) = (1|2|8).
But please be aware that this bitmasking approach can't exploit indexes, so it will scale up poorly to megarow tables.
Hello to the lovely stack overflow community.
I have a set of values x values and I am trying to find the nearest larger value, then divide the difference by 2 and set this as a new value in new column called nearest_x. I have created a procedure with a while loop. The procedure runs and then give me no results. I think this is because I using the ele_id in the while loop, I think I need to simply look at each row in turn? Thoughts much appreciated. My first stack overflow post!
DELIMITER ;;
CREATE PROCEDURE highX_rowperrow()
BEGIN
DECLARE n INT DEFAULT 0;
DECLARE i INT DEFAULT 0;
SET n = (SELECT COUNT(*) FROM table_csa WHERE condition = "pal0");
SET i=0;
WHILE i<n DO #start while loop
SET #x1pal0 = (SELECT x FROM table_csa WHERE ele_id=(21001+i) AND condition = "pal0"); #count applicable rows (pal10 always first)
SET #Xnearest = (select x from table_csa where x > #x1pal0 and condition = "pal0" order by x asc limit 1); #Select the nearest larger than x value
SET #polypoint = #x1pal0+((abs(#x1pal)-abs(#Xnearest))/2); #calculate the difference and divide by 2 to set up point
UPDATE table_csa SET nearest_x = #polypoint WHERE ele_id=(21001+i); #put value in table
SET i=i+1;
END WHILE;
END ;;
DELIMITER ;
So grateful for any help, I thought this would be a simple obvious answer and I missed something as I'm learning. I guess not so simple, so I'm adding more information to make it more of a re-producible problem. I have quite a big data set so not immediately obvious how to do that but here is my effort. I also tried to explain the background so it makes more sense.
I have a table which has the following columns:
"ele_id" which is a set of location ids not necessarily individual in
numbers, but does generally count up by 1.
"x", which is x location,
"y" which is y location,
"condition" which is condition code for the location. So a different
condition codes do exist for the same location later on in the data
set.
"nearest_x" which is the column I want to populate
I also added a snapshot with example data
snapshot of table with example data
I am actually trying to turn these points into zones which I can make into a polygon, I have another data set in my database where I want to find which pieces of data fall within my polygons from the csa table. So I am trying to find the nearest x point value and calculate a point that falls half way between them, then I can do the same for y and draw a polygon. I know the condition pal0 has each location I need in it as an individual value which seemed like a good starting place.
As part of debugging I tried the following, this set of code performs what I want, but only on one element 21003 in this case, I can't seem to get it working in a while loop:
Set #var=2;
SET #x1pal0 = (SELECT x FROM table_csa WHERE ele_id=(21001+#var) AND condition = "pal0");
SET #Xnearest = (select x from table_csa where x > #x1pal0 and condition = "pal0" order by x asc limit 1);
SET #polypoint =#x1pal0+abs(#x1pal0-#Xnearest)/2;
Select #x1pal0, #Xnearest, #polypoint, ele_id from table_csa where ele_id=(21001+#var) and condition = "pal0" ;
UPDATE table_csa SET nearest_x = #polypoint WHERE ele_id=(21001+#var);
I needed to use a cursor not a while loop!
[MySQL_tutorial_link]
I want to select a random row with a specific where statement but the query is taking to long (around 2.7 seconds)
SELECT * FROM PIN WHERE available = '1' ORDER BY RAND() LIMIT 1
The database contains around 900k rows
Thanks
SELECT * FROM PIN WHERE available = '1' ORDER BY RAND() LIMIT 1
means, that you are going to generate a random number for EVERY row, then sort the whole result-set and finally retrieve one row.
That's a lot of work for querying a single row.
Assuming you have id's without gaps - or only little of them - you better use the programming language you are using to generate ONE random number - and fetch that id:
Pseudo-Example:
result = null;
min_id = queryMinId();
max_id = queryMaxId();
while (result == null){
random_number = random_beetween(min_id, max_id);
result = queryById(randomNumber);
}
If you have a lot of gaps, you could retrieve the whole id-set, and then pick ONE random number from that result prior:
id_set = queryAllIds();
random_number = random_beetween(0, size(id_set)-1);
result = queryById(id_set[random_number])
The first example will work without additional constraints. In your case, you should use option 2. This ensures, that all IDs with available=1 are pre-selected into an 0 to count() -1 array, hence ignoring all invalid ids.
Then you can generate a random number between 0 and count() -1 to get an index within that result-set, which you can translate to an actual ID, which you are going to fetch finally.
id_set = queryAllIdsWithAvailableEqualsOne(); //"Condition"
random_number = random_beetween(0, size(id_set)-1);
result = queryById(id_set[random_number])
Here is a problem that I'm facing, which I need to solve using Perl DBI module:
Table:
c1 c2 c3 c4 c5 c6
__________________
r1 | a b c d e f
r2 | h i j k x m
r3 | n x p q r x
r4 | k l m n p q
Task: determine the name of the left-most column that has value 'x' in any of the rows. In the example it is c2.
I see two ways to do this:
First
Select column c1 in all the rows;
Loop through the retrieved fields, starting from top-most;
If any of the fields have value 'x', return c1;
Otherwise, repeat 1-4 for next column;
How I approximately imagine it to look in perl:
my #bind_values=\(my $field);
my $var;
for my $i (1..6) {
$statement="select c$i from table"
$dbh->selectcol_arrayref($statement, undef, #bind_values);
if ($field eq 'x') {$var=$i;last;}
}
return $field;
Second
Set variable $var to 4;
Select all columns from r1 to r$var.
Loop through returned fields, starting from left-most;
If a field has value 'x' and current column number is lower than x, assign the current column number to x;
repeat 2-5 for next row
return x
How I approximately imagine it to look in Perl:
my #bind_values;
my $var=6;
my #cols;
for my $i (1..6) {
for (1..$var){push #cols, "c$_"; push #bind_values, my "c$_";}
$statement="select #cols from table"
$dbh->selectrow_array($statement, undef, #bind_values)
for (#bind values){
if ($$_<$var) $var=$$_;
}
}
return $var;
If I understood the manual correctly, selectcol_array() actually performs a separate SQL call for each row in the table, so both approaches involve a two-level loop.
To people know more about the inner workings of Perl DBI module my question is the following:
Which of the approaches is better performance-wise?
If it's of any significance, I'm working with a MySQL database.
EDIT: Actual table dimensions are potentially c200 x r1000.
EDIT2:
Another idea: using LIMIT statement, to determine if a column contains a field with the statement SQL statement itself, for example:
SELECT c1
FROM table
WHERE c1='x'
LIMIT 0,1
This statement should allow to determine if c1 contains value 'x'. This would move some more of the performance load to DB engine, correct? Would this improve or worsen performance?
Here is a version using SQLite. I expect the same code to work for MySQL with little or no change. It should work fine unless your detabase table is huge, but you don't mention its size so I presume it's not out of the ordinary.
It simply fetches the contents of the table into memory and checks each column, one by one, to see if any field is x, printing the name of the column once it is found.
use strict;
use warnings;
use DBI;
use List::Util qw/ any /;
my $dbh = DBI->connect('dbi:SQLite:test.sqlite');
my $sth = $dbh->prepare('SELECT * FROM "table"');
$sth->execute;
my $table = $sth->fetchall_arrayref;
my $first_column;
for my $i (0 .. $#{$table->[0]}) {
my #column = map { $_->[$i] } #$table;
if ( any { $_ eq 'x' } #column ) {
$first_column = $sth->{NAME}[$i];
last;
}
}
print $first_column, "\n";
output
c2
Update
This way is likely to be faster, as it uses the database engine to search for columns that contain an x and very little data is loaded into memory
use strict;
use warnings;
use DBI;
my $dbh = DBI->connect('dbi:SQLite:test.sqlite');
my #names = do {
my $sth = $dbh->prepare('SELECT * FROM "table"' LIMIT 0);
$sth->execute;
#{ $sth->{NAME_lc} };
};
my $first_column;
for my $col (#names) {
my $sql = qq{SELECT $col from "table" WHERE $col = 'x' LIMIT 1};
my $row = $dbh->selectrow_arrayref($sql);
if ($row) {
$first_column = $col;
last;
}
}
print $first_column, "\n";
Short of redesigning your table so that it can be queried more effectively, I think your optimal solution is likely to be a modified version of your Option 1. Instead of using fetchall_arrayref(), use fetchrow_arrayref() to collect 1 row at a time. Examine each row as you get it. Break the loop if the minimum column ever gets to column 1. This minimizes the memory used in the Perl code; it uses a single SQL statement (but multiple fetch operations — but then fetchall_arrayref() also uses multiple fetch operations).
The fact that you need to query your data this way tells me that it's stored in a bizarre and inappropriate way. Relational databases are meant to store relations, and the order of their columns should be irrelevant to how they logically function. Any need to refer to column order is a guaranteed sign that you're doing something wrong.
I understand that sometimes one needs to perform one-time queries to determine unusual things about data sets, but I stand by my assessment: this data is stored inappropriately.
My guess is that there are many columns that define related, sequential attributes, maybe something like "profits_1q2001", "profits_2q2001", etc. You'll want to create a separate table for those, maybe something like:
CREATE TABLE `department_profits` (
`id` int(10) unsigned NOT NULL,
`department_id` same_as_parent_table NOT NULL,
`y` year(4) NOT NULL,
`q` tinyint(3) unsigned NOT NULL,
`profits` decimal(9,2) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_dept_quarter` (`department_id`,`y`,`q`),
KEY `idx_profits_y_q_dept` (`profits`,`y`,`q`,`department_id`)
) ENGINE=InnoDB;
Converting the data from its current format to the proper format is left as an exercise for the reader, but it might involve 200 script-generated queries that look like:
SELECT CONCAT(
"INSERT INTO department_profits (department_id, y, q, profits) VALUES (",
"'", department_id, "',",
2001, ",",
1, ",",
profits_1q2001,
");"
)
FROM old_table;
If your question is then (say) when was the first time profits exceeded $10,000 and in which department, then finding the answer becomes something like:
SELECT department_id, y, q, profits
FROM department_profits
WHERE profits > 10000
ORDER BY y, q LIMIT 1;
For the actual question you asked -- if it really is a one-off -- since there are just 200,000 data points, I would do it manually. Export the whole table as tab-separated, drag it onto Excel, "Find/Replace" to change "x" to "-999" or some small value, then "Data -> Sort" by each column in turn until your answer pops to the top. Heck, plain old "Find" might tell you your answer. With just 200 columns, it won't take long, and you might learn something new about your data by seeing it all on the screen sorted various ways :)
Assuming your columns are c1 .. c6 you can use something like this to get it in sqlite:
select distinct (case when c1 = 'x' then 'c1' when c2 = 'x' then 'c2' when c3 = 'x' then 'c4' when c4 = 'x' then 'c4' when c5 = 'x' then 'c5' when c6 = 'x' then 'c6' else 'x' end) from mje order by 1 limit 1;
What is the sequence in which the values (separated by commas) will be updated?
$command = sprintf('UPDATE %s SET rating = ((rating * rating_count + %f) / (rating_count + 1.0)) , rating_count=rating_count+1 WHERE id=%d', $table, $ratingGiven, $id)`;
I want to make sure that
rating = (rating * rating_count + %f) / (rating_count + 1.0)
is executed before
rating_count=rating_count+1
without firing two SQL commands.
I am not sure if the update-value-statements are executed in the order in which they are separated by commas in MySql (or any other DB)?
I don't think it will matter UPDATE will read the current row and do the update upon it based on the existing values, and not the ones that are in the update.
So in both SET operations, the original value of rating_count will be used.