processing data with perl - selecting for update usage with mysql

processing data with perl - selecting for update usage with mysql - mysql

I have a table that is storing data that needs to be processed. I have id, status, data in the table. I'm currently going through and selecting id, data where status = #. I'm then doing an update immediately after the select, changing the status # so that it won't be selected again.
my program is multithreaded and sometimes I get threads that grab the same id as they are both querying the table at a relatively close time to each other, causing the grab of the same id. i looked into select for update, however, i either did the query wrong, or i'm not understanding what it is used for.
my goal is to find a way of grabbing the id, data that i need and setting the status so that no other thread tries to grab and process the same data. here is the code i tried. (i wrote it all together for show purpose here. i have my prepares set at the beginning of the program as to not do a prepare for each time it's ran, just in case anyone was concerned there)
my $select = $db->prepare("SELECT id, data FROM `TestTable` WHERE _status=4 LIMIT ? FOR UPDATE") or die $DBI::errstr;
if ($select->execute($limit))
{
while ($data = $select->fetchrow_hashref())
{
my $update_status = $db->prepare( "UPDATE `TestTable` SET _status = ?, data = ? WHERE _id=?");
$update_status->execute(10, "", $data->{_id});
push(#array_hash, $data);
}
}
when i run this, if doing multiple threads, i'll get many duplicate inserts, when trying to do an insert after i process my transaction data.
i'm not terribly familiar with mysql and the research i've done, i haven't found anything that really cleared this up for me.
thanks

As a sanity check, are you using InnoDB? MyISAM has zero transactional support, aside from faking it with full table locking.
I don't see where you're starting a transaction. MySQL's autocommit option is on by default, so starting a transaction and later committing would be necessary unless you turned off autocommit.

It looks like you simply rely on the database locking mechanisms. I googled perl dbi locking and found this:
$dbh->do("LOCK TABLES foo WRITE, bar READ");
$sth->prepare("SELECT x,y,z FROM bar");
$sth2->prepare("INSERT INTO foo SET a = ?");
while (#ary = $sth->fetchrow_array()) {
$sth2->$execute($ary[0]);
}
$sth2->finish();
$sth->finish();
$dbh->do("UNLOCK TABLES");
Not really saying GIYF as I am also fairly novice at both MySQL and DBI, but perhaps you can find other answers that way.
Another option might be as follows, and this only works if you control all the code accessing the data. You can create lock column in the table. When your code accesses the table it (pseudocode):
if row.lock != 1
row.lock = 1
read row
update row
row.lock = 0
next
else
sleep 1
redo
again though, this trusts that all users/script that access this data will agree to follow this policy. If you cannot ensure that then this won't work.
Anyway thats all the knowledge I have on the topic. Good Luck!

Related

How to run 'SELECT FOR UPDATE' in Laravel 3 / MySQL

I am trying to execute SELECT ... FOR UPDATE query using Laravel 3:
SELECT * from projects where id = 1 FOR UPDATE;
UPDATE projects SET money = money + 10 where id = 1;
I have tried several things for several hours now:
DB::connection()->pdo->exec($query);
and
DB::query($query)
I have also tried adding START TRANSACTION; ... COMMIT; to the query
and I tried to separate the SELECT from the UPDATE in two different parts like this:
DB::query($select);
DB::query($update);
Sometimes I get 0 rows affected, sometimes I get an error like this one:
SQLSTATE[HY000]: General error: 2014 Cannot execute queries while other unbuffered queries are active. Consider using PDOStatement::fetchAll(). Alternatively, if your code is only ever going to run against mysql, you may enable query buffering by setting the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY attribute.
SQL: UPDATE `sessions` SET `last_activity` = ?, `data` = ? WHERE `id` = ?
I want to lock the row in order to update sensitive data, using Laravel's database connection.
Thanks.

In case all you need to do is increase money by 10, you don't need to lock the row before update. Simply executing the update query will do the job. The SELECT query will only slow down your script and doesn't help in this case.
UPDATE projects SET money = money + 10 where id = 1;

I would use diferent queries for sure, so you can have control on what you are doing.
I would use a transaction.
If we read this simple explanations, pdo transactions are quite straightforward. They give us this simple but complete example, that ilustrates how everithing is as we should expect (consider $db to be your DB::connection()->pdo).
try {
$db->beginTransaction();
$db->exec("SOME QUERY");
$stmt = $db->prepare("SOME OTHER QUERY?");
$stmt->execute(array($value));
$stmt = $db->prepare("YET ANOTHER QUERY??");
$stmt->execute(array($value2, $value3));
$db->commit();
}
catch(PDOException $ex) {
//Something went wrong rollback!
$db->rollBack();
echo $ex->getMessage();
}
Lets go to your real statements. For the first of them, the SELECT ..., i wouldn't use exec, but query, since as stated here
PDO::exec() does not return results from a SELECT statement. For a
SELECT statement that you only need to issue once during your program,
consider issuing PDO::query(). For a statement that you need to issue
multiple times, prepare a PDOStatement object with PDO::prepare() and
issue the statement with PDOStatement::execute().
And assign its result to some temp variable like
$result= $db->query ($select);
After this execution, i would call $result->fetchAll(), or $result->closeCursor(), since as we can read here
If you do not fetch all of the data in a result set before issuing
your next call to PDO::query(), your call may fail. Call
PDOStatement::closeCursor() to release the database resources
associated with the PDOStatement object before issuing your next call
to PDO::query().
Then you can exec the update
$result= $db->exec($update);
And after all, just in case, i would call again $result->fetchAll(), or $result->closeCursor().

If the aim is
to lock the row in order to update sensitive data, using Laravel's database connection.
Maybe you can use PDO transactions :
DB::connection()->pdo->beginTransaction();
DB::connection()->pdo->commit();
DB::connection()->pdo->rollBack();

Perl/MySQL Relationship Query

I have the following perl code that will eventually be a webpage:
my($dbh) = DBI->connect("DBI:mysql:host=dbsrv;database=database","my_sqlu","my_sqlp") or die "Canny Connect";
my($sql) = "SELECT * FROM hardware where srv_name = \"$srv_name\"";
my($sth) = $dbh->prepare($sql);
$sth->execute();
$sth->bind_col( 1, \my($db_id));
$sth->bind_col( 2, \my($db_srv_name));
$sth->bind_col( 5, \my($db_site));
$sth->fetchrow();
$sth->finish ();
my($sql) = "SELECT sites.\`site_code\`, sites.\`long_name\` FROM \`hardware\` JOIN \`sites\` ON \`sites\`.id=\`hardware\`.\`site\` where \`hardware\`.\`id\`=\'$db_id\'";
my($sth) = $dbh->prepare($sql);
$sth->execute();
$sth->bind_col( 1, \my($db_site_code));
$sth->bind_col( 2, \my($db_long_name));
$sth->fetchrow();
$sth->finish ();
$dbh->disconnect;
print "$db_site_code<br>$db_long_name";
The query above does work however what I'm trying to find out is there any way I can run one SQL query and get the db_site_code and db_long_name from the sites DB without running the 2nd query? The hardware DB has the foreign key 'id' in the sites Db.
When you read anything about relational DBs they all say it's by far the most efficient method of getting data from your database but I just can't see how this is any quicker than just running 2 select queries. What I've done above would surely take longer than "select from hardware where srv_name = $srv_name" then "select from sites where id = db_site_id"? Any comments are greatly appreciated.

Here's an example of how to do this with placeholders as well as a combined query. If I understand your DB correctly, you can just omit the first query and add the server name instead of the ID in the second query. I might be mistaken there, but my example will still be of value for the Perl suggestions.
use strict;
use warnings;
use DBI;
# Create DB connection
my $dbh = DBI->connect("DBI:mysql:host=dbsrv;database=database","my_sqlu","my_sqlp")
or die "Cannot connect to database";
# Create the statement handle
my $sth = $dbh->prepare(<<'SQLQUERY') or die $dbh->errstr;
SELECT s.site_code, s.long_name
FROM hardware h
JOIN sites s ON s.id=h.site
WHERE h.srv_name=?
SQLQUERY
$sth->execute('Server Name'); # There's the parameter
my $res = $sth->fetchrow_hashref; # $res now has a hash ref with the first row
print "$res->{'site_code'}<br>$res->{'long_name'}";
There were a few issues with your code I'd like to point out to you:
You should always use strict and use warnings. They make your life easier!
You can leave the parens ( and ) out with my. Saves you keystrokes and makes your code more readable.
You can (but do not have to, this is preference!) leave out the parens after method calls that do not have arguments. Decide this for yourself.
As was already pointed out, always use placeholders with DBI. They are very simple. Now you don't have to escape the " with backslashes. Instead, just use ?.
Once you've combined your query, you can put it in a heredoc (<<'SQLQUERY'). It's a string that lasts from the next line to the delimiter (SQLQUERY). That way, your query is easier to read.
You can use one of the ref-fetchrow-methods to get all your result's columns into one hash. I used $sth->fetchrow_hashref because I find it most convenient. You've got the complete row and all the columns are named hash keys.
If called in a small scope (like a short sub), you don't need to finish a statement handle. It will be finished and destroyed by Perl automatically once it goes out of scope.
Another thing about performance: If this is just run occasionally, don't worry about it. You can profile your queries with DBI::Profile to see which way it is faster, but you should only do that if you really need to.
In my experience, especially with very huge queries and a very busy database, two or three queries are a lot better than a single big one because they do not take over the servers resources. But again, that is something you need to profile and benchmark (if the need arises).

Aside from #tadman's recommendation to use placeholders, I'd tag this as a sql question as well, but your solution is to simply add
srv_name = \"$srv_name\"
to your second where clause, so that your statement is:
"SELECT sites.\`site_code\`, sites.\`long_name\` FROM \`hardware\` JOIN \`sites\` ON \`sites\`.id=\`hardware\`.\`site\` where \`hardware\`.\`id\`=\'$db_id\'";
I strongly second #tadman's suggestion though -- use prepared statements and/or placeholders whenever possible.

Codeignighter Record wont insert

Using CI for the first time and i'm smashing my head with this seemingly simple issue. My query wont insert the record.
In an attempt to debug a possible problem, the insert code has been simplified but i'm still getting no joy.
Essentially, i'm using;
$data = array('post_post' => $this->input->post('ask_question'));
$this->db->insert('posts', $data);
I'm getting no errors (although that possibly due to disabling them in config/database.php due to another CI related trauma :-$ )
Ive used
echo print $this->db->last_query();
to get the generated query, shown as below:
INSERT INTO `posts` (`post_post`) VALUES ('some text')
I have pasted this query into phpMyAdmin, it inserts no problem. Ive even tried using $this->db->query() to run the outputted query above 'manually' but again, the record will not insert.
The scheme of the DB table 'posts' is simply two columns, post_id & post_post.
Please, any pointers on whats going on here would be greatly appreciated...thanks

OK..Solved, after much a messing with CI.
Got it to work by setting persistant connection to false.
$db['default']['pconnect'] = FALSE;
sigh

Things generally look ok, everything you have said suggests that it should work. My first instinct would be to check that what you're inserting is compatible with your SQL field.
Just a cool CI feature; I'd suggest you take a look at the CI Database Transaction class. Transactions allow you to wrap your query/queries inside a transaction, which can be rolled back on failure, and can also make error handling easier:
$this->db->trans_start();
$this->db->query('INSERT INTO posts ...etc ');
$this->db->trans_complete();
if ($this->db->trans_status() === FALSE)
{
// generate an error... or use the log_message() function to log your error
}
Alternatively, one thing you can do is put your Insert SQL statement into $this->db->query(your_query_here), instead of calling insert. There is a CI Query feature called Query Binding which will also auto-escape your passed data array.
Let me know how it goes, and hope this helps!

MySQL - Fastest way to check if data in InnoDB table has changed

My application is very database intensive. Currently, I'm running MySQL 5.5.19 and using MyISAM, but I'm in the process of migrating to InnoDB. The only problem left is checksum performance.
My application does about 500-1000 "CHECKSUM TABLE" statements per second in peak times, because the clients GUI is polling the database constantly for changes (it is a monitoring system, so must be very responsive and fast).
With MyISAM, there are Live checksums that are precalculated on table modification and are VERY fast. However, there is no such thing in InnoDB. So, CHECKSUM TABLE is very slow...
I hoped to be able to check the last update time of the table, Unfortunately, this is not available in InnoDB either. I'm stuck now, because tests have shownn that the performance of the application drops drastically...
There are simply too much lines of code that update the tables, so implementing logic in the application to log table changes is out of the question...
The Database ecosystem consists of one master na 3 slaves, so local file checks is not an option.
I thought of a method to mimic a checksum cache - a lookup table with two columns - table_name, checksum, and update that table with triggers when changes in a table occurs, but i have around 100 tables to monitor and this means 3 triggers per table = 300 triggers. Hard to maintain, and i'm not sure that this wont be a performance hog again.
So is there any FAST method to detect changes in InnoDB tables?
Thanks!

The simplest way is to add a nullable column with type TIMESTAMP, with the trigger: ON UPDATE CURRENT_TIMESTAMP.
Therefore, the inserts will not change because the column accepts nulls, and you can select only new and changed columns by saying:
SELECT * FROM `table` WHERE `mdate` > '2011-12-21 12:31:22'
Every time you update a row this column will change automatically.
Here are some more informations: http://dev.mysql.com/doc/refman/5.0/en/timestamp.html
To see deleted rows simply create a trigger which is going to log every deletion to another table:
DELIMITER $$
CREATE TRIGGER MyTable_Trigger
AFTER DELETE ON MyTable
FOR EACH ROW
BEGIN
INSERT INTO MyTable_Deleted VALUES(OLD.id, NOW());
END$$

I think I've found the solution. For some time I was looking at Percona Server to replace my MySQL servers, and now i think there is a good reason for this.
Percona server introduces many new INFORMATION_SCHEMA tables like INNODB_TABLE_STATS, which isn't available in standard MySQL server.
When you do:
SELECT rows, modified FROM information_schema.innodb_table_stats WHERE table_schema='db' AND table_name='table'
You get actual row count and a counter. The Official documentation says the following about this field:
If the value of modified column exceeds “rows / 16” or 2000000000, the
statistics recalculation is done when innodb_stats_auto_update == 1.
We can estimate the oldness of the statistics by this value.
So this counter wraps every once in a while, but you can make a checksum of the number of rows and the counter, and then with every modification of the table you get a unique checksum. E.g.:
SELECT MD5(CONCAT(rows,'_',modified)) AS checksum FROM information_schema.innodb_table_stats WHERE table_schema='db' AND table_name='table';
I was going do upgrade my servers to Percona server anyway so this bounding is not an issue for me. Managing hundreds of triggers and adding fields to tables is a major pain for this application, because it's very late in development.
This is the PHP function I've come up with to make sure that tables can be checksummed whatever engine and server is used:
function checksum_table($input_tables){
if(!$input_tables) return false; // Sanity check
$tables = (is_array($input_tables)) ? $input_tables : array($input_tables); // Make $tables always an array
$where = "";
$checksum = "";
$found_tables = array();
$tables_indexed = array();
foreach($tables as $table_name){
$tables_indexed[$table_name] = true; // Indexed array for faster searching
if(strstr($table_name,".")){ // If we are passing db.table_name
$table_name_split = explode(".",$table_name);
$where .= "(table_schema='".$table_name_split[0]."' AND table_name='".$table_name_split[1]."') OR ";
}else{
$where .= "(table_schema=DATABASE() AND table_name='".$table_name."') OR ";
}
}
if($where != ""){ // Sanity check
$where = substr($where,0,-4); // Remove the last "OR"
$get_chksum = mysql_query("SELECT table_schema, table_name, rows, modified FROM information_schema.innodb_table_stats WHERE ".$where);
while($row = mysql_fetch_assoc($get_chksum)){
if($tables_indexed[$row[table_name]]){ // Not entirely foolproof, but saves some queries like "SELECT DATABASE()" to find out the current database
$found_tables[$row[table_name]] = true;
}elseif($tables_indexed[$row[table_schema].".".$row[table_name]]){
$found_tables[$row[table_schema].".".$row[table_name]] = true;
}
$checksum .= "_".$row[rows]."_".$row[modified]."_";
}
}
foreach($tables as $table_name){
if(!$found_tables[$table_name]){ // Table is not found in information_schema.innodb_table_stats (Probably not InnoDB table or not using Percona Server)
$get_chksum = mysql_query("CHECKSUM TABLE ".$table_name); // Checksuming the old-fashioned way
$chksum = mysql_fetch_assoc($get_chksum);
$checksum .= "_".$chksum[Checksum]."_";
}
}
$checksum = sprintf("%s",crc32($checksum)); // Using crc32 because it's faster than md5(). Must be returned as string to prevent PHPs signed integer problems.
return $checksum;
}
You can use it like this:
// checksum a signle table in the current db
$checksum = checksum_table("test_table");
// checksum a signle table in db other than the current
$checksum = checksum_table("other_db.test_table");
// checksum multiple tables at once. It's faster when using Percona server, because all tables are checksummed via one select.
$checksum = checksum_table(array("test_table, "other_db.test_table"));
I hope this saves some trouble to other people having the same problem.

Updating the db 6000 times will take few minutes?

I am writing a test program with Ruby and ActiveRecord, and it reads a document
which is like 6000 words long. And then I just tally up the words by
recordWord = Word.find_by_s(word);
if (recordWord.nil?)
recordWord = Word.new
recordWord.s = word
end
if recordWord.count.nil?
recordWord.count = 1
else
recordWord.count += 1
end
recordWord.save
and so this part loops for 6000 times... and it takes a few minutes to
run at least using sqlite3. Is it normal? I was expecting it could run
within a couple seconds... can MySQL speed it up a lot?

With 6000 calls to write to the database, you're going to see speed issues. I would save the various tallies in memory and save to the database once at the end, not 6000 times along the way.

Take a look at AR:Extensions as well to handle the bulk insertions.
http://rubypond.com/articles/2008/06/18/bulk-insertion-of-data-with-activerecord/

I wrote up some quick code in perl that simply does:
Create the database
Insert a record that only contains a single integer
Retrieve the most recent record and verify that it returns what it inserted
And it does steps #2 and #3 6000 times. This is obviously a considerably lighter workload than having an entire object/relational bridge. For this trivial case with SQLite it still took 17 seconds to execute, so your desire to have it take "a couple of seconds" is not realistic on "traditional hardware."
Using the monitor I verified that it was primarily disk activity that was slowing it down. Based on that if for some reason you really do need the database to behave that quickly I suggest one of two options:
Do what people have suggested and find away around the requirement
Try buying some solid state disks.
I think #1 is a good way to start :)
Code:
#!/usr/bin/perl
use warnings;
use strict;
use DBI;
my $dbh = DBI->connect('dbi:SQLite:dbname=/tmp/dbfile', '', '');
create_database($dbh);
insert_data($dbh);
sub insert_data {
my ($dbh) = #_;
my $insert_sql = "INSERT INTO test_table (test_data) values (?)";
my $retrieve_sql = "SELECT test_data FROM test_table WHERE test_data = ?";
my $insert_sth = $dbh->prepare($insert_sql);
my $retrieve_sth = $dbh->prepare($retrieve_sql);
my $i = 0;
while (++$i < 6000) {
$insert_sth->execute(($i));
$retrieve_sth->execute(($i));
my $hash_ref = $retrieve_sth->fetchrow_hashref;
die "bad data!" unless $hash_ref->{'test_data'} == $i;
}
}
sub create_database {
my ($dbh) = #_;
my $status = $dbh->do("DROP TABLE test_table");
# return error status if CREATE resulted in error
if (!defined $status) {
print "DROP TABLE failed";
}
my $create_statement = "CREATE TABLE test_table (id INTEGER PRIMARY KEY AUTOINCREMENT, \n";
$create_statement .= "test_data varchar(255)\n";
$create_statement .= ");";
$status = $dbh->do($create_statement);
# return error status if CREATE resulted in error
if (!defined $status) {
die "CREATE failed";
}
}

What kind of database connection are you using? Some databases allow you to connect 'directly' rather then using a TCP network connection that goes through the network stack. In other words, if you're making an internet connection and sending data through that way, it can slow things down.
Another way to boost performance of a database connection is to group SQL statements together in a single command.
For example, making a single 6,000 line SQL statement that looks like this
"update words set count = count + 1 where word = 'the'
update words set count = count + 1 where word = 'in'
...
update words set count = count + 1 where word = 'copacetic'"
and run that as a single command, performance will be a lot better. By default, MySQL has a 'packet size' limit of 1 megabyte, but you can change that in the my.ini file to be larger if you want.
Since you're abstracting away your database calls through ActiveRecord, you don't have much control over how the commands are issued, so it can be difficult to optimize your code.
Another thin you could do would be to keep a count of words in memory, and then only insert the final total into the database, rather then doing an update every time you come across a word. That will probably cut down a lot on the number of inserts, because if you do an update every time you come across the word 'the', that's a huge, huge waste. Words have a 'long tail' distribution and the most common words are hugely more common then more obscure words. Then the underlying SQL would look more like this:
"update words set count = 300 where word = 'the'
update words set count = 250 where word = 'in'
...
update words set count = 1 where word = 'copacetic'"
If you're worried about taking up too much memory, you could count words and periodically 'flush' them. So read a couple megabytes of text, then spend a few seconds updating the totals, rather then updating each word every time you encounter it. If you want to improve performance even more, you should consider issuing SQL commands in batches directly

Without knowing about Ruby and Sqlite, some general hints:
create a unique index on Word.s (you did not state whether you have one)
define a default for Word.count in the database ( DEFAULT 1 )
optimize assignment of count:
recordWord = Word.find_by_s(word);
if (recordWord.nil?)
recordWord = Word.new
recordWord.s = word
recordWord.count = 1
else
recordWord.count += 1
end
recordWord.save

Use BEGIN TRANSACTION before your updates then COMMIT at the end.

ok, i found some general rule:
1) use a hash to keep the count first, not the db
2) at the end, wrap all insert or updates in one transaction, so that it won't hit the db 6000 times.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008