MySQL - Fastest way to check if data in InnoDB table has changed - mysql

My application is very database intensive. Currently, I'm running MySQL 5.5.19 and using MyISAM, but I'm in the process of migrating to InnoDB. The only problem left is checksum performance.
My application does about 500-1000 "CHECKSUM TABLE" statements per second in peak times, because the clients GUI is polling the database constantly for changes (it is a monitoring system, so must be very responsive and fast).
With MyISAM, there are Live checksums that are precalculated on table modification and are VERY fast. However, there is no such thing in InnoDB. So, CHECKSUM TABLE is very slow...
I hoped to be able to check the last update time of the table, Unfortunately, this is not available in InnoDB either. I'm stuck now, because tests have shownn that the performance of the application drops drastically...
There are simply too much lines of code that update the tables, so implementing logic in the application to log table changes is out of the question...
The Database ecosystem consists of one master na 3 slaves, so local file checks is not an option.
I thought of a method to mimic a checksum cache - a lookup table with two columns - table_name, checksum, and update that table with triggers when changes in a table occurs, but i have around 100 tables to monitor and this means 3 triggers per table = 300 triggers. Hard to maintain, and i'm not sure that this wont be a performance hog again.
So is there any FAST method to detect changes in InnoDB tables?
Thanks!

The simplest way is to add a nullable column with type TIMESTAMP, with the trigger: ON UPDATE CURRENT_TIMESTAMP.
Therefore, the inserts will not change because the column accepts nulls, and you can select only new and changed columns by saying:
SELECT * FROM `table` WHERE `mdate` > '2011-12-21 12:31:22'
Every time you update a row this column will change automatically.
Here are some more informations: http://dev.mysql.com/doc/refman/5.0/en/timestamp.html
To see deleted rows simply create a trigger which is going to log every deletion to another table:
DELIMITER $$
CREATE TRIGGER MyTable_Trigger
AFTER DELETE ON MyTable
FOR EACH ROW
BEGIN
INSERT INTO MyTable_Deleted VALUES(OLD.id, NOW());
END$$

I think I've found the solution. For some time I was looking at Percona Server to replace my MySQL servers, and now i think there is a good reason for this.
Percona server introduces many new INFORMATION_SCHEMA tables like INNODB_TABLE_STATS, which isn't available in standard MySQL server.
When you do:
SELECT rows, modified FROM information_schema.innodb_table_stats WHERE table_schema='db' AND table_name='table'
You get actual row count and a counter. The Official documentation says the following about this field:
If the value of modified column exceeds “rows / 16” or 2000000000, the
statistics recalculation is done when innodb_stats_auto_update == 1.
We can estimate the oldness of the statistics by this value.
So this counter wraps every once in a while, but you can make a checksum of the number of rows and the counter, and then with every modification of the table you get a unique checksum. E.g.:
SELECT MD5(CONCAT(rows,'_',modified)) AS checksum FROM information_schema.innodb_table_stats WHERE table_schema='db' AND table_name='table';
I was going do upgrade my servers to Percona server anyway so this bounding is not an issue for me. Managing hundreds of triggers and adding fields to tables is a major pain for this application, because it's very late in development.
This is the PHP function I've come up with to make sure that tables can be checksummed whatever engine and server is used:
function checksum_table($input_tables){
if(!$input_tables) return false; // Sanity check
$tables = (is_array($input_tables)) ? $input_tables : array($input_tables); // Make $tables always an array
$where = "";
$checksum = "";
$found_tables = array();
$tables_indexed = array();
foreach($tables as $table_name){
$tables_indexed[$table_name] = true; // Indexed array for faster searching
if(strstr($table_name,".")){ // If we are passing db.table_name
$table_name_split = explode(".",$table_name);
$where .= "(table_schema='".$table_name_split[0]."' AND table_name='".$table_name_split[1]."') OR ";
}else{
$where .= "(table_schema=DATABASE() AND table_name='".$table_name."') OR ";
}
}
if($where != ""){ // Sanity check
$where = substr($where,0,-4); // Remove the last "OR"
$get_chksum = mysql_query("SELECT table_schema, table_name, rows, modified FROM information_schema.innodb_table_stats WHERE ".$where);
while($row = mysql_fetch_assoc($get_chksum)){
if($tables_indexed[$row[table_name]]){ // Not entirely foolproof, but saves some queries like "SELECT DATABASE()" to find out the current database
$found_tables[$row[table_name]] = true;
}elseif($tables_indexed[$row[table_schema].".".$row[table_name]]){
$found_tables[$row[table_schema].".".$row[table_name]] = true;
}
$checksum .= "_".$row[rows]."_".$row[modified]."_";
}
}
foreach($tables as $table_name){
if(!$found_tables[$table_name]){ // Table is not found in information_schema.innodb_table_stats (Probably not InnoDB table or not using Percona Server)
$get_chksum = mysql_query("CHECKSUM TABLE ".$table_name); // Checksuming the old-fashioned way
$chksum = mysql_fetch_assoc($get_chksum);
$checksum .= "_".$chksum[Checksum]."_";
}
}
$checksum = sprintf("%s",crc32($checksum)); // Using crc32 because it's faster than md5(). Must be returned as string to prevent PHPs signed integer problems.
return $checksum;
}
You can use it like this:
// checksum a signle table in the current db
$checksum = checksum_table("test_table");
// checksum a signle table in db other than the current
$checksum = checksum_table("other_db.test_table");
// checksum multiple tables at once. It's faster when using Percona server, because all tables are checksummed via one select.
$checksum = checksum_table(array("test_table, "other_db.test_table"));
I hope this saves some trouble to other people having the same problem.

Related

EF6 & MySql - Avoid selecting Id after inserting record

I have a program that inserts thousands of records in a MySql DB. The operation cannot be done in bulk for a variety of reasons. Overall, the operations are very slow.
After looking at the SQL that is being generated, I can see that EF is calling a select to get the Id of the recently inserted record.
SET SESSION sql_mode = 'ANSI'; INSERT INTO `table`(
'blah') VALUES(
86613784);
SELECT
`id`
FROM `table`
WHERE row_count() > 0 AND `id`= last_insert_id()
Since I don't need that Id, how can I tell EF to avoid the call and save me the time?
FYI - I am already using the following statements to speed things up as well.
Configuration.ValidateOnSaveEnabled = false;
Configuration.AutoDetectChangesEnabled = false;
As requested, here is the code used to create the record. Not much to it, but if it helps...
using (var ctx = new Tc_TrademarkEntities(_entityFrameworkConnectionString))
{
ctx.case_file.Add(request.Trademark);
ctx.SaveChanges();
}

Yii Framework - InnoDB vs MyISAM

i have a question: i have built a big application with Yii and InnoDB and came to the problem, that the insert/update durate really really long time, here is my php report:
INNODB:
admin User update 55.247464895248 seconds
ekuskov User update 13.282548904419 seconds
doriwall User update 0.002094030380249 seconds
MYISAM:
admin User update 7.8317859172821 seconds
ekuskov User update 1.6304929256439 seconds
doriwall User update 0.0020859241485596 seconds
Can anyone suggest some solution to speed up the insert/update?
EDIT ----------------------------------------------
Now i used some very simple insert loop:
public function run($args) {
$time = -microtime(true);
$begin = DateTime::createFromFormat('Y-m-d H:i:s', '2010-01-01 00:00:00');
$end = DateTime::createFromFormat('Y-m-d H:i:s', '2013-01-01 00:00:00');
$end->add(new DateInterval('P1D'));
$interval = DateInterval::createFromDateString('1 day');
$days = new DatePeriod($begin, $interval, $end);
foreach ( $days as $day ) {
echo "i";
$track = new TimeTracking();
$track->user_id = 25;
$track->date = $day->format('Y-m-d H:i:s');
$track->active = 4;
$track->save(false);
}
$time += microtime(true);
echo count($days)." items insert - $time seconds\n";
}
and now the INSERT times are following:
InnoDB: items insert - 72.269570827484 seconds
MyISAM: items insert - 0.87537479400635 seconds
[EDIT] And now i was counting time for whole SAVE method and Yii Models "save()" function:
UPDATE: model->save(false) - 0.1096498966217 seconds
UPDATE: controller save function () - 0.1302649974823 seconds
CREATE: model->save(false) - 0.052282094955444 seconds
CREATE: controller save function () - 0.057214975357056 seconds
Why just save() method takes so long?
[EDIT] I have tested save() vs command() and they durate same:
$track->save(false);
or
$command = Yii::app()->db->createCommand();
$command->insert('timeTracking', array(
'id'=>NULL,
'date'=>$track->date,
'active'=>$track->active,
'user_id'=>$track->user_id,
));
EDIT -----------------------------
And here is a statistic for inserting 1,097 Objects:
save(): 0.86-0.94,
$command->insert(): 0.67-0.72,
$command->execute(): 0.46-0.48,
mysql_query(): 0.33-0.36
FINALLY ANSWER: If you want to use some massive INSERT or UPDATE methods you should consider to create the functions with direct MYSQL Calls, there you will save almost 70% of execution time.
Regards,
Edgar
A table crawling on insert and update may indicate that you've got a bit carried away with your indexes. Remember that the DB has to stop and recompile indexes after every commit.
With Yii and InnoDB You should wrap your commands in a transaction like so:
$transaction = Yii::app()->db->beginTransaction();
try {
// TODO Loop through all of your inserts
$transaction->commit();
} catch (Exception $ex) {
// There was some type of error. Rollback the last transaction
$transaction->rollback();
}
The solution was: for those tables, where you need big inserts and quick response, convert them to MyISAM. Otherwise the user has to wait a long time and there is a threat, that the PHP Max Script Execution Time will stop your script.
InnoDB is a much more complex, feature-rich engine than MyISAM in many respects. That makes it slower and there is not much you can do in your queries, configuration, or otherwise to fix that. However, MySQL is trying to close the gap with recent updates.
Look at version 5.6:
http://dev.mysql.com/doc/refman/5.6/en/innodb-performance.html
Your best bet may be to upgrade your version of MySQL if you're behind.
InnoDB gives the option to make relations and constraints which makes the database faster and you will be able to generate models & crud with those relations.
You could also consider to break the queries into smaller ones and execute them one by one.
As MySQL specialists say, use InnoDB until you can explicitly prove that you need MyISAM.
Performance issues is not good argument.
If you have big application, you probably will face problems with table-level locks and data inconsistency. So use InnoDB.
Your performance issue may be connected to the lack of indexes or hard disk and its file system issues.
I worked with tables having hundreds of millions rows which were updated and inserted constantly from several connections. Those tables had InnoDB engine.
Of course, it you have data that should be added in a bunch, add them using one insert statement.

Delete duplicate rows, do not preserve one row

I need a query that goes through each entry in a database, checks if a single value is duplicated elsewhere in the database, and if it is - deletes both entries (or all, if more than two).
Problem is the entries are URLs, up to 255 characters, with no way of identifying the row. Some existing answers on Stack Overflow do not work for me due to performance limitations, or they use uniqueid which obviously won't work when dealing with a string.
Long Version:
I have two databases containing URLs (and only URLs). One database has around 3,000 urls and the other around 1,000.
However, a large majority of the 1,000 urls were taken from the 3,000 url database. I need to merge the 1,000 into the 3,000 as new entries only.
For this, I made a third database with combined URLs from both tables, about 4,000 entries. I need to find all duplicate entries in this database and delete them (Both of them, without leaving either).
I have followed the query of a few examples on this site, but whenever I try to delete both entries it ends up deleting all the entries, or giving sql errors.
Alternatively:
I have two databases, each containing the separate database. I need to check each row from one database against the other to find any that aren't duplicates, and then add those to a third database.
Since you were looking for a SQL solution here is one. Lets assume that your table has a single column for simplicity sake. However this will work for any number of fields of course:
CREATE TABLE `allkindsofvalues` (
`value` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The following series of queries will accomplish what you are looking for:
CREATE TABLE allkindsofvalues_temp LIKE allkindsofvalues;
INSERT INTO allkindsofvalues_temp SELECT * FROM allkindsofvalues akv1 WHERE (SELECT COUNT(*) FROM allkindsofvalues akv2 WHERE akv1.value = akv2.value) = 1;
DROP TABLE allkindsofvalues;
RENAME TABLE allkindsofvalues_temp to allkindsofvalues;
The OP wrote:
I've got my own PHP solution which is pretty hacky, but works.
I went with a PHP script to accomplish this, as I'm more familiar with PHP than MySQL.
This generates a simple list of urls that only exist in the target
database, but not both. If you have more than 7,000 entries to parse
this may take awhile, and you will need to copy/paste the results
into a text file or expand the script to store them back into a
database.
I'm just doing it manually to save time.
Note: Uses MeekroDB
<pre>
<?php
require('meekrodb.2.1.class.php');
DB::$user = 'root';
DB::$password = '';
DB::$dbName = 'testdb';
$all = DB::query('SELECT * FROM old_urls LIMIT 7000');
foreach($all as $row) {
$test = DB::query('SELECT url FROM new_urls WHERE url=%s',
$row['url']);
if (!is_array($test)) {
echo $row['url'] . "\n";
}else{
if (count($test) == 0) {
echo $row['url'] . "\n";
}
}
}
?>
</pre>

processing data with perl - selecting for update usage with mysql

I have a table that is storing data that needs to be processed. I have id, status, data in the table. I'm currently going through and selecting id, data where status = #. I'm then doing an update immediately after the select, changing the status # so that it won't be selected again.
my program is multithreaded and sometimes I get threads that grab the same id as they are both querying the table at a relatively close time to each other, causing the grab of the same id. i looked into select for update, however, i either did the query wrong, or i'm not understanding what it is used for.
my goal is to find a way of grabbing the id, data that i need and setting the status so that no other thread tries to grab and process the same data. here is the code i tried. (i wrote it all together for show purpose here. i have my prepares set at the beginning of the program as to not do a prepare for each time it's ran, just in case anyone was concerned there)
my $select = $db->prepare("SELECT id, data FROM `TestTable` WHERE _status=4 LIMIT ? FOR UPDATE") or die $DBI::errstr;
if ($select->execute($limit))
{
while ($data = $select->fetchrow_hashref())
{
my $update_status = $db->prepare( "UPDATE `TestTable` SET _status = ?, data = ? WHERE _id=?");
$update_status->execute(10, "", $data->{_id});
push(#array_hash, $data);
}
}
when i run this, if doing multiple threads, i'll get many duplicate inserts, when trying to do an insert after i process my transaction data.
i'm not terribly familiar with mysql and the research i've done, i haven't found anything that really cleared this up for me.
thanks
As a sanity check, are you using InnoDB? MyISAM has zero transactional support, aside from faking it with full table locking.
I don't see where you're starting a transaction. MySQL's autocommit option is on by default, so starting a transaction and later committing would be necessary unless you turned off autocommit.
It looks like you simply rely on the database locking mechanisms. I googled perl dbi locking and found this:
$dbh->do("LOCK TABLES foo WRITE, bar READ");
$sth->prepare("SELECT x,y,z FROM bar");
$sth2->prepare("INSERT INTO foo SET a = ?");
while (#ary = $sth->fetchrow_array()) {
$sth2->$execute($ary[0]);
}
$sth2->finish();
$sth->finish();
$dbh->do("UNLOCK TABLES");
Not really saying GIYF as I am also fairly novice at both MySQL and DBI, but perhaps you can find other answers that way.
Another option might be as follows, and this only works if you control all the code accessing the data. You can create lock column in the table. When your code accesses the table it (pseudocode):
if row.lock != 1
row.lock = 1
read row
update row
row.lock = 0
next
else
sleep 1
redo
again though, this trusts that all users/script that access this data will agree to follow this policy. If you cannot ensure that then this won't work.
Anyway thats all the knowledge I have on the topic. Good Luck!

Updating the db 6000 times will take few minutes?

I am writing a test program with Ruby and ActiveRecord, and it reads a document
which is like 6000 words long. And then I just tally up the words by
recordWord = Word.find_by_s(word);
if (recordWord.nil?)
recordWord = Word.new
recordWord.s = word
end
if recordWord.count.nil?
recordWord.count = 1
else
recordWord.count += 1
end
recordWord.save
and so this part loops for 6000 times... and it takes a few minutes to
run at least using sqlite3. Is it normal? I was expecting it could run
within a couple seconds... can MySQL speed it up a lot?
With 6000 calls to write to the database, you're going to see speed issues. I would save the various tallies in memory and save to the database once at the end, not 6000 times along the way.
Take a look at AR:Extensions as well to handle the bulk insertions.
http://rubypond.com/articles/2008/06/18/bulk-insertion-of-data-with-activerecord/
I wrote up some quick code in perl that simply does:
Create the database
Insert a record that only contains a single integer
Retrieve the most recent record and verify that it returns what it inserted
And it does steps #2 and #3 6000 times. This is obviously a considerably lighter workload than having an entire object/relational bridge. For this trivial case with SQLite it still took 17 seconds to execute, so your desire to have it take "a couple of seconds" is not realistic on "traditional hardware."
Using the monitor I verified that it was primarily disk activity that was slowing it down. Based on that if for some reason you really do need the database to behave that quickly I suggest one of two options:
Do what people have suggested and find away around the requirement
Try buying some solid state disks.
I think #1 is a good way to start :)
Code:
#!/usr/bin/perl
use warnings;
use strict;
use DBI;
my $dbh = DBI->connect('dbi:SQLite:dbname=/tmp/dbfile', '', '');
create_database($dbh);
insert_data($dbh);
sub insert_data {
my ($dbh) = #_;
my $insert_sql = "INSERT INTO test_table (test_data) values (?)";
my $retrieve_sql = "SELECT test_data FROM test_table WHERE test_data = ?";
my $insert_sth = $dbh->prepare($insert_sql);
my $retrieve_sth = $dbh->prepare($retrieve_sql);
my $i = 0;
while (++$i < 6000) {
$insert_sth->execute(($i));
$retrieve_sth->execute(($i));
my $hash_ref = $retrieve_sth->fetchrow_hashref;
die "bad data!" unless $hash_ref->{'test_data'} == $i;
}
}
sub create_database {
my ($dbh) = #_;
my $status = $dbh->do("DROP TABLE test_table");
# return error status if CREATE resulted in error
if (!defined $status) {
print "DROP TABLE failed";
}
my $create_statement = "CREATE TABLE test_table (id INTEGER PRIMARY KEY AUTOINCREMENT, \n";
$create_statement .= "test_data varchar(255)\n";
$create_statement .= ");";
$status = $dbh->do($create_statement);
# return error status if CREATE resulted in error
if (!defined $status) {
die "CREATE failed";
}
}
What kind of database connection are you using? Some databases allow you to connect 'directly' rather then using a TCP network connection that goes through the network stack. In other words, if you're making an internet connection and sending data through that way, it can slow things down.
Another way to boost performance of a database connection is to group SQL statements together in a single command.
For example, making a single 6,000 line SQL statement that looks like this
"update words set count = count + 1 where word = 'the'
update words set count = count + 1 where word = 'in'
...
update words set count = count + 1 where word = 'copacetic'"
and run that as a single command, performance will be a lot better. By default, MySQL has a 'packet size' limit of 1 megabyte, but you can change that in the my.ini file to be larger if you want.
Since you're abstracting away your database calls through ActiveRecord, you don't have much control over how the commands are issued, so it can be difficult to optimize your code.
Another thin you could do would be to keep a count of words in memory, and then only insert the final total into the database, rather then doing an update every time you come across a word. That will probably cut down a lot on the number of inserts, because if you do an update every time you come across the word 'the', that's a huge, huge waste. Words have a 'long tail' distribution and the most common words are hugely more common then more obscure words. Then the underlying SQL would look more like this:
"update words set count = 300 where word = 'the'
update words set count = 250 where word = 'in'
...
update words set count = 1 where word = 'copacetic'"
If you're worried about taking up too much memory, you could count words and periodically 'flush' them. So read a couple megabytes of text, then spend a few seconds updating the totals, rather then updating each word every time you encounter it. If you want to improve performance even more, you should consider issuing SQL commands in batches directly
Without knowing about Ruby and Sqlite, some general hints:
create a unique index on Word.s (you did not state whether you have one)
define a default for Word.count in the database ( DEFAULT 1 )
optimize assignment of count:
recordWord = Word.find_by_s(word);
if (recordWord.nil?)
recordWord = Word.new
recordWord.s = word
recordWord.count = 1
else
recordWord.count += 1
end
recordWord.save
Use BEGIN TRANSACTION before your updates then COMMIT at the end.
ok, i found some general rule:
1) use a hash to keep the count first, not the db
2) at the end, wrap all insert or updates in one transaction, so that it won't hit the db 6000 times.