how to get incremental data of mysql based on bin log - mysql

use below parameters to using bin log of one single mysql instance:
--log-bin=XXX
--server-id=XXX
--log-bin-index=XXX
--binlog-checksum=CRC32
--binlog_format=ROW
every thing starts from here:
read one table with repeatable read isolation level and dump all data into somewhere. meanwhile mark this read transaction by write something unique at the very beginning and very ending of this transaction so I can easily locate this transaction in bin log.
when read these data, active transactions are modifying these data at the same time.
when finish reading these data by add finish mark and commit this transaction, other transactions modified these data may commit before or after this transaction.
how to get precise diff(incremental) of the table data by only reading bin log next time?
each time locked table to read is the option, but lock table is heavy load and will be the last option.
are there any other options?

Related

will mysql guarantee binary log (bin log) write Table_map event/Write_rows event/Xid event atomically?

try to use bin log to read incremental data but insert/update/delete events only record table id while table id is record in Table_map event before these modified events.
so is insert/update/delete events and Table_map event are atomically written into binary log so that certain modify events' table name can be told by very close previous Table_map event?
if not, how to get table name for certain insert/update/delete events with table id?
#BillKarwin
after I read another project:maxwell(https://github.com/zendesk/maxwell) the answer is yes, mysql transactionnly write insert/update/delete events with begin(event type = Query) and COMMIT(event type = Xid)surrounding them.
if read master binlog directly, there could be one corner case AFAIK, the server crashed during write each transaction, then there will be unclosed events. so I just need to cover this case.
so basically my solution is below:
at the very beginning, I fetch the complete data with below actions:
flush tables tableA, tableB, tableC WITH READ LOCK;
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION WITH CONSISTENT SNAPSHOT;
show master status;
//record binlog filename and position
UNLOCK TABLES;
//read complete data
commit;
with above action, not only get complete data, but also record the related binlog position.
each time after this, I start to read binlog from the position last time record and get the data I interested with caution to take care of the corner case I mentioned above since the unclosed events may be rolled back by mysql server.
after read binlog, record the position of currently read for next iteration.
#BillKarwin, thanks for u help

Getting stale results in multiprocessing environment

I am using 2 separate processes via multiprocessing in my application. Both have access to a MySQL database via sqlalchemy core (not the ORM). One process reads data from various sources and writes them to the database. The other process just reads the data from the database.
I have a query which gets the latest record from the a table and displays the id. However it always displays the first id which was created when I started the program rather than the latest inserted id (new rows are created every few seconds).
If I use a separate MySQL tool and run the query manually I get correct results, but SQL alchemy is always giving me stale results.
Since you can see the changes your writer process is making with another MySQL tool that means your writer process is indeed committing the data (at least, if you are using InnoDB it does).
InnoDB shows you the state of the database as of when you started your transaction. Whatever other tools you are using probably have an autocommit feature turned on where a new transaction is implicitly started following each query.
To see the changes in SQLAlchemy do as zzzeek suggests and change your monitoring/reader process to begin a new transaction.
One technique I've used to do this myself is to add autocommit=True to the execution_options of my queries, e.g.:
result = conn.execute( select( [table] ).where( table.c.id == 123 ).execution_options( autocommit=True ) )
assuming you're using innodb the data on your connection will appear "stale" for as long as you keep the current transaction running, or until you commit the other transaction. In order for one process to see the data from the other process, two things need to happen: 1. the transaction that created the new data needs to be committed and 2. the current transaction, assuming it's read some of that data already, needs to be rolled back or committed and started again. See The InnoDB Transaction Model and Locking.

How to get hibernate failed transaction details

I'm writing a script to load records from a file in to a mysql DB using hibernate. I'm processing records in batches of 1000 using Transactions, an insert would fail if the record already exists in the DB, which would essentially make entire Transaction rolled back.Is there a way to know what are the records processed in rolled back transaction ?
Also, considering this scenario is there a better to way to do it ? Do note that the script runs daily and its not a one time loading and the file typically will have about 250 million records daily.
You can use the StatelessSession API and check for a ConstraintViolationException; you can discard failure record without rollback transaction.

Database dumping in mysql after certain checkpoints

I want to get mysqldump after certain checkpoint e.g. if i take the mysqldump now then next time when i will take the dump it should give me only the commands which executed between this time interval. is there anyway to get this using mysqldump.
One more thing how to show the commands delete, update in the mysqldump files.
Thanks
I dont think this is possible from a MySQLdump, however that feature exists as part of MySQL core - its called Binlogging or binary logging.
The binary log contains “events” that describe database changes such as table creation operations or changes to table data. It also contains events for statements that potentially could have made changes (for example, a DELETE which matched no rows). The binary log also contains information about how long each statement took that updated data
Check this out http://dev.mysql.com/doc/refman/5.0/en/binary-log.html
Word of warning, binlogs can slow down the performance of your server.

What is binlog in mysql?

I've set mysql parameter innodb_flush_log_at_trx_commit=0. It means that mysql flushes transactions to HDD 1 time per second. Is it true that if mysql will fail with this flush (because of power off) i will lose my data from these transactions. Or mysql will save them in data file (ibdata1) after each transaction regardless of binlog flush?
Thanks.
The binary log contains “events” that describe database changes such as table creation operations or changes to table data. It also contains events for statements that potentially could have made changes (for example, a DELETE which matched no rows), unless row-based logging is used. The binary log also contains information about how long each statement took that updated data. The binary log has two important purposes:
For replication, the binary log on a primary replication server provides a record of the data changes to be sent to secondary servers. The primary server sends the events contained in its binary log to its secondaries, which execute those events to make the same data changes that were made on the primary.
Certain data recovery operations require the use of the binary log. After a backup has been restored, the events in the binary log that were recorded after the backup was made are re-executed. These events bring databases up to date from the point of the backup
The binary log is not used for statements such as SELECT or SHOW that do not modify data.
https://dev.mysql.com/doc/refman/8.0/en/binary-log.html
Here is the entry in the MySQL reference manual for innodb_flush_log_at_trx_commit. You can lose the last second of transactions with the value set to 0.
Note that the binlog is actually something different that is independent of innodb and is used for all storage engines. Here is the chapter on the binary log in the MySQL reference manual.