I want to export updated data from MySQL/postgreSQL to mongodb every time specified table has changed or, if that's impossible, make the dump of whole table to NoSQL every X seconds/minutes. What can I do to achieve this? I've googled and I found only paid, enterprise level solutions and those are out of reach for my amateur project.
SymmetricDS provides and open source database replication option that would that has support for replicating a RDMS database (MySQL, Postgres) into MongoDB.
Here is the specific documentation to setup the Mongo target node in SymmetricDS.
http://www.symmetricds.org/doc/3.11/html/user-guide.html#_mongodb
There is also a blog about setting up Mongo in a bit more detail.
https://www.jumpmind.com/blog/mongodb-synchronization
To get online replication into a target database you can use:
Get the data stream at the same time in both databases
Enterprise solution which reads the transaction log and pushes the data to the next database
Check periodically for change dates > X
Export table periodically
Write changed records to a certain table with trigger and poll this table to select the changes
Push the changed data with triggers in a datastreamservice into the next database
Many additional approaches
Depending on the amount of time you want to use and the lag the data can have it depends which solution fits for your demand.
If the amount of data gets bigger or the number or transaction increases some solutions which fit for an amateur project don't fit anymore.
Related
I am looking into migrating my MySQL DB to Azure Database for MySQL https://azure.microsoft.com/en-us/services/mysql/. It currently resides on a server hosted by another company. The DB is about 100 GB. (It worries me that Azure uses the term "relatively large" for 1GB.)
Is there a way to migrate the DB without any or little (a few hours, max) downtime? I obviously can't do a dump and load as the downtime could be days. Their documentation seems to be for syncing with a MySQL server that is already on a MS server.
Is there a way to export the data out of MS Azure if I later want to use something else, again without significant downtime?
Another approach: Use Azure Data Factory to copy the data from your MySQL source to your Azure DB. Set up a sync procedure that updates your Azure Database with new rows. Sync, take MYSQL db offline, sync once more and switch to the Azure DB.
See Microsoft online help
Don't underestimate the complexity of this migration.
With 100GB, it's a good guess that most rows in your tables don't get UPDATEd or DELETEd.
For my suggestion here to work, you will need a way to
SELECT * FROM table WHERE (the rows are new or updated since a certain date)
Some INSERT-only tables will have autoincrementing ID values. In this case you can figure out the ID cutoff value between old and new. Other tables may be UPDATEd. Unless those table have timestamps saying when they were updated, you'll have a challenge figuring it out. You need to understand your data to do that. It's OK if your WHERE (new or updated) operation takes some extra rows that are older. It's NOT OK if it misses INSERTed or UPDATEd rows.
Once you know how to do this for each large table, you can start migrating.
Mass Migration Keeping your old system online and active, you can use mysqldump to migrate your data to the new server. You can take as long as you require to do it. Read this for some suggestions. getting Lost connection to mysql when using mysqldump even with max_allowed_packet parameter
Then, you'll have a stale copy of the data on the new server. Make sure the indexes are correctly built. You may want to use OPTIMIZE TABLE on the newly loaded tables.
Update Migration You can then use your WHERE (the rows are new or updated) queries to migrate the rows that have changed since you migrated the whole table. Again, you can take as long as you want to do this, keeping your old system online. It should take much less time than your first migration, because it will handle far fewer rows.
Final Migration, offline Finally, you can take your system offline and migrate the remaining rows, the ones that changed since your last migration. And migrate your small tables in their entirety, again. Then start your new system.
Yeah but, you say, how will I know I did it right?
For best results, you should script your migration steps, and use the scripts. That way your final migration step will go quickly.
You could rehearse this process on a local server on your premises. While 100GiB is big for a database, it's not an outrageous amount of disk space on a desktop or server-room machine.
Save the very large extracted files from your mass migration step so you can re-use them when you flub your first attempts to load them. That way you'll save the repeated extraction load on your old system.
You should stand up a staging copy of your migrated database (at your new cloud provider) and test it with a staging copy of your application. You may be able to do this with a small subset of your rows. But do test your final migration step with this copy to make sure it works.
Be prepared for a fast rollback to the old system if the new one goes wrong .
AND, maybe this is an opportunity to purge out some old data before you migrate. This kind of migration is difficult enough that you could make a business case for extracting and then deleting old rows from your old server, before you start migrating.
Google says NO triggers, NO stored procedures, No views. This means the only thing I can dump (or import) is just a SHOW TABLES and SELECT * FROM XXX? (!!!).
Which means for a database with 10 tables and 100 triggers, stored procedures and views I have to recreate, by hand, almost everything? (either for import or for export).
(My boss thinks I am tricking him. He cannot understand how previous, to me, employers did that replication to a bunch of computers using two clicks and I personally need hours (or even days) to do this with an internet giant like Google.)
EDIT:
We have applications which are being created in local computers, where we use our local MySQL. These applications use MySQL DB's which consist, say, from n tables and 10*n triggers. For the moment we cannot even check google-cloud-sql since that means almost everything (except the n almost empty tables) must be "uploaded" by hand. And we cannot also check using google-cloud-sql DB since that means almost everything (except the n almost empty tables) must be "downloaded" by hand.
Until now we do these "up-down"-loads by taking a decent mysqldump from the local or the "cloud" MySQL.
It's unclear what you are asking for. Do you want "replication" or "backups" because these are different concepts in MySQL.
If you want to replicate data to another MySQL instance, you can set up replication. This replication can be from a Cloud SQL instance, or to a Cloud SQL instance using the external master feature.
If you want to backup data to or from the server, checkout these pages on importing data and exporting data.
As far as I understood, you want to Create Cloud SQL Replicas. There are a bunch of replica options found in the doc, use the one that fits the best to you.
However, if you said "replica" as Cloning a Cloud SQL instance, you can follow the steps to clone your instance in a new and independent instance.
Some of these tutorials are done by using the GCP Console and can be scheduled.
I have an application that runs on a MySQL database, the application is somewhat resource intensive on the DB.
My client wants to connect Qlikview to this DB for reporting. I was wondering if someone could point me to a white paper or URL regarding the best way to do this without causing locks etc on my DB.
I have searched the Google to no avail.
Qlikview is in-memory tool with preloaded data so your client have to get data only during periodical reloads not all the time.
The best way is that your client will set reload once per night and make it incremental. If your tables have only new records load every night only records bigger than last primary key loaded.
If your tables have modified records you need to add in mysql last_modified_time field and maybe also set index on that field.
last_modified_time TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
If your fields are get deleted the best is set it as deleted=1 in mysql otherwise your client will need to reload everything from that tables to get to know which rows were deleted.
Additionally your client to save resources should load only data in really simple style per table without JOINS:
SELECT [fields] FROM TABLE WHERE `id` > $(vLastId);
Qlikview is really good and fast for data modelling/joins so all data model your client can create in QLikview.
Reporting can indeed cause problems on a busy transactional database.
One approach you might want to examine is to have a replica (slave) of your database. MySQL supports this very well and your replica data can be as up to date as you require. You could then attach any reporting system to your replica to run heavy reports that won't affect your main database. This also gives you a backup (2nd copy) and the backup can further be used to create offline backups of your data also without affecting your main database.
There's lots of information on the setup of MySQL replicas so that's not too hard.
I hope that helps.
I'm planning to build a system that will have 30+ tables and 100+ million rows in a few of those. Going to use MySQL - InnoDB (any better alternative for this?)
My scripts are going to add a couple of hundreds of thousands of clicks to the database every day. On the other hand, I'd like to do heavy database queries during the day as well.
What I came us with is to have two different servers. Server A would take all the clicks and store them and Server B would work on retrieving the results.
Question A: Is this the right approach to do? Question B: Is it possible to set up a script that's cloning the database over from Server A to Server B - so the data is semi-up to date?
Edit: LEMP stack
You should not do this via a batch process that runs a large update every so often. Instead, use MySQL’s built-in replication features.
In particular, use a master-slave configuration. This allows you to keep multiple servers current in (essentially) real-time, while splitting reads (fast) from writes (slow) to get maximum performance.
I have an app that stores locally some data in sqlite,
I have a server that stores the same data in a mysql database,
Both tables have a timestamp column that indicates the time it was edited.
What i want to do is sync the data so it matches,
So if another devices changes the central data on the server, it is pushed down to all devices.
Currently i achieve this by ...
In the app i store the time i last made a server read.
I ask the server for all data that has changed since ...
I make a read about every 30 seconds
My issue, what happens when the clocks change (this will potentially cause issues)
What is the standard way of achieving what i want, the project is very early in development so i can change if there is a much better way of achieving that i want.
Thanks