I am using Oracle database but open to use other database so tagging all of them.
I am designing one system in which I have to inject all the data of existing database table into new database and whatever changes happens in existing database should reflect in new database on daily basis. My approach is.
I will copy all the data of existing database to new database.
Then I will create a trigger which will record all the changes in the table and store in another table(all the DML operations).
Once in a day my API will read the data generated by trigger and copy into new system. I don't need live data so I will schedule job only once in a day to copy data into new database
is this the proper approach? any suggestions?
Common practice would be to back up your primary instance and restore it on the secondary once a day.
You could schedule the backup and restore in sequence as a daily jobs.
If your copy database is Sql server, then I suggested you use LinkedServer. Based on the documentation:
Linked servers enable you to implement distributed databases that can
fetch and update data in other databases. They are a good solution in
the scenarios where you need to implement database sharding without
need to create a custom application code or directly load from remote
data sources. Linked servers offer the following advantages:
The ability to access data from outside of SQL Server.
The ability to issue distributed queries, updates, commands, and
transactions on heterogeneous data sources across the enterprise.
The ability to address diverse data sources similarly.
You can find more information based on the documentation.
Visit https://learn.microsoft.com/en-us/sql/relational-databases/linked-servers/linked-servers-database-engine?view=sql-server-ver15
Related
I have an application that runs on a MySQL database, the application is somewhat resource intensive on the DB.
My client wants to connect Qlikview to this DB for reporting. I was wondering if someone could point me to a white paper or URL regarding the best way to do this without causing locks etc on my DB.
I have searched the Google to no avail.
Qlikview is in-memory tool with preloaded data so your client have to get data only during periodical reloads not all the time.
The best way is that your client will set reload once per night and make it incremental. If your tables have only new records load every night only records bigger than last primary key loaded.
If your tables have modified records you need to add in mysql last_modified_time field and maybe also set index on that field.
last_modified_time TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
If your fields are get deleted the best is set it as deleted=1 in mysql otherwise your client will need to reload everything from that tables to get to know which rows were deleted.
Additionally your client to save resources should load only data in really simple style per table without JOINS:
SELECT [fields] FROM TABLE WHERE `id` > $(vLastId);
Qlikview is really good and fast for data modelling/joins so all data model your client can create in QLikview.
Reporting can indeed cause problems on a busy transactional database.
One approach you might want to examine is to have a replica (slave) of your database. MySQL supports this very well and your replica data can be as up to date as you require. You could then attach any reporting system to your replica to run heavy reports that won't affect your main database. This also gives you a backup (2nd copy) and the backup can further be used to create offline backups of your data also without affecting your main database.
There's lots of information on the setup of MySQL replicas so that's not too hard.
I hope that helps.
I am looking for a tip how to synchronize data from a local firebird database into online db? Few comments:
On a local machine I use sales software which keeps data on firebird db. There is an internet connection, but I want to avoid direct db access (as the PC after 9pm is being turned off).
I would like to create an online app (based on foundation + php + database) in which I will be able to view daily sales and explore past data.
In local db, I will need to pull data from several different tables, and I would like to keep them in online/final db as a single table (with fields: #id, transaction date, transaction value, sales manager).
While mostly I know how to create frontend of the app, and partially backend still I wonder what would be best choice in terms of db - mysql? (it was my first thought). Or rather I should focus on NoSQL?
What's your recommendation on data sync? I should use symmetricsDB (pretty hard to configure) or equivalent, I should write a script which will push data from firebird into json/xml? I'm referring to your knowledge and best practices
Put a scheduled job that will invoke a simple data pump / replication script.
From the script, connect to the source sales db, retrieve the joined data added from last replication and insert them into the "online" database.
You may keep also Firebird as online DB as it works great with PHP.
Firebird also in version 2.5 has all technology already build in to implement a fully functional replication. We have implemented this in the largest installation for a big restaurant company with about 0.6 billion records, daily about 1 million new records and 150 locations where replicated servers are working online or offline with the back office software.
If you simply want to upload the data from your local db to a remote db, you can rent a virtual server at a provider you like, install firebird there, create a secure connection (we use ssh, but any tcp over vpn can be used). copy your local database to the remote server, if required open firewall fb port (3050 or other) and when you a low number of writes on your local database, simply implement a trigger on each table, that does the same insert/update/delete with the same values using the "execute statement on external" feature.
When your local database has higher workload, it is better to put the change data (table name and pk values) from trigger into a log table and let a second connection upload the records to the target db, where the same "execute statement on external" can be used.
this is just a hint how to do that, if budget allows, we can do it for you, but stopping the database pc in the evening seems to be only typical for smaller companies
I am working for a client who uses multiple RDS (MySQL) instances on AWS and wants me to consolidate data from there and other sources into a single instance and do reporting off that.
What would be the most efficient way to transfer selective data from other AWS RDS MySQL instances to mine?
I don't want to migrate the entire DB, rather just a few columns and rows based on which have relevant data and what was last created/updated.
One option would be to use a PHP script that'd read from one DB and insert it into another, but it'd be very inefficient. Unlike SQL Server or ORACLE, MySQL also does not have the ability to write queries across servers, else I'd have just used that in a stored procedure.
I'd appreciate any inputs regarding this.
If your overall objective is reporting and analytics, the standard practice is to move your transactional data from RDS to Redshift which will become your data warehouse. This blog article by AWS provides an approach to do it.
For the consolidation operation, you can use AWS Data Migration Service which will allow you to migrate data column wise with following options.
Migrate existing data
Migrate existing data & replicate ongoing changes
Replicate data changes only
For more details read this whitepaper.
Note: If you need to process the data while moving, use AWS Data Pipeline.
Did you take a look at the RDS migration tool?
I'm developer preparing new version solution of current solution in my company.
Currently, application create database per our clients. For example
db1,db2,db3.....,db100,db101...
And server has 8 thousand of database.
This has benefit of performance of CRUD and backup or rollback database per client. Problem is some noise code on my application source.
I cannot define database config file statically because database created and removed dynamic. All of my code write query using client's number such as below
"select * from db"+clientNumber+".table"
or create new database connection for every request.
In our new version, I want to manage one database that combined all current database. If I do, some table has tens of million data row. Forasmuch I've never experienced manage tens of million data row table, I cannot judge this is good practice. And how backup and rollback program will developed?
I have a Transaction database with 10,000+ entries inserted on daily basis.
My client's requirement is that we allow him to download reports from his own server, for this we make a same copy of Transaction database to his server.
But now problem is how do we move data at a specific time to his server which takes latest data entry?
There are at least a couple of options in SQL Server.
If you can connect to your customer's database, Change data capture with SSIS is one option. CDC collects all changes in a queryable store which SSIS then reads and pushes to your target. You can be as selective as you want on what to move over since you write the ETL process in SSIS. One downside to CDC is it's in enterprise edition only. See detailed instructions at https://technet.microsoft.com/en-us/library/bb895315(v=sql.105).aspx
Transactional replication is another option which is available in both enterprise and standard editions. This has been around along time and used by a lot of organizations to do exactly what you described - incrementally move data to another database. Not as flexible as CDC but you can still apply filters to what rows/columns get moved. Not needed enterprise edition is helpful for many customers. Lots of detail about the technology here https://msdn.microsoft.com/en-us/library/ms151198(v=sql.105).aspx but highly encourage you to check out Kendra Little's most excellent article that covers trans repl and compares it with CDC http://www.brentozar.com/archive/2013/09/transactional-replication-change-tracking-data-capture/
If you can't connect directly to the customer database, CDC with SSIS still works but the output target will be some flat file which then gets transferred to the customer and loaded using another SSIS package or some other bulk load job (TSQL, BCP, etc...). Do be careful with how the flat file gets moved since anybody can see its contents.
I'd avoid any manual methods like creating triggers or running some (usually expensive) query to find the changed rows. Apart from the maintenance efforts, you're very likely to encounter tough performance issues.