Copy MariaDb database with selected data - mysql

I have the shop live database. I need to have opportunity to make copies of this database but with orders data only for last 14 days. Database can be really big but almost 80 percent of data is order, payment and related tables. So we want to copy only last 14 days data of these tables, and all data of another tables. How it can be implemented?

For me it sounds like a classic ETL job. You could use any programming language (like Python) or KNIME that reads from the source db (with an SQL query with a WHERE clause like your_date_column >= CURDATE() - INTERVAL 14 DAY) and write to a sink db.
You can then run it as a (cron) job in Windows/Linux and create each day a backup for the last 14 days but make sure that you also delete/drop the older backups if the size of the backups gets to big.

Related

How to sync records with ETL to a datawarehouse in NRT

I am a newbie with all these ETL stuff,
I wonder what are the best solutions with tools like PDI (pentaho data integration) to sync some records from operational databases to datawarehouse
I am in a near real time context (so I don't want to sync data 1 a day but every 5 minutes for example.)
3 ways immediately come to me:
using an indexed time columns on operation database
Ex: SELECT * FROM records WHERE date > NOW() - INTERVAL 5 MINUTES
but I can still miss some records or having some duplicates etc...
using a table or a sync column
Ex: SELECT * FROM records WHERE synced = no
using a queue service
Ex: at record creation, creating an event in a rabbitMq (or any other tool) telling that something is ready to get sync

How to create a linked mysql database

I have software that reads only one database by name. However, every day I have to check for records that are 30+ days old so my solution is to rename the database everyday (appending a timestamp) and create a new one with the old name so my software can continue to run.
I need my software to read all of the databases but it can only read one. Is there a way to link the main database with the archived ones without copying the database? I don't think I can use MERGE because I won't be able to split the databases by day.
e.g.
Software only reads database MAINDB
Everyday, a cronjob renames the database. MAINDB becomes BKDB_2015_12_04. I can still access the database from mysql because it's not a dumped database.
A new MAINDB is made for the software to read.
However, I need the software to read the data stored in BKDB_2015_12_04 and any other database BKDP_*
I'd like to have the software, when reading MAINDB, also read BKDB_*
Essentially, I'm having some databases 'read-only' and I'm partitioning the data by day. I'm reading about using PARTITION but I'm dealing with an immense amount of data and I'm not sure if PARTITION is effective in dealing with this amount of data.
Renaming and re-creating a database is a "bad idea". How can you ever be sure that your database is not being accessed when you rename and re-create?
For example, say I'm in the middle of a purchase and my basket is written to a database table as I add items to it (unlikely scenario but possible). I'm browsing around choosing more items. In the time I'm browsing, the existing database is renamed and a new one re-created. Instantly, my basket is empty with no explanation.
With backups, what happens if your database is renamed half=way through a backup? How can you be sure that all your other renamed databases are backed up?
One final thing - what happens in the long run with renamed databases? Are they left there for ever? Are the dropped after a certain amount of time? etc
If you're checking for records that are 30+ days old, the only solution you should be considering is to time-stamp each record. If your tables are linked via single "master" table, put the time-stamp in there. Your queries can stay largely the same (except from adding a check for a times-stamp) and you don't have to calculate database names for the past 30 days
You should be able to do queries on all databases with UNION, all you need to know is the names of the databases:
select * from MAINDB.table_name
union all
select * from BKDB_2015_12_04.table_name
union all
select * from database_name.table_name

SQL Data synchronization between production and reporting server

I have one production server which will store data from day 1 (latest data) up to day 90.
I will move day 91 data to reporting server everyday when new data enters on production server.
My reporting server will keep 365 days of data.
Production will keep 90 days data.
There are still some daily data update in my production for the total 90 days data. How should I synchronize the changes in production data (90 days) with my reporting data ( 365 days) ?
Please advise.
And for the day 91 data import to reporting, is it the best way to use SSIS import wizard?
Thanks in advance.
No don't use the SSIS wizard. You cannot acheive what you want through the wizard.
You'll need to use something to move the data. Ig the two databases are on the same server you don't need SSIS you can just use INSERT/SELECT SQL statements to move the data. If the DB's are on different servers (or expecte to be in the future), then you need to use an ETL tool, of which SSIS may be your best option.
I suggest you store ALL data in your reporting database, i.e. day 1 to 365. Then you do all your reporting from the reporting database instead of trying to stitch the two databases together.
How do you identify day 91? is there a single field you can use to do this in the source?
The simplest approach is a rolling window approach. You delete day 0 to, say, day 20 in your reporting database. Then you load that same window over the top from production.
The other approach is a full CDC approach but if you have a reliable 'age' field that you can use, this won't be necessary.

Delete data from mysql innodb tables after one month is passed

Currently i am using cron for this. I thought perhaps it is possible to implement some procedure that will remove all data from database that is older than one month, but i am not sure that this is the best way.
Problem is that we have many servers with many cron processes, that are controlled by very small amount of stuff, and we need to make it clear and easy-to-manage, that's why i don't want to have such cron process.
Data in table i want to delete - statistics, huge amount of this data is inserted every day, and if it will not be deleted - database will be unbeliaveable huge (about ~500M every day, for us it's quite big amount, 500M * 365 days is 182,5G per year)
Is it possible to delete data using some procedure in mysql (perhaps after new row is added) / and is that a good idea?
If you're intending on moving away from cron jobs, you could always create an event that runs at a scheduled frequency.
Whatever you do, it's a very bad idea to delete data every time a new row is added, as it'll slow down your insert and it's more likely to fragment your tables.

Can I use a "last update" timestamp to select MySQL records for update?

I have a MySQL database with about 30,000 rows. I update this database to a remote server nightly, but never are more than 50 rows updated at a time. I am the only one who updates the database. I would like to develop a method in which only CHANGED rows are exported to the remote server.
To save space in the database and to save time when I export to the remote server, I have built "archive" tables (on the remote server) with records that will no longer be updated, and which do not reside on the local database. But I know splitting up this data into multiple tables is bad design that could lead to problems if the structure of the tables ever needs to be changed.
So I would like to rebuild the database so that ALL the records with similar table structures are in a single table (like they were when the database was much smaller). The size of the resulting table (with all archived records) would exceed 80,000 rows, much to large to export in a whole-database package.
To do this, I would like to
(1) Update a "last updated" timestamp in each row when the row is added or modified
(2) Select only rows in tables for export when their "last update" timestamp is greater than the timestamp of the last export operation
(3) Write a query that builds the export .sql file with only new and updated rows
(4) Update the timestamp for the export operation to be used for comparison during the next export
Has anyone ever done this? If so, I would be grateful for some guidance on how to accomplish this.
Steve
If you add a column with a timestamp datatype, for example last_updated timestamp, it will be automatically updated to now() every time a row changes.
Then early every day, simply ship yesterday's changes:
select * from mytable
where last_updated between subdate(CURDATE(), 1) and CURDATE()
Why not just setup the remote server as a replication slave? MySQL will only send updated rows in that situation, and very quickly / efficiently at that.
Using an official replication strategy is generally advisable rather than rolling your own. You'll have lots of examples to work from and lots of people who understand what's going on if you run into problems.