How to create a linked mysql database - mysql

I have software that reads only one database by name. However, every day I have to check for records that are 30+ days old so my solution is to rename the database everyday (appending a timestamp) and create a new one with the old name so my software can continue to run.
I need my software to read all of the databases but it can only read one. Is there a way to link the main database with the archived ones without copying the database? I don't think I can use MERGE because I won't be able to split the databases by day.
e.g.
Software only reads database MAINDB
Everyday, a cronjob renames the database. MAINDB becomes BKDB_2015_12_04. I can still access the database from mysql because it's not a dumped database.
A new MAINDB is made for the software to read.
However, I need the software to read the data stored in BKDB_2015_12_04 and any other database BKDP_*
I'd like to have the software, when reading MAINDB, also read BKDB_*
Essentially, I'm having some databases 'read-only' and I'm partitioning the data by day. I'm reading about using PARTITION but I'm dealing with an immense amount of data and I'm not sure if PARTITION is effective in dealing with this amount of data.

Renaming and re-creating a database is a "bad idea". How can you ever be sure that your database is not being accessed when you rename and re-create?
For example, say I'm in the middle of a purchase and my basket is written to a database table as I add items to it (unlikely scenario but possible). I'm browsing around choosing more items. In the time I'm browsing, the existing database is renamed and a new one re-created. Instantly, my basket is empty with no explanation.
With backups, what happens if your database is renamed half=way through a backup? How can you be sure that all your other renamed databases are backed up?
One final thing - what happens in the long run with renamed databases? Are they left there for ever? Are the dropped after a certain amount of time? etc
If you're checking for records that are 30+ days old, the only solution you should be considering is to time-stamp each record. If your tables are linked via single "master" table, put the time-stamp in there. Your queries can stay largely the same (except from adding a check for a times-stamp) and you don't have to calculate database names for the past 30 days

You should be able to do queries on all databases with UNION, all you need to know is the names of the databases:
select * from MAINDB.table_name
union all
select * from BKDB_2015_12_04.table_name
union all
select * from database_name.table_name

Related

MySQL backup pieces of the database from a server

I'm writing the back-end for a web app in Spring and it uses a MySQL database on an AWS RDS instance to keep track of user data. Right now the SQL tables are separated by user groups (just a value in a column), so different groups have different access to data. Whenever a person using the app does a certain operation, we want to back up their part of the database, which can be viewed later, or replace their data in the current branch if they want.
The only way I can figure out how to do this is to create separate copies of every table for each backup and keep another table to keep track of what all the names of the tables are. This feels very inelegant and labor intensive.
So far all operations I do on the database are SQL queries from the server, and I would like to stay consistent with that.
Is there a nice way to do what I need?
Why would you want a separate table for each backup? You could have a single table that mirrored the main table but had a few additional fields to record some metadata about the change, for example the person making it, a timestamp, and the type of change either update or delete. Whenever a change is made, simply copy the old value over to this table and you will then have a complete history of the state of the record over time. You can still enforce the group-based access by keeping that column.
As for doing all this with queries, you will need some for viewing or restoring these archived changes, but the simplest way for maintaining the archived records is surely to create TRIGGERS on the main tables. If you add BEFORE UPDATE and BEFORE DELETE TRIGGERS these can copy the old version of each record over to the archive (and also add the metadata at the same time) each time a record is updated or deleted.

Multiple MySQL databases all using the same schema

EDIT: To clarify throughout this post: when I say "schema" I am referring to "data-model," which are synonyms in my head. :)
My question is very similar to this question (Rails: Multiple databases, same schema), but mine is related to MySQL.
To reiterate the problem: I am developing a SAAS. The user will be given an option of which DB to connect to at startup. Most customers will be given two DBs: a production DB and a test DB, which means that every customer of mine will have 1-2 databases. So, if I have 10 clients, I will have about 20 databases to maintain. This is going to be difficult whenever the program (and datamodel) needs to be updated.
My question is: is there a way to have ONE datamodel for MULTIPLE databases? The accepted answer to the question I posted above is to combine everything into one database and use a company_id to separate out the data, but this has several foreseeable problems:
What happens when these transaction-based tables become inundated? My 1 customer right now has recorded 16k transactions already in the past month.
I'd have to add where company_id = to hundreds of SQL queries/updates/inserts (yes, Jeff Atwood, they're Parametrized SQL calls), which would have a severe impact on performance I can only assume.
Some tables store metadata, i.e., drop-down menu items that will be company-specific in some cases and application-universal in others. where company_id = would add an unfortunate layer of complexity.
It seems logical to me to create (a) new database(s) for each new customer and point their software client to their database(s). But, this will be a headache to maintain, so I'm looking to reduce this potential headache.
Create scripts for deployments for change to the DB schema, keep an in house database of all customers and keep that updated, write that in your scripts to pull from for the connection string.
Way better than trying to maintain a single database for all customers if your software package takes off.
FYI: I am currently with an organization that has ~4000 clients, all running separate instances of the same database (very similar, depending on the patch version they are on, etc) running the same software package. A lot of the customers are running upwards of 20-25k transactions per second.
A "database" in MySQL is called a "schema" by all the other database vendors. There are not separate databases in MySQL, just schemas.
FYI: (real) databases cannot have foreign keys between them, whereas schemas can.
Your test and production databases should most definitely not be on the same machine.
Use Tenant Per Schema, that way you don't have company_ids in every table.
Your database schema should either be generated by your ORM or it should be in source control in sql files, and you should have a script that automatically builds/patches the db. It is trivial to change this script so that it builds a schema per tenant.

Ruby/Bash script to backup my table and delete records

I'm save all the transactions in DB instead of logs , but i don't want the table to get huge and slow , so I was thinking to create a cron job to do something every few month like :
1- Backup the table for hard drive
2- move all the records to a new table something like table_backup
3- delete the records on that table
This way in case insert will take lot of time with huge table, the table will be freed every few months
Please not that I'm using ruby with active record models to access the DB tables , what do you think the best way to do such a thing , and is there any alternatives to what I suggested ?
I would suggest the following:
Redundancy - depending on how critical your data is, you may want redundant storage (e.g. a master-slave database setup, or the database on a RAID device)
Backups - have hourly/daily/weekly backups (again, depending on how critical it is to maintain these backups, how much space you can afford for them, how much traffic you're getting, and what the impact is on the database) of the entire database.
Truncation - have a cron task (check out the whenever gem which makes this easy) that deletes all entries older than some threshold (2 weeks?). There's no need to populate a new table just to delete old entries.
I believe these approaches are orthogonal, so you can pick whichever ones suit you, or implement the important one(s) first.

How do I combine two MySQL databases from different points in time?

I recently switch to a new hosting provider for my application. My employees used the old site until the new site went live, however, the database backup from the old site was taken two days before the new site went live. So in the midst of transferring, records were being entered into the old site database while the new site had no existence of them (hence my two day time lag). How do I merge the two databases to reflect the changes?
A couple of things to note are the primary keys might be duplicated for some tables and there are only timestamps on a few tables as well. I would do a 'diff' or something of the sort, but the tables are dumped in different formats.
Any thoughts?
This is something where you'll need to actually understand your database schema. You'll need to create a program that can look at both versions of the database, identify which records are shared, which are not, and which have conflicting primary keys (vs ones which were updated with the same keys). It then needs to copy over changes, possibly replacing the value of primary keys (including the values in other rows that refer to the row being renumbered!) This isn't easy, and it's not an exact science - you'll be writing heuristics, and expect to do some manual repairs as well.
Next time shut that database down when you grab the final backup :)
You don't need to create any additional programs. All what you need, to setup replications from the old DB to the new one.
All your data from the old DB will automatically transfer to the new DB. At this period you should use you old DB as the main data source. And as soon as all data will be copied to the new location, you'll need just brake replica connection and change the DB address in your code (or DNS pointer) to the new one.
1. oldDB ===> replication ==> newDB
R/W operations
2. oldDB ==/= brake ==/= newDB
R/W operations
MySQL Doc: 15.1.1. How to Set Up Replication

Database design for heavy timed data logging

I have an application where I receive each data 40.000 rows. I have 5 million rows to handle (500 Mb MySQL 5.0 database).
Actually, those rows are stored in the same table => slow to update, hard to backup, etc.
Which kind of scheme is used in such application to allow long term accessibility to the data without problems with too big tables, easy backup, fast read/write ?
Is postgresql better than mysql for such purpose ?
1 - 40000 rows / day is not that big
2 - Partition your data against the insert date : you can easily delete old data this way.
3 - Don't hesitate to go through a datamart step. (compute often asked metrics in intermediary tables)
FYI, I have used PostgreSQL with tables containing several GB of data without any problem (and without partitioning). INSERT/UPDATE time was constant
We're having log tables of 100-200million rows now, and it is quite painful.
backup is impossible, requires several days of down time.
purging old data is becoming too painful - it usually ties down the database for several hours
So far we've only seen these solutions:
backup , set up a MySQL slave. Backing up the slave doesn't impact the main db. (We havn't done this yet - as the logs we load and transform are from flat files - we back up these files and can regenerate the db in case of failures)
Purging old data, only painless way we've found is to introduce a new integer column that identifies the current date, and partition the tables(requires mysql 5.1) on that key, per day. Dropping old data is a matter of dropping a partition, which is fast.
If in addition you need to do continuously transactions on these tables(as opposed to just load data every now and then and mostly query that data), you probably need to look into InnoDB and not the default MyISAM tables.
The general answer is: you probably don't need all that detail around all the time.
For example, instead of keeping every sale in a giant Sales table, you create records in a DailySales table (one record per day), or even a group of tables (DailySalesByLocation = one record per location per day, DailySalesByProduct = one record per product per day, etc.)
First, huge data volumes are not always handled well in a relational database.
What some folks do is to put huge datasets in files. Plain old files. Fast to update, easy to back up.
The files are formatted so that the database bulk loader will work quickly.
Second, no one analyzes huge data volumes. They rarely summarize 5,000,000 rows. Usually, they want a subset.
So, you write simple file filters to cut out their subset, load that into a "data mart" and let them query that. You can build all the indexes they need. Views, everything.
This is one way to handle "Data Warehousing", which is that your problem sounds like.
First, make sure that your logging table is not over-indexed. By that i mean that every time you insert/update/delete from a table any indexes that you have also need to be updated which slows down the process. If you have a lot of indexes specified on your log table you should take a critical look at them and decide if they are indeed necessary. If not, drop them.
You should also consider an archiving procedure such that "old" log information is moved to a separate database at some arbitrary interval, say once a month or once a year. It all depends on how your logs are used.
This is the sort of thing that NoSQL DBs might be useful for, if you're not doing the sort of reporting that requires complicated joins.
CouchDB, MongoDB, and Riak are document-oriented databases; they don't have the heavyweight reporting features of SQL, but if you're storing a large log they might be the ticket, as they're simpler and can scale more readily than SQL DBs.
They're a little easier to get started with than Cassandra or HBase (different type of NoSQL), which you might also look into.
From this SO post:
http://carsonified.com/blog/dev/should-you-go-beyond-relational-databases/