Multiple MySQL databases all using the same schema - mysql

EDIT: To clarify throughout this post: when I say "schema" I am referring to "data-model," which are synonyms in my head. :)
My question is very similar to this question (Rails: Multiple databases, same schema), but mine is related to MySQL.
To reiterate the problem: I am developing a SAAS. The user will be given an option of which DB to connect to at startup. Most customers will be given two DBs: a production DB and a test DB, which means that every customer of mine will have 1-2 databases. So, if I have 10 clients, I will have about 20 databases to maintain. This is going to be difficult whenever the program (and datamodel) needs to be updated.
My question is: is there a way to have ONE datamodel for MULTIPLE databases? The accepted answer to the question I posted above is to combine everything into one database and use a company_id to separate out the data, but this has several foreseeable problems:
What happens when these transaction-based tables become inundated? My 1 customer right now has recorded 16k transactions already in the past month.
I'd have to add where company_id = to hundreds of SQL queries/updates/inserts (yes, Jeff Atwood, they're Parametrized SQL calls), which would have a severe impact on performance I can only assume.
Some tables store metadata, i.e., drop-down menu items that will be company-specific in some cases and application-universal in others. where company_id = would add an unfortunate layer of complexity.
It seems logical to me to create (a) new database(s) for each new customer and point their software client to their database(s). But, this will be a headache to maintain, so I'm looking to reduce this potential headache.

Create scripts for deployments for change to the DB schema, keep an in house database of all customers and keep that updated, write that in your scripts to pull from for the connection string.
Way better than trying to maintain a single database for all customers if your software package takes off.
FYI: I am currently with an organization that has ~4000 clients, all running separate instances of the same database (very similar, depending on the patch version they are on, etc) running the same software package. A lot of the customers are running upwards of 20-25k transactions per second.

A "database" in MySQL is called a "schema" by all the other database vendors. There are not separate databases in MySQL, just schemas.
FYI: (real) databases cannot have foreign keys between them, whereas schemas can.
Your test and production databases should most definitely not be on the same machine.
Use Tenant Per Schema, that way you don't have company_ids in every table.
Your database schema should either be generated by your ORM or it should be in source control in sql files, and you should have a script that automatically builds/patches the db. It is trivial to change this script so that it builds a schema per tenant.

Related

Comparison between MySQL Federated, Trigger, and Event Schedule?

I have a very specific problem that requires multiple MYSQL DB instances, and I need to "sync" all data from each DB/table into one DB/table.
Basically, [tableA.db1, tableB.db2, tableC.db3] into [TableAll.db4].
Some of the DB instances are on the same machine, and some are on a separate machine.
About 80,000 rows are added to a table per day, and there are 3 tables(DB).
So, about 240,000 would be "synced" to a single table per day.
I've just been using Event Schedule to copy the data from each DB into the "All-For-One" DB every hour.
However, I've been wondering lately if that's the best solution.
I considered using Trigger, but I've been told it puts heavy burden on DB.
Using statement trigger may be better, but it depends too much on how the statement is formed.
Then I heard about Federated (in Oracle term, "DBLink"),
and I thought I could use it to link each table and create a VIEW table on those tables.
But I don't know much about databases, so I don't really know the implication of each method.
So, my question is..
Considering the "All-For-One" DB only needs to be Read-Only,
which method would be better, performance and resource wise, in order to copy data from multiple databases into one database regularly?
Thanks!

How to fill for the first time a SQL database with multiple tables

I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.
For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.
My question is : Now, what is the easiest way for me to populate this DB ?
My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?
Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related
Notes : I work with R (dplyr, etc.) and MySQL
A serious relational database, such as Postgres for example, will offer features for populating a large database.
Bulk loading
Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.
In Postgres that would be the COPY command.
Temporarily disabling referential-integrity
Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.
In Postgres, use SET CONSTRAINTS … DEFERRED to not check constraints during each statement, and instead wait until the end of the transaction.
Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.
Other issues
For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).
Disable Autocommit
Use COPY (for mass import, mentioned above)
Remove Indexes
Remove Foreign Key Constraints (mentioned above)
Increase maintenance_work_mem (changing the memory allocation of your database engine)
Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)
Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)
Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)
Database migration
By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.

How to create a linked mysql database

I have software that reads only one database by name. However, every day I have to check for records that are 30+ days old so my solution is to rename the database everyday (appending a timestamp) and create a new one with the old name so my software can continue to run.
I need my software to read all of the databases but it can only read one. Is there a way to link the main database with the archived ones without copying the database? I don't think I can use MERGE because I won't be able to split the databases by day.
e.g.
Software only reads database MAINDB
Everyday, a cronjob renames the database. MAINDB becomes BKDB_2015_12_04. I can still access the database from mysql because it's not a dumped database.
A new MAINDB is made for the software to read.
However, I need the software to read the data stored in BKDB_2015_12_04 and any other database BKDP_*
I'd like to have the software, when reading MAINDB, also read BKDB_*
Essentially, I'm having some databases 'read-only' and I'm partitioning the data by day. I'm reading about using PARTITION but I'm dealing with an immense amount of data and I'm not sure if PARTITION is effective in dealing with this amount of data.
Renaming and re-creating a database is a "bad idea". How can you ever be sure that your database is not being accessed when you rename and re-create?
For example, say I'm in the middle of a purchase and my basket is written to a database table as I add items to it (unlikely scenario but possible). I'm browsing around choosing more items. In the time I'm browsing, the existing database is renamed and a new one re-created. Instantly, my basket is empty with no explanation.
With backups, what happens if your database is renamed half=way through a backup? How can you be sure that all your other renamed databases are backed up?
One final thing - what happens in the long run with renamed databases? Are they left there for ever? Are the dropped after a certain amount of time? etc
If you're checking for records that are 30+ days old, the only solution you should be considering is to time-stamp each record. If your tables are linked via single "master" table, put the time-stamp in there. Your queries can stay largely the same (except from adding a check for a times-stamp) and you don't have to calculate database names for the past 30 days
You should be able to do queries on all databases with UNION, all you need to know is the names of the databases:
select * from MAINDB.table_name
union all
select * from BKDB_2015_12_04.table_name
union all
select * from database_name.table_name

MS Access databases on slow network: Is it faster to separate back ends?

I have an Access database containing information about people (employee profiles and related information). The front end has a single console-like interface that modifies one type of data at a time (such as academic degrees from one form, contact information from another). It is currently linked to multiple back ends (one for each type of data, and one for the basic profile information). All files are located on a network share and many of the back ends are encrypted.
The reason I have done that is that I understand that MS Access has to pull the entire database file to the local computer in order to make any queries or updates, then put any changed data back on the network share. My theory is that if a person is changing a telephone number or address (contact information), they would only have to pull/modify/replace the contact information database, rather than pull a single large database containing contact information, projects, degrees, awards, etc. just to change one telephone number, thus reducing the potential for locked databases and network traffic when multiple users are accessing data.
Is this a sane conclusion? Do I misunderstand a great deal? Am I missing something else?
I realize there is the consideration of overhead with each file, but I don't know how great the impact is. If I were to consolidate the back ends, there is also the potential benefit of being able to let Access handle referential integrity for cascading deletes, etc., rather than coding for that...
I'd appreciate any thoughts or (reasonably valid) criticisms.
This is a common misunderstanding:
MS Access has to pull the entire database file to the local computer in order to make any queries or updates
Consider this query:
SELECT first_name, last_name
FROM Employees
WHERE EmpID = 27;
If EmpID is indexed, the database engine will read just enough of the index to find which table rows match, then read the matching rows. If the index includes a unique constraint (say EmpID is the primary key), the reading will be faster. The database engine doesn't read the entire table, nor even the entire index.
Without an index on EmpID, the engine would do a full table scan of the Employees table --- meaning it would have to read every row from the table to determine which include matching EmpID values.
But either way, the engine doesn't need to read the entire database ... Clients, Inventory, Sales, etc. tables ... it has no reason to read all that data.
You're correct that there is overhead for connections to the back-end database files. The engine must manage a lock file for each database. I don't know the magnitude of that impact. If it were me, I would create a new back-end database and import the tables from the others. Then make a copy of the front-end and re-link to the back-end tables. That would give you the opportunity to examine the performance impact directly.
Seems to me relational integrity should be a strong argument for consolidating the tables into a single back-end.
Regarding locking, you shouldn't ever need to lock the entire back-end database for routine DML (INSERT, UPDATE, DELETE) operations. The database base engine supports more granular locking. Also pessimistic vs. opportunistic locking --- whether the lock occurs once you begin editing a row or is deferred until you save the changed row.
Actually "slow network" could be the biggest concern if slow means a wireless network. Access is only safe on a hard-wired LAN.
Edit: Access is not appropriate for a WAN network environment. See this page by Albert D. Kallal.
ms access is not good to use in local area network nor wide area network which certainly have lower speed. the solution is to use a client server database such as Ms SQL Server or MySQL. Ms SQL Server is much better than My SQL but it is not free. Consider Ms SQL server for large-scale projects. Again I said MS access is only good for 1 computer not for computer network.

Best approach to relating databases or tables?

What I have:
A MySQL database running on Ubuntu that maintains a
large table of articles (similar to
wordpress).
Need to relate a given article to
another set of data. This set of data
will be fairly large.
There maybe various sets of data that
will be related.
The query:
Is it better to contain these various large sets of data within the same database of articles, which will have a large set of traffic on it?
or
Is it better to create different databases (on the same server) that
relate by a primary key to the main database with the articles?
Put them all in the same DB initially, until you find that there is a performance issue. Much easier than prematurely optimising.
Modern RDBMS are very good at optimising data access.
If you need to connect frequently and read both of the records, you should put in a the same database. The server then won't have to run permission checks twice for each of your databases.
If you have serious traffic, you should consider using persistent connection for that query.
If you don't need to read them together frequently, consider to put on different machine. As the high traffic for the bigger database won't cause slow downs on the other.
Different databases on the same server gives you all the problems of a distributed architecture without any of the benefits of scaling out. One database per server is the way to go.
When you say 'same database' and 'different databases related' don't you mean 'same table' vs 'different tables'?
if that's the question, i'd say:
one table for articles
if these 'other sets of data' are all of the same structure, put them all in the same table. if not, one table per kind of data.
everything on the same database
if you grow big enough to make database size a performance issue (after many million records and lots of queries a second), consider table partitioning or maybe replacing the biggest table with a key/value store (couchDB, mongoDB, redis, tokyo cabinet, [etc][6]), which can be a little faster than MySQL but a lot easier to distribute for performance.
[6]:key-value store