Foreign keys across different servers - mysql

I am looking to have one main database with global data such us users & subscriptions. Additionally, to that, each subscription will have its own database, i refer to this type of databases as children.
Databases will be located on different servers, those servers may change from time to time. Due to this child databases are not able to utilize (as far as I am aware) the benefit of foreign keys on data from the global database. i.e. linking a "tool" in the "tools" table, which has column user_id = 12, with the user in the global database.
The question is is it ok for me to include columns, in child databases, that will store ids referencing data in the global database? Is there facilities that I can put in place to recreate what foreign keys offer?
I am running MySQL 5.7, InnoDB engine. The system runs on Laravel 5.2.

Related

Could federated table impact on database performance?

I have some questions before implement the following scenario:
I have the Database A (it contains multiple tables with lots of data, and is being queried by multiple clients)
this database contains a users table, which I need to create some triggers, but this database is managed by a partner. We don't have permissions to create triggers.
And the Database B is managed by me, much lighter, the queries are only from one source, and I need to have access to users table data from Database A so I can create triggers and take actions for every update, create or delete in users table from database A.
My most concern is, how can this federated table impact on performance in database A? Database B is not the problem.
Both databases stay in the same geographic location, just different servers.
My goal is to make possible take actions from every transaction in database A users table.
Definitely queries that read federated tables have performance issues.
https://dev.mysql.com/doc/refman/8.0/en/federated-usagenotes.html says:
A FEDERATED table does not support indexes in the usual sense; because access to the table data is handled remotely, it is actually the remote table that makes use of indexes. This means that, for a query that cannot use any indexes and so requires a full table scan, the server fetches all rows from the remote table and filters them locally. This occurs regardless of any WHERE or LIMIT used with this SELECT statement; these clauses are applied locally to the returned rows.
Queries that fail to use indexes can thus cause poor performance and network overload. In addition, since returned rows must be stored in memory, such a query can also lead to the local server swapping, or even hanging.
(emphasis mine)
The reason the federated engine was created was to support applications that need to write to tables at a rate greater than a single server can support. If you are inserting to a table and overwhelming the I/O of that server, you can use a federated table so you can write to a table on a different server.
Reading from federated tables is likely to be worse than reading local tables, and cannot be optimized with indexes.
If you need good performance, you should use replication or a CDC tool, to maintain a real table on server B that you can query as a local table, not a federated table.
Another solution would be to cache the user's table in the client application, so you don't have to read it on every query.

Merge data for a set of database tables in SSIS across databases

Is there a way in SSIS to effectively synchronize data in multiple tables with foreign key constraints across databases on different servers? Is it with TableDiff which is a command line interface or some other built in tasks?
On top of this, the tables that need to be synchronized have foreign key constraints with other tables outside the set. That means we might need to first disable all the constraints before running the task?

How to fill for the first time a SQL database with multiple tables

I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.
For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.
My question is : Now, what is the easiest way for me to populate this DB ?
My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?
Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related
Notes : I work with R (dplyr, etc.) and MySQL
A serious relational database, such as Postgres for example, will offer features for populating a large database.
Bulk loading
Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.
In Postgres that would be the COPY command.
Temporarily disabling referential-integrity
Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.
In Postgres, use SET CONSTRAINTS … DEFERRED to not check constraints during each statement, and instead wait until the end of the transaction.
Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.
Other issues
For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).
Disable Autocommit
Use COPY (for mass import, mentioned above)
Remove Indexes
Remove Foreign Key Constraints (mentioned above)
Increase maintenance_work_mem (changing the memory allocation of your database engine)
Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)
Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)
Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)
Database migration
By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.

Multiple MySQL databases all using the same schema

EDIT: To clarify throughout this post: when I say "schema" I am referring to "data-model," which are synonyms in my head. :)
My question is very similar to this question (Rails: Multiple databases, same schema), but mine is related to MySQL.
To reiterate the problem: I am developing a SAAS. The user will be given an option of which DB to connect to at startup. Most customers will be given two DBs: a production DB and a test DB, which means that every customer of mine will have 1-2 databases. So, if I have 10 clients, I will have about 20 databases to maintain. This is going to be difficult whenever the program (and datamodel) needs to be updated.
My question is: is there a way to have ONE datamodel for MULTIPLE databases? The accepted answer to the question I posted above is to combine everything into one database and use a company_id to separate out the data, but this has several foreseeable problems:
What happens when these transaction-based tables become inundated? My 1 customer right now has recorded 16k transactions already in the past month.
I'd have to add where company_id = to hundreds of SQL queries/updates/inserts (yes, Jeff Atwood, they're Parametrized SQL calls), which would have a severe impact on performance I can only assume.
Some tables store metadata, i.e., drop-down menu items that will be company-specific in some cases and application-universal in others. where company_id = would add an unfortunate layer of complexity.
It seems logical to me to create (a) new database(s) for each new customer and point their software client to their database(s). But, this will be a headache to maintain, so I'm looking to reduce this potential headache.
Create scripts for deployments for change to the DB schema, keep an in house database of all customers and keep that updated, write that in your scripts to pull from for the connection string.
Way better than trying to maintain a single database for all customers if your software package takes off.
FYI: I am currently with an organization that has ~4000 clients, all running separate instances of the same database (very similar, depending on the patch version they are on, etc) running the same software package. A lot of the customers are running upwards of 20-25k transactions per second.
A "database" in MySQL is called a "schema" by all the other database vendors. There are not separate databases in MySQL, just schemas.
FYI: (real) databases cannot have foreign keys between them, whereas schemas can.
Your test and production databases should most definitely not be on the same machine.
Use Tenant Per Schema, that way you don't have company_ids in every table.
Your database schema should either be generated by your ORM or it should be in source control in sql files, and you should have a script that automatically builds/patches the db. It is trivial to change this script so that it builds a schema per tenant.

Will NEWSEQUENTIALID() SQL always generate unique ID irrespective of Host computer?

This is a generic query, My scenario is: I have a DB(MS SQL) and create a table with a column as uniqueidentifier and assign the values using NEWSEQUENTIALID(), I know it will be unique id always. But what if I am deploying the same DB on three machine (2 machines are transactional DBs and the third is replication DB). In the replication DB, I will update the column to not assign value by itself. From the two transactional DBs, I will replicate the data to the replication DB daily. NOW THE QUERY IS, will the ids generated on the two transactional DB be unique when I replicate to the replication DB. ie. is the IDs generated unique across any machine? or is that only one machine?
Yes, it will still be globally unique.
Have a look as the MSDN page for it:
http://msdn.microsoft.com/en-gb/library/ms189786.aspx
By "Specified Computer" it is refering to the fact that the GUID will be greater than those previously generated. So being greater than the last generated is only guaranteed for that machine. It's uniqueness is Global.