MySQL Multi Tenant Application - Too Many Tables & Performance Issues - mysql

I am developing a multi-tenant application where for each tenant I create separate set of 50 tables in a single MySQL database in LAMP environment.
In each set average table size is 10 MB with the exception of about 10 tables having size between 50 to 200MB.
MySQL InnoDB creates 2 files(.frm & .ibd) for each table.
For 100 tenants there will be 100 x 50 = 5000 Tables x 2 Files = 10,000 Files
It looks too high to me. Am I doing it in a wrong way or its common in this kind of scenario. What other options I should consider ?
I also read this question but this question was closed by moderators so it did not attract many thoughts.

Have one database per tenant. That would be 100 directories, each with 2*50 = 100 files. 100 is reasonable; 10,000 items in a directory is dangerously high in most operating systems.
Addenda
If you have 15 tables that are used by all tenants, put them in one extra database. If you call that db Common, then consider these snippits:
USE Tenant; -- Customer starts in his own db
SELECT ... FROM table1 ...; -- Accesses `table1` for that tenant
SELECT a.this, b.blah
FROM table1 AS a -- tenant's table
JOIN Common.foo AS b ON ... -- common table
Note on grants...
GRANT ALL PRIVILEGES ON Tenant_123.* TO tenant_123#'%' IDENTIFIED BY ...;
GRANT SELECT ON Common.* TO tenant_123#'%';
That is, it is probably OK to 'grant' everything to his own database. But he show have very limited access to the Common data.
If, instead, you manage the logins and all accesses go through, say, a PHP API, then you probably have only one mysql 'user' for all accesses. In this case, my notes above about GRANTs are not relevant.
Do not let the Tenants have access to everything. Your entire system will quickly be hacked and possibly destroyed.

Typically, this has little to do with which way to do it versus which way you've basically sold your customers on how it's to be done, or in some cases having no choice due to the type of data.
For example, does your application have a policy or similar that defines isolation of user generated data? Does your application store HIPAA or PCI type data? If so, you may not even have a choice, and if the customer is expecting that sort of privacy, that normally comes at a premium due to the potential overhead of creating the separation.
If the separation/isolation of data is not required, then adding a field to tables indicating which application owns the data would be most ideal from a performance perspective, and you would just need to update your queries to filter based on that.

Using MySQL or MariaDB I prefer to use a single database for all tenants and restrict access to data by using a different database user per tenant which only has permission to their data.
You can accomplish by using an tenant_id column that stores the database username of the tenant that owns the data. I use a trigger to populate this column automatically when new rows are added. I then use Views to filter the tables where the tenant_id = current_database_user. Then I restrict the tenant database users to only have access to the views, not the real tables.
I was able to convert a large single-tenant application to a multi-tenant application over a weekend using this technique because I only needed to modify the database and my database connection code.
I've written a blog post fully describing this approach. https://opensource.io/it/mysql-multi-tenant/

Related

MySQL or SQLite for live messages and notifications

I am going to build a website which will provide to the clients the ability to live chat user to user and also implement a notification system (for new updates, messages, events e.t.c). The primary database is going to be MySQL.
But i was thinking to release some "pressure" from the main database MySQL and instead of saving the chat messages and notifications into MySQL, create a personal SQLite for each client where the messages and notifications will be stored which as a result will leave MySQL do other more important queries. Each client can see only his/her messages and notifications so only he/she will open/read/write/update the corresponding SQLite database.
I think creating individual SQLite databases for each client is like partitioning the tables (messages, notifications) by user_id if they were stored in MySQL. So f.e: instead of quering each time through all the clients notifications in order to find the one which are for user with id:5, we just open his personal SQLite database which will contain ofc much fewer records (only his records).
Now to the question part:
As i said the queries will be simple. Just some selects with 1 or 2 where clauses (mostly on indexes) and some ordering. Since this is the first time i am going to use SQLite i am wondering:
1) Will this method (individual SQLite instead of MySQL) practically work in case of perfomance?
2) Which method will still perform better after some time when they will begin getting bigger and bigger? SQLite or MySQL?
3) Does SQLite have any memory limits which will make it run slower in time?
Whats your opinions on this method?

Multiple MySQL databases all using the same schema

EDIT: To clarify throughout this post: when I say "schema" I am referring to "data-model," which are synonyms in my head. :)
My question is very similar to this question (Rails: Multiple databases, same schema), but mine is related to MySQL.
To reiterate the problem: I am developing a SAAS. The user will be given an option of which DB to connect to at startup. Most customers will be given two DBs: a production DB and a test DB, which means that every customer of mine will have 1-2 databases. So, if I have 10 clients, I will have about 20 databases to maintain. This is going to be difficult whenever the program (and datamodel) needs to be updated.
My question is: is there a way to have ONE datamodel for MULTIPLE databases? The accepted answer to the question I posted above is to combine everything into one database and use a company_id to separate out the data, but this has several foreseeable problems:
What happens when these transaction-based tables become inundated? My 1 customer right now has recorded 16k transactions already in the past month.
I'd have to add where company_id = to hundreds of SQL queries/updates/inserts (yes, Jeff Atwood, they're Parametrized SQL calls), which would have a severe impact on performance I can only assume.
Some tables store metadata, i.e., drop-down menu items that will be company-specific in some cases and application-universal in others. where company_id = would add an unfortunate layer of complexity.
It seems logical to me to create (a) new database(s) for each new customer and point their software client to their database(s). But, this will be a headache to maintain, so I'm looking to reduce this potential headache.
Create scripts for deployments for change to the DB schema, keep an in house database of all customers and keep that updated, write that in your scripts to pull from for the connection string.
Way better than trying to maintain a single database for all customers if your software package takes off.
FYI: I am currently with an organization that has ~4000 clients, all running separate instances of the same database (very similar, depending on the patch version they are on, etc) running the same software package. A lot of the customers are running upwards of 20-25k transactions per second.
A "database" in MySQL is called a "schema" by all the other database vendors. There are not separate databases in MySQL, just schemas.
FYI: (real) databases cannot have foreign keys between them, whereas schemas can.
Your test and production databases should most definitely not be on the same machine.
Use Tenant Per Schema, that way you don't have company_ids in every table.
Your database schema should either be generated by your ORM or it should be in source control in sql files, and you should have a script that automatically builds/patches the db. It is trivial to change this script so that it builds a schema per tenant.

MySQL Federated storage engine vs replication (performance)

Long story short - I am dealing with a largish database where basic user details (userid (index), username, password, parent user, status) are stored in one database and extended user details (same userid (index), full name, address etc. etc.) are stored in another database on another server.
I need to do a query where I select all users owned by a particular user (via the parent user field from basic user details database), sorted by their full name (from the extended user details field) and just 25 at a time (there are thousands, maybe tens of thousands for any one user).
As far as I can work out there are three possible solutions;
No JOIN - get all the user IDs in one query and run the second query based on those IDs. This would be fine, except the number of user IDs could get so high that it would exceed the maximum query length, or be horribly inefficient.
Replicate the database table with the basic user details onto the server with the extended details so I can do a JOIN
Use a federated storage engine table to achieve the same results as #2
It seems that 3 is the best option, but I have been able to find little information about performance and I also found one comment to be careful using this on production databases.
I would appreciate any suggestions on what would be the best implementation.
Thanks!
FEDERATED tables are a nice feature .. but they do not support indexes, which would slow down your application dramatically.
if (!) you do read only from the users database on the remote server.
replication would be more effective and also faster.
Talking in terms of performance or limitations, Federated Engine has a lot of limitations. It doesn't support for transactions, Performance on a FEDERATED table when performing bulk inserts is slower than with other table types etc..
Replication and Federated engines are not meant to do same things. First of all, did you try both?

MS Access databases on slow network: Is it faster to separate back ends?

I have an Access database containing information about people (employee profiles and related information). The front end has a single console-like interface that modifies one type of data at a time (such as academic degrees from one form, contact information from another). It is currently linked to multiple back ends (one for each type of data, and one for the basic profile information). All files are located on a network share and many of the back ends are encrypted.
The reason I have done that is that I understand that MS Access has to pull the entire database file to the local computer in order to make any queries or updates, then put any changed data back on the network share. My theory is that if a person is changing a telephone number or address (contact information), they would only have to pull/modify/replace the contact information database, rather than pull a single large database containing contact information, projects, degrees, awards, etc. just to change one telephone number, thus reducing the potential for locked databases and network traffic when multiple users are accessing data.
Is this a sane conclusion? Do I misunderstand a great deal? Am I missing something else?
I realize there is the consideration of overhead with each file, but I don't know how great the impact is. If I were to consolidate the back ends, there is also the potential benefit of being able to let Access handle referential integrity for cascading deletes, etc., rather than coding for that...
I'd appreciate any thoughts or (reasonably valid) criticisms.
This is a common misunderstanding:
MS Access has to pull the entire database file to the local computer in order to make any queries or updates
Consider this query:
SELECT first_name, last_name
FROM Employees
WHERE EmpID = 27;
If EmpID is indexed, the database engine will read just enough of the index to find which table rows match, then read the matching rows. If the index includes a unique constraint (say EmpID is the primary key), the reading will be faster. The database engine doesn't read the entire table, nor even the entire index.
Without an index on EmpID, the engine would do a full table scan of the Employees table --- meaning it would have to read every row from the table to determine which include matching EmpID values.
But either way, the engine doesn't need to read the entire database ... Clients, Inventory, Sales, etc. tables ... it has no reason to read all that data.
You're correct that there is overhead for connections to the back-end database files. The engine must manage a lock file for each database. I don't know the magnitude of that impact. If it were me, I would create a new back-end database and import the tables from the others. Then make a copy of the front-end and re-link to the back-end tables. That would give you the opportunity to examine the performance impact directly.
Seems to me relational integrity should be a strong argument for consolidating the tables into a single back-end.
Regarding locking, you shouldn't ever need to lock the entire back-end database for routine DML (INSERT, UPDATE, DELETE) operations. The database base engine supports more granular locking. Also pessimistic vs. opportunistic locking --- whether the lock occurs once you begin editing a row or is deferred until you save the changed row.
Actually "slow network" could be the biggest concern if slow means a wireless network. Access is only safe on a hard-wired LAN.
Edit: Access is not appropriate for a WAN network environment. See this page by Albert D. Kallal.
ms access is not good to use in local area network nor wide area network which certainly have lower speed. the solution is to use a client server database such as Ms SQL Server or MySQL. Ms SQL Server is much better than My SQL but it is not free. Consider Ms SQL server for large-scale projects. Again I said MS access is only good for 1 computer not for computer network.

lock down view to be uneditable

I am building a database that contains public, private(limited to internal) and confidential data (limited to very few). It has very specific requirements that the security of the the data is managed on the database side, but I am working in an environment where I do not have direct control of the permissions, and requests to change them will be time consuming (2-3 days).
So I created a structure that should meet our needs without requiring a lot of permissioning. I created two databases on the same server, one is the internal one, who's tables can only be edited by certain users within certain subnets of our network. The second is the public database where, using an admin account, I create views limited to public fields of tables in the internal database to expose public data and it seems to work well. However the data should only flow one way and the views should not be able to write to the source tables. And I cannot just lock down the public database to be only SELECTable since the public database is used for various tasks of our public website.
So I need to create views to limit access of some scripts to certain fields in a table. I need to make sure that those views are not able insert, update, or delete data in the source table. To create the view I use:
CREATE ALGORITHM = UNDEFINED
VIEW `table_view` AS
SELECT *
FROM `table`
Looking at the documentation to prevent updates the view needs to have aggregate data, sub queries in the WHERE clause, and ALGORITHM = TEMPTABLE. I would go with TEMPTABLE, but the manual is unclear whether it would impact the performance. In one paragraph the manual states:
It prefers MERGE over TEMPTABLE if
possible, because MERGE is usually
more efficient
Then immediately states:
A reason to choose TEMPTABLE
explicitly is that locks can be
released on underlying tables after
the temporary table has been created
and before it is used to finish
processing the statement. This might
result in quicker lock release than
the MERGE algorithm so that other
clients that use the view are not
blocked as long.
The views are going to be queried on page load to generate the contents of the page, would MERGE still be more efficient or would the lower lock time serve me better? And no, handling this through account permissions is not really an option due to the inability to GRANT permissions on individual fields to meet the legal confidentiality requirements. To meet them would require fragmenting each table into 2-3 tables containing fields with homogeneous confidentiality.
Should the algorithm be UNDEFINED or TEMPTABLE, or is there another setting in the view definition that will lock down the view. And what are the performance effects I will experience. Also, if I do something to force it to be uneditable, like including HAVING 1 to make it an aggregate function force it to be TEMPTABLE and the choice of algorithm moot.
I'm wondering why you don't just lock down grants to the account(s) being used to not have DELETE, INSERT or UPDATE.
MySQL doesn't appear to support roles, where I'd have defined a role without these grants & just associated the account(s) with that role - pity...