Basically, I'm working on a REST API for my application. I was researching for server performance, and I couldn't find information on splitting a database. Would it be better to have a single file system shared through multiple servers that use MySQL (if it's possible to store it on a shared file system), or should I just upgrade my current server or store different information on different databases (for example users a-n information on database 1 and o-z on another) if it's slowing it down? If anyone has more information on increasing database performance, or something I can read through it would be very much appreciated. Thanks in advance.
"Sharding" is where you put some of the rows of the main table on one physical server, some on other server(s). Only one system in a thousand needs this type of scaling.
"Partitioning" is where you split a single table into multiple "subtables" that act like a single table. There are very few use cases (I count only 4) where this provides any benefit. All the subtables live in the same server.
Having multiple MySQL instances on the same physical server adds complexity and may add some performance, but it is unlikely.
Having multiple MySQL instances on different physical server, but sharing the same data -- Do not attempt; MySQL does not know how to share its own data this way.
Using Replication lets you do arbitrary read scaling. This is slightly complex. It is a common approach. But it only handles apps that are more 'read' than 'write'.
Galera Clustering gives you some write scaling.
A single instance can handle lots of incoming requests (from a web server or other type of client). These clients could be scattered across multiple servers. That is scaling of clients, even without scaling MySQL, may be useful to you.
With a load balancer / proxy server / etc, you can also have multiple clients talking to multiple MySQL servers (read Slaves / Galera nodes / sharded servers / etc).
Many of the above techniques can be combined in the same system.
Bottom line: It smells like you don't yet know if you will need any scaling. When you get to that point, please provide more info on what the app is like so we can discuss the various options with less 'hand waving'.
I'm 99.999% certain all those words are just saying "premature optimization". Unless you have over 20GB of data (or 50GB or 100GB.. basically, a lot), use a single database and once it starts slowing down look at different options (sharding, etc).
And don't worry, you'll have plenty of other things to keep you busy without introducing advanced db tactics :)
Related
Is there any tools or proper way to handle more than 2000 requests (Mostly write request) per second to mysql database? Without reaching queuelimit.
There are a few different ways to handle massive amounts of requests to a MySQL (or any other relational/RDB) database. Starting out with growing traffic you can employ replication which allows for additional machines to send read-only (no INSERTs, UPDATEs, DELETEs, etc.) from one machine and to only write to a single "master" machine (the read replicas copy the written data from the master or write-allowed instance but may be slightly behind the latest data written for a short period of time). Oracle (owner of the MySQL project) has a good article about it (and scaling PHP) here: http://www.oracle.com/technetwork/articles/dsl/white-php-part1-355135.html
Once your app begins taking on requests on a truly massive scale (like Facebook, Google, etc. level) you will want to consider other strategies such as clustering, utilizing NoSQL (for certain functions such as search, analytics, logging, monitoring, etc.), splitting tables and databases based on geographic regions (if it makes sense). There is a starter white paper here: https://www.mysql.com/why-mysql/white-papers/guide-to-scaling-web-databases-with-mysql-cluster/
You can also conduct generic searches for "scaling MySQL" which deliver even more results.
MariaDB 10+ comes with Galera Cluster that allows you to have multiple MASTER servers and you can load balance either by IP or through a device.
Also, the number or requests/second are dependent on how fast a write is completed. If you have a simple atomic raw write, you can turn off INDEXES on the receiving table, so it's as fast as your server can handle. That raw table can by MyISAM and not InnoDB. That's usually up to 10x faster in writes. Have another process read the raw data in bulk into another table with proper indexes. We've had success with up to 10K transactions/second this way
In our software, we share information across installations.
Currently we use staging tables within the same database to facilitate this. We use stored procedures to pull certain data from the live tables into the staging tables, and then dump them. This dump then gets loaded into the staging tables of the target database, and stored procedures merge in the data.
This works fine, but for a few reasons*, I'm considering moving this from staging tables to a separate staging database. I'm just concerned about whether or not this will have any performance implications.
Having very quickly tried this (just as a thought exercise) on a couple of differently configured systems, I've come up with differing results. One (with not much data, and running MySQL 5.6) showed no real difference, possibly even slightly faster. The other (with much more data, running MySQL 5.5) showed it to be about 1.5 times slower.
I'm well aware that there will likely be configuration options that may affect this, I'm no DBA, so any pointers would be much appreciated.
TL;DR
What performance implications might there be in inserting data into tables in a different database (on the same server), compared to within the same database. Will it depend on MySQL version, or configuration settings?
* If you're interested in 'reasons', I can let you know in the comments
Let me start by saying that you cannot conduct tests the way you did.
Databases, and MySQL among them, rely on hardware. There are number of optimizations for MySQL variables available which can turn it from a snail to formula 1.
Consequently, you tested on separate systems, and each either runs on different hardware or contains different data. Technically, what MySQL does by default is use InnoDB storage engine. InnoDB storage engine, by default, stores all the data into a single file. So from some "down to the core" perspective, whether you use a table or a database - MySQL won't really care because it will store it into the one and the same file. From there on we get to the point whether the file is fragmented or not (if on mechanical disk) and to many other interesting details that can't be covered in a single answer.
That brings us to next issue - databases exist to store structured data.
Databases are not glorified text files. They exist so we can ask them to cross-examine data and give us meaningful results to questions.
That means we design databases and tables in ways that correspond to certain logic. If it makes sense to store those staging tables into database A, then store it. If it makes sense to do some other thing - do the other thing.
From performance point of view, it hugely depends on how your servers are configured, both from OS to MySQL variables, to which hardware (especially HDD) you have. Without knowing what's going on there down to the last detail - no one can tell you "Yes, it is faster", "No it is slower" or "It is the same". If they do - they're lying. We basically lack every possible information to tell you with certainty which approach is better.
I am helping a customer migrate a PHP/MySQL application to AWS.
One issue we have encountered is that they have architected this app to use a huge number of databases. They create a new DB (with identical schema) for each user. They expect to have tens of thousands of users.
I don't know MySQL very well, but this setup does not seem at all good to me. My only guess is that the developers did this so they could avoid having tables with huge amounts of data. However I can only think of drawbacks (maintaining this system will be a nightmare, very difficult to extend, difficult to scale, etc..).
Anyhow, is this type of pattern commonly used within the MySQL community? What are the benefits, if any?
I am trying to convince them that they should re-architect the DB schema.
* [EDIT] *
In the meantime we know another drawback of this approach. We had originally intended to use Amazon RDS for data storage. However, RDS currently supports up to 30 databases per instance. So unfortunately RDS is now ruled out. The fact that RDS has this limit in place is already very telling, my interpretation is that having such a huge number of databases is not a common practice with MySQL.
Thanks!
This is one of most horrible ideas I've ever read (and I've read many). For once the amounts of databases do not scale as well as tables in databases and on the other it would be impossible to connect users to each other or at least share common attributes and options. It essentially defeats the purpose of the database itself.
My advise here is rather outside of original scope: Your intuition knows more than you think, listen to it more!
This idea seems quite strange to me also! Databases are designed to handle large data sets after all! If there is genuine concern about the volume of data it is usually better practice to separate tables onto different databases - hosted on different physical servers as this allows you to spread the database level processes across hardware to boost performance
Also I don't know how they plan to host this application but many hosting providers are going to charge you per database instance!
Another problem this will give you is that it will make reporting more difficult - I wouldn't like to try including tables from 10,000 databases in a query!!
I've seen pictures like this where multiple rails engines write to a single mySQL server.
1) Is this possible? Or does Rails want each application server to write to one database server?
2) If this is possible, how is it accomplished? Are there queues and a scheduler between the application servers and the write database server?
Scaling a mysql db is a pretty difficult thing to do, but its certainly been done plenty of times and there are a lot of best practices out there for you to take advantage of. The first thing you should know is that before you worry about scaling writes for a while yet, you probably need to scale your reads first.
Scaling reads can be done fairly easily using replication. There are several tools out there that make managing replication a lot easier such as Amazon RDS. Generally speaking many web severs can connect to many databases (as suggested by others), however you quickly run into scale issues once you have a lot of traffic, connections or whatever other action you are performing that generates load on the server.
As replicated severs are read only, you need to manage which sever you connect to depending on the action you're performing. I.e. if you had a users table, when creating, updating or deleting users you need to use the "write" database (the primary "source" sever) but when reading the user table, you can use one of the read replicas. This reduces the load on the primary write sever (allowing it to deal with even more writes) and as you can have multiple read databases behind a load balancer, you can get away with this structure for a very long time and scale reads across tens of database severs before you'll hit any significant issues (however most apps get away with 1-3).
There are situations where you will need to use your write database for read actions (although you should avoid it as much as possible) as the read replicas can be slightly behind the write dbs due to latency in replicating the write db queries, however most of the time you should be able to code knowing that there is the possibility that the read db is delayed (i.e. queue actions a reasonable period of time such that the updates will propagate across all the read severs) and simply use one of your read dbs rather than the write db.
Beyond this the key items to work on are ensuring you have efficient indexes and applying other best practices around maintaining a sensible data structure. You might also want to consider having 3 distinct "groups" of database servers. I generally like to have write, read and "stats" db groups. The write group for create, update and delete operations (as well as select for update), the read for general read items that must return their results quickly, and stats for anything that is going to be under high load and that you do not rely on for a prompt response (this keeps heavy queries that are not time sensitive away from your read db that you need quick responses from for general reads)
Once you get into a situation where you can no longer buy larger hardware and you're near maxing out your write capacity, you'll need to look into sharding, however that will take a lot of traffic / data (so dont worry about it unless you've done all of the above already).
I'm building a web app. This app will use MySQL to store all the information associated with each user. However, it will also use MySQL to store sys admin type stuff like error logs, event logs, various temporary tokens, etc. This second set of information will probably be larger than the first set, and it's not as important. If I lost all my error logs, the site would go on without a hiccup.
I am torn on whether to have multiple databases for these different types of information, or stuff it all into a single database, in multiple tables.
The reason to keep it all in one, is that I only have to open up one connection. I've noticed a measurable time penalty for connection opening, particularly using remote mysql servers.
What do you guys do?
Fisrt,i must say, i think storing all your event logs, error logs in db is a very bad idea, instead you may want to store them on the filesystem.
You will only need error logs or event logs if something in your web app goes unexpected. Then you download the file, and examine it, thats all. No need to store it on the db. It will slow down your db and your web app.
As an answer to your question, if you really want to do that, you should seperate them, and you should find a way to keep your page running even your event og and error log databases are loaded and responding slowly.
Going with two distinct database (one for your application's "core" data, and another one for "technical" data) might not be a bad idea, at least if you expect your application to have a lot of users :
it'll allow you to put one DB on one server, and the other DB on a second server
and you can think about scaling a bit more, later : more servers for the "core" data, and still only one for the "technical" data -- or the opposite
if the "technical" data is not as important, you can (more easily) have two distinct backup processes / policies
having two distinct databases, and two distinct servers, also means you can have heavy calculations on the technical data, without impacting the DB server that hosts the "core" data -- and those calculations can be useful, on logs, or stuff like that.
as a sidenote : if you don't need that kind of "reporting" calculations, maybe storing those data to a DB is not useful, and files would do perfectly ?
Maybe opening two connections means a bit more time -- but that difference is probably rather negligible, is it not ?
I've worked a couple of times on applications that would use two database :
One "master" / "write" database, that would be used only for writes
and one "slave" database (a replication of the first one, to several slave servers), that would be used for reads
This way, yes, we sometimes open two connections -- bu one server alone would not have been able to handle the load...
Use connection pooling anyway. So the time to get a connection is not a problem. But if you have 2 connections, transaction handling become more complicated. On the other hand, sometimes it's handy to have 2 connections: if something goes wrong on the business transaction, you can rollback transaction and still log the failure on the admin transaction. But I would still stick to one database.
I would only use one databse - mostly for the reason you supply: You only need one connection to reach both logging and user stored data.
Depending on your programming language, some frameworks (J2EE as an example) provide connection pooling. With two databases you would need two pools. In PHP on the other hand, the performance come in to perspective when setting up a connection (or two).
I see no reason for two databases. It'd be perfectly acceptable to have tables that are devoted to "technical" and "business"data, but the logical separation should be sufficient.
Physical separation doesn't seem necessary to me, unless you mean an application and data warehouse star schema. In that case, it's either real-time updates or, more typically, a nightly batch ETL.
It makes no difference to mysql in any way whether you use separate "datbases", they are simply catalogues.
It may make setting permissions easier, this is a legitimate reason to do it. Other than that, it is exactly the same as keeping the tables in the same db (except you can have several tables with the same name ... but please don't)
Putting them on separate servers might be a good idea however, as you probably don't want your core critical (user info, for example) data mixed in with your high-volume, unimportant data. This is particularly true for old audit data, debug logs etc.
Also short-lived data, such as search results, sessions etc, could be placed on a different server - it presumably has no high availability[1] requirement.
Having said that, if you don't need to do this, dump it all on one server where it's easier to manage (backup, provide high availibilty, manage security etc).
It is not generally possible to take a consistent snapshot of data on >1 server. This is a good reason to only have one (or one that you care about for backup purposes)
[1] Of the data, not the database.
In MySQL, InnoDB has an option of storing all tables of a certain database in one file, or having one file per table.
Having one file per table is somewhat recommended anyway, and if you do that, it makes difference on the database storage level if you have one database or several.
With connection pooling, one database or several is probably not going to matter either.
So, in my opinion, the question is if you'd ever consider separating the "other half" of the database to a separate server - with the separate server having perhaps a very different hardware configuration, such as no RAID. If so, consider using separate databases. If not, use a single database.