Related
I am helping a customer migrate a PHP/MySQL application to AWS.
One issue we have encountered is that they have architected this app to use a huge number of databases. They create a new DB (with identical schema) for each user. They expect to have tens of thousands of users.
I don't know MySQL very well, but this setup does not seem at all good to me. My only guess is that the developers did this so they could avoid having tables with huge amounts of data. However I can only think of drawbacks (maintaining this system will be a nightmare, very difficult to extend, difficult to scale, etc..).
Anyhow, is this type of pattern commonly used within the MySQL community? What are the benefits, if any?
I am trying to convince them that they should re-architect the DB schema.
* [EDIT] *
In the meantime we know another drawback of this approach. We had originally intended to use Amazon RDS for data storage. However, RDS currently supports up to 30 databases per instance. So unfortunately RDS is now ruled out. The fact that RDS has this limit in place is already very telling, my interpretation is that having such a huge number of databases is not a common practice with MySQL.
Thanks!
This is one of most horrible ideas I've ever read (and I've read many). For once the amounts of databases do not scale as well as tables in databases and on the other it would be impossible to connect users to each other or at least share common attributes and options. It essentially defeats the purpose of the database itself.
My advise here is rather outside of original scope: Your intuition knows more than you think, listen to it more!
This idea seems quite strange to me also! Databases are designed to handle large data sets after all! If there is genuine concern about the volume of data it is usually better practice to separate tables onto different databases - hosted on different physical servers as this allows you to spread the database level processes across hardware to boost performance
Also I don't know how they plan to host this application but many hosting providers are going to charge you per database instance!
Another problem this will give you is that it will make reporting more difficult - I wouldn't like to try including tables from 10,000 databases in a query!!
is mysql capable of managing the data for a site which holds lots of data (say with hundreds of millions of users)? which database would be the most capable/beneficial?
Wikipedia is based on MySQL. I don't think it has 100M users, but it must be close by now.
No database will handle hundreds of millions of users unless you know how to set it up properly. No single server could handle that kind of traffic, so you need to know how to setup replication and load balancing. Once you reach a certain level, there is no out of the box solution, only tools you can use. MySQL being a very capable tool.
There are a couple of answers to this.
Yes, MySQL can store hundreds of millions of records; you need to know what you're doing, have a decent database schema, pretty robust hardware, but you're not pushing the limits.
When you talk about "hundreds of millions of users", you're talking about a site along the lines of Wikipedia/Facebook/Google/Amazon in scale. You need a custom, highly cached, distributed architecture to run a site at that scale - and the traditional database driven application architecture will almost certainly not be enough. You could still store your data in MySQL, but you'd need a whole bunch of additional components to make it all work - and without knowing more about the application, nobody could tell you what that might be. At that scale, none of the commonly used databases would suffice, so MySQL is no better or worse than any of the other options...
Your question is really irrelevant, because creating a product or service that hundreds of millions of customers actually want is a much bigger and more difficult challenge than choosing a database engine.
If you're starting a business from nothing, pick a technical platform you already know and go with it: productivity and quick implementation will be more important than being scalable to a level you may never reach anyway.
If you do eventually become successful enough to have to deal with hundreds of millions of customers, then you'll certainly be able to raise the cash to buy whatever expertise and hardware you need.
Let's say I want to build a gaming website and I have many game sections. They ALL have a lot of data that needs to be stored. Is it better to make one database with a table representing each game or have a database represent each section of the game? I'm pretty much expecting a "depends" kind of answer.
Managing 5 different databases is going to be a headache. I would suggest using one database with 5 different tables. Aside from anything else, I wouldn't be surprised to find you've got some common info between the 5 - e.g. user identity.
Note that your idea of "a lot of data" may well not be the same as the database's... databases are generally written to cope with huge globs of data.
Depends.
Just kidding. If this is one project and the data are in any way related to each other I would always opt for one database absent a specific and convincing reason for doing otherwise. Why? Because I can't ever remember thinking to myself "Boy, I sure wish it were harder to see that information."
While there is not enough information in your question to give a good answer, I would say that unless you foresee needing data from two games at the same time for the same user (or query), there is no reason to combine databases.
You should probably have a single database for anything common, and then create independent databases for anything unique. Databases, like code, tend to end up evolving in different directions for different applications. Keeping them together may lead you to break things or to be more conservative in your changes.
In addition, some databases are optimized, managed, and backed-up at a database level rather than a table level. Since they may have different performance characteristics and usage profiles, a one-size-fit-all solution may not be scalable.
If you use an ORM framework, you get access to multiple databases (almost) for free while still avoiding code replication. So unless you have joint queries, I don't think it's worth it to pay the risk of shared databases.
Of course, if you pay someone to host your databases, it may be cheaper to use a single database, but that's really a business question, not software.
If you do choose to use a single database, do yourself a favour and make sure the code for each game only knows about specific tables. It would make it easier for you to maintain things later or separate into multiple databases.
One database.
Most of the stuff you are reasonably going to want to store is going to be text, or primitive data types such as integers. You might fancy throwing your binary content into blobs, but that's a crazy plan on a media-heavy website when the web server will serve files over HTTP for free.
I pulled lead programming duties on a web-site for a major games publisher. We managed to cover a vast portion of their current and previous content, in three European languages.
At no point did we ever consider having multiple databases to store all of this, despite the fact that each title was replete with video and image resources.
I cannot imagine why a multiple database configuration would suit your needs here, either in development or outside of it. The amount of synchronisation you'll have to pull and capacity for error is immense. Trying to pull data that pertains to all of them from all of them will be a nightmare.
Every site-wide update you migrate will be n times as hard and error prone, where n is the number of databases you eventually plump for.
Seriously, one database - and that's about as far from your anticipated depends answer as you're going to get.
If the different games don't share any data it would make sense to use separate databases. On the other hand it would make sense to use one database if the structure of the games' data is the same--you would have to make changes in every game database separately otherwise.
Update: In case of doubt you should always use one database because it's easier to manage in the most cases. Just if you're sure that the applications are completely separate and have completely different structures you should use more databases. The only real advantage is more clarity.
Generally speaking, "one database per application" tends to be a good rule of thumb.
If you're building one site that has many sections for talking about different games (or different types of games), then that's a single application, so one database is likely the way to go. I'm not positive, but I think this is probably the situation you're asking about.
If, on the other hand, your "one site" is a battle.net-type matching service for a collection of five distinct games, then the site itself is one application and each of the five games is a separate application, so you'd probably want six databases since you have a total of six largely-independent applications. Again, though, my impression is that this is not the situation you're asking about.
If you are going to be storing the same data for each game, it would make sense to use 1 database to store all the information. There would be no sense in replicating table structures across different databases, likewise there would be no sense in creating 5 tables for 5 games if they are all storing the same information.
I'm not sure this is correct, but I think you want to do one database with 5 tables because (along with other reasons) of the alternative's impact on connection pooling (if, for example, you're using ADO.Net). In the ADO.Net connection pool, connections are keyed by the connection string, so with five different databases you might end up with 20 connections to each database instead of 100 connections to one database, which would potentially affect the flexibility of the allocation of connections.
If anybody knows better or has additional info, please add it here, as I'm not sure if what I'm saying is accurate.
What's your idea of "a lot of data"? The only reason that you'd need to split this across multiple databases is if you are trying to save some money with shared hosting (i.e. getting cheap shared hosts and splitting it across servers), or if you feel each database will be in the 500GB+ range and do not have access to appropriate storage.
Note that both of these reasons have nothing to do with architecture, and entirely based on monetary concerns during scaling.
But since you haven't created the site yet, you're putting the cart before the horse. It is very unlikely that a brand new site would use anywhere near this level of storage, so just create 1 database.
Some companies have single databases in the 1,000+ TB range ... there is basically no upper bound on database size.
The number of databases you want to create depends not on the number of your games, but on the data stored in the databases, or, better say, how do you exchange these data between the databases.
If it is export and import, then do separate databases.
If it is normal relationships (with foreign keys and cross-queries), then leave it in one database.
If the databases are not related to each other, then they are separate databases, of course.
In one of my projects, I distinguished between the internal and external data (which were stored in separate databases).
The difference was quite simple:
External database stored only the facts you cannot change or undo. That was phone calls, SMS messages and incoming payments in our case.
Internal database stored the things that are usually stored: users, passwords etc.
The external database used only the natural PRIMARY KEY's, that were the phone numbers, bank transaction id's etc.
The databases were given with completely different rights and exchanging data between them was a matter of import and export, not relationships.
This made sure that nothing would happen with actual data: it is easy to relink a payment to a user, but it's very hard to restore a payment if it's lost.
I can pass on my experience with a similar situation.
We had 4 "Common" databases and about 30 "Specific" databases, separated for the same space concerns. The downside is that the space concerns were just projecting dBase shortcomings onto SQL Server. We ended up with all these databases on SQL Server Enterprise that were well under the maximum size allowed by the Desktop edition.
From a database perspective with respect to separation of concerns, the 4 Common databases could've been 2. The 30 Specific databases could've been 3 (or even 1 with enough manipulation / generalization). It was inefficient code (both stored procs and data access layer code) and table schema that dictated the multitude of databases; in the end it had nothing at all to do with space.
I would consolidate as much as possible early and keep your design & implementation flexible enough to extract components if necessary. In short, plan for several databases but implement as one.
Remember, especially on web sites. If you have multiple databases, you often lose the performance benefits of query caching and connection pooling. Stick to one.
Defenitively, one database
One place I worked had many databases, a common one for the stuff all clients used and client specifc ones for customizing by client. What ended up happening was that since the clients asked for the changes, they woudl end up inthe client database instead of common and thus there would be 27 ways of doing essentially the same thing becasue there was no refactoring from client-specific to "hey this is something other clients will need to do as well" so let's put it in common. So one database tends to lead to less reinventing the wheel.
Security Model
If each game will have a distinct set of permissions/roles specific to that game, split it out.
Query Performance /Complexity
I'd suggest keeping them in a single database if you need to frequently query across the data between the games.
Scalability
Another consideration is your scalability plans. If the games get extremely popular, you might want to buy separate database hardware for each game. Separating them into different databases from the start would make that easier.
Data Size
The size of the data should not be a factor in this decision.
Just to add a little. When you have millions and millions of players in one game and your game is realtime and you have tens of thousand simultaneous players online and you have to at least keep some essential data as up-to-date in DB as possible (say, player's virtual money). Then you will want to separate tables into independent DBs even though they are all "connected".
It really depends. And scaling will be painful whatever you may try to do to avoid it being painful. But if you really expect A LOT of players and updates and data I would advise on thinking twice, thrice and more before settling on a "one DB for several projects" solution.
Yes it will be difficult to manage several DBs probably. But you will have to do this anyway.
Really depends :)..
Ask yourself these questions:
Could there be a resuability (users table) that I may want to think about?
Is it worth seperating these entities or are they pretty much the same?
Do any of these entities share specific events / needs?
Is it worth my time and effort to build 5 different database systems (remember if you are writing the games that would mean different connection strings and also present more security, etc).
Or you could create one database OnlineGames and have a table that stores the game name and a category:
PacMan Arcade
Zelda Role playing
etc etc..
It really depends on what your intentions are...
I'm building a web app. This app will use MySQL to store all the information associated with each user. However, it will also use MySQL to store sys admin type stuff like error logs, event logs, various temporary tokens, etc. This second set of information will probably be larger than the first set, and it's not as important. If I lost all my error logs, the site would go on without a hiccup.
I am torn on whether to have multiple databases for these different types of information, or stuff it all into a single database, in multiple tables.
The reason to keep it all in one, is that I only have to open up one connection. I've noticed a measurable time penalty for connection opening, particularly using remote mysql servers.
What do you guys do?
Fisrt,i must say, i think storing all your event logs, error logs in db is a very bad idea, instead you may want to store them on the filesystem.
You will only need error logs or event logs if something in your web app goes unexpected. Then you download the file, and examine it, thats all. No need to store it on the db. It will slow down your db and your web app.
As an answer to your question, if you really want to do that, you should seperate them, and you should find a way to keep your page running even your event og and error log databases are loaded and responding slowly.
Going with two distinct database (one for your application's "core" data, and another one for "technical" data) might not be a bad idea, at least if you expect your application to have a lot of users :
it'll allow you to put one DB on one server, and the other DB on a second server
and you can think about scaling a bit more, later : more servers for the "core" data, and still only one for the "technical" data -- or the opposite
if the "technical" data is not as important, you can (more easily) have two distinct backup processes / policies
having two distinct databases, and two distinct servers, also means you can have heavy calculations on the technical data, without impacting the DB server that hosts the "core" data -- and those calculations can be useful, on logs, or stuff like that.
as a sidenote : if you don't need that kind of "reporting" calculations, maybe storing those data to a DB is not useful, and files would do perfectly ?
Maybe opening two connections means a bit more time -- but that difference is probably rather negligible, is it not ?
I've worked a couple of times on applications that would use two database :
One "master" / "write" database, that would be used only for writes
and one "slave" database (a replication of the first one, to several slave servers), that would be used for reads
This way, yes, we sometimes open two connections -- bu one server alone would not have been able to handle the load...
Use connection pooling anyway. So the time to get a connection is not a problem. But if you have 2 connections, transaction handling become more complicated. On the other hand, sometimes it's handy to have 2 connections: if something goes wrong on the business transaction, you can rollback transaction and still log the failure on the admin transaction. But I would still stick to one database.
I would only use one databse - mostly for the reason you supply: You only need one connection to reach both logging and user stored data.
Depending on your programming language, some frameworks (J2EE as an example) provide connection pooling. With two databases you would need two pools. In PHP on the other hand, the performance come in to perspective when setting up a connection (or two).
I see no reason for two databases. It'd be perfectly acceptable to have tables that are devoted to "technical" and "business"data, but the logical separation should be sufficient.
Physical separation doesn't seem necessary to me, unless you mean an application and data warehouse star schema. In that case, it's either real-time updates or, more typically, a nightly batch ETL.
It makes no difference to mysql in any way whether you use separate "datbases", they are simply catalogues.
It may make setting permissions easier, this is a legitimate reason to do it. Other than that, it is exactly the same as keeping the tables in the same db (except you can have several tables with the same name ... but please don't)
Putting them on separate servers might be a good idea however, as you probably don't want your core critical (user info, for example) data mixed in with your high-volume, unimportant data. This is particularly true for old audit data, debug logs etc.
Also short-lived data, such as search results, sessions etc, could be placed on a different server - it presumably has no high availability[1] requirement.
Having said that, if you don't need to do this, dump it all on one server where it's easier to manage (backup, provide high availibilty, manage security etc).
It is not generally possible to take a consistent snapshot of data on >1 server. This is a good reason to only have one (or one that you care about for backup purposes)
[1] Of the data, not the database.
In MySQL, InnoDB has an option of storing all tables of a certain database in one file, or having one file per table.
Having one file per table is somewhat recommended anyway, and if you do that, it makes difference on the database storage level if you have one database or several.
With connection pooling, one database or several is probably not going to matter either.
So, in my opinion, the question is if you'd ever consider separating the "other half" of the database to a separate server - with the separate server having perhaps a very different hardware configuration, such as no RAID. If so, consider using separate databases. If not, use a single database.
We are looking to have about 35-40 people writing to an access database via script on a shared drive. The metrics break down to them needed to write about 3-7 times an hour. Would Access support this without going ape on me.
Yes I would love to use this as a SQL server but that means going through massive amounts of red tape/meetings paperwork etc that I would prefer not to bother with
Could you not make them go with the free edition of SQL Server Express without the red tape?
In answer to your question, though, I've seen Access give big problems in environments with this many users, although that was pre 2007. I dunno how much it has changed.
If it were me, I'd avoid Access at all cost.
Could it? Yes. If you are very careful and perform locking and ensure that nobody steps on anybody else. Access is really not designed for any form of concurrency. I know of one place that managed to make it work in a very concurrent environment, but that environment basically logged everything and if the DB clobbered itself, it'd restore from the last backup and replay against the Access file automatically, so that the failures were transparent. I would not recommend following that course of action...
Should you do it? No. Is there any reason that you cannot use something like PostgreSQL or MySQL?
Yes, it would work. No, it's not a good idea.
Access would be able to handle the load, as long as those 35-40 people aren't all trying to access the database at once. It'll quickly bog down when you start having more than a couple of concurrent users, particularly if those users are all trying to update something.
The problem is that isn't not safe. You need to have the entire database file accessible on a network share, where any users will be able to write to it. You'll have multiple instances of Access trying to read and modify the file at the same time, and unless you are very careful with locking, it's quite possible for the database to become damaged or corrupt.
You'll also never be able to add any kind of access control beyond basic file permissions. You might not need it now, but internal databases often end up needing to be exposed to the wider world somehow.
It's not worth it. There are plenty of real RDBMS systems out there, for free, that are designed to handle this kind of thing. Why spend time trying to make Access work in such an environment, when you could just install SQL Server Express and be done with it? It has limitations, but if you're seriously considering Access, you're never going to be anywhere near those. Or use MySQL, PostgreSQL, Firebird...
I would avoid access too. Have you every thought about sql ce. It should handle multi users better and it is file just like access.
7 * 40 = 280 per hour.
280 / 60 = 4,6 per mins.
If your script is light, and if you don't read results too often, maybe...
Of course I don't recommand you to try. Meetings time! ;)
If the connections are opened only as long as needed to run the scripts, and you use transactions and have some retry logic built in when there's a conflict, there really oughtn't be too much of an issue.
If your script takes 1 second to do its update (that's a pretty long time in computer/database terms, of course), and there are 280 updates per hour, if you were lucky enough that no two users simultaneously ran their scripts, you would still have 3,320 seconds when the database was not open.
I don't see an issue, assuming that you know how to properly manage your connections and manage your Jet transactions.
That volume is not a problem for Access so long as it's on a stable LAN or very high speed WAN. Wireless connections are also a bad idea.
I have several clients which are adding about 200K to 300K transactions per year into the systems. So that's about 1000 per work day. That's using both an Access front end and back end.
That said one of them will be upsizing shortly to SQL Server. I fired the other client when they hired a PHB (Dilbert's pointy haired boss.)
It's iffy. The first time the database crashes you'll wish you went with SQL Server Express. And it will crash, eventually.
In my previous job we had a product with an Access database backend. We had some clients with 25 users. We refused clients who had 40 potential users because we knew from experience that the database would corrupt itself on a regular basis, and performance would be unacceptable.
The day we went to SQL Server Express, the performance of the application doubled, and the problems with crashing and corruption virtually disappeared.