Let's say I want to build a gaming website and I have many game sections. They ALL have a lot of data that needs to be stored. Is it better to make one database with a table representing each game or have a database represent each section of the game? I'm pretty much expecting a "depends" kind of answer.
Managing 5 different databases is going to be a headache. I would suggest using one database with 5 different tables. Aside from anything else, I wouldn't be surprised to find you've got some common info between the 5 - e.g. user identity.
Note that your idea of "a lot of data" may well not be the same as the database's... databases are generally written to cope with huge globs of data.
Depends.
Just kidding. If this is one project and the data are in any way related to each other I would always opt for one database absent a specific and convincing reason for doing otherwise. Why? Because I can't ever remember thinking to myself "Boy, I sure wish it were harder to see that information."
While there is not enough information in your question to give a good answer, I would say that unless you foresee needing data from two games at the same time for the same user (or query), there is no reason to combine databases.
You should probably have a single database for anything common, and then create independent databases for anything unique. Databases, like code, tend to end up evolving in different directions for different applications. Keeping them together may lead you to break things or to be more conservative in your changes.
In addition, some databases are optimized, managed, and backed-up at a database level rather than a table level. Since they may have different performance characteristics and usage profiles, a one-size-fit-all solution may not be scalable.
If you use an ORM framework, you get access to multiple databases (almost) for free while still avoiding code replication. So unless you have joint queries, I don't think it's worth it to pay the risk of shared databases.
Of course, if you pay someone to host your databases, it may be cheaper to use a single database, but that's really a business question, not software.
If you do choose to use a single database, do yourself a favour and make sure the code for each game only knows about specific tables. It would make it easier for you to maintain things later or separate into multiple databases.
One database.
Most of the stuff you are reasonably going to want to store is going to be text, or primitive data types such as integers. You might fancy throwing your binary content into blobs, but that's a crazy plan on a media-heavy website when the web server will serve files over HTTP for free.
I pulled lead programming duties on a web-site for a major games publisher. We managed to cover a vast portion of their current and previous content, in three European languages.
At no point did we ever consider having multiple databases to store all of this, despite the fact that each title was replete with video and image resources.
I cannot imagine why a multiple database configuration would suit your needs here, either in development or outside of it. The amount of synchronisation you'll have to pull and capacity for error is immense. Trying to pull data that pertains to all of them from all of them will be a nightmare.
Every site-wide update you migrate will be n times as hard and error prone, where n is the number of databases you eventually plump for.
Seriously, one database - and that's about as far from your anticipated depends answer as you're going to get.
If the different games don't share any data it would make sense to use separate databases. On the other hand it would make sense to use one database if the structure of the games' data is the same--you would have to make changes in every game database separately otherwise.
Update: In case of doubt you should always use one database because it's easier to manage in the most cases. Just if you're sure that the applications are completely separate and have completely different structures you should use more databases. The only real advantage is more clarity.
Generally speaking, "one database per application" tends to be a good rule of thumb.
If you're building one site that has many sections for talking about different games (or different types of games), then that's a single application, so one database is likely the way to go. I'm not positive, but I think this is probably the situation you're asking about.
If, on the other hand, your "one site" is a battle.net-type matching service for a collection of five distinct games, then the site itself is one application and each of the five games is a separate application, so you'd probably want six databases since you have a total of six largely-independent applications. Again, though, my impression is that this is not the situation you're asking about.
If you are going to be storing the same data for each game, it would make sense to use 1 database to store all the information. There would be no sense in replicating table structures across different databases, likewise there would be no sense in creating 5 tables for 5 games if they are all storing the same information.
I'm not sure this is correct, but I think you want to do one database with 5 tables because (along with other reasons) of the alternative's impact on connection pooling (if, for example, you're using ADO.Net). In the ADO.Net connection pool, connections are keyed by the connection string, so with five different databases you might end up with 20 connections to each database instead of 100 connections to one database, which would potentially affect the flexibility of the allocation of connections.
If anybody knows better or has additional info, please add it here, as I'm not sure if what I'm saying is accurate.
What's your idea of "a lot of data"? The only reason that you'd need to split this across multiple databases is if you are trying to save some money with shared hosting (i.e. getting cheap shared hosts and splitting it across servers), or if you feel each database will be in the 500GB+ range and do not have access to appropriate storage.
Note that both of these reasons have nothing to do with architecture, and entirely based on monetary concerns during scaling.
But since you haven't created the site yet, you're putting the cart before the horse. It is very unlikely that a brand new site would use anywhere near this level of storage, so just create 1 database.
Some companies have single databases in the 1,000+ TB range ... there is basically no upper bound on database size.
The number of databases you want to create depends not on the number of your games, but on the data stored in the databases, or, better say, how do you exchange these data between the databases.
If it is export and import, then do separate databases.
If it is normal relationships (with foreign keys and cross-queries), then leave it in one database.
If the databases are not related to each other, then they are separate databases, of course.
In one of my projects, I distinguished between the internal and external data (which were stored in separate databases).
The difference was quite simple:
External database stored only the facts you cannot change or undo. That was phone calls, SMS messages and incoming payments in our case.
Internal database stored the things that are usually stored: users, passwords etc.
The external database used only the natural PRIMARY KEY's, that were the phone numbers, bank transaction id's etc.
The databases were given with completely different rights and exchanging data between them was a matter of import and export, not relationships.
This made sure that nothing would happen with actual data: it is easy to relink a payment to a user, but it's very hard to restore a payment if it's lost.
I can pass on my experience with a similar situation.
We had 4 "Common" databases and about 30 "Specific" databases, separated for the same space concerns. The downside is that the space concerns were just projecting dBase shortcomings onto SQL Server. We ended up with all these databases on SQL Server Enterprise that were well under the maximum size allowed by the Desktop edition.
From a database perspective with respect to separation of concerns, the 4 Common databases could've been 2. The 30 Specific databases could've been 3 (or even 1 with enough manipulation / generalization). It was inefficient code (both stored procs and data access layer code) and table schema that dictated the multitude of databases; in the end it had nothing at all to do with space.
I would consolidate as much as possible early and keep your design & implementation flexible enough to extract components if necessary. In short, plan for several databases but implement as one.
Remember, especially on web sites. If you have multiple databases, you often lose the performance benefits of query caching and connection pooling. Stick to one.
Defenitively, one database
One place I worked had many databases, a common one for the stuff all clients used and client specifc ones for customizing by client. What ended up happening was that since the clients asked for the changes, they woudl end up inthe client database instead of common and thus there would be 27 ways of doing essentially the same thing becasue there was no refactoring from client-specific to "hey this is something other clients will need to do as well" so let's put it in common. So one database tends to lead to less reinventing the wheel.
Security Model
If each game will have a distinct set of permissions/roles specific to that game, split it out.
Query Performance /Complexity
I'd suggest keeping them in a single database if you need to frequently query across the data between the games.
Scalability
Another consideration is your scalability plans. If the games get extremely popular, you might want to buy separate database hardware for each game. Separating them into different databases from the start would make that easier.
Data Size
The size of the data should not be a factor in this decision.
Just to add a little. When you have millions and millions of players in one game and your game is realtime and you have tens of thousand simultaneous players online and you have to at least keep some essential data as up-to-date in DB as possible (say, player's virtual money). Then you will want to separate tables into independent DBs even though they are all "connected".
It really depends. And scaling will be painful whatever you may try to do to avoid it being painful. But if you really expect A LOT of players and updates and data I would advise on thinking twice, thrice and more before settling on a "one DB for several projects" solution.
Yes it will be difficult to manage several DBs probably. But you will have to do this anyway.
Really depends :)..
Ask yourself these questions:
Could there be a resuability (users table) that I may want to think about?
Is it worth seperating these entities or are they pretty much the same?
Do any of these entities share specific events / needs?
Is it worth my time and effort to build 5 different database systems (remember if you are writing the games that would mean different connection strings and also present more security, etc).
Or you could create one database OnlineGames and have a table that stores the game name and a category:
PacMan Arcade
Zelda Role playing
etc etc..
It really depends on what your intentions are...
Related
I've been asked to develop an application that will be run out to a number of business units. the application will be the basically the same for each unit, but will have minor procedural differences, which won't change the structure of the underlying database. Should I use one database per business unit, or one big database for all the units? The business units are totally separate
My preference is for one database per client. The advantages:
if a client gets too big, they're easy to move - backup, restore, change the connection string, boom. Try doing that when their data is mixed in with others in a massive database. Even if you use schemas and filegroups to segregate, moving them is not a cakewalk.
ditto for deleting a client's data when they move on.
by definition you're keeping each client's data separate. This is often going to be a want, and sometimes a need. Sometimes it will even be legally binding.
all of your code within a database is simpler - it doesn't have to include the client's schema (which can't be parameterized) and your tables don't have to be littered with an extra column indicating the client.
A lot of people will claim that managing 200 or 500 databases is a lot harder than managing 10 databases. It's not really any different, in my experience. You build scripts that automate things, you stagger index maintenance and backup jobs, etc.
The potential disadvantages are when you get up into the realm of 4-digit and higher databases per instance, where you want to start thinking about having multiple servers (the threshold really depends on the workload and the hardware, so I'm just picking a number). If you build the system right, adding a second server and putting new databases there should be quite simple. Again, the app should be aware of each client's connection string, and all you're doing by using different servers is changing the instance the connection string points to.
Some questions over on dba.SE you should look at. They're not all about SQL Server, but many of the concepts and challenges are universal:
https://dba.stackexchange.com/questions/16745/handling-growing-number-of-tenants-in-multi-tenant-database-architecture
https://dba.stackexchange.com/questions/5071/what-are-the-performance-implications-of-running-multiple-smaller-dbs-instead-of
https://dba.stackexchange.com/questions/7924/one-big-database-vs-several-smaller-ones
Your question is a design question. In order to answer it, you need to understand the requirements of the system that you want to build. From a technical perspective, SQL Server -- or really any database -- can handle either scenario.
Here are some things to think about.
The first question is how separate your clients need the data to be. Mixing data together from different business units may not be legal in some cases (say, the investment side of a bank and the market analysis side). In such situations, separate databases are the solution.
The next question is security. In some situations, clients might be very uncomfortable knowing that their data is intermixed with other clients data. A small slip-up, and confidential information is inadvertently shared. This is probably not an issue for different business units in the same company.
Do you have to deal with different uptime requirements, upload requirements, customizations, and perhaps interaction with other tools? If one business unit will need customizations ASAP that other business units are not interested in, then that suggests different databases.
Another consideration is performance. Does this application use a lot of expensive resources? If so, being able to partition the application on different databases -- and potentially different servers -- may be highly desirable.
On the other hand, if much of the data is shared, and the repository is really a central repository with the same underlying functionality, then one database is a good choice.
In my database I have tables that will grow at different speeds:
fairly static ones that I dont expect to grow much at all,
medium-grade ones that will grow somewhat linearly with number of users and their activity
fast-growing ones that will grow rapidly as they hold logged data points
I have been fretting a little bit about maintaining this database as it is growing. It is a balancing scenario:
A single database is easier to work with but it may have higher maintenance costs in the future
Multiple databases can be easier to maintain once the application is deployed, but will require more R&D time
Can you recommend one or the other solution based on your past experience?
Thanks,
I think that how fast the data grows is irrelevant. I think that it makes more sense to have databases split up based on something mapped to a real-world reason.
For example, Typically, we have one database per app that we write. We have a database for a Nutrition and Ingredients database, another one for Job Listings, etc. We do this because it's easier for us to keep track of which database affects which apps. (To avoid confusion in other words.)
But we do have one Common database that holds information that's used in multiple applications. (Such as corporate info, locations, etc) so that we can avoid data de-duplication. (Why maintain a list of locations in each database).
I'm not saying this is how you should structure your data, but I listed this as an example of a good reason to have data split across multiple databases.
Other than having different maintenance plans for databases with varying growth, I can't see any reason to split based on database activity.
Splitting them behind a load balancer is the same either way.. Maintianing them once they're deployed should be done in a controlled, pre-tested manner regardless of how fast the tables grow...
Unless I'm missing something, I don't see a good reason for it, but I don't see a good reason not to do it, either. If it makes sense for you, and simplifues your business process without adding confusion or other problems, then it makes sense in your situation, and I see no harm in it at all.
I am working on a project that has the potential to have a large number of users each of which will be managing their own unique data sets. I am thinking the data can be stored in one of two ways.
1) Create a completely different database for each user so that their data is fully separate from everyone elses
2) Share the data in the same database, and segregate it at the query level using a user_id field.
The schema will always be identical for each user.
The main thing is that the system will need to be able to scale, and I am not sure if having potentially several thousand different databases, or storing millions of records in the same tables would scale better.
I am interested in hearing from anyone who has dealt with this kind of situation in the past and what pitfalls might be out there with either option.
In addition to the scaling aspect that you have already identified, there are a few other concerns which may drive your decision - the 'large number of users' can also mean such a range of numbers that you would be best to clarify.
Other operational concerns:
Security - relying on a user_id field within your code relies on there being no error or flaw that allows a user to see / maniuplate other user's data.
Upgrades - goes both ways, but you either upgrade everyone at once (single DB) or by splitting - allow yourself to upgrade diffent sets of users at different times.
Backup / Restore - depending on the restore requirements and SLAs, you may find that having everyone in a single database creates too much of a problem when it comes to backup / restore. If a single client wants to restore their data, the operational overhead when it is combined with all the other client's data is not trivial. Equally, having lots of databases = lots of seperate backups.
Scalability - having the ability to place different user's databases on seperate servers can aid scale, instead of requiring a big iron DB server. But again, that is a management overhead.
Multi-tenancy of an application and it's data source is not an easy question / answer - understanding more about how many users is 'large' in this case might be, combined with the operational concerns provide you guidance.
Option 2 should be your best bet. Databases are usually designed to work with millions and millions of rows and a lots of data. So, as long as you design your schema correctly and have proper indexes, fill factors etc., option 2 will lead you to the scaling that you are looking for. As DarthVader said, learn more about database design.
Dont create seperate database for each user. That s not good.
What if you will have million users?
Create table for users and entities that belong to same context. you cant scale applications like that. and before learning about scalability. you need to learn about database design and how databases works.
is mysql capable of managing the data for a site which holds lots of data (say with hundreds of millions of users)? which database would be the most capable/beneficial?
Wikipedia is based on MySQL. I don't think it has 100M users, but it must be close by now.
No database will handle hundreds of millions of users unless you know how to set it up properly. No single server could handle that kind of traffic, so you need to know how to setup replication and load balancing. Once you reach a certain level, there is no out of the box solution, only tools you can use. MySQL being a very capable tool.
There are a couple of answers to this.
Yes, MySQL can store hundreds of millions of records; you need to know what you're doing, have a decent database schema, pretty robust hardware, but you're not pushing the limits.
When you talk about "hundreds of millions of users", you're talking about a site along the lines of Wikipedia/Facebook/Google/Amazon in scale. You need a custom, highly cached, distributed architecture to run a site at that scale - and the traditional database driven application architecture will almost certainly not be enough. You could still store your data in MySQL, but you'd need a whole bunch of additional components to make it all work - and without knowing more about the application, nobody could tell you what that might be. At that scale, none of the commonly used databases would suffice, so MySQL is no better or worse than any of the other options...
Your question is really irrelevant, because creating a product or service that hundreds of millions of customers actually want is a much bigger and more difficult challenge than choosing a database engine.
If you're starting a business from nothing, pick a technical platform you already know and go with it: productivity and quick implementation will be more important than being scalable to a level you may never reach anyway.
If you do eventually become successful enough to have to deal with hundreds of millions of customers, then you'll certainly be able to raise the cash to buy whatever expertise and hardware you need.
Is it because of size?
By the way, what is the limit of the size?
There are many reasons to use two databases, some that come to mind:
Size (the limit of which is controlled by the operating system, filesystem, and database server)
Separation of types of data. Think of a database like a book -- you wouldn't write a book that spans multiple subjects, and you shouldn't (necessarily) have a database with multiple subjects. Just so all of the data is somehow related, you could keep it together (i.e. all the tables have something to do with one website or application).
Import / Export - it might be easier to import data into your application if you can drop and restore a whole database, rather than import individual rows into a database table.
Separate applications, or services. I can't see any reason to use separate databases for a single app/service.
(note: replication, even multimaster, isn't a separate database. Neither is Sharding.)
I believe some on here are confusing Database with a Database Instance.
Example:
A phone book is a prime example of a Database.
Replication:
having 2 copies of the same phone book does not mean you have 2 databases. It means you have 2 copies of 1 database, and that you can hand 1 to someone else so you can both look up different things at the same time thereby accomplishing more work at once.
Sharding:
You could tear those phonebooks apart at the end of the white pages and the beginning of the yellow and hand them to 2 more people. You could further tear them at each letter and when you need susan summers ask the person with that section of the book to look for her.
suppose you wanted to publish or reuse some external database, and keep it separate from your primary database. This would be a good reason to use 2 databases... You can drop and reimport the external database at any time without affecting your database, and vice versa...
You can use two databases the same reason most banks have two ATMs, for reliability. You can swap one in if the other fails, but to do it quickly requires setup, such as a cname and controlling your own DNS server.
You can also do writes on one database, if the writes have complex triggers on them, and use some synching between databases to keep the second one updates, which is used for selects.
You can use two databases for load sharing, for example, use round-robin to split up the load so one isn't overloaded.
I sometimes have separate database because they handle different concerns. I.E. a Reporting database or an Authentication Database.
Replication
Making your system scalable by devide your database system to different physical location
Provide redundancy/replication as backup and seamless uptime.a
As Ben mentioned, Replication is one reason. Another is load balancing.
For example, Hotmail uses many database servers and customer data is broken up across the databases.
To have all of their customers' data on one server would not only require massive storage requirements, but the response times would be horrible.
In other cases, the data may be separated by function. You may well have two sets of data which are either not connected, or at least very loosely so, and in such cases, it may make sense to separate that data from the rest.
Also consider IO needs. Writing to one, reading from the other. One with immediate transactional needs, others where "transactions" can be queued, one instance at high priority, the other at "idle" priority, &c. It is very obvious however with the correct hardware and tablespace/filesystem layouts most of these situations can be achieved in a singular DB.
I think SQLite databases on the iPhone is limited to a size of 50 megabytes, but you can open several databases.