Organizing a MySQL Database - mysql

I'm developing an application that will require me to create my first large-scale MySQL database. I'm currently having a difficult time wrapping my mind around the best way to organize it. Can anyone recommend any reading materials that show different ways to organize a MySQL database for different purposes?
I don't want to try getting into the details of what I imagine the database's main components will be because I'm not confident that I can express it clearly enough to be helpful at this point. That's why I'm just looking for some general resources on MySQL database organization.

The way I learned to work these things out is to stop and make associations.
In more object oriented languages (I'm assuming you're using PHP?) that force OO, you learn to think OO very quickly, which is sort of what you're after here.
My workflow is like this:
Work out what data you need to store. (Customer name etc.)
Work out the main objects you're working with (e.g. Customer, Order, Salesperson etc), and assign each of these a key (e.g. Customer ID).
Work out which data connects to which objects. (Customer name belongs to a customer)
Work out how the main objects connect to each other (Salesperson sold order to Customer)
Once you have these, you have a good object model of what you're after. The next step is to look at the connections. For example:
Each customer has only one name.
Each product can be sold multiple times to anybody
Each order has only one salesperson and one customer.
Once you've worked that out, you want to try something called normalization, which is the art of getting this collection of data into a list of tables, still minimizing redundancy. (The idea is, a one-to-one (customer name) is stored in the table with the customer ID, many to one, one to many and many to many are stored in separate tables with certain rules)
That's pretty much the gist of it, if you ask for it, I'll scan an example sheet from my workflow for you.

Maybe I can provide some advices based on my own experience
unless very specific usage (like fulltext index), use the InnoDB tables engine (transactions, row locking etc...)
specify the default encoding - utf8 is usually a good choice
fine tune the server parameters (*key_buffer* etc... a lot of material on the Net)
draw your DB scheme by hand, discuss it with colleagues and programmers
define data types based not only on the programs usage, but also on the join queries (faster if types are equal)
create indexes based on the expected necessary queries, also to be discussed with programmers
plan a backup solution (based on DB replication, or scripts etc...)
user management and access, grant only the necessary access rights, and create a read-only user to be used by most of queries, that do not need write access
define the server scale, disks (raid?), memory, CPU
Here are also some tips to use and create a database.

I can recomend you the first chapter of this book: An Introduction to Database Systems, it may help you organize your ideas, and I certainly recomend not using 5th normal form but using 4th, this is very important.

If I could only give you one piece of advice, that would be to generate test data at similar volumes as production and benchmark the main queries.
Just make sure that the data distribution is realistic. (Not all people are named "John", and not all people have unique names. Not all people give their phone nr, and most people won't have 10 phone numbers either).
Also, make sure that the test data doesn't fit into RAM (unless you expect the production data volumes to do too).

Related

Ways to structure an application that has two clear parts

I am in a project that has an infinite amount of tables, We have to come to a solution that brings scalability to the platform, and we don't seem to figure out what would be a really good one.
The platform is a job seeker, so it has two clear parts, candidates, and companies.
We've been thinking and have come to those posible solutions to re-estructure the current database, as it is a monster.
2 API's 2 Databases: This way would take a lot of database migration work, but would define very clearly the different parts of the platform.
2 API's 1 Database: Doing this, the database work would be reduced to normalize what we have now, but we would still have the two parts of the platform logically separated.
1 API 1 Database: Normalize the database, and do everything in the same API, trying to logically separate everything, making it scalable but at the same time accesible from one part to the other.
Right now I am more into the 1 API 1 Database solution, but we would like to read some experienced users to make the final choice.
Thank you!
I was in a situation kind of like yours some years ago. I will try to express my thoughts on how we handled it. All this might sound opinionated but each and every task is different, therefore the implementations are as well.
The two largest problems I notice:
Having an infinite number of tables is the first sign that your current database schema design is a Big Ball of Mud.
Acknowledging that you have a monster database indicates that you better start refactoring it to smaller pieces. Yes I know it's never easy.
It would add a lot more value to your question if you would show us some of the architectural details/parts of your codebase, so we could give better suited ideas.
Please forgive me for linking Domain Driven Design related information sources. I know that DDD is not about any technological fluff, however the strategy you need to choose is super important and I think it brings value to this post.
Know your problem domain
Before you start taking your database apart you should clearly understand how your problem domain works. To put it simply: the problem domain definition in short is the domain of the business problems you are trying to solve with the strategy you are going to apply.
Pick your strategy
The most important thing here is: the business value your strategy brings. The proposed strategy in this case is to make clear distinctions between your database objects.
Be tactical!
We chose the strategy, now we need to to define tactics applied to this refactoring. Our definition of our tactics here should be clearly set like:
Separate the related database objects that belong together, this defines explicit boundaries.
Make sure the connections between the regrouped database objects remain intact and are working. I'm talking about cross table/object references here.
Let's get technical - the database
How to break things
I personally would split up your current schema to three individual separate parts:
Candidates
Companies
Common tables
Reasoning
By strategically splitting up these database objects you consciously separate these concerns. This separation lets you have a new thing: tactical boundary.
Each of your newly separated schemas now have different contexts, and different boundaries. For example there is the Candidates schemas bounded context. It groups together business concepts/rules/etc. The same applies to the Companies schema.
The only difference is the Common tables schema. This could serve as a shared kernel -a bridge, if you like- between your other databases, containing all the shared tables that every other schema needs to reach.
Outcome
All that has been said could bring you up to a level where you can:
Backup/restore faster and more conveniently
Scale database instances separately
Easily set/monitor the access of database objects defined per schema
The API
How to glue things
This is the point where it gets really greasy, however implementing an API is really dependent on your business use case. I personally would design two different public API's.
Example
For Candidates
For Companies
The same design principles apply here as well. The only difference here is that I think there is no added business value to add an API for the Common tables. It could be just a simple database schema which both of these main API's could query or send commands to.
In my humble opinion, seperating databases results in some content management difficulties. Both of these seperate parts will contain exactly same tables like job positions, cities, business areas etc. How will you maintain these tables? Will you insert country "Zimbabwe" to both of them? What if their primary keys not equal? At some point you will need to use data from these seperated databases and which record of "Zimbabwe" will be used? I'm not talking about performance but using same database for these two project will be make life easier for you. Also we are in cloud age and you can scale your single database service/server/droplet as you want. For clearity of modules, you can define your naming conventions. For example if table is used by both parts, add prefix "common_", if table only used by candidates use "candidate_" etc.
For API, you can use same methodology, too. Define 3 different API part. Common, candidates and companies. But in this way, you should code well-tested authentication and authorization layer for your API.
If I were you, I'd choose the 1 API, 1 Database.
If it fails, seperating 1 API to 2 API or 1 Database to 2 Database is much easier then merging them (humble opinion...)

Postgres shared schema multi-tenant setup for e-commerce SaaS

I've researched a lot for the best multi-tenant setup of an e-commerce project but could not find a fitting answer. I am leaning to use a shared database separate schema setup with either MySQL or PostgreSQL. The structure of the tables is the same for all the tenants. I really like that with that setup the application code doesn't need to provide an extra WHERE clause for every query, so it is very developer friendly!
Now you also have the shared schema approach and that is what I am currently using, but I feel this is bad for security and isolation purposes. I would like to move to a different solution.
The app will be used by +- 100 webshops (a tenant) this year and I expect it to grow to into the thousands. Webshops ranges from small to large, so it is important that I can later pick out a specific shops data and put it in it's own database server.
Since I don't have any experience with a separate schema setup I would like to know if this would benefit me. What issues might I have when walking that path? Especially with changes in the structure of the tables, this is what bothers me the most. What is the limit of schema's to use in a separate schema approach using PostgreSQL (I will have 100-1000 schemas) before it would be a pain to manage?
The following is my take
Blockquote
When using the SharedDatabase with same table for all tenants, isolation happening with tenantid column is easy because you always add a filter that says tenantid = LoggedInTenantId [LoggedInTenantId => set during login]. When you have a base method in your ORM like EF [I'm from .Net], this would auto append to any query that goes out of the code.
When you opt for shared Schema, if there is a use-case like sharing data between tenants [webshops], it is not feasible. Else, if you have an accountant that wants to audit a collection of tenant's and wants a dashboard to view the a/c statistics etc, it becomes impossible
With the scaling point of view, you can better go for a separate db per tenant if a single tenant or a collection of them wants to scale out. This will be better than managing schemas.
Consider the use-cases that you may have for your product and share here so that we can take this discussion forward.
HTH

Best database model for saas application (1 db per account VS 1 db for everyone)

Little question, I'm developing a saas software (erp).
I designed it with 1 database per account for these reasons :
I make a lot of personalisation, and need to add specific table columns for each account.
Easier to manage db backup (and reload data !)
Less risky : sometimes I need to run SQL queries on a table, in case of an error with bad query (update / delete...), only one customer is affected instead of all of them.
Bas point : I'm turning to have hundreds of databases...
I'm hiring a company to manage my servers, and they said that it's better to have only one database, with a few tables, and put all data in the same tables with column as id_account. I'm very very surprised by these words, so I'm wondering... what are your ideas ?
Thanks !
Frederic
The current environment I am working in, we handle millions of records from numerous clients. Our solution is to use Schema to segregate each individual client. A schema allows you to partition your clients into separate virtual databases while inside a single db. Each schema will have an exact copy of the tables from your application.
The upside:
Segregated client data
data from a single client can be easily backed up, exported or deleted
Programming is still the same, but you have to select the schema before db calls
Moving clients to another db or standalone server is a lot easier
adding specific tables per client is easier (see below)
single instance of the database running
tuning the db affects all tenants
The downside:
Unless you manage your shared schema properly, you may duplicate data
Migrations are repeated for every schema
You have to remember to select the schema before db calls
hard pressed to add many negatives... I guess I may be biased.
Adding Specific Tables: Why would you add client specific tables if this is SAAS and not custom software? Better to use a Postgres DB with a Hstore field and store as much searchable data as you like.
Schemas are ideal for multi-tenant databases Link Link
A lot of what I am telling you depends on your software stack, the capabilities of your developers and the backend db you selected (all of which you neglected to mention)
Your hardware guys should not decide your software architecture. If they do, you are likely shooting yourself in the leg before you even get out of the gate. Get a good senior software architect, the grief they will save you, will likely save your business.
I hope this helps...
Bonne Chance

5 separate database or 5 tables in 1 database?

Let's say I want to build a gaming website and I have many game sections. They ALL have a lot of data that needs to be stored. Is it better to make one database with a table representing each game or have a database represent each section of the game? I'm pretty much expecting a "depends" kind of answer.
Managing 5 different databases is going to be a headache. I would suggest using one database with 5 different tables. Aside from anything else, I wouldn't be surprised to find you've got some common info between the 5 - e.g. user identity.
Note that your idea of "a lot of data" may well not be the same as the database's... databases are generally written to cope with huge globs of data.
Depends.
Just kidding. If this is one project and the data are in any way related to each other I would always opt for one database absent a specific and convincing reason for doing otherwise. Why? Because I can't ever remember thinking to myself "Boy, I sure wish it were harder to see that information."
While there is not enough information in your question to give a good answer, I would say that unless you foresee needing data from two games at the same time for the same user (or query), there is no reason to combine databases.
You should probably have a single database for anything common, and then create independent databases for anything unique. Databases, like code, tend to end up evolving in different directions for different applications. Keeping them together may lead you to break things or to be more conservative in your changes.
In addition, some databases are optimized, managed, and backed-up at a database level rather than a table level. Since they may have different performance characteristics and usage profiles, a one-size-fit-all solution may not be scalable.
If you use an ORM framework, you get access to multiple databases (almost) for free while still avoiding code replication. So unless you have joint queries, I don't think it's worth it to pay the risk of shared databases.
Of course, if you pay someone to host your databases, it may be cheaper to use a single database, but that's really a business question, not software.
If you do choose to use a single database, do yourself a favour and make sure the code for each game only knows about specific tables. It would make it easier for you to maintain things later or separate into multiple databases.
One database.
Most of the stuff you are reasonably going to want to store is going to be text, or primitive data types such as integers. You might fancy throwing your binary content into blobs, but that's a crazy plan on a media-heavy website when the web server will serve files over HTTP for free.
I pulled lead programming duties on a web-site for a major games publisher. We managed to cover a vast portion of their current and previous content, in three European languages.
At no point did we ever consider having multiple databases to store all of this, despite the fact that each title was replete with video and image resources.
I cannot imagine why a multiple database configuration would suit your needs here, either in development or outside of it. The amount of synchronisation you'll have to pull and capacity for error is immense. Trying to pull data that pertains to all of them from all of them will be a nightmare.
Every site-wide update you migrate will be n times as hard and error prone, where n is the number of databases you eventually plump for.
Seriously, one database - and that's about as far from your anticipated depends answer as you're going to get.
If the different games don't share any data it would make sense to use separate databases. On the other hand it would make sense to use one database if the structure of the games' data is the same--you would have to make changes in every game database separately otherwise.
Update: In case of doubt you should always use one database because it's easier to manage in the most cases. Just if you're sure that the applications are completely separate and have completely different structures you should use more databases. The only real advantage is more clarity.
Generally speaking, "one database per application" tends to be a good rule of thumb.
If you're building one site that has many sections for talking about different games (or different types of games), then that's a single application, so one database is likely the way to go. I'm not positive, but I think this is probably the situation you're asking about.
If, on the other hand, your "one site" is a battle.net-type matching service for a collection of five distinct games, then the site itself is one application and each of the five games is a separate application, so you'd probably want six databases since you have a total of six largely-independent applications. Again, though, my impression is that this is not the situation you're asking about.
If you are going to be storing the same data for each game, it would make sense to use 1 database to store all the information. There would be no sense in replicating table structures across different databases, likewise there would be no sense in creating 5 tables for 5 games if they are all storing the same information.
I'm not sure this is correct, but I think you want to do one database with 5 tables because (along with other reasons) of the alternative's impact on connection pooling (if, for example, you're using ADO.Net). In the ADO.Net connection pool, connections are keyed by the connection string, so with five different databases you might end up with 20 connections to each database instead of 100 connections to one database, which would potentially affect the flexibility of the allocation of connections.
If anybody knows better or has additional info, please add it here, as I'm not sure if what I'm saying is accurate.
What's your idea of "a lot of data"? The only reason that you'd need to split this across multiple databases is if you are trying to save some money with shared hosting (i.e. getting cheap shared hosts and splitting it across servers), or if you feel each database will be in the 500GB+ range and do not have access to appropriate storage.
Note that both of these reasons have nothing to do with architecture, and entirely based on monetary concerns during scaling.
But since you haven't created the site yet, you're putting the cart before the horse. It is very unlikely that a brand new site would use anywhere near this level of storage, so just create 1 database.
Some companies have single databases in the 1,000+ TB range ... there is basically no upper bound on database size.
The number of databases you want to create depends not on the number of your games, but on the data stored in the databases, or, better say, how do you exchange these data between the databases.
If it is export and import, then do separate databases.
If it is normal relationships (with foreign keys and cross-queries), then leave it in one database.
If the databases are not related to each other, then they are separate databases, of course.
In one of my projects, I distinguished between the internal and external data (which were stored in separate databases).
The difference was quite simple:
External database stored only the facts you cannot change or undo. That was phone calls, SMS messages and incoming payments in our case.
Internal database stored the things that are usually stored: users, passwords etc.
The external database used only the natural PRIMARY KEY's, that were the phone numbers, bank transaction id's etc.
The databases were given with completely different rights and exchanging data between them was a matter of import and export, not relationships.
This made sure that nothing would happen with actual data: it is easy to relink a payment to a user, but it's very hard to restore a payment if it's lost.
I can pass on my experience with a similar situation.
We had 4 "Common" databases and about 30 "Specific" databases, separated for the same space concerns. The downside is that the space concerns were just projecting dBase shortcomings onto SQL Server. We ended up with all these databases on SQL Server Enterprise that were well under the maximum size allowed by the Desktop edition.
From a database perspective with respect to separation of concerns, the 4 Common databases could've been 2. The 30 Specific databases could've been 3 (or even 1 with enough manipulation / generalization). It was inefficient code (both stored procs and data access layer code) and table schema that dictated the multitude of databases; in the end it had nothing at all to do with space.
I would consolidate as much as possible early and keep your design & implementation flexible enough to extract components if necessary. In short, plan for several databases but implement as one.
Remember, especially on web sites. If you have multiple databases, you often lose the performance benefits of query caching and connection pooling. Stick to one.
Defenitively, one database
One place I worked had many databases, a common one for the stuff all clients used and client specifc ones for customizing by client. What ended up happening was that since the clients asked for the changes, they woudl end up inthe client database instead of common and thus there would be 27 ways of doing essentially the same thing becasue there was no refactoring from client-specific to "hey this is something other clients will need to do as well" so let's put it in common. So one database tends to lead to less reinventing the wheel.
Security Model
If each game will have a distinct set of permissions/roles specific to that game, split it out.
Query Performance /Complexity
I'd suggest keeping them in a single database if you need to frequently query across the data between the games.
Scalability
Another consideration is your scalability plans. If the games get extremely popular, you might want to buy separate database hardware for each game. Separating them into different databases from the start would make that easier.
Data Size
The size of the data should not be a factor in this decision.
Just to add a little. When you have millions and millions of players in one game and your game is realtime and you have tens of thousand simultaneous players online and you have to at least keep some essential data as up-to-date in DB as possible (say, player's virtual money). Then you will want to separate tables into independent DBs even though they are all "connected".
It really depends. And scaling will be painful whatever you may try to do to avoid it being painful. But if you really expect A LOT of players and updates and data I would advise on thinking twice, thrice and more before settling on a "one DB for several projects" solution.
Yes it will be difficult to manage several DBs probably. But you will have to do this anyway.
Really depends :)..
Ask yourself these questions:
Could there be a resuability (users table) that I may want to think about?
Is it worth seperating these entities or are they pretty much the same?
Do any of these entities share specific events / needs?
Is it worth my time and effort to build 5 different database systems (remember if you are writing the games that would mean different connection strings and also present more security, etc).
Or you could create one database OnlineGames and have a table that stores the game name and a category:
PacMan Arcade
Zelda Role playing
etc etc..
It really depends on what your intentions are...

Should I split the data between multiple databases or keep them in a single one?

I'm creating a multi-user/company web application in PHP & MySQL. I'm interested to know what the best practice is with regards to structuring my database(s).
There will be hundreds of companies and thousands of users of this web app so this needs to be robust. Each company won't be able to see other companies data, just their own. We will be storing mainly text data and will probably only be a few MB per company.
Currently the database contains 14 tables (for one sample company).
Is it better to put the data for all companies and their users in a single database and create a unique companyID for each one?
or:
Is it better to put each company's data in its own database and create a new database and table set for each new company that I add?
What are the pluses and minuses to each approach?
Thanks,
Stephen
If a single web app is being used by all the different companies, unless you have a very specific need or reason to use separate databases (it doesn't sound like you do), then you should definitely use a single database.
Your application will be responsible for only showing the correct information to the correct authenticated users.
Multiple databases would be a nightmare to maintain. For each new company you'd have to create and administer each one. If you make a change to one schema, you'll have to do it to your 14+.
Thousands of users and thousands of apps shouldn't pose a problem at all as long as you're using something that is a real database and not Access or something silly like that.
Multi-tenant
Pluses
Relatively easy to develop: only change database code in one place.
Lets you easily create queries which use data for multiple tenants.
Straightforward to add new tenants: no code needs to change.
Transforming a multi-tenant to a single-tenant setup is easy, should you need to change your design.
Minuses
Risk of data leak between tenants if coding is sloppy. Tenant view filters can in some cases be employed to reduce this risk. This method is based on using different database user accounts for different tenants.
If you break the code, all tenants will be affected.
Single-tenant
Pluses
If you have very different requirements for different tenants, several different database models can be beneficial. This is the best case for using a single tenant setup.
If you code sloppily, there's practically no risk of data leak between tenants (tenant A will not be able to access tenant B's data). In addition, if you accidentally destroy the schema of one tenant through a botched update, other tenants will remain unaffected.
Less SQL code when you don't need to take tenant ID values into account in your queries
Minuses
Database schemas tend to differentiate with time, often resulting in a nightmare. Using a database compare tool, you can alleviate this problem, but potentially many schemas need to be compared.
Including data from several databases in one query is typically complex, and often requires prepared statements.
Developing is hard, since you need to make the same changes to multiple schemas.
The same database entity can appear in many databases with different ID keys, resulting in confusion.
Transforming a single-tenant to a multi-tenant setup is very hard, should you need to change your design.
A single database is the relational way. One aspect from this perspective is that databases gather statistics about database usage and make heavy use of this. If you split things up you will be shooting yourself in the foot as the statistics will be fragmented.