Is it OK to keep 10000+ tables in a MySQL database?
I'm making a messaging/chat script, so I'm thinking about partitioning data's over several tables as it will be a huge amount of data after some days.
IS IT OK?
Or it has some effect?
Well, as a table can hold millions of rows so I was thinking maybe a database can hold large number of tables too
or, the question could be like, how does Facebook stores their huge amount of daily chat messages?
I'm a newbie in MySQL, please help
MySQL has no limit on the number of tables. The underlying file system may have a limit on the number of files that represent tables. Individual storage engines may impose engine-specific constraints. InnoDB permits up to 4 billion tables.
Even so, the typical DBMS will 'handle' such large databases, but there is more strain on the system catalog than usual in such systems.
I have about huge tables in one database with no ill effects, other than displaying the table list in phpMyAdmin taking a while
It's possible, but I would avoid it unless you have a really good use case for it. It raises all kinds of scalability and maintainability issues. Your table size is mainly limited by available disk space.
If you really need to do it...
You'll need to increase the maximum number of file descriptors that your OS will allow to have open, since MyISAM tables use two file descriptors per table. (If you're using Linux then read the section about ulimit in the man page for bash for how to do this).
Also, there's a MySQL config value called table_cache that limits the number of allowed tables. You'll need to make sure that's large enough to support the number of tables you need.
You won't want to use the standard "flush tables" anymore (unless you're the kind of person that likes to watch paint dry) so you'll need to flush each table individually (e.g. before shutdown).
Again, I would avoid using so many tables. You're probably better off making your schema support what you need in a handful of tables, and consider archiving, warehousing (or deleting!) old data if you're concerned about storing too much data.
Related
I am looking for a free SQL database able to handle my data model. The project is a production database working in a local network not connected to the internet without any replication. The number of application connected at the same times would be less than 10.
The data volume forecast for the next 5 years are:
3 tables of 100 millions rows
2 tables of 500 millions rows
20 tables with less than 10k rows
My first idea was to use MySQL, but I have found around the web several articles saying that MySQL is not designed for big database. But, what is the meaning of big in this case?
Is there someone to tell me if MySQL is able to handle my data model?
I read that Postgres would be a good alternative, but require a lot of hours for tuning to be efficient with big tables.
I don't think so that my project would use NOSQL database.
I would know if someone has some experience to share with regarding MySQL.
UPDATE
The database will be accessed by C# software (max 10 at the same times) and web application (2-3 at the same times),
It is important to mention that only few update will be done on the big tables, only insert query. Delete statements will be only done few times on the 20 small tables.
The big tables are very often used for select statement, but the most often in the way to know if an entry exists, not to return grouped and ordered batch of data.
I work for Percona, a company that provides consulting and other services for MySQL solutions.
For what it's worth, we have worked with many customers who are successful using MySQL with very large databases. Terrabytes of data, tens of thousands of tables, tables with billions of rows, transaction load of tens of thousands of requests per second. You may get some more insight by reading some of our customer case studies.
You describe the number of tables and the number of rows, but nothing about how you will query these tables. Certainly one could query a table of only a few hundred rows in a way that would not scale well. But this can be said of any database, not just MySQL.
Likewise, one could query a table that is terrabytes in size in an efficient way. It all depends on how you need to query it.
You also have to set specific goals for performance. If you want queries to run in milliseconds, that's challenging but doable with high-end hardware. If it's adequate for your queries to run in a couple of seconds, you can be a lot more relaxed about the scalability.
The point is that MySQL is not a constraining factor in these cases, any more than any other choice of database is a constraining factor.
Re your comments.
MySQL has referential integrity checks in its default storage engine, InnoDB. The claim that "MySQL has no integrity checks" is a myth often repeated over the years.
I think you need to stop reading superficial or outdated articles about MySQL, and read some more complete and current documentation.
MySQLPerformanceBlog.com
High Performance MySQL, 3rd edition
MySQL 5.6 manual
MySQL has a two important (and significantly different) database engines - MyISAM and InnoDB. A limits depends on usage - MyISAM is nontransactional - there is relative fast import, but it is too simple (without own memory cache) and JOINs on tables higher than 100MB can be slow (due too simple MySQL planner - hash joins is supported from 5.6). InnoDB is transactional and is very fast on operations based on primary key - but import is slower.
Current versions of MySQL has not good planner as Postgres has (there is progress) - so complex queries are usually much better on PostgreSQL - and really simple queries are better on MySQL.
Complexity of PostgreSQL configuration is myth. It is much more simple than MySQL InnoDB configuration - you have to set only five parameters: max_connection, shared_buffers, work_mem, maintenance_work_mem and effective_cache_size. Almost all is related to available memory for Postgres on server. Usually work for 5 minutes. On my experience a databases to 100GB is usually without any problems on Postgres (probably on MySQL too). There are two important factors - how speed you expect and how much memory and how fast IO you have.
With large databases you have to have a experience and knowledges for any database technology. All is fast when you are in memory, and when ratio database size/memory is higher, then much more work you have to do to get good results.
First of all, MySQLs table size is only limited by the allowed file size limit of your OS which is I. The terra bytes on any modern OS. That would pose no problems. Most important are questions like this:
What kind of queries will you run?
Are the large table records updated frequently or basically archives for history data?
What is your hardware budget?
What is the kind of query speed you need?
Are you familiar with table partitioning, archive tables, config tuning?
How fast do you need to write (expected inserts per second)
What language will you use to connect to the db (Java, .net, Ruby etc)
What platform are you most familiar with?
Will you run queries which might cause table scans such like '%something%' which would have to go through every single row and take forever
MySQL is used by Facebook, google, twitter and others with large tables and 100,000,000 is not much in the age of social media. MySQL has very little drawbacks (even though I prefer postgresql in most cases) like altering large tables by adding a new index for example. That might send your company in a couple days forced vacation if you don't have a replica in the meantime. Is there a reason why NoSQL is not an option? Sometimes hybrid approaches are a good choice like having your relational business logic in MySQL and huge statistical tables in a NoSQL database like MongoDb which can scale by adding new servers in minutes (MySQL can too but it's more complicated). Now MongoDB can have a indexed column which can be searched by in blistering speed.
Bejond the bottom line: you need to answer the above questions first to make a very informed decision. If you have huge tables and only search on indexed keys almost any database will do - if you expect many changes to the structure down the road you want to use a different approach.
Edit:
Based on your update you just posted I doubt you would run into problems.
In a MySQL database with multiple tables, is the runtime of an query on a table affected by the size of all the other tables within the same database?
I can only think of two effects that other tables could have on a query, assuming that these tables are not involved in the query, no other queries are running, and there are no constraints or triggers that link the tables.
The first effect is during the compilation phase. During this phase, the SQL parser is fetching information about tables and columns from metadata tables. More tables and columns in the database could slow down the compilation of the query.
The other effect is page table and disk fragmentation. If you have a clean system and start allocating and filling pages, then the pages are probably going to fill contiguous pages on the disk (no guarantees, but probably). At access time, the operating system might be pre-fetching physical pages adjacent to a requested page. In the environment I described, this pre-fetched data would probably be used in the query.
In a database with multiple tables, you have a much greater chance of the disk being fragmented. In this case, the pre-fetched hardware pages are less likely to be for the same tables in the query. This means that they don't get used, so an additional I/O request is needed to get the next page.
I've described this in terms of disk fragmentation, but a similar thing can happen with the pages themselves. The physical pages where data for a table is stored may not be contiguous, with similar results.
Fragmentation can be an issue with databases. In fact, it can be an issue regardless of the number of tables in the database. But more tables with more insert/delete activity on them tends to increase fragmentation. However, the effects are usually pretty slight, and would only in certain extreme circumstances be responsible for a significant reduction in performance.
Having many tables (and data) only requires an HDD space.
Processing speed depends on your table optimization and size + queries you executing, not an entire database.
Query run-time can be effected by a wide variety of server environment concerns, so while one might say that in and of itself having other large tables in the database wouldn't effect the speed of queries on unrelated tables, the reality is those tables might be getting queried too and consuming server resources. This of course effects your query speed.
It should only be a problem if that table (the large one) is related to the other table with some sort of connection (foreign key constraints would be one example)...otherwise your tables should behave independently. That said, if you have a single table that is so massive as to be causing speed problems, you might want to find other solutions for refactoring some of that data into smaller subsets.
So, one of my tables in MySQL which uses the InnoDB storage engine will contain multi-billion rows(with potentially no limit to how many will be inserted).
Can you tell me what sort of optimizations i can do to help speed up things?
Cause with a few million rows already, it will start getting slow.
Of course if you suggest to use something else. The only options i have are PostgreSQL and Sqlite3. But I've been told that sqlite3 is not a good choice for that.
As for postgresql, i have absolutely no idea how it is, as i've never used it.
I imagine though, at least about 1000-1500 inserts per second in that table.
A simple answer to your question would be yes InnoDB would be the perfect choice for a multi-billion row data set.
There is a host of optimization that is possbile.
The most obvious optimizations would be setting a large buffer pool, as buffer pool is the single most important thing when it comes to InnoDB because InnoDB buffers the data as well as the index in the buffer pool. If you have a dedicated MySQL server with only InnoDB tables, then you should set upto 80% of the available RAM to be used by InnoDB.
Another most important optimization is having proper indexes on the table (keeping in mind the data access/update pattern), both primary and secondary. (Remember that primary indexes are automatically appended to secondary indexes).
With InnoDB there are some extra goodies, such as protection from data corruption, auto-recovery etc.
As for increasing write-performance, you should setup your transaction log files to be upto a total of 4G.
One other thing that you can do is partition the table.
You can eek out more performance, by setting the bin-log-format to "row", and setting the auto_inc_lock_mode to 2 (that will ensure that innodb does not hold table level locks when inserting into auto-increment columns).
If you need any specific advice you can contact me, I would be more than willing to help.
optimizations
Take care not to have too many indexes. They are expensive when inserting
Make your datatypes fit your data, as tight fit you can. (so don't go saving ip-adresses in a text or a blob, if you know what i mean). Look in to varchar vs char. Don't forget that because varchar is more flexible, you are trading in some things. If you know a lot about your data it might help to use char's, or it might be clearly better to use varchars. etc.
Do you read at all from this table? If so, you might want to do all the reading from a replicated slave, although your connection should be good enough for that amount of data.
If you have big inserts (aside from the number of inserts), make sure your IO is actually quick enough to handle the load.
I don't think there is any reason MySQL wouldn't support this. Things that can slow you down from "thousands" to "millions" to "billions" are stuff like aforementioned indexes. There is -as far as i know- no "mysql is full" problem.
Look into Partial indexes. From wikipedia (quickest source I could find, didn't check the references, but I'm sure you can manage:)
MySQL as of version 5.4 does not
support partial indexes.[3] In MySQL,
the term "partial index" is sometimes
used to refer to prefix indexes, where
only a truncated prefix of each value
is stored in the index. This is
another technique for reducing index
size.[4]
No idea on the MySQL/InnoDB part (I'd assume it'll cope). But if you end up looking at alternatives, PostgreSQL can manage a DB of unlimited size on paper. (At least one 32TB database exists according to the FAQ.)
Can you tell me what sort of optimizations i can do to help speed up things?
Your milage will vary depending on your application. But with billions of rows, you're at least looking into partitioning your data, in order to work on smaller tables.
In the case of PostgreSQL, you'd also look into creating partial indexes where appropriate.
You may want to have a look at:
http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/
http://forums.whirlpool.net.au/archive/954126
If you have a very large table (Billions of records) and need to data mine the table (queries that read lots of data), mysql can slow to a crawl.
Large databases (200+GB) are fine, but they are bound by IO/ temp table to disk and multiple other issues when attempting to read large groups that don't fit in memory.
Is it bad to have too many tables in a database? I have about 160 tables in one database. Is it better to split it into several database rather than using a single database? Single database is more convenient for me.
There are no server limits on the number of tables in a MySQL database. You will definitely have no problems with 160 tables, and you don't need to split them into multiple databases.
You will not gain performance by splitting your tables into multiple databases. If performance remains an issue, you could consider using per-table tablespaces in order to place some sets of tables on different physical disks.
according to MySQL reference manual:
MySQL has no limit on the number of tables. The underlying file system may have a limit on the number of files that represent tables. Individual storage engines may impose engine-specific constraints. InnoDB permits up to 4 billion tables.
160 tables isn't radically huge.
16,000 might be...probably would be...more unreasonable - such databases exist in ERP or CRM systems (even into the 40-50K tables range, but many of those tables are not actually used, or are only barely used).
Even so, the typical DBMS will 'handle' such large databases, but there is more strain on the system catalog than usual in such systems.
160 is still ok. it makes SQL command more faster than making too many contents under a single table. In my case I have 8,545,214 tables in a single Mysql database. I dont want to store millions of user in a single database that's why I used multiple table to store each post a user done. it makes mysql more faster than searching on a single table with millions of rows.
WordPress Multisite creates dozens of tables for every new subsite in the same database.
So you can be so good at only 160 tables.
It might be an issue to manage them with PhpMyAdmin or other software to see and scroll through the tables. But if you work with the code it should not be a problem.
What I have:
A MySQL database running on Ubuntu that maintains a
large table of articles (similar to
wordpress).
Need to relate a given article to
another set of data. This set of data
will be fairly large.
There maybe various sets of data that
will be related.
The query:
Is it better to contain these various large sets of data within the same database of articles, which will have a large set of traffic on it?
or
Is it better to create different databases (on the same server) that
relate by a primary key to the main database with the articles?
Put them all in the same DB initially, until you find that there is a performance issue. Much easier than prematurely optimising.
Modern RDBMS are very good at optimising data access.
If you need to connect frequently and read both of the records, you should put in a the same database. The server then won't have to run permission checks twice for each of your databases.
If you have serious traffic, you should consider using persistent connection for that query.
If you don't need to read them together frequently, consider to put on different machine. As the high traffic for the bigger database won't cause slow downs on the other.
Different databases on the same server gives you all the problems of a distributed architecture without any of the benefits of scaling out. One database per server is the way to go.
When you say 'same database' and 'different databases related' don't you mean 'same table' vs 'different tables'?
if that's the question, i'd say:
one table for articles
if these 'other sets of data' are all of the same structure, put them all in the same table. if not, one table per kind of data.
everything on the same database
if you grow big enough to make database size a performance issue (after many million records and lots of queries a second), consider table partitioning or maybe replacing the biggest table with a key/value store (couchDB, mongoDB, redis, tokyo cabinet, [etc][6]), which can be a little faster than MySQL but a lot easier to distribute for performance.
[6]:key-value store