Is it all right to have many tables inside a database or should I create another database? If so, what is the limit on how many tables I should have in a database?
Will having many tables in a database affect the speed of the database?
It depends on the size of the tables, as well as the amount of reading / writing you're going to be doing, which again depends on the hardware you're running, and the types of tables you're using.
Performance is usually reduced by lots of I/O because that tends to be the slowest part of a system.
As for your other question about limits, may I suggest having a look through the MySQL documentation.
Really, the key is to employ a good relational database design, and understand and optimize your queries appropriately. Having many tables in a database won't affect the speed. Building those tables with bad design, and accessing data with inefficient queries absolutely will.
One limitation in MySQL to be aware of is that a single table cannot be over 4GB in size using the MyISAM database engine. InnoDB does not have that limitation that I'm aware of.
I don't believe it makes much difference in performance if your multitude of tables is in one or several databases. (If you need to reference multiple tables in one query then they will be a bit easier to write the queries if the tables are not all in different databases, though that's not a performance matter.)
Implied in your question, however, is another question: What are the performance effects of structuring my data into specifically these tables as opposed to some other schema that uses a different number and structure of tables. That's a question many DB designers face. There often are significant performance differences. There are few general rules with any validity. One that's included in the MySQL manual is: Use the schema that minimizes total DB size. It's not guaranteed to be the most performant schema but it's often one worth considering.
Related
Does it scans the entire information_schema? Or it simply shows data from some header in table file?
Documentations on mysql (https://dev.mysql.com/doc/refman/8.0/en/show-columns.html) and mariadb (https://mariadb.com/kb/en/show-columns/) give information about outputs, but do not reveal how the data is fetched internally.
Requirement of this answer branches by the curiosity to understand following
Effect of increasing number of tables on performance to database, due to impact on information schema.
Whether table metadata is used instead of information schema for describe.
Is information schema stored in a different tablespace, or is it rendered via table metadata.
I think MySQL 8.0 has all that info in the "Data Dictionary", which is the big change for 8.0. It is in InnoDB table(s), so it should be fast.
Before 8.0, the .frm was the main source for the information, but I think there was other information buried in unindexed pseudo tables in RAM and/or ibdata1. The more tables you had, the slower things were.
If you go past, say, 1K tables in your system, you may have a poor schema design.
In a survey of a lot of servers, running DESCRIBE or SHOW CREATE TABLE is, on most machines, done less than once an hour. If you are doing such queries more than once a second, I would again question the architecture.
Note: MariaDB has not implemented the Data Dictionary.
DESCRIBE does not incur the performance cost of querying INFORMATION_SCHEMA.
Years ago, I implemented code in Zend Framework 1.0 to discover columns for a table. First implementation for Beta was to use INFORMATION_SCHEMA. But users complained that it ruined their performance.
So I changed it to query DESCRIBE, which has less detailed information, but it had enough for the purpose I had. This was much better for performance.
I am looking for a free SQL database able to handle my data model. The project is a production database working in a local network not connected to the internet without any replication. The number of application connected at the same times would be less than 10.
The data volume forecast for the next 5 years are:
3 tables of 100 millions rows
2 tables of 500 millions rows
20 tables with less than 10k rows
My first idea was to use MySQL, but I have found around the web several articles saying that MySQL is not designed for big database. But, what is the meaning of big in this case?
Is there someone to tell me if MySQL is able to handle my data model?
I read that Postgres would be a good alternative, but require a lot of hours for tuning to be efficient with big tables.
I don't think so that my project would use NOSQL database.
I would know if someone has some experience to share with regarding MySQL.
UPDATE
The database will be accessed by C# software (max 10 at the same times) and web application (2-3 at the same times),
It is important to mention that only few update will be done on the big tables, only insert query. Delete statements will be only done few times on the 20 small tables.
The big tables are very often used for select statement, but the most often in the way to know if an entry exists, not to return grouped and ordered batch of data.
I work for Percona, a company that provides consulting and other services for MySQL solutions.
For what it's worth, we have worked with many customers who are successful using MySQL with very large databases. Terrabytes of data, tens of thousands of tables, tables with billions of rows, transaction load of tens of thousands of requests per second. You may get some more insight by reading some of our customer case studies.
You describe the number of tables and the number of rows, but nothing about how you will query these tables. Certainly one could query a table of only a few hundred rows in a way that would not scale well. But this can be said of any database, not just MySQL.
Likewise, one could query a table that is terrabytes in size in an efficient way. It all depends on how you need to query it.
You also have to set specific goals for performance. If you want queries to run in milliseconds, that's challenging but doable with high-end hardware. If it's adequate for your queries to run in a couple of seconds, you can be a lot more relaxed about the scalability.
The point is that MySQL is not a constraining factor in these cases, any more than any other choice of database is a constraining factor.
Re your comments.
MySQL has referential integrity checks in its default storage engine, InnoDB. The claim that "MySQL has no integrity checks" is a myth often repeated over the years.
I think you need to stop reading superficial or outdated articles about MySQL, and read some more complete and current documentation.
MySQLPerformanceBlog.com
High Performance MySQL, 3rd edition
MySQL 5.6 manual
MySQL has a two important (and significantly different) database engines - MyISAM and InnoDB. A limits depends on usage - MyISAM is nontransactional - there is relative fast import, but it is too simple (without own memory cache) and JOINs on tables higher than 100MB can be slow (due too simple MySQL planner - hash joins is supported from 5.6). InnoDB is transactional and is very fast on operations based on primary key - but import is slower.
Current versions of MySQL has not good planner as Postgres has (there is progress) - so complex queries are usually much better on PostgreSQL - and really simple queries are better on MySQL.
Complexity of PostgreSQL configuration is myth. It is much more simple than MySQL InnoDB configuration - you have to set only five parameters: max_connection, shared_buffers, work_mem, maintenance_work_mem and effective_cache_size. Almost all is related to available memory for Postgres on server. Usually work for 5 minutes. On my experience a databases to 100GB is usually without any problems on Postgres (probably on MySQL too). There are two important factors - how speed you expect and how much memory and how fast IO you have.
With large databases you have to have a experience and knowledges for any database technology. All is fast when you are in memory, and when ratio database size/memory is higher, then much more work you have to do to get good results.
First of all, MySQLs table size is only limited by the allowed file size limit of your OS which is I. The terra bytes on any modern OS. That would pose no problems. Most important are questions like this:
What kind of queries will you run?
Are the large table records updated frequently or basically archives for history data?
What is your hardware budget?
What is the kind of query speed you need?
Are you familiar with table partitioning, archive tables, config tuning?
How fast do you need to write (expected inserts per second)
What language will you use to connect to the db (Java, .net, Ruby etc)
What platform are you most familiar with?
Will you run queries which might cause table scans such like '%something%' which would have to go through every single row and take forever
MySQL is used by Facebook, google, twitter and others with large tables and 100,000,000 is not much in the age of social media. MySQL has very little drawbacks (even though I prefer postgresql in most cases) like altering large tables by adding a new index for example. That might send your company in a couple days forced vacation if you don't have a replica in the meantime. Is there a reason why NoSQL is not an option? Sometimes hybrid approaches are a good choice like having your relational business logic in MySQL and huge statistical tables in a NoSQL database like MongoDb which can scale by adding new servers in minutes (MySQL can too but it's more complicated). Now MongoDB can have a indexed column which can be searched by in blistering speed.
Bejond the bottom line: you need to answer the above questions first to make a very informed decision. If you have huge tables and only search on indexed keys almost any database will do - if you expect many changes to the structure down the road you want to use a different approach.
Edit:
Based on your update you just posted I doubt you would run into problems.
I have some very large databases (some up to 150M rows) I'm working with & after initially inserting the data there isn't much INSERT's going on; just a lot of SELECT's & usage of JOINS.
I've been messing around with InfoBright a lot (the community version) & whilst I believe it is a good engine, I personally have been having some problems with it getting it to run like it should (fast).
So I was wondering if anyone else could recommend any other fast free storage engine for MySQL?
I'm just now checking out tokudb; is there anything else out there to check out as well?
You should look at InfiniDB too. http://infinidb.org/ (one of the fastest)
There are a lot of considerations you need to make before benchmarking any engine. Hardware stuff like multicore processors, memory, configuration. Design stuff related to your schema etc etc. and how all this impacts the engine performance.
Do check this blog out for how they do benchmarking of engines (it names other engine types) - http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/
Note that this comparison is for a star schema design. If a columnar db engine doesn't suit your requirements, you can look into XtraDB , which is an extended version of InnoDB (not the fastest, but is ACID compliant).
ps - Always track the properties (important to you) of each engine - like referential integrity checks, ACID compliance etc. Sometimes these limitations can be bigger deal breakers as compared to a 10% increase in query performance
Have you looked at Sphinx at all? While it is a search engine, it also supports query-less searches, which is similar to standard SELECT queries with indexes. I found it to be a huge help when dealing with large datasets. It's very fast, and is used heavily in high-traffic forums who are up in the millions (or hundreds of millions) of posts arena.
There is also a plugin for MySQL called SphinxSE which allows it to act as a MySQL storage engine which makes integration very easy to set up. You build your indexes by supplying the indexer program a query, and then once it's all set up, you can query it as if it was a normal table.
http://sphinxsearch.com/docs/2.0.1/sphinxse-overview.html (note, I haven't used it much since pre 1.0)
Besides taking into consideration which DBMS you use, you should also focus on optimizing your tables, indices and queries.
Whenever you have multiple joins, join first on the most selective relation and then on the less selective.
Analyze your query execution plans.
Create indices on columns that are hit often in your QEPs.
Brett -
When using Infobright, you get the best performance gains by:
1) Utilizing the Knowledge Grid as much as possible
2) reducing joins
3) creating 'lookup'
Since the Knowledge-Grid is in-memory, you can kill off a lot of query time just by adding additional filters. Also, consider using a nested select instead of a join. By doing so, you can use an already-created knowledge node (instead of generating a pack-to-pack node on the fly).
If you have some queries that you think should be faster, post them, and I can help with potentially modifying the query to make it run faster.
Cheers,
Jeff
I am planning to use mysql to store my datasets.
I have about 10^8 (hundred million) records:
ID(int), x(float), y(float), z(float), property(float).
Which database engine is suited for this kind of data-sets InnoDB or MyISAM? Or maybe ndb (I have no idea on scalability or performance)?
I am planning to query the static dataset with following questions:
Select getRectagularRegion or getPointsInSphere;
I am assuming you are trying to store points in 3d space and then find all points within a region.
How the underlining database codes with a lot of records is a lot less important to you then having a very good 3d spatial indexing system build in to the database. Without the spatial indexing you can’t do the queries with wish.
You should also consider writing your own data storage as a simple 3rd quod tree may give you good indexing depending on how your points are clustered – but using a "off the self" database would be less work for you.
So I think you need to investigate support for spatial indexing in databases, rather than ask about support for lots of rows. Storing lots of rows is a given for most databases these days…
Your table seems to be pretty simple and you won't need transactions and foreign keys. So I guess MyISAM would be better suited than InnoDB. But I guess MEMORY might be your fastest choice.
What I have:
A MySQL database running on Ubuntu that maintains a
large table of articles (similar to
wordpress).
Need to relate a given article to
another set of data. This set of data
will be fairly large.
There maybe various sets of data that
will be related.
The query:
Is it better to contain these various large sets of data within the same database of articles, which will have a large set of traffic on it?
or
Is it better to create different databases (on the same server) that
relate by a primary key to the main database with the articles?
Put them all in the same DB initially, until you find that there is a performance issue. Much easier than prematurely optimising.
Modern RDBMS are very good at optimising data access.
If you need to connect frequently and read both of the records, you should put in a the same database. The server then won't have to run permission checks twice for each of your databases.
If you have serious traffic, you should consider using persistent connection for that query.
If you don't need to read them together frequently, consider to put on different machine. As the high traffic for the bigger database won't cause slow downs on the other.
Different databases on the same server gives you all the problems of a distributed architecture without any of the benefits of scaling out. One database per server is the way to go.
When you say 'same database' and 'different databases related' don't you mean 'same table' vs 'different tables'?
if that's the question, i'd say:
one table for articles
if these 'other sets of data' are all of the same structure, put them all in the same table. if not, one table per kind of data.
everything on the same database
if you grow big enough to make database size a performance issue (after many million records and lots of queries a second), consider table partitioning or maybe replacing the biggest table with a key/value store (couchDB, mongoDB, redis, tokyo cabinet, [etc][6]), which can be a little faster than MySQL but a lot easier to distribute for performance.
[6]:key-value store