MySQL: testing performance after dividing a table - mysql

I made a test to see if dividing an indexed large table will increase the performance.
Original Table: 20000 rows.
Sub Tables: 4x5000 rows.
The main Table is divided into 4 tables, all tables are indexed, in the test each sql query was executed 10000 times in a loop to measure more accurate query times.
When I search an indexed column in the table, I see no difference in performance and Query times are the same for the original (20000 Rows) table and the new (5000 rows) tables.
I tried the same test without indexing by deleting indexes for all tables, and the difference in performance was obvious, where searching in sub tables was 6 times faster than searching in the large table. But with indexing the performance was the same.
So do you think it is a waste of time to divide my tables into smaller ones?
Note: 20000 size is just for testing, my real data will be of the size of 100M or more.

Yes, it is a waste of time. Databases can easily handle millions of rows and 20,000 is relatively small. As you noticed, indexes make finding data fast. The size of the data doesn't affect the speed of lookups noticeably in most cases. Queries might take a few more milliseconds if the difference in size is 100 or 1000 times, but the scale you're working on would make no real difference.

What you have effectively done is reinvented Partitioning of Tables. I would not use your own sub-table scheme and focus on using partitioned tables would automatically mean that internally subtables are used and if you formulate your SQL appropriately, subtables would automatically be excluded from operations if not needed.
However, all the management of the partitions would be on the server itself, so that your client code can be kept simple and you still only have to deal with a single table.

Related

MySQL Performance of one vs. many tables

I know that MySQL usually handles tables with many rows well. However, I currently face a setting where one table will be read and written by multiple users (around 10) at the same time and it is quite possible that the table will contain 10 billion rows.
My setting is a MySQL database with an InnoDB storage engine.
I have heart of some projects where tables of that size would become less efficient and slower, also concerning indexes.
I do not like the idea of having multiple tables with exactly the same structure just to split rows. Main question: However, would this not solve the issue of having reduced performance due to such a large bunch of rows?
Additional question: What else could I do to work with such a large table? The number of rows itself is not diminishable.
I have heard of some projects where tables of that size would become less efficient and slower, also concerning indexes.
This is not typical. So long as your tables are appropriately indexed for the way you're using them, performance should remain reasonable even for extremely large tables.
(There is a very slight drop in index performance as the depth of a BTREE index increases, but this effect is practically negligible. Also, it can be mitigated by using smaller keys in your indexes, as this minimizes the depth of the tree.)
In some situations, a more appropriate solution may be partitioning your table. This internally divides your data into multiple tables, but exposes them as a single table which can be queried normally. However, partitioning places some specific requirements on how your table is indexed, and does not inherently improve query performance. It's mainly useful to allow large quantities of older data to be deleted from a table at once, by dropping older partitions from a table that's partitioned by date.

Low cardinality column index VS table overheads

I have a table which holds 70 thousand rows and it is planned to slowly grow to about 140 thousands within several months.
I have 4 columns with low cardinality that contain 0/1 values as in FALSE/TRUE. I have table overheads (after optimization) of 28 MB with table size of 6 MB. I have added 4 separate simple indexes to those 4 columns. My overheads dropped to 20 MB.
I understand that indexing low cardinality column (where there are many rows, but few distinct values) has almost no effect on performance of queries, yet my overheads dropped. And overheads increase without these indexes. Should I keep lower overheads or should I rather keep potentially pointless indexes? Which affect performance the most?
P.S. Table is mainly read with variable load ranging from thousands of queries per minute to hundreds of queries per day. Writes are mainly updates of these 4 boolean columns or one timestamp column.
Indices aren't pointless when you approach table sizes that have tens of millions of rows- and you will only see marginal improvements in query performance when dealing with the table size you are dealing with now.
You're better off leaving the indices the way they are, and reconsider your DB schema. A query shouldn't use 20+ MB of memory, and its performance will only snowball into much bigger problem as the DB grows.
That said, jumping from 70k rows to 150k rows is not a huge leap in your typical mysql database. If performance is already a concern, there is already a much larger problem at play here. If you are storing large blobs in your DB, for example, you may be better off storing your data in a file, and save its location as a varchar field in your table.
One other thing to consider, if you absolutely have to keep your DB schema exactly the way it is, is to consider partitioning your data. You can typically partition your table by ID's or datetime, and see a considerable improvement in performance.

Limitations on amount of rows in MySQL

What are the limitations in terms of performance of MySQL when it comes to the amount of rows in a table? I currently have a running project that runs cronjobs every hour. Those gather data and write them into the database.
In order to boost the performance, I'm thinking about saving the data of those cronjobs in a table. (Not just the result, but all the things). The data itself will be something similar to this;
imgId (INT,FKEY->images.id) | imgId (INT,FKEY->images.id) | myData(INT)
So, the actual data per row is quite small. The problem is, that the amount of rows in this table will grow exponentially. With every imgId I add, I need the myData for every other image. That means, with 3000 images, I will have 3000^2 = 9 million rows (not counting the diagonals because I'm too lazy to do it now).
I'm concered about what MySQL can handle with such preconditions. Every hour will add roughly 100-300 new entries in the origin-table, meaning 10,000 to 90,000 new entries in the cross table.
Several questions arise:
Are there limitations to the number of rows in a table?
When (if) will MySQL significally drop performance?
What actions can I take to make this cross-table as fast (acessible-wise, writing doesn't have to be fast) as possible?
EDIT
I just finished by polynomial interpolation and it turns out the growth will not be as drastic as I originally thought. As the relation 1-2 has the same data as 2-1, I only need "half" a table, bringing the growth down to (x^2-x)/2.
Still, it will get a lot.
9 million rows is not a huge table. Given the structure you provided, as long as it's indexed properly performance of select / update / insert queries won't be an issue. DDL may be a bit slow.
Since all the rows are already described by a cartesian join, you don't need to populate the entire table.
If the order of the image pairs is not significant then you can save some space by sorting the attributes or using a two / three table schema where the imgIds are equivalent.

MySQL performance: many rows and columns (MyISAM)

Since I'm still in the beginning of my site design I figured now's a good time to ask this.
I know that one of the ways to optimize MySQL queries is to split your rows into seperate tables, however, that does have a few comfort issues.
What I'm considering is this: would querying a table consisting of around 1'000'000 rows and 150 columns using excellently designed indexes and getting only the needed columns from each query result in a much higher server load than splittiing the table into multiple ones, resulting in less collumns?
Big blob tables are a anti-pattern, never use them.
Normalized tables will run much much faster than a single blob.
InnoDB is optimized for many small tables that need to be joined.
Using a normalized table will save you many headaches besides:
Your data will be smaller, so more of it fits in memory.
You only store data in one place, so it cannot end up with inconsistent data.
MySQL only allows you to use one index per select per table, multiple tables means you get to use more indexes and get more speed.
Triggers on tables execute much faster.
Normalized tables are easier to maintain.
You have less indexes per table, so inserts are faster.
Indexes are smaller (fewer rows) and narrows (less columns) and will run much faster as a result.
If the data is static, you can pack the tables for greater efficiency. Here is the page in the reference manual

Maximum table size for a MySQL database

What is the maximum size for a MySQL table? Is it 2 million at 50GB? 5 million at 80GB?
At the higher end of the size scale, do I need to think about compressing the data? Or perhaps splitting the table if it grew too big?
I once worked with a very large (Terabyte+) MySQL database. The largest table we had was literally over a billion rows.
It worked. MySQL processed the data correctly most of the time. It was extremely unwieldy though.
Just backing up and storing the data was a challenge. It would take days to restore the table if we needed to.
We had numerous tables in the 10-100 million row range. Any significant joins to the tables were too time consuming and would take forever. So we wrote stored procedures to 'walk' the tables and process joins against ranges of 'id's. In this way we'd process the data 10-100,000 rows at a time (Join against id's 1-100,000 then 100,001-200,000, etc). This was significantly faster than joining against the entire table.
Using indexes on very large tables that aren't based on the primary key is also much more difficult. Mysql stores indexes in two pieces -- it stores indexes (other than the primary index) as indexes to the primary key values. So indexed lookups are done in two parts: First MySQL goes to an index and pulls from it the primary key values that it needs to find, then it does a second lookup on the primary key index to find where those values are.
The net of this is that for very large tables (1-200 Million plus rows) indexing against tables is more restrictive. You need fewer, simpler indexes. And doing even simple select statements that are not directly on an index may never come back. Where clauses must hit indexes or forget about it.
But all that being said, things did actually work. We were able to use MySQL with these very large tables and do calculations and get answers that were correct.
About your first question, the effective maximum size for the database is usually determined by operating system, specifically the file size MySQL Server will be able to create, not by MySQL Server itself. Those limits play a big role in table size limits. And MyISAM works differently from InnoDB. So any tables will be dependent on those limits.
If you use InnoDB you will have more options on manipulating table sizes, resizing the tablespace is an option in this case, so if you plan to resize it, this is the way to go. Give a look at The table is full error page.
I am not sure the real record quantity of each table given all necessary information (OS, Table type, Columns, data type and size of each and etc...) And I am not sure if this info is easy to calculate, but I've seen simple table with around 1bi records in a couple cases and MySQL didn't gave up.