Low cardinality column index VS table overheads - mysql

I have a table which holds 70 thousand rows and it is planned to slowly grow to about 140 thousands within several months.
I have 4 columns with low cardinality that contain 0/1 values as in FALSE/TRUE. I have table overheads (after optimization) of 28 MB with table size of 6 MB. I have added 4 separate simple indexes to those 4 columns. My overheads dropped to 20 MB.
I understand that indexing low cardinality column (where there are many rows, but few distinct values) has almost no effect on performance of queries, yet my overheads dropped. And overheads increase without these indexes. Should I keep lower overheads or should I rather keep potentially pointless indexes? Which affect performance the most?
P.S. Table is mainly read with variable load ranging from thousands of queries per minute to hundreds of queries per day. Writes are mainly updates of these 4 boolean columns or one timestamp column.

Indices aren't pointless when you approach table sizes that have tens of millions of rows- and you will only see marginal improvements in query performance when dealing with the table size you are dealing with now.
You're better off leaving the indices the way they are, and reconsider your DB schema. A query shouldn't use 20+ MB of memory, and its performance will only snowball into much bigger problem as the DB grows.
That said, jumping from 70k rows to 150k rows is not a huge leap in your typical mysql database. If performance is already a concern, there is already a much larger problem at play here. If you are storing large blobs in your DB, for example, you may be better off storing your data in a file, and save its location as a varchar field in your table.
One other thing to consider, if you absolutely have to keep your DB schema exactly the way it is, is to consider partitioning your data. You can typically partition your table by ID's or datetime, and see a considerable improvement in performance.

Related

MySQL Performance of one vs. many tables

I know that MySQL usually handles tables with many rows well. However, I currently face a setting where one table will be read and written by multiple users (around 10) at the same time and it is quite possible that the table will contain 10 billion rows.
My setting is a MySQL database with an InnoDB storage engine.
I have heart of some projects where tables of that size would become less efficient and slower, also concerning indexes.
I do not like the idea of having multiple tables with exactly the same structure just to split rows. Main question: However, would this not solve the issue of having reduced performance due to such a large bunch of rows?
Additional question: What else could I do to work with such a large table? The number of rows itself is not diminishable.
I have heard of some projects where tables of that size would become less efficient and slower, also concerning indexes.
This is not typical. So long as your tables are appropriately indexed for the way you're using them, performance should remain reasonable even for extremely large tables.
(There is a very slight drop in index performance as the depth of a BTREE index increases, but this effect is practically negligible. Also, it can be mitigated by using smaller keys in your indexes, as this minimizes the depth of the tree.)
In some situations, a more appropriate solution may be partitioning your table. This internally divides your data into multiple tables, but exposes them as a single table which can be queried normally. However, partitioning places some specific requirements on how your table is indexed, and does not inherently improve query performance. It's mainly useful to allow large quantities of older data to be deleted from a table at once, by dropping older partitions from a table that's partitioned by date.

Do indexes help a mysql MEMORY table?

I was optimizing a 3 GB table as a MEMORY table in order to do some analysis on it, and I was curious if adding indexes even help a MEMORY table. Since the data is all in memory anyway, is this just redundant?
No, they're not redundant.
Yes, continue to use indexes.
The speed of access to a memory table on smaller tables with a non-indexed column may seem almost identical to the indexed ones due to how fast full table scans can be in memory, but as the table grows or as you join them together to make larger result sets there will be a difference.
Regardless of the storage method the engine uses (disk/memory), proper indexes will improve performance as long the storage engine supports them. How the indexes are implemented may vary, but I know they are implemented in the table types MEMORY, INNODB, and MyISAM. BTW: The default method for indexes in MEMORY tables is with a hash instead of a B-Tree.
Also, I generally don't recommend coding to your storage engine. What's a memory table today may need to changed to innodb tomorrow--the SQL and schema should stand on it's own.
No, indexing has little to do with data access speed. An index reorganizes data in order to optimize specific queries.
For example if you add a balanced binary tree index to a one-million-row column, you will be able to find the item you want in about 20 read operations, instead of a average half million.
So placing that million rows in memory, which is 100x faster than the disk, will speed a brute force search by 100x. Adding the index will further improve the speed by a factor of twenty-five thousand by allowing the DB to perform a smarter search instead of a merely faster search.
Things are more complicated than this, because other factors get into play, and you rarely get such large a benefit from an index. Smarter searches are also slower on a one-by-one basis: those 20 index seeks cost much more than 20 brute force seeks. Then there's index maintenance, etc.
But my suggestion is to keep the data in memory if you can -- and index them.

MySQL: testing performance after dividing a table

I made a test to see if dividing an indexed large table will increase the performance.
Original Table: 20000 rows.
Sub Tables: 4x5000 rows.
The main Table is divided into 4 tables, all tables are indexed, in the test each sql query was executed 10000 times in a loop to measure more accurate query times.
When I search an indexed column in the table, I see no difference in performance and Query times are the same for the original (20000 Rows) table and the new (5000 rows) tables.
I tried the same test without indexing by deleting indexes for all tables, and the difference in performance was obvious, where searching in sub tables was 6 times faster than searching in the large table. But with indexing the performance was the same.
So do you think it is a waste of time to divide my tables into smaller ones?
Note: 20000 size is just for testing, my real data will be of the size of 100M or more.
Yes, it is a waste of time. Databases can easily handle millions of rows and 20,000 is relatively small. As you noticed, indexes make finding data fast. The size of the data doesn't affect the speed of lookups noticeably in most cases. Queries might take a few more milliseconds if the difference in size is 100 or 1000 times, but the scale you're working on would make no real difference.
What you have effectively done is reinvented Partitioning of Tables. I would not use your own sub-table scheme and focus on using partitioned tables would automatically mean that internally subtables are used and if you formulate your SQL appropriately, subtables would automatically be excluded from operations if not needed.
However, all the management of the partitions would be on the server itself, so that your client code can be kept simple and you still only have to deal with a single table.

Is there any benefit to separate varchar(2xx) column from mysql table to nosql storage

If the numbers of the record is very big like N*10 M per table , is there any benefit to move the varchar(2xx) column to a nosql storage? The content of the text won't be very long, I think 200 characters is big enough. And the engine of mysql will be innoDB. The column won't be used as an index.
Moving a specific column won't help performance much and will likely reduce performance because you need to get data from two places instead of one.
In general the slow part of any query is finding the right record - once you find that record, reading a few hundred bytes more doesn't really change anything.
Also, 10 million records of 200 characters is at most 4GB - not much even if your dataset needs to fit in RAM.

Maximum table size for a MySQL database

What is the maximum size for a MySQL table? Is it 2 million at 50GB? 5 million at 80GB?
At the higher end of the size scale, do I need to think about compressing the data? Or perhaps splitting the table if it grew too big?
I once worked with a very large (Terabyte+) MySQL database. The largest table we had was literally over a billion rows.
It worked. MySQL processed the data correctly most of the time. It was extremely unwieldy though.
Just backing up and storing the data was a challenge. It would take days to restore the table if we needed to.
We had numerous tables in the 10-100 million row range. Any significant joins to the tables were too time consuming and would take forever. So we wrote stored procedures to 'walk' the tables and process joins against ranges of 'id's. In this way we'd process the data 10-100,000 rows at a time (Join against id's 1-100,000 then 100,001-200,000, etc). This was significantly faster than joining against the entire table.
Using indexes on very large tables that aren't based on the primary key is also much more difficult. Mysql stores indexes in two pieces -- it stores indexes (other than the primary index) as indexes to the primary key values. So indexed lookups are done in two parts: First MySQL goes to an index and pulls from it the primary key values that it needs to find, then it does a second lookup on the primary key index to find where those values are.
The net of this is that for very large tables (1-200 Million plus rows) indexing against tables is more restrictive. You need fewer, simpler indexes. And doing even simple select statements that are not directly on an index may never come back. Where clauses must hit indexes or forget about it.
But all that being said, things did actually work. We were able to use MySQL with these very large tables and do calculations and get answers that were correct.
About your first question, the effective maximum size for the database is usually determined by operating system, specifically the file size MySQL Server will be able to create, not by MySQL Server itself. Those limits play a big role in table size limits. And MyISAM works differently from InnoDB. So any tables will be dependent on those limits.
If you use InnoDB you will have more options on manipulating table sizes, resizing the tablespace is an option in this case, so if you plan to resize it, this is the way to go. Give a look at The table is full error page.
I am not sure the real record quantity of each table given all necessary information (OS, Table type, Columns, data type and size of each and etc...) And I am not sure if this info is easy to calculate, but I've seen simple table with around 1bi records in a couple cases and MySQL didn't gave up.