Say I have a db table with 5 fields.
id bigint(20)
name varchar(255)
place varchar(255)
DOB date
about TEXT
Maximum row size is 65535 bytes in MySQL that is shared among all columns in the table, except TEXT/BLOB columns.
I need a query which will give me the maximum row size in bytes for this table so that I can check if the max size exceeds 65535. The charset used is utf8mb4. I need something like this
(20*8 + 255*4 + 255*4 + 3) = 2203 the size of about field is ignored since the type is TEXT.
It is much simpler -- Write and run the CREATE TABLE statement. It will spit at you if it is too large.
Meanwhile, here are some of the flaws in your attempt:
BIGINT is 8 bytes; the (20) is useless information.
Each VARCHAR needs 1- or 2-bytes for length field.
TEXT takes some amount of space toward the 64K limit.
There is a bunch of "overhead" for each column and the row, so it is really impractical to try to compute the length.
Note also -- When the length gets too large, some of the VARCHARs may be treated like TEXT. That is they won't necessarily count toward the 64K limit. However, they will leave a 20-byte pointer in their place.
If you are tempted to have a table with too many too-big columns, describe the table to us; we can suggest improvements, such as:
BIGINT is rarely needed; use a smaller int type.
Don't blindly use (255) is has several drawbacks. Analyze your data and pick a more realistic limit.
Be aware that there are four ROW_FORMATs. DYNAMIC is probably optimal for you. (Old versions of MySQL did not have Dynamic.
Parallel tables (Two tables with the same PK)
Normalization (Replacing a common string with a smaller INT.)
Don't use a bunch of columns as an "array"; make another table. (Example: 3 phone number columns.)
Have you hit any INDEX limits yet? It smells like you might get bitten by that, too. Write up the CREATE TABLE, including tentative indexes. We'll chew it up and advise you.
Related
I'm creating 2 similar tables, atestunion1 and atestunion2, with columns of: of id, customer_id, product_id, comment, date. The only difference between these is the length of the varchar comment. The "why" of this structure is below.
As comments are entered, the number of characters are counted, and then the entry is saved to the table (via if or switch php statement) with the smallest varchar character size that the comment with fit into.
Then, these are accessed like a single table, using UNION, like this:
SELECT * FROM atestunion1 UNION SELECT * from atestunion2 ORDER BY date
This query seems to work without issue - the different comment field size doesn't seem to cause a problem - but I'm wondering if there are issues with this conceptually. The reason for doing this is to save on the DB size. I believe (assumption 1) that a comment field with 20 characters in varchar(30) column takes up less memory than one with varchar(500). However, I would think that this sort of optimization might be built into MySQL and is thus not in need of my lowly hack. Maybe it does this already, such that my assumption 1 is simply incorrect? Or, perhaps there is a setting for the varchar column that will cause this?
My waterfall of questions:
Does MySQL already do such an optimization behind the scenes, such that an entry with some number of characters takes up the same memory regardless of the varchar setting and such that I don't need to mess with it?
If not, is there a setting for the varchar that would cause it to do so?
If not, does this concept of similar tables but for the varchar size difference, then accessed like a single table via UNION, seem like a valid and non-problematic way to save on DB size?
The difference in storage size between varchar(30) and varchar(500) (for the same string) is one byte. See String Type Storage Requirements:
L represents the actual length in bytes of a given string value.
[..]
VARCHAR(M), VARBINARY(M) [..]
L + 1 bytes if column values require 0 − 255 bytes, L + 2 bytes if values may require more than 255 bytes
So no - It's not worth splitting the table and overcomplicating your code.
The only case I know, where it might make a significant difference, is when you use temporary tables with MEMORY engine. Then the VARCHAR columns will be expanded to it's maximum size (That are 2000 bytes for VARCHAR(500) with utf8mb4 character set).
See The MEMORY Storage Engine:
MEMORY tables use a fixed-length row-storage format. Variable-length
types such as VARCHAR are stored using a fixed length.
We have a table with 150 million rows, and one of the columns is still varchar (128 symbols), we optimized every other column to tinyints and similar to reduce size. We're trying to improve the performance further. Would moving the column to another table and using a join when selecting something have any performance issues? there are around 500 unique varchars at the moment, and it shouldn't exceed growth of around 100-200/year, so in theory it should decrease the size of the table drastically.
It depends on how long those strings are. Just because the string is defined as varchar(128) doesn't mean that it contains that many characters. A varchar is going to contain a length (either one or two bytes) and then the data. In this case, the length is 1 byte.
So, if your strings are very short, then an integer used for mapping to another table might actually be bigger.
If your strings are long -- say 100 characters -- then replacing them with a looking will be smaller. And, this might actually have a significant impact on the data size (and hence on performance).
The join itself should add little to the cost of a query, particularly if the join key is a primary key. In fact, because the data in the larger table is smaller, such queries might run faster with the join.
What you should do depends on your data.
I usually use maximum chars possible for varchar fields, so in most cases I set 255 but only using 16 chars in columns...
does this decreases performance for my database?
When it comes to storage, a VARCHAR(255) column will take up 1 byte to store the length of the actual value plus the bytes required to store the actual value.
For a latin1 VARCHAR(255) column, that's at most 256 bytes. For a UTF8 column, where each character can take up to 3 bytes (though rarely), the maximum size is 766 bytes. As we know the maximum index length for a single column in bytes in InnoDB is 767 bytes, hence perhaps the reason some declare 255 as the maximum supported column length.
So, again, when storing the value, it only takes up as much room as is actually needed.
However, if the column is indexed, the index automatically allocates the maximum possible size so that each node in the index has enough room to store any possible value. When searching through an index, MySQL loads the nodes in specific byte size chunks at a time. Large nodes means less nodes per read, which means it takes longer to search the index.
MySQL will also use the maximum size when storing the values in a temp table for sorting.
So, even if you aren't using indexes, but are ever performing a query that can't utilize an index for sorting, you will get a performance hit.
Therefore, if performance is your goal, setting any VARCHAR column to 255 characters should not be a rule of thumb. Instead, you should use the minimum required.
There may be edge cases where you'd rather suffer the performance every day so that you never have to lock a table completely to increase the size of a column, but I don't think that's the norm.
One possible exception is if you are joining on a VARCHAR column between two tables. MySQL says:
MySQL can use indexes on columns more efficiently if they are declared
as the same type and size.
In that case, you might use the max size between the two.
Whenever you're talking about "performance" you can only find out one way: Benchmarking.
In theoretical terms there's no difference between VARCHAR(20) and VARCHAR(255) if they're both populated with the same data. Keep in mind if you get your length wrong you will have massive truncation problems and MySQL does not warn you before it starts chopping data to fit.
I try to avoid setting limits on VARCHAR columns unless the data would be completely invalid if it was longer. For instance, two-character ISO country codes can be stored in VARCHAR(2) because longer strings are meaningless. For other things, especially names or phone numbers, limiting the length is potentially and probably harmful.
Still, you will want to test any schema you create to be sure it meets your performance requirements. I expect you'd have a hard time detecting any difference at all between VARCHAR(25) and VARCHAR(16).
There are two ways in which this will decrease performance.
if you're loading those columns many many times, performing a join on the column, or other such thing that means they need to be accessed a large number of times. The number of times depends on your machine, but think on the order of millions.
if you're always filling the field (using 20 chars in a varchar(20), then the length checks are adding a little overhead whenever you perform an insert.
The best way to determine this though is to benchmark your database though.
At the moment, we have a varchar(255) field. One of our users has requested that the field be increased about 10-fold to around varchar(2048). Given his use-case, it appears to be a reasonable request.
The field in question is not indexed, it is not part of any joins and it is never included in any where clauses.
I realize that smaller increases (say from 50 to 100) have no impact, but does this still hold at much larger increases in size?
Apparently not with a sidenote:
"In storage, VARCHAR(255) is smart enough to store only the length you need on a given row, unlike CHAR(255) which would always store 255 characters.
But since you tagged this question with MySQL, I'll mention a MySQL-specific tip: when your query implicitly generates a temporary table, for instance while sorting or GROUP BY, VARCHAR fields are converted to CHAR to gain the advantage of working with fixed-width rows. If you use a lot of VARCHAR(255) fields for data that doesn't need to be that long, this can make the temporary table very large.
It's best to define the column based on the type of data that you intend to store. It's hard to know what the longest postal address is, of course, which is why many people choose a long VARCHAR that is certainly longer than any address. And 255 is customary because it may have been the maximum length of a VARCHAR in some databases in the dawn of time (as well as PostgreSQL until more recently)."
by Bill Karwin
So it basically depends on your specific user for that field; If you don't use GROUP BY with that field, then there's no problem.
For your case there is no difference between varchar(255) and varchar(2048).
In MySQL, temporary tables and MEMORY tables store a VARCHAR column as a fixed-length column, padded out to its maximum length. If you design VARCHAR columns much larger than the greatest size you need, you will consume more memory than you have to. This affects cache efficiency, sorting speed, etc.
How does MySQL store a varchar field? Can I assume that the following pattern represents sensible storage sizes :
1,2,4,8,16,32,64,128,255 (max)
A clarification via example. Lets say I have a varchar field of 20 characters. Does MySQL when creating this field, basically reserve space for 32 bytes(not sure if they are bytes or not) but only allow 20 to be entered?
I guess I am worried about optimising disk space for a massive table.
To answer the question, on disk MySql uses 1 + the size that is used in the field to store the data (so if the column was declared varchar(45), and the field was "FooBar" it would use 7 bytes on disk, unless of course you where using a multibyte character set, where it would be using 14 bytes). So, however you declare your columns, it wont make a difference on the storage end (you stated you are worried about disk optimization for a massive table). However, it does make a difference in queries, as VARCHAR's are converted to CHAR's when MySql makes a temporary table (SORT, ORDER, etc) and the more records you can fit into a single page, the less memory and faster your table scans will be.
MySQL stores a varchar field as a variable length record, with either a one-byte or a two-byte prefix to indicate the record size.
Having a pattern of storage sizes doesn't really make any difference to how MySQL will function when dealing with variable length record storage. The length specified in a varchar(x) declaration will simply determine the maximum length of the data that can be stored. Basically, a varchar(16) is no different disk-wise than a varchar(128).
This manual page has a more detailed explanation.
Edit: With regards to your updated question, the answer is still the same. A varchar field will only use up as much space on disk as the data you store in it (plus a one or two byte overhead). So it doesn't matter if you have a varchar(16) or a varchar(128), if you store a 10-character string in it, you're only going to use 10 bytes (plus 1 or 2) of disk space.