mysql mediumtext & max row size - mysql

I am confused about two conflicting statements. I read repeatedly that the max size of a row is 64KB. So, for example, what does it mean to say that a mediumtext column can be 16MB if the whole row maxes out at 64KB? (I am using InnoDB.)
I have two related questions as well.
In practice, is a mediumtext field fixed or variable length? On dump it looks variable, but is it actually, or is the db record always alloting 16MB for it?
Is there a concensus that mediumtext is as useful as any smaller relative such as varchar, or is the general wisdom that it is better to use files for text over 60KB or so?
I know these are newbie questions, but I did a fair amount of searching before coming here, and I did not find answers. Your help will be appreciated.

Rows are not limited to 64KB, not even in any qualified way that I can think of. If you have a link to where is says that limit, let me know.
I have collected most of the 'hard' limits in MySQL here.
MEDIUMTEXT is implemented as a 3-byte length plus zero to 16MB bytes of text. The other TEXT and BLOB datatypes are similarly designed. They are all "variable length", not "fixed".
In general, use the datatype that works best for the data an the application code manipulating it.
As it turns out, LONGTEXT and LONGBLOB are difficult to load directly because of a 'variable' max_allowed_packet which has a default of 16MB. I'm not sure if you need to be sure that (innodb_log_file_size * innodb_log_files_in_group) > 10 * the largest TEXT you will have. (I don't think I have heard of anyone having such an issue.
There is a slight advantage in declaring VARCHARs with a reasonable size, not the 255 that is commonly seen. There is probably no difference between larger varchars and TEXT, except for what the cutoff is.
File versus text/blob... This is a common question on the forum, usually relating to the storage of images in BLOBs. There are advantages/disadvantages of either approach.

Related

Is there any difference between varchar(10) and varchar(1000) when we store string whose length is less than 10?

I understand the difference between char and varchar but I'm not sure of exact meaning of varchar maximum length.
When we store string whose length is less than 10, is there any difference between varchar(10) and varchar(100)?
I think both cases use same space and have same performance. If so, why do we need varchar max limit?
Is it enough to use just "varchar" instead of "varchar(xxx)"?
(added) I'm using MySQL 5.0.67
This depends entirely on the DBMS engine being used. SQL itself does not mandate how things are stored physically, just how they're seen logically.
For example, your DBMS may allocate space in the row for the maximum size, plus some extra bytes to store the length. In that case, there would be a big difference between varchar(10) and varchar(1000) since you would waste quite a bit of space per row.
Alternatively, it may use a buffer pool for the varchar data and store only the length and the buffer pool "starting address" in the row. In that case, every single row would store identically-sized information for a varchar column regardless of its size, but there would be an added step to extract the actual data in that column (following the link to the buffer pool).
The reason you use a varchar is exactly why it's named varchar. It allows you to store variable-sized data elements. Typically, char(10) gives you ten characters, no matter what, padding it with spaces if you insert something shorter. You can trim trailing spaces off as you extract it but that won't work so well if the data you want to store is actually "hello ", with a trailing space you want preserved.
A decent DBMS engine may decide to make a trade-off depending on the maximum size of the varchar column. For short ones, it could just store it inline in the row and consume the extra bytes for the size.
Longer varchar columns could be "outsourced" to a separate buffer pool to ensure row-reading is kept efficient (at least until you need the large varchar column, anyway).
What you need to do is re-ask the question for your specific DBMS so as to get a more targeted answer.
Or, in all honesty, engineer your database to only store the maximum size. If you know it's 10, then varchar(1000) is a waste. If, in the future, you need to enlarge the column, that is the time to do it, rather than now (see YAGNI).
For MySQL, you'll want to look at Chapter 14 Storage Engines of the online documentation.
It covers the various storage engines (such as InnoDB and MyISAM) that MySQL uses and, looking deep enough, you can see how the information is physically stored.
For example, in MyISAM, the presence of variable length data in a table (varchar included) usually means dynamic tables. This follows a scheme roughly analogous to the buffer pool concept I mentioned above, with the advantage that less space is wasted for variable sized columns, and the disadvantage that rows may become fragmented.
The other storage format (discounting compressed format since it's only really used for read-only tables) is the static one, where data is stored in a single physical row.
Information on the InnoDB physical structures can be found here. Depending on whether you use the Antelope or Barracuda file format, you end up with the "all information is a physical row" or "buffer pool" situation, similar to the MyISAM distinction between dynamic and static.
In SQL Server, the limit does not affect how the data is stored on disk. What it does provide, though, is one constraint for free. If you, as the database designer, only want up to 10 characters stored, you've prevented someone from storing a novel instead.
Is it enough to use just "varchar"
Again, for SQL Server, almost certainly not what you want. In most circumstances, if you don't specify a limit, you get a varchar(1) (surely the most pointless data type ever conceived). Occasionally, it's a varchar(30).
In Oracle, the Varchar stretches in size depending on its use, just up to the point where you set the limit. This means that indeed, a varchar(10) and a varchar(100) containing 2 characters use the same space (which is different for a char, which always uses the full allocated space).

MySQL best way to store long strings

I'm looking for some advice on the best way to store long strings of data from the mySQL experts.
I have a general purpose table which is used to store any kind of data, by which I mean it should be able to hold alphanumeric and numeric data.
Currently, the table structure is simple with an ID and the actual data stored in a single column as follows:
id INT(11)
data VARCHAR(128)
I now have a requirement to store a larger amount of data (up to 500 characters) and am wondering whether the best way would be to simply increase the varchar column size, or whether I should add a new column (a TEXT type column?) for the times I need to store longer strings.
If any experts out there has any advice I'm all ears!
My preferred method would be to simply increase the varchar column, but that's because I'm lazy.
The mySQL version I'm running is 5.0.77.
I should mention the new 500 character requirement will only be for the odd record; most records in the table will be not longer than 50 characters.
I thought I'd be future-proofing by making the column 128. Shows how much I knew!
Generally speaking, this is not a question that has a "correct" answer. There is no "infinite length" text storage type in MySQL. You could use LONGTEXT, but that still has an (absurdly high) upper limit. Yet if you do, you're kicking your DBMS in the teeth for having to deal with that absurd blob of a column for your 50-character text. Not to mention the fact that you hardly do anything with it.
So, most futureproofness(TM) is probably offered by LONGTEXT. But it's also a very bad method of resolving the issue. Honestly, I'd revisit the application requirements. Storing strings that have no "domain" (as in, being well-defined in their application) and arbitrary length is not one of the strengths of RDBMS.
If I'd want to solve this on the "application design" level, I'd use NoSQL key-value store for this (and I'm as anti-NoSQL-hype as they get, so you know it's serious), even though I recognize it's a rather expensive change for such a minor change. But if this is an indication of what your DBMS is eventually going to hold, it might be more prudent to switch now to avoid this same problem hundred times in the future. Data domain is very important in RDBMS, whereas it's explicitly sidelined in non-relational solutions, which seems to be what you're trying to solve here.
Stuck with MySQL? Just increase it to VARCHAR(1000). If you have no requirements for your data, it's irrelevant what you do anyway.
Careful if using text. TEXT data is not stored in the database server’s memory, therefore, whenever you query TEXT data, MySQL has to read from it from the disk, which is much slower in comparison with CHAR and VARCHAR as it cannot make use of indexes.The better way to store long string will be nosql databases
We can use varchar(<maximum_limit>). The maximum limit that we can pass is 65535 bytes.
Note: This maximum length of a VARCHAR is shared among all columns except TEXT/BLOB columns and the character set used.

MySQL Storage and Optimization

I'm looking at a db schema for a project I'm inheriting. There are many instances of binary answers being stored as INT(11) rather than TinyInt(1), which is the way I've normally handled this type or storage.
I've checked the data and everything is either "1" or "0". Is there any reason to or not to change the datatype to TinyInt(1) Unsigned for all of these instances?
Similarly, if something like "last_name" if the current column allows varchar(255), would switching to varchar(100) create any gains? I'm more interested in performance/efficiency than in just limiting data storage at this point.
Thanks,
D.
I would say definitely go ahead with the changes to the boolean columns. (Note: Actually if you're using MySQL 5+, I would use the bit datatype instead of tinyint).
As far as the varchar columns, it doesn't actually make a difference changing 255 to 100 length.
From The SQL Docs:
A column uses one length byte if
values require no more than 255 bytes,
two length bytes if values may require
more than 255 bytes.
So as long as its under 255, you're really not gaining much in terms of memory storage.
That being said, by limiting the size of the names, less data needs to be transferred between your SQL server and your application.
Switching to TINYINT would save you 3 bytes I believe, which doesn't seem like a lot to me, although it's certainly a little more efficient.
I always try and make VARCHAR columns as small as I can get away with. I would personally focus on any gains you can get from that.
The main reason I can think of to avoid any of these changes is if you have so much data that running an ALTER TABLE would cause significant downtime.
Whether any of this will help your app perform better is open to debate. In theory, with VARCHARs, MySQL will only send the actual data over the wire, so if all your last names are 40 bytes long, it's only sending 40 bytes. If the column isn't being used in lookups, it shouldn't really have any impact on your perfomance. There's a couple relevant questions like this one on SO covering this issue already.

Limiting the size of TEXT in MySQL for speed

I have a table with a field that needs to store 512 characters. My question is this: can you limit the length of a TEXT field (VARCHAR stops at 255 - I can't use it) to 512 characters? If I do that, will MySQL run through the table quicker due to the fixed data length, or does it act a bit like VARCHAR in that the length can vary?
I don't have a table (at all) yet, but I'm very conscious about speed and size for the future. I found this article: link text incredibly useful - maybe you will to!
Thanks a lot,
James
Update your mysql server. Since 5.0.3 you can store up to 65535 bytes in VARCHAR.
You don't really say how you are using the table, so I'll just throw this in. One option is to separate out the text field into another table and just have a reference to it in your original table.

MySQL: Why use VARCHAR(20) instead of VARCHAR(255)? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Are there disadvantages to using a generic varchar(255) for all text-based fields?
In MYSQL you can choose a length for the VARCHAR field type. Possible values are 1-255.
But what are its advantages if you use VARCHAR(255) that is the maximum instead of VARCHAR(20)? As far as I know, the size of the entries depends only on the real length of the inserted string.
size (bytes) = length+1
So if you have the word "Example" in a VARCHAR(255) field, it would have 8 bytes. If you have it in a VARCHAR(20) field, it would have 8 bytes, too. What is the difference?
I hope you can help me. Thanks in advance!
Check out: Reference for Varchar
In short there isn't much difference unless you go over the size of 255 in your VARCHAR which will require another byte for the length prefix.
The length indicates more of a constraint on the data stored in the column than anything else. This inherently constrains the MAXIMUM storage size for the column as well. IMHO, the length should make sense with respect to the data. If your storing a Social Security # it makes no sense to set the length to 128 even though it doesn't cost you anything in storage if all you actually store is an SSN.
There are many valid reasons for choosing a value smaller than the maximum that are not related to performance. Setting a size helps indicate the type of data you are storing and also can also act as a last-gasp form of validation.
For instance, if you are storing a UK postcode then you only need 8 characters. Setting this limit helps make clear the type of data you are storing. If you chose 255 characters it would just confuse matters.
I don't know about mySQL but in SQL Server it will let you define fields such that the total number of bytes used is greater than the total number of bytes that can actually be stored in a record. This is a bad thing. Sooner or later you will get a row where the limit is reached and you cannot insert the data.
It is far better to design your database structure to consider row size limits.
Additionally yes, you do not want people to put 200 characters in a field where the maximum value should be 10. If they do, it is almost always bad data.
You say, well I can limit that at the application level. But data does not get into the database just from one application. Sometimes multiple applications use it, sometimes data is imported and sometimes it is fixed manually from the query window (update all the records to add 10% to the price for instance). If any of these other sources of data don't know about the rules you put in your application, you will have bad, useless data in your database. Data integrity must be enforced at the database level (which doesn't stop you from also checking before you try to enter data) or you have no integrity. Plus it has been my experience that people who are too lazy to design their database are often also too lazy to actually put the limits into the application and there is no data integrity check at all.
They have a word for databases with no data integrity - useless.
There is a semantical difference (and I believe that's the only difference): if you try to fill 30 non-space characters into varchar(20), it will produce an error, whereas it will succeed for varchar(255). So it is primarily an additional constraint.
Well, if you want to allow for a larger entry, or limit the entry size perhaps.
For example, you may have first_name as a VARCHAR 20, but perhaps street_address as a VARCHAR 50 since 20 may not be enough space. At the same time, you may want to control how large that value can get.
In other words, you have set a ceiling of how large a particular value can be, in theory to prevent the table (and potentially the index/index entries) from getting too large.
You could just use CHAR which is a fixed width as well, but unlike VARCHAR which can be smaller, CHAR pads the values (although this makes for quicker SQL access.
From a database perspective performance wise I do not believe there is going to be a difference.
However, I think a lot of the decision on the length to use comes down to what you are trying to accomplish and documenting the system to accept just the data that it needs.