MySQL varchar(2000) vs text? - mysql

I need to store on average a paragraph of text, which would be about ~800 characters in the database. In some rare cases it may go up to 2000-2500~ characters. I've read the manual and I know there are many of these questions already, but I've read over 10+ questions on stackoverflow and I still find it a bit hard to figure out whether I should simply use text or something like varchar(2000). Half seem to say use varchar, while the other half say text. Some people say always use text if you have more than 255 characters (yea, this was after 5.0.3 which allowed varchar up to 65k). But then I thought to myself if I were to use text everytime the characters were over 255, then why did mysql bother increasing the size at all if that was always the best option?
They both have a variable size in storage I've read, so would there be no difference in my situation? I was personally leaning towards varchar(2000) then I read that varchar stores the data inline while text doesn't. Does this mean that if I constantly select this column, storing the data as varchar would be better, and conversely if I rarely select this column then using text would be better? If that is true, I think I would now choose the text column as I won't be selecting this column many of the times I run a query on the table. If it matters, this table is also frequently joined to as well (but won't be selecting the column), would that also further the benefit of using text?
Are my assumptions correct that I should go with text in this case?

When a table has TEXT or BLOB columns, the table can't be stored in memory. This means every query (which doesn't hit cache) has to access the file system - which is orders of magnitude slower than the memory.
Therefore you should store this TEXT column in a seperate table which is only accessed when you actually need it. This way the original table can be stored in memory and will be much faster.
Think of it as separating the data into one "memory table" and one "file table". The reason for doing this is to avoid accessing of the filesystem except when neccessary (i.e. only when you need the text).
You don't earn anything by storing the text in multiple tables. You still have to access the file system.
Sorry what I meant was for example, a forum script, in the posts table they might be >storing 20 columns of post data, they also store the actual post as a text field in the >same table. So that post column should be separated out?
Yes.
It seems weird to have a table called post, but the actual post isn't stored there, maybe >in another table called "actual_post" not sure lol.
You can try (posts, post_text) or (post_details, posts) or something like that.
I have a tags table that has just three fields, tag_id, tag, and description. So that >description column should also be separated out? So I need a tags table and a >tags_description table just to store 3 columns?
If the description is a TEXT column and you run queries against this table that doesn't need the description it would certainly be preferable.

I think you summarized it well. Another thing you could consider is just moving the "text" to another table... and join back to the master record. That way every time you are actually using the master table, that extra data of where the "text" is isn't even taking up space in the master record. When you need it you can join to that table. This way you can store it as a varchar just in case you want to do something like " where text like... "

Related

MySQL database design with sparse column

I have a table (millions of rows) where one of the columns is a Text field (stores json blobs). But only about 10-20% of them are actually non-Null.
What is the best practice when it comes to sparse columns?
Should I
a) Just keep the table as is or
b) Create a new table with just that Text column?
If I am not mistaken, option (a) is fine because InnoDB will dynamically only allocate as much space that is needed for that Text column right? Is there any reason to go with option (b)? Seems like option (b) will just add complexity in querying (joining) these tables and further increase the space complexity as well.
MySQL (InnoDB storage engine) stores nothing for a NULL. Well, each row has a bitfield with 1 bit for each nullable column. The bitfield is followed by data values for non-NULL columns. And variable-length columns like VARCHAR, TEXT, BLOB, or JSON take only the space needed given their length.
So I'd suggest keeping your table as is, keep the TEXT field in the table, and make it NULL when there's no JSON data.
P.S.: Aren't you using the JSON data type?
You mentioned the storage/space consideration. I think most importantly is how you will use the data. If you performance is okey with doing a like "%% match, then just leave it.
Denormalize the data allow you better query/index the content.
In general, it does not matter whether you do (a) or (b). But here are some more considerations:
If you do SELECT * but ignore that column, then (a) is wasteful.
Certain InnoDB ROW_FORMATs will put 'short' strings in the table, not separate; others will store them in a separate block, leaving behind 20 or 767 bytes in the main block. (It gets rather tedious and confusing to see if this will really matter for (a).)
(b) involves a LEFT JOIN in your code when you do want the column. You may consider this a bother.

Is it correct to have a BLOB field directly in the main table?

Which one is better: having a BLOB field in the same table or having a 1-TO-1 reference to it in another table?
I'm making a MySQL database whose main table is called item(ID, Description). This table is consulted by a program I'm developing in VB.NET which offers the possibility to double-click a specific item obtained with a query. Once opened its dedicated form, I would like to show an image stored in the BLOB field, a sort of item preview. The problem is I don't know where is better to create this BLOB field.
Assuming to have a table like this: Item(ID, Description, BLOB), will the BLOB field affect the database performance on queries like:
SELECT ID, Description FROM Item;
If yes, what do you think about this solution:
Item(ID, Description)
Images(Item, File)
Where Images.Item references to Item.ID, and File is the BLOB field.
You can add the BLOB field directly to your main table, as BLOB fields are not stored in-row and require a separate look-up to retrieve its contents. Your dependent table is needless.
BUT another and preferred way is to store on your database table only a pointer (path to the file on server) to your image file. In this way you can retrive the path and access the file from your VB.NET application.
To quote the documentation about blobs:
Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened.
In simpler terms, the blob's storage isn't stored inside the table's row, only a pointer is - which is pretty similar to what you're trying to achieve with the secondary table. To make a long story short - there's no need for another table, MySQL already doesn't the same thing internally.
Most of what has been said in the other Answers is mostly correct. I'll start from scratch, adding some caveats.
The two-table, 1-1, design is usually better for MyISAM, but not for InnoDB. The rest of my Answer applies only to InnoDB.
"Off-record" storage may happen to BLOB, TEXT, and 'large' VARCHAR and VARBINARY, almost equally.
"Large" columns are usually stored "off-record", thereby providing something very similar to your 1-1 design. However, by having InnoDB do the work usually leads to better performance.
The ROW_FORMAT and the size of the column makes a difference.
A "small" BLOB may be stored on-record. Pro: no need for the extra fetch when you include the blob in the SELECT list. Con: clutter.
Some ROW_FORMATs cut off at 767 bytes.
Some ROW_FORMATs store 20 bytes on-record; this is just a 'pointer'; the entire blob is off-record.
etc, etc.
Off-record is beneficial when you need to filter out a bunch of rows, then fetch only a few. Also, when you don't need the column.
As a side note, TINYTEXT is possibly useless. There are situations where the 'equivalent' VARCHAR(255) performs better.
Storing an image in the table (on- or off-record) is arguably unwise if that image will be used in an HTML page. HTML is quite happy to request the <img src=...> from your server or even some other server. In this case, a smallish VARCHAR containing a url is the 'correct' design.

Would it be effective to store text of different sizes in different mysql tables?

I am building a database system that will be storing large amounts of text.
The text will be stored in a table with an id column and one varchar/text column.
I was wondering if it would be more effective to use a single table which employed a large varchar, or multiple tables, each employing a different text type.
The multiple table option would contain several different tables, each employing a different kind of text (tinytext, text, etc.), and the system would store text in the most appropriate one based on the length of the text.
I am concerned with both speed and storage space, and would like a solution to balances both.
Edit -
The text table will not be searched on, but it may be joined (usually an id number will be determined, then a single row accessed).
Size will typically be smaller that text, but some will be large enough to require mediumtext. I doubt that longtext will be needed.
Keep it simple! Seriously.
Unless you have an overwhelming majority of text items that are 255 characters or shorter, just use TEXT or LONGTEXT. Spend your time doing interesting things with your text, not fiddling around with complex data structures. Get your project done now; optimize later.
Disk drives and RAM are getting cheaper much faster than your time is.
If your app requirements absolutely need you to use varchar data, for its brevity and searchability, instead of text data, you can do the following.
Create an article table, with one row per text article. It will have all the stuff you need to manage an article, including let's say the title, author, and an article_id.
Create a second table called something like article_text. It will have, possibly, four columns.
article_id foreign key to article table.
language a language code, if you happen to store translations of articles
ordinal a sequence number
textfrag varchar(255) part of the text.
Store each article's text in an series of article_text rows with ascending ordinal values. Each textfrag will hold up to 255 characters of your text. To retrieve an article's text you'll use a query like this.
SELECT textfrag
FROM article_text
WHERE language = 'en_US' /* or whatever */
AND article_id = 23456 /* or whatever */
ORDER BY ordinal
Then, you'll fetch a bunch of rows, concatenate the contents of the textfrag items, and there's your article with no effective length limit. If you create an index with all the fields in it, your retrieval time will be really fast because all retrievals will come from the index.
(article_id, language, ordinal, textfrag)
If you do your best to spit the text into fragments at word boundaries, and you use MyISAM, and you use FULLTEXT indexes, you'll get a very effective fulltext search system.

Make MySQL table FIXED by splitting TEXT field into chunks of type CHAR(255)

FIXED MySQL table has well-known performance advantages over DYNAMIC table.
There is a table tags with only one description TEXT field. An idea is to split this field into 4-8 CHAR(255) fields. For INSERT/UPDATE queries just divide description into chunks (PHP function str_split()). That will make table FIXED.
Have anybody practiced this technique? Is it worth it?
OK, this is done, but where it is done I have only seen it done for historical reasons, such as a particular client-server model that requires it, or for legacy reports where the segments are de facto fields in the layout.
The examples I have seen where for free form text entries (remarks, notes, contact log) in insurance/collections applications and the like where formatting on a printed report was important or there was a need to avoid any confusion in post post processing to dress the format where multiple platforms are involved. (\r\n vs \n and EBCDIC vertical tabs).
So not generally for space/performance/recovery purposes.
If the row is "mostly" this field, a alternative would be to create a row for each segment and add a low-range sequence number to the key.
In this way you would have only 1 row for short values and up to 8 for long. Consider your likely statistics.
Caveats :
Always be acutely aware of MySQL indexes dropping trailing spaces. Concatenating these should take this into account if used in an index.
This is not a recommendation, but "tags" sounds like a candidate for a single varchar field for full text indexing. If the data is so important that forensic recovery is a requirement, then normalising the model to store the tags in a separate table may be another way to go.

Does splitting TEXT fields into multiple tables provide performance optimization in multi-language application?

I'm building a project and I have a question about mysql databases. The application is multi-language. And we are wondering if you will get better performance if we split up different types of text-fields (varchar, text, med-text) to different tables? Or is it just better to create one table with just a text field?
With this question and the multi-language constraint in mind, I am wondering if the performance will rise of I split up the different types of text fields into seperate tables. Because when you just have one table with all the texts and the language you can search it easily. (Give me the text with this value (in an item column) and that language) When you have different tables for different types of text. You will save space in your database. Because you don't need a full text area for a varchar(200), but you will have multiple tables to create a connection between the item, the type of text and the languages you have for your text.
What do you think is the best? Or are there some possibilities that I didn't used?
I find it better for performance reasons to keep columns with blob and text data types in a separate able from the other data types even if it breaks normalization.
Consider a person table with columns name varchar, address varchar, dob date and picture blob. A picture can be about 1MB easily while the remaining columns may not take any more than 1KB. Imagine how many blokcs of data needed to be read even if you only want to list the name and address of people living in a certain city - if you are keeping everything in the same table.
If you are not bound to MySQL, I would suggest you to use some sort of text-searching engines, such as Apache Lucene if you want to do full-text searches. Because as far as I know, MySQL does not provide as much performance as Lucene can for full-text searches.
In case you are bound to MySQL, let me try to provide some information based on current definition of the problem (which is actually not much yet).
MySQL reference documentation states that:
Instances of BLOB or TEXT columns in the result of a query that is processed using a temporary table causes the server to use a table on disk rather than in memory because the MEMORY storage engine does not support those data types.
So, if you run your queries using SELECT * on a table that contains text field, you can either separate queries that really need the text field and the ones that don't need it to gain speed; or alternatively you can separate the text field from the table as well. Saving text field on a secondary table will cause you an extra overhead of the duplicated key storage and also the indexes for that secondary table. However according to your database design, you may also be suffering overhead for unnecessary index updates which can be eliminated by moving the text field to another table, but this is just a proposition since we don't know your schema and data access occasions.