Any pitfalls of converting MySQL TEXT field to MEDIUMTEXT? - mysql

I understand the size/storage constraints of MySQL TEXT and MEDIUMTEXT fields, but I just wanted to make absolutely sure (before I sign off on a change) that I'm not looking at any adverse effects from converting a field with existing data from TEXT to MEDIUMTEXT.
My concerns are mainly performance, integrity, and disk storage.
Thanks

With regard to the performance, integrity, and disk storage in the database layer, I wouldn't worry about it.
Variable-length data like varchar, text, and blob is stored without padding.
I don't know any issues with integrity. All data types are treated atomically by the database engine.
Of course if you have really long text data then it will take more storage and more time for disk I/O and network bandwidth when you fetch that data. But if that's the data you need to put in the database, then that's what you have to do.
I can think of one possible impact:
Some client interface libraries pre-allocate a buffer to hold results, and they allocate enough memory for the largest possible value. The client doesn't know the length of the data before it fetches the data, so it must allocate enough space assuming the data might be as long as the data type supports.
Therefore the library would allocate 16MB per mediumtext while it would allocate 64KB for a text. This is something to watch out for if you have a low memory limit in your client layer. For instance, PHP has a memory_limit config parameter for scripts, and the buffer allocated for data result sets would count toward this.

Related

Best MySql data types to store variable length strings and binary data

I have a data table which has to be read often. I need to store in it strings and binary data of variable length. I could store data as BLOB or TEXT, but the way I understand MySql, those types are stored on the hard drive instead of memory, and if I use them, the speed of reading the table is going to be low.
Are there any alternative variable length types which I could use? Or, maybe, is there a way to tell MySql to hold the data in columns of those types in memory?
Is this 'data table' the only place that the strings are stored? If so, you need the 'persistence' of storing it on disk. However, MySQL will "cache" the data, so reads will almost always be from RAM.
If each element of data is not 'too' big, you could use ENGINE=MEMORY for the table; that would leave the data only in RAM. A system crash would lose the data.
But if you don't need persistence, there are many flavors of caching outside MySQL. Please describe where the data comes from, what language is using the data, how big the data is, etc.

In memory relational database

I know this question is asked multiple times in stackoverflow.
I am posting this question to find out what will be the best choice in for my design.
I have following schema for my job details.
_unique_key varchar(256) NULL
_job_handle varchar(256) NULL
_data varchar(1024) NULL
_user_id int(11) NULL
_server_ip varchar(39) NULL
_app_version varchar(256) NULL
_state int(11) NULL
_is_set_stopped bool
What operation we are doing on this table:
For each job we will be having one update and 10 select query on this table. So we need high frequency for read and write.
There are many application which are manipulating this table by doing filter on:
_unique_key
_state
is_set_stopped
_user_id
_data field size varies from 5KB to 1 MB based on type of application and user.
Application can update selective attribute.
Solution we thought:
MySQL InnoDB
I think MySQL will not scale enough due to requirement on high read and write.
MySQL In Memory Table
Problem with this solution is that
It doesn't support dynamic field size. MEMORY tables use a fixed-length row-storage format. Variable-length types such as VARCHAR are stored using a fixed length. Source http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html
select for .... update it will lock a entire table. I don't know will it be a problem.
Redis
Redis look likes a good choice. But I think my table is not good for key value cache server.
It support only very let's set of datatypes. I can store only string in list. I need to store fields as JSON or some other format.
If clients want to update a particular attribute they need to download full value and then do parsing of object and repush to server.
May be I am wrong is there a way to do that?
Filtering based on value will not be possible.
May be I am wrong is there a way to do that?
MySQL InnoDB on TMPFS file system
This look promising. But don't no will it scale enough similar to Redis or MySQL in memory table.
In this question, you are confusing raw performance (i.e. efficiency) with scalability. They are different concepts.
Between the InnoDB and memory engines, InnoDB is likely to be the most scalable. InnoDB supports multi-versioning concurrency control, has plenty of optimizations to deal with contention, so it will handle concurrent accesses much better than the memory engine. Even if it may be slower in some I/O bound situations.
Redis is a single-threaded server. All the operations are serialized. It has zero scalability. It does not mean it is inefficient. On the contrary, it will likely support more connections that MySQL (due to its epoll-based event loop), and more traffic (due to its very efficient lock-free implementation and in-memory data structures).
To answer your question, I would give a try to MySQL with InnoDB. If it is properly configured (no synchronous commit, enough cache buffer, etc ...), it can sustain a good throughput. And instead of running it on top on tmpfs, I would consider SSD hardware.
Now, if you prefer to use Redis (which is not a relational store btw), you can certainly do it. There is no need to systematically serialize/deserialize your data. And filtering is indeed possible, provided you can anticipate all access paths and find an adapted data structure.
For instance:
one hash object per job. The key is _unique_key. The fields of the hash should correspond to the columns of your relational table.
one set per state value
2 sets for is_set_stopped
one set per userid value
For each job insertion, you need to pipeline the following commands:
HMSET job:AAA job_handle BBB data CCC user_id DDD server_ip EEE app_version FFF state GGG is_set_stopped HHH
SADD state:GGG AAA
SADD is_set_stopped:HHH AAA
SADD user_id:DDD AAA
You can easily update any field individually provided you maintain the corresponding sets.
You can perform filtering queries by intersecting the sets. For instance:
SINTER is_set_stopped:HHH state:GGG
With Redis, the bottleneck will likely be the network, especially if the data field is large. I hope you will have more jobs of 5KB than jobs of 1MB. For instance 1000 write/s of 1 MB objects represents 8 GBits/s, probably more than what your network can sustain. This is true for both Redis and MySQL.
I suggest postgresql, it's more capable (has more features and better support for complex queries and datatypes) than mysql and has a lot of tuning options.
If you give postgresql enough memory and tune the parameters right it will cache everything in memory.
Alternatively you could also use it on tmpfs if that's your preference and use streaming replication to a on-disk database for a hard copy.
Streaming replication has 3 operating modes asyncronously, on receive, and on fsync. If you use the first one, async, you don't have to wait for a sync to disk on the replication server so any updates will be very fast with tmpfs.
Since you also seem to have a lot of text fields, another feature might help, postgresql can store a textsearch vector on a row, and you can add an index on that and update it via a trigger with the concatenated content of all the rows you are searching on. That will give you an incredible boost in performance when doing text search on multiple columns versus any way you can possibly write that in mysql.
Regardless of database you use:
You state that _data is varchar[1024], yet you say it contains 5K to 1M of data? Is this actually a blob? Even if it was a mistake in the length mysql doesn't support varchar fields longer than 65535 bytes in length! I suppose that's not updated as much as the other rows, it might be wise to separate this into two tables, one with the static data and one with the dynamic data to minimize disk access.

Varbinary vs Blob in MySQL

I have about 2k of raw binary data that I need to store in a table, but don't know whether to choose the Varbinary or Blob type. I have read through the descriptions in the MySQL docs but didn't find any contract and compare descriptions. I also read that varbinary only supports up to 255 characters, but I successfully created a varbinary(2048) field, so I'm a bit confused.
The binary data does not need to be indexed, nor will I need to query on it. Is there an advantage to using one type over the other from PHP?
Thanks!
VARBINARY is bound to 255 bytes on MySQL 5.0.2 and below, to 65kB on 5.0.3 and above.
BLOB is bound to 65kB.
Ultimately, VARBINARY is virtually the same as BLOB (from the perspective of what can be stored in it), unless you want to preserve compatibility with "old" versions of MySQL. The MySQL Documentation says:
In most respects, you can regard a BLOB column as a VARBINARY column that can be as large as you like.
Actually blob can be bigger (there are tinyblob, blob, mediumblob & longblob http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html) with up to 2^32 -1 on size limit.
Also blob storage grows "outside" of the row, while max varbinary size is tied by amount of free row size available (so it can actually be less than 64Kb).
There are some minor differences between both
1) With Index scripting (blob needs a prefix size on indexes,
varbinary doesn't) http:/en/column-indexes.html
CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));
2) As already mentioned there are trailling space issues managed
differently between varbinary & blob at MySql 5.0.x or earlier
versions: http:///en/blob.html
http:///en/binary-varbinary.html
(truncating the links, since stackoverflow thinks too many links are spam)
One significant difference is blob types are stored in secondary storage, while varbinaries are stored inline in the row in the same way as varchars and other "simple" types.
This can have an impact on performance in a busy system, where the additional lookup to fetch and manipulate the blob data can be expensive.
It is worth to point that Memory storage engine does not support BLOB/TEXT but it works with VARBINARY.
I am just looking at a test app that stores around 5k binary data in a column. It initially used varbinary but since it is so slow I decided to try blob. Well I'm looking at disk write speed with atop and can't see any difference.
The only significant difference I read in mysql manual is that blobs are unsupported by the memory engine so any temporary tables you create with queries (see when mysql uses temp tables) will be created on-disk and that is much slower.
So you better bet on varbinary/binary if it is a short enough to fit into a row (at the moment 64k total for all columns).

How can mysql blobs be edited?

Is there a way to edit a blob with mysql, for example to delete from 39th to 48 byte or to insert some bytes (characters), at some position? Are there any such commands.
If your blob is extremely large, entering it into the database in the first place is very difficult.
MySQL only allows the maximum packet size to be sent as a command - which makes using large blobs difficult.
If your blobs are big enough that you care, you probably need to create some schema where they're stored as several rows with chunks, to allow them to be created in a sensible fashion.
BUT storing very large blobs is probably not a good idea in MySQL, as it's effectively a massive waste of your innodb_buffer_pool.
NB: By "Very large" I mean > 10M or so.

How to insert a file in MySQL database?

I want to insert a file in MYSQL database residing on a remote webserver using a webservice.
My question is: What type of table column (e.g. varchar, etc.) will store a file? And will the insert statement be somewhat different in case of a file?
File size by MySQL type:
TINYBLOB 255 bytes = 0.000255 Mb
BLOB 65535 bytes = 0.0655 Mb
MEDIUMBLOB 16777215 bytes = 16.78 Mb
LONGBLOB 4294967295 bytes = 4294.97 Mb = 4.295 Gb
Yet, in most cases, I would NOT recommend storing big blobs of bytes in database, even if it supports it, because it will increase overall database size & may cause real performance issues. You can read more on topic here. Many databases that care about consistent performance won't even let you do such thing. Like e.g. AWS DynamoDB, which is known to perform extremely well at any scale, limits single item record to 400KB. MongoDB does allow 16MB, which is also already too much, imo. MySQL allows all 4GB if you wish. But again, think twice before doing that. The case where you may be OK to store big blob of data with these column types would be - you have small traffic database and you just want to save all the stuff in one place for faster development. Like internal system in a small company.
The BLOB datatype is best for storing files.
See: How to store .pdf files into MySQL as BLOBs using PHP?
The MySQL BLOB reference manual has some interesting comments
The other answers will give you a good idea how to accomplish what you have asked for....
However
There are not many cases where this is a good idea. It is usually better to store only the filename in the database and the file on the file system.
That way your database is much smaller, can be transported around easier and more importantly is quicker to backup / restore.
You need to use BLOB, there's TINY, MEDIUM, LONG, and just BLOB, as with other types, choose one according to your size needs.
TINYBLOB 255
BLOB 65535
MEDIUMBLOB 16777215
LONGBLOB 4294967295
(in bytes)
The insert statement would be fairly normal. You need to read the file using fread and then addslashes to it.