What is the best way to store Time Series in MySQL? - mysql

I want to store a large number of Time Series (time vs value) data points. I would prefer using MySQL for the same. Right now I am planning to store the time series as a Binary Blob in MySQL. Is this the best way, what would be the best approach.

You should store your values as whatever type they are (int, boolean, char) and your times as either date, or int containing the UNIX timestamp, whatever fits your application better.

If you want to process the information in any way using mysql you should store it as a date type, numeric type .
The only scaling issue i see (if you only intend to store the information) is extra disk size.

As both Tom and aeon said, you should store data in its native format in MySQL if you want to do anything with that data (-> process the data with SQL).
If, on the other end, you don't want to work on that data, but just store/retrieve it, then you are using MySQL just as a blob container where every blob is a mess/group of multiple time/data points, and it could not be the best tool for the job: that's what files are designed for.
You could investigate an hybrid approach where you possibly store the data non-structured, but you store timestamps as discrete values. In other words, a key/value store with your timestamps as keys. That should open up the possibility to work with NoSQL solutions and maybe you could find out that they fit better (eg. think about running map/reduce jobs directly on the db in a Riak cluster)

Depending on your application, using the spatial data extensions of MySQL could be an option. You could then use spatial indexing to query the data fast.
To represent a time series, the LineString class might be the most appropriate choice. It represents a sequence of tuples you could use to store time and value.
Here it says that "On a world map, LineString objects could represent rivers."
Creating line strings is easy:
-- if you see a blob, use Convert( AsText(...) using 'utf8')
SELECT AsText(GeomFromText('LineString(1 1, 2 2, 3.5 3.9)'));
Some links to get started:
https://dev.mysql.com/doc/refman/5.1/en/spatial-extensions.html
https://dev.mysql.com/doc/refman/5.1/en/populating-spatial-columns.html

Related

Storing large JSON data in Postgres is infeasible, so what are the alternatives?

I have large JSON data, greater than 2kB, in each record of my table and currently, these are being stored in JSONB field.
My tech stack is Django and Postgres.
I don't perform any updates/modifications on this json data but i do need to read it, frequently and fast. However, due to the JSON data being larger than 2kB, Postgres splits it into chunks and puts it into the TOAST table, and hence the read process has become very slow.
So what are the alternatives? Should i use another database like MongoDB to store these large JSON data fields?
Note: I don't want to pull the keys out from this JSON and turn them into columns. This data comes from an API.
It is hard to answer specifically without knowing the details of your situation, but here are some things you may try:
Use Postgres 12 (stored) generated columns to maintain the fields or smaller JSON blobs that are commonly needed. This adds storage overhead, but frees you from having to maintain this duplication yourself.
Create indexes for any JSON fields you are querying (Postgresql allows you to create indexes for JSON expressions).
Use a composite index, where the first field in the index the field you are querying on, and the second field (/json expression) is that value you wish to retrieve. In this case Postgresql should retrieve the value from the index.
Similar to 1, create a materialised view which extracts the fields you need and allows you to query them quickly. You can add indexes to the materialised view too. This may be a good solution as materialised views can be slow to update, but in your case your data doesn't update anyway.
Investigate why the toast tables are being slow. I'm not sure what performance you are seeing, but if you really do need to pull back a lot of data then you are going to need fast data access whatever database you choose to go with.
Your mileage may vary with all of the above suggestions, especially as each will depend on your particular use case. (see the questions in my comment)
However, the overall idea is to use the tools that Postgresql provides to make your data quickly accessible. Yes this may involve pulling the data out of its original JSON blob, but this doesn't need to be done manually. Postgresql provides some great tools for this.
If you just need to store and read fully this json object without using the json structure in your WHERE query, what about simply storing this data as binary in a bytea column? https://www.postgresql.org/docs/current/datatype-binary.html

Analyzing multiple Json with Tableau

I'm beginning to use Tableau and I have a project involving multiple website logs stored as JSON. I have one log for each day for about a month, each weighting about 500-600 Mb.
Is it possible to open (and join) multiple JSON files in Tableau? If yes, how ? I can load them in parallel, but not join them.
EDIT : I can load multiple JSON files and define their relationship, so this is OK. I still have the memory issue:
I'm am worried that by joining them all, I will not have enough memory to make it work. Are the loaded files stored in RAM of in an internal DB ?
What would be the best way to do this ? Should I merge all the JSON first, or load them in a database and use a connector to Tableau? If so, what could be a good choice of DB?
I'm aware some of these questions are opinion-based, but I have no clue about this and I really need some guideline to get started.
For this volume of data, you probably want to preprocess, filter, aggregate and index it ahead of time - either using a database, something like Parquet and Spark and/or Tableau extracts.
If you use extracts, you probably want to filter and aggregate them for specific purposes, just be aware if that that you aggregate the data when you make the extract, you need to be careful that any further aggregations you perform in the visualization are well defined. Additive functions like SUM(), MIN() and MAX() are safe. Sums of partial sums are still correct sums. But averages of averages and count distincts of count distincts often are not.
Tableau sends a query to the database and then renders a visualization based on the query result set. The volume of data returned depends on the query which depends on what you specify in Tableau. Tableau caches results, and you can also create an extract which serves as a persistent, potentially filtered and aggregated, cache. See this related stack overflow answer
For text files and extracts, Tableau loads them into memory via its Data Engine process today -- replaced by a new in-memory database called Hyper in the future. The concept is the same though, Tableau sends the data source a query which returns a result set. For data of the size you are talking about, you might want to test using some sort of database if it the volume exceeds what comfortably fits in memory.
The JSON driver is very convenient for exploring JSON data, and I would definitely start there. You can avoid an entire ETL step if that serves your needs. But at high volume of data, you might need to move to some sort of external data source to handle production loads. FYI, the UNION feature with Tableau's JSON driver is not (yet) available as of version 10.1.
I think the answer which nobody gave is that No, you cannot join two JSON files in Tableau. Please correct me if I'm wrong.
I believe we can join 2 JSON tables in Tableau.
First extract the column names from the JSON data as below--
select
get_json_object(JSON_column, '$.Attribute1') as Attribute1,
get_json_object(line, '$.Attribute2') as Attribute2
from table_name;
perform the above for the required tableau and join them.

Native JSON support in MYSQL 5.7 : what are the pros and cons of JSON data type in MYSQL?

In MySQL 5.7 a new data type for storing JSON data in MySQL tables has been
added. It will obviously be a great change in MySQL. They listed some benefits
Document Validation - Only valid JSON documents can be stored in a
JSON column, so you get automatic validation of your data.
Efficient Access - More importantly, when you store a JSON document in a JSON column, it is not stored as a plain text value. Instead, it is stored
in an optimized binary format that allows for quicker access to object
members and array elements.
Performance - Improve your query
performance by creating indexes on values within the JSON columns.
This can be achieved with “functional indexes” on virtual columns.
Convenience - The additional inline syntax for JSON columns makes it
very natural to integrate Document queries within your SQL. For
example (features.feature is a JSON column): SELECT feature->"$.properties.STREET" AS property_street FROM features WHERE id = 121254;
WOW ! they include some great features. Now it is easier to manipulate data. Now it is possible to store more complex data in column.
So MySQL is now flavored with NoSQL.
Now I can imagine a query for JSON data something like
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN
(
SELECT JSON_EXTRACT(data,"$.inverted")
FROM t1 | {"series": 3, "inverted": 8}
WHERE JSON_EXTRACT(data,"$.inverted")<4 );
So can I store huge small relations in few json colum? Is it good? Does it break normalization. If this is possible then I guess it will act like NoSQL in a MySQL column. I really want to know more about this feature. Pros and cons of MySQL JSON data type.
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...
Using a column inside an expression or function like this spoils any chance of the query using an index to help optimize the query. The query shown above is forced to do a table-scan.
The claim about "efficient access" is misleading. It means that after the query examines a row with a JSON document, it can extract a field without having to parse the text of the JSON syntax. But it still takes a table-scan to search for rows. In other words, the query must examine every row.
By analogy, if I'm searching a telephone book for people with first name "Bill", I still have to read every page in the phone book, even if the first names have been highlighted to make it slightly quicker to spot them.
MySQL 5.7 allows you to define a virtual column in the table, and then create an index on the virtual column.
ALTER TABLE t1
ADD COLUMN series AS (JSON_EXTRACT(data, '$.series')),
ADD INDEX (series);
Then if you query the virtual column, it can use the index and avoid the table-scan.
SELECT * FROM t1
WHERE series IN ...
This is nice, but it kind of misses the point of using JSON. The attractive part of using JSON is that it allows you to add new attributes without having to do ALTER TABLE. But it turns out you have to define an extra (virtual) column anyway, if you want to search JSON fields with the help of an index.
But you don't have to define virtual columns and indexes for every field in the JSON document—only those you want to search or sort on. There could be other attributes in the JSON that you only need to extract in the select-list like the following:
SELECT JSON_EXTRACT(data, '$.series') AS series FROM t1
WHERE <other conditions>
I would generally say that this is the best way to use JSON in MySQL. Only in the select-list.
When you reference columns in other clauses (JOIN, WHERE, GROUP BY, HAVING, ORDER BY), it's more efficient to use conventional columns, not fields within JSON documents.
I presented a talk called How to Use JSON in MySQL Wrong at the Percona Live conference in April 2018. I'll update and repeat the talk at Oracle Code One in the fall.
There are other issues with JSON. For example, in my tests it required 2-3 times as much storage space for JSON documents compared to conventional columns storing the same data.
MySQL is promoting their new JSON capabilities aggressively, largely to dissuade people against migrating to MongoDB. But document-oriented data storage like MongoDB is fundamentally a non-relational way of organizing data. It's different from relational. I'm not saying one is better than the other, it's just a different technique, suited to different types of queries.
You should choose to use JSON when JSON makes your queries more efficient.
Don't choose a technology just because it's new, or for the sake of fashion.
Edit: The virtual column implementation in MySQL is supposed to use the index if your WHERE clause uses exactly the same expression as the definition of the virtual column. That is, the following should use the index on the virtual column, since the virtual column is defined AS (JSON_EXTRACT(data,"$.series"))
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...
Except I have found by testing this feature that it does NOT work for some reason if the expression is a JSON-extraction function. It works for other types of expressions, just not JSON functions. UPDATE: this reportedly works, finally, in MySQL 5.7.33.
The following from MySQL 5.7 brings sexy back with JSON sounds good to me:
Using the JSON Data Type in MySQL comes with two advantages over
storing JSON strings in a text field:
Data validation. JSON documents will be automatically validated and
invalid documents will produce an error. Improved internal storage
format. The JSON data is converted to a format that allows quick read
access to the data in a structured format. The server is able to
lookup subobjects or nested values by key or index, allowing added
flexibility and performance.
...
Specialised flavours of NoSQL stores
(Document DBs, Key-value stores and Graph DBs) are probably better
options for their specific use cases, but the addition of this
datatype might allow you to reduce complexity of your technology
stack. The price is coupling to MySQL (or compatible) databases. But
that is a non-issue for many users.
Note the language about document validation as it is an important factor. I guess a battery of tests need to be performed for comparisons of the two approaches. Those two being:
Mysql with JSON datatypes
Mysql without
The net has but shallow slideshares as of now on the topic of mysql / json / performance from what I am seeing.
Perhaps your post can be a hub for it. Or perhaps performance is an after thought, not sure, and you are just excited to not create a bunch of tables.
From my experience, JSON implementation at least in MySql 5.7 is not very useful due to its poor performance.
Well, it is not so bad for reading data and validation. However, JSON modification is 10-20 times slower with MySql that with Python or PHP.
Lets imagine very simple JSON:
{ "name": "value" }
Lets suppose we have to convert it to something like that:
{ "name": "value", "newName": "value" }
You can create simple script with Python or PHP that will select all rows and update them one by one. You are not forced to make one huge transaction for it, so other applications will can use the table in parallel. Of course, you can also make one huge transaction if you want, so you'll get guarantee that MySql will perform "all or nothing", but other applications will most probably not be able to use database during transaction execution.
I have 40 millions rows table, and Python script updates it in 3-4 hours.
Now we have MySql JSON, so we don't need Python or PHP anymore, we can do something like that:
UPDATE `JsonTable` SET `JsonColumn` = JSON_SET(`JsonColumn`, "newName", JSON_EXTRACT(`JsonColumn`, "name"))
It looks simple and excellent. However, its speed is 10-20 times slower than Python version, and it is single transaction, so other applications can not modify the table data in parallel.
So, if we want to just duplicate JSON key in 40 millions rows table, we need to not use table at all during 30-40 hours. It has no sence.
About reading data, from my experience direct access to JSON field via JSON_EXTRACT in WHERE is also extremelly slow (much slower that TEXT with LIKE on not indexed column). Virtual generated columns perform much faster, however, if we know our data structure beforehand, we don't need JSON, we can use traditional columns instead. When we use JSON where it is really useful, i. e. when data structure is unknown or changes often (for example, custom plugin settings), virtual column creation on regular basis for any possible new columns doesn't look like good idea.
Python and PHP make JSON validation like a charm, so it is questionable do we need JSON validation on MySql side at all. Why not also validate XML, Microsoft Office documents or check spelling? ;)
I got into this problem recently, and I sum up the following experiences:
1, There isn't a way to solve all questions.
2, You should use the JSON properly.
One case:
I have a table named: CustomField, and it must two columns: name, fields.
name is a localized string, it content should like:
{
"en":"this is English name",
"zh":"this is Chinese name"
...(other languages)
}
And fields should be like this:
[
{
"filed1":"value",
"filed2":"value"
...
},
{
"filed1":"value",
"filed2":"value"
...
}
...
]
As you can see, both the name and the fields can be saved as JSON, and it works!
However, if I use the name to search this table very frequently, what should I do? Use the JSON_CONTAINS,JSON_EXTRACT...? Obviously, it's not a good idea to save it as JSON anymore, we should save it to an independent table:CustomFieldName.
From the above case, I think you should keep these ideas in mind:
Why MYSQL support JSON?
Why you want to use JSON? Did your business logic just need this? Or there is something else?
Never be lazy
Thanks
Strong disagree with some of things that are said in other answers (which, to be fair, was a few years ago).
We have very carefully started to adopt JSON fields with a healthy skepticism. Over time we've been adding this more.
This generally describes the situation we are in:
Like 99% of applications out there, we are not doing things at a massive scale. We work with many different applications and databases, the majority of these are capable of running on modest hardware.
We have processes and know-how in place to make changes if performance does become a problem.
We have a general idea of which tables are going to be large and think carefully about how we optimize queries for them.
We also know in which cases this is not really needed.
We're pretty good at data validation and static typing at the application layer.
Lastly,
When we use JSON for storing complex data, that data is never referenced directly by other tables. We also tend to never need to use them in where clauses in hot paths.
So with all this in mind, using a little JSON field instead of 1 or more tables vastly reduces the complexity of queries and data model. Removing this complexity makes it easier to write certain queries, makes our code simpler and just generally saves time.
Complexity and performance is something that needs to be carefully balanced. JSON fields should not be blindly applied, but for the cases where this works it's fantastic.
'JSON fields don't perform well' is a valid reason to not use JSON fields, if you are at a place where that performance difference matters.
One specific example is that we have a table where we store settings for video transcoding. The settings table has 1 'profile' per row, and the settings themselves have a maximum nesting level of 4 (arrays and objects).
Despite this being a large database overall, there's only a few hundreds of these records in the database. Suggesting to split this into 5 tables would yield no benefit and lots of pain.
This is an extreme example, but we have plenty of others (with more rows) where the decision to use JSON fields is a few years in the past, and hasn't yet caused an issue.
Last point: it is now possible to directly index on JSON fields.

Efficient and scalable storage for JSON data with NoSQL databases

We are working on a project which should collect journal and audit data and store it in a datastore for archive purposes and some views. We are not quite sure which datastore would work for us.
we need to store small JSON documents, about 150 bytes, e.g. "audit:{timestamp: '86346512',host':'foo',username:'bar',task:'foo',result:0}" or "journal:{timestamp:'86346512',host':'foo',terminalid:1,type='bar',rc=0}"
we are expecting about one million entries per day, about 150 MB data
data will be stored and read but never modified
data should stored in an efficient way, e.g. binary format used by Apache Avro
after a retention time data may be deleted
custom queries, such as 'get audit for user and time period' or 'get journal for terminalid and time period'
replicated data base for failsafe
scalable
Currently we are evaluating NoSQL databases like Hadoop/Hbase, CouchDB, MongoDB and Cassandra. Are these databases the right datastore for us? Which of them would fit best?
Are there better options?
One million inserts / day is about 10 inserts / second. Most databases can deal with this, and its well below the max insertion rate we get from Cassandra on reasonable hardware (50k inserts / sec)
Your requirement "after a retention time data may be deleted" fits Cassandra's column TTLs nicely - when you insert data you can specify how long to keep it for, then background merge processes will drop that data when it reaches that timeout.
"data should stored in an efficient way, e.g. binary format used by Apache Avro" - Cassandra (like many other NOSQL stores) treats values as opaque byte sequences, so you can encode you values how ever you like. You could also consider decomposing the value into a series of columns, which would allow you to do more complicated queries.
custom queries, such as 'get audit for user and time period' - in Cassandra, you would model this by having the row key to be the user id and the column key being the time of the event (most likely a timeuuid). You would then use a get_slice call (or even better CQL) to satisfy this query
or 'get journal for terminalid and time period' - as above, have the row key be terminalid and column key be timestamp. One thing to note is that in Cassandra (like many join-less stores), it is typical to insert the data more than once (in different arrangements) to optimise for different queries.
Cassandra has a very sophisticate replication model, where you can specify different consistency levels per operation. Cassandra is also very scalable system with no single point of failure or bottleneck. This is really the main difference between Cassandra and things like MongoDB or HBase (not that I want to start a flame!)
Having said all of this, your requirements could easily be satisfied by a more traditional database and simple master-slave replication, nothing here is too onerous
Avro supports schema evolution and is a good fit for this kind of problem.
If your system does not require low latency data loads, consider receiving the data to files in a reliable file system rather than loading directly into a live database system. Keeping a reliable file system (such as HDFS) running is simpler and less likely to have outages than a live database system. Also, separating the responsibilities ensures that your query traffic won't ever impact the data collection system.
If you will only have a handful of queries to run, you could leave the files in their native format and write custom map reduces to generate the reports you need. If you want a higher level interface, consider running Hive over the native data files. Hive will let you run arbitrary friendly SQL-like queries over your raw data files. Or, since you only have 150MB/day, you could just batch load it into MySQL readonly compressed tables.
If for some reason you need the complexity of an interactive system, HBase or Cassandra or might be good fits, but beware that you'll spend a significant amount of time playing "DBA", and 150MB/day is so little data that you probably don't need the complexity.
We're using Hadoop/HBase, and I've looked at Cassandra, and they generally use the row key as the means to retrieve data the fastest, although of course (in HBase at least) you can still have it apply filters on the column data, or do it client side. For example, in HBase, you can say "give me all rows starting from key1 up to, but not including, key2".
So if you design your keys properly, you could get everything for 1 user, or 1 host, or 1 user on 1 host, or things like that. But, it takes a properly designed key. If most of your queries need to be run with a timestamp, you could include that as part of the key, for example.
How often do you need to query the data/write the data? If you expect to run your reports and it's fine if it takes 10, 15, or more minutes (potentially), but you do a lot of small writes, then HBase w/Hadoop doing MapReduce (or using Hive or Pig as higher level query languages) would work very well.
If your JSON data has variable fields, then a schema-less model like Cassandra could suit your needs very well. I'd expand the data into columns rather then storing it in binary format, that will make it easier to query. With the given rate of data, it would take you 20 years to fill a 1 TB disk, so I wouldn't worry about compression.
For the example you gave, you could create two column families, Audit and Journal. The row keys would be TimeUUIDs (i.e. timestamp + MAC address to turn them into unique keys). Then the audit row you gave would have four columns, host:'foo', username:'bar', task:'foo', and result:0. Other rows could have different columns.
A range scan over the row keys would allow you to query efficiently over time periods (assuming you use ByteOrderedPartitioner). You could then use secondary indexes to query on users and terminals.

What column type should be used to store serialized data in a mysql db?

What column type should be used to store serialized data in a mysql db?
I know you can use varbinary, blob, text. What's considered the best and why?
Edit:
I understand it is not "good" to store serialized data. I need to do it in this one case though. Please just trust me on this and focus on the question if you have an answer. Thanks!
To answer: text is deprecated in a lot of DBMS it seems, so better use either a blob or a varchar with a high limit (and with blob you won't get any encoding issues, which is a major hassle with varchar and text).
Also as pointed in this thread at the MySQL forums, hard-drives are cheaper than software, so you'd better first design your software and make it work, and only then if space becomes an issue, you may want to optimize that aspect. So don't try to overoptimize the size of your column too early on, better set the size larger at first (plus this will avoid security issues).
About the various comments:
Too much SQL fanaticism here. Despite the fact that I am greatly fond of SQL and relational models, they also have their pitfalls.
Storing serialized data into the database as-is (such as storing JSON or XML formatted data) has a few advantages:
You can have a more flexible format for your data: adding and removing fields on the fly, changing the specification of the fields on the fly, etc...
Less impedance mismatch with the object model: you store and you fetch the data just as it is in your program, compared to fetching the data and then having to process and convert it between your program objects' structures and your relational database's structures.
And there are a lot more other advantages, so please no fanboyism: relational databases are a great tool, but let's not dish the other tools we can get. More tools, the better.
As for a concrete example of use, I tend to add a JSON field in my database to store extra parameters of a record where the columns (properties) of the JSON data will never be SELECT'd individually, but only used when the right record is already selected. In this case, I can still discriminate my records with the relational columns, and when the right record is selected, I can just use the extra parameters for whatever purpose I want.
So my advice to retain the best of both world (speed, serializability and structural flexibility), just use a few standard relational columns to serve as unique keys to discriminate between your rows, and then use a blob/varchar column where your serialized data will be inserted. Usually, only two/three columns are required for a unique key, thus this won't be a major overhead.
Also, you may be interested by PostgreSQL which now has a JSON datatype, and the PostSQL project to directly process JSON fields just as relational columns.
How much do you plan to store? Check out the specs for the string types at the MySQL docs and their sizes. The key here is that you don't care about indexing this column, but you also never want it to overflow and get truncated, since then you JSON is unreadable.
TINYTEXT L < 2^8
TEXT L < 2^16
MEDIUMTEXT L < 2^24
LONGTEXT L < 2^32
Where L is the length in character
Just plain text should be enough, but go bigger if you are storing more. Though, in that case, you might not want to be storing it in the db.
The length limits that #Twisted Pear mentions are good reasons.
Also consider that TEXT and its ilk have a charset associated with them, whereas BLOB data types do not. If you're just storing raw bytes of data, you might as well use BLOB instead of TEXT.
Note that you can still store textual data in a BLOB, you just can't do any SQL operations on it that take charset into account; it's just bytes to SQL. But that's probably not an issue in your case, since it's serialized data with structure unknown to SQL anyway. All you need to do is store bytes and fetch bytes. The interpretation of the bytes is up to your app.
I have also had troubles using LONGBLOB or LONGTEXT using certain client libraries (e.g. PHP) because the client tries to allocate a buffer as large as the largest possible data type, not knowing how large the content will be on any given row until it's fetched. This caused PHP to burst into flames as it tried to allocate a 4GB buffer. I don't know what client you're using, or whether it suffers from the same behavior.
The workaround: use MEDIUMBLOB or just BLOB, as long as those types are sufficient to store your serialized data.
On the issue of people telling you not to do this, I'm not going to tell you that (in spite of the fact that I'm an SQL advocate). It's true you can't use SQL expressions to perform operations on individual elements within the serialized data, but that's not your purpose. What you do gain by putting that data into the database includes:
Associate serialized data with other more relational data.
Ability to store and fetch serialized data according to transaction scope, COMMIT, ROLLBACK.
Store all your relational and non-relational data in one place, to make it easier to replicate to slaves, back up and restore, etc.
LONGTEXT
Wordpress stores serialized data in their postmeta table as LONGTEXT. I find the Wordpress database to be a good place to research datatypes for columns.
I might be late to the party, but the php.net documentation about serialized object states the following:
Note that this is a binary string which may include null bytes, and
needs to be stored and handled as such. For example, serialize()
output should generally be stored in a BLOB field in a database,
rather than a CHAR or TEXT field.
Source: http://php.net/manual/en/function.serialize.php
Hope that helps!
As of MySQL 5.7.8, MySQL supports a native JSON data type: MySQL Manual
Unless the serialized data has no other use than to be saved and restored from the database, you probably don't want to do it that way.
Typically, serialized data has several fields which should be stored in the database as separate columns. It is common for every item of serialized data to be a separate column. Some of those columns would naturally be key fields. Additional columns might plausibly added besides the data to indicate the date+time of when the insertion occurred, the responsible user, etc., etc.
I found:
varchar(5000)
to be the best balance of size/speed for us. Also, it works with the rails 3 serialize data (varbinary) was throwing serialize errors intermittently.