Store Miscellaneous Data in DB Table Row - mysql

Let's assume I need to store some data of unknown amount within a database table. I don't want to create extra tables, because this will take more time to get the data. The amount of data can be different.
My initial thought was to store data in a key1=value1;key2=value2;key3=value3 format, but the problem here is that some value can contain ; in its body. What is the best separator in this case? What other methods can I use to be able to store various data in a single row?
The example content of the row is like data=2012-05-14 20:07:45;text=This is a comment, but what if I contain a semicolon?;last_id=123456 from which I can then get through PHP an array with corresponding keys and values after correctly exploding row text with a seperator.

First of all: You never ever store more than one information in only one field, if you need to access them separately or search by one of them. This has been discussed here quite a few times.
Assuming you allwas want to access the complete collection of information at once, I recommend to use the native serialization format of your development environment: e.g. if it is PHP, use serialze().
If it is cross-plattform, JSON might be a way to go: Good JSON encoding/decoding libraries exist for something like all environments out there. The same is true for XML, but int his context the textual overhead of XML is going to bite a bit.
On a sidenote: Are you sure, that storing the data in additional tables is slower? You might want to benchmark that before finally deciding.
Edit:
After reading, that you use PHP: If you don't want to put it in a table, stick with serialize() / unserialize() and a MEDIUMTEXT field, this works perfectly, I do it all the time.

EAV (cringe) is probably the best way to store arbitrary values like you want, but it sounds like you're firmly against additional tables for whatever reason. In light of that, you could just save the result of json_encode in the table. When you read it back, just json_decode to get it back into an array.
Keep in mind that if you ever need to search for anything in this field, you're going to have to use a SQL LIKE. If you never need to search this field or join it to anything, I suppose it's OK, but if you do, you've totally thrown performance out the window.

it can be the quotes who separate them .
key1='value1';key2='value2';key3='value3'
if not like that , give your sql example and we can see how to do it.

Related

MySQL full text search on extracted json column

I am looking for answers to a particular scenario I am dealing with using MySQL's full text search and json column features.
I am storing unstructured/schema-less data in a json column that I want to enable full text search on. I am sure everybody knows MySQL does not allow FTS on json column. But there are ways around it and I have kind of settled on one but I am not sure what the implications of it will be in the long run.
Many people suggest copying the data that needs searching to a generated column. I cannot do that because, as I said, my data does not have any schema and it is not practical for me to create generated columns for all possibilities, it will defeat the purpose of using json in the first place.
So i thought about using just one generated column that takes the whole data from the source json column and stores it as json literal using MySQL's json_extract(). This appears to be working fine and I am satisfied with the results. I, however, am not sure about one thing. Storing json literal will mean storing other redundant things besides the actual data e.g. property keys/names. Considering how full text search works (inverted index and all), would redundant information like that have any effect on searching when data starts to grow? lets not worry about users trying to search for data using the json keys but instead on whether the inclusion if these keys from every row will affect FTS performance or not?
Last thing, yes I have done my homework, i know there are other better ways to do full text search but i am taking this path because of my situation and i need a really solid justification to look into alternate solutions.

Native JSON support in MYSQL 5.7 : what are the pros and cons of JSON data type in MYSQL?

In MySQL 5.7 a new data type for storing JSON data in MySQL tables has been
added. It will obviously be a great change in MySQL. They listed some benefits
Document Validation - Only valid JSON documents can be stored in a
JSON column, so you get automatic validation of your data.
Efficient Access - More importantly, when you store a JSON document in a JSON column, it is not stored as a plain text value. Instead, it is stored
in an optimized binary format that allows for quicker access to object
members and array elements.
Performance - Improve your query
performance by creating indexes on values within the JSON columns.
This can be achieved with “functional indexes” on virtual columns.
Convenience - The additional inline syntax for JSON columns makes it
very natural to integrate Document queries within your SQL. For
example (features.feature is a JSON column): SELECT feature->"$.properties.STREET" AS property_street FROM features WHERE id = 121254;
WOW ! they include some great features. Now it is easier to manipulate data. Now it is possible to store more complex data in column.
So MySQL is now flavored with NoSQL.
Now I can imagine a query for JSON data something like
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN
(
SELECT JSON_EXTRACT(data,"$.inverted")
FROM t1 | {"series": 3, "inverted": 8}
WHERE JSON_EXTRACT(data,"$.inverted")<4 );
So can I store huge small relations in few json colum? Is it good? Does it break normalization. If this is possible then I guess it will act like NoSQL in a MySQL column. I really want to know more about this feature. Pros and cons of MySQL JSON data type.
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...
Using a column inside an expression or function like this spoils any chance of the query using an index to help optimize the query. The query shown above is forced to do a table-scan.
The claim about "efficient access" is misleading. It means that after the query examines a row with a JSON document, it can extract a field without having to parse the text of the JSON syntax. But it still takes a table-scan to search for rows. In other words, the query must examine every row.
By analogy, if I'm searching a telephone book for people with first name "Bill", I still have to read every page in the phone book, even if the first names have been highlighted to make it slightly quicker to spot them.
MySQL 5.7 allows you to define a virtual column in the table, and then create an index on the virtual column.
ALTER TABLE t1
ADD COLUMN series AS (JSON_EXTRACT(data, '$.series')),
ADD INDEX (series);
Then if you query the virtual column, it can use the index and avoid the table-scan.
SELECT * FROM t1
WHERE series IN ...
This is nice, but it kind of misses the point of using JSON. The attractive part of using JSON is that it allows you to add new attributes without having to do ALTER TABLE. But it turns out you have to define an extra (virtual) column anyway, if you want to search JSON fields with the help of an index.
But you don't have to define virtual columns and indexes for every field in the JSON document—only those you want to search or sort on. There could be other attributes in the JSON that you only need to extract in the select-list like the following:
SELECT JSON_EXTRACT(data, '$.series') AS series FROM t1
WHERE <other conditions>
I would generally say that this is the best way to use JSON in MySQL. Only in the select-list.
When you reference columns in other clauses (JOIN, WHERE, GROUP BY, HAVING, ORDER BY), it's more efficient to use conventional columns, not fields within JSON documents.
I presented a talk called How to Use JSON in MySQL Wrong at the Percona Live conference in April 2018. I'll update and repeat the talk at Oracle Code One in the fall.
There are other issues with JSON. For example, in my tests it required 2-3 times as much storage space for JSON documents compared to conventional columns storing the same data.
MySQL is promoting their new JSON capabilities aggressively, largely to dissuade people against migrating to MongoDB. But document-oriented data storage like MongoDB is fundamentally a non-relational way of organizing data. It's different from relational. I'm not saying one is better than the other, it's just a different technique, suited to different types of queries.
You should choose to use JSON when JSON makes your queries more efficient.
Don't choose a technology just because it's new, or for the sake of fashion.
Edit: The virtual column implementation in MySQL is supposed to use the index if your WHERE clause uses exactly the same expression as the definition of the virtual column. That is, the following should use the index on the virtual column, since the virtual column is defined AS (JSON_EXTRACT(data,"$.series"))
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...
Except I have found by testing this feature that it does NOT work for some reason if the expression is a JSON-extraction function. It works for other types of expressions, just not JSON functions. UPDATE: this reportedly works, finally, in MySQL 5.7.33.
The following from MySQL 5.7 brings sexy back with JSON sounds good to me:
Using the JSON Data Type in MySQL comes with two advantages over
storing JSON strings in a text field:
Data validation. JSON documents will be automatically validated and
invalid documents will produce an error. Improved internal storage
format. The JSON data is converted to a format that allows quick read
access to the data in a structured format. The server is able to
lookup subobjects or nested values by key or index, allowing added
flexibility and performance.
...
Specialised flavours of NoSQL stores
(Document DBs, Key-value stores and Graph DBs) are probably better
options for their specific use cases, but the addition of this
datatype might allow you to reduce complexity of your technology
stack. The price is coupling to MySQL (or compatible) databases. But
that is a non-issue for many users.
Note the language about document validation as it is an important factor. I guess a battery of tests need to be performed for comparisons of the two approaches. Those two being:
Mysql with JSON datatypes
Mysql without
The net has but shallow slideshares as of now on the topic of mysql / json / performance from what I am seeing.
Perhaps your post can be a hub for it. Or perhaps performance is an after thought, not sure, and you are just excited to not create a bunch of tables.
From my experience, JSON implementation at least in MySql 5.7 is not very useful due to its poor performance.
Well, it is not so bad for reading data and validation. However, JSON modification is 10-20 times slower with MySql that with Python or PHP.
Lets imagine very simple JSON:
{ "name": "value" }
Lets suppose we have to convert it to something like that:
{ "name": "value", "newName": "value" }
You can create simple script with Python or PHP that will select all rows and update them one by one. You are not forced to make one huge transaction for it, so other applications will can use the table in parallel. Of course, you can also make one huge transaction if you want, so you'll get guarantee that MySql will perform "all or nothing", but other applications will most probably not be able to use database during transaction execution.
I have 40 millions rows table, and Python script updates it in 3-4 hours.
Now we have MySql JSON, so we don't need Python or PHP anymore, we can do something like that:
UPDATE `JsonTable` SET `JsonColumn` = JSON_SET(`JsonColumn`, "newName", JSON_EXTRACT(`JsonColumn`, "name"))
It looks simple and excellent. However, its speed is 10-20 times slower than Python version, and it is single transaction, so other applications can not modify the table data in parallel.
So, if we want to just duplicate JSON key in 40 millions rows table, we need to not use table at all during 30-40 hours. It has no sence.
About reading data, from my experience direct access to JSON field via JSON_EXTRACT in WHERE is also extremelly slow (much slower that TEXT with LIKE on not indexed column). Virtual generated columns perform much faster, however, if we know our data structure beforehand, we don't need JSON, we can use traditional columns instead. When we use JSON where it is really useful, i. e. when data structure is unknown or changes often (for example, custom plugin settings), virtual column creation on regular basis for any possible new columns doesn't look like good idea.
Python and PHP make JSON validation like a charm, so it is questionable do we need JSON validation on MySql side at all. Why not also validate XML, Microsoft Office documents or check spelling? ;)
I got into this problem recently, and I sum up the following experiences:
1, There isn't a way to solve all questions.
2, You should use the JSON properly.
One case:
I have a table named: CustomField, and it must two columns: name, fields.
name is a localized string, it content should like:
{
"en":"this is English name",
"zh":"this is Chinese name"
...(other languages)
}
And fields should be like this:
[
{
"filed1":"value",
"filed2":"value"
...
},
{
"filed1":"value",
"filed2":"value"
...
}
...
]
As you can see, both the name and the fields can be saved as JSON, and it works!
However, if I use the name to search this table very frequently, what should I do? Use the JSON_CONTAINS,JSON_EXTRACT...? Obviously, it's not a good idea to save it as JSON anymore, we should save it to an independent table:CustomFieldName.
From the above case, I think you should keep these ideas in mind:
Why MYSQL support JSON?
Why you want to use JSON? Did your business logic just need this? Or there is something else?
Never be lazy
Thanks
Strong disagree with some of things that are said in other answers (which, to be fair, was a few years ago).
We have very carefully started to adopt JSON fields with a healthy skepticism. Over time we've been adding this more.
This generally describes the situation we are in:
Like 99% of applications out there, we are not doing things at a massive scale. We work with many different applications and databases, the majority of these are capable of running on modest hardware.
We have processes and know-how in place to make changes if performance does become a problem.
We have a general idea of which tables are going to be large and think carefully about how we optimize queries for them.
We also know in which cases this is not really needed.
We're pretty good at data validation and static typing at the application layer.
Lastly,
When we use JSON for storing complex data, that data is never referenced directly by other tables. We also tend to never need to use them in where clauses in hot paths.
So with all this in mind, using a little JSON field instead of 1 or more tables vastly reduces the complexity of queries and data model. Removing this complexity makes it easier to write certain queries, makes our code simpler and just generally saves time.
Complexity and performance is something that needs to be carefully balanced. JSON fields should not be blindly applied, but for the cases where this works it's fantastic.
'JSON fields don't perform well' is a valid reason to not use JSON fields, if you are at a place where that performance difference matters.
One specific example is that we have a table where we store settings for video transcoding. The settings table has 1 'profile' per row, and the settings themselves have a maximum nesting level of 4 (arrays and objects).
Despite this being a large database overall, there's only a few hundreds of these records in the database. Suggesting to split this into 5 tables would yield no benefit and lots of pain.
This is an extreme example, but we have plenty of others (with more rows) where the decision to use JSON fields is a few years in the past, and hasn't yet caused an issue.
Last point: it is now possible to directly index on JSON fields.

how to insert multiple values in a single column in mysql

I have a table in My Sql qhere i have fields lie name, location, description and picture.
What I want to do is store multiple picture links in the picture column.
Is there a way of doing that without creating a separate table for picture?
Thank you
Well you need to perform some sort of serialization in order to do that. I used to do that before I moved to document-oriented databases. Quite possibly your best option is to store everything in a json format as it is pretty universal and I can't think of any language that cannot handle it and convert it back to an object, array, dictionary or whatever the language requires. Assuming you need to save the name of the file as in somefile.png, what you could do is store ["image1.png","image2.png","image3.png"] and so on. If you want to store a blob however it's a bit more complicated. You either have to create a second table or read the contents of each image, convert it to base64, load all base64 strings into an object and then serialize it into a json. I wouldn't recommend that as each operation would cost a lot of system resources.

Translate data in MySQL database into another with slightly different structure?

I did some research on this and couldn't find many introductory resources for a beginner so I'm looking for a basic understanding here of how the process works. The problem I'm trying to solve is as such: I want to move data from an old database to a new one with a slightly different structure, possibly mutating the data a little bit in the process. Without going into the nitty gritty detail.. what are the general steps involved in doing this?
From what I gathered I would either be...
writing a ton of SQL queries manually (eesh)
using some complex tool that may be overkill for what I'm doing
There is a lot of data in the database so writing INSERT queries from a SQL dump seems like a nightmare. What I was looking for is some way to write a simple program that inserts logic like for each row in the table "posts", take the value of the "body" attribute and put it in the "post-body attribute of the new database or something like that. I'm also looking for functionality like append a 0 to the data in the "user id" column then insert it in the new database (just an example, the point is to mutate the data slightly).
In my head I can easily construct the logic of how the migration would go very easily (definitely not rocket science here).. but I'm not sure how to make this happen on a computer to iterate over the ridiculous amount of data without doing it manually. What is the general process for doing this, and what tools might a beginner want to use? Is this even a good idea for someone who has never done it before?
Edit: by request, here is an example of a mutation I'd like to perform:
Old database: table "posts" with an attribute post_body that is a varchar 255.
New database: table "posts" with an attribute body" that is a text datatype.
Want to take post-body from the old one and put it in body in the new one. Realize they are different datatypes but they are both technically strings and should be fine to convert, right? Etc. a bunch of manipulations like this.
Usually, the most time-consuming step of a database conversion is understanding both the old and the new structure, and establishing the correspondance of fields in each structure.
Compared to that, the time it takes to write the corresponding SQL query is ridiculously short.
for each row in the table "posts", take the value of the "body" attribute and put it in the "post-body attribute of the new database
INSERT INTO newdb.postattribute (id, attribute, value)
SELECT postid, 'post-body', body FROM olddb.post;
In fact, the tool that allows such data manipulation is... SQL! Really, this is already a very high-level language.

mysql key/value store problem

I'm trying to implement a key/value store with mysql
I have a user table that has 2 columns, one for the global ID and one for the serialized data.
Now the problem is that everytime any bit of the user's data changes, I will have to retrieve the serialized data from the db, alter the data, then reserialize it and throw it back into the db. I have to repeat these steps even if there is a very very small change to any of the user's data (since there's no way to update that cell within the db itself)
Basically i'm looking at what solutions people normally use when faced with this problem?
Maybe you should preprocess your JSON data and insert data as a proper MySQL row separated into fields.
Since your input is JSON, you have various alternatives for converting data:
You mentioned many small changes happen in your case. Where do they occur? Do they happen in a member of a list? A top-level attribute?
If updates occur mainly in list members in a part of your JSON data, then perhaps every member should in fact be represented in a different table as separate rows.
If updates occur in an attribute, then represent it as a field.
I think cost of preprocessing won't hurt in your case.
When this is a problem, people do not use key/value stores, they design a normalized relational database schema to store the data in separate, single-valued columns which can be updated.
To be honest, your solution is using a database as a glorified file system - I would not recommend this approach for application data that is core to your application.
The best way to use a relational database, in my opinion, is to store relational data - tables, columns, primary and foreign keys, data types. There are situations where this doesn't work - for instance, if your data is really a document, or when the data structures aren't known in advance. For those situations, you can either extend the relational model, or migrate to a document or object database.
In your case, I'd see firstly if the serialized data could be modeled as relational data, and whether you even need a database. If so, move to a relational model. If you need a database but can't model the data as a relational set, you could go for a key/value model where you extract your serialized data into individual key/value pairs; this at least means that you can update/add the individual data field, rather than modify the entire document. Key/value is not a natural fit for RDBMSes, but it may be a smaller jump from your current architecture.
when you have a key/value store, assuming your serialized data is JSON,it is effective only when you have memcached along with it, because you don't update the database on the fly every time but instead you update the memcache & then push that to your database in background. so definitely you have to update the entire value but not an individual field in your JSON data like address alone in database. You can update & retrieve data fast from memcached. since there are no complex relations in database it will be fast to push & pull data from database to memcache.
I would continue with what you are doing and create separate tables for the indexable data. This allows you to treat your database as a single data store which is managed easily through most operation groups including updates, backups, restores, clustering, etc.
The only thing you may want to consider is to add ElasticSearch to the mix if you need to perform anything like a like query just for improved search performance.
If space is not an issue for you, I would even make it an insert only database so any changes adds a new record that way you can keep the history. Of course you may want to remove the older records but you can have a background job that would delete the superseded records in a batch in the background. (Mind you what I described is basically Kafka)
There's many alternatives out there now that beats RDBMS in terms of performance. However, they all add extra operational overhead in that it's yet another middleware to maintain.
The way around that if you have a microservices architecture is to keep the middleware as part of your microservice stack. However, you have to deal with transmitting the data across the microservices so you'd still end up with a switch to Kafka underneath it all.