Why PostgreSQL keeps json field when jsonb is much more efficient? - json

The first reason that pops up in the mind is backward capability. But maybe much more significant reasons exist?

json has its uses:
it has less processing overhead when you store and retrieve the whole column, because it is stored as plain text and doesn't have to be parsed and converted to the internal binary representation.
it conserves the formatting and attribute order.
it does not remove duplicate attributes.
In short, json is better if you don't want to process the column inside the database, that is, when the column is just used to store application data.
If you want to process the JSON inside the database, jsonb is better.

Related

Is it true that JSONB may take more disk space than JSON in PostgreSQL? and Why?

A lot of postgreSQL JSON related articles mentioned that JSONB might take slightly bigger disk space than JSON. I didn't find it in PostgreSQL docs. if its true, can someone help me understand why?
in this article:
"It may take more disk space than plain JSON due to a larger table footprint, though not always.". I am not sure what "large table footprint" means.
From the manual:
The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed
Essentially postgres is parsing the json at load and storing it in such a way that makes it easier to query and indexable. This transformation likely adds some storage size since it's not just storing the json string that was passed as-is.
I would imagine that under the hood the json data is being parsed into something more relational for storage and indexing (although the manual doesn't call it out). There is likely overhead for creating underlying (invisible to you) table schema to store this. That storage overhead (table metadata, and storage methodology of postgres tables) is likely the biggest factor in the increase in size.
Also on that page it reads:
Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.
Which would suggest that if your data has lots of duplicate keys or significant superfluous whitespace, that it will likely offset the change in storage size when postgres converts from json string to its jsonb format.
I believe the last pertinent section that may suggest differences in storage is:
When converting textual JSON input into jsonb, the primitive types described by RFC 7159 are effectively mapped onto native PostgreSQL types, as shown in Table 8.23. Therefore, there are some minor additional constraints on what constitutes valid jsonb data that do not apply to the json type, nor to JSON in the abstract, corresponding to limits on what can be represented by the underlying data type.
That hints that Postgres is parsing the JSON on load and converting to its own types. For strings in your json, that means a converstion to postgres' text type is a variable unlimited length string. By definition that will take on an extra byte or more to store the length of the string. As an example a string like "hi" which may generally only need two bytes to store, would end up using 3 bytes. 2 bytes for the string, and another extra byte to store that its 2 characters.
In the end, you are trading some upfront compute and some storage in order to save compute when reading the data. If the requirement of your application is to quickly read the data being stored in this json, then use jsonb. If the requirement, on the other hand, is backup, or to store and query it every now-and-again (but not often) then store it as json.

Storing large JSON data in Postgres is infeasible, so what are the alternatives?

I have large JSON data, greater than 2kB, in each record of my table and currently, these are being stored in JSONB field.
My tech stack is Django and Postgres.
I don't perform any updates/modifications on this json data but i do need to read it, frequently and fast. However, due to the JSON data being larger than 2kB, Postgres splits it into chunks and puts it into the TOAST table, and hence the read process has become very slow.
So what are the alternatives? Should i use another database like MongoDB to store these large JSON data fields?
Note: I don't want to pull the keys out from this JSON and turn them into columns. This data comes from an API.
It is hard to answer specifically without knowing the details of your situation, but here are some things you may try:
Use Postgres 12 (stored) generated columns to maintain the fields or smaller JSON blobs that are commonly needed. This adds storage overhead, but frees you from having to maintain this duplication yourself.
Create indexes for any JSON fields you are querying (Postgresql allows you to create indexes for JSON expressions).
Use a composite index, where the first field in the index the field you are querying on, and the second field (/json expression) is that value you wish to retrieve. In this case Postgresql should retrieve the value from the index.
Similar to 1, create a materialised view which extracts the fields you need and allows you to query them quickly. You can add indexes to the materialised view too. This may be a good solution as materialised views can be slow to update, but in your case your data doesn't update anyway.
Investigate why the toast tables are being slow. I'm not sure what performance you are seeing, but if you really do need to pull back a lot of data then you are going to need fast data access whatever database you choose to go with.
Your mileage may vary with all of the above suggestions, especially as each will depend on your particular use case. (see the questions in my comment)
However, the overall idea is to use the tools that Postgresql provides to make your data quickly accessible. Yes this may involve pulling the data out of its original JSON blob, but this doesn't need to be done manually. Postgresql provides some great tools for this.
If you just need to store and read fully this json object without using the json structure in your WHERE query, what about simply storing this data as binary in a bytea column? https://www.postgresql.org/docs/current/datatype-binary.html

How can I store JSON in a MySQL database table field?

I currently have an application that uses a third party API whose endpoints return JSON. When a component on the front-end is mounted, I execute a function which makes a GET request to my own back-end and in return my back-end makes a GET request to the third party API, then the response from the API is returned as JSON to the front-end.
I have a limited amount of allowed requests to that API so I want to make sure to save the response to my database so that when a future requests are made, my back-end would return what's in my database instead of making a whole new GET request.
I'm not sure if storing JSON is wise or possible and this is why I decided to ask. Under what data type should the JSON be saved and would there be any drawbacks to what I'm doing?
Yes you can save JSON to mysql database. Mysql added the JSON datatype above 5.7 version.
Please refer http://www.mysqltutorial.org/mysql-json/
Yes, this is possible. MySQL implements JSON datatype since 5.7 version.
If You are asking about technical details about how to operate with this datatype, here is excellent shortcut.
Just quoting few examples:
CREATING:
mysql> CREATE TABLE facts (sentence JSON);
INSERTING:
mysql> INSERT INTO facts VALUES
> ('{"mascot": "Our mascot is a dolphin named \\"Sakila\\"."}');
READING:
mysql> SELECT sentence->"$.mascot" FROM facts;
But I bet that a real question is about how wise it is to store a JSON in database.
So the general answer is:
if developers of particular RDBMS included such aproach in their implementation, it is intended and desired for use.
So, as long as it is good idea to format your data as a JSON at all, it should be also a good idea to store this data in JSON column in RDBMS. I do not have an experience in that particular implementation (prefer Postgresql rather than MySQL), but I had started using JSON datatype as soon as I've needed it and still I do not consider it as a bad decision or something.
Epecially, when you consider storing JSON formated data inside a file and hooking just paths inside database, using a JSON type instead should be a good idea. Almost always storing JSON formated data in files WILL be slower than inserting and querying JSON, Especially, when You need access only to a particular key-value pairs (you can query just a particular keys in ordinary selects).
HOWEVER, when Your data is not inteded to be stored as a JSON format AT ALL, it will be a bad idea to use a JSON datatype also. What kind of data does JSON not like? Basically, all sorts of unstructured streams, when a number-of-keys :TO: overall-size ratio is very small. An example would be a dictionary with one key, and value storing 500 kByte long string:
{"file": "a very very very ... long string, perhaps just encoded file"}
In such case - yes, a better aproach is to store it as a regular files.
So as always, it all depends on a particular use case :)
Yes, you can save JSON response inside the table itself.
But It will be in simple string format and parsing of JSON string required to be done at code level.
You can use CLOB for the same.
It would just be a BLOB and theres nothing inheriently wrong with that, its a supported datatype.
However, i would urge you to save the JSON to a file and keep a list of paths instead. simply because it will be quicker.

Efficient storage of Json in Django with postgres

I need to store json files with models in my django app and I'm wondering whether JSONField is my best choice or not. All I need is to store it for later, no searching or other querying is needed other than when I eventually need to retrieve it again. I'm using postgresql if that matters.
It's important to notice that the Django JSONField is a PostgreSQL specific model field. Internally it uses a jsonb field.
The documentation of PostgreSQL says in regard to jsonb fields:
jsonb data is stored in a decomposed binary format that makes it
slightly slower to input due to added conversion overhead, but
significantly faster to process
It also enforces that you put in valid JSON.
If read/write performance is the only thing that matters to you, I would go with a normal text field. If you want to enforce that JSON is put into the field and be open for future requirements (e.g. if you want to filter this column for a certain JSON key), then you are better off using JSONField.
Also have a look at this answer.

What column type should be used to store serialized data in a mysql db?

What column type should be used to store serialized data in a mysql db?
I know you can use varbinary, blob, text. What's considered the best and why?
Edit:
I understand it is not "good" to store serialized data. I need to do it in this one case though. Please just trust me on this and focus on the question if you have an answer. Thanks!
To answer: text is deprecated in a lot of DBMS it seems, so better use either a blob or a varchar with a high limit (and with blob you won't get any encoding issues, which is a major hassle with varchar and text).
Also as pointed in this thread at the MySQL forums, hard-drives are cheaper than software, so you'd better first design your software and make it work, and only then if space becomes an issue, you may want to optimize that aspect. So don't try to overoptimize the size of your column too early on, better set the size larger at first (plus this will avoid security issues).
About the various comments:
Too much SQL fanaticism here. Despite the fact that I am greatly fond of SQL and relational models, they also have their pitfalls.
Storing serialized data into the database as-is (such as storing JSON or XML formatted data) has a few advantages:
You can have a more flexible format for your data: adding and removing fields on the fly, changing the specification of the fields on the fly, etc...
Less impedance mismatch with the object model: you store and you fetch the data just as it is in your program, compared to fetching the data and then having to process and convert it between your program objects' structures and your relational database's structures.
And there are a lot more other advantages, so please no fanboyism: relational databases are a great tool, but let's not dish the other tools we can get. More tools, the better.
As for a concrete example of use, I tend to add a JSON field in my database to store extra parameters of a record where the columns (properties) of the JSON data will never be SELECT'd individually, but only used when the right record is already selected. In this case, I can still discriminate my records with the relational columns, and when the right record is selected, I can just use the extra parameters for whatever purpose I want.
So my advice to retain the best of both world (speed, serializability and structural flexibility), just use a few standard relational columns to serve as unique keys to discriminate between your rows, and then use a blob/varchar column where your serialized data will be inserted. Usually, only two/three columns are required for a unique key, thus this won't be a major overhead.
Also, you may be interested by PostgreSQL which now has a JSON datatype, and the PostSQL project to directly process JSON fields just as relational columns.
How much do you plan to store? Check out the specs for the string types at the MySQL docs and their sizes. The key here is that you don't care about indexing this column, but you also never want it to overflow and get truncated, since then you JSON is unreadable.
TINYTEXT L < 2^8
TEXT L < 2^16
MEDIUMTEXT L < 2^24
LONGTEXT L < 2^32
Where L is the length in character
Just plain text should be enough, but go bigger if you are storing more. Though, in that case, you might not want to be storing it in the db.
The length limits that #Twisted Pear mentions are good reasons.
Also consider that TEXT and its ilk have a charset associated with them, whereas BLOB data types do not. If you're just storing raw bytes of data, you might as well use BLOB instead of TEXT.
Note that you can still store textual data in a BLOB, you just can't do any SQL operations on it that take charset into account; it's just bytes to SQL. But that's probably not an issue in your case, since it's serialized data with structure unknown to SQL anyway. All you need to do is store bytes and fetch bytes. The interpretation of the bytes is up to your app.
I have also had troubles using LONGBLOB or LONGTEXT using certain client libraries (e.g. PHP) because the client tries to allocate a buffer as large as the largest possible data type, not knowing how large the content will be on any given row until it's fetched. This caused PHP to burst into flames as it tried to allocate a 4GB buffer. I don't know what client you're using, or whether it suffers from the same behavior.
The workaround: use MEDIUMBLOB or just BLOB, as long as those types are sufficient to store your serialized data.
On the issue of people telling you not to do this, I'm not going to tell you that (in spite of the fact that I'm an SQL advocate). It's true you can't use SQL expressions to perform operations on individual elements within the serialized data, but that's not your purpose. What you do gain by putting that data into the database includes:
Associate serialized data with other more relational data.
Ability to store and fetch serialized data according to transaction scope, COMMIT, ROLLBACK.
Store all your relational and non-relational data in one place, to make it easier to replicate to slaves, back up and restore, etc.
LONGTEXT
Wordpress stores serialized data in their postmeta table as LONGTEXT. I find the Wordpress database to be a good place to research datatypes for columns.
I might be late to the party, but the php.net documentation about serialized object states the following:
Note that this is a binary string which may include null bytes, and
needs to be stored and handled as such. For example, serialize()
output should generally be stored in a BLOB field in a database,
rather than a CHAR or TEXT field.
Source: http://php.net/manual/en/function.serialize.php
Hope that helps!
As of MySQL 5.7.8, MySQL supports a native JSON data type: MySQL Manual
Unless the serialized data has no other use than to be saved and restored from the database, you probably don't want to do it that way.
Typically, serialized data has several fields which should be stored in the database as separate columns. It is common for every item of serialized data to be a separate column. Some of those columns would naturally be key fields. Additional columns might plausibly added besides the data to indicate the date+time of when the insertion occurred, the responsible user, etc., etc.
I found:
varchar(5000)
to be the best balance of size/speed for us. Also, it works with the rails 3 serialize data (varbinary) was throwing serialize errors intermittently.