Cassandra and JSON

Cassandra and JSON - json

Alright so, I'm taking massive amounts in what is supposed to be JSON format and I'm trying to insert it into a Cassandra cluster. The problem is that the data doesn't have a standard key:value format so I believe its not actually JSON.
Here's an example of the data:
'{"15151162":"6f0aa7ebc60af9b6dd5992341e155138b3ea369a","15149182":"c141929a6ccc6157f4de7055ea565e7a83f59aea","15144225":"f70a2cdecee0e7e9fe85819e74d0e09d36060909"}'
So then keeping that in mind I'm wondering and I know this is somewhat opinion, but do I have to pull apart and then mass insert the data or is there a better way where I could just map them to columns using some feature of CQL/Cassandra.
Also as additional information we're talking around 28 million records so ideally I'd like to do it using the CQL/Cassandra instead of reorganizing the objects in a programming language.
I am familiar with Java, C++ and SQL, fairly new to nosql/Hybrid nosql.
Thanks

If you don't have a key then create one as you ingest it by creating a guid and format the record like so.
{
"key": "3fa55ea6-de8b-4b6f-b11e-5a3701982c65",
"type": "weird data",
"data": {
"15144225": "f70a2cdecee0e7e9fe85819e74d0e09d36060909",
"15149182": "c141929a6ccc6157f4de7055ea565e7a83f59aea",
"15151162": "6f0aa7ebc60af9b6dd5992341e155138b3ea369a"
}
}
Adding a type field is very helpful for when the next programmer actually has to deserialize this data. A version field is probably a good idea too.

Related

How can I store JSON in a MySQL database table field?

I currently have an application that uses a third party API whose endpoints return JSON. When a component on the front-end is mounted, I execute a function which makes a GET request to my own back-end and in return my back-end makes a GET request to the third party API, then the response from the API is returned as JSON to the front-end.
I have a limited amount of allowed requests to that API so I want to make sure to save the response to my database so that when a future requests are made, my back-end would return what's in my database instead of making a whole new GET request.
I'm not sure if storing JSON is wise or possible and this is why I decided to ask. Under what data type should the JSON be saved and would there be any drawbacks to what I'm doing?

Yes you can save JSON to mysql database. Mysql added the JSON datatype above 5.7 version.
Please refer http://www.mysqltutorial.org/mysql-json/

Yes, this is possible. MySQL implements JSON datatype since 5.7 version.
If You are asking about technical details about how to operate with this datatype, here is excellent shortcut.
Just quoting few examples:
CREATING:
mysql> CREATE TABLE facts (sentence JSON);
INSERTING:
mysql> INSERT INTO facts VALUES
> ('{"mascot": "Our mascot is a dolphin named \\"Sakila\\"."}');
READING:
mysql> SELECT sentence->"$.mascot" FROM facts;
But I bet that a real question is about how wise it is to store a JSON in database.
So the general answer is:
if developers of particular RDBMS included such aproach in their implementation, it is intended and desired for use.
So, as long as it is good idea to format your data as a JSON at all, it should be also a good idea to store this data in JSON column in RDBMS. I do not have an experience in that particular implementation (prefer Postgresql rather than MySQL), but I had started using JSON datatype as soon as I've needed it and still I do not consider it as a bad decision or something.
Epecially, when you consider storing JSON formated data inside a file and hooking just paths inside database, using a JSON type instead should be a good idea. Almost always storing JSON formated data in files WILL be slower than inserting and querying JSON, Especially, when You need access only to a particular key-value pairs (you can query just a particular keys in ordinary selects).
HOWEVER, when Your data is not inteded to be stored as a JSON format AT ALL, it will be a bad idea to use a JSON datatype also. What kind of data does JSON not like? Basically, all sorts of unstructured streams, when a number-of-keys :TO: overall-size ratio is very small. An example would be a dictionary with one key, and value storing 500 kByte long string:
{"file": "a very very very ... long string, perhaps just encoded file"}
In such case - yes, a better aproach is to store it as a regular files.
So as always, it all depends on a particular use case :)

Yes, you can save JSON response inside the table itself.
But It will be in simple string format and parsing of JSON string required to be done at code level.
You can use CLOB for the same.

It would just be a BLOB and theres nothing inheriently wrong with that, its a supported datatype.
However, i would urge you to save the JSON to a file and keep a list of paths instead. simply because it will be quicker.

Native JSON support in MYSQL 5.7 : what are the pros and cons of JSON data type in MYSQL?

In MySQL 5.7 a new data type for storing JSON data in MySQL tables has been
added. It will obviously be a great change in MySQL. They listed some benefits
Document Validation - Only valid JSON documents can be stored in a
JSON column, so you get automatic validation of your data.
Efficient Access - More importantly, when you store a JSON document in a JSON column, it is not stored as a plain text value. Instead, it is stored
in an optimized binary format that allows for quicker access to object
members and array elements.
Performance - Improve your query
performance by creating indexes on values within the JSON columns.
This can be achieved with “functional indexes” on virtual columns.
Convenience - The additional inline syntax for JSON columns makes it
very natural to integrate Document queries within your SQL. For
example (features.feature is a JSON column): SELECT feature->"$.properties.STREET" AS property_street FROM features WHERE id = 121254;
WOW ! they include some great features. Now it is easier to manipulate data. Now it is possible to store more complex data in column.
So MySQL is now flavored with NoSQL.
Now I can imagine a query for JSON data something like
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN
(
SELECT JSON_EXTRACT(data,"$.inverted")
FROM t1 | {"series": 3, "inverted": 8}
WHERE JSON_EXTRACT(data,"$.inverted")<4 );
So can I store huge small relations in few json colum? Is it good? Does it break normalization. If this is possible then I guess it will act like NoSQL in a MySQL column. I really want to know more about this feature. Pros and cons of MySQL JSON data type.

SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...
Using a column inside an expression or function like this spoils any chance of the query using an index to help optimize the query. The query shown above is forced to do a table-scan.
The claim about "efficient access" is misleading. It means that after the query examines a row with a JSON document, it can extract a field without having to parse the text of the JSON syntax. But it still takes a table-scan to search for rows. In other words, the query must examine every row.
By analogy, if I'm searching a telephone book for people with first name "Bill", I still have to read every page in the phone book, even if the first names have been highlighted to make it slightly quicker to spot them.
MySQL 5.7 allows you to define a virtual column in the table, and then create an index on the virtual column.
ALTER TABLE t1
ADD COLUMN series AS (JSON_EXTRACT(data, '$.series')),
ADD INDEX (series);
Then if you query the virtual column, it can use the index and avoid the table-scan.
SELECT * FROM t1
WHERE series IN ...
This is nice, but it kind of misses the point of using JSON. The attractive part of using JSON is that it allows you to add new attributes without having to do ALTER TABLE. But it turns out you have to define an extra (virtual) column anyway, if you want to search JSON fields with the help of an index.
But you don't have to define virtual columns and indexes for every field in the JSON document—only those you want to search or sort on. There could be other attributes in the JSON that you only need to extract in the select-list like the following:
SELECT JSON_EXTRACT(data, '$.series') AS series FROM t1
WHERE <other conditions>
I would generally say that this is the best way to use JSON in MySQL. Only in the select-list.
When you reference columns in other clauses (JOIN, WHERE, GROUP BY, HAVING, ORDER BY), it's more efficient to use conventional columns, not fields within JSON documents.
I presented a talk called How to Use JSON in MySQL Wrong at the Percona Live conference in April 2018. I'll update and repeat the talk at Oracle Code One in the fall.
There are other issues with JSON. For example, in my tests it required 2-3 times as much storage space for JSON documents compared to conventional columns storing the same data.
MySQL is promoting their new JSON capabilities aggressively, largely to dissuade people against migrating to MongoDB. But document-oriented data storage like MongoDB is fundamentally a non-relational way of organizing data. It's different from relational. I'm not saying one is better than the other, it's just a different technique, suited to different types of queries.
You should choose to use JSON when JSON makes your queries more efficient.
Don't choose a technology just because it's new, or for the sake of fashion.
Edit: The virtual column implementation in MySQL is supposed to use the index if your WHERE clause uses exactly the same expression as the definition of the virtual column. That is, the following should use the index on the virtual column, since the virtual column is defined AS (JSON_EXTRACT(data,"$.series"))
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...
Except I have found by testing this feature that it does NOT work for some reason if the expression is a JSON-extraction function. It works for other types of expressions, just not JSON functions. UPDATE: this reportedly works, finally, in MySQL 5.7.33.

The following from MySQL 5.7 brings sexy back with JSON sounds good to me:
Using the JSON Data Type in MySQL comes with two advantages over
storing JSON strings in a text field:
Data validation. JSON documents will be automatically validated and
invalid documents will produce an error. Improved internal storage
format. The JSON data is converted to a format that allows quick read
access to the data in a structured format. The server is able to
lookup subobjects or nested values by key or index, allowing added
flexibility and performance.
...
Specialised flavours of NoSQL stores
(Document DBs, Key-value stores and Graph DBs) are probably better
options for their specific use cases, but the addition of this
datatype might allow you to reduce complexity of your technology
stack. The price is coupling to MySQL (or compatible) databases. But
that is a non-issue for many users.
Note the language about document validation as it is an important factor. I guess a battery of tests need to be performed for comparisons of the two approaches. Those two being:
Mysql with JSON datatypes
Mysql without
The net has but shallow slideshares as of now on the topic of mysql / json / performance from what I am seeing.
Perhaps your post can be a hub for it. Or perhaps performance is an after thought, not sure, and you are just excited to not create a bunch of tables.

From my experience, JSON implementation at least in MySql 5.7 is not very useful due to its poor performance.
Well, it is not so bad for reading data and validation. However, JSON modification is 10-20 times slower with MySql that with Python or PHP.
Lets imagine very simple JSON:
{ "name": "value" }
Lets suppose we have to convert it to something like that:
{ "name": "value", "newName": "value" }
You can create simple script with Python or PHP that will select all rows and update them one by one. You are not forced to make one huge transaction for it, so other applications will can use the table in parallel. Of course, you can also make one huge transaction if you want, so you'll get guarantee that MySql will perform "all or nothing", but other applications will most probably not be able to use database during transaction execution.
I have 40 millions rows table, and Python script updates it in 3-4 hours.
Now we have MySql JSON, so we don't need Python or PHP anymore, we can do something like that:
UPDATE `JsonTable` SET `JsonColumn` = JSON_SET(`JsonColumn`, "newName", JSON_EXTRACT(`JsonColumn`, "name"))
It looks simple and excellent. However, its speed is 10-20 times slower than Python version, and it is single transaction, so other applications can not modify the table data in parallel.
So, if we want to just duplicate JSON key in 40 millions rows table, we need to not use table at all during 30-40 hours. It has no sence.
About reading data, from my experience direct access to JSON field via JSON_EXTRACT in WHERE is also extremelly slow (much slower that TEXT with LIKE on not indexed column). Virtual generated columns perform much faster, however, if we know our data structure beforehand, we don't need JSON, we can use traditional columns instead. When we use JSON where it is really useful, i. e. when data structure is unknown or changes often (for example, custom plugin settings), virtual column creation on regular basis for any possible new columns doesn't look like good idea.
Python and PHP make JSON validation like a charm, so it is questionable do we need JSON validation on MySql side at all. Why not also validate XML, Microsoft Office documents or check spelling? ;)

I got into this problem recently, and I sum up the following experiences:
1, There isn't a way to solve all questions.
2, You should use the JSON properly.
One case:
I have a table named: CustomField, and it must two columns: name, fields.
name is a localized string, it content should like:
{
"en":"this is English name",
"zh":"this is Chinese name"
...(other languages)
}
And fields should be like this:
[
{
"filed1":"value",
"filed2":"value"
...
},
{
"filed1":"value",
"filed2":"value"
...
}
...
]
As you can see, both the name and the fields can be saved as JSON, and it works!
However, if I use the name to search this table very frequently, what should I do? Use the JSON_CONTAINS,JSON_EXTRACT...? Obviously, it's not a good idea to save it as JSON anymore, we should save it to an independent table:CustomFieldName.
From the above case, I think you should keep these ideas in mind:
Why MYSQL support JSON?
Why you want to use JSON? Did your business logic just need this? Or there is something else?
Never be lazy
Thanks

Strong disagree with some of things that are said in other answers (which, to be fair, was a few years ago).
We have very carefully started to adopt JSON fields with a healthy skepticism. Over time we've been adding this more.
This generally describes the situation we are in:
Like 99% of applications out there, we are not doing things at a massive scale. We work with many different applications and databases, the majority of these are capable of running on modest hardware.
We have processes and know-how in place to make changes if performance does become a problem.
We have a general idea of which tables are going to be large and think carefully about how we optimize queries for them.
We also know in which cases this is not really needed.
We're pretty good at data validation and static typing at the application layer.
Lastly,
When we use JSON for storing complex data, that data is never referenced directly by other tables. We also tend to never need to use them in where clauses in hot paths.
So with all this in mind, using a little JSON field instead of 1 or more tables vastly reduces the complexity of queries and data model. Removing this complexity makes it easier to write certain queries, makes our code simpler and just generally saves time.
Complexity and performance is something that needs to be carefully balanced. JSON fields should not be blindly applied, but for the cases where this works it's fantastic.
'JSON fields don't perform well' is a valid reason to not use JSON fields, if you are at a place where that performance difference matters.
One specific example is that we have a table where we store settings for video transcoding. The settings table has 1 'profile' per row, and the settings themselves have a maximum nesting level of 4 (arrays and objects).
Despite this being a large database overall, there's only a few hundreds of these records in the database. Suggesting to split this into 5 tables would yield no benefit and lots of pain.
This is an extreme example, but we have plenty of others (with more rows) where the decision to use JSON fields is a few years in the past, and hasn't yet caused an issue.
Last point: it is now possible to directly index on JSON fields.

is JSON a good solution for data transfer between client and server?

I am trying to understand why JSON is widely used for data transfer between client and server. I understand that it offers simple design which is easy to understand. However, on the contrary;
A JSON string includes repeated data, e.g, incase of a table, columns names (keys) are repeated in each object . Would it not be wise to send columns as first object and rest of the object should be the data (without columns/keys information) from the table.
Once we have a JSON object, the searching based on keys is expensive (in time) compared to indexes. Imagine a table with 20-30 column, doing this searching for each key for each object would cost a lot more time compare to directly using indexes.
There may be many more drawbacks and advantages, add here if you know one.

I think if you want data transfer then you want a table based format. The JSON format is not a table based format like standard databases or Excel. This can complicate analyzing data if there is a problem because someone will usually use excel for that (sorting, filtering, formulas). Also building test files will be more difficult because you can't simply use excel to export to JSON.
But, If you wanted to use JSON for data transfer you could basically build a JSON version of a CSV file. You would only use arrays.
Columns: ["First_Name", "Last_Name"]
Rows: [
["Joe", "Master"],
["Alice", "Gooberg"]
.... etc
]
Seems messy to me though.
If you wanted to use objects then you will have to embed Column names for every bit of data, which in my opinion indicates a wrong approach.

Storing multi-language data in JSON (MySQL)

I'm currently designing a multilanguage website's database.
The website must be able to store it's data in an undefined number of languages (3 languages at the moment, can be more in the future).
My question is:
Is there any downside to storing website's field values in JSON strings in MySQL?
My usual approach to this problem was to have an extra table (var_translations), where I would store each translation in a new row:
languages - vars - var_translations
I'm thinking I could store all translations for a "var" in one same row, in the "vars" table, using a TEXT field that could store a JSON string contaning an array for the values, which I could later work with in PHP:
{
"name":{
"EN": "Name",
"ES": "Nombre",
"FR": "Nom"
}
}
I'm not sure if there's anything wrong with this way of storing data, but I like how it supports multiple languages and it keeps the database cleaner and clearer.
Is there anything I should worry about this approach before I start implementing it?

It’s a perfect use of JSON in a database field.
Rolled this strategy out very successfully.
Otherwise you end up contorting your logical data structure to capture languages. And have a shed load of counter tables. The queries and performance become terrible too, especially if row is not present and want to have a default language where desired one not available.
We tended to name the fields with multilingual as a suffix
nameMultiligual {“en-gb”:”Name”, …., default:”en-gb”}
hobbiesMultilingual {…}
Use locale codes though.
You can then have a locale mode in your session application and the business layers can take care of pulling the value from the json with session locale, so it’s as easy to use from the presentation layer.

Of course there's a downside : you can't do easy requests, for example to get the elements which don't have a translation in French.
The point of relational databases is to structure data. Don't depart from that logic if you don't have important reasons.
In that precise case, as you saw, it's easy to have a table i18n_name holding the translations.
If (and only if) it's confirmed you should store raw JSON, then you might want to have a look at DBMS having a good support of it, most notably PostgreSQL.

Storing JSON in an msSQL database?

I'm developing a form generator, and wondering if it would be bad mojo to store JSON in an SQL database?
I want to keep my database & tables simple, so I was going to have
`pKey, formTitle, formJSON`
on a table, and then store
{["firstName":{"required":"true","type":"text"},"lastName":{"required":"true","type":"text"}}
in formJSON.
Any input is appreciated.

I use JSON extensively in my CMS (which hosts about 110 sites) and I find the speed of access data to be very fast. I was surprised that there wasn't more speed degradation. Every object in the CMS (Page, Layout, List, Topic, etc) has an NVARCHAR(MAX) column called JSONConfiguration. My ORM tool knows to look for that column and reconstitute it as an object if needed. Or, depending on the situation, I will just pass it to the client for jQuery or Ext JS to process.
As for readability / maintainability of my code, you might say it's improved because I now have classes that represent a lot of the JSON objects stored in the DB.
I used JSON.net for all serialization / deserialization. https://www.newtonsoft.com/json
I also use a single query to return meta-JSON with the actual data. As in the case of Ext JS, I have queries that return both the structure of the Ext JS object as well as the data the object will need. This cuts out one post back / SQL round trip.
I was also surprised at how fast the code was to parse a list of JSON objects and map them into a DataTable object that I then handed to a GridView.
The only downside I've seen to using JSON is indexing. If you have a property of the JSON you need to search, then you have to store it as a separate column.
There are JSON DB's out there that might server your needs better: CouchDB, MongoDB, and Cassandra.

A brilliant way to make an object database from sql server. I do this for all config objects and everything else that doesn't need any specific querying. extending your object - easy, just create a new property in your class and init with default value. Don't need a property any more? Just delete it in the class. Easy roll out, easy upgrade. Not suitable for all objects, but if you extract any prop you need to index on - keep using it. Very modern way of using sql server.

It will be slower than having the form defined in code, but one extra query shouldn't cause you much harm. (Just don't let 1 extra query become 10 extra queries!)
Edit: If you are selecting the row by formTitle instead of pKey (I would, because then your code will be more readable), put an index on formTitle

We have used a modified version of XML for exactly the purpose you decribe for seven or eight years and it works great. Our customers' form needs are so diverse that we could never keep up with a table/column approach. We are too far down the XML road to change very easily but I think JSON would work as well and maybe evan better.
Reporting is no problem with a couple of good parsing functions and I would defy anyone to find a significant difference in performance between our reporting/analytics and a table/column solution to this need.

I wouldn't recommend it.
If you ever want to do any reporting or query based on these values in the future it's going to make your life a lot harder than having a few extra tables/columns.
Why are you avoiding making new tables? I say if your application requires them go ahead and add them in... Also if someone has to go through your code/db later it's probably going to be harder for them to figure out what you had going on (depending on what kind of documentation you have).

You should be able to use SisoDb for this. http://sisodb.com

I think it not an optimal idea to store object data in a string in SQL. You have to do transformation outside of SQL in order to parse it. That presents a performance issue and you lose the leverage of using SQL native data parsing capability. A better way would be to store JSON as an XML datatype in SQL. This way, you kill two birds with one stone: You don't have to create shit load of tables and still get all the native querying benefits of SQL.
XML in SQL Server 2005? Better than JSON in Varchar?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008