Cassandra query by a field in JSON - json

I'm using the latest cassandra version and trying to save JSON like below and was successful,
INSERT INTO mytable JSON '{"username": "myname", "country": "mycountry", "userid": "1"}'
Above query saves the record like,
"rows": [
{
"[json]": "{\"userid\": \"1\", \"country\": \"india\", \"username\": \"sai\"}"
}
],
"rowLength": 1,
"columns": [
{
"name": "[json]",
"type": {
"code": 13,
"type": null
}
}
]
Now I would like to retrieve the record based on userid:
SELECT JSON * FROM mytable WHERE userid = fromJson("1") // but this query throws error
All this occurs in a node/express app and I'm using dse-driver as the client driver.

The CQL command worked like below,
SELECT JSON * FROM mytable WHERE userid="1";
However if it has to be executed via the dse-driver then the below snippet worked,
let query = 'SELECT JSON * FROM mytable WHERE userid = ?';
client.execute(query, ["1"], { prepare: true });
where client is,
const dse = require('dse-driver');
const client = new dse.Client({
contactPoints: ['h1', 'h2'],
authProvider: new dse.auth.DsePlainTextAuthProvider('username', 'pass')
});

If your Cassandra version is 2.1x and below, you can use the Python-based approach. Write a python script using Cassandra-Python API
Here you have to get your row first and then use python json's loads method, which will convert your json text column value into JSON object which will be dict in Python. Then you can play around with Python dictionaries and extract your required nested keys. See the below code snippet.
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import json
if __name__ == '__main__':
auth_provider = PlainTextAuthProvider(username='xxxx', password='xxxx')
cluster = Cluster(['0.0.0.0'],
port=9042, auth_provider=auth_provider)
session = cluster.connect("keyspace_name")
print("session created successfully")
rows = session.execute('select * from user limit 10')
for user_row in rows:
#fetchign your json column
column_dict = json.loads(user_row.json_col)
print(column_dict().keys()

Assuming user-id is the partition key, and assuming you want to retrieve a JSON object corresponding to user of id 1, you should try:
SELECT JSON * FROM mytable WHERE userid=1;
If userid is of type text, you will need to add some quotes.

Related

How can I use the oracle REGEXP_SUBSTR to extract specific json values?

I have some columns in my Oracle database that contains json and to extract it's data in a query, I use REGEXP_SUBSTR.
In the following example, value is a column in the table DOSSIER that contains json. The regex extract the value of the property client.reference in that json
SELECT REGEXP_SUBSTR(value, '"client"(.*?)"reference":"([^"]+)"', 1, 1, NULL, 2) FROM DOSSIER;
So if the json looks like this :
[...],
"client": {
"someproperty":"123",
"someobject": {
[...]
},
"reference":"ABCD",
"someotherproperty":"456"
},
[...]
The SQL query will return ABDC.
My problem is that some json have multiple instance of "client", for example :
[...],
"contract": {
"client":"Name of the client",
"supplier": {
"reference":"EFGH"
}
},
[...],
"client": {
"someproperty":"123",
"someobject": {
[...]
},
"reference":"ABCD",
"someotherproperty":"456"
},
[...]
You get the issue, now the SQL query will return EFGH, which is the supplier's reference.
How can I make sure that "reference" is contained in a json object "client" ?
EDIT : I'm on Oracle 11g so I can't use the JSON API and I would like to avoid using third-party package
Assuming you are using Oracle 12c or later then you should NOT use regular expressions and should use Oracle's JSON functions.
If you have the table and data:
CREATE TABLE table_name ( value CLOB CHECK ( value IS JSON ) );
INSERT INTO table_name (
value
) VALUES (
'{
"contract": {
"client":"Name of the client",
"supplier": {
"reference":"EFGH"
}
},
"client": {
"someproperty":"123",
"someobject": {},
"reference":"ABCD",
"someotherproperty":"456"
}
}'
);
Then you can use the query:
SELECT JSON_VALUE( value, '$.client.reference' ) AS reference
FROM table_name;
Which outputs:
REFERENCE
ABCD
db<>fiddle here
If you are using Oracle 11 or earlier then you could use the third-party PLJSON package to parse JSON in PL/SQL. For example, this question.
Or enable Java within the database and then use CREATE JAVA (or the loadjava utility) to add a Java class that can parse JSON to the database and then wrap it in an Oracle function and use that.
I faced similar issue recently. If "reference" is a property that is only present inside "client" object, this will solve:
SELECT reference FROM (
SELECT DISTINCT
REGEXP_SUBSTR(
DBMS_LOB.SUBSTR(
value,
4000
),
'"reference":"(.+?)"',
1, 1, 'c', 1) reference
FROM DOSSIER
) WHERE reference IS NOT null;
You can also try to adapt the regex to your need.
Edit:
In my case, column type is CLOB and that's why I use DBMS_LOB.SUBSTR function there. You can remove this function and pass column directly in REGEXP_SUBSTR.

mySQL/Sequelize - how to query to get data whose field is empty array

My http request will return below data:
It returns below data:
Users.js
{
{
...
friends:[]
},
{
...
friends:[{id:xxx,...},...]
},
{
...
friends:[]
},
}
If I want to use query to get all data whose friends array is [],
should I do below query.
select * from users where (what should I write here)
If friends is a direct column in your database is JSON array. You can use JSON_LENGTH to find out the length of array.
SELECT JSON_LENGTH('[1, 2, {"a": 3}]'); // Output: 3
SELECT JSON_LENGTH('[]'); // Output: 0
You can use same concept to get data from your database.
select *
FROM users
WHERE JSON_LENGTH(friends) = 0;
If you've nested JSON and one of key is friends in that json for given column(data) then your query would be like using JSON_CONTAINS
SELECT *
FROM users
WHERE JSON_CONTAINS(data, JSON_ARRAY(), '$.friends') -- To check do we have `friends` as key in that json
and JSON_LENGTH(data, '$.friends') = 0; -- To check whether it is empty array.
Now you can convert it to sequelize query. One of the way you can use is
Model.findAll({
where: {
[Op.and]: [
Sequelize.literal('RAW SQL STATEMENT WHICH WONT BE ESCAPED!!!')
]
}
})
Make sure to update Model with your user model and query.

I cannot parse "country" and "name" with PARSE_JSON function in this very simple JSON object with SnowSQL

I'm ingesting a large simple json dataset from Azure Blob and moving data into a "stage" called "cities_stage" with FILE_FORMAT = json like so.
(Here is the error steps are below "Error parsing JSON: unknown keyword "Hurzuf", pos 7.")
create or replace stage cities_stage
url='azure://XXXXXXX.blob.core.windows.net/xxxx/landing/cities'
credentials=(azure_sas_token='?st=XXXXX&se=XXX&sp=racwdl&sv=XX&sr=c&sig=XXX')
FILE_FORMAT = (type = json);
I then take this stage location and dump it into a table with a single variant column like so. The file I'm ingesting is larger than 16mb so I create individual rows for each object by using type = json strip_outer_array = true
create or replace table cities_raw_source (
src variant);
copy into cities_raw_source
from #cities_stage
file_format = (type = json strip_outer_array = true)
on_error = continue;
When I select * from cities_raw_source each row looks like the following.
{
"coord": {
"lat": 44.549999,
"lon": 34.283333
},
"country": "UA",
"id": 707860,
"name": "Hurzuf"
}
When I add a reference to "country" or "name" that's where the issues come in. Here is my query (I did not use country in this one but it produces the same result).
select parse_json(src:id),
parse_json(src:coord:lat),
parse_json(src:coord:lon),
parse_json(src:name)
from cities_raw_source;
ERROR:
Error parsing JSON: unknown keyword "Hurzuf", pos 7.
ID, Lat, and Lon all come back as expected if I remove "src:name"
Any help is appreciated!
It turns out I had everything correct except for the query itself.
When querying a VARIANT column you do not need to PARSE_JSON so the correct query would look like this.
select src:id,
src:coord:lat,
src:coord:lon,
src:name
from cities_raw_source;

U-SQL - Extract data from complex json object

So I have a lot of json files structured like this:
{
"Id": "2551faee-20e5-41e4-a7e6-57bd20b02a22",
"Timestamp": "2016-12-06T08:09:57.5541438+01:00",
"EventEntry": {
"EventId": 1,
"Payload": [
"1a3e0c9e-ef69-4c6a-ac8c-9b2de2fbc701",
"DHS.PlanCare.Business.BusinessLogic.VisionModels.VisionModelServiceWithoutUnitOfWork.FetchVisionModelsForClientOnReferenceDateAsync(System.Int64 clientId, System.DateTime referenceDate, System.Threading.CancellationToken cancellationToken)",
25,
"DHS.PlanCare.Business.BusinessLogic.VisionModels.VisionModelServiceWithoutUnitOfWork+<FetchVisionModelsForClientOnReferenceDateAsync>d__11.MoveNext\r\nDHS.PlanCare.Core.Extensions.IQueryableExtensions+<ExecuteAndThrowTaskCancelledWhenRequestedAsync>d__16`1.MoveNext\r\n",
false,
"2197, 6-12-2016 0:00:00, System.Threading.CancellationToken"
],
"EventName": "Duration",
"KeyWordsDescription": "Duration",
"PayloadSchema": [
"instanceSessionId",
"member",
"durationInMilliseconds",
"minimalStacktrace",
"hasFailed",
"parameters"
]
},
"Session": {
"SessionId": "0016e54b-6c4a-48bd-9813-39bb040f7736",
"EnvironmentId": "C15E535B8D0BD9EF63E39045F1859C98FEDD47F2",
"OrganisationId": "AC6752D4-883D-42EE-9FEA-F9AE26978E54"
}
}
How can I create an u-sql query that outputs the
Id,
Timestamp,
EventEntry.EventId and
EventEntry.Payload[2] (value 25 in the example below)
I can't figure out how to extend my query
#extract =
EXTRACT
Timestamp DateTime
FROM #"wasb://xxx/2016/12/06/0016e54b-6c4a-48bd-9813-39bb040f7736/yyy/{*}/{*}.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
#res =
SELECT Timestamp
FROM #extract;
OUTPUT #res TO "/output/result.csv" USING Outputters.Csv();
I have seen some examples like:
U- SQL Unable to extract data from JSON file => this only queries one level of the document, I need data from multiple levels.
U-SQL - Extract data from json-array => this only queries one level of the document, I need data from multiple levels.
JSONTuple supports multiple JSONPaths in one go.
#extract =
EXTRACT
Id String,
Timestamp DateTime,
EventEntry String
FROM #"..."
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
#res =
SELECT Id, Timestamp, EventEntry,
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(EventEntry,
"EventId", "Payload[2]") AS Event
FROM #extract;
#res =
SELECT Id,
Timestamp,
Event["EventId"] AS EventId,
Event["Payload[2]"] AS Something
FROM #res;
You may want to look at this GIT example. https://github.com/Azure/usql/blob/master/Examples/JsonSample/JsonSample/NestedJsonParsing.usql
This take 2 disparate data elements and combines them, like you have the Payload, and Payload schema. If you create key value pairs using the "Donut" or "Cake and Batter" examples you may be able to match the scema up to the payload and use the cross apply explode function.

How to parse JSON value of a text column in cassandra

I have a column of text type be contain JSON value.
{
"customer": [
{
"details": {
"customer1": {
"name": "john",
"addresses": {
"address1": {
"line1": "xyz",
"line2": "pqr"
},
"address2": {
"line1": "abc",
"line2": "efg"
}
}
}
"customer2": {
"name": "robin",
"addresses": {
"address1": null
}
}
}
}
]
}
How can I extract 'address1' JSON field of column with query?
First I am trying to fetch JSON value then I will go with parsing.
SELECT JSON customer from text_column;
With my query, I get following error.
com.datastax.driver.core.exceptions.SyntaxError: line 1:12 no viable
alternative at input 'customer' (SELECT [JSON] customer...)
com.datastax.driver.core.exceptions.SyntaxError: line 1:12 no viable
alternative at input 'customer' (SELECT [JSON] customer...)
Cassandra version 2.1.13
You can't use SELECT JSON in Cassandra v2.1.x CQL v3.2.x
For Cassandra v2.1.x CQL v3.2.x :
The only supported operation after SELECT are :
DISTINCT
COUNT (*)
COUNT (1)
column_name AS new_name
WRITETIME (column_name)
TTL (column_name)
dateOf(), now(), minTimeuuid(), maxTimeuuid(), unixTimestampOf(), typeAsBlob() and blobAsType()
In Cassandra v2.2.x CQL v3.3.x Introduce : SELECT JSON
With SELECT statements, the new JSON keyword can be used to return each row as a single JSON encoded map. The remainder of the SELECT statment behavior is the same.
The result map keys are the same as the column names in a normal result set. For example, a statement like “SELECT JSON a, ttl(b) FROM ...” would result in a map with keys "a" and "ttl(b)". However, this is one notable exception: for symmetry with INSERT JSON behavior, case-sensitive column names with upper-case letters will be surrounded with double quotes. For example, “SELECT JSON myColumn FROM ...” would result in a map key "\"myColumn\"" (note the escaped quotes).
The map values will JSON-encoded representations (as described below) of the result set values.
If your Cassandra version is 2.1x and below, you can use the Python-based approach.
Write a python script using Cassandra-Python API
Here you have to get your row first and then use python json's loads method, which will convert your json text column value into JSON object which will be dict in Python. Then you can play around with Python dictionaries and extract your required nested keys. See the below code snippet.
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import json
if __name__ == '__main__':
auth_provider = PlainTextAuthProvider(username='xxxx', password='xxxx')
cluster = Cluster(['0.0.0.0'],
port=9042, auth_provider=auth_provider)
session = cluster.connect("keyspace_name")
print("session created successfully")
rows = session.execute('select * from user limit 10')
for user_row in rows:
customer_dict = json.loads(user_row.customer)
print(customer_dict().keys()