Deep JSON query with partial path in MySQL 5.7? - mysql

Given a number of JSON document like this:
{
id: some_id,
l1: {
f1: [
{
c1: foo,
c2: bar
},
{
c1: foo1,
c2: bar1
}
],
f2: [
{
c3: baz,
c4: bar
}
]
}
}
How can I query MySQL 5.7 for f1....c1: foo1 -- ie lX is not given nor is the list position of the c1-c2 subdocument.
This is not a duplicate of Deep JSON query with partial path in PGSQL JSONB? since that is about PostgreSQL and this one is about MySQL.

This should do it:
SELECT JSON_CONTAINS(JSON_EXTRACT(Doc, '$.*.f1[*].c1'), '"foo1"') FROM table;
If you're using 5.7.9 or later, you can replace the JSON_EXTRACT function with the -> operator:
SELECT JSON_CONTAINS(Doc->'$.*.f1[*].c1', '"foo1"') FROM table;

Related

Table from nested list, struct

I have this json data:
consumption_json = """
{
"count": 48,
"next": null,
"previous": null,
"results": [
{
"consumption": 0.063,
"interval_start": "2018-05-19T00:30:00+0100",
"interval_end": "2018-05-19T01:00:00+0100"
},
{
"consumption": 0.071,
"interval_start": "2018-05-19T00:00:00+0100",
"interval_end": "2018-05-19T00:30:00+0100"
},
{
"consumption": 0.073,
"interval_start": "2018-05-18T23:30:00+0100",
"interval_end": "2018-05-18T00:00:00+0100"
}
]
}
"""
and I would like to covert the results list to an Arrow table.
I have managed this by first converting it to python data structure, using python's json library, and then converting that to an Arrow table.
import json
consumption_python = json.loads(consumption_json)
results = consumption_python['results']
table = pa.Table.from_pylist(results)
print(table)
pyarrow.Table
consumption: double
interval_start: string
interval_end: string
----
consumption: [[0.063,0.071,0.073]]
interval_start: [["2018-05-19T00:30:00+0100","2018-05-19T00:00:00+0100","2018-05-18T23:30:00+0100"]]
interval_end: [["2018-05-19T01:00:00+0100","2018-05-19T00:30:00+0100","2018-05-18T00:00:00+0100"]]
But, for reasons of performance, I'd rather just use pyarrow exclusively for this.
I can use pyarrow's json reader to make a table.
reader = pa.BufferReader(bytes(consumption_json, encoding='ascii'))
table_from_reader = pa.json.read_json(reader)
And 'results' is a struct nested inside a list. (Actually, everything seems to be nested).
print(table_from_reader['results'].type)
list<item: struct<consumption: double, interval_start: timestamp[s], interval_end: timestamp[s]>>
How do I turn this into a table directly?
following this https://stackoverflow.com/a/72880717/3617057
I can get closer...
import pyarrow.compute as pc
flat = pc.list_flatten(table_from_reader["results"])
print(flat)
[
-- is_valid: all not null
-- child 0 type: double
[
0.063,
0.071,
0.073
]
-- child 1 type: timestamp[s]
[
2018-05-18 23:30:00,
2018-05-18 23:00:00,
2018-05-18 22:30:00
]
-- child 2 type: timestamp[s]
[
2018-05-19 00:00:00,
2018-05-18 23:30:00,
2018-05-17 23:00:00
]
]
flat is a ChunkedArray whose underlying arrays are StructArray. To convert it to a table, you need to convert each chunks to a RecordBatch and concatenate them in a table:
pa.Table.from_batches(
[
pa.RecordBatch.from_struct_array(s)
for s in flat.iterchunks()
]
)
If flat is just a StructArray (not a ChunkedArray), you can call:
pa.Table.from_batches(
[
pa.RecordBatch.from_struct_array(flat)
]
)

Convert nested CSV data file to JSON

I am trying to create a Perl function which converts CSV with a special format to JSON.
I have a CSV file where every line has the following format
child_number,parent_number,status
The root parent (without a parent) has parent_number = NA.
My goal is to convert into JSON format like this
{
number:
status:
children: [
{
number:
status:
children: [
{
number:
status:
}
]
},
number:
status:
children: [
{
number:
status:
}
]
]
}
I inserted the contents of the CSV file into an array and found the line with no parent
foreach my $line ( #csv_arr ) {
($child, $perent, $status) = split(/,/,$line);
last if ( $perent eq "NA" );
}
return 0 unless($perent eq "NA");
I would like to get the children for each $perent and insert them into the data structure. I could use the following
push #{ $json_struct->{usage} }, {
number => $perent,
children => getChildren(),
};
...
my #json_arr = JSON::PP->new->encode($json_struct);
In order to explain myself better, please consider the following example
> cat ./csv_file
7,5,0
5,NA,0
3,5,1
> cat ./json_file
{
number: NA
status: 0
children: [
{
number: 5
status: 0
children: [
{
number: 3
status: 1
},
{
number: 7
status: 0
}
]
}
]
}
These are the contents of the CSV file after a real run
7696,44641,0
44654,44641,0
44646,44641,0
44644,44641,0
44651,44649,0
7700,44641,0
7691,44641,0
44657,44641,0
7699,7698,0
7692,44641,0
44650,44649,0
44661,44660,0
44656,44654,1
44659,44641,0
44648,44646,0
7695,7694,0
44662,44660,0
44641,NA,0
7701,7700,0
7698,44641,0
44653,44651,1
44645,44644,0
44647,44646,0
44649,44641,0
44660,44641,0
7697,7696,0
7689,7688,0
7694,44641,0
44658,44657,1
7693,44641,0
7688,44641,0
What is the proper way to implement this method in Perl?
Here's one way to do it:
use strict;
use warnings;
use JSON::PP ();
my %nodes;
$nodes{NA} = { number => 'NA', status => 0 };
while (my $line = readline DATA) {
chomp $line;
my ($id, $parent_id, $status) = split /,/, $line;
my $node = $nodes{$id} //= {};
$node->{number} = 0 + $id;
$node->{status} = 0 + $status;
push #{ $nodes{$parent_id}{children} }, $node;
}
my $jsonpp = JSON::PP->new->pretty->canonical;
print $jsonpp->encode($nodes{NA});
__DATA__
7,5,0
5,NA,0
3,5,1
Output:
{
"children" : [
{
"children" : [
{
"number" : 7,
"status" : 0
},
{
"number" : 3,
"status" : 1
}
],
"number" : 5,
"status" : 0
}
],
"number" : "NA",
"status" : 0
}
The only reason
$nodes{NA} = { number => 'NA', status => 0 };
is there is to give the root node a number and status field. It doesn't otherwise exist in your CSV input, so without that line you'd get a root object containing only a children field.
For every CSV record we extract its id, the corresponding parent id, and the status.
If we've already created a node for this id (this happens if another node has already specified this node as its parent), we just use the existing node; otherwise we create a new empty node and store it in %nodes.
%nodes is our universe of nodes indexed by their id.
We always set the number and status fields. The 0 + part is to explicitly convert the values to numbers (otherwise they'd be quoted in the JSON output).
We add our current node to the list of children of its parent (which is found in $nodes{$parent_id}{children}).
At the end we simply pretty-print the tree under $nodes{NA}, including its children, their children-children, and so on.

Write/read recursive structure S4 objects

I have a recursive structure of S4 objects , that can be presented ( this is a simple version) by theses 2 classes:
cl2 <-
setClass("cl2",
representation(
id = "numeric",
date="Date"),
prototype = list(
date=Sys.Date(),
id=sample(1:100,1)
)
)
cl1 <-
setClass("cl1",
representation(
date="Date",
cl2 = "cl2"
),
prototype = list(
date=Sys.Date()
)
)
I would like to save/load objects of type cl1. I opt to use json format(suitable for unstructured objects). The problem is with dates. Dates are coerced to numeric? Is there an option/solution to get dates in the right format when I serialize the object? Note that the objects can contains other objects ( recursive structure) so I would like that all dates are in the good format.
cat(RJSONIO::toJSON(cl1(),pretty=TRUE))
{
"date" : 16861,
"cl2" : {
"id" : 90,
"date" : 16861
}
}
A solution can be to replace dates by character. But I will loose the validation mechanism of S4 object and I should implement the date validation for all objects. Thanks in advance for any help.
An expected output should be like :
{
"date" :"2016-03-01",
"cl2" : {
"id" : 76,
"date" : "2016-03-01"
}
}
Reading the documentation of toJSON I found an interesting parameter:
force unclass/skip objects of classes with no defined JSON mapping
So I tried and I think this would match you need as you can simply ignore the class entry:
> s <- jsonlite::toJSON(cl1(),force=TRUE,auto_unbox=TRUE,pretty=TRUE)
> s
{
"date": "2016-03-01",
"cl2": {
"date": "2016-03-01",
"id": 67,
"class": "cl2"
},
"class": "cl1"
}
Drawback: This is still no loadable "as-is" to s4 objects with fromJSON as it will give a named list back, analyzing the list recursively to recreate S4 objects is doable, but you'll have to create the necessary as implementation to turn a named list to your classes, for your example:
setAs('list', 'cl2',
function(from, to) {
new(to, id=from[['id']], date=as.Date(from[['date']]))
})
setAs('list','cl1',
function(from, to) {
new(to,date=as.Date(from[['date']],cl2=as(from[['cl2']],'cl2')))
})
With a dummy input from previous output:
input <- '
{
"date": "2016-03-05",
"cl2": {
"date": "2016-02-01",
"id": 83,
"class": "cl2"
},
"class": "cl1"
}'
This gives:
> as(fromJSON(input),'cl1')
An object of class "cl1"
Slot "date":
[1] "2016-03-05"
Slot "cl2":
An object of class "cl2"
Slot "id":
[1] 67
Slot "date":
[1] "2016-03-01"
I let you adapt this to your real use case, probably using fromJSON(input,FALSE) to get a 'pure' list to coerce with lapply for example if you have multiples instances of your cl1 class in the json input.
One option is to use the jsonlite package to serialize. Indeed jsonlite::tojson respects date and serilze them in well formated form. The problem is jsonlite::toJSON is not defined for S4 objects. My solution is to coerce the object to a list and then seralize it:
## S4 method to coerce any S4 object to a list
setMethod("as.list",signature(x="ANY"),
function(x) {
Map(
function(y) if (isS4(slot(x,y))) as.list(slot(x,y)) else slot(x,y)
,slotNames(class(x)))
})
## coercion
jsonlite::toJSON(as.list(cl1()),pretty=TRUE,auto_unbox=TRUE)
{
"date": "2016-03-01",
"cl2": {
"id": 24,
"date": "2016-03-01"
}
}
udpdate
in as.list I replace lapply by Map to create a named list.
For the recursive reading of S4 classes from JSON you can use a similar approach:
library(RJSONIO)
createParser <- function(className) {
setAs("list", className, function(from, to) {
to <- new(to)
for (n in names(from)) {
if (isS4(slot(to, n))) {
c <- class(slot(to, n))[[1]]
o <- as(from[[n]], c)
slot(to, n) = o
} else {
slot(to, n) = from[[n]]
}
}
to
})
}
Name <- setClass("Name", slots=c("first"="character", "last"="character"))
createParser("Name")
Customer <- setClass("Customer", slots=c("name"="Name", "age"="numeric"))
createParser("Customer")
Case <- setClass("Case", slots=c("customer"="Customer"))
createParser("Case")
c1 <- Case(customer=Customer(name=Name(first="Mika", last="R"), age=100))
j <- RJSONIO::toJSON(c1)
l <- RJSONIO::fromJSON(j, simplify = FALSE)
as(l, "Case")

Postgres JSON data type Rails query

I am using Postgres' json data type but want to do a query/ordering with data that is nested within the json.
I want to order or query with .where on the json data type. For example, I want to query for users that have a follower count > 500 or I want to order by follower or following count.
Thanks!
Example:
model User
data: {
"photos"=>[
{"type"=>"facebook", "type_id"=>"facebook", "type_name"=>"Facebook", "url"=>"facebook.com"}
],
"social_profiles"=>[
{"type"=>"vimeo", "type_id"=>"vimeo", "type_name"=>"Vimeo", "url"=>"http://vimeo.com/", "username"=>"v", "id"=>"1"},
{"bio"=>"I am not a person, but a series of plants", "followers"=>1500, "following"=>240, "type"=>"twitter", "type_id"=>"twitter", "type_name"=>"Twitter", "url"=>"http://www.twitter.com/", "username"=>"123", "id"=>"123"}
]
}
For any who stumbles upon this. I have come up with a list of queries using ActiveRecord and Postgres' JSON data type. Feel free to edit this to make it more clear.
Documentation to the JSON operators used below: https://www.postgresql.org/docs/current/functions-json.html.
# Sort based on the Hstore data:
Post.order("data->'hello' DESC")
=> #<ActiveRecord::Relation [
#<Post id: 4, data: {"hi"=>"23", "hello"=>"22"}>,
#<Post id: 3, data: {"hi"=>"13", "hello"=>"21"}>,
#<Post id: 2, data: {"hi"=>"3", "hello"=>"2"}>,
#<Post id: 1, data: {"hi"=>"2", "hello"=>"1"}>]>
# Where inside a JSON object:
Record.where("data ->> 'likelihood' = '0.89'")
# Example json object:
r.column_data
=> {"data1"=>[1, 2, 3],
"data2"=>"data2-3",
"array"=>[{"hello"=>1}, {"hi"=>2}],
"nest"=>{"nest1"=>"yes"}}
# Nested search:
Record.where("column_data -> 'nest' ->> 'nest1' = 'yes' ")
# Search within array:
Record.where("column_data #>> '{data1,1}' = '2' ")
# Search within a value that's an array:
Record.where("column_data #> '{array,0}' ->> 'hello' = '1' ")
# this only find for one element of the array.
# All elements:
Record.where("column_data ->> 'array' LIKE '%hello%' ") # bad
Record.where("column_data ->> 'array' LIKE ?", "%hello%") # good
According to this http://edgeguides.rubyonrails.org/active_record_postgresql.html#json
there's a difference in using -> and ->>:
# db/migrate/20131220144913_create_events.rb
create_table :events do |t|
t.json 'payload'
end
# app/models/event.rb
class Event < ActiveRecord::Base
end
# Usage
Event.create(payload: { kind: "user_renamed", change: ["jack", "john"]})
event = Event.first
event.payload # => {"kind"=>"user_renamed", "change"=>["jack", "john"]}
## Query based on JSON document
# The -> operator returns the original JSON type (which might be an object), whereas ->> returns text
Event.where("payload->>'kind' = ?", "user_renamed")
So you should try Record.where("data ->> 'status' = 200 ") or the operator that suits your query (http://www.postgresql.org/docs/current/static/functions-json.html).
Your question doesn't seem to correspond to the data you've shown, but if your table is named users and data is a field in that table with JSON like {count:123}, then the query
SELECT * WHERE data->'count' > 500 FROM users
will work. Take a look at your database schema to make sure you understand the layout and check that the query works before complicating it with Rails conventions.
JSON filtering in Rails
Event.create( payload: [{ "name": 'Jack', "age": 12 },
{ "name": 'John', "age": 13 },
{ "name": 'Dohn', "age": 24 }]
Event.where('payload #> ?', '[{"age": 12}]')
#You can also filter by name key
Event.where('payload #> ?', '[{"name": "John"}]')
#You can also filter by {"name":"Jack", "age":12}
Event.where('payload #> ?', {"name":"Jack", "age":12}.to_json)
You can find more about this here

What is the difference between #> and ->> operator in PostgreSQL?

We can access any JSON element in PostgreSQL 9.3 using the -> and ->> operators. Seems to me that the #> along with #>> only provide a shorter form of writing the JSON path. Or is there a bigger picture behind the #> operator? Does it serve a special purpose/provide any advantage over the arrow notation? Which one is the preffered method of writing paths?
It all comes to the question: why should I use the #> and #>> operator over the -> and ->>?
The docs are a bit enigmatic about this.
Both queries below give the same result:
=> select '{"a":[1,2,3],"b":[4,5,6]}'::json#>'{a,2}';
?column?
----------
3
=> select '{"a":[1,2,3],"b":[4,5,6]}'::json->'a'->>2;
?column?
----------
3
Consider nesting.
{
"a" : {
"b" : {
"c" : 1,
"d" : 2
}
}
}
Imagine that you have a json document, and you don't know in advance how it will be nested. If you knew you needed a three level path, you could write:
SELECT '{
"a" : {
"b" : {
"c" : 1,
"d" : 2
}
}
}'::json -> 'a' -> 'b' -> 'c';
but what if you want to write a query that doesn't know that in advance? That's where the path-based operators are useful; the path can be supplied along with the document, and there's no longer any assumption about the document structure in the query.
SELECT '{
"a" : {
"b" : {
"c" : 1,
"d" : 2
}
}
}'::json #>> ARRAY['a','b','c']