Related
I'm very new to jq and this post is a result of not understanding the mechanics behind jq.
I could develop a bash script, which does what I want but jq and it's JSON super-powers have intrigued me and I'd like to learn it by applying to real world scenarios. Here's one...
BTW, I've tried to make use of the existing jq related SO solutions for merging/joining JSONs but have failed.
The closest I came to what I needed was to use an INDEX and a concatenation of $x + . , however I was only getting the LAST item from my second (c2) json.
So, my problem is as follows:
There are Two JSON files:
JSON #1 will have unique "id" and "type" keys - among other key/value pairs, which I've removed for better clarity of my post.
JSON #2 will contain multiples/non-unique "type" keys, which I'd like to match these two JSON files on. This JSON #2 will also contain other key/value pairs, which are expected to be contained in the resultant output.
My output requirements are:
I'd like to obtain a (one per line or a single array) list of all combinations of matching key/values pairs between c1 and c2 array where the value of the "type" key (string) matches between c1 and c2 exactly.
One more question, how much more difficult would it be to scale the solution to perform similar matching/joining between three JSON files at once - again on the same value of a particular key?
Any assistance or even just hints on how to solve and understand how to solve this would be greatly appreciated!
1st input file: JSON #1, Array c1 (collection 1)
{ "c1":
[
{ "c1id":1, "type":"alpha" },
{ "c1id":2, "type":"beta" }
]
}
2nd input file: JSON #2, Array c2 (collection 2)
{
"c2":
[
{ "c2id":1,"type":"alpha","serial":"DDBB001"} ,
{ "c2id":2,"type":"beta","serial":"DDBB007"} ,
{ "c2id":3,"type":"alpha","serial":"DDTT005"} ,
{ "c2id":4,"type":"beta","serial":"DDAA002"} ,
{ "c2id":5,"type":"yotta","serial":"DDCC017"}
]
}
Expected output:
{"c1id":1,"type":"alpha","c2id":1,"serial":"DDBB001"}
{"c1id":1,"type":"alpha","c2id":3,"serial":"DDTT005"}
{"c1id":2,"type":"beta","c2id":2,"serial":"DDBB007"}
{"c1id":2,"type":"beta","c2id":4,"serial":"DDAA002"}
You will notice that type "yotta" from the c2 is not included in the output. This is expected. Only "types" which exist in c1 and match c2 are expected to be in the results. I guess this is implied by this being a matching/joining exercise - I added it just for clarity - I hope it worked.
Here's an example of using INDEX and JOIN:
jq --compact-output --slurpfile c1 c1.json '
INDEX(
$c1[0].c1[];
.type
) as $index |
JOIN(
$index;
.c2[];
.type;
reverse|add
)
' c2.json
The first argument to INDEX needs to produce a stream of items, which is why we apply [] to get the items from the array individually. The second argument selects our index key.
We use the four argument version of JOIN. The first argument is the index itself, the second is a stream of objects to be joined to the index, the third argument selects the lookup key from the streamed objects, and the fourth argument is an expression to assemble the join object. The input to that expression is a stream of two-item arrays, each looking something like this:
[{"c2id":1,"type":"alpha","serial":"DDBB001"},{"c1id":1,"type":"alpha"}]
Since we just want to combine all the keys and values from the objects we just use add, but we first reverse the array to nicely arrange the c1 fields before the c2 fields. The end result is as you hoped:
{"c1id":1,"type":"alpha","c2id":1,"serial":"DDBB001"}
{"c1id":2,"type":"beta","c2id":2,"serial":"DDBB007"}
{"c1id":1,"type":"alpha","c2id":3,"serial":"DDTT005"}
{"c1id":2,"type":"beta","c2id":4,"serial":"DDAA002"}
Let's take a simple schema with two tables, one that describes an simple entity item (id, name)
id | name
------------
1 | foo
2 | bar
and another, lets call it collection, that references to an item, but inside a JSON Object in something like
{
items: [
{
id: 1,
quantity: 2
}
]
}
I'm looking for a way to eventually enrich this field (kind of like populate in Mongo) in the collection with the item element referenced, to retrieve something like
{
...
items: [
{
item: {
id: 1,
name: foo
},
quantity: 2
}
]
...
}
If you have a solution with PostgreSQL, I take it as well.
If I understood correctly, your requirement is to convert an Input JSON data into MySQL table so that you can work with JSON but leverage the power of SQL.
Mysql8 recently released JSONTABLE function. By using this function, you can store your JSON in the table directly and then query it like any other SQL query.
It should serve your immediate case, but this means that your table schema will have a JSON column instead of traditional MySQL columns. You will need to check if it serves your purpose.
This is a good tutorial for the same.
I have a report in JSON format stored in a field in a PostgreSQL database table.
Say the (simplified) table format is:
Column | Type
-------------------+----------------------------
id | integer
element_id | character varying(256)
report | json
and the structure of the data in the reports is like this
{
"section1":
"test1": {
"outcome": "nominal",
"results": {
"value1": 34.,
"value2": 56.
}
},
"test2": {
"outcome": "warning",
"results": {
"avg": 4.5,
"std": 21.
}
},
...
"sectionN": {
...
}
}
That is, there are N keys at first level (the sections), each of them being an object with a set of keys (the tests), with a outcome and a variable set of results in form of (key, value) pairs.
I need to do filtering based on internal JSON keys. More specifically, in this example, I want to know if it is possible, using SQL alone, to obtain the elements that have, for example, the std value in the results section above a certain threshold, say 10. I can even know that the std is in test2, but I do not know a priori in which section. With this filter (test2.std > 10.), for example, the record with the sample data shown above will appear, since the std variable in the test2 test has this value equal to 21. (>10.).
Another, simpler, filter could be to request all the records for which the test2.outcome is not nominal.
One way is jsonb_each, like:
select section.key
, test.key
from t1
cross join
jsonb_each(t1.col1) section
cross join
jsonb_each(section.value) test
where (test.value->'results'->>'std')::int > 10
Example at SQL Fiddle.
I have a postgresql table called datasource with jsonb column called config. It has the following structure:
{
"url":"some_url",
"password":"some_password",
"username":"some_username",
"projectNames":[
"project_name_1",
...
"project_name_N"
]
}
I would like to transform nested json array projectNames into a map and add a default value for each element from the array, so it would look like:
{
"url":"some_url",
"password":"some_password",
"username":"some_username",
"projectNames":{
"project_name_1": "value",
...
"project_name_N": "value"
}
}
I have selected projectNames from the table using postgresql jsonb operator config#>'{projectNames}', but I have no idea how to perform transform operation.
I think, I should use something like jsonb_object_agg, but it converts all data into a single row.
I'm using PostgreSQL 9.6 version.
You need to first unnest the array, then build a new JSON document from that. Then you can put that back into the column.
update datasource
set config = jsonb_set(config, '{projectNames}', t.map)
from (
select id, jsonb_object_agg(pn.n, 'value') as map
from datasource, jsonb_array_elements_text(config -> 'projectNames') as pn (n)
group by id
) t
where t.id = datasource.id;
The above assumes that there is a primary (or at least unique) column named id. The inner select transforms the array into a map.
Online example: http://rextester.com/GPP85654
are you looking for smth like:
t=# with c(j) as (values('{
"url":"some_url",
"password":"some_password",
"username":"some_username",
"projectNames":[
"project_name_1",
"project_name_N"
]
}
'::jsonb))
, n as (select j,jsonb_array_elements_text(j->'projectNames') a from c)
select jsonb_pretty(jsonb_set(j,'{projectNames}',jsonb_object_agg(a,'value'))) from n group by j
;
jsonb_pretty
------------------------------------
{ +
"url": "some_url", +
"password": "some_password", +
"username": "some_username", +
"projectNames": { +
"project_name_1": "value",+
"project_name_N": "value" +
} +
}
(1 row)
Time: 19.756 ms
if so, look at:
https://www.postgresql.org/docs/current/static/functions-aggregate.html
https://www.postgresql.org/docs/current/static/functions-json.html
I am new to prolog and am considering using it for a small data analysis application. Here is what I am seeking to accomplish:
I have a CSV file with some data of the following from:
a,b,c
d,e,f
g,h,i
...
The data is purely numerical and I need to do the following: 1st, I need to group rows according to the following scheme:
So what's going on above?
I start at the 1st row, which has value 'a' in column one. Then, I keep going down the rows until I hit a row whose value in column one differs from 'a' by a certain amount, 'z'. The process is then repeated, and many "groups" are formed after the process is complete.
For each of these groups, I want to find the mean of columns two and three (as an example, for the 1st group in the picture above, the mean of column two would be: (b+e+h)/3).
I am pretty sure this can be done in prolog. However, I have 50,000+ rows of data and since prolog is declarative, I am not sure how efficient prolog would be at accomplishing the above task?
Is it feasible to work out a prolog program to accomplish the above task, so that efficiency of the program is not significantly lower than a procedural analog?
this snippet could be a starting point for your task
:- [library(dcg/basics)].
rownum(Z, AveList) :- phrase_from_file(row_scan(Z, [], [], AveList), 'numbers.txt').
row_scan(Z, Group, AveSoFar, AveList) -->
number(A),",",number(B),",",number(C),"\n",
{ row_match(Z, A,B,C, Group,AveSoFar, Group1,AveUpdated) },
row_scan(Z, Group1, AveUpdated, AveList).
row_scan(_Z, _Group, AveList, AveList) --> "\n";[].
% row_match(Z, A,B,C, Group,Ave, Group1,Ave1)
row_match(_, A,B,C, [],Ave, [(A,B,C)],Ave).
row_match(Z, A,B,C, [H|T],Ave, Group1,Ave1) :-
H = (F,_,_),
( A - F =:= Z
-> aggregate_all(agg(count,sum(C2),sum(C3)),
member((_,C2,C3), [(A,B,C), H|T]), agg(Count,T2,T3)),
A2 is T2/Count, A3 is T3/Count,
Group1 = [], Ave1 = [(A2,A3)|Ave]
; Group1 = [H,(A,B,C)|T], Ave1 = Ave
).
with this input
1,2,3
4,5,6
7,8,9
10,2,3
40,5,6
70,8,9
16,0,0
yields
?- rownum(6,L).
L = [ (3.75, 4.5), (5, 6)]