Snakemake : multi-level json parsing - json

I've a json configuration file that looks like:
{
"projet_name": "Project 1",
"samples": [
{
"sample_name": "Sample_A",
"files":[
{
"a": "file_A_a1.txt",
"b": "file_A_a2.txt",
"x": "x1"
},
{
"a": "file_A_b1.txt",
"b": "file_A_b2.txt",
"x": "x1"
},
{
"a": "file_A_c1.txt",
"b": "file_A_c2.txt",
"x": "x2"
}
]
},
{
"sample_name": "Sample_B",
"files":[
{
"a": "file_B_a1.txt",
"b": "file_B_a2.txt",
"x": "x1"
},
{
"a": "file_B_b1.txt",
"b": "file_B_b2.txt",
"x": "x1"
}
]
}]
}
I'm currently writing a snakemake file to process such json file. The idea is to for each sample (e.g. Sample_A , Sample_B) to concatenate the files that have the same "x" entry. For example in Sample_A, I would like to concatenate "a" files : file_A_a1.txt and file_A_b1.txt as they have the same "x" entry. Same for "b" files : file_A_a2.txt and file_A_b2.txt. file_A_c1.txt and file_A_c2.txt will not be concatenate with other files as they have a unique "x". At the end I would like a structure like this :
merged_files/Sample_A_a_x1.txt
merged_files/Sample_A_b_x1.txt
merged_files/Sample_A_a_x2.txt
merged_files/Sample_A_b_x2.txt
merged_files/Sample_B_a_x1.txt
merged_files/Sample_B_b_x1.txt
My issue is the grouping of files with same "sample_name" and same "x" .. Any suggestions ?
Thank you

Related

How to Handle Multiline record in Hive table

Json File :
{
"buyer": {
"legalBusinessName": "test1 Company","organisationIdentifications": [{ "type": "abcd",
"identification": "test.bb#tesr"
},
{
"type": "TXID","identification": "12345678"
}
]
},
"supplier": {
"legalBusinessName": "test Company",
"organisationIdentifications": [
{
"type":"abcd","identification": "test28#test"
}
]
},
"paymentRecommendationId": "1234-5678-9876-2212-123456",
"excludedRemittanceInformation": [],
"recommendedPaymentInstructions": [{
"executionDate": "2022-06-12",
"paymentMethod": "aaaa",
"remittanceInformation": {
"structured": [{
"referredDocumentInformation": [{
"type": "xxx",
"number": "12341234",
"relatedDate": "2022-06-12",
"paymentDueDate": "2022-06-12",
"referredDocumentAmount": {
"remittedAmount": 2600.5,
"duePayableAmount": 3000
}
}]
}]
}
}]
}
Create Table Statement:
CREATE EXTERNAL TABLE IF NOT EXISTS `test`.`test_rahul`
(`buyer` STRUCT< `legalBusinessName`:STRING, `organisationIdentifications`:STRUCT< `type`:STRING, `identification`:STRING>>,
`supplier` STRUCT< `legalBusinessName`:STRING, `organisationIdentifications`:STRUCT< `type`:STRING, `identification`:STRING>>,
`paymentRecommendationId` STRING, `recommendedPaymentInstructions` ARRAY< STRUCT< `executionDate`:STRING, `paymentMethod`:STRING,
`remittanceInformation`:STRUCT< `structured`:STRUCT< `referredDocumentInformation`:STRUCT< `type`:STRING,
`number`:STRING, `relatedDate`:STRING, `paymentDueDate`:STRING, `referredDocumentAmount`:STRUCT< `remittedAmount`:DOUBLE,
`duePayableAmount`:INT>>>>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "field.delim"=",","mapping.ts" = "number")
STORED AS textFILE LOCATION '/user/hdfs/Jsontest/';
If I am wring Jsonfile data in single row, for each record than it working fine but if its in multiline then getting below error.
Error Message :
Error: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException:
Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 2 [character 3 line 1] (state=,code=0)
can someone kindly suggest. looks like i need to add line/field seprator but not able to decide what should i add so that it can handle multiline also same as spark. i.e..oprtion(multiline,true)
It seems like JSON serde in Hive cannot support multi-line. You might need to flatten JSON into single line like the following format.
{ "buyer": { "a": "1", "b": "2" }, "c": "3" }
{ "buyer": { "a": "1", "b": "2" }, "c": "3" }
{ "buyer": { "a": "1", "b": "2" }, "c": "3" }
...

Conditional deletion from array of field A with condition on field B

Let's say I have a json with an array inside. Say that the elements of this array are objects with keys A and B. I would like to remove the B objects on the elements where A objects meet a certain condition.
For example, I would like to remove the B objects where A is greater than 5, transforming
{
"title": "myTitle",
"myArray": [
{
"A": 1,
"B": "foo"
},
{
"A": 4,
"B": "bar"
},
{
"A": 7,
"B": "barfoo"
},
{
"A": 9,
"B": "foobar"
}
]
}
into
{
"title": "myTitle",
"myArray": [
{
"A": 1,
"B": "foo"
},
{
"A": 4,
"B": "bar"
},
{
"A": 7
},
{
"A": 9
}
]
}
The task seems easy enough and if I had't have to keep the A's it would be a simple del(select..) thing. There surely must be an elegant way to do this as well?
Thank you!
You can still use a del(select..) thing.
.myArray[] |= del(select(.A > 5) .B)
demo at jqplay.org

KarateException Missing Property in path - JSON

I was trying to match particular variable from response and tried as below. But im getting error saying KarateException Missing Property in path $['Odata']. My question is: how we can modify so that we won't get this error?
Feature:
And match response.#odata.context.a.b contains '<b>'
Examples:
|b|
|b1 |
|b2 |
Response is
{
"#odata.context": "$metadata#Accounts",
"a": [
{
"c": 145729,
"b": "b1",
"d": "ON",
},
{
"c": 145729,
"b": "b2",
"d": "ON",
}
]
}
I think you are confused with the structure of your JSON. Also note that when the JSON key has special characters, you need to change the way you use them in path expressions. You can try paste the below in a new Scenario and see it work:
* def response =
"""
{
"#odata.context": "$metadata#Accounts",
"a": [
{
"c": 145729,
"b": "b1",
"d": "ON",
},
{
"c": 145729,
"b": "b2",
"d": "ON",
}
]
}
"""
* match response['#odata.context'] == '$metadata#Accounts'
* match response.a[0].b == 'b1'
* match response.a[1].b == 'b2'

jmespath flatten multiple hash values

Ideally, I want to write a query that returns a flat list output: ["abc", "bcd", "cde", "def"] from the following JSON sample:
{
"l_l": [
[1,2,3],
[4,5,6]
],
"l_h_l": [
{ "n": [10,2,3] },
{ "n": [4,5,60] }
],
"l_h_m": [
{
"n": {
"1234": "abc",
"2345": "bcd"
}
}, {
"n": {
"3456": "cde",
"4567": "def"
}
}
]
}
The closest I can get is l_h_m[].n.* which returns the contents that I want as an unflattened list of lists:
[
[
"abc",
"bcd"
],
[
"cde",
"def"
]
]
jmespath lets you flatten lists of lists. Queries l_l[] and l_h_l[].n[] both returned flattened results, when the source json is structured that way.
Looks like your solution just required another flattening operator.
l_h_m[].n.*[]
returns
[
"abc",
"bcd",
"cde",
"def"
]

How to Append JSON Object in already created object in mysql json document

My object is
{
"name":"Testing",
"id": "hcig_3fe7cb00-e936-11e6-af69-a748c8cc89ad",
"belongsTo": {
"id": "69616d26-c3bb-405c-8c84-c51c091524b2",
"name": "test"
},
"locatedAt": {
"id": "49616d26-c3bb-405c-8c84-c51c091524b2",
"name":"Test"
} }
I want to merge one more object like
"obj":[{
"a": 123
}}
With the help of JSON_MERGE in mysql document store i am able to add object.
But it looks likes
{
"name":"Tester",
"id": "hcig_3fe7cb00-e936-11e6-af69-a748c8cc89ad",
"belongsTo": {
"id": "69616d26-c3bb-405c-8c84-c51c091524b2",
"name": "test"
},
"locatedAt": {
"id": "49616d26-c3bb-405c-8c84-c51c091524b2",
"name":"Test"
},{
"obj":[{
"a": 123
}]
}}
I want my object to be as
{
"name": "Tester",
"id": "hcig_3fe7cb00-e936-11e6-af69-a748c8cc89ad",
"belongsTo": {
"id": "69616d26-c3bb-405c-8c84-c51c091524b2",
"name": "test"
},
"locatedAt": {
"id": "49616d26-c3bb-405c-8c84-c51c091524b2",
"name": "Test"
},
"obj": [{
"a": 123
}]}
Any idea on how to add object as above manner using JSON Functions in mysql ??
Use lodash for a recursive deep copy - https://lodash.com/
lodash.merge(targetObj, sourceObj);
Or if you have programmatic access:
targetObj.obj = sourceObj;