How to Handle Multiline record in Hive table - json

Json File :
{
"buyer": {
"legalBusinessName": "test1 Company","organisationIdentifications": [{ "type": "abcd",
"identification": "test.bb#tesr"
},
{
"type": "TXID","identification": "12345678"
}
]
},
"supplier": {
"legalBusinessName": "test Company",
"organisationIdentifications": [
{
"type":"abcd","identification": "test28#test"
}
]
},
"paymentRecommendationId": "1234-5678-9876-2212-123456",
"excludedRemittanceInformation": [],
"recommendedPaymentInstructions": [{
"executionDate": "2022-06-12",
"paymentMethod": "aaaa",
"remittanceInformation": {
"structured": [{
"referredDocumentInformation": [{
"type": "xxx",
"number": "12341234",
"relatedDate": "2022-06-12",
"paymentDueDate": "2022-06-12",
"referredDocumentAmount": {
"remittedAmount": 2600.5,
"duePayableAmount": 3000
}
}]
}]
}
}]
}
Create Table Statement:
CREATE EXTERNAL TABLE IF NOT EXISTS `test`.`test_rahul`
(`buyer` STRUCT< `legalBusinessName`:STRING, `organisationIdentifications`:STRUCT< `type`:STRING, `identification`:STRING>>,
`supplier` STRUCT< `legalBusinessName`:STRING, `organisationIdentifications`:STRUCT< `type`:STRING, `identification`:STRING>>,
`paymentRecommendationId` STRING, `recommendedPaymentInstructions` ARRAY< STRUCT< `executionDate`:STRING, `paymentMethod`:STRING,
`remittanceInformation`:STRUCT< `structured`:STRUCT< `referredDocumentInformation`:STRUCT< `type`:STRING,
`number`:STRING, `relatedDate`:STRING, `paymentDueDate`:STRING, `referredDocumentAmount`:STRUCT< `remittedAmount`:DOUBLE,
`duePayableAmount`:INT>>>>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "field.delim"=",","mapping.ts" = "number")
STORED AS textFILE LOCATION '/user/hdfs/Jsontest/';
If I am wring Jsonfile data in single row, for each record than it working fine but if its in multiline then getting below error.
Error Message :
Error: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException:
Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 2 [character 3 line 1] (state=,code=0)
can someone kindly suggest. looks like i need to add line/field seprator but not able to decide what should i add so that it can handle multiline also same as spark. i.e..oprtion(multiline,true)

It seems like JSON serde in Hive cannot support multi-line. You might need to flatten JSON into single line like the following format.
{ "buyer": { "a": "1", "b": "2" }, "c": "3" }
{ "buyer": { "a": "1", "b": "2" }, "c": "3" }
{ "buyer": { "a": "1", "b": "2" }, "c": "3" }
...

Related

Snakemake : multi-level json parsing

I've a json configuration file that looks like:
{
"projet_name": "Project 1",
"samples": [
{
"sample_name": "Sample_A",
"files":[
{
"a": "file_A_a1.txt",
"b": "file_A_a2.txt",
"x": "x1"
},
{
"a": "file_A_b1.txt",
"b": "file_A_b2.txt",
"x": "x1"
},
{
"a": "file_A_c1.txt",
"b": "file_A_c2.txt",
"x": "x2"
}
]
},
{
"sample_name": "Sample_B",
"files":[
{
"a": "file_B_a1.txt",
"b": "file_B_a2.txt",
"x": "x1"
},
{
"a": "file_B_b1.txt",
"b": "file_B_b2.txt",
"x": "x1"
}
]
}]
}
I'm currently writing a snakemake file to process such json file. The idea is to for each sample (e.g. Sample_A , Sample_B) to concatenate the files that have the same "x" entry. For example in Sample_A, I would like to concatenate "a" files : file_A_a1.txt and file_A_b1.txt as they have the same "x" entry. Same for "b" files : file_A_a2.txt and file_A_b2.txt. file_A_c1.txt and file_A_c2.txt will not be concatenate with other files as they have a unique "x". At the end I would like a structure like this :
merged_files/Sample_A_a_x1.txt
merged_files/Sample_A_b_x1.txt
merged_files/Sample_A_a_x2.txt
merged_files/Sample_A_b_x2.txt
merged_files/Sample_B_a_x1.txt
merged_files/Sample_B_b_x1.txt
My issue is the grouping of files with same "sample_name" and same "x" .. Any suggestions ?
Thank you

How to combine columns in spark as a JSON in Scala

I have a variable which is constructed as follows extracting data using Spark SQL:
{
"resourceType" : "Test1",
"count" : 10,
"entry": [{
"id": "112",
"gender": "female",
"birthDate": 1213999
}, {
"id": "urn:uuid:002e27cf-3cae-4393-89c5-1b78050d9428",
"resourceType": "Encounter"
}]
}
I want the output in the following format:
{
"resourceType" : "Test1",
"count" : 10,
"entry" :[
"resource" :{
"id": "112",
"gender": "female",
"birthDate": 1213999
},
"resource" :{
"id": "urn:uuid:002e27cf-3cae-4393-89c5-1b78050d9428",
"resourceType": "Encounter"
}]
}
I am basically new to Scala :), would need help in this.
EDIT: Adding the scala code to create the JSON:
val bundle = endresult.groupBy("id").agg(count("*") as "total",collect_list("resource") as "entry").
withColumn("resourceType", lit("Bundle")).
drop("id").
select(to_json(struct("resourceType","entry"))).
map(row => row.getString(0).
replace("\"entry\":[\"{", "\"entry\":[{").
replace("}\"]}","}]}"). // Should be at the end of the string ONLY (we might switch to regex instead
replace("}\",\"{","},{")
replace("\\\"", "\"")
)

How to Append JSON Object in already created object in mysql json document

My object is
{
"name":"Testing",
"id": "hcig_3fe7cb00-e936-11e6-af69-a748c8cc89ad",
"belongsTo": {
"id": "69616d26-c3bb-405c-8c84-c51c091524b2",
"name": "test"
},
"locatedAt": {
"id": "49616d26-c3bb-405c-8c84-c51c091524b2",
"name":"Test"
} }
I want to merge one more object like
"obj":[{
"a": 123
}}
With the help of JSON_MERGE in mysql document store i am able to add object.
But it looks likes
{
"name":"Tester",
"id": "hcig_3fe7cb00-e936-11e6-af69-a748c8cc89ad",
"belongsTo": {
"id": "69616d26-c3bb-405c-8c84-c51c091524b2",
"name": "test"
},
"locatedAt": {
"id": "49616d26-c3bb-405c-8c84-c51c091524b2",
"name":"Test"
},{
"obj":[{
"a": 123
}]
}}
I want my object to be as
{
"name": "Tester",
"id": "hcig_3fe7cb00-e936-11e6-af69-a748c8cc89ad",
"belongsTo": {
"id": "69616d26-c3bb-405c-8c84-c51c091524b2",
"name": "test"
},
"locatedAt": {
"id": "49616d26-c3bb-405c-8c84-c51c091524b2",
"name": "Test"
},
"obj": [{
"a": 123
}]}
Any idea on how to add object as above manner using JSON Functions in mysql ??
Use lodash for a recursive deep copy - https://lodash.com/
lodash.merge(targetObj, sourceObj);
Or if you have programmatic access:
targetObj.obj = sourceObj;

Whats the standard of defining an empty object in JSON

I have an issue with my application. It is returning a JSON file of an array of objects. The application is defining an empty object inside the array of objects as text value string whose value is defined as an object in the other element of array. Please see the value of the key "b" in the example.
For Eg:
{
"result": [{
"a": "1",
"b": {
"c1": "31",
"c2": "32"
}
}, {
"a": "5",
"b": ""
}
]
}
I want to know if that is a correct way of defining the key "b" as an empty object.
Thanks in advance!!
An empty object is defined by {}:
"b": {}
I.e. use the usual object delimiters but don't add any key-values.
What you defined is an empty string.
In JSON, an object is defined with { }, which is exactly what you would represent an empty object as.
{
"result": [
{
"a": "1",
"b": {
"c1": "31",
"c2": "32"
}
}, {
"a": "5",
"b": { }
}
]
}

New line feed in json data

I'm having a json file as :-
{
VALUES: [
{
"UTAGID": "SYSTEM_CHILLER1",
"tagName": "P1",
"tagValue": "10",
"tagTime": "2015-07-23T14:29:30.731Z",
"tagQuality": "128"
},
{
"UTAGID": "SYSTEM_CHILLER1",
"tagName": "P1",
"tagValue": "10",
"tagTime": "2015-07-23T14:29:30.731Z",
"tagQuality": "128"
},
{
"UTAGID": "SYSTEM_CHILLER1",
"tagName": "P1",
"tagValue": "10",
"tagTime": "2015-07-23T14:29:30.731Z",
"tagQuality": "128"
}
]
};
where each record is not on a single line.I'm able to retrieve values from a hive table based on this if each record appears on a single line else it shows null values.
What could be the reason?how do I insert new line feed after each record?
Thanks