Unable to parse nested json in logstash - json

My application generates the below mentioned mulogs, which is in a nested json logs. While am trying to parse it in kibana, the json parse fails. Below is my log sample,
2022-08-04T12:43:03.977Z {"tags":
{"caller":"sphe",
"job-id":"1",
"mulog/duration":"3180930",
"mulog/namespace":"tool.utilities.db",
"mulog/outcome":":ok",
"user-name":"Pol",
"type":":execute!",
"app-name":"kan",
"mulog/parent-trace":"_YznrMCc",
"user-id":"52-7d4128fb7cb7",
"sql":"SELECT data FROM kan.material_estimate_history WHERE job_id = '167aa1cc' ",
"result":"[]",
"within-tx":"false",
"mulog/root-trace":"S0yn8jclLsmNVyKpH",
"mulog/timestamp":"1659616983977",
"uri":"/api/kan/material-estimates/job/f14b167aa1cc",
"mulog/trace-id":"kI4grnAMe4bGmFc_aX",
"request-method":":get",
"mulog/event-name":":kan=.source.db.material-estimates/find-history-by-job-id"},
"localEndpoint":{"serviceName":"kan"},
"name":"kan.source.db.material-estimates/find-history-by-job-id",
"traceId":"112721d07ecc9be",
"duration":3180,"id":"c90a259a2",
"kind":"SERVER","timestamp":1659616983977000,
"parentId":"dd7368"}

Related

How to add field within nested JSON when reading from/writing to Kafka via a Spark dataframe

I've a Spark (v.3.0.1) job written in Java that reads Json from Kafka, does some transformation and then writes it back to Kafka. For now, the incoming message structure in Kafka is something like:
{"catKey": 1}. The output from the Spark job that's written back to Kafka is something like: {"catKey":1,"catVal":"category-1"}. The code for processing input data from Kafka goes something as follows:
DataFrameReader dfr = putSrcProps(spark.read().format("kafka"));
for (String key : srcProps.stringPropertyNames()) {
dfr = dfr.option(key, srcProps.getProperty(key));
}
Dataset<Row> df = dfr.option("group.id", getConsumerGroupId())
.load()
.selectExpr("CAST(value AS STRING) as value")
.withColumn("jsonData", from_json(col("value"), schemaHandler.getSchema()))
.select("jsonData.*");
// transform df
df.toJSON().write().format("kafka").option("key", "val").save()
I want to change the message structure in Kafka. Now, it should be of the format: {"metadata": <whatever>, "payload": {"catKey": 1}}. While reading, we need to read only the contents of the payload, so the dataframe remains similar. Also, while writing back to Kafka, first I need to wrap the msg in payload, add a metadata. The output will have to be of the format: {"metadata": <whatever>, "payload": {"catKey":1,"catVal":"category-1"}}. I've tried manipulating the contents of the selectExpr and from_json method, but no luck so far. Any pointer on how to achieve the functionality would be very much appreciated.
To extract the content of payload in your JSON you can use get_json_object. And to create the new output you can use the built-in functions struct and to_json.
Given a Dataframe:
val df = Seq(("""{"metadata": "whatever", "payload": {"catKey": 1}}""")).toDF("value").as[String]
df.show(false)
+--------------------------------------------------+
|value |
+--------------------------------------------------+
|{"metadata": "whatever", "payload": {"catKey": 1}}|
+--------------------------------------------------+
Then creating the new column called "value"
val df2 = df
.withColumn("catVal", lit("category-1")) // whatever your logic is to fill this column
.withColumn("payload",
struct(
get_json_object(col("value"), "$.payload.catKey").as("catKey"),
col("catVal").as("catVal")
)
)
.withColumn("metadata",
get_json_object(col("value"), "$.metadata"),
).select("metadata", "payload")
df2.show(false)
+--------+---------------+
|metadata|payload |
+--------+---------------+
|whatever|[1, category-1]|
+--------+---------------+
val df3 = df2.select(to_json(struct(col("metadata"), col("payload"))).as("value"))
df3.show(false)
+----------------------------------------------------------------------+
|value |
+----------------------------------------------------------------------+
|{"metadata":"whatever","payload":{"catKey":"1","catVal":"category-1"}}|
+----------------------------------------------------------------------+

Dump a list into a JSON file acceptable by Athena

I am creating a JSON file in an s3 bucket using the following code -
def myconverter(o):
if isinstance(o, datetime.datetime):
return o.__str__()
s3.put_object(
Bucket='sample-bucket',
Key="sample.json",
Body = json.dumps(whole_file, default=myconverter)
)
Here, the whole_file variable is a list.
Sample of the "whole_file" variable -
[{"sample_column1": "abcd","sample_column2": "efgh"},{"sample_column1": "ijkl","sample_column2": "mnop"}]
The output "sample.json" file that I get should be in the following format -
{"sample_column1": "abcd","sample_column2": "efgh"}
{"sample_column1": "ijkl","sample_column2": "mnop"}
The output "sample.json" that I am getting is -
[{"sample_column1": "abcd","sample_column2": "efgh"},{"sample_column1": "ijkl","sample_column2": "mnop"}]
What changes should be made to get each JSON object in a single line?
You can write each entry to the file, then upload the file object to s3
import json
whole_file = [{"sample_column1": "abcd","sample_column2": "efgh"},
{"sample_column1": "ijkl","sample_column2": "mnop"}
]
with open("temp.json", "w") as temp:
for record in whole_file:
temp.write(json.dumps(record, default=str))
temp.write("\n")
The lookput should look like this
~ cat temp.json
{"sample_column1": "abcd", "sample_column2": "efgh"}
{"sample_column1": "ijkl", "sample_column2": "mnop"}
upload the file
import boto3
s3 = boto3.client("s3")
s3.upload_file("temp.json", bucket, object_name="whole_file.json")

I m trying to use 'ffprobe' with Java or groovy

As per my understanding "ffprobe" will provide file related data in JSON format. So, I have installed the ffprobe in my Ubuntu machine but I don't know how to access the ffprobe JSON response using Java/Grails.
Expected response format:
{
"format": {
"filename": "/Users/karthick/Documents/videos/TestVideos/sample.ts",
"nb_streams": 2,
"nb_programs": 1,
"format_name": "mpegts",
"format_long_name": "MPEG-TS (MPEG-2 Transport Stream)",
"start_time": "1.430800",
"duration": "170.097489",
"size": "80425836",
"bit_rate": "3782576",
"probe_score": 100
}
}
This is my groovy code
def process = "ffprobe -v quiet -print_format json -show_format -show_streams HelloWorld.mpeg ".execute()
println "Found ${process.text}"
render process as JSON
I m able to get the process object and i m not able to get the json response
Should i want to convert the process object to json object?
OUTPUT:
Found java.lang.UNIXProcess#75566697
org.codehaus.groovy.grails.web.converters.exceptions.ConverterException: Error converting Bean with class java.lang.UNIXProcess
Grails has nothing to do with this. Groovy can execute arbitrary shell commands in a very simplistic way:
"mkdir foo".execute()
Or for more advanced features, you might look into using ProcessBuilder. At the end of the day, you need to execute ffprobe and then capture the output stream of JSON to use in your app.
Groovy provides a simple way to execute command line processes. Simply
write the command line as a string and call the execute() method.
The execute() method returns a java.lang.Process instance.
println "ffprobe <options>".execute().text
[Source]

Json Type Provider: Parsing Valid Json Fails

I have the following code block in my REPL
#r "../packages/FSharp.Data.2.2.1/lib/net40/FSharp.Data.dll"
open FSharp.Data
[<Literal>]
let uri = "http://www.google.com/finance/option_chain?q=AAPL&output=json"
type OptionChain = JsonProvider<uri>
When I run it, FSI is returning
Error 1 The type provider 'ProviderImplementation.JsonProvider'
reported an error: Cannot read sample JSON from
'http://www.google.com/finance/option_chain?q=AAPL&output=json':
Invalid JSON starting at character 1, snippet =
---- {expiry:{y:2
----- json =
------ {expiry:{y:2015,m:5,d:8},expirations: [{y:2015,m:5,d:8},{y:2015,m:5,d:15},
This json is valid according to two other sites. Is it a bug in the TP?
The output isn't valid JSON because some keys are not quoted.
{expiry:{y:2015,m:5,d:8},expirations:[{y:2015,m:5,d:8},{y:2015,m:5,d:15},{y:2015,m:5,d:22},{y:2015,m:5,d:29},{y:2015,m:6,d:5},{y:2015,m:6,d:12},{y:2015,m:6,d:19},{y:2015,m:6,d:26},{y:2015,m:7,d:17},{y:2015,m:8,d:21},{y:2015,m:10,d:16},{y:2016,m:1,d:15},{y:2017,m:1,d:20}],
puts:[{cid:"43623726334021",s:"AAPL150508P00085000",e:"OPRA",p:"-",c:"-",b:"-",a:"-",oi:"-",vol:"-",strike:"85.00",expiry:"May 8, 2015"},
...

Json Parsing in Apache Pig

I am Having a json :
{"Name":"sampling","elementInfo":{"fraction":"3"},"destination":"/user/sree/OUT","source":"/user/sree/foo.txt"}
I found that we are able to load json into PigScript.
A = LOAD ‘data.json’
USING PigJsonLoader();
But how to parse json in Apache Pig
--Sampling.pig
--pig -x mapreduce -f Sampling.pig -param input=foo.csv -param output=OUT/pig -param delimiter="," -param fraction='0.05'
--Load data
inputdata = LOAD '$input' using PigStorage('$delimiter');
--Group data
groupedByAll = group inputdata all;
--output into hdfs
sampled = SAMPLE inputdata $fraction;
store sampled into '$output' using PigStorage('$delimiter');
Above is my pig script.
How to parse json (each element) in Apache pig?
I need to take above json as input and parse its source,delimiter,fraction,output and pass in $input,$delimiter,$fraction,$output respectively.
How to parse the same .
Please suggest
Try this :
--Load data
inputdata = LOAD '/input.txt' using JsonLoader('Name:chararray,elementinfo:(fraction:chararray),destionation:chararray,source:chararray');
--Group data
groupedByAll = group inputdata all;
store groupedByAll into '/OUT/pig' using PigStorage(',');
Now your output looks :
all,{(sampling1,(4),/user/sree/OUT1,/user/sree/foo1.txt),(sampling,(3),/user/sree/OUT,/user/sree/foo.txt)}
In input file fraction data {"fraction":"3"} in double quotes. so i used fraction as chararray so can't able to run sample command so i used the above script to get the result.
if you want to perform sample operation cast the fraction data to int and then you will get the result.