Unknown key for a START_OBJECT in [layers] - json

When trying to send my json formatted tcpdump to elasticsearch, I get the following error:
curl -X PUT --data-binary #myjson 'localhost:9200/_bulk?pretty'
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "Unknown key for a START_OBJECT in [layers].",
"line" : 1,
"col" : 95
}
],
"type" : "parsing_exception",
"reason" : "Unknown key for a START_OBJECT in [layers].",
"line" : 1,
"col" : 95
},
"status" : 400
}
The json file was obtained using tshark with the "-T json" option.
The json file was modified using jq with the filter "{index: .[]}" and the option -c since elasticsearch requires an entry to fit in a single line.
I am using elasticsearch 5.5.1 with the standard configuration.
jsonformatter marks the json object as valid.
A json object that produces the error looks as follows:
{"index":{"_index":"packets-2017-08-04","_type":"pcap_file","_score":null,"_source":{"layers":{"frame":{"frame.encap_type":"25","frame.time":"Aug 5, 2001 13:10:06.559762000 CEST","frame.offset_shift":"0.000000000","frame.time_epoch":"1501773006.559765000","frame.time_delta":"0.000000000","frame.time_delta_displayed":"0.000000000","frame.time_relative":"0.000000000","frame.number":"1","frame.len":"200","frame.cap_len":"200","frame.marked":"0","frame.ignored":"0","frame.protocols":"sll:ethertype:ip:tcp:data"},"sll":{"sll.pkttype":"4","sll.hatype":"65135","sll.halen":"0","sll.etype":"0x00000800"},"ip":{"ip.version":"4","ip.hdr_len":"20","ip.dsfield":"0x00000010","ip.dsfield_tree":{"ip.dsfield.dscp":"4","ip.dsfield.ecn":"0"},"ip.len":"184","ip.id":"0x000093f2","ip.flags":"0x00000002","ip.flags_tree":{"ip.flags.rb":"0","ip.flags.df":"1","ip.flags.mf":"0"},"ip.frag_offset":"0","ip.ttl":"64","ip.proto":"6","ip.checksum":"0x0000ef4b","ip.checksum.status":"2","ip.src":"0.0.00","ip.addr":"0.0.0.0","ip.src_host":"0.0.0.0","ip.host":"0.0.0.0","ip.dst":"0.0.0.0","ip.dst_host":"0.0.0.0","Source GeoIP: Germany":{"ip.geoip.src_country":"Germany","ip.geoip.country":"Germany","ip.geoip.src_city":"Frankfurt, 1","ip.geoip.city":"Berlin, 1","ip.geoip.src_asnum":"123","ip.geoip.asnum":"123","ip.geoip.src_lat":"701","ip.geoip.lat":"523,01","ip.geoip.src_lon":"2313,4","ip.geoip.lon":"12,13"},"Destination GeoIP: Germany":{"ip.geoip.dst_country":"Germany","ip.geoip.country":"Germany","ip.geoip.dst_asnum":"123","ip.geoip.asnum":"123","ip.geoip.dst_lat":"3321","ip.geoip.lat":"41","ip.geoip.dst_lon":"1","ip.geoip.lon":"2"}},"tcp":{"tcp.srcport":"41","tcp.dstport":"124","tcp.port":"234","tcp.stream":"3","tcp.len":"134","tcp.seq":"1","tcp.nxtseq":"133","tcp.ack":"4","tcp.hdr_len":"32","tcp.flags":"0x00000018","tcp.flags_tree":{"tcp.flags.res":"0","tcp.flags.ns":"0","tcp.flags.cwr":"0","tcp.flags.ecn":"0","tcp.flags.urg":"0","tcp.flags.ack":"1","tcp.flags.push":"1","tcp.flags.reset":"0","tcp.flags.syn":"0","tcp.flags.fin":"0","tcp.flags.str":"·······AP···"},"tcp.window_size_value":"223","tcp.window_size":"31","tcp.window_size_scalefactor":"-1","tcp.checksum":"0x0000b79c","tcp.checksum.status":"1","tcp.urgent_pointer":"0","tcp.options":"123","tcp.options_tree":{"No-Operation (NOP)":{"tcp.options.type":"1","tcp.options.type_tree":{"tcp.options.type.copy":"0","tcp.options.type.class":"0","tcp.options.type.number":"1"}},"Timestamps: TSval 1875055084, TSecr 5726840":{"tcp.option_kind":"8","tcp.option_len":"10","tcp.options.timestamp.tsval":"185084","tcp.options.timestamp.tsecr":"1116840"}},"tcp.analysis":{"tcp.analysis.bytes_in_flight":"123","tcp.analysis.push_bytes_sent":"133"}},"data":{"data.data":"01:01:02","data.len":"265"}}}}}
My question is: What is wrong with this json, so that elasticsearch rejects it?

This is not really a jq problem - Unknown key for a START_OBJECT is an elasticsearch error. The [layers] is a hint that the problem is in the object there which unfortunately was elided in the problem description so there's really not much to go on here.
Since the jq filter you specified is just {index:.[]}, jq is doing nothing to the part of the json elasticsearch is complaining about. If your workflow is expecting jq to correct that portion somehow you'll need to investigate the data closer and use a more sophisticated filter.
For reference, the elasticsearch test suite contains an example of this particular error:
---
"junk in source fails":
- do:
catch: /Unknown key for a START_OBJECT in \[junk\]./
reindex:
body:
source:
junk: {}
Hope this helps.

Related

Invalid JSON while submitting spark submit job via NiFi

I am trying to submit a spark job where I am setting a date argument in conf property and I am running it through a script in NiFi. However, when I am running the script I am facing an error.
Spark Submit Code in the script:
aws emr add-steps --cluster-id "$1" --steps '[{"Args":["spark-submit","--deploy-mode","cluster","--jars","s3://tvsc-lumiq-edl/jars/ojdbc7.jar","--executor-memory","10g","--driver-memory","10g","--conf","spark.hadoop.yarn.timeline-service.enabled=false","--conf","currDate='\"$5\"'","--class",'\"$2\"','\"$3\"','\"$4\"'],"Type":"CUSTOM_JAR","ActionOnFailure":"CONTINUE","Jar":"command-runner.jar","Properties":"","Name":"Spark application"}]' --region "$6"
and after I run it, I get the below error:
ExecuteStreamCommand[id=5b08df5a-1f24-3958-30ca-2e27a6c4becf] Transferring flow file StandardFlowFileRecord[uuid=00f844ee-dbea-42a3-aba3-0edcabfc50a2,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1607082757752-507103, container=default, section=223], offset=29, length=-1],offset=0,name=6414901712887990,size=0] to nonzero status. Executable command /bin/bash ended in an error:
Error parsing parameter '--steps': Invalid JSON:
[{"Args":["spark-submit","--deploy-mode","cluster","--jars","s3://tvsc-lumiq-edl/jars/ojdbc7.jar","--executor-memory","10g","--driver-memory","10g","--conf","spark.hadoop.yarn.timeline-service.enabled=false","--conf","currDate="Fri
Where am I going wrong?
You can use JSONLint to validate your JSON, which makes it easier to see why its wrong.
In your case, you are wrapping the final 3 values in single quotes ' rather than double quotes "
Your steps JSON should look like:
[{
"Args": [
"spark-submit",
"--deploy-mode",
"cluster",
"--jars",
"s3://tvsc-lumiq-edl/jars/ojdbc7.jar",
"--executor-memory",
"10g",
"--driver-memory",
"10g",
"--conf",
"spark.hadoop.yarn.timeline-service.enabled=false",
"--conf",
"currDate='\"$5\"'",
"--class",
"\"$2\"",
"\"$3\"",
"\"$4\""
],
"Type": "CUSTOM_JAR",
"ActionOnFailure": "CONTINUE",
"Jar": "command-runner.jar",
"Properties": "",
"Name": "Spark application"
}]
Specifically, these 3 lines:
"\"$2\"",
"\"$3\"",
"\"$4\""
Instead of the original:
'\"$2\"',
'\"$3\"',
'\"$4\"'

Pull value in jq with escaped vars

I have a JSON that I am trying to process.
I am using jq and can't for my life get the required output.
I have a simple eg below,
{
"message" :"{ \"foo\": \"42\", \"bar\": \"less interesting data\"}"
}
My Build Up
jq '."message"
{
"message" :{"foo": "42", "bar": "less interesting data"}
}
gives
{
"foo": "42",
"bar": "less interesting data"
}
."message"."bar"
gives
"less interesting data"
So
{
"message" :"{"foo": "42", "bar": "less interesting data"}"
}
FAILS as JSON invalid
{
"message" :"{\"foo\": \"42\", \"bar\": \"less interesting data\"}"
}
FAILS 'jq: error (at :3): Cannot index string with string "bar"
exit status 5'
I have tried a whole bunch of differing jq queries (i won't waste your time listing them)
So I would like some advice on how id get "bar" from the JSON
It's not a duplicate of convert string to JSON as this leads you to the idea of conversion. Without this question, you'd never know the answer is to use fromjson
Use the fromjson construct to restore the strings as JSON texts. So, given the content below
{
"message": "{ \"foo\": \"42\", \"bar\": \"less interesting data\" }"
}
all you need to do to extract bar is
jq '."message"|fromjson|.bar' file
"less interesting data"
To print the output without the quotes, use the -r/--raw-ouput flag which emits text in raw format. As noted in the comments fromjson.bar should also work as expected.

Why doesn't Elasticsearch Ingest accept a grok pattern that Logstash does?

I have the following grok pattern that works in Logstash and in the Grok debugger in Kibana.
\[%{TIMESTAMP_ISO8601:req_time}\] %{IP:client_ip} (?:%{IP:forwarded_for}|\(-\)) (?:%{QS:request}|-) %{NUMBER:response_code:int} %{WORD}:%{NUMBER:request_length:int} %{WORD}:%{NUMBER:body_bytes_sent:int} %{WORD}:(?:%{QS:http_referer}|-) %{WORD}:(?:%{QS:http_user_agent}|-) (%{WORD}:(\")?(%{NUMBER:request_time:float})(\")?)?"
I am trying to create a new ingest pipeline via the PUT method, but I get an error that contains:
"type": "parse_exception",
"reason": "Failed to parse content to map",
"caused_by": {
"type": "i_o_exception",
"reason": "Unrecognized character escape '[' (code 91)\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper#61326735; line: 7, column: 25]"
}
Elasticsearch requires that grok patterns used in pipelines submitted using the PUT method are properly escaped JSON, while Logstash patterns use different escaping.
That includes preceding brackets with double backslashes (\\[) and double quotes with triple backslashes (\\\"). The working pattern (after running through a JSON escaping tool) is:
\\[%{TIMESTAMP_ISO8601:req_time}\\] %{IP:client_ip} (?:%{IP:forwarded_for}|\\(-\\)) (?:%{QS:request}|-) %{NUMBER:response_code:int} %{WORD}:%{NUMBER:request_length:int} %{WORD}:%{NUMBER:body_bytes_sent:int} %{WORD}:(?:%{QS:http_referer}|-) %{WORD}:(?:%{QS:http_user_agent}|-) (%{WORD}:(\\\")?(%{NUMBER:request_time:float})(\\\")?)?

Why does MongoDB produce invalid JSON? The ObjectId is not quoted, breaks jq parser

I've searched the forum and seen many folks with a similar problem, but not this exact one.
I think my question is the simplest form, and there must be something I'm missing because no one is asking it.
I have a shell script that calls a MongoDB script and gets the results in a file. I then want to parse that file with jq.
jq is breaking because the output from the query is not valid JSON. The offender is the ObjectId. I'm at a total loss as to how something that's "ALL JSON ALL THE TIME" produces invalid JSON.
I'm pretty sure there's something fundamental that I'm missing.
I have a file called MyMongoScript.js. Its contents look like this:
db.WorkflowJobs.find().sort({"STATUS":1}).forEach(printjson)
I call MyMongScript.js with the following command:
mongo -u $MONGO_U -p $MONGO_P $MONGO_DB -quiet --eval "var DATE_TO_RUN=\"$DATE_TO_RUN\"" MyMongoScript.js
Here's the results to STDOUT:
{
"_id" : ObjectId("52816fd50bc9efc3e6d8e33f"),
"WORK_TYPE" : "HIVE",
"Script_Name" : "upload_metrics_LANDING_to_HIST.sql",
"Stop_On_Fail" : true,
"STATUS" : "READY",
"START_TS" : "NULL",
"END_TS" : "NULL",
"DURATION" : "NULL",
"INS_TS" : "Mon Nov 11 2013 16:01:25 GMT-0800 (PST)"
}
Here's what jsonlint.com says about it:
Parse error on line 2:
{ "_id": ObjectId("52816fd50b
------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '['
Any help much appreciated.
Try this for your MyMongoScript.js:
db.WorkflowJobs.find().sort({"STATUS":1}).forEach(function(myDoc){myDoc._id=myDoc._id.valueOf();print(tojson(myDoc))});
The key is valueOf() which will set your ObjectId to a String.
EDITED Left out a paren.

mongoexport JSON assertion: 10340 Failure parsing JSON string

I'm trying to export CSV file list from mongoDB and save the output file to my directory, which is /home/asaj/. The output file should have the following columns: name, file_name, d_start and d_end.
The query should filter data with status equal to "FU" or "FD", and d_end > Dec. 10, 2012.
In mongoDB, the query is working properly. The query below is limited to 1 data output. See query below:
> db.Samples.find({ $or : [ { status : 'FU' }, { status : 'FD'} ], d_end : { $gte : ISODate("2012-12-10T00:00:00.000Z") } }, {_id: 0, name: 1, file_name: 1, d_start: 1, d_end: 1}).limit(1).toArray();
[
{
"name" : "sample"
"file_name" : "sample.jpg",
"d_end" : ISODate("2012-12-10T05:1:57.879Z"),
"d_start" : ISODate("2012-12-10T02:31:34.560Z"),
}
]
>
In CLI, mongoexport command looks like this:
mongoexport -d maindb -c Samples -f "name, file_name, d_start, d_end" -q "{'\$or' : [ { 'status' : 'FU' }, { 'status' : 'FD'} ] , 'd_end' : { '\$gte' : ISODate("2012-12-10T00:00:00.000Z") } }" --csv -o "/home/asaj/currentlist.csv"
But i always ended up with this error:
connected to: 127.0.0.1
Wed Dec 19 16:58:17 Assertion: 10340:Failure parsing JSON string near: , 'd_end
0x5858b2 0x528cb4 0x52902e 0xa9a631 0xa93e4d 0xa97de2 0x31b441ecdd 0x4fd289
mongoexport(_ZN5mongo11msgassertedEiPKc+0x112) [0x5858b2]
mongoexport(_ZN5mongo8fromjsonEPKcPi+0x444) [0x528cb4]
mongoexport(_ZN5mongo8fromjsonERKSs+0xe) [0x52902e]
mongoexport(_ZN6Export3runEv+0x7b1) [0xa9a631]
mongoexport(_ZN5mongo4Tool4mainEiPPc+0x169d) [0xa93e4d]
mongoexport(main+0x32) [0xa97de2]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x31b441ecdd]
mongoexport(__gxx_personality_v0+0x3c9) [0x4fd289]
assertion: 10340 Failure parsing JSON string near: , 'd_end
I'm having error in ", 'd_end' " in mongoexport CLI. I'm not so sure if it is a JSON syntax error because query works on MongoDB.
Please help.
After asking someone knows MongoDB better than me, we found out that the problem is the
ISODate("2012-12-10T00:00:00.000Z")
We found the answer on this question: mongoexport JSON parsing error
To resolve this error, first, we convert it to strtotime:
php > echo strtotime("12/10/2012");
1355126400
Next, multiple strtotime result by 1000. This date will looks like this:
1355126400000
Lastly, change ISODate("2012-12-10T00:00:00.000Z") to new Date(1355126400000) in the mongoexport command.
Now, the CLI mongoexport looks like this and it works:
mongoexport -d maindb -c Samples -f "id,file_name,d_start,d_end" -q "{'\$or' : [ { 'status' : 'FU' }, { 'status' : 'FD'} ] , 'd_end' : { '\$gte' : new Date(1355126400000) } }" --csv -o "/home/asaj/listupdate.csv"
Note: remove space between each field names in -f or --fields option.
I know it has little to do with this question, but the title of this post brought it up in Google so since I was getting the exact same error I'll add an answer. Hopefully it helps someone.
My issue was adding a MongoId query for _id to a mongoexport console command on Windows. Here's the error:
Assertion: 10340:Failure parsing JSON string near: _id
The problem ended up being that I needed to wrap the JSON query in double quotes, and the ObjectId had to be in double quotes (not single!), so I had to escape those quotes. Here's the final query that worked, for future reference:
mongoexport -u USERNAME -pPASSWORD -d DATABASE -c COLLECTION
--query "{_id : ObjectId(\"5148894d98981be01e000011\")}"