Why doesn't Elasticsearch Ingest accept a grok pattern that Logstash does? - json

I have the following grok pattern that works in Logstash and in the Grok debugger in Kibana.
\[%{TIMESTAMP_ISO8601:req_time}\] %{IP:client_ip} (?:%{IP:forwarded_for}|\(-\)) (?:%{QS:request}|-) %{NUMBER:response_code:int} %{WORD}:%{NUMBER:request_length:int} %{WORD}:%{NUMBER:body_bytes_sent:int} %{WORD}:(?:%{QS:http_referer}|-) %{WORD}:(?:%{QS:http_user_agent}|-) (%{WORD}:(\")?(%{NUMBER:request_time:float})(\")?)?"
I am trying to create a new ingest pipeline via the PUT method, but I get an error that contains:
"type": "parse_exception",
"reason": "Failed to parse content to map",
"caused_by": {
"type": "i_o_exception",
"reason": "Unrecognized character escape '[' (code 91)\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper#61326735; line: 7, column: 25]"
}

Elasticsearch requires that grok patterns used in pipelines submitted using the PUT method are properly escaped JSON, while Logstash patterns use different escaping.
That includes preceding brackets with double backslashes (\\[) and double quotes with triple backslashes (\\\"). The working pattern (after running through a JSON escaping tool) is:
\\[%{TIMESTAMP_ISO8601:req_time}\\] %{IP:client_ip} (?:%{IP:forwarded_for}|\\(-\\)) (?:%{QS:request}|-) %{NUMBER:response_code:int} %{WORD}:%{NUMBER:request_length:int} %{WORD}:%{NUMBER:body_bytes_sent:int} %{WORD}:(?:%{QS:http_referer}|-) %{WORD}:(?:%{QS:http_user_agent}|-) (%{WORD}:(\\\")?(%{NUMBER:request_time:float})(\\\")?)?

Related

Invalid JSON while submitting spark submit job via NiFi

I am trying to submit a spark job where I am setting a date argument in conf property and I am running it through a script in NiFi. However, when I am running the script I am facing an error.
Spark Submit Code in the script:
aws emr add-steps --cluster-id "$1" --steps '[{"Args":["spark-submit","--deploy-mode","cluster","--jars","s3://tvsc-lumiq-edl/jars/ojdbc7.jar","--executor-memory","10g","--driver-memory","10g","--conf","spark.hadoop.yarn.timeline-service.enabled=false","--conf","currDate='\"$5\"'","--class",'\"$2\"','\"$3\"','\"$4\"'],"Type":"CUSTOM_JAR","ActionOnFailure":"CONTINUE","Jar":"command-runner.jar","Properties":"","Name":"Spark application"}]' --region "$6"
and after I run it, I get the below error:
ExecuteStreamCommand[id=5b08df5a-1f24-3958-30ca-2e27a6c4becf] Transferring flow file StandardFlowFileRecord[uuid=00f844ee-dbea-42a3-aba3-0edcabfc50a2,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1607082757752-507103, container=default, section=223], offset=29, length=-1],offset=0,name=6414901712887990,size=0] to nonzero status. Executable command /bin/bash ended in an error:
Error parsing parameter '--steps': Invalid JSON:
[{"Args":["spark-submit","--deploy-mode","cluster","--jars","s3://tvsc-lumiq-edl/jars/ojdbc7.jar","--executor-memory","10g","--driver-memory","10g","--conf","spark.hadoop.yarn.timeline-service.enabled=false","--conf","currDate="Fri
Where am I going wrong?
You can use JSONLint to validate your JSON, which makes it easier to see why its wrong.
In your case, you are wrapping the final 3 values in single quotes ' rather than double quotes "
Your steps JSON should look like:
[{
"Args": [
"spark-submit",
"--deploy-mode",
"cluster",
"--jars",
"s3://tvsc-lumiq-edl/jars/ojdbc7.jar",
"--executor-memory",
"10g",
"--driver-memory",
"10g",
"--conf",
"spark.hadoop.yarn.timeline-service.enabled=false",
"--conf",
"currDate='\"$5\"'",
"--class",
"\"$2\"",
"\"$3\"",
"\"$4\""
],
"Type": "CUSTOM_JAR",
"ActionOnFailure": "CONTINUE",
"Jar": "command-runner.jar",
"Properties": "",
"Name": "Spark application"
}]
Specifically, these 3 lines:
"\"$2\"",
"\"$3\"",
"\"$4\""
Instead of the original:
'\"$2\"',
'\"$3\"',
'\"$4\"'

Ruby Read file without escaping the characters

File Content:
file.txt:
{ 'a' : 'b"c\g' }
Need to parse this JSON.
read_file = File.read(file.txt)
This read_file string is of the form : "{ 'a' : 'b\"c\\g'}\n".
While parsing the JSON:
JSON::ParserError: 757: unexpected token at '{ 'a' : 'b"c\g'}
from /usr/share/ruby/json/common.rb:155:in `parse'
from /usr/share/ruby/json/common.rb:155:in `parse'
from (irb):21
from /usr/bin/irb:12:in `<main>`
The file could contain any escaping sequence or wild-cards, but it will always be in JSON format.
How to parse such JSON file to ruby Hash?
This is because the json in your text file is invalid. Here is a similar question about parsing a json string using single quotes. For this to be valid json it needs
Double quotes
Escaped special characters
Without this you'll be unable to parse it as it won't be recognized as valid json. Try parsing this text instead:
{ "a": "b\"c\\g" } => {"a"=>"b\"c\g"}

jq construct with value strings spanning multiple lines

I am trying to form a JSON construct using jq that should ideally look like below:-
{
"api_key": "XXXXXXXXXX-7AC9-D655F83B4825",
"app_guid": "XXXXXXXXXXXXXX",
"time_start": 1508677200,
"time_end": 1508763600,
"traffic": [
"event"
],
"traffic_including": [
"unattributed_traffic"
],
"time_zone": "Australia/NSW",
"delivery_format": "csv",
"columns_order": [
"attribution_attribution_action",
"attribution_campaign",
"attribution_campaign_id",
"attribution_creative",
"attribution_date_adjusted",
"attribution_date_utc",
"attribution_matched_by",
"attribution_matched_to",
"attribution_network",
"attribution_network_id",
"attribution_seconds_since",
"attribution_site_id",
"attribution_site_id",
"attribution_tier",
"attribution_timestamp",
"attribution_timestamp_adjusted",
"attribution_tracker",
"attribution_tracker_id",
"attribution_tracker_name",
"count",
"custom_dimensions",
"device_id_adid",
"device_id_android_id",
"device_id_custom",
"device_id_idfa",
"device_id_idfv",
"device_id_kochava",
"device_os",
"device_type",
"device_version",
"dimension_count",
"dimension_data",
"dimension_sum",
"event_name",
"event_time_registered",
"geo_city",
"geo_country",
"geo_lat",
"geo_lon",
"geo_region",
"identity_link",
"install_date_adjusted",
"install_date_utc",
"install_device_version",
"install_devices_adid",
"install_devices_android_id",
"install_devices_custom",
"install_devices_email_0",
"install_devices_email_1",
"install_devices_idfa",
"install_devices_ids",
"install_devices_ip",
"install_devices_waid",
"install_matched_by",
"install_matched_on",
"install_receipt_status",
"install_san_original",
"install_status",
"request_ip",
"request_ua",
"timestamp_adjusted",
"timestamp_utc"
]
}
What I have tried unsuccessfully thus far is below:-
json_construct=$(cat <<EOF
{
"api_key": "6AEC90B5-4169-59AF-7AC9-D655F83B4825",
"app_guid": "komacca-s-rewards-app-au-ios-production-cv8tx71",
"time_start": 1508677200,
"time_end": 1508763600,
"traffic": ["event"],
"traffic_including": ["unattributed_traffic"],
"time_zone": "Australia/NSW",
"delivery_format": "csv"
"columns_order": ["attribution_attribution_action","attribution_campaign","attribution_campaign_id","attribution_creative","attribution_date_adjusted","attribution_date_utc","attribution_matched_by","attribution_matched_to","attributio
network","attribution_network_id","attribution_seconds_since","attribution_site_id","attribution_tier","attribution_timestamp","attribution_timestamp_adjusted","attribution_tracker","attribution_tracker_id","attribution_tracker_name","
unt","custom_dimensions","device_id_adid","device_id_android_id","device_id_custom","device_id_idfa","device_id_idfv","device_id_kochava","device_os","device_type","device_version","dimension_count","dimension_data","dimension_sum","ev
t_name","event_time_registered","geo_city","geo_country","geo_lat","geo_lon","geo_region","identity_link","install_date_adjusted","install_date_utc","install_device_version","install_devices_adid","install_devices_android_id","install_
vices_custom","install_devices_email_0","install_devices_email_1","install_devices_idfa","install_devices_ids","install_devices_ip","install_devices_waid","install_matched_by","install_matched_on","install_receipt_status","install_san_
iginal","install_status","request_ip","request_ua","timestamp_adjusted","timestamp_utc"]
}
EOF)
followed by:-
echo "$json_construct" | jq '.'
I get the following error:-
parse error: Expected separator between values at line 10, column 15
I am guessing it is because of the string literal which spans to multiple lines that jq is unable to parse it.
Use jq itself:
my_formatted_json=$(jq -n '{
"api_key": "XXXXXXXXXX-7AC9-D655F83B4825",
"app_guid": "XXXXXXXXXXXXXX",
"time_start": 1508677200,
"time_end": 1508763600,
"traffic": ["event"],
"traffic_including": ["unattributed_traffic"],
"time_zone": "Australia/NSW",
"delivery_format": "csv",
"columns_order": [
"attribution_attribution_action",
"attribution_campaign",
...,
"timestamp_utc"
]
}')
Your input "JSON" is not valid JSON, as indicated by the error message.
The first error is that a comma is missing after the key/value pair: "delivery_format": "csv", but there are others -- notably, JSON strings cannot be split across lines. Once you fix the key/value pair problem and the JSON strings that are split incorrectly, jq . will work with your text. (Note that once your input is corrected, the longest JSON string is quite short -- 50 characters or so -- whereas jq has no problems processing strings of length 10^8 quite speedily ...)
Generally, jq is rather permissive when it comes to JSON-like input, but if you're ever in doubt, it would make sense to use a validator such as the online validator at jsonlint.com
By the way, the jq FAQ does suggest various ways for handling input that isn't strictly JSON -- see https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json
Along the lines of chepner's suggestion since jq can read raw text data you could just use a jq filter to generate a legal json object from your script variables. For example:
#!/bin/bash
# whatever logic you have to obtain bash variables goes here
key=XXXXXXXXXX-7AC9-D655F83B4825
guid=XXXXXXXXXXXXXX
# now use jq filter to read raw text and construct legal json object
json_construct=$(jq -MRn '[inputs]|map(split(" ")|{(.[0]):.[1]})|add' <<EOF
api_key $key
app_guid $guid
EOF)
echo $json_construct
Sample Run (assumes executable script is in script.sh)
$ ./script.sh
{ "api_key": "XXXXXXXXXX-7AC9-D655F83B4825", "app_guid": "XXXXXXXXXXXXXX" }
Try it online!

Unknown key for a START_OBJECT in [layers]

When trying to send my json formatted tcpdump to elasticsearch, I get the following error:
curl -X PUT --data-binary #myjson 'localhost:9200/_bulk?pretty'
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "Unknown key for a START_OBJECT in [layers].",
"line" : 1,
"col" : 95
}
],
"type" : "parsing_exception",
"reason" : "Unknown key for a START_OBJECT in [layers].",
"line" : 1,
"col" : 95
},
"status" : 400
}
The json file was obtained using tshark with the "-T json" option.
The json file was modified using jq with the filter "{index: .[]}" and the option -c since elasticsearch requires an entry to fit in a single line.
I am using elasticsearch 5.5.1 with the standard configuration.
jsonformatter marks the json object as valid.
A json object that produces the error looks as follows:
{"index":{"_index":"packets-2017-08-04","_type":"pcap_file","_score":null,"_source":{"layers":{"frame":{"frame.encap_type":"25","frame.time":"Aug 5, 2001 13:10:06.559762000 CEST","frame.offset_shift":"0.000000000","frame.time_epoch":"1501773006.559765000","frame.time_delta":"0.000000000","frame.time_delta_displayed":"0.000000000","frame.time_relative":"0.000000000","frame.number":"1","frame.len":"200","frame.cap_len":"200","frame.marked":"0","frame.ignored":"0","frame.protocols":"sll:ethertype:ip:tcp:data"},"sll":{"sll.pkttype":"4","sll.hatype":"65135","sll.halen":"0","sll.etype":"0x00000800"},"ip":{"ip.version":"4","ip.hdr_len":"20","ip.dsfield":"0x00000010","ip.dsfield_tree":{"ip.dsfield.dscp":"4","ip.dsfield.ecn":"0"},"ip.len":"184","ip.id":"0x000093f2","ip.flags":"0x00000002","ip.flags_tree":{"ip.flags.rb":"0","ip.flags.df":"1","ip.flags.mf":"0"},"ip.frag_offset":"0","ip.ttl":"64","ip.proto":"6","ip.checksum":"0x0000ef4b","ip.checksum.status":"2","ip.src":"0.0.00","ip.addr":"0.0.0.0","ip.src_host":"0.0.0.0","ip.host":"0.0.0.0","ip.dst":"0.0.0.0","ip.dst_host":"0.0.0.0","Source GeoIP: Germany":{"ip.geoip.src_country":"Germany","ip.geoip.country":"Germany","ip.geoip.src_city":"Frankfurt, 1","ip.geoip.city":"Berlin, 1","ip.geoip.src_asnum":"123","ip.geoip.asnum":"123","ip.geoip.src_lat":"701","ip.geoip.lat":"523,01","ip.geoip.src_lon":"2313,4","ip.geoip.lon":"12,13"},"Destination GeoIP: Germany":{"ip.geoip.dst_country":"Germany","ip.geoip.country":"Germany","ip.geoip.dst_asnum":"123","ip.geoip.asnum":"123","ip.geoip.dst_lat":"3321","ip.geoip.lat":"41","ip.geoip.dst_lon":"1","ip.geoip.lon":"2"}},"tcp":{"tcp.srcport":"41","tcp.dstport":"124","tcp.port":"234","tcp.stream":"3","tcp.len":"134","tcp.seq":"1","tcp.nxtseq":"133","tcp.ack":"4","tcp.hdr_len":"32","tcp.flags":"0x00000018","tcp.flags_tree":{"tcp.flags.res":"0","tcp.flags.ns":"0","tcp.flags.cwr":"0","tcp.flags.ecn":"0","tcp.flags.urg":"0","tcp.flags.ack":"1","tcp.flags.push":"1","tcp.flags.reset":"0","tcp.flags.syn":"0","tcp.flags.fin":"0","tcp.flags.str":"·······AP···"},"tcp.window_size_value":"223","tcp.window_size":"31","tcp.window_size_scalefactor":"-1","tcp.checksum":"0x0000b79c","tcp.checksum.status":"1","tcp.urgent_pointer":"0","tcp.options":"123","tcp.options_tree":{"No-Operation (NOP)":{"tcp.options.type":"1","tcp.options.type_tree":{"tcp.options.type.copy":"0","tcp.options.type.class":"0","tcp.options.type.number":"1"}},"Timestamps: TSval 1875055084, TSecr 5726840":{"tcp.option_kind":"8","tcp.option_len":"10","tcp.options.timestamp.tsval":"185084","tcp.options.timestamp.tsecr":"1116840"}},"tcp.analysis":{"tcp.analysis.bytes_in_flight":"123","tcp.analysis.push_bytes_sent":"133"}},"data":{"data.data":"01:01:02","data.len":"265"}}}}}
My question is: What is wrong with this json, so that elasticsearch rejects it?
This is not really a jq problem - Unknown key for a START_OBJECT is an elasticsearch error. The [layers] is a hint that the problem is in the object there which unfortunately was elided in the problem description so there's really not much to go on here.
Since the jq filter you specified is just {index:.[]}, jq is doing nothing to the part of the json elasticsearch is complaining about. If your workflow is expecting jq to correct that portion somehow you'll need to investigate the data closer and use a more sophisticated filter.
For reference, the elasticsearch test suite contains an example of this particular error:
---
"junk in source fails":
- do:
catch: /Unknown key for a START_OBJECT in \[junk\]./
reindex:
body:
source:
junk: {}
Hope this helps.

Error in parsing json Expecting '{', '['**

This is the json .
"{
'places': [
{
'name': 'New\x20Orleans,
\x20US\x20\x28New\x20Lakefront\x20\x2D\x20NEW\x29',
'code': 'NEW'
}
]
}"
I am getting json parsererror. I am checking on http://jsonlint.com/ and it shows following error
Parse error on line 1:
"{ 'places': [
^
Expecting '{', '['
Please explain what are the problems with the json and do I correct it?
If you literally mean that the string, as a whole, is your JSON text (containing something that isn't JSON), there are three issues:
It's just a JSON fragment, not a full JSON document.
Literal line breaks within strings are not valid in JSON, use \n.
\x is an invalid escape sequence in JSON strings. If you want your contained non-JSON text to have a \x escape (e.g., when you read the value of the overall string and parse it), you have to escape that backslash: \\x.
In a full JSON document, the top level must be an object or array:
{"prop": "value"}
[1, 2, 3]
Most JSON parsers support parsing fragments, such as standalone strings. (For instance, JavaScript's JSON.parse supports this.) http://jsonlint.com is doing full document parsing, however.
Here's your fragment wrapped in an object with the line breaks and \x issue handled:
{
"stuff": "{\n 'places': [\n {\n 'name': 'New\\x20Orleans,\n \\x20US\\x20\\x28New\\x20Lakefront\\x20\\x2D\\x20NEW\\x29',\n 'code': 'NEW'\n }\n \n ]\n }"
}
The text within the string is also not valid JSON, but perhaps it's not meant to be. For completeness: JSON requires that all keys and strings be in double quotes ("), not single quotes ('). It also doesn't allow literal line breaks within string literals (use \n instead), and doesn't support \x escapes. See http://json.org for details.
Here's a version as valid JSON with the \x converted to the correct JSON \u escape:
{
"places": [
{
"name": "New\u0020Orleans,\n\u0020US\u0020\u0028New\u0020Lakefront\u0020\u002D\u0020NEW\u0029",
"code": "NEW"
}
]
}
...also those escapes are all actually defining perfectly normal characters, so:
{
"places": [
{
"name": "New Orleans,\n US (New Lakefront - NEW)",
"code": "NEW"
}
]
}
read http://json.org/
{
"places": [
{
"name": "New\\x20Orleans,\\x20US\\x20\\x28New\\x20Lakefront\\x20\\x2D\\x20NEW\\x29",
"code": "NEW"
}
]
}