How to scrape multi-line java logs with promtail? - multiline

Here is som java logs
2020-08-21 19:55:44-[audit]-INFO[http-nio-8001-exec-10]BeetlsqlDebugInterceptor.println(20) |
┏━━━━━ Debug [conclusionOperator.selectConclusionOperator] ━━━
┣ SQL: select id,create_time,update_time from conclusion_operator where id=?
┣ PARAMETERS: [751]
┣ POSITION: org.aaaa.audit.controller.MailController.getElecMail(MailController.java:584)
┣ TIME: 3ms
┣ RESULT: [2]
┗━━━━━ Debug [conclusionOperator.selectConclusionOperator] ━━━
This 8 lines is one log event actrully, how to config promtail to scrape this lines as one log event? prefer to your pipeline_stages ! Thanks a lot!

It is not possible when using Promtail <= 2.1 . Either you use a framework that posts directly to loki like #nehaev or you have to upgrade to Loki 2.2
The workaround for Loki < 2.2 is to encode your multi line log statement in json or logfmt (preferably the later, as it is better readable without parsing). The new lines are escaped by \n in these encodings and for Promtail its still a single line.
you would then have a log line like
time="2020-08-21 19:55:44" msg="line1\nline2\nline3" level=INFO logger=audit ...
you can then find and parse it with the following query
{app="someApp"} |= "..." | logfmt
Grafana will then parse all the fields in the log line like level=INFO additionally it will deescape the message to
line1
line2
line3
The nice thing about this approach is you cann add meta information like a correlation id, transaction id, tenant/customer id that is easy to find but still parsable by Grafana.

If you use Logback as your logging framework, consider switching to an appender that writes directly to Loki: https://github.com/loki4j/loki-logback-appender (disclosure: I'm an author of it).
Unlike promtail, it works with original Logback's log events and doesn't need to parse anything. So no issues with multiline strings, custom labels, etc.

Related

Apache drill cannot parse CSV files with windows EOL correctly?

Ok, let's save someone 8 hours of clueless debugging.
TL;DR: Apache drill cannot correctly parse CSV files generated on windows machines. That's because their EOL is set to \r\n by default unlike to unix system, where it is set to \n. And this leads to horribly undebuggable errors because the leading \r probably stays clued to the last field's value. And what's funny, you won't notice this because it's invisible.
Let's have two files, one created in linux and the second in windows: hello.linux.csv and hello.win.csv. The content is the same (at least it looks like it is ...)
field_a,field_b
Hello,0.5
Let's have a query.
SELECT * from (...)/hello.linux.csv;
---
field_a, field_b
Hello, "0.5"
SELECT * from (...)/hello.win.csv;
---
field_a, field_b
Hello, "0.5"
Fine! Let's do something with the data. Cast "0.5" to number should be fine (and necessary).
SELECT
field_a, CAST (field_b as DECIMAL(10, 2)) as test
from (...)/hello.linux.csv;
---
field_a, test
Hello, 0.5
-- ... aaand, here we go!
SELECT
field_a, CAST (field_b as DECIMAL(10, 2)) as test
from (...)/hello.win.csv;
[30038]Query execution error. Details:[
SYSTEM ERROR: NumberFormatException
Fragment 0:0
Please, refer to logs for more information. -- In the logs, there is only useless java stacktrace, of course.
[Error Id: 3551c939-3f5b-42c1-9b58-d600da5f12a0 on drill-develop-7bdb45c597-52rnz:31010]
]
...
(And now, imagine how much time would take to reveal this on a complex production setup where the queries, data and other factors are somehow more complicated.)
The question: Is there a way how to force apache drill (v 1.15) to process CSV files created with windows EOLs?
You can update csv format line delimiter to \r\n but this would apply to all csv files in the scope of your text plugin. To change delimiter per table use table function.
https://drill.apache.org/docs/plugin-configuration-basics/

Parsing json output for hive

I need to automatically move new cases (TheHive-Project) to LimeSurvey every 5 minutes. I have figured out the basis of the API script to add responses to LimeSurvey. However, I can't figure out how to add only new cases, and how to parse the Hive case data for the information I want to add.
So far I've been using curl to get a list of cases from hive. The following is the command and the output.
curl -su user:pass http://myhiveIPaddress:9000/api/case
[{"createdBy":"charlie","owner":"charlie","createdAt":1498749369897,"startDate":1498749300000,"title":"test","caseId":1,"user":"charlie","status":"Open","description":"testtest","tlp":2,"tags":[],"flag":false,"severity":1,"metrics":{"Time for Alert to Handler Pickup":2,"Time from open to close":4,"Time from compromise to discovery":6},"updatedBy":"charlie","updatedAt":1498751817577,"id":"AVz0bH7yqaVU6WeZlx3w","_type":"case"},{"createdBy":"charlie","owner":"charlie","title":"testtest","caseId":3,"description":"ddd","user":"charlie","status":"Open","createdAt":1499446483328,"startDate":1499446440000,"severity":2,"tlp":2,"tags":[],"flag":false,"id":"AV0d-Z0DqHSVxnJ8z_HI","_type":"case"},{"createdBy":"charlie","owner":"charlie","createdAt":1499268177619,"title":"test test","user":"charlie","status":"Open","caseId":2,"startDate":1499268120000,"tlp":2,"tags":[],"flag":false,"description":"s","severity":1,"metrics":{"Time from open to close":2,"Time for Alert to Handler Pickup":3,"Time from compromise to discovery":null},"updatedBy":"charlie","updatedAt":1499268203235,"id":"AV0TWOIinKQtYP_yBYgG","_type":"case"}]
Each field is separated by the delimiter },{.
In regards to parsing out specific information from each case, I previously tried to just use the cut command. This mostly worked until I reached "metrics"; it doesn't always work for metrics because they will not always be listed in the same order.
I have asked my boss for help, and he told me this command might get me going in the right direction to adding only new hive cases to the survey, but I'm still very lost and want to avoid asking too much again.
curl -su user:pass http://myhiveIPaddress:9000/api/case | sed 's/},{/\n/g' | sed 's/\[{//g' | sed 's/}]//g' | awk -F '"caseId":' {'print $2'} | cut -f 1 -d , | sort -n | while read line; do echo '"caseId":'$line; done
Basically, I'm in way over my head and feel like I have no idea what I'm doing. If I need to clarify anything, or if it would help for me to post what I have so far in my API script, please let me know.
Update
Here is the potential logic for the script I'd like to write.
get list of hive cases (curl ...)
read each field, delimited by },{
while read each field, check /tmp/addedHiveCases to see if caseId of field already exists
--> if it does not exist in file, add case to limesurvey and add caseId to /tmp/addedHiveCases
--> if it does exist, skip to next field
why are you thinking that the fields are separated by a "},{" delimiter?
The response of the /api/case API is a valid JSON format, that lists the cases.
Can you use a Python script to play with the API? If yes, I can help you write the script you need.

How to generate a JSON log from nginx?

I'm trying to generate a JSON log from nginx.
I'm aware of solutions like this one but some of the fields I want to log include user generated input (like HTTP headers) which need to be escaped properly.
I'm aware of the nginx changelog entries from Oct 2011 and May 2008 that say:
*) Change: now the 0x7F-0x1F characters are escaped as \xXX in an
access_log.
*) Change: now the 0x00-0x1F, '"' and '\' characters are escaped as \xXX
in an access_log.
but this still doesn't help since \xXX is invalid in a JSON string.
I've also looked at the HttpSetMiscModule module which has a set_quote_json_str directive, but this just seems to add \x22 around the strings which doesn't help.
Any idea for other solutions to log in JSON format from nginx?
Finally it looks like we have good way to do this with vanilla nginx without any modules. Just define:
log_format json_combined escape=json
'{'
'"time_local":"$time_local",'
'"remote_addr":"$remote_addr",'
'"remote_user":"$remote_user",'
'"request":"$request",'
'"status": "$status",'
'"body_bytes_sent":"$body_bytes_sent",'
'"request_time":"$request_time",'
'"http_referrer":"$http_referer",'
'"http_user_agent":"$http_user_agent"'
'}';
Note that escape=json was added in nginx 1.11.8.
http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format
You can try to use that one https://github.com/jiaz/nginx-http-json-log - addition module for Nginx.
You can try to use:
addition module for Nginx nginx-http-json-log
Use any language as done in nginx-json-logformat with example /etc/nginx/conf.d/json_log.conf
A version of the Nginx HTTP stub status module that outputs in JSON format
PS:
The if parameter (1.7.0) enables conditional logging. A request will not be logged if the condition evaluates to “0” or an empty string:
map $status $http_referer{
~\xXX 0;
default 1;
}
access_log /path/to/access.log combined if=$http_referer;
It’s a good idea to use a tool such as https://github.com/zaach/jsonlint to check your JSON data. You can test the output of your new logging format and make sure it’s real-and-proper JSON.

Converting Shell Output to json

I want to convert the output of octave execution in shell to json format.
For example if I execute
$ octave --silent --eval 'a=[1,3],b=2'
I get
a =
1 3
b = 2
I want the output to be formatted to a json string as in
"{'a':[1,3], 'b':2}"
How do I achieve this, It would be great if it is in node/js, but anthing is fine. I am looking for any existing solutions to rather than writing my own logic for parsing it. Need suggestion.
I doubt if any such package exists. Its easy to write your own rather thank waiting to find one.

Ways to parse JSON using KornShell

I have a working code for parsing a JSON output using KornShell by treating it as a string of characters. The issue I have is that the vendor keeps changing the position of the field that I am intersted in. I understand in JSON, we can parse it by key-value pairs.
Is there something out there that can do this? I am intersted in a specific field and I would like to use it to run the checks on the status of another RESTAPI call.
My sample json output is like this:
JSONDATA value :
{
"status": "success",
"job-execution-id": 396805,
"job-execution-user": "flexapp",
"job-execution-trigger": "RESTAPI"
}
I would need the job-execution-id value to monitor this job through the rest of the script.
I am using the following command to parse it:
RUNJOB=$(print ${DATA} |cut -f3 -d':'|cut -f1 -d','| tr -d [:blank:]) >> ${LOGDIR}/${LOGFILE}
The problem with this is, it is field delimited by :. The field position has been known to be changed by the vendors during releases.
So I am trying to see if I can use a utility out there that would always give me the key-value pair of "job-execution-id": 396805, no matter where it is in the json output.
I started looking at jsawk, and it requires the js interpreter to be installed on our machines which I don't want. Any hint on how to go about finding which RPM that I need to solve it?
I am using RHEL5.5.
Any help is greatly appreciated.
The ast-open project has libdss (and a dss wrapper) which supposedly could be used with ksh. Documentation is sparse and is limited to a few messages on the ast-user mailing list.
The regression tests for libdss contain some json and xml examples.
I'll try to find more info.
Python is included by default with CentOS so one thing you could do is pass your JSON string to a Python script and use Python's JSON parser. You can then grab the value written out by the script. An example you could modify to meet your needs is below.
Note that by specifying other dictionary keys in the Python script you can get any of the values you need without having to worry about the order changing.
Python script:
#get_job_execution_id.py
# The try/except is because you'll probably have Python 2.4 on CentOS 5.5,
# and the straight "import json" statement won't work unless you have Python 2.6+.
try:
import json
except:
import simplejson as json
import sys
json_data = sys.argv[1]
data = json.loads(json_data)
job_execution_id = data['job-execution-id']
sys.stdout.write(str(job_execution_id))
Kornshell script that executes it:
#get_job_execution_id.sh
#!/bin/ksh
JSON_DATA='{"status":"success","job-execution-id":396805,"job-execution-user":"flexapp","job-execution-trigger":"RESTAPI"}'
EXECUTION_ID=`python get_execution_id.py "$JSON_DATA"`
echo $EXECUTION_ID