Using Jsawk to parse JSON access logs - json

With our new webservers, the access logs are in JSON and I'm not able to use typical awk commands to pull out traffic info. I've found jsawk, however I keep getting a parse error anytime I try to pull anything out of the access logs. I have the feeling that the logs are not in a format the the parser likes
Here is a sample entry from the logs:
{ "#timestamp": "2014-09-30T21:33:56+00:00", "webserver_remote_addr": "24.4.209.153", "webserver_remote_user": "-", "webserver_body_bytes_sent": 193, "webserver_request_time": 0.000, "webserver_status": "404", "webserver_request": "GET /favicon.ico HTTP/1.1", "webserver_request_method": "GET", "webserver_http_referrer": "-", "webserver_http_user_agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36" }
So for example if I want to pull the IP addresses out of the logs, I would use this:
cat access.log | jsawk 'return this.webserver_remote_addr'
However this only results in 'jsawk: JSON parse error:' and the entire access log printed.
Am I correct in assuming that the access logs are in a format the parser doesn't recognize? Each entry in the logs is all on one line. How can I get jsawk to parse properly?

I tried this:
$ echo '{ "#timestamp": "2014-09-30T21:33:56+00:00", "webserver_remote_addr": "24.4.209.153", "webserver_remote_user": "-", "webserver_body_bytes_sent": 193, "webserver_request_time": 0.000, "webserver_status": "404", "webserver_request": "GET /favicon.ico HTTP/1.1", "webserver_request_method": "GET", "webserver_http_referrer": "-", "webserver_http_user_agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36" }' | jsawk 'return this.webserver_remote_addr'
and got this:
24.4.209.153
Updates:
I think the problem is that you have each line as a json object, and there are multiple lines in access.log. There's a good way to work around at here: How to use jsawk if every line is a json object ?

Related

Why can't this text be parsed through fastjson2?

import com.alibaba.fastjson2.JSONArray
JSONArray.parseArray(str).toString()
I use the toString method of fastjson2 to parse this JSON string, but I will encounter an error:
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
at com.alibaba.fastjson2.JSONWriterUTF16JDK8.writeString(JSONWriterUTF16JDK8.java:183)
at com.alibaba.fastjson2.writer.ObjectWriterImplMap.write(ObjectWriterImplMap.java:428)
at com.alibaba.fastjson2.writer.ObjectWriterImplMap.write(ObjectWriterImplMap.java:457)
at com.alibaba.fastjson2.writer.ObjectWriterImplList.write(ObjectWriterImplList.java:278)
at com.alibaba.fastjson2.JSONArray.toString(JSONArray.java:871)
Similar strings can work normally. I really can't find which special character caused them.
My str is:
[{"response_info":{"header":"Content-Length: 388\r\nContent-Type: application/octet-stream\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36\r\nHost: 180.102.211.212\r\n","body":"\u0000\u0000\u0000\u0003seq\u0000\u0000\u0000\u000241\u0000\u0000\u0000\u0003ver\u0000\u0000\u0000\u00011\u0000\u0000\u0000\tweixinnum\u0000\u0000\u0000\n1429629729\u0000\u0000\u0000\u0007authkey\u0000\u0000\u0000D0B\u0002\u0001\u0001\u0004;09\u0002\u0001\u0002\u0002\u0001\u0001\u0002\u0004U6k!\u0002\u0003\u000fBA\u0002\u0004\u0015zXu\u0002\u0004\ufffd\ufffdf\ufffd\u0002\u0003\u000fU\ufffd\u0002\u0003\u0006\u0000\u0000\u0002\u0004U6k!\u0002\u0004d=\u001eS\u0002\u0004\ufffd\ufffd7\u0019\u0004\u0000\u0000\u0000\u0000\u0006rsaver\u0000\u0000\u0000\u00011\u0000\u0000\u0000\brsavalue\u0000\u0000\u0000\ufffd\ufffd\ufffd\ufffd\ufffd\u0006\u001d\ufffd_;\ufffdi\ufffdT.\ufffd\ufffd\"CK\ufffd/\u00169\u0018\u0015bI\ufffd\ufffd`<n\ufffd\ufffd\ufffdw\ufffd\ufffd\ufffd!\ufffd\u001a\u0003\ufffdHh\ufffdP%i$\ufffd$\ufffd\u0005\ufffd<\ufffd8\ufffd\ufffd\ufffd\ufffd\n\ufffd$\u0016A-O5\ufffd`\r\ufffd\ufffdc\ufffd\ufffd\u001b\ufffd\ufffd\r3\ufffd\ufffd`\ufffd)\ufffd\ufffdV\ufffdf \ufffd`\t\ufffd%\u0010\ufffd\ufffd\ufffdJ\ufffd\u001aCu\u0010\u000b\ufffd\u0001X\ufffd\ufffd\u01b7\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd.\u0000\u0000\u0000\u0007filemd5\u0000\u0000\u0000 0d65f9a4beb26b55874965490344abef\u0000\u0000\u0000\bfiletype\u0000\u0000\u0000\u00015\u0000\u0000\u0000\u0006touser\u0000\u0000\u0000\u00101688854880368629"}}]
fastjson version is 2.0.10

How to enable proper Description reason in N1QL Couchbase in case of query failure. Or any Exception Id(Icode)?

Following is the one failure log for same id here it is mark as status as success or error in failure case.
Here it is failure case in description it mention just "A N1QL EXPLAIN statement was executed" but didnt gave proper Exception Id and proper Description details-
{"clientContextId":"INTERNAL-b8d19563-94a1-442d-9a09-dde36743fb7d","description":"A
N1QL EXPLAIN statement was
executed","id":28673,"isAdHoc":true,"metrics":{"elapsedTime":"11.921ms","executionTime":"11.764ms","resultCount":1,"resultSize":649},"name":"EXPLAIN
statement","node":"127.0.0.1:8091","real_userid":{"domain":"builtin","user":"Administrator"},"remote":{"ip":"127.0.0.1","port":44695},"requestId":"958a7e12-d5a6-4d7b-bd40-ac9bb60cf4a3","statement":"explain
INSERT INTO `Guardium` (KEY, VALUE) \nVALUES ( "id::5554\n", { "Emp
Name": "Test4", "Emp Company" : "GS Lab", "Emp Country" :
"India"} )\nRETURNING
*;","status":"errors","timestamp":"2021-01-07T09:37:00.486Z","userAgent":"Mozilla/5.0
(Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/87.0.4280.88 Safari/537.36 (Couchbase Query Workbench
(6.6.1-9213-enterprise))"}
Please provide input on this, I want proper description about why this N1QL failure reason in audit.logs
Thank you..

Keep getting HTTP code 422, not sure whats wrong with the json payload

I am making a python module that interacts with Carousell using the requests module. Now I am trying to send a post request with a JSON payload, but I keep getting HTTP error code 422(UNPROCESSABLE ENTITY). I don't know what's wrong with my JSON payload, python dict(before it's converted to JSON) or perhaps I am missing something in my request headers.
I tried taking the raw json string(from the POST request that I captured using Chrome dev tools) converting it dict and copy that dict(printed out) and try to use it in the program. It didn't work.
login_session = requests.session()
login_session.headers.update({"DNT":"1", "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36", "Origin":"https://sg.carousell.com"})
login_payload = {'requests': {'g0': {'resource': 'sso', 'operation': 'create', 'params': {'loginToken': cookies["login-token"]}, 'body': {}}}, 'context': {'_csrf': cookies["_csrf"]}}
login_cookies = {"__cfduid": cookies["__cfduid"], "_csrf": cookies["_csrf"], "gtkprId": cookies["gtkprId"], "login-token": cookies["login-token"], "redirect":"redirect"}
login_headers = {'accept':'*/*','accept-encoding':'gzip, deflate, br','accept-language':'en-GB,en;q=0.9,zh-CN;q=0.8,zh;q=0.7,en-US;q=0.6', 'x-requested-with': 'XMLHttpRequest', 'content-type': 'application/json'}
login_data = login_session.post(query_url, cookies=login_cookies, data=json.dumps(login_payload), headers=login_header)
Heres the output from debugging logger
DEBUG:urllib3.connectionpool:https://sg.carousell.com:443 "POST /ui/iso?_csrf=TNZTMZpBdQYgRFFouCF4ELVB HTTP/1.1" 422 0
Edit:
Heres the JSON payload which was sent to the server. I am trying to replicate it.
{"requests":{"g0":{"resource":"sso","operation":"create","params":{"loginToken":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1NTg1MDQyMjQsImlzcyI6ImxvZ2luLmNhcm91c2VsbC5jb20iLCJzc29pZCI6IkRacG1rd1l1SXAxdDF5U3A2M1RXWExPUTJnWmRFRzBOSHd3d0ZGSm9PSkFvVFFOdGFyNWt0MDMzNm5EVHRudHoiLCJ1c2VyaWQiOiIxNDczMjI3NCJ9.x7YxdLLk1ID6_jWy4trtLzbrPnZZ0eI7g_cQN1BilF8"},"body":{}}},"context":{"_csrf":"hPPhgajp-1GMLSbgjZBNBD7z2EGPVGCuA_mU"}}
Note that login token and _csrf are data from the cookies.

Bitcoind JSON RPC auth not working

I have runing bitcoind on ubuntu. bitcoin-cli works fine. I can not get working json rpc protocol
bitcoin.conf file:
testnet=0
rpcuser="bitcoinrpc"
rpcpassword="xxxxx"
rpcport=8332
rpcallowip="*"
server=1
http post request with url='http://bitcoinrpc:xxxxx#127.0.0.1:8332/' fails with 401 error.
request headers:
Accept:*/*
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.8,ru;q=0.6,de;q=0.4,sr;q=0.2
Authorization:Basic Yml0Y29pbnJwYzp4eHh4eA==
Cache-Control:no-cache
Connection:keep-alive
Content-Length:53
Content-Type:text/plain
DNT:1
Host:127.0.0.1:8332
Origin:chrome-extension://fhjcajmcbmldlhcimfajhfbgofnpcjmb
Pragma:no-cache
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/51.0.2704.79 Chrome/51.0.2704.79 Safari/537.36
request post payload:
{jsonrpc: "2.0", method: "getinfo", params: []}
What is correct way for bitcoind json rpc autentification?
For future googlers: a possible problem is that the password should not contain the pound sign (#) as this is treated as a comment!

Removing spaces out of a line after so many spaces has occured

So I am trying to do some tomcat access log analysis by getting it loaded into mysql. I have the majority of it working, but the last entry in the combined access log is kinda a pain, it does not always have the same spaces, and the file is space delimited. I need the last string in the file to either have the spaces removed or replaced with a comma or some other place holder.
I process the file through sed to remove all of the " from the file, so if i can add more to my sed command to do this that would be great, if i need to run it against something else after the sed command that will work to.
Here is the file before the sed command
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
Here is the sed command
sed 's/\"//g' filename > newfilename
Here is an example string from the file after that command is ran against it. Since it is space delimited in mysql it tries to make several more columns and it cannot. so if i can get all the spaces out of the last section that would be awesome.
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
Example of a string where Mozilla is not present.
24.240.97.38 - - [09/Feb/2015:07:38:21 -0600] GET /irep/images/integra.png HTTP/1.1 304 - - MobileSafari/600.1.4 CFNetwork/711.1.16 Darwin/14.0.0
Here is my expected output, sorry had several distractions this morning to this project.
IPAddress, ClientUsername, AuthUserName, DateTime, Request/File, Protocol, Status, SizeBytes, Referance address, UserAgent/Browser
I would post a screen shot of the table in mysql workbench but i am not allowed to yet.
Basically everything from "Mozilla" to the end of the row i want the spaces replaced or gone, i think a comma or : place holder would be ideal. Any suggestions?
Ed, here is the error I am getting when running it today.
awk: irep-istor_access_log.2015-02-10.txt:4: 166.173.58.240 - - [10/Feb/2015:00:04:07 -0600] "GET /istore/js/cart.js HTTP/1.1" 200 7042 "https://istore.salonservicegroup.com/istore/loginpage.jsp" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
awk: irep-istor_access_log.2015-02-10.txt:4: ^ syntax error
You can do the part you have left like this:
$ awk 'match($0,/Mozilla.*/){ tgt=substr($0,RSTART); gsub(/[[:space:]]+/,",",tgt); $0 = substr($0,1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
but you should just be using one small, simple, awk script for the whole thing, whatever that is.
I see you just added some pre-sed input (but still no expected output) so:
$ cat file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
$
$ awk '{gsub(/"/,"")} match($0,/Mozilla.*/){ tgt=substr($0,RSTART); gsub(/[[:space:]]+/,",",tgt); $0 = substr($0,1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Cart/Controller/TempController.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
Different approach: here is how to convert your input file into a CSV file:
$ cat tst.awk
BEGIN{
OFS=","
print "ipAddr", "dash1", "dash2", "dateTime", "getCmd", "number", "info", "browser"
}
{
gsub(OFS,";")
ip = $1
dash1 = $2
dash2 = $3
match($0,/\[[^]]+\]/)
dt = substr($0,RSTART+1,RLENGTH-2)
match($0,/"[^"]+"/)
get = substr($0,RSTART+1,RLENGTH-2)
$0 = substr($0,RSTART+RLENGTH)
num = $1
dash3 = $2
match($0,/"[^"]+"/)
info = substr($0,RSTART+1,RLENGTH-2)
$0 = substr($0,RSTART+RLENGTH)
match($0,/"[^"]+"/)
browser = substr($0,RSTART+1,RLENGTH-2)
print ip, dash1, dash2, dt, get, num, info, browser
}
.
$ awk -f tst.awk file
ipAddr,dash1,dash2,dateTime,getCmd,number,info,browser
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Cart/Controller/TempController.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4