Removing spaces out of a line after so many spaces has occured - mysql

So I am trying to do some tomcat access log analysis by getting it loaded into mysql. I have the majority of it working, but the last entry in the combined access log is kinda a pain, it does not always have the same spaces, and the file is space delimited. I need the last string in the file to either have the spaces removed or replaced with a comma or some other place holder.
I process the file through sed to remove all of the " from the file, so if i can add more to my sed command to do this that would be great, if i need to run it against something else after the sed command that will work to.
Here is the file before the sed command
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
Here is the sed command
sed 's/\"//g' filename > newfilename
Here is an example string from the file after that command is ran against it. Since it is space delimited in mysql it tries to make several more columns and it cannot. so if i can get all the spaces out of the last section that would be awesome.
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
Example of a string where Mozilla is not present.
24.240.97.38 - - [09/Feb/2015:07:38:21 -0600] GET /irep/images/integra.png HTTP/1.1 304 - - MobileSafari/600.1.4 CFNetwork/711.1.16 Darwin/14.0.0
Here is my expected output, sorry had several distractions this morning to this project.
IPAddress, ClientUsername, AuthUserName, DateTime, Request/File, Protocol, Status, SizeBytes, Referance address, UserAgent/Browser
I would post a screen shot of the table in mysql workbench but i am not allowed to yet.
Basically everything from "Mozilla" to the end of the row i want the spaces replaced or gone, i think a comma or : place holder would be ideal. Any suggestions?
Ed, here is the error I am getting when running it today.
awk: irep-istor_access_log.2015-02-10.txt:4: 166.173.58.240 - - [10/Feb/2015:00:04:07 -0600] "GET /istore/js/cart.js HTTP/1.1" 200 7042 "https://istore.salonservicegroup.com/istore/loginpage.jsp" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
awk: irep-istor_access_log.2015-02-10.txt:4: ^ syntax error

You can do the part you have left like this:
$ awk 'match($0,/Mozilla.*/){ tgt=substr($0,RSTART); gsub(/[[:space:]]+/,",",tgt); $0 = substr($0,1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
but you should just be using one small, simple, awk script for the whole thing, whatever that is.
I see you just added some pre-sed input (but still no expected output) so:
$ cat file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
$
$ awk '{gsub(/"/,"")} match($0,/Mozilla.*/){ tgt=substr($0,RSTART); gsub(/[[:space:]]+/,",",tgt); $0 = substr($0,1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Cart/Controller/TempController.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
Different approach: here is how to convert your input file into a CSV file:
$ cat tst.awk
BEGIN{
OFS=","
print "ipAddr", "dash1", "dash2", "dateTime", "getCmd", "number", "info", "browser"
}
{
gsub(OFS,";")
ip = $1
dash1 = $2
dash2 = $3
match($0,/\[[^]]+\]/)
dt = substr($0,RSTART+1,RLENGTH-2)
match($0,/"[^"]+"/)
get = substr($0,RSTART+1,RLENGTH-2)
$0 = substr($0,RSTART+RLENGTH)
num = $1
dash3 = $2
match($0,/"[^"]+"/)
info = substr($0,RSTART+1,RLENGTH-2)
$0 = substr($0,RSTART+RLENGTH)
match($0,/"[^"]+"/)
browser = substr($0,RSTART+1,RLENGTH-2)
print ip, dash1, dash2, dt, get, num, info, browser
}
.
$ awk -f tst.awk file
ipAddr,dash1,dash2,dateTime,getCmd,number,info,browser
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Cart/Controller/TempController.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4

Related

API Documentation on NGINX server

I have a Fastapi running on localhost:8020 and I have access to documentation with:localhost:8020/docs and the openapi.json file is in localhost:8020/openapi.json.
I want to redirect localhost:8020/docs to localhost:8080/docs with nginx. Here is my nginx.conf:
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log notice;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include mime.types;
sendfile on;
upstream docss {
server 172.17.0.1:8020;
}
server {
client_max_body_size 500M;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
client_body_timeout 600;
listen 8080;
resolver 127.0.0.11;
autoindex off;
server_name localhost;
server_tokens off;
location /docs/ {
proxy_pass http://docss/docs;
}
}
}
with the above config, when I open localhost:8080/docs/, I receive this error:
Failed to load API definition. Errors Hide Fetch error Not Found
/openapi.json
and in the nginx docker log, I receive this error:
172.21.0.1 - - [23/Jun/2022:18:10:35 +0000] "GET /docs/ HTTP/1.1" 200 931 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36"
2022/06/23 18:10:36 [error] 23#23: *1 open()
"/etc/nginx/html/openapi.json" failed (2: No such file or directory),
client: 172.21.0.1, server: localhost, request: "GET /openapi.json
HTTP/1.1", host: "localhost:8080", referrer:
"http://localhost:8080/docs/"
172.21.0.1 - - [23/Jun/2022:18:10:36 +0000] "GET /openapi.json HTTP/1.1" 404 548 "http://localhost:8080/docs/" "Mozilla/5.0 (Windows
NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/102.0.0.0 Safari/537.36"
I guess it means that it couldnot find openapi.json at "/etc/nginx/html/openapi.json".
So my question is that how can I import 172.17.0.1:8020/openapi.json which is the upstream connection to "/etc/nginx/html/openapi.json".
This is the command I use to create nginx docker:
docker run --link ds-ai-ocr_web_1 --link ds-ai-ocr_api_1 --net ds-ai-ocr_default --name nginx -v c:/Users/ab/Documents/ds-nginx-conf:/etc/nginx -p 8080:8080 -d nginx
I guess I should mount openapi.json through this command but it doesn't work like this:
docker run --link ds-ai-ocr_web_1 --link ds-ai-ocr_api_1 --net ds-ai-ocr_default --name nginx -v c:/Users/amd/Documents/ds-nginx-conf:/etc/nginx -v http://172.17.0.1:8020/openapi.json:/etc/nginx/html -p 8080:8080 -d nginx

Error in http_statuses - subscript out of bounds

Can someone explain me why session2 gives me following error:
library("rvest")
uastring = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
session = html_session("https://www.linkedin.com/job/", user_agent(uastring))
session2 = html_session("https://www.linkedin.com/job/")
Error in http_statuses[[as.character(status)]] : subscript out of
bounds
I have these example from https://stat4701.github.io/edav/2015/04/02/rvest_tutorial/
How I can check which value of uastring I have to put to html_session (for different sites). I don't ask about this specific site (I put it here because it's comes from tutorial).

In R, getURL() results a page saying too many request. But that page is viewable in Broswer

I am trying to get the page from www.dotabuff.com.
library(RCurl)
url <- "http://www.dotabuff.com/heroes/abaddon/matchups"
webpage <- getURL(url,verbose = TRUE)
The result is a page from dotabuff complaining too many requests. I was expecting a html page with a table, like the one viewable in a web browser. I have tried http, https, getURLContent, etc.
I think this has something to do with the type of request getURL sent, or maybe something tricky about that website.
Add a header to the request...
library(RCurl)
url <- "http://www.dotabuff.com/heroes/abaddon/matchups"
options(RCurlOptions = list(verbose = TRUE, useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13"))
webpage <- getURL(url,verbose = TRUE)
* Trying 23.235.40.64...
* Connected to www.dotabuff.com (23.235.40.64) port 80 (#0)
> GET /heroes/abaddon/matchups HTTP/1.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13
Host: www.dotabuff.com
Accept: */*
< HTTP/1.1 200 OK
...

Using Jsawk to parse JSON access logs

With our new webservers, the access logs are in JSON and I'm not able to use typical awk commands to pull out traffic info. I've found jsawk, however I keep getting a parse error anytime I try to pull anything out of the access logs. I have the feeling that the logs are not in a format the the parser likes
Here is a sample entry from the logs:
{ "#timestamp": "2014-09-30T21:33:56+00:00", "webserver_remote_addr": "24.4.209.153", "webserver_remote_user": "-", "webserver_body_bytes_sent": 193, "webserver_request_time": 0.000, "webserver_status": "404", "webserver_request": "GET /favicon.ico HTTP/1.1", "webserver_request_method": "GET", "webserver_http_referrer": "-", "webserver_http_user_agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36" }
So for example if I want to pull the IP addresses out of the logs, I would use this:
cat access.log | jsawk 'return this.webserver_remote_addr'
However this only results in 'jsawk: JSON parse error:' and the entire access log printed.
Am I correct in assuming that the access logs are in a format the parser doesn't recognize? Each entry in the logs is all on one line. How can I get jsawk to parse properly?
I tried this:
$ echo '{ "#timestamp": "2014-09-30T21:33:56+00:00", "webserver_remote_addr": "24.4.209.153", "webserver_remote_user": "-", "webserver_body_bytes_sent": 193, "webserver_request_time": 0.000, "webserver_status": "404", "webserver_request": "GET /favicon.ico HTTP/1.1", "webserver_request_method": "GET", "webserver_http_referrer": "-", "webserver_http_user_agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36" }' | jsawk 'return this.webserver_remote_addr'
and got this:
24.4.209.153
Updates:
I think the problem is that you have each line as a json object, and there are multiple lines in access.log. There's a good way to work around at here: How to use jsawk if every line is a json object ?

How to serve audio file for streaming from Grails with code 206?

I'm trying to stream out an audio/wav using HTML5 feature in the following way:
<audio type="audio/wav" src="/sound/10/audio/full" sound-id="10" version="full" controls="">
</audio>
This is working quite well on Chrome, except for the fact it's impossible to replay or reposition the currentTime attribute:
var audioElement = $('audio').get(0)
audioElement.currentTime
> 1.2479820251464844
audioElement.currentTime = 0
audioElement.currentTime
> 1.2479820251464844
I'm serving the audio file from a Grails controller, using following code:
def audio() {
File file = soundService.getAudio(...)
response.setContentType('audio/wav')
response.setContentLength (file.bytes.length)
response.setHeader("Content-disposition", "attachment;filename=${file.getName()}")
response.outputStream << file.newInputStream() // Performing a binary stream copy
response.status = 206
return false
}
It seems though Grails is giving back an HTTP response 200 instead of 206 (Partial-Content) as you can see from the following output from Chrome:
Request URL:http://localhost:8080/sound/10/audio/full
Request Method:GET
Status Code:200 OK
Request Headersview source
Accept:*/*
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:identity;q=1, *;q=0
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Cookie:JSESSIONID=9395D8FFF34B7455F937190F521AA1BC
Host:localhost:8080
Range:bytes=0-3189
Referer:http://localhost:8080/cms/sound/10/edit
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11
Response Headersview source
Content-disposition:attachment;filename=full.wav
Content-Length:3190
Content-Type:audio/wav
Date:Wed, 19 Dec 2012 13:58:44 GMT
Server:Apache-Coyote/1.1
Any idea what might be wrong?
Thanks, Amit.
ADDITION:
Changing the controller logic to:
response.status = 206
response.setContentType(version.mime)
response.setContentLength (file.bytes.length)
response.setHeader("Content-disposition", "attachment;filename=${file.getName()}")
response.outputStream << file.newInputStream() // Performing a binary stream copy
Did help with returning HTTP-206(Partial-Content), however both Chrome and Firefox won't play the audio file(mentioning again Chrome did play when it got the file with a 200...)
with the following info of the response:
Request URL:http://localhost:8080/sound/10/audio/full
Request Headersview source
Accept-Encoding:identity;q=1, *;q=0
Range:bytes=0-
Referer:http://localhost:8080/cms/sound/10/edit
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11
You might want to try:
response.reset();
response.setStatus(206);
response.setHeader("Accept-Ranges", "bytes");
response.setHeader("Content-length", Integer.toString(length + 1));
response.setHeader("Content-range", "bytes " + start.toString() + "-" + end.toString() + "/" + Long.toString(f.size()));
response.setContentType(...);
And this type of output should only be done if the client specifically asked for a range. You can check by using:
String range = request.getHeader("range");
if range is not null, then you'll have to parse the range for the start and end byte requests. Note that you can have "0-" as a range In some cases, you'll see "0-1" as a request to see if your service knows how to handle range requests.