JSON Extractor stops messages from showing up in graylog input - json

I have an nginx access_log Input that receives logs in json format. I have been trying to get the JSON Extractors working but to no avail.
Firstly, I was following this official Graylog tutorial: https://www.graylog.org/videos/json-extractor
This is a sample full message that comes in:
MyHost nginx: { “timestamp”: “1658474614.043”, “remote_addr”: “x.x.x.x.x”, “body_bytes_sent”: 229221, “request_time”: 0.005, “response_status”: 200, “request”: “GET /foo/bar/1999/09/sth.jpeg HTTP/2.0”, “request_method”: “GET”, “host”: “www…somesite.com”,“upstream_cache_status”: “”,“upstream_addr”: “x.x.x.x.x:xxx”,“http_x_forwarded_for”: “”,“http_referrer”: “https:////www.somesite.com/foo/bar/woo/boo/moo”, “http_user_agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36”, “http_version”: “HTTP/2.0”, “nginx_access”: true }
It's then extracted into a json field by the use of a following regex: nginx:\s+(.*)
Then the json field looks like that:
{ “timestamp”: “1658474614.043”, “remote_addr”: “x.x.x.x.x”, “body_bytes_sent”: 229221, “request_time”: 0.005, “response_status”: 200, “request”: “GET /foo/bar/1999/09/sth.jpeg HTTP/2.0”, “request_method”: “GET”, “host”: “www…somesite.com”,“upstream_cache_status”: “”,“upstream_addr”: “x.x.x.x.x:xxx”,“http_x_forwarded_for”: “”,“http_referrer”: “https://www.somesite.com/foo/bar/woo/boo/moo”, “http_user_agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36”, “http_version”: “HTTP/2.0”, “nginx_access”: true }
However from now on things only go downhill. I have set up a basic default JSON extractor without changing any options and when I click "Try" it shows the correct output:
Sadly after I implement this extractor, messages stop showing up in my Input. There has to be some kind of error but I couldn't find anything in the server.log located in /var/log/graylog-server/server.log.
Hope someone will help me figure this out!

I had same issue. Graylog has it's own timestamp field. You should try add key prefix _ to your extractor, so that your nginx timestamp would not conflict with graylog timestamp field

Since the link to the solution has been removed by a moderator, here's a pipeline that ultimately got the job done:
rule "parse the json log entries"
when has_field("json")
then
let json_tree = parse_json(to_string($message.json));
let json_fields = select_jsonpath(json_tree, { time: "$.timestamp",
remote_addr: "$.remote_addr", body_bytes_sent: "$.body_bytes_sent",
request_time: "$.request_time", response_status: "$.response_status",
request: "$.request", request_method: "$.request_method", host:
"$.host", upstream_cache_status: "$.upstream_cache_status",
upstream_addr: "$.upstream_addr" , http_x_forwarded_for:
"$.http_x_forwarded_for" , http_referrer: "$.http_referrer",
http_user_agent: "$.http_user_agent", http_version: "$.http_version",
nginx_access: "$.nginx_access"});
# Adding additional hours due to timezone differences, adjust it to your needs
let s_epoch = to_string(json_fields.time);
let s = substring(s_epoch, 0, 10);
let ts_millis = (to_long(s) + 7200) * 1000;
let new_date = parse_unix_milliseconds(ts_millis);
set_field("date", new_date);
set_field("remote_addr", to_string(json_fields.remote_addr));
set_field("body_bytes_sent",
to_double(json_fields.body_bytes_sent));
set_field("request_time", to_double(json_fields.request_time));
set_field("response_status",
to_double(json_fields.response_status));
set_field("request", to_string(json_fields.request));
set_field("request_method", to_string(json_fields.request_method));
set_field("host", to_string(json_fields.host));
set_field("upstream_cache_status",
to_string(json_fields.upstream_cache_status));
set_field("upstream_addr", to_string(json_fields.upstream_addr));
set_field("http_x_forwarded_for",
to_string(json_fields.http_x_forwarded_for));
set_field("http_referrer", to_string(json_fields.http_referrer));
set_field("http_user_agent",
to_string(json_fields.http_user_agent));
set_field("http_version", to_string(json_fields.http_version));
set_field("nginx_access", to_bool(json_fields.nginx_access));
end
Note that you still have to configure an extractor, in this particular example, the original message looks a bit like this: nginx: {json}.
So to make it only json, configure an extractor the following way:
So that's all, you may need to adjust it a bit if it doesn't work, but for most use cases it should.
Still, if anyone would be interested in seeing the entire discussion that resulted in this solution, go to this link: https://community.graylog.org/t/failed-to-index-1-messages-failed-to-parse-field-datetime-of-type-date-in-document/24960/6

Related

R web scraping: I can't pull up the elements I want

I'm a beginner in web scraping using R. I'm trying to scrape the following webpage: https://bkmea.com/bkmea-members/#/company/2523.
I would like to get all text elements under div nodes with class="company_name", as well as text elements under td nodes. For example, I'm trying to fetch the company name ("MOMO APPARELS") as in the following HTML text.
<div class="comapny_header">
<div class="company_name">MOMO APPARELS LTD</div>
<div class="view_all">View All</div>
</div>
So I've written the following code:
library(textreadr)
library(rvest)
companyinfo <- read_html("https://bkmea.com/bkmea-members/#/company/2523")
html_nodes(companyinfo,"div")%>%
html_text() # it works
html_nodes(companyinfo,"div.company_name")%>%
html_text() # doesn't work
html_nodes(companyinfo,"td") %>%
html_text() # doesn't work
If I understand correctly - the first one should pull up texts with div nodes.
The second one should pull up texts within div nodes with attributes equal to company_name.
The third one should pull up texts within td nodes.
The first one works (which isn't what I'm trying to get) but the second and the third ones don't - am I doing something terribly wrong?
I'd really appreciate it if you could help me out here!!
Many thanks,
Sang
The data you're looking for is retrieved by this API (it is not present in the html body) :
GET https://bkmea.com/wp-admin/admin-ajax.php?action=bkmea_get_company&id=2523
You just need to extract the id from your original url, build the url above and parse json result as following :
library(httr)
originalUrl <- "https://bkmea.com/bkmea-members/#/company/2523"
id <- sub("^.+/", "", originalUrl)
userAgent <- "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
output <- content(GET("https://bkmea.com/wp-admin/admin-ajax.php", query = list(
"action" = "bkmea_get_company",
"id" = id
), add_headers('User-Agent' = userAgent)), as = "parsed", type = "application/json")
print(output$company$company_info$company_name)
output :
[1] "MOMO APPARELS LTD"

Best way to deal with "key error" when scraping (yahoo finance)?

Hi I am making a scrape for yahoo finance and I am using JSON to get keys and then scraping the keys e.g ...
fwd_div_yield = data['context']['dispatcher']['stores']['QuoteSummaryStore']["summaryDetail"]['dividendYield']['raw']
The error is that if a company doesn't pay a dividend it will produce a key error as there is no key 'raw' instead of using raw = 0 they just don't have raw. But if a company does have a dividend it will return 'raw', 'fmt' etc.
I was wondering what the most efficient way of dealing with this is?
Another Question Is how would you access ...
[{'raw': 1595894400, 'fmt': '2020-07-28'}, {'raw': 1596412800, 'fmt': '2020-08-03'}]
my current soloution is...
earnings_dates = data['context']['dispatcher']['stores']['QuoteSummaryStore']['calendarEvents']['earnings']['earningsDate'][0]['fmt']
earnings_datee = data['context']['dispatcher']['stores']['QuoteSummaryStore']['calendarEvents']['earnings']['earningsDate'][1]['fmt']
earnings_date = earnings_dates+", "+earnings_datee
To extract the dividend yield from the raw key and not get a KeyError when it's not there, do the following:
fwd_div_yield = data['context']['dispatcher']['stores']['QuoteSummaryStore']["summaryDetail"]['dividendYield'].get('raw', 0)
In the event raw is not there, the fwd_div_yield will be 0.
Then to retrieve each date from the list of dictionaries, you can use a list comprehension:
earnings_dates = data['context']['dispatcher']['stores']['QuoteSummaryStore']['calendarEvents']['earnings']['earningsDate']
fmt_dates = [date['fmt'] for date in earnings_dates]
Also, this data is available via url: https://query2.finance.yahoo.com/v10/finance/quoteSummary/aapl?modules=summaryDetail. Just replace aapl with the symbol you're scraping.
I would wrap whatever code is checking if the company pays a dividend in a try/except block.
def paysDivivend(data):
try:
if 'raw' in data:
return True
except KeyError:
return False
Without seeing any example code this is a quick fix solution
For the second question...
IF you are asking to create [{'raw': 1234,'fmt':'2020-07-28'},...]:
Based on the compiled list of companies that pay a dividend.
Create a the list:
def dividendList(data):
dividend_list = []
for company in data:
dividend_list.append({'raw':compay['path']['to']['raw'],'fmt':company['path']['to'][fmt']})
return dividend_list
IF you are trying to access each one after you already created the list:
def accessDividend(dividend_data):
for dividend in dividend_data:
print(f"{dividend['raw']}, {dividend['fmt']}")
I created this method as a workaround.
def yfinanceDataframe(symbol, interval, _range):
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
data = requests.get(f'https://query1.finance.yahoo.com/v8/finance/chart/{symbol}?interval={interval}&range={_range}', headers=headers).json()
timestamp = data['chart']['result'][0]['timestamp']
data = data['chart']['result'][0]['indicators']['quote'][0]
df = pd.DataFrame(data)
df['Datetime'] = timestamp
df['Datetime'] = df['Datetime'].apply(lambda x: dt.fromtimestamp(x).strftime('%m/%d/%Y %H:%M'))
df.dropna(inplace=True)
df.reset_index(inplace=True)
df.rename(columns={'close': 'Close'}, inplace=True)
return df

Getting data out of JSON in NodeJS results in undefined

So today I was writing a nodejs app to get data out of a website's API.So the API returns data in JSON. This is my code :
var processing = WooCommerce.get('orders?status='+type, function(err, data,
res) {
var result = res;
JSON.stringify(result)
console.log(result);
result = result[0].meta_data;
console.log(result);
});
And this is my console log : (Sorry for the mess)
[{"id":2977,"parent_id":0,"number":"2977","order_key":"wc_order_5a8bc4c350d54","created_via":"checkout","version":"3.0.5","status":"on-hold","currency":"INR","date_created":"2018-02-20T12:18:3
5","date_created_gmt":"2018-02-20T06:48:35","date_modified":"2018-02-20T12:18:41","date_modified_gmt":"2018-02-20T06:48:41","discount_total":"0.00","discount_tax":"0.00","shipping_total":"0.00
","shipping_tax":"0.00","cart_tax":"0.00","total":"40.00","total_tax":"0.00","prices_include_tax":false,"customer_id":342,"customer_ip_address":"103.104.77.159","customer_user_agent":"mozilla\
/5.0 (linux; android 6.0.1; le x526 build\/iixosop5801910121s) applewebkit\/537.36 (khtml, like gecko) chrome\/64.0.3282.137 mobile safari\/537.36","customer_note":"","billing":{"first_name":"
Fahad","last_name":"Khan","company":"","address_1":"","address_2":"","city":"Delhi","state":"DL","postcode":"","country":"IN","email":"shimail786#gmail.com","phone":"8745076002"},"shipping":{"
first_name":"","last_name":"","company":"","address_1":"","address_2":"","city":"","state":"","postcode":"","country":""},"payment_method":"paytm-qr","payment_method_title":"Pay with Paytm QR"
,"transaction_id":"","date_paid":null,"date_paid_gmt":null,"date_completed":null,"date_completed_gmt":null,"cart_hash":"0119311d11c4978ecc7bf6f59b53586f","meta_data":[{"id":91320,"key":"_billi
ng_stl","value":"https:\/\/steamcommunity.com\/tradeoffer\/new\/?partner=452464312&token=Gq27CMGc"},{"id":91321,"key":"billing_stl","value":"https:\/\/steamcommunity.com\/tradeoffer\/new\/?par
tner=452464312&token=Gq27CMGc"},{"id":91324,"key":"_woocs_order_rate","value":"1"},{"id":91325,"key":"_woocs_order_base_currency","value":"INR"},{"id":91326,"key":"_woocs_order_currency_change
d_mannualy","value":"0"}],"line_items":[{"id":1641,"name":"MAG-7 | Silver (Factory New)","product_id":2972,"variation_id":0,"quantity":1,"tax_class":"","subtotal":"40.00","subtotal_tax":"0.00"
,"total":"40.00","total_tax":"0.00","taxes":[],"meta_data":[],"sku":"","price":40}],"tax_lines":[],"shipping_lines":[],"fee_lines":[],"coupon_lines":[],"refunds":[],"_links":{"self":[{"href":"
https:\/\/ezpz-skins.com\/wp-json\/wc\/v2\/orders\/2977"}],"collection":[{"href":"https:\/\/ezpz-skins.com\/wp-json\/wc\/v2\/orders"}],"customer":[{"href":"https:\/\/ezpz-skins.com\/wp-json\/w
c\/v2\/customers\/342"}]}},{"id":2976,"parent_id":0,"number":"2976","order_key":"wc_order_5a8bc2fabf6d8","created_via":"checkout","version":"3.0.5","status":"on-hold","currency":"INR","date_cr
eated":"2018-02-20T12:10:58","date_created_gmt":"2018-02-20T06:40:58","date_modified":"2018-02-20T12:11:02","date_modified_gmt":"2018-02-20T06:41:02","discount_total":"0.00","discount_tax":"0.
00","shipping_total":"0.00","shipping_tax":"0.00","cart_tax":"0.00","total":"95.00","total_tax":"0.00","prices_include_tax":false,"customer_id":342,"customer_ip_address":"103.104.77.159","cust
omer_user_agent":"mozilla\/5.0 (linux; android 6.0.1; le x526 build\/iixosop5801910121s) applewebkit\/537.36 (khtml, like gecko) chrome\/64.0.3282.137 mobile safari\/537.36","customer_note":""
,"billing":{"first_name":"Fahad","last_name":"Khan","company":"","address_1":"","address_2":"","city":"Delhi","state":"DL","postcode":"","country":"IN","email":"shimail786#gmail.com","phone":"
8745076002"},"shipping":{"first_name":"","last_name":"","company":"","address_1":"","address_2":"","city":"","state":"","postcode":"","country":""},"payment_method":"paytm-qr","payment_method_
title":"Pay with Paytm QR","transaction_id":"","date_paid":null,"date_paid_gmt":null,"date_completed":null,"date_completed_gmt":null,"cart_hash":"ca6b663ea1f4b4c7ed65b9fd39acc2cb","meta_data":
[{"id":91268,"key":"_billing_stl","value":"https:\/\/steamcommunity.com\/tradeoffer\/new\/?partner=452464312&token=1m7SCUVf"},{"id":91269,"key":"billing_stl","value":"https:\/\/steamcommunity.
com\/tradeoffer\/new\/?partner=452464312&token=1m7SCUVf"},{"id":91272,"key":"_woocs_order_rate","value":"1"},{"id":91273,"key":"_woocs_order_base_currency","value":"INR"},{"id":91274,"key":"_w
oocs_order_currency_changed_mannualy","value":"0"}],"line_items":[{"id":1639,"name":"SG 553 | Tiger Moth (Field Tested)","product_id":911,"variation_id":0,"quantity":1,"tax_class":"","subtotal
":"42.00","subtotal_tax":"0.00","total":"42.00","total_tax":"0.00","taxes":[],"meta_data":[],"sku":"","price":42},{"id":1640,"name":"Glock-18 | Bunsen Burner (Factory New)","product_id":532,"v
ariation_id":0,"quantity":1,"tax_class":"","subtotal":"53.00","subtotal_tax":"0.00","total":"53.00","total_tax":"0.00","taxes":[],"meta_data":[{"id":4861,"key":"_woocs_order_rate","value":"1"}
,{"id":4862,"key":"_woocs_order_base_currency","value":"INR"},{"id":4863,"key":"_woocs_order_currency_changed_mannualy","value":"0"}],"sku":"","price":53}],"tax_lines":[],"shipping_lines":[],"
fee_lines":[],"coupon_lines":[],"refunds":[],"_links":{"self":[{"href":"https:\/\/ezpz-skins.com\/wp-json\/wc\/v2\/orders\/2976"}],"collection":[{"href":"https:\/\/ezpz-skins.com\/wp-json\/wc\
/v2\/orders"}],"customer":[{"href":"https:\/\/ezpz-skins.com\/wp-json\/wc\/v2\/customers\/342"}]}}]
undefined
So I realize (after reading tons of questions on StackOverflow) that my data is an array. That's why I have added the result = result[0].meta_data; But that gives me undefined (notice it at the end of log). Also if I remove .meta_data , it just returns [, the very first character.
Where am I going wrong ? I'm kinda new to all this and am still learning, so please explain :)
The 'res' is in string format so instead of JSON.stringify() use JSON.parse() so that it will be converted back into Javascript object, then try consoling the result as shown below and try to access meta_data after that.
var processing = WooCommerce.get('orders?status='+type, function(err, data,
res) {
var result = JSON.parse(res);
console.log(result[0]);
result = result[0].meta_data;
console.log(result);
});

Opensips 2.1 SIP Trunk configuration

I'm ahmed and I'm working on opensips.
Actually, I sow your questions on the forum and I have a problem that I think you have the answer.
Actually, I did a simple senario to route calls between users registered in opensips server, but when it comes to real IP phones( that each one has its own ip address ), it doasn't work. ( trunk ).
for example: my opensips address: 10.42.15.18
and my IP phone address is : 10.42.13.82
it is all about sip trunk I think.
I am blocked in this part and searched a lot for a solution, maybe there is a detail that I have missed.
which function is responsible for handling requests and responses with an IP phone ?
I used this code :
account only INVITEs
if ($rU=="49894614950666"){
$rU = $tU;
rewritehostport("10.42.13.82:5060");
$du = "sip:49894614950666#10.42.13.82;user=phone";
t_relay();
xlog("reference to URI of 'To' header ====> $tu");
xlog("reference to domain in URI of 'TO' header ====> $td");
# route the call out based on RURI
route(3);
}
route[3]{
seturi("sip:49894614950666#10.42.13.82;user=phone");
$du = "sip:49894614950666#10.42.13.82;user=phone";
rewriteuri("sip:49894614950666#10.42.13.82;user=phone");
xlog("route 2 : forwarding to $tU \n $ruri \n");
xlog("Received $rm from $fu (callid: $ci)\n");
forward();
if (is_method("INVITE")) {
t_on_branch("2");
t_on_reply("2");
t_on_failure("1");
}
if (!t_relay()) {
sl_reply_error();
};
exit;
}
When calling from a soft phone the requested number, the server sends a request INVITE as follow :
INVITE sip:49894614950666#10.42.15.18;transport=TCP SIP/2.0
Via: SIP/2.0/TCP 10.42.15.12:5060;branch=z9hG4bK-524287-1---dedd27ee7475c0f1
Max-Forwards: 70
Contact: <sip:test11#10.42.15.12:5060;transport=tcp>
To: <sip:49894614950666#10.42.15.18;transport=TCP>
From: <sip:test11#10.42.15.18;transport=TCP>;tag=2f025b44
Call-ID: tdO14DnlADH9Okx6Sr0p4A..
CSeq: 1 INVITE
Content-Type: application/sdp
User-Agent: Z 3.15.40006 rv2.8.20
Allow-Events: presence, kpml, talk
Content-Length: 237
and the target VM resend an INVITE request to Opensips server, but then, the server start to send to himself messages and not responding the target machine...
I wonder that the "To" field in the INVITE message is false !
opensips only sends a invite to the IP phone and ignore messages coming from it, does not respond after with any ack.
what should I add or modify ?
thank you a lot.
Why just not using lookup function ? It intended exactly for cases like your and will do all duty work in rewriting URI's automatically.
Something like that :
if (lookup("location","m")) {
xlog("[INCOMINGCALL][$rU # $si:$sp ] Forward call call to <$ru> via <$du>\n");
if (!t_relay()) {
send_reply("503","Internal Error");
};
exit;
}
t_reply("404", "Not Found");
exit;
The advantage of this technique you will be able to change locations at run time using 'opensipsctl address' command

What's the best way to send in multiple coordinates in a JSON to RethinkDB in order to create an r.polygon?

I'm using an Express server with RethinkDB, and I want to send in multiple coordinates into my 'locations' table on RethinkDB and create an r.polygon(). I understand how to do the query via RethinkDB's data explorer , but I'm having trouble figuring out how to send it via JSON from the client to the server and insert it through my query there.
I basically want to do this:
r.db('places').table('locations').insert({
name: req.body.name,
bounds: r.polygon(req.body.bounds)
})
where req.body.bounds looks like this:
[long, lat],[long, lat], [long, lat]
I can't send it in as a string because then it gets read as one single input instead of three arrays. I'm sure there's a 'right in front of me' way, but I'm drawing a blank.
What's the best way to do this?
Edit: To clarify, my question is, what should my JSON look like and how should it be received on my server?
This is what RethinkDB wants in order to make a polygon:
r.polygon([lon1, lat1], [lon2, lat2], [lon3, lat3], ...) → polygon
As per the suggestion, I've added in r.args() to my code:
r.db('places').table('locations').insert({
name: req.body.name,
bounds: r.polygon(r.args(req.body.bounds))
})
Edit
Ok, I was dumb and had a typo in one of my coordinates!
Sending it as an array of arrays and wrapping it in r.args() on the server side works.
What you need is r.args to unpack the array into arguments for r.polygon. https://www.rethinkdb.com/api/javascript/args/
With assumption that req.body.bounds is:
[[long, lat],[long, lat], [long, lat]]
And you are submit a raw JSON string from client.
You first need to decode the JSON payload, and get the bounds field, wrap it with args as following:
var body = JSON.parse(req.body)
r.db('places').table('locations').insert({
name: req.body.name,
bounds: r.polygon(r.args(body.bounds))
})