What does this line mean ? (Logstash Conf & Grok Syntax) - json

Hi i'm new to Logstash and Grok syntax, I'm trying to understand what are those line meaning
codec => multiline { pattern => "^{$" negate => "true" what => "previous" }
and
mutate {
gsub => ["message", "'", '"']
}
Thanks !

it is pretty well explain in the official documentation of the multiline codec plugin:
pattern => ^{$ matches lines that only contain a { character and end immediately
negate => true means that line NOT matching the pattern are considered
what => previous means that the current matched line relates to the previous one
In summary, these settings mean that all lines that do NOT consist of only { belong to the previous line.
Concretely, this multiline filter is for putting together JSON data that was pretty-printed on several line, like this:
{
"bla": {
"test": 1
}
}
The above pretty-printed JSON will be handled as if it had been printed as a single line, like this:
{ "bla": { "test": 1 } }
Regarding the second filter (mutate/gsub), it is used to replace all single quotes with double quotes.

Related

Regex Remove Spaces in JSON String

I have a wider automation which populates a lookup table and then serializes the data into a JSON file as this is my desired output.
I am required to remove the spaces once in the JSON format of the lookup column headers.
I am looking to see if it is possible to have a regex which will identify the headers and be able to remove the spaces.
JSON String below:
[
{
"INVOLVED PARTY ID":" 9445999606",
"CUSTOMER NUMBER":" 9445999606",
"PRODUCT":"Current Account",
"LAST UPDATED":"20/02/2020 10:33:00",
"APPLICATION STATUS":"Clearing Handbrake",
"PROGRESS":"Progress",
"APPLICANT":" ACCEPT FLEX INDICATOR Y",
"QUESTION 3 - HEART/CANCER CONDITIONS":null,
}
]
Desired output after regex manipulation
[
{
"INVOLVEDPARTYID":" 9445999606",
"CUSTOMERNUMBER":" 9445999606",
"PRODUCT":"Current Account",
"LASTUPDATED":"20/02/2020 10:33:00",
"APPLICATIONSTATUS":"Clearing Handbrake",
"PROGRESS":"Progress",
"APPLICANT":" ACCEPT FLEX INDICATOR Y",
"QUESTION3-HEART/CANCERCONDITIONS":null,
}
]
Notice only the spaces within the headers have been removed.
Any help on the regex string would be much appreciated or point me in the right direction.
Well, this one works fine:
(?<=\"[A-Z0-9 /-]*) (?=[A-Z0-9 /-]*\":)
It has two non-capturing groups:
Catches alphabets (capital), digits, space, hyphen and slash followed by a double quotation mark.
Catches all the same char set before double quotation mark and a colon.
In between there is the space which gets captured.
Check this out https://regexr.com/4vogd
The logic here is to first creating a new empty result object, iterate over prev object keys, remove the whitespace from it, then assign it to result object as key and put the prev value (intact) as the this(filtered key)'s value;
const yourData =[
{
"INVOLVED PARTY ID":" 9445999606",
"CUSTOMER NUMBER":" 9445999606",
"PRODUCT":"Current Account",
"LAST UPDATED":"20/02/2020 10:33:00",
"APPLICATION STATUS":"Clearing Handbrake",
"PROGRESS":"Progress",
"APPLICANT":" ACCEPT FLEX INDICATOR Y",
"QUESTION 3 - HEART/CANCER CONDITIONS":null,
}
];
let newData = yourData.map(obj=>{
let regexedObj = {};
Object.keys(obj).forEach( prevKey => {
//pattern can be /\s/ too, depends on use-case
const regexedKey = prevKey.replace(/ /g,'')
regexedObj[regexedKey] = obj[prevKey]
})
return regexedObj
})
console.log(newData)

Using ELK, I'm unsure how to use right filter in logstash config file for pairs

I have a dollarVars term filled with things in this format...
Message 1:
Stuff1,Header1|Value1|Header2|Value2|Header3|Value3|
Message 2:
Stuff1,Header2|Value2|Header4|Value4|Header5|Value5|
Message 3:
Stuff1,Header3|Value3|Header4|Value4|Header5|Value5|
Etc.
My current filter looks like:
filter {
dissect {
mapping => {
"message" => "%{stuff1},%{dollarVars}"
}
}
mutate {
convert => {
"stuff1"=>"integer"
}
}
date{
match => ["dateTime", "yyyy/MM/dd HH:mm:ss"]
}
}
What can I do to take my dissected dollarVars and break it up into the header with its respective value (the one to the right of each header)?
I suspect this would use the csv filter somehow using "|" instead of "," and somehow telling it each dollarVar odd column is a header and each even column goes with its respective (n-1) odd column.
I think you're going to want to split that dollarVars using the split function within mutate.
From there, I'd break out into ruby via the ruby filter to loop through the resultant split array and build out the new fields:
ruby {
code => "
i = 0
while i < dollarVars.count
event.set(dollarVars[i++]) = dollarVars[i++]
end
"
}
Here is what fixed what I was trying to do. No mutate needed, just this ruby code.
ruby {
code => "
i = 0
vars = event.get('dollarVars')
v_array = vars.split('|')
while i < v_array.count
event.set(v_array[i],v_array[i+1])
i += 2
end
"
}

Pass data from JSON to variable for comparison

I have a request that I make in an API using GET
LWP::UserAgent,
the data is returned as JSON, with up to two results at most as follows:
{
"status":1,
"time":1507891855,
"response":{
"prices":{
"nome1\u2122":{
"preco1":1111,
"preco2":1585,
"preco3":1099
},
"nome2":{
"preco1":519,
"preco2":731,
"preco3":491
}
}
}
}
Dump:
$VAR1 = {
'status' => 1,
'time' => 1507891855,
'response' => {
'prices' => {
'nome1' => {
'preco1' => 1111,
'preco3' => 1099,
'preco2' => 1585
},
'nome2' => {
'preco3' => 491,
'preco1' => 519,
'preco2' => 731
}
}
}
};
What I would like to do is:
Take this data and save it in a variable to make a comparison using if with another variable that already has the name stored. The comparison would be with name1 / name2 and if it is true with the other variable it would get preco2 and preco3 to print everything
My biggest problem in the case is that some of these names in JSON contain characters like (TradeMark) that comes as \u2122 (some cases are other characters), so I can not make the comparison with the name of the other variable that is already with the correct name
nome1™
If I could only save the JSON already "converted" the characters would help me with the rest.
Basically after doing the request for the API I want to save the contents in a variable already converting all \u2122 to their respective character (this is the part that I do not know how to do in Perl) and then using another variable to compare them names are equal to show the price
Thanks for the help and any questions please tell me that I try to explain again in another way.
If I understand correctly, you need to get the JSON that you receive in UTF8 format to an internal variable that you can process. For that, you may use JSON::XS:
use utf8;
use JSON::XS;
my $name = "nome1™";
my $var1 = decode_json $utf8_encoded_json_text;
# Compare with name in $name
if( defined $var1->{'response'}->{'prices'}->{$name} ) {
# Do something with the name that matches
my $match = $var1->{'response'}->{'prices'}->{$name};
print $match->{'preco1'}, "\n";
}
Make sure you tell the Perl interpreter that your source is in UTF8 by specifying use utf8; at the beginning of the script. Then make sure you are editing the script with an editor that supports that format.
The function decode_json will return a ref to the converted value. In this case a hash ref. From there you work your way into the JSON.
If you know $name is going to be in the JSON you may omit the defined part. Otherwise, the defined clause will tell you whether the hash value is there. One you know, you may do something with it. If the hash values are a single word with no special characters, you may use $var1->{response}->{prices}->{$name}, but it is always safer to use $var1->{'response'}->{'prices'}->{$name}. Perl gets a bit ugly handling hash refs...
By the way, in JSON::XS you will also find the encode_json function to do the opposite and also an object oriented interface.

Logstash configuration for CSV with two table structures

I am trying to load a set of CSV files using logstash.
The CSV files contains two tables, where only the second table is of my interest. Any suggestions on how to skip the entries in the first table? (Say the first 50 lines of the CSV file)
My current conf files looks as follows:
input{
file{
path => "/home/username/pathtoData/*"
start_position => beginning
}
}
filter{
csv{
columns => ["col_name_a", "col_name_b", ...]
separator => ","
}
}
output{
elasticsearch{
hosts => ["localhost:portnum"]
}
}
You didn't specify the structure of the two datasets, but let's assume you have some way to tell them apart. For example, you could use a regular expression that counts the number of commas.
Suppose any column with 5 commas is one that you don't want. You could conditionally send those rows to the drop filter:
filter {
if [message] =~ /^([^,]*,){5}[^,]*$/ {
drop {}
}
}
Or, you could drop any column that does not have exactly 7 commas:
filter {
if [message] !~ /^([^,]*,){7}[^,]*$/ {
drop {}
}
}
If you need more elaborate filtering, you could use the grok filter to examine each row more closely. So long as you have some condition you can filter on, you can use conditionals to drop the rows that you don't want.

apply different functions to each element of a Perl data structure

Given an arbitrarily nested data structure, how can I create a new data structure so that all the elements in it have been standardized by applying a function on all the elements depending on the type of the element. For example, I might have
$data = {
name => 'some one',
date => '2010-10-10 12:23:45',
sale => [34, 22, 65],
cust => {
name => 'Jimmy',
addr => '1 Foobar Way',
amnt => 452.024,
item => ['books', 'pens', 'post-it notes']
}
}
and I want to convert all text values to upper case, all dates to UTC date times, find the square of all integers, round down all real numbers and add 1, and so on. So, in effect, I want to apply a different function to each element depending on the type of element.
In reality the data might arrive via a database query, in which case they are already a Perl data structure, or they might start life as a JSON object, in which case I can use JSON::from_json to convert it to a Perl data structure. The idea is to standardize all the values in the data structure based on the value type, and then spit out the Perl data structure back again as a JSON object.
I read the answers to executing a function on every element of a data structure and feel that Data::Rmap might do the trick, but can't figure out how. Seems like Rmap works on all the keys as well, not just the values.
It's crazy straightforward with Data::Rmap you mentioned.
use Data::Rmap qw( rmap );
rmap { $_ = transform($_); } $data;
Regarding the question in the comments:
use Data::Rmap qw( rmap );
use Scalar::Util qw( looks_like_number );
# Transforms $_ in place.
sub transform {
if (looks_like_number($_)) {
if (...) {
$_ *= 2;
}
$_ = 0+$_; # Makes it look like a number to JSON::XS
} else {
...
}
}
&rmap(\&transform, $data);