I am trying to load a set of CSV files using logstash.
The CSV files contains two tables, where only the second table is of my interest. Any suggestions on how to skip the entries in the first table? (Say the first 50 lines of the CSV file)
My current conf files looks as follows:
input{
file{
path => "/home/username/pathtoData/*"
start_position => beginning
}
}
filter{
csv{
columns => ["col_name_a", "col_name_b", ...]
separator => ","
}
}
output{
elasticsearch{
hosts => ["localhost:portnum"]
}
}
You didn't specify the structure of the two datasets, but let's assume you have some way to tell them apart. For example, you could use a regular expression that counts the number of commas.
Suppose any column with 5 commas is one that you don't want. You could conditionally send those rows to the drop filter:
filter {
if [message] =~ /^([^,]*,){5}[^,]*$/ {
drop {}
}
}
Or, you could drop any column that does not have exactly 7 commas:
filter {
if [message] !~ /^([^,]*,){7}[^,]*$/ {
drop {}
}
}
If you need more elaborate filtering, you could use the grok filter to examine each row more closely. So long as you have some condition you can filter on, you can use conditionals to drop the rows that you don't want.
Related
Hi i'm new to Logstash and Grok syntax, I'm trying to understand what are those line meaning
codec => multiline { pattern => "^{$" negate => "true" what => "previous" }
and
mutate {
gsub => ["message", "'", '"']
}
Thanks !
it is pretty well explain in the official documentation of the multiline codec plugin:
pattern => ^{$ matches lines that only contain a { character and end immediately
negate => true means that line NOT matching the pattern are considered
what => previous means that the current matched line relates to the previous one
In summary, these settings mean that all lines that do NOT consist of only { belong to the previous line.
Concretely, this multiline filter is for putting together JSON data that was pretty-printed on several line, like this:
{
"bla": {
"test": 1
}
}
The above pretty-printed JSON will be handled as if it had been printed as a single line, like this:
{ "bla": { "test": 1 } }
Regarding the second filter (mutate/gsub), it is used to replace all single quotes with double quotes.
I have a dollarVars term filled with things in this format...
Message 1:
Stuff1,Header1|Value1|Header2|Value2|Header3|Value3|
Message 2:
Stuff1,Header2|Value2|Header4|Value4|Header5|Value5|
Message 3:
Stuff1,Header3|Value3|Header4|Value4|Header5|Value5|
Etc.
My current filter looks like:
filter {
dissect {
mapping => {
"message" => "%{stuff1},%{dollarVars}"
}
}
mutate {
convert => {
"stuff1"=>"integer"
}
}
date{
match => ["dateTime", "yyyy/MM/dd HH:mm:ss"]
}
}
What can I do to take my dissected dollarVars and break it up into the header with its respective value (the one to the right of each header)?
I suspect this would use the csv filter somehow using "|" instead of "," and somehow telling it each dollarVar odd column is a header and each even column goes with its respective (n-1) odd column.
I think you're going to want to split that dollarVars using the split function within mutate.
From there, I'd break out into ruby via the ruby filter to loop through the resultant split array and build out the new fields:
ruby {
code => "
i = 0
while i < dollarVars.count
event.set(dollarVars[i++]) = dollarVars[i++]
end
"
}
Here is what fixed what I was trying to do. No mutate needed, just this ruby code.
ruby {
code => "
i = 0
vars = event.get('dollarVars')
v_array = vars.split('|')
while i < v_array.count
event.set(v_array[i],v_array[i+1])
i += 2
end
"
}
I have a request that I make in an API using GET
LWP::UserAgent,
the data is returned as JSON, with up to two results at most as follows:
{
"status":1,
"time":1507891855,
"response":{
"prices":{
"nome1\u2122":{
"preco1":1111,
"preco2":1585,
"preco3":1099
},
"nome2":{
"preco1":519,
"preco2":731,
"preco3":491
}
}
}
}
Dump:
$VAR1 = {
'status' => 1,
'time' => 1507891855,
'response' => {
'prices' => {
'nome1' => {
'preco1' => 1111,
'preco3' => 1099,
'preco2' => 1585
},
'nome2' => {
'preco3' => 491,
'preco1' => 519,
'preco2' => 731
}
}
}
};
What I would like to do is:
Take this data and save it in a variable to make a comparison using if with another variable that already has the name stored. The comparison would be with name1 / name2 and if it is true with the other variable it would get preco2 and preco3 to print everything
My biggest problem in the case is that some of these names in JSON contain characters like (TradeMark) that comes as \u2122 (some cases are other characters), so I can not make the comparison with the name of the other variable that is already with the correct name
nome1™
If I could only save the JSON already "converted" the characters would help me with the rest.
Basically after doing the request for the API I want to save the contents in a variable already converting all \u2122 to their respective character (this is the part that I do not know how to do in Perl) and then using another variable to compare them names are equal to show the price
Thanks for the help and any questions please tell me that I try to explain again in another way.
If I understand correctly, you need to get the JSON that you receive in UTF8 format to an internal variable that you can process. For that, you may use JSON::XS:
use utf8;
use JSON::XS;
my $name = "nome1™";
my $var1 = decode_json $utf8_encoded_json_text;
# Compare with name in $name
if( defined $var1->{'response'}->{'prices'}->{$name} ) {
# Do something with the name that matches
my $match = $var1->{'response'}->{'prices'}->{$name};
print $match->{'preco1'}, "\n";
}
Make sure you tell the Perl interpreter that your source is in UTF8 by specifying use utf8; at the beginning of the script. Then make sure you are editing the script with an editor that supports that format.
The function decode_json will return a ref to the converted value. In this case a hash ref. From there you work your way into the JSON.
If you know $name is going to be in the JSON you may omit the defined part. Otherwise, the defined clause will tell you whether the hash value is there. One you know, you may do something with it. If the hash values are a single word with no special characters, you may use $var1->{response}->{prices}->{$name}, but it is always safer to use $var1->{'response'}->{'prices'}->{$name}. Perl gets a bit ugly handling hash refs...
By the way, in JSON::XS you will also find the encode_json function to do the opposite and also an object oriented interface.
I have an imported CSV ($csv) with multiple headers, one of which is "Target Server". In the CSV this column has values device1, device2 etc.
I also have a hashtable ($hash) with name/value pairs of name(device1) - value(fqdn1) etc.
So I would like to replace the "device1" in CSV with the correct value from the hashtable, like:
foreach($row in $csv)
if($hash[$_.Name]){
$row."Target Server"= $hash[$_.Value]
}
Am I getting warm?
Use the ContainsKey() method to see if the hashtable contains an entry with a specific name/key:
foreach($row in $csv) {
if($hash.ContainsKey($row.'Target Server')) {
$row.'Target Server' = $hash[$row.'Target Server']
}
}
I want to import many informations from a CSV file to Elastic Search.
My issue is I don't how can I use a equivalent of substring to select information into a CSV column.
In my case I have a field date (YYYYMMDD) and I want to have (YYYY-MM-DD).
I use filter, mutate, gsub like:
filter
{
mutate
{
gsub => ["date", "[0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789]", "[0123456789][0123456789][0123456789][0123456789]-[0123456789][0123456789]-[0123456789][0123456789]"]
}
}
But my result is false.
I can indentified my string but I don't how can I extract part of this.
My target it's to have something like:
gsub => ["date", "[0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789]","%{date}(0..3}-%{date}(4..5)-%{date}"(6..7)]
%{date}(0..3} : select from the first to the 4 characters of csv columns date
You can use ruby plugin to do conversion. As you say, you will have a date field. So, we can use it directly in ruby
filter {
ruby {
code => "
date = Time.strptime(event['date'],'%Y%m%d')
event['date_new'] = date.strftime('%Y-%m-%d')
"
}
}
The date_new field is the format you want.
First, you can use a regexp range to match a sequence, so rather than [0123456789], you can do [0-9]. If you know there will be 4 numbers, you can do [0-9]{4}.
Second, you want to "capture" parts of your input string and reorder them in the output. For that, you need capture groups:
([0-9]{4})([0-9]{2})([0-9]{2})
where parens define the groups. Then you can reference those on the right side of your gsub:
\1-\2-\3
\1 is the first capture group, etc.
You might also consider getting these three fields when you do the grok{}, and then putting them together again later (perhaps with add_field).