Selective parsing of csv file using logstash - csv

I am trying to feed data into elasticsearch from csv files, through logstash. These csv files contain the first row as the column names. Is there any particular way to skip that row while parsing the file? Are there any conditionals/filters that I could use such that in case of exception it would skip to the next row??
my config file looks like:
input {
file {
path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv"
type => "promosms_dec15"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["Comm_Plan","Queue_Booking","Order_Reference","Generation_Date"]
separator => ","
}
ruby {
code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);"
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
index => "promosms-%{+dd.MM.YYYY}"
workers => 1
}
}
The first few rows of my csv file looks like
"Comm_Plan","Queue_Booking","Order_Reference","Generation_Date"
"","No","FMN1191MVHV","31/03/2014"
"","No","FMN1191N64G","31/03/2014"
"","No","FMN1192OPMY","31/03/2014"
Is there anyway I could skip the first line? Also, if my csv file ends with a new line, with nothing in it, then also I get an error. How do I skip those new lines if they come at the end of the file or if thre is an empty row between 2 rows?

A simple way to do it would be to add the following to your filter (after csv, before ruby):
if [Comm_Plan] == "Comm_Plan" {
drop { }
}
Assuming the field would never normally have the same value as the column heading, it should work as expected, however, you could be more specific by using:
if [Comm_Plan] == "Comm_Plan" and [Queue_Booking] == "Queue_Booking" and [Order_Reference] == "Order_Reference" and [Generation_Date] == "Generation_Date" {
drop { }
}
All this would do would be to check to see if the field value had that particular value and if it did, drop the event.

try this:
mutate {
gsub => ["message","\r\n",""]
}
mutate {
gsub => ["message","\r",""]
}
mutate {
gsub => ["message","\n",""]
}
if ![message] {
drop { }
}

Related

Check if a field is empty while importing data from Excel file in Laravel?

I am trying to import data from excel file into database tables in Laravel. I have successfully imported the data but If I mistakenly leave a field empty then I get error that column cannot be null. So I need check if the all necessary data is provided.
I use this code.
IndependentContractor::create(['staff_id' => $staff->id, 'address' => $excel_row['address'], 'longitude' => $excel_row['longitude'], 'latitude' => $excel_row['latitude'], 'tax_id' => $excel_row['tax_id'], 'business_name' => $excel_row['business_name'], 'person_incharge' => $excel_row['person_incharge'], 'phone_of_person_incharge' => $excel_row['phone_of_person_incharge'], 'general_manager' => $excel_row['general_manager'], 'phone_of_general_manager' => $excel_row['phone_of_general_manager']]);
I can use If() to check the data but I will have to repeat this in almost 7 places because there are 7 different tables in which data is being stored.
also if statement will look like this.
if(!empty($excel_row['address']) && !empty($excel_row['longitude']) && !empty($excel_row['latitude']) && !empty($excel_row['business_name']) and so on )
So is there any better way to achieve this?
Thanks.
you can try using looping, using the array_key, assumming database column name = excel column name
example :
$data = [];
foreach($excel_row as $key => $value){
if(!empty($excel_row[$key])){
$data[$key] = $excel_row[$key];
}else{
dd('empty column found'); //your code here
}
}
//if everything goes fine
IndependentContractor::create($data);
class UsersImport implements ToModel, WithUpserts
{
/**
* #return string|array
*/
public function uniqueBy()
{
return 'email';
}
if (!isset($row[0])) {
return null;
}
return new User([
'name' => $row[0],
]);
}

Escape hyphen in fields for logstash

I have a problem with logstash parsing, I need to extract a field separated by hyphen:
if "[payload][text][code][session-id]" {
mutate {
add_field => { "session-id" => "%{[payload][text][code][session-id]}" }
}
}
the conditional is working because the quotes, but when I run the filter, it returns:
t session-id: %{[payload][text][code][session-id]}
and what I expect is:
t session-id: HK5wfPQgzkKHmgzVF
how I can scape the hypens in the payload field?
Your conditional is wrong, putting the field name between double quotes will make it a string and it will always be true, so your mutate filter will always run and add the field session-id with the content of the field [payload][text][code][session-id], if the field does not exist, the string %{[payload][text][code][session-id]} will be added into session-id as the value of the field.
Your conditional should be if [payload][text][code][session-id] without double quotes.
You can reproduce this behavior with the following test pipeline
input {
generator {
message => "HK5wfPQgzkKHmgzVF"
count => 1
}
}
filter {
dissect {
mapping => {
"message" => "%{[payload][text][code][session-id]}"
}
}
if "[payload][text][code][session-id]" {
mutate {
add_field => { "session-id" => "%{[payload][text][code][session-id]}"}
}
}
}
output {
stdout { }
}
Running this pipeline will give you the field session-id with the value HK5wfPQgzkKHmgzVF because the field [payload][text][code][session-id] exists and your conditional is always true, if you change the name of the field to [payload][text][code][session-id-test] in the dissect block and run the pipeline again, the value of session-id will now be %{[payload][text][code][session-id]}.
Remove the double quotes from the conditional and it will only be true if the field [payload][text][code][session-id] really exists.

Conditional header in CSV based on input

I am trying to feed data into elasticsearch from csv files, through logstash.
My CSV file contains two type of input REQUEST and RESPONSE.
If the input contains "|REQUEST|" string then it has 6 headers and
If the input contains "|RESPONSE|" string then it has 9 headers
Sample Input Data:
2016-04-04 01:37:36,724|INFO|RI404013736|REQUEST|PaymentVia3DS|PT160115.02516
2016-04-04 01:38:36,724|INFO|RI888993736|RESPONSE|PaymentVia3DS|PT160115.0251|556656|4498399|XYZ9
Filter:
filter {
if "REQUEST" in [tags] {
csv {
columns => ["#timestamp","LOG_TYPE","REQUEST_ID","REQUEST_TYPE","TANSACTION_TYPE","USER_IDENTIFIER"]
separator => "|"
}
}
if "RESPONSE" in [tags] {
csv {
columns => ["#timestamp","LOG_TYPE","REQUEST_ID","REQUEST_TYPE","TANSACTION_TYPE","USER_IDENTIFIER","CODE","ABC","ID_TYPE"]
separator => "|"
}
}
}
Filter is not working for my data. What is the correct way to do it?
Sounds like you're looking for a conditional across the "message" field:
if [message] =~ /REQUEST/ {
...
}
else if [message] =~ /RESPONSE/ {
...
}

How to generate #timestamp in logstash by combining two fields / columns of input csv

We have data that is coming from external sources as below in csv file:
orderid,OrderDate,BusinessMinute,Quantity,Price
31874,01-01-2013,00:06,2,17.9
The data has date in one column and time in another column - I need to generate a time-stamp by combining those two columns together.
I am using csv filter to read the above data from file using below configuration in logstash - which is generating its own timestamp:
input {
file {
path => "/root/data/import/Order.csv"
start_position => "beginning"
}
}
filter {
csv {
columns => ["orderid","OrderDate","BusinessMinute","Quantity","Price"]
separator => ","
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
index => "demo"
workers => 1
}
}
How to make the combination of OrderDate + Business Minute as the #timestamp?
Use a mutate filter to combine the OrderDate and BusinessMinute fields into a single (temporary) field, then use the date filter and have it delete the field if it's successful.
filter {
mutate {
add_field => {
"timestamp" => "%{OrderDate} %{BusinessMinute}"
}
}
date {
match => ["timestamp", "..."]
remove_field => ["timestamp"]
}
}

Logstash dynamically split events

is there a way to split a logstash (1.4.2) event into multiple other events?
My input looks like this:
{ "parts" => ["one", "two"],
"timestamp" => "2014-09-27T12:29:17.601Z"
"one.key=> "1", "one.value"=>"foo",
"two.key" => "2", "two.value"=>"bar"
}
And I'd like to create two events with the following content:
{ "key" => "1", "value" => "foo", "timestamp" => "2014-09-27T12:29:17.601Z" }
{ "key" => "2", "value" => "bar", "timestamp" => "2014-09-27T12:29:17.601Z" }
Problem is that I can't know the actual "parts"...
Thanks for your help :)
Updating a very old answer because there is a better way to do this in newer versions of logstash without resorting to a custom filter.
You can do this using a ruby filter and a split filter:
filter {
ruby {
code => '
arrayOfEvents = Array.new()
parts = event.get("parts")
timestamp = event.get("timestamp")
parts.each { |part|
arrayOfEvents.push({
"key" => event.get("#{part}.key"),
"value" => event.get("#{part}.value"),
"timestamp" => timestamp
})
event.remove("#{part}.key")
event.remove("#{part}.value")
}
puts arrayOfEvents
event.remove("parts")
event.set("event",arrayOfEvents)
'
}
split {
field => 'event'
}
mutate {
rename => {
"[event][key]" => "key"
"[event][value]" => "value"
"[event][timestamp]" => "timestamp"
}
remove_field => ["event"]
}
}
My original answer was:
You need to resort to a custom filter for this (you can't call yield from a ruby code filter which is what's needed to generate new events).
Something like this (dropped into lib/logstash/filters/custom_split.rb):
# encoding: utf-8
require "logstash/filters/base"
require "logstash/namespace"
# custom code to break up an event into multiple
class LogStash::Filters::CustomSplit < LogStash::Filters::Base
config_name "custom_split"
milestone 1
public
def register
# Nothing
end # def register
public
def filter(event)
return unless filter?(event)
if event["parts"].is_a?(Array)
event["parts"].each do |key|
e = LogStash::Event.new("timestamp" => event["timestamp"],
"key" => event["#{key}.key"],
"value" => event["#{key}.value"])
yield e
end
event.cancel
end
end
end
And then just put filter { custom_split {} } into your config file.
For future reference and based on #alcanzar answer, it is now possible to do things like this:
ruby {
code => "
# somefield is an array
array = event.get('somefield')
# drop the current event (this was my use case, I didn't need the feeding event)
event.cancel
# iterate over to construct new events
array.each { |a|
# creates a new logstash event
generated = LogStash::Event.new({ 'foo' => 'something' })
# puts the event in the pipeline queue
new_event_block.call(generated)
}
"
}