How can I remove field which are nil in CSV file - csv

My CSV file contains fields which are nil like that :
{ "message" => [
[0] "m_FRA-LIENSs-R2012-1;\r"
],
"#version" => "1",
"#timestamp" => "2015-05-24T13:51:14.735Z",
"host" => "debian",
"SEXTANT_UUID" => "m_FRA-LIENSs-R2012-1",
"SEXTANT_ALTERNATE_TITLE" => nil
}
How can I remove all : messages and fields
Here is my CSV file
SEXTANT_UUID|SEXTANT_ALTERNATE_TITLE
a1afd680-543c | ZONE_ENJEU
4b80d9ad-e59d | ZICO
800d640f-1f82 |
I want to delete the last line, I used filter ruby, but it doesn't work! It remove just the field not the entire message.

If you configure your Ruby filter like this, it will work:
filter {
# let ruby check all fields of the event and remove any empty ones
ruby {
code => "event.to_hash.delete_if {|field, value| value.blank? }"
}
}

I used if ([message]=~ "^;") { drop { } } ans it's work => that for csv file

Related

Logstash - splitting the log into a csv file

I want to use logstash to separate the appropriate logs by a constant value appearing in these logs, and then divide the log into pieces after the separator ("|") and put it into a csv file with headers. The logs I'm looking for are recognized by the constant (WID2). I also noticed that the message pulled out by GREEDYDATA gets cut off after about 85 characters
Example log:
2022-01-02 10:32:30,0000001 | WID2 | 3313141414 | Request | STEP_1 | OK | Message
And i want from this logs create csv file with headers: TIMESTAMP, VALUE, MESSAGE_TYPE, STEP, STATUS, MESSAGE. I do not want to save a constant value (WID2) in the csv file, it only serves to find my logs among others.
I wrote it but it doesn't work:
input {
file {
path => ["path"]
start_position => "beginning"
sincedb_path => "path"
}
}
filter {
grok {
match => {
"message" => "%{GREEDYDATA:SYSLOGMESSAGE}"
}
}
if ([SYSLOGMESSAGE] !~ "WID2"){
drop {}
}
if([SYSLOGMESSAGE] =~ 'WID2") {
csv {
separator => "|"
columns => ["TIMESTAMP", "VALUE", "MESSAGE_TYPE", "STEP", "STATUS", "MESSAGE"]
}
}
}
output{
file {
path => ["path.csv"]
}
}
If your log messages have this format:
2022-01-02 10:32:30,0000001 | WID2 | 3313141414 | Request | STEP_1 | OK | Message
And you want to parse every message that has WID2 on it, the following filter will work.
filter {
if "WID2" in [message] {
csv {
separator => "|"
columns => ["TIMESTAMP", "[#metadata][wid2]", "VALUE", "MESSAGE_TYPE", "STEP", "STATUS", "MESSAGE"]
}
} else {
drop {}
}
}
The if conditional will test if WID2 is present in the message, if it is true, it will use the csv filter to parse it, since the second column of your csv is the value WID2 and you do not want to save it, you can store its value in the field [#metadata][wid2], this metadata field will not be present in the output block.
If the string WID2 is not present in the message field, the event is dropped.

Parsing out awkward JSON in Logstash

Afternoon,
I've been trying to sort this for the past few weeks and cannot find a solution. We receive some logs via a 3rd part and so far I've used grok to pull out the value below into the details field. Annoyingly this would be extremely simple if it weren't for the all the slashes.
Is there an easy way to parse this data out as JSON in Logstash?
{\"CreationTime\":\"2021-05-11T06:42:44\",\"Id\":\"xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx\",\"Operation\":\"SearchMtpBatch\",\"OrganizationId\":\"xxxxxxxxx-xxx-xxxx-xxxx-xxxxxxx\",\"RecordType\":52,\"UserKey\":\"eample#example.onmicrosoft.com\",\"UserType\":5,\"Version\":1,\"Workload\":\"SecurityComplianceCenter\",\"UserId\":\"example#example.onmicrosoft.com\",\"AadAppId\":\"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx\",\"DataType\":\"MtpBatch\",\"DatabaseType\":\"DataInsights\",\"RelativeUrl\":\"/DataInsights/DataInsightsService.svc/Find/MtpBatch?tenantid=xxxxxxx-xxxxx-xxxx-xxx-xxxxxxxx&PageSize=200&Filter=ModelType+eq+1+and+ContainerUrn+eq+%xxurn%xAZappedUrlInvestigation%xxxxxxxxxxxxxxxxxxxxxx%xx\",\"ResultCount\":\"1\"}
You can achieve this easily with the json filter:
filter {
json {
source => "message"
}
}
If your source data actually contains those backslashes, then you need to somehow remove them before Logstash can recognise the message as valid JSON.
You could do that before it hits Logstash, then the json codec will probably work as expected. Or if you want Logstash to handle it, you can use the Mutate's gsub option, followed by the JSON filter to parse:
filter {
mutate {
gsub => ["message", "[\\]", "" ]
}
json {
source => "message"
}
}
A couple of things to note: this will just blindly strip out all backslashes. If your strings ever might contain backslashes, you need to do something a little more sophisticated. I've had trouble escaping backslashes in gsub before and found that using the regex any of/[] construction is safer.
Here's a docker one-liner to run that config. The stdin input and stdout output are the default when using -e to specify config on the command line, so I've omitted them here for readability:
docker run --rm -it docker.elastic.co/logstash/logstash:7.12.1 -e 'filter { mutate { gsub => ["message", "[\\]", "" ]} json { source => "message" } }'
Pasting your example in and hitting return results in this output:
{
"#timestamp" => 2021-05-13T01:57:40.736Z,
"RelativeUrl" => "/DataInsights/DataInsightsService.svc/Find/MtpBatch?tenantid=xxxxxxx-xxxxx-xxxx-xxx-xxxxxxxx&PageSize=200&Filter=ModelType+eq+1+and+ContainerUrn+eq+%xxurn%xAZappedUrlInvestigation%xxxxxxxxxxxxxxxxxxxxxx%xx",
"OrganizationId" => "xxxxxxxxx-xxx-xxxx-xxxx-xxxxxxx",
"UserKey" => "eample#example.onmicrosoft.com",
"DataType" => "MtpBatch",
"message" => "{\"CreationTime\":\"2021-05-11T06:42:44\",\"Id\":\"xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx\",\"Operation\":\"SearchMtpBatch\",\"OrganizationId\":\"xxxxxxxxx-xxx-xxxx-xxxx-xxxxxxx\",\"RecordType\":52,\"UserKey\":\"eample#example.onmicrosoft.com\",\"UserType\":5,\"Version\":1,\"Workload\":\"SecurityComplianceCenter\",\"UserId\":\"example#example.onmicrosoft.com\",\"AadAppId\":\"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx\",\"DataType\":\"MtpBatch\",\"DatabaseType\":\"DataInsights\",\"RelativeUrl\":\"/DataInsights/DataInsightsService.svc/Find/MtpBatch?tenantid=xxxxxxx-xxxxx-xxxx-xxx-xxxxxxxx&PageSize=200&Filter=ModelType+eq+1+and+ContainerUrn+eq+%xxurn%xAZappedUrlInvestigation%xxxxxxxxxxxxxxxxxxxxxx%xx\",\"ResultCount\":\"1\"}",
"UserType" => 5,
"UserId" => "example#example.onmicrosoft.com",
"type" => "stdin",
"host" => "de2c988c09c7",
"#version" => "1",
"Operation" => "SearchMtpBatch",
"AadAppId" => "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx",
"ResultCount" => "1",
"DatabaseType" => "DataInsights",
"Version" => 1,
"RecordType" => 52,
"CreationTime" => "2021-05-11T06:42:44",
"Id" => "xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx",
"Workload" => "SecurityComplianceCenter"
}

Add a header line before each json document

I have a json file with 1000 json object.
is there any way to add a header line before each json document ? Is there any easiest way ?
Example : I have 1000 object like this
{"id":58,"first_name":"Louis","last_name":"Jordan","email":"ljordan1l#nature.com","gender":"Male","Latitude":"-15.93444","Longitude":"-50.14028"}
i want to add index header like below for every json object so that i can use in Elasticsearch Bulk api
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "unique_id" } }
{"id":58,"first_name":"Louis","last_name":"Jordan","email":"ljordan1l#nature.com","gender":"Male","Latitude":"-15.93444","Longitude":"-50.14028"}
If you are willing to leverage Logstash, you don't need to modify your file and can simply read it line by line and stream it to ES using the elasticsearch output which leverages the Bulk API.
Store the following Logstash configuration in a file named es.conf (make sure the file path and ES hosts match your settings):
input {
file {
path => "/path/to/your/json"
sincedb_path => "/dev/null"
start_position => "beginning"
codec => "json"
}
}
filter {
mutate {
remove_fields => ["#version", "#timestamp"]
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "test"
document_type => "type1"
document_id => "%{id}"
}
}
Then, you need to install logstash and you'll be able to run the following command in order to load your JSON files to your ES server:
bin/logstash -f es.conf
I found the best way to Add a header line before each json document.
https://stackoverflow.com/a/30899000/5029432

Elastic Search Bulk Import from JSON without ID

Is there any way to import data from a JSON file into elasticSearch without having to provide ID to each document?
I have some data in a JSON file. It contains around 1000 documents but no ID has been specified for any document. Here's how the data looks like:
{"business_id": "aasd231as", "full_address": "202 McClure 15034", "hours":{}}
{"business_id": "123123444", "full_address": "1322 lure 34", "hours": {}}
{"business_id": "sd231as", "full_address": "2 McCl 5034", "hours": {}}
It does not have {"index":{"_id":"5"}} before any document.
Now I am trying to import the data into elasticsearch using the following command:
curl -XPOST localhost:9200/newindex/newtype/_bulk?pretty --data-binary #path/file.json
But it throws the following error:
"type" : "illegal_argument_exception",
"reason" : "Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]"
This is because of the absence of ID in line before each document.
Is there any way to import the data without providing {"index":{"_id":"5"}} before each document.
Any help will be highly appreciated!!
How about using Logstash which is perfectly suited for this task. Just use the following config file and you're done:
Save the following config in logstash.conf:
input {
file {
path => "/path/to/file.json"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => "json"
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp", "path", "host" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "newindex"
document_type => "newtype"
workers => 1
}
}
Then start logstash with
bin/logstash -f logstash.conf
Another option, perhaps the easier one since you are not filtering data is to use filebeat. Latest filebeat-5.0.0-alpha3 has JSON shipper. Here is a sample

elasticsearch delete documents using logstash and csv

Is there any way to delete documents from ElasticSearch using Logstash and a csv file?
I read the Logstash documentation and found nothing and tried a few configs but nothing happened using action "delete"
output {
elasticsearch{
action => "delete"
host => "localhost"
index => "index_name"
document_id => "%{id}"
}
}
Has anyone tried this? Is there anything special that I should add to the input and filter sections of the config? I used file plugin for input and csv plugin for filter.
It is definitely possible to do what you suggest, but if you're using Logstash 1.5, you need to use the transport protocol as there is a bug in Logstash 1.5 when doing deletes over the HTTP protocol (see issue #195)
So if your delete.csv CSV file is formatted like this:
id
12345
12346
12347
And your delete.conf Logstash config looks like this:
input {
file {
path => "/path/to/your/delete.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["id"]
}
}
output {
elasticsearch{
action => "delete"
host => "localhost"
port => 9300 <--- make sure you have this
protocol => "transport" <--- make sure you have this
index => "your_index" <--- replace this
document_type => "your_doc_type" <--- replace this
document_id => "%{id}"
}
}
Then when running bin/logstash -f delete.conf you'll be able to delete all the documents whose id is specified in your CSV file.
In addition to Val's answer, I would add that if you have a single input that has a mix of deleted and upserted rows, you can do both if you have a flag that identifies the ones to delete. The output > elasticsearch > action parameter can be a "field reference," meaning that you can reference a per-row field. Even better, you can change that field to a metadata field so that it can be used in a field reference without being indexed.
For example, in your filter section:
filter {
# [deleted] is the name of your field
if [deleted] {
mutate {
add_field => {
"[#metadata][elasticsearch_action]" => "delete"
}
}
mutate {
remove_field => [ "deleted" ]
}
} else {
mutate {
add_field => {
"[#metadata][elasticsearch_action]" => "index"
}
}
mutate {
remove_field => [ "deleted" ]
}
}
}
Then, in your output section, reference the metadata field:
output {
elasticsearch {
hosts => "localhost:9200"
index => "myindex"
action => "%{[#metadata][elasticsearch_action]}"
document_type => "mytype"
}
}