Logstash Produces No Output - csv

I followed the tutorial found here to set up the ELK stack, and eventually I managed to get everything working. When I tried to modify the system to read in CSV files, however, it stopped working entirely. The conf file looks like this:
input {
file {
path => "/home/user/remaining/path/*.csv"
type => "transaction"
start_position => "beginning"
}
}
filter {
if [type] == "transaction" {
csv {
columns => ["#timestamp", "ip address", "domain", "user", "demo", "id", "activity", "detail"]
separator => ","
}
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-test"
workers => 1
}
}
I've tried following a number of tutorials and guides for setting this up, and as far as I can tell logstash is still connected to elasticsearch. I just don't see any output. My suspicion is that it isn't reading the files at all. Part of the problem is that I'm not sure how to test each component of logstash. Is there something I'm likely to have missed?
Edit:
csv files look generally like this:
2016-02-29T22:26:39.319700,22.111.11.11,place.domain.ca,bob,DEMO,95081299250aa8,TI_START,"{'field': 'data', 'field2': 'moredata', 'anotherfield': 'evenmoredata', 'continuedfield': 'habbo', 'size': '16'}"
2016-02-29T22:27:00.098426,24.111.11.11,otherplace.domain.ca,bob,DEMO,390s8clearlyfake,TI_END,"{'field': 'data', 'field2': 'moredata', 'anotherfield': 'evenmoredata', 'continuedfield': 'habbo', 'size': '16'}"
I've also noticed that when I go to localhost:9200/logstash-test/ I'm getting a 404. I'm not sure if this is because there's no data transmitting, or if it's a different connection issue.

As part of the discussion here: https://discuss.elastic.co/t/logstash-not-showing-any-output-solved/28636/16
My files were more than 24 hours old, which is the default expiry time for a file input. It was fixed by changing the following where ignore_older was 86400 as a default:
input {
file {
ignore_older => 864000
path => "/home/sean/cost-logs/transaction/*.csv"
type => "transaction"
start_position => "beginning"
}
}

Related

Logstash with elasticsearch input and output keep looping results

I would like to reindex and filter my log again. What I get the information from Internet is using the logstash to filter the data again.
I tried and it can really split my data into different fields, however, the data keeps looping. That is, I have 100,000 logs but after filtering and output to elasticsearch, I found that more than 100,000 logs are output into elasticsearch and the logs are duplicated. Does anyone have idea on that?
Moreover, I receive below log when running logstash, although it said that error phasing JSON, I found that the log can still be filtered. Why would be like that?
Thank you!
Here is my logstash config:
input {
elasticsearch {
hosts => "10.0.132.56"
index => "logstash-2018.01.04"
}
}
filter{
grok {
match => {"message" => "%{TIMESTAMP_ISO8601:logdate} %{GREEDYDATA:vmname} %{GREEDYDATA:message}"}
overwrite => [ "message" ]
}
}
filter {
json {
source => "scrmsg"
}
}
output {
elasticsearch {
hosts => ["10.0.132.64:9200"]
manage_template => false
index => "logstash-2018.01.04-1"
}
}
Here is the error log:
[2018-01-11T15:15:32,010][WARN ][logstash.filters.json ] Error parsing json {:source=>"scrmsg", :raw=>"Trident/5.0)\",\"geoip_country\":\"US\",\"allowed\":\"1\",\"threat_score\":\"268435456\",\"legacy_unique_id\":\"\",\"cache_status\":\"-\",\"informed_id\":\"\",\"primitive_id\":\"2BC2D8AD-7AD0-3CAD-9453-B0335F409701\",\"valid_ajax\":\"0\",\"orgin_response_time\":\"0.081\",\"request_id\":\"cd2ae0a8-0921-48b6-b03f-15c71a55100b\",\"bytes_returned_origin\":\"83\",\"server_ip\":\"10.0.10.16\",\"origin_status_code\":\"\",\"calculated_pages_per_min\":\"1\",\"calculated_pages_per_session\":\"1\",\"calculated_session_length\":\"0\",\"k_s\":\"\",\"origin_address\":\"10.0.10.16:443\",\"request_protocol\":\"https\",\"server_serial\":\"5c3eb4ad-3799-4bd8-abb2-42edecd54b99\",\"nginx_worker_process\":\"19474\",\"origin_content_type\":\"application/json;charset=UTF-8\",\"lb_request_time\":\"\",\"SID\":\"\",\"geoip_org\":\"Drake Holdings LLC\",\"accept\":\"*/*\",\"accept_encoding\":\"gzip, deflate\",\"accept_language\":\"\",\"connection\":\"Keep-Alive\",\"http_request_length\":\"418\",\"real_ip_header_value\":\"204.79.180.18\",\"http_host\":\"www.honeyworkshop.com\",\"machine_learning_score\":\"\",\"HSIG\":\"ALE_UHCF\",\"ZID\":\"\",\"ZUID\":\"\",\"datacenter_id\":\"363\",\"new_platform_domain_id\":\"3063fc0b-5b48-4413-9bc7-600039caf64c\",\"whitelist_score\":\"0\",\"billable\":\"1\",\"distil_action\":\"#proxy\",\"js_additional_threats\":\"\",\"js_kv_additional_threats\":\"\",\"re_field_1\":\"\",\"re_field_2\":\"\",\"re_field_3\":\"\",\"http_accept_charset\":\"\",\"sdk_token_id\":\"\",\"sdk_application_instance_id\":\"\",\"per_path_calculated_pages_per_minute\":\"1\",\"per_path_calculated_pages_per_session\":\"1\",\"path_security_type\":\"api\",\"identification_provider\":\"web\",\"identifier_record_pointer\":\"\",\"identifier_record_value\":\"\",\"path_rule_scope_id\":\"\",\"experiment_id\":\"0\",\"experiment_score\":\"\",\"experiment_group_id\":\"\",\"experiment_auxiliary_string\":\"\",\"type\":\"distil\"}\n", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Trident': was expecting ('true', 'false' or 'null') at [Source: (byte[])"Trident/5.0)","geoip_country":"US","allowed":"1","threat_score":"268435456","legacy_unique_id":"","cache_status":"-","informed_id":"","primitive_id":"2BC2D8AD-7AD0-3CAD-9453-B0335F409701","valid_ajax":"0","orgin_response_time":"0.081","request_id":"cd2ae0a8-0921-48b6-b03f-15c71a55100b","bytes_returned_origin":"83","server_ip":"10.0.10.16","origin_status_code":"","calculated_pages_per_min":"1","calculated_pages_per_session":"1","calculated_session_length":"0","k_s":"","origin_address":"10.0.10"[truncated 1180 bytes]; line: 1, column: 9]>}

Need help in identifying TYPE of document ingested by Elasticsearch (through Logstash)

I used Logstash to ingest csv files from https://www.kaggle.com/wcukierski/the-simpsons-by-the-data and saved it to Elasticsearch. For starters, I ingested simpsons_characters.csv using the following conf:
input {
file {
path => "/Users/xyz/Downloads/the-simpsons-by-the-data/simpsons_characters.csv"
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["id", "name", "normalized_name", "gender"]
separator => ","
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
}
}
However, when I query like so: http://localhost:9200/simpsons/name/Lou
where
simpsons = index
name = type (I think ... not sure)
I get the following response back:
{
"_index": "simpsons",
"_type": "name",
"_id": "Lou",
"found": false
}
So, the question is, why am I not getting the correct response. Further, when you do bulk ingestion through csv, what is the type of the document?
Thanks!
The default type in Logstash Elasticsearch output is logs. So, no matter how you define your IDs (either take it from the csv - document_id => "%{id}" or let ES define its own), you can get those documents as http://localhost:9200/simpsons/logs/THE_ID.
If you don't know the id and want to simply check if something is there: http://localhost:9200/simpsons/logs/_search?pretty.
If you want to see what is the mapping of your index, for example to find out the _type of the index: http://localhost:9200/simpsons/_mapping?pretty.
To change the default _type:
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
document_type => "characters"
document_id => "%{id}"
}
Here you haven't specified id field in your logstash output. In this case elasticsearch would asign a random id to your documents and you are searching for a document with id=Lou.
Adding document_id => "%{id}" would solve your problem.
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
document_id => "%{id}"
}
}

import csv into elasticsearch

I'm doing "elastic search getting started" tutorial. Unfortunatelly this tutorial doesn't cover first step which is importing csv database into elasticsearch.
I googled to find solution but it doesn't work unfortunatelly. Here is what I want to achieve and what I have:
I have a file with data which I want to import (simplified)
id,title
10,Homer's Night Out
12,Krusty Gets Busted
I would like to import it using logstash. After research over the internet I end up with following config:
input {
file {
path => ["simpsons_episodes.csv"]
start_position => "beginning"
}
}
filter {
csv {
columns => [
"id",
"title"
]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
action => "index"
hosts => ["127.0.0.1:9200"]
index => "simpsons"
document_type => "episode"
workers => 1
}
}
I have a trouble with specifying document type so once data is imported and I navigate to http://localhost:9200/simpsons/episode/10 I expect to see result with episode 10.
Good job, you're almost there, you're only missing the document ID. You need to modify your elasticsearch output like this:
elasticsearch {
action => "index"
hosts => ["127.0.0.1:9200"]
index => "simpsons"
document_type => "episode"
document_id => "%{id}" <---- add this line
workers => 1
}
After this you'll be able to query episode with id 10
GET http://localhost:9200/simpsons/episode/10
I'm the author of moshe/elasticsearch_loader
I wrote ESL for this exact problem.
You can download it with pip:
pip install elasticsearch-loader
And then you will be able to load csv files into elasticsearch by issuing:
elasticsearch_loader --index incidents --type incident csv file1.csv
Additionally, you can use custom id file by adding --id-field=document_id to the command line

Grok match json field and value

I'm using koajs with bunyan to save error logs to my server then I use filebeat to have them shipped to my logstash application.
My error logs are being forwarded correctly however I would now like to create a filter which will add a tag to specific logs.
{"name":"myapp","hostname":"sensu-node-dev","pid":227,"level":50,"err":{"message":"Cannot find module 'lol'","name":"Error","stack":"Error: Cannot find module 'lol'\n at Function.Module._resolveFilename (module.js:339:15)\n at Function.Module._load (module.js:290:25)\n at Module.require (module.js:367:17)\n at require (internal/module.js:16:19)\n at Object.<anonymous> (/srv/www/dev.site/app.js:27:6)\n at next (native)\n at Object.<anonymous> (/srv/www/dev.site/node_modules/koa-compose/index.js:29:5)\n at next (native)\n at onFulfilled (/srv/www/dev.site/node_modules/co/index.js:65:19)\n at /srv/www/dev.site/node_modules/co/index.js:54:5","code":"MODULE_NOT_FOUND"},"msg":"Cannot find module 'lol'","time":"2016-02-24T22:04:26.492Z","v":0}
Now the interesting part in that specific log is "err":{...} and the "name":"Error" bits. For simplicity reasons I would just like to create a filter which detects "name":"Error" in the log (if it exists) and then apply a tag add_tag => ["error"] to the log.
Here is my /etc/logstash/conf.d/logstash.conf file:
input {
beats {
port => 5044
type => "logs"
}
}
filter {
grok {
type => "log"
pattern => "???" // <--- have no idea what to do here
add_tag => ["error"]
}
}
output {
elasticsearch {
hosts => "localhost:9200"
sniffing => true
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
http {
http_method => "post"
url => "<MY_URL>"
format => "message"
message => "{"text":"dis is workinz, you has error"}"
tags => ["error"]
}
}
I tried the following:
pattern => ""name":"Error""
But got the following error:
Error: Expected one of #, {, } at line 9, column 31 (byte 107) after filter {
grok {
match => { "message" => ""
You may be interested in the '--configtest' flag which you can
use to validate logstash's configuration before you choose
to restart a running system.
There is no simple example of this specific type of matching anywhere.
Bonus: Also how does one escape in logstash, I couldn't find anything on the subject?
If you only want to see if a string exists in your message, try this:
if [message] =~ /"name":"Error"/ {
mutate {
add_tag { ... }
}
}
If you really want to grok the input into fields, check out the json codec or filter instead.

Logstash Multiline filter

We have some files that are written out to our web servers whenever we have php errors. Each error has it's own file, but there are always multiple lines in each file. The files always start with the text "Excepton:". Is there a way to easily just say, "take the whole file as a log event?" See example below:
Exception: ABC_Exception_Domain -- Message: There is no valid performance dimension for the given nodeRootId.
Error Date and Time:
Date: September 25, 2014
Time: 10:38:15
Timestamp: 1411659495
PersonId: 3947680
ProcessId: 18055
Memory Usage: 18194784
Machine Id:...
and here is my configuration file that I am using to test output:
input {
file {
type => "stack_trace"
path => [ "/root/20[1-9][0-9]*" ]
start_position => "beginning"
}
}
filter {
multiline {
type => "stack_trace"
pattern => "^Exception.*$"
negate => true
what => "previous"
}
}
output {
stdout {
codec => rubydebug
}
}
I have also tried this to use the last line as the log delimiter:
input {
file {
type => "stack_trace"
path => [ "/root/20[1-9][0-9]*" ]
start_position => "beginning"
}
}
filter {
multiline {
type => "stack_trace"
pattern => "^#[0-9]{1,3} \{main\}.*$"
negate => true
what => "previous"
}
}
output {
stdout {
codec => rubydebug
}
}
The only way I ever get any results is if I update the files manually and add another Exception or main line.
Thanks in advance for any help, and let me know if I can provide anymore information.
Thanks,
Chris.