I would like to reindex and filter my log again. What I get the information from Internet is using the logstash to filter the data again.
I tried and it can really split my data into different fields, however, the data keeps looping. That is, I have 100,000 logs but after filtering and output to elasticsearch, I found that more than 100,000 logs are output into elasticsearch and the logs are duplicated. Does anyone have idea on that?
Moreover, I receive below log when running logstash, although it said that error phasing JSON, I found that the log can still be filtered. Why would be like that?
Thank you!
Here is my logstash config:
input {
elasticsearch {
hosts => "10.0.132.56"
index => "logstash-2018.01.04"
}
}
filter{
grok {
match => {"message" => "%{TIMESTAMP_ISO8601:logdate} %{GREEDYDATA:vmname} %{GREEDYDATA:message}"}
overwrite => [ "message" ]
}
}
filter {
json {
source => "scrmsg"
}
}
output {
elasticsearch {
hosts => ["10.0.132.64:9200"]
manage_template => false
index => "logstash-2018.01.04-1"
}
}
Here is the error log:
[2018-01-11T15:15:32,010][WARN ][logstash.filters.json ] Error parsing json {:source=>"scrmsg", :raw=>"Trident/5.0)\",\"geoip_country\":\"US\",\"allowed\":\"1\",\"threat_score\":\"268435456\",\"legacy_unique_id\":\"\",\"cache_status\":\"-\",\"informed_id\":\"\",\"primitive_id\":\"2BC2D8AD-7AD0-3CAD-9453-B0335F409701\",\"valid_ajax\":\"0\",\"orgin_response_time\":\"0.081\",\"request_id\":\"cd2ae0a8-0921-48b6-b03f-15c71a55100b\",\"bytes_returned_origin\":\"83\",\"server_ip\":\"10.0.10.16\",\"origin_status_code\":\"\",\"calculated_pages_per_min\":\"1\",\"calculated_pages_per_session\":\"1\",\"calculated_session_length\":\"0\",\"k_s\":\"\",\"origin_address\":\"10.0.10.16:443\",\"request_protocol\":\"https\",\"server_serial\":\"5c3eb4ad-3799-4bd8-abb2-42edecd54b99\",\"nginx_worker_process\":\"19474\",\"origin_content_type\":\"application/json;charset=UTF-8\",\"lb_request_time\":\"\",\"SID\":\"\",\"geoip_org\":\"Drake Holdings LLC\",\"accept\":\"*/*\",\"accept_encoding\":\"gzip, deflate\",\"accept_language\":\"\",\"connection\":\"Keep-Alive\",\"http_request_length\":\"418\",\"real_ip_header_value\":\"204.79.180.18\",\"http_host\":\"www.honeyworkshop.com\",\"machine_learning_score\":\"\",\"HSIG\":\"ALE_UHCF\",\"ZID\":\"\",\"ZUID\":\"\",\"datacenter_id\":\"363\",\"new_platform_domain_id\":\"3063fc0b-5b48-4413-9bc7-600039caf64c\",\"whitelist_score\":\"0\",\"billable\":\"1\",\"distil_action\":\"#proxy\",\"js_additional_threats\":\"\",\"js_kv_additional_threats\":\"\",\"re_field_1\":\"\",\"re_field_2\":\"\",\"re_field_3\":\"\",\"http_accept_charset\":\"\",\"sdk_token_id\":\"\",\"sdk_application_instance_id\":\"\",\"per_path_calculated_pages_per_minute\":\"1\",\"per_path_calculated_pages_per_session\":\"1\",\"path_security_type\":\"api\",\"identification_provider\":\"web\",\"identifier_record_pointer\":\"\",\"identifier_record_value\":\"\",\"path_rule_scope_id\":\"\",\"experiment_id\":\"0\",\"experiment_score\":\"\",\"experiment_group_id\":\"\",\"experiment_auxiliary_string\":\"\",\"type\":\"distil\"}\n", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Trident': was expecting ('true', 'false' or 'null') at [Source: (byte[])"Trident/5.0)","geoip_country":"US","allowed":"1","threat_score":"268435456","legacy_unique_id":"","cache_status":"-","informed_id":"","primitive_id":"2BC2D8AD-7AD0-3CAD-9453-B0335F409701","valid_ajax":"0","orgin_response_time":"0.081","request_id":"cd2ae0a8-0921-48b6-b03f-15c71a55100b","bytes_returned_origin":"83","server_ip":"10.0.10.16","origin_status_code":"","calculated_pages_per_min":"1","calculated_pages_per_session":"1","calculated_session_length":"0","k_s":"","origin_address":"10.0.10"[truncated 1180 bytes]; line: 1, column: 9]>}
I used Logstash to ingest csv files from https://www.kaggle.com/wcukierski/the-simpsons-by-the-data and saved it to Elasticsearch. For starters, I ingested simpsons_characters.csv using the following conf:
input {
file {
path => "/Users/xyz/Downloads/the-simpsons-by-the-data/simpsons_characters.csv"
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["id", "name", "normalized_name", "gender"]
separator => ","
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
}
}
However, when I query like so: http://localhost:9200/simpsons/name/Lou
where
simpsons = index
name = type (I think ... not sure)
I get the following response back:
{
"_index": "simpsons",
"_type": "name",
"_id": "Lou",
"found": false
}
So, the question is, why am I not getting the correct response. Further, when you do bulk ingestion through csv, what is the type of the document?
Thanks!
The default type in Logstash Elasticsearch output is logs. So, no matter how you define your IDs (either take it from the csv - document_id => "%{id}" or let ES define its own), you can get those documents as http://localhost:9200/simpsons/logs/THE_ID.
If you don't know the id and want to simply check if something is there: http://localhost:9200/simpsons/logs/_search?pretty.
If you want to see what is the mapping of your index, for example to find out the _type of the index: http://localhost:9200/simpsons/_mapping?pretty.
To change the default _type:
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
document_type => "characters"
document_id => "%{id}"
}
Here you haven't specified id field in your logstash output. In this case elasticsearch would asign a random id to your documents and you are searching for a document with id=Lou.
Adding document_id => "%{id}" would solve your problem.
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
document_id => "%{id}"
}
}
I'm doing "elastic search getting started" tutorial. Unfortunatelly this tutorial doesn't cover first step which is importing csv database into elasticsearch.
I googled to find solution but it doesn't work unfortunatelly. Here is what I want to achieve and what I have:
I have a file with data which I want to import (simplified)
id,title
10,Homer's Night Out
12,Krusty Gets Busted
I would like to import it using logstash. After research over the internet I end up with following config:
input {
file {
path => ["simpsons_episodes.csv"]
start_position => "beginning"
}
}
filter {
csv {
columns => [
"id",
"title"
]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
action => "index"
hosts => ["127.0.0.1:9200"]
index => "simpsons"
document_type => "episode"
workers => 1
}
}
I have a trouble with specifying document type so once data is imported and I navigate to http://localhost:9200/simpsons/episode/10 I expect to see result with episode 10.
Good job, you're almost there, you're only missing the document ID. You need to modify your elasticsearch output like this:
elasticsearch {
action => "index"
hosts => ["127.0.0.1:9200"]
index => "simpsons"
document_type => "episode"
document_id => "%{id}" <---- add this line
workers => 1
}
After this you'll be able to query episode with id 10
GET http://localhost:9200/simpsons/episode/10
I'm the author of moshe/elasticsearch_loader
I wrote ESL for this exact problem.
You can download it with pip:
pip install elasticsearch-loader
And then you will be able to load csv files into elasticsearch by issuing:
elasticsearch_loader --index incidents --type incident csv file1.csv
Additionally, you can use custom id file by adding --id-field=document_id to the command line
I'm using koajs with bunyan to save error logs to my server then I use filebeat to have them shipped to my logstash application.
My error logs are being forwarded correctly however I would now like to create a filter which will add a tag to specific logs.
{"name":"myapp","hostname":"sensu-node-dev","pid":227,"level":50,"err":{"message":"Cannot find module 'lol'","name":"Error","stack":"Error: Cannot find module 'lol'\n at Function.Module._resolveFilename (module.js:339:15)\n at Function.Module._load (module.js:290:25)\n at Module.require (module.js:367:17)\n at require (internal/module.js:16:19)\n at Object.<anonymous> (/srv/www/dev.site/app.js:27:6)\n at next (native)\n at Object.<anonymous> (/srv/www/dev.site/node_modules/koa-compose/index.js:29:5)\n at next (native)\n at onFulfilled (/srv/www/dev.site/node_modules/co/index.js:65:19)\n at /srv/www/dev.site/node_modules/co/index.js:54:5","code":"MODULE_NOT_FOUND"},"msg":"Cannot find module 'lol'","time":"2016-02-24T22:04:26.492Z","v":0}
Now the interesting part in that specific log is "err":{...} and the "name":"Error" bits. For simplicity reasons I would just like to create a filter which detects "name":"Error" in the log (if it exists) and then apply a tag add_tag => ["error"] to the log.
Here is my /etc/logstash/conf.d/logstash.conf file:
input {
beats {
port => 5044
type => "logs"
}
}
filter {
grok {
type => "log"
pattern => "???" // <--- have no idea what to do here
add_tag => ["error"]
}
}
output {
elasticsearch {
hosts => "localhost:9200"
sniffing => true
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
http {
http_method => "post"
url => "<MY_URL>"
format => "message"
message => "{"text":"dis is workinz, you has error"}"
tags => ["error"]
}
}
I tried the following:
pattern => ""name":"Error""
But got the following error:
Error: Expected one of #, {, } at line 9, column 31 (byte 107) after filter {
grok {
match => { "message" => ""
You may be interested in the '--configtest' flag which you can
use to validate logstash's configuration before you choose
to restart a running system.
There is no simple example of this specific type of matching anywhere.
Bonus: Also how does one escape in logstash, I couldn't find anything on the subject?
If you only want to see if a string exists in your message, try this:
if [message] =~ /"name":"Error"/ {
mutate {
add_tag { ... }
}
}
If you really want to grok the input into fields, check out the json codec or filter instead.
We have some files that are written out to our web servers whenever we have php errors. Each error has it's own file, but there are always multiple lines in each file. The files always start with the text "Excepton:". Is there a way to easily just say, "take the whole file as a log event?" See example below:
Exception: ABC_Exception_Domain -- Message: There is no valid performance dimension for the given nodeRootId.
Error Date and Time:
Date: September 25, 2014
Time: 10:38:15
Timestamp: 1411659495
PersonId: 3947680
ProcessId: 18055
Memory Usage: 18194784
Machine Id:...
and here is my configuration file that I am using to test output:
input {
file {
type => "stack_trace"
path => [ "/root/20[1-9][0-9]*" ]
start_position => "beginning"
}
}
filter {
multiline {
type => "stack_trace"
pattern => "^Exception.*$"
negate => true
what => "previous"
}
}
output {
stdout {
codec => rubydebug
}
}
I have also tried this to use the last line as the log delimiter:
input {
file {
type => "stack_trace"
path => [ "/root/20[1-9][0-9]*" ]
start_position => "beginning"
}
}
filter {
multiline {
type => "stack_trace"
pattern => "^#[0-9]{1,3} \{main\}.*$"
negate => true
what => "previous"
}
}
output {
stdout {
codec => rubydebug
}
}
The only way I ever get any results is if I update the files manually and add another Exception or main line.
Thanks in advance for any help, and let me know if I can provide anymore information.
Thanks,
Chris.