I am trying to configure Logstash which should read all json files which are located in specific directory.
JSON files update every time by python script and I need to read all files from scratch every time (not just checking the last lines).
I have tested two variants of configs:
1 config:
input {
file {
path => "/tmp/*.json"
start_position => "beginning"
}
}
output{
stdout{codec => rubydebug}
}
2 config:
input {
file {
path => "/tmp/*.json"
start_position => "beginning"
sincedb_path => "/dev/null"
ignore_older => 0
}
}
output{
stdout{codec => rubydebug}
}
Result in both cases:
Logstash can`t detect the changes when python updates the files (File timestamp is changing every time)
Even if I change something inside of the file logstash detects it only first time and print full file but after that it can`t detect changes anymore.
Final result:
At the end Logstash just need to get all files (new files should be detected automatically) in specific directory, read the content and push it somewhere. Can you please help to do it?
Related
I have json file for test results generated by pytest using pytest-json pluggin. So each json file contains test results of one test run. I want to upload this single json file into elasticsearch through logstash. But when I try it with the below logstash conf file, it is splitting the json file and posting as multiple docs in elasticsearch where I expect it to be uploaded as only one doc. Because of this split my results data is all distributed in multiple docs and getting corrupted.
logstash conf:
input {
file {
start_position => "beginning"
path => "home/report.json"
sincedb_path => "/dev/null"
}
}
filter {
json {
source => "message"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "demo_ds"
}
}
Configure your file input with a multiline codec
input {
file {
path => "/home/user/report.json"
sincedb_path => "/dev/null"
start_position => "beginning"
codec => multiline { pattern => "^Spalanzani" negate => true what => "previous" auto_flush_interval => 2 }
}
}
The codec takes every line that does not match the regular expression ^Spalanzani (i.e., it takes every line) and combines them into one event. The auto_flush_interval is required because otherwise it will wait forever for a line that does match ^Spalanzani.
Note that a file input only accepts absolute paths.
I have a json file with 1000 json object.
is there any way to add a header line before each json document ? Is there any easiest way ?
Example : I have 1000 object like this
{"id":58,"first_name":"Louis","last_name":"Jordan","email":"ljordan1l#nature.com","gender":"Male","Latitude":"-15.93444","Longitude":"-50.14028"}
i want to add index header like below for every json object so that i can use in Elasticsearch Bulk api
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "unique_id" } }
{"id":58,"first_name":"Louis","last_name":"Jordan","email":"ljordan1l#nature.com","gender":"Male","Latitude":"-15.93444","Longitude":"-50.14028"}
If you are willing to leverage Logstash, you don't need to modify your file and can simply read it line by line and stream it to ES using the elasticsearch output which leverages the Bulk API.
Store the following Logstash configuration in a file named es.conf (make sure the file path and ES hosts match your settings):
input {
file {
path => "/path/to/your/json"
sincedb_path => "/dev/null"
start_position => "beginning"
codec => "json"
}
}
filter {
mutate {
remove_fields => ["#version", "#timestamp"]
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "test"
document_type => "type1"
document_id => "%{id}"
}
}
Then, you need to install logstash and you'll be able to run the following command in order to load your JSON files to your ES server:
bin/logstash -f es.conf
I found the best way to Add a header line before each json document.
https://stackoverflow.com/a/30899000/5029432
I'm using Ubuntu 14.04 LTS, Kibana, Logstash and Elasticsearch. I tried the following code to import my csv file to LogStash but it doesnt detect.
input
{
file
{
path => "/home/kibana/Downloads/FL_insurance_sample.csv"
type => "FL_insurance_sample.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter
{
csv
{
columns => ["policyID","statecode","country","eq_site_limit","hu_site_limit",
"fl_sitelimit","fr_site_limit","tiv_2011","tiv_2012","eq_site_deductible",
"hu_site_deductible","fl_site_deductible","fr_site_deductible","point_latitude",
"point_longtitude","line","construction","point_granularity"]
separator => ","
}
}
output
{
elasticsearch {
action => "index"
host => "localhost"
index => "promosms-%{+dd.MM.YYYY}"
workers => 1
}
stdout
{
codec => rubydebug
}
}
I even did
sudo service logstash restart
When I went into index mapping in Kibana GUI interface, i chose Logstash-* and couldn't find the data that I wanted.
P.S. my config file is stored in /etc/logstash/conf.d/simple.conf
In your question, you state that you went to Logstash-* in Kibana, but your configuration file says that you are putting data into promosms-%{+dd.MM.YYYY}.
You need to go into kibana4's setup section and put [promosms-]DD.MM.YYYY into the Index name or pattern box and check both the "index contains time-based events" and "Use event times to create index names".
Then you might also want to set that as your default index.
My config file is shown below:
input {
file {
codec => "json"
path => "/home/data/*"
sincedb_path => "/dev/null"
}
}
output {
elasticsearch{
protocol => "http"
host => "localhost"
index => "data"
}
}
When I download a .json file to the data directory, logstash will not receive the data or output to elasticsearch unless I first open the file and save it in gedit.
Running logstash with the -vvv flag shows no errors all I get when a file is put in that directory is
_discover_file: /home/data/*: new: /home/data/prediction1.json (exclude is []) {:level=>:debug, :file=>"filewatch/watch.rb", :line=>"126"}
_open_file: /home/data/prediction1.json: opening {:level=>:debug, :file=>"filewatch/tail.rb", :line=>"98"}
/home/data/prediction1.json: staying at position 0, no sincedb {:level=>:debug, :file=>"filewatch/tail.rb", :line=>"143"}
writing sincedb (delta since last write = 377) {:level=>:debug, :file=>"filewatch/tail.rb", :line=>"177"}
/home/data/prediction1.json: file grew, old size 0, new size 3952 {:level=>:debug, :file=>"filewatch/watch.rb", :line=>"81"}
This led me to believe that there may be a character encoding problem as the json file is created using json.dumps in python before reaching elasticsearch. I tried converting the data to yaml:
outfile.write(yaml.dump(json.dumps(data)))
which logstash accepts as plain text as it is now in the form:
'data:{ properties:
{ location:
{ here }}}'
after using sed and tr commands to convert this BACK to valid json logstash no longer accepts it unless saved through gedit!!!
So, I have a web platform that prints a JSON file per request containing some log data about that request. I can configure several rules about when should it log stuff, only at certain levels, etc...
Now, I've been toying with the Logstash + Elasticsearch + Kibana3 stack, and I'd love to find a way to see those logs in Kibana. My question is, is there a way to make Logstash import these kind of files, or would I have to write a custom input plugin for it? I've searched around and for what I've seen, plugins are written in Ruby, a language I don't have experience with.
Logstash is a very good tool for processing dynamic files.
Here is the way to import your json file into elasticsearch using logstash:
configuration file:
input
{
file
{
path => ["/path/to/json/file"]
start_position => "beginning"
sincedb_path => "/dev/null"
exclude => "*.gz"
}
}
filter
{
mutate
{
replace => [ "message", "%{message}" ]
gsub => [ 'message','\n','']
}
if [message] =~ /^{.*}$/
{
json { source => message }
}
}
output
{
elasticsearch {
protocol => "http"
codec => json
host => "localhost"
index => "json"
embedded => true
}
stdout { codec => rubydebug }
}
example of json file:
{"foo":"bar", "bar": "foo"}
{"hello":"world", "goodnight": "moon"}
Note the json need to be in one line. if you want to parse a multiline json file, replace relevant fields in your configuration file:
input
{
file
{
codec => multiline
{
pattern => '^\{'
negate => true
what => previous
}
path => ["/opt/mount/ELK/json/*.json"]
start_position => "beginning"
sincedb_path => "/dev/null"
exclude => "*.gz"
}
}
filter
{
mutate
{
replace => [ "message", "%{message}}" ]
gsub => [ 'message','\n','']
}
if [message] =~ /^{.*}$/
{
json { source => message }
}
}
Logstash is just a tool for converting various kinds of syslog files into JSON and loading them into elasticsearch (or graphite, or... ).
Since your files are already in JSON, you don't need logstash. You can upload them directly into elasticsearch using curl.
See Import/Index a JSON file into Elasticsearch
However, in order to work well with Kibana, your JSON files need to be at a minimum.
Flat - Kibana does not grok nested JSON structs. You need a simple hash of key/value pairs.
Have a identifiable timestamp.
What I would suggest is looking the JSON files logstash outputs and seeing if you can massage your JSON files to match that structure. You can do this in any language you
like that supports JSON. The program jq is very handy for filtering json from one format to another.
Logstash format - https://gist.github.com/jordansissel/2996677
jq - http://stedolan.github.io/jq/
Logstash can import different formats and sources as it provides a lot of plugins. There are also other log collector and forwarder tools that can send logs to logstash such as nxlog, rsyslog, syslog-ng, flume, kafka, fluentd, etc. From what I've heard most people use nxlog on windows (though it works on linux equally well) in combination with the ELK stack because of its low resource footprint. (Disclaimer: I'm affiliated with the project)