I have a CSV file, and I want to import it in my Elastic Search.
I am on Windows 10 and I also have a Kibana be able to browse data once imported. I use Logstash to try to make this import.
All of my services (Kibana, ES and Logstash) are running on my localhost.
I tried with a the following Logstash configuration file (my csv file is in the correct path):
input {
file {
path => ["E:\Programmes\Logstash\logstash-2.2.0\data\*.csv"]
start_position => "beginning"
}
}
filter {
csv {
columns => ["billing_entity","invoice","company","username","reference","line_number","recipient_number","zone","network","date","time","country","duration","cost","currency","call_origin","billing_type"]
separator => ";"
}
#grok {
# match => { "call" => "%{WORD:billing_entity} %{WORD:invoice} %{WORD:company} %{WORD:username} %{WORD:reference} %{NUMBER:line_number} %{NUMBER:recipient_number} %{WORD:zone} %{WORD:network} %{DATE:date} %{TIME:time} %{WORD:country} %{WORD:duration} %{NUMBER:cost} %{WORD:currency} %{WORD:call_origin} %{WORD:billing_type}" }
#}
}
output {
elasticsearch {
action => "index"
index => "call_samples"
#index => "call-%{+YYYY.MM.dd}"
hosts => "localhost"
workers => 1
}
}
As you can see I tried to use 'csv' or 'grok' filter.
Then I launched in verbose mode logstash with this configuration file :
logstash.bat -f ..\conf\logstash.conf -v > logfile.txt
EDIT : after each try, I delete the generated sincedb files to simulate changes. But anyway I noticed they are empty
But in the logs I see nothing relevant :
message=>"Using mapping template from"
message=>"Attempting to install template"
message=>"New Elasticsearch output"
message=>"Registering file input"
message=>"No sincedb_path set,generating o....
message=>"Using mapping template from ...
message=>"Attempting to install template"
message=>"New Elasticsearch output"
message=>"Starting pipeline"
message=>"Pipeline started"
Quite alike my file is ignored .... I also tried several indexes , etc ... it will never import data.
To check if data is present I make query on localhost:9200 or I browse Kibana search bar "Index name or pattern" with the index "call_samples".
Can anyone help me on this please ? I'm stuck at this point ... Thanks
EDIT 2 :
Ok I'm dumb on this one, I just wanted to redirect the logs streams to a file when I was launching logstash :
logstash.bat -f ..\conf\logstash.conf -v > logfile.txt
But it was probably breaking the input file from being imported. So I just removed the part where I redirect on file :
logstash.bat -f ..\conf\logstash.conf -v
Now, my index is correctly created, but no data is being imported ...
It was an encoding issue, and even in verbose mode, it never told me it was failing or something ... not even a little clue.
So I tested with a new test file encoded in UTF-8 and it worked well ...
Related
I have test results that are being stored in json files. I then have logstash locate the file and attempt to send all of the lines to elasticsearch. Only about half of the lines are being sent and can't figure out why certain lines are being left out. For example, there will be 34 lines, but only 14 are sent.
input {
file {
path => "/data/*.json"
start_position => "beginning"
}
}
# ----------------------------------------------------------------------
filter {
# Parse fields out of JSON message, then remove the raw JSON.
json {
source => "message"
}
}
# ----------------------------------------------------------------------
output {
elasticsearch {
hosts => ["host:9200", "localhost:9200"]
index => "ct-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
I'm not sure if there is something within the json itself that causes logstash to just skip over it, or if there's something wrong with my logstash.conf file that I posted above.
Logstash computes files from different types to send it to elasticsearch in Json format. In your case, a Filebeat agent with an elasticsearch output would be enough to send a json file to ES and to index it.
It would look like this using Filebeat 6.x :
#=========================== Filebeat inputs =============================
filebeat.inputs:
- type: log
# Paths to the logs
paths:
- "/your/path/to/your/logs/file.json"
# tags to identify the logs source, .gz files excluded from the prospector
tags: ["beats","yourtag"]
exclude_files: ['\.gz$']
#================================ Outputs =====================================
#----------------------------- Elasticsearch output --------------------------------
output.elasticsearch:
# The ES host & index name
hosts: ["yourEShost:9200"]
index: "ct-%{+YYYY.MM.dd}"
I use the ELK stack to parse CSV files and send them to ElasticSearch after parsing them with logstash.
Unfortunately, I have a problem:
When I send my files to the listening directory of the "input" of my logstash pipeline, the records are doubled, see triplets, without my asking anything ...
Indeed :
This is what my pipeline looks like:
input {
file {
path => "/home/XXX/report/*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ";"
columns => ["Name", "Status", "Category", "Type", "EndPoint", "Group", "Policy", "Scanned At", "Reported At", "Affected Application"]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "malwarebytes-report"
}
stdout {}
}
When I send my first file containing 28 records in "/home/XXX/report/", this is what ElasticSearch says:
[root # lrtstfpe1 confd]#curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open malwarebytes-report PO4g6rKRTb6yuMDb7i-6sg 5 1 28 0 25.3kb 25.3kb
So far so good, but when I send my second file of 150 records ...:
[root # lrtstfpe1 confd]#curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open malwarebytes-report PO4g6rKRTb6yuMDb7i-6sg 5 1 328 0 263.3kb 263.3kb
The 150 recordings have been doubled and added to the first 28 ...
What's going on ??
Several days that I am stuck on the problem, I really need you ..
UPDATE :
You need to look in /etc/logstash/conf.d and see if there are any other config files there
The problem is that I only have one pipeline in this folder ... So:
I just completely uninstalled the ELK stack (rpm -e elasticsearch kibana logstash filebeat) as well as any ELK traces (rm -rf /var/lib/ELK/ var/log/ELK/ etc/default/ELK /usr/share/ELK ...) So, nothing anywhere.
I just reinstall everything:
rpm -ivh elasticsearch-6.2.3.rpm
rpm -ivh kibana-6.2.3-x86_64.rpm
rpm -ivh logstash-6.2.3.rpm
And start the services: service ELK restart
Then, in terms of configurations:
/etc/elasticsearch.yml is completely by default.
/etc/kibana.yml is completely by default.
/etc/logstash.yml is completely by default.
Then, I put my one and ONLY pipeline named "pip.conf" in /etc/logstash/conf.d/
Its configuration:
input {
file {
path => "/home/report/*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ";"
columns => ["Name","Status","Category","Type","EndPoint","Group","Policy","Scanned At","Reported At","Affected Application"]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "malwarebytes-report"
}
stdout{}
}
And finally, I launch my pipeline :
I go into /usr/share/logstash and I execute :
bin/logstash -f /etc/logstash/conf.d/pip.conf
After few secondes, my pipeline is listening, and now, I put my file1.csv and my file2.csv into /home/report/.
file1.csv contains 28 records and file2.csv contains 150 records.
But now, when I check my index : curl -XGET 'localhost:9200/_cat/indices?v&pretty'
My index "malwarebytes-report" contains 357 records ... (150x2 + 28x2 ...)
I don't understand NOTHING ....
If you able to use other tools other than logstash in order to load files into elasticsearch you can you elasticsearch-loader.
I'm the author of moshe/elasticsearch_loader
I wrote ESL for this exact problem.
You can download it with pip:
pip install elasticsearch-loader
And then you will be able to load csv files into elasticsearch by issuing:
elasticsearch_loader --index incidents --type incident csv file1.csv
I'm using logstash to push log data from csv files into elastic search. This is my configuration file to push the csv files into elasticsearch.
input
{
file {
path => "C:\Projects\Tools\ELK\data\CI_Report\*.csv"
start_position =>"beginning"
}
}
filter {
csv {
columns =>["LOG_LEVEL", "TOOL_ID", "ID_ANALYSIS", "ARCH_COMP", "Files",
"Component Analysed","Date","Tool Version"]
separator =>","
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "cilog_all"
}
stdout {
codec => json
}
}
This is my command to start the logstash, logstash -f logcsv.conf
Note : It works If I use the actual file path to a single csv file. It never works if I use *.csv to load log data from all csv files located in the specified folder.
Is Glob pattern supported by logstash input plugin ? as mentioned here
https://www.elastic.co/guide/en/logstash/current/glob-support.html
Everything looks ok save for your path.
Being windows you need to change the path to this as \ is an escape character
path => "C:/Projects/Tools/ELK/data/CI_Report/*.csv"
I am trying to configure Logstash which should read all json files which are located in specific directory.
JSON files update every time by python script and I need to read all files from scratch every time (not just checking the last lines).
I have tested two variants of configs:
1 config:
input {
file {
path => "/tmp/*.json"
start_position => "beginning"
}
}
output{
stdout{codec => rubydebug}
}
2 config:
input {
file {
path => "/tmp/*.json"
start_position => "beginning"
sincedb_path => "/dev/null"
ignore_older => 0
}
}
output{
stdout{codec => rubydebug}
}
Result in both cases:
Logstash can`t detect the changes when python updates the files (File timestamp is changing every time)
Even if I change something inside of the file logstash detects it only first time and print full file but after that it can`t detect changes anymore.
Final result:
At the end Logstash just need to get all files (new files should be detected automatically) in specific directory, read the content and push it somewhere. Can you please help to do it?
I'm using Ubuntu 14.04 LTS, Kibana, Logstash and Elasticsearch. I tried the following code to import my csv file to LogStash but it doesnt detect.
input
{
file
{
path => "/home/kibana/Downloads/FL_insurance_sample.csv"
type => "FL_insurance_sample.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter
{
csv
{
columns => ["policyID","statecode","country","eq_site_limit","hu_site_limit",
"fl_sitelimit","fr_site_limit","tiv_2011","tiv_2012","eq_site_deductible",
"hu_site_deductible","fl_site_deductible","fr_site_deductible","point_latitude",
"point_longtitude","line","construction","point_granularity"]
separator => ","
}
}
output
{
elasticsearch {
action => "index"
host => "localhost"
index => "promosms-%{+dd.MM.YYYY}"
workers => 1
}
stdout
{
codec => rubydebug
}
}
I even did
sudo service logstash restart
When I went into index mapping in Kibana GUI interface, i chose Logstash-* and couldn't find the data that I wanted.
P.S. my config file is stored in /etc/logstash/conf.d/simple.conf
In your question, you state that you went to Logstash-* in Kibana, but your configuration file says that you are putting data into promosms-%{+dd.MM.YYYY}.
You need to go into kibana4's setup section and put [promosms-]DD.MM.YYYY into the Index name or pattern box and check both the "index contains time-based events" and "Use event times to create index names".
Then you might also want to set that as your default index.