CSV & Lgstash & Elasticsearch - csv

I need help !
I am new in Elasticsearch ... I just imported data in CSV format from Kaggle and I integrated them into elasticsearch via logstash by specifying the configuration file, but on elasticsearch my data is no longer readable (it is in this format: u0000E \ u0000l )
Did I need to add any converts to the config file?
Thank you.

I tried: codec => plain { charset=> "UTF-8" } and ruby { code => 'event.set("decoded", Base64.decode64(event.get("message")))' } but but it didn't work
this is what I get: {
"host" => "HIND",
"id" => "\u0000G\u0000u\u0000e\u0000s\u0000t\u0000s\u0000 \u0000c\u0000a\u0000n\u0000 \u0000r\u0000e\u0000l\u0000a\u0000x\u0000 \u0000i\u0000n\u0000 \u0000t\u0000h\u0000e\u0000 \u0000g\u0000a\u0000r\u0000d\u0000e\u0000n\u0000 \u0000a\u0000t\u0000 \u0000t\u0000h\u0000e\u0000 \u0000p\u0000r\u0000o\u0000p\u0000e\u0000r\u0000t\u0000y\u0000.\u0000",
"message" => "\u0000G\u0000u\u0000e\u0000s\u0000t\u0000s\u0000 \u0000c\u0000a\u0000n\u0000 \u0000r\u0000e\u0000l\u0000a\u0000x\u0000 \u0000i\u0000n\u0000 \u0000t\u0000h\u0000e\u0000 \u0000g\u0000a\u0000r\u0000d\u0000e\u0000n\u0000 \u0000a\u0000t\u0000 \u0000t\u0000h\u0000e\u0000 \u0000p\u0000r\u0000o\u0000p\u0000e\u0000r\u0000t\u0000y\u0000.\u0000",
"decoded" => "\x1A\xE7\xAC\xB6\xC7\x1A\x9E\xB7\xA5k\x18\xA7\xB6\x17\xA0j\xB7^\x9D\xABm\x85\xEAk\xA2\x97\xAB\xB7",
"path" => "C:/elastic_stack/data/hotelsb_df_es.csv",
"#version" => "1",
"#timestamp" => 2021-08-04T18:04:46.170Z
}

Related

How to migrate Mysql data to elasticsearch using logstash

I need a brief explanation of how can I convert MySQL data to Elastic Search using logstash.
can anyone explain the step by step process about this
This is a broad question, I don't know how much you familiar with MySQL and ES. Let's say you have a table user. you may just simply dump it as csv and load it at your ES will be good. but if you have a dynamic data, like the MySQL just like a pipeline, you need to write a Script to do those stuff. anyway you can check the below link to build your basic knowledge before you ask How.
How to dump mysql?
How to load data to ES
Also, since you probably want to know how to covert your CSV to json file, which is the best suite for ES to understand.
How to covert CSV to JSON
You can do it using the jdbc input plugin for logstash.
Here is a config example.
Let me provide you with a high level instruction set.
Install Logstash, and Elasticsearch.
In Logstash bin folder copy jar ojdbc7.jar.
For logstash, create a config file ex: config.yml
#
input {
# Get the data from database, configure fields to get data incrementally
jdbc {
jdbc_driver_library => "./ojdbc7.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "jdbc:oracle:thin:#db:1521:instance"
jdbc_user => "user"
jdbc_password => "pwd"
id => "some_id"
jdbc_validate_connection => true
jdbc_validation_timeout => 1800
connection_retry_attempts => 10
connection_retry_attempts_wait_time => 10
#fetch the db logs using logid
statement => "select * from customer.table where logid > :sql_last_value order by logid asc"
#limit how many results are pre-fetched at a time from the cursor into the client’s cache before retrieving more results from the result-set
jdbc_fetch_size => 500
jdbc_default_timezone => "America/New_York"
use_column_value => true
tracking_column => "logid"
tracking_column_type => "numeric"
record_last_run => true
schedule => "*/2 * * * *"
type => "log.customer.table"
add_field => {"source" => "customer.table"}
add_field => {"tags" => "customer.table" }
add_field => {"logLevel" => "ERROR" }
last_run_metadata_path => "last_run_metadata_path_table.txt"
}
}
# Massage the data to store in index
filter {
if [type] == 'log.customer.table' {
#assign values from db column to custom fields of index
ruby{
code => "event.set( 'errorid', event.get('ssoerrorid') );
event.set( 'msg', event.get('errormessage') );
event.set( 'logTimeStamp', event.get('date_created'));
event.set( '#timestamp', event.get('date_created'));
"
}
#remove the db columns that were mapped to custom fields of index
mutate {
remove_field => ["ssoerrorid","errormessage","date_created" ]
}
}#end of [type] == 'log.customer.table'
} #end of filter
# Insert into index
output {
if [type] == 'log.customer.table' {
amazon_es {
hosts => ["vpc-xxx-es-yyyyyyyyyyyy.us-east-1.es.amazonaws.com"]
region => "us-east-1"
aws_access_key_id => '<access key>'
aws_secret_access_key => '<secret password>'
index => "production-logs-table-%{+YYYY.MM.dd}"
}
}
}
Go to bin, Run as
logstash -f config.yml

Need help in identifying TYPE of document ingested by Elasticsearch (through Logstash)

I used Logstash to ingest csv files from https://www.kaggle.com/wcukierski/the-simpsons-by-the-data and saved it to Elasticsearch. For starters, I ingested simpsons_characters.csv using the following conf:
input {
file {
path => "/Users/xyz/Downloads/the-simpsons-by-the-data/simpsons_characters.csv"
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["id", "name", "normalized_name", "gender"]
separator => ","
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
}
}
However, when I query like so: http://localhost:9200/simpsons/name/Lou
where
simpsons = index
name = type (I think ... not sure)
I get the following response back:
{
"_index": "simpsons",
"_type": "name",
"_id": "Lou",
"found": false
}
So, the question is, why am I not getting the correct response. Further, when you do bulk ingestion through csv, what is the type of the document?
Thanks!
The default type in Logstash Elasticsearch output is logs. So, no matter how you define your IDs (either take it from the csv - document_id => "%{id}" or let ES define its own), you can get those documents as http://localhost:9200/simpsons/logs/THE_ID.
If you don't know the id and want to simply check if something is there: http://localhost:9200/simpsons/logs/_search?pretty.
If you want to see what is the mapping of your index, for example to find out the _type of the index: http://localhost:9200/simpsons/_mapping?pretty.
To change the default _type:
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
document_type => "characters"
document_id => "%{id}"
}
Here you haven't specified id field in your logstash output. In this case elasticsearch would asign a random id to your documents and you are searching for a document with id=Lou.
Adding document_id => "%{id}" would solve your problem.
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
document_id => "%{id}"
}
}

import csv into elasticsearch

I'm doing "elastic search getting started" tutorial. Unfortunatelly this tutorial doesn't cover first step which is importing csv database into elasticsearch.
I googled to find solution but it doesn't work unfortunatelly. Here is what I want to achieve and what I have:
I have a file with data which I want to import (simplified)
id,title
10,Homer's Night Out
12,Krusty Gets Busted
I would like to import it using logstash. After research over the internet I end up with following config:
input {
file {
path => ["simpsons_episodes.csv"]
start_position => "beginning"
}
}
filter {
csv {
columns => [
"id",
"title"
]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
action => "index"
hosts => ["127.0.0.1:9200"]
index => "simpsons"
document_type => "episode"
workers => 1
}
}
I have a trouble with specifying document type so once data is imported and I navigate to http://localhost:9200/simpsons/episode/10 I expect to see result with episode 10.
Good job, you're almost there, you're only missing the document ID. You need to modify your elasticsearch output like this:
elasticsearch {
action => "index"
hosts => ["127.0.0.1:9200"]
index => "simpsons"
document_type => "episode"
document_id => "%{id}" <---- add this line
workers => 1
}
After this you'll be able to query episode with id 10
GET http://localhost:9200/simpsons/episode/10
I'm the author of moshe/elasticsearch_loader
I wrote ESL for this exact problem.
You can download it with pip:
pip install elasticsearch-loader
And then you will be able to load csv files into elasticsearch by issuing:
elasticsearch_loader --index incidents --type incident csv file1.csv
Additionally, you can use custom id file by adding --id-field=document_id to the command line

Logstash Produces No Output

I followed the tutorial found here to set up the ELK stack, and eventually I managed to get everything working. When I tried to modify the system to read in CSV files, however, it stopped working entirely. The conf file looks like this:
input {
file {
path => "/home/user/remaining/path/*.csv"
type => "transaction"
start_position => "beginning"
}
}
filter {
if [type] == "transaction" {
csv {
columns => ["#timestamp", "ip address", "domain", "user", "demo", "id", "activity", "detail"]
separator => ","
}
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-test"
workers => 1
}
}
I've tried following a number of tutorials and guides for setting this up, and as far as I can tell logstash is still connected to elasticsearch. I just don't see any output. My suspicion is that it isn't reading the files at all. Part of the problem is that I'm not sure how to test each component of logstash. Is there something I'm likely to have missed?
Edit:
csv files look generally like this:
2016-02-29T22:26:39.319700,22.111.11.11,place.domain.ca,bob,DEMO,95081299250aa8,TI_START,"{'field': 'data', 'field2': 'moredata', 'anotherfield': 'evenmoredata', 'continuedfield': 'habbo', 'size': '16'}"
2016-02-29T22:27:00.098426,24.111.11.11,otherplace.domain.ca,bob,DEMO,390s8clearlyfake,TI_END,"{'field': 'data', 'field2': 'moredata', 'anotherfield': 'evenmoredata', 'continuedfield': 'habbo', 'size': '16'}"
I've also noticed that when I go to localhost:9200/logstash-test/ I'm getting a 404. I'm not sure if this is because there's no data transmitting, or if it's a different connection issue.
As part of the discussion here: https://discuss.elastic.co/t/logstash-not-showing-any-output-solved/28636/16
My files were more than 24 hours old, which is the default expiry time for a file input. It was fixed by changing the following where ignore_older was 86400 as a default:
input {
file {
ignore_older => 864000
path => "/home/sean/cost-logs/transaction/*.csv"
type => "transaction"
start_position => "beginning"
}
}

Use Logstash CSV filter doesn't work

I was trying to use CSV filter on Logstash but it can upload values of my file.
I'm using Ubuntu Server 14.04, kibana 4, logstash 1.4.2 and elasticsearch 1.4.4.
Next I show my CSV file and filter I wrote. Am I doing something wrong?
CSV File:
Joao,21,555
Miguel,24,1000
Rodrigo,43,443
Maria,54,2343
Antonia,67,213
Logstash CSV filter:
#Este e filtro que le o ficheiro e permite alocar os dados num index do Elasticsearch
input
{
file
{
path => ["/opt/logstash/bin/testeFile_lite.csv"]
start_position => "beginning"
# sincedb_path => "NIL"
}
}
filter
{
csv
{
columns => ["nome", "idade", "salario"]
separator => ","
}
}
output
{
elasticsearch
{
action => "index"
host => "localhost"
index => "logstash-%{+YYYY.MM.dd}"
}
stdout
{
codec => rubydebug
}
}
When I execute filter, appears: using milestone 2 input plugin 'file'... and using milestone 2 input plugin 'csv'... and the OK message doesn't appear.
Someone can help me?
I solved the problem adding the field sincedb_path in the input file.
Here's the Logstash CSV filter:
input
{
file
{
path => "/opt/logstash/bin/testeFile_lite.csv"
type => "testeFile_lite"
start_position => "beginning"
sincedb_path => "/opt/logstash/bin/dbteste"
}
}
filter
{
csv
{
columns => ['nome', 'idade', 'salario']
separator => ","
}
}
output
{
elasticsearch
{
action => "index"
host => "localhost"
index => "xpto"
cluster => "SIC_UTAD"
}
stdout
{
codec => rubydebug
}
}