How to integrate ElasticSearch with MySQL? - mysql

In one of my project, I am planning to use ElasticSearch with MySQL.
I have successfully installed ElasticSearch. I am able to manage index in ES separately. but I don't know how to implement the same with MySQL.
I have read a couple of documents but I am a bit confused and not having a clear idea.

As of ES 5.x , they have given this feature out of the box with logstash plugin.
This will periodically import data from database and push to ES server.
One has to create a simple import file given below (which is also described here) and use logstash to run the script. Logstash supports running this script on a schedule.
# file: contacts-index-logstash.conf
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "user"
jdbc_password => "pswd"
schedule => "* * * * *"
jdbc_validate_connection => true
jdbc_driver_library => "/path/to/latest/mysql-connector-java-jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
statement => "SELECT * from contacts where updatedAt > :sql_last_value"
}
}
output {
elasticsearch {
protocol => http
index => "contacts"
document_type => "contact"
document_id => "%{id}"
host => "ES_NODE_HOST"
}
}
# "* * * * *" -> run every minute
# sql_last_value is a built in parameter whose value is set to Thursday, 1 January 1970,
# or 0 if use_column_value is true and tracking_column is set
You can download the mysql jar from maven here.
In case indexes do not exist in ES when this script is executed, they will be created automatically. Just like a normal post call to elasticsearch

Finally i was able to find the answer. sharing my findings.
To use ElasticSearch with Mysql you will require The Java Database Connection (JDBC) importer. with JDBC drivers you can sync your mysql data into elasticsearch.
I am using ubuntu 14.04 LTS and you will require to install Java8 to run elasticsearch as it is written in Java
following are steps to install ElasticSearch 2.2.0 and ElasticSearch-jdbc 2.2.0 and please note both the versions has to be same
after installing Java8 ..... install elasticsearch 2.2.0 as follows
# cd /opt
# wget https://download.elasticsearch.org/elasticsearch/release/org/elasticsearch/distribution/deb/elasticsearch/2.2.0/elasticsearch-2.2.0.deb
# sudo dpkg -i elasticsearch-2.2.0.deb
This installation procedure will install Elasticsearch in /usr/share/elasticsearch/ whose configuration files will be placed in /etc/elasticsearch .
Now lets do some basic configuration in config file. here /etc/elasticsearch/elasticsearch.yml is our config file
you can open file to change by
nano /etc/elasticsearch/elasticsearch.yml
and change cluster name and node name
For example :
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: servercluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: vps.server.com
#
# Add custom attributes to the node:
#
# node.rack: r1
Now save the file and start elasticsearch
/etc/init.d/elasticsearch start
to test ES installed or not run following
curl -XGET 'http://localhost:9200/?pretty'
If you get following then your elasticsearch is installed now :)
{
"name" : "vps.server.com",
"cluster_name" : "servercluster",
"version" : {
"number" : "2.2.0",
"build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe",
"build_timestamp" : "2016-01-27T13:32:39Z",
"build_snapshot" : false,
"lucene_version" : "5.4.1"
},
"tagline" : "You Know, for Search"
}
Now let's install elasticsearch-JDBC
download it from http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.3.1/elasticsearch-jdbc-2.3.3.1-dist.zip and extract the same in /etc/elasticsearch/ and create "logs" folder also there ( path of logs should be /etc/elasticsearch/logs)
I have one database created in mysql having name "ElasticSearchDatabase" and inside that table named "test" with fields id,name and email
cd /etc/elasticsearch
and run following
echo '{
"type":"jdbc",
"jdbc":{
"url":"jdbc:mysql://localhost:3306/ElasticSearchDatabase",
"user":"root",
"password":"",
"sql":"SELECT id as _id, id, name,email FROM test",
"index":"users",
"type":"users",
"autocommit":"true",
"metrics": {
"enabled" : true
},
"elasticsearch" : {
"cluster" : "servercluster",
"host" : "localhost",
"port" : 9300
}
}
}' | java -cp "/etc/elasticsearch/elasticsearch-jdbc-2.2.0.0/lib/*" -"Dlog4j.configurationFile=file:////etc/elasticsearch/elasticsearch-jdbc-2.2.0.0/bin/log4j2.xml" "org.xbib.tools.Runner" "org.xbib.tools.JDBCImporter"
now check if mysql data imported in ES or not
curl -XGET http://localhost:9200/users/_search/?pretty
If all goes well, you will be able to see all your mysql data in json format
and if any error is there you will be able to see them in /etc/elasticsearch/logs/jdbc.log file
Caution :
In older versions of ES ... plugin Elasticsearch-river-jdbc was used which is completely deprecated in latest version so do not use it.
I hope i could save your time :)
Any further thoughts are appreciated
Reference url : https://github.com/jprante/elasticsearch-jdbc

The logstash JDBC plugin will do the job:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/testdb"
jdbc_user => "root"
jdbc_password => "factweavers"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/home/comp/Downloads/mysql-connector-java-5.1.38.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# our query
schedule => "* * * *"
statement => "SELECT" * FROM testtable where Date > :sql_last_value order by Date"
use_column_value => true
tracking_column => Date
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "test-migrate"
"document_type" => "data"
"document_id" => "%{personid}"
}
}

To make it more simple I have created a PHP class to Setup MySQL with Elasticsearch. Using my Class you can sync your MySQL data in elasticsearch and also perform full-text search. You just need to set your SQL query and class will do the rest for you.

Related

Logstash: Unable to connect to external Amazon RDS Database

Am relatively new to logstash & Elasticsearch...
Installed logstash & Elasticsearch using on macOS Mojave (10.14.2):
brew install logstash
brew install elasticsearch
When I check for these versions:
brew list --versions
Receive the following output:
elasticsearch 6.5.4
logstash 6.5.4
When I open up Google Chrome and type this into the URL Address field:
localhost:9200
This is the JSON response that I receive:
{
"name" : "9oJAP16",
"cluster_name" : "elasticsearch_local",
"cluster_uuid" : "PgaDRw8rSJi-NDo80v_6gQ",
"version" : {
"number" : "6.5.4",
"build_flavor" : "oss",
"build_type" : "tar",
"build_hash" : "d2ef93d",
"build_date" : "2018-12-17T21:17:40.758843Z",
"build_snapshot" : false,
"lucene_version" : "7.5.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
Inside:
/usr/local/etc/logstash/logstash.yml
Resides the following variables:
path.data: /usr/local/Cellar/logstash/6.5.4/libexec/data
pipeline.workers: 2
path.config: /usr/local/etc/logstash/conf.d
log.level: info
path.logs: /usr/local/var/log
Inside:
/usr/local/etc/logstash/pipelines.yml
Resides the following variables:
- pipeline.id: main
path.config: "/usr/local/etc/logstash/conf.d/*.conf"
Have setup the following logstash_etl.conf file underneath:
/usr/local/etc/logstash/conf.d
Its contents:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://myapp-production.crankbftdpmc.us-west-2.rds.amazonaws.com:3306/products"
jdbc_user => "products_admin"
jdbc_password => "products123"
jdbc_driver_library => "/etc/logstash/mysql-connector/mysql-connector-java-5.1.21.jar"
jdbc_driver_class => "com.mysql.jdbc.driver"
schedule => "*/5 * * * *"
statement => "select * from products"
use_column_value => false
clean_run => true
}
}
# sudo /usr/share/logstash/bin/logstash-plugin install logstash-output-exec
output {
if ([purge_task] == "yes") {
exec {
command => "curl -XPOST 'localhost:9200/_all/products/_delete_by_query?conflicts=proceed' -H 'Content-Type: application/json' -d'
{
\"query\": {
\"range\" : {
\"#timestamp\" : {
\"lte\" : \"now-3h\"
}
}
}
}
'"
}
}
else {
stdout { codec => json_lines}
elasticsearch {
"hosts" => "localhost:9200"
"index" => "product_%{product_api_key}"
"document_type" => "%{[#metadata][index_type]}"
"document_id" => "%{[#metadata][index_id]}"
"doc_as_upsert" => true
"action" => "update"
"retry_on_conflict" => 7
}
}
}
When I do this:
brew services start logstash
Receive the following inside my /usr/local/var/log/logstash-plain.log file:
[2019-01-15T14:51:15,319][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x399927c7 run>"}
[2019-01-15T14:51:15,663][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-01-15T14:51:16,514][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2019-01-15T14:57:31,432][ERROR][logstash.inputs.jdbc ] Unable to connect to database. Tried 1 times {:error_message=>"Java::ComMysqlCjJdbcExceptions::CommunicationsException: Communications link failure\n\nThe last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server."}
[2019-01-15T14:57:31,435][ERROR][logstash.inputs.jdbc ] Unable to connect to database. Tried 1 times {:error_message=>"Java::ComMysqlCjJdbcExceptions::CommunicationsException: Communications link failure\n\nThe last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server."}[2019-01-15T14:51:15,319][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x399927c7 run>"}
[2019-01-15T14:51:15,663][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-01-15T14:51:16,514][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2019-01-15T14:57:31,432][ERROR][logstash.inputs.jdbc ] Unable to connect to database. Tried 1 times
What am I possibly doing wrong?
Is there a way to obtain a dump (e.g. mysqldump) from an Elasticsearch server (Stage or Production) and then reimport into a local instance running Elasticsearch without using logstash?
This is the same configuration file that works inside an Amazon EC-2 Production Instance but don't know why its not working in my local macOS Mojave instance?
You may encounter the SSL issue of RDS, since
If you use either the MySQL Java Connector v5.1.38 or later, or the MySQL Java Connector v8.0.9 or later to connect to your databases, even if you haven't explicitly configured your applications to use SSL/TLS when connecting to your databases, these client drivers default to using SSL/TLS. In addition, when using SSL/TLS, they perform partial certificate verification and fail to connect if the database server certificate is expired.
as described in AWS RDS Doc
To overcome, either set up the trust store for the LogStash, which is described in the above link as well.
Or take the risk to disable the SSL in the connecting string, like
jdbc_connection_string => "jdbc:mysql://myapp-production.crankbftdpmc.us-west-2.rds.amazonaws.com:3306/products?sslMode=DISABLED"

Can't import mysql data into elastic search

I'm trying to follow along with a course example haven't recieved any help in the FAQ's, tried everything I could find on google and here.
I'm not using docker just running this demo on my local machine(Ubunutu 18.04), both elastic search and mysql are running.
When I run "sudo bin/logstash -f /etc/logstash/conf.d/mysql.conf --path.settings /etc/logstash"
I get the following Error: com.mysql.jdbc.Driver not loaded. Are you sure you've included the correct jdbc driver in :jdbc_driver_library?
The driver does exist and path is correct.
when I use sudo bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/mysql.conf
It returns with configuration ok.
I'm using mysql-connector-java-5.1.47
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-0ubuntu0.18.04.1-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
Elasticsearch-6.4.2
Logstash-6.4.2
My mysql.conf is
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/movielens"
jdbc_user => "grunt"
jdbc_password => "password"
jdbc_driver_library => "/home/alarik/mysql-connecter-java-5.1.47/mysql-connector-java-5.1.47-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT * FROM movies"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "movielens-sql"
"document_type" => "data"
}
}
I solved the problem:
First check your java version:
root#xxxxxx:/# java -version
openjdk version "1.8.0_181"
If you are using 1.8 then you should use the JDBC42 version.
If you are using 1.7 then you should use the JDBC41 version.
If you are using 1.6 then you should use the JDBC43 version.
mysql setup:
mysql-connector-java-5.1.46.jar
jdbc_driver_library => "//path_to_jar/mysql-connector-java-5.1.46.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"

Mnesia DB elasticsearch

I using ejabberd server for chat communication. I'd like be able dynamicly search my archive messages. Now I'm using elasticsearch and logstash, but it working only on mysql db. It's my logstash config
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/ejabberd"
jdbc_user => "ejabber"
jdbc_password => "password"
jdbc_driver_library => "mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT * FROM ejabberd.archive"
}
}
output {
# stdout { codec => json_lines }
elasticsearch {
index => "muc_room"
hosts => ["localhost:9200"]
}
}
I need use mnesia DB, its default base for ejabber. How can connect mnesia DB with logstash, or it is possible use another way to include search engione to mnesia DB. Thank you
I would send the data directly to elasticsearch from ejabberd. That way, you don't need to have two separate things that need to be updated if you change storage engines. There's an Erlang package to talk to Elasticsearch. The documentation on it isn't great, but it's a pretty simple interface anyway.

Mysql to Elasticsearch - won't create index and export data

I am attempting to import a MySQL table into Elasticsearch.It is a table containing 10 different columns with a an 8 digits VARCHAR set as a Primary Key. MySQL database is located on a remote host.
To transfer data from MySQL into Elasticsearch I've decided to use Logstash and jdbc MySQL driver.
I am assuming that Logstash will create the index for me if it isn't there.
Here's my logstash.conf script:
input{
jdbc {
jdbc_driver_library => "/home/user/logstash/mysql-connector-java-5.1.17-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://[remotehostipnumber]/databasename"
jdbc_validate_connection => true
jdbc_user => "username"
jdbc_password => "password"
schedule => "* * * * *"
statement => "select * from table"
}
}
output
{
elasticsearch
{
index => "tables"
document_type => "table"
document_id => "%{table_id}"
hosts => "localhost:9200"
}stdout { codec => json_lines }
}
When running logstash config test it outputs 'Configration OK' message:
sudo /opt/logstash/bin/logstash --configtest -f /home/user/logstash/logstash.conf
Also when executing the logstash.conf script, Elasticsearch outputs:
Settings: Default filter workers: 1
Logstash startup completed
But when I go to check whether the index has been created and data has also been added:
curl -XGET 'localhost:9200/tables/table/_search?pretty=true'
I get:
{
"error" : {
"root_cause" : [ {
"type" : "index_not_found_exception",
"reason" : "no such index",
"resource.type" : "index_or_alias",
"resource.id" : "tables",
"index" : "table"
} ],
"type" : "index_not_found_exception",
"reason" : "no such index",
"resource.type" : "index_or_alias",
"resource.id" : "tables",
"index" : "tables"
},
"status" : 404
}
What could be the potential reasons behind the data not being indexed?
PS. I am keeping the Elasticsearch server running in the separate terminal window, to ensure Logstash can connect and interact with it.
For those who end up here looking for the answer to the similar problem.
My database had 4m rows and it must have been too much for logstash/elasticsearch/jdbc driver to handle in one command.
After I divided the initial transfer into 4 separate chunks of work, the script run and added the desired table into the elasticsearch NoSQL db.
use following code to export data from mysql table and create index in elastic search
echo '{
"type":"jdbc",
"jdbc":{
"url":"jdbc:mysql://localhost:3306/your_database_name",
"user":"your_database_username",
"password":"your_database_password",
"useSSL":"false",
"sql":"SELECT * FROM table1",
"index":"Index_name",
"type":"Index_type",
"poll" : "6s",
"autocommit":"true",
"metrics": {
"enabled" : true
},
"elasticsearch" : {
"cluster" : "clustername",
"host" : "localhost",
"port" : 9300
}
}
}' | java -cp "/etc/elasticsearch/elasticsearch-jdbc-2.3.4.0/lib/*" -"Dlog4j.configurationFile=file:////etc/elasticsearch/elasticsearch-jdbc-2.3.4.0/bin/log4j2.xml" "org.xbib.tools.Runner" "org.xbib.tools.JDBCImporter"

Filebeat and LogStash -- data in multiple different formats

I have Filebeat, Logstash, ElasticSearch and Kibana. Filebeat is on a separate server and it's supposed to receive data in different formats: syslog, json, from a database, etc and send it to Logstash.
I know how to setup Logstash to make it handle a single format, but since there are multiple data formats, how would I configure Logstash to handle each data format properly?
In fact, how can I setup them both, Logstash and Filebeat, so that all the data in different formats get sent from Filebeat and submitted to Logstash properly? I mean, the config setting which handle sending and receiving data.
To separate different types of inputs within the Logstash pipeline, use the type field and tags for more identification.
In your Filebeat configuration, you should be using a different prospector for each different data format, each prospector can then be set to have a different document_type: field.
Reference
For example:
filebeat:
# List of prospectors to fetch data.
prospectors:
# Each - is a prospector. Below are the prospector specific configurations
-
# Paths that should be crawled and fetched. Glob based paths.
# For each file found under this path, a harvester is started.
paths:
- "/var/log/apache/httpd-*.log"
# Type to be published in the 'type' field. For Elasticsearch output,
# the type defines the document type these entries should be stored
# in. Default: log
document_type: apache
-
paths:
- /var/log/messages
- "/var/log/*.log"
document_type: log_message
In the above example, logs from /var/log/apache/httpd-*.log will have document_type: apache, while the other prospector has document_type: log_message.
This document-type field becomes the type field when Logstash is processing the event. You can then use if statements in Logstash to do different processing on different types.
Reference
For example:
filter {
if [type] == "apache" {
# apache specific processing
}
else if [type] == "log_message" {
# log_message processing
}
}
If the "data formats" in your question are codecs, this has to be configured in the input of logstash. The following is about filebeat 1.x and logstash 2.x, not the elastic 5 stack.
In our setup, we have two beats inputs - the first is default = "plain":
beats {
port => 5043
}
beats {
port => 5044
codec => "json"
}
On the filebeat side, we need two filebeat instances, sending their output to their respective ports. It's not possible to tell filebeat "route this prospector to that output".
Documentation logstash: https://www.elastic.co/guide/en/logstash/2.4/plugins-inputs-beats.html
Remark: If you ship with different protocols, e.g. legacy logstash-forwarder / lumberjack, you need more ports.
Supported with 7.5.1
filebeat-multifile.yml // file beat installed on a machine
filebeat.inputs:
- type: log
tags: ["gunicorn"]
paths:
- "/home/hduser/Data/gunicorn-100.log"
- type: log
tags: ["apache"]
paths:
- "/home/hduser/Data/apache-access-100.log"
output.logstash:
hosts: ["0.0.0.0:5044"] // target logstash IP
gunicorn-apache-log.conf // log stash installed on another machine
input {
beats {
port => "5044"
host => "0.0.0.0"
}
}
filter {
if "gunicorn" in [tags] {
grok {
match => { "message" => "%{USERNAME:u1} %{USERNAME:u2} \[%{HTTPDATE:http_date}\] \"%{DATA:http_verb} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:android_client}\"" }
remove_field => "message"
}
}
else if "apache" in [tags] {
grok {
match => { "message" => "%{IPORHOST:client_ip} %{DATA:u1} %{DATA:u2} \[%{HTTPDATE:http_date}\] \"%{WORD:http_method} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:gd}\" \"%{DATA:u3}\""}
remove_field => "message"
}
}
}
output {
if "gunicorn" in [tags]{
stdout { codec => rubydebug }
elasticsearch {
hosts => [...]
index => "gunicorn-index"
}
}
else if "apache" in [tags]{
stdout { codec => rubydebug }
elasticsearch {
hosts => [...]
index => "apache-index"
}
}
}
Run filebeat from binary
Give proper permission to file
sudo chown root:root filebeat-multifile.yml
sudo chmod go-w filebeat-multifile.yml
sudo ./filebeat -e -c filebeat-multifile-1.yml -d "publish"
Run logstash from binary
./bin/logstash -f gunicorn-apache-log.conf