Mysql to Elasticsearch - won't create index and export data - mysql

I am attempting to import a MySQL table into Elasticsearch.It is a table containing 10 different columns with a an 8 digits VARCHAR set as a Primary Key. MySQL database is located on a remote host.
To transfer data from MySQL into Elasticsearch I've decided to use Logstash and jdbc MySQL driver.
I am assuming that Logstash will create the index for me if it isn't there.
Here's my logstash.conf script:
input{
jdbc {
jdbc_driver_library => "/home/user/logstash/mysql-connector-java-5.1.17-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://[remotehostipnumber]/databasename"
jdbc_validate_connection => true
jdbc_user => "username"
jdbc_password => "password"
schedule => "* * * * *"
statement => "select * from table"
}
}
output
{
elasticsearch
{
index => "tables"
document_type => "table"
document_id => "%{table_id}"
hosts => "localhost:9200"
}stdout { codec => json_lines }
}
When running logstash config test it outputs 'Configration OK' message:
sudo /opt/logstash/bin/logstash --configtest -f /home/user/logstash/logstash.conf
Also when executing the logstash.conf script, Elasticsearch outputs:
Settings: Default filter workers: 1
Logstash startup completed
But when I go to check whether the index has been created and data has also been added:
curl -XGET 'localhost:9200/tables/table/_search?pretty=true'
I get:
{
"error" : {
"root_cause" : [ {
"type" : "index_not_found_exception",
"reason" : "no such index",
"resource.type" : "index_or_alias",
"resource.id" : "tables",
"index" : "table"
} ],
"type" : "index_not_found_exception",
"reason" : "no such index",
"resource.type" : "index_or_alias",
"resource.id" : "tables",
"index" : "tables"
},
"status" : 404
}
What could be the potential reasons behind the data not being indexed?
PS. I am keeping the Elasticsearch server running in the separate terminal window, to ensure Logstash can connect and interact with it.

For those who end up here looking for the answer to the similar problem.
My database had 4m rows and it must have been too much for logstash/elasticsearch/jdbc driver to handle in one command.
After I divided the initial transfer into 4 separate chunks of work, the script run and added the desired table into the elasticsearch NoSQL db.

use following code to export data from mysql table and create index in elastic search
echo '{
"type":"jdbc",
"jdbc":{
"url":"jdbc:mysql://localhost:3306/your_database_name",
"user":"your_database_username",
"password":"your_database_password",
"useSSL":"false",
"sql":"SELECT * FROM table1",
"index":"Index_name",
"type":"Index_type",
"poll" : "6s",
"autocommit":"true",
"metrics": {
"enabled" : true
},
"elasticsearch" : {
"cluster" : "clustername",
"host" : "localhost",
"port" : 9300
}
}
}' | java -cp "/etc/elasticsearch/elasticsearch-jdbc-2.3.4.0/lib/*" -"Dlog4j.configurationFile=file:////etc/elasticsearch/elasticsearch-jdbc-2.3.4.0/bin/log4j2.xml" "org.xbib.tools.Runner" "org.xbib.tools.JDBCImporter"

Related

Wildfly json log formatter dynamic configuration not applied

Wildfly 20 is connected with a Logstash instance listening on tcp port 5300:
logstash.conf:
input {
tcp {
codec => json
port => "5300"
}
}
output {
stdout {}
}
Making use of its built-in json logging capabilities with socket connection, as outpointed in wildfly-logstash does not send logs to logstash, Wildfly is configured on the Wildfly CLI, entering the following sequence of statements (that end up in standalone.xml automatically):
/subsystem=logging/json-formatter=LOG-STASH:add(key-overrides={timestamp=#timestamp,message=#message,logger-name=#source,host-name=#source_host}, exception-output-type=formatted)
/socket-binding-group=standard-sockets/remote-destination-outbound-socket-binding=log-stash:add(host=localhost, port=8000)
/subsystem=logging/socket-handler=LOGSTASH-SOCKET:add(named-formatter=LOG-STASH, outbound-socket-binding-ref=log-stash, level=DEBUG)
/subsystem=logging/async-handler=LOGSTASH-ASYNC:add(queue-length=512, subhandlers=[LOGSTASH-SOCKET])
/subsystem=logging/root-logger=ROOT:add-handler(name=LOGSTASH-ASYNC)
It produces log statements on standard out of the logstash node, as e.g.:
{
"level" => "DEBUG",
"host" => "gateway",
"processId" => 14972,
"sequence" => 34696,
"#version" => "1",
"#source" => "com.myapplication.TaskService",
"#source_host" => "device-01",
"threadName" => "EJB default - 6",
"threadId" => 215,
"loggerClassName" => "org.slf4j.impl.Slf4jLogger",
"mdc" => {},
"ndc" => "",
"port" => 64210,
"processName" => "jboss-modules.jar",
"#timestamp" => 2021-03-31T14:10:19.869Z,
"#message" => "task execution successfull: MailDaemon"
}
That is only half way to the goal, required is another set of attribute names (of the individual json log message) to fit in our enterprise logstash instances.
Especially, neither "host-name" nor "logger-name" are written, although configured; instead "#source_host" and #source are logged.
Further adaption of the log-formatter LOG-STASH partially succeeds.
1) /subsystem=logging/json-formatter=LOG-STASH:write-attribute(name="meta-data",value={service="myapplication-api", serviceversion="1.1.0", instanceId="myapplication-api-1.1.0"})
2) /subsystem=logging/json-formatter=LOG-STASH:write-attribute(name="key-overrides",value=[severity=level,timestamp=#timestamp,message=msg,logger-name=#source,host-name=#source_host])
Further simplifaction results in attribute stored, but not applied:
3) /subsystem=logging/json-formatter=LOG-STASH:write-attribute(name="key-overrides",value={"level"="severity"})
4) /subsystem=logging/json-formatter=LOG-STASH:read-attribute(name="key-overrides")
works and meta data are added. 2. and 3. bring no results. 4. prints out like
INFO [org.jboss.as.cli.CommandContext] {
"outcome" => "success",
"result" => {"level" => "severity"}
}
{
"outcome" => "success",
"result" => {"level" => "severity"}
}
With the above setup the following Wildfly CLI command sucessfully renames the wanted keys' default values:
/subsystem=logging/json-formatter=LOG-STASH:write-attribute(name="key-overrides",value={"level"="severity","sequence"="trace","thread-id"="pid","logger-class-name"="class","thread-name"="thread"})
These settings end up in standalone.xml and logging.properties in the same folder on disk.
During my work there was a discrepancy between configured keys in both files.
Be aware that camel case key names like threadId produce a configuration error. You have to use thread-id instead. I found this by inspection of the JBoss logging library, i.e. looking on the Java source code.
The produced logging output is e.g.
{
"pid" => 212,
"message" => "Synchronizaing finished in 0ms",
"#version" => "1",
"loggerName" => "com.myapp.Cache",
"#timestamp" => 2021-04-08T13:49:00.178Z,
"port" => 59182,
"processName" => "jboss-modules.jar",
"trace" => 4245,
"host" => "gateway",
"severity" => "DEBUG",
"processId" => 10536,
"mdc" => {},
"hostName" => "host-alpha",
"timestamp" => "2021-04-08T15:49:00.176+02:00",
"class" => "org.slf4j.impl.Slf4jLogger",
"ndc" => "",
"thread" => "EJB default - 7"
}
What would be nice still, is to have fields mdc and ndc deprived of the output.

logstash: Unknown setting ‘“index”’ for elasticsearch

I'm new to the elastic search concept to make connection with mySQL.
I followed multiple tutorials to install but I'm getting these errors:
Unknown setting '"index"'and '"host'" for elasticsearch
The output of
sudo -Hu root /usr/share/logstash/bin/logstash --path.settings /etc/logstash/
returns:
> Sending Logstash logs to /usr/share/logstash/logs which is now configured via log4j2.properties
> [2019-04-20T17:48:47,293][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"7.0.0"}
> [2019-04-20T17:48:53,873][ERROR][logstash.outputs.elasticsearch] Unknown setting '"document_type"' for elasticsearch
> [2019-04-20T17:48:53,878][ERROR][logstash.outputs.elasticsearch] Unknown setting '"hosts"' for elasticsearch > [2019-04-20T17:48:53,878][ERROR][logstash.outputs.elasticsearch] Unknown setting '"index"' for elasticsearch
> [2019-04-20T17:48:53,891][ERROR][logstash.agent ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash::ConfigurationError", :message=>"Something is wrong with your configuration.", :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/config/mixin.rb:86:inconfig_init'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:60:in initialize'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:232:ininitialize'", "org/logstash/config/ir/compiler/OutputDelegatorExt.java:48:in initialize'", "org/logstash/config/ir/compiler/OutputDelegatorExt.java:30:ininitialize'", "org/logstash/plugins/PluginFactoryExt.java:239:in plugin'", "org/logstash/plugins/PluginFactoryExt.java:137:inbuildOutput'", "org/logstash/execution/JavaBasePipelineExt.java:50:in initialize'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:23:ininitialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:36:in execute'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:325:inblock in converge_state'"]}
> [2019-04-20T17:48:54,190][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
> [2019-04-20T17:48:59,066][INFO ][logstash.runner ] Logstash shut down.
Here is the content of the logstash.conf file:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/archief"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => "pswxxx"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/usr/share/java/mysql-connector-java-8.0.15.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# our query
statement => "SELECT * FROM archief"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => ["localhost:9200"]
"index" => "archief"
}
}
There is no double quotes in the options name.
output {
stdout { codec => json_lines }
elasticsearch {
hosts => ["localhost:9200"]
index => "archief"
}
}
I had the same issue, and solved it by changing my logstash version from 7.4.2 to 6.3.2.
Logstash 6.3.2 link

Logstash: Unable to connect to external Amazon RDS Database

Am relatively new to logstash & Elasticsearch...
Installed logstash & Elasticsearch using on macOS Mojave (10.14.2):
brew install logstash
brew install elasticsearch
When I check for these versions:
brew list --versions
Receive the following output:
elasticsearch 6.5.4
logstash 6.5.4
When I open up Google Chrome and type this into the URL Address field:
localhost:9200
This is the JSON response that I receive:
{
"name" : "9oJAP16",
"cluster_name" : "elasticsearch_local",
"cluster_uuid" : "PgaDRw8rSJi-NDo80v_6gQ",
"version" : {
"number" : "6.5.4",
"build_flavor" : "oss",
"build_type" : "tar",
"build_hash" : "d2ef93d",
"build_date" : "2018-12-17T21:17:40.758843Z",
"build_snapshot" : false,
"lucene_version" : "7.5.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
Inside:
/usr/local/etc/logstash/logstash.yml
Resides the following variables:
path.data: /usr/local/Cellar/logstash/6.5.4/libexec/data
pipeline.workers: 2
path.config: /usr/local/etc/logstash/conf.d
log.level: info
path.logs: /usr/local/var/log
Inside:
/usr/local/etc/logstash/pipelines.yml
Resides the following variables:
- pipeline.id: main
path.config: "/usr/local/etc/logstash/conf.d/*.conf"
Have setup the following logstash_etl.conf file underneath:
/usr/local/etc/logstash/conf.d
Its contents:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://myapp-production.crankbftdpmc.us-west-2.rds.amazonaws.com:3306/products"
jdbc_user => "products_admin"
jdbc_password => "products123"
jdbc_driver_library => "/etc/logstash/mysql-connector/mysql-connector-java-5.1.21.jar"
jdbc_driver_class => "com.mysql.jdbc.driver"
schedule => "*/5 * * * *"
statement => "select * from products"
use_column_value => false
clean_run => true
}
}
# sudo /usr/share/logstash/bin/logstash-plugin install logstash-output-exec
output {
if ([purge_task] == "yes") {
exec {
command => "curl -XPOST 'localhost:9200/_all/products/_delete_by_query?conflicts=proceed' -H 'Content-Type: application/json' -d'
{
\"query\": {
\"range\" : {
\"#timestamp\" : {
\"lte\" : \"now-3h\"
}
}
}
}
'"
}
}
else {
stdout { codec => json_lines}
elasticsearch {
"hosts" => "localhost:9200"
"index" => "product_%{product_api_key}"
"document_type" => "%{[#metadata][index_type]}"
"document_id" => "%{[#metadata][index_id]}"
"doc_as_upsert" => true
"action" => "update"
"retry_on_conflict" => 7
}
}
}
When I do this:
brew services start logstash
Receive the following inside my /usr/local/var/log/logstash-plain.log file:
[2019-01-15T14:51:15,319][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x399927c7 run>"}
[2019-01-15T14:51:15,663][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-01-15T14:51:16,514][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2019-01-15T14:57:31,432][ERROR][logstash.inputs.jdbc ] Unable to connect to database. Tried 1 times {:error_message=>"Java::ComMysqlCjJdbcExceptions::CommunicationsException: Communications link failure\n\nThe last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server."}
[2019-01-15T14:57:31,435][ERROR][logstash.inputs.jdbc ] Unable to connect to database. Tried 1 times {:error_message=>"Java::ComMysqlCjJdbcExceptions::CommunicationsException: Communications link failure\n\nThe last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server."}[2019-01-15T14:51:15,319][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x399927c7 run>"}
[2019-01-15T14:51:15,663][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-01-15T14:51:16,514][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2019-01-15T14:57:31,432][ERROR][logstash.inputs.jdbc ] Unable to connect to database. Tried 1 times
What am I possibly doing wrong?
Is there a way to obtain a dump (e.g. mysqldump) from an Elasticsearch server (Stage or Production) and then reimport into a local instance running Elasticsearch without using logstash?
This is the same configuration file that works inside an Amazon EC-2 Production Instance but don't know why its not working in my local macOS Mojave instance?
You may encounter the SSL issue of RDS, since
If you use either the MySQL Java Connector v5.1.38 or later, or the MySQL Java Connector v8.0.9 or later to connect to your databases, even if you haven't explicitly configured your applications to use SSL/TLS when connecting to your databases, these client drivers default to using SSL/TLS. In addition, when using SSL/TLS, they perform partial certificate verification and fail to connect if the database server certificate is expired.
as described in AWS RDS Doc
To overcome, either set up the trust store for the LogStash, which is described in the above link as well.
Or take the risk to disable the SSL in the connecting string, like
jdbc_connection_string => "jdbc:mysql://myapp-production.crankbftdpmc.us-west-2.rds.amazonaws.com:3306/products?sslMode=DISABLED"

Migration from MySQL to Elasticsearch using Logstash

I am new to ELK stack. I am working with data migration from MySQL TO elasticsearch. I am following this tutorial:
https://qbox.io/blog/migrating-mysql-data-into-elasticsearch-using-logstash
and I have installed and configured MySQL and ElasticSearch. I could not configure Logstash.
I dont know where to find logstash.conf, so i created a file named logstash.conf in my conf.d file of logstash folder. I wrote the following in logstash.conf:
input {
jdbc {
jdbc_driver_library => "usr/share/java/mysql-connector-java.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/books"
jdbc_user => "root"
jdbc_password => "root"
statement => "SELECT * FROM authors"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "my-authors"
"document_type" => "data"
}
}
But when I run my command bin/logstash -f logstash.conf by going into /etc/logstash/conf.d folder from ubuntu terminal, It gives an error stating that bin/logstash does not exist.
Please help me with the issue.

How to integrate ElasticSearch with MySQL?

In one of my project, I am planning to use ElasticSearch with MySQL.
I have successfully installed ElasticSearch. I am able to manage index in ES separately. but I don't know how to implement the same with MySQL.
I have read a couple of documents but I am a bit confused and not having a clear idea.
As of ES 5.x , they have given this feature out of the box with logstash plugin.
This will periodically import data from database and push to ES server.
One has to create a simple import file given below (which is also described here) and use logstash to run the script. Logstash supports running this script on a schedule.
# file: contacts-index-logstash.conf
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "user"
jdbc_password => "pswd"
schedule => "* * * * *"
jdbc_validate_connection => true
jdbc_driver_library => "/path/to/latest/mysql-connector-java-jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
statement => "SELECT * from contacts where updatedAt > :sql_last_value"
}
}
output {
elasticsearch {
protocol => http
index => "contacts"
document_type => "contact"
document_id => "%{id}"
host => "ES_NODE_HOST"
}
}
# "* * * * *" -> run every minute
# sql_last_value is a built in parameter whose value is set to Thursday, 1 January 1970,
# or 0 if use_column_value is true and tracking_column is set
You can download the mysql jar from maven here.
In case indexes do not exist in ES when this script is executed, they will be created automatically. Just like a normal post call to elasticsearch
Finally i was able to find the answer. sharing my findings.
To use ElasticSearch with Mysql you will require The Java Database Connection (JDBC) importer. with JDBC drivers you can sync your mysql data into elasticsearch.
I am using ubuntu 14.04 LTS and you will require to install Java8 to run elasticsearch as it is written in Java
following are steps to install ElasticSearch 2.2.0 and ElasticSearch-jdbc 2.2.0 and please note both the versions has to be same
after installing Java8 ..... install elasticsearch 2.2.0 as follows
# cd /opt
# wget https://download.elasticsearch.org/elasticsearch/release/org/elasticsearch/distribution/deb/elasticsearch/2.2.0/elasticsearch-2.2.0.deb
# sudo dpkg -i elasticsearch-2.2.0.deb
This installation procedure will install Elasticsearch in /usr/share/elasticsearch/ whose configuration files will be placed in /etc/elasticsearch .
Now lets do some basic configuration in config file. here /etc/elasticsearch/elasticsearch.yml is our config file
you can open file to change by
nano /etc/elasticsearch/elasticsearch.yml
and change cluster name and node name
For example :
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: servercluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: vps.server.com
#
# Add custom attributes to the node:
#
# node.rack: r1
Now save the file and start elasticsearch
/etc/init.d/elasticsearch start
to test ES installed or not run following
curl -XGET 'http://localhost:9200/?pretty'
If you get following then your elasticsearch is installed now :)
{
"name" : "vps.server.com",
"cluster_name" : "servercluster",
"version" : {
"number" : "2.2.0",
"build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe",
"build_timestamp" : "2016-01-27T13:32:39Z",
"build_snapshot" : false,
"lucene_version" : "5.4.1"
},
"tagline" : "You Know, for Search"
}
Now let's install elasticsearch-JDBC
download it from http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.3.1/elasticsearch-jdbc-2.3.3.1-dist.zip and extract the same in /etc/elasticsearch/ and create "logs" folder also there ( path of logs should be /etc/elasticsearch/logs)
I have one database created in mysql having name "ElasticSearchDatabase" and inside that table named "test" with fields id,name and email
cd /etc/elasticsearch
and run following
echo '{
"type":"jdbc",
"jdbc":{
"url":"jdbc:mysql://localhost:3306/ElasticSearchDatabase",
"user":"root",
"password":"",
"sql":"SELECT id as _id, id, name,email FROM test",
"index":"users",
"type":"users",
"autocommit":"true",
"metrics": {
"enabled" : true
},
"elasticsearch" : {
"cluster" : "servercluster",
"host" : "localhost",
"port" : 9300
}
}
}' | java -cp "/etc/elasticsearch/elasticsearch-jdbc-2.2.0.0/lib/*" -"Dlog4j.configurationFile=file:////etc/elasticsearch/elasticsearch-jdbc-2.2.0.0/bin/log4j2.xml" "org.xbib.tools.Runner" "org.xbib.tools.JDBCImporter"
now check if mysql data imported in ES or not
curl -XGET http://localhost:9200/users/_search/?pretty
If all goes well, you will be able to see all your mysql data in json format
and if any error is there you will be able to see them in /etc/elasticsearch/logs/jdbc.log file
Caution :
In older versions of ES ... plugin Elasticsearch-river-jdbc was used which is completely deprecated in latest version so do not use it.
I hope i could save your time :)
Any further thoughts are appreciated
Reference url : https://github.com/jprante/elasticsearch-jdbc
The logstash JDBC plugin will do the job:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/testdb"
jdbc_user => "root"
jdbc_password => "factweavers"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/home/comp/Downloads/mysql-connector-java-5.1.38.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# our query
schedule => "* * * *"
statement => "SELECT" * FROM testtable where Date > :sql_last_value order by Date"
use_column_value => true
tracking_column => Date
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "test-migrate"
"document_type" => "data"
"document_id" => "%{personid}"
}
}
To make it more simple I have created a PHP class to Setup MySQL with Elasticsearch. Using my Class you can sync your MySQL data in elasticsearch and also perform full-text search. You just need to set your SQL query and class will do the rest for you.