Logstash JDBC Input plugin : Migrate data from mysql in batch count - mysql

I have a table of 20GB data having 50 million rows. Need to migrate to ElasticSearch using logstash jdbc input plugin. I have tried all basic implementation but need help in migrating data in batch i.e only 10,000 rows at a time. I am not sure how and where to specify this count and how to update it the next time i run logstash. Please help me solve this issue
This is what i have:
input {
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.12-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost/db"
jdbc_validate_connection => true
jdbc_user => "root"
jdbc_password => "root"
clean_run => true
record_last_run => true
use_column_value => true
jdbc_paging_enabled => true
jdbc_page_size => 5
tracking_column => id
statement => "select * from employee"
}
}
Thanks in advance.

You need to set jdbc_paging_enabled to true in order for pagniation to work.
But you also need to make sure that clean_run is set to false, otherwise pagination won't work.

Related

Logstash settup to Mysql hosted on AWS RDS. Won't connect

I have a MySQL DB hosted on AWS RDS. I am running ElasticSearch locally and using Logstash to retrieve data from MYSQL server that is on AWS to then push the data to my ElasticSearch DB.
The problem is that my logstash file isn't settup correctly I guess:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://aws.ffffffffff.us-east-1.rds.amazonaws.com:3306/dbName?user=userName&password=pword"
jdbc_user => "user"
jdbc_password => "pword"
schedule => "* * * * *"
jdbc_validate_connection => true
jdbc_driver_library => "C:\Program Files (x86)\MySQL\Connector J 8.0\mysql-connector-java-8.0.19.jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
statement => "SELECT * from data-5"
type => "data-5"
tags => ["data-5"]
}
jdbc {
jdbc_connection_string => "jdbc:mysql://aws.ffffffffff.us-east-1.rds.amazonaws.com:3306/dbName?user=userName&password=pword"
jdbc_user => "user"
jdbc_password => "pword"
schedule => "* * * * *"
jdbc_validate_connection => true
jdbc_driver_library => "C:\Program Files (x86)\MySQL\Connector J 8.0\mysql-connector-java-8.0.19.jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
statement => "SELECT * from data-4"
type => "data-4"
tags => ["data-4"]
}
jdbc {
jdbc_connection_string => "jdbc:mysql://aws.ffffffffff.us-east-1.rds.amazonaws.com:3306/dbName?user=userName&password=pword"
jdbc_user => "user"
jdbc_password => "pword"
schedule => "* * * * *"
jdbc_validate_connection => true
jdbc_driver_library => "C:\Program Files (x86)\MySQL\Connector J 8.0\mysql-connector-java-8.0.19.jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
statement => "SELECT * from data-3"
type => "data-3"
tags => ["data-3"]
}
}
output {
stdout { codec => json_lines }
if "data-5" in [tags] {
elasticsearch {
hosts => ["http://127.0.0.1:9200/"]
index => "data-5"
document_type => "data-%{+YYYY.MM.dd}"
}
}
if "data-4" in [tags] {
elasticsearch {
hosts => ["http://127.0.0.1:9200/"]
index => "data-4"
document_type => "data-%{+YYYY.MM.dd}"
}
}
if "data-3" in [tags] {
elasticsearch {
hosts => ["http://127.0.0.1:9200/"]
index => "data-3"
document_type => "data-%{+YYYY.MM.dd}"
}
}
}
This is the fun part of programming right?
Anyway, locally I am on windows as you may be able to tell from the file path to the jdbc driver library. My jdbc connection to the AWS RDS is copied and pasted from AWS Console, so no typos were involved.
I am told that I only need to append jdbc:mysql:// to the url. But is there anything I'm missing to do in the AWS console? Do I need to modify my RDS instance?
The error by the way is:
Unable to connect to database. Tried 1 times
{:error_message=>"Java::ComMysqlCjJdbcExceptions::CommunicationsException:
Communications link failure\n\n
The last packet sent successfully to the server was 0 milliseconds ago.
The driver has not received any packets from the server."
I had a similar issue, and SSL was not enabled for communication. So, AWS RDS was not allowing to connect.
I specified an additional query param useSSL=false to JDBC connection string, and problem was solved.
So, in your case, the jdbc_connection_string would be as follows:
"jdbc:mysql://aws.ffffffffff.us-east-1.rds.amazonaws.com:3306/dbName?useSSL=false&user=userName&password=pword"

How correctly push data from Logstash to Elasticsearch server?

I am new in ELK. I need to visualize data from a PostgreSQL in Kibana. I ran into a little problem and need a help.
I use:
Elasticsearch 6.4.1
Kibana 6.4.1
Logstash 6.4.1
When I run next logstash.conf file it don't send me correct data to elasticsearch server. What I need to change in my configuration file?
logstash.conf:
input
{
jbdc_connection_string => "path_to_database"
jdbc_user => "postgres"
jdbc_password => "postgres"
jdbc_driver_library => "/path_to/postgresql-42.2.5.jar"
jdbc_driver_class => "org.postgresql.Driver"
statement => "SELECT * from documents"
}
output
{
elasticsearch
{
hosts => ["localhost:9200"]
index => "documents"
}
}
Only when in output I use next configuration I see correct data in terminal:
strout
{
codes => json_lines
}

logstash: not able to connect mysql with logstash?

I am trying to make connection with MySQL using logstash and write into elastic search below in my code in conf file
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://192.168.2.24:3306/test"
# The user we wish to execute our statement as
jdbc_user => "uname"
jdbc_password => "pass"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/usr/local/Cellar/logstash/6.2.4/mysql-connector-java-8.0.11.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# our query
statement => "SELECT * FROM report_table"
}
}
output {
elasticsearch {
action => "index"
hosts => "localhost:9200"
index => "mysqlsample"
document_type => "record"
}
}
on running the above getting the below error :
Error: com.mysql.jdbc.Driver not loaded. Are you sure you've included
the correct jdbc driver in :jdbc_driver_library? Exception:
LogStash::ConfigurationError Stack:
/usr/local/Cellar/logstash/6.2.4/libexec/vendor/bundle/jruby/2.3.0/gems/logstash-input-jdbc-4.3.9/lib/logstash/plugin_mixins/jdbc.rb:162:in
open_jdbc_connection'
/usr/local/Cellar/logstash/6.2.4/libexec/vendor/bundle/jruby/2.3.0/gems/logstash-input-jdbc-4.3.9/lib/logstash/plugin_mixins/jdbc.rb:220:in
execute_statement'
/usr/local/Cellar/logstash/6.2.4/libexec/vendor/bundle/jruby/2.3.0/gems/logstash-input-jdbc-4.3.9/lib/logstash/inputs/jdbc.rb:264:in
execute_query'
/usr/local/Cellar/logstash/6.2.4/libexec/vendor/bundle/jruby/2.3.0/gems/logstash-input-jdbc-4.3.9/lib/logstash/inputs/jdbc.rb:250:in
run'
/usr/local/Cellar/logstash/6.2.4/libexec/logstash-core/lib/logstash/pipeline.rb:514:in
inputworker'
/usr/local/Cellar/logstash/6.2.4/libexec/logstash-core/lib/logstash/pipeline.rb:507:in
block in start_input'
Sounds like it's an issue with jdbc_driver_library => "/usr/local/Cellar/logstash/6.2.4/mysql-connector-java-8.0.11.jar".
Are you sure it's a valid path? Is that the correct connector? Maybe try to use the one that the documentation mentions: mysql-connector-java-5.1.36-bin.jar

how to automate updation of data from mysql to logstash

Currently i am working on Elastic Stack with Mysql, everything working fine like data in mysql database is available on Elastic-search using Logstash but when new data entered in mysql db then i need to restart Logstash or it can be done using Schedule in config file of Logstash
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/testdb"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => "ankit"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/home/ankit/Downloads/mysql-connector-java-5.1.38.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
#run logstash at an interval of on minute
#schedule => "* * * * * *"
# our query
statement => "SELECT * FROM ghijkl"
}
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/testdb"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => "ankit"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/home/ankit/Downloads/mysql-connector-java-5.1.38.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
#run logstash at an interval of on minute
#schedule => "* * * * * *"
# our query
statement => "SELECT * FROM abcdef"
}
}
but this isn't a good approach, and i am thinking to use web-hooks but no resource is available to do so, i tried Logstash HTTP input plugin from documentation page but no help from this.
Please help.
You can download only recent data let's say every 15 minutes using special query:
SELECT * FROM ghijkl" WHERE EVENT_TIME_OCCURRENCE_FIELD > :sql_last_value
In place of :sql_last_value a timestamp of the most recent record will be inserted . When the query is run for the first time tracking_column value is set to 01.01.1970.
Required configuration for Logstash:
schedule => "*/15 * * * *"
use_column_value => true
tracking_column => 'EVENT_TIME_OCCURRENCE_FIELD'
For every input you should also specify last_run_metadata_path parameter in order to avoid problems in the future, when you have many inputs and some are using the same table but different schemas then meta data might be overridden and produce unexpected results.
last_run_metadata_path => "PATH_TO_FILE_FOR_META_DATA"

using logstash to sync data

I'm trying to use logstash to sync all my data on my MySql server to my Elasticsearch server.
I've aleardy learned the basics of logstash.conf, this is my file:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost/homestead"
jdbc_user => "homestead"
jdbc_password => "secret"
jdbc_driver_library => "/home/vagrant/Code/mysql-connector-java-5.1.38-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT * from volunteer"
}
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost/homestead"
jdbc_user => "homestead"
jdbc_password => "secret"
jdbc_driver_library => "/home/vagrant/Code/mysql-connector-java-5.1.38-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT * from contact"
}
}
output {
elasticsearch {
document_id => "%{uid}"
hosts => "localhost"
}
}
My intention is to copy every table into a Type. How do I specify this?
edit: "type" instead of "index"
Thank you!
What you can do is simply to add a field (using add_field) in each input denoting the type name which you want the data to be indexed in and then use that variable as the type name in the elasticsearch output.
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost/homestead"
jdbc_user => "homestead"
jdbc_password => "secret"
jdbc_driver_library => "/home/vagrant/Code/mysql-connector-java-5.1.38-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT * from volunteer"
add_field => {"type" => "volunteer"}
}
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost/homestead"
jdbc_user => "homestead"
jdbc_password => "secret"
jdbc_driver_library => "/home/vagrant/Code/mysql-connector-java-5.1.38-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT * from contact"
add_field => {"type" => "contact"}
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "homestead"
document_type => "%{type}" <--- specify the index here
document_id => "%{uid}"
}
}
Be aware though that using the same index to host several different mapping types might lead to type conflicts. The short story is that two different fields with the same name in two different types MUST ALWAYS have the same type definition. Read more about it in this blog article