logstash-input-jdbc how to use utf-8 chars in statement - mysql

I use logstash-input-jdbc to sync my database to elasticsearch.
Env: (logstash 7.5, elasticsearch 7.5,mysql-connector-java-5.1.48.jar, logstash-input-jdbc-4.3.16)
materials.conf:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/sc_education"
jdbc_driver_library => "connector/mysql-connector-java-5.1.48.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_user => "dauser"
jdbc_password => "daname"
jdbc_paging_enabled => "true"
jdbc_page_size => "50"
statement_filepath => "./materials.sql"
schedule => "* * * * *"
last_run_metadata_path => "./materials.info"
record_last_run => true
tracking_column => updated_at
codec => plain { charset => "UTF-8"}
# parameters => { "favorite_artist" => "Beethoven" }
# statement => "SELECT * from songs where artist = :favorite_artist"
}
}
filter {
json {
source => "message"
remove_field => ["message"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "materials"
document_id => "%{material_id}"
}
stdout {
codec => json_lines
}
}
materials.sql:
SELECT material_name,material_id,
CASE grade_id
WHEN grade_id = 1 THEN "一年级"
WHEN grade_id = 2 THEN "二年级"
WHEN grade_id = 3 THEN "三年级"
WHEN grade_id = 4 THEN "四年级"
WHEN grade_id = 5 THEN "五年级"
WHEN grade_id = 6 THEN "六年级"
WHEN grade_id = 7 THEN "初一"
WHEN grade_id = 8 THEN "初二"
WHEN grade_id = 9 THEN "初三"
WHEN grade_id = 10 THEN "高一"
WHEN grade_id = 11 THEN "高二"
WHEN grade_id = 12 THEN "高三"
ELSE "" END as grade,
CASE subject_id
WHEN subject_id = 1 THEN "数学"
WHEN subject_id = 2 THEN "物理"
WHEN subject_id = 3 THEN "化学"
WHEN subject_id = 4 THEN "语文"
WHEN subject_id = 5 THEN "英语"
WHEN subject_id = 6 THEN "科学"
WHEN subject_id = 7 THEN "音乐"
WHEN subject_id = 8 THEN "绘画"
WHEN subject_id = 9 THEN "政治"
WHEN subject_id = 10 THEN "历史"
WHEN subject_id = 11 THEN "地理"
WHEN subject_id = 12 THEN "生物"
WHEN subject_id = 13 THEN "奥数"
ELSE "" END as subject,
CASE course_term_id
WHEN course_term_id = 1 THEN "春"
WHEN course_term_id = 2 THEN "暑"
WHEN course_term_id = 3 THEN "秋"
WHEN course_term_id = 4 THEN "寒"
ELSE "" END as season,
created_at, updated_at from sc_materials where updated_at > :sql_last_value and material_id in (2025,317,2050);
./bin/logstash -f materials.conf
{"#version":"1","updated_at":"2019-08-19T02:04:54.000Z","season":"?","grade":"","created_at":"2019-08-19T02:04:54.000Z","#timestamp":"2019-12-13T01:02:01.907Z","material_name":"test material seri''al","material_id":2025,"subject":"??"}
{"#version":"1","updated_at":"2019-08-26T09:25:35.000Z","season":"","grade":"","created_at":"2019-08-26T09:25:35.000Z","#timestamp":"2019-12-13T01:02:01.908Z","material_name":"人教版高中英语必修三第10讲Unit5 Canada The True North语法篇A学生版2.pdf","material_id":2050,"subject":""}
{"#version":"1","updated_at":"2019-08-10T06:50:48.000Z","season":"?","grade":"","created_at":"2019-05-27T06:26:44.000Z","#timestamp":"2019-12-13T01:02:01.880Z","material_name":"90aca2238832143fb75dcf0fe6dbbfa9.pdf","material_id":317,"subject":""}
The chinese chars in db works well, but the chinese chars in statement becomes chars ?.

for me, characterEncoding=utf8 was not working.
after added this,
stdin {
codec => plain { charset => "UTF-8"}
}
works well.
here is my working conf file.
It's a bit of a time to post an answer, but I hope it helps someone.
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://localhost:5432/atlasdb?useTimezone=true&useLegacyDatetimeCode=false&serverTimezone=UTC&useSSL=false&useUnicode=true&characterEncoding=utf8"
jdbc_user => "atlas"
jdbc_password => "atlas"
jdbc_validate_connection => true
jdbc_driver_library => "/lib/postgres-42-test.jar"
jdbc_driver_class => "org.postgresql.Driver"
schedule => "* * * * *"
statement => "SELECT * from naver_city"
}
stdin {
codec => plain { charset => "UTF-8"}
}
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "2020-04-23-2"
doc_as_upsert => true
action => "update"
document_id => "%{code}"
}
stdout { codec => rubydebug }
}

I have encountered this problem when use query contain Japanese character.
You could change jdbc_connection_string in materials.conf
<i>
jdbc_connection_string => "jdbc:mysql://localhost:3306/sc_education?useSSL=false&useUnicode=true&characterEncoding=utf8"
</i>
Restart logstash

Related

Store the output of results.each into array in ruby on rails

Code:
{ db = Mysql2::Client.new( :host => 'localhost', :username => 'username',
password => 'password', :database => 'database')
results = db.query("select * from users where exported is not TRUE OR
NULL").each(:as => :array)
results.each { | row | puts row[1]}
The results.each line outputs outputs company data, and I want to use each line as an input within an API call. Any ideas as how to do this? Each row should populate an attribute like below.
"requested_item_value_attributes" => {
"employee_first_name_6000555821" => 'results.each { | row | puts row[0]}',
"employee_last_name_6000555821" => "results.each { | row | puts row[1]}",
"hiring_manager_6000555821" => "results.each { | row | puts row[2]}",
"job_title" => "results.each { | row | puts row[3]}",
"start_date" => "#results.each { | row | puts row[4]}"
}
You can use
nameArray = Array.new
nameArray.push(nameToSave)
to add the variable nameToSave to the end of the array nameArray.
Just call push for each of your results and you have an array with all your names from your query.
Use [Array#map] to map the results to an array:
results.map do |row|
"requested_item_value_attributes" => {
"employee_first_name_6000555821" => row[0],
"employee_last_name_6000555821" => row[1],
"hiring_manager_6000555821" => row[2],
"job_title" => row[3],
"start_date" => row[4]
}
}
or, even better:
results.map do |row|
"requested_item_value_attributes" =>
%w[
employee_first_name_6000555821,
employee_last_name_6000555821,
hiring_manager_6000555821,
job_title,
start_date
].zip(row.take(5)).to_h
}
}
Use the query method second argument.
results = []
db.query('SELECT * FROM table', results)

Kartik-Widget GridView Filter TYPEAHEAD inactive

Following code throws out no error,but it's completely inactive, respectively redundant. JQuery is filtering nothing! Any ideas, how to fix this?
Here is code:
[
'attribute' => 'name',
'label' => 'Land',
'value' => function($model) {
if ($model->name) {
return $model->name;
} else {
return NULL;
}
},
'filterType' => GridView::FILTER_TYPEAHEAD,
'filterWidgetOptions' => [
'pluginOptions' => ['highlight' => true],
//'dataset' => [['local' => array_values(\app\models\Country::find()->orderBy('name')->asArray()->one())]
'dataset' => [['local' => array_values(ArrayHelper::map(\app\models\Country::find()->all(), 'id', 'name'))]
]],
'filterInputOptions' => ['placeholder' => 'JQuery will filter...'],
'format' => 'raw'
],
Here is var_dump of
$ausgabe_ = array(array_values(ArrayHelper::map(\app\models\Country::find()->all(), 'id', 'name')));
E:\xampp\htdocs\Yii-WSL\views\country\index.php:145:
array (size=1)
0 =>
array (size=25)
0 => string 'Arabische Emirate' (length=17)
1 => string 'Algerien' (length=8)
2 => string 'Australia' (length=9)
3 => string 'Belgien' (length=7)
4 => string 'Brasilien' (length=9)
5 => string 'Canada' (length=6)
6 => string 'Schweiz' (length=7)
7 => string 'China' (length=5)
8 => string 'Zypern' (length=6)
9 => string 'Germany' (length=7)
10 => string 'Westsahara' (length=10)
11 => string 'France' (length=6)
12 => string 'United Kingdom' (length=14)
13 => string 'Ungarn' (length=6)
14 => string 'India' (length=5)
15 => string 'Laos' (length=4)
16 => string 'Russia' (length=6)
17 => string 'Sudan' (length=5)
18 => string 'Turkmenistan' (length=12)
19 => string 'Ukraine' (length=7)
20 => string 'Uganda' (length=6)
21 => string 'United States' (length=13)
22 => string 'Vatikanstadt' (length=12)
23 => string 'Vietnam' (length=7)
24 => string 'Südafrika' (length=10)
Any further ideas,how to fix this?
P.S.: If I try like this.....:
'dataset' => [
['local' => array_values([ArrayHelper::map(\app\models\Country::find()->orderBy('name')->asArray()->all(), 'id', 'name ')])],
]
....result of
$ausgabe_ = array(array_values(ArrayHelper::map(Country::find()->orderBy('name')->asArray()->all(), 'id', 'name ')));
var_dump($ausgabe_);
is like this:
E:\xampp\htdocs\Yii-WSL\views\country\index.php:146:
array (size=1)
0 =>
array (size=25)
0 => null
1 => null
2 => null
3 => null
4 => null
5 => null
6 => null
7 => null
8 => null
9 => null
10 => null
11 => null
12 => null
13 => null
14 => null
15 => null
16 => null
17 => null
18 => null
19 => null
20 => null
21 => null
22 => null
23 => null
24 => null
Nothing helps to fix this problem. Further ideas?
Worth to note the array must be with integer type indexes and there cannot be missing numbers (indexes) (for example, 0; 1; 2; 3 is fine but 0; 1; 3; 4 is not because index 2 is missing).
The only valid structure (as example):
array(4) {
[0]=>
string(5) "Alpha"
[1]=>
string(4) "Beta"
[2]=>
string(5) "Gamma"
[3]=>
string(5) "Delta"
}
Your array is not valid:
The first (and only) element is 0 that contains other array. It cannot be like that;
That larger array has mixed indexes (read requirement in the first paragraph);
null values are not accepted.
What might solve your problem is if you use array_values:
'dataset' => [
['local' => array_values([ArrayHelper::map(\app\models\Country::find()->orderBy('name')->asArray()->all(), 'id', 'name ')])],
],

How sync data from mysql database to elasticsearch with logstash: only indexing new data and index by country?

Can I specify the Id where starting syncrhonize? For not index all data again and can I specify diferent index in diferents scenarios, for example, index by country?
This is my logstach conf:
# file: contacts-index-logstash.conf
input {
jdbc {
jdbc_driver_library => "/home/peter/Downloads/mysql-connector-java-5.1.40-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost/MYJOBS"
jdbc_user => "readuser"
jdbc_password => "xxxx"
# schedule => "* * * * *"
statement => "SELECT af.IdAnuncio as idanuncio, af.Titulo, af.Descripcion, af.Empresa, p.Abreviatura,
pr.Nombre as Provincia, cd.Nombre as Ciudad, af.Localidad
FROM `ANUNCIO_FORM` af
INNER JOIN PAIS p ON p.Id = IdPais
INNER JOIN PROVINCIA pr ON pr.Id = af.IdProvincia
INNER JOIN CIUDAD cd ON cd.Id = af.IdCiudad
WHERE af.IdAnuncio >0
AND af.Fecha_de_publicacion > '2016-12-01'
AND af.Estado =1"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
index => "anuncios"
document_type => "internos"
document_id => "%{idanuncio}"
hosts => "localhost:9200"
}
}
Thanks in advance.
PD. English not is my first language, so please excuse any mistakes.

How to index nested mysql object into elasticsearch using logstash?

I'm trying to index mysql database with elasticsearch. Consider the example mapping:
{"blog":
{"properties":
{"id": "string"}
{"author": "string"}
{"time_created": }
{"author_info":
{"author_name":}
{"author_sex":}
}
{"posts":
{"post_author":}
{"post_time":}
}
}
}
I have three tables which are author_info, blog and post. How can I index these records into elastic with a nested structure? I cannot find documents about it. Thanks
input {
jdbc{
jdbc_validate_connection => true
jdbc_connection_string => "jdbc:mysql://172.17.0.2:3306/_db"
jdbc_user => "root"
jdbc_password => "admin"
jdbc_driver_library => "/home/ilsa/mysql-connector-java-5.1.36-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
clean_run => true
statement => "SELECT
u.id as employee_number, u.email as email, u.username as username,
up.id as post_id, up.text_content as content,
pc.id as comment_id , pc.user_post_id as comment_post_id, pc.comment as comment_text
FROM users u join user_posts up on up.user_id = u.id
LEFT JOIN post_comments pc ON pc.user_post_id = up.id
ORDER BY up.id ASC"
}
}
filter {
aggregate {
task_id => "%{employee_number}"
code => "
map['employee_number'] = event.get('employee_number')
map['email'] = event.get('email')
map['username'] = event.get('username')
map['posts'] ||= []
map['posts'] << {
'post_id' => event.get('post_id'),
'content' => event.get('content'),
'comments' => [] << {
'comment_id' => event.get('comment_id'),
'comment_post_id' => event.get('comment_post_id'),
'comment_text' => event.get('comment_text')
}
}
event.cancel()"
push_previous_map_as_event => true
timeout => 30
}
}
output {
stdout{ codec => rubydebug }
elasticsearch{
action => "index"
index => "_dev"
document_type => "_doc"
document_id => "%{employee_number}"
hosts => "localhost:9200"
}
}
In the sql part of logstash input you might try to select the fields with the nested names you want in elasticsearch. Below is a small sample of how it might look.
input {
jdbc {
statement => "SELECT id as blog.properties.id, author as blog.properties.author,..... from blog inner join properties inner join posts"
}
}

Select records as column (sub array) from joins or sub query

I have my tables
companies (id)
purchase_invoice (id,company_id,date)
items(id,company_id,purchase_invoice_id)
gifted_items(id,company_id,purchase_invoice_id)
rebated_items(id,company_id,purchase_invoice_id)
return_items(id,company_id,purchase_invoice_id)
I need to Query all purchase invoices of company and their associated records in other tables where company_id is equal to (user selected ).
I need all these sub tables result as sub columns (idk if i am correct).
Note : company_id and purchase_invoice_id are FK in all other tables
This is what I am trying
$ledgers = array(
'com' => $data['company_id'],
'sd' => $data['start_date'],
'ed' => $data['end_date']
);
$purchaseObj = new Application_Model_DbTable_Ledgers();
$purchaseRes = $purchaseObj->getPurchaseInvoices($ledgers);
public function getPurchaseInvoices($ledgers) {
$sql = $this->select()
->setIntegrityCheck(false)
->from(array('pi' => $this->_name))
->join(array('c' => 'companies'), 'pi.company_id = c.id', array('c.name as companyName'))
->join(array('it' => 'items'), 'it.purchase_invoice_id = pi.id');
// ->join(array('prm' => 'rebated_purchase_mobiles'), 'prm.company_id = pi.company_id')
// ->join(array('rpm' => 'returned_purchase_mobiles'), 'rpm.company_id = pi.company_id')
// ->join(array('pgm' => 'purchase_gifted_mobiles'), 'pgm.company_id = pi.company_id');
$sql = $sql->where('pi.company_id = ?', $ledgers['com']);
if (count($ledgers)) {
if (Performance_Engine::notEmpty($ledgers['sd']) && $ledgers['sd'] != 'sd') {
$dt = new DateTime($ledgers['sd']);
$sdate = $dt->format('Y-m-d');
$sql = $sql->where('CAST(pi.date AS DATE) >= ?', $sdate);
}
if (Performance_Engine::notEmpty($ledgers['ed']) && $ledgers['ed'] != 'ed') {
$dt = new DateTime($ledgers['ed']);
$edate = $dt->format('Y-m-d');
$sql = $sql->where('CAST(pi.date AS DATE) <= ?', $edate);
}
$sql->group('pi.id');
$sql->order('pi.id ASC');
return $sql->query()->fetchAll();
}
}
I need result like below array
Array (
[0] => Array
(
[id] => 102
[company_id] => 8
[grand_total_price] => 45000
[disc_by_rupees] => 111
[disc_by_percent] => 0
[total_price] => 44889
[pay_amount] => 15000
[total_paid_amount] => 15000
[remaining_amount] => 29889
[bill_no] => Q13
[date] => 2015-01-15
[companyName] => QMobile
[items]=> array(
[0] => array(
'id' => 1
'company_id'=> 8
'purchase_invoice_id '=> 102
'model_id' => 1
'date' => 2015-12-1
'IMEI'=>7654367876
'color' => black
'purchase_price' => 500
'is_sold' => 1
'is_returned'=>1
)
[1] => array(
'id' => 2
'company_id'=> 8
'purchase_invoice_id '=> 102
'model_id' => 3
'date' => 2015-12-1
'IMEI'=>34567890
'color' => white
'purchase_price' => 6500
'is_sold' => 1
'is_returned'=>1
)
)
[gifted_items]=> array(
[0] => array(
'id' => 1
'company_id'=> 8
'purchase_invoice_id '=> 102
)
[1] => array(
'id' => 2
'company_id'=> 8
'purchase_invoice_id '=> 102
)
)
[rebated_items]=> array(
[0] => array(
'id' => 1
'company_id'=> 8
'purchase_invoice_id '=> 102
)
[1] => array(
'id' => 2
'company_id'=> 8
'purchase_invoice_id '=> 102
)
)
[return_items]=> array(
[0] => array(
'id' => 1
'company_id'=> 8
'purchase_invoice_id '=> 102
)
[1] => array(
'id' => 2
'company_id'=> 8
'purchase_invoice_id '=> 102
)
)
)
I hope this is understand. please correct me what i need to do.
All I need to do is purchase invoice and all associated table as column which will contain array of selected records
That is not possible in a query.