Push Mysql data to Elasticsearch using fluentd mysql-replicator Plugin

Push Mysql data to Elasticsearch using fluentd mysql-replicator Plugin - mysql

I am working on Fluentd, ElasticSearch and Kibana to push my mysql database data into elasticsearch I'm using plugin fluentd mysql-replicator plugin , But my fluntd throw following error:
2016-06-08 15:43:56 +0530 [warn]: super was not called in #start: called it forcedly plugin=Fluent::MysqlReplicatorInput
2016-06-08 15:43:56 +0530 [info]: listening dRuby uri="druby://127.0.0.1:24230" object="Engine"
2016-06-08 15:43:56 +0530 [info]: listening fluent socket on 0.0.0.0:24224
2016-06-08 15:43:57 +0530 [error]: mysql_replicator: missing primary_key. :tag=>replicator.livechat.chat_chennaibox.${event}.${primary_key} :primary_key=>chat_id
2016-06-08 15:43:57 +0530 [error]: mysql_replicator: failed to execute query.
2016-06-08 15:43:57 +0530 [error]: error: bad value for range
2016-06-08 15:43:57 +0530 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-mysql-replicator-0.5.2/lib/fluent/plugin/in_mysql_replicator.rb:105:in `block in poll'
This is my configuration File for fluentd:
####
## Output descriptions:
##
# Treasure Data (http://www.treasure-data.com/) provides cloud based data
# analytics platform, which easily stores and processes data from td-agent.
# FREE plan is also provided.
# #see http://docs.fluentd.org/articles/http-to-td
#
# This section matches events whose tag is td.DATABASE.TABLE
#<match td.*.*>
# type tdlog
# apikey YOUR_API_KEY
# auto_create_table
# buffer_type file
# buffer_path /var/log/td-agent/buffer/td
# <secondary>
# type file
# path /var/log/td-agent/failed_records
# </secondary>
#</match>
## match tag=debug.** and dump to console
#<match debug.**>
# type stdout
#</match>
####
## Source descriptions:
##
## built-in TCP input
## #see http://docs.fluentd.org/articles/in_forward
<source>
type forward
</source>
## built-in UNIX socket input
#<source>
# type unix
#</source>
# HTTP input
# POST http://localhost:8888/<tag>?json=<json>
# POST http://localhost:8888/td.myapp.login?json={"user"%3A"me"}
# #see http://docs.fluentd.org/articles/in_http
<source>
type http
port 8888
</source>
## live debugging agent
<source>
type debug_agent
bind 127.0.0.1
port 24230
</source>
####
## Examples:
##
## File input
## read apache logs continuously and tags td.apache.access
#<source>
# type tail
# format apache
# path /var/log/httpdaccess.log
# tag td.apache.access
#</source>
## File output
## match tag=local.** and write to file
#<match local.**>
# type file
# path /var/log/td-agent/apache.log
#</match>
## Forwarding
## match tag=system.** and forward to another td-agent server
#<match system.**>
# type forward
# host 192.168.0.11
# # secondary host is optional
# <secondary>
# host 192.168.0.12
# </secondary>
#</match>
## Multiple output
## match tag=td.*.* and output to Treasure Data AND file
#<match td.*.*>
# type copy
# <store>
# type tdlog
# apikey API_KEY
# auto_create_table
# buffer_type file
# buffer_path /var/log/td-agent/buffer/td
# </store>
# <store>
# type file
# path /var/log/td-agent/td-%Y-%m-%d/%H.log
# </store>
#</match>
#<source>
# #type tail
# format apache
# tag apache.access
# path /var/log/td-agent/apache_log/ssl_access_log.1
# read_from_head true
# pos_file /var/log/httpd/access_log.pos
#</source>
#<match apache.access*>
# type stdout
#</match>
#<source>
# #type tail
# format magento_system
# tag magento.access
# path /var/log/td-agent/Magento_log/system.log
# pos_file /tmp/fluentd_magento_system.pos
# read_from_head true
#</source>
#<match apache.access
# type stdout
#</match>
#<source>
# #type http
# port 8080
# bind localhost
# body_size_limit 32m
# keepalive_timeout 10s
#</source>
#<match magento.access*>
# type stdout
#</match>
#<match magento.access*>
# #type elasticsearch
# logstash_format true
# host localhost
# port 9200
#</match>
<source>
type mysql_replicator
host 127.0.0.1
username root
password gworks.mobi2
database livechat
query select chat_name from chat_chennaibox;
#query SELECT t2.firstname,t2.lastname, t1.* FROM status t1 INNER JOIN student_detail t2 ON t1.number = t2.number;
primary_key chat_id # specify unique key (default: id)
interval 10s # execute query interval (default: 1m)
enable_delete yes
tag replicator.livechat.chat_chennaibox.${event}.${primary_key}
</source>
#<match replicator.**>
#type stdout
#</match>
<match replicator.**>
type mysql_replicator_elasticsearch
host localhost
port 9200
tag_format (?<index_name>[^\.]+)\.(?<type_name>[^\.]+)\.(?<event>[^\.]+)\.(?<primary_key>[^\.]+)$
flush_interval 5s
max_retry_wait 1800
flush_at_shutdown yes
buffer_type file
buffer_path /var/log/td-agent/buffer/mysql_replicator_elasticsearch.*
</match>
If there anyone who faced the same issue please suggests to me How to solve this problem.

I solve This Problem. My Problem is Missing primary_key in Query.
wrong Query:
<source>
type mysql_replicator
host 127.0.0.1
username root
password xxxxxxx
database livechat
query select chat_name from chat_chennaibox;
primary_key chat_id
interval 10s # execute query interval (default: 1m)
enable_delete yes
tag replicator.livechat.chat_chennaibox.${event}.${primary_key}
</source>
In this Query i Mentioned primary_key is chat_id But, doesn't mention in query so, I got Error in mysql_replicator: missing primary_key. :tag=>replicator.livechat.chat_chennaibox.${event}.${primary_key} :primary_key=>chat_id
Right Query:
<source>
#type mysql_replicator
host localhost
username root
password xxxxxxx
database livechat
query SELECT chat_id, chat_name FROM chat_chennaibox
primary_key chat_id
interval 10s
enable_delete yes
tag replicator.livechat.chat_chennaibox.${event}.${primary_key}
</source>
You need to Mention primary_key in Query Field
query SELECT chat_id, chat_name FROM chat_chennaibox
Its Working For me.

Related

How to treat nested json using Fluentd

I used fluentd filter parser after looking at the official document, but I keep getting 'pattern not matched' error. I don't know why.
Can someone please help me. thanks.
log
{"audit_record":{"name":"Query","record":"2406760_2022-05-16T05:15:00","timestamp":"2022-05-19T03:52:25Z","command_class":"select","connection_id":"77","status":0,"sqltext":"select ##version_comment limit 1","user":"root[root] # localhost []","host":"localhost","os_user":"","ip":"","db":"Enchante"}}
td-agent.conf
<source>
#type tail
path /var/log/td-agent/audit.log-20220521
<parse>
#type json
</parse>
pos_file /var/log/td-agent/audit_temp.pos
tag audit.test
</source>
<filter audit.test>
#type parser
key_name audit_record
reserve_data false
<parse>
#type json
</parse>
</filter>
<match audit.test>
#type copy
#type mysql_bulk
host localhost
port 3306
database test
username root
password 1234
column_names log_type,log_time,command,sql_text,log_user,db_name
key_names name,timestamp,command_class,sqltext,user,db
table audit_log_temp
flush_interval 10s
</match>
error text
[warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data '{\"name\"=>\"Query\", \"record\"=>\"2406760_2022-05-16T05:15:00\", \"timestamp\"=>\"2022-05-19T03:52:25Z\", \"command_class\"=>\"select\", \"connection_id\"=>\"77\", \"status\"=>0, \"sqltext\"=>\"select ##version_comment limit 1\", \"user\"=>\"root[root] # localhost []\", \"host\"=>\"localhost\", \"os_user\"=>\"\", \"ip\"=>\"\", \"db\"=>\"Enchante\"}'" location=nil tag="audit.test" time=2022-08-08 13:14:58.465969695 +0900 record={"audit_record"=>{"name"=>"Query", "record"=>"2406760_2022-05-16T05:15:00", "timestamp"=>"2022-05-19T03:52:25Z", "command_class"=>"select", "connection_id"=>"77", "status"=>0, "sqltext"=>"select ##version_comment limit 1", "user"=>"root[root] # localhost []", "host"=>"localhost", "os_user"=>"", "ip"=>"", "db"=>"Enchante"}}

Sidekiq server is not processing scheduled jobs when started using systemd

I have a cuba application which I want to use sidekiq with.
This is how I setup the config.ru:
require './app'
require 'sidekiq'
require 'sidekiq/web'
environment = ENV['RACK_ENV'] || "development"
config_vars = YAML.load_file("./config.yml")[environment]
Sidekiq.configure_client do |config|
config.redis = { :url => config_vars["redis_uri"] }
end
Sidekiq.configure_server do |config|
config.redis = { url: config_vars["redis_uri"] }
config.average_scheduled_poll_interval = 5
end
# run Cuba
run Rack::URLMap.new('/' => Cuba, '/sidekiq' => Sidekiq::Web)
I started sidekiq using systemd. This is the systemd script which I adapted from the sidekiq.service on the sidekiq site.:
#
# systemd unit file for CentOS 7, Ubuntu 15.04
#
# Customize this file based on your bundler location, app directory, etc.
# Put this in /usr/lib/systemd/system (CentOS) or /lib/systemd/system (Ubuntu).
# Run:
# - systemctl enable sidekiq
# - systemctl {start,stop,restart} sidekiq
#
# This file corresponds to a single Sidekiq process. Add multiple copies
# to run multiple processes (sidekiq-1, sidekiq-2, etc).
#
# See Inspeqtor's Systemd wiki page for more detail about Systemd:
# https://github.com/mperham/inspeqtor/wiki/Systemd
#
[Unit]
Description=sidekiq
# start us only once the network and logging subsystems are available,
# consider adding redis-server.service if Redis is local and systemd-managed.
After=syslog.target network.target
# See these pages for lots of options:
# http://0pointer.de/public/systemd-man/systemd.service.html
# http://0pointer.de/public/systemd-man/systemd.exec.html
[Service]
Type=simple
Environment=RACK_ENV=development
WorkingDirectory=/media/temp/bandmanage/repos/fall_prediction_verification
# If you use rbenv:
#ExecStart=/bin/bash -lc 'pwd && bundle exec sidekiq -e production'
ExecStart=/home/froy001/.rvm/wrappers/fall_prediction/bundle exec "sidekiq -r app.rb -L log/sidekiq.log -e development"
# If you use the system's ruby:
#ExecStart=/usr/local/bin/bundle exec sidekiq -e production
User=root
Group=root
UMask=0002
# if we crash, restart
RestartSec=1
Restart=on-failure
# output goes to /var/log/syslog
StandardOutput=syslog
StandardError=syslog
# This will default to "bundler" if we don't specify it
SyslogIdentifier=sidekiq
[Install]
WantedBy=multi-user.target
The code calling the worker is :
raw_msg = JSON.parse(req.body.read, {:symbolize_names => true})
if raw_msg
ts = raw_msg[:ts]
waiting_period = (1000*60*3) # wait 3 min before checking
perform_at_time = Time.at((ts + waiting_period)/1000).utc
FallVerificationWorker.perform_at((0.5).minute.from_now, raw_msg)
my_res = { result: "success", status: 200}.to_json
res.status = 200
res.write my_res
else
my_res = { result: "not found", status: 404}.to_json
res.status = 404
res.write my_res
end
I am only using the default q.
My problem is that the job is not being processed at all.

After you run systemctl enable sidekiq so that it starts at boot and systemctl start sidekiq so that it starts immediately, then you should have some logs to review which will provide some detail about any failure to start:
sudo journalctl -u sidekiq
Review the logs, review the systemd docs and adjust your unit file as needed. You can find all the installed systemd documentation with apropos systemd. Some of the most useful man pages to review are systemd.service,systemd.exec and systemd.unit

filebeat doesn't forward to logstash

I have configured filebeat as it is descripted at elastic.co
The Problem is that when I add a new log-file the data is not uploaded to logstash. What can be the problem?
I already tried different config ways but it didn't work at all.
################### Filebeat Configuration Example #########################
############################# Filebeat ######################################
filebeat:
# List of prospectors to fetch data.
prospectors:
-
paths:
- /Users/apps/*.log
input_type: log
###############################################################################
############################# Libbeat Config ##################################
# Base config file used by all other beats for using libbeat features
############################# Output ##########################################
output:
elasticsearch:
hosts: ["localhost:9200"]
worker: 1
index: "filebeat"
template:
path: "filebeat.template.json"
### Logstash as output
logstash:
# The Logstash hosts
hosts: ["localhost:5044"]
index: filebeat
############################# Shipper #########################################
############################# Logging #########################################
# There are three options for the log ouput: syslog, file, stderr.
# Under Windos systems, the log files are per default sent to the file output,
# under all other system per default to syslog.
logging:
files:
rotateeverybytes: 10485760 # = 10MB
Config in logstash.conf:
input {
beats {
port => 5044
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => true
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
document_id => "%{fingerprint}"
}
}

You are both sending Elasticsearch and logstash. You need to remove elasticsearch part if you want to send it to logstash. Taken from https://www.elastic.co/guide/en/beats/filebeat/current/config-filebeat-logstash.html:
If you want to use Logstash to perform additional processing on the
data collected by Filebeat, you need to configure Filebeat to use
Logstash.
To do this, you edit the Filebeat configuration file to disable the
Elasticsearch output by commenting it out and enable the Logstash
output by uncommenting the logstash section

Can you check the worker count in filebeat.yml as below?
### Logstash as output
logstash:
# The Logstash hosts
hosts: ["localhost:5044"]
# Number of workers per Logstash host.
worker: 1
You should add worker count in the logstash part

Cygnus: Bad HTTP notification (curl/7.29.0 user agent not supported)

I installed cygnus version 0.8.2 on Fiware instance basing on the image CentOS-7-x64 using:
sudo yum install cygnus
I configured my agent as the following:
cygnusagent.sources = http-source
cygnusagent.sinks = mongo-sink
cygnusagent.channels = mongo-channel
#=============================================
# source configuration
# channel name where to write the notification events
cygnusagent.sources.http-source.channels = mongo-channel
# source class, must not be changed
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# listening port the Flume source will use for receiving incoming notifications
cygnusagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
cygnusagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
# URL target
cygnusagent.sources.http-source.handler.notification_target = /notify
# Default service (service semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service = def_serv
# Default service path (service path semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service_path = def_servpath
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = 10
# Source interceptors, do not change
cygnusagent.sources.http-source.interceptors = ts gi
# TimestampInterceptor, do not change
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
# GroupinInterceptor, do not change
cygnusagent.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.GroupingInterceptor$Builder
# Grouping rules for the GroupingInterceptor, put the right absolute path to the file if necessary
# See the doc/design/interceptors document for more details
cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file = /usr/cygnus/conf/grouping_rules.conf
# ============================================
# OrionMongoSink configuration
# sink class, must not be changed
cygnusagent.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.OrionMongoSink
# channel name from where to read notification events
cygnusagent.sinks.mongo-sink.channel = mongo-channel
# FQDN/IP:port where the MongoDB server runs (standalone case) or comma-separated list of FQDN/IP:port pairs where the MongoDB replica set members run
cygnusagent.sinks.mongo-sink.mongo_hosts = 127.0.0.1:27017
# a valid user in the MongoDB server (or empty if authentication is not enabled in MongoDB)
cygnusagent.sinks.mongo-sink.mongo_username =
# password for the user above (or empty if authentication is not enabled in MongoDB)
cygnusagent.sinks.mongo-sink.mongo_password =
# prefix for the MongoDB databases
cygnusagent.sinks.mongo-sink.db_prefix = kura_
# prefix pro the MongoDB collections
cygnusagent.sinks.mongo-sink.collection_prefix = kura_
# true is collection names are based on a hash, false for human redable collections
cygnusagent.sinks.mongo-sink.should_hash = false
#=============================================
# mongo-channel configuration
# channel type (must not be changed)
cygnusagent.channels.mongo-channel.type = memory
# capacity of the channel
cygnusagent.channels.mongo-channel.capacity = 1000
# amount of bytes that can be sent per transaction
cygnusagent.channels.mongo-channel.transactionCapacity = 100
I tried to test it locally using the following curl command:
URL=$1
curl $URL -v -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' --header "Fiware-Service: qsg" --header "Fiware-ServicePath: testsink" -d #- <<EOF
{
"subscriptionId" : "51c0ac9ed714fb3b37d7d5a8",
"originator" : "localhost",
"contextResponses" : [
{
"contextElement" : {
"attributes" : [
{
"name" : "temperature",
"type" : "float",
"value" : "26.5"
}
],
"type" : "Room",
"isPattern" : "false",
"id" : "Room1"
},
"statusCode" : {
"code" : "200",
"reasonPhrase" : "OK"
}
}
]
}
EOF
but I got this exception:
2015-10-06 14:38:50,138 (1251445230#qtp-1186065012-0) [INFO - com.telefonica.iot.cygnus.handlers.OrionRestHandler.getEvents(OrionRestHandler.java:150)] Starting transaction (1444142307-244-0000000000)
2015-10-06 14:38:50,140 (1251445230#qtp-1186065012-0) [WARN - com.telefonica.iot.cygnus.handlers.OrionRestHandler.getEvents(OrionRestHandler.java:180)] Bad HTTP notification (curl/7.29.0 user agent not supported)
2015-10-06 14:38:50,140 (1251445230#qtp-1186065012-0) [WARN - org.apache.flume.source.http.HTTPSource$FlumeHTTPServlet.doPost(HTTPSource.java:186)] Received bad request from client.
org.apache.flume.source.http.HTTPBadRequestException: curl/7.29.0 user agent not supported
at com.telefonica.iot.cygnus.handlers.OrionRestHandler.getEvents(OrionRestHandler.java:181)
at org.apache.flume.source.http.HTTPSource$FlumeHTTPServlet.doPost(HTTPSource.java:184)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:814)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Any idea of what can be the cause of this exception?

Cygnus version <= 0.8.2 controls the HTTP headers, only accepting user-agets starting by orion. This has been fixed in 0.9.0 (this is the particular issue). Thus, you have two options:
Avoiding sending such a user-agent header. According to the curl documentation, you can use the -A, --user-agent <agent string> option in order to modify the user-agent and sending something starting by orion (e.g. orion/0.24.0).
Moving to Cygnus 0.9.0 (in order to avoid you have to install it from the sources, I'll upload along the day a RPM in the FIWARE repo).

"null" name is given to both folder and file when sinking data from Orion to Cosmos using Cygnus

I have an issue related to ngsi2cosmos data flow. Everything works fine when persisting the information received in Orion into the public instance of Cosmos, but the destination folder and file name are both "null".
Simple test as follows:
I create of a brand new NGSIEntity with these headers added: Fiware-Service: myservice & Fiware-ServicePath: /my
I add a new subscription with Cygnus as the reference endpoint.
I send an update to previously created NGSIEntity
When I check my user space in Cosmos I check that the following route has been created: /user/myuser/myservice/null/null.txt
File content is OK, every updated info in Orion has been correctly sinked into it. The problem is with folder and file names.
I can't make it work properly. Isn't it supposed to get entityId and entityType for folder and file naming?
Component versions:
Orion version: contextBroker-0.19.0-1.x86_64
Cygnus version: cygnus-0.5-91.g3eb100e.x86_64
Cosmos: global instance
Cygnus conf file:
cygnusagent.sources = http-source
cygnusagent.sinks = hdfs-sink
cygnusagent.channels = hdfs-channel
#=============================================
# source configuration
# channel name where to write the notification events
cygnusagent.sources.http-source.channels = hdfs-channel
# source class, must not be changed
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# listening port the Flume source will use for receiving incoming notifications
cygnusagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
cygnusagent.sources.http-source.handler = es.tid.fiware.fiwareconnectors.cygnus.handlers.OrionRestHandler
# URL target
cygnusagent.sources.http-source.handler.notification_target = /notify
# Default organization (organization semantic depend on the persistence sink)
cygnusagent.sources.http-source.handler.default_organization = org42
# Number of channel re-injection retries before a Flume event is definitely discarded
cygnusagent.sources.http-source.handler.events_ttl = 10
# Management interface port (FIXME: temporal location for this parameter)
cygnusagent.sources.http-source.handler.management_port = 8081
# Source interceptors, do not change
cygnusagent.sources.http-source.interceptors = ts
# Timestamp interceptor, do not change
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
# Destination extractor interceptor, do not change
cygnusagent.sources.http-source.interceptors.de.type = es.tid.fiware.fiwreconnectors.cygnus.interceptors.DestinationExtractor$Builder
# Matching table for the destination extractor interceptor, do not change
cygnusagent.sources.http-source.interceptors.de.matching_table = matching_table.conf
# ============================================
# OrionHDFSSink configuration
# channel name from where to read notification events
cygnusagent.sinks.hdfs-sink.channel = hdfs-channel
# sink class, must not be changed
cygnusagent.sinks.hdfs-sink.type = es.tid.fiware.fiwareconnectors.cygnus.sinks.OrionHDFSSink
# Comma-separated list of FQDN/IP address regarding the Cosmos Namenode endpoints
cygnusagent.sinks.hdfs-sink.cosmos_host = 130.206.80.46
# port of the Cosmos service listening for persistence operations; 14000 for httpfs, 50070 for webhdfs and free choice for inifinty
cygnusagent.sinks.hdfs-sink.cosmos_port = 14000
# default username allowed to write in HDFS
cygnusagent.sinks.hdfs-sink.cosmos_default_username = myuser
# default password for the default username
cygnusagent.sinks.hdfs-sink.cosmos_default_password = mypassword
# HDFS backend type (webhdfs, httpfs or infinity)
cygnusagent.sinks.hdfs-sink.hdfs_api = httpfs
# how the attributes are stored, either per row either per column (row, column)
cygnusagent.sinks.hdfs-sink.attr_persistence = column
# prefix for the database and table names, empty if no prefix is desired
cygnusagent.sinks.hdfs-sink.naming_prefix =
# Hive FQDN/IP address of the Hive server
cygnusagent.sinks.hdfs-sink.hive_host = 130.206.80.46
# Hive port for Hive external table provisioning
cygnusagent.sinks.hdfs-sink.hive_port = 10000
#=============================================
# hdfs-channel configuration
# channel type (must not be changed)
cygnusagent.channels.hdfs-channel.type = memory
# capacity of the channel
cygnusagent.channels.hdfs-channel.capacity = 1000
# amount of bytes that can be sent per transaction
cygnusagent.channels.hdfs-channel.transactionCapacity = 100

I think you should configure the matching_table of cygnus to define the path and file name.
You have that file in the same path of Cygnus agent conf file.
You can follow the next example:
# integer id|comma-separated fields|regex to be applied to the fields concatenation|destination|dataset
#
# The available "dictionary" of fields is:
# - entitydId
# - entityType
# - servicePath
1|entityId,entityType|Room\.(\d*)Room|numeric_rooms|rooms

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Push Mysql data to Elasticsearch using fluentd mysql-replicator Plugin - mysql

Related

How to treat nested json using Fluentd

Sidekiq server is not processing scheduled jobs when started using systemd

filebeat doesn't forward to logstash

Cygnus: Bad HTTP notification (curl/7.29.0 user agent not supported)

"null" name is given to both folder and file when sinking data from Orion to Cosmos using Cygnus

Categories

Resources