Fluend does not automatically add the current system time in Json Parser - json

Fluentd Experts and Users!
Currently we have met an issue in using Fluentd to parse json format log. Fluentd does not automatically add the current system time to the parsing result, although I have configured time_key and keep_time_key according to the documentation.
The example of our log is,
{"host": "204.48.112.175", "user-identifier": "-", "method": "POST", "request": "/synthesize/initiatives/integrated", "protocol": "HTTP/2.0", "status": 502, "bytes": 10272}
and you can see that there is no time field in it.
But there is no system current time in the parsed log output (the output is in stdout (debug mode) ):
loghub_s3: {"host":"204.48.112.175","user-identifier":"-","method":"POST","request":"/synthesize/initiatives/integrated","protocol":"HTTP/2.0","status":502,"bytes":10272,"referer":"http://www.centralenable.name/user-centric/reintermediate/synergistic/e-business","s3_bucket":"loghub-logs-691546483958","s3_key":"json/json-notime.json"}
And my config file is:
<system>
log_level debug
</system>
<match loghub_s3>
#type stdout
#id output_stdout
</match>
<source>
#type s3
tag loghub_s3
s3_bucket loghub-logs-691546483958
s3_region us-east-1
store_as json
add_object_metadata true
<instance_profile_credentials>
ip_address 169.254.169.254
port 80
</instance_profile_credentials>
<sqs>
queue_name loghub-fluentd-dev
</sqs>
<parse>
#type json
time_type string
time_format %d/%b/%Y:%H:%M:%S %z
time_key time
keep_time_key true
</parse>
</source>
Other informations:
Fluentd version: 1.14.3
TD Agent version: 4.3.0
fluent-plugin-s3 version: 1.6.1
Operating system: Amazon Linux2
Kernel version: 5.10.102-99.473.amzn2.x86_64
And we have used the s3-input-plugin: https://github.com/fluent/fluent-plugin-s3
Can anyone help us to check if our configuration is wrong. And I’m not sure if this is a Fluentd issue, or Plugin issue.
Thanks a lot in advance!

As mentioned in the comments, fluentd does not create a time/timestamp field unless configured otherwise. You can inject this field under filter or match section.
Here's an example with the sample input and stdout output plugins:
fluentd: 1.12.3
fluent.conf
<source>
#type sample
#id in_sample
sample {"k":"v"}
tag sample
</source>
<match sample>
#type stdout
#id out_stdout
<inject>
time_key timestamp
time_type string
time_format %Y-%m-%dT%H:%M:%S.%NZ
</inject>
</match>
Run fluentd:
fluentd -c ./fluent.conf
fluentd logs
2022-04-10 08:46:26.053278947 +0500 sample: {"k":"v","timestamp":"2022-04-10T08:46:26.053278947Z"}
2022-04-10 08:46:27.056770340 +0500 sample: {"k":"v","timestamp":"2022-04-10T08:46:27.056770340Z"}
2022-04-10 08:46:28.059998159 +0500 sample: {"k":"v","timestamp":"2022-04-10T08:46:28.059998159Z"}

Related

How to treat nested json using Fluentd

I used fluentd filter parser after looking at the official document, but I keep getting 'pattern not matched' error. I don't know why.
Can someone please help me. thanks.
log
{"audit_record":{"name":"Query","record":"2406760_2022-05-16T05:15:00","timestamp":"2022-05-19T03:52:25Z","command_class":"select","connection_id":"77","status":0,"sqltext":"select ##version_comment limit 1","user":"root[root] # localhost []","host":"localhost","os_user":"","ip":"","db":"Enchante"}}
td-agent.conf
<source>
#type tail
path /var/log/td-agent/audit.log-20220521
<parse>
#type json
</parse>
pos_file /var/log/td-agent/audit_temp.pos
tag audit.test
</source>
<filter audit.test>
#type parser
key_name audit_record
reserve_data false
<parse>
#type json
</parse>
</filter>
<match audit.test>
#type copy
#type mysql_bulk
host localhost
port 3306
database test
username root
password 1234
column_names log_type,log_time,command,sql_text,log_user,db_name
key_names name,timestamp,command_class,sqltext,user,db
table audit_log_temp
flush_interval 10s
</match>
error text
[warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data '{\"name\"=>\"Query\", \"record\"=>\"2406760_2022-05-16T05:15:00\", \"timestamp\"=>\"2022-05-19T03:52:25Z\", \"command_class\"=>\"select\", \"connection_id\"=>\"77\", \"status\"=>0, \"sqltext\"=>\"select ##version_comment limit 1\", \"user\"=>\"root[root] # localhost []\", \"host\"=>\"localhost\", \"os_user\"=>\"\", \"ip\"=>\"\", \"db\"=>\"Enchante\"}'" location=nil tag="audit.test" time=2022-08-08 13:14:58.465969695 +0900 record={"audit_record"=>{"name"=>"Query", "record"=>"2406760_2022-05-16T05:15:00", "timestamp"=>"2022-05-19T03:52:25Z", "command_class"=>"select", "connection_id"=>"77", "status"=>0, "sqltext"=>"select ##version_comment limit 1", "user"=>"root[root] # localhost []", "host"=>"localhost", "os_user"=>"", "ip"=>"", "db"=>"Enchante"}}

Fluentd on kubernetes : Segregating container logs based on container name and retag them to send to ElasticSearch

I have deployed fluentd in Openshift cluster and setup ES and Kibana On-Premise. I need to capture logs from the nodes and transmit them to ES running on-prem. Specifically, I need to separate the /var/log/containers/*.log into two sections based on a container name (kong)so that all kong logs are tagged as kong and remaining ones are tagged as application. Additionally, I would also require the kubernetes metadata info for the pod logs(namespace, container name etc)
Is there a way to achieve that?
Fluentd docker image version: fluent/fluentd-kubernetes-daemonset:v1.11.1-debian-elasticsearch7-1.3
ElasticSearch, Kibana - 7.8.0
Below are my configuration files.
fluentd.conf
# AUTOMATICALLY GENERATED
# DO NOT EDIT THIS FILE DIRECTLY, USE /templates/conf/fluent.conf.erb
#include kubernetes.conf
<match **>
#type elasticsearch
#log_level info
include_tag_key true
host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'http'}"
logstash_format true
logstash_prefix openshift-${tag}
user "#{ENV['FLUENT_ELASTICSEARCH_USER'] || use_default}"
password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD'] || use_default}"
<buffer>
#type file
path /var/log/fluentd-buffers/kubernetes.system.buffer
retry_max_interval 30
flush_interval 1s
flush_thread_count 8
chunk_limit_size 2M
queue_limit_length 32
overflow_action block
retry_forever true
</buffer>
</match>
kubernetes.conf
<label #FLUENT_LOG>
<match fluent.**>
#type null
</match>
</label>
<source>
#type tail
path /var/log/containers/*kong*.log
pos_file /var/log/fluentd-containers.log.pos
tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
exclude_path ["/var/log/containers/fluentd*"]
read_from_head true
<parse>
#type multi_format
<pattern>
format json
time_format '%Y-%m-%dT%H:%M:%S.%N%Z'
keep_time_key true
</pattern>
<pattern>
format regexp
expression /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/
time_format '%Y-%m-%dT%H:%M:%S.%N%:z'
keep_time_key true
</pattern>
</parse>
</source>
<filter kubernetes.**>
#type kubernetes_metadata
kubernetes_url "#{ENV['FLUENT_FILTER_KUBERNETES_URL'] || 'https://' + ENV.fetch('KUBERNETES_SERVICE_HOST') + ':' + ENV.fetch('KUBERNETES_SERVICE_PORT') + '/api'}"
verify_ssl "#{ENV['KUBERNETES_VERIFY_SSL'] || true}"
ca_file "#{ENV['KUBERNETES_CA_FILE']}"
skip_labels "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_LABELS'] || 'false'}"
skip_container_metadata "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_CONTAINER_METADATA'] || 'false'}"
skip_master_url "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_MASTER_URL'] || 'false'}"
skip_namespace_metadata "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_NAMESPACE_METADATA'] || 'false'}"
</filter>
I tried changing the tag name kubernetes.* to kong . I was able to get the log in ES but the kubernetes metadata were missing.
Please help in this regard.
my solution was to set a label. See this, I guarantee it will help you.
https://stackoverflow.com/a/63739163/3740244
\o/Good is that, by boyBR

How to access json elements in fluentd config match directive

I have setup fluentd in my kubernetes cluster (AKS) to send the logs to azure blob using the microsoft plugin azure-storage-append-blob. Currently the path how my logs are stored is as follows containername/logs/file.log. but I want it to be in this way containername/logs/podname/file.log. I've used fluent-plugin-kubernetes_metadata_filter plugin to filter out the kubernetes metadata. Below is my current configuration that I tried. but this did not work out well for me. Also I'm posting a sample JSON output from the logs. I know this is possible but just need a little bit help or guidance here to finish this off.
Current configuration:
<match fluent.**>
#type null
</match>
<source>
#type tail
path /var/log/containers/*.log
pos_file /var/log/td-agent/tmp/access.log.pos
tag container.*
#format json
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
read_from_head true
</source>
<match container.var.log.containers.**fluentd**.log>
#type null
</match>
<filter container.**>
#type kubernetes_metadata
</filter>
<match **>
#type azure-storage-append-blob
azure_storage_account mysaname
azure_storage_access_key mysaaccesskey
azure_container fluentdtest
auto_create_container true
path logs/
append false
azure_object_key_format %{path}%{tag}%{time_slice}_%{index}.log
time_slice_format %Y%m%d-%H-%M
# if you want to use %{tag} or %Y/%m/%d/ like syntax in path / azure_blob_name_format,
# need to specify tag for %{tag} and time for %Y/%m/%d in <buffer> argument.
<buffer tag,time,timekey>
#type file
path /var/log/fluent/azurestorageappendblob
timekey 300s
timekey_wait 10s
timekey_use_utc true # use utc
chunk_limit_size 5MB
queued_chunks_limit_size 1
</buffer>
</match>
Sample Json from the logs
container.var.log.containers.nginx - connector - deployment - 5 bbfdf4f86 - p86dq_mynamespace_nginx - ee437ca90cb3924e1def9bdaa7f682577fc16fb023c00975963a105b26591bfb.log:
{
"log": "2020-07-16 17:12:56,761 INFO spawned: 'consumer' with pid 87068\n",
"stream": "stdout",
"docker": {
"container_id": "ee437ca90cb3924e1def9bdaa7f682577fc16fb023c00975963a105b26591bfb"
},
"kubernetes": {
"container_name": "nginx",
"namespace_name": "mynamespace",
"pod_name": "nginx-connector-deployment-5bbfdf4f86-p86dq",
"container_image": "docker.io/nginx",
"container_image_id": "docker-pullable://docker.io/nginx:f908584cf96053e50862e27ac40534bbd57ca3241d4175c9576dd89741b4926",
"pod_id": "93a630f9-0442-44ed-a8d2-9a7173880a3b",
"host": "aks-nodepoolkube-15824989-vmss00000j",
"labels": {
"app": "nginx",
"pod-template-hash": "5bbfdf4f86"
},
"master_url": "https://docker.io:443/api",
"namespace_id": "87092784-26b4-4dd5-a9d2-4833b72a1366"
}
}
Below is the official github link for the append-blob plugin https://github.com/microsoft/fluent-plugin-azure-storage-append-blob
Please refer below link for configuration for fluentd for reading JSON/NON-JSON multiline logs. Try with this configuration it will work.
How to get ${kubernetes.namespace_name} for index_name in fluentd?

Push Mysql data to Elasticsearch using fluentd mysql-replicator Plugin

I am working on Fluentd, ElasticSearch and Kibana to push my mysql database data into elasticsearch I'm using plugin fluentd mysql-replicator plugin , But my fluntd throw following error:
2016-06-08 15:43:56 +0530 [warn]: super was not called in #start: called it forcedly plugin=Fluent::MysqlReplicatorInput
2016-06-08 15:43:56 +0530 [info]: listening dRuby uri="druby://127.0.0.1:24230" object="Engine"
2016-06-08 15:43:56 +0530 [info]: listening fluent socket on 0.0.0.0:24224
2016-06-08 15:43:57 +0530 [error]: mysql_replicator: missing primary_key. :tag=>replicator.livechat.chat_chennaibox.${event}.${primary_key} :primary_key=>chat_id
2016-06-08 15:43:57 +0530 [error]: mysql_replicator: failed to execute query.
2016-06-08 15:43:57 +0530 [error]: error: bad value for range
2016-06-08 15:43:57 +0530 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-mysql-replicator-0.5.2/lib/fluent/plugin/in_mysql_replicator.rb:105:in `block in poll'
This is my configuration File for fluentd:
####
## Output descriptions:
##
# Treasure Data (http://www.treasure-data.com/) provides cloud based data
# analytics platform, which easily stores and processes data from td-agent.
# FREE plan is also provided.
# #see http://docs.fluentd.org/articles/http-to-td
#
# This section matches events whose tag is td.DATABASE.TABLE
#<match td.*.*>
# type tdlog
# apikey YOUR_API_KEY
# auto_create_table
# buffer_type file
# buffer_path /var/log/td-agent/buffer/td
# <secondary>
# type file
# path /var/log/td-agent/failed_records
# </secondary>
#</match>
## match tag=debug.** and dump to console
#<match debug.**>
# type stdout
#</match>
####
## Source descriptions:
##
## built-in TCP input
## #see http://docs.fluentd.org/articles/in_forward
<source>
type forward
</source>
## built-in UNIX socket input
#<source>
# type unix
#</source>
# HTTP input
# POST http://localhost:8888/<tag>?json=<json>
# POST http://localhost:8888/td.myapp.login?json={"user"%3A"me"}
# #see http://docs.fluentd.org/articles/in_http
<source>
type http
port 8888
</source>
## live debugging agent
<source>
type debug_agent
bind 127.0.0.1
port 24230
</source>
####
## Examples:
##
## File input
## read apache logs continuously and tags td.apache.access
#<source>
# type tail
# format apache
# path /var/log/httpdaccess.log
# tag td.apache.access
#</source>
## File output
## match tag=local.** and write to file
#<match local.**>
# type file
# path /var/log/td-agent/apache.log
#</match>
## Forwarding
## match tag=system.** and forward to another td-agent server
#<match system.**>
# type forward
# host 192.168.0.11
# # secondary host is optional
# <secondary>
# host 192.168.0.12
# </secondary>
#</match>
## Multiple output
## match tag=td.*.* and output to Treasure Data AND file
#<match td.*.*>
# type copy
# <store>
# type tdlog
# apikey API_KEY
# auto_create_table
# buffer_type file
# buffer_path /var/log/td-agent/buffer/td
# </store>
# <store>
# type file
# path /var/log/td-agent/td-%Y-%m-%d/%H.log
# </store>
#</match>
#<source>
# #type tail
# format apache
# tag apache.access
# path /var/log/td-agent/apache_log/ssl_access_log.1
# read_from_head true
# pos_file /var/log/httpd/access_log.pos
#</source>
#<match apache.access*>
# type stdout
#</match>
#<source>
# #type tail
# format magento_system
# tag magento.access
# path /var/log/td-agent/Magento_log/system.log
# pos_file /tmp/fluentd_magento_system.pos
# read_from_head true
#</source>
#<match apache.access
# type stdout
#</match>
#<source>
# #type http
# port 8080
# bind localhost
# body_size_limit 32m
# keepalive_timeout 10s
#</source>
#<match magento.access*>
# type stdout
#</match>
#<match magento.access*>
# #type elasticsearch
# logstash_format true
# host localhost
# port 9200
#</match>
<source>
type mysql_replicator
host 127.0.0.1
username root
password gworks.mobi2
database livechat
query select chat_name from chat_chennaibox;
#query SELECT t2.firstname,t2.lastname, t1.* FROM status t1 INNER JOIN student_detail t2 ON t1.number = t2.number;
primary_key chat_id # specify unique key (default: id)
interval 10s # execute query interval (default: 1m)
enable_delete yes
tag replicator.livechat.chat_chennaibox.${event}.${primary_key}
</source>
#<match replicator.**>
#type stdout
#</match>
<match replicator.**>
type mysql_replicator_elasticsearch
host localhost
port 9200
tag_format (?<index_name>[^\.]+)\.(?<type_name>[^\.]+)\.(?<event>[^\.]+)\.(?<primary_key>[^\.]+)$
flush_interval 5s
max_retry_wait 1800
flush_at_shutdown yes
buffer_type file
buffer_path /var/log/td-agent/buffer/mysql_replicator_elasticsearch.*
</match>
If there anyone who faced the same issue please suggests to me How to solve this problem.
I solve This Problem. My Problem is Missing primary_key in Query.
wrong Query:
<source>
type mysql_replicator
host 127.0.0.1
username root
password xxxxxxx
database livechat
query select chat_name from chat_chennaibox;
primary_key chat_id
interval 10s # execute query interval (default: 1m)
enable_delete yes
tag replicator.livechat.chat_chennaibox.${event}.${primary_key}
</source>
In this Query i Mentioned primary_key is chat_id But, doesn't mention in query so, I got Error in mysql_replicator: missing primary_key. :tag=>replicator.livechat.chat_chennaibox.${event}.${primary_key} :primary_key=>chat_id
Right Query:
<source>
#type mysql_replicator
host localhost
username root
password xxxxxxx
database livechat
query SELECT chat_id, chat_name FROM chat_chennaibox
primary_key chat_id
interval 10s
enable_delete yes
tag replicator.livechat.chat_chennaibox.${event}.${primary_key}
</source>
You need to Mention primary_key in Query Field
query SELECT chat_id, chat_name FROM chat_chennaibox
Its Working For me.

How to remove label of the json data using fluentd out webhdfs

I am using fluentd out_webhdfs to post data into hadoop hdfs
td-agent.conf:
<match access.**>
type webhdfs
host hadoop1
port 50070
path /data/access.log.%Y%m%d_%H.log
output_data_type ltsv
field_separator TAB
output_include_tag false
</match>
using curl command to post data:
curl -X POST -d 'json={"name":"abc","id":"8754738"}' http://localhost:8888/.access
the result in hdfs file is below:
2014-08-20T08:25:28Z name:abc id:8754738
Is there any way to remove the "name:" and "id:" out , the result I want is :
2014-08-20T08:25:28Z abc 8754738
Any body help me please !