How to tail multiple files in fluentd - json

I have setup fluentd logger and I am able to monitor a file by using fluentd tail input plugin. All the data is received by fluentd is later published to elasticsearch cluster. Below is the configuration file for fluentd:
<source>
#type tail
path /home/user/Documents/log_data.json
format json
tag myfile
</source>
<match *myfile*>
#type elasticsearch
hosts 192.168.48.118:9200
user <username>
password <password>
index_name fluentd
type_name fluentd
</match>
As you can see I am monitoring log_data.json file by using tail. I also have a file in the same directory log_user.json, I want to monitor it also and publish it logs to elasticsearch. To do this, I thought of creating another <source> & <match> with different tag but it started showing errors.
How can I monitor multiple files in fluentd and publish them to elasticsearch. I see when we start fluentd its worker is started. Is it possible to start multiple worker so that each one of them is monitoring different files, or any other way of doing it. Can anyone point me to some good links/tutorials.
Thanks.

You can use multiple source+match tags.
Label can help you to bind them.
Here an example:
<source>
#label #mainstream
#type tail /home/user/Documents/log_data.json
format json
tag myfile
</source>
<label #mainstream>
<match **>
#type copy
<store>
#type elasticsearch
host elasticsearch
port 9200
logstash_format true
logstash_prefix fluentd
logstash_dateformat %Y%m%d
include_tag_key true
type_name access_log
tag_key #log_name
<buffer>
flush_mode interval
flush_interval 1s
retry_type exponential_backoff
flush_thread_count 2
retry_forever true
retry_max_interval 30
chunk_limit_size 2M
queue_limit_length 8
overflow_action block
</buffer>
</store>
</match>
</label>

Related

Kibana cannot capture the JSON logs start with "code" ="0000000"

Hello I am very new to elastic stack and I am trying to extract useful information based on the system logs (in JSON format) in Kibana. I am using filebeat to send to kibana in elastic cloud.
Problem: basically, I cannot view the data i send to kibana in the following format
{"code":"28000","file":"auth.c","length":164,"level":"error","line":"496","message":"no pg_hba.conf entry",everity":"FATAL","timestamp":"2022-10-05 08:40:14.308"}
Kibana is ignoring all the lines start with "code":"000000".
however, the following formatted logs have no prob.
{"code":"SELF_SIGNED_CERT_IN_CHAIN","level":"error","message":"self signed certificate in certificate chain,"requestId":"166666666623piq9lzd2ngllllll","timestamp":"2022-10-05 08:40:39.908"}
and
{"level":"error","message":"Login was refused using authentication mechanism PLAIN.","requestId":"199999999udchsuz8d2s5aoszqw6w7","stack":"Error: Handshake terminated by server","timestamp":"2022-10-05 08:57:37.330"}
Please see the filebeat.yml below.
###################### Filebeat Configuration Example #########################
# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html
# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.
# ============================== Filebeat inputs ===============================
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
# filestream is an input for collecting log messages from files.
- type: log
# Unique ID among all inputs, an ID is required.
# id: my-filestream-id
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- C:\bi\hd-be-logs\hd-be-logs\combined\*.log
#- c:\programdata\elasticsearch\logs\*
json.keys_under_root: true
json.overwrite_keys: true
json.add_error_key: true
json.expand_keys: true
json.close_inactive: true
json.clean_removed: true
# Exclude lines. A list of regular expressions to match. It drops the lines that are
# matching any regular expression from the list.
# Line filtering happens after the parsers pipeline. If you would like to filter lines
# before parsers, use include_message parser.
#exclude_lines: ['^DBG']
# Include lines. A list of regular expressions to match. It exports the lines that are
# matching any regular expression from the list.
# Line filtering happens after the parsers pipeline. If you would like to filter lines
# before parsers, use include_message parser.
#include_lines: ['^ERR', '^WARN']
# Exclude files. A list of regular expressions to match. Filebeat drops the files that
# are matching any regular expression from the list. By default, no files are dropped.
#prospector.scanner.exclude_files: ['.gz$']
# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
#fields:
# level: debug
# review: 1
# ============================== Filebeat modules ==============================
filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
# Set to true to enable config reloading
reload.enabled: false
# Period on which files under path should be checked for changes
#reload.period: 10s
# ======================= Elasticsearch template setting =======================
setup.template.settings:
index.number_of_shards: 1
#index.codec: best_compression
#_source.enabled: false
# ================================== General ===================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging
# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false
# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
# =================================== Kibana ===================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
#host: "localhost:5601"
# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:
# =============================== Elastic Cloud ================================
# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).
# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
cloud.id: My_deployment:cloudid
# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
cloud.auth: elastic:4----------F
# ================================== Outputs ===================================
# Configure what output to use when sending the data collected by the beat.
# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
# Array of hosts to connect to.
# hosts: ["localhost:9200"]
# Protocol - either `http` (default) or `https`.
#protocol: "https"
# Authentication credentials - either API key or username/password.
# api_key: "changeme"
#username: "elastic"
#password: "changeme"
# ------------------------------ Logstash Output -------------------------------
#output.logstash:
# The Logstash hosts
#hosts: ["localhost:5044"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
# ================================= Processors =================================
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
- add_cloud_metadata: ~
- add_docker_metadata: ~
- add_kubernetes_metadata: ~
# ================================== Logging ===================================
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]
# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster. This requires xpack monitoring to be enabled in Elasticsearch. The
# reporting is disabled by default.
# Set to true to enable the monitoring reporter.
#monitoring.enabled: false
# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:
# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:
# ============================== Instrumentation ===============================
# Instrumentation support for the filebeat.
#instrumentation:
# Set to true to enable instrumentation of filebeat.
#enabled: false
# Environment in which filebeat is running on (eg: staging, production, etc.)
#environment: ""
# APM Server hosts to report instrumentation results to.
#hosts:
# - http://localhost:8200
# API Key for the APM Server(s).
# If api_key is set then secret_token will be ignored.
#api_key:
# Secret token for the APM Server(s).
#secret_token:
# ================================= Migration ==================================
# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

Setting up a CSV Data Adapter locally

I am trying to set up the Data Visualization extension to use data from csv file for the sensors based on this example:
https://forge.autodesk.com/en/docs/dataviz/v1/developers_guide/advanced_topics/csv_adapter/
So the csv data I am trying to use is the default Hyperion-1.csv in folder server\gateways\csv. Do I need to add/change some other settings as well?
It is showing the following error in Chrome console:
I have these settings for the csv in .env file.
And these in devices.json in server\gateways\synthetic-data folder.
I've just taken the following steps to enable the CSV data adapter which seemed to work fine:
Clone the repo: git clone https://github.com/Autodesk-Forge/forge-dataviz-iot-reference-app
Install dependencies: npm install
Create a copy of server/env_template and rename it to server/.env
Modify the contents of server/.env, commenting out all the initial env. variables, uncommenting the CSV-related env. vars, and setting their corresponding values:
# FORGE_CLIENT_ID=
# FORGE_CLIENT_SECRET=
# FORGE_ENV=AutodeskProduction
# FORGE_API_URL=https://developer.api.autodesk.com
# FORGE_CALLBACK_URL=http://localhost:9000/oauth/callback
#
# FORGE_BUCKET=
# ENV=local
# ADAPTER_TYPE=local
## Please uncomment the following part if you want to connect to Azure IoTHub and Time Series Insights
## Connect to Azure IoTHub and Time Series Insights
# ADAPTER_TYPE=azure
# AZURE_IOT_HUB_CONNECTION_STRING=
# AZURE_TSI_ENV=
#
## Azure Service Principle
# AZURE_CLIENT_ID=
# AZURE_APPLICATION_SECRET=
# AZURE_TENANT_ID=
# AZURE_SUBSCRIPTION_ID=
#
## Path to Device Model configuration File
# DEVICE_MODEL_JSON=
## End - Connect to Azure IoTHub and Time Series Insights
## Please uncomment the following part if you want to use a CSV file as the time series provider
ADAPTER_TYPE=csv
CSV_MODEL_JSON=server/gateways/synthetic-data/device-models.json
CSV_DEVICE_JSON=server/gateways/synthetic-data/devices.json
CSV_FOLDER=server/gateways/csv/
CSV_DATA_START=2011-02-01T08:00:00.000Z
CSV_DATA_END=2011-02-20T13:51:10.511Z
CSV_DELIMITER="\t"
CSV_LINE_BREAK="\n"
CSV_TIMESTAMP_COLUMN="time"
CSV_FILE_EXTENSION=".csv"
## End - Please uncomment the following part if you want to use a CSV file as the time series provider
Run the app with ENV set to "local": ENV=local npm run dev
After these steps the app is running successfully, however you'll get some other errors because the server/gateways/csv folder only contains data for a single sensor (Hyperion-1).
Btw. I've been working on an alternative DataViz sample app that aims to be simpler and easier to reuse: https://github.com/petrbroz/forge-iot-extensions-demo (which uses https://github.com/petrbroz/forge-iot-extensions under the hood).

Keycloak on kubernetes and logging json layout format with log4j2

I have Keycloak deployed in Kubernetes using the official codecentric chart. Now I want to make Keycloak logs into json format in order to export them to Kibana.
A comment to the original reply pointed to a cli command to do this.
cli:
# Custom CLI script
custom: |
/subsystem=logging/json-formatter=json:add(exception-output-type=formatted, pretty-print=false, meta-data={label=value})
/subsystem=logging/console-handler=CONSOLE:write-attribute(name=named-formatter, value=json)
It is a Java application that is running on Wildfly. If you check the main process that is running inside the pod, you will see something like:
/usr/lib/jvm/java/bin/java -D[Standalone] -server -Xms64m -Xmx512m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -Dorg.jboss.boot.log.file=/opt/jboss/keycloak/standalone/log/server.log -Dlogging.configuration=file:/opt/jboss/keycloak/standalone/configuration/logging.properties -jar /opt/jboss/keycloak/jboss-modules.jar -mp /opt/jboss/keycloak/modules org.jboss.as.standalone -Djboss.home.dir=/opt/jboss/keycloak -Djboss.server.base.dir=/opt/jboss/keycloak/standalone -Djboss.bind.address=10.217.0.231 -Djboss.bind.address.private=10.217.0.231 -b 0.0.0.0 -c standalone.xml
Important part here is the following:
-Dlogging.configuration=file:/opt/jboss/keycloak/standalone/configuration/logging.properties
So, the logging configuration is passed to the Java process as a JVM option, and read from the file on the path /opt/jboss/keycloak/standalone/configuration/logging.properties.
If you check the content of the file, it has a section like the following:
...
handler.CONSOLE=org.jboss.logmanager.handlers.ConsoleHandler
handler.CONSOLE.level=INFO
handler.CONSOLE.formatter=COLOR-PATTERN
handler.CONSOLE.properties=autoFlush,target,enabled
handler.CONSOLE.autoFlush=true
handler.CONSOLE.target=SYSTEM_OUT
handler.CONSOLE.enabled=true
...
You need to figure out what to change in this logging configuration to meet your JSON requirements. An example would be:
formatter.json=org.jboss.logmanager.formatters.JsonFormatter
formatter.json.properties=keyOverrides,exceptionOutputType,metaData,prettyPrint,printDetails,recordDelimiter
formatter.json.constructorProperties=keyOverrides
formatter.json.keyOverrides=timestamp\=#timestamp
formatter.json.exceptionOutputType=FORMATTED
formatter.json.metaData=#version\=1
formatter.json.prettyPrint=false
formatter.json.printDetails=false
formatter.json.recordDelimiter=\n
Then, in Kubernetes you can create a ConfigMap with the logging config that you want, define it as a volume in your pod/deployment, and mount it as a file to that exact path in the pod/deployment definition. If you do all steps correctly, you should be able to customize the logging format as you need.

How to store apache logs into mongodb using fluentd?

I am using "FLUENTD" data collector for storing apache logs into MongoDB.I did necessary changes in td-agent configuration files like
<source>
#type tail
format apache2
path C:\Program Files (x86)\Apache Group\Apache2\logs\access.log
tag mongo.apache
</source>
and
<match mongo.**>
# plugin type
#type mongo
# mongodb db + collection
database apache
collection access
# mongodb host + port
host localhost
port 27017
# interval
flush_interval 10s
# make sure to include the time key
include_time_key true
</match>
Did all changes in td-agent.conf file;
Path of log file is C:\Program Files (x86)\Apache Group\Apache2\logs\access
path of position file is C:\var\log\td-agent\apache2.access_log.pos
To test the configuration, pinged the apache server using command;
ab -n 100 -c 10 http://localhost/
This command is provided by http://docs.fluentd.org/v0.12/articles/apache-to-mongodb inorder to send logs to mongodb and I followed that tutorial to send logs to mongodb
Everything works good. Database and collection were also created,
but the logs files were not stored in MongoDB.
And also installed "Apache Group" to work with apache bench.

fluent and webhdfs filename with 197001011

I run td-agent on ubuntu 14.04 with the follow configuration:
<source>
type tail
format json
path /path/tomcat/logs/file-input.log
tag bhc.hdfs
pos_file /var/td-agent/file.pos
</source>
<match bhc.hdfs>
type webhdfs
port 50070
host my.host.name
path /hdfs/path/file.${hostname}.%Y%m%d.log
username user
flush_interval 10s
output_include_time false
output_include_tag false
output_data_type json
</match>
Log source files in directory /path/tomcat/logs/file-input.log contain only a structured json data.
Ntp daemon is installed and running but when td-agent creates file in hdfs date on filename is 19700101.
What's wrong?
Fluentd records has time, and webhdfs plugin creates files with that records' timestamp, not current time.
tail plugin uses field named as time for time of record in default. If your log data have any other time information field, you can specify it with time_key and time_format.
See also: http://docs.fluentd.org/articles/in_tail