How to store apache logs into mongodb using fluentd? - json

I am using "FLUENTD" data collector for storing apache logs into MongoDB.I did necessary changes in td-agent configuration files like
<source>
#type tail
format apache2
path C:\Program Files (x86)\Apache Group\Apache2\logs\access.log
tag mongo.apache
</source>
and
<match mongo.**>
# plugin type
#type mongo
# mongodb db + collection
database apache
collection access
# mongodb host + port
host localhost
port 27017
# interval
flush_interval 10s
# make sure to include the time key
include_time_key true
</match>
Did all changes in td-agent.conf file;
Path of log file is C:\Program Files (x86)\Apache Group\Apache2\logs\access
path of position file is C:\var\log\td-agent\apache2.access_log.pos
To test the configuration, pinged the apache server using command;
ab -n 100 -c 10 http://localhost/
This command is provided by http://docs.fluentd.org/v0.12/articles/apache-to-mongodb inorder to send logs to mongodb and I followed that tutorial to send logs to mongodb
Everything works good. Database and collection were also created,
but the logs files were not stored in MongoDB.
And also installed "Apache Group" to work with apache bench.

Related

Setting up a CSV Data Adapter locally

I am trying to set up the Data Visualization extension to use data from csv file for the sensors based on this example:
https://forge.autodesk.com/en/docs/dataviz/v1/developers_guide/advanced_topics/csv_adapter/
So the csv data I am trying to use is the default Hyperion-1.csv in folder server\gateways\csv. Do I need to add/change some other settings as well?
It is showing the following error in Chrome console:
I have these settings for the csv in .env file.
And these in devices.json in server\gateways\synthetic-data folder.
I've just taken the following steps to enable the CSV data adapter which seemed to work fine:
Clone the repo: git clone https://github.com/Autodesk-Forge/forge-dataviz-iot-reference-app
Install dependencies: npm install
Create a copy of server/env_template and rename it to server/.env
Modify the contents of server/.env, commenting out all the initial env. variables, uncommenting the CSV-related env. vars, and setting their corresponding values:
# FORGE_CLIENT_ID=
# FORGE_CLIENT_SECRET=
# FORGE_ENV=AutodeskProduction
# FORGE_API_URL=https://developer.api.autodesk.com
# FORGE_CALLBACK_URL=http://localhost:9000/oauth/callback
#
# FORGE_BUCKET=
# ENV=local
# ADAPTER_TYPE=local
## Please uncomment the following part if you want to connect to Azure IoTHub and Time Series Insights
## Connect to Azure IoTHub and Time Series Insights
# ADAPTER_TYPE=azure
# AZURE_IOT_HUB_CONNECTION_STRING=
# AZURE_TSI_ENV=
#
## Azure Service Principle
# AZURE_CLIENT_ID=
# AZURE_APPLICATION_SECRET=
# AZURE_TENANT_ID=
# AZURE_SUBSCRIPTION_ID=
#
## Path to Device Model configuration File
# DEVICE_MODEL_JSON=
## End - Connect to Azure IoTHub and Time Series Insights
## Please uncomment the following part if you want to use a CSV file as the time series provider
ADAPTER_TYPE=csv
CSV_MODEL_JSON=server/gateways/synthetic-data/device-models.json
CSV_DEVICE_JSON=server/gateways/synthetic-data/devices.json
CSV_FOLDER=server/gateways/csv/
CSV_DATA_START=2011-02-01T08:00:00.000Z
CSV_DATA_END=2011-02-20T13:51:10.511Z
CSV_DELIMITER="\t"
CSV_LINE_BREAK="\n"
CSV_TIMESTAMP_COLUMN="time"
CSV_FILE_EXTENSION=".csv"
## End - Please uncomment the following part if you want to use a CSV file as the time series provider
Run the app with ENV set to "local": ENV=local npm run dev
After these steps the app is running successfully, however you'll get some other errors because the server/gateways/csv folder only contains data for a single sensor (Hyperion-1).
Btw. I've been working on an alternative DataViz sample app that aims to be simpler and easier to reuse: https://github.com/petrbroz/forge-iot-extensions-demo (which uses https://github.com/petrbroz/forge-iot-extensions under the hood).

Keycloak on kubernetes and logging json layout format with log4j2

I have Keycloak deployed in Kubernetes using the official codecentric chart. Now I want to make Keycloak logs into json format in order to export them to Kibana.
A comment to the original reply pointed to a cli command to do this.
cli:
# Custom CLI script
custom: |
/subsystem=logging/json-formatter=json:add(exception-output-type=formatted, pretty-print=false, meta-data={label=value})
/subsystem=logging/console-handler=CONSOLE:write-attribute(name=named-formatter, value=json)
It is a Java application that is running on Wildfly. If you check the main process that is running inside the pod, you will see something like:
/usr/lib/jvm/java/bin/java -D[Standalone] -server -Xms64m -Xmx512m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -Dorg.jboss.boot.log.file=/opt/jboss/keycloak/standalone/log/server.log -Dlogging.configuration=file:/opt/jboss/keycloak/standalone/configuration/logging.properties -jar /opt/jboss/keycloak/jboss-modules.jar -mp /opt/jboss/keycloak/modules org.jboss.as.standalone -Djboss.home.dir=/opt/jboss/keycloak -Djboss.server.base.dir=/opt/jboss/keycloak/standalone -Djboss.bind.address=10.217.0.231 -Djboss.bind.address.private=10.217.0.231 -b 0.0.0.0 -c standalone.xml
Important part here is the following:
-Dlogging.configuration=file:/opt/jboss/keycloak/standalone/configuration/logging.properties
So, the logging configuration is passed to the Java process as a JVM option, and read from the file on the path /opt/jboss/keycloak/standalone/configuration/logging.properties.
If you check the content of the file, it has a section like the following:
...
handler.CONSOLE=org.jboss.logmanager.handlers.ConsoleHandler
handler.CONSOLE.level=INFO
handler.CONSOLE.formatter=COLOR-PATTERN
handler.CONSOLE.properties=autoFlush,target,enabled
handler.CONSOLE.autoFlush=true
handler.CONSOLE.target=SYSTEM_OUT
handler.CONSOLE.enabled=true
...
You need to figure out what to change in this logging configuration to meet your JSON requirements. An example would be:
formatter.json=org.jboss.logmanager.formatters.JsonFormatter
formatter.json.properties=keyOverrides,exceptionOutputType,metaData,prettyPrint,printDetails,recordDelimiter
formatter.json.constructorProperties=keyOverrides
formatter.json.keyOverrides=timestamp\=#timestamp
formatter.json.exceptionOutputType=FORMATTED
formatter.json.metaData=#version\=1
formatter.json.prettyPrint=false
formatter.json.printDetails=false
formatter.json.recordDelimiter=\n
Then, in Kubernetes you can create a ConfigMap with the logging config that you want, define it as a volume in your pod/deployment, and mount it as a file to that exact path in the pod/deployment definition. If you do all steps correctly, you should be able to customize the logging format as you need.

How to tail multiple files in fluentd

I have setup fluentd logger and I am able to monitor a file by using fluentd tail input plugin. All the data is received by fluentd is later published to elasticsearch cluster. Below is the configuration file for fluentd:
<source>
#type tail
path /home/user/Documents/log_data.json
format json
tag myfile
</source>
<match *myfile*>
#type elasticsearch
hosts 192.168.48.118:9200
user <username>
password <password>
index_name fluentd
type_name fluentd
</match>
As you can see I am monitoring log_data.json file by using tail. I also have a file in the same directory log_user.json, I want to monitor it also and publish it logs to elasticsearch. To do this, I thought of creating another <source> & <match> with different tag but it started showing errors.
How can I monitor multiple files in fluentd and publish them to elasticsearch. I see when we start fluentd its worker is started. Is it possible to start multiple worker so that each one of them is monitoring different files, or any other way of doing it. Can anyone point me to some good links/tutorials.
Thanks.
You can use multiple source+match tags.
Label can help you to bind them.
Here an example:
<source>
#label #mainstream
#type tail /home/user/Documents/log_data.json
format json
tag myfile
</source>
<label #mainstream>
<match **>
#type copy
<store>
#type elasticsearch
host elasticsearch
port 9200
logstash_format true
logstash_prefix fluentd
logstash_dateformat %Y%m%d
include_tag_key true
type_name access_log
tag_key #log_name
<buffer>
flush_mode interval
flush_interval 1s
retry_type exponential_backoff
flush_thread_count 2
retry_forever true
retry_max_interval 30
chunk_limit_size 2M
queue_limit_length 8
overflow_action block
</buffer>
</store>
</match>
</label>

Specify JFROG_ACCESS home instead of ~/.jfrog_access (Artifactory 5.5.2)

I managed to set up artifactory using our existing tomcat. I have set to ARTIFACTORY_HOME=/opt/artifactory, that part works well. There is, however, also the jfrog access.war file, which needs to be running as well. I didn't figure out which variable to use to specify its home, therefore it defaults to ~/.jfrog_access, which is not at all what I like.
I moved the content over to my $ARTIFACTORY_HOME/access and symlinked it, but that's not the way to go for sure. Any help appreciated.
In case someone is stumbling over this thread and struggles with the same problem:
Solution for me was to also extract the Context files (access.xml and artifactory.xml which are available in the zip file under <zip extract>/misc/tomcat) to the Tomcat configuration folder, e.g. $CATALINA_HOME/conf/Catalina/localhost/. After that the $ARTIFACTORY_HOME env will be recognized on Access startup.
A previous answer finally put me on the right track for solving this problem on Amazon Linux.
In addition to copying access.xml and artifactory.xml to ${catalina.home}/host/MY_HOSTNAME, I found that some other changes were needed.
I modified the docBase attributes in the XML context files because my server has multiple hostnames:
/usr/share/tomcat8/conf/Catalina/repo.mydomain.org/access.xml
<Context path="/access" docBase="${catalina.home}/host/repo.mydomain.org/access.war">
<Parameter name="jfrog.access.bundled" value="true" override="true"/>
<!-- enable annotations scanning of access jar files -->
<JarScanner scanClassPath="false">
<JarScanFilter defaultPluggabilityScan="false" pluggabilityScan="access*" defaultTldScan="false"/>
</JarScanner>
</Context>
/usr/share/tomcat8/conf/Catalina/repo.mydomain.org/artifactory.xml
<Context crossContext="true" path="/artifactory" docBase="${catalina.home}/host/repo.mydomain.org/artifactory.war">
</Context>
Important Note: In order to prevent the above two XML files from being deleted by Tomcat Manager during upgrades via Undeploy/Deploy WAR, make sure they are owned by root and not writable by the tomcat user:
chown root.root access.xml artifactory.xml
chmod 644 access.xml artifactory.xml
If you forget to do the above, you will likely end up missing these files, which will break the communication between the access and artifactory web applications, resulting in login failures ("Username or Password Are Incorrect"). In this case, these errors result from the lack of communication between the web applications, not a problem with the credentials themselves.
/usr/share/tomcat8/conf/Catalina/repo.mydomain.org/manager.xml
This gives me the ability to upload new versions of access.war and artifactory.war via https://repo.mydomain.org:8443/manager/html:
<Context docBase="${catalina.home}/webapps/manager" privileged="true" antiResourceLocking="false">
</Context>
Additionally, I created the following folder to serve as the artifactory.home:
sudo mkdir /usr/share/artifactory
sudo chown tomcat.tomcat /usr/share/artifactory
tomcat8.conf
Add (or modify) the following line:
JAVA_OPTS="-Dartifactory.home=/usr/share/artifactory -Djfrog.access.home=/usr/share/artifactory/access -Dartifactory.access.client.serverUrl.override=http://localhost:8080/access"
Note: The Access Client URL specified above must use localhost in order to avoid the Server HTTP parameter from being overwritten by Apache and its modules. For instance, if I use:
https://repo.mydomain.org/access/api/v1/system/ping
The Server HTTP header value in the response is:
Server: Apache/2.4.33 (Amazon) OpenSSL/1.0.2k-fips mod_jk/1.2.43
And the Access Client produces the following exception:
[ERROR] (o.j.a.c.AccessClientImpl:154) - Access client/server version mismatch. Client version: 4.1.5, Server version: 2.4.33 (Amazon) OpenSSL
Which means the Access Client is depending on the first string matching #.#.# in the server header. This seems like a really fragile part of the Access Client. They should have used X-JFrog-Access-Server or something instead of trying to control a value that is set by the web server. So, to reiterate, use http://localhost:8080/access to connect directly to the tomcat server.
Artifactory 6.2.0 depends on Apache Derby (the specific version can be found in jfrog-artifactory-oss-6.2.0.zip\artifactory-oss-6.2.0\tomcat\lib). This should be added as a shared library to Tomcat:
mkdir /usr/share/tomcat8/shared
cd /usr/share/tomcat8/shared
wget http://central.maven.org/maven2/org/apache/derby/derby/10.11.1.1/derby-10.11.1.1.jar
Add or modify the following line in catalina.properties:
shared.loader=${catalina.home}/shared/*.jar
Since we want https://repo.mydomain.org to go to the Artifactory webapp:
mkdir /usr/share/tomcat8/host/repo.mydomain.org/ROOT
echo '<html><head><meta http-equiv="refresh" content="0;URL=/artifactory"></meta></head><body></body></html>' > /usr/share/tomcat8/host/repo.mydomain.org/ROOT/index.html
And make sure the services automatically start on reboot:
sudo chkconfig httpd on
sudo chkconfig tomcat8 on
Artifactory will then be available at the url:
https://repo.mydomain.org/artifactory/webapp/

fluent and webhdfs filename with 197001011

I run td-agent on ubuntu 14.04 with the follow configuration:
<source>
type tail
format json
path /path/tomcat/logs/file-input.log
tag bhc.hdfs
pos_file /var/td-agent/file.pos
</source>
<match bhc.hdfs>
type webhdfs
port 50070
host my.host.name
path /hdfs/path/file.${hostname}.%Y%m%d.log
username user
flush_interval 10s
output_include_time false
output_include_tag false
output_data_type json
</match>
Log source files in directory /path/tomcat/logs/file-input.log contain only a structured json data.
Ntp daemon is installed and running but when td-agent creates file in hdfs date on filename is 19700101.
What's wrong?
Fluentd records has time, and webhdfs plugin creates files with that records' timestamp, not current time.
tail plugin uses field named as time for time of record in default. If your log data have any other time information field, you can specify it with time_key and time_format.
See also: http://docs.fluentd.org/articles/in_tail