I'm looking into log analysis tools such as Splunk and elasticsearch/logstash. I modified my logback configuration so that it outputs all log as JSON, which can be routed to Splunk/logstash.
I have lots (GBs) of existing log files that I'd like to analyze. These files are in plain text. Anyone that knows of tools that can take a log file and the log pattern it is created with and use that to convert the log file to JSON?
I think Logstash&elasticsearch are most suitable for you. Elasticsearch outputs the logs as JSON format.
Related
I'm looking for ideas for an Open Source ETL or Data Processing software that can monitor a folder for CSV files, then open and parse the CSV.
For each CSV row the software will transform the CSV into a JSON format and make an API call to start a Camunda BPM process, passing the cell data as variables into the process.
Looking for ideas,
Thanks
You can use a Java WatchService or Spring FileSystemWatcher as discussed here with examples:
How to monitor folder/directory in spring?
referencing also:
https://www.baeldung.com/java-nio2-watchservice
Once you have picked up the CSV you can use my example here as inspiration or extend it: https://github.com/rob2universe/csv-process-starter specifically
https://github.com/rob2universe/csv-process-starter/blob/main/src/main/java/com/camunda/example/service/CsvConverter.java#L48
The example starts a configurable process for every row in the CSV and includes the content of the row as a JSON process data.
I wanted to limit the dependencies of this example. The CSV parsing logic applied is very simple. Commas in the file may break the example, special characters may not be handled correctly. A more robust implementation could replace the simple Java String .split(",") with an existing CSV parser library such as Open CSV
The file watcher would actually be a nice extension to the example. I may add it when I get around to it, but would also accept a pull request in case you fork my project.
I have noticed there is a feature in web interface of ArangoDB which allows users to Download or Upload data as JSON file. However, I find nothing similar for CSV exporting. How can an existing Arango DB collection be exported to a .csv file?
If you want to export data from ArangoDB to CSV, then you should use Arangoexport. It is included in the full packages as well as the client-only packages. You find it next to the arangod server executable.
Basic usage:
https://docs.arangodb.com/3.4/Manual/Programs/Arangoexport/Examples.html#export-csv
Also see the CSV example with AQL query:
https://docs.arangodb.com/3.4/Manual/Programs/Arangoexport/Examples.html#export-via-aql-query
Using an AQL query for a CSV export allows you to transform the data if desired, e.g. to concatenate an array to a string or unpack nested objects. If you don't do that, then the JSON serialization of arrays/objects will be exported (which may or may not be what you want).
The default Arango install includes the following file:
/usr/share/arangodb3/js/contrib/CSV_export/CSVexport.js
It includes this comment:
// This is a generic CSV exporter for collections.
//
// Usage: Run with arangosh like this:
// arangosh --javascript.execute <CollName> [ <Field1> <Field2> ... ]
Unfortunately, at least in my experience, that usage tip is incorrect. Arango team, if you are reading this, please correct the file or correct my understanding.
Here's how I got it to work:
arangosh --javascript.execute "/usr/share/arangodb3/js/contrib/CSV_export/CSVexport.js" "<CollectionName>"
Please specify a password:
Then it sends the CSV data to stdout. (If you with to send it to a file, you have to deal with the password prompt in some way.)
Does LogParser support JSON log files? I am working with an app that outputs simple JSON log files into a folder and I'm trying to run aggregate SQL style queries against the files in the folder.
The format of the files is simple:
{"f1":"value", "f2":NumericValue, "f3":"DateValue", etc...}
You can try out Musoq which already has a plugin that lets you treat json as queryable source.
Unfortunately not. LogParser was written when XML was the cool thing :-)
But you can always write your COM extension and LogParser will be able to query JSON files as well.
Is there any way to generate the log file in xml or json format using TCL.
Using the log_file the logs are stored in text format.
Please suggest
You can of course use one of the JSON or XML libraries available for Tcl to write log messages in any format you want. Probably to put the data into one of those ELK stacks?
The expect log_file command takes the -open or -leaveopen argument with a tcl file identifier. Combine this with Tcl reflected channels and you can divert the log to some other logging system that writes the JSON you want.
See the documentation for chan create and the API description at refchan. For writing JSON, you can use json::write from tcllib.
You could probably adapt the code of tcl::chan::textwidget to dump out JSON instead of writing to a text widget.
How can I convert .jhist files to json format in OSX?
I wonder if there are validated software packages or commands for doing so?
About .jhist files: Another important log for MapReduce jobs is the job history file (.jhist). These files contain a wealth of performance data on the execution of Mappers and Reducers, including HDFS statistics, data volume processed, memory allocated etc. We configure our History Server to write the jhist files to HDFS periodically using the mapreduce.jobhistory.done-dir parameter in yarn-site.xml
If you are interested in the full log history you could parse as an avro file. If you are interested in one big json file with aggregated counters, you could check out Rumen, a parsing tool from the apache ecosystem.
An example run of Rumen:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-rumen-2.5.0-cdh5.2.6.jar \
org.apache.hadoop.tools.rumen.TraceBuilder \
file:///tmp/job-trace.json \
file:///tmp/job-topology.json \
file:///sample-job-histories/job_201211091010_0001_1352484738664_word+count
and you get the aggregated counters in job-trace.json, and the topology in job-topology.json