Converting .jhist files to JSON format - json

How can I convert .jhist files to json format in OSX?
I wonder if there are validated software packages or commands for doing so?
About .jhist files: Another important log for MapReduce jobs is the job history file (.jhist). These files contain a wealth of performance data on the execution of Mappers and Reducers, including HDFS statistics, data volume processed, memory allocated etc. We configure our History Server to write the jhist files to HDFS periodically using the mapreduce.jobhistory.done-dir parameter in yarn-site.xml

If you are interested in the full log history you could parse as an avro file. If you are interested in one big json file with aggregated counters, you could check out Rumen, a parsing tool from the apache ecosystem.
An example run of Rumen:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-rumen-2.5.0-cdh5.2.6.jar \
org.apache.hadoop.tools.rumen.TraceBuilder \
file:///tmp/job-trace.json \
file:///tmp/job-topology.json \
file:///sample-job-histories/job_201211091010_0001_1352484738664_word+count
and you get the aggregated counters in job-trace.json, and the topology in job-topology.json

Related

Convert file format while copying data using copy activity in Azure data factory

I am performing a copy activity to bring in data into the Azure data lake using Azure data factory. The file format is compressed(.gz) format.
I want to copy those files but want to change the format to .json instead of copying in the same original format(the .gz file contains inside a .json file).
Is there a mechanism to get this done in Azure data factory? I want to perform this because in further ETL process i will face issues with .gz format.
Any help would be great. Thank you.
Step1: Create Copy Activity.
Step2: Select .gz file as Source
Step3: Select gzip(.gz) as Compression type and Compression level as Optimal.
Step4: Select Sink as blob storage and run pipeline.
This will unzip your .gz file.

Get only the content of each commit in a json ? ( git diff json)

So I'm storring on my github all the differences I get on a json file during time ( I call an api that updates each time the json and only store the differences). To give you an idea ; It's the changes of availability, either an id is available or not.
What I'm trying to do now : I want to get the content of each commit in a json file in my local machine. SO later I can loop through all the json files in sequence using nodejs or python and then generate a CSV with the data that interest me.
Thank you for your help,
There is a couple of Gists adding support of log2json command to git:
https://gist.github.com/textarcana/1306223#file-git-log2json-sh - the original log2json command implementation
https://gist.github.com/dmegorov/b64dcea2eed31e02c916fc6ed9111f4f#file-git-log2json-sh - my version of the above with support of --name-only parameter
References:
Git log output to XML, JSON, or YAML?
Git log JSON *with changed files*

TCL/expect for generating log file in xml or json format

Is there any way to generate the log file in xml or json format using TCL.
Using the log_file the logs are stored in text format.
Please suggest
You can of course use one of the JSON or XML libraries available for Tcl to write log messages in any format you want. Probably to put the data into one of those ELK stacks?
The expect log_file command takes the -open or -leaveopen argument with a tcl file identifier. Combine this with Tcl reflected channels and you can divert the log to some other logging system that writes the JSON you want.
See the documentation for chan create and the API description at refchan. For writing JSON, you can use json::write from tcllib.
You could probably adapt the code of tcl::chan::textwidget to dump out JSON instead of writing to a text widget.

Yarn parsing job logs stored in hdfs

Is there any parser, which I can use to parse the json present in yarn job logs(jhist files) which gets stored in hdfs to extract information from it.
The second line in the .jhist file is the avro schema for the other jsons in the file. Meaning that you can create avro data out of the jhist file.
For this you could use avro-tools-1.7.7.jar
# schema is the second line
sed -n '2p;3q' file.jhist > schema.avsc
# removing the first two lines
sed '1,2d' file.jhist > pfile.jhist
# finally converting to avro data
java -jar avro-tools-1.7.7.jar fromjson pfile.jhist --schema-file schema.avsc > file.avro
You've got an avro data, which you can for example import to a Hive table, and make queries on it.
You can check out Rumen, a parsing tool from the apache ecosystem
or When you visit the web UI, go to job history and look for the job for which you want to read .jhist file. Hit the Counters link at the left,now you will be able see an API which gives you all the parameters and the value like CPU time in milliseconds etc. which will read from a .jhist file itself.

Converting existing log files to JSON format for analysis purposes

I'm looking into log analysis tools such as Splunk and elasticsearch/logstash. I modified my logback configuration so that it outputs all log as JSON, which can be routed to Splunk/logstash.
I have lots (GBs) of existing log files that I'd like to analyze. These files are in plain text. Anyone that knows of tools that can take a log file and the log pattern it is created with and use that to convert the log file to JSON?
I think Logstash&elasticsearch are most suitable for you. Elasticsearch outputs the logs as JSON format.