Yarn parsing job logs stored in hdfs - json

Is there any parser, which I can use to parse the json present in yarn job logs(jhist files) which gets stored in hdfs to extract information from it.

The second line in the .jhist file is the avro schema for the other jsons in the file. Meaning that you can create avro data out of the jhist file.
For this you could use avro-tools-1.7.7.jar
# schema is the second line
sed -n '2p;3q' file.jhist > schema.avsc
# removing the first two lines
sed '1,2d' file.jhist > pfile.jhist
# finally converting to avro data
java -jar avro-tools-1.7.7.jar fromjson pfile.jhist --schema-file schema.avsc > file.avro
You've got an avro data, which you can for example import to a Hive table, and make queries on it.

You can check out Rumen, a parsing tool from the apache ecosystem
or When you visit the web UI, go to job history and look for the job for which you want to read .jhist file. Hit the Counters link at the left,now you will be able see an API which gives you all the parameters and the value like CPU time in milliseconds etc. which will read from a .jhist file itself.

Related

csv data streaming using Kafka

I am trying to send csv file data throguh producer into kafka topic then on the consumer side i am listening the event.
Producer is a command line. I am sending the csv file using below command -
kafka-console-producer.bat --broker-list localhost:9092 --topic freshTopic < E:\csv\sample.csv
I am sucessfully listen the event on consumer side as well.
Now I have to save that data in some database like elasticsearch. For this i have to convert csv records into DataModel. I read below tutorial for it but not able to understand that how can i write this in java. So Can anyone help me here how can i convert csv file data into datamodel? Thanks in advance.
Csv data streaming using kafka
What you wrote will work fine to get data into Kafka. There are lots of ways to get data into Elasticsearch (which is not a database) from there...
You don't need Avro, as JSON will work too, but confluent-schema-registry doesn't handle conversion from CSV, and Kafka has no "DataModel" class.
Assuming you want Avro to get it into Elasticsearch as individual fields, then
You could use Kafka Connect spooldir source instead of the console producer, and that would get you further along, and then you can run Elasticsearch sink connector from there
Use something to parse the CSV to Avro, as the link you have shows (doesn't have to be Python, KSQL could work too)
If you are fine with JSON, then Logstash would work as well

How to save response data to CSV file that i am generating in Simple Data Writer in JMeter

I am executing one Thread Group which have multiple Http Request. And I am capturing the Error result into a CSV file using Simple Data Writer. But unable to add the response data to the same file. Can you guys please let me know how I can add the response data to this CSV file or is there any other way that we can use for this purpose?
Use simple data writer but change the option to produce XML. Remove CSV and select "Save as XML". There is a option to "Save Response Data(XML)" that is what you require as shown below.
Put the result file output format as ".jtl" and open it in the Excel to see the results.
You don't even need to Simple Data Writer for this, it can be done by amending JMeter Results File Configuration:
Add the next lines to user.properties file (lives in "bin" folder of your JMeter installation)
jmeter.save.saveservice.output_format=xml
jmeter.save.saveservice.response_data.on_error=true
Restart JMeter to pick the properties up
Next time when you run JMeter in command-line non-GUI mode like
jmeter -n -t test.jmx -l result.jtl
response data for failed samplers will be added to the results.jtl file, you will be able to inspect it using either View Results Tree listener or your favorite text/XML viewer/editor.

TCL/expect for generating log file in xml or json format

Is there any way to generate the log file in xml or json format using TCL.
Using the log_file the logs are stored in text format.
Please suggest
You can of course use one of the JSON or XML libraries available for Tcl to write log messages in any format you want. Probably to put the data into one of those ELK stacks?
The expect log_file command takes the -open or -leaveopen argument with a tcl file identifier. Combine this with Tcl reflected channels and you can divert the log to some other logging system that writes the JSON you want.
See the documentation for chan create and the API description at refchan. For writing JSON, you can use json::write from tcllib.
You could probably adapt the code of tcl::chan::textwidget to dump out JSON instead of writing to a text widget.

Converting .jhist files to JSON format

How can I convert .jhist files to json format in OSX?
I wonder if there are validated software packages or commands for doing so?
About .jhist files: Another important log for MapReduce jobs is the job history file (.jhist). These files contain a wealth of performance data on the execution of Mappers and Reducers, including HDFS statistics, data volume processed, memory allocated etc. We configure our History Server to write the jhist files to HDFS periodically using the mapreduce.jobhistory.done-dir parameter in yarn-site.xml
If you are interested in the full log history you could parse as an avro file. If you are interested in one big json file with aggregated counters, you could check out Rumen, a parsing tool from the apache ecosystem.
An example run of Rumen:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-rumen-2.5.0-cdh5.2.6.jar \
org.apache.hadoop.tools.rumen.TraceBuilder \
file:///tmp/job-trace.json \
file:///tmp/job-topology.json \
file:///sample-job-histories/job_201211091010_0001_1352484738664_word+count
and you get the aggregated counters in job-trace.json, and the topology in job-topology.json

How to export JMeter results to JSON?

We run load tests with JMeter and would like to export result data (throughput, latency, requests per second etc.) to JSON, either a file or STDOUT. How can we do that?
JMeter can save the results in a CSV format with header.
(Do not forget to select Save Field Names - it is OFF by default)
Then you can use this tool to covert the CSV to a JSON.
http://www.convertcsv.com/csv-to-json.htm
EDIT
JMeter stores the result in XML or CSV format. XML is by default (with .jtl extension). But It is always recommended to save the result in csv format.
If you want to convert XML to JSON
http://www.utilities-online.info/xmltojson/#.U9O2ifldVBk
If you are planning to use CSV, To save the result in CSV format automatically
When you are running your test via command line, to save the result in csv for a specific test
%JMETER_HOME%\bin\jmeter.bat" -n -t %TESTNAME% -p %PROPERTY_FILE_PATH% -l %RESULT_FILE_PATH% -j %LOG_FILE_PATH% -Djmeter.save.saveservice.output_format=csv
Or
You can update the jmeter.properties in bin folder to enable below property (for any test you run)
jmeter.save.saveservice.output_format=csv
Hope, it is clear!
There is no OOTB solution for this but you could inspire yourself from this patch:
https://issues.apache.org/bugzilla/show_bug.cgi?id=53668