JSON file to CSV file conversion when my JSON columns are dynamic - json

I found the solution for json to csv conversion. Below is the sample json and solution.
{
"took" : 111,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "alerts",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"alertID" : "639387c3-0fbe-4c2b-9387-c30fbe7c2bc6",
"alertCategory" : "Server Alert",
"description" : "Successfully started.",
"logId" : null
}
},
{
"_index" : "alerts",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"alertID" : "2",
"alertCategory" : "Server Alert",
"description" : "Successfully stoped.",
"logId" : null
}
}
]
}
}
The solution :
jq -r '.hits.hits[]._source | [ "alertID" , "alertCategory" , "description", "logId" ], ([."alertID",."alertCategory",."description",."logId" // "null"]) | #csv' < /root/events.json
The problem with this solution is that I have to hard code the column names. What If my json gets a few additions under _source tag later? I need a solution which can handle the dynamic data under _source. I am open to any other tool or command in shell.

Simply use keys_unsorted (or keys if you want them sorted). See e.g. Convert JSON array into CSV using jq or How to convert arbitrary simple JSON to CSV using jq? for two SO examples. There are many others too.

Related

Jmeter: Extracting JSON response with special/spaces characters

Hello can someone help me extract the value of user parameter which is "testuser1"
I tried to use this JSON Path expression $..data I was able to extract the entire response but unable to extract user parameter. Thanks in advance
{
"data": "{ "took" : 13, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.0, "hits" : [ { "_index" : "bushidodb_history_network_eval_ea9656ef-0a9b-474b-8026-2f83e2eb9df1_2021-april-10", "_type" : "network", "_id" : "6e2e58be-0ccf-3fb4-8239-1d4f2af322e21618059082000", "_score" : 1.0, "_source" : { "misMatches" : [ "protocol", "state", "command" ], "instance" : "e3032804-4b6d-3735-ac22-c827950395b4|0.0.0.0|10.179.155.155|53|UDP", "protocol" : "UDP", "localAddress" : "0.0.0.0", "localPort" : "12345", "foreignAddress" : "10.179.155.155", "foreignPort" : "53", "command" : "ping yahoo.com ", "user" : "testuser1", "pid" : "10060", "state" : "OUTGOINGFQ", "rate" : 216.0, "originalLocalAddress" : "192.168.100.229", "exe" : "/bin/ping", "md5" : "f9ad63ce8592af407a7be43b7d5de075", "dir" : "", "agentId" : "abcd-dcd123", "year" : "2021", "month" : "APRIL", "day" : "10", "hour" : "12", "time" : "1618059082000", "isMerged" : false, "timestamp" : "Apr 10, 2021 12:51:22 PM", "metricKey" : "6e2e58be-0ccf-3fb4-8239-1d4f2af322e2", "isCompliant" : false }, "sort" : [ 1618059082000 ] } ] }, "aggregations" : { "count_over_time" : { "buckets" : [ { "key_as_string" : "2021-04-10T08:00:00.000-0400", "key" : 1618056000000, "doc_count" : 1 } ] } }}",
"success": true,
"message": {
"code": "S",
"message": "Get Eval results Count Success"
}
}
Actual Response:
Images
What you posted doesn't look like a valid JSON to me.
If in reality you're getting what's in your image, to wit:
{
"data": "{ \"took\" : 13, \"timed_out\" : false, \"_shards\" : { \"total\" : 5, \"successful\" : 5, \"skipped\" : 0, \"failed\" : 0 }, \"hits\" : { \"total\" : 1, \"max_score\" : 1.0, \"hits\" : [ { \"_index\" : \"bushidodb_history_network_eval_ea9656ef-0a9b-474b-8026-2f83e2eb9df1_2021-april-10\", \"_type\" : \"network\", \"_id\" : \"6e2e58be-0ccf-3fb4-8239-1d4f2af322e21618059082000\", \"_score\" : 1.0, \"_source\" : { \"misMatches\" : [ \"protocol\", \"state\", \"command\" ], \"instance\" : \"e3032804-4b6d-3735-ac22-c827950395b4|0.0.0.0|10.179.155.155|53|UDP\", \"protocol\" : \"UDP\", \"localAddress\" : \"0.0.0.0\", \"localPort\" : \"12345\", \"foreignAddress\" : \"10.179.155.155\", \"foreignPort\" : \"53\", \"command\" : \"pingyahoo.com\", \"user\" : \"testuser1\", \"pid\" : \"10060\", \"state\" : \"OUTGOINGFQ\", \"rate\" : 216.0, \"originalLocalAddress\" : \"192.168.100.229\", \"exe\" : \"/bin/ping\", \"md5\" : \"f9ad63ce8592af407a7be43b7d5de075\", \"dir\" : \"\", \"agentId\" : \"abcd-dcd123\", \"year\" : \"2021\", \"month\" : \"APRIL\", \"day\" : \"10\", \"hour\" : \"12\", \"time\" : \"1618059082000\", \"isMerged\" : false, \"timestamp\" : \"Apr10, 202112: 51: 22PM\", \"metricKey\" : \"6e2e58be-0ccf-3fb4-8239-1d4f2af322e2\", \"isCompliant\" : false }, \"sort\" : [ 1618059082000 ] } ] }, \"aggregations\" : { \"count_over_time\" : { \"buckets\" : [ { \"key_as_string\" : \"2021-04-10T08: 00: 00.000-0400\", \"key\" : 1618056000000, \"doc_count\" : 1 } ] } }}",
"success": true,
"message": {
"code": "S",
"message": "Get Eval results Count Success"
}
}
the easiest way is just using 2 JSON Extractors:
Extract data attribute value into a JMeter Variable from the response
Extract user attribute value into a JMeter variable from ${data} JMeter Variable:
Demo:
If the response looks like exactly you posted you won't be able to use JSON Extractors and will have to treat it as normal text so your choice is limited to Regular Expression Extractor, example regular expression:
"user"\s*:\s*"(\w+)"
Add Regular Expression extractor with the corresponding request and extract it. Use the below expression.
Expression: "user" : "(.*?)"
Ref: https://jmeter.apache.org/usermanual/regular_expressions.html
Regular Expression Extractor Sample

JSON file to CSV file conversion using jq

I am trying to convert my json file to a csv file using jq. Below is the sample input events.json file.
{
"took" : 111,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "alerts",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"alertID" : "639387c3-0fbe-4c2b-9387-c30fbe7c2bc6",
"alertCategory" : "Server Alert",
"description" : "Successfully started.",
"logId" : null
}
},
{
"_index" : "alerts",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"alertID" : "2",
"alertCategory" : "Server Alert",
"description" : "Successfully stoped.",
"logId" : null
}
}
]
}
}
My rows in csv should have the data inside each _source tag. So my columns would be alertId , alertCategory , description and logId with its respective data.
I tried the below command :
jq --raw-output '.hits[] | [."alertId",."alertCategory",."description",."logId"] | #csv' < /root/events.json
and its not working.
Can anyone help me with this?
Your path-expression is not right, you have a hits array inside an object named hits and the fields you trying to put in CSV is present under __source object.
So your expression should have been below. Use it along with -r flag to put the output in raw output format
.hits.hits[]._source | [ .alertID, .alertCategory, .description, .logId ] | #csv
If your fields are null, the string representation of your null field value results in just "". If you want an explicit "null" string representation, use the alternate operator along with the field you expect to be null, e.g. instead of .logId, you can do (.logId // "null")
To add the column name as the header in the output CSV format, you could use the #csv or the join(",") function in raw output format -r
[ "alertId" , "alertCategory" , "description", "logId" ],
( .hits.hits[]._source | [ .alertID, .alertCategory, .description, .logId // "null" ]) | #csv
or
[ "alertId" , "alertCategory" , "description", "logId" ],
( .hits.hits[]._source | [ .alertID, .alertCategory, .description, .logId // "null" ]) | join(",")

Is there any another way to optimize this elasticsearch query for multiple nested fields in JSON

I am new to elasticserach. Below is the Sample data on which elastic query needs to run. I am trying to get those docs in which account_type is "credit card" and source_name is 'SOMEVALUE'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bureau_data",
"_type" : "_doc",
"_id" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
"_score" : 1.0,
"_source" : {
"userid" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
"raw_derived" : {
"gender" : "MALE",
"firstname" : "trsqlsz",
"middlename" : "rgj",
"lastname" : "ggksb",
"mobilephone" : "2125954664",
"dob" : "1988-06-28 00:00:00",
"applications" : [
{
"applicationid" : "c7fb0147-22fd-4a5e-8851-98241de6aa50",
"createdat" : "2019-06-07 19:28:54",
"updatedat" : "2019-06-07 19:28:55",
"source" : "4",
"source_name" : "EXPERIAN",
"applicationcreditreportid" : "b67f9180-9bb6-485c-9cfc-e7ccf9a70a69",
"accounts" : [
{
"applicationcreditreportaccountid" : "c5de28c4-cac9-4390-852a-96f143cb0b62",
"currentbalance" : 418288,
"institutionid" : "021d58b4-aba5-42c9-8d39-304a78d34aea",
"accounttypeid" : "5",
"institution_name" : "HDFC BANK",
"account_type_name" : "Personal Loan"
}
]
}
]
}
}
}
I have tried the below query and its working fine. I need if we have any optimized way to query the multiple nested fields
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "raw_derived.applications.accounts",
"query": {
"bool": {
"must": [
{"match": {
"raw_derived.applications.accounts.account_type_name": "Credit Card"
}}
]
}
}
}
},
{
"nested": {
"path": "raw_derived.applications",
"query": {
"bool": {
"must": [
{"match": {
"raw_derived.applications.source_name": "CIBIL"
}}
]
}
}
}
}
]
}
}
}
If I will query on the multiple nested fields it will become very long Please suggest any other way to query nested fields or multiple AND
Well your optimizations should always start with your data model / mapping since it's mostly the cause of performance issues and not your queries.
That being said, you can avoid the nested query by flattening your data. A flattened data model would lead to one document per application and account element.
Since elasticsearch is a non-relational data store, it is completely fine to index "redundant" data. This is not a lazy appraoch but a common way to handle these type of data structures.
Sample document #1:
{
"_index" : "bureau_data",
"_type" : "_doc",
"_id" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
"_score" : 1.0,
"_source" : {
"userid" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
"gender" : "MALE",
"firstname" : "trsqlsz",
"middlename" : "rgj",
"lastname" : "ggksb",
"mobilephone" : "2125954664",
"dob" : "1988-06-28 00:00:00",
"applicationid" : "c7fb0147-22fd-4a5e-8851-98241de6aa50",
"createdat" : "2019-06-07 19:28:54",
"updatedat" : "2019-06-07 19:28:55",
"source" : "4",
"source_name" : "EXPERIAN",
"applicationcreditreportid" : "b67f9180-9bb6-485c-9cfc-e7ccf9a70a69",
"applicationcreditreportaccountid" : "c5de28c4-cac9-4390-852a-96f143cb0b62",
"currentbalance" : 418288,
"institutionid" : "021d58b4-aba5-42c9-8d39-304a78d34aea",
"accounttypeid" : "5",
"institution_name" : "HDFC BANK",
"account_type_name" : "Personal Loan"
}
}
If the same user creates another account you would send the very same ("redundant") data, except for that other account element/data like so:
{
"_index" : "bureau_data",
"_type" : "_doc",
"_id" : "another, from es generated id",
"_score" : 1.0,
"_source" : {
"userid" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
"gender" : "MALE",
"firstname" : "trsqlsz",
"middlename" : "rgj",
"lastname" : "ggksb",
"mobilephone" : "2125954664",
"dob" : "1988-06-28 00:00:00",
"applicationid" : "c7fb0147-22fd-4a5e-8851-98241de6aa50",
"createdat" : "2019-06-07 19:28:54",
"updatedat" : "2019-06-07 19:28:55",
"source" : "4",
"source_name" : "EXPERIAN",
"applicationcreditreportid" : "b67f9180-9bb6-485c-9cfc-e7ccf9a70a69",
"applicationcreditreportaccountid" : "the new id",
"currentbalance" : 4711,
"institutionid" : "foo",
"accounttypeid" : "bar",
"institution_name" : "foo bar",
"account_type_name" : "foo baz"
}
}
With that kind of data model, you can run simple queries to get your results:
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{
"match":{
"account_type_name": "Credit Card"
}
},
{
"match":{
"source_name": "CIBIL"
}
}
]
}
}
}

How to get nested json object/value in scala

I have Elasticsearch Search response that is a deeply nested Json file and I am stuck as to how to get a particular value from it. Please am new to Scala and programming in general and I have searched online and could not see any answer that explained it well.
This is the Json file and the value I want to get out is "getSum":"value"
Search_response: org.elasticsearch.action.search.SearchResponse = {
"took" : 32,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 12,
"max_score" : 1.0,
"hits" : [ {
"_index" : "myIndex",
"_type" : "myType",
"_id" : "4151202002020",
"_score" : 1.0,
"_source":{"pint":[{"printer":[{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"}],"Lam":[{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"}],"Kam":[{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"}],"Jas":[{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"}],"tiv":[{ourc""s:"wrer","sourceType":"rsd","Vag":"agaatttt363336"}],"timeLineSource:[{"LA":"DGAT","GATA":"JAS","timeline":9.111694,"GA":"SFWF2525252552552525"}
}, {
"_index" : "myIndex",
"_type" : "myType",
"_id" : "4151202002020",
"_score" : 1.0,
"_source":{"pint":[{"printer":[{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"}],"Lam":[{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"}],"Kam":[{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"}],"Jas":[{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"}],"tiv":[{ourc""s:"wrer","sourceType":"rsd","Vag":"agaatttt363336"}],"timeLineSource:[{"LA":"DGAT","GATA":"JAS","timeline":9.111694,"GA":"SFWF2525252552552525"}
}, {
"_index" : "myIndex",
"_type" : "myType",
"_id" : "4151202002020",
"_score" : 1.0,
"_source":{"pint":[{"printer":[{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"}],"Lam":[{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"}],"Kam":[{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"},{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"}],"Jas":[{"sourceName":"3636636","sourceType":"Bin","Star":0.0,"Fun":"gatayay"}],"tiv":[{ourc""s:"wrer","sourceType":"rsd","Vag":"agaatttt363336"}],"timeLineSource:[{"LA":"DGAT","GATA":"JAS","timeline":9.111694,"GA":"SFWF2525252552552525"}
}, {
},
"aggregations" : {
"DAEY" : {
"doc_count" : 59,
"histogram" : {
"buckets" : [ {
"key_as_string" : "1978-02-22T00:00:00.000Z",
"key" : 1503360000000,
"doc_count" : 59,
"nestedValue" : {
"doc_count" : 177,
"getSum" : {
"value" : 768.0690221786499
}
},
}
}
}
}
This is what I tried
val getResult: String = searchResult.toString.stripMargin
val getValue = JsonParser.parse(getResult).asInstanceOf[JObject].values("aggregations").toString
You can solve this by using type-safe config. Please find the required maven and sbt dependency below -
Maven Dependecy -
<dependency>
<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.3.1</version>
</dependency>
Sbt Dependency -
libraryDependencies += "com.typesafe" % "config" % "1.3.1"
Afterwards, you can get the value of sum with below code -
import com.typesafe.config.ConfigFactory
val config = ConfigFactory.parseString(getResult)
config.getConfigList("aggregations.DAEY.buckets").get(0).getString("nestedValue .getSum.value")
Checkout API doc for library from this link
I finally used
val getResult: String = searchResult.toString.stripMargin
val getValue = JsonParser.parse(getResult).asInstanceOf[JObject].values("aggregations").toString
val valueToDouble = getValue.split(" ").last.dropRight(13).toDouble

Extract JSON value using Jmeter

I have this JSON:
{
"totalMemory" : 12206567424,
"totalProcessors" : 4,
"version" : "0.4.1",
"agent" : {
"reconnectRetrySec" : 5,
"agentName" : "1001",
"checkRecovery" : false,
"backPressure" : 10000,
"throttler" : 100
},
"logPath" : "/eq/equalum/eqagent-0.4.1.0-SNAPSHOT/logs",
"startTime" : 1494837249902,
"status" : {
"current" : "active",
"currentMessage" : null,
"previous" : "pending",
"previousMessage" : "Recovery:Starting pipelines"
},
"autoStart" : false,
"recovery" : {
"agentName" : "1001",
"partitionInfo" : { },
"topicToInitialCapturePosition" : { }
},
"sources" : [ {
"dataSource" : "oracle",
"name" : "oracle_source",
"captureType" : "directOverApi",
"streams" : [ ],
"idlePollingFreqMs" : 100,
"status" : {
"current" : "active",
"currentMessage" : null,
"previous" : "pending",
"previousMessage" : "Trying to init storage"
},
"host" : "192.168.191.5",
"metricsType" : { },
"bulkSize" : 10000,
"user" : "STACK",
"password" : "********",
"port" : 1521,
"service" : "equalum",
"heartbeatPeriodInMillis" : 1000,
"lagObjective" : 1,
"dataSource" : "oracle"
} ],
"upTime" : "157 min, 0 sec",
"build" : "0-SNAPSHOT",
"target" : {
"targetType" : "equalum",
"agentID" : 1001,
"engineServers" : "192.168.56.100:9000",
"kafkaOptions" : null,
"eventsServers" : "192.168.56.100:9999",
"jaasConfigurationPath" : null,
"securityProtocol" : "PLAINTEXT",
"stateMonitorTopic" : "_state_change",
"targetType" : "equalum",
"status" : {
"current" : "active",
"currentMessage" : null,
"previous" : "pending",
"previousMessage" : "Recovery:Starting pipelines"
},
"serializationFormat" : "avroBinary"
}
}
I trying using Jmeter to extract out the value of agentID, how can I do that using Jmeter, what would be better ? using extractor or json extractor?
what I am trying to do is to extract agentID value in order to use it on another http request sample, but first I have to extract it from this request.
thanks!
I believe using JSON Extractor is the best way to get this agentID value, the relevant JsonPath query will be as simple as $..agentID
Demo:
See the following reference material:
JsonPath - Getting Started - for initial information regarding JsonPath language, functions, operators, etc.
JMeter's JSON Path Extractor Plugin - Advanced Usage Scenarios - for more complex scenarios.