How to replace \" with " in string columns scala - json

I have a column of badly formatted nested json string that I'm trying to edit with regexp_replace. So that the column can be read by scala as a struct. There are random \" added to each item in the Json that i want to replace with "
{
{"id": "1", "json": [
{ \"details\": {\n \"name\" : \"john\", \n \"lastname\" : \"doe"\ \n},
\"location\": {\n \"city\" : \"new york\", \n \"country\" : \"usa\" \n} },
{ \"details\": {\n \"name\" : \"jane\", \n \"lastname\" : \"random"\ \n},
\"location\": {\n \"city\" : \"new york\", \n \"country\" : \"usa\" \n} },
] },
{"id": "2", "json": [
{ \"details\": {\n \"name\" : \"jack\", \n \"lastname\" : \"ryan"\ \n},
\"location\": {\n \"city\" : \"york\", \n \"country\" : \"uk\" \n} },
{ \"details\": {\n \"name\" : \"jill\", \n \"lastname\" : \"test"\ \n},
\"location\": {\n \"city\" : \"LA\", \n \"country\" : \"usa\" \n} },
] }
}
I was able to remove the \n with regexp_replace but i'm struggling with the \" wrapping each item.
var newdf = df.withcolumn( "clean_json",
regexp_replace(regexp_replace(col("json"), "\n", ""), """\"""", "\\\""))
I've tried using both \\\ and """ as escape characters. nothing seems to work

If you want to replace \, you should replace with \\\\.
df = spark.createDataFrame(['\\"'], StringType()).toDF('value')
df.withColumn('new_value', f.regexp_replace('value', '\\\\"', '')).show(truncate=False)
+-----+---------+
|value|new_value|
+-----+---------+
|\" | |
+-----+---------+

Related

Access a JSON string value, undefined

I have this example where I am trying to access a JSON value, but I get undefined value for id.
My JSON:
{
"success": true,
"msg": "",
"obj": [
{
"remark": "test-1",
"settings": "{\n \"clients\": [\n {\n \"id\": \"430c867306d8\",\n \"alterId\": 0\n }\n ],\n \"disableInsecureEncryption\": false\n}",
},
{
"remark": "test-2",
"settings": "{\n \"clients\": [\n {\n \"id\": \"9831d43186de\",\n \"alterId\": 0\n }\n ],\n \"disableInsecureEncryption\": false\n}",
}
]
}
I want to fetch remark and id, so I wrote:
const remark = data.obj[i].remark;
const settings = data.obj[i].settings.clients[0].id;
Note: data is where my data is in actual code.
settings should be parsed before using
const settings = JSON.parse(data.obj[i].settings).clients[0].id;

Getting the first element of json data with jq

I'm working with Poloniex API. While using returnTicker function, the data comes like this.
curl "https://poloniex.com/public?command=returnTicker"
{
"BTC_BTS": {
"id": 14,
"last": "0.00000111",
"lowestAsk": "0.00000112",
"highestBid": "0.00000110",
"percentChange": "0.09900990",
"baseVolume": "3.12079869",
"quoteVolume": "2318738.79293715",
"isFrozen": "0",
"high24hr": "0.00000152",
"low24hr": "0.00000098"
},
"BTC_DASH": {
"id": 24,
"last": "0.00466173",
"lowestAsk": "0.00466008",
"highestBid": "0.00464358",
"percentChange": "0.02318430",
"baseVolume": "1.98111396",
"quoteVolume": "425.22973220",
"isFrozen": "0",
"high24hr": "0.00482962",
"low24hr": "0.00450482"
....
},
"USDT_GRT": {
"id": 497,
"last": "0.72811272",
"lowestAsk": "0.75999916",
"highestBid": "0.72740000",
"percentChange": "0.48594450",
"baseVolume": "133995.43411815",
"quoteVolume": "194721.36672887",
"isFrozen": "0",
"high24hr": "0.79000000",
"low24hr": "0.45000020"
},
"TRX_SUN": {
"id": 498,
"last": "500.00000000",
"lowestAsk": "449.99999999",
"highestBid": "100.00000000",
"percentChange": "0.00000000",
"baseVolume": "0.00000000",
"quoteVolume": "0.00000000",
"isFrozen": "0",
"high24hr": "0.00000000",
"low24hr": "0.00000000"
}
}
I want the output like this
BTC_BTS : 14 : 0.00000111 : 0.00000112 : 0.00000110 : 0.09900990 : 3.12079869 : 2318738.79293715 : 0 : 0.00000152 : 0.00000098
...
USDT_GRT : 497 : 0.72428700 : 0.75999958 : 0.72630001 : 0.47813685 : 133968.74968533 : 194695.96886712 : 0 : 0.79000000 : 0.45000020
TRX_SUN : 498 : 500.00000000 : 449.99999999 : 100.00000000 : 0.00000000 : 0.00000000 : 0.00000000 : 0 : 0.00000000 : 0.00000000
I am using jq and my problem is accesing the currency pair name.
I could do this;
14 : 0.00000111 : 0.00000112 : 0.00000110 : 0.09900990 : 3.12079869 : 2318738.79293715 : 0 : 0.00000152 : 0.00000098
...
497 : 0.72428700 : 0.75999958 : 0.72630001 : 0.47813685 : 133968.74968533 : 194695.96886712 : 0 : 0.79000000 : 0.45000020
498 : 500.00000000 : 449.99999999 : 100.00000000 : 0.00000000 : 0.00000000 : 0.00000000 : 0 : 0.00000000 : 0.00000000
by using this command;
curl "https://poloniex.com/public?command=returnTicker" |jq -r | jq '.[] | (.id|tostring) + " : " + (.last|tostring) + " : " + (.lowestAsk|tostring) + " : " + (.highestBid|tostring) + " : " + (.percentChange|tostring) + " : " + (.baseVolume|tostring) + " : " + (.quoteVolume|tostring) + " : " + (.isFrozen|tostring) + " : " + (.high24hr|tostring) + " : " + (.low24hr|tostring)'|jq -r
not only this, in every jq pipeline, I cant access the first element of json
I am not meaning the |jq .BTC_BTS or |jq .USDT_GRT pipeline.
|jq . gives whole json, |jq .[] gives the sub elements after the first element.
How can i access the first path?
By the way, I may have written stupid and long pipeline with jq. If you have any idea to convert whole json to a row-column data, I am open to your ideas.
Thank you all for your answers.
To be safe, it might be better not to assume that the ordering of the keys is the same in all the inner objects. Ergo:
keys_unsorted as $outer
| (.[$outer[0]] | keys_unsorted) as $keys
| $outer[] as $k
| [ $k, .[$k][$keys[]] ]
| join(" : ")
I think this does what you want.
curl -s "https://poloniex.com/public?command=returnTicker" | \
jq -r 'to_entries
| .[]
| [ .key, (.value | to_entries | .[] | .value) ]
| join(" : ")'
In a nutshell, put everything in an array and use join to produce the desired output.
Update
As luciole75w notes, my solution has too many steps. This is better.
jq -r 'to_entries[] | [ .key, .value[] ] | join(" : ")'
That said, I would use peak's solution. Mine does not guarantee that the columns are the same for each line.

JSON file to CSV file conversion using jq

I am trying to convert my json file to a csv file using jq. Below is the sample input events.json file.
{
"took" : 111,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "alerts",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"alertID" : "639387c3-0fbe-4c2b-9387-c30fbe7c2bc6",
"alertCategory" : "Server Alert",
"description" : "Successfully started.",
"logId" : null
}
},
{
"_index" : "alerts",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"alertID" : "2",
"alertCategory" : "Server Alert",
"description" : "Successfully stoped.",
"logId" : null
}
}
]
}
}
My rows in csv should have the data inside each _source tag. So my columns would be alertId , alertCategory , description and logId with its respective data.
I tried the below command :
jq --raw-output '.hits[] | [."alertId",."alertCategory",."description",."logId"] | #csv' < /root/events.json
and its not working.
Can anyone help me with this?
Your path-expression is not right, you have a hits array inside an object named hits and the fields you trying to put in CSV is present under __source object.
So your expression should have been below. Use it along with -r flag to put the output in raw output format
.hits.hits[]._source | [ .alertID, .alertCategory, .description, .logId ] | #csv
If your fields are null, the string representation of your null field value results in just "". If you want an explicit "null" string representation, use the alternate operator along with the field you expect to be null, e.g. instead of .logId, you can do (.logId // "null")
To add the column name as the header in the output CSV format, you could use the #csv or the join(",") function in raw output format -r
[ "alertId" , "alertCategory" , "description", "logId" ],
( .hits.hits[]._source | [ .alertID, .alertCategory, .description, .logId // "null" ]) | #csv
or
[ "alertId" , "alertCategory" , "description", "logId" ],
( .hits.hits[]._source | [ .alertID, .alertCategory, .description, .logId // "null" ]) | join(",")

Nativescript/Google Geocoding JSON Parse

I am reading a geocoding from Google API in my Nativescript code. The results are coming this way:
{"_bodyInit":"{\n \"results\" : [\n {\n \"address_components\" : [\n {\n \"long_name\" : \"Cooper City\",\n \"short_name\" : \"Cooper City\",\n \"types\" : [ \"locality\", \"political\" ]\n },\n {\n \"long_name\" : \"Broward County\",\n \"short_name\" : \"Broward County\",\n \"types\" : [ \"administrative_area_level_2\", \"political\" ]\n },\n {\n \"long_name\" : \"Florida\",\n \"short_name\" : \"FL\",\n \"types\" : [ \"administrative_area_level_1\", \"political\" ]\n },\n {\n \"long_name\" : \"United States\",\n \"short_name\" : \"US\",\n \"types\" : [ \"country\", \"political\" ]\n }\n ],\n \"formatted_address\" : \"Cooper City, FL, USA\",\n \"geometry\" : {\n
when reading from this code:
fetchModule.fetch(geoPlace, {
method: "GET"
})
.then(function(response) {
alert({title: "GET Response", message: JSON.stringify(response), okButtonText: "Close"});
console.log(JSON.stringify(response))
}, function(error) {
console.log(JSON.stringify(error));
})
My question is, how to get access, for example, to "Cooper City"?
I have tried (without success):
console.log(response.value["results"])
console.log(response[0])
console.log(response.results[0])
making a revision of the response I identify that it is not a successful JSON format, you should look for the way to convert it into JSON format to be able to use its code and get to the tag you need.
https://jsonformatter.curiousconcept.com/
Here's how I found the solution:
var m = JSON.parse(response._bodyInit)
console.log(m.results[0].address_components[1].short_name)
For some reason I don't know the original results did not return as a valid JSON.

jsonschema.core.exceptions.InvalidSchemaException: fatal: core.invalidSchema level: "fatal"

My schema is :
{
"title": "Order",
"description": "An order from oms",
"type": "object",
"properties": {
"order_id": {
"description": "The unique identifier for an order",
"type": "number"
},
"order_bill_from_party_id": {
"description": "The unique identifier for a party",
"type": "string"
},
"test":{
"type":"integer"
}
},
"required": ["order_id"]
}
My input is :
{"order_bill_from_party_id":"abc",
"order_id":1234}
My validator code :
val factory : JsonSchemaFactory= JsonSchemaFactory.byDefault()
val validator: JsonValidator = factory.getValidator
val schemaJson: JsonNode = JsonNodeFactory.instance.textNode(schema)
val inputJson = JsonNodeFactory.instance.textNode(input)
println(schemaJson)
val report: ProcessingReport = validator.validate(schemaJson, inputJson)
EDIT :
The schemaJson takes form :
{\n
\"title\": \"Order\",\n
\"description\": \"An order from oms\",\n
\"type\": \"object\",\n
\"properties\": {\n
\"order_id\": {\n
\"description\": \"The unique identifier for an order\",\n
\"type\": \"number\"\n
},\n
\"order_bill_from_party_id\": {\n
\"description\": \"The unique identifier for a party\",\n
\"type\": \"string\"\n
},\n
\"test\":{\n
\"type\":\"integer\"\n
}\n
},\n
\"required\": [\"order_id\"]\n
}
However I am getting an exception :
org.specs.runner.SpecError: com.github.fge.jsonschema.core.exceptions.InvalidSchemaException: fatal: core.invalidSchema
level: "fatal"
org.specs.runner.UserError: com.github.fge.jsonschema.core.exceptions.InvalidSchemaException: fatal: core.invalidSchema
level: "fatal"
at com.github.fge.jsonschema.processors.validation.ValidationProcessor.process(ValidationProcessor.java:86)
at com.github.fge.jsonschema.processors.validation.ValidationProcessor.process(ValidationProcessor.java:48)
at com.github.fge.jsonschema.core.processing.ProcessingResult.of(ProcessingResult.java:78)
at com.github.fge.jsonschema.main.JsonValidator.validate(JsonValidator.java:103)
at com.github.fge.jsonschema.main.JsonValidator.validate(JsonValidator.java:123)
at com.flipkart.marketing.bro.core.ValidationSchema.validate(ValidationSchema.scala:25)
at com.flipkart.marketing.bro.core.ValidationSchemaTest$$anonfun$3$$anonfun$apply$1.apply$mcV$sp(ValidationSchemaTest.scala:29)
at com.flipkart.marketing.bro.core.ValidationSchemaTest$$anonfun$3$$anonfun$apply$1.apply(ValidationSchemaTest.scala:27)
at com.flipkart.marketing.bro.core.ValidationSchemaTest$$anonfun$3$$anonfun$apply$1.apply(ValidationSchemaTest.scala:27)
at org.specs.specification.LifeCycle$class.withCurrent(ExampleLifeCycle.scala:66)
at org.specs.specification.Examples.withCurrent(Examples.scala:52)
However on testing on :
http://json-schema-validator.herokuapp.com/
I am getting a
`success
[]`.
Can someone help me out? Testing sites imply my schema is correct but my code is giving excpetions. I think it is because of the escape characters. How can I fix this and remove the escape characters in the JsonNode?
Thanks in advance!
Use JsonLoader.fromString instead of JsonNodeFactory.instance.textNode to load your schema and your instance:
val schemaJson= JsonLoader.fromString(schema)
val inputJson = JsonLoader.fromString(input)
textNode constructs a single node containing a text value, which is not what you are after.