Is there a way to change values, or assign new variables in a json file and after give it back in the same format?
It can be used rjson pachage to get the json file in R in data.frame format but how to covert back this data.frame to json after my changes?
EDIT:
sample code:
json file:
{"__v":1,"_id":{"$oid":"559390f6fa76bc94285fa68a"},"accountId":6,"api":false,"countryCode":"no","countryName":"Norway","date":{"$date":"2015-07-01T07:04:22.265Z"},"partnerId":1,"query":{"search":[{"label":"skill","operator":"and","terms":["java"],"type":"required"}]},"terms":[{"$oid":"559390f6fa76bc94285fa68b"}],"time":19,"url":"eyJzZWFyY2giOlt7InRlcm1zIjpbImphdmEiXSwibGFiZWwiOiJza2lsbCIsInR5cGUiOiJyZXF1aXJlZCIsIm9wZXJhdG9yIjoiYW5kIn1dfQ","user":11}
{"__v":1,"_id":{"$oid":"5593910cfa76bc94285fa68d"},"accountId":6,"api":false,"countryCode":"se","countryName":"Sweden","date":{"$date":"2015-07-01T07:04:44.565Z"},"partnerId":1,"query":{"search":[{"label":"company","operator":"or","terms":["microsoft"],"type":"required"},{"label":"country","operator":"or","terms":["se"],"type":"required"}]},"terms":[{"$oid":"5593910cfa76bc94285fa68e"},{"$oid":"5593910cfa76bc94285fa68f"}],"time":98,"url":"eyJzZWFyY2giOlt7InRlcm1zIjpbIm1pY3Jvc29mdCJdLCJsYWJlbCI6ImNvbXBhbnkiLCJ0eXBlIjoicmVxdWlyZWQiLCJvcGVyYXRvciI6Im9yIn0seyJ0ZXJtcyI6WyJzZSJdLCJsYWJlbCI6ImNvdW50cnkiLCJ0eXBlIjoicmVxdWlyZWQiLCJvcGVyYXRvciI6Im9yIn1dfQ","user":13}
Code:
library('rjson')
c <- file(Usersfile,'r')
l <- readLines(c,-1L)
json <- lapply(X=l,fromJSON)
json[[1]]$countryName <- 'Jamaica'
result <- cat(toJSON(json))
Output(is one line and start with [:
[{"__v":1,"_id":{"$oid":"559390f6fa76bc94285fa68a"},"accountId":6,"api":false,"countryCode":"no","countryName":"Jamaica","date":{"$date":"2015-07-01T07:04:22.265Z"},"partnerId":1,"query":{"search":[{"label":"skill","operator":"and","terms":"java","type":"required"}]},"terms":[{"$oid":"559390f6fa76bc94285fa68b"}],"time":19,"url":"eyJzZWFyY2giOlt7InRlcm1zIjpbImphdmEiXSwibGFiZWwiOiJza2lsbCIsInR5cGUiOiJyZXF1aXJlZCIsIm9wZXJhdG9yIjoiYW5kIn1dfQ","user":11},{"__v":1,"_id":{"$oid":"5593910cfa76bc94285fa68d"},"accountId":6,"api":false,"countryCode":"se","countryName":"Sweden","date":{"$date":"2015-07-01T07:04:44.565Z"},"partnerId":1,"query":{"search":[{"label":"company","operator":"or","terms":"microsoft","type":"required"},{"label":"country","operator":"or","terms":"se","type":"required"}]},"terms":[{"$oid":"5593910cfa76bc94285fa68e"},{"$oid":"5593910cfa76bc94285fa68f"}],"time":98,"url":"eyJzZWFyY2giOlt7InRlcm1zIjpbIm1pY3Jvc29mdCJdLCJsYWJlbCI6ImNvbXBhbnkiLCJ0eXBlIjoicmVxdWlyZWQiLCJvcGVyYXRvciI6Im9yIn0seyJ0ZXJtcyI6WyJzZSJdLCJsYWJlbCI6ImNvdW50cnkiLCJ0eXBlIjoicmVxdWlyZWQiLCJvcGVyYXRvciI6Im9yIn1dfQ","user":13}]
convert data frame to json
So this question has already been answered in full here ^^^
Quick Summary ::
There are 2 options presented.
(A) rjson library
import the library
use to the toJSON() method to create a JSON object. (Not exactly sure what the unname() function does... :p ).
(B) jsonlite library
import the jsonlite library
just use the toJSON() method (same as above, but with no modification).
cat() the above object.
Code examples are at that link. Hope this helps!
Related
I am using Spark/Scala to make an API Request and parse the response into a dataframe. Following is the sample JSON response I am using for testing purpose:
API Request/Response
However, I tried to use the following answer from StackOverflow to convert to JSON but the nested fields are not being processed. Is there any way to convert the JSON string to a dataframe with columns??
I think the problem is that the json that you have attached, if we read it as a df, it is giving a single row(and it is very huge) and hence spark might be truncating the result.
If this is what you want then you can try to use the spark property spark.debug.maxToStringFields to a higher value(default is 25)
spark.conf().set("spark.debug.maxToStringFields", 100)
However, if you want to process the Results from json, then it would be better to get it as data frame and then do the processing. Here is how you can do it
val results = JsonParser.parseString(<json content>).getAsJsonObject().get("Results").getAsJsonArray.toString
import spark.implicits._
val df = spark.read.json(Seq(results).toDS)
df.show(false)
I am using MongoDB 3.4 and Python 2.7. I have retrieved a document from the database and I can print it and the structure indicates it is a Python dictionary. I would like to write out the content of this document as a JSON file. When I create a simple dictionary like d = {"one": 1, "two": 2} I can then write it to a file using json.dump(d, open("text.txt", 'w'))
However, if I replace d in the above code with the the document I retrieve from MongoDB I get the error
ObjectId is not JSON serializable
Suggestions?
As you have found out, the issue is that the value of _id is in ObjectId.
The class definition for ObjectId is not understood by the default json encoder to be serialised. You should be getting similar error for ANY Python object that is not understood by the default JSONEncoder.
One alternative is to write your own custom encoder to serialise ObjectId. However, you should avoid inventing the wheel and use the provided PyMongo/bson utility method bson.json_util
For example:
from bson import json_util
import json
json.dump(json_util.dumps(d), open("text.json", "w"))
The issue is that “_id” is actually an object and not natively deserialized. By replacing the _id with a string as in mydocument['_id'] ='123 fixed the issue.
I want to convert my nested json into csv ,i used
df.write.format("com.databricks.spark.csv").option("header", "true").save("mydata.csv")
But it can use to normal json but not nested json. Anyway that I can convert my nested json to csv?help will be appreciated,Thanks!
When you ask Spark to convert a JSON structure to a CSV, Spark can only map the first level of the JSON.
This happens because of the simplicity of the CSV files. It is just asigning a value to a name. That is why {"name1":"value1", "name2":"value2"...} can be represented as a CSV with this structure:
name1,name2, ...
value1,value2,...
In your case, you are converting a JSON with several levels, so Spark exception is saying that it cannot figure out how to convert such a complex structure into a CSV.
If you try to add only a second level to your JSON, it will work, but be careful. It will remove the names of the second level to include only the values in an array.
You can have a look at this link to see the example for json datasets. It includes an example.
As I have no information about the nature of the data, I can't say much more about it. But if you need to write the information as a CSV you will need to simplify the structure of your data.
Read json file in spark and create dataframe.
val path = "examples/src/main/resources/people.json"
val people = sqlContext.read.json(path)
Save the dataframe using spark-csv
people.write
.format("com.databricks.spark.csv")
.option("header", "true")
.save("newcars.csv")
Source :
read json
save to csv
I was trying to use jsonlite to work with my JSON requests. I was expecting that applying toJSON() to the result of fromJSON() and writing it to a file will produce the same JSON as the original. Apparently, fromJSON does a lot of type conversion from numeric to character and encloses single values into [].
Are there any parameters I can use to make sure we obtain the same json file by toJSON(fromJSON) or I have to care about all the types of all elements myself.
Maybe this could be achieved by some other R JSON library.
Here is the sample of original JSON and past transformation.
Original:
"target": "LENGTH",
"solvers_list": "TMtmil", "passes_num": 45
Modified:
"target":["LENGTH"],"solvers_list":["TMtmil"],"passes_num":[45]
You may need to post the actual JSON (your "Original" is not JSON) if this doesn't help:
orig <- '{"target":"LENGTH","solvers_list":"TMtmil","passes_num":45}'
orig == jsonlite::toJSON(jsonlite::fromJSON(orig), auto_unbox=TRUE)
## [1] TRUE
I am using Gatling to stress test a RESTful API. I will be posting data that is JSON to a particular URI. I want to use a feed file that is a .tsv where each line is a particular JSON element. However, I get errors and I just can't seem to find a pattern or system to add "" to my .tsv JSON so the feed will work. Attached is my code and tsv file.
package philSim
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._
class eventAPISimulation extends Simulation {
object Query {
val feeder = tsv("inputJSON.tsv").circular
val query = forever {
feed(feeder)
.exec(
http("event")
.post("my/URI/here")
.body(StringBody("${json}")).asJSON
)
}
}
val httpConf = http.baseURL("my.url.here:portnumber")
val scn = scenario("event").exec(Query.query)
setUp(scn.inject(rampUsers(100) over (30 seconds)))
.throttle(reachRps(2000) in (30 seconds), holdFor(3 minutes))
.protocols(httpConf)
}
Here is an example of my unedited .tsv with JSON:
json
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"push","eventTime":"2015-01-23T23:20:50.123Z","platform":"iPhoneApp","notificationId":"123456","pushType":1,"action":"sent","eventData":{}}
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"INVALID","eventTime":"2015-01-23T23:25:20.342Z","platform":"iPhoneApp","notificationId":"123456","pushType":1,"action":"received","eventData":{"osVersion":"7.1.2","productVersion":"5.9.2"}}
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"push","eventTime":"2015-01-23T23:27:30.342Z","platform":"iPhoneApp","notificationId":"123456","pushType":1,"action":"followedLink","eventData":{"deepLinkUrl":"URL.IS.HERE","osVersion":"7.1.2","productVersion":"5.9.2"}}
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"push","eventTime":"2015-01-23T23:27:30.342Z","platform":"AndroidApp","notificationId":"123456","pushType":1,"action":"followedLink","eventData":{"deepLinkUrl":"URL.IS.HERE"}}
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"push","eventTime":"2015-01-23T23:25:20.342Z","platform":"iPhoneApp","notificationId":"123456","pushType":1,"action":"error","eventData":{}}
I have seen this blog post which talks about manipulating quotation marks (") to get the author's JSON with .tsv to work but the author doesn't offer a system how. I have tried various things and nothing I do really works. Some JSON will work with the quotation wrap similar to what the author of the paper does. However, this doesn't work for everything. What are the best practices for dealing with JSON and Gatling? Thanks for your help!
Straight from Gatling's documentation : Use rawSplit so that Gatling's TSV parser will be able to handle your JSON entries:
tsv("inputJSON.tsv", rawSplit = true).circular