Create JSON file from MongoDB document using Python - json

I am using MongoDB 3.4 and Python 2.7. I have retrieved a document from the database and I can print it and the structure indicates it is a Python dictionary. I would like to write out the content of this document as a JSON file. When I create a simple dictionary like d = {"one": 1, "two": 2} I can then write it to a file using json.dump(d, open("text.txt", 'w'))
However, if I replace d in the above code with the the document I retrieve from MongoDB I get the error
ObjectId is not JSON serializable
Suggestions?

As you have found out, the issue is that the value of _id is in ObjectId.
The class definition for ObjectId is not understood by the default json encoder to be serialised. You should be getting similar error for ANY Python object that is not understood by the default JSONEncoder.
One alternative is to write your own custom encoder to serialise ObjectId. However, you should avoid inventing the wheel and use the provided PyMongo/bson utility method bson.json_util
For example:
from bson import json_util
import json
json.dump(json_util.dumps(d), open("text.json", "w"))

The issue is that “_id” is actually an object and not natively deserialized. By replacing the _id with a string as in mydocument['_id'] ='123 fixed the issue.

Related

sqlalchemy/psycopg2 desreializes my jsons as dicts but doesn't preserve order

I have a psql query that returns a json object. I fetch this query with results.fetchall() and I get the json properly as a dict.
however, as I'm in python 3.4, not yet in 3.6, the objects' order is not preserved in the dict. I saw there's a way to use OrderedDict to keep the order of the json but I'm not sure how to tell sqlalchemy/psycopg2 to use it.
can anybody help please?
As indicated in the documentation, you must provide a custom deserializer when creating your engine:
from functools import partial
import json, collections
engine = create_engine(
...,
json_deserializer=partial(
json.loads,
object_pairs_hook=collections.OrderedDict),
)

Scala code to insert JSON string to mongo DB using scala drivers/casbah

I found some code which can parse JSON document, convert it to BSON and then insert. But this code is implemented using Java classes in casbah. I could not find corresponding implementation in Scala.
Also casbah documentation says "In general, you should only need the org.mongodb.scala and org.bson namespaces in your code. You should not need to import from the com.mongodb namespace as there are equivalent type aliases and companion objects in the Scala driver. The main exception is for advanced configuration via the builders in MongoClientSettings which is considered to be for advanced users."
If you see below code and note imports, they are using com.mongodb classes. I can use below code in scala and make it work, but I want to know if there is Scala implementation out there to insert JSON into mongodb.
import com.mongodb.DBObject
import com.mongodb.casbah.MongoClient
import com.mongodb.casbah.MongoClientURI
import com.mongodb.util.JSON
val jsonString = """{"card_id" : 75893645814809,"cust_id": 1008,"card_info": {"card_type" : "Travel Card","credit_limit": 126839},"card_dates" : [{"date":"1997-09-09" },{"date":"2007-09-07" }]}"""
val dbObject: DBObject = JSON.parse(jsonString).asInstanceOf[DBObject]
val mongo = MongoClient(MongoClientURI("mongodb://127.0.0.1:27017"))
val buffer = new java.util.ArrayList[DBObject]()
buffer.add(dbObject)
mongo.getDB("yourDBName").getCollection("yourCollectionName").insert(buffer)
buffer.clear()
Reference : Scala code to insert JSON string to mongo DB
I found few link online which suggests to use different JSON parser libraries, but none of them seems straightforward even though above ~5 lines of code can insert JSON document in Java. I would like to achieve similar thing in Java.

Parsing large JSON file with Scala and JSON4S

I'm working with Scala in IntelliJ IDEA 15 and trying to parse a large twitter record json file and count the total number of hashtags. I am very new to Scala and the idea of functional programming. Each line in the json file is a json object (representing a tweet). Each line in the file starts like so:
{"in_reply_to_status_id":null,"text":"To my followers sorry..
{"in_reply_to_status_id":null,"text":"#victory","in_reply_to_screen_name"..
{"in_reply_to_status_id":null,"text":"I'm so full I can't move"..
I am most interested in a property called "entities" which contains a property called "hastags" with a list of hashtags. Here is an example:
"entities":{"hashtags":[{"text":"thewayiseeit","indices":[0,13]}],"user_mentions":[],"urls":[]},
I've browsed the various scala frameworks for parsing json and have decided to use json4s. I have the following code in my Scala script.
import org.json4s.native.JsonMethods._
var json: String = ""
for (line <- io.Source.fromFile("twitter38.json").getLines) json += line
val data = parse(json)
My logic here is that I am trying to read each line from twitter38.json into a string and then parse the entire string with parse(). The parse function is throwing an error claiming:
"Type mismatch, expected: Nothing, found:String."
I have seen examples that use parse() on strings that hold json objects such as
val jsontest =
"""{
|"name" : "bob",
|"age" : "50",
|"gender" : "male"
|}
""".stripMargin
val data = parse(jsontest)
but I have received the same error. I am coming from an object oriented programming background, is there something fundamentally wrong with the way I am approaching this problem?
You have most likely incorrectly imported dependencies to your Intellij project or modules into your file. Make sure you have the following lines imported:
import org.json4s.native.JsonMethods._
Even if you correctly import this module, parse(String: json) will not work for you, because you have incorrectly formed a json. Your json String will look like this:
"""{"in_reply_...":"someValue1"}{"in_reply_...":"someValues2"}"""
but should look as follows to be a valid json that can be parsed:
"""{{"in_reply_...":"someValue1"},{"in_reply_...":"someValues2"}}"""
i.e. you need starting and ending brackets for the json, and a comma between each line of tweets. Please read the json4s documenation for more information.
Although being almost 6 years old, I think this question deserves another try.
JSON format has a few misunderstandings in people's minds, especially how they are stored and how they are read back.
JSON documents, are stored as either a single object having all the other fields, or an array of multiple object possibly in same format. this second part is important because arrays in almost every programming language are defined by angle brackets and values separated by commas (note here I used a person object as my single value):
[
{"name":"John","surname":"Doe"},
{"name":"Jane","surname":"Doe"}
]
also note that everything except brackets, numbers and booleans are enclosed in quotes when written into file.
however, there is another use that is not official but preferred to transfer datasets easily where every object, or document as in nosql/mongo language, are stored in a new line like this:
{"name":"John","surname":"Doe"}
{"name":"Jane","surname":"Doe"}
so for the question, OP has a document written in this second form, but tries an algorithm written to read the first form. following code has few simple changes to achieve this, and the user must read the file knowing that:
var json: String = "["
for (line <- io.Source.fromFile("twitter38.json").getLines) json += line + ","
json=json.splitAt(json.length()-1)._1
json+= "]"
val data = parse(json)
PS: although #sbrannon, has the correct idea, the example he/she gave has mistakenly curly braces instead of angle brackets to surround the data.
EDIT: I have added json=json.splitAt(json.length()-1)._1 because the code above ends with a trailing comma which will cause parse error per the JSON format definition.

Spark exception handling for json

I am trying to catch/ignore a parsing error when I'm reading a json file
val DF = sqlContext.jsonFile("file")
There are a couple of lines that aren't valid json objects, but the data is too large to go through individually (~1TB)
I've come across exception handling for mapping using import scala.util.Tryand in.map(a => Try(a.toInt)) referencing:
how to handle the Exception in spark map() function?
How would I catch an exception when reading a json file with the function sqlContext.jsonFile?
Thanks!
Unfortunately you are out of luck here. DataFrameReader.json which is used under the hood is pretty much all-or-nothing. If your input contains malformed lines you have to filter these manually. A basic solution could look like this:
import scala.util.parsing.json._
val df = sqlContext.read.json(
sc.textFile("file").filter(JSON.parseFull(_).isDefined)
)
Since above validation is rather expensive you may prefer to drop jsonFile / read.json completely and to use parsed JSON lines directly.

Problems with using a .tsv file that contains JSON as a data feed file in Gatling

I am using Gatling to stress test a RESTful API. I will be posting data that is JSON to a particular URI. I want to use a feed file that is a .tsv where each line is a particular JSON element. However, I get errors and I just can't seem to find a pattern or system to add "" to my .tsv JSON so the feed will work. Attached is my code and tsv file.
package philSim
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._
class eventAPISimulation extends Simulation {
object Query {
val feeder = tsv("inputJSON.tsv").circular
val query = forever {
feed(feeder)
.exec(
http("event")
.post("my/URI/here")
.body(StringBody("${json}")).asJSON
)
}
}
val httpConf = http.baseURL("my.url.here:portnumber")
val scn = scenario("event").exec(Query.query)
setUp(scn.inject(rampUsers(100) over (30 seconds)))
.throttle(reachRps(2000) in (30 seconds), holdFor(3 minutes))
.protocols(httpConf)
}
Here is an example of my unedited .tsv with JSON:
json
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"push","eventTime":"2015-01-23T23:20:50.123Z","platform":"iPhoneApp","notificationId":"123456","pushType":1,"action":"sent","eventData":{}}
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"INVALID","eventTime":"2015-01-23T23:25:20.342Z","platform":"iPhoneApp","notificationId":"123456","pushType":1,"action":"received","eventData":{"osVersion":"7.1.2","productVersion":"5.9.2"}}
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"push","eventTime":"2015-01-23T23:27:30.342Z","platform":"iPhoneApp","notificationId":"123456","pushType":1,"action":"followedLink","eventData":{"deepLinkUrl":"URL.IS.HERE","osVersion":"7.1.2","productVersion":"5.9.2"}}
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"push","eventTime":"2015-01-23T23:27:30.342Z","platform":"AndroidApp","notificationId":"123456","pushType":1,"action":"followedLink","eventData":{"deepLinkUrl":"URL.IS.HERE"}}
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"push","eventTime":"2015-01-23T23:25:20.342Z","platform":"iPhoneApp","notificationId":"123456","pushType":1,"action":"error","eventData":{}}
I have seen this blog post which talks about manipulating quotation marks (") to get the author's JSON with .tsv to work but the author doesn't offer a system how. I have tried various things and nothing I do really works. Some JSON will work with the quotation wrap similar to what the author of the paper does. However, this doesn't work for everything. What are the best practices for dealing with JSON and Gatling? Thanks for your help!
Straight from Gatling's documentation : Use rawSplit so that Gatling's TSV parser will be able to handle your JSON entries:
tsv("inputJSON.tsv", rawSplit = true).circular