I have the following type of json document which I need to insert in a mongodb collection with pymongo :
json={
"resource": "/items/6791111",
"user_id": 123456789,
"topic": "items",
"application_id":001,
"attempts": 1,
"sent": "2020-07-22T15:53:06.000-04:00",
"received":"2020-07-22T15:53:06.000-04:00"
}
the fields sent and received are strings so if I run :
collection.insert_one(json)
this will be saved as string in the database, how can I store directly as a date?
I tried something like this:
from dateutil.parser import parse
json['sent']=parse(json['sent'])
collection.insert_one(json)
but doesn't seems to me pretty good solution because I have documents which in some cases have several date fields or sometimes some date field is null (example in a order maybe the delivered field is null until the order is delivered)
something like this:
json2={
"resource": "/items/6791111",
"user_id": 123456789,
"topic": "items",
"application_id":001,
"attempts": 1,
"sent": "2020-07-22T15:53:06.000-04:00",
"received":Null
}
now I'm parsing the dates by hand using a function, but its really not useful at all
And I need to have the datefield parsed as dates so I can filter by time.
You can use attempt isoparse on each field which will convert any valid dates to datetime format and will therefore be stored in MongoDB as a BSON date type. Nulls will be unaffected.
from dateutil.parser import isoparse
k, v in json.items():
try:
json[k] = isoparse(v)
except Exception:
pass
Full worked example:
from pymongo import MongoClient
from dateutil.parser import isoparse
import pprint
collection = MongoClient()['mydatabase'].collection
json={
"resource": "/items/6791111",
"user_id": 123456789,
"topic": "items",
"application_id":1,
"attempts": 1,
"sent": "2020-07-22T15:53:06.000-04:00",
"received":"2020-07-22T15:53:06.000-04:00",
}
for k, v in json.items():
try:
json[k] = isoparse(v)
except Exception:
pass
collection.insert_one(json)
pprint.pprint(collection.find_one(), indent=4)
gives:
{ '_id': ObjectId('5fde015e794ced49eeaa7a65'),
'application_id': 1,
'attempts': 1,
'nulldate': None,
'received': datetime.datetime(2020, 7, 22, 19, 53, 6),
'resource': '/items/6791111',
'sent': datetime.datetime(2020, 7, 22, 19, 53, 6),
'topic': 'items',
'user_id': 123456789}
Related
I have a json object served from an api as follows:
{
"workouts": [
{
"id": 92527291,
"starts": "2021-06-28T15:42:44.000Z",
"minutes": 30,
"name": "Indoor Cycling",
"created_at": "2021-06-28T16:12:57.000Z",
"updated_at": "2021-06-28T16:12:57.000Z",
"plan_id": null,
"workout_token": "ELEMNT BOLT A1B3:59",
"workout_type_id": 12,
"workout_summary": {
"id": 87540207,
"heart_rate_avg": "152.0",
"calories_accum": "332.0",
"created_at": "2021-06-28T16:12:58.000Z",
"updated_at": "2021-06-28T16:12:58.000Z",
"power_avg": "185.0",
"distance_accum": "17520.21",
"cadence_avg": "87.0",
"ascent_accum": "0.0",
"duration_active_accum": "1801.0",
"duration_paused_accum": "0.0",
"duration_total_accum": "1801.0",
"power_bike_np_last": "186.0",
"power_bike_tss_last": "27.6",
"speed_avg": "9.73",
"work_accum": "332109.0",
"file": {
"url": "https://cdn.wahooligan.com/wahoo-cloud/production/uploads/workout_file/file/FPoJBPZo17BvTmSomq5Y_Q/2021-06-28-154244-ELEMNT_BOLT_A1B3-59-0.fit"
}
}
}
],
"total": 55,
"page": 1,
"per_page": 1,
"order": "descending",
"sort": "starts"
}
I want to get the data into a dataframe. However, lots of the columns seem to have a dtype of object. I assume that this is because some of the numeric values in the json are double quoted. What is the best and most efficient way to avoid this (the json potentially has many workouts elements)?
Is it to fix the returned json? Or to iterate through the dataframe columns and convert the objects to floats?
Thank you
Martyn
IIUC, you can try:
df = pd.json_normalize(json_data, meta=[
'total', 'page', 'per_page', 'order', 'sort'], record_path='workouts').convert_dtypes()
Try using pandas.to_numeric. Here are the docs.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html
I am coding using Python, Flask, pandas. I am reading data from a REST API.
When I get the data from the REST API, the dispenserId used to be an Integer meaning that each value started with a number different from 0.
This weekend, I received dispenserIds starting with a 0 (zero) character, so calling json.load(path_to_filenamen) does not parse the JSON file anymore due to errors.
See the sample
{
"result": {
"dispensers": [
{
"dispenserId": 00000,
"dispenserName": "1st Floor",
"dispenserType": "H2",
"status": "Green",
"locationId": 12345
},
{
"dispenserId": 98765,
"dispenserName": "2nd Floor",
"dispenserType": "S4",
"status": "Green",
"locationId": 23456
},
{
"dispenserId": 00001,
"dispenserName": "3rd Floor",
"dispenserType": "H2",
"status": "Green",
"locationId": 34567
}
]
}
}
I receive Exception has occurred: TypeError string indices must be integers when I call data["result"]["dispensers"].
How can I indicate to the JSON parser that the dispenserId is a string instead of an Integer?
Few things :
1.
your dispensers collection is not closed (no closing squarre braket, so it cannot work)
2.
since you got integers, you should not get that much zeros. you should have :
"dispenserId": 0,
or
"dispenserId": 1,
Once you got this corrected, a["result"]["dispensers"] will work just fine, undepending of the values of "dispenserId".
OR :
The values should be given as strings :
"dispenserId": "00000",
and then convert them into integers :
int(a["result"]["dispensers"][0]["dispenserId"])
But anyway, your json file does not respect the json format.
This piece of code should "clean" your file, by deleting all non wanted "0" and convert into Json format :
import re
import codecs
import json
pattern = "(:\s*)0*(\d)"
with codecs.open(path_to_filenamen,"r","utf-8") as f:
myJson = json.loads(re.sub(pattern,'\\1\\2', f.read()))
The myJson var is in Json format, you can therefore use myJson["result"]["dispensers"]
I have twitter account timeline data per tweet saved in .json format, I am unable to save the data into mongodb
Example: fetched data of one tweet.
{
"created_at": "Fri Apr 12 05:13:35 +0000 2019",
"id": 1116570031511359489,
"id_str": "1116570031511359489",
"full_text": "#jurafsky How can i get your video lectures related to Sentiment Analysis",
"truncated": false,
"display_text_range": [0, 73],
"entities": {
"hashtags": [],
"symbols": [],
"user_mentions": [
{
"screen_name": "jurafsky",
"name": "Dan Jurafsky",
"id": 14968475,
"id_str": "14968475",
"indices": [0, 9]
}
],
"urls": []
}
it also contains urls and other lost of information
I have tried the following code.
from pymongo import MongoClient
import json
client=MongoClient('localhost',27107)
db=client.test
coll=db.dataset
with open('tweets.json') as f:
file_data=json.loads(f.read())
coll.insert(file_data)
client.close()
Try this:
from pymongo import MongoClient
import json
client=MongoClient('localhost',27107)
db=client.test
coll=db.dataset
with open('tweets.json') as f:
file_data=json.load(f)
coll.insert(file_data)
client.close()
My json dataset was not valid, I have to merge it to one array object
Thanks to: Can't parse json file: json.decoder.JSONDecodeError: Extra data.
I got a JSON text which I should parse, but for some reason I can't parse it because it has another array inside. My JSON looks like that:
{
"statementId": "1",
"movements": [
{
"id": 65,
"date": "2019-02-05",
"number": 32,
"balance": -4.62,
"purpose": "1"
},
{
"id": 1,
"date": "2019-02-05",
"number": 22,
"balance": -3,
"purpose": "23"
},
{
"id": 32,
"date": "2019-02-05",
"number": 12,
"balance": -11,
"purpose": "2"
}
],
"startPointer": "1122",
"endPointer": "3333"
}
I am using JsonSlurper. I want to know if it is possible to catch all the data inside "movements", I have tried to use this script:
JsonSlurper slurper = new JsonSlurper()
Map parsedJson = slurper.parseText(bodyContent)
String parsed_movements = parsedJson["movements"]
I have no problem with parsing single strings, like statementId or startPointer, but when I try to parse movements with my script it gives me result as null. I have also tried parsedJson["movements"][0] to catch first movement but it also gives me an error.
I have found a lot of things about json parsers on internet and also on stackoverflow but nothing what I seek. I really don't think that it is a duplicate question.
EDIT: I tried for statement also to put each object in array like that:
def movements_array = []
for(def i = 0; i < parsedJson.movements.size(); i++) {
movements_array << parsedJson.movements[i].id
println(movements_array)
}
But it gives me an error: Cannot invoke method size() on null object, because parsedJson.movements is null.
When you do:
String parsed_movements = parsedJson["movements"]
You're sticking a map into a String, which isn't what you want.
Given the json in your question, you can just do
def movementIds = new JsonSlurper().parseText(bodyContents).movements.id
To get a list of [65, 1, 32]
If you're getting NPEs I assume the json isn't what you show in the question
I have a web-service call (HTTP Get) that my Python script makes in which returns a JSON response. The response looks to be a list of Dictionaries. The script's purpose is to iterate through the each dictionary, extract each piece of metadata (i.e. "ClosePrice": "57.74",) and write each dictionary to its own row in Mssql.
The issue is, I don't think Python is recognizing the JSON output from the API call as a list of dictionaries, and when I try a for loop, I'm getting the error must be int not str. I have tried converting the output to a list, dictionary, tuple. I've also tried to make it work with List Comprehension, with no luck. Further, if I copy/paste the data from the API call and assign it to a variable, it recognizes that its a list of dictionaries without issue. Any help would be appreciated. I'm using Python 2.7.
Here is the actual http call being made: http://test.kingegi.com/Api/QuerySystem/GetvalidatedForecasts?user=kingegi&market=us&startdate=08/19/13&enddate=09/12/13
Here is an abbreviated JSON output from the API call:
[
{
"Id": "521d992cb031e30afcb45c6c",
"User": "kingegi",
"Symbol": "psx",
"Company": "phillips 66",
"MarketCap": "34.89B",
"MCapCategory": "large",
"Sector": "basic materials",
"Movement": "up",
"TimeOfDay": "close",
"PredictionDate": "2013-08-29T00:00:00Z",
"Percentage": ".2-.9%",
"Latency": 37.48089483333333,
"PickPosition": 2,
"CurrentPrice": "57.10",
"ClosePrice": "57.74",
"HighPrice": null,
"LowPrice": null,
"Correct": "FALSE",
"GainedPercentage": 0,
"TimeStamp": "2013-08-28T02:31:08 778",
"ResponseMsg": "",
"Exchange": "NYSE "
},
{
"Id": "521d992db031e30afcb45c71",
"User": "kingegi",
"Symbol": "psx",
"Company": "phillips 66",
"MarketCap": "34.89B",
"MCapCategory": "large",
"Sector": "basic materials",
"Movement": "down",
"TimeOfDay": "close",
"PredictionDate": "2013-08-29T00:00:00Z",
"Percentage": "16-30%",
"Latency": 37.4807215,
"PickPosition": 1,
"CurrentPrice": "57.10",
"ClosePrice": "57.74",
"HighPrice": null,
"LowPrice": null,
"Correct": "FALSE",
"GainedPercentage": 0,
"TimeStamp": "2013-08-28T02:31:09 402",
"ResponseMsg": "",
"Exchange": "NYSE "
}
]
Small Part of code being used:
import os,sys
import subprocess
import glob
from os import path
import urllib2
import json
import time
try:
data = urllib2.urlopen('http://api.kingegi.com/Api/QuerySystem/GetvalidatedForecasts?user=kingegi&market=us&startdate=08/10/13&enddate=09/12/13').read()
except urllib2.HTTPError, e:
print "HTTP error: %d" % e.code
except urllib2.URLError, e:
print "Network error: %s" % e.reason.args[1]
list_id=[x['Id'] for x in data] #test to see if it extracts the ID from each Dict
print(data) #Json output
print(len(data)) #should retrieve the number of dict in list
UPDATE
Answered my own question, here is the method below:
`url = 'some url that is a list of dictionaries' #GetCall
u = urllib.urlopen(url) # u is a file-like object
data = u.read()
newdata = json.loads(data)
print(type(newdata)) # printed data type will show as a list
print(len(newdata)) #the length of the list
newdict = newdata[1] # each element in the list is a dict
print(type(newdict)) # this element is a dict
length = len(newdata) # how many elements in the list
for a in range(1,length): #a is a variable that increments itself from 1 until a number
var = (newdata[a])
print(var['Correct'], var['User'])`