json.loads or json.load() on tweet result - json

I use this code to get the tweet from the feed which i write inside a file.
When i read the file and try to json the lines i always get an ERROR.
def SearchTwt(api):
os.chdir('/Users/me/Desktop')
SearchResult = api.search( q='market',lang='en',rpp=20)
text_file = open("TweetOut.txt", "w")
for tw in SearchResult:
text_file.write(str(tw))
print(str(tw))
text_file.close()
I read the file with:
def readfile():
tweets_data = []
os.chdir('/Users/me/Desktop')
file = open("TweetOut.txt", "r")
for line in file:
parts = line.split("Status(")
print (len(parts))
for part in parts:
tweet = 'Status('+part
if len(tweet) > 10:
tweetj = json.loads(tweet)
#tweets_data.append(tweet)
print(tweet)
file.close()
May be this is wrong to fill the file with str(tw)? Yes I rebuild the string during the reading because i thought the tweet started like that. So may be another mistake.
I tried a lot of other options.
the error:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
the file starts like this (edited the url as asked by stack):
Status(source='SocialFlow', id=757991135465857024, in_reply_to_status_id=None, is_quote_status=False, entities={'hashtags': [], 'user_mentions': [], 'symbols': [], 'urls': [{'url': '', 'expanded_url': '', 'display_url':

The file is not valid JSON. It should be something like
{
"source": "SocialFlow",
"id":"757991135465857024",
...
"entities": {
"hashtags": [],
"user_mentions": [],
...
}
}
Because it is not valid json you either have to parse it a different way, or be sure to write it as json when you save the file.

Related

Seeding rails project with Json file

I'm at a lost and my searches have gotten me nowhere.
In my seeds.rb file I have the following code
require 'json'
jsonfile = File.open 'db/search_result2.json'
jsondata = JSON.load jsonfile
#jsondata = JSON.parse(jsonfile)
jsondata[].each do |data|
Jobpost.create!(post: data['title'],
link: data['link'],
image: data['pagemap']['cse_image']['src'] )
end
Snippet of the json file looks like this:
{
"kind": "customsearch#result",
"title": "Careers Open Positions - Databricks",
"link": "https://databricks.com/company/careers/open-positions",
"pagemap": {
"cse_image": [
{
"src": "https://databricks.com/wp-content/uploads/2020/08/careeers-new-og-image-sept20.jpg"
}
]
}
},
Fixed jsondata[].each to jasondata.each. Now I'm getting the following error:
TypeError: no implicit conversion of String into Integer
jsondata[] says to call the [] method with no arguments on the object in the jsondata variable. Normally [] would take an index like jsondata[0] to get the first element or a start and length like jsondata[0, 5] to get the first five elements.
You want to call the each method on jsondata, so jsondata.each.
So this is very specific to what you have posted:
require 'json'
file = File.open('path_to_file.json').read
json_data = JSON.parse file
p json_data['kind'] #=> "customsearch#result"
# etc for all the other keys
now maybe the json you posted is just the first element in an array:
[
{}, // where each {} is the json you posted
{},
{},
// etc
]
in which case you will indeed have to iterate:
require 'json'
file = File.open('path_to_file.json').read
json_data = JSON.parse file
json_data.each do |data|
p data['kind'] #=> "customsearch#result"
end

creating a nested json document

I have below document in mongodb.
I am using below python code to save it in .json file.
file = 'employee'
json_cur = find_document(file)
count_document = emp_collection.count_documents({})
with open(file_path, 'w') as f:
f.write('[')
for i, document in enumerate(json_cur, 1):
print("document : ", document)
f.write(dumps(document))
if i != count_document:
f.write(',')
f.write(']')
the output is -
{
"_id":{
"$oid":"611288c262c5c14df84f649b"
},
"Lname":"Borg",
"Fname":"James",
"Dname":"Headquarters",
"Projects":"[{"HOURS": 5.0, "PNAME": "Reorganization", "PNUMBER": 20}]"
}
But i need it like this (Projects value without quotes) -
{
"_id":{
"$oid":"611288c262c5c14df84f649b"
},
"Lname":"Borg",
"Fname":"James",
"Dname":"Headquarters",
"Projects":[{"HOURS": 5.0, "PNAME": "Reorganization", "PNUMBER": 20}]
}
Could anyone please help me to resolve this?
Thanks,
Jay
You should parse the JSON from the Projects field
Like this:
from json import loads
document['Projects'] = loads(document['Projects'])
So,
file = 'employee'
json_cur = find_document(file)
count_document = emp_collection.count_documents({})
with open(file_path, 'w') as f:
f.write('[')
for i, document in enumerate(json_cur, 1):
document['Projects'] = loads(document['Projects'])
print("document : ", document)
f.write(dumps(document))
if i != count_document:
f.write(',')
f.write(']')

Saving json file by dumping dictionary in a for loop, leading to malformed json

So I have the following dictionaries that I get by parsing a text file
keys = ["scientific name", "common names", "colors]
values = ["somename1", ["name11", "name12"], ["color11", "color12"]]
keys = ["scientific name", "common names", "colors]
values = ["somename2", ["name21", "name22"], ["color21", "color22"]]
and so on. I am dumping the key value pairs using a dictionary to a json file using a for loop where I go through each key value pair one by one
for loop starts
d = dict(zip(keys, values))
with open("file.json", 'a') as j:
json.dump(d, j)
If I open the saved json file I see the contents as
{"scientific name": "somename1", "common names": ["name11", "name12"], "colors": ["color11", "color12"]}{"scientific name": "somename2", "common names": ["name21", "name22"], "colors": ["color21", "color22"]}
Is this the right way to do it?
The purpose is to query the common name or colors for a given scientific name. So then I do
with open("file.json", "r") as j:
data = json.load(j)
I get the error, json.decoder.JSONDecodeError: Extra data:
I think this is because I am not dumping the dictionaries in json in the for loop correctly. I have to insert some square brackets programatically. Just doing json.dump(d, j) won't suffice.
JSON may only have one root element. This root element can be [], {} or most other datatypes.
In your file, however, you get multiple root elements:
{...}{...}
This isn't valid JSON, and the error Extra data refers to the second {}, where valid JSON would end instead.
You can write multiple dicts to a JSON string, but you need to wrap them in an array:
[{...},{...}]
But now off to how I would fix your code. First, I rewrote what you posted, because your code was rather pseudo-code and didn't run directly.
import json
inputs = [(["scientific name", "common names", "colors"],
["somename1", ["name11", "name12"], ["color11", "color12"]]),
(["scientific name", "common names", "colors"],
["somename2", ["name21", "name22"], ["color21", "color22"]])]
for keys, values in inputs:
d = dict(zip(keys, values))
with open("file.json", 'a') as j:
json.dump(d, j)
with open("file.json", 'r') as j:
print(json.load(j))
As you correctly realized, this code failes with
json.decoder.JSONDecodeError: Extra data: line 1 column 105 (char 104)
The way I would write it, is:
import json
inputs = [(["scientific name", "common names", "colors"],
["somename1", ["name11", "name12"], ["color11", "color12"]]),
(["scientific name", "common names", "colors"],
["somename2", ["name21", "name22"], ["color21", "color22"]])]
jsonData = list()
for keys, values in inputs:
d = dict(zip(keys, values))
jsonData.append(d)
with open("file.json", 'w') as j:
json.dump(jsonData, j)
with open("file.json", 'r') as j:
print(json.load(j))
Also, for python's json library, it is important that you write the entire json file in one go, meaning with 'w' instead of 'a'.

Reading a json file into a RDD (not dataFrame) using pyspark

I have the following file: test.json >
{
"id": 1,
"name": "A green door",
"price": 12.50,
"tags": ["home", "green"]
}
I want to load this file into a RDD. This is what I tried:
rddj = sc.textFile('test.json')
rdd_res = rddj.map(lambda x: json.loads(x))
I got an error:
Expecting object: line 1 column 1 (char 0)
I don't completely understand what does json.loads do.
How can I resolve this problem ?
textFile reads data line by line. Individual lines of your input are not syntactically valid JSON.
Just use json reader:
spark.read.json("test.json", multiLine=True)
or (not recommended) whole text files
sc.wholeTextFiles("test.json").values().map(json.loads)

create fixtures with custom manager methods, json dumps and ways to avoid type error :xxx is not json serializable

I'm trying to create a test fixture using custom manager methods as my app uses a subset of dbtables and fewer records. so i dropped the idea of using initial_data. In manager I'm doing something like this. in Managers.py:
sitedict = Site.objects.filter(pk=1234).values()[0]
custdict = Customer.objects.filter(custid=123456).values()[0]
customer = {"pk":123456,"model":"myapp.customer","fields":custdict}
site = {"pk":0001,"model":"myapp.site","fields":sitedict}
csvfile = open('shoppingcart/bsofttestdata.csv','wb')
csv_writer = csv.writer(csvfile)
csv_writer.writerow([customer,site])
then i did modify my csv file to replace single quotes with double, etc. Then i did save that file as json.Sorry if its too dumb way but this is the first time I'm creating testdata,I'd love to learn better way.Sample data of the file is like this in : myapp/fixtures/testdata.json
[{"pk": 123456, "model": "myapp.customer", "fields": {"city": "abc", "maritalstatus": None, "zipcode": "12345", "lname": "fdfdf", "state": "AZ", "agentid": 1111, "fname": "sdadsad", "email": "abcd#xxx.com", "phone": "0000000000", "custid":123456,"datecreate": datetime.datetime(2011, 3, 29, 11, 40, 18, 157612)}},{"pk":0001, "model": "myapp.site", "fields": {"url": "http://google.com", "websitecode": "", "notify": True, "fee": 210.0, "id":0001}}]
I used this to run my tests but i got the following error:
EProblem installing fixture '/var/lib/django/myproject/myapp/fixtures/testdata.json':
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.6/django/core/management/commands/loaddata.py", line 150, in handle
for obj in objects:
File "/usr/lib/pymodules/python2.6/django/core/serializers/json.py", line 41, in Deserializer
for obj in PythonDeserializer(simplejson.load(stream)):
File "/usr/lib/pymodules/python2.6/simplejson/__init__.py", line 267, in load
parse_constant=parse_constant, **kw)
File "/usr/lib/pymodules/python2.6/simplejson/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib/pymodules/python2.6/simplejson/decoder.py", line 335, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/pymodules/python2.6/simplejson/decoder.py", line 353, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
in stead of using raw find replace its better to use something as shown here and when we have some datatypes that JSON doesn't support.this would be helpful to get rid of TypeError: xxxxxxx is not JSON serializable or specifically stackover post for Datetime problem will be helpful.
EDIT:
instead of writing to csv then manually modifying it,I did the following:
with open('myapp/fixtures/customer_testdata.json',mode = 'w') as f:
json.dump(customer,f,indent=2)
here is small code I used to get out of the TypeError:xxxx not json blah blah problem
for key in cust.keys():
value = cust[key]
if isinstance(cust[key],datetime.datetime):
temp = cust[key].timetuple() # this converts datetime.datetime to time.struct_time
cust.update({key:{'__class__':'time.asctime','__value__':time.asctime(temp)}})
return cust
if we convert datetime.datetime to any other type, then we have to chang the class accordingly. E.g timestamp --> float here is fantastic reference for datetime conversions
Hope this is helpful.