I have my data as below which looks like multiple json dictionaries but it is of type string. Can someone please help me out to convert it into json dictionary ?
{"id": "1305857561179152385", "tweet": "If you like vintage coke machines and guys who look like Fred Flintstone you'll love the short we've riffed: Coke R\u2026 ", "ts": "Tue Sep 15 13:14:38 +0000 2020"}{"id": "1305858267067883521", "tweet": "Chinese unicorn Genki Forest plots own beverage hits #China #Chinese #Brands #GoingGlobal\u2026 ", "ts": "Tue Sep 15 13:17:27 +0000 2020"}{"id": "1305858731293507585", "tweet": "RT #CinemaCheezy: If you like vintage coke machines and guys who look like Fred Flintstone you'll love the short we've riffed: Coke Refresh\u2026", "ts": "Tue Sep 15 13:19:17 +0000 2020"}
Try this,
let = "{'a': 'b', 'c': 'd'}{'e':'f', 'g':'h'}"
let_list = let.split('}')
d = []
for i in let_list[:-1]:
val = eval(i + '}')
d.append(val)
The output will be two dictionaries
print(d)
# Will print as shown
[{'a': 'b', 'c': 'd'}, {'e':'f', 'g':'h'}]
import json
json.loads(json_str)
Related
I am trying to read in the JSON structure below into pandas dataframe, but it throws out the error message:
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.
Json data:
'''
{
"Name": "Bob",
"Mobile": 12345678,
"Boolean": true,
"Pets": ["Dog", "cat"],
"Address": {
"Permanent Address": "USA",
"Current Address": "UK"
},
"Favorite Books": {
"Non-fiction": "Outliers",
"Fiction": {"Classic Literature": "The Old Man and the Sea"}
}
}
'''
How do I get this right? I have tried the script below...
'''
j_df = pd.read_json('json_file.json')
j_df
with open(j_file) as jsonfile:
data = json.load(jsonfile)
'''
Read json from file first and pass to json_normalize with DataFrame.explode:
import json
with open('json_file.json') as data_file:
data = json.load(data_file)
df = pd.json_normalize(j).explode('Pets').reset_index(drop=True)
print (df)
Name Mobile Boolean Pets Address.Permanent Address \
0 Bob 12345678 True Dog USA
1 Bob 12345678 True cat USA
Address.Current Address Favorite Books.Non-fiction \
0 UK Outliers
1 UK Outliers
Favorite Books.Fiction.Classic Literature
0 The Old Man and the Sea
1 The Old Man and the Sea
EDIT: For write values to sentence you can select necessary columns, remove duplicates, create numpy array and loop:
for x, y in df[['Name','Favorite Books.Fiction.Classic Literature']].drop_duplicates().to_numpy():
print (f"{x}’s favorite classical iterature book is {y}.")
Bob’s favorite classical iterature book is The Old Man and the Sea.
I want to practice building models and I figured that I'd do it with something that I am familiar with: League of Legends. I'm having trouble replacing an integer in a dataframe with a value in a json.
The datasets I'm using come off of the kaggle. You can grab it and run it for yourself.
https://www.kaggle.com/datasnaek/league-of-legends
I have json file of the form: (it's actually must bigger, but I shortened it)
{
"type": "champion",
"version": "7.17.2",
"data": {
"1": {
"title": "the Dark Child",
"id": 1,
"key": "Annie",
"name": "Annie"
},
"2": {
"title": "the Berserker",
"id": 2,
"key": "Olaf",
"name": "Olaf"
}
}
}
and dataframe of the form
print df
gameDuration t1_champ1id
0 1949 1
1 1851 2
2 1493 1
3 1758 1
4 2094 2
I want to replace the ID in t1_champ1id with the lookup value in the json.
If both of these were dataframe, then I could use the merge option.
This is what I've tried. I don't know if this is the best way to read in the json file.
import pandas
df = pandas.read_csv("lol_file.csv",header=0)
champ = pandas.read_json("champion_info.json", typ='series')
for i in champ.data[0]:
for j in df:
if df.loc[j,('t1_champ1id')] == i:
df.loc[j,('t1_champ1id')] = champ[0][i]['name']
I get the below error:
the label [gameDuration] is not in the [index]'
I'm not sure that this is the most efficient way to do this, but I'm not sure how to do it at all either.
What do y'all think?
Thanks!
for j in df: iterates over the column names in df, which is unnecessary, since you're only looking to match against the column 't1_champ1id'. A better use of pandas functionality is to condense the id:name pairs from your JSON file into a dictionary, and then map it to df['t1_champ1id'].
player_names = {v['id']:v['name'] for v in json_file['data'].itervalues()}
df.loc[:, 't1_champ1id'] = df['t1_champ1id'].map(player_names)
# gameDuration t1_champ1id
# 0 1949 Annie
# 1 1851 Olaf
# 2 1493 Annie
# 3 1758 Annie
# 4 2094 Olaf
Created a dataframe from the 'data' in the json file (also transposed the resulting dataframe and then set the index to what you want to map, the id) then mapped that to the original df.
import json
with open('champion_info.json') as data_file:
champ_json = json.load(data_file)
champs = pd.DataFrame(champ_json['data']).T
champs.set_index('id',inplace=True)
df['champ_name'] = df.t1_champ1id.map(champs['name'])
We scrape the website www.theft-alerts.com. Now we get all the text.
connection = urllib2.urlopen('http://www.theft-alerts.com')
soup = BeautifulSoup(connection.read().replace("<br>","\n"), "html.parser")
theftalerts = []
for sp in soup.select("table div.itemspacingmodified"):
for wd in sp.select("div.itemindentmodified"):
text = wd.text
if not text.startswith("Images :"):
print(text)
with open("theft-alerts.json", 'w') as outFile:
json.dump(theftalerts, outFile, indent=2)
Output:
STOLEN : A LARGE TAYLORS OF LOUGHBOROUGH BELL
Stolen from Bromyard on 7 August 2014
Item : The bell has a diameter of 37 1/2" is approx 3' tall weighs just shy of half a ton and was made by Taylor's of Loughborough in 1902. It is stamped with the numbers 232 and 11.
The bell had come from Co-operative Wholesale Society's Crumpsall Biscuit Works in Manchester.
Any info to : PC 2361. Tel 0300 333 3000
Messages : Send a message
Crime Ref : 22EJ / 50213D-14
No of items stolen : 1
Location : UK > Hereford & Worcs
Category : Shop, Pub, Church, Telephone Boxes & Bygones
ID : 84377
User : 1 ; Antique/Reclamation/Salvage Trade ; (Administrator)
Date Created : 11 Aug 2014 15:27:57
Date Modified : 11 Aug 2014 15:37:21;
How can we categories the text for the JSON file. The JSON file is now empty.
Output JSON:
[]
You can define a list and append all dictionary objects that you create to the list. e.g:
import json
theftalerts = [];
atheftobject = {};
atheftobject['location'] = 'UK > Hereford & Worcs';
atheftobject['category'] = 'Shop, Pub, Church, Telephone Boxes & Bygones';
theftalerts.append(atheftobject);
atheftobject['location'] = 'UK';
atheftobject['category'] = 'Shop';
theftalerts.append(atheftobject);
with open("theft-alerts.json", 'w') as outFile:
print(json.dump(theftalerts, outFile, indent=2))
After this run the theft-alerts.json will contain this json object:
[
{
"category": "Shop",
"location": "UK"
},
{
"category": "Shop",
"location": "UK"
}
]
You can play with this to generate your own JSON object.
Checkout the json module
Your JSON output remains empty because your loop doesn't append to the list.
Here's how I would extract the category name:
theftalerts = []
for sp in soup.select("table div.itemspacingmodified"):
item_text = "\n".join(
[wd.text for wd in sp.select("div.itemindentmodified")
if not wd.text.startswith("Images :")])
category = sp.find(
'span', {'class': 'itemsmall'}).text.split('\n')[1][11:]
theftalerts.append({'text': item_text, 'category': category})
I am trying to read a JSON file into Pandas. It's a relatively large file (41k records) mostly text.
{"sanders": [{"date": "February 8, 2016 Monday", "source": "Federal News
Service", "subsource": "MSNBC \"MSNBC Live\" Interview with Sen. Bernie
Sanders (I-VT), Democratic", "quotes": ["Well, it's not very progressive to
take millions of dollars from Wall Street as well.", "That's a very good
question, and I wish I could give her a definitive answer. QUOTE SHORTENED FOR
SPACE"]}, {"date": "February 7, 2016 Sunday", "source": "CBS News Transcripts", "subsource": "SHOW: CBS FACE THE NATION 10:30 AM EST", "quotes":
["Well, John -- John, I think that`s a media narrative that goes around and
around and around. I don`t accept that media narrative.", "Well, that`s what
she said about Barack Obama in 2008. "]},
I tried:
quotes = pd.read_json("/quotes.json")
I expected it to read in cleanly because it was file created in python. However, I got this error:
ValueError Traceback (most recent call last)
<ipython-input-19-c1acfdf0dbc6> in <module>()
----> 1 quotes = pd.read_json("/Users/kate/Documents/99Antennas/Client\
Files/Fusion/data/quotes.json")
/Users/kate/venv/lib/python2.7/site-packages/pandas/io/json.pyc in
read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates,
keep_default_dates, numpy, precise_float, date_unit)
208 obj = FrameParser(json, orient, dtype, convert_axes,
convert_dates,
209 keep_default_dates, numpy, precise_float,
--> 210 date_unit).parse()
211
212 if typ == 'series' or obj is None:
/Users/kate/venv/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)
276
277 else:
--> 278 self._parse_no_numpy()
279
280 if self.obj is None:
/Users/kate/venv/lib/python2.7/site-packages/pandas/io/json.pyc in _
parse_no_numpy(self)
493 if orient == "columns":
494 self.obj = DataFrame(
--> 495 loads(json, precise_float=self.precise_float),
dtype=None)
496 elif orient == "split":
497 decoded = dict((str(k), v)
ValueError: Expected object or value
After reading the documentation and stackoverflow, I also tried adding convert_dates=False to the parameters, but that did not correct the problem. I would welcome suggestions as to how to handle this error.
Try removing the forward slash in the filename. If you run this python code from the same directory where the file is sitting, it should work.
quotes = pd.read_json("quotes.json")
SPKoder mentioned the forward slash. I was looking for an answer when I realized I hadn't added a / when combing filename and path (i.e. c:/path/herefile.json, instead of c:/path/here/file.json). Anyways the error I received was ...
ValueError: Expected object or value
Not a very intuitive error message, but that is what causes it.
How do I convert this into a pandas dataframe?
df_components = { "result1":
{"data" : [["43", "48", "27", "12"], ["67", "44", "24", "11"], ["11.85", "6.31", "5.18", "11.70"]],
"index" : [["Device_use_totala11. PS4", "Unweighted base"], ["Device_use_totala11. PS4", "Base"], ["Device_use_totala11. PS4", "Mean"]],
"columns" : [["Age", "Under 30"], ["Age", "30-44"], ["Age", "45-54"], ["Age", "55+"]]}
}
It's a dict with list of lists.
I thought this would work but it returns something funky which doesnt look like a dataframe
pd.DataFrame(df_components['result1'])
Output looks like:
columns [[Age, Under 30], [Age, 30-44], [Age, 45-54], ...
data [[43, 48, 27, 12], [67, 44, 24, 11], [11.85, 6...
index [[Device_use_totala11. PS4, Unweighted base], ...
Expected output:
a multi index df, something similar to the table below?
Your dict is not formatted properly to transform it directly into a DataFrame, you need to do:
d = df_components["result1"]
df = pd.DataFrame(d["data"],
columns=pd.MultiIndex.from_tuples(d["columns"]),
index=pd.MultiIndex.from_tuples(d["index"]))
df
Age
Under 30 30-44 45-54 55+
Device_use_totala11. PS4 Unweighted base 43 48 27 12
Base 67 44 24 11
Mean 11.85 6.31 5.18 11.70