How to use loops to extract elements from JSON in Python? - json

json_data = [{'User_Info':[{'Name':'John'},{'Name':'Ashly'},
{'Name':'Herbert'}]},
{'User_Info':[{'Name':''}]},
{'User_Info':[{'Name':'Lee'},{'Name':'Patrick'},{'Name':'Herbert'}]},
{'User_Info':[{'Name':'Benjamine'}]}]
I have JSON data and the length of the data is 5. I'd like to use loops to find names from that data. I've tried the code below but didn't get the expected outputs:
names_outputs = []
for ppl in json_data:
for i in ppl['User_Info']:
names_outputs.append(i['Name'])
print(names_outputs)
>>['John','Ashly','Herbert','Lee','Patrick','Walter','Steve','Benjamine']
However, my expected outputs should be like this:
[['John','Ashly','Herbert'],[],['Lee','Patrick','Herbert'],['Walter','Steve'],['Benjamine']]

You can use a nested list comprehension for that:
>>> [[name["Name"] for name in people] for people in [d["User_Info"] for d in json_data]]
[['John', 'Ashly', 'Herbert'], [''], ['Lee', 'Patrick', 'Herbert'], ['Benjamine']]
If you want to eliminate empty strings, use filter:
>>> [filter(None, [name["Name"] for name in people]) for people in [d["User_Info"] for d in json_data]]
[['John', 'Ashly', 'Herbert'], [], ['Lee', 'Patrick', 'Herbert'], ['Benjamine']]

Related

How to convert multi dimensional array in JSON as separate columns in pandas

I have a DB collection consisting of nested strings . I am trying to convert the contents under "status" column as separate columns against each order ID in order to track the time taken from "order confirmed" to "pick up confirmed". The string looks as follows:
I have tried the same using
xyz_db= db.logisticsOrders -------------------------(DB collection)
df =pd.DataFrame(list(xyz_db.find())) ------------(JSON to dataframe)
Using normalize :
parse1=pd.json_normalize(df['status'])
It works fine in case of non nested arrays. But status being a nested array the output is as follows:
Using for :
data = df[['orderid','status']]
data = list(data['status'])
dfy = pd.DataFrame(columns = ['statuscode','statusname','laststatusupdatedon'])
for i in range(0, len(data)):
result = data[i]
dfy.loc[i] = [data[i][0],data[i][0],data[i][0],data[i][0]]
It gives the result in form of appended rows which is not the format i am trying to achieve
The output I am trying to get is :
Please help out!!
i share you which i used json read, maybe help you:
you can use two and more list
def jsonify(z):
genr = []
if z==z and z is not None:
z = eval(z)
if type(z) in (dict, list, tuple):
for dic in z:
for key, val in dic.items():
if key == "name":
genr.append(val)
else:
return None
else:
return None
return genr
top_genr['genres_N']=top_genr['genres'].apply(jsonify)

How to check for specific field values based on some condition while converting csv file to json format

Below is the code to convert csv file to json format in python.
I have two fields 'recommendation' and 'rating'. Based on the recommendation value I need to set the value for rating field like if recommendation is 1 then rating =1 and vice versa. With the answer I got I'm getting output for only one record entry instead of getting all the records. I think it's overriding. Do I need to create separate list for that and append each record entry to the list to get the output for all records.
here's the updated code:
def main(input_file):
csv_rows = []
with open(input_file, 'r') as csvfile:
reader = csv.DictReader(csvfile, delimiter='|')
title = reader.fieldnames
for row in reader:
entry = OrderedDict()
for field in title:
entry[field] = row[field]
[c.update({'RATING': c['RECOMMENDATIONS']}) for c in reader]
csv_rows.append(entry)
with open(json_file, 'w') as f:
json.dump(csv_rows, f, sort_keys=True, indent=4, ensure_ascii=False)
f.write('\n')
I want to create the nested format like the below:
"rating": {
"user_rating": {
"rating": 1
},
"recommended": {
"rating": 1
}
After you've read the file in, using the csv.DictReader, you'll have a list of dicts. Since you want to set the values now, it's a simple dict manipulation. There are several ways, of which one is:
[c.update({'rating': c['recommendation']}) for c in read_csvDictReader]
Hope that helps.

Removing characters from column in pandas data frame

My goal is to (1) import Twitter JSON, (2) extract data of interest, (3) create pandas data frame for the variables of interest. Here is my code:
import json
import pandas as pd
tweets = []
for line in open('00.json'):
try:
tweet = json.loads(line)
tweets.append(tweet)
except:
continue
# Tweets often have missing data, therefore use -if- when extracting "keys"
tweet = tweets[0]
ids = [tweet['id_str'] for tweet in tweets if 'id_str' in tweet]
text = [tweet['text'] for tweet in tweets if 'text' in tweet]
lang = [tweet['lang'] for tweet in tweets if 'lang' in tweet]
geo = [tweet['geo'] for tweet in tweets if 'geo' in tweet]
place = [tweet['place'] for tweet in tweets if 'place' in tweet]
# Create a data frame (using pd.Index may be "incorrect", but I am a noob)
df=pd.DataFrame({'Ids':pd.Index(ids),
'Text':pd.Index(text),
'Lang':pd.Index(lang),
'Geo':pd.Index(geo),
'Place':pd.Index(place)})
# Create a data frame satisfying conditions:
df2 = df[(df['Lang']==('en')) & (df['Geo'].dropna())]
So far, everything seems to be working fine.
Now, the extracted values for Geo result in the following example:
df2.loc[1921,'Geo']
{'coordinates': [39.11890951, -84.48903638], 'type': 'Point'}
To get rid of everything except the coordinates inside the squared brackets I tried using:
df2.Geo.str.replace("[({':]", "") ### results in NaN
# and also this:
df2['Geo'] = df2['Geo'].map(lambda x: x.lstrip('{'coordinates': [').rstrip('], 'type': 'Point'')) ### results in syntax error
Please advise on the correct way to obtain coordinates values only.
The following line from your question indicates that this is an issue with understanding the underlying data type of the returned object.
df2.loc[1921,'Geo']
{'coordinates': [39.11890951, -84.48903638], 'type': 'Point'}
You are returning a Python dictionary here -- not a string! If you want to return just the values of the coordinates, you should just use the 'coordinates' key to return those values, e.g.
df2.loc[1921,'Geo']['coordinates']
[39.11890951, -84.48903638]
The returned object in this case will be a Python list object containing the two coordinate values. If you want just one of the values, you can slice the list, e.g.
df2.loc[1921,'Geo']['coordinates'][0]
39.11890951
This workflow is much easier to deal with than casting the dictionary to a string, parsing the string, and recapturing the coordinate values as you are trying to do.
So let's say you want to create a new column called "geo_coord0" which contains all of the coordinates in the first position (as shown above). You could use a something like the following:
df2["geo_coord0"] = [x['coordinates'][0] for x in df2['Geo']]
This uses a Python list comprehension to iterate over all entries in the df2['Geo'] column and for each entry it uses the same syntax we used above to return the first coordinate value. It then assigns these values to a new column in df2.
See the Python documentation on data structures for more details on the data structures discussed above.

How to convert Mnesia query results to a JSON'able list?

I am trying to use JSX to convert a list of tuples to a JSON object.
The list items are based on a record definition:
-record(player, {index, name, description}).
and looks like this:
[
{player,1,"John Doe","Hey there"},
{player,2,"Max Payne","I am here"}
]
The query function looks like this:
select_all() ->
SelectAllFunction =
fun() ->
qlc:eval(qlc:q(
[Player ||
Player <- mnesia:table(player)
]
))
end,
mnesia:transaction(SelectAllFunction).
What's the proper way to make it convertable to a JSON knowing that I have a schema of the record used and knowing the structure of tuples?
You'll have to convert the record into a term that jsx can encode to JSON correctly. Assuming you want an array of objects in the JSON for the list of player records, you'll have to either convert each player to a map or list of tuples. You'll also have to convert the strings to binaries or else jsx will encode it to a list of integers. Here's some sample code:
-record(player, {index, name, description}).
player_to_json_encodable(#player{index = Index, name = Name, description = Description}) ->
[{index, Index}, {name, list_to_binary(Name)}, {description, list_to_binary(Description)}].
go() ->
Players = [
{player, 1, "John Doe", "Hey there"},
% the following is just some sugar for a tuple like above
#player{index = 2, name = "Max Payne", description = "I am here"}
],
JSON = jsx:encode(lists:map(fun player_to_json_encodable/1, Players)),
io:format("~s~n", [JSON]).
Test:
1> r:go().
[{"index":1,"name":"John Doe","description":"Hey there"},{"index":2,"name":"Max Payne","description":"I am here"}]

Expanding a JSON column in R

I am reading in a data table from a CSV file. Some elements in the CSV are in JSON format, so one of the columns has JSON formatted data, for example:
user_id tv_sec action_info
1: 47074 1426791420 {"foo": {"bar":12345,"baz":309}, "type": "type1"}
2: 47074 1426791658 {"foo": '{"bar":23409,"baz":903}, "type": "type2"}
3: 47074 1426791923 {"foo": {"bar":97241,"baz":218}, "type": "type3"}
I would like to flatten out the action_info column and add the data as columns, as follows:
user_id tv_sec bar baz type
1: 47074 1426791420 12345 309 type1
2: 47074 1426791658 23409 903 type2
3: 47074 1426791923 97241 218 type3
I am not sure how to achieve this. I found a library to convert strings to JSON in R (RJSONIO) but I'm having a hard time figuring out what to do next. When I experiment with just trying to convert all rows in the action_info column to JSON with the command userActions[,.(fromJSON(action_info))] I basically get a data table with what seems like all the values accumulated in some way that's not entirely clear to me. For example, running that with my (non-example) data I get:
V1
1: 2.188603e+12,2.187628e+12,2.186202e+12,1.164000e+03
2: type1
Warning messages:
1: In if (is.na(encoding)) return(0L) :
the condition has length > 1 and only the first element will be used
2: In if (is.na(i)) { :
the condition has length > 1 and only the first element will be used
So, I'm trying to figure out:
how to operate on the column to convert it from JSON to values (I think I am doing this correctly though, but I'm not certain)
how to get the values and create columns out of them in either the current or new data table.
Rather ugly but should work:
library(dplyr)
library(data.table)
lapply(as.character(df$action_info), RJSONIO::fromJSON) %>%
lapply(function(e) list(bar=e$foo[1], baz=e$foo[2], type=e$type)) %>%
rbindlist() %>%
cbind(df) %>%
select(-action_info)
Data:
library(data.table)
df <- data.table(structure(list(user_id = c(47074L, 47074L, 47074L), tv_sec = c(1426791420L,
1426791658L, 1426791923L), action_info = c("{\"foo\": {\"bar\":12345,\"baz\":309}, \"type\": \"type1\"}",
"{\"foo\": {\"bar\":23409,\"baz\":903}, \"type\": \"type2\"}",
"{\"foo\": {\"bar\":97241,\"baz\":218}, \"type\": \"type3\"}"
)), .Names = c("user_id", "tv_sec", "action_info"), row.names = c(NA,
-3L), class = "data.frame"))
Here's one way to do it with data_table:
df[, c('bar', 'baz', 'type'):=as.list(unlist(fromJSON(action_info[1]))),
by=action_info]
How it works:
The by=action_info essentially makes sure we just call fromJSON once per unique action_info (once per row in your case); this is because fromJSON doesn't work on vectorised input.
The fromJSON(action_info[1]) converts the action_info to JSON (the [1] is on the off chance that you have multiple rows with the same action_info since fromJSON doesn't work on vector input).
The unlist flattens the nested "foo: {bar...}" (do fromJSON(df$action_info[1]) and unlist(fromJSON(df$action_info[1])) to see what I mean).
The as.list converts the result back into a list, with one element per "column" (data.table needs this to do the multiple assignment)
Then the c('bar', 'baz', 'type'):= assigns the output back out to the columns.
Note we don't match by name, so 'bar' is always the first part of the JSON, 'baz' is always the second, etc. If your action_info could have a {bar: ..., baz: ...} as well as a {baz: ..., bar: ...} the baz of the second will be assigned to the bar column. If you want to be cleverer and assign by name, you will have to think of something cleverer (for you could do as.list(...)[c('foo.bar', 'foo.baz', 'type')] to ensure the elements are in the right order before assigning).