Merge multiple values for same key to one dict/json (Pandas, Python, Dataframe)? - json

I have the following dataframe:
pd.DataFrame({'id':[1,1,1,2,2], 'key': ['a', 'a', 'b', 'a', 'b'], 'value': ['kkk', 'aaa', '5', 'kkk','8']})
I want to convert it to the following data frame:
id value
1 {'a':['kkk', 'aaa'], 'b': 5}
2 {'a':['kkk'], 'b': 8}
I am trying to do this using .to_dict method but the output is
df.groupby(['id','key']).aggregate(list).groupby('id').aggregate(list)
{'value': {1: [['kkk', 'aaa'], ['5']], 2: [['kkk'], ['8']]}}
Should I perform dict comprehension or there is an efficient logic to build such generic json/dict?

After you groupby(['id', 'key']) and agg(list), you can group by the first level of the index and for each group thereof, use droplevel + to_dict:
new_df = df.groupby(['id', 'key']).agg(list).groupby(level=0).apply(lambda x: x['value'].droplevel(0).to_dict()).reset_index(name='value')
Output:
>>> new_df
id value
0 1 {'a': ['kkk', 'aaa'], 'b': ['5']}
1 2 {'a': ['kkk'], 'b': ['8']}
Or, simpler:
new_df = df.groupby('id').apply(lambda x: x.groupby('key')['value'].agg(list).to_dict())

Related

handle multiple double quotes & apostrophe in json key value in sql

metadata
-----------
{'a': 'jay', 'b': '100', 'c': ""ANAND'S STORE"", 'd': '200'}
Having difficulty parsing this in sql
Need to get like this:
a b c d
----------------------------------------
jay | 100 | ANAND'S STORE | 200
I have tried json_extract_path_text(metadata, 'c') as store_name but throwing error Error parsing JSON: more than one document in the input
I am open to solution in mysql or postgresql also.
For Snowflake:
The problem is double double quotes, is how you escape quote inside CSV, but this is JSON, and thus double double quotes do not mean anything, so if we replace the double doubles, with a single double, for this data we are all good.
select column1 as a
,replace(a, '""','"') as b
,parse_json(b) as c
,c:a::text as c_a
from values
('{\'a\': \'jay\', \'b\': \'100\', \'c\': ""ANAND\'S STORE"", \'d\': \'200\'}');
gives:
A
B
C
C_A
{'a': 'jay', 'b': '100', 'c': ""ANAND'S STORE"", 'd': '200'}
{'a': 'jay', 'b': '100', 'c': "ANAND'S STORE", 'd': '200'}
{ "a": "jay", "b": "100", "c": "ANAND'S STORE", "d": "200" }
jay

Insert list json objects into row based on other column values in dataframe

I have dataframe with the following columns:
ID A1 B1 C1 A2 B2 C2 A3 B3 C3
AA 1 3 6 4 0 6
BB 5 5 4 6 7 9
CC 5 5 5
I want to create a new column called Z that takes each row, groups them into a JSON list of records, and renames the column as their key. After the JSON column is constructed, I want to drop all the columns and keep Z and ID only.
Here is the output desired:
ID Z
AA [{"A":1, "B":3,"C":6},{"A":4, "B":0,"C":6}]
BB [{"A":5, "B":5,"C":4},{"A":6, "B":7,"C":9}]
CC [{"A":5, "B":5,"C":5}]
Here is my current attempt:
df2 = df.groupby(['ID']).apply(lambda x: x[['A1', 'B1', 'C1',
'A2', 'B2', 'C2', 'A3', 'B3', 'C3']].to_dict('records')).to_frame('Z').reset_index()
The problem is that I cannot rename the columns so that only the letter remains and the number is removed like the example above. Running the code above also does not separate each group of 3 into one object as opposed to creating two objects in my list. I would like to accomplish this in Pandas if possible. Any guidance is greatly appreciated.
Pandas solution
Convert the columns to MultiIndex by splitting and expanding around a regex delimiter, then stack the dataframe to convert the dataframe to multiindex series, then group the dataframe on level=0 and apply the to_dict function to create records per ID
s = df.set_index('ID')
s.columns = s.columns.str.split(r'(?=\d+$)', expand=True)
s.stack().groupby(level=0).apply(pd.DataFrame.to_dict, 'records').reset_index(name='Z')
Result
ID Z
0 AA [{'A': 1.0, 'B': 3.0, 'C': 6.0}, {'A': 4.0, 'B': 0.0, 'C': 6.0}]
1 BB [{'A': 5.0, 'B': 5.0, 'C': 4.0}, {'A': 6.0, 'B': 7.0, 'C': 9.0}]
2 CC [{'A': 5.0, 'B': 5.0, 'C': 5.0}]
Have you tried to go line by line?? I am not very good with pandas and python. But I have me this code. Hope it works for you.
toAdd = []
for row in dataset.values:
toAddLine = {}
i = 0
for data in row:
if data is not None:
toAddLine["New Column Name "+dataset.columns[i]] = data
i = i +1
toAdd.append(toAddLine)
dataset['Z'] = toAdd
dataset['Z']
# create a columns name map for chang related column
columns = dataset.columns
columns_map = {}
for i in columns:
columns_map[i] = f"new {i}"
def change_row_to_json(row):
new_dict = {}
for index, value in enumerate(row):
new_dict[columns_map[columns[index]]] = value
return json.dumps(new_dict, indent = 4)
dataset.loc[:,'Z'] = dataset.apply(change_row_to_json, axis=1)
dataset= dataset[["ID", "Z"]]
I just add a few lines on subham codes and it worked for me
import pandas as pd
from numpy import nan
data = pd.DataFrame({'ID': {0: 'AA', 1: 'BB', 2: 'CC'}, 'A1': {0: 1, 1: 5, 2: 5}, 'B1': {0: 3, 1: 5, 2: 5}, 'C1': {0: 6, 1: 4, 2: 5}, 'A2': {0: nan, 1: 6.0, 2: nan}, 'B2': {0: nan, 1: 7.0, 2: nan}, 'C2': {0: nan, 1: 9.0, 2: nan}, 'A3': {0: 4.0, 1: nan, 2: nan}, 'B3': {0: 0.0, 1: nan, 2: nan}, 'C3': {0: 6.0, 1: nan, 2: nan}} )
data
data.index = data.ID
data.drop(columns=['ID'],inplace=True)
data
data.columns = data.columns.str.split(r'(?=\d+$)', expand=True)
d = data.stack().groupby(level=0).apply(pd.DataFrame.to_dict, 'records').reset_index(name='Z')
d.index = d.ID
d.drop(columns=['ID'],inplace=True)
d.to_dict()['Z']
Now we can see we get desired output thanks, #shubham Sharma, for the answer I think this might help

data transformation from pandas to json

I have a dataframe df:
d = {'col1': [1, 2,0,55,12,3], 'col3': ['A','A','A','B','B','B'] }
df = pd.DataFrame(data=d)
df
col1 col3
0 1 A
1 2 A
2 0 A
3 55 B
4 12 B
6 3 B
and want to build a Json from it, as the results looks like this :
json_result = { 'A' : [1,2,0], 'B': [55,12,3] }
basically, I would like for each group of the col3 to affect an array of its corresponding values from the dataframe
Aggregate list and then use Series.to_json:
print (df.groupby('col3')['col1'].agg(list).to_json())
{"A":[1,2,0],"B":[55,12,3]}
or if need dictionary use Series.to_dict:
print (df.groupby('col3')['col1'].agg(list).to_dict())
{'A': [1, 2, 0], 'B': [55, 12, 3]}

how to make lua table key in order

my test code:
local jsonc = require "jsonc"
local x = {
a = 1,
b = 2,
c = 3,
d = 4,
e = 5,
}
for k, v in pairs(x) do
print(k,v)
end
print(jsonc.stringify(x))
output:
a 1
c 3
b 2
e 5
d 4
{"a":1,"c":3,"b":2,"e":5,"d":4}
someone help:
from for pairs output, lua store table by key hash order, how can i change it?
i need output: {"a":1,"b":2,"c":3,"d":4,"e":5}
thanks
Lua tables can't preserve the order of their keys. There are two possible solutions.
You can store the keys in a separate array and iterate through that whenever you need to iterate through the table:
local keys = {'a', 'b', 'c', 'd', 'e'}
Or, instead of a hash table, you can use an array of pairs:
local x = {
{'a', 1},
{'b', 2},
{'c', 3},
{'d', 4},
{'e', 5},
}

pandas json_normalize flatten nested dictionaries

I am trying to flatten nested dictionaries by using json_normalize.
My data is like this:
data = [
{'gra': [
{
'A': 1,
'B': 9,
'C': {'D': '1', 'E': '1'},
'date': '2019-06-27'
}
]},
{'gra': [
{
'A': 2,
'B': 1,
'C': {'D': '1', 'E': '2'},
'date': '2019-06-27'
}
]},
{'gra': [
{
'A': 6,
'B': 1,
'C': {'D': '1', 'E': '3'},
'date': '2019-06-27'
}
]}
]
I want to get a dataframe like this:
A B C.D C.E date
1 9 1 1 2019-06-27
2 1 1 2 2019-06-27
6 1 1 3 2019-06-27
I tried record_path and meta in the json_normalize, but it keeps giving me an error.
How do you achieve this?
json_normalize does a pretty good job of flatting the object into a
pandas dataframe:
from pandas.io.json import json_normalize
json_normalize(sample_object)
from pandas.io.json import json_normalize
data_ = [item['gra'][0] for item in data] # [{'A': 1, 'B': 9, 'C': {'D': '1', 'E': '1'}, 'date': '2019-06-27'}, {'A': 2, 'B': 1, 'C': {'D': '1', 'E': '2'}, 'date': '2019-06-27'}, {'A': 6, 'B': 1, 'C': {'D': '1', 'E': '3'}, 'date': '2019-06-27'}]
print (json_normalize(data_))
output:
A B C.D C.E date
0 1 9 1 1 2019-06-27
1 2 1 1 2 2019-06-27
2 6 1 1 3 2019-06-27
It is the most easy way by iterating the list but cannot say it is the best way.
I hope it would solve your problem
data = [{'gra':[{'A': 1,
'B': 9,
'C': {'D': '1', 'E': '1'},
'date': '2019-06-27'}]},
{'gra':[{'A': 2,
'B': 1,
'C': {'D': '1', 'E': '2'},
'date': '2019-06-27'}]},
{'gra':[{'A': 6,
'B': 1,
'C': {'D': '1', 'E': '3'},
'date': '2019-06-27'}]}
]
final_list =[]
for i in data:
temp = dict()
temp['A'] = i['gra'][0]['A']
temp['B'] = i['gra'][0]['B']
temp['C.D'] = i['gra'][0]['C']['D']
temp['C.E'] = i['gra'][0]['C']['E']
temp['date']=i['gra'][0]['date']
final_list.append(temp)
df = pd.DataFrame.from_dict(final_list)
print(df)
A B C.D C.E date
0 1 9 1 1 2019-06-27
1 2 1 1 2 2019-06-27
2 6 1 1 3 2019-06-27
First we normalized and then hacked our way to produce the required output
import pandas as pd
data = [
{'gra': [
{
'A': 1,
'B': 9,
'C': {'D': '1', 'E': '1'},
'date': '2019-06-27'
}
]},
{'gra': [
{
'A': 2,
'B': 1,
'C': {'D': '1', 'E': '2'},
'date': '2019-06-27'
}
]},
{'gra': [
{
'A': 6,
'B': 1,
'C': {'D': '1', 'E': '3'},
'date': '2019-06-27'
}
]}
]
df = pd.json_normalize(data, 'gra')
cols = ['A','B','C.D','C.E','date']
df = df[cols]
print(df)
A B C.D C.E date
0 1 9 1 1 2019-06-27
1 2 1 1 2 2019-06-27
2 6 1 1 3 2019-06-27
[Program finished]