Merging arrays of dictionaries - json

Given the following two arrays of dictionaries, how can I merge them such that the resulting array of dictionaries contains only those dictionaries whose version is greatest?
data1 = [{'id': 1, 'name': 'Oneeee', 'version': 2},
{'id': 2, 'name': 'Two', 'version': 1},
{'id': 3, 'name': 'Three', 'version': 2},
{'id': 4, 'name': 'Four', 'version': 1},
{'id': 5, 'name': 'Five', 'version': 1}]
data2 = [{'id': 1, 'name': 'One', 'version': 1},
{'id': 2, 'name': 'Two', 'version': 1},
{'id': 3, 'name': 'Threeee', 'version': 3},
{'id': 6, 'name': 'Six', 'version': 2}]
The merged result should look like this:
data3 = [{'id': 1, 'name': 'Oneeee', 'version': 2},
{'id': 2, 'name': 'Two', 'version': 1},
{'id': 3, 'name': 'Threeee', 'version': 3},
{'id': 4, 'name': 'Four', 'version': 1},
{'id': 5, 'name': 'Five', 'version': 1},
{'id': 6, 'name': 'Six', 'version': 2}]

How about the following:
//define data1 and Data2
var data1 = new[]{
new {id = 1, name = "Oneeee", version = 2},
new {id = 2, name = "Two", version = 1},
new {id = 3, name = "Three", version = 2},
new {id = 4, name = "Four", version = 1},
new {id = 5, name ="Five", version = 1}
};
var data2 = new[] {
new {id = 1, name = "One", version = 1},
new {id = 2, name = "Two", version = 1},
new {id = 3, name = "Threeee", version = 3},
new {id = 6, name = "Six", version = 2}
};
//create a dictionary to handle lookups
var dict1 = data1.ToDictionary (k => k.id);
var dict2 = data2.ToDictionary (k => k.id);
// now query the data
var q = from k in dict1.Keys.Union(dict2.Keys)
select
dict1.ContainsKey(k) ?
(
dict2.ContainsKey(k) ?
(
dict1[k].version > dict2[k].version ? dict1[k] : dict2[k]
) :
dict1[k]
) :
dict2[k];
// convert enumerable back to array
var result = q.ToArray();
Alternative solution that is database friendly if data1 and data2 are tables....
var q = (
from d1 in data1
join d2 in data2 on d1.id equals d2.id into data2j
from d2j in data2j.DefaultIfEmpty()
where d2j == null || d1.version >= d2j.version
select d1
).Union(
from d2 in data2
join d1 in data1 on d2.id equals d1.id into data1j
from d1j in data1j.DefaultIfEmpty()
where d1j == null || d2.version > d1j.version
select d2
);
var result = q.ToArray();

Assume you have some class like (JSON.NET attributes used here):
public class Data
{
[JsonProperty("id")]
public int Id { get; set; }
[JsonProperty("name")]
public string Name { get; set; }
[JsonProperty("version")]
public int Version { get; set; }
}
You can parse you JSON to arrays of this Data objects:
var str1 = #"[{'id': 1, 'name': 'Oneeee', 'version': 2},
{'id': 2, 'name': 'Two', 'version': 1},
{'id': 3, 'name': 'Three', 'version': 2},
{'id': 4, 'name': 'Four', 'version': 1},
{'id': 5, 'name': 'Five', 'version': 1}]";
var str2 = #"[{'id': 1, 'name': 'One', 'version': 1},
{'id': 2, 'name': 'Two', 'version': 1},
{'id': 3, 'name': 'Threeee', 'version': 3},
{'id': 6, 'name': 'Six', 'version': 2}]";
var data1 = JsonConvert.DeserializeObject<Data[]>(str1);
var data2 = JsonConvert.DeserializeObject<Data[]>(str2);
So, you can concat these two arrays, group data items by id and select from each group item with highest version:
var data3 = data1.Concat(data2)
.GroupBy(d => d.Id)
.Select(g => g.OrderByDescending(d => d.Version).First())
.ToArray(); // or ToDictionary(d => d.Id)
Result (serialized back to JSON):
[
{ "id": 1, "name": "Oneeee", "version": 2 },
{ "id": 2, "name": "Two", "version": 1 },
{ "id": 3, "name": "Threeee", "version": 3 },
{ "id": 4, "name": "Four", "version": 1 },
{ "id": 5, "name": "Five", "version": 1 },
{ "id": 6, "name": "Six", "version": 2 }
]

Related

Postgresql -> JSON Array to rows

One of the columns in my table, has JSONArray data. I have used jsonb_agg() in a jsonb column to a view. Now the data from view looks like below
[
{
"Android": 2,
"Windows": 1,
"Macintosh": 1
},
{
"iOS": 1,
"Android": 2,
"Windows": 2,
"Macintosh": 2
},
{},
{
"Android": 1,
"Windows": 1
},
{
"Android": 1
},
{
"iOS": 1,
"Android": 2
},
{
"iOS": 2,
"Android": 1
},
{
"iOS": 2
},
{
"Android": 1
},
{
"iOS": 2,
"Windows": 1
},
{
"Android": 5
},
{},
{},
{
"iOS": 1,
"Android": 1
},
{},
{},
{
"Windows": 3
}
]
However, I need to product the below result
{
"Android": 16,
"Windows": 8,
"Macintosh": 3,
"iOS": 9
}
Is there a way within PostgreSQL to achieve this?
Yes there is a way. Here it is.
select to_jsonb(t.*) from
(
select
sum((j->>'Android')::numeric) "Android",
sum((j->>'Windows')::numeric) "Windows",
sum((j->>'Macintosh')::numeric) "Macintosh",
sum((j->>'iOS')::numeric) "iOS"
from jsonb_array_elements('[{"Android": 2, "Windows": 1, "Macintosh": 1}, {"iOS": 1, "Android": 2, "Windows": 2, "Macintosh": 2}, {}, {"Android": 1, "Windows": 1}, {"Android": 1}, {"iOS": 1, "Android": 2}, {"iOS": 2, "Android": 1}, {"iOS": 2}, {"Android": 1}, {"iOS": 2, "Windows": 1}, {"Android": 5}, {}, {}, {"iOS": 1, "Android": 1}, {}, {}, "Windows": 3}]'::jsonb) as j
) as t;
As a parameterized query:
select to_jsonb(t.*) from
(
select
sum((j->>'Android')::numeric) "Android",
sum((j->>'Windows')::numeric) "Windows",
sum((j->>'Macintosh')::numeric) "Macintosh",
sum((j->>'iOS')::numeric) "iOS"
from jsonb_array_elements(?::jsonb) as j
) as t;
You can use dynamic key, value like that:
with data as (
select
je.key,
sum(je.value::int)
from
json_array_elements('[{"Android": 2, "Windows": 1, "Macintosh": 1}, {"iOS": 1, "Android": 2, "Windows": 2, "Macintosh": 2}, {}, {"Android": 1, "Windows": 1}, {"Android": 1}, {"iOS": 1, "Android": 2}, {"iOS": 2, "Android": 1}, {"iOS": 2}, {"Android": 1}, {"iOS": 2, "Windows": 1}, {"Android": 5}, {}, {}, {"iOS": 1, "Android": 1}, {}, {}, {"Windows": 3}]') d
cross join json_each_text(d) as je
group by 1
)
select json_object_agg(d.key, d.sum)
from data d
Or if you have table and column you can use this sample format like that:
with data as (
select
je.key,
sum(je.value::int)
from
your_table t join
json_array_elements(t.your_json_coulmn) d on true
cross join json_each_text(d) as je
group by 1
)
select json_object_agg(d.key, d.sum)
from data d
You can combine many json/jsonb functions to get the result
SELECT
jsonb_agg(json_build_object(key, sum))
FROM (
SELECT
key,
sum(value)
FROM (
SELECT
(arr).key,
(arr).value::int
FROM (
SELECT
jsonb_each(j) arr
FROM (
SELECT
jsonb_array_elements('[{"Android": 2, "Windows": 1, "Macintosh": 1}, {"iOS": 1, "Android": 2, "Windows": 2, "Macintosh": 2}, {}, {"Android": 1, "Windows": 1}, {"Android": 1}, {"iOS": 1, "Android": 2}, {"iOS": 2, "Android": 1}, {"iOS": 2}, {"Android": 1}, {"iOS": 2, "Windows": 1}, {"Android": 5}, {}, {}, {"iOS": 1, "Android": 1}, {}, {}, {"Windows": 3}]
'::jsonb) AS j) AS jpart) AS x) sub
GROUP BY
1) AS sub2
output: [{"Windows": 8}, {"Android": 16}, {"iOS": 9}, {"Macintosh": 3}]

read large file excel with multiple worksheet to json with python

I have a large excel file and have multiple worksheets its 100 MB
sheet A
id | name | address
1 | joe | A
2 | gis | B
3 | leo | C
work_1
id| call
1 | 10
1 | 8
2 | 1
3 | 3
work_2
id| call
2 | 4
3 | 8
3 | 7
desired json for each id
data = { id: 1,
address: A,
name: Joe,
log : [{call:10}, {call:8 }]
}
data= { id: 2,
address: B,
name: Gis,
log : [{call:1}, {call:4}]
}
data= { id: 3,
address: C,
name: Leo,
log : [{call:3}, {call:8}, {call:7}]
}
i've tried with pandas but it takes 5 minutes to run it and it only read_excel without any processing. is there any solution to make it faster and how to get desired json?
maybe divide the process into chunk(but pandas removed chunksize for read_excel) and add some threading to make interval so the procees could be print each batch.
You can do:
works=pd.concat([work1,work2],ignore_index=True)
mapper_works=works.groupby('id')[['call']].apply(lambda x: x.to_dict('records'))
dfa['log']=dfa['id'].map(mapper_works)
data=dfa.reindex(columns=['id','address','name','log']).to_dict('records')
print(data)
The output is a list of dict for each id:
[{'id': 1, 'address': 'A', 'name': 'joe', 'log': [{'call': 10}, {'call': 8}]},
{'id': 2, 'address': 'B', 'name': 'gis', 'log': [{'call': 1}, {'call': 4}]},
{'id': 3, 'address': 'C', 'name': 'leo', 'log': [{'call': 3}, {'call': 8}, {'call': 7}]}
]
If you want you can assign to a column:
dfa['dicts']=data
print(dfa)
id name address log \
0 1 joe A [{'call': 10}, {'call': 8}]
1 2 gis B [{'call': 1}, {'call': 4}]
2 3 leo C [{'call': 3}, {'call': 8}, {'call': 7}]
dicts
0 {'id': 1, 'address': 'A', 'name': 'joe', 'log'...
1 {'id': 2, 'address': 'B', 'name': 'gis', 'log'...
2 {'id': 3, 'address': 'C', 'name': 'leo', 'log'...

pandas json_normalize flatten nested dictionaries

I am trying to flatten nested dictionaries by using json_normalize.
My data is like this:
data = [
{'gra': [
{
'A': 1,
'B': 9,
'C': {'D': '1', 'E': '1'},
'date': '2019-06-27'
}
]},
{'gra': [
{
'A': 2,
'B': 1,
'C': {'D': '1', 'E': '2'},
'date': '2019-06-27'
}
]},
{'gra': [
{
'A': 6,
'B': 1,
'C': {'D': '1', 'E': '3'},
'date': '2019-06-27'
}
]}
]
I want to get a dataframe like this:
A B C.D C.E date
1 9 1 1 2019-06-27
2 1 1 2 2019-06-27
6 1 1 3 2019-06-27
I tried record_path and meta in the json_normalize, but it keeps giving me an error.
How do you achieve this?
json_normalize does a pretty good job of flatting the object into a
pandas dataframe:
from pandas.io.json import json_normalize
json_normalize(sample_object)
from pandas.io.json import json_normalize
data_ = [item['gra'][0] for item in data] # [{'A': 1, 'B': 9, 'C': {'D': '1', 'E': '1'}, 'date': '2019-06-27'}, {'A': 2, 'B': 1, 'C': {'D': '1', 'E': '2'}, 'date': '2019-06-27'}, {'A': 6, 'B': 1, 'C': {'D': '1', 'E': '3'}, 'date': '2019-06-27'}]
print (json_normalize(data_))
output:
A B C.D C.E date
0 1 9 1 1 2019-06-27
1 2 1 1 2 2019-06-27
2 6 1 1 3 2019-06-27
It is the most easy way by iterating the list but cannot say it is the best way.
I hope it would solve your problem
data = [{'gra':[{'A': 1,
'B': 9,
'C': {'D': '1', 'E': '1'},
'date': '2019-06-27'}]},
{'gra':[{'A': 2,
'B': 1,
'C': {'D': '1', 'E': '2'},
'date': '2019-06-27'}]},
{'gra':[{'A': 6,
'B': 1,
'C': {'D': '1', 'E': '3'},
'date': '2019-06-27'}]}
]
final_list =[]
for i in data:
temp = dict()
temp['A'] = i['gra'][0]['A']
temp['B'] = i['gra'][0]['B']
temp['C.D'] = i['gra'][0]['C']['D']
temp['C.E'] = i['gra'][0]['C']['E']
temp['date']=i['gra'][0]['date']
final_list.append(temp)
df = pd.DataFrame.from_dict(final_list)
print(df)
A B C.D C.E date
0 1 9 1 1 2019-06-27
1 2 1 1 2 2019-06-27
2 6 1 1 3 2019-06-27
First we normalized and then hacked our way to produce the required output
import pandas as pd
data = [
{'gra': [
{
'A': 1,
'B': 9,
'C': {'D': '1', 'E': '1'},
'date': '2019-06-27'
}
]},
{'gra': [
{
'A': 2,
'B': 1,
'C': {'D': '1', 'E': '2'},
'date': '2019-06-27'
}
]},
{'gra': [
{
'A': 6,
'B': 1,
'C': {'D': '1', 'E': '3'},
'date': '2019-06-27'
}
]}
]
df = pd.json_normalize(data, 'gra')
cols = ['A','B','C.D','C.E','date']
df = df[cols]
print(df)
A B C.D C.E date
0 1 9 1 1 2019-06-27
1 2 1 1 2 2019-06-27
2 6 1 1 3 2019-06-27
[Program finished]

datetime keyerror in json API data

Using Python 3.5, I'm trying return data from the Todoist REST api, which is in JSON format.
[{'id': 2577166691, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577166691', 'completed': False, 'order': 2, 'content': 'soon', 'priority': 1, 'comment_count': 0, 'due': {'recurring': False, 'date': '2018-04-01', 'timezone': 'UTC+10:00', 'datetime': '2018-04-01T10:00:00Z', 'string': 'Mar 31 2019'}, 'indent': 1}, {'id': 2577166849, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577166849', 'completed': False, 'order': 3, 'content': 'To City +1', 'priority': 1, 'comment_count': 0, 'due': {'recurring': False, 'date': '2018-03-31', 'string': 'Mar 31'}, 'indent': 1}, {'id': 2577225965, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577225965', 'completed': False, 'order': 4, 'content': 'To City +2', 'priority': 1, 'comment_count': 0, 'indent': 1}, {'id': 2577974095, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577974095', 'completed': False, 'order': 5, 'content': 'To City +3', 'priority': 1, 'comment_count': 0, 'indent': 1}, {'id': 2577974970, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577974970', 'completed': False, 'order': 6, 'content': 'Next train from City', 'priority': 1, 'comment_count': 0, 'indent': 1}, {'id': 2577975012, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577975012', 'completed': False, 'order': 7, 'content': 'From City +1', 'priority': 1, 'comment_count': 0, 'indent': 1}, {'id': 2577975101, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577975101', 'completed': False, 'order': 8, 'content': 'From City +2', 'priority': 1, 'comment_count': 0, 'indent': 1}, {'id': 2577975145, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577975145', 'completed': False, 'order': 9, 'content': 'From City +3', 'priority': 1, 'comment_count': 0, 'indent': 1}]
I can correctly obtain data for all items, eg
print(json_tasks[0]['id']
2577166691
And it also works for
print(json_tasks[0]['due']['recurring'])
False
print(json_tasks[0]['due']['date'])
2018-04-01
But:
print(json_tasks[0]['due']['datetime'])
'KeyError: 'datetime'
I have tried a number of things but I'm stumped. What am I doing wrong? How can I get it to recognise 'datetime' as a key?
The code below, when I ran it, printed out 2018-04-01T10:00:00Z.
json_tasks = [{'id': 2577166691, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577166691', 'completed': False, 'order': 2, 'content': 'soon', 'priority': 1, 'comment_count': 0, 'due': {'recurring': False, 'date': '2018-04-01', 'timezone': 'UTC+10:00', 'datetime': '2018-04-01T10:00:00Z', 'string': 'Mar 31 2019'}, 'indent': 1}, {'id': 2577166849, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577166849', 'completed': False, 'order': 3, 'content': 'To City +1', 'priority': 1, 'comment_count': 0, 'due': {'recurring': False, 'date': '2018-03-31', 'string': 'Mar 31'}, 'indent': 1}, {'id': 2577225965, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577225965', 'completed': False, 'order': 4, 'content': 'To City +2', 'priority': 1, 'comment_count': 0, 'indent': 1}, {'id': 2577974095, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577974095', 'completed': False, 'order': 5, 'content': 'To City +3', 'priority': 1, 'comment_count': 0, 'indent': 1}, {'id': 2577974970, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577974970', 'completed': False, 'order': 6, 'content': 'Next train from City', 'priority': 1, 'comment_count': 0, 'indent': 1}, {'id': 2577975012, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577975012', 'completed': False, 'order': 7, 'content': 'From City +1', 'priority': 1, 'comment_count': 0, 'indent': 1}, {'id': 2577975101, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577975101', 'completed': False, 'order': 8, 'content': 'From City +2', 'priority': 1, 'comment_count': 0, 'indent': 1}, {'id': 2577975145, 'project_id': 2181643136, 'url': 'https://todoist.com/showTask?id=2577975145', 'completed': False, 'order': 9, 'content': 'From City +3', 'priority': 1, 'comment_count': 0, 'indent': 1}]
print(json_tasks[0]['due']['datetime'])

import csv to json/dict with additional keywords/organization

I'm trying to convert a .csv to json/dict such that the data in its current form:
cat1,cat2,cat3,name
1,2,3,a
4,5,6,b
7,8,9,c
I'm currently using something like this(as well as importing using pandas.df bc it will be used for graphing from json file):
with open('Data.csv') as f:
reader = csv.DictReader(f)
rows = list(reader)
print (rows)
[{'cat1': '1', 'name': 'a', 'cat3': '3', 'cat2': '2'},
{'cat1': '4', 'name': 'b', 'cat3': '6', 'cat2': '5'},
{'cat1': '7', 'name': 'c', 'cat3': '9', 'cat2': '8'}]
and I want it to look like this in json/dict format:
{"data: [{"all_cats": {"cat1": 1}, {"cat2": 2}, {"cat3": 3}}, "name": a},
{"all_cats": {"cat1": 4}, {"cat2": 5}, {"cat3": 6}}, "name": b},
{"all_cats": {"cat1": 7}, {"cat2": 8}, {"cat3": 8}}, "name": c}]}
Importing directly doesn't allow me to include: 'cat1', 'cat2', 'cat3' under 'all_cats' and keep 'name' separate.
Any help would be appreciated.
Since it's space separated and not comma separated you have to add delimiter=" ". Additionally since some of your rows have whitespace beforehand, that means you also have to add skipinitialspace=True.
reader = csv.DictReader(f, delimiter=" ", skipinitialspace=True)
rows = list(dict(row) for row in reader)
Thus if you now do:
for row in rows:
print(row)
The output will be:
{'cat1': '1', 'cat2': '2', 'cat3': '3', 'name': 'a'}
{'cat1': '4', 'cat2': '5', 'cat3': '6', 'name': 'b'}
{'cat1': '7', 'cat2': '8', 'cat3': '9', 'name': 'c'}
As already mentioned in the other answer you don't specify valid JSON format for what you want to achieve. You can check if a string contains valid JSON format using json.loads(jsonDATAstring) function:
import json
jsonDATAstring_1 = """
{"data: [{"all_cats": {"cat1": 1}, {"cat2": 2}, {"cat3": 3}}, "name": a},
{"all_cats": {"cat1": 4}, {"cat2": 5}, {"cat3": 6}}, "name": b},
{"all_cats": {"cat1": 7}, {"cat2": 8}, {"cat3": 8}}, "name": c}]}
"""
json.loads(jsonDATAstring_1)
what in case of the by you specified expected JSON format results in:
json.decoder.JSONDecodeError: Expecting ':' delimiter: line 2 column 12 (char 12)
From what is known to me from your question I assume, that the JSON string you want to get is a following one:
jsonDATAstring_2 = """
{"data": [{"all_cats": {"cat1": 1, "cat2": 2, "cat3": 3}, "name": "a"},
{"all_cats": {"cat1": 4, "cat2": 5, "cat3": 6}, "name": "b"},
{"all_cats": {"cat1": 7, "cat2": 8, "cat3": 8}, "name": "c"}]}
"""
json.loads(jsonDATAstring_2)
This second string loads OK, so assuming:
rows = [{'cat1': '1', 'name': 'a', 'cat3': '3', 'cat2': '2'},
{'cat1': '4', 'name': 'b', 'cat3': '6', 'cat2': '5'},
{'cat1': '7', 'name': 'c', 'cat3': '9', 'cat2': '8'}]
you can get what you want as follows:
dctData = {"data": []}
lstCats = ['cat1', 'cat2', 'cat3']
for row in rows:
dctAllCats = {"all_cats":{}, "name":"?"}
for cat in lstCats:
dctAllCats["all_cats"][cat] = row[cat]
dctAllCats["name"] = row["name"]
dctData["data"].append(dctAllCats)
import pprint
pp = pprint.PrettyPrinter()
pp.pprint(dctData)
what gives:
{'data': [{'all_cats': {'cat1': '1', 'cat2': '2', 'cat3': '3'}, 'name': 'a'},
{'all_cats': {'cat1': '4', 'cat2': '5', 'cat3': '6'}, 'name': 'b'},
{'all_cats': {'cat1': '7', 'cat2': '8', 'cat3': '9'}, 'name': 'c'}]}
Now it is possible to serialize the Python dictionary object to JSON string (or file):
jsonString = json.dumps(dctData)
print(jsonString)
what gives:
{"data": [{"all_cats": {"cat1": "1", "cat2": "2", "cat3": "3"}, "name": "a"}, {"all_cats": {"cat1": "4", "cat2": "5", "cat3": "6"}, "name": "b"}, {"all_cats": {"cat1": "7", "cat2": "8", "cat3": "9"}, "name": "c"}]}