Python csv conversion to specific nested json

Python csv conversion to specific nested json - json

I have a dataframe (csv file loaded into Pandas) as below :
col1 col2 col3 col4 col5 name amount
1 USA 4000 Air 60 Education 200
1 USA 4000 Air 60 Car 100
1 USA 4000 Air 60 Restaurant 100
2 UK 5000 Cash 50 Government 125
2 UK 5000 Cash 50 Restaurant 135
Now, i need to convert it into nested json format. For one record ( Col1, col2, col3, col4 - consider for grouping )
Below Json format is expected output :
{
“col5”: 60,
“col4”: [
{
“name”: “Air”
}
],
“expenses”: [
{
“amount”: 200,
“name”: “Education”
},
{
“amount”: Car,
“name”: “Car”
},
{
“amount”: 100,
“name”: “Restaurant”
}
],
“col1”: 1,
“col2”: “USA”,
“col3”: “4000”
}
I understand, its going to be bit complex code... But is there some one to help ?
Thanks in advance !!

I believe you need:
For dictionary:
d = (df.groupby(['col1','col2','col3','col4','col5'])
.apply(lambda x: dict(zip(x['name'], x['amount'])))
.reset_index(name='expenses')
.to_dict(orient='records')
)
print (d)
For json:
j = (df.groupby(['col1','col2','col3','col4','col5'])
.apply(lambda x: dict(zip(x['name'], x['amount'])))
.reset_index(name='expenses')
.to_json(orient='records')
)
print (j)

Related

Postgres Json List of Map

I have one sample json as below
{
"jsonObject":
[
{ "Name" : "XPerson",
"Age" : 18},
{ "Name" : "YPerson",
"Age" : 18}
]
}
I can have this list to N numbers. I want to separate this in different column based on Age like less than 18 in column1, between 18 to 25 in column2 and all other in column3.
How can we achieve in postgres?

It sounds like you are trying to do front end's job at database level.
with data(Age,Name) as
(select *
from jsonb_to_recordset(' {
"jsonObject": [
{
"Name": "XPerson",
"Age": 18
},
{
"Name": "YPerson",
"Age": 18
}
]
}'::jsonb -> 'jsonObject') as t("Age" int, "Name" text))
select
string_agg(case when age < 18 then name end,',') as Column1,
string_agg(case when age >= 18 and age <=25 then name end,',') as Column2,
string_agg(case when age > 25 then name end,',') as Column3
from data;

In groovy, how can i iterate over a csv file and call a function for every "group" that as a different value on some of the columns?

In Groovy and running on a jenkins pipeline, I am using the readFile function from jenkins to read the csv file.
Example csv:
name
val1
val2
John
2
122
John
2
012
Bertha
2
0021
John
3
20
Philip
3
12022
Bertha
3
162021
John
3
2022
What I am trying to achieve is call another function for each different value in column "name".
The Groovy script flow would be something like:
call functionX (name, rest of values) with:
name
val1
val2
John
2
122
John
2
012
John
3
20
John
3
2022
then call functionX (name, rest of values) with:
name
val1
val2
Philip
3
12022
then call functionX (name, rest of values) with:
name
val1
val2
Bertha
2
0021
Bertha
3
162021
Note:
The order (John, Philip, Bertha) is not important!
I think i can achieve this with closures but I'm not quite sure since I'm pretty new to the topic

Is this something like what you are looking for?
def functionX(name,val1,val2) {
if (name == 'name') return
println ( "Name: $name, V1: $val1, V2: $val2" )
}
new File( 'names.csv' ).readLines().sort{ it }.each {
println it
functionX( *( it.split( ',' ) ) )
}
Output:
Bertha,2,21
Name: Bertha, V1: 2, V2: 21
Bertha,3,162021
Name: Bertha, V1: 3, V2: 162021
John,2,12
Name: John, V1: 2, V2: 12
John,2,122
Name: John, V1: 2, V2: 122
John,3,20
Name: John, V1: 3, V2: 20
John,3,2022
Name: John, V1: 3, V2: 2022
Philip,3,12022
Name: Philip, V1: 3, V2: 12022
name,val1,val2

Pandas Dataframe row as a formatted JSON output

I have data frame which I am trying to group by customer and print an output ,the to_json is not giving the format. Also I need to create separate json file for each customer, I think using the pandas generic method custom json formatting is not possible, what should be the direction I should be looking for.
I tried to group by customer_id , first_name and last_name and then set them as index and tried the orientation as index value but that didn't really worked out.
import pandas as pd
data = [{'customer_id': 1, 'first_name':'John', 'last_name':'Doe', 'amount':100, 'sub_amount':50,'total': 150,'product':'tool box'},
{'customer_id': 1, 'first_name':'John', 'last_name':'Doe', 'amount':50, 'sub_amount':50,'total': 100,'product':'light'},
{'customer_id': 2, 'first_name':'Jane', 'last_name':'Doe', 'amount':200, 'sub_amount':50,'total': 250,'product':'iron box'},
{'customer_id': 2, 'first_name':'Jane', 'last_name':'Doe', 'amount':50, 'sub_amount':50,'total': 100,'product':'led'}
]
df = pd.DataFrame(data)
df
customer_id first_name last_name amount sub_amount total product
0 1 John Doe 100 50 150 tool box
1 1 John Doe 50 50 100 light
2 2 Jane Doe 200 50 250 iron box
3 2 Jane Doe 50 50 100 led
expected output
{
"frist_name": "John",
"last_name": "Doe",
"Product_Details": {
"too box": {
"total": 150,
"amount": 100
},
"light": {
"total": 100,
"amount": 50
}
}
}

clients={}
for index,row in df.iterrows():
clients.setdefault(row['customer_id'], {'first_name': row['first_name'],
'last_name': row['last_name']})
clients[row['customer_id']].setdefault('Product_Details',{})[row['product']] = \
{'total': row['total'], 'amount': row['amount']}
print(json.dumps(clients[1],indent=4))

Sort and Select Top 5 JSON values

I have a two-fold issue and looking for clues as to how to approach it.
I have a json file that is formatted as such:
{
"code": 2000,
"data": {
"1": {
"attribute1": 40,
"attribute2": 1.4,
"attribute3": 5.2,
"attribute4": 124
"attribute5": "65.53%"
},
"94": {
"attribute1": 10,
"attribute2": 4.4,
"attribute3": 2.2,
"attribute4": 12
"attribute5": "45.53%"
},
"96": {
"attribute1": 17,
"attribute2": 9.64,
"attribute3": 5.2,
"attribute4": 62
"attribute5": "51.53%"
}
},
"message": "SUCCESS"
}
My goals are to:
I would first like to sort the data by any of the attributes.
There are around 100 of these, I would like to grab the top 5 (depending on how they are sorted), then...
Output the data in a table e.g.:
These are sorted by: attribute5
---
attribute1 | attribute2 | attribute3 | attribute4 | attribute5
40 |1.4 |5.2|124|65.53%
17 |9.64|5.2|62 |51.53%
10 |4.4 |2.2|12 |45.53%
*also, attribute5 above is a string value
Admittedly, my knowledge here is very limited.
I attempted to mimick the method used here:
python sort list of json by value
I managed to open the file and I can extract the key values from a sample row:
import json
jsonfile = path-to-my-file.json
with open(jsonfile) as j:
data=json.load(j)
k = data["data"]["1"].keys()
print(k)
total=data["data"]
for row in total:
v = data["data"][str(row)].values()
print(v)
this outputs:
dict_keys(['attribute1', 'attribute2', 'attribute3', 'attribute4', 'attribute5'])
dict_values([1, 40, 1.4, 5.2, 124, '65.53%'])
dict_values([94, 10, 4.4, 2.2, 12, '45.53%'])
dict_values([96, 17, 9.64, 5.2, 62, '51.53%'])
Any point in the right direction would be GREATLY appreciated.
Thanks!

If you don't mind using pandas you could do it like this
import pandas as pd
rows = [v for k,v in data["data"].items()]
df = pd.DataFrame(rows)
# then to get the top 5 values by attribute can choose either ascending
# or descending with the ascending keyword and head prints the top 5 rows
df.sort_values('attribute1', ascending=True).head()
This will allow you to sort by any attribute you need at any time and print out a table.
Which will produce output like this depending on what you sort by
attribute1 attribute2 attribute3 attribute4 attribute5
0 40 1.40 5.2 124 65.53%
1 10 4.40 2.2 12 45.53%
2 17 9.64 5.2 62 51.53%

I'll leave this answer here in case you don't want to use pandas but the answer from #MatthewBarlowe is way less complicated and I recommend that.
For sorting by a specific attribute, this should work:
import json
SORT_BY = "attribute4"
with open("test.json") as j:
data = json.load(j)
items = data["data"]
sorted_keys = list(sorted(items, key=lambda key: items[key][SORT_BY], reverse=True))
Now, sorted_keys is a list of the keys in order of the attribute they were sorted by.
Then, to print this as a table, I used the tabulate library. The final code for me looked like this:
from tabulate import tabulate
import json
SORT_BY = "attribute4"
with open("test.json") as j:
data = json.load(j)
items = data["data"]
sorted_keys = list(sorted(items, key=lambda key: items[key][SORT_BY], reverse=True))
print(f"\nSorted by: {SORT_BY}")
print(
tabulate(
[
[sorted_keys[i], *items[sorted_keys[i]].values()]
for i, _ in enumerate(items)
],
headers=["Column", *items["1"].keys()],
)
)
When sorting by 'attribute5', this outputs:
Sorted by: attribute5
Column attribute1 attribute2 attribute3 attribute4 attribute5
-------- ------------ ------------ ------------ ------------ ------------
1 40 1.4 5.2 124 65.53%
96 17 9.64 5.2 62 51.53%
94 10 4.4 2.2 12 45.53%

How to format the structure of data returned from SQL query

I got this data from my SQL query:
addon_id | addon_name | addon_category_id
---------+------------+------------------
1 | abc | 10
2 | def | 20
3 | ghi | 10
Now I have to send this in the following JSON format and group the addons based on addon_category_id:
[
{
addon_category_id: 10,
addons:
[
{
addon_id: 1,
addon_name: abc
},
{
addon_id: 3,
addon_name: ghi
}
]
},
{
addon_category_id: 20
addons:
[
{
addon_id: 2,
addon_name: def
}
]
}
]
How can I do this? What is the logic behind that? Do I have to do it programmatically using a for loop or is there any other way?

As mentioned in the comments it depends on the programming language you use. In SQL Server 2016 you can use FOR JSON AUTO
SELECT b.addon_category_id ,addons.addon_id , addons.addon_name
FROM addon a
JOIN addon addons
ON a.addon_category_id = b.addon_category_id
FOR JSON AUTO;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Python csv conversion to specific nested json - json

Related

Postgres Json List of Map

In groovy, how can i iterate over a csv file and call a function for every "group" that as a different value on some of the columns?

Pandas Dataframe row as a formatted JSON output

Sort and Select Top 5 JSON values

How to format the structure of data returned from SQL query

Categories

Resources