Converting a DataFrame into JSON-list using Python3 - json

This is how far I've gotten from converting a pandas dataframe into the format I want it.
print(data_frame)
Output:
>>{'Country': ['Croatia', 'Serbia', 'United States', 'Algeria', 'Chile'],
'Value': [5, 5, 4, 2, 2]}
dictlist = []
for key, value in data_frame.items():
temp = [key,value]
dictlist.append(temp)
print(dictlist)
Output:
>>[['Country', ['Croatia', 'Serbia', 'United States', 'Algeria', 'Chile']], ['Value', [5, 5, 4, 2, 2]]]
Now here is where I'm lost. I've tried so many combinations of videos and tutorials that even my brute force ideas are empty.
for key, val in dictlist:
print(key, "->", val)
for item in val:
print(item)
output:
>>Country -> ['Croatia', 'Serbia', 'United States', 'Algeria', 'Chile']
>>Croatia
>>Serbia
>>United States
>>Algeria
>>Chile
>>Value -> [5, 5, 4, 2, 2]
>>5
>>5
>>4
>>2
>>2
I just don't know how to work inside that for loop. I can't iterate any further than that.
This is the format I'm trying to get it converted to:
[
{
"name": "Croatia",
"value": 5,
},
{
"name": "Serbia",
"value": 5,
},
{
"name": "United States",
"value": 4,
},
{
"name": "Algeria",
"value": 2,
},
{
"name": "Chile",
"value": 2,
}
]
Update
Thanks to a friendly fellow in the chat, I've now gotten closer to the goal. This line helped me a lot.
data_frame = frame.set_index('Country').T.to_dict('list')
print(data_frame)
output:
{'Croatia': [5],
'Serbia': [5],
'United States': [4],
'Algeria': [2],
'Chile': [2]}
Not sure why the values got converted into arrays, but at least this should be solvable! Cheers!

This would return dictionary as desired in the question.
# this is the data frame
frame = pd.DataFrame({'Country': ['Croatia', 'Serbia', 'United States', 'Algeria', 'Chile'],
Check the pandas documentation for to_dict to get available parameter options apart from 'list' that was used in the posted code sample.
Now one can settle for 'records' which returns dictionary inside a list:
data_frame = frame.set_index('Country')T.to_dict('records')[0]
# this is the output without [0]
# [{'Croatia': 5, 'Serbia': 5, 'United States': 4, 'Algeria': 2, 'Chile': 2}]
# so you need the [0] to pick the dictionary in the list and get the desired dictionary output.

Related

Why generating json data from a list (array of arrays) results in a quotation mark problem?

Considering the dataframe below:
timestamp coordinates
0 [402, 404] [[2.5719,49.0044], [2.5669,49.0043]]
1 [345, 945] [[2.5719,49.0044], [2.5669,49.0043]]
I'd like to generate a json file like below:
[
{
"vendor": 1,
"path": [
[2.5719,49.0044],
[2.5669,49.0043]
],
"timestamps": [402, 404]
},
{
"vendor": 1,
"path": [
[2.5719,49.0044],
[2.5669,49.0043]
],
"timestamps": [345, 945]
}]
To do so, my idea is:
For each row of my df, generate a new column geometry
containing row json data
Then append all geometries in a json
However, my function below doesn't work.
df["geometry"] = df.apply(lambda row: {
"vendor": 1,
"path": row["coordinates"],
"timestamps": row["timestamp"]
},
axis = 1)
Indeed, the result is (for example):
Note the quote marks (') around arrays in path
{
'vendor': 1,
'path': ['[2.5719,49.0044]', '[2.5669,49.0043]'],
'timestamps': [402, 404]
}
Any idea?
Thanks
Presumably the values in coordinates column are of type string. You can use ast.literal_eval to convert it to list:
from ast import literal_eval
df["geometry"] = df.apply(
lambda row: {
"vendor": 1,
"path": literal_eval(row["coordinates"]),
"timestamps": row["timestamp"],
},
axis=1,
)
print(df)
Prints:
timestamp coordinates geometry
0 [402, 404] [[2.5719,49.0044], [2.5669,49.0043]] {'vendor': 1, 'path': [[2.5719, 49.0044], [2.5669, 49.0043]], 'timestamps': [402, 404]}
1 [345, 945] [[2.5719,49.0044], [2.5669,49.0043]] {'vendor': 1, 'path': [[2.5719, 49.0044], [2.5669, 49.0043]], 'timestamps': [345, 945]}

Converting JSON to CSV with missing fieldnames

I am having a hard time converting JSON to csv because the names on some of the records don't show up. For example:
[{device: 1,
name: 'John',
age: 25,
city: 'Denver'
},
{device: 2,
name: 'Jake',
age: 24,
city: 'New York'
},
{device: 3,
name: 'Phil',
age: 23}]
It is further made difficult because it's several thousand rows where sometimes the city is known, other times it's not.
I would like to put these together into a csv and just leave Phil's city blank.
You can use this:
import json
import csv
js = """[{"device": 1,
"name": "John",
"age": 25,
"city": "Denver"
},
{"device": 2,
"name": "Jake",
"age": 24,
"city": "New York"
},
{"device": 3,
"name": "Phil",
"age": 23}]
"""
js = json.loads(js)
with open( 'result.csv', 'w' ) as csv_file:
writer = csv.writer( csv_file )
columns = list({column for row in js for column in row.keys()})
writer.writerow( columns )
for row in js:
writer.writerow([None if column not in row else row[column] for column in columns])
This works even with different column names and larger numbers of columns!
There is probably a built in way to do this in the json to csv module but you could iterate over all your dicts and add the missing keys:
my_list = [{'device':1,
'name':'John',
'age':25,
'city': 'Denver' },
{'device':2,
'name':'Jake',
'age':24,
'city': 'New York'
},
{'device':3,
'name':'Phil',
'age':23}]
keys_and_default = {
'device': -1,
'name': 'N/A',
'age': -1,
'city': 'N/A'
}
for dic in my_list:
for key, val in keys_and_default.items():
if key not in dic:
dic[key] = val
print(my_list)

Retrieve only values from Array

I need to retrieve only the values from an array of a payload, using dataweave in mule.
I have tried using the ++ technique, but it returns errors or when I put the values variable set an an array I get " in the results.
Input:
{
"Shops":
[{
"StoreName": "Store1",
"Sales":
[{"dayDate": "01/01/2019",
"product": "A",
"quantity": 2
},
{"dayDate": "02/01/2019",
"product": "B",
"quantity": 1
}
]
}]
}
I expect the Output:
[Store1, [01/01/2019, A, 2], [02/01/2019, B, 1]]
but actual is
["Store1, [01/01/2019, A, 2], [02/01/2019, B, 1]"]
How do I remove the " or if there is a better way of obtaining my expected output?
You need the pluck function, which transforms an object into an array (docs here).
%dw 2.0
output application/json
var shops = {
"Shops": [
{
"StoreName": "Store1",
"Sales": [
{
"dayDate": "01/01/2019",
"product": "A",
"quantity": 2
},
{
"dayDate": "02/01/2019",
"product": "B",
"quantity": 1 }]}]}
var res = shops.Shops map (shop) ->
using (sales = shop.Sales map (sale) -> sale pluck $)
[shop.StoreName] ++ sales
---
flatten(res)
This goes a bit beyond what you've described so that it can also handle multiple shops. If you need to be able to handle multiple shops, you probably won't want to use flatten, but it is there so the output matches what you asked for.
Output:
[
"Store1",
[
"01/01/2019",
"A",
2
],
[
"02/01/2019",
"B",
1
]
]

Matching arrays in soapUI JSONPath RegEx Match assertions

I'm trying to make a JSONPath RegEx Match on SoapUI for the following Json Response:
{
"quantidadeItens": 5,
"registros": [
{
"identificador": 1,
"descricao": "Viagem à Disney"
},
{
"identificador": 2,
"descricao": "Carro"
},
{
"identificador": 3,
"descricao": "Smartphone novo"
},
{
"identificador": 4,
"descricao": "Casa nova"
},
{
"identificador": 5,
"descricao": "Apartamento Novo"
}
]
}
On the attached Image we can see that the JsonPath is correct, but the SoapUI is not finding the match.
I guess that the [*] is not supported on SoapUI, but I didn't find anything about it on documentation.
The expected output of your JSONPath expression would be something like:
[
1,
2,
3,
4,
5
]
This doesn't match your regex, but in any case soapUI produces [1, 2, 3, 4, 5], which the soapUI documentation says is not a JSON array but just a list of values enclosed in square brackets.
So, a regex like \[(\s?[0-9]+,?)*\s?\] will match this output:

jq: unnesting records and mixing fields from both record levels

I have the following file:
[
{
'id': 1,
'arr': [{'x': 1,
{'x': 2}]
},
{
'id': 2,
'arr': [{'x': 3},
{'x': 4}]
}
]
How can I transform it into the following form using jq?
[
{'id': 1, 'x': 1},
{'id': 1, 'x': 2},
{'id': 2, 'x': 3},
{'id': 2, 'x': 4},
]
Assuming it doesn't get any more complex than that, you could simply do this:
map(del(.arr) + .arr[])
This is under the assumption that you're replacing the arr property of each object with the contents of the items in arr. It's unclear what you're trying to do exactly.
The input shown in the question is not valid JSON. After making some minor changes to make it valid JSON, the following filter produces the output as shown below:
map( (.arr[]|.x) as $x | {id, "x": $x} )
Output:
[
{
"id": 1,
"x": 1
},
{
"id": 1,
"x": 2
},
{
"id": 2,
"x": 3
},
{
"id": 2,
"x": 4
}
]