Dataframe to nested JSON (grouped by row entry) - json

I have a python pandas dataframe, which should be converted to a nested JSON (based on the entry found in the column 'laterality').
My dataframe looks like this:
import pandas as pd
df1 = pd.DataFrame({'id': [1, 1, 1, 2],
'laterality': ['L', 'L', 'R', 'R'],
'image': ['im1', 'im2', 'im3', 'im4'],
'number': [5, 5, 5, 6] })
The desired JSON output then is:
[
{
"id": 1,
"number": 5
"images L": ["im1", "im2"],
"images R": ["im3"]
},
{
"id": 2,
"number": 6,
"images R": ["im4"]
},
...
]
What I tried:
I tried to group records based on the .to_dict() method like below. Unfortunately this didn't solve my problem yet. Since I'm currently stuck, I am eager to learn from your advise. I'm looking for a push in the right direction.
json1 = (df1.groupby(['id','number'])
.apply(lambda x: x[['laterality','image']].to_dict('records'))
.reset_index()
.rename(columns={0:'image'})
.to_json(orient='records'))
Some related topics I looked at:
more pythonic way to format a JSON string from a list of tuples
Convert Pandas Dataframe to nested JSON
Convert Pandas DataFrame to JSON format

You can group by ['id', 'number', 'laterality'] and aggregate with list. After that pivot to get images as columns:
import pandas as pd
df = pd.DataFrame({'id': [1, 1, 1, 2],
'laterality': ['L', 'L', 'R', 'R'],
'image': ['im1', 'im2', 'im3', 'im4'],
'number': [5, 5, 5, 6] })
df_out = df.pivot_table(columns='laterality', index=['id','number'], values='image', aggfunc=list)
df_out.columns = "images " + df_out.columns
df_out.reset_index().to_json(orient='records')
Edit: replacing groupby + pivot by pivot_table with aggfunc
Output:
[{"id":1,"number":5,"images L":["im1","im2"],"images R":["im3"]},{"id":2,"number":6,"images L":null,"images R":["im4"]}]

It is simple and faster to just iterate the grouped dataframe
out = []
for (index, num), group in df1.groupby(['id', 'number']):
out.append({
"id": index,
"number": num,
"images L": group.loc[group.laterality == 'L', 'image'].tolist(),
"images R": group.loc[group.laterality == 'R', 'image'].tolist()
})
Output
[{'id': 1, 'number': 5, 'images L': ['im1', 'im2'], 'images R': ['im3']},
{'id': 2, 'number': 6, 'images L': [], 'images R': ['im4']}]

Related

corrupted record from json file in pyspark due to False as entry

I have a json file that looks like this:
test= {'kpiData': [{'date': '2020-06-03 10:05',
'a': 'MINIMUMINTERVAL',
'b': 0.0,
'c': True},
{'date': '2020-06-03 10:10',
'a': 'MINIMUMINTERVAL',
'b': 0.0,
'c': True},
{'date': '2020-06-03 10:15',
'a': 'MINIMUMINTERVAL',
'b': 0.0,
'c': True},
{'date': '2020-06-03 10:20',
'a': 'MINIMUMINTERVAL',
'b': 0.0,}
]}
I want to transfer it to a dataframe object, like this:
rdd = sc.parallelize([test])
jsonDF = spark.read.json(rdd)
This results in a corrupted record. From my understanding the reason for this is, that True and False can't be entries in Python. So I need to tranform these entries prior to the spark.read.json() (to TRUE, true or "True"). test is a dict and rdd is a pyspark.rdd.RDD object. For a datframe object the transformation is pretty straigth forward, but I didn't find a solution for these objects.
spark.read.json expects an RDD of JSON strings, not an RDD of Python dictionaries. If you convert the dictionary to a JSON string, you should be able to read that into a dataframe:
import json
df = spark.read.json(sc.parallelize([json.dumps(test)]))
Another possible way is to read in the dictionary using spark.createDataFrame:
df = spark.createDataFrame([test])
which will give a different schema with maps instead of structs.

Convert volley string response (List<List<Int>>) to Kotlin list

Im stuck at what probably has a simple solution.
I have a string representation of a list, like this:
"[[1, 2, 3], [4, 5, 6]]"
In other words, a list containing 2 lists of 3 integers
How to convert the string to a list object of List<List> in Kotlin?
You can use kotlinx.serialization to deserialize JSON!
As a standalone Kotlin script:
#file:DependsOn("org.jetbrains.kotlinx:kotlinx-serialization-json:1.2.0")
import kotlinx.serialization.decodeFromString
import kotlinx.serialization.json.Json
val s = "[[1, 2, 3], [4, 5, 6]]"
val j = Json.decodeFromString<List<List<Int>>>(s)
println(j) // [[1, 2, 3], [4, 5, 6]]
println(j[0][0]) // 1
In an Android app's build.gradle you would need these lines instead of #file:DependsOn:
dependencies {
implementation 'org.jetbrains.kotlinx:kotlinx-serialization-json:1.2.0'
}
apply plugin: 'kotlinx-serialization'

How to display csv data in html as URL using flask?

With the flask app, I managed to display the my.csv file in an HTML table. Now I'm trying to add URL to each userId that is displayed in HTML table (see Output), for example:
https://myURL.com/1
https://myURL.com/2 etc...
What would be the best way to achieve this assuming that once I click on the userId URL in userID column it will bring me to an HTML page with more details specific for that ID.
app.py
from flask import Flask,render_template, request
import pandas as pd
import numpy as np
app = Flask(__name__)
#app.route('/example')
def dataframe():
df = pd.read_csv("my.csv")
return render_template("example.html", data=df.head(5).to_html())
if __name__ == "__main__":
app.run()
example.html
<!DOCTYPE html>
<html>
<head>
<title>CSV Data</title>
</head>
<body>
<h1>My stats</h1>
{{data | safe}}
</body>
</html>
output: http://127.0.0.1:5000/example
Thank you in advance!
Although you can render HTML from Pandas and then render that HTML in Flask, the problem is that approach prevents additional edits to the HTML. To get started with the task, I created some synthetic data in Pandas
import pandas
df = pandas.DataFrame([{"userID": 4, "customers": 23},
{"userID": 3, "customers": 33},
{"userID": 1, "customers": 42},
{"userID": 5, "customers": 13},])
print(df.to_html())
To include URLs, a naive method would be to try to embed the links as strings in the Pandas dataframe prior to generating the HTML. The problem is that Pandas converts strings:
import pandas
df = pandas.DataFrame([{"userID": "<a href='helo'>4</a>", "customers": 23},
{"userID": 3, "customers": 33},
{"userID": 1, "customers": 42},
{"userID": 5, "customers": 13},])
print(df.to_html())
produces a cell that contains the string <a href='helo'>4</a>
To keep the hyperlink as HTML, tell Pandas to not escape special characters.
import pandas
df = pandas.DataFrame([{"userID": "<a href='helo'>4</a>", "customers": 23},
{"userID": 3, "customers": 33},
{"userID": 1, "customers": 42},
{"userID": 5, "customers": 13},])
print(df.to_html(escape=False))
Next task is to inject links into the dataframe:
list_of_rows = []
for index, row in df.iterrows():
new_row = {'customers': dict(row)['customers']} # keep the old data
uid = str(dict(row)['userID'])
new_row['userID'] = "<a href='http://url.com/"+uid+"'>"+uid+"</a>"
list_of_rows.append(new_row)
df_with_links = pandas.DataFrame(list_of_rows)
Incorporating those techniques into your code,
#app.route('/example')
def dataframe():
df = pd.read_csv("my.csv")
list_of_rows = []
for index, row in df.iterrows():
new_row = {'customers': dict(row)['customers']} # keep the old data
uid = str(dict(row)['userID'])
new_row['userID'] = "<a href='http://url.com/"+uid+"'>"+uid+"</a>"
list_of_rows.append(new_row)
df_with_links = pandas.DataFrame(list_of_rows)
return render_template("example.html", data=df_with_links.to_html(escape=False))

scipy dendrogram to json for d3.js tree visualisation

I am trying to convert results of scipy hierarchical clustering into json for display in d3.js here an example
The following codes produces a dendrogram with 6 branches.
import pandas as pd
import scipy.spatial
import scipy.cluster
d = {'employee' : ['A', 'B', 'C', 'D', 'E', 'F'],
'skillX': [2,8,3,6,8,10],
'skillY': [8,15,6,9,7,10]}
d1 = pd.DataFrame(d)
distMat = xPairWiseDist = scipy.spatial.distance.pdist(np.array(d1[['skillX', 'skillY']]), 'euclidean')
clusters = scipy.cluster.hierarchy.linkage(distMat, method='single')
dendo = scipy.cluster.hierarchy.dendrogram(clusters, labels = list(d1.employee), orientation = 'right')
dendo
my question
How can I represent the data in a json file in a format that d3.js understand
{'name': 'Root1’,
'children':[{'name' : 'B'},
{'name': 'E-D-F-C-A',
'children' : [{'name': 'C-A',
'children' : {'name': 'A'},
{'name' : 'C'}]
}
}
]
}
The embarassing truth is that I do not know if I can extract this information from the dendogram or from the linkage matrix and how
I am thankful for any help I can get.
EDIT TO CLARIFY
So far, I have tried to use the totree method but have difficulties understanding its structure (yes, I read the documentation).
a = scipy.cluster.hierarchy.to_tree(clusters , rd=True)
for x in a[1]:
#print x.get_id()
if x.is_leaf() != True :
print x.get_left().get_id(), x.get_right().get_id(), x.get_count()
You can do this in three steps:
Recursively construct a nested dictionary that represents the tree returned by Scipy's to_tree method.
Iterate through the nested dictionary to label each internal node with the leaves in its subtree.
dump the resulting nested dictionary to JSON and load into d3.
Construct a nested dictionary representing the dendrogram
For the first step, it is important to call to_tree with rd=False so that the root of the dendrogram is returned. From that root, you can construct the nested dictionary as follows:
# Create a nested dictionary from the ClusterNode's returned by SciPy
def add_node(node, parent ):
# First create the new node and append it to its parent's children
newNode = dict( node_id=node.id, children=[] )
parent["children"].append( newNode )
# Recursively add the current node's children
if node.left: add_node( node.left, newNode )
if node.right: add_node( node.right, newNode )
T = scipy.cluster.hierarchy.to_tree( clusters , rd=False )
d3Dendro = dict(children=[], name="Root1")
add_node( T, d3Dendro )
# Output: => {'name': 'Root1', 'children': [{'node_id': 10, 'children': [{'node_id': 1, 'children': []}, {'node_id': 9, 'children': [{'node_id': 6, 'children': [{'node_id': 0, 'children': []}, {'node_id': 2, 'children': []}]}, {'node_id': 8, 'children': [{'node_id': 5, 'children': []}, {'node_id': 7, 'children': [{'node_id': 3, 'children': []}, {'node_id': 4, 'children': []}]}]}]}]}]}
The basic idea is to start with a node not in the dendrogram that will serve as the root of the whole dendrogram. Then we recursively add left- and right-children to this dictionary until we reach the leaves. At this point, we do not have labels for the nodes, so I'm just labeling nodes by their clusterNode ID.
Label the dendrogram
Next, we need to use the node_ids to label the dendrogram. The comments should be enough explanation for how this works.
# Label each node with the names of each leaf in its subtree
def label_tree( n ):
# If the node is a leaf, then we have its name
if len(n["children"]) == 0:
leafNames = [ id2name[n["node_id"]] ]
# If not, flatten all the leaves in the node's subtree
else:
leafNames = reduce(lambda ls, c: ls + label_tree(c), n["children"], [])
# Delete the node id since we don't need it anymore and
# it makes for cleaner JSON
del n["node_id"]
# Labeling convention: "-"-separated leaf names
n["name"] = name = "-".join(sorted(map(str, leafNames)))
return leafNames
label_tree( d3Dendro["children"][0] )
Dump to JSON and load into D3
Finally, after the dendrogram has been labeled, we just need to output it to JSON and load into D3. I'm just pasting the Python code to dump it to JSON here for completeness.
# Output to JSON
json.dump(d3Dendro, open("d3-dendrogram.json", "w"), sort_keys=True, indent=4)
Output
I created Scipy and D3 versions of the dendrogram below. For the D3 version, I simply plugged the JSON file I output ('d3-dendrogram.json') into this Gist.
SciPy dendrogram
D3 dendrogram

How do I `jsonify` a list in Flask? [duplicate]

This question already has answers here:
Return JSON response from Flask view
(15 answers)
Closed 5 years ago.
Currently Flask would raise an error when jsonifying a list.
I know there could be security reasons https://github.com/mitsuhiko/flask/issues/170, but I still would like to have a way to return a JSON list like the following:
[
{'a': 1, 'b': 2},
{'a': 5, 'b': 10}
]
instead of
{ 'results': [
{'a': 1, 'b': 2},
{'a': 5, 'b': 10}
]}
on responding to a application/json request. How can I return a JSON list in Flask using Jsonify?
You can't but you can do it anyway like this. I needed this for jQuery-File-Upload
import json
# get this object
from flask import Response
#example data:
js = [ { "name" : filename, "size" : st.st_size ,
"url" : url_for('show', filename=filename)} ]
#then do this
return Response(json.dumps(js), mimetype='application/json')
jsonify prevents you from doing this in Flask 0.10 and lower for security reasons.
To do it anyway, just use json.dumps in the Python standard library.
http://docs.python.org/library/json.html#json.dumps
This is working for me. Which version of Flask are you using?
from flask import jsonify
...
#app.route('/test/json')
def test_json():
list = [
{'a': 1, 'b': 2},
{'a': 5, 'b': 10}
]
return jsonify(results = list)
Flask's jsonify() method now serializes top-level arrays as of this commit, available in Flask 0.11 onwards.
For convenience, you can either pass in a Python list: jsonify([1,2,3])
Or pass in a series of args: jsonify(1,2,3)
Both will be serialized to a JSON top-level array: [1,2,3]
Details here: https://flask.palletsprojects.com/en/2.2.x/api/?highlight=jsonify#flask.json.jsonify**
Solved, no fuss. You can be lazy and use jsonify, all you need to do is pass in items=[your list].
Take a look here for the solution
https://github.com/mitsuhiko/flask/issues/510
A list in a flask can be easily jsonify using jsonify like:
from flask import Flask,jsonify
app = Flask(__name__)
tasks = [
{
'id':1,
'task':'this is first task'
},
{
'id':2,
'task':'this is another task'
}
]
#app.route('/app-name/api/v0.1/tasks',methods=['GET'])
def get_tasks():
return jsonify({'tasks':tasks}) #will return the json
if(__name__ == '__main__'):
app.run(debug = True)
If you are searching literally the way to return a JSON list in flask and you are completly sure that your variable is a list then the easy way is (where bin is a list of 1's and 0's):
return jsonify({'ans':bin}), 201
Finally, in your client you will obtain something like
{ "ans": [ 0.0, 0.0, 1.0, 1.0, 0.0 ] }
josonify works... But if you intend to just pass an array without the 'results' key, you can use JSON library from python. The following conversion works for me.
import json
#app.route('/test/json')
def test_json():
mylist = [
{'a': 1, 'b': 2},
{'a': 5, 'b': 10}
]
return json.dumps(mylist)