I have created a function to output a JSON file into which I input a pandas data frame and a variable.
Dataframe (df) has 3 columns: ' a', 'b', and 'c'.
df = pd.DataFrame([[1,2,3], [0.1, 0.2, 0.3]], columns=('a','b','c'))
Error is a variable with a float value.
Error = 45
The output format of the JSON file should look like this:
{
"Error": 45,
"Data": [
{
"a": [1, 0.1]
},
{
"b": [2, 0.2]
},
{
"c": [3, 0.3]
},
]
}
I can convert the dataframe into a JSON using the below code. But how can I obtain the desired format of the JSON file?
def OutputFunction(df, Error):
#json_output = df_ViolationSummary.to_json(orient = 'records')
df.to_json(r'C:\Users\Administrator\Downloads\export_dataframe.json', orient = 'records')
## Calling the Function
OutputFunction(df, Error)
calling to_dict(orient='list') will return a dictionary object with each key representing the column and value as the column values in a list. Then you can achieve your desired json object like this: output = {"Error":Error, "Data": df.to_dict(orient='list')}.
Running this line will return:
{'Error': 45, 'Data': {'a': [1.0, 0.1], 'b': [2.0, 0.2], 'c': [3.0, 0.3]}}
Note that the integer values will be in a float format since some of the values in the dataframe are floats, so the the columns' data types become float. If you really wish to have mixed types, you could use some form of mapping/dictionary comprehension as the following, although it should not be necessary for most cases:
output = {
"Error":Error,
"Data": {
col: [
int(v) if v%1 == 0 else v
for v in vals
]
for col,vals in df.to_dict(orient='list').items()
}
}
Related
{
"ABC": {
"A": {
"Date": "01/01/2021",
"Value": "0.09"
},
"B": {
"Date": "01/01/2021",
"Value": "0.001"
}
},
"XYZ": {
"A": {
"Date": "01/01/2021",
"Value": "0.006"
},
"B": {
"Date": "01/01/2021",
"Value": "0.000"
}
}
}
Current output after applying pd.json_normalize(x,max_level=1)
Expected Output :
Need to Convert this to pandas DataFrame
If any one can help or give some advice on working with this data that would be great!
Use the following while the js is your input dict:
s = pd.DataFrame(js)
ss = s.apply(lambda x: [pd.Series(y)['Value'] for y in x])
ss['Date'] = s['ABC'].apply(pd.Series)['Date']
result:
One of possible options is custom processing of your x object,
creating a list of rows:
lst = []
for k1, v1 in x.items():
row = {}
row['key'] = k1
for k2, v2 in v1.items():
dd = v2["Date"]
vv = float(v2["Value"])
row['Date'] = dd
row[k2] = vv
lst.append(row)
Note that the above code also converts Value to float type.
I assumed that all dates in each first-level object are the same,
so in the second level loop Date is overwritten, but I assume
that this does no harm.
Then you can create the output DataFrame as follows:
df = pd.DataFrame(lst)
df.set_index('key', inplace=True)
df.index.name = None
The result is:
Date A B
ABC 01/01/2021 0.090 0.001
XYZ 01/01/2021 0.006 0.000
Although it is possible to read x using json_normalize into a temporary
DataFrame, the sequence of operations to convert it to your desired shape
would be complicated.
This is why I came up with the above, in my opinion conceptually simpler solution.
I'm working with a REST API that returns data in the following format:
{
"id": "2902cbad6da44459ad05abd1305eed14",
"displayName": "",
"sourceHost": "dev01.test.lan",
"sourceIP": "192.168.145.1",
"messagesPerSecond": 0,
"messages": 2733,
"size": 292062,
"archiveSize": 0,
"dates": [
{
"date": 1624921200000,
"messages": 279,
"size": 29753,
"archiveSize": 0
},
{
"date": 1625007600000,
"messages": 401,
"size": 42902,
"archiveSize": 0
}
]
}
I'm using json.loads to successfully pull the data from the API, and I now need to search for a particular "date:" value and read the corresponding "messages", "size" and "archiveSize" values.
I'm trying to use the "if-in" method to find the value I'm interested in, for example:
response = requests.request("GET", apiQuery, headers=headers, data=payload)
json_response = json.loads(response.text)
test = 2733
if test in json_response.values():
print(f"Yes, value: '{test}' exist in dictionary")
else:
print(f"No, value: '{test}' does not exist in dictionary")
This works fine for any value in the top section of the JSON return, but it never finds any values in the "dates" sub-branches.
I have two questions, firstly, how do I find the target "date" value? Secondly, once I find that "sub-branch" what would be the best way to extract the three values I need?
Thanks.
from json import load
def list_dates_whose_message_count_equals(dates=None, message_count=0):
return list(filter(
lambda date: date.get("messages") == message_count, dates
))
def main():
json_ = {}
with open("values.json", "r") as fp:
json_ = load(fp)
print(list_dates_whose_message_count_equals(json_["dates"], message_count=279))
print(list_dates_whose_message_count_equals(json_["dates"], message_count=401))
if __name__ == "__main__":
main()
Returns this
[{'date': 1624921200000, 'messages': 279, 'size': 29753, 'archiveSize': 0}]
[{'date': 1625007600000, 'messages': 401, 'size': 42902, 'archiveSize': 0}]
{
"key1": {
"subfield1": 4,
"subfield2": "hello"
},
"key2": 325,
...
}
No idea about how deeply nested it would be (max upto 4 - 5 levels). The only guarantee is that each key will be a string type. What I want to do is to convert the above JSON into the following format:
{
"field1.subfield1": 4,
"field1.subfield2": "hello",
"field2" : 325,
...,
"fieldN.subfieldM.subsubfieldK. ...": "blah blah"
}
How can I do this?
I am currently assuming that your dictionary is only containing other nested dictionaries or the final value.
My input data:
data = {
"key1": {
"subfield1": 4,
"subfield2": {"subfield1": 4,
"subfield2": "hello"
},
},
"key2": 325,
}
Then by using a python generator, I can generate the nested key combinations and value using:
def flatten_dict(data, prefix=''):
for key, value in data.items():
if isinstance(value, dict):
for item in flatten_dict(value, prefix=f"{prefix}{key}."):
yield item
else:
yield f"{prefix}{key}", value
Then using that function on the data:
from pprint import pprint
pprint(dict(flatten_dict(data)))
results in the output:
{'key1.subfield1': 4,
'key1.subfield2.subfield1': 4,
'key1.subfield2.subfield2': 'hello',
'key2': 325}
Here is my sample json data:
{
"id": "67362003",
"a": {
"b": {
"3": 2,
"43": 30,
"c": "jack",
"d": "trail",
"e": [
{
"f": {
"g": "Father",
"h": "Mother"
}
}
],
"p": [
{
"q": {
"r": "Aunt",
"s": "Uncle"
}
}
]
}
}
}
Wrote a method, objpath which will get the each input from user-config: like [root:id, root:a:b:e[0]] and returns each path as ['id] at first call.. and for second call.. it returns ['a']['b']['e'][0] which I can able to read and print in my code of wrapper.
I am able to print print (obj['a']['b']['e'][0]) this manner.. In that, objpath method will return '['a']['b']['e'][0]' part as string. So, here how can I convert it to obj['a']['b']['e'][0] format.. It may be simple, I tried multiple ways.. But somehow it's breaking with type of string/list/dict... Below is my python-wrapper to do same
file_json = 'config_lib/sample.json'
lf = open(file_json, 'r')
obj = json.loads(lf.read())
skeys = ["root:id", "root:a:b:e[0]"]
for skey in skeys:
returns = JsonParser(obj, skey)
# Its returning string.
print type(returns) # <type 'str'>
print (returns) # *['id']* at first itereation, *['a']['b']['e'][0]* at second iteration
print (obj[returns]) # how to print the obj value here ??? I am not getting
I need to print obj['a']['b']['e'][0] and obj['id']
Can anyone please suggest a solution.
By converting the string to json-object resolved the issue..
print ("Returned value is:", eval('obj' + returns))
I have an RDD of type RDD[(String, List[String])].
Example:
(FRUIT, List(Apple,Banana,Mango))
(VEGETABLE, List(Potato,Tomato))
I want to convert the above output to json object like below.
{
"categories": [
{
"name": "FRUIT",
"nodes": [
{
"name": "Apple",
"isInTopList": false
},
{
"name": "Banana",
"isInTopList": false
},
{
"name": "Mango",
"isInTopList": false
}
]
},
{
"name": "VEGETABLE",
"nodes": [
{
"name": "POTATO",
"isInTopList": false
},
{
"name": "TOMATO",
"isInTopList": false
},
]
}
]
}
Please suggest the best possible way to do it.
NOTE: "isInTopList": false is always constant and has to be there with every item in the jsonobject.
First I used the following code to reproduce the scenario that you mentioned:
val sampleArray = Array(
("FRUIT", List("Apple", "Banana", "Mango")),
("VEGETABLE", List("Potato", "Tomato")))
val sampleRdd = sc.parallelize(sampleArray)
sampleRdd.foreach(println) // Printing the result
Now, I am using json4s Scala library to convert this RDD into the JSON structure that you requested:
import org.json4s.native.JsonMethods._
import org.json4s.JsonDSL.WithDouble._
val json = "categories" -> sampleRdd.collect().toList.map{
case (name, nodes) =>
("name", name) ~
("nodes", nodes.map{
name => ("name", name)
})
}
println(compact(render(json))) // Printing the rendered JSON
The result is:
{"categories":[{"name":"FRUIT","nodes":[{"name":"Apple"},{"name":"Banana"},{"name":"Mango"}]},{"name":"VEGETABLE","nodes":[{"name":"Potato"},{"name":"Tomato"}]}]}
Since you want a single JSON for you entire RDD, I would start by doing Rdd.collect. Be careful that your set fits in memory, as this will move the data back to the driver.
To get the json, just use a library to traverse your objects. I like Json4s due to its simple internal structure and practical, clean operators. Here is a sample from their website that shows how to traverse nested structures (in particular, lists):
object JsonExample extends App {
import org.json4s._
import org.json4s.JsonDSL._
import org.json4s.jackson.JsonMethods._
case class Winner(id: Long, numbers: List[Int])
case class Lotto(id: Long, winningNumbers: List[Int], winners: List[Winner], drawDate: Option[java.util.Date])
val winners = List(Winner(23, List(2, 45, 34, 23, 3, 5)), Winner(54, List(52, 3, 12, 11, 18, 22)))
val lotto = Lotto(5, List(2, 45, 34, 23, 7, 5, 3), winners, None)
val json =
("lotto" ->
("lotto-id" -> lotto.id) ~
("winning-numbers" -> lotto.winningNumbers) ~
("draw-date" -> lotto.drawDate.map(_.toString)) ~
("winners" ->
lotto.winners.map { w =>
(("winner-id" -> w.id) ~
("numbers" -> w.numbers))}))
println(compact(render(json)))
}