I need to parse through a dictionary and update its value - json

I have a problem statement in which I have to read a JSON file. The JSON file when converted to the dictionary using json.loads() has 12 keys.
One of the key('body') has the value which is of type string. When converting this string again to dictionary using json.loads() results in a list of dictionaries. The length of this list of dictionaries is 1000 while each dictionary within has a length of 24.
I need to increase the number of dictionaries so that my list of dictionaries has a new length of 2000. Each dictionary within has a length of 24 has a key('id') that needs to be unique.
Now, this is my code snippet where I'm trying to update the value of the dictionary if my key value is 'id':
val = 1
for each_dict in list_of_dictionary:
for k,v in each_dict.items():
if k == 'id':
v = val
print("value is ",v)
val = val+1
O/P
value is 1
value is 2
and so on...
Now, when I am trying to view the updated value again, I can see the previous values only.
This is the code snippet:
for each_dict in list_of_dictionary:
for k,v in each_dict.items():
if k == 'id':
print("value is ",v)
O/P
value is 11123
value is 11128
and so on...
Whereas I want the output as above since I have updated the values already.

Got the answer. Actually in the first for-in loop, I realized that I forgot to update the dictionary and that's why in the second loop, I couldn't see the updated data. So the updated code for the first for loop would be :
val = 1
for each_dict in list_of_dictionary:
for k,v in each_dict.items():
if k == 'id':
temp = {k, val}
each_dict.update(temp)
val = val+1
Now I'm able to see the updated data in the second loop.

Related

How to convert multi dimensional array in JSON as separate columns in pandas

I have a DB collection consisting of nested strings . I am trying to convert the contents under "status" column as separate columns against each order ID in order to track the time taken from "order confirmed" to "pick up confirmed". The string looks as follows:
I have tried the same using
xyz_db= db.logisticsOrders -------------------------(DB collection)
df =pd.DataFrame(list(xyz_db.find())) ------------(JSON to dataframe)
Using normalize :
parse1=pd.json_normalize(df['status'])
It works fine in case of non nested arrays. But status being a nested array the output is as follows:
Using for :
data = df[['orderid','status']]
data = list(data['status'])
dfy = pd.DataFrame(columns = ['statuscode','statusname','laststatusupdatedon'])
for i in range(0, len(data)):
result = data[i]
dfy.loc[i] = [data[i][0],data[i][0],data[i][0],data[i][0]]
It gives the result in form of appended rows which is not the format i am trying to achieve
The output I am trying to get is :
Please help out!!
i share you which i used json read, maybe help you:
you can use two and more list
def jsonify(z):
genr = []
if z==z and z is not None:
z = eval(z)
if type(z) in (dict, list, tuple):
for dic in z:
for key, val in dic.items():
if key == "name":
genr.append(val)
else:
return None
else:
return None
return genr
top_genr['genres_N']=top_genr['genres'].apply(jsonify)

extract rows and columns from dictionary of JSON responses consisting of lists of dictionaries in python

sorry for the confusing title.
So im trying to read a butload of JSON responses using grequests with this loop:
def GetData():
urlRespDict = {}
for OrderNo in LookupNumbers['id']:
urls1 = []
for idno in ParameterList:
urlTemp = url0_const + OrderNo + url1_const + idno + param1_const
urls1.append(urlTemp)
urlRespDict[OrderNo] = grequests.map((grequests.get(u) for u in urls1))
return urlRespDict
Which is all fine and dandy, my response is a dictionary of 4 keys with consisting of a lists with sizes 136.
When i read one of the responses with (key and index are random):
d1 = dict_responses['180378'][0].json()
I get a list of dictionaries that has a dictionary inside see picture below.
Basically all i want to get out is the value from the 'values' key where in this case is '137' and
'13,80137' ideally i want to create a df that has columns with the 'key' (in this case the '137') and rows with the values extracted from d1.
I've tried using apply(pd.Series) on the values dict. But it is very time consuming.
like:
df2 = [(pd.DataFrame.from_records(n))['values'].apply(pd.Series,dtype="string") for n in df1]
just to see the data.
I hope theres another alternative, i am not an experienced coder
I hope i explained it good enough and i hope you can help. Thank you so much in advance

How to convert pyarrow.Table columnar data to tabular, row-like, data

I am having some columnar data in Pyarrow Table. How to convert it back from dict of lists to list of dicts?
The following code snippet allows you to iterate the table efficiently using pyarrow.RecordBatch.to_pydict() as a working buffer.
See full example.
"""Columnar data manipulation utilities."""
from typing import Iterable, Dict
def iterate_columnar_dicts(inp: Dict[str, list]) -> Iterable[Dict[str, object]]:
"""Iterates columnar dict data as rows.
Useful for constructing rows/objects out from :py:class:`pyarrow.Table` or :py:class:`pyarrow.RecordBatch`.
Example:
.. code-block:: python
#classmethod
def create_from_pyarrow_table(cls, table: pa.Table) -> "PairUniverse":
pairs = {}
for batch in table.to_batches(max_chunksize=5000):
d = batch.to_pydict()
for row in iterate_columnar_dicts(d):
pairs[row["pair_id"]] = DEXPair.from_dict(row)
return PairUniverse(pairs=pairs)
:param inp: Input dictionary of lists e.g. one from :py:method:`pyarrow.RecordBatch.to_pydict`. All lists in the input must be equal length.
:return: Iterable that gives one dictionary per row after transpose
"""
keys = inp.keys()
for i in range(len(inp)):
item = {key: inp[key][i] for key in keys}
yield item

Get JuliaDB.loadtable() to parse all columns as String

I want JuliaDB.loadtable() to read a CSV (really a bunch of CSVs, but for simplicity let's try just one), where all columns are parsed as String.
Here's what I've tried:
using CSV
using DataFrames
using JuliaDB
df1 = DataFrame(
[['a', 'b', 'c'], [1, 2, 3]],
["name", "id"]
)
CSV.write("df1.csv", df1)
# This works, but if I have 10+ columns it would get unwieldy
df1 = loadtable("df1.csv"; colparsers=Dict(:name=>String, :id=>String),)
# This doesn't work
df1 = loadtable("df1.csv"; colparsers=String,)
# MethodError: no method matching iterate(::Type{String})
Here's how it's done in R:
df1 = read.csv("df1.csv", colClasses = "character")
If you know the number of columns (or just an upper bound on it), you can use types, I should think (from CSV.jl documentation):
types: a Vector or Dict of types to be used for column types; a Dict can map column index Int, or name Symbol or String to type for a column, i.e. Dict(1=>Float64) will set the first column as a Float64, Dict(:column1=>Float64) will set the column named column1 to Float64 and, Dict("column1"=>Float64) will set the column1 to Float64; if a Vector if provided, it must match the # of columns provided or detected in header

How can i flatten hbase cells so i can process the resulting JSON using a Spark RDD or Data frame in scala?

a relative newbie to spark, hbase, and scala here.
I have json (stored as byte arrays) in hbase cells in the same column family but across several thousand column qualifiers. Example (simplified):
Table name: 'Events'
rowkey: rk1
column family: cf1
column qualifier: cq1, cell data (in bytes): {"id":1, "event":"standing"}
column qualifier: cq2, cell data (in bytes): {"id":2, "event":"sitting"}
etc.
Using scala, I can read rows by specifying a timerange
val scan = new Scan()
val start = 1460542400
val end = 1462801600
val hbaseContext = new HBaseContext(sc, conf)
val getRdd = hbaseContext.hbaseRDD(TableName.valueOf("Events"), scan)
If I try to load up my hbase rdd (getRdd) into a dataframe (after converting the byte arrays into string etc.), it only reads the first cell in every row (in the example above, I would only get "standing".
this code only loads up a single cell for every row returned
val resultsString = getRdd.map(s=>Bytes.toString(s._2.value()))
val resultsDf = sqlContext.read.json(resultsString)
In order to get every cell I have to iterate as below.
val jsonRDD = getRdd.map(
row => {
val str = new StringBuilder
str.append("[")
val it = row._2.listCells().iterator()
while (it.hasNext) {
val cell = it.next()
val cellstring = Bytes.toString(CellUtil.cloneValue(cell))
str.append(cellstring)
if (it.hasNext()) {
str.append(",")
}
}
str.append("]")
str.toString()
}
)
val hbaseDataSet = sqlContext.read.json(jsonRDD)
I need to add the square brackets and the commas so its properly formatted json for the dataframe to read it.
Questions:
Is there a more elegant way to construct the json i.e. some parser that takes in the individual json strings and concatenates them together so its properly formed json?
Is there a better capability to flatten hbase cells so i dont need to iterate?
For the jsonRdd, the closure that is computed should include the str local variable, so the task executing this code on a node should not be missing the "[", "]" or ",". i.e i wont get parser errors once i run this on the cluster instead of local[*]
Finally, is it better to just create a pair RDD from the json or use data frames to perform simple things like counts? Is there some way to measure the efficiency and performance of one vs. the other?
thank you