create a new library [b] that pair a repeating key from a library [a] with sum values from a library [a] - duplicates

I am working on a function to create a new library that pairs a repeating value from a library item[a] with the sum of values from a library item[b]. In addition, for a given length n, assign 0 to the unspecified key.
For example d={a:[1, 2, 2, 5], b:[3,2,2,1]}
return d={1:3, 2:4, 3:0, 4:0, 5:1}
Does anyone have an idea to solve this issue using either set() or defaultdict() ?
Thanks.
my current progress:
from collections import defaultdict
d = {}
d['a'] = [1, 2, 2, 5]
d['b'] = [3,2,2,1]
x = list(zip(d['a'],d['b']))
output = defaultdict(int)
for a,b in x:
output[a]+=b
x = dict(output)
x = dict(output)
sorted_x = sorted(x.items(), key=lambda x: x[0])
print (sorted_x)
n= 8
y = dict(sorted_x)
for i, j in dict(sorted_x).items():
for a in range (n):.........
However, I have no clue to assign new pairs in y

Python doesn't support duplicate keys in Dictionary.
If you define dictionary like d={1:1, 4:2, 1:3},
then Python will keep only one unique key like d={1:3, 4:2}
I think you can use below approach:
from collections import defaultdict
d=[(1,1),(4,2),(1,3)] # Use List of tuples to stores all the duplicate keys
output = defaultdict(int)
for k,v in d:
output[k]+=v
print output

Related

Unable to add new value in python array

I am trying to reverse my existing array . The reverse elements must insert in array called "new_ar"
My code is not throwing any error but not inserting values also in new array"new_ar"
Existing array is as below -
ar = array([1, 2, 3, 4])
i want output as new array which is exactly reverse of above array
i.e new_ar = (4,3,2,1) - > Expected output
i am using below code -
import numpy as np
ar = np.array(list)
len_ar = len(ar)
new_ar = [] ## declaring empty array
x = 0
#print(ar)
#print(len_ar)
while len_ar >0:
# print(ar[len_ar-1])
new_ar = np.insert(new_ar,x,ar[len_ar-1])
len_ar = len_ar -1
x = x+1
print(x)
print(new_ar)
i have tried with insert method as well. but no luck.

define a function in which min() is used

I am trying to define a function in which I want a part of the function limited. I try to do this by using min() but it returns
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
My code:
def f(x, beta):
K_w = (1+((0.5*D)/(0.5*D+x))**2)**2
K_c = min(11,(3.5*(x/D)**(-0.5))) # <-- this is what gives me the problem. It should limit K_c to 11, but that does not work.
K_tot = (K_c**2+K_w**2+2*K_c*K_w*np.cos(beta))**0.5
return K_tot
x = np.linspace(0, 50, 100)
beta = np.linspace(0, 3.14, 180)
X, Y = np.meshgrid(x, beta)
Z = f(X, Y)
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 100, cmap = 'viridis')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z');
I expected K_c to be limited to 11, but it gave a
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I might be making a rookie mistake, but help is much appreciated!
Consider using np.clip of which its references can be found here.
np.clip(3.5*(x/D)**(-0.5), None, 11)
for your case.
For example,
>>> import numpy as np
>>> np.clip([1, 2, 3, 15], None, 11)
array([ 1, 2, 3, 11])
The problem with your code is that min is comparing a number with a list of which this is not expected.
Alternatively, here is a list comprehension approach:
A = [1, 2, 3, 15]
B = [min(11, a) for a in A]
print(B)

Inverse of Pandas json_normalize

I just discovered the json_normalize function which works great in taking a JSON object and giving me a pandas Dataframe. Now I want the reverse operation which takes that same Dataframe and gives me a json (or json-like dictionary which I can easily turn to json) with the same structure as the original json.
Here's an example: https://hackersandslackers.com/json-into-pandas-dataframes/.
They take a JSON object (or JSON-like python dictionary) and turn it into a dataframe, but I now want to take that dataframe and turn it back into a JSON-like dictionary (to later dump to json file).
I implemented it with a couple functions
def set_for_keys(my_dict, key_arr, val):
"""
Set val at path in my_dict defined by the string (or serializable object) array key_arr
"""
current = my_dict
for i in range(len(key_arr)):
key = key_arr[i]
if key not in current:
if i==len(key_arr)-1:
current[key] = val
else:
current[key] = {}
else:
if type(current[key]) is not dict:
print("Given dictionary is not compatible with key structure requested")
raise ValueError("Dictionary key already occupied")
current = current[key]
return my_dict
def to_formatted_json(df, sep="."):
result = []
for _, row in df.iterrows():
parsed_row = {}
for idx, val in row.iteritems():
keys = idx.split(sep)
parsed_row = set_for_keys(parsed_row, keys, val)
result.append(parsed_row)
return result
#Where df was parsed from json-dict using json_normalize
to_formatted_json(df, sep=".")
A simpler approach:
Uses only 1 function...
def df_to_formatted_json(df, sep="."):
"""
The opposite of json_normalize
"""
result = []
for idx, row in df.iterrows():
parsed_row = {}
for col_label,v in row.items():
keys = col_label.split(sep)
current = parsed_row
for i, k in enumerate(keys):
if i==len(keys)-1:
current[k] = v
else:
if k not in current.keys():
current[k] = {}
current = current[k]
# save
result.append(parsed_row)
return result
df.to_json(path)
or
df.to_dict()
I just implemented this using 2 functions.
Get a full list of fields from the DataFrame that are part of a nested field. Only the parent i.e. if location.city.code fits the criteria, we only care about location.city. Sort it by the deepest level of nesting, i.e. location.city is nested further than location.
Starting with the deepest nested parent field, find all child fields by searching in the column name. Create a field in the DataFrame for the parent field, which is a combination of all child fields (renamed so that they lose the nesting structure, e.g. location.city.code becomes code) converted to JSON and then loaded to a dictionary value. Finally, drop all of the child fields.
def _get_nested_fields(df: pd.DataFrame) -> List[str]:
"""Return a list of nested fields, sorted by the deepest level of nesting first."""
nested_fields = [*{field.rsplit(".", 1)[0] for field in df.columns if "." in field}]
nested_fields.sort(key=lambda record: len(record.split(".")), reverse=True)
return nested_fields
def df_denormalize(df: pd.DataFrame) -> pd.DataFrame:
"""
Convert a normalised DataFrame into a nested structure.
Fields separated by '.' are considered part of a nested structure.
"""
nested_fields = _get_nested_fields(df)
for field in nested_fields:
list_of_children = [column for column in df.columns if field in column]
rename = {
field_name: field_name.rsplit(".", 1)[1] for field_name in list_of_children
}
renamed_fields = df[list_of_children].rename(columns=rename)
df[field] = json.loads(renamed_fields.to_json(orient="records"))
df.drop(list_of_children, axis=1, inplace=True)
return df
let me throw in my two cents
after backward converting you might need to drop empty columns from your generated jsons
therefore, i checked if val != np.nan. but u cant directly do it, instead you need to check val == val or not, because np.nan != itself.
my version:
def to_formatted_json(df, sep="."):
result = []
for _, row in df.iterrows():
parsed_row = {}
for idx, val in row.iteritems():
if val == val:
keys = idx.split(sep)
parsed_row = set_for_keys(parsed_row, keys, val)
result.append(parsed_row)
return result
This is a solution which looks working to me. It is designed to work on a dataframe with one line, but it can be easily looped over large dataframes.
class JsonRecreate():
def __init__(self, df):
self.df = df
def pandas_to_json(self):
df = self.df
# determine the number of nesting levels
number_levels = np.max([len(i.split('.')) for i in df.columns])
# put all the nesting levels in an a list
levels = []
for level_idx in np.arange(number_levels):
levels.append(np.array([i.split('.')[level_idx] if len(i.split('.')) > level_idx else ''
for i in df.columns.tolist()]))
self.levels = levels
return self.create_dict(upper_bound = self.levels[0].shape[0])
def create_dict(self, level_idx = 0, lower_bound = 0, upper_bound = 100):
''' Function to create the dictionary starting from a pandas dataframe generated by json_normalize '''
levels = self.levels
dict_ = {}
# current nesting level
level = levels[level_idx]
# loop over all the relevant elements of the level (relevant w.r.t. its parent)
for key in [i for i in np.unique(level[lower_bound: upper_bound]) if i != '']:
# find where a particular key occurs in the level
correspondence = np.where(level[lower_bound: upper_bound] == key)[0] + lower_bound
# check if the value(s) corresponding to the key appears once (multiple times)
if correspondence.shape[0] == 1:
# if the occurence is unique, append the value to the dictionary
dict_[key] = self.df.values[0][correspondence[0]]
else:
# otherwhise, redefine the relevant bounds and call the function recursively
lower_bound_, upper_bound_ = correspondence.min(), correspondence.max() + 1
dict_[key] = self.create_dict(level_idx + 1, lower_bound_, upper_bound_)
return dict_
I tested it with a simple dataframe such as:
df = pd.DataFrame({'a.b': [1], 'a.c.d': [2], 'a.c.e': [3], 'a.z.h1': [-1], 'a.z.h2': [-2], 'f': [4], 'g.h': [5], 'g.i.l': [6], 'g.i.m': [7], 'g.z.h1': [-3], 'g.z.h2': [-4]})
The order in the json is not exactly preserved in the resulting json, but it can be easily handled if needed.

Counting in sequences

Say I want to count values in sequence xs of how many times it appears in v and return the integers in a list in the same order. Including dups. This the code I have so far and I'm kind of stuck on what to do. Trying to keep it simple without .count funcs and what not.
def count_each(xs,v):
count = []
for i in range(len(xs)):
if xs(i) == v:
return count.append(i)
return count
You could use the list method count().
>>> keys = [10, 20, 30]
>>> search = [10, 20, 50, 20, 40, 20]
>>> print [search.count(key) for key in keys]
[1, 3, 0]
alternatively O(n),
>>> from collections import Counter
>>> c = Counter(search)
>>> print [c[key] for key in keys]
[1, 3, 0]
Below is the sample function to achieve this using collections.defaultdict:
from collections import defaultdict
def count_each(xs, v):
count = defaultdict(int)
for item in v:
if item in xs:
count[item] += 1
return [count[item] for item in xs]
OR, using simple dict as:
def count_each(xs, v):
count = {}
for item in v:
if item in xs:
if item not in count:
count[item] = 0
count[item] += 1
return [count.get(item, 0) for item in xs]
Sample call:
>>> count_each([10,20,30],[10,20,50,20,40,20])
[1, 3, 0]
You can use function list.count(element) to count element in list
So your code can look like this
def count_each(xs, v):
result = []
for element in xs:
result.append( v.count(element) )
return result
count_each([10,20,30],[10,20,50,20,40,20])
or you can write it shorter using list comprehension - see #mrdomoboto answer.
EDIT: the same without count()
def count_each(xs, v):
result = []
for element in xs:
count = 0
for x in v:
if x == element:
count += 1
result.append( count )
return result
count_each([10,20,30],[10,20,50,20,40,20])
BTW: you can't do this with dict because dict don't have to keep order. One time you can get result [1,3,0] but another time [3,1,0] or [0,1,3], etc.

Is there a in built method from python csv module to enumerate all possible value for a specific column?

I have a csv file which has many columns. Now my requirement is to find all possible value that are present for that specific column.
Is there any built in function in python that helps me to get these values.
You can us pandas.
Example file many_cols.csv:
col1,col2,col3
1,10,100
1,20,100
2,10,100
3,30,100
Find unique values per column:
>>> import pandas as pd
>>> df = pd.read_csv('many_cols.csv')
>>> df.col1.drop_duplicates().tolist()
[1, 2, 3]
>>> df['col2'].drop_duplicates().tolist()
[10, 20, 30]
>>> df['col3'].drop_duplicates().tolist()
[100]
For all columns:
import pandas as pd
df = pd.read_csv('many_cols.csv')
for col in df.columns:
print(col, df[col].drop_duplicates().tolist())
Output:
col1 [1, 2, 3]
col2 [10, 20, 30]
col3 [100]
I would use a set() for this.
Lets say the csv file is this and we want only unique values from second column.
foo,1,bar
baz,2,foo
red,3,blue
git,3,foo
Here is the code that would accomplish this. I am simply printing out the unique values to test that it worked.
import csv
def parse_csv_file(rawCSVFile):
fileLineList = []
with open(rawCSVFile, newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
fileLineList.append(row)
return fileLineList
def main():
uniqueColumnValues = set()
fileLineList = parse_csv_file('sample.csv')
for row in fileLineList:
uniqueColumnValues.add(row[1]) # Selecting 2nd column here.
print(uniqueColumnValues)
if __name__ == '__main__':
main()
Overly "clever" approach to figuring out unique values for all the rows at once (assumes all columns are the same size, though it ignores empty lines seamlessly):
# Assumes somefile was opened properly earlier
csvin = filter(None, csv.reader(somefile))
for i, vals in enumerate(map(sorted, map(set, zip(*csvin)))):
print("Unique values for column", i)
print(vals)
It uses zip(*csvin) to do a table rotation (converting the normal one row at a time output to one column at a time), then uniquifies each column with set, and (for nice output) sorts it.