Csv reader PyQt5 [duplicate] - csv

I've an iterable list of over 100 elements. I want to do something after every 10th iterable element. I don't want to use a counter variable. I'm looking for some solution which does not includes a counter variable.
Currently I do like this:
count = 0
for i in range(0,len(mylist)):
if count == 10:
count = 0
#do something
print i
count += 1
Is there some way in which I can omit counter variable?

for count, element in enumerate(mylist, 1): # Start counting from 1
if count % 10 == 0:
# do something
Use enumerate. Its built for this

Just to show another option...hopefully I understood your question correctly...slicing will give you exactly the elements of the list that you want without having to to loop through every element or keep any enumerations or counters. See Explain Python's slice notation.
If you want to start on the 1st element and get every 10th element from that point:
# 1st element, 11th element, 21st element, etc. (index 0, index 10, index 20, etc.)
for e in myList[::10]:
<do something>
If you want to start on the 10th element and get every 10th element from that point:
# 10th element, 20th element, 30th element, etc. (index 9, index 19, index 29, etc.)
for e in myList[9::10]:
<do something>
Example of the 2nd option (Python 2):
myList = range(1, 101) # list(range(1, 101)) for Python 3 if you need a list
for e in myList[9::10]:
print e # print(e) for Python 3
Prints:
10
20
30
...etc...
100

for i in range(0,len(mylist)):
if (i+1)%10==0:
do something
print i

A different way to approach the problem is to split the iterable into your chunks before you start processing them.
The grouper recipe does exactly this:
from itertools import izip_longest # needed for grouper
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
You would use it like this:
>>> i = [1,2,3,4,5,6,7,8]
>>> by_twos = list(grouper(i, 2))
>>> by_twos
[(1, 2), (3, 4), (5, 6), (7, 8)]
Now, simply loop over the by_twos list.

You can use range loops to iterate through the length of mylist in multiples of 10 the following way:
for i in range(0,len(mylist), 10):
#do something

Related

Elixir: How to get bit_size of an Integer variable?

I need to get the size of bits used in one Integer variable.
like this:
bit_number = 1
bit_number = bit_number <<< 2
bit_size(bit_number) # must return 3 here
the bit_size/1 function is for 'strings', not for integers but, in the exercise, whe need to get the size of bits of the integer.
I'm doing one exercise of compression of an book (Classic Computer Science Problems in Python, of Daivid Kopec) and I'm trying to do in Elixir for study.
This works:
(iex) import Bitwise
(iex) Integer.digits(1 <<< 1, 2) |> length
2
but I'm sure there are better solutions.
(as #Hauleth mentions, the answer here should be 2, not 3)
You can count how many times you can divide it by two:
defmodule Example do
def bits_required(0), do: 1
def bits_required(int), do: bits_required(int, 1)
defp bits_required(1, acc), do: acc
defp bits_required(int, acc), do: bits_required(div(int, 2), acc + 1)
end
Output:
iex> Example.bits_required(4)
3

Calculating the average of a column in csv per hour

I have a csv file that contains data in the following format.
Layer relative_time Ht BSs Vge Temp Message
57986 2:52:46 0.00m 87 15.4 None CMSG
20729 0:23:02 45.06m 82 11.6 None BMSG
20729 0:44:17 45.06m 81 11.6 None AMSG
I want to get read in this csv file and calculate the average BSs for every hour. My csv file is quite huge about 2000 values. However the values are not evenly distributed across every hour. For e.g.
I have 237 samples from hour 3 and only 4 samples from hour 6. Also I should mention that the BSs can be collected from multiple sources.The value always ranges from 20-100. Because of this it is giving a skewed result. For each hour I am calculating the sum of BSs for that hour divided by the number of samples in that hour.
The primary purpose is to understand how BSs evolves over time.
But what is the common approach to this problem. Is this where people apply normalization? It would be great if someone could explain how to apply normalization in such a situation.
The code I am using for my processing is shown below. I believe the code below is correct.
#This 24x2 matrix will contain no of values recorded per hour per hour
hours_no_values = [[0 for i in range(24)] for j in range(2)]
#This 24x2 matrix will contain mean bss stats per hour
mean_bss_stats = [[0 for i in range(24)] for j in range(2)]
with open(PREFINAL_OUTPUT_FILE) as fin, open(FINAL_OUTPUT_FILE, "w",newline='') as f:
reader = csv.reader(fin, delimiter=",")
writer = csv.writer(f)
header = next(reader) # <--- Pop header out
writer.writerow([header[0],header[1],header[2],header[3],header[4],header[5],header[6]]) # <--- Write header
sortedlist = sorted(reader, key=lambda row: datetime.datetime.strptime(row[1],"%H:%M:%S"), reverse=True)
print(sortedlist)
for item in sortedlist:
rel_time = datetime.datetime.strptime(item[1], "%H:%M:%S")
if rel_time.hour not in hours_no_values[0]:
print('item[6] {}'.format(item[6]))
if 'MAN' in item[6]:
print('Hour found {}'.format(rel_time.hour))
hours_no_values[0][rel_time.hour] = rel_time.hour
mean_bss_stats[0][rel_time.hour] = rel_time.hour
mean_bss_stats[1][rel_time.hour] += int(item[3])
hours_no_values[1][rel_time.hour] +=1
else:
pass
else:
if 'MAN' in item[6]:
print('Hour Previous {}'.format(rel_time.hour))
mean_bss_stats[1][rel_time.hour] += int(item[3])
hours_no_values[1][rel_time.hour] +=1
else:
pass
for i in range(0,24):
if(hours_no_values[1][i] != 0):
mean_bss_stats[1][i] = mean_bss_stats[1][i]/hours_no_values[1][i]
else:
mean_bss_stats[1][i] = 0
pprint.pprint('mean bss stats {} \n hour_no_values {} \n'.format(mean_bss_stats,hours_no_values))
The number of value per each hour are as follows for hours starting from 0 to 23.
[31, 117, 85, 237, 3, 67, 11, 4, 57, 0, 5, 21, 2, 5, 10, 8, 29, 7, 14, 3, 1, 1, 0, 0]
You could do it with pandas using groupby and aggregate to appropriate column:
import pandas as pd
import numpy as np
df = pd.read_csv("your_file")
df.groupby('hour')['BSs'].aggregate(np.mean)
If you don't have that column in initial dataframe you could add it:
df['hour'] = your_hour_data
numpy.mean - calculates the mean of the array.
Compute the arithmetic mean along the specified axis.
pandas.groupby
Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns
From pandas docs:
By “group by” we are referring to a process involving one or more of the following steps
Splitting the data into groups based on some criteria
Applying a function to each group independently
Combining the results into a data structure
Aggregation: computing a summary statistic (or statistics) about each group.
Some examples:
Compute group sums or means
Compute group sizes / counts

Egg dropping in worst case

I have been trying to write an algorithm to compute the maximum number or trials required in worst case, in the egg dropping problem. Here is my python code
def eggDrop(n,k):
eggFloor=[ [0 for i in range(k+1) ] ]* (n+1)
for i in range(1, n+1):
eggFloor[i][1] = 1
eggFloor[i][0] = 0
for j in range(1, k+1):
eggFloor[1][j] = j
for i in range (2, n+1):
for j in range (2, k+1):
eggFloor[i][j] = 'infinity'
for x in range (1, j + 1):
res = 1 + max(eggFloor[i-1][x-1], eggFloor[i][j-x])
if res < eggFloor[i][j]:
eggFloor[i][j] = res
return eggFloor[n][k]print eggDrop(2, 100)
```
The code is outputting a value of 7 for 2eggs and 100floors, but the answer should be 14, i don't know what mistake i have made in the code. What is the problem?
The problem is in this line:
eggFloor=[ [0 for i in range(k+1) ] ]* (n+1)
You want this to create a list containing (n+1) lists of (k+1) zeroes. What the * (n+1) does is slightly different - it creates a list containing (n+1) copies of the same list.
This is an important distinction - because when you start modifying entries in the list - say,
eggFloor[i][1] = 1
this actually changes element [1] of all of the lists, not just the ith one.
To instead create separate lists that can be modified independently, you want something like:
eggFloor=[ [0 for i in range(k+1) ] for j in range(n+1) ]
With this modification, the program returns 14 as expected.
(To debug this, it might have been a good idea to write out a function to pring out the eggFloor array, and display it at various points in your program, so you can compare it with what you were expecting. It would soon become pretty clear what was going on!)

counting non-empty lines and sum of lengths of those lines in python

Am trying to create a function that takes a filename and it returns a 2-tuple with the number of the non-empty lines in that program, and the sum of the lengths of all those lines. Here is my current program:
def code_metric(file):
with open(file, 'r') as f:
lines = len(list(filter(lambda x: x.strip(), f)))
num_chars = sum(map(lambda l: len(re.sub('\s', '', l)), f))
return(lines, num_chars)
The result I get is get if I do:
if __name__=="__main__":
print(code_metric('cmtest.py'))
is
(3, 0)
when it should be:
(3,85)
Also is there a better way of finding the sum of the length of lines using using the functionals map, filter, and reduce? I did it for the first part but couldn't figure out the second half. AM kinda new to python so any help would be great.
Here is the test file called cmtest.py:
import prompt,math
x = prompt.for_int('Enter x')
print(x,'!=',math.factorial(x),sep='')
First line has 18 characters (including white space)
Second line has 29 characters
Third line has 38 characters
[(1, 18), (1, 29), (1, 38)]
The line count is 85 characters including white spaces. I apologize, I mis-read the problem. The length total for each line should include the whitespaces as well.
A fairly simple approach is to build a generator to strip trailing whitespace, then enumerate over that (with a start value of 1) filtering out blank lines, and summing the length of each line in turn, eg:
def code_metric(filename):
line_count = char_count = 0
with open(filename) as fin:
stripped = (line.rstrip() for line in fin)
for line_count, line in enumerate(filter(None, stripped), 1):
char_count += len(line)
return line_count, char_count
print(code_metric('cmtest.py'))
# (3, 85)
In order to count lines, maybe this code is cleaner:
with open(file) as f:
lines = len(file.readlines())
For the second part of your program, if you intend to count only non-empty characters, then you forgot to remove '\t' and '\n'. If that's the case
with open(file) as f:
num_chars = len(re.sub('\s', '', f.read()))
Some people have advised you to do both things in one loop. That is fine, but if you keep them separated you can make them into different functions and have more reusability of them that way. Unless you are handling huge files (or executing this coded millions of times), it shouldn't matter in terms of performance.

find function matlab in numpy/scipy

Is there an equivalent function of find(A>9,1) from matlab for numpy/scipy. I know that there is the nonzero function in numpy but what I need is the first index so that I can use the first index in another extracted column.
Ex: A = [ 1 2 3 9 6 4 3 10 ]
find(A>9,1) would return index 4 in matlab
The equivalent of find in numpy is nonzero, but it does not support a second parameter.
But you can do something like this to get the behavior you are looking for.
B = nonzero(A >= 9)[0]
But if all you are looking for is finding the first element that satisfies a condition, you are better off using max.
For example, in matlab, find(A >= 9, 1) would be the same as [~, idx] = max(A >= 9). The equivalent function in numpy would be the following.
idx = (A >= 9).argmax()
matlab's find(X, K) is roughly equivalent to numpy.nonzero(X)[0][:K] in python. #Pavan's argmax method is probably a good option if K == 1, but unless you know apriori that there will be a value in A >= 9, you will probably need to do something like:
idx = (A >= 9).argmax()
if (idx == 0) and (A[0] < 9):
# No value in A is >= 9
...
I'm sure these are all great answers but I wasn't able to make use of them. However, I found another thread that partially answers this:
MATLAB-style find() function in Python
John posted the following code that accounts for the first argument of find, in your case A>9 ---find(A>9,1)-- but not the second argument.
I altered John's code which I believe accounts for the second argument ",1"
def indices(a, func):
return [i for (i, val) in enumerate(a) if func(val)]
a = [1,2,3,9,6,4,3,10]
threshold = indices(a, lambda y: y >= 9)[0]
This returns threshold=3. My understanding is that Python's index starts at 0... so it's the equivalent of matlab saying 4. You can change the value of the index being called by changing the number in the brackets ie [1], [2], etc instead of [0].
John's original code:
def indices(a, func):
return [i for (i, val) in enumerate(a) if func(val)]
a = [1, 2, 3, 1, 2, 3, 1, 2, 3]
inds = indices(a, lambda x: x > 2)
which returns >>> inds [2, 5, 8]
Consider using argwhere in Python to replace MATLAB's find function. For example,
import numpy as np
A = [1, 2, 3, 9, 6, 4, 3, 10]
np.argwhere(np.asarray(A)>=9)[0][0] # Return first index
returns 3.
import numpy
A = numpy.array([1, 2, 3, 9, 6, 4, 3, 10])
index = numpy.where(A >= 9)
You can do this by first convert the list to an ndarray, then using the function numpy.where() to get the desired index.