How to calculate the average value of a particular column in a text file with the help of Tcl Script ?
For example I have a text file containing 3 columns like:
1 2 3
4 5 6
5 9 7
3 2 8
And I want to do the average value calculation for Column 1 only; then How can I do it using Tcl script ?
Split by spaces to get the first column values
Create an empty list to store the values
Divide the sum by its length
someFile:
1 2 3
4 5 6
5 9 7
3 2 8
Hence:
values = [] # an empty list
with open(fileName, 'r') as f:
content = f.readlines()
content = [l.strip() for l in content if l.strip()] # to remove empty lines
for line in content:
values.append(int(line.split(" ")[0])) # convert str to int and append
print(sum(values) / float(len(values)))
OUTPUT:
3.25
Related
Is there a function to reduce the amount of redundant data from one column to match the number of cells in a second column?
I have logged data from two sensors that sent values at different rates. in 8 hours, I collected 11857 values for the first sensor and 8130 for the second one.
I need to compress the first column by deleting data to match the number of cells on the second column, so I can display synchronized values on a chart.
It is not a matter of cutting 3727 cells from the head or tail of the first column, but to delete cells in a proportional way.
I've tried using de Modulus function, but it does not give me the right amount of compression; e.g., by running =MOD(A1,3) and then filtering cells containing '0' value and deleting those rows, I get 7905, which is close to 8130 but still, the data is shifted out.
Edit:
I found a method that requires several steps:
Copy the sensors' data into two columns
Get the number of cells for both columns using COUNTA
Get the ratio between the smaller count over the bigger count
In a new column, create an index for the rows using =INT(ROW()*ratio)
Remove duplicate rows using the index column as the reference with Data > Remove Duplicates
It works, but it will be much faster if there was a ready-made function that will run over the provided data columns and copy the values into two new columns
I tested this solution in LibreOffice Calc. The functions used are basic enough to be found in Excel as well.
Here's a sample with data from 2 sensors, s1 and s2, similar to yours:
Row s1 s2
1 2 3
2 4 6
3 6 9
4 8 12
5 10 15
6 12 18
7 14 21
8 16
9 18
10 20
11 22
What I did was match the data from s1 samples with those from s2 that relatively match the position of the first, so instead of ending up with a number of rows with no s2 values, I padded non-existent s2 values with the last sample taken for any given period of time (column s2a)
Row s1 s2 s2a
1 2 3 3
2 4 6 6
3 6 9 6
4 8 12 9
5 10 15 12
6 12 18 12
7 14 21 15
8 16 18
9 18 18
10 20 21
11 22 21
Assuming that s1 is column A and s2 is column B in the spreadsheet, the function you want on each cell of the new column is:
=INDIRECT( ADDRESS( CEILING( ROW()* COUNT(B:B)/COUNT(A:A)),2))
Let's go from bottom to top:
COUNT(B:B)/COUNT(A:A) - this is the ratio. 0.63' above. It indicates that each sample in any given row in s1 will be found at that row x 0.63 in column s2.
Ceiling - Spreadsheets don't start at row 0, so the first one HAS to be 1. I experimented with Int(), but if the ratio were less than 0.5 we would end up with a 0, which we don't want.
Address - Returns a string with the address of a cell given its row,column coordinates (e.g. Address(3, 2) = "B3" and Address(3,2,2) as used here, will yield an absolute column or "$B3").
Indirect - Returns the contents of a cell whose address is passed as a string (e.g. Address("x5") will return whatever value is stored in cell X5).
Alex
I'm trying to read different csv files with different delimiters. I have it working for space delimiters but it doesn't seem to properly read them if the have an extra space before or after the data.
The data is as follows:
Space Delimited Data
1 2 3
4 2 4
3 4 5
3 5 6
3 4 5
5 6 8
The last 3 rows would not be read properly (3 5 6, 3 4 5, 5 6 8) because of extra spaces before or after it. How can this be resolved?
My code is as follows:
def read(self, csv_file):
if self.delimiter == '':
with open(csv_file, newline='') as csvfile:
try:
dialect = csv.Sniffer().sniff(csvfile.read(), delimiters='space,;-\|\t\\')
csvfile.seek(0)
f = csv.reader(csvfile, dialect)
for row in f:
self.raw_data.append(row)
It often happens that data will be given to you with wrapped columns. Consider, for example:
CCY Decimals CCY Decimals CCY Decimals
AUD/CAD 5 EUR/CZK 4 GBP/NOK 5
AUD/CHF 5 EUR/DKK 5 GBP/NZD 5
AUD/DKK 5 EUR/GBP 5 GBP/PLN 5
AUD/JPY 3 EUR/HKD 5 GBP/SEK 5
AUD/NOK 5 EUR/HUF 3 GBP/SGD 5
...
Which should be parsed as a dataframe of two columns (CCY and Decimals), not six. My question is, what is the most idiomatic way of achieving this?
I would have wanted something like the following:
data = pd.read_csv("file.csv")
data.groupby(axis=1,by=data.columns.map(lambda s: s.replace("\..",""))).\
apply(lambda df : df.values.flatten())
When reading the csv file we end up with columns CCY,Decimals,CCY.1,Decimals.1 .. etc. The groupby operation returns a collection of data frames:
<pandas.core.groupby.DataFrameGroupBy object at 0x3a52b10>
Which we would then flatten using numpy functionality. So we would are converting DataFrames with repeating columns into Series, and then merging these into a result DF.
However, this doesn't work. I've tried passing the different keys arguments to groupBy, but it always complains about being unable to reindex non-unique columns.
There are a number of existing questions that deal with flattening groups of columns (e.g. "Flattening" output of group.nth in Pandas), but I can't find any that do this for repeating columns.
To use groupby, I'd do:
>>> groups = df.groupby(axis=1,by=lambda x: x.rsplit(".",1)[0])
>>> pd.DataFrame({k: v.values.flat for k,v in groups})
CCY Decimals
0 AUD/CAD 5
1 EUR/CZK 4
2 GBP/NOK 5
3 AUD/CHF 5
4 EUR/DKK 5
5 GBP/NZD 5
6 AUD/DKK 5
7 EUR/GBP 5
8 GBP/PLN 5
9 AUD/JPY 3
10 EUR/HKD 5
11 GBP/SEK 5
12 AUD/NOK 5
13 EUR/HUF 3
14 GBP/SGD 5
[15 rows x 2 columns]
and then sort.
I am working on NGSim Traffic data, having 18 columns and 1180598 rows in a text file. I want to smooth the position data, in the column 'Local Y'. I know there are built-in functions for data smoothing in R but none of them seem to match with the formula I am required to apply. The data in text file looks something like this:
Index VehicleID Total_Frames Local Y
1 2 5 35.381
2 2 5 39.381
3 2 5 43.381
4 2 5 47.38
5 2 5 51.381
6 4 8 504.828
7 4 8 508.325
8 4 8 512.841
9 4 8 516.338
10 4 8 520.854
11 4 8 524.592
12 4 8 528.682
13 4 8 532.901
14 5 7 39.154
15 5 7 43.153
16 5 7 47.154
17 5 7 51.154
18 5 7 55.153
19 5 7 59.154
20 5 7 63.154
The above data columns are just example taken out of original file. Here you can see 3 vehicles, with vehicle IDs = 2, 4 and 5 but in fact there are 2169 vehicles with different IDS. The column Total_Frames tell us how many times vehicle Id of each vehicle is repeated in the first column, for example in the table above, vehicle ID 2 is repeated 5 times, hence '5' in Total_Frames column. Following is the formula I am required to apply to remove data noise (smoothing) from column 'Local Y':
Smoothed Position Value = (1/(Summation of [EXP^-abs(i-k)/delta] from k=i-D to i+D)) * ( (Summation of (Local Y) *[EXP^-abs(i-k)/delta] from k=i-D to i+D))
where,
i = index #
delta = 5
D = 15
I have tried using the built-in functions, which I know of, but they don't smooth the data as required. My question is: Is there any built-in function in R which can do the data smoothing in the way of given formula or which could take this formula as an argument? I need to apply the formula to every value in Local Y which has 15 values before and 15 values after them (i-D and i+D) for same vehicle Id. Can anyone give me any idea how to approach the problem? Thanks in advance.
You can place your formula in a function and then use the apply function of R to apply it to the elements in your "Local Y" column of the dataframe
Is there a way that an Octave Matrix would hold Strings and numbers together?
I want to have a matrix of the fallowing type:
A=["A","B","C","D";1,2,3,4;2,3,4,5;3,4,5,6;4,5,6,7];
So that the matrix will look like:
A B C D
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
But when I try this I get:
ABCD
empty line
empty line
empty line
empty line
*empty line represents an empty line
And if I try to put strings that are more than 1 character in length, I get a number of columns mismatch error.
Is there a way to create a "mixed" octave matrix?
It sounds like you may be looking for a cell array.