dlmread range with unlimited rows - octave

I'm reading a range of values from a csv-file:
M = dlmread(filename,delimiter,[R1 C1 R2 C2])
Is there any way to specify R2 parameter as default or unknown? I want to read all the rows, but a limited number of columns.
R1=0,
C1=0,
R2=Unknown,
C2=15

Related

Count or read only actual entries

How can I count or read only the actual entries of a column, as distinct from non-zero entries?
In other words, if I have the file:
4000,1,5221,0
4001,0,5222,1
4002,3,,,
column 4 has 2 actual entries, whereof one vanishes. I can count entries like so:
R = csvread("bugtest.csv");
for i = 1:4
VanishingColEntries(i) = numel (find (R(:,i) ==0));
NonVanishingColEntries(i) = nnz(R(:,i));
endfor
VanishingColEntries
NonVanishingColEntries
yielding:
octave:2> nument
VanishingColEntries =
0 1 1 2
NonVanishingColEntries =
3 2 2 1
But, I dont know how to extract the number of "actual" entries, that is the sum of non zero and explicitly zero entries!
csvread is only for numeric data. If csvread encounters an entry which is not strictly numeric, it checks if the string starts with a number, and uses that as the result (e.g. 1direction, 2pac, 7up will result in 1,2,7 ). 'Empty' entries here are effectively considered to be an empty string, which is parsed as the number 0. However, there are some special strings, like nan and inf which are parsed specially.
If you can / are happy to preprocess your csv file, then you can replace all empty entries with the string nan (without quotes). csvread will then treat this string specially and replace it with an actual nan value in the resulting numerical matrix. You can then use this with isnan to count the number of nan / non-nan entries as follows:
R = csvread( 'bugtest.csv' );
% Count nan / non-nan entries along rows
VanishingColEntries = sum( isnan( R ), 1 )
NonVanishingColEntries = sum( ~isnan( R ), 1 )
If you do not have the luxury of preprocessing your csv file (or you simply want to process it programmatically throughout, without the need for human intervention), then you can use the csv2cell function from the io package instead, and process the resulting cell to get what you want, e.g.
pkg load io
C = csv2cell( 'bugtest.csv' )
% Convert cells with empty strings to nan
for i = 1 : numel(C), if ischar(C{i}), C{i} = nan; endif, endfor
% Convert numeric cell array (nan is a valid number) to a matrix
R = cell2mat( C );
You can then use isnan as before to get your result.

How to Find First Valid Row in SQL Based on Difference of Column Values

I am trying to find a reliable query which returns the first instance of an acceptable insert range.
Research:
some of the below links adress similar questions, but I could get none of them to work for me.
Find first available date, given a date range in SQL
Find closest date in SQL Server
MySQL difference between two rows of a SELECT Statement
How to find a gap in range in SQL
and more...
Objective Query Function:
InsertRange(1) = (StartRange(i) - EndRange(i-1)) > NewValue
Where InsertRange(1) is the value the query should return. In other words, this would be the first instance where the above condition is satisfied.
Table Structure:
Primary Key: StartRange
StartRange(i-1) < StartRange(i)
StartRange(i-1) + EndRange(i-1) < StartRange(i)
Example Dataset
Below is an example User table (3 columns), with a set range distribution. StartRanges are always ordered in a strictly ascending way, UserID are arbitrary strings, only the sequences of StartRange and EndRange matters:
StartRange EndRange UserID
312 6896 user0
7134 16268 user1
16877 22451 user2
23137 25142 user3
25955 28272 user4
28313 35172 user5
35593 38007 user6
38319 38495 user7
38565 45200 user8
46136 48007 user9
My current Query
I am trying to use this query at the moment:
SELECT t2.StartRange, t2.EndRange
FROM user AS t1, user AS t2
WHERE (t1.StartRange - t2.StartRange+1) > NewValue
ORDER BY t1.EndRange
LIMIT 1
Example Case
Given the table, if NewValue = 800, then the returned answer should be 23137. This means, the first available slot would be between user3 and user4 (with an actual slot size = 813):
InsertRange(1) = (StartRange(i) - EndRange(i-1)) > NewValue
InsertRange = (StartRange(6) - EndRange(5)) > NewValue
23137 = 25955 - 25142 > 800
More Comments
My query above seemed to be working for the special case where StartRanges where tightly packed (i.e. StartRange(i) = StartRange(i-1) + EndRange(i-1) + 1). This no longer works with a less tightly packed set of StartRanges
Keep in mind that SQL tables have no implicit row order. It seems fair to order your table by StartRange value, though.
We can start to solve this by writing a query to obtain each row paired with the row preceding it. In MySQL, it's hard to do this beautifully because it lacks the row numbering function.
This works (http://sqlfiddle.com/#!9/4437c0/7/0). It may have nasty performance because it generates O(n^2) intermediate rows. There's no row for user0; it can't be paired with any preceding row because there is none.
select MAX(a.StartRange) SA, MAX(a.EndRange) EA,
b.StartRange SB, b.EndRange EB , b.UserID
from user a
join user b ON a.EndRange <= b.StartRange
group by b.StartRange, b.EndRange, b.UserID
Then, you can use that as a subquery, and apply your conditions, which are
gap >= 800
first matching row (lowest StartRange value) ORDER BY SB
just one LIMIT 1
Here's the query (http://sqlfiddle.com/#!9/4437c0/11/0)
SELECT SB-EA Gap,
EA+1 Beginning_of_gap, SB-1 Ending_of_gap,
UserId UserID_after_gap
FROM (
select MAX(a.StartRange) SA, MAX(a.EndRange) EA,
b.StartRange SB, b.EndRange EB , b.UserID
from user a
join user b ON a.EndRange <= b.StartRange
group by b.StartRange, b.EndRange, b.UserID
) pairs
WHERE SB-EA >= 800
ORDER BY SB
LIMIT 1
Notice that you may actually want the smallest matching gap instead of the first matching gap. That's called best fit, rather than first fit. To get that you use ORDER BY SB-EA instead.
Edit: There is another way to use MySQL to join adjacent rows, that doesn't have the O(n^2) performance issue. It involves employing user variables to simulate a row_number() function. The query involved is a hairball (that's a technical term). It's described in the third alternative of the answer to this question. How do I pair rows together in MYSQL?

Replace custom function in Google Sheets with standard functions

I have a range of cells, and I want to accrue the number of times a column has the max value for its given row.
Sample:
headers -> a b c d e f g h
0 0 12 18* 1 0 0 0
30* 0 15 25 0 0 0 0
35 0 19 31 0 0 31 50*
40 10 19 31 0 2 5 55*
expected:
#max val per row-> 1 0 0 1 0 0 0 2
The maximum values are marked with an asterisk. The column a scores 1 because it has the maximum value in the second data row, the column d scores 1 as well because it has the maximum value in the first data row and the column h scores 2 because it has the maximum value in the third and fourth data rows. The rest of columns don't have the maximum value in any row, so they get a 0.
For just one row, I can copy this formula for for each column and it would do it, but I need something that applies the max row-wise COUNTIF(B2:B10, MAX($B2:$B10)).
I have written this google apps script, but I don't like its responsiveness (seeing the "Loading..." in the cell for almost a second is kind of exasperating compared with the snappiness you get with native functions):
function countMaxInRange(input) {
return [input.map(function(row) {
var m = Math.max.apply(null, row);
return row.map(function(x){return x === m && 1 || 0});
}).reduce(function(a, b){
var s = Array(a.length);
for (var i = 0; i < a.length; i++) {
s[i] = (a[i] + b[i]) || 0;
}
return s;
})];
}
Any ideas on how I could replace that code with built in functions? I don't care adding auxiliar rows or columns, as long as it is a constant number of them (that is, if I extend my dataset I don't want to manually add more helper rows or columns for each new data row or column).
I think I could add an extra column that collects the header of the column with the max value for each row; and then for each data column count how many times their header appears in that auxiliar column, but does not seem very clean.
FORMULA
=ArrayFormula(TRANSPOSE(
MMULT(
N(TRANSPOSE(NamedRange1)
=
INDEX(
QUERY(TRANSPOSE(NamedRange1),
"SELECT "&JOIN(",",("MAX(Col"&TRANSPOSE(ROW(NamedRange1))-INDEX(ROW(NamedRange1),1)+1)&")"
)),
2)
),
SIGN(ROW(NamedRange1))
)
))
where NamedRange1 is named range referred to the range.
Conditional formatting:
Apply to range: A1:H4
Custom formula: =A1=MAX($A1:$H1)
Explanation
Summary
The above formula no requires extra columns, just to set the range as a named range. In the formula NamedRange1 was used but it could be customized according to your preferences.
The result is a 1 x n array where n is the number of columns of NamedRange1. Each column will have the count of occurrences of maximum values by row on the correspondent column.
Featured "hacks"
ARRAYFORMULA returns an array of values.
Ranges greater than 1 x 1 are handled as arrays.
Using an array as argument with some functions and operators works in a similar way than a loop. In this case, this features is used to create a SQL statement to get the maximum value of each column of the input data. Note that the input data for QUERY is the transpose of NamedRange1.
N coerce TRUE/FALSE values to 1/0 respectively.
MMULT is used to make sums by rows
Note: the +0 shown on the image was inserted to force Google Sheets to keep the breaklines introduced on an edit of the formula without breaklines because if there are not significant changes to the formula, the breaklines are automatically removed due to the formula/result caching feature of Google Sheets.
Reference
MMULT Usage by Adam Lusk.

MySQL: compare a mixed field containing letters and numbers

I have a field in the mysql database that contains data like the following:
Q16
Q32
L16
Q4
L32
L64
Q64
Q8
L1
L4
Q1
And so forth. What I'm trying to do is pull out, let's say, all the values that start with Q which is easy:
field_name LIKE 'Q%'
But then I want to filter let's say all the values that have a number higher than 32. As a result I'm supposed to get only 'Q64', however, I also get Q4, Q8 and so for as I'm comparing them as strings so only 3 and the respective digit are compared and the numbers are in general taken as single digits, not as integers.
As this makes perfect sense, I'm struggling to find a solution on how to perform this operation without pulling all the data out of the database, stripping out the Qs and parsing it all to integers.
I did play around with the CAST operator, however, it only works if the value is stored as string AND it contains only digits. The parsing fails if there's another character in there..
Extract the number from the string and cast it to a number with *1 or cast
select * from your_table
where substring(field_name, 1, 1) = 'Q'
and substring(field_name, 2) * 1 > 32

Report and Graphic in access 2007 - calculating values on queries

Picture a table with fields (Id, Valid, Value)
Valid = boolean
Value = number from 0 to 100
What I want is a report that counts the number of records where (valid = 0), and then gives me the total number of cases where (value < 70) and the number of cases where (value >= 70).
The problem is that the "value" field could be empty on some of the records and I only want the records where the value field is not empty.
I know that the second value (value>=70) is going to be calculated, but the problem is that I can't simply do (total number of records - number of records where value < 70), because there's the problem with the records where "value" is null...
And then I want to create graphic with these values, to see the percentage of records below and above 70.
"The problem is that the "value" field could be empty on some of the records and I only want the records where the value field is not empty."
Use a WHERE clause to exclude rows whose "value" field is Null.
Here is sample data for tblMetraton.
Id Valid Valu
1 -1 2
2 0 4
3 -1 6
4 0
5 0 90
I used Valu for the field name because Value is a reserved word.
You can use the Count() function in a query and take advantage of the fact it only counts non-Null values. So use Count() with an IIf() expression which returns a non-Null value (I used 1) for the condition you want to match and Null otherwise.
SELECT
Count(IIf(Valid=0,1,Null)) AS valid_false,
Count(IIf(Valu<70,1,Null)) AS below_70,
Count(IIf([Valu]>=70,1,Null)) AS at_least70
FROM tblMetraton AS m
WHERE (((m.[Valu]) Is Not Null));
Based on my sample data in tblMetraton, that query gives me this result set.
valid_false below_70 at_least70
2 3 1
If my sample data does not address the variability you're dealing with, and/or if my query results don't match your requirements, please show us you own sample data and the results you want based on that sample.