Count or read only actual entries - octave

How can I count or read only the actual entries of a column, as distinct from non-zero entries?
In other words, if I have the file:
4000,1,5221,0
4001,0,5222,1
4002,3,,,
column 4 has 2 actual entries, whereof one vanishes. I can count entries like so:
R = csvread("bugtest.csv");
for i = 1:4
VanishingColEntries(i) = numel (find (R(:,i) ==0));
NonVanishingColEntries(i) = nnz(R(:,i));
endfor
VanishingColEntries
NonVanishingColEntries
yielding:
octave:2> nument
VanishingColEntries =
0 1 1 2
NonVanishingColEntries =
3 2 2 1
But, I dont know how to extract the number of "actual" entries, that is the sum of non zero and explicitly zero entries!

csvread is only for numeric data. If csvread encounters an entry which is not strictly numeric, it checks if the string starts with a number, and uses that as the result (e.g. 1direction, 2pac, 7up will result in 1,2,7 ). 'Empty' entries here are effectively considered to be an empty string, which is parsed as the number 0. However, there are some special strings, like nan and inf which are parsed specially.
If you can / are happy to preprocess your csv file, then you can replace all empty entries with the string nan (without quotes). csvread will then treat this string specially and replace it with an actual nan value in the resulting numerical matrix. You can then use this with isnan to count the number of nan / non-nan entries as follows:
R = csvread( 'bugtest.csv' );
% Count nan / non-nan entries along rows
VanishingColEntries = sum( isnan( R ), 1 )
NonVanishingColEntries = sum( ~isnan( R ), 1 )
If you do not have the luxury of preprocessing your csv file (or you simply want to process it programmatically throughout, without the need for human intervention), then you can use the csv2cell function from the io package instead, and process the resulting cell to get what you want, e.g.
pkg load io
C = csv2cell( 'bugtest.csv' )
% Convert cells with empty strings to nan
for i = 1 : numel(C), if ischar(C{i}), C{i} = nan; endif, endfor
% Convert numeric cell array (nan is a valid number) to a matrix
R = cell2mat( C );
You can then use isnan as before to get your result.

Related

SUM on VARCHAR field converted to DECIMAL not adding numbers after decimal

I have a VARCHAR field that stores a value like 0.00000000.
I want to run a report query to SUM all those VARCHAR fields, which means I have to convert them to a number to add them.
Here's my query, which works as far as giving no errors, but it gives the wrong number back:
SELECT SUM(CAST(IFNULL(tx.received_amount, '0.00000000') AS DECIMAL(16, 16)))
FROM account
JOIN account_invoice
ON account_invoice.account_id = account.id
JOIN withdrawal
ON withdrawal.invoice_id = account_invoice.invoice_id
JOIN tx
ON tx.id = withdrawal.tx_id
AND tx.currency = 'BTC'
AND tx.created_at > DATE_SUB(NOW(), INTERVAL 7 DAY)
WHERE account.id = 1
This is what I get: 100 x 1.12345678 = 100.00000000
This is what I should get: 100 x 1.12345678 = 112.34567800
Why is the SUM not adding the numbers after the decimal?
You are not using the DECIMAL datatype accordingly to your use case. DECIMAL(16, 16) declares a decimal number with a total of 16 digits and with 16 decimal digits. This cannot hold a value greater than 1.
Consider:
SELECT CAST('1.12345678' AS DECIMAL(16, 16))
Returns: 0.9999999999999999.
You probably want something like DECIMAL(16, 8) instead, since your strings seem to have 8 decimals.
From the MySQL documentation:
The declaration syntax for a DECIMAL column is DECIMAL(M,D). The ranges of values for the arguments are as follows:
M is the maximum number of digits (the precision). It has a range of 1 to 65.
D is the number of digits to the right of the decimal point (the scale). It has a range of 0 to 30 and must be no larger than M.
GMB's answer is usually the best choice, but if you truly need to output a (a_really_precise_number)*100 you can do it application-side by actually passing it as a string into a language that supports arbitrarily large numbers, then cast it application side. If you have numbers more precise than 16 digits in your database, you are likely already using one that supports this in your application.
In some cases, you are looking at data from another source and you have more precise numbers than your language of choice is designed for. Many languages that don't support these larger numbers natively may have libraries available that do fancy parsing to perform math on strings as strings but they tend to be a bit slow if you need to work with really large numbers or data sets.
A third option if you are just multiplying it by a power of 10 such as N*100 and outputting the result is to pass it to the application as a string, then just parse it to move that decimal over 2 places like this:
function shiftDec(str, shift){
// split on decimal point
var decPoint = str.indexOf(".");
var decInt = str.substr(0, decPoint);
var decMod = str.substr((decPoint+1));
// move decimal 'shift' places to simulate N*100.
if(shift > 0){
var shiftCopy = decInt .substr(0,shift);
decInt = decInt + shiftCopy;
decMod = decMod .substr(shift);
} else {
var shiftCopy = decInt .substr((decInt.length + shift));
decInt = decInt .substr(0,(decInt.length + shift));
decMod = shiftCopy + decMod;
}
return decInt + '.' + decMod;
}
var result = shiftDec("1234567891234567.8912345678912345", 2);
document.write(result);
You should not use DECIMAL(16,16)
SELECT 100 * CAST('1.123' AS DECIMAL(16,16))
99.999...
SELECT 100 * CAST('1.123' AS DECIMAL(16, 10))
112.300...

insert and fetch strings and matrices to/from MySQL with Matlab

I need to store data in a database. I have installed and configured a MySQL database (and an SQLite database) in Matlab. However I cannot store and retrieve anything other than scalar numeric values.
% create an empty database called test_data base with MySQL workbench.
% connect to it in Matlab
conn=database('test_database','root','XXXXXX','Vendor','MySQL');
% create a table to store values
create_test_table=['CREATE TABLE test_table (testID NUMERIC PRIMARY KEY, test_string VARCHAR(255), test_vector BLOB, test_scalar NUMERIC)'];
curs=exec(conn,create_test_table)
Result is good so far (curs.Message is an empty string)
% create a new record
datainsert(conn,'test_table',{'testID','test_string','test_vector','test_scalar'},{1,'string1',[1,2],1})
% try to read out the new record
sqlquery='SELECT * FROM test_table8';
data_to_view=fetch(conn,sqlquery)
Result is bad:
data_to_view =
1 NaN NaN 1
From the documentation for "fetch" I would expect:
data_to_view =
1×4 table
testID test_string test_vector test_scalar
_____________ ___________ ______________ ________
1 'string1' 1x2 double 1
Until I learn how to read blobs I'd even be willing to accept:
data_to_view =
1×4 table
testID test_string test_vector test_scalar
_____________ ___________ ______________ ________
1 'string1' NaN 1
I get the same thing with an sqlite database. How can I store and then read out strings and blobs and why isn't the data returned in table format?
Matlab does not document that the default options for SQLite and MySQL database retrieval are to attempt to return everything as a numeric array. One only needs this line:
setdbprefs('DataReturnFormat','cellarray')
or
setdbprefs('DataReturnFormat','table')
in order to get results with differing datatypes. However! now my result is:
data_to_view =
1×4 cell array
{[2]} {'string1'} {11×1 int8} {[1]}
If instead I input:
datainsert(conn,'test_table',{'testID','test_string','test_vector','test_scalar'},{1,'string1',typecast([1,2],'int8'),1})
Then I get:
data_to_view =
1×4 cell array
{[2]} {'string1'} {16×1 int8} {[1]}
which I can convert like so:
typecast(data_to_view{3},'double')
ans =
1 2
Unfortunately this does not work for SQLite. I get:
data_to_view =
1×4 cell array
{[2]} {'string1'} {' �? #'} {[1]}
and I can't convert the third part correctly:
typecast(unicode2native(data_to_view{1,3}),'double')
ans =
0.0001 2.0000
So I still need to learn how to read an SQLite blob in Matlab but that is a different question.

Replace custom function in Google Sheets with standard functions

I have a range of cells, and I want to accrue the number of times a column has the max value for its given row.
Sample:
headers -> a b c d e f g h
0 0 12 18* 1 0 0 0
30* 0 15 25 0 0 0 0
35 0 19 31 0 0 31 50*
40 10 19 31 0 2 5 55*
expected:
#max val per row-> 1 0 0 1 0 0 0 2
The maximum values are marked with an asterisk. The column a scores 1 because it has the maximum value in the second data row, the column d scores 1 as well because it has the maximum value in the first data row and the column h scores 2 because it has the maximum value in the third and fourth data rows. The rest of columns don't have the maximum value in any row, so they get a 0.
For just one row, I can copy this formula for for each column and it would do it, but I need something that applies the max row-wise COUNTIF(B2:B10, MAX($B2:$B10)).
I have written this google apps script, but I don't like its responsiveness (seeing the "Loading..." in the cell for almost a second is kind of exasperating compared with the snappiness you get with native functions):
function countMaxInRange(input) {
return [input.map(function(row) {
var m = Math.max.apply(null, row);
return row.map(function(x){return x === m && 1 || 0});
}).reduce(function(a, b){
var s = Array(a.length);
for (var i = 0; i < a.length; i++) {
s[i] = (a[i] + b[i]) || 0;
}
return s;
})];
}
Any ideas on how I could replace that code with built in functions? I don't care adding auxiliar rows or columns, as long as it is a constant number of them (that is, if I extend my dataset I don't want to manually add more helper rows or columns for each new data row or column).
I think I could add an extra column that collects the header of the column with the max value for each row; and then for each data column count how many times their header appears in that auxiliar column, but does not seem very clean.
FORMULA
=ArrayFormula(TRANSPOSE(
MMULT(
N(TRANSPOSE(NamedRange1)
=
INDEX(
QUERY(TRANSPOSE(NamedRange1),
"SELECT "&JOIN(",",("MAX(Col"&TRANSPOSE(ROW(NamedRange1))-INDEX(ROW(NamedRange1),1)+1)&")"
)),
2)
),
SIGN(ROW(NamedRange1))
)
))
where NamedRange1 is named range referred to the range.
Conditional formatting:
Apply to range: A1:H4
Custom formula: =A1=MAX($A1:$H1)
Explanation
Summary
The above formula no requires extra columns, just to set the range as a named range. In the formula NamedRange1 was used but it could be customized according to your preferences.
The result is a 1 x n array where n is the number of columns of NamedRange1. Each column will have the count of occurrences of maximum values by row on the correspondent column.
Featured "hacks"
ARRAYFORMULA returns an array of values.
Ranges greater than 1 x 1 are handled as arrays.
Using an array as argument with some functions and operators works in a similar way than a loop. In this case, this features is used to create a SQL statement to get the maximum value of each column of the input data. Note that the input data for QUERY is the transpose of NamedRange1.
N coerce TRUE/FALSE values to 1/0 respectively.
MMULT is used to make sums by rows
Note: the +0 shown on the image was inserted to force Google Sheets to keep the breaklines introduced on an edit of the formula without breaklines because if there are not significant changes to the formula, the breaklines are automatically removed due to the formula/result caching feature of Google Sheets.
Reference
MMULT Usage by Adam Lusk.

select int column and compare it with Json array column

this is row in option column in table oc_cart
20,228,27,229
why no result found when value is 228 but result found when value is 20 like below :
select 1 from dual
where 228 in (select option as option from oc_cart)
and result found when I change value to 20 like
select 1 from dual
where 20 in (select option as option from oc_cart)
The option column data type is TEXT
In SQL, these two expressions are different:
WHERE 228 in ('20,228,27,229')
WHERE 228 in ('20','228','27','229')
The first example compares the integer 228 to a single string value, whose leading numeric characters can be converted to the integer 20. That's what happens. 228 is compared to 20, and fails.
The second example compares the integer 228 to a list of four values, each can be converted to different integers, and 228 matches the second integer 228.
Your subquery is returning a single string, not a list of values. If your oc_cart.option holds a single string, you can't use the IN( ) predicate in the way you're doing.
A workaround is this:
WHERE FIND_IN_SET(228, (SELECT option FROM oc_cart WHERE...))
But this is awkward. You really should not be storing strings of comma-separated numbers if you want to search for an individual number in the string. See my answer to Is storing a delimited list in a database column really that bad?

MySQL: compare a mixed field containing letters and numbers

I have a field in the mysql database that contains data like the following:
Q16
Q32
L16
Q4
L32
L64
Q64
Q8
L1
L4
Q1
And so forth. What I'm trying to do is pull out, let's say, all the values that start with Q which is easy:
field_name LIKE 'Q%'
But then I want to filter let's say all the values that have a number higher than 32. As a result I'm supposed to get only 'Q64', however, I also get Q4, Q8 and so for as I'm comparing them as strings so only 3 and the respective digit are compared and the numbers are in general taken as single digits, not as integers.
As this makes perfect sense, I'm struggling to find a solution on how to perform this operation without pulling all the data out of the database, stripping out the Qs and parsing it all to integers.
I did play around with the CAST operator, however, it only works if the value is stored as string AND it contains only digits. The parsing fails if there's another character in there..
Extract the number from the string and cast it to a number with *1 or cast
select * from your_table
where substring(field_name, 1, 1) = 'Q'
and substring(field_name, 2) * 1 > 32