GNU Octave: reading data with NA values using textscan(...) - octave

I need to read data from an ASCII file where missing values are given as NA. Using textscan(...) does not seem to work, because textscan(...) seems to stop reading/parsing at the first occurrence of NA.
Here's a simple demonstration of the issue:
x = textscan ( "1 ; 2 ; 3\n4 ; NA ; 6" , '%d %d %d' , 'Delimiter' , ';' , 'ReturnOnError' , false )
error: textscan: Read error in field 2 of row 2
I have also tried to tell textscan(...) to interpret NA as "empty value", but no luck:
x = textscan ( "1 ; 2 ; 3\n4 ; NA ; 6" , '%d %d %d' , 'Delimiter' , ';' , 'TreatAsEmpty' , 'NA' , 'ReturnOnError' , false )
error: textscan: Read error in field 2 of row 2
Can someone explain what's going on, or how to make this work?
Note that is just a simplified example to illustrate the problem. The format of the data in my files is a bit more complex, and I really depend on textscan(...) to parse it; I don't think I can easily do it without textscan(...).
(I am running Octave 4.2.1.)

NA is defined for floating point numbers so you should use '%f' conversion specifier instead of '%d'.
x = textscan ( "1 ; 2 ; 3\n4 ; NA ; 6" , '%f %f %f' ,
'Delimiter' , ';' , 'ReturnOnError' , false )

Related

Convert integer to date DDMMYYYY

I am uploading an excel data sheet. In the sheet I have a numeric column which I want to convert to date. So 40955 should look like 04.09.1955 (DDMMYYYY)
Can someone help me out here. I tried using Data Conversion transformation component and its showing me error.
PP
Main obstacle here is that your values are not in an easy to use format.
To do what you specify it needs to break up the value into its parts, concatenate again and then convert. All this can be done in a single statement. For explanation I show the steps below.
DECLARE
#someval int = 40955,
#dateval int,
#dated date
;
SELECT
-- single extraction steps
#someval % 100 AS yearval,
( #someval / 100 ) % 100 AS monthval,
( #someval / 10000 ) AS dayval
;
SELECT
--#dateval =
-- extract year and push it to front
( #someval % 100 ) * 10000
-- extract month and push into middle
+ ( #someval / 100 ) % 100 * 100
-- extract day and keep at end
+ ( #someval / 10000 )
;
SELECT
-- clip all elements into single integer
#dateval =
( #someval % 100 ) * 10000
+ ( #someval / 100 ) % 100 * 100
+ ( #someval / 10000 )
;
SELECT
-- 112 = yyyymmdd format
#dated = CONVERT( date, CAST( #dateval AS varchar(8) ), 112 )
;
SELECT
-- show as standard (format 120) date aka ISO 8601 readable
#dated AS Dated
;
However I suspect that the value you receive from Excel is kind of Julian date. In this case the following answer will provide a solution:
convert Excel Date Serial Number to Regular Date
Keep in mind that in SSIS you need to wrap this coding into either a column or a transformation.

Assigned Octave variable not being saved to file

In the Octave script below I am looping through files in a directory, loading them in to Octave to do some manipulation on data, and then attempting to write the manipulated data ( a matrix ) to a new file whose name is derived from the name of the input file. The manipulated data is assigned to a variable name that has the same name as the file that it is to be saved in. All unwanted variables are cleared and the save command should save/write the single, assigned variable matrix to the file "new_filename."
However, this last save/write command is not happening, and I don't understand why not. Without specific variable commands, the save function should save all variables in scope, in this case there only being the one matrix to save. Why is this not working?
clear all ;
all_raw_OHLC_files = glob( "*_raw_OHLC_daily" ) ; % cell with filenames matching *_raw_OHLC_daily
for ii = 1 : length( all_raw_OHLC_files ) % loop for length of above cell
filename = all_raw_OHLC_files{ii} ; % get files' names
% create a new filename for the output file
split_filename = strsplit( filename , "_" ) ;
new_filename = tolower( [ split_filename{1} "_" split_filename{2} "_ohlc_daily" ] ) ;
% open and read file
fid = fopen( filename , 'rt' ) ;
data = textscan( fid , '%s %f %f %f %f %f %s' , 'Delimiter' , ',' , 'CollectOutput', 1 ) ;
fclose( fid ) ;
ex_data = [ datenum( data{1} , 'yyyy-mm-dd HH:MM:SS' ) data{2} ] ; % extract the file's data
% process the raw data in to OHLC bars
weekday_ix = weekday( ex_data( : , 1 ) ) ;
% find Mondays immediately preceeded by Sundays in the data
monday_ix = find( ( weekday_ix == 2 ) .* ( shift( weekday_ix , 1 ) == 1 ) ) ;
sunday_ix = monday_ix .- 1 ;
% replace Monday open with the Sunday open
ex_data( monday_ix , 2 ) = ex_data( sunday_ix , 2 ) ;
% replace Monday high with max of Sunday high and Monday high
ex_data( monday_ix , 3 ) = max( ex_data( sunday_ix , 3 ) , ex_data( monday_ix , 3 ) ) ;
% repeat for min of lows
ex_data( monday_ix , 4 ) = min( ex_data( sunday_ix , 4 ) , ex_data( monday_ix , 4 ) ) ;
% combines volume figures
ex_data( monday_ix , 6 ) = ex_data( sunday_ix , 6 ) .+ ex_data( monday_ix , 6 ) ;
% now delete the sunday data
ex_data( sunday_ix , : ) = [] ;
assignin( "base" , tolower( [ split_filename{1} "_" split_filename{2} "_ohlc_daily" ] ) , ex_data )
clear ans weekday_ix sunday_ix monday_ix ii filename split_filename fid ex_data data all_raw_OHLC_files
% print to file
save new_filename
endfor
save new_filename saves the current workspace to a file with the filename "new_filename". I guess what you want is to create a file with a filename that is stored in "new_filename":
save (new_filename);
Your current approach of "clearing all I don't need and then store the whole workspace" is IMHO very ugly and you should instead explicitly store ex_data if this is the only part wou want:
save (new_filename, "ex_data");

ANTLR: problem differntiating unary and binary operators (e.g. minus sign)

i'm using ANTLR (3.2) to parse some rather simple grammar. Unfortunately, I came across a little problem. Take the follwoing rule:
exp
: NUM
| '(' expression OPERATOR expression ')' -> expression+
| '(' (MINUS | '!') expression ')' -> expression
;
OPERATOR contains the same minus sign ('-') as is defined with MINUS. Now ANTLR seems to be unable to deal with these two rules. If I remove either one, everything works fine.
Anyone ideas?
Make the unary expression the one with the highest precedence. I'd also use a different token for the unary - to make the distinction between the minus better. A demo:
grammar Exp;
options {
output=AST;
}
tokens {
UNARY;
}
parse
: exp EOF
;
exp
: additionExp
;
additionExp
: multiplyExp ('+'^ multiplyExp | '-'^ multiplyExp)*
;
multiplyExp
: unaryExp ('*'^ unaryExp | '/'^ unaryExp)*
;
unaryExp
: '-' atom -> ^(UNARY atom)
| '!' atom -> ^('!' atom)
| atom
;
atom
: '(' exp ')' -> exp
| Number -> Number
;
Number : ('0'..'9')+ ('.' ('0'..'9')+)? ;
Spaces : (' ' | '\t' | '\r'| '\n') {$channel=HIDDEN;} ;
A quick test with the source:
3 * -4 + 7 / 6 * -(3 + -7 * (4 + !2))
produced the following AST:
image created using http://graph.gafol.net/

How to convert float to varchar in SQL Server

I have a float column with numbers of different length and I'm trying to convert them to varchar.
Some values exceed bigint max size, so I can't do something like this
cast(cast(float_field as bigint) as varchar(100))
I've tried using decimal, but numbers aren't of the same size, so this doesn't help too
CONVERT(varchar(100), Cast(float_field as decimal(38, 0)))
Any help is appreciated.
UPDATE:
Sample value is 2.2000012095022E+26.
Try using the STR() function.
SELECT STR(float_field, 25, 5)
STR() Function
Another note: this pads on the left with spaces. If this is a problem combine with LTRIM:
SELECT LTRIM(STR(float_field, 25, 5))
The only query bit I found that returns the EXACT same original number is
CONVERT (VARCHAR(50), float_field,128)
See http://www.connectsql.com/2011/04/normal-0-microsoftinternetexplorer4.html
The other solutions above will sometimes round or add digits at the end
UPDATE: As per comments below and what I can see in https://msdn.microsoft.com/en-us/library/ms187928.aspx:
CONVERT (VARCHAR(50), float_field,3)
Should be used in new SQL Server versions (Azure SQL Database, and starting in SQL Server 2016 RC3)
this is the solution I ended up using in sqlserver 2012 (since all the other suggestions had the drawback of truncating fractional part or some other drawback).
declare #float float = 1000000000.1234;
select format(#float, N'#.##############################');
output:
1000000000.1234
this has the further advantage (in my case) to make thousands separator and localization easy:
select format(#float, N'#,##0.##########', 'de-DE');
output:
1.000.000.000,1234
SELECT LTRIM(STR(float_field, 25, 0))
is the best way so you do not add .0000 and any digit at the end of the value.
Convert into an integer first and then into a string:
cast((convert(int,b.tax_id)) as varchar(20))
Useful topic thanks.
If you want like me remove leadings zero you can use that :
DECLARE #MyFloat [float];
SET #MyFloat = 1000109360.050;
SELECT REPLACE(RTRIM(REPLACE(REPLACE(RTRIM(LTRIM(REPLACE(STR(#MyFloat, 38, 16), '0', ' '))), ' ', '0'),'.',' ')),' ',',')
float only has a max. precision of 15 digits. Digits after the 15th position are therefore random, and conversion to bigint (max. 19 digits) or decimal does not help you.
This can help without rounding
declare #test float(25)
declare #test1 decimal(10,5)
select #test = 34.0387597207
select #test
set #test1 = convert (decimal(10,5), #test)
select cast((#test1) as varchar(12))
Select LEFT(cast((#test1) as varchar(12)),LEN(cast((#test1) as varchar(12)))-1)
Try this one, should work:
cast((convert(bigint,b.tax_id)) as varchar(20))
select replace(myFloat, '', '')
from REPLACE() documentation:
Returns nvarchar if one of the input arguments is of the nvarchar data type; otherwise, REPLACE returns varchar.
Returns NULL if any one of the arguments is NULL.
tests:
null ==> [NULL]
1.11 ==> 1.11
1.10 ==> 1.1
1.00 ==> 1
0.00 ==> 0
-1.10 ==> -1.1
0.00001 ==> 1e-005
0.000011 ==> 1.1e-005
If you use a CLR function, you can convert the float to a string that looks just like the float, without all the extra 0's at the end.
CLR Function
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.Read)]
[return: SqlFacet(MaxSize = 50)]
public static SqlString float_to_str(double Value, int TruncAfter)
{
string rtn1 = Value.ToString("R");
string rtn2 = Value.ToString("0." + new string('0', TruncAfter));
if (rtn1.Length < rtn2.Length) { return rtn1; } else { return rtn2; }
}
.
Example
create table #temp (value float)
insert into #temp values (0.73), (0), (0.63921), (-0.70945), (0.28), (0.72000002861023), (3.7), (-0.01), (0.86), (0.55489), (0.439999997615814)
select value,
dbo.float_to_str(value, 18) as converted,
case when value = cast(dbo.float_to_str(value, 18) as float) then 1 else 0 end as same
from #temp
drop table #temp
.
Output
value converted same
---------------------- -------------------------- -----------
0.73 0.73 1
0 0 1
0.63921 0.63921 1
-0.70945 -0.70945 1
0.28 0.28 1
0.72000002861023 0.72000002861023 1
3.7 3.7 1
-0.01 -0.01 1
0.86 0.86 1
0.55489 0.55489 1
0.439999997615814 0.439999997615814 1
.
Caveat
All converted strings are truncated at 18 decimal places, and there are no trailing zeros. 18 digits of precision is not a problem for us. And, 100% of our FP numbers (close to 100,000 values) look identical as string values as they do in the database as FP numbers.
Modified Axel's response a bit as it for certain cases will produce undesirable results.
DECLARE #MyFloat [float];
SET #MyFloat = 1000109360.050;
SELECT REPLACE(RTRIM(REPLACE(REPLACE(RTRIM((REPLACE(CAST(CAST(#MyFloat AS DECIMAL(38,18)) AS VARCHAR(max)), '0', ' '))), ' ', '0'),'.',' ')),' ','.')
Select
cast(replace(convert(decimal(15,2),acs_daily_debit), '.', ',') as varchar(20))
from acs_balance_details
Based on molecular's answer:
DECLARE #F FLOAT = 1000000000.1234;
SELECT #F AS Original, CAST(FORMAT(#F, N'#.##############################') AS VARCHAR) AS Formatted;
SET #F = 823399066925.049
SELECT #F AS Original, CAST(#F AS VARCHAR) AS Formatted
UNION ALL SELECT #F AS Original, CONVERT(VARCHAR(128), #F, 128) AS Formatted
UNION ALL SELECT #F AS Original, CAST(FORMAT(#F, N'G') AS VARCHAR) AS Formatted;
SET #F = 0.502184537571209
SELECT #F AS Original, CAST(#F AS VARCHAR) AS Formatted
UNION ALL SELECT #F AS Original, CONVERT(VARCHAR(128), #F, 128) AS Formatted
UNION ALL SELECT #F AS Original, CAST(FORMAT(#F, N'G') AS VARCHAR) AS Formatted;
I just came across a similar situation and was surprised at the rounding issues of 'very large numbers' presented within SSMS v17.9.1 / SQL 2017.
I am not suggesting I have a solution, however I have observed that FORMAT presents a number which appears correct. I can not imply this reduces further rounding issues or is useful within a complicated mathematical function.
T SQL Code supplied which should clearly demonstrate my observations while enabling others to test their code and ideas should the need arise.
WITH Units AS
(
SELECT 1.0 AS [RaisedPower] , 'Ten' As UnitDescription
UNION ALL
SELECT 2.0 AS [RaisedPower] , 'Hundred' As UnitDescription
UNION ALL
SELECT 3.0 AS [RaisedPower] , 'Thousand' As UnitDescription
UNION ALL
SELECT 6.0 AS [RaisedPower] , 'Million' As UnitDescription
UNION ALL
SELECT 9.0 AS [RaisedPower] , 'Billion' As UnitDescription
UNION ALL
SELECT 12.0 AS [RaisedPower] , 'Trillion' As UnitDescription
UNION ALL
SELECT 15.0 AS [RaisedPower] , 'Quadrillion' As UnitDescription
UNION ALL
SELECT 18.0 AS [RaisedPower] , 'Quintillion' As UnitDescription
UNION ALL
SELECT 21.0 AS [RaisedPower] , 'Sextillion' As UnitDescription
UNION ALL
SELECT 24.0 AS [RaisedPower] , 'Septillion' As UnitDescription
UNION ALL
SELECT 27.0 AS [RaisedPower] , 'Octillion' As UnitDescription
UNION ALL
SELECT 30.0 AS [RaisedPower] , 'Nonillion' As UnitDescription
UNION ALL
SELECT 33.0 AS [RaisedPower] , 'Decillion' As UnitDescription
)
SELECT UnitDescription
, POWER( CAST(10.0 AS FLOAT(53)) , [RaisedPower] ) AS ReturnsFloat
, CAST( POWER( CAST(10.0 AS FLOAT(53)) , [RaisedPower] ) AS NUMERIC (38,0) ) AS RoundingIssues
, STR( CAST( POWER( CAST(10.0 AS FLOAT(53)) , [RaisedPower] ) AS NUMERIC (38,0) ) , CAST([RaisedPower] AS INT) + 2, 0) AS LessRoundingIssues
, FORMAT( POWER( CAST(10.0 AS FLOAT(53)) , [RaisedPower] ) , '0') AS NicelyFormatted
FROM Units
ORDER BY [RaisedPower]

Implementing parts of rfc4226 (HOTP) in mysql

Like the title says, I'm trying to implement the programmatic parts of RFC4226 "HOTP: An HMAC-Based One-Time Password Algorithm" in SQL. I think I've got a version that works (in that for a small test sample, it produces the same result as the Java version in the code), but it contains a nested pair of hex(unhex()) calls, which I feel can be done better. I am constrained by a) needing to do this algorithm, and b) needing to do it in mysql, otherwise I'm happy to look at other ways of doing this.
What I've got so far:
-- From the inside out...
-- Concatinate the users secret, and the number of time its been used
-- find the SHA1 hash of that string
-- Turn a 40 byte hex encoding into a 20 byte binary string
-- keep the first 4 bytes
-- turn those back into a hex represnetation
-- convert that into an integer
-- Throw away the most-significant bit (solves signed/unsigned problems)
-- Truncate to 6 digits
-- store into otp
-- from the otpsecrets table
select (conv(hex(substr(unhex(sha1(concat(secret, uses))), 1, 4)), 16, 10) & 0x7fffffff) % 1000000
into otp
from otpsecrets;
Is there a better (more efficient) way of doing this?
I haven't read the spec, but I think you don't need to convert back and forth between hex and binary, so this might be a little more efficient:
SELECT (conv(substr(sha1(concat(secret, uses)), 1, 8), 16, 10) & 0x7fffffff) % 1000000
INTO otp
FROM otpsecrets;
This seems to give the same result as your query for a few examples I tested.
This is absolutely horrific, but it works with my 6-digit OTP tokens. Call as:
select HOTP( floor( unix_timestamp()/60), secret ) 'OTP' from SecretKeyTable;
drop function HOTP;
delimiter //
CREATE FUNCTION HOTP(C integer, K BINARY(64)) RETURNS char(6)
BEGIN
declare i INTEGER;
declare ipad BINARY(64);
declare opad BINARY(64);
declare hmac BINARY(20);
declare cbin BINARY(8);
set i = 1;
set ipad = repeat( 0x36, 64 );
set opad = repeat( 0x5c, 64 );
repeat
set ipad = insert( ipad, i, 1, char( ascii( substr( K, i, 1 ) ) ^ 0x36 ) );
set opad = insert( opad, i, 1, char( ascii( substr( K, i, 1 ) ) ^ 0x5C ) );
set i = i + 1;
until (i > 64) end repeat;
set cbin = unhex( lpad( hex( C ), 16, '0' ) );
set hmac = unhex( sha1( concat( opad, unhex( sha1( concat( ipad, cbin ) ) ) ) ) );
return lpad( (conv(hex(substr( hmac, (ascii( right( hmac, 1 ) ) & 0x0f) + 1, 4 )),16,10) & 0x7fffffff) % 1000000, 6, '0' );
END
//
delimiter ;