GAMS csv read issue - csv

I'm trying to read a .csv file with the following format using MAC:
;lon;lat
0;55,245594;25,066697
1;55,135613;25,070419
2;55,275683;25,203425
What I am doing so far is:
$call csv2gdx coords.csv id=d index=1 values=2..lastCol useHeader=y
sets
i
c /x,y/
;
parameters
dloc(i,c) 'locations'
;
$gdxin clients_csv.gdx
$load ___ ?
What I want to do is read the lat,lon coordinates in the parameter dloc so as for each i to have a pair of coords c, i.e. lat, lon.
Example output:
x y
i1 17.175 84.327

Running your code produces an error from csv2gdx:
*** ErrNr = 15 Msg = Values(s) column number exceeds column count; Index = 2, ColCnt = 1
Per default, csv2gdx expects the entries separated by commas, which you do not have in your data. You could also define semicolon or tab as separator by means of an option, but if the data has really the format you posted, you do not need to call csv2gdx at all. You could just include the data directly like this:
Sets
i
c
;
Table dloc(i<,c<) 'locations'
$include coords.csv
;
Display dloc;
EDIT after change of input data format:
The error message is still the same. And also the reason is the same: You use a different field separator than the default one. If you switch that using the option fieldSep=semiColon, you will realize that also your decimal separator is non-default for csv2gdx. But this can be changed as well. Here is the whole code (with adjusted csv2gdx call and adjustments for data loading). Note that sets i and c get implicitly defined when loading dloc with the < syntax in the declaration of dloc.
$call csv2gdx coords.csv id=d index=1 values=2..lastCol useHeader=y fieldSep=semiColon decimalSep=comma
Sets
i
c
;
parameters
dloc(i<,c<) 'locations'
;
$gdxin coords.gdx
$load dloc=d
Display dloc;
$exit\

Related

matrix operation not returning correctly

R = [cos(pi/3) sin(pi/3); -sin(pi/3) cos(pi/3)]
[i,j]=round([1 1] * R)
returns
i =
-0 1
error: element number 2 undefined in return list
While I want i=0 and j=1
Is there a way to work around that? Or just Octave being stupid?
Octave is not being stupid; it's just that you expect the syntax [a,b] = [c,d] to result in 'destructuring', but that's not how octave/matlab works. Instead, you are assigning a 'single' output (a matrix) to two variables. Since you are not generating multiple outputs, there is no output to assign to the second variable you specify (i.e. j) so this is ignored.
Long story short, if you're after a 'destructuring' effect, you can convert your matrix to a cell, and then perform cell expansion to generate two outputs:
[i,j] = num2cell( round( [1 1] * R ) ){:}
Or, obviously, you can collect the output into a single object, and then assign to i, and j separately via that object:
[IJ] = round( [1 1] * R ) )
i = IJ(1)
j = IJ(2)
but presumably that's what you're trying to avoid.
Explanation:
The reason [a,b] = bla bla doesn't work, is because syntactically speaking, the [a,b] here isn't a normal matrix; it represents a list of variables you expect to assign return values to. If you have a function or operation that returns multiple outputs, then each output will be assigned to each of those variables in turn.
However, if you only pass a single output, and you specified multiple return variables, Octave will assign that single output to the first return variable, and ignore the rest. And since a matrix is a single object, it assigns this to i, and ignores j.
Converting the whole thing to a cell allows you to then index it via {:}, which returns all cells as a comma separated list (this can be used to pass multiple arguments into functions, for instance). You can see this if you just index without capturing - this results in 'two' answers, printed one after another:
num2cell( round( [1 1] * R ) ){:}
% ans = 0
% ans = 1
Note that many functions in matlab/octave behave differently, based on whether you call them with 1 or 2 output arguments. In other words, think of the number of output arguments with which you call a function to be part of its signature! E.g., have a look at the ind2sub function:
[r] = ind2sub([3, 3], [2,8]) % returns 1D indices
% r = 2 8
[r, ~] = ind2sub([3, 3], [2,8]) % returns 2D indices
% r = 2 2
If destructuring worked the way you assumed on normal matrices, it would be impossible to know if one is attempting to call a function in "two-outputs" mode, or simply trying to call it in "one-output" mode and then destructure the output.

When creating a namedtuple, how do I substitute a value?

I'm pulling data from the NHTSA API, using a JSON format. I'm then creating a named tuple from this data and a few other sources and using this as a record to insert into a MySQL database.
The NHTSA API uses '' to designate a null value which is not an accepted value in for this particular column in database. The column only allows a float datatype.
When creating my named tuple, is there a way to substitute None if a specific value is returned? I.e. if API call returns '', use None instead?
Error returned is
Failed inserting object into MySQL table Error while executing statement: Data truncated for column 'weight' at row 1
Tuples are immutable, hence you need to create a new tuple
Here's an example:
old = (1,2,'ABC','','','','text')
new = tuple(None if x == '' else x for x in old)
Output:
Now new contains:
(1, 2, 'ABC', None, None, None, 'text')
Refer this thread for more information
To replace one specific field value in namedtuple / NamedTuple in an easier way you can use _replace() method.
Point = namedtuple('Point', 'x,y')
p = Point(x=11, y=22)
p = p._replace(x=33)
print(p)
It will print:
Point(x=33, y=22)
_replace() substitutes a field specified with keyword argument with its value, and returns a new namedtuple with that value and the rest of values copied from an old namedtuple.

Kaggle competition submission error : The value '' in the key column '' has already been defined

This is my first time participating in a kaggle competition and I'm having trouble submitting my result table. I made my model using gbm and made a prediction table like below. the submission file has 2 column named 'fullVisitorId' and 'PredictedLogRevenue') as any other kaggle competition cases.
pred_oob = predict(object = model_gbm, newdata = te_df, type = 'response')
mysub = data.frame(fullVisitorId = test$fullVisitorId, Pred = pred_oob)
mysub = mysub %>%
group_by(fullVisitorId) %>%
summarise(Predicted = sum(Pred))
submission = read.csv('sample_submission.csv')
mysub = submission %>%
left_join(mysub, by = 'fullVisitorId')
mysub$PredictedLogRevenue = NULL
names(mysub) = names(submission)
But when I try to submit the file, I got the 'fail' message saying ...
ERROR: The value '8.893887e+17' in the key column 'fullVisitorId' has already been defined (Line 549026, Column 1)
ERROR: The value '8.895317e+18' in the key column 'fullVisitorId' has already been defined (Line 549126, Column 1)
ERROR: The value '8.895317e+18' in the key column 'fullVisitorId' has already been defined (Line 549127, Column 1)
Not just 3 lines, but 8 more lines like this.
I have no idea what I did wrong. I also checked other kernels but couldn't find the answer. Please...help!!
This issue was because fullVisitorId was numeric instead of character, so It dropped all the leading zeros. Therefore, using read.csv() with colClases argument or fread() can make it work.
I left this just because there could be someone else who are having the similar trouble like me
For creating submission dataframe, the easiest way is this
subm_df = pd.read_csv('../input/sample_submission.csv')
subm_df['PredictedLogRevenue'] = <your prediction array>
subm_df.to_csv('Subm_1.csv', index=False)
Noe this is assuming your sample_submission.csv has all fullVisitorId, which it usually does in Kaggle. Following this, I have never faced any issues.

how to iterate over xlsx data in octave with mixed types

I am trying to read a simple xlsx file with xlsread in octave. Its csv version is shown below:
2,4,abc,6
8,10,pqr,12
14,16,xyz,18
I am trying to read and write the contents with this code:
[~, ~, RAW] = xlsread('file.xlsx');
allData = cell2mat(RAW); # error with cell2mat()
printf('data nrows=%d, ncolms=%d\n', rows(allData), columns(allData));
for i=1:rows(allData)
for j=1:columns(allData)
printf('data(%d,%d) = %d\n', i,j, allData(i,j));
endfor
endfor
and I am getting the following error:
error: cell2mat: wrong type elements or mixed cells, structs, and matrices
I have experimented with several variations of this problem:
(A) If I delete the column with the text data, ie the xlsx file contains only numbers, then this code works fine.
(B) On the other hand, if I delete the cell2mat() call even for the purely number xlsx, I get an error during the cell access:
error: printf: wrong type argument 'cell'
(C) If I use cell2mat() during printf, like this:
printf('data(%d,%d) = %d\n', i,j, cell2mat(allData(i,j)));
I get correct data for the integers, and garbage for the text items.
So, how can I access and print each cell of the xlsx data, when the xlsx contains mixed-type data?
In other words, given a column index, and given that I know what type of data I am expecting there (integer or string), so how can I re-format the cell type before using it?
A numeric array cannot have multi-class data hence cell2mat fails. Cell-arrays are used to hold such type of data and you already have it in a cell array, so there is no need of conversion and so just skip that line (allData = cell2mat(RAW);).
Within the loop, you have this line:
printf('data(%d,%d) = %d\n', i, j, allData(i,j) );
% ↑ ↑ ↑
% 1 2a 2b
The problems are represented by up-arrows.
You've mixed data in your cell array but you're using %d as the data specifier. You can fix this by converting all of your data to string and then use %s as the specifier.
If you use square brackets ( ) for indexing a cell array, you will get a cell. What you need here is the content of that cell and braces { } are used for that.
So it will be:
printf('data(%d,%d) = %s\n', i,j, num2str(RAW{i,j}));
Note that instead of all that, you can simply just enter RAW to get this:
octave:1> RAW
RAW =
{
[1,1] = 2
[2,1] = 8
[3,1] = 14
[1,2] = 4
[2,2] = 10
[3,2] = 16
[1,3] = abc
[2,3] = pqr
[3,3] = xyz
[1,4] = 6
[2,4] = 12
[3,4] = 18
}

How to import comma delimited text file into datawindow (powerbuilder 11.5)

Hi good day I'm very new to powerbuilder and I'm using PB 11.5
Can someone know how to import comma delimited text file into datawindow.
Example Text file
"1234","20141011","Juan, Delacruz","Usa","001992345456"...
"12345","20141011","Arc, Ino","Newyork","005765753256"...
How can I import the third column which is the full name and the last column which is the account number. I want to transfer the name and account number into my external data window. I've tried to use the ImportString(all the rows are being transferred in one column only). I have three fields in my external data window.the Name and Account number.
Here's the code
ls_File = dw_2.Object.file_name[1]
li_FileHandle = FileOpen(ls_File)
li_FileRead = FileRead(li_FileHandle, ls_Text)
DO WHILE li_FileRead > 0
li_Count ++
li_FileRead = FileRead(li_FileHandle, ls_Text)
ll_row = dw_1.ImportString(ls_Text,1)
Loop.
Please help me with the code! Thank You
It seems that PB expects by default a tab-separated csv file (while the 'c' from 'csv' stands for 'coma'...).
Add the csv! enumerated value in the arguments of ImportString() and it should fix the point (it does in my test box).
Also, the columns defined in your dataobject must match the columns in the csv file (at least for the the first columns your are interested in). If there are mode columns in the csv file, they will be ignored. But if you want to get the 1st (or 2nd) and 3rd columns, you need to define the first 3 columns. You can always hide the #1 or #2 if you do not need it.
BTW, your code has some issues :
you should always test the return values of function calls like FileOpen() for stopping processing in case of non-existent / non-readable file
You are reading the text file twice for the first row: once before the while and another inside of the loop. Or maybe it is intended to ignore a first line with column headers ?
FWIF, here is a working code based on yours:
string ls_file = "c:\dev\powerbuilder\experiment\data.csv"
string ls_text
int li_FileHandle, li_fileread, li_count
long ll_row
li_FileHandle = FileOpen(ls_File)
if li_FileHandle < 1 then
return
end if
li_FileRead = FileRead(li_FileHandle, ls_Text)
DO WHILE li_FileRead > 0
li_Count ++
ll_row = dw_1.ImportString(csv!,ls_Text,1)
li_FileRead = FileRead(li_FileHandle, ls_Text)//read next line
Loop
fileclose(li_fileHandle)
use datawindow_name.importfile(CSV!,file_path) method.