COBOL .csv File IO into Table Not Working - csv

I am trying to learn Cobol as I have heard of it and thought it would be fun to take a look at. I came across MicroFocus Cobol, not really sure if that is pertinent to this post though, and since I like to write in visual studio it was enough incentive to try and learn it.
I've been reading alot about it and trying to follow documentation and examples. So far I've gotten user input and output to the console working so then I decided to try file IO out. That went ok when I was just reading in a 'record' at a time, I realize that 'record' may be incorrect jargon. Although I've been programming for a while I am an extreme noob with cobol.
I have a c++ program that I have written before that simply takes a .csv file and parses it then sorts the data by whatever column the user wants. I figured it wouldn't be to hard to do the same in cobol. Well apparently I have misjudged in this regard.
I have a file, edited in windows using notepad++, called test.csv which contains:
4001942600,140,4
4001942700,141,3
4001944000,142,2
This data is from the us census, which has column headers titled: GEOID, SUMLEV, STATE. I removed the header row since I couldn't figure out how to read it in at the time and then read in the other data. Anywho...
In Visual Studio 2015, on Windows 7 Pro 64 Bit, using Micro Focus, and step debugging I can see in-record containing the first row of data. The unstring works fine for that run but the next time the program 'loops' I can step debug, and view in-record and see it contains the new data however the watch display when I expand the watch elements looks like the following:
REC-COUNTER 002 PIC 9(3)
+ IN-RECORD {Length = 42} : "40019427004001942700 000 " GROUP
- GEOID {Length = 3} PIC 9(10)
GEOID(1) 4001942700 PIC 9(10)
GEOID(2) 4001942700 PIC 9(10)
GEOID(3) <Illegal data in numeric field> PIC 9(10)
- SUMLEV {Length = 3} PIC 9(3)
SUMLEV(1) <Illegal data in numeric field> PIC 9(3)
SUMLEV(2) 000 PIC 9(3)
SUMLEV(3) <Illegal data in numeric field> PIC 9(3)
- STATE {Length = 3} PIC X
STATE(1) PIC X
STATE(2) PIC X
STATE(3) PIC X
So I'm not sure why that just before the Unstring operation the second time around I can see the proper data, but after the unstring happens incorrect data is then stored in the 'table'. What is also interesting is that if I continue on the third time around the correct data is stored in the 'table'.
identification division.
program-id.endat.
environment division.
input-output section.
file-control.
select in-file assign to "C:/Users/Shittin Kitten/Google Drive/Embry-Riddle/Spring 2017/CS332/group_project/cobol1/cobol1/test.csv"
organization is line sequential.
data division.
file section.
fd in-file.
01 in-record.
05 record-table.
10 geoid occurs 3 times pic 9(10).
10 sumlev occurs 3 times pic 9(3).
10 state occurs 3 times pic X(1).
working-storage section.
01 switches.
05 eof-switch pic X value "N".
* declaring a local variable for counting
01 rec-counter pic 9(3).
* Defining constants for new line and carraige return. \n \r DNE in cobol!
78 NL value X"0A".
78 CR value X"0D".
78 TAB value X"09".
******** Start of Program ******
000-main.
open input in-file.
perform
perform 200-process-records
until eof-switch = "Y".
close in-file;
stop run.
*********** End of Program ************
******** Start of Paragraph 2 *********
200-process-records.
read in-file into in-record
at end move "Y" to eof-switch
not at end compute rec-counter = rec-counter + 1;
end-read.
Unstring in-record delimited by "," into
geoid in record-table(rec-counter),
sumlev in record-table(rec-counter),
state in record-table(rec-counter).
display "GEOID " & TAB &">> " & TAB & geoid of record-table(rec-counter).
display "SUMLEV >> " & TAB & sumlev of record-table(rec-counter).
display "STATE " & TAB &">> " & TAB & state of record-table(rec-counter) & NL.
************* End of Paragraph 2 **************
I'm very confused about why I can actually see the data after the read operation, but it isn't stored in the table. I have tried changing the declarations of the table to pic 9(some length) as well and the result changes but I can't seem to pinpoint what I'm not getting about this.

I think there are a few things you've not grasped yet, and which you need to.
In the DATA DIVISION, there are a number of SECTIONs, each of which has a specific purpose.
The FILE SECTION is where you define data structures which represent data on files (input, output or input-output). Each file has an FD, and subordinate to an FD will be one or more 01-level structures, which can be extremely simple, or complex.
Some of the exact behaviour is down to particular implementation for a compiler, but you should treat things this way, for your own "minimal surprise" and for the same of anyone who has to later amend your programs: for an input file, don't change the data after a READ, unless you are going to update the record (of if you are using a keyed READ, perhaps). You can regard the "input area" as a "window" on your data-file. The next READ, and the window is pointed to a different position. Alternatively, you can regard it as "the next record arrives, obliterating what was there previously". You have put the "result" of your UNSTRING into the record-area. The result will for sure disappear on the next read. You have the possibility (if the window is true for your compiler, and depending on the mechanism it uses for IO) of squishing the "following" data as well.
Your result should be in the WORKING-STORAGE, where it will remain undisturbed by new records being read.
READ filname INTO data-description is an implicit MOVE of the data from the record-area to data-description. If, as you have specified, data-description is the record-area, the result is "undefined". If you only want the data in the record-area, just a plain READ filename is all that is needed.
You have a similar issue with your original UNSTRING. You have the source and target fields referencing the same storage. "Undefined" and not the result you want. This is why the unnecessary UNSTRING "worked".
You have a redundant inline PERFORM. You process "something" after end-of-file. You make things more convoluted by using unnecessary "punctuation" in the PROCEDURE DIVISION (which you've apparently omitted to paste). Try using ADD instead of COMPUTE there. Look at the use of FILE STATUS, and of 88-level condition-names.
You don't need a "new line" for DISPLAY, because you get one for free unless you use NO ADVANCING.
You don't need to "concatenate" in the DISPLAY, because you get that for free as well.
DISPLAY and its cousin, ACCEPT, are the verbs (only intrinsic functions are functions in COBOL (except where your compiler supports user-defined functions)) which vary the most from compiler to compiler. If your complier supports SCREEN SECTION in the DATA DIVISION you can format and process user-input in "screens". If you were to use IBM's Enterprise COBOL you'd have very basic DISPLAY/ACCEPT.
You "declare a local variable". Do you? In what sense? Local to the program.
You can pick up quite a lot of tips by looking at COBOL questions here from the last few years.

Well I figured it out. While step debugging again, and hovering the mouse over record-table I noticed 26 white spaces present after the last data field. Now earlier tonight I attempted to change this data on the 'fly' as it were, because normally visual studio allows this. I attempted to make the change but did not verify that it took, normally I don't have to, but apparently it did not take. Now I should have known better since the icon displayed to the left of record-table displays a little closed pad-lock.
I normally program C, C++, and C# so when I see the little pad lock it usually has something to do with scoping and visibility. Not knowing COBOL well enough I overlooked this little detail.
Now I decided to unstring in-record delimited by spaces into temp-string. just prior to the
Unstring temp-string delimited by "," into
geoid in record-table(rec-counter),
sumlev in record-table(rec-counter),
state in record-table(rec-counter).
The result of this was the properly formatted data, at least as I understand it, stored into the table and printed to the console screen.
Now I have read that the unstring 'function' can utilize multiple 'operators' such as so I may try to combine these two unstring operations into one.
Cheers!
**** Update ****
I have read the Mr. Woodger's reply below. If I could ask for a bit more assistance with this. I have also read this post which is similar but above my level at this time. COBOL read/store in table
That is pretty much what I'm trying to do but I don't understand some of things Mr. Woodger is trying to explain. Below is the code a bit more refined with some questions I have as comments. I would very much like some assistance with this or maybe if I could have an offline conversation that would be fine too.
`identification division.
* I do not know what 'endat' is
program-id.endat.
environment division.
input-output section.
file-control.
* assign a file path to in-file
select in-file assign to "C:/Users/Shittin Kitten/Google Drive/Embry-Riddle/Spring 2017/CS332/group_project/cobol1/cobol1/test.csv"
* Is line sequential what I need here? I think it is
organization is line sequential.
* Is the data devision similar to typedef in C?
data division.
* Does the file sectino belong to data division?
file section.
* Am I doing this correctly? Should this be below?
fd in-file.
* I believe I am defining a structure at this point
01 in-record.
05 record-table.
10 geoid occurs 3 times pic A(10).
10 sumlev occurs 3 times pic A(3).
10 state occurs 3 times pic A(1).
* To me the working-storage section is similar to ADA declarative section
* is this a correct analogy?
working-storage section.
* Is this where in-record should go? Is in-record a representative name?
01 eof-switch pic X value "N".
01 rec-counter pic 9(1).
* I don't know if I need these
78 NL value X"0A".
78 TAB value X"09".
01 sort-col pic 9(1).
********************************* Start of Program ****************************
*Now the procedure division, this is alot like ada to me
procedure division.
* Open the file
perform 100-initialize.
* Read data
perform 200-process-records
* loop until eof
until eof-switch = "Y".
* ask user to sort by a column
display "Would which column would you like to bubble sort? " & TAB.
* get user input
accept sort-col.
* close file
perform 300-terminate.
* End program
stop run.
********************************* End of Program ****************************
******************************** Start of Paragraph 1 ************************
100-initialize.
open input in-file.
* Performing a read, what is the difference in this read and the next one
* paragraph 200? Why do I do this here instead of just opening the file?
read in-file
at end
move "Y" to eof-switch
not at end
* Should I do this addition here? Also why a semicolon?
add 1 to rec-counter;
end-read.
* Should I not be unstringing here?
Unstring in-record delimited by "," into geoid of record-table,
sumlev of record-table, state of record-table.
******************************** End of Paragraph 1 ************************
********************************* Start of Paragraph 2 **********************
200-process-records.
read in-file into in-record
at end move "Y" to eof-switch
not at end add 1 to rec-counter;
end-read.
* Should in-record be something else? I think so but don't know how to
* declare and use it
Unstring in-record delimited by "," into
geoid in record-table(rec-counter),
sumlev in record-table(rec-counter),
state in record-table(rec-counter).
* These lines seem to give the printed format that I want
display "GEOID " & TAB &">> " & TAB & geoid of record-table(rec-counter).
display "SUMLEV >> " & TAB & sumlev of record-table(rec-counter).
display "STATE " & TAB &">> " & TAB & state of record-table(rec-counter) & NL.
********************************* End of Paragraph 2 ************************
********************************* Start of Paragraph 3 ************************
300-terminate.
display "number of records >>>> " rec-counter;
close in-file;
**************************** End of Paragraph 3 *****************************
`

Related

Q/KDB+ / CSV upload and WSFULL

Pardon me but I'm a Q novice and couldn't find a solution. The code below appends a four-column CSV file to a KDB+ database. This code worked well but, now that my database is large, it throws a WSFULL error. Perhaps there is a more memory efficient way to write it. Please help:
// FILE_LOADER.q
\c 520 500
if [(count .z.x) < 1;
show `$"usage: q loadcsv.q inputfile destfile
where inputfile and destfile are absolute or relative paths to
the files. Inputfile has the following fields:
DATE, TICKER, FIELD, VALUE. DATE is of type date,
TICKER and FIELD are strings, and VALUE is converted to a float.
Any string VALUEs will show up as nulls.";
exit 1
]
f1: hsym `$.z.x[0]
f2: hsym `$.z.x[1]
columns: `DATE`TICKER`FIELD`VALUE
if [() ~ key f1; show ("Input file '",.z.x[0],"' not found");exit 1]
x: .Q.fsn[{f2 upsert flip columns!("DSSF";",")0:x};f1;4194000000]
show ("loaded ",(string x)," characters into the kdb database")
exit 0
First just from trying this out I assume your input csv file never has a header? If it does you'll need a slight code change so kdb is aware.
You are correct that it's a memory issue so what you can do is just decrease the chunk size. You are reading in 4194000000 bytes at a time right now. Try lowering this in accordance with available memory.
If you are still seeing issues it may be your garbage collection setting. You could force a gc after each read/upsert.
.Q.fsn[{f2 upsert flip columns!("DSSF";",")0:x;**.Q.gc[]**};f1;4194000000]

Opening a file of varying row and column structure in Scilab

I habitually use csvRead in scilab to read my data files however I am now faced with one which contains blocks of 200 rows, preceeded by 3 lines of headers, all of which I would like to take into account.
I've tried specifying a range of data following the example on the scilab help website for csvRead (example is right at the bottom of the page) (https://help.scilab.org/doc/6.0.0/en_US/csvRead.html) but I always come out with the same error messages :
The line and/or colmun indices are outside of the limits
or
Error in the column structure.
My first three lines are headers which I know can cause a problem but even if I omit them from my block-range, I still have the same problem.
Otherwise, my data is ordered such that I have my three lines of headers (two lines containing a header over just one or two columns, one line containing a header over all columns), 200 lines of data, and a blank line - this represents data from one image and I have about 500 images in the file, I would like to be able to read and process all of them and keep track of the headers because they state the image number which I need to reference later. Example:
DTN-dist_Devissage-1_0006_0,,,,,,
L0,,,,,,
X [mm],Y [mm],W [mm],exx [1] - Lagrange,eyy [1] - Lagrange,exy [1] - Lagrange,Von Mises Strain [1] - Lagrange
-1.13307,-15.0362,-0.00137507,7.74679e-05,8.30045e-05,5.68249e-05,0.00012711
-1.10417,-14.9504,-0.00193334,7.66086e-05,8.02914e-05,5.43132e-05,0.000122655
-1.07528,-14.8647,-0.00249155,7.57493e-05,7.75786e-05,5.18017e-05,0.0001182
Does anyone have a solution to this?
My current code, following an adapted version of the Scilab-help example looks like this (I have tried varying the blocksize and iblock values to include/omit headers:
blocksize=200;
C1=1;
C2=14;
iblock=1
while (%t)
R1=(iblock-1)*blocksize+4;
R2=blocksize+R1-1;
irange=[R1 C1 R2 C2];
V=csvRead(filepath+filename,",",".","",[],"",irange);
iblock=iblock+1
end
Errors
The CSV
A lot's of your problem comes from the inconsistency of the number of coma in your csv file. Opening it in LibreOffice Calc and saving it puts the right number of comma, even on empty lines.
R1
Your current code doesn't position R1 at the beginning of the values. The right formula is
R1=(iblock-1)*(blocksize+blanksize+headersize)+1+headersize;
End of file
Currently your code raise an error and the end of the file because R1 becomes greater than the number of lines. To solve this, you can specify the maximum number of block or test the value of R1 against the number of lines.
Improved solution for much bigger file.
When solving your probem with a big file, two problems were raised :
We need to know the number of blocks or the number of lines
Each call of csvRead is really slow because it process the whole file at each call (1s / block !)
My idea was to read the whole file and store it in a string matrix ( since mgetl as been improved since 6.0.0 ), then use csvTextScan on a submatrix. Doing so also removes the manual writing of the number of block/lines.
The code follows :
clear all
clc
s = filesep()
filepath='.'+s;
filename='DTN_full.csv';
// header is important as it as the image name
headersize=3;
blocksize=200;
C1=1;
C2=14;
iblock=1
// let save everything. Good for the example.
bigstruct = struct();
// Read all the value in one pass
// then using csvTextScan is much more efficient
text = mgetl(filepath+filename);
nlines = size(text,'r');
while ( %t )
mprintf("Block #%d",iblock);
// Lets read the header
R1=(iblock-1)*(headersize+blocksize+1)+1;
R2=R1 + headersize-1;
// if R1 or R1 is bigger than the number of lines, stop
if sum([R1,R2] > nlines )
mprintf('; End of file\n')
break
end
// We use csvTextScan ony on the lines that matters
// speed the program, since csvRead read thge whole file
// every time it is used.
H=csvTextScan(text(R1:R2),",",".","string");
mprintf("; %s",H(1,1))
R1 = R1 + headersize;
R2 = R1 + blocksize-1;
if sum([R1,R2]> nlines )
mprintf('; End of file\n')
break
end
mprintf("; rows %d to %d\n",R1,R2)
// Lets read the values
V=csvTextScan(text(R1:R2),",",".","double");
iblock=iblock+1
// Let save theses data
bigstruct(H(1,1)) = V;
end
and returns
Block #1; DTN-dist_0005_0; rows 4 to 203
....
Block #178; DTN-dist_0710_0; rows 36112 to 36311
Block #179; End of file
Time elapsed 1.827092s

What does 'multiline strings are different' meant by from RIDE (Robot Framework) output?

i am trying to compare two csv file data and followed below process in RIDE -
${csvA} = Get File ${filePathA}
${csvB} = Get File ${filePathB}
Should Be Equal As Strings ${csvA} ${csvB}
Here are my two csv contents -
csvA data
Harshil,45,8.03,DMJ
Divy,55,8,VVN
Parth,1,9,vvn
kjhjmb,44,0.5,bugg
csvB data
Harshil,45,8.03,DMJ
Divy,55,78,VVN
Parth,1,9,vvnbcb
acc,5,6,afafa
As few of the data is not in match, when i Run the code in RIDE, the result is FAIL. But in the log below data is shown -
**
Multiline strings are different:
--- first
+++ second
## -1,4 +1,4 ##
Harshil,45,8.03,DMJ
-Divy,55,8,VVN
-Parth,1,9,vvn
-kjhjmb,44,0.5,bugg
+Divy,55,78,VVN
+Parth,1,9,vvnbcb
+acc,5,6,afafa**
I would like to know the meaning of ---first +++second ##-1,4+1,4## content.
Thanks in advance!
When robot compares multiline strings (data that has newlines in it), it uses the standard unix tool diff to show the differences. Those characters are all part of what's called a unified diff. Even though you pass in raw data, it's treating the data as two files and showing the differences between the two in a format familiar to most programmers.
Here are two references to read more about the format:
What does "## -1 +1 ##" mean in Git's diff output?. (stackoverflow)
the diff man page (gnu.org)
In short, the ## gives you a reference for which line numbers are different, and the + and - show you which lines are different.
In your specific example it's telling you that three lines were different between the two strings: the line beginning with Divy, the line beginning with Parth, and the line beginning with acc. Since the line beginning with Harshil does not show a + or -, that means it was identical between the two strings.

Function to open a file and navigate to a specified line number

I have the output of recursive grep (actually ag) in a buffer, which is of the form filename:linenumber: ... [match] ..., and I want to be able to go to the occurrence (file and line number) currently under the cursor. This told me that I could execute normal-mode movements, so after extracting the file:line portion, I wrote this function:
function OpenFileNewTab(name)
let l:pair=split(a:name, ":")
execute "tabnew" get(l:pair, 0)
execute "normal!" get(l:pair, 1) . "G"
endfunction
It is supposed to open the specified file in a tab and then do <lineno>G, like I am able to do manually, to go to the specified line number. However, the cursor just stays on line 1. What am I doing wrong?
This question, by title alone, would be an exact duplicate, but it talks locating symbols in other files, while I already have the locations at hand.
Edit: My mappings for grep / ag are as follows:
nnoremap <Leader>ag :execute "new \| read !ag --literal -w" "<C-r><C-w>" g:repo \| :set filetype=c<CR>
nnoremap <Leader>gf ^v2t:"zy :execute OpenFileNewTab("<C-r>z")<CR>
To get my grep / ag results, I put the cursor on the word I want to search and enter <leader>ag, then, in the new buffer, I put the cursor on a line and enter <leader>gf - it selects from the start up to the second colon and calls OpenFileNewTab.
Edit 2: I'm on Cygwin, if it is of any importance - I doubt it.
Why don't you set &grepprg to call ag ?
" according to man ag
set grepprg=ag\ --vimgrep\ $*
set grepformat=%f:%l:%c:%m
" And then (not tested)
nnoremap <Leader>ag :grep -w <c-r><c-w><cr>
As others have said in the comments, you are just trying to emulate what the quickfix windows already provides. And, we are lucky vim can call grep, and it has a variation point to let us specify which grep program we wish to use: 'grepprg'.
Use file-line plugin. Pressing Enter on a line in the quicklist will normally open that file; file-line will make any filename of the form file:line:column (and several other formats) to open file and position to line and column.
I only found this (old) thread after I posted the exact same question on vi.stackexchange: https://vi.stackexchange.com/q/39557/44764. To help anyone who comes looking, I post the best answer to my question below as an alternative to the answers already given.
The gF command, like gf, opens the file in a new tab but additionally it also positions the cursor on the line after the colon. (I note the OP defines <leader>gf so maybe vim/neovim didn't auto-define gf or gF at the time this thread was originally created.)

liblinear's train.exe: "Wrong input format at line 1"

I'm trying to run liblinear's train.exe on Windows:
>train ex1_train.txt
Wrong input format at line 1
Here's the beginning of the file. What's wrong?
17.592 1:6.1101
9.1302 1:5.5277
13.662 1:8.5186
11.854 1:7.0032
6.8233 1:5.8598
11.886 1:8.3829
4.3483 1:7.4764
12 1:8.5781
6.5987 1:6.4862
3.8166 1:5.0546
3.2522 1:5.7107
15.505 1:14.164
3.1551 1:5.734
7.2258 1:8.4084
0.71618 1:5.6407
3.5129 1:5.3794
5.3048 1:6.3654
0.56077 1:5.1301
3.6518 1:6.4296
5.3893 1:7.0708
Liblinear requires the same input format as LibSVM. And, from their README file,
The format of training and testing data file is:
<label> <index1>:<value1> <index2>:<value2> ...
Each line contains an instance and is ended by a '\n' character. For
classification, <label> is an integer indicating the class label
(multi-class is supported). For regression, <label> is the target
value which can be any real number. For one-class SVM, it's not used
so can be any number. The pair <index>:<value> gives a feature
(attribute) value: <index> is an integer starting from 1 and <value>
is a real number. The only exception is the precomputed kernel, where
<index> starts from 0; see the section of precomputed kernels. Indices
must be in ASCENDING order.
Since we don't have the entire file, the best answer we can provide is that make sure all these instructions are followed. E.g., there is no TAB instead of space, there is no '\r\n' instead of '\n', etc. A good way to debug would be to take a few lines and keep adding until you get the error.
head -10 <yourfile> > tmp10
head -20 <yourfile> > tmp20
etc. And see where the error pops up.
My problems were that: you can't use zero as a feature id, and your features need to be sorted.