Reading csv files into table in GAMS - csv

I am having trouble reading CSV files into table. Suppose I have csv file looking like:
;c1;c1;c3;c4
r1;(some numeric values separated by ";")
r2;(some numeric values separated by ";")
I have tried to rewrite the csv into a .inc file and replacing ;with spaces. And then do something like this:
set col /c1 * c4/
set row /r1 * r2/
table(row, col)
$include("myfile.inc")
;
But this doesn't work because my collins are not aligned and I can't do it manually because I have more than 500 columns.
My problem can be solved by
finding a way to define table entries without the text being aligned
find way to read csv directly into GAMS
What you suggest I do?

There are two things that could help you:
Setting $onDelim (https://www.gams.com/latest/docs/UG_DollarControlOptions.html#DOLLARonoffdelim)
Using csv2gdx (https://www.gams.com/latest/docs/T_CSV2GDX.html)

Related

How to remove duplicates from a csv file without sorting

I have a file with three columns: first-name, last-name, email
I want to remove duplicate columns without sorting files. The file is in csv format.
How I can do that? Please help.
You can use python to remove the duplicate from a csv file. There are multiple ways in python to remove the duplicates. E.g you can use set as shown below
import fileinput
withoutDuplicate = set()
for line in fileinput.FileInput('<name of your csv file>', inplace=1):
if line in withoutDuplicate:
withoutDuplicate.add(line)
print line,
This will remove the duplicate from your original file.Let me know if you are looking for something else.

Python 3: write string list to csv file

I have found several answers (encoding, decoding...) online, but I still don't get what to do.
I have a list called abc.
abc = ['sentence1','-1','sentence2','1','sentence3','0'...]
Now I would like to store this list in a CSV file, the following way:
sentence1, -1
sentence2, 1
sentence3, 0
I know that the format of my abc list probably isn't how it should to achieve this. I guess it should be a list of lists? But the major problem is actually that I have no clue how to write this to a CSV file, using Python 3. The only times it kinda worked, was when every character turned out to be separated by a comma.
Does anybody know how to solve this? Thank you!
You can use zip and join to create a new list and then write to csv :
abc=['sentence1', '-1', 'sentence2', '1', 'sentence3', '0', 'sentence4']
new=[(abc[0],)]+[(''.join(i),) for i in zip(abc[1::2],abc[2::2])]
import csv
with open('test.csv', 'w', newline='') as fp:
a = csv.writer(fp, delimiter=',')
a.writerows(new)
result :
sentence1
-1sentence2
1sentence3
0sentence4
Here is the documentation to work with files, and CSV is basically the same thing as txt, the difference is that you should use commas to separate the columns and new lines to rows.
In your example you could do this (or iterate over a loop):
formated_to_csv = abc[0]+','+abc[1]+','+abc[2]+','+abc[3]...
the value of formated_to_csv would be 'sentence1,-1,sentence2,1,sentence3,0'.. note that this is a single string, so it will generate a single row, and then write the formated_to_csv as text in the csv file :
f.write(formated_to_csv)
To put all sentences on the first column and all the numbers on the second column it would be better to have a list of lists :
abc = [['sentence1','-1'],['sentence2','1'],['sentence3','0']...]
for row in abc:
f.write(row[0]+','+row[1])
The "conversion" to table will be done by excel, calc or whatever program that you use to read spreadsheets.

Unable to import 3.4GB csv into redshift because values contains free-text with commas

And so we found a 3.6GB csv that we have uploaded onto S3 and now want to import into Redshift, then do the querying and analysis from iPython.
Problem 1:
This comma delimited file contains values free text that also contains commas and this is interfering with the delimiting so can’t upload to Redshift.
When we tried opening the sample dataset in Excel, Excel surprisingly puts them into columns correctly.
Problem 2:
A column that is supposed to contain integers have some records containing alphabets to indicate some other scenario.
So, the only way to get the import through is to declare this column as varchar. But then we can do calculations later on.
Problem 3:
The datetime data type requires the date time value to be in the format YYYY-MM-DD HH:MM:SS, but the csv doesn’t contain the SS and the database is rejecting the import.
We can’t manipulate the data on a local machine because it is too big, and we can’t upload onto the cloud for computing because it is not in the correct format.
The last resort would be to scale the instance running iPython all the way up so that we can read the big csv directly from S3, but this approach doesn’t make sense as a long-term solution.
Your suggestions?
Train: https://s3-ap-southeast-1.amazonaws.com/bucketbigdataclass/stack_overflow_train.csv (3.4GB)
Train Sample: https://s3-ap-southeast-1.amazonaws.com/bucketbigdataclass/stack_overflow_train-sample.csv (133MB)
Try having different delimiter or use escape characters.
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_preparing_data.html
For second issue, if you want to extract only numbers from the column after loading into char use regexp_replace or other functions.
For third issue, you can as well load it into VARCHAR field and then use substring cast(left(column_name, 10)||' '||right(column_name, 6)||':00' as timestamp)
to load it into final table from staging table
For the first issue, you need to find out a way to differentiate between the two types of commas - the delimiter and the text commas. Once you have done that, replace the delimiters with a different delimiter and use the same as delimiter in the copy command for Redshift.
For the second issue, you need to first figure out if this column needs to be present for numerical aggregations once loaded. If yes, you need to get this data cleaned up before loading. If no, you can directly load this as char/ varchar field. All your queries will still work but you will not be able to do any aggregations (sum/ avg and the likes) on this field.
For problem 3, you can use Text(date, "yyyy-mm-dd hh:mm:ss") function in excel to do a mass replace for this field.
Let me know if this works out.

Updating a local SAS table from a CSV file

3 and have a table which I need to update. From my understanding, you can do something like the following:
data new_table;
update old_table update_table;
by some_key;
run;
My issue (well I have a few...) is that I'm importing the "update_table" from a CSV file and the formats aren't matching the "old_table", so the update fails.
I've tried creating the "update_table" from the "old_table" using proc sql create table with zero observations, which created the correct types/formats, but then I was unable to insert data into it without replacing it.
The other major issue I have is that there are a large number of columns (480), and custom formats, and I've run up against a 6000 character limit for the script.
I'm very new to SAS and any help would be greatly appreciated :)
It sounds like you need to use a data step to read in your CSV. There are lots of papers out there explaining how to do this, so I won't cover it here. This will allow you to specify the format (numeric/character) for each field. The nice thing here is you already know what formats they need to be in (from your existing dataset), so you can create this read in fairly easily.
Let's say your data is so:
data have;
informat x date9.;
input x y z $;
datalines;
10JAN2010 1 Base
11JAN2010 4 City
12JAN2010 8 State
;;;;
run;
Now, if you have a CSV of the same format, you can read it in by generating the input code from the above dataset. You can use PROC CONTENTS to do this, or you can generate it by using dictionary.tables which has the same information as PROC CONTENTS.
proc sql;
select catx(' ',name,ifc(type='char', '$' ,' '))into :inputlist
separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE';
select catx(' ',name,informat) into :informatlist separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE'
and not missing(informat);
quit;
The above are two examples; they may or may not be sufficient for your particular needs.
Then you use them like so:
data want;
infile datalines dlm=',';
informat &informatlist.;
input &inputlist.;
datalines;
13JAN2010,9,REGION
;;;;
run;
(obviously you would use your CSV file instead of datalines, just used here as example).
The point is you can write the data step code using the metadata from your original dataset.
I needed this today, so I made a macro out of it: https://core.sasjs.io/mp__csv2ds_8sas.html
It doesn't wrap the input statement so it may break with a large number of columns if you have batch line length limits. If anyone would like me to fix that, just raise an issue: https://github.com/sasjs/core/issues/new

How to deal with a string with comma in it from a csv, when we have to read the data by using loadrunner?

When I used Loadrunner, it can read data from a csv file. As we know , csv file is separated by a comma.
The question is, if the parameter in csv has comma itself, the string will be separated to several segments. That is not I want to get.
How can we get the original data with comma in it?
When data has a comma, use an escape character to store the data in the parameter.
For example, if the name is 'Smith, John', it can be stored as Smith\, John in the Loadrunner data file.
When you save a file in Excel that has commas in the actual cell data, the whole cell will be inside two " characters. Also it seems that cells with a space in them are inside " chars.
Example
ColA,ColB,"ColC with a , inside",ColD,ColE
More info on CSV file format: http://www.parse-o-matic.com/parse/pskb/CSV-File-Format.htm
The answer to the question is that perhaps the easiest way to do deal with , separators is to change the separator to a ; character. This is also a valid separator in CSV files.
Then the example would be:
ColA;ColB;"ColC with a , inside";ColD;ColE
Maybe the right way is to use C functions to read data from the file (for example fopen/fread)? When you have read it you be able to use "strchr" to find first quotes char and second quotes char. All in that interval would be a value, and it doesn't matter if comma is inside.
For the documentation about fopen, fread,strchr, you could refer to the HP or C function references.
Hope this will help you.
Assuming you are reading from a data file for the parameters, just use a custom seperator. Comma is the default, but you can define it to be whatever you want. Whenever I have a comma in the variable data I tend to use a pipe symbol, '|' as a way to distinguish the columns of data in the data file.
Examine your parameter dialog carefully and you will see where to make the change.