How can I import a .TSV file using the "Get Data" command using SPSS syntax? - csv

I'm importing a .TSV file, with the first row being the variable name and the first column as IDs, into SPSS using a syntax but I keep getting a Failure opening file error in my output. This is my code so far:
GET DATA
/TYPE=TXT
/FILE=\filelocation\filename.tsv
/DELCASE=LINE
/DELIMTERS="/t"
/QUALIFIER=''
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/IMPORTCASE=ALL
/VARIABLES=
/MAP
RESTORE.
CACHE.
EXECUTE.
SAVE OUTFILE = "newfile.sav"
I think I'm having an issue in the delimters or qualifiers subcommand. Wondering if I should also include the variables under the variables subcommand. Any advice would be helpful. Thanks!

The GET DATA command you cite above has an empty /VARIABLES= subcommand.
If you used the "File -> Import Data -> Text Data' wizard, it would have populated this subcommand for you. If you are writing the GET DATA command syntax yourself, then you'd have to supply that list of field names yourself.

Related

How to convert dbase III files to mysql?

is ist possible to convert .DBF files to any other format?
Does anybody knows a script, that can be used to convert .DBF files to an mysql query.
It would be also fine, to convert the DBF files to CSV files.
I always got problems with the codec of the DBF files.
Konstantin
https://www.dbase.com/Knowledgebase/faq/import_export_data.asp
Q: How do I export data from a dBASE table to a text file?
A: Exporting data from dBASE to a text file is handled through the COPY TO command.
Like the APPEND FROM command, there are a number of ways to use this command. Here we are only interested in it's most basic use. Once you understand how to use this command, you can go to your on-line help for further details on what can be accomplished with the COPY TO command.
In order to export data you must first be using the table from which the data will be exported. As before, you will be employing the USE command in the command window.
USE <tablename>
For example:
USE Mytest.dbf
Once the table is in use, all you need to do is type the following command in the command window:
COPY TO <filename> TYPE DELIMITED
For example:
COPY TO Myexport.txt TYPE DELIMITED
This would result in a file being created in the current directory called Myexport.txt which would be in the DELIMITED or *.CSV format.
If we had wanted to export the data in the *.SDF format, we would have typed:
COPY TO Myexport.txt TYPE SDF
This would result in a file being created in the current directory called Myexport.txt which would be in the System Delimted or *.SDF format.
Those are the basics on how to import and export text data into a dBASE table. For further information consult the on-line help for the APPEND FROM and COPY TO commands.
I converted old (circa 1997) DBF files to CSV using Python and the dbfread module.
After installation of Python, from the Python interpreter (<WIN> + 'Python') install the dbfread module:
>>> pip install dbfread
The module has many method to read DBF files and excellent documentation.
Then a Python script does the job, or typing directly into the interpreter:
# Read the DBF file
table = DBF('C:/my_dbf_file.dbf', encoding='1252')
outFileName = 'C:/my_export.csv'
with open(outFileName, 'w', newline='', encoding='1252' ) as file:
writer = csv.writer(file)
writer.writerow(table.field_names)
for record in table:
writer.writerow(list(record.values()))
Note that each record in the database is read and save one at a time and that the first line of the CSV file are the column's names.
Encoding could be problematic, a list of encoding to try is here: The dbread.DBF() method tries to guess the encoding but is not perfect. This is why in the code I specify the parameters encoding in both DBF() and csv.open().

SAS - Change the format of a column in a proc imported CSV

I cannot quite figure out how to change the format of a column in my data file. I have the data set proc imported, and it guessed the format of a specific column as numeric, I would like to to be character-based.
This is where I'm currently at, and it does not change the format of my NUMBER column:
proc import
datafile = 'datapath'
out = dataname
dbms = CSV
replace
;
format NUMBER $8.
;
guessingrows = 20000
;
run;
You could import the data and then format after using - I believe the following would work.
proc sql;
create table want as
select *,
put(Number, 4.) as CharacterVersion
from data;
quit;
You cannot change the type/format via PROC IMPORT. However, you can write a data step to read in the file and then customize everything. If you're not sure how to start with that, check the log after you run a PROC IMPORT and it will have the 'skeleton' code. You can copy that code, edit it, and run to get what you need. Writing from scratch also works using an INFILE and INPUT statement.
From the help file (search for "Processing Delimited Files in SAS")
If you need to revise your code after the procedure runs, issue the RECALL command (or press F4) to the generated DATA step. At this point, you can add or remove options from the INFILE statement and customize the INFORMAT, FORMAT, and INPUT statements to your data.
Granted, the grammar in this section is horrific! The idea is that the IMPORT Procedure generates source code that can be recalled and modified for subsequent submission.

Difficulties creating CSV table in Google BigQuery

I'm having some difficulties creating a table in Google BigQuery using CSV data that we download from another system.
The goal is to have a bucket in the Google Cloud Platform that we will upload a 1 CSV file per month. This CSV files have around 3,000 - 10,000 rows of data, depending on the month.
The error I am getting from the job history in the Big Query API is:
Error while reading data, error message: CSV table encountered too
many errors, giving up. Rows: 2949; errors: 1. Please look into the
errors[] collection for more details.
When I am uploading the CSV files, I am selecting the following:
file format: csv
table type: native table
auto detect: tried automatic and manual
partitioning: no partitioning
write preference: WRITE_EMPTY (cannot change this)
number of errors allowed: 0
ignore unknown values: unchecked
field delimiter: comma
header rows to skip: 1 (also tried 0 and manually deleting the header rows from the csv files).
Any help would be greatly appreciated.
This usually points to the error in the structure of data source (in this case your CSV file). Since your CSV file is small, you can run a little validation script to see that the number of columns is exactly the same across all your rows in the CSV, before running the export.
Maybe something like:
cat myfile.csv | awk -F, '{ a[NF]++ } END { for (n in a) print n, "rows have",a[n],"columns" }'
Or, you can bind it to the condition (lets say if your number of columns should be 5):
ncols=$(cat myfile.csv | awk -F, 'x=0;{ a[NF]++ } END { for (n in a){print a[n]; x++; if (x==1){break}}}'); if [ $ncols==5 ]; then python myexportscript.py; else echo "number of columns invalid: ", $ncols; fi;
It's impossible to point out the error without seeing an example CSV file, but it's very likely that your file is incorrectly formatted. As a result, one typo confuses BQ into thinking there are thousands. Let's say you have the following csv file:
Sally Whittaker,2018,McCarren House,312,3.75
Belinda Jameson 2017,Cushing House,148,3.52 //Missing a comma after the name
Jeff Smith,2018,Prescott House,17-D,3.20
Sandy Allen,2019,Oliver House,108,3.48
With the following schema:
Name(String) Class(Int64) Dorm(String) Room(String) GPA(Float64)
Since the schema is missing a comma, everything is shifted one column over. If you have a large file, it results in thousands of errors as it attempts to inserts Strings into Ints/Floats.
I suggest you run your csv file through a csv validator before uploading it to BQ. It might find something that breaks it. It's even possible that one of your fields has a comma inside the value which breaks everything.
Another theory to investigate is to make sure that all required columns receive an appropriate (non-null) value. A common cause of this error is if you cast data incorrectly which returns a null value for a specific field in every row.
As mentioned by Scicrazed, this issue seems to be generated as some file rows has an incorrect format, in which case it is required to validate the content data in order to figure out the specific error that is leading this issue.
I recommend you to check the errors[] collection that might contains additional information about the aspects that can be making to fail the process. You can do this by using the Jobs: get method that returns detailed information about your BigQuery Job or refer to the additionalErrors field of the JobStatus Stackdriver logs that contains the same complete error data that is reported by the service.
I'm probably too late for this, but it seems the file has some errors (it can be a character that cannot be parsed or just a string in an int column) and BigQuery cannot upload it automatically.
You need to understand what the error is and fix it somehow. An easy way to do it is by running this command on the terminal:
bq --format=prettyjson show -j <JobID>
and you will be able to see additional logs for the error to help you understand the problem.
If the error happens only a few times you just can increase the number of errors allowed.
If it happens many times you will need to manipulate your CSV file before you upload it.
Hope it helps

How to copy csv data file to Amazon RedShift?

I'm trying to migrating some MySQL tables to Amazon Redshift, but met some problems.
The steps are simple:
1. Dump the MySQL table to a csv file
2. Upload the csv file to S3
3. Copy the data file to RedShift
Error occurs in step 3:
The SQL command is:
copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS
'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' delimiter ',' csv;
The error info:
An error occurred when executing the SQL command: copy TABLE_A from
's3://ciphor/TABLE_A.csv' CREDENTIALS
'aws_access_key_id=xxxx;aws_secret_access_key=xxxx ERROR: COPY CSV is
not supported [SQL State=0A000] Execution time: 0.53s 1 statement(s)
failed.
I don't know if there's any limitations on the format of the csv file, say the delimiters and quotes, I cannot find it in documents.
Any one can help?
The problem is finally resolved by using:
copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS
'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' delimiter ','
removequotes;
More information can be found here http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
Now Amazon Redshift supports CSV option for COPY command. It's better to use this option to import CSV formatted data correctly. The format is shown bellow.
COPY [table-name] FROM 's3://[bucket-name]/[file-path or prefix]'
CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' CSV;
The default delimiter is ( , ) and the default quotes is ( " ). Also you can import TSV formatted data with CSV and DELIMITER option like this.
COPY [table-name] FROM 's3://[bucket-name]/[file-path or prefix]'
CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' CSV DELIMITER '\t';
There are some disadvantages to use the old way(DELIMITER and REMOVEQUOTES) that REMOVEQUOTES does not support to have a new line or a delimiter character within an enclosed filed. If the data can include this kind of characters, you should use CSV option.
See the following link for the details.
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
If you want to save your self some code/ you have a very basic use case you can use Amazon Data Pipeline.
it stats a spot instance and perform the transformation within amazon network and it's really intuitive tool (but very simple so you can't do complex things with it)
You can try with this
copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' csv;
CSV itself means comma separated values, no need to provide delimiter with this. Please refer link.
[http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-format]
I always this code:
COPY clinical_survey
FROM 's3://milad-test/clinical_survey.csv'
iam_role 'arn:aws:iam::123456789123:role/miladS3xxx'
CSV
IGNOREHEADER 1
;
Description:
1- COPY the name of your file store in S3
2- FROM address of file
3- iam_role is a substitution for CREDENTIAL. Note that, iam_role should be defined in iam management menu at your console, and then in trust menu should be assigned to the user as well (That is the hardest part!)
4- CSV uses comma delimiter
5- IGNORHEADER 1 is a must! Otherwise it will throw an error. (skip one row of my CSV and consider it as a header)
Since the resolution has already been provided, I'll not repeat the obvious.
However, in case you receive some more error which you're not able to figure out, simply execute on your workbench while you're connected to any of the Redshift accounts:
select * from stl_load_errors [where ...];
stl_load_errors contains all the Amazon RS load errors in historical fashion where a normal user can view details corresponding to his / her own account but a superuser can have all the access.
The details are captured elaborately at :
Amazon STL Load Errors Documentation
Little late to comment but it can be useful:-
You can use an open source project to copy tables directly from mysql to redshift - sqlshift.
It only requires spark and if you have yarn then it can also be used.
Benefits:- It will automatically decides distkey and interleaved sortkey using primary key.
It looks like you are trying to load local file into REDSHIFT table.
CSV file has to be on S3 for COPY command to work.
If you can extract data from table to CSV file you have one more scripting option. You can use Python/boto/psycopg2 combo to script your CSV load to Amazon Redshift.
In my MySQL_To_Redshift_Loader I do the following:
Extract data from MySQL into temp file.
loadConf=[ db_client_dbshell ,'-u', opt.mysql_user,'-p%s' % opt.mysql_pwd,'-D',opt.mysql_db_name, '-h', opt.mysql_db_server]
...
q="""
%s %s
INTO OUTFILE '%s'
FIELDS TERMINATED BY '%s'
ENCLOSED BY '%s'
LINES TERMINATED BY '\r\n';
""" % (in_qry, limit, out_file, opt.mysql_col_delim,opt.mysql_quote)
p1 = Popen(['echo', q], stdout=PIPE,stderr=PIPE,env=env)
p2 = Popen(loadConf, stdin=p1.stdout, stdout=PIPE,stderr=PIPE)
...
Compress and load data to S3 using boto Python module and multipart upload.
conn = boto.connect_s3(AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(bucket_name)
k = Key(bucket)
k.key = s3_key_name
k.set_contents_from_file(file_handle, cb=progress, num_cb=20,
reduced_redundancy=use_rr )
Use psycopg2 COPY command to append data to Redshift table.
sql="""
copy %s from '%s'
CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s'
DELIMITER '%s'
FORMAT CSV %s
%s
%s
%s;""" % (opt.to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,opt.delim,quote,gzip, timeformat, ignoreheader)

Filemaker to SQL Server via SSIS

I'm using SSIS and trying to import data from Filelmaker into SQL Server. In the Solution Explorer, I right click on "SSIS Packages" and select SQL Server Import and Export Wizard". During the process, I use my DSN as the source, SQL Server as the destination, use a valid query to pull data from Filemaker, and set the mappings.
Each time I try to run the package, I receive the following message:
The "output column "LastNameFirst" (12)" has a length that is not valide. The length must be between 0 and 4000.
I do not understand this error exactly, but in the documentation for ODBC:
http://www.filemaker.com/downloads/pdf/fm9_odbc_jdbc_guide_en.pdf (page 47) it states:
"The maximum column length of text is 1 million characters, unless you specify a smaller Maximum number of characters for the text field in FileMaker. FileMaker returns empty strings as NULL."
I'm thinking that the data type is too large when trying to convert it to varchar. But even after using a query of SUBSTR(LastNameFirst, 1, 2000), I get the same error.
Any suggestions?
I had this problem, and don't know the cause but these are the steps I used to find the offending row:
-in filemaker, export the data to CSV
-open the CSV in excel
-double click on the LastNameFirst column to maximize its width
-scroll down until you see a column '#########' -the way excel indicates data that is too large to be displayed.
I'm sure theres a better way, and I'd love to hear it!
You should use this:
nvarchar (max)