import csv to sqlite3, but change the type of column before import - csv

I am working with one csv, which has many columns. I decided I'll import the data via sqlite3 shell and I found this very useful:
.mode CSV
.import my_table.csv my_sqlite_table
This saves me a lot of work, on the other hand it gives me no control over column and value characteristic as all the data is TEXT
Is there any elegant way within shell to first address what type the column should be or to fix particular blank values with null?

Values in CSV files always are strings.
To change that, import into a temporary table, and then modify the values appropriately:
INSERT INTO the_actual_table(a, b, c)
SELECT a, CAST(b AS INTEGER), nullif(c, '') FROM temp_table;

Related

Export non-varchar data to CSV table using Trino (formerly PrestoDB)

I am working on some benchmarks and need to compare ORC, Parquet and CSV formats. I have exported TPC/H (SF1000) to ORC based tables. When I want to export it to Parquet I can run:
CREATE TABLE hive.tpch_sf1_parquet.region
WITH (format = 'parquet')
AS SELECT * FROM hive.tpch_sf1_orc.region
When I try the similar approach with CSV, then I get the error Hive CSV storage format only supports VARCHAR (unbounded). I would assumed that it would convert the other datatypes (i.e. bigint) to text and store the column format in the Hive metadata.
I can export the data to CSV using trino --server trino:8080 --catalog hive --schema tpch_sf1_orc --output-format=CSV --execute 'SELECT * FROM nation, but then it gets emitted to a file. Although this works for SF1 it quickly becomes unusable for SF1000 scale-factor. Another disadvantage is that my Hive metastores wouldn't have the appropriate meta-data (although I could patch it manually if nothing else works).
Anyone an idea how to convert my ORC/Parquet data to CSV using Hive?
In Trino Hive connector, the CSV table can contain varchar columns only.
You need to cast the exported columns to varchar when creating the table
CREATE TABLE region_csv
WITH (format='CSV')
AS SELECT CAST(regionkey AS varchar), CAST(name AS varchar), CAST(comment AS varchar)
FROM region_orc
Note that you will need to update your benchmark queries accordingly, e.g. by applying reverse casts.
DISCLAIMER: Read the full post, before using anything discussed here. It's not real CSV and you migth screw up!
It is possible to create typed CSV-ish tables when using the TEXTFILE format and use ',' as the field separator:
CREATE TABLE hive.test.region (
regionkey bigint,
name varchar(25),
comment varchar(152)
)
WITH (
format = 'TEXTFILE',
textfile_field_separator = ','
);
This will create a typed version of the table in the Hive catalog using the TEXTFILE format. It normally uses the ^A character (ASCII 10), but when set to ',' it resembles the same structure as CSV formats.
IMPORTANT: Although it looks like CSV, it is not real CSV. It doesn't follow RFC 4180, because it doesn't properly quote and escape. The following INSERT will not be inserted co:
INSERT INTO hive.test.region VALUES (
1,
'A "quote", with comma',
'The comment contains a newline
in it');
The text will be copied unmodified to the file without escaping quotes or commas. This should have been written like this to be proper CSV:
1,"A ""quote"", with comma","The comment contains a newline
in it"
Unfortunately, it is written as:
1,A "quote", with comma,The comment contains a newline
in it
This results in invalid data that will be represented by NULL columns. For this reason, this method can only be used when you have full control over the text-based data and are sure that it doesn't contain newlines, quotes, commas, ...

Importing a Text File into a MySQL through Navicat DB software

I am trying to import a Text File into a MySQL through Navicat DB software.
I am struggling to import(append) a text file into a MySQL table.
The text file fields are seperated by | ;
example : |Name|Email|Address|
When i import this through the Navicat import wizard it ask for " Which delimeter seperates
the fields. So instead of selecting Tabs, ; , or any other i select | as field seperator.
But still the fields in the file do not match(sync) with the fields of the table...
Can anyone suggest any advice here?
I actually have exported the text file from another MySQL DB thru export functionality from PHPMyAdmin,,
I assume your name column is null and the values appear instead in the email column?
I suspect the problem lies in the fact that your fields are not only separated by a pipe, your rows also begin and end with a pipe.
Think of a CSV: name,email,address, not ,name,email,address,, because that would be interpreted as 5 columns, the value of the first and last field being null.
You'll have to choose a different delimiter for your rows and fields.
Beyond that, you can try importing the data into a new table and then write an insert query to map the temp fields to the ones in your database. The screen after the one where you choose the target table has a table where you can map your import fields to the target ones.
Let me know how that works out.

Invalid field count in CSV input on line 1

I am trying to export an ODS file to CSV, but when I import into phpmyadmin - I get "Invalid field count in CSV input on line 1."
File (it has more than two lines but the scheme is the same):
"Administração da Guarda Nacional Republicana"
"Administração de Publicidade e Marketing"
table:
CREATE TABLE IF NOT EXISTS `profession` (
`id_profession` int(11) NOT NULL,
`profession` varchar(45) DEFAULT NULL,
`formation_area_id_formation_area` int(11) NOT NULL,
PRIMARY KEY (`id_profession`),
UNIQUE KEY `profession_UNIQUE` (`profession`),
KEY `fk_profession_formation_area1` (`formation_area_id_formation_area`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I never do something similar, probably i need to specify the columns. the csv only have one column and the table have three. In this case the file input belongs to profession column
If you use phpMyAdmin, then you are allowed to specify column names. When logged into the desired database:
Select the table you want to import to.
Click the Import tab.
Under Format of imported file, select CSV.
In Column names, write out a comma separated list of the columns you want the data imported to.
You can also use mysqlimport if you prefer the shell.
For Example:
shell>mysqlimport --columns=column1,column2 dbname imptest.txt
If you want this to import into that table you have 2 choices:
1) Add a comma before and after the data in every row of your file, or
2) Delete the first and third columns in your table, import the data and then add the 2 columns you deleted back.
In Excel I saved the file as "Microsoft Office Excel Comma Separated Values File (.csv)"
In Phpmyadmin:
Select database you want to import table into.
Click import tab.
Select your file. Set FORMAT to CSV
Leave Format-Specific Options alone except for ticking the "The first line of the file contains the table column names" box if you need to
Click GO
You then need to rename the table ( which will be called somthing like "TABLE 5" if its the 5th table in the DB). So select the table and using the Operations tab -> "Rename table to:"
Make sure your uploading csv file only not excel file and then follow the below steps
1 Import
2 Browse for your csv file.
3 Select CSV using LOAD DATA (rather than just CSV)
4 Change “Fields terminated by” from “;” to “,”
5 Make sure “Use LOCAL keyword” is selected.
Click “Go”
Go to Import tab
Browse for the csv file.
Select "CSV" on Format on imported file tab
Checklist the "Ignore duplicate rows"
Change “Fields terminated by” from “;” to “,”
note : you better check the data that already imported. the first row of the data usually contains the header of each column on the table which you imported to csv file. delete the first row after you imported the csv file
you need to check the "The first line of the file contains the table column names (if this is unchecked, the first line will become part of the data)" and then click on "GO".

Import specific columns from text-file into mysql.. is this possible?

I've just downloaded a bunch of text files from data.gov, and there are fields in the text file that I really don't need.
Is there a way to import columns [1,3] and leave the rest?
I figure I'll import using 'load data in file', but didn't see anything on the mysql page as to how to only import certain columns.
http://dev.mysql.com/doc/refman/5.0/en/load-data.html
The fields are delimited by ^.
Just so I'm clear, if a line in the txt file is
00111^first column entry^second column entry^this would be the 3rd column
I am trying to get my mysql table to contain
first column entry | this would be the 3rd column
You can import the specific columns with:
LOAD DATA LOCAL INFILE 'yourFile' INTO TABLE table_name
FIELDS TERMINATED BY '^' (column1, #dummy, column3, #dummy);
Put all columns which you don't need in #dummy.
You could always create a table with a dummy column(s) which you drop after loading the file (assuming you don't have to load the file very often).
Something like this:
LOAD DATA LOCAL INFILE '/path/to/file' INTO TABLE table_name
FIELDS TERMINATED BY '^' (dummy_column1, column1, dummy_column2, column2);
ALTER TABLE table_name DROP dummy_column1;
ALTER TABLE table_name DROP dummy_column2;
Assuming a Unix platform, you could filter the fields upstream.
cut -d^ -f2,4 mygovfile.dat > mytable.txt
To filter the first and third column, then import using your preferred method.
For instance
mysqlimport --local -uxxx -pyyy mydb --fields-terminated-by="^" mytable.txt ....
The two most common ways of dealing with this:
Import the data just as it is into a
staging table, move what you need
into your "real" tables, then
truncate the staging table.
Use a text utility to snip out just
what you need.
My text utility of choice is awk. A minimal awk script--which probably won't work for you without some tweaking--would look like this.
$ awk 'BEGIN { FS="^";OFS=",";}{print $2, $4}' test.dat
first column entry,this would be the 3rd column
What kind of tweaking? It usually involves taking care of embedded commas, single quotes, and double quotes.
This part
BEGIN { FS="^";OFS=",";}{print $2, $4}
is the whole awk program.
awk rocks.

SQL loader, insert filename in import

I'm trying to import a data file that looks like
FileName: BND20160114.dat
The rows look something like this:
SPSTART;BN;20160114;083422;000026
SPINFO;15165446456;A1;20160114
SPINFO;54645465456;A1;20160114
SPSLUT;BN;20160114;083422;4
I've created a simple controlfile that imports the rows in a table.
load data characterset WE8MSWIN1252
APPEND INTO TABLE CSSE_IMP.STAGED_PERSON_LKD
fields terminated by ';'
TRAILING NULLCOLS
(
FILE_ROWNUM RECNUM,
FILE_DATA POSITION(1:400),
)
My issue is that I'm am going to import more files that are similar to this file, and I need to know the origin of every file I import.
Either I import the filename for every row.
Or I get the the letters after the SPSTART; in the first row, and add that to every row.
I'm stuck. Could someone shed some light on this.
Thank!