MySql importing from CSV - mysql

I have a table with three columns, NODEID, X, Y. NODEID is the primary key and it is set as an INT(4) to be AUTOINCREMENT. I wish to add more data to this table by importing it from a CSV via the phpmyadmin import. Question:
What would be the format of the CSV look like?
Is this possible or is importing basically just to replace the whole data with the CSV?
As of now the CSV looks like:
1,-105.057578,39.785603
2,-105.038646,39.771132
3,-105.013045,39.771727
5,-105.045721,39.762055
6,-105.031777,39.76206
7,-105.046015,39.72835
8,-105.029796,39.728304
10,-104.930863,39.754579
11,-104.910624,39.754644
13,-104.930959,39.74367
16,-105.045802,39.685253
17,-105.032149,39.688557
18,-105.060891,39.657622
20,-105.042257,39.644086
etc...

Change the SQL that phpmyadmin will run to this:
LOAD DATA INFILE '*FILEPATH*'
INTO TABLE *table*
(X, Y);
(You will only have to change the last line)
And your csv should look like
-105.057578,39.785603
-105.038646,39.771132
-105.013045,39.771727
-105.045721,39.762055
-105.031777,39.76206
-105.046015,39.72835
The last line tells MySQL to look for only those two columns of data and insert null for any other columns. The NULL value will be auto-incremented as expected.

Related

Unable to load .csv data from hdfs into Hive table in Hadoop

I am trying to load csv files into a Hive table. I need to have it done through HDFS.
My end goal is to have the hive table also connected to Impala tables, which I can then load into Power BI, but I am having trouble getting the Hive tables to populate.
I create a table in the Hive query editor using the following code:
CREATE TABLE IF NOT EXISTS dbname.table_name (
time_stamp TIMESTAMP COMMENT 'time_stamp',
attribute STRING COMMENT 'attribute',
value DOUBLE COMMENT 'value',
vehicle STRING COMMENT 'vehicle',
filename STRING COMMENT 'filename')
Then I check and see the LOCATION using the following code:
SHOW CREATE TABLE dbname.table_name;
and find that is has gone to the default location:
hdfs://our_company/user/hive/warehouse/dbname.db/table_name
So I go to the above location in HDFS, and I upload a few csv files manually, which are in the same five-column format as the table I created. Here is where I expect this data to be loaded into the Hive table, but when I go back to dbname in Hive, and open up the table I made, all values are still null, and when I try to open in browser I get:
DB Error
AnalysisException: Could not resolve path: 'dbname.table_name'
Then I try the following code:
LOAD DATA INPATH 'hdfs://our_company/user/hive/warehouse/dbname.db/table_name' INTO TABLE dbname.table_name;
It runs fine, but the table in Hive still does not populate.
I also tried all of the above using CREATE EXTERNAL TABLE instead, and specifying the HDFS in the LOCATION argument. I also tried making an HDFS location first, uploading the csv files, then CREATE EXTERNAL TABLE with the LOCATION argument pointed at the pre-made HDFS location.
I already made sure I have authorization privileges.
My table will not populate with the csv files, no matter which method I try.
What I am doing wrong here?
I was able to solve the problem using:
CREATE TABLE IF NOT EXISTS dbname.table_name (
time_stamp STRING COMMENT 'time_stamp',
attribute STRING COMMENT 'attribute',
value STRING COMMENT 'value',
vehicle STRING COMMENT 'vehicle',
filename STRING COMMENT 'filename')
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
and
LOAD DATA INPATH 'hdfs://our_company/user/hive/warehouse/dbname.db/table_name' OVERWRITE INTO TABLE dbname.table_name;

Is it possible to create a hive table from a specific subset of CSV columns?

I have approximately 400 CSV files. I want to create a Hive table over these CSV files but only include a certain subset of the columns (see below). I know I could create a table with all of them then use a select statement to grab only the ones I want and make a second hive table but I was wondering if there was a way I could avoid doing that.
here are my columns:
columns = ['time', 'Var2', 'Var3', 'Var4', 'Var5', 'Var6', 'Var7', 'I0', 'I1',
'I2', 'V0', 'V1', 'V2', 'fpa', 'fpb', 'fpc', 'fpg', 'filename',
'record_time_stamp', 'fault', 'unix_time', 'Var2_real', 'Var2_imag',
'Var3_real', 'Var3_imag', 'Var4_real', 'Var4_imag', 'Var5_real',
'Var5_imag', 'Var6_real', 'Var6_imag', 'Var7_real', 'Var7_imag',
'I0_real', 'I0_imag', 'I1_real', 'I1_imag', 'I2_real', 'I2_imag',
'V0_real', 'V0_imag', 'V1_real', 'V1_imag', 'V2_real', 'V2_imag']
I don't want these in the Hive table :
['Var2', 'Var3', 'Var4', 'Var5', 'Var6', 'Var7', 'I0', 'I1','I2', 'V0', 'V1', 'V2']
I understand I can just alter my data in the CSVs or use 2 Hive tables but I don't want to alter my data (because another team will use those columns for their work) and I don't want to make another table for the sake of keeping things neat. Is this possible?
In case you can use Spark, I'd suggest you read the data from the CSV file, create a data model of the columns that you need and then enforce it on the RDD you have ingested with your application to create a dataframe.
Save the dataframe thereafter using the .saveAsTable( ) and you should see this in your Hive Database.
Manipulation of data to such an extent is a task for Spark and not Hive.

Importing a CSV with a timestamp field into MonetDB

I'm importing a CSV into MonetDB. I create a table called fx:
CREATE TABLE fx(ticktime timestamp,broker varchar(6),pair varchar(10),side varchar(1),price float,size tinyint,level tinyint)
and now I am trying to upload a large CSV file that does not have a header.
My sample.csv:
20150828 00:00:00.023,BRK1,EUR/USD,A,1.12437,1,1
20150828 00:00:00.023,BRK1,EUR/USD,A,1.12439,5,2
20150828 00:00:00.023,BRK1,EUR/USD,A,1.12441,9,3
My command:
sql>copy into fx from 'c:\fx\sample.csv' using delimiters ',','\n';
Failed to import table line 1 field 1 'timestamp(7)' expected in '20150828 00:00:00.023'
How do I upload this csv?
The timestamp format in your file is not the one MonetDB likes. So two options:
1) Change the type of ticktime to string:
CREATE TABLE fx(ticktime string, broker varchar(6),pair varchar(10),side varchar(1),price float,size tinyint,level tinyint);
COPY INTO ...
However, you would then need to convert the string column ticktime to a new column ticktimet of type timestamp using string manipulation, for example:
ALTER TABLE fx add column ticktimet timestamp;
UPDATE fx SET ticktimet=str_to_timestamp(ticktime , '%Y%m%d %H:%M:%S');
Note that this solution will discard the subsecond part (e.g. .023) from the timestamp, as this is currently not supported in str_to_timestamp.
2) Change the CSV to use a date format MonetDB likes, e.g.
2015-08-28 00:00:00.023,BRK1,EUR/USD,A,1.12437,1,1
2015-08-28 00:00:00.023,BRK1,EUR/USD,A,1.12439,5,2
2015-08-28 00:00:00.023,BRK1,EUR/USD,A,1.12441,9,3
Then, COPY INTO should work directly.

Importing an excel .csv file and adding it to a column in phpMyAdmin

I've read through some other posts and nothing quite answers my question specifically.
I have an existing database in phpMyAdmin - a set of pin codes we use to collect contest entries.
The DB has about 10,000 pin codes in it.
I need to add 250 "New" codes to it. I have an excel file that is stripped down to a single column .csv, no header - just codes.
What I need to do is import this into the table named "pin2" and add these to the row called "pin"
The other rows are where entrants would add names and phone numbers, so are all "null"
I've uploaded a screen grab of the structure.
DB Structure http://www.redpointdesign.ca/sql.png
any help would be appreciated!
You need to use a LOAD DATA query similar to this:
LOAD DATA INFILE 'pincodes.csv'
INTO TABLE pin2 (pin)
If the pin codes in the csv file are enclosed in quotes you may also need to include an ENCLOSED BY clause.
LOAD DATA INFILE 'pincodes.csv'
INTO TABLE pin2
FIELDS ENCLOSED BY '"'
(pin)
If you wants to do using csv
Then you need to need to follow these steps
Manually define autoincremented value in first comlumn.
In other column you have to externally define it as a NULL,
otherwise you will get Invalid column count in CSV input on line 1.
because column with no value is not consider by phpmyadmin
Them click on import in phpmyadmin and you are done ..

Invalid field count in CSV input on line 1

I am trying to export an ODS file to CSV, but when I import into phpmyadmin - I get "Invalid field count in CSV input on line 1."
File (it has more than two lines but the scheme is the same):
"Administração da Guarda Nacional Republicana"
"Administração de Publicidade e Marketing"
table:
CREATE TABLE IF NOT EXISTS `profession` (
`id_profession` int(11) NOT NULL,
`profession` varchar(45) DEFAULT NULL,
`formation_area_id_formation_area` int(11) NOT NULL,
PRIMARY KEY (`id_profession`),
UNIQUE KEY `profession_UNIQUE` (`profession`),
KEY `fk_profession_formation_area1` (`formation_area_id_formation_area`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I never do something similar, probably i need to specify the columns. the csv only have one column and the table have three. In this case the file input belongs to profession column
If you use phpMyAdmin, then you are allowed to specify column names. When logged into the desired database:
Select the table you want to import to.
Click the Import tab.
Under Format of imported file, select CSV.
In Column names, write out a comma separated list of the columns you want the data imported to.
You can also use mysqlimport if you prefer the shell.
For Example:
shell>mysqlimport --columns=column1,column2 dbname imptest.txt
If you want this to import into that table you have 2 choices:
1) Add a comma before and after the data in every row of your file, or
2) Delete the first and third columns in your table, import the data and then add the 2 columns you deleted back.
In Excel I saved the file as "Microsoft Office Excel Comma Separated Values File (.csv)"
In Phpmyadmin:
Select database you want to import table into.
Click import tab.
Select your file. Set FORMAT to CSV
Leave Format-Specific Options alone except for ticking the "The first line of the file contains the table column names" box if you need to
Click GO
You then need to rename the table ( which will be called somthing like "TABLE 5" if its the 5th table in the DB). So select the table and using the Operations tab -> "Rename table to:"
Make sure your uploading csv file only not excel file and then follow the below steps
1 Import
2 Browse for your csv file.
3 Select CSV using LOAD DATA (rather than just CSV)
4 Change “Fields terminated by” from “;” to “,”
5 Make sure “Use LOCAL keyword” is selected.
Click “Go”
Go to Import tab
Browse for the csv file.
Select "CSV" on Format on imported file tab
Checklist the "Ignore duplicate rows"
Change “Fields terminated by” from “;” to “,”
note : you better check the data that already imported. the first row of the data usually contains the header of each column on the table which you imported to csv file. delete the first row after you imported the csv file
you need to check the "The first line of the file contains the table column names (if this is unchecked, the first line will become part of the data)" and then click on "GO".