So basically I would like to create a table containing csv files
I tried something like this where the filenames only differ from eachother by the two last digits :
CREATE EXTERNAL TABLE pageviews
(page_date string,site string)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ';'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/hue/201401/pageviews/supersite_1046_201401**.csv';
To me this syntax looks ok but when I execute it I get the following:
Error occurred executing hive query: Unknown exception.
Any help would be appreciated.
The LOCATION parameter of a hive's create table statement takes as argument a *hdfs_path* (See here). Such a path cannot be file path, but must be a directory path, hence the error you get.
In your case, you could put the required files under a specific directory und specify this very directory in the LOCATION clause of the create table statement.
Related
MySql query gives me data from the 2020-09-21 to 2022-11-02. I want to save the file as FieldData_20200921_20221102.csv.
Mysql query
SELECT 'datetime','sensor_1','sensor_2'
UNION ALL
SELECT datetime,sensor_1,sensor_2
FROM `field_schema`.`sensor_table`
INTO OUTFILE "FieldData.csv"
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
;
Present output file:
Presently I named the file as FieldData.csv and it is accordingly giving me the same. But I want the query to automatically append the first and last dates to this file, so, it helps me know the duration of data without having to open it.
Expected output file
FieldData_20200921_20221102.csv.
MySQL's SELECT ... INTO OUTFILE syntax accepts only a fixed string literal for the filename, not a variable or an expression.
To make a custom filename, you would have to format the filename yourself and then write dynamic SQL so the filename could be a string literal. But to do that, you first would have to know the minimum and maximum date values in the data set you are dumping.
I hardly ever use SELECT ... INTO OUTFILE, because it can only create the outfile on the database server. I usually want the file to be saved on the server where my client application is running, and the database server's filesystem is not accessible to the application.
Both the file naming problem and the filesystem access problem are better solved by avoiding the SELECT ... INTO OUTFILE feature, and instead writing to a CSV file using application code. Then you can name the file whatever you want.
I am trying to load csv files into a Hive table. I need to have it done through HDFS.
My end goal is to have the hive table also connected to Impala tables, which I can then load into Power BI, but I am having trouble getting the Hive tables to populate.
I create a table in the Hive query editor using the following code:
CREATE TABLE IF NOT EXISTS dbname.table_name (
time_stamp TIMESTAMP COMMENT 'time_stamp',
attribute STRING COMMENT 'attribute',
value DOUBLE COMMENT 'value',
vehicle STRING COMMENT 'vehicle',
filename STRING COMMENT 'filename')
Then I check and see the LOCATION using the following code:
SHOW CREATE TABLE dbname.table_name;
and find that is has gone to the default location:
hdfs://our_company/user/hive/warehouse/dbname.db/table_name
So I go to the above location in HDFS, and I upload a few csv files manually, which are in the same five-column format as the table I created. Here is where I expect this data to be loaded into the Hive table, but when I go back to dbname in Hive, and open up the table I made, all values are still null, and when I try to open in browser I get:
DB Error
AnalysisException: Could not resolve path: 'dbname.table_name'
Then I try the following code:
LOAD DATA INPATH 'hdfs://our_company/user/hive/warehouse/dbname.db/table_name' INTO TABLE dbname.table_name;
It runs fine, but the table in Hive still does not populate.
I also tried all of the above using CREATE EXTERNAL TABLE instead, and specifying the HDFS in the LOCATION argument. I also tried making an HDFS location first, uploading the csv files, then CREATE EXTERNAL TABLE with the LOCATION argument pointed at the pre-made HDFS location.
I already made sure I have authorization privileges.
My table will not populate with the csv files, no matter which method I try.
What I am doing wrong here?
I was able to solve the problem using:
CREATE TABLE IF NOT EXISTS dbname.table_name (
time_stamp STRING COMMENT 'time_stamp',
attribute STRING COMMENT 'attribute',
value STRING COMMENT 'value',
vehicle STRING COMMENT 'vehicle',
filename STRING COMMENT 'filename')
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
and
LOAD DATA INPATH 'hdfs://our_company/user/hive/warehouse/dbname.db/table_name' OVERWRITE INTO TABLE dbname.table_name;
I'm attempting to COPY a CSV file to Redshift from an S3 bucket. When I execute the command, I don't get any error messages, however the load doesn't work.
Command:
COPY temp FROM 's3://<bucket-redacted>/<object-redacted>.csv'
CREDENTIALS 'aws_access_key_id=<redacted>;aws_secret_access_key=<redacted>'
DELIMITER ',' IGNOREHEADER 1;
Response:
Load into table 'temp' completed, 0 record(s) loaded successfully.
I attempted to isolate the issue via the system tables, but there is no indication there are issues.
Table Definition:
CREATE TABLE temp ("id" BIGINT);
CSV Data:
id
123,
The line endings in your csv file probably don't have a unix new line character at the end, so the COPY command probably sees your file as:
id123,
Given you have the IGNOREHEADER option enabled, and the line endings in the file aren't what COPY is expecting (my assumption based on past experience), the file contents get treated as one line, and then skipped.
I had this occur for some files created from a Windows environment.
I guess one thing to remember is that CSV is not a standard, more a convention, and different products/vendors have different implementations for csv file creation.
I repeated your instructions, and it worked just fine:
First, the CREATE TABLE
Then, the LOAD (from my own text file containing just the two lines you show)
This resulted in:
Code: 0 SQL State: 00000 --- Load into table 'temp' completed, 1 record(s) loaded successfully.
So, there's nothing obviously wrong with your commands.
At first, I thought that the comma at the end of your data line could cause Amazon Redshift to think that there is an additional column of data that it can't map to your table, but it worked fine for me. Nonetheless, you might try removing the comma, or create an additional column to store this 'empty' value.
How can I load 10,000 rows of test.xls file into mysql db table?
When I use below query it shows this error.
LOAD DATA INFILE 'd:/test.xls' INTO TABLE karmaasolutions.tbl_candidatedetail (candidate_firstname,candidate_lastname);
My primary key is candidateid and has below properties.
The test.xls contains data like below.
I have added rows starting from candidateid 61 because upto 60 there are already candidates in table.
please suggest the solutions.
Export your Excel spreadsheet to CSV format.
Import the CSV file into mysql using a similar command to the one you are currently trying:
LOAD DATA INFILE 'd:/test.csv'
INTO TABLE karmaasolutions.tbl_candidatedetail
(candidate_firstname,candidate_lastname);
To import data from Excel (or any other program that can produce a text file) is very simple using the LOAD DATA command from the MySQL Command prompt.
Save your Excel data as a csv file (In Excel 2007 using Save As) Check
the saved file using a text editor such as Notepad to see what it
actually looks like, i.e. what delimiter was used etc. Start the MySQL
Command Prompt (I’m lazy so I usually do this from the MySQL Query
Browser – Tools – MySQL Command Line Client to avoid having to enter
username and password etc.) Enter this command: LOAD DATA LOCAL INFILE
‘C:\temp\yourfile.csv’ INTO TABLE database.table FIELDS TERMINATED
BY ‘;’ ENCLOSED BY ‘”‘ LINES TERMINATED BY ‘\r\n’ (field1, field2);
[Edit: Make sure to check your single quotes (') and double quotes (")
if you copy and paste this code - it seems WordPress is changing them
into some similar but different characters] Done! Very quick and
simple once you know it :)
Some notes from my own import – may not apply to you if you run a different language version, MySQL version, Excel version etc…
TERMINATED BY – this is why I included step 2. I thought a csv would default to comma separated but at least in my case semicolon was the deafult
ENCLOSED BY – my data was not enclosed by anything so I left this as empty string ”
LINES TERMINATED BY – at first I tried with only ‘\n’ but had to add the ‘\r’ to get rid of a carriage return character being imported into the database
Also make sure that if you do not import into the primary key field/column that it has auto increment on, otherwhise only the first row will be imported
Original Author reference
I've read through some other posts and nothing quite answers my question specifically.
I have an existing database in phpMyAdmin - a set of pin codes we use to collect contest entries.
The DB has about 10,000 pin codes in it.
I need to add 250 "New" codes to it. I have an excel file that is stripped down to a single column .csv, no header - just codes.
What I need to do is import this into the table named "pin2" and add these to the row called "pin"
The other rows are where entrants would add names and phone numbers, so are all "null"
I've uploaded a screen grab of the structure.
DB Structure http://www.redpointdesign.ca/sql.png
any help would be appreciated!
You need to use a LOAD DATA query similar to this:
LOAD DATA INFILE 'pincodes.csv'
INTO TABLE pin2 (pin)
If the pin codes in the csv file are enclosed in quotes you may also need to include an ENCLOSED BY clause.
LOAD DATA INFILE 'pincodes.csv'
INTO TABLE pin2
FIELDS ENCLOSED BY '"'
(pin)
If you wants to do using csv
Then you need to need to follow these steps
Manually define autoincremented value in first comlumn.
In other column you have to externally define it as a NULL,
otherwise you will get Invalid column count in CSV input on line 1.
because column with no value is not consider by phpmyadmin
Them click on import in phpmyadmin and you are done ..