Exporting Data from Cassandra to CSV file

Exporting Data from Cassandra to CSV file - csv

Table Name : Product
uid | productcount | term | timestamp
304ad5ac-4b6d-4025-b4ea-8b7991a3fe72 | 26 | dress | 1433110980000
6097e226-35b5-4f71-b158-a1fe39a430c1 | 0 | #751104 | 1433861040000
Command :
COPY product (uid, productcount, term, timestamp) TO 'temp.csv';
Error:
Improper COPY command.
Am I missing something?

The syntax of your original COPY command is also fine. The problem is with your column named timestamp, which is a data type and is a reserved word in this context. For this reason you need to escape your column name as follows:
COPY product (uid, productcount, term, "timestamp") TO 'temp.csv';
Even better, try to use a different field name, because this can cause other problems as well.

I am able to export the data into CSV files by using by below command.
Avoiding the column names did the trick.
copy product to 'temp.csv' ;

Use following commands to get the data from Cassandra Tables to CSV
This command will copy Top 100 rows to CSV file.
cqlsh -e"SELECT * FROM employee.employee_details" > /home/hadoop/final_Employee.csv
This command will copy All the rows to CSV file.
cqlsh -e"PAGING OFF;SELECT * FROM employee.employee_details" > /home/hadoop/final_Employee.csv

Related

Why redshift does not have entry for csv file in stl_load_commits ??

Even though I know aws has mentioned on their documentation that csv is more like txt file for them. But why there is no entry for CSV file.
For example:
If I am running a query like:
COPY "systemtable" FROM 's3://test/example.txt' <credentials> IGNOREHEADER 1 delimiter as ','
then its creating entry in stl_load_commits, which I can query by:
select query, curtime as updated from stl_load_commits where query = pg_last_copy_id();
But, in same way when I am tryig with:
COPY "systemtable" FROM 's3://test/example.csv'
<credentials> IGNOREHEADER 1 delimiter as ',' format csv;
then return from
select query, curtime as updated from stl_load_commits where query = pg_last_copy_id();
is blank, Why aws does not create entry for csv ?
This is the first part of the question. Secondly, there must be some way through which we can check the status of the loaded file?
How can we check if the file has successfully loaded in DB if the file is of type csv?

The format of the file does not affect the visibility of success or error information in system tables.
When you run COPY it returns confirmation of success and a count of rows loaded. Some SQL clients may not return this information to you but here's what it looks like using psql:
COPY public.web_sales from 's3://my-files/csv/web_sales/'
FORMAT CSV
GZIP
CREDENTIALS 'aws_iam_role=arn:aws:iam::01234567890:role/redshift-cluster'
;
-- INFO: Load into table 'web_sales' completed, 72001237 record(s) loaded successfully.
-- COPY
If the load succeeded you can see the files in stl_load_commits:
SELECT query, TRIM(file_format) format, TRIM(filename) file_name, lines, errors FROM stl_load_commits WHERE query = pg_last_copy_id();
query | format | file_name | lines | errors
---------+--------+---------------------------------------------+---------+--------
1928751 | Text | s3://my-files/csv/web_sales/0000_part_03.gz | 3053206 | -1
1928751 | Text | s3://my-files/csv/web_sales/0000_part_01.gz | 3053285 | -1
If the load fails you should get an error. Here's an example error (note the table I try to load):
COPY public.store_sales from 's3://my-files/csv/web_sales/'
FORMAT CSV
GZIP
CREDENTIALS 'aws_iam_role=arn:aws:iam::01234567890:role/redshift-cluster'
;
--ERROR: Load into table 'store_sales' failed. Check 'stl_load_errors' system table for details.
You can see the error details in stl_load_errors.
SELECT query, TRIM(filename) file_name, TRIM(colname) "column", line_number line, TRIM(err_reason) err_reason FROM stl_load_errors where query = pg_last_copy_id();
query | file_name | column | line | err_reason
---------+------------------------+-------------------+------+---------------------------
1928961 | s3://…/0000_part_01.gz | ss_wholesale_cost | 1 | Overflow for NUMERIC(7,2)
1928961 | s3://…/0000_part_02.gz | ss_wholesale_cost | 1 | Overflow for NUMERIC(7,2)

Hive search using csv file

I'm relatively new to Hive, so I'm not even sure of the proper terminology to use, so this may have already been addressed. Apologies if it has.
Here's my scenario; we have a large table of data for thousands of devices, keyed by serial number. I need to lookup specific variables for devices, often several hundred at a time. I know I can do a search that contains "SN=001 OR SN=002 OR SN=003.." for hundreds of entries, but that's awkward and time consuming. What I'd like to be able to do is have a csv file that contains a list of serial numbers, and perform a search that says "Give me the variables I want for all the devices in this csv file". Is that possible, and if so how do I do it? Thanks!

in_file
Demo
bash
cat>/tmp/myfile.txt
111
123
222
333
789
hive
create table mytable (mycol string);
insert into mytable values (123),(456),(789);
select *
from mytable
where in_file (mycol,'/tmp/myfile.txt')
;
+-------+
| mycol |
+-------+
| 123 |
| 789 |
+-------+

If you have your CSV file in HDFS you could just make a table over it. (we'll call it csv_table).
Then you can write your query as follows:
select *
from my_table
where specific_column in (
select *
from csv_table)
;

Importing a CSV file into a table with a different number of columns without a bridge / temp table

Say I have a CSV file with 3 columns and a destination table with 5 columns (3 identical to the CSV columns, and 2 more). All rows have data for the same number of columns.
CSV:
id | name | rating
---|-----------|-------
1 | radiohead | 10
2 | bjork | 9
3 | aqua | 2
SQL table:
id | name | rating | biggest_fan | next_concert
Right now, in order to import the CSV file, I create a temporary table with 3 columns, then copy the imported data into the real table. But this seems silly, and I can't seem to find any more efficient solution.
Isn't there a way to import the file directly into the destination table, while generating NULL / default values in the columns that appear in the table but not in the file?
I'm looking for a SQL / phpMyAdmin solution

No, I don't think there's a better way. A different way would be to use a text manipulating program (sed, awk, perl, python,...) to add two commas to the end of each line; even if your column order didn't match, phpMyAdmin has a field for changing the order when importing a CSV. However, it seems to still require the proper number of columns. Whether that's more or less work than what you're already doing is up to you, though.

Import CSV with multi-valued (collection) attributes to Cassandra

Suppose I would like to import a csv file into the following table:
CREATE TABLE example_table (
id int PRIMARY KEY,
comma_delimited_str_list list<ascii>,
space_delimited_str_list list<ascii>
);
where comma_delimited_str_list and space_delimited_str_list are two list-attributes which use comma and space as their delimiter respectively.
An example csv record would be:
12345,"hello,world","stack overflow"
where I would like to treat "hello,world" and "stack overflow" as two multi-valued attributes.
Can I know how to import such CSV file into its corresponding table in Cassandra? Preferably using CQL COPY?

CQL 1.2 is able to port CSV file with multi-valued fields directly to a table. However, the format of those multi-valued fields must match the CQL format.
For example, lists must be in the form ['abc','def','ghi'], and sets must be in the form {'123','456','789'}.
Below is an example of importing CSV formatted data to the example_table mentioned in the OP from STDIN:
cqlsh:demo> copy example_table from STDIN;
[Use \. on a line by itself to end input]
[copy] 12345,"['hello','world']","['stack','overflow']"
[copy] 56780,"['this','is','a','test','list']","['here','is','another','one']"
[copy] \.
2 rows imported in 11.304 seconds.
cqlsh:demo> select * from example_table;
id | comma_delimited_str_list | space_delimited_str_list
-------+---------------------------+--------------------------
12345 | [hello, world] | [stack, overflow]
56780 | [this, is, a, test, list] | [here, is, another, one]
Importing incorrect formatted list or set values from a CSV file will raise an error:
cqlsh:demo> copy example_table from STDIN;
[Use \. on a line by itself to end input]
[copy] 9999,"hello","world"
Bad Request: line 1:108 no viable alternative at input ','
Aborting import at record #0 (line 1). Previously-inserted values still present.
The above input should be replaced by 9999,"['hello']","['world']":
cqlsh:demo> copy example_table from STDIN;
[Use \. on a line by itself to end input]
[copy] 9999,"['hello']","['world']"
[copy] \.
1 rows imported in 16.859 seconds.
cqlsh:demo> select * from example_table;
id | comma_delimited_str_list | space_delimited_str_list
-------+---------------------------+--------------------------
9999 | [hello] | [world]
12345 | [hello, world] | [stack, overflow]
56780 | [this, is, a, test, list] | [here, is, another, one]

MySql - Load Local Data Infile - how to avoid insertion of row containing invalid data

I'm importing data from .csv file to Mysql db using "LOAD DATA LOCAL INFILE" query.
.csv contains foll:
ID | Name | Date | Price
1. 01 | abc | 13-02-2013 | 1500
2. 02 | nbd | blahblahbl | 1000
3. 03 | kgj | 11-02-2012 | jghj
My Mysql Table contains following columns:
Id INTEGER
NAME Varchat(100)
InsertionTimeStamp Date
Price INTEGER
MySQL query to load .csv data to the table above :
LOAD DATA LOCAL INFILE 'UserDetails.csv' INTO TABLE userdetails
FIELDS TERMINATED BY ','
IGNORE 1 LINES
(Id,Name,InsertionTimeStamp,Price)
set InsertionTimeStamp = str_to_date(#d,'%d-%m-%Y');
When I executed the query, 3 records got inserted into the table with 2 warnings
a) Incorrect datetime value : 'blahblahbl' for function str_to_date
b) Data truncate at row 3 (because of invalid INTEGER for Price column)
Question
1. Is there any way to avoid data being inserted in table which shows warnings/errors or the row which has invalid data
I dont want Row 2 & Row 3 to be inserted as it contains invalid data
2. For WARNING of Incorrect datetime value above, can I get the row no also?
Basically I want to get exact warning/error with the row number.

I think it would be way more easy if you'd validate the input with some other language(for example php).
You'd just have to iterate through the lines of the csv and call something like this , this and this!
If you just have to fiddle with SQL, this might help!

You can try Data Import tool (CSV or TXT import format) in dbForge Studio for MySQL.
In Data Import wizard on Modes page uncheck Use bulk insert oprion, on Errors handling page check Ignore all errors option, it will help you to skip import errors.

I know you are trying to skip problematic rows but don't you think you have a mistake in your LOAD DATA command? Should it be:
LOAD DATA LOCAL INFILE 'UserDetails.csv' INTO TABLE userdetails
FIELDS TERMINATED BY ','
IGNORE 1 LINES
(Id,Name,#d,Price)
set InsertionTimeStamp = str_to_date(#d,'%d-%m-%Y');
Shouldn't you be using the variable name (d) in the list of columns instead of the actual name of the column (InsertionTimeStamp)? That could be the reason why you are getting the error message about datetime.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Exporting Data from Cassandra to CSV file - csv

I am able to export the data into CSV files by using by below command. Avoiding the column names did the trick. copy product to 'temp.csv' ;

Related

Why redshift does not have entry for csv file in stl_load_commits ??

Hive search using csv file

Importing a CSV file into a table with a different number of columns without a bridge / temp table

Import CSV with multi-valued (collection) attributes to Cassandra

MySql - Load Local Data Infile - how to avoid insertion of row containing invalid data

Categories

Resources