Hive search using csv file - csv

I'm relatively new to Hive, so I'm not even sure of the proper terminology to use, so this may have already been addressed. Apologies if it has.
Here's my scenario; we have a large table of data for thousands of devices, keyed by serial number. I need to lookup specific variables for devices, often several hundred at a time. I know I can do a search that contains "SN=001 OR SN=002 OR SN=003.." for hundreds of entries, but that's awkward and time consuming. What I'd like to be able to do is have a csv file that contains a list of serial numbers, and perform a search that says "Give me the variables I want for all the devices in this csv file". Is that possible, and if so how do I do it? Thanks!

in_file
Demo
bash
cat>/tmp/myfile.txt
111
123
222
333
789
hive
create table mytable (mycol string);
insert into mytable values (123),(456),(789);
select *
from mytable
where in_file (mycol,'/tmp/myfile.txt')
;
+-------+
| mycol |
+-------+
| 123 |
| 789 |
+-------+

If you have your CSV file in HDFS you could just make a table over it. (we'll call it csv_table).
Then you can write your query as follows:
select *
from my_table
where specific_column in (
select *
from csv_table)
;

Related

HP Vertica - How do I specify date formats for the CSV parsers

In Vertica 7.2, I'm using COPY with fdelimitedparser. I would like to be able to specify a date or datetime format for some but not all of the columns. Different date columns can have different formats.
I can't list all columns like when using COPY without a parser, since I have many files with different column combinations, and I would rather avoid writing a script to generate my copy command for each file.
Is there any way to do this ?
Additionally, how do I know which parser natively accepts which date format ?
Thanks !
You can use Vertica filler option when loading data.
See example here :
Transform data during load in Vertica
A small example also :
dbadmin=> \! cat /tmp/file.csv
2016-19-11
dbadmin=> copy tbl1 (v_col_1 FILLER date FORMAT 'YYYY-DD-MM',col1 as v_col_1) from '/tmp/file.csv';
Rows Loaded
-------------
1
(1 row)
dbadmin=> select * from tbl1;
col1
------------
2016-11-19
(1 row)
dbadmin=> copy tbl1 (v_col_1 FILLER date FORMAT 'YYYY-MM-DD',col1 as v_col_1) from '/tmp/file.csv';
Rows Loaded
-------------
1
(1 row)
dbadmin=> select * from tbl1;
col1
------------
2016-11-19
2017-07-14
(2 rows)
hope this helped
You can use the format keyword as part of the COPY command
see below eg from Vertica Forum :
create table test3 (id int, Name varchar(16), dt date, f2 int);
CREATE TABLE
vsql=> \!cat /tmp/mydata.data
1|foo|29-Jan-2013|100.0
2|bar|30-Jan-2013|200.0
3|egg|31-Jan-2013|300.0
4|tux|01-Feb-2013|59.9
vsql=> copy test3
vsql-> ( id, Name, dt format 'DD#MON#YYYY', f 2)
vsql-> from '/tmp/mydata.data' direct delimiter '|' abort on error;
Rows Loaded
-------------
4
(1 row)
vsql=> select * from test3;
id | Name | dt | f2
----+------+------------+----------
1 | foo | 2013-01-29 | 100.0000
2 | bar | 2013-01-30 | 200.0000
3 | egg | 2013-01-31 | 300.0000
4 | tux | 2013-02-01 | 59.9000
I understand , you need to chooses between “Simple to load “ And “Fast
to consume”, flex table will add some impacts on the consumers ,
some info on that : Flex table is row based storage it will consume
more disk space and its have zero capability to encode the data ,
you have the ability to materialized the relevant columns as
columnar , but the data will be persist twice , on both ROW and
columnar storages (load time should be slower, and it will require)
. At a Query time , if you plan to query only the materialized
columns you should be ok , but if not , you should expect to have
performance issues

Exporting Data from Cassandra to CSV file

Table Name : Product
uid | productcount | term | timestamp
304ad5ac-4b6d-4025-b4ea-8b7991a3fe72 | 26 | dress | 1433110980000
6097e226-35b5-4f71-b158-a1fe39a430c1 | 0 | #751104 | 1433861040000
Command :
COPY product (uid, productcount, term, timestamp) TO 'temp.csv';
Error:
Improper COPY command.
Am I missing something?
The syntax of your original COPY command is also fine. The problem is with your column named timestamp, which is a data type and is a reserved word in this context. For this reason you need to escape your column name as follows:
COPY product (uid, productcount, term, "timestamp") TO 'temp.csv';
Even better, try to use a different field name, because this can cause other problems as well.
I am able to export the data into CSV files by using by below command.
Avoiding the column names did the trick.
copy product to 'temp.csv' ;
Use following commands to get the data from Cassandra Tables to CSV
This command will copy Top 100 rows to CSV file.
cqlsh -e"SELECT * FROM employee.employee_details" > /home/hadoop/final_Employee.csv
This command will copy All the rows to CSV file.
cqlsh -e"PAGING OFF;SELECT * FROM employee.employee_details" > /home/hadoop/final_Employee.csv

MySQL varchar column filled but not visible

I'm having a problem with a column ( VARCHAR(513) NOT NULL ) on a MySQL table.During a procedure of import from a CSV file, a bunch of rows got filled with some weird stuff coming from I don't know where.
This stuff is not visible from Workbench, but if I query the DBMS with:SELECT * FROM MyTable;I got:ID | Drive | Directory | URI | Type ||
1 | Z: | \Users\Data\ | \server\dati | 1 || // <-correct row
...
32 | NULL | \Users\OtherDir\ | | 0 ||While row 1 is correct, row 32 shows a URI filled with something. Now, if I query dbms with:SELECT length(URI) FROM MyTable WHERE ID = 32; I got 32. While, doing:SELECT URI FROM MyTable WhERE ID = 32; inside a MFC application, gets a string with length 0.Inside this program I have a tool for handling this table but this cannot work because I cannot build up queries about rows with bugged URI: how can I fix this? Where this problem comes from? If you need more information please ask.
Thanks.
Looks like you have white spaces in the data and which is causing the issue and when you import data from CSV its most often happen.
So to fix it you may need to run the following update statement
update MyTable set URI = trim(URI);
The above will remove the white spaces from the column.
Also while importing data from CSV its better to use the TRIM() for the values before inserting into the database and this will avoid this kind of issues.

Importing a CSV file into a table with a different number of columns without a bridge / temp table

Say I have a CSV file with 3 columns and a destination table with 5 columns (3 identical to the CSV columns, and 2 more). All rows have data for the same number of columns.
CSV:
id | name | rating
---|-----------|-------
1 | radiohead | 10
2 | bjork | 9
3 | aqua | 2
SQL table:
id | name | rating | biggest_fan | next_concert
Right now, in order to import the CSV file, I create a temporary table with 3 columns, then copy the imported data into the real table. But this seems silly, and I can't seem to find any more efficient solution.
Isn't there a way to import the file directly into the destination table, while generating NULL / default values in the columns that appear in the table but not in the file?
I'm looking for a SQL / phpMyAdmin solution
No, I don't think there's a better way. A different way would be to use a text manipulating program (sed, awk, perl, python,...) to add two commas to the end of each line; even if your column order didn't match, phpMyAdmin has a field for changing the order when importing a CSV. However, it seems to still require the proper number of columns. Whether that's more or less work than what you're already doing is up to you, though.

mysql/oracle stored math formula

is there any way apply math formula from stored string in Oracle and or MySQL?
col1 | col2 | formula
---------------------
2 | 2 | col1*col2
2 | 3 | col1+col2
SELECT * from tbl
result:
col1 | col2 | formula
---------------------
2 | 2 | 4
2 | 3 | 5
edit: for each row another formula
I think what you're saying is you want to have the database parse the formula string. For example, for Oracle you could
Add a column to the table to contain the result
Run an update statement which would call a PL/SQL function with the values of the columns in the table and the text of the formula
update {table} set formula_result = fn_calc_result (col1, col2, formula_column);
The PL/SQL function would create a string by replacing the "col1" and "col2" and so forth with the actual values of those columns. You can do that with regular expresions, as long as the formulas are consistently written.
Then use
execute immediate 'select '||{formula}||' from dual' into v_return;
return v_return;
to calculate the result and return it.
Of course, you could also write your own parser. If you decide to go that way, don't forget to handle operation precedence, parentheses, and so forth.
I think you want a virtual column. See here for excellent article on its setup and use.
you may do it via a PL/SQL script that you can trigger automcatically when inserting the data.
See http://en.wikipedia.org/wiki/PL/SQL
PL/SQL is a kind of program that executes in the database itself. It's quite easy to do.