Loading CSV with NULLs columns using bq load - mysql

I am trying to upload a CSV file(TSV actually) generated in mysql(using outfile) into Bigquery using bq tool. This table has following schema:
Here is the sample data file:
"6.02" "0000" "101" \N "Md Fiesta Chicken|1|6.69|M|300212|100|100^M Sourdough|1|0|M|51301|112|112" "6.5" \N "V03" "24270310376" "10/17/2014 3:34 PM" "6.02" "30103" "452" "302998" "2014-12-08 10:57:15" \N
And this is how I try to upload it using bq CLI tool:
$ bq load -F '\t' --quote '"' --allow_jagged_rows receipt_archive.receipts /tmp/rec.csv
BigQuery error in load operation: Error processing job
'circular-gist-812:bqjob_r8d0bbc3192b065_0000014ab097c63c_1': Too many errors encountered. Limit is: 0.
Failure details:
- File: 0 / Line:1 / Field:16: Could not parse '\N' as a timestamp.
Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]]
I think the issue is that updated_at column is NULL & hence skipped. so any idea how can I tell it to consider null/empty columns?

CuriousMind - This isn't an answer. Just an example of the problem of using floats instead of decimals...
CREATE TABLE fd (f FLOAT(5,2),d DECIMAL(5,2));
INSERT INTO fd VALUES (100.30,100.30),(100.70,100.70;
SELECT * FROM fd;
+--------+--------+
| f | d |
+--------+--------+
| 100.30 | 100.30 |
| 100.70 | 100.70 |
+--------+--------+
SELECT f/3+f/3+f/3,d/3+d/3+d/3 FROM fd;
+-------------+-------------+
| f/3+f/3+f/3 | d/3+d/3+d/3 |
+-------------+-------------+
| 100.300003 | 100.300000 |
| 100.699997 | 100.700000 |
+-------------+-------------+
SELECT (f/3)*3,(d/3)*3 FROM fd;
+------------+------------+
| (f/3)*3 | (d/3)*3 |
+------------+------------+
| 100.300003 | 100.300000 |
| 100.699997 | 100.700000 |
+------------+------------+
But why is this a problem, I hear you ask?
Well, consider the following...
SELECT * FROM fd WHERE f <= 100.699997;
+--------+--------+
| f | d |
+--------+--------+
| 100.30 | 100.30 |
| 100.70 | 100.70 |
+--------+--------+
...now surely that's not what would be expected when dealing with money?

To specify "null" in a CSV file, elide all data for the field. (It looks like you are using an unspecified escape syntax "\N".)
For example:
$ echo 2, > rows.csv
$ bq load tmp.test rows.csv a:integer,b:integer
$ bq head tmp.test
+---+------+
| a | b |
+---+------+
| 2 | NULL |
+---+------+

Related

How to parse JSON array that has no element/property names using Oracle

I am new to JSON and am trying to parse the data returned by following URL
https://api.binance.com/api/v3/klines?symbol=LTCBTC&interval=5m
The data is public if you want to see the exact output
I am in an Oracle 18c database trying to use json_table but I am not sure how to format the query or reference the columns as the JSON has no names, just values.
If I just paste in one record from the array as follows then I can get a column with all the values, but I need to parse the entire array and get the output into a table
SELECT *
FROM json_table( '[1617210000000,"0.00325500","0.00326600","0.00325400","0.00326600","780.81000000",1617210299999,"2.54374363",210,"569.58000000","1.85545803","0"]' , '$[*]'
COLUMNS (value PATH '$' ))
I have been searching google for days and not found an example of what I am trying to do, all the example use JSON with name:value pairs.
Thank you in advance.
The raw data is an array of arrays, so you can use $[*] to get the individual arrays, and then numbered positions to get the values from each of those arrays:
SELECT *
FROM json_table(
'[[...], [...], ...]', -- use actual data, as CLOB?
'$[*]'
COLUMNS (
open_time PATH '$[0]',
open PATH '$[1]',
high PATH '$[2]',
low PATH '$[3]',
close PATH '$[4]',
volume PATH '$[5]',
close_time PATH '$[6]',
quote_av PATH '$[7]',
number_of_trades PATH '$[8]',
taker_buy_base_av PATH '$[9]',
taker_buy_quote_av PATH '$[10]',
ignore PATH '$[11]'
)
)
I've taken the column names from the API documentation. Not sure why some are strings, presumably a precision thing; but you can obviously specify the data types. (And there are lots of examples of converted epoch timestamps to Oracle dates/timestamps if you want to do that.)
db<>fiddle with four entries, and an additional column for ordinality, which you might not want/need.
IDX | OPEN_TIME | OPEN | HIGH | LOW | CLOSE | VOLUME | CLOSE_TIME | QUOTE_AV | NUMBER_OF_TRADES | TAKER_BUY_BASE_AV | TAKER_BUY_QUOTE_AV | IGNORE
--: | :------------ | :--------- | :--------- | :--------- | :--------- | :----------- | :------------ | :--------- | :--------------- | :---------------- | :----------------- | :-----
1 | 1617423900000 | 0.00356800 | 0.00357100 | 0.00356400 | 0.00356800 | 358.71000000 | 1617424199999 | 1.27964866 | 90 | 313.96000000 | 1.12008826 | 0
2 | 1617424200000 | 0.00356800 | 0.00357000 | 0.00356600 | 0.00356800 | 349.47000000 | 1617424499999 | 1.24704741 | 105 | 283.05000000 | 1.01005077 | 0
3 | 1617424500000 | 0.00357000 | 0.00357900 | 0.00357000 | 0.00357400 | 412.32000000 | 1617424799999 | 1.47359944 | 127 | 53.73000000 | 0.19203676 | 0
4 | 1617424800000 | 0.00357500 | 0.00357500 | 0.00356500 | 0.00356600 | 910.58000000 | 1617425099999 | 3.25045272 | 198 | 463.30000000 | 1.65400945 | 0

Convert Json to separate columns in HIVE

I have 4 columns in Hive database table. First two columns are of type string, 3rd and 4th are of JSON. Type. How to extract json data in different columns.
SERDE available in Hive seems to be handling only json data. I have both normal (STRING) and JSON data. How can I extract data in separate colums here.
Example:
abc 2341 {max:2500e0,value:"20",Type:"1",ProviderType:"ABC"} {Name:"ABC",minA:1200e0,StartDate:1483900200000,EndDate:1483986600000,Flags:["flag4","flag3","flag2","flag1"]}
xyz 6789 {max:1300e0,value:"10",Type:"0",ProviderType:"foo"} {Name:"foo",minA:3.14159e0,StartDate:1225864800000,EndDate:1225864800000,Flags:["foo","foo"]}
Given a fixed JSON
create table mytable (str string,i int,jsn1 string, jsn2 string);
insert into mytable values
('abc',2341,'{"max":2500e0,"value":"20","Type":"1","ProviderType":"ABC"}','{"Name":"ABC","minA":1200e0,"StartDate":1483900200000,"EndDate":1483986600000,"Flags":["flag4","flag3","flag2","flag1"]}')
,('xyz',6789,'{"max":1300e0,"value":"10","Type":"0","ProviderType":"foo"}','{"Name":"foo","minA":3.14159e0,"StartDate":1225864800000,"EndDate":1225864800000,"Flags":["foo","foo"]}')
;
select str,i
,jsn1_max,jsn1_value,jsn1_type,jsn1_ProviderType
,jsn2_Name,jsn2_minA,jsn2_StartDate,jsn2_EndDate
,jsn2_Flags
from mytable
lateral view json_tuple (jsn1,'max','value','Type','ProviderType')
j1 as jsn1_max,jsn1_value,jsn1_type,jsn1_ProviderType
lateral view json_tuple (jsn2,'Name','minA','StartDate','EndDate','Flags')
j2 as jsn2_Name,jsn2_minA,jsn2_StartDate,jsn2_EndDate,jsn2_Flags
;
+-----+------+----------+------------+-----------+-------------------+-----------+-----------+----------------+---------------+-----------------------------------+
| str | i | jsn1_max | jsn1_value | jsn1_type | jsn1_providertype | jsn2_name | jsn2_mina | jsn2_startdate | jsn2_enddate | jsn2_flags |
+-----+------+----------+------------+-----------+-------------------+-----------+-----------+----------------+---------------+-----------------------------------+
| abc | 2341 | 2500.0 | 20 | 1 | ABC | ABC | 1200.0 | 1483900200000 | 1483986600000 | ["flag4","flag3","flag2","flag1"] |
| xyz | 6789 | 1300.0 | 10 | 0 | foo | foo | 3.14159 | 1225864800000 | 1225864800000 | ["foo","foo"] |
+-----+------+----------+------------+-----------+-------------------+-----------+-----------+----------------+---------------+-----------------------------------+

Time format in MySql database

In the table theGlobal from MySql Database I have the field theTime set on char 50 and the field theType set on char 1.
On table theGlobal import with LOAD DATA INFILE Syntax two different .csv files.
In the first .csv file I have this row:
"T","0:01"
"B","1:05"
The format of 0:01 and 1:05 is mm:ss
In the second .csv file I have this row:
"L","00:07:10"
"L","01:21:39"
The format of 00:07:10 and 01:21:39 is hh:mm:ss
The result of import in the table theGlobal transform the mm of first .csv on the hh and the ss of first .csv on the mm.
E.g:
+---------+----------+
| theType | theTime |
+---------+----------+
| B | 1:05:00 |
| T | 0:01:00 |
| L | 00:07:10 |
| L | 01:21:39 |
+---------+----------+
I need for all rows in the field theTime the format hh:mm:ss.
+---------+----------+
| theType | theTime |
+---------+----------+
| B | 00:01:05 |
| B | 00:00:01 |
| L | 00:07:10 |
| L | 01:21:39 |
+---------+----------+
How to resolve this?
Please help me, thank you so much in advance.
While loading the data, you can assign it to a variable first, then do whatever with the variable and load it in the actual column. In your case this would look something like this:
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, #var1)
SET column2 = TIME(STR_TO_DATE(#var1, '%i:%S'));
Adjust the STR_TO_DATE() parameter as needed. Here's a table explaining it (it's for date_format() but it's the same for str_to_date()).
Oh, and store the data in a time column, not varchar.

How should I format my .txt enum values?

The table test from my database has a unique ENUM column. How should I format my .txt file in order to load data from it into the column?
This is how I'm doing it right now:
text.txt:
0
1
2
2
1
MySQL Script:
LOAD DATA LOCAL INFILE 'Data/test.txt' INTO TABLE test
DESCRIBE test
+-------+-------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------------+------+-----+---------+-------+
| enum | enum('0','1','2') | YES | | NULL | |
+-------+-------------------+------+-----+---------+-------+
The output:
+------+
| enum |
+------+
| |
| |
| |
| |
| 1 |
+------+
The first (possible) bug is break line symbols, which is '\n' by default in unix systems. Check your file, a high probability that it is '\r\n', and add LINES TERMINATED clause -
LINES TERMINATED BY '\r\n'
The second bug - a file name, you wrote 'text.txt', but in LOAD DATA command you have used 'test.txt'.
LOAD DATA INFILE Syntax

Hive Table returning empty result set on all queries

I created a Hive Table, which loads data from a text file. But its returning empty result set on all queries.
I tried the following command:
CREATE TABLE table2(
id1 INT,
id2 INT,
id3 INT,
id4 STRING,
id5 INT,
id6 STRING,
id7 STRING,
id8 STRING,
id9 STRING,
id10 STRING,
id11 STRING,
id12 STRING,
id13 STRING,
id14 STRING,
id15 STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION '/user/biadmin/lineitem';
The command gets executed, and the table gets created. But, always returns 0 rows for all queries, including SELECT * FROM table2;
Sample data:
Single line of the input data:
1|155190|7706|1|17|21168.23|0.04|0.02|N|O|1996-03-13|1996-02-12|1996-03-22|DELIVER IN PERSON|TRUCK|egular courts above the|
I have attached the screen shot of the data file.
Output for command: DESCRIBE FORMATTED table2;
| Wed Apr 16 20:18:58 IST 2014 : Connection obtained for host: big-instght-15.persistent.co.in, port number 1528. |
| # col_name data_type comment |
| |
| id1 int None |
| id2 int None |
| id3 int None |
| id4 string None |
| id5 int None |
| id6 string None |
| id7 string None |
| id8 string None |
| id9 string None |
| id10 string None |
| id11 string None |
| id12 string None |
| id13 string None |
| id14 string None |
| id15 string None |
| |
| # Detailed Table Information |
| Database: default |
| Owner: biadmin |
| CreateTime: Mon Apr 14 20:17:31 IST 2014 |
| LastAccessTime: UNKNOWN |
| Protect Mode: None |
| Retention: 0 |
| Location: hdfs://big-instght-11.persistent.co.in:9000/user/biadmin/lineitem |
| Table Type: MANAGED_TABLE |
| Table Parameters: |
| serialization.null.format |
| transient_lastDdlTime 1397486851 |
| |
| # Storage Information |
| SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| InputFormat: org.apache.hadoop.mapred.TextInputFormat |
| OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| Compressed: No |
| Num Buckets: -1 |
| Bucket Columns: [] |
| Sort Columns: [] |
| Storage Desc Params: |
| field.delim | |
+-----------------------------------------------------------------------------------------------------------------+
Thanks!
Please make sure that the location /user/biadmin/lineitem.txt actually exists and you have data present there. Since you are using LOCATION clause your data must be present there, instead of the default warehouse location, /user/hive/warehouse.
Do a quick ls to verify that :
bin/hadoop fs -ls /user/biadmin/lineitem.txt
Also, make sure that you are using the proper delimiter.
Did you tried with LOAD DATA LOCAL INFILE
LOAD DATA LOCAL INFILE'/user/biadmin/lineitem.txt' INTO TABLE table2
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(id1,id2,id3........);
Documentation: http://dev.mysql.com/doc/refman/5.1/en/load-data.html
Are you using managed table or external table?? If it is external table you should use external keyword while creating the table. after creating the table load data into the table using load command. if it is managed table, after loading data to the table, you could see the data in your hive warehouse directory in hadoop. the default path is "/user/hive/warehouse/yourtablename".
you should run the load command in the hive shell.
I was able to load the data to the table. The problem was:
LOCATION '/user/biadmin/lineitem';
wasn't loading any data. But when I gave the directory containing the file as the path like:
LOCATION '/user/biadmin/tpc-h';
where I put the lineite.txt file in the tpc-h directory.
It worked!