Error while creating Parquet table from CSV using Apache Drill - csv

I'm Trying to create a Parquet table from a CSV extract (generated from an Oracle database table) that has over a million rows. about 25 of those rows have null values for the START_DATE and CTAS is failing to interpret "" as null. Any suggestions would be greatly appreciated.
CREATE TABLE dfs.tmp.FOO as
select cast(columns[0] as INT) as `PRODUCT_ID`,
cast(columns[1] as INT) as `LEG_ID`,
columns[2] as `LEG_TYPE`,
to_timestamp(columns[3], 'dd-MMM-yy HH.mm.ss.SSSSSS a') as `START_DATE`
from dfs.`c:\work\prod\data\foo.csv`;
Error: SYSTEM ERROR: IllegalArgumentException: Invalid format ""

You can always include a CASE statement to filter out the empty entries:
CREATE TABLE dfs.tmp.FOO as
select cast(columns[0] as INT) as `PRODUCT_ID`,
cast(columns[1] as INT) as `LEG_ID`,
columns[2] as `LEG_TYPE`,
CASE WHEN columns[3] = '' THEN null
ELSE to_timestamp(columns[3], 'dd-MMM-yy HH.mm.ss.SSSSSS a')
END as `START_DATE`
from dfs.`c:\work\prod\data\foo.csv`;

You can also use NULLIF() function as below
CREATE TABLE dfs.tmp.FOO as
select cast(columns[0] as INT) as `PRODUCT_ID`,
cast(columns[1] as INT) as `LEG_ID`,
columns[2] as `LEG_TYPE`,
to_timestamp(NULLIF(columns[3],''), 'dd-MMM-yy HH.mm.ss.SSSSSS a') as `START_DATE`
from dfs.`c:\work\prod\data\foo.csv`;
NULLIF will convert empty string to null and the casting won't fail.

Related

Is there a way to store the value of a variable in a table using INSERT in Myssql

I am using MySQL Workbench 8.0 and would like to store the value of a variable in a table using INSERT.
Let's say:
I have a function defined as func which return a decimal.
I have a stored procedure defined as storedProcedure which takes two parameters.
Inside the stored procedure, I would like to store the returned value of the function in a variable then insert all three values (two parameters + one returned result) in an existing table.
Here is what I unsuccessfully tried so far:
CREATE DEFINER=`user`#`localhost` PROCEDURE `storedProcedure`(
IN param1 BOOL,
IN param2 INTEGER(10)
)
BEGIN
DECLARE returnedValue DECIMAL(7,2);
SET #returnedValue = (SELECT func());
INSERT INTO existing_db.existing_table (
column1,
column2,
column3
)
VALUES(
#param1,
#param2,
#returnedValue
);
END
When executing call storedProcedure(param1Value, param2Value) in a query tab, the console returns the
error code 1048: column cannot be null.
PS:
During debugging, I noticed that the function works correctly and returns a value which is effectively store in the variable.
I also tried to access the value in variables using (select #resultAG) in the VALUES section of the INSERT statement.
Thanks in advance,
CREATE DEFINER=`user`#`localhost`
PROCEDURE `storedProcedure`(
IN param1 BOOL,
IN param2 INTEGER(10)
)
INSERT INTO existing_db.existing_table (
column1,
column2,
column3
)
VALUES(
param1,
param2,
func()
);
If you want not only save the value returned by the function into the table but use this returned value into the variable for future use in the connection code also then use
CREATE DEFINER=`user`#`localhost`
PROCEDURE `storedProcedure`(
IN param1 BOOL,
IN param2 INTEGER(10)
)
INSERT INTO existing_db.existing_table (
column1,
column2,
column3
)
SELECT
param1,
param2,
#returnedValue := func()
;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=daaa2e7409dfc4e5e00675153134da9d

Mysql Using a variable within a case statement

Trying to do something very simple.
Want to Run a SP that checks if one of the arguments is null. If so, store an empty string in a log table, otherwise store the value.
CREATE DEFINER=`root`#`localhost` PROCEDURE `SP_TestVariable`(
Code1 varchar(255) ,
CodeToTest varchar(255)
)
BEGIN
CASE
WHEN CodeToTest IS NULL THEN
set #FinalCode = ' '
ELSE
set #FinalCode = CodeToTest
end
-- Now do the insert into the log table
INSERT INTO `TempLogTable` ( strField1, strField2)
VALUES (Code1 , #FinalCode );
You can reduce all that to a simle query
INSERT INTO `TempLogTable` (strField1, strField2)
SELECT Code1, coalesce(CodeToTest, '')
and for that I would even drop the procedure and just use the query instead.

Truncated Inccorrect date value while calling nested function

I have two function.
fn_validate_date
fn_validation
fn_validate_date code:
CREATE DEFINER=`root`#`localhost` FUNCTION `fn_validate_date`(
`dt_date` DATE
)
RETURNS date
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT 'Returns the associated value of given attribute for given employee for a particular date.'
BEGIN
SET dt_date = IF(dt_date IS NULL OR dt_date ='', CURRENT_DATE, dt_date);
RETURN dt_date;
END
fn_validation code:
CREATE DEFINER=`root`#`localhost` FUNCTION `fn_validation`(
`dt_date` DATE
)
RETURNS date
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT ''
BEGIN
RETURN fn_validate_date(dt_date);
END
Now when I am calling fn_validate_date as below
SELECT `fn_validate_date`(null);
It's working well but when I calling fn_validation it's giving me an error.
SELECT `fn_validation`(null);
My question is why I didn't get error while calling fn_validate_date?
In fn_validate_date, the dt_date-parameter is type of date and you are comparing it to a string datatype. No need for that. Date datatype cannot contain ''. It either is NULL or has a date value in it.
So instead of:
SET dt_date = IF(dt_date IS NULL OR dt_date ='', CURRENT_DATE, dt_date);
You can simply use:
return ifnull( dt_date, current_date() );
I disable strict mode and it's working well.
SET sql_mode =''
To disable strict mode in MySQL. I am not sure why MySql short-circuited IF condition as I am passing NULL in the input parameter of fn_validation.

PostgreSQL: preventing sql injection on multiinsertion

I'm looking for the fastest way to parse, validate and insert data in table(Postgresql 9.3).
The data is an json-array which contains 1..N items.
[{"name":"a","value":"1"},{"name":"b","value":"2"}]
The table looks like:
CREATE TABLE logs
(
id serial NOT NULL,
name text ,
value text,
CONSTRAINT "log_Pkey" PRIMARY KEY (id)
);
For that i have stored procedure:
CREATE OR REPLACE FUNCTION insert_logs(v json)
RETURNS integer AS
$BODY$
DECLARE
sql text;
i json;
logs_part_id int;
BEGIN
SELECT INTO logs_part_id id from another_table_with_that_id where some_condition.
sql = '';
FOR i IN SELECT * FROM json_array_elements(v)
LOOP
sql = sql||'insert into logs_'||logs_part_id ||'
(name, value)
values( ' ||quote_literal(i->>'name')||' , ' ||quote_literal(i->>'value')||' );';
END LOOP;
raise notice '%',sql;
EXECUTE sql;
return 1;
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
(function returns integer as a response status)
Function call:
select * from insert_logs('[{"name":"a","value":"1"},{"name":"b","value":"2"}]'::json);
Actually the "insert.." statement is quite bigger - 15 columns to insert and aparently some of them should be checked in order to prevent sql injection.
Question:
Is there any way to rewrite this stored procedure in order to improve performance?
Should I use prepared statements?
EDIT.
The reason i build sql string because the table name is unknown because of the tables partitioning. The table name format is: logs_id where id - int which is obtained just before insert.
If you need to speed up your query, json_populate_recordset() does exactly what you need:
insert into logs
select * from json_populate_recordset(null::logs, '[...]')
As, for SQL-injection: you should always use prepared statements, or at least execute your sql with parameters sent separately (f.ex. with PQexecParams() if you use libpq directly).
Why are you building an SQL multi-statement string then EXECUTEing it at all?
Just:
insert into logs (name, value)
values( i->>name , i->>value );
There's no need for explicit quoting because i->>name is a text value that's inserted as a bound parameter into the insert by PL/PgSQL. It's never parsed as SQL.
If you must build the statement dynamically (e.g. varying table name, per comment) use EXECUTE ... USING with format:
EXECUTE format('insert into %I (name, value) values( $1, $2 );', 'logs_'||log_partition_id)
USING i->>name , i->>value;
in your case

MYSQL Stored Procedure list of string parameter and using IN clause

I have this stored procedure query. I'm using this code in my vb.net in dataset so i need is to pass parameter in my every where clause. or can i pass my whole where clause in this stored procedure from my vb.net.If not how can i do the "where IN clause" because im getting error if I'm call my stored procedure.
Maybe someone can give me some idea how can i handle this problem.
DELIMITER $$
DROP PROCEDURE IF EXISTS `lcs_rdb`.`sp_MissedCallsReport`$$
CREATE DEFINER=`root`#`localhost` PROCEDURE `sp_MissedCallsReport`()
BEGIN
select
cdr_extension_no, cdr_charge_to, COUNT(cdr_call_type_code) as answered,
SUM(cdr_call_type_code = 'BSY') as Busy,
sum(cdr_call_type_code = 'ABN') as abandon,
sum(cdr_call_type_code in ('BSY','ABN')) as total,
coalesce((sum(case cdr_call_type_code when 'ABN' then cdr_duration_number/60000 else 0 end) / sum(cdr_call_type_code = 'ABN')),0) as avg_abandon,
coalesce((sum(cdr_call_type_code in ('BSY','ABN')) /
(sum(cdr_call_type_code in ('BSY','ABN')) + COUNT(cdr_call_type_code))) *100,0) as missed_calls_rate
from cdr_departments
where cdr_site_id = '{0}' AND
cdr_datetime BETWEEN '{1}' AND '{2}'
AND cdr_call_class_id IN({3}) AND cdr_call_type_id IN({4})
AND cdr_extension_id IN({5}) or cdr_route_member_id IN ({6})
GROUP BY cdr_extension_no;
END$$
DELIMITER ;
I suggest you use IN parameters for your stored procedure to use with WHERE clause.
Example:
PROCEDURE `sp_MissedCallsReport`(
IN param_cdr_site_id INT
, IN param_cdr_datetime_min DATETIME
, IN param_cdr_datetime_max DATETIME
, IN param_cdr_call_class_id_csv VARCHAR(1024) -- csv integers
, IN param_cdr_call_type_id_csv VARCHAR(1024) -- csv integers
, IN param_cdr_extension_id_csv VARCHAR(1024) -- csv integers
, IN param_cdr_route_member_id_csv VARCHAR(1024) -- csv integers
)
I suggest CSV form of int valuess as varchar params for parameter placeholder numbers 3,4,5, and 6. It is because, you want to use them with IN as a set to search in them. As we can't pass an array of values as procedure parameters, we can make use of CSV form as an alternative.
And as the input is in CSV format, the IN is not suitable to use for search.
We can use FIND_IN_SET with CSV values.
Example: (for your where clause):
where cdr_site_id = param_cdr_site_id
AND cdr_datetime BETWEEN param_cdr_datetime_min AND param_cdr_datetime_max
AND FIND_IN_SET( cdr_call_class_id, param_cdr_call_class_id_csv )
AND FIND_IN_SET( cdr_call_type_id, param_cdr_call_type_id_csv )
AND ( FIND_IN_SET( cdr_extension_id, param_cdr_extension_id_csv )
or FIND_IN_SET ( cdr_route_member_id, param_cdr_route_member_id_csv ) )
Refer to:
CREATE PROCEDURE Syntax
FIND_IN_SET( str, strlist )