I have a problem with loading CSV data into snowflake table. Fields are wrapped in double quote marks and hence there is problem with importing them into table.
I know that COPY TO has CSV specific option FIELD_OPTIONALLY_ENCLOSED_BY = '"'but it's not working at all.
Here are some pices of table definition and copy command:
CREATE TABLE ...
(
GamePlayId NUMBER NOT NULL,
etc...
....);
COPY INTO ...
FROM ...csv.gz'
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 1
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "ABORT_STATEMENT"
;
Csv file looks like this:
"3922000","14733370","57256","2","3","2","2","2019-05-23 14:14:44",",00000000",",00000000",",00000000",",00000000","1000,00000000","1000,00000000","1317,50400000","1166,50000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000"
I get an error
'''Numeric value '"3922000"' is not recognized '''
I'm pretty sure it's because NUMBER value is interpreted as string when snowflake is reading "" marks, but since I use
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
it shouldn't even be there... Does anyone have some solution to this?
Maybe something is incorrect with your file? I was just able to run the following without issue.
1. create the test table:
CREATE OR REPLACE TABLE
dbNameHere.schemaNameHere.stacko_58322339 (
num1 NUMBER,
num2 NUMBER,
num3 NUMBER);
2. create test file, contents as follows
1,2,3
"3922000","14733370","57256"
3,"2",1
4,5,"6"
3. create stage and put file in stage
4. run the following copy command
COPY INTO dbNameHere.schemaNameHere.STACKO_58322339
FROM #stageNameHere/stacko_58322339.csv.gz
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 0
ERROR_ON_COLUMN_COUNT_MISMATCH=FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "CONTINUE";
4. results
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| stageNameHere/stacko_58322339.csv.gz | LOADED | 4 | 4 | 4 | 0 | NULL | NULL | NULL | NULL |
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.436s
5. view the records
>SELECT * FROM dbNameHere.schemaNameHere.stacko_58322339;
+---------+----------+-------+
| NUM1 | NUM2 | NUM3 |
|---------+----------+-------|
| 1 | 2 | 3 |
| 3922000 | 14733370 | 57256 |
| 3 | 2 | 1 |
| 4 | 5 | 6 |
+---------+----------+-------+
Can you try with a similar test as this?
EDIT: A quick look at your data shows many of your numeric fields appear to start with commas, so something definitely amiss with the data.
Assuming your numbers are European formatted , decimal place, and . thousands, reading the numeric formating help, it seems Snowflake does not support this as input. I'd open a feature request.
But if you read the column in as text then use REPLACE like
SELECT '100,1234'::text as A
,REPLACE(A,',','.') as B
,TRY_TO_DECIMAL(b, 20,10 ) as C;
gives:
A B C
100,1234 100.1234 100.1234000000
safer would be to strip placeholders first like
SELECT '1.100,1234'::text as A
,REPLACE(A,'.') as B
,REPLACE(B,',','.') as C
,TRY_TO_DECIMAL(C, 20,10 ) as D;
Related
I'm using MySQL 5.7.35. If I use the LOAD DATA INFILE command on a CSV file with NULL as an unquoted string value in the CSV file, the value is imported as NULL in MySQL.
For example, if I import a CSV file with the following content:
record_number,a,b,c,d,e,f
1,1,2,3,4,5,6
2,NULL,null,Null,nUlL,,"NULL"
The imported table will have the following values:
+---------------+------+--------+--------+--------+--------+--------+
| record_number | a | b | c | d | e | f |
+---------------+------+--------+--------+--------+--------+--------+
| 1 | 1 | 2 | 3 | 4 | 5 | 6 |
| 2 | NULL | "null" | "Null" | "nUlL" | "" | "NULL" |
+---------------+------+--------+--------+--------+--------+--------+
Is there any way to force column a, record 2, to be imported as a string without modifying the CSV file?
Update
#Barmar Pointed out that there's a paragraph in the MySQL documentation on this behavior here:
If FIELDS ENCLOSED BY is not empty, a field containing the literal
word NULL as its value is read as a NULL value. This differs from the
word NULL enclosed within FIELDS ENCLOSED BY characters, which is read
as the string 'NULL'.
This is documented here:
If FIELDS ENCLOSED BY is not empty, a field containing the literal word NULL as its value is read as a NULL value. This differs from the word NULL enclosed within FIELDS ENCLOSED BY characters, which is read as the string 'NULL'.
So you need to specify the quoting character with something like FIELDS ENCLOSED BY '"' and then write "NULL" in the CSV file.
You could check for a NULL value in your code and convert it to a string.
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(record_number, #a, #b, #c, #d, #e, #f)
SET a = IFNULL(#a, 'NULL'),
b = IFNULL(#b, 'NULL'),
c = IFNULL(#c, 'NULL'),
d = IFNULL(#d, 'NULL'),
e = IFNULL(#e, 'NULL'),
f = IFNULL(#f, 'NULL')
However, this can't distinguish between an intentional NULL written as \N and MySQL treating NULL as NULL.
I have a problem concerning the IF() function in MySQL.
I would like to return a string and change the value of a variable. Somwhat like:
IF(#firstRow=1, "Dear" AND #firstRow:=0, "dear")
This outputs only '0' instead of 'Dear'...
I would be very thankful for some input on ways I could solve this problem!
Louis :)
AND is a boolean operator, not a "also do this other thing" operator.
"Dear" AND 0 returns 0 because 0 is treated as false in MySQL and <anything> AND false will return false.
Also because the integer/boolean value of "Dear" is 0 as well. Using a string in a numeric context just reads initial digits in the string, if any, and ignores the rest.
It's not clear what your problem is, but I guess you want to capitalize the word "dear" if the row is the first one in the result set.
Instead of being too clever by half trying to fit the side-effect into your expression, do yourself a favor and break it out into a separate column:
mysql> SELECT IF(#firstRow=1, 'Dear', 'dear'), #firstRow:=0 AS _ignoreThis
-> FROM (SELECT #firstRow:=1) AS _init
-> CROSS JOIN
-> mytable;
+---------------------------------+-------------+
| IF(#firstRow=1, 'Dear', 'dear') | _ignoreThis |
+---------------------------------+-------------+
| Dear | 0 |
| dear | 0 |
| dear | 0 |
+---------------------------------+-------------+
But if you really want to make your code as confusing and unreadable as possible, you can do something like this:
SELECT IF(#firstRow=1, CONCAT('Dear', IF(#firstRow:=0, '', '')), 'dear')
FROM (SELECT #firstRow:=1) AS _init
CROSS JOIN
...
But remember this important metric of code quality: WTFs per minute.
Use a case expression instead of IF() as the syntax is far easier to follow e.g.
select
case when #firstRow = 1 then 'Dear' else 'dear' end AS Salutation
, #firstRow := 0
from (
select 1 n union all
select 2 n union all
select 3
) d
cross join (SELECT #firstRow:=1) var
+---+------------+----------------+
| | Salutation | #firstRow := 0 |
+---+------------+----------------+
| 1 | Dear | 0 |
| 2 | dear | 0 |
| 3 | dear | 0 |
+---+------------+----------------+
Demo
I am trying to create a csv from values stored in the table:
| col1 | col2 | col3 |
| "one" | null | "one" |
| "two" | "two" | "two" |
hive > select * from table where col2 is null;
one null one
I am getting the csv using the below code:
df.repartition(1)
.write.option("header",true)
.option("delimiter", ",")
.option("quoteAll", true)
.option("nullValue", "")
.csv(S3Destination)
Csv I get:
"col1","col2","col3"
"one","","one"
"two","two","two"
Expected Csv:WITH NO DOUBLE QUOTES FOR NULL VALUE
"col1","col2","col3"
"one",,"one"
"two","two","two"
Any help is appreciated to know if the dataframe writer has options to do this.
You can go in a udf approach and apply on the column (using withColumn on the repartitioned datafrmae above) where possiblity of double quote empty string is there see below sample code
sqlContext.udf().register("convertToEmptyWithOutQuotes",(String abc) -> (abc.trim().length() > 0 ? abc : abc.replace("\"", " ")),DataTypes.StringType);
String has replace method which does the job.
val a = Array("'x'","","z")
println(a.mkString(",").replace("\"", " "))
will produce 'x',,z
I am trying to create a function that outputs a matrix that contains each item in a list on a separate line with lines in between. The only output I'm getting is quotations (''). I do not understand why. I think I set it all up correctly to output what is needed but there has to be something missing?
I included examples below my code.
def show_table(table):
table=[]
s=[[str(e) for e in row] for row in table]
lens= [max(map(len, col)) for col in zip(*s)]
fmt= '\t'.join('{{:{}}}'.format(x) for x in lens)
table= [fmt.format(*row) for row in s]
return '\n'.join(table)
show_table([['A','BB'],['C','DD']])
output:
'| A | BB |\n| C | DD |\n'
print(show_table([['A','BB'],['C','DD']]))
output:
| A | BB |
| C | DD |
The issue is on the second line where you are initialising your list to an empty list. Instead try:
if table is None:
table = []
Perhaps a better way to accomplish this could be:
def show_table(table):
if table is None:
table = []
data = ""
for row in table:
for val in row:
data += "| " + val + " "
data += "|\n"
return data.strip("\n")
print show_table([['a','bb'],['c','dd']])
Output:
| a | bb |
| c | dd |
I have an issue and i'm looping on it! :| I hope someone can help me..
So i have an input file (.xls), that is simple but there are a row (lets say its "ROW1") that is like this:
ROW1 | ROW2 | ROW3 | ROW_N
765 | 1 | AAAA-MM-DD | ...
null | 1 | AAAA-MM-DD | ...
null | 1 | AAAA-MM-DD | ...
944 | 2 | AAAA-MM-DD | ...
null | 2 | AAAA-MM-DD | ...
088 | 7 | AAAA-MM-DD | ...
555 | 2 | AAAA-MM-DD | ...
null | 2 | AAAA-MM-DD | ...
There are no stardard here, like you can see.. There are some lines null (ROW1) and in ROW2, there are equal numbers, with different association to ROW1 (like in line 5 and 6, then in line 8 and 9).
My objective is to copy and paste the values from ROW1, in the ROW1 after when is null, till isn't null. Basically is to copy form previous step, when is null...
I'm trying to use the "Formula" step, by using something like:
=IF(AND(ISBLANK([ROW1]);NOT(ISBLANK([ROW2]));ROW_n=ROW1;IF(AND(NOT(ISBLANK([ROW1]));NOT(ISBLANK([ROW2]));ROW_n=ROW1;ROW_n=""));
But nothing yet..
I've tried "Analytic Query" but nothing too..
I'm using just stream a xls file input..
Tks very much, any help is very much appreciiated!!
Best Regardsd!
Well i discover a solution, adding a "User Defined Java Class" with the code below:
import java.util.HashMap;
private FieldHelper output_field, card_field;
private RowSet out, log;
private String previou_card =null;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
if (first)
{
first = false;
out = findTargetRowSet("out");
output_field = get(Fields.Out, "previous_card");
} else {
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
r = createOutputRow(r, data.outputRowMeta.size());
if (previous_card != null) {
output_field.setValue(r, previous_card);
}
if (card_field == null) {
card_field = get(Fields.In, "Grupo de Cartões");
}
String card = card_field.getString(r);
if (card != null && !card.isEmpty()) {
previous_card = card;
}
// Send the row on to the next step.
putRowTo(data.outputRowMeta, r, out);
}
return true;
After this i have to put a few steps but this help very much.
Thank you mates!!
Finally i got result. Please follow below steps
Below image is full transformation screen.
Data Grid Data will be like these. Sorry for that in my local i don't have Microsoft because of that i took Data Grid. Instead of Data Grid you can drag and drop Microsoft Excel Input step.
Drag and Drop one java script step and write below code.
Last step of transformation, drag and drop Select values step and select the columns.( These step is no necessary)
Final result will be like these.
Hope this helps.