Everyone, how does GP turn the strings looking like a dictionary As below into the table with field names and values.
I mean how to made it in SQL
{"useCoupon":true,"useLevel":false,"usePoints":false,"useActivity" :false}
create table test(a json);
insert into test(a) values('{"useCoupon":true,"useLevel":false,"usePoints":false,"useActivity" :false}');
insert into test(a) values('{"useCoupon":false,"useLevel":false,"usePoints":true,"useActivity" :true}');
After loading data to a staging table using copy or gpfdist or any another utilities, select the required fields like below and load the data to final table.
gpadmin=# select a -> 'useCoupon' as useCoupon,a -> 'useLevel' as useLevel,a -> 'usePoints' as usePoints,a -> 'useActivity' as useActivity from test;
usecoupon | uselevel | usepoints | useactivity
-----------+----------+-----------+-------------
true | false | false | false
false | false | true | true
(2 rows)
I have this jsonb column in a PostgresSQL table.
{
"{\"start\":\"14:00\",\"end\":\"14:50\"}",
"{\"start\":\"14:51\",\"end\":\"15:40\"}",
"{\"start\":\"15:41\",\"end\":\"16:30\"}",
"{\"start\":\"16:31\",\"end\":\"17:20\"}"
}
I need to extract all values of start and end.
I want the result to be like this
id | start1 | end1 | start2 | end2 | start3 | end3 | start4 | end4
or
id | start1 | end1
id | start2 | end2
id | start3 | end3
id | start4 | end4
The usual ->> doesn't work for this and I have no clue how can I do that.
You don't say what version of Postgres you are using, but if it probably has:
https://www.postgresql.org/docs/12/runtime-config-compatible.html#RUNTIME-CONFIG-COMPATIBLE-VERSION
standard_conforming_strings (boolean)
This controls whether ordinary string literals ('...') treat backslashes The presence of this parameter can also be taken as an indication that the escape string syntax (E'...') is supported. Escape string syntax (Section 4.1.2.2) should be used if an application desires backslashes to be treated as escape characters.
In that case to deal with the escapes in your JSON you need to do:
select E'{\"start\":\"14:00\",\"end\":\"14:50\"}'::jsonb;
jsonb
------------------------------------
{"end": "14:50", "start": "14:00"}
(1 row)
select E'{\"start\":\"14:00\",\"end\":\"14:50\"}'::jsonb ->> 'start';
?column?
----------
14:00
select E'[
{\"start\":\"14:00\",\"end\":\"14:50\"},
{\"start\":\"14:51\",\"end\":\"15:40\"},
{\"start\":\"15:41\",\"end\":\"16:30\"},
{\"start\":\"16:31\",\"end\":\"17:20\"}
]'::jsonb;
--------------------------------------------------------------------------------------------------------------------------------------------------
[{"end": "14:50", "start": "14:00"}, {"end": "15:40", "start": "14:51"}, {"end": "16:30", "start": "15:41"}, {"end": "17:20", "start": "16:31"}]
I have a problem with loading CSV data into snowflake table. Fields are wrapped in double quote marks and hence there is problem with importing them into table.
I know that COPY TO has CSV specific option FIELD_OPTIONALLY_ENCLOSED_BY = '"'but it's not working at all.
Here are some pices of table definition and copy command:
CREATE TABLE ...
(
GamePlayId NUMBER NOT NULL,
etc...
....);
COPY INTO ...
FROM ...csv.gz'
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 1
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "ABORT_STATEMENT"
;
Csv file looks like this:
"3922000","14733370","57256","2","3","2","2","2019-05-23 14:14:44",",00000000",",00000000",",00000000",",00000000","1000,00000000","1000,00000000","1317,50400000","1166,50000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000"
I get an error
'''Numeric value '"3922000"' is not recognized '''
I'm pretty sure it's because NUMBER value is interpreted as string when snowflake is reading "" marks, but since I use
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
it shouldn't even be there... Does anyone have some solution to this?
Maybe something is incorrect with your file? I was just able to run the following without issue.
1. create the test table:
CREATE OR REPLACE TABLE
dbNameHere.schemaNameHere.stacko_58322339 (
num1 NUMBER,
num2 NUMBER,
num3 NUMBER);
2. create test file, contents as follows
1,2,3
"3922000","14733370","57256"
3,"2",1
4,5,"6"
3. create stage and put file in stage
4. run the following copy command
COPY INTO dbNameHere.schemaNameHere.STACKO_58322339
FROM #stageNameHere/stacko_58322339.csv.gz
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 0
ERROR_ON_COLUMN_COUNT_MISMATCH=FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "CONTINUE";
4. results
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| stageNameHere/stacko_58322339.csv.gz | LOADED | 4 | 4 | 4 | 0 | NULL | NULL | NULL | NULL |
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.436s
5. view the records
>SELECT * FROM dbNameHere.schemaNameHere.stacko_58322339;
+---------+----------+-------+
| NUM1 | NUM2 | NUM3 |
|---------+----------+-------|
| 1 | 2 | 3 |
| 3922000 | 14733370 | 57256 |
| 3 | 2 | 1 |
| 4 | 5 | 6 |
+---------+----------+-------+
Can you try with a similar test as this?
EDIT: A quick look at your data shows many of your numeric fields appear to start with commas, so something definitely amiss with the data.
Assuming your numbers are European formatted , decimal place, and . thousands, reading the numeric formating help, it seems Snowflake does not support this as input. I'd open a feature request.
But if you read the column in as text then use REPLACE like
SELECT '100,1234'::text as A
,REPLACE(A,',','.') as B
,TRY_TO_DECIMAL(b, 20,10 ) as C;
gives:
A B C
100,1234 100.1234 100.1234000000
safer would be to strip placeholders first like
SELECT '1.100,1234'::text as A
,REPLACE(A,'.') as B
,REPLACE(B,',','.') as C
,TRY_TO_DECIMAL(C, 20,10 ) as D;
I am trying to create a csv from values stored in the table:
| col1 | col2 | col3 |
| "one" | null | "one" |
| "two" | "two" | "two" |
hive > select * from table where col2 is null;
one null one
I am getting the csv using the below code:
df.repartition(1)
.write.option("header",true)
.option("delimiter", ",")
.option("quoteAll", true)
.option("nullValue", "")
.csv(S3Destination)
Csv I get:
"col1","col2","col3"
"one","","one"
"two","two","two"
Expected Csv:WITH NO DOUBLE QUOTES FOR NULL VALUE
"col1","col2","col3"
"one",,"one"
"two","two","two"
Any help is appreciated to know if the dataframe writer has options to do this.
You can go in a udf approach and apply on the column (using withColumn on the repartitioned datafrmae above) where possiblity of double quote empty string is there see below sample code
sqlContext.udf().register("convertToEmptyWithOutQuotes",(String abc) -> (abc.trim().length() > 0 ? abc : abc.replace("\"", " ")),DataTypes.StringType);
String has replace method which does the job.
val a = Array("'x'","","z")
println(a.mkString(",").replace("\"", " "))
will produce 'x',,z
I have the following function in PostgreSQL
CREATE OR REPLACE FUNCTION public.translatejson(JSONB, TEXT)
RETURNS TEXT
AS
$BODY$
SELECT ($1->$2)::TEXT
$BODY$
LANGUAGE sql STABLE;
When I execute it I receive the values surrounded by double quotes. For example:
SELECT id, translatejson("title", 'en-US') AS "tname" FROM types."FuelTypes";
in return I get a table like this
-------------------
| id | tname |
-------------------
| 1 | "gasoline" |
| 2 | "diesel" |
-------------------
The values in the 'title' column are in JSON format:
{ "en-US":"gasoline", "fr-FR":"essence" }.
How I can omit the double quotes to return just the string of the result?
The -> operator returns a json result. Casting it to text leaves it in a json reprsentation.
The ->> operator returns a text result. Use that instead.
test=> SELECT '{"car": "going"}'::jsonb -> 'car';
?column?
----------
"going"
(1 row)
test=> SELECT '{"car": "going"}'::jsonb ->> 'car';
?column?
----------
going
(1 row)