I have this jsonb column in a PostgresSQL table.
{
"{\"start\":\"14:00\",\"end\":\"14:50\"}",
"{\"start\":\"14:51\",\"end\":\"15:40\"}",
"{\"start\":\"15:41\",\"end\":\"16:30\"}",
"{\"start\":\"16:31\",\"end\":\"17:20\"}"
}
I need to extract all values of start and end.
I want the result to be like this
id | start1 | end1 | start2 | end2 | start3 | end3 | start4 | end4
or
id | start1 | end1
id | start2 | end2
id | start3 | end3
id | start4 | end4
The usual ->> doesn't work for this and I have no clue how can I do that.
You don't say what version of Postgres you are using, but if it probably has:
https://www.postgresql.org/docs/12/runtime-config-compatible.html#RUNTIME-CONFIG-COMPATIBLE-VERSION
standard_conforming_strings (boolean)
This controls whether ordinary string literals ('...') treat backslashes The presence of this parameter can also be taken as an indication that the escape string syntax (E'...') is supported. Escape string syntax (Section 4.1.2.2) should be used if an application desires backslashes to be treated as escape characters.
In that case to deal with the escapes in your JSON you need to do:
select E'{\"start\":\"14:00\",\"end\":\"14:50\"}'::jsonb;
jsonb
------------------------------------
{"end": "14:50", "start": "14:00"}
(1 row)
select E'{\"start\":\"14:00\",\"end\":\"14:50\"}'::jsonb ->> 'start';
?column?
----------
14:00
select E'[
{\"start\":\"14:00\",\"end\":\"14:50\"},
{\"start\":\"14:51\",\"end\":\"15:40\"},
{\"start\":\"15:41\",\"end\":\"16:30\"},
{\"start\":\"16:31\",\"end\":\"17:20\"}
]'::jsonb;
--------------------------------------------------------------------------------------------------------------------------------------------------
[{"end": "14:50", "start": "14:00"}, {"end": "15:40", "start": "14:51"}, {"end": "16:30", "start": "15:41"}, {"end": "17:20", "start": "16:31"}]
Related
I'm using MySQL 5.7.35. If I use the LOAD DATA INFILE command on a CSV file with NULL as an unquoted string value in the CSV file, the value is imported as NULL in MySQL.
For example, if I import a CSV file with the following content:
record_number,a,b,c,d,e,f
1,1,2,3,4,5,6
2,NULL,null,Null,nUlL,,"NULL"
The imported table will have the following values:
+---------------+------+--------+--------+--------+--------+--------+
| record_number | a | b | c | d | e | f |
+---------------+------+--------+--------+--------+--------+--------+
| 1 | 1 | 2 | 3 | 4 | 5 | 6 |
| 2 | NULL | "null" | "Null" | "nUlL" | "" | "NULL" |
+---------------+------+--------+--------+--------+--------+--------+
Is there any way to force column a, record 2, to be imported as a string without modifying the CSV file?
Update
#Barmar Pointed out that there's a paragraph in the MySQL documentation on this behavior here:
If FIELDS ENCLOSED BY is not empty, a field containing the literal
word NULL as its value is read as a NULL value. This differs from the
word NULL enclosed within FIELDS ENCLOSED BY characters, which is read
as the string 'NULL'.
This is documented here:
If FIELDS ENCLOSED BY is not empty, a field containing the literal word NULL as its value is read as a NULL value. This differs from the word NULL enclosed within FIELDS ENCLOSED BY characters, which is read as the string 'NULL'.
So you need to specify the quoting character with something like FIELDS ENCLOSED BY '"' and then write "NULL" in the CSV file.
You could check for a NULL value in your code and convert it to a string.
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(record_number, #a, #b, #c, #d, #e, #f)
SET a = IFNULL(#a, 'NULL'),
b = IFNULL(#b, 'NULL'),
c = IFNULL(#c, 'NULL'),
d = IFNULL(#d, 'NULL'),
e = IFNULL(#e, 'NULL'),
f = IFNULL(#f, 'NULL')
However, this can't distinguish between an intentional NULL written as \N and MySQL treating NULL as NULL.
I have a problem with loading CSV data into snowflake table. Fields are wrapped in double quote marks and hence there is problem with importing them into table.
I know that COPY TO has CSV specific option FIELD_OPTIONALLY_ENCLOSED_BY = '"'but it's not working at all.
Here are some pices of table definition and copy command:
CREATE TABLE ...
(
GamePlayId NUMBER NOT NULL,
etc...
....);
COPY INTO ...
FROM ...csv.gz'
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 1
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "ABORT_STATEMENT"
;
Csv file looks like this:
"3922000","14733370","57256","2","3","2","2","2019-05-23 14:14:44",",00000000",",00000000",",00000000",",00000000","1000,00000000","1000,00000000","1317,50400000","1166,50000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000"
I get an error
'''Numeric value '"3922000"' is not recognized '''
I'm pretty sure it's because NUMBER value is interpreted as string when snowflake is reading "" marks, but since I use
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
it shouldn't even be there... Does anyone have some solution to this?
Maybe something is incorrect with your file? I was just able to run the following without issue.
1. create the test table:
CREATE OR REPLACE TABLE
dbNameHere.schemaNameHere.stacko_58322339 (
num1 NUMBER,
num2 NUMBER,
num3 NUMBER);
2. create test file, contents as follows
1,2,3
"3922000","14733370","57256"
3,"2",1
4,5,"6"
3. create stage and put file in stage
4. run the following copy command
COPY INTO dbNameHere.schemaNameHere.STACKO_58322339
FROM #stageNameHere/stacko_58322339.csv.gz
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 0
ERROR_ON_COLUMN_COUNT_MISMATCH=FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "CONTINUE";
4. results
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| stageNameHere/stacko_58322339.csv.gz | LOADED | 4 | 4 | 4 | 0 | NULL | NULL | NULL | NULL |
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.436s
5. view the records
>SELECT * FROM dbNameHere.schemaNameHere.stacko_58322339;
+---------+----------+-------+
| NUM1 | NUM2 | NUM3 |
|---------+----------+-------|
| 1 | 2 | 3 |
| 3922000 | 14733370 | 57256 |
| 3 | 2 | 1 |
| 4 | 5 | 6 |
+---------+----------+-------+
Can you try with a similar test as this?
EDIT: A quick look at your data shows many of your numeric fields appear to start with commas, so something definitely amiss with the data.
Assuming your numbers are European formatted , decimal place, and . thousands, reading the numeric formating help, it seems Snowflake does not support this as input. I'd open a feature request.
But if you read the column in as text then use REPLACE like
SELECT '100,1234'::text as A
,REPLACE(A,',','.') as B
,TRY_TO_DECIMAL(b, 20,10 ) as C;
gives:
A B C
100,1234 100.1234 100.1234000000
safer would be to strip placeholders first like
SELECT '1.100,1234'::text as A
,REPLACE(A,'.') as B
,REPLACE(B,',','.') as C
,TRY_TO_DECIMAL(C, 20,10 ) as D;
I am trying to create a csv from values stored in the table:
| col1 | col2 | col3 |
| "one" | null | "one" |
| "two" | "two" | "two" |
hive > select * from table where col2 is null;
one null one
I am getting the csv using the below code:
df.repartition(1)
.write.option("header",true)
.option("delimiter", ",")
.option("quoteAll", true)
.option("nullValue", "")
.csv(S3Destination)
Csv I get:
"col1","col2","col3"
"one","","one"
"two","two","two"
Expected Csv:WITH NO DOUBLE QUOTES FOR NULL VALUE
"col1","col2","col3"
"one",,"one"
"two","two","two"
Any help is appreciated to know if the dataframe writer has options to do this.
You can go in a udf approach and apply on the column (using withColumn on the repartitioned datafrmae above) where possiblity of double quote empty string is there see below sample code
sqlContext.udf().register("convertToEmptyWithOutQuotes",(String abc) -> (abc.trim().length() > 0 ? abc : abc.replace("\"", " ")),DataTypes.StringType);
String has replace method which does the job.
val a = Array("'x'","","z")
println(a.mkString(",").replace("\"", " "))
will produce 'x',,z
I have the following function in PostgreSQL
CREATE OR REPLACE FUNCTION public.translatejson(JSONB, TEXT)
RETURNS TEXT
AS
$BODY$
SELECT ($1->$2)::TEXT
$BODY$
LANGUAGE sql STABLE;
When I execute it I receive the values surrounded by double quotes. For example:
SELECT id, translatejson("title", 'en-US') AS "tname" FROM types."FuelTypes";
in return I get a table like this
-------------------
| id | tname |
-------------------
| 1 | "gasoline" |
| 2 | "diesel" |
-------------------
The values in the 'title' column are in JSON format:
{ "en-US":"gasoline", "fr-FR":"essence" }.
How I can omit the double quotes to return just the string of the result?
The -> operator returns a json result. Casting it to text leaves it in a json reprsentation.
The ->> operator returns a text result. Use that instead.
test=> SELECT '{"car": "going"}'::jsonb -> 'car';
?column?
----------
"going"
(1 row)
test=> SELECT '{"car": "going"}'::jsonb ->> 'car';
?column?
----------
going
(1 row)
I see that within MySQL there are Cast() and Convert() functions to create integers from values, but is there any way to check to see if a value is an integer? Something like is_int() in PHP is what I am looking for.
I'll assume you want to check a string value. One nice way is the REGEXP operator, matching the string to a regular expression. Simply do
select field from table where field REGEXP '^-?[0-9]+$';
this is reasonably fast. If your field is numeric, just test for
ceil(field) = field
instead.
Match it against a regular expression.
c.f. http://forums.mysql.com/read.php?60,1907,38488#msg-38488 as quoted below:
Re: IsNumeric() clause in MySQL??
Posted by: kevinclark ()
Date: August 08, 2005 01:01PM
I agree. Here is a function I created for MySQL 5:
CREATE FUNCTION IsNumeric (sIn varchar(1024)) RETURNS tinyint
RETURN sIn REGEXP '^(-|\\+){0,1}([0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+|[0-9]+)$';
This allows for an optional plus/minus sign at the beginning, one optional decimal point, and the rest numeric digits.
Suppose we have column with alphanumeric field having entries like
a41q
1458
xwe8
1475
asde
9582
.
.
.
.
.
qe84
and you want highest numeric value from this db column (in this case it is 9582) then this query will help you
SELECT Max(column_name) from table_name where column_name REGEXP '^[0-9]+$'
Here is the simple solution for it
assuming the data type is varchar
select * from calender where year > 0
It will return true if the year is numeric else false
This also works:
CAST( coulmn_value AS UNSIGNED ) // will return 0 if not numeric string.
for example
SELECT CAST('a123' AS UNSIGNED) // returns 0
SELECT CAST('123' AS UNSIGNED) // returns 123 i.e. > 0
To check if a value is Int in Mysql, we can use the following query.
This query will give the rows with Int values
SELECT col1 FROM table WHERE concat('',col * 1) = col;
The best i could think of a variable is a int Is a combination with MySQL's functions CAST() and LENGTH().
This method will work on strings, integers, doubles/floats datatypes.
SELECT (LENGTH(CAST(<data> AS UNSIGNED))) = (LENGTH(<data>)) AS is_int
see demo http://sqlfiddle.com/#!9/ff40cd/44
it will fail if the column has a single character value. if column has
a value 'A' then Cast('A' as UNSIGNED) will evaluate to 0 and
LENGTH(0) will be 1. so LENGTH(Cast('A' as UNSIGNED))=LENGTH(0) will
evaluate to 1=1 => 1
True Waqas Malik totally fogotten to test that case. the patch is.
SELECT <data>, (LENGTH(CAST(<data> AS UNSIGNED))) = CASE WHEN CAST(<data> AS UNSIGNED) = 0 THEN CAST(<data> AS UNSIGNED) ELSE (LENGTH(<data>)) END AS is_int;
Results
**Query #1**
SELECT 1, (LENGTH(CAST(1 AS UNSIGNED))) = CASE WHEN CAST(1 AS UNSIGNED) = 0 THEN CAST(1 AS UNSIGNED) ELSE (LENGTH(1)) END AS is_int;
| 1 | is_int |
| --- | ------ |
| 1 | 1 |
---
**Query #2**
SELECT 1.1, (LENGTH(CAST(1 AS UNSIGNED))) = CASE WHEN CAST(1.1 AS UNSIGNED) = 0 THEN CAST(1.1 AS UNSIGNED) ELSE (LENGTH(1.1)) END AS is_int;
| 1.1 | is_int |
| --- | ------ |
| 1.1 | 0 |
---
**Query #3**
SELECT "1", (LENGTH(CAST("1" AS UNSIGNED))) = CASE WHEN CAST("1" AS UNSIGNED) = 0 THEN CAST("1" AS UNSIGNED) ELSE (LENGTH("1")) END AS is_int;
| 1 | is_int |
| --- | ------ |
| 1 | 1 |
---
**Query #4**
SELECT "1.1", (LENGTH(CAST("1.1" AS UNSIGNED))) = CASE WHEN CAST("1.1" AS UNSIGNED) = 0 THEN CAST("1.1" AS UNSIGNED) ELSE (LENGTH("1.1")) END AS is_int;
| 1.1 | is_int |
| --- | ------ |
| 1.1 | 0 |
---
**Query #5**
SELECT "1a", (LENGTH(CAST("1.1" AS UNSIGNED))) = CASE WHEN CAST("1a" AS UNSIGNED) = 0 THEN CAST("1a" AS UNSIGNED) ELSE (LENGTH("1a")) END AS is_int;
| 1a | is_int |
| --- | ------ |
| 1a | 0 |
---
**Query #6**
SELECT "1.1a", (LENGTH(CAST("1.1a" AS UNSIGNED))) = CASE WHEN CAST("1.1a" AS UNSIGNED) = 0 THEN CAST("1.1a" AS UNSIGNED) ELSE (LENGTH("1.1a")) END AS is_int;
| 1.1a | is_int |
| ---- | ------ |
| 1.1a | 0 |
---
**Query #7**
SELECT "a1", (LENGTH(CAST("1.1a" AS UNSIGNED))) = CASE WHEN CAST("a1" AS UNSIGNED) = 0 THEN CAST("a1" AS UNSIGNED) ELSE (LENGTH("a1")) END AS is_int;
| a1 | is_int |
| --- | ------ |
| a1 | 0 |
---
**Query #8**
SELECT "a1.1", (LENGTH(CAST("a1.1" AS UNSIGNED))) = CASE WHEN CAST("a1.1" AS UNSIGNED) = 0 THEN CAST("a1.1" AS UNSIGNED) ELSE (LENGTH("a1.1")) END AS is_int;
| a1.1 | is_int |
| ---- | ------ |
| a1.1 | 0 |
---
**Query #9**
SELECT "a", (LENGTH(CAST("a" AS UNSIGNED))) = CASE WHEN CAST("a" AS UNSIGNED) = 0 THEN CAST("a" AS UNSIGNED) ELSE (LENGTH("a")) END AS is_int;
| a | is_int |
| --- | ------ |
| a | 0 |
see demo
What about:
WHERE table.field = "0" or CAST(table.field as SIGNED) != 0
to test for numeric and the corrolary:
WHERE table.field != "0" and CAST(table.field as SIGNED) = 0
I have tried using the regular expressions listed above, but they do not work for the following:
SELECT '12 INCHES' REGEXP '^(-|\\+){0,1}([0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+|[0-9]+)$' FROM ...
The above will return 1 (TRUE), meaning the test of the string '12 INCHES' against the regular expression above, returns TRUE. It looks like a number based on the regular expression used above. In this case, because the 12 is at the beginning of the string, the regular expression interprets it as a number.
The following will return the right value (i.e. 0) because the string starts with characters instead of digits
SELECT 'TOP 10' REGEXP '^(-|\\+){0,1}([0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+|[0-9]+)$' FROM ...
The above will return 0 (FALSE) because the beginning of the string is text and not numeric.
However, if you are dealing with strings that have a mix of numbers and letters that begin with a number, you will not get the results you want. REGEXP will interpret the string as a valid number when in fact it is not.
This works well for VARCHAR where it begins with a number or not..
WHERE concat('',fieldname * 1) != fieldname
may have restrictions when you get to the larger NNNNE+- numbers
for me the only thing that works is:
CREATE FUNCTION IsNumeric (SIN VARCHAR(1024)) RETURNS TINYINT
RETURN SIN REGEXP '^(-|\\+){0,1}([0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+|[0-9]+)$';
from kevinclark all other return useless stuff for me in case of 234jk456 or 12 inches