Athena CTAS saves json as string with special char escaped - json

I'm creating a new table using CTAS in athena, everything works fine except json string in the raw table (not defined as a struct).
It was
"screen_orientation":"{"angle":"0"}",
Now becomes:
"screen_orientation":"{\"angle\":\"0\"}",
CTAS statement is straight forward:
CREATE TABLE destination_table
WITH (
format='JSON',
partitioned_by=ARRAY['partition_date'],
write_compression='GZIP'
)
AS
SELECT * FROM src_table
Source column is of type string.
Is there anyway I could prevent this from happening? I can't redefine the source table's column definition due to permission issue.

The behavior is expected in Athena. For example if I run below query where I am casting a string to JSON then the double quotes are escaped by backslash(\).
SQL:
WITH dataset AS (
SELECT
CAST('{"test": "value"}' AS JSON) AS hello_msg
)
SELECT * FROM dataset
Output:
But you can always work around this by using json_format function as shown below :
SQL:
WITH dataset AS (
SELECT
json_format(JSON '{"test": "value"}' ) as hello_msg
)
SELECT * FROM dataset
Output:
So you can add json_format to your select query in CTAS statement which will not embed these backslashes.

If your json comes as a string you can also use json_parse:
WITH dataset AS (
SELECT
json_parse('{"test": "value"}') as hello_msg
)
SELECT * FROM dataset

Related

Extracting from Json in AWS Athena or Presto

My query below does not give me any result
WITH dataset AS (
SELECT responseelements FROM cloudtrail_logs
WHERE useridentity.type = 'Root'
AND eventname='CreateVpc'
ORDER BY eventsource, eventname;
AS blob
)
SELECT
json_extract(blob, '$.vpc.vpcId') AS name,
json_extract(blob, '$.ownerId') AS projects
FROM dataset
But if I run only the inner query
SELECT responseelements FROM cloudtrail_logs
WHERE useridentity.type = 'Root'
AND eventname='CreateVpc'
ORDER BY eventsource, eventname;
it gives me the correct response as a Json
{"requestId":"40aaffac-2c53-419e-a678-926decc48557","vpc":{"vpcId":"vpc-01eff2919c7c1da07","state":"pending","ownerId":"347612567792","cidrBlock":"10.0.0.0/26","cidrBlockAssociationSet":{"items":[{"cidrBlock":"10.0.0.0/26","associationId":"vpc-cidr-assoc-04136293a8ac73600","cidrBlockState":{"state":"associated"}}]},"ipv6CidrBlockAssociationSet":{},"dhcpOptionsId":"dopt-92df95e9","instanceTenancy":"default","tagSet":{},"isDefault":false}}
and if I pass this as data as below
WITH dataset AS (
SELECT '{"requestId":"40aaffac-2c53-419e-a678-926decc48557","vpc":{"vpcId":"vpc-01eff2919c7c1da07","state":"pending","ownerId":"347612567792","cidrBlock":"10.0.0.0/26","cidrBlockAssociationSet":{"items":[{"cidrBlock":"10.0.0.0/26","associationId":"vpc-cidr-assoc-04136293a8ac73600","cidrBlockState":{"state":"associated"}}]},"ipv6CidrBlockAssociationSet":{},"dhcpOptionsId":"dopt-92df95e9","instanceTenancy":"default","tagSet":{},"isDefault":false}}'
AS blob
)
SELECT
json_extract(blob, '$.vpc.vpcId') AS name,
json_extract(blob, '$.ownerId') AS projects
FROM dataset
it gives me result , what I am missing here ? So that I am able to make it run in one shot
Is it at all possible?
You're referencing the wrong column name in your query, it should be json_extract(responseelements, '$.vpc.vpcId') AS name instead of json_extract(blob, '$.vpc.vpcId') AS name. The AS blob part of this query does nothing since you can't alias an entire query, so take it out.
The AS blob works in your last example because you're selecting a value (the json string) into a column and the AS blob gives the column a name or alias of "blob". In your original query, you're selecting an existing column named responseelements so that's what you need to refer to in the json_extract function.

Query json column based on the integer value in Postgres

I am using postgres v9.3
I have a table called temp which have a column all_data. The value looks something like below :-
{"Accountid" : "1364", "Personalid" : "4629-87c3-04e6a7a60208", "quote_number" : "QWQA62364384"}
Now, I want to query the all_data column by accountid=1364.
Could you please tell what would be the query?
Use the ->> operator
select *
from temp
where all_data ->> 'Accountid' = '1364';
Online example for 9.3: http://sqlfiddle.com/#!15/55981/1
However the above will not work if the JSON contains an array instead of a JSON object. E.g. a value like '[1,2]'::json will cause that query to fail.
With 9.4 and above you could check that using json_typeof() but with your soon to be unsupported version the only workaround I can think of is to convert the column to text and exclude those that start with a [
with valid_rows as (
select *
from temp
where all_data::text not like '[%'
)
select *
from valid_rows
where all_data ->> 'Accountid' = '1364';
Online example for 9.3: http://sqlfiddle.com/#!15/d01f43/3

Export DB2 select to CSV with headhers

I am trying to export DB2 select with headhers. But without any success, my actual code is:
db2 "EXPORT TO /tmp/result5.csv OF DEL MODIFIED BY NOCHARDEL
SELECT 1 as id, 'DEVICE_ID', 'USER_ID' from sysibm.sysdummy1
UNION ALL (SELECT 2 as id, DEVICE_ID, USER_ID FROM MOB_DEVICES) ORDER BY id"
which is not working (I suggest because USER_ID is INTEGER), when I change it for:
db2 "EXPORT TO /tmp/result5.csv OF DEL MODIFIED BY NOCHARDEL
SELECT 1 as id, 'DEVICE_ID', 'PUSH_ID' from sysibm.sysdummy1
UNION ALL (SELECT 2 as id, DEVICE_ID, PUSH_ID FROM MOB_DEVICES) ORDER BY id"
It works, DEVICE_ID and PUSH_ID are both VARCHAR.
MOB_DEVICE TABLE Any suggest how to solve this?
Thanks for advice.
DB2 will not export a CSV file with the headers, because the headers will be included as data. Normally, CSV file is for storage not viewing. If you want to view a file with its headers you have the following options:
Export to IXF file, but this file is not a flat file. You will need a spreadsheet to view it.
Export to a CSV file and include the headers by:
Select the columns names from the name, and then perform an extra step to add it to the file. You can use the describe command or perform a select on syscat.columns for this purpose, but this process is manual.
Perform a select union, in one part the data and in the other part the headers.
Perform a select and take the output to a file. Do not use export.
select * from myTable > myTable
Ignoring the EXPORT, thus just looking exclusively at the problematic UNION ALL query:
The DB2 SQL will want to conform the data of the mismatched data-types, into the numeric data-type; in this scenario, into the INTEGER data-type. Because conspicuously, the literal string value 'USER_ID' is not a valid representation of numeric value, that value can not be cast into an INTEGER value.
However, one can explicitly request to reverse that casting [whereby SQL wants to convert from string into numeric], to ensure that the SQL obeys the desired effect, to convert the INTEGER values from the column into VARCHAR values; i.e. explicit casting can ensure the data-types between the common columns of the UNION will be compatible, by forcing the values from the INTEGER column to match the data-type of the literal\constant character-string value of 'USER_ID':
with
mob_devices (DEVICE_ID, USER_ID, PUSH_ID) as
( values( varchar('dev', 1000 ), int( 1 ), varchar('pull', 1000) ) )
( SELECT 1 as id, 'DEVICE_ID', 'USER_ID'
from sysibm.sysdummy1
)
UNION ALL
( SELECT 2 as id, DEVICE_ID , cast( USER_ID as varchar(1000) )
FROM MOB_DEVICES
)
ORDER BY id

Separate Character from date on filename sql expression

I just wondering how to separate the character from the filename using SQL Expression for example Filename is IED_2015Nov020914.AF I want to separate or split the "Date" Which is "2015Nov02" in the where clause of SQL Expression.. I try to use function called "cross apply" But no luck it not working here is my working so far
where (patindex('%[0-9]%', File_name) > "2018-Nov-12"));
Thank you
The easiest way is to locate _ char and substring 9 characters.
CREATE TABLE #tab(filename VARCHAR(100));
INSERT INTO #tab VALUES ('IED_2015Nov020914.AF');
SELECT SUBSTRING(filename, CHARINDEX('_',filename)+1, 9) AS result
FROM #tab;
LiveDemo
Warning: This will work only if your data has constant format:
...._yyyyMMMdd....
If you need you can CAST result to date:
SELECT CAST(SUBSTRING(filename, CHARINDEX('_',filename)+1, 9) AS DATE) AS result
FROM #tab;
LiveDemo2

How to use a mysql variable from one table as a subquery in another, but casting it as a string (from int)

This is the query i'm trying to run:
UPDATE files set refcount=
(
SELECT count(*)
FROM comments WHERE data=files.id
)
WHERE id=?;
The problem is, comments.data is a text column (for other reasons). So I need to cast files.id as a STRING instead of what it is (an INT), because otherwise the comments.data index won't be used.
For example, this query runs fine:
SELECT count(*) FROM comments WHERE data='1234';
But this one takes forever (because it cannot use the index, comments has 10M rows):
SELECT count(*) FROM comments WHERE data=1234;
Perhaps I need to use #vars or something? I tried putting the thing in quotes, but that uses the literal "files.id" i think.
UPDATE files set refcount=
(
SELECT count(*)
FROM comments WHERE data='files.id'
)
WHERE id=?;
All you have to do is to cast files.id into string before comparing it to data
something like this :
UPDATE files set refcount=
(
SELECT count(*)
FROM comments WHERE data=CAST(files.id AS vachar)
)
WHERE id=?;
Here's a link that show how you can use cast functions and operators in mysql.
UPDATED: It seems that for some reasons CAST is not working with varchar.Though char might do the trick (whih in case it doesn't as Timh said in the comments below) CONCAT can be used to convert other types to a varchar (when you concat othery types with a string it returns a string and concating with an empty string will act as some sort of conversion :) )