Hive: Ignore field in create table - json

I have to load data in json format into hive. Problem is there exists a field which is a date which is different per record leading to all kinds of problems. The DDL for one record looks like:
CREATE EXTERNAL TABLE `not_really_awesome_table` (
`super_wtf` struct<
`10-02-2019`: string
>
`super_blah` struct <
`bleh`: string,
`blah`: string,
`sub_blah`: struct <
`blah_field`: string,
`bleh_field`: string
>
>
)
ROW FORMAT serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties ( 'ignore.malformed.json' = 'true' )
LOCATION
's3://wtf/is/this/lol'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1539066055')
;
Is there a way to ignore the super-wtf field or cast it into some type which would avoid parsing it further?

You can skip super-wtf column in the DDL and add everything else:
CREATE EXTERNAL TABLE `not_really_awesome_table` (
`super_blah` struct <
`bleh`: string,
`blah`: string,
`sub_blah`: struct <
`blah_field`: string,
`bleh_field`: string
>
>
)
In this case it will not be parsed from JSON.
Or alternatively define super-wtf column as map<string, string> in the DDL

Related

Count rows in json using /UI2/CL_JSON

i have a json value stored in a variable now i need to get the count of rows and i am facing error.
when i am storing the deserilized data in ls_response it shows error as
ls_Response is not an internal table
when i am storing the deserilized data in lt_response it show output as
total count is 0
here is the code
CLASS z_http1_code DEFINITION
PUBLIC
FINAL
CREATE PUBLIC .
PUBLIC SECTION.
INTERFACES if_oo_adt_classrun .
TYPES: BEGIN OF ty_field,
customer_id TYPE string,
address TYPE string,
created_time TYPE string,
customer TYPE string,
date_created TYPE string,
END OF ty_field,
tt_field type standard table of ty_field.
TYPES: BEGIN OF ty_record,
id TYPE string,
createdtime TYPE string,
fields TYPE ty_field,
END OF ty_record.
TYPES tt_record TYPE STANDARD TABLE OF ty_record WITH EMPTY KEY.
TYPES: BEGIN OF ty_response,
records TYPE tt_record,
END OF ty_response.
DATA:ls_Response TYPE ty_response,
lt_response type tt_record.
TYPES: BEGIN OF ty_serialize,
customer_id TYPE string,
customer TYPE string,
END OF ty_serialize.
DATA(lv_response) = `{"records":[{"id":"rec5Qk24OQpKDyykq","createdTime":"2022-08-03T10:14:43.000Z","fields":{"customer_id":"0000010001","address":"Chennai","time_created":"06:00:14","customer":"IDADMIN","date_created":"16.04.2004"}},{"id":"rec7bSe8` &&
`Zb18z6b5a","createdTime":"2022-08-08T13:07:16.000Z","fields":{"customer_id":"0000010007","address":"Kakinada","time_created":"04:01:18","customer":"Ramya","date_created":"15.04.2000"}},{"id":"recD9Hh4YLgNXOhUE","createdTime":"2022-08-08T11:48:06.00` &&
`0Z","fields":{"customer_id":"0000010002","address":"Bangalore","time_created":"04:03:35","customer":"MAASSBERG","date_created":"20.04.2004"}},{"id":"recK7Tfw4PFAedDiB","createdTime":"2022-08-03T10:14:43.000Z","fields":{"customer_id":"0000010005","a` &&
`ddress":"Kakinada","time_created":"12:55","customer":"Lakshmi","date_created":"13-10-2022"}},{"id":"recKOq0DhEtAma7BV","createdTime":"2022-08-03T10:14:43.000Z","fields":{"customer_id":"0000010006","address":"Hyderabad","time_created":"18:42:28","cu` &&
`stomer":"GLAESS","date_created":"21.04.2004"}},{"id":"recS8pg10dFBGj8o7","createdTime":"2022-08-03T10:14:43.000Z","fields":{"customer_id":"0000010003","address":"Gurugram","time_created":"04:10:02","customer":"MAASSBERG","date_created":"20.04.2004"` &&
`}},{"id":"recf4QbOmKMrBeLQZ","createdTime":"2022-08-03T10:14:43.000Z","fields":{"customer_id":"0000010004","address":"Bangalore","time_created":"06:00:12","customer":"IDADMIN","date_created":"21.04.2004"}},{"id":"recs7oHEqfkN87tWm","createdTime":"2` &&
`022-08-03T10:14:43.000Z","fields":{"customer_id":"0000010000","address":"Hyderabad","time_created":"04:01:18","customer":"MAASSBERG","date_created":"15.04.2004"}}]}`.
PROTECTED SECTION.
PRIVATE SECTION.
ENDCLASS.
/ui2/cl_json=>deserialize(
EXPORTING
json = lv_response
* jsonx =
pretty_name = /ui2/cl_json=>pretty_mode-user
* assoc_arrays =
* assoc_arrays_opt =
* name_mappings =
* conversion_exits =
* hex_as_base64 =
CHANGING
data = lt_response
).
Data LV_RowCnt type i.
*LV_RowCnt = LINES( Lt_response ).
*out->write( |'Total items count' { LV_RowCnt }| ).
You need to store the deserialized data in ls_response.
After that, you can get the row count of the field records of the structure ls_response:
DATA lv_row_count type i.
lv_row_count = LINES( ls_response-records ).

How to extract all the keys in a JSON with BigQuery

I have a json data in Bigquery like this:
**document_id time_in_pending_per_user**
1 {"quqtC1DfyAk0d5bMi7GIE7":1735,"XmrBJS4hnqLLyDH1W5X7z2":6150,"system":0}
and I want to transform this data like this to parse:
**user ID time_in_pending_per_user** document_id
quqtC1DfyAk0d5bMi7GIE7 1735 1
XmrBJS4hnqLLyDH1W5X7z2 6150 1
system 0 1
Can you help me? thanks!
Consider below
create temp function extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
select user_id, val as time_in_pending_per_user, document_id
from your_table,
unnest(extract_keys(time_in_pending_per_user)) user_id with offset
join unnest(extract_values(time_in_pending_per_user)) val with offset
using(offset)
if applied to sample data in your question - output is

JSONExtractInt64 null extract from string into clickhouse

I have a source table:
CREATE TABLE test.source
(
text_json String
)
engine = Memory;
I have a destination table:
CREATE TABLE test.destination
(
column Nullable(Int64)
)
engine = Memory;
I insert to the source:
INSERT INTO test.source (text_json) FORMAT JSONAsString {"column": null};
Then I try to parse json int value and insert into destination
INSERT INTO test.destination (column)
SELECT JSONExtractInt(text_json) FROM test.source;
However, it will insert 0. And it will be a non-deterministic behavior as if I have a real zero beforehand, it will be impossible to differentiate between NULL and 0.
How to parse from JSONString null value into Int64 column AS Null?
You can use JSON_VALUE to extract the string value and pass that to INSERT instead (with implicit or explicit casting):
INSERT INTO test.destination (column)
SELECT JSON_VALUE(text_json, '$.column')
FROM test.source
The result will be NULL:
SELECT * FROM test.destination
┌─column─┐
│ ᴺᵁᴸᴸ │
└────────

How to retrieve a value in a json object in sqlite when the key is an empty string?

I am processing an sqlite table which contains json objects. These json objects have keys that are empty strings. How can I retrieve the value? For example:
select json_extract('{"foo": "bar", "":"empty"}', '$.foo') as data;
-result: "bar"
How can I retrieve "empty"?
Using your example:
sqlite> SELECT value FROM json_each('{"foo":"bar","":"empty"}') WHERE key = '';
value
----------
empty
As part of a larger query from a table:
SELECT (SELECT j.value FROM json_each(t.your_json_column) AS j WHERE j.key = '') AS data
FROM your_table AS t;

Unable to work with unixtimestamp column type in string data type

I have a hive table to load JSON data. There are two values in my JSON. Both have data type as string. If I keep them as bigint, then select on this table gives below error:
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Current token (VALUE_STRING) not numeric, can not use numeric value accessors
at [Source: java.io.ByteArrayInputStream#3b6c740b; line: 1, column: 21]
If I change it two string, then it works OK.
Now, because these columns are in string, I am not able to use from_unixtime method for these columns.
If I try to alter these columns data types from string to bigint, I get below error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions : uploadtimestamp
Below is my create table statement:
create table ABC
(
uploadTimeStamp bigint
,PDID string
,data array
<
struct
<
Data:struct
<
unit:string
,value:string
,heading:string
,loc:string
,loc1:string
,loc2:string
,loc3:string
,speed:string
,xvalue:string
,yvalue:string
,zvalue:string
>
,Event:string
,PDID:string
,`Timestamp`:string
,Timezone:string
,Version:string
,pii:struct<dummy:string>
>
>
)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile;
My JSON:
{"uploadTimeStamp":"1488793268598","PDID":"123","data":[{"Data":{"unit":"rpm","value":"100"},"EventID":"E1","PDID":"123","Timestamp":1488793268598,"Timezone":330,"Version":"1.0","pii":{}},{"Data":{"heading":"N","loc":"false","loc1":"16.032425","loc2":"80.770587","loc3":"false","speed":"10"},"EventID":"Location","PDID":"skga06031430gedvcl1pdid2367","Timestamp":1488793268598,"Timezone":330,"Version":"1.1","pii":{}},{"Data":{"xvalue":"1.1","yvalue":"1.2","zvalue":"2.2"},"EventID":"AccelerometerInfo","PDID":"skga06031430gedvcl1pdid2367","Timestamp":1488793268598,"Timezone":330,"Version":"1.0","pii":{}},{"EventID":"FuelLevel","Data":{"value":"50","unit":"percentage"},"Version":"1.0","Timestamp":1488793268598,"PDID":"skga06031430gedvcl1pdid2367","Timezone":330},{"Data":{"unit":"kmph","value":"70"},"EventID":"VehicleSpeed","PDID":"skga06031430gedvcl1pdid2367","Timestamp":1488793268598,"Timezone":330,"Version":"1.0","pii":{}}]}
Any ways I can convert this string unixtimestamp to standard time or I can work with bigint for these columns?
If you are talking about Timestamp and Timezone then you can define them as int/big int types.
If you'll look on their definition you'll see that there are no qualifiers (") around the values, therefore they are of numeric types within in the JSON doc:
"Timestamp":1488793268598,"Timezone":330
create external table myjson
(
uploadTimeStamp string
,PDID string
,data array
<
struct
<
Data:struct
<
unit:string
,value:string
,heading:string
,loc3:string
,loc:string
,loc1:string
,loc4:string
,speed:string
,x:string
,y:string
,z:string
>
,EventID:string
,PDID:string
,`Timestamp`:bigint
,Timezone:smallint
,Version:string
,pii:struct<dummy:string>
>
>
)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile
location '/tmp/myjson'
;
+------------------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| myjson.uploadtimestamp | myjson.pdid | myjson.data |
+------------------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1486631318873 | 123 | [{"data":{"unit":"rpm","value":"0","heading":null,"loc3":null,"loc":null,"loc1":null,"loc4":null,"speed":null,"x":null,"y":null,"z":null},"eventid":"E1","pdid":"123","timestamp":1486631318873,"timezone":330,"version":"1.0","pii":{"dummy":null}},{"data":{"unit":null,"value":null,"heading":"N","loc3":"false","loc":"14.022425","loc1":"78.760587","loc4":"false","speed":"10","x":null,"y":null,"z":null},"eventid":"E2","pdid":"123","timestamp":1486631318873,"timezone":330,"version":"1.1","pii":{"dummy":null}},{"data":{"unit":null,"value":null,"heading":null,"loc3":null,"loc":null,"loc1":null,"loc4":null,"speed":null,"x":"1.1","y":"1.2","z":"2.2"},"eventid":"E3","pdid":"123","timestamp":1486631318873,"timezone":330,"version":"1.0","pii":{"dummy":null}},{"data":{"unit":"percentage","value":"50","heading":null,"loc3":null,"loc":null,"loc1":null,"loc4":null,"speed":null,"x":null,"y":null,"z":null},"eventid":"E4","pdid":"123","timestamp":1486631318873,"timezone":330,"version":"1.0","pii":null},{"data":{"unit":"kmph","value":"70","heading":null,"loc3":null,"loc":null,"loc1":null,"loc4":null,"speed":null,"x":null,"y":null,"z":null},"eventid":"E5","pdid":"123","timestamp":1486631318873,"timezone":330,"version":"1.0","pii":{"dummy":null}}] |
+------------------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Even if you have defined Timestamp as a string you can still cast it to a bigint before using it in a function that requires a bigint.
cast (`Timestamp` as bigint)
hive> with t as (select '0' as `timestamp`) select from_unixtime(`timestamp`) from t;
FAILED: SemanticException [Error 10014]: Line 1:45 Wrong arguments
'timestamp': No matching method for class
org.apache.hadoop.hive.ql.udf.UDFFromUnixTime with (string). Possible
choices: FUNC(bigint) FUNC(bigint, string) FUNC(int)
FUNC(int, string)
hive> with t as (select '0' as `timestamp`) select from_unixtime(cast(`timestamp` as bigint)) from t;
OK
1970-01-01 00:00:00