I have a nested JSON to upload in Big Query.
{
"status":{
"sleep":"12333",
"wake":"3837"
}
}
After inserting it in Big Query, I am getting the field names as :
status_sleep and status_wake
I require the field names to be seperated by delimeters like '.' or any other delimeter
status.sleep and status.wake
Please suggest how to add the field deimeter. I checked there is a field delimeter key for uploading the data in csv format.
After you insert data with above schema you have record named status with two fields in it status.sleep and status.wake
When you query as
SELECT * FROM yourtable
without providing aliases - you will get output named as status_sleep and status_wake because dot notation is reserved for referencing nested data.
But you still can reference your data with dots as in below
SELECT status.sleep as sleep, status.wake as wake FROM yourtable
Related
In my SSIS project I have to retrieve my data from a flat csv file. The data itself looks something like this:
AccountType,SID,PersonID,FirstName,LastName,Email,Enabled
NOR,0001,0001,Test,Test0001,Test1#email.com,TRUE
NOR,1001,NULL,Test,Test1002,Test2#email.com,FALSE
TST,1002,NULL,Test,Test1003,Test3#email.com,TRUE
I need to read this data and make sure it has the correct datatypes for future checks. Meaning SID and PersonID should have a numeric datatype, Enabled should be a boolean. But I would like to keep the same columns and names as my source file.
It seems like the only correct way to read this data trough the 'Flat File Source'-Task is as String. Otherwise I keep getting errors because "NULL" is literally a String and not a NULL value.
Next I perform a Derived Column transformation to get rid of all "NULL" values. For example, I use the following expression for PersonId:
(TRIM(PersonID) == "" || UPPER(PersonID) == "NULL") ? (DT_WSTR,50)NULL(DT_WSTR,50) : PersonID
I would like to immediatly convert it to the correct datatype by adding it in the expression above, but it seems impossible to select another datatype for the same column when I select 'Replace 'PersonId'' in the Derived Column dropdown box.
So next up I thought of using the Data Conversion task next to change the datatypes of these columns, but when I use this it only creates new columns, even when I enter the output alias to remain the same.
How could I alter my solution to efficiently and correctly read this data and convert its values to the correct datatypes?
I have a table in AWS Glue, and the crawler has defined one field as array.
The content is in S3 files that have a json format.
The table is TableA, and the field is members.
There are a lot of other fields such as strings, booleans, doubles, and even structs.
I am able to query them all using a simpel query such as:
SELECT
content.my_boolean,
content.my_string,
content.my_struct.value
FROM schema.tableA;
The issue is when I add content.members into the query.
The error I get is: [Amazon](500310) Invalid operation: schema "content" does not exist.
Content exists because i am able to select other fiels from the main key in the json (content).
Probably is something related with how to perform the query agains array field in Spectrum.
Any idea?
You have to rename the table to extract the fields from the external schema:
SELECT
a.content.my_boolean,
a.content.my_string,
a.content.my_struct.value
FROM schema.tableA a;
I had the same issue on my data, I really don't know why it needs this cast but it works. If you need to access elements of an array you have to explod it like:
SELECT member.<your-field>,
FROM schema.tableA a, a.content.members as member;
Reference
You need to create a Glue Classifier.
Select JSON as Classifier type
and for the JSON Path input the following:
$[*]
then run your crawler. It will infer your schema and populate your table with the correct fields instead of just one big array. Not sure if this was what you were looking for but figured I'd drop this here just in case others had the same problem I had.
I'm inserting into a Postgres table with a JSON document and I want to generate a unique ID for the document. I can do that on my own, of course, but I was wondering if there was a way to have PG do it.
INSERT INTO test3 (data) VALUES ('{"key": "value", "unique": ????}')
The docs seem to indicate that JSON records fit into various SQL data types, but I don't see how that actually works.
How about just concatenating? Assuming your column is of type json/jsonb, something like the following should work:
INSERT INTO test3 (data) VALUES (('{"key": "value", "unique": "' || uuid_generate_v4() || '"}')::jsonb)
If you're looking to generate a UUID and store it at the same time as a value within a JSON data field, here is something some may find to be a little more sane:
WITH
-- Create a temporary view named "new_entry" containing your data
new_entry
-- This is how you name the view's columns
("key", "unique")
AS (
VALUES
-- This is the actual row returned by the view
(
'value',
uuid_generate_v4()
)
)
INSERT INTO
test3(
data
)
SELECT
-- Convert row to JSON. Column name = key, column value = value.
ROW_TO_JSON(new_entry.*)
FROM
new_entry
First, we're creating a temporary view named new_entry, which containing all of the data want to store in a JSON data field.
Second, we're grabbing that entry and passing it to the ROW_TO_JSON function which converts it to a valid JSON data type. Once converted, it's then inserting the row into the test3 table.
My reasoning for the "sanity" is that more than likely, your JSON object will end up containing more than just two key/value pairs... Rather, you'll end up with a hand full of keys and values, in which it'll be up to you to ensure you don't miss any quotes and escape user input appropriately. Why glue all of this together manually when you can have Postgres do it for you (with the help of ROW_TO_JSON()) while at the same time, making it easier to read and debug?
Hey im creating an Hive external table over my flat file data.
The data in my flat file is something like this :
'abc',3,'xyz'
When I load it into the Hive table it shows me the result with the single quotes.
But I want it to be something like this :
abc,3,xyz
Is there any way to do this?
I can think of two ways to get desired result.
Use existing String functions available in hive - SUBSTR and LENGTH.
select SUBSTR("\'abc\'",2,length("\'abc\'")-2) , SUBSTR("\'3\'",2,length("\'3\'")-2) , SUBSTR("\'xyz\'",2,length("\'xyz\'")-2)
Generalized query
select SUBSTR(col1,2,length(col1)-2) , SUBSTR(col2,2,length(col2)-2) , SUBSTR(col3,2,length(col3)-2)
NOTE: Hive SUBSTR method expect string index to start from "1" not "0"
Write your own UDF to chop first and last letter of every string.
How to convert million rows?
Lets assume you have a table (named "staging") with 3 columns and 1million record.
if you run below query, you will have new table "final" which will not have any single quotes at the start or end.
INSERT INTO final SELECT SUBSTR(col1,2,length(col1)-2) , SUBSTR(col2,2,length(col2)-2) , SUBSTR(col3,2,length(col3)-2) from staging
Once the above query finish job , you will have your desired result in "final" table
Description is basically the question.
I have a CSV file that I load into a table in my database. I then (SELECT * FROM tablename into OUTFILE '...') and check that the two CSVs are equal. I do this to make sure that the database was loaded properly.
My problem is that one of the fields is an ENUM type. The CSV I use to load the table contains the integer representation (for reasons I can't control) and the outfile CSV contains the string representation. Is there any way for me to cast the ENUM into its integer value when creating the outfile? I feel as though this should be an easy thing to do but I wasn't able to find an answer on Google.
Note: I have to do this for many different tables so it's difficult for me to treat each table on an individual basis.
You need to use a numeric context to get the enum's integer value. Try this:
SELECT ..., enum_field+0, ...
This will evaluate the enum_field in a numeric context, which will return the index value.