Can we COPY External JSON File into Snowflake? - json

I am trying to load External JSON File from Azure Blob Storage to Snowflake. I created the table LOCATION_DETAILS with all columns as Variant. When I try to load into the table, I am getting the below error:
Can anyone help me on this?

You need to create a file format and mention the type of file and other specification like below:
create or replace file format myjsonformat
type = 'JSON'
strip_outer_array = true;
And then try to load the file it will work.

When I use external data for Snowflake, I like to create stages that are linked to the BlobStorage (in this case), it's easy and you can do everything really easy and transparent, just as if it would be local data.
Create the stage linked to the blobstorage like this:
CREATE OR REPLACE STAGE "<DATABASE>"."<SCHEMA>"."<STAGE_NAME>"
URL='azure://demostorage178.blob.core.windows.net/democontainer'
CREDENTIALS=(AZURE_SAS_TOKEN='***********************************************')
FILE_FORMAT = (TYPE = JSON);
After that, you can list what is in the blobstorage fromo snowflake like this:
list #"<DATABASE>"."<SCHEMA>"."<STAGE_NAME>";
Or like this:
use database "<DATABASE>";
use schema "<SCHEMA>";
SELECT * FROM #"STAGE_NAME"/sales.json;
If you need to create the table, use this:
create or replace table "<DATABASE>"."<SCHEMA>"."<TABLE>" (src VARIANT);
And you can COPY your data like this (for a single file):
copy into "<DATABASE>"."<SCHEMA>"."<TABLE>" from #"<STAGE_NAME>"/sales.json;
Finally, use this for all new data that you get in your stage. Note: You don't need to erase previous data, it will ignore it and will load only the new one.
copy into "<DATABASE>"."<SCHEMA>"."<TABLE>" from #"STAGE_NAME";

Related

Loading and unloading JSON files using AWS Athena

I'm trying to load, filter and unload some json files using AWS Athena:
CREATE EXTERNAL TABLE IF NOT EXISTS
json_based_table(file_line string)
LOCATION 's3://<my_bucket>/<some_path/';
UNLOAD
(SELECT file_line from json_based_table limit 10)
TO 's3://<results_bucket>/samples/'
WITH (format = 'JSON');
Problem is the output is a set of files containing a json per line that has a single key "file_line" who's value is a json line from the original file as a string.
How do I UNLOAD such a table values only? (ignoring the column name I had to create to load the files)
It seems that by choosing
WITH (format = 'TEXTFILE');
I can get what I want.
Choosing JSON as a format is good for preserving the tabular structure of the table in a file and was a misleading name in this case.

Alternative to JSON Flattening via Target Table in Snowflake

Per snowflake: https://docs.snowflake.net/manuals/user-guide/json-basics-tutorial-copy-into.html I created a target table (Testing_JSON), that is a single Variant column that contains an uploaded JSON file.
My Question is How can I cut out creating this "Target Table (i.e. Testing_JSON)" that is a single Variant Column that I have to reference to create the actual and only table I want (TABLE1), which contains the flattened JSON. I have found no way to read in a JSON file from my desktop and 'parse it on the fly' to create a flattened table, VIA THE UI. Not using the CLI as I know this can be done using PUT/COPY INTO
create or replace temporary table TABLE1 AS
SELECT
VALUE:col1::string AS COL_1,
VALUE:col2::string AS COL_2,
VALUE:col3::string AS COL_3
from TESTING_JSON
, lateral flatten( input => json:value);
You can't do this through the UI. If you want to do this then you need to use an external tool on your desktop or - as Mike mentioned - in the COPY statement.
You're going to need to do this in a few steps from your desktop.
use SnowSQL or some other tool to get your JSON file up to blob
storage:
https://docs.snowflake.net/manuals/sql-reference/sql/put.html
use a COPY INTO statement to get the data loaded directly to the flattened table that you want to load to. This will require a SELECT statement in your COPY INTO:
https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html
There is a good example of this here:
https://docs.snowflake.net/manuals/user-guide/querying-stage.html#example-3-querying-elements-in-a-json-file

How to use a value in CSV file as relationship lable in neo4j?

I am trying to create a graph in neo4j and my data which is in CSV file looks like
node1,connection,node2
PPARA,0.5,PPARGC1A
PPARA,0.5,ENSG00000236349
PPARA,0.5,TSPA
I want connection values to use as labels of relationships in graph which I am not able to do. Following is the exact code I am using to create graph.
LOAD CSV WITH HEADERS FROM "file:///C:/Users/username/Desktop/Cytoscape-friend.csv" AS network
CREATE (:GeneRN2{sourceNode:network.node1, destNode:network.node2})
CREATE (sourceNode) -[:TO {exp:network.connection}] ->(destNode)
My second question is that as there are multiple repeating values in my file, by default neo4j is creating multiple nodes for repeating values. How do I create single node for multiple values and connect all other relating nodes to that single node?
Relationships do not have labels. They have a type.
If you need to specify the type of relationship from a variable, then you need to use the procedure apoc.create.relationship from the APOC library.
To avoid creating duplicate nodes, use MERGE instead of CREATE.
So your query might look like this:
LOAD CSV WITH HEADERS
FROM "file:///C:/Users/username/Desktop/Cytoscape-friend.csv"
AS network
MERGE (sourceNode {id:network.node1})
MERGE (destNode {id:network.node2})
WITH sourceNode,
destNode,
network
CALL apoc.create.relationship(sourceNode, network.connection, {}, destNode) yield rel
RETURN sourceNode,
rel,
destNode

How to write a JSON extracted value to a csv file in jmeter for a specific variable

I have a csv file that looks like this:
varCust_id,varCust_name,varCity,varStateProv,varCountry,varUserId,varUsername
When I run the HTTP Post Request to create a new customer, I get a JSON response. I am extracting the cust_id and cust_name using the json extractor. How can I enter this new value into the csv for the correct variable? For example, after creating the customer, the csv would look like this:
varCust_id,varCust_name,varCity,varStateProv,varCountry,varUserId,varUsername
1234,My Customer Name
Or once I create a user, the file might look like this:
varCust_id,varCust_name,varCity,varStateProv,varCountry,varUserId,varUsername
1234,My Customer Name,,,,9876,myusername
In my searching through the net, I have found ways and I'm able to append these extracted variables to a new line but in my case, I need to replace the value in the correct location so it is associated to the correct variable I have set up in the csv file.
I believe what you're looking to do can be done via a BeanShell PostProcessor and is answered here.
Thank you for the reply. I ended up using User Defined Variables for some things and BeanShell PreProcessors for other bits vs. using the CSV.
Well, never tried this. But what you can do is create all these variables and set them to Null / 0.
Once done, update these during your execution. At the end, you can concatenate these with any delimiter (say ; or Tab) and just push in CSV as a single string.
Once you got data in CSV, you can easily split in Ms excel.

CSV LOAD and updating existing nodes / creating new ones

I might be on the wrong track so I could use some helpful input. I receive data from other systems by CSV files which I can import into my DB with CSV LOAD. So far so good.
I stucked when I need to reload the CSV again to follow up updates. I cannot delet the former data as I might have additional user input already attached so I would need a query that imports the CSV data, makes a match and when it finds the node it will just use SET to override the existing properties. Saying that I am unsure how to catch the cases where there is no node in the DB (new record) and we need to create a node.
LOAD CSV FROM "file:xxx.csv" AS csvLine
MATCH (c:Customer {code:"ABC"})
SET c.name = name: csvLine[0]
***OPTIONAL MATCH // Here I am unsure how to express when the node is not found***
MERGE (c:Customer { name: csvLine[0], code: csvLine[1]})
So ideally Cypher would check if the node is there and make an UPDATE by SET the new property coming with the CSV or - if the node cannot be found - creates a new one with the CSV data.
And - as a sidenote: How would I find nodes that are not in the CSV file but in the DB in order to mark them as obsolete? (This might not be able in the import but maybe someone has an idea how to solve this in order to keep the DB clean of deleted records - which can only be detected by a comparison with the latest CSV import - happy for every idea).
Any idea or hint how to write the query for updaten the graph while importing?
You need to use MERGEs ON MATCH and/or ON CREATE handlers, see http://neo4j.com/docs/stable/query-merge.html#_use_on_create_and_on_match. I assume the customer code in the second column is the identifier - so the name in column one might change on updates:
LOAD CSV FROM "file:xxx.csv" AS csvLine
MERGE (c:Customer {code:csvLine[1]})
ON CREATE SET c.name = csvLine[0]
ON MATCH SET c.name = csvLine[0]