Loop through csv file and build json file in python - json

Have a csv file with 3 columns
Col1 : SCHEMA
col2 : TABLENAME
col3 : COLUMNNAME
There are going to be around 10k rows in them. Need a script to roll through the csv and create a json as below and keep appending the same json until the last.
I want to build a json block as below :-
{
"name": "SCHEMA.TABLENAME",
"table_manipulation": {
"owner": "SCHEMA",
"name": "TABLENAME",
"transform_columns": [{
"column_name": "COLUMNNAME",
"action": "KEEP",
"computation_expression": "replace($COLUMNNAME,\"'\",\"\")"
}],
"source_table_settings": {
"unload_segments": {
"ranges": {},
"entry_names": {}
}
}
}
},
As for script - python would be best would powershell would work also.
Any help will be greatly appreciated - thanks in advance

Related

Parse JSON object dynamically in Bigquery + dbt

I have a json message like below. I am using dbt and with Big query plug in. I need to create table dynamically in Big query
{
"data": {
"schema":"dev",
"payload": {
"lastmodifieddate": "2022-11-122 00:01:28",
"changeeventheader": {
"changetype": "UPDATE",
"changefields": [
"lastmodifieddate",
"product_value"
],
"committimestamp": 18478596845860,
"recordIds":[
"568069"
]
},
"product_value" : 20000
}
}
}
I need to create table dynamically with recordIds and changed fields. This field list changes dynamically whenever source sends update..
Expected output:
recordIds | product_value | lastmodifieddate |changetype
568069 | 20000 | 2022-11-122 00:01:28 |UPDATE
Thanks for your suggestions and help!.
JSON objects can be saved in a BigQuery table. There is no need to use dbt here.
with tbl as (select 5 row, JSON '''{
"data": {
"schema":"dev",
"payload": {
"lastmodifieddate": "2022-11-122 00:01:28",
"changeeventheader": {
"changetype": "UPDATE",
"changefields": [
"lastmodifieddate",
"product_value"
],
"committimestamp": 18478596845860,
"recordIds":[
"568069"
]
},
"product_value" : 20000
}
}
}''' as JS)
select *,
JSON_EXTRACT_STRING_ARRAY(JS.data.payload.changeeventheader.recordIds) as recordIds,
JSON_EXTRACT_SCALAR(JS.data.payload.product_value) as product_value,
Json_value(JS.data.payload.lastmodifieddate) as lastmodifieddate,
Json_value(JS.data.payload.changeeventheader.changetype) as changetype
from tbl
If the JSON is saved as string in a BigQuery table, please use PARSE_JSON(column_name) to convert the string to JSON first.

Azure data factory ingest csv with full stop in header

I have a copy data activity in Azure data factory which reads in a csv file. This csv file is produced by a 3rd party so I cannot change it. One of the headings has a full stop (or period) in it: 'foo.bar'. When I run the activity I get the error message:
Failure happened on 'Source' side. ErrorCode=JsonInvalidDataFormat,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Error occurred when deserializing source JSON file 'foo;bar'.
Check if the data is in valid JSON object format.,Source=Microsoft.DataTransfer.ClientLibrary,'
The csv look like this
state,sys_updated_on,foo.bar,sys_id
New,03/06/2021 12:42:18,S Services,xxx
Resolved,03/06/2021 12:35:06,MS Services,yyy
New,03/06/2021 12:46:18,S Services,zzz
The source dataset looks like this:
{
"name": "my_dataset",
"properties": {
"linkedServiceName": {
"referenceName": "my_linked_service",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": "i.csv",
"folderPath": "Temp/exports/widgets",
"container": "blah"
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
},
"schema": []
}
}
You can skip the header like this:
Don't set first row as header:
Set Skip line count: 1 at source settings:
Some other way, you also could use Data Flow with Derived column to create a new schema to replace the column foo.bar:

Parsing JSON in SAS

Does anyone know how to convert the following JSON to table format in SAS? Appreciate in advance any help!
JSON
{
"totalCount": 2,
"facets": {},
"content": [
[
{
"name": "customer_ID",
"value": "1"
},
{
"name": "customer_name",
"value": "John"
}
],
[
{
"name": "customer_ID",
"value": "2"
},
{
"name": "customer_name",
"value": "Jennifer"
}
]
]
}
Desired Output
customer_ID
customer_name
1
John
2
Jennifer
Steps I've Taken
1- Call API
filename request "C:\path.request.txt";
filename response "C:\path.response.json";
filename status "C:\path.status.json";
proc http
url="http://httpbin.org/get"
method="POST"
in=request
out=response
headerout=status;
run;
2- I have the following JSON MAP file save:
{
"DATASETS": [
{
"DSNAME": "customers",
"TABLEPATH": "/root/content",
"VARIABLES": [
{
"NAME": "name",
"TYPE": "CHARACTER",
"PATH": "/root/content/name"
},
{
"NAME": "value",
"TYPE": "CHARACTER",
"PATH": "/root/content/value"
}
]
}
]
}
3- I use the above JSON Map file as follow:
filename jmap "C:\path.jmap.map";
libname cust json map=jmap access=readonly;
proc copy inlib=cust outlib=work;
run;
4- This generates a table like this, which is not what I need:
name
value
customer_id
1
customer_value
John
customer_id
2
customer_value
Jennifer
From where you are, you have a very trivial step to convert to what you want - PROC TRANSPOSE.
filename test "h:\temp\test.json";
libname test json;
data pre_trans;
set test.content;
if name='customer_ID' then row+1;
run;
proc transpose data=pre_trans out=want;
by row;
id name;
var value;
run;
You could also do this directly in the data step; there are advantages to going either way.
data want;
set test.content;
retain customer_ID customer_name;
if name='customer_ID' then customer_ID=input(value,best.);
else if name='customer_name' then do;
customer_name = value;
output;
end;
run;
This data step works okay for the example above - the proc transpose works better for more complex examples, as you only have to hardcode the one value.
I suspect you could do this more directly with a proper JSON map, but I don't usually do this sort of thing that way - it's easier for me to just get it into a dataset and then work with it from there.
In this case, SAS is getting tripped up by the double arrays with no content before the second array - if there was some (any) content there, it would parse more naturally. Since there's nothing for SAS to really judge what you want to do with that Content array, it just lets you do whatever you want with it - which is easy enough.

Pentaho Kettle: How to dynamically fetch JSON file columns

Background: I work for a company that basically sells passes. Every order that is placed by the customer will contain N number of passes.
Issue: I have these JSON event-transaction files coming into a S3 bucket on a daily basis from DocumentDB (MongoDB). This JSON file is associated to the relevant type of event (insert, modify or delete) for every document key (which is an order in my case). The example below illustrates a "Insert" type of event that came through to the S3 bucket:
{
"_id": {
"_data": "11111111111111"
},
"operationType": "insert",
"clusterTime": {
"$timestamp": {
"t": 11111111,
"i": 1
}
},
"ns": {
"db": "abc",
"coll": "abc"
},
"documentKey": {
"_id": {
"$uuid": "abcabcabcabcabcabc"
}
},
"fullDocument": {
"_id": {
"$uuid": "abcabcabcabcabcabc"
},
"orderNumber": "1234567",
"externalOrderId": "12345678",
"orderDateTime": "2020-09-11T08:06:26Z[UTC]",
"attraction": "abc",
"entryDate": {
"$date": 2020-09-13
},
"entryTime": {
"$date": 04000000
},
"requestId": "abc",
"ticketUrl": "abc",
"tickets": [
{
"passId": "1111111",
"externalTicketId": "1234567"
},
{
"passId": "222222222",
"externalTicketId": "122442492"
}
],
"_class": "abc"
}
}
As we see above, every JSON file might contain N number of passes and every pass is - in turn - is associated to an external ticket id, which is a different column (as seen above). I want to use Pentaho Kettle to read these JSON files and load the data into the DW. I am aware of the Json input step and Row Normalizer that could then transpose "PassID 1", "PassID 2", "PassID 3"..."PassID N" columns into 1 unique column "Pass" and I would have to have to apply a similar logic to the other column "External ticket id". The problem with that approach is that it is quite static, as in, I need to "tell" Pentaho how many Passes are coming in advance in the Json input step. However what if tomorrow I have an order with 10 different passes? How can I do this dynamically to ensure the job will not break?
If you want a tabular output like
TicketUrl Pass ExternalTicketID
---------- ------ ----------------
abc PassID1Value1 ExTicketIDvalue1
abc PassID1Value2 ExTicketIDvalue2
abc PassID1Value3 ExTicketIDvalue3
And make incoming value dynamic based on JSON input file values, then you can download this transformation Updated Link
I found everything work dynamic in JSON input.

Copy JSON Array data from REST data factory to Azure Blob as is

I have used REST to get data from API and the format of JSON output that contains arrays. When I am trying to copy the JSON as it is using copy activity to BLOB, I am only getting first object data and the rest is ignored.
In the documentation is says we can copy JSON as is by skipping schema section on both dataset and copy activity. I followed the same and I am the getting the output as below.
https://learn.microsoft.com/en-us/azure/data-factory/connector-rest#export-json-response-as-is
Tried copy activity without schema, using the header as first row and output files to BLOB as .json and .txt
Sample REST output:
{
"totalPages": 500,
"firstPage": true,
"lastPage": false,
"numberOfElements": 50,
"number": 0,
"totalElements": 636,
"columns": {
"dimension": {
"id": "variables/page",
"type": "string"
},
"columnIds": [
"0"
]
},
"rows": [
{
"itemId": "1234",
"value": "home",
"data": [
65
]
},
{
"itemId": "1235",
"value": "category",
"data": [
92
]
},
],
"summaryData": {
"totals": [
157
],
"col-max": [
123
],
"col-min": [
1
]
}
}
BLOB Output as the text is below: which is only first object data
totalPages,firstPage,lastPage,numberOfElements,number,totalElements
500,True,False,50,0,636
If you want to write the JSON response as is, you can use an HTTP connector. However, please note that the HTTP connector doesn't support pagination.
If you want to keep using the REST connector and to write a csv file as output, can you please specify how you want the nested objects and arrays to be written ?
In csv files, we can not write arrays. You could always use a custom activity or an azure function activity to call the REST API, parse it the way you want and write to a csv file.
Hope this helps.