Import json files from s3 into postgres RDS - json

I want to make a script (maybe lambda?) so every new json file uploaded to this s3 is also uploaded directly into a postgres table located in PostgreSQL RDS.
The json in nested and contains lists of jsons inside, so it is not that simple to just parse it in Postgres. In addition, it has a changing number of columns, so a new file may add up a new column to the table. (If a file has a new column that didn't appear yet, I want to add it and put null objects for the rest of the table values).
How can I do it efficiently?

As suggested, you can write lambda to listen to S3 events and trigger a function when a new file is uploaded.
https://n2ws.com/blog/aws-automation/lambda-function-s3-event-triggers
One event is trigged you need to read & parse the file.
Now connect to database & run sql queries after generating them from the object.

Related

Map nested JSON (Mongo ATLAS) to SQL [Azure Data Factory]

I want to map nested json to sql table (Microsoft SSMS)
Source is a Dataset of MongoAtlas &
Sink is a Dataset of Azure SQL Database Managed Instance
I am able to map parentArray using collection reference.
but not able to select child under it.
also childArrays are kind of scalar arrays (they don't have any keys)
Note : I tried the option Map complex values to string
but it is putting the values in column cell like ["ABC", PQR] which I dont want
is there any way to map it ?
Expected output for Table : childarray2
Currently in ADF, Copy Activity supports mapping of arrays for only 1 level.
There is not way to map nested arrays.
For this I had to use Data flows.
Limitation was, we cannot use mongoDB/mongo Atlas as a input source in Data flow, so the workaround was
Convert Mongo To Azure Blob JSON (Copy Activity Task)
Use Azure Blob JSON files as a input source and then SQL tables as sink
Note: You can select this option to delete you blob files, to save storage space

Is there a way to insert the filename into the record when importing a JSON file into Power BI?

Not sure how to ask this but here goes. I have a collection of 500+ JSON files that I need to import into PowerBI. Each JSON has four different levels of information that I need to parse out. I converted the JSON top-level info into a table and transposed it so I had one row like the attached screenshot.
enter image description here
My first question is: can I easily add the filename to the JSON record? I would like to use the filename as a unique key in later queries.
Thanks!
It looks like you may be connecting to each JSON file individually? If I'm correct, assuming all the JSON files can be in a single folder, you can use the "Folder" connection. That then allows you to right-click on the original folder query and choose "reference" to then create various transformations for each JSON file, and it includes the file name.
Related details:
https://powerbi.tips/2016/06/loading-data-from-folder/
https://learn.microsoft.com/en-us/power-bi/guidance/power-query-referenced-queries
Hoping that helps!

Dynamically refer to Json value in Data Factory copy

I have ADF CopyRestToADLS activity which correctly saves json complex object to Data Lake storage. But I additionally need to pass one of the json values (myextravalue) to a stored procedure. I tried referencing it in the stored procedure parameter as #{activity('CopyRESTtoADLS').output.myextravalue but I am getting error
The actions CopyRestToADLS refernced by 'inputs' in the action ExectuteStored procedure1 are not defined in the template
{
"items": [1000 items],
"count": 1000,
"myextravalue": 15983444
}
I would like to try to dynamically reference this value because the CopyRestToADLS source REST dataset dynamically calls different REST endpoints so the structure of JSON object is different each time. But the myextravalue is always present in each JSON call.
How is it possible to refernce myextravalue and use it as a parameter?
Rich750
You could create another lookup active on REST data source to get the json value. Then pass it to the Stored Procedure active.
Yes, it will create a new REST request, and it seams to be an easy way to achieve your purpose. Lookup active to get the content of the source and won't save it.
The another solution may be get the value from the copy active output file, after the copy active completed.
I'm glad you solved it by this way:
"I created a Data Flow to read from the folder where Copy Activity saves dynamically named output json filenames. After importing schema from sample file, I selected the myextravalue as the only mapping in the Sink Mapping section."

Azure Pipeline - Importing Taskgroup via json always creates a new one instead of changing the existing

I have created a Task Group in Azure Pipeline via the GUI.
Then, I exported the JSON.
Next, I have changed the inputs in the json.
Afterward, I wanted to import this new json to change the existing TaskGroup.
Result:
It didn't update the existing TaskGroup, instead, it created a new task group called the same but as postfix " - Copy".
Analyzed:
When I downloaded the new imported Task Group I have seen that the value of Id has changed.
Anyway, I could not found a way to update the existing TaskGroup, what do I have to change in my Json in order to alter and not to create a new one?
Thanks!
Try using the Taskgroups Update API.

Data Factory v2 - Generate a json file per row

I'm using Data Factory v2. I have a copy activity that has an Azure SQL dataset as input and a Azure Storage Blob as output. I want to write each row in my SQL dataset as a separate blob, but I don't see how I can do this.
I see a copyBehavior in the copy activity, but that only works from a file based source.
Another possible setting is the filePattern in my dataset:
Indicate the pattern of data stored in each JSON file. Allowed values
are: setOfObjects and arrayOfObjects.
setOfObjects - Each file contains single object, or line-delimited/concatenated multiple objects. When this option is chosen in an output dataset, copy activity produces a single JSON file with each object per line (line-delimited).
arrayOfObjects - Each file contains an array of objects.
The description talks about "each file" so initially I thought it would be possible, but now I've tested them it seems that setOfObjects creates a line separated file, where each row is written to a new line. The setOfObjects setting creates a file with a json array and adds each line as a new element of the array.
I'm wondering if I'm missing a configuration somewhere, or is it just not possible?
What I did for now is to load the rows in to a SQL table and run a foreach for each record in the table. The I use a Lookup activity to have an array to loop in a Foreach activity. The foreach activity writes each row to a blob store.
For Olga's documentDb question, it would look like this:
In the lookup, you get a list of the documentid's you want to copy:
You use that set in your foreach activity
Then you copy the files using a copy activity within the foreach activity. You query a single document in your source:
And you can use the id to dynamically name your file in the sink. (you'll have to define the param in your dataset too):