Data Factory v2 - Generate a json file per row - json

I'm using Data Factory v2. I have a copy activity that has an Azure SQL dataset as input and a Azure Storage Blob as output. I want to write each row in my SQL dataset as a separate blob, but I don't see how I can do this.
I see a copyBehavior in the copy activity, but that only works from a file based source.
Another possible setting is the filePattern in my dataset:
Indicate the pattern of data stored in each JSON file. Allowed values
are: setOfObjects and arrayOfObjects.
setOfObjects - Each file contains single object, or line-delimited/concatenated multiple objects. When this option is chosen in an output dataset, copy activity produces a single JSON file with each object per line (line-delimited).
arrayOfObjects - Each file contains an array of objects.
The description talks about "each file" so initially I thought it would be possible, but now I've tested them it seems that setOfObjects creates a line separated file, where each row is written to a new line. The setOfObjects setting creates a file with a json array and adds each line as a new element of the array.
I'm wondering if I'm missing a configuration somewhere, or is it just not possible?

What I did for now is to load the rows in to a SQL table and run a foreach for each record in the table. The I use a Lookup activity to have an array to loop in a Foreach activity. The foreach activity writes each row to a blob store.
For Olga's documentDb question, it would look like this:
In the lookup, you get a list of the documentid's you want to copy:
You use that set in your foreach activity
Then you copy the files using a copy activity within the foreach activity. You query a single document in your source:
And you can use the id to dynamically name your file in the sink. (you'll have to define the param in your dataset too):

Related

Azure Data Factory - Passing lookup value/id into ForEach copy task not working

I'm attempting to pass the ID from one storage routine into another copy task, which requires a for each to recursively process each ID. I've setup the Lookup ID task, which is working. It's passing these objects into my for each, in which the settings are "sequential" with items set to the following: #activity('LookupUID').output.value
foreach
In my for each, I have 1 activity to copy data from another API call to an Azure SQL Database. I have a linked service, with a parameter that is being passed. I'm attempting to use a dynamic content operator to pass the current item from the for each into this parameter, which then gets sent to the API call for the ID parameter. When I manually plug in a value here, it works fine. However, trying to pass the value from the for each into this copy task parameter doesn't produce a data row when running the task.
copy task
output
You must mention the column name along with the current item in copy activity like #item().ID
Example:
I have a lookup activity to get the IDs from a source. Below is the output of the Lookup activity with a list of IDs.
Lookup Output:
I am looping these IDs in the ForEach activity and passing the current item to a variable.
ForEach activity setting: Items- #activity('Lookup1').output.value
I have a string variable in which I am passing the current item as below using Set Variable activity.
#string(item().ID)
Output:
Use this code and replace with your column name
#activity('lookup1').output.firstrow.columnname

Map nested JSON (Mongo ATLAS) to SQL [Azure Data Factory]

I want to map nested json to sql table (Microsoft SSMS)
Source is a Dataset of MongoAtlas &
Sink is a Dataset of Azure SQL Database Managed Instance
I am able to map parentArray using collection reference.
but not able to select child under it.
also childArrays are kind of scalar arrays (they don't have any keys)
Note : I tried the option Map complex values to string
but it is putting the values in column cell like ["ABC", PQR] which I dont want
is there any way to map it ?
Expected output for Table : childarray2
Currently in ADF, Copy Activity supports mapping of arrays for only 1 level.
There is not way to map nested arrays.
For this I had to use Data flows.
Limitation was, we cannot use mongoDB/mongo Atlas as a input source in Data flow, so the workaround was
Convert Mongo To Azure Blob JSON (Copy Activity Task)
Use Azure Blob JSON files as a input source and then SQL tables as sink
Note: You can select this option to delete you blob files, to save storage space

How to read csv file data line by line in Azure Data Factory and store it in a variable

I want to read a csv file line by line and store that value in variable so that i can pass that in for each activity in Azure data factory.
So I want to read records line by line and store each record in a variable so that we can pass in for each activity one by one and generate new data based on these records.
How can we achive this ?
You can follow these steps:
Lookup active to get the data of the csv data.
Foreach the csv rows.
In Foreach active, set the row value to the variable.
Build your active after the variable, for example:
If you have any other concerns, please feel free to let me know.
HTH.

Dynamically refer to Json value in Data Factory copy

I have ADF CopyRestToADLS activity which correctly saves json complex object to Data Lake storage. But I additionally need to pass one of the json values (myextravalue) to a stored procedure. I tried referencing it in the stored procedure parameter as #{activity('CopyRESTtoADLS').output.myextravalue but I am getting error
The actions CopyRestToADLS refernced by 'inputs' in the action ExectuteStored procedure1 are not defined in the template
{
"items": [1000 items],
"count": 1000,
"myextravalue": 15983444
}
I would like to try to dynamically reference this value because the CopyRestToADLS source REST dataset dynamically calls different REST endpoints so the structure of JSON object is different each time. But the myextravalue is always present in each JSON call.
How is it possible to refernce myextravalue and use it as a parameter?
Rich750
You could create another lookup active on REST data source to get the json value. Then pass it to the Stored Procedure active.
Yes, it will create a new REST request, and it seams to be an easy way to achieve your purpose. Lookup active to get the content of the source and won't save it.
The another solution may be get the value from the copy active output file, after the copy active completed.
I'm glad you solved it by this way:
"I created a Data Flow to read from the folder where Copy Activity saves dynamically named output json filenames. After importing schema from sample file, I selected the myextravalue as the only mapping in the Sink Mapping section."

Beanshell script to use data from CSV

I have dynamically created parameters using regular expression extractor and beanshell script (given below). I am creating parameters with Name = "pass_" + i.
Now I need to populate the value of these parameter field from a CSV file. I have loaded a CSV file and the login variable contains the value of the first row. The below code populates only the first value in the CSV file. I need the code to iterate through the CSV file and populate the parameter fields with next values present in the first column.
int count = Integer.parseInt(vars.get("pass_matchNr"));
for(int i=1;i<=count;i++) { //regex counts are 1 based
sampler.addArgument(vars.get("pass_" + i),vars.get("login"));}
Try using a CSV data config object. You point to the path of your CSV and can then reference each CSV column in a Jmeter variable with ease. With each iteration, your Jmeter variable will hold the value of the next row in your CSV. From here you can use vars.get("yourVar"); to feed this Jmeter variable into your BeanShell script.
Alternatively, if you need the population from CSV to be done in one pass, an option could be to use the CSV data config object and set up your first column and row to be a concatenation of all the values found in the CSV for example 'ValueA,ValueB,ValueC'. You can then feed this variable into your Jmeter script and parse it in BeanShell by doing a split on (','). That'll leave you with all the values found in your CSV.
If these 2 options are unsuitable, a final option would be to create your own Java custom method which you can then feed into your BeanShell script. For example, you could create a class which reads your CSV file and returns a string in the format you desire. For a detailed step by step guide on setting up custom functions in Jmeter, refer to this article.