Azure Data Factory v2 Data Transformation - json

I am new to Azure Data Factory. And my question is, I have a requirement to move the data from an on-premise Oracle and on-premise SQL Server to a Blob storage. The data need to be transformed into JSON format. Each row as one JSON file. This will be moved to an Event Hub. How can I achieve this. Any suggestions.

You could use lookup activity + foreach activity. And inside the foreach, there is a copy activity. Please reference this post. How to copy СosmosDb docs to Blob storage (each doc in single json file) with Azure Data Factory

The Data copy tool as part of the azure data factory is an option to copy on premises data to azure.
the data copy tool comes with a configuration wizard where you do all the required steps like configuring the source, sink, integration pipeline etc.
In the source you need to write a custom query to fetch data from the tables you require in json format.
In case of SQL server to select json you would use the options OPENJSON, FOR JSON AUTO to convert the rows to json. Supported in SQL 2016. For older versions you need to explore the options available. Worst case you can write a simple console app in C#/java to fetch the rows and then convert them to json file. And then you can upload the file to azure blob storage. If this is an one time activity this option should work and you may not require a data factory.
In case of ORACLE you can use the JSON_OBJECT function.

Related

Azure Logic Apps how to combine multiple JSON files to Blob storage

I have a logic app that calls an API daily and saves the output to a .JSON file within a blob/storage container
These JSON files are then picked up by Power BI for reporting purposes.
The number of files is growing quickly and I want to see if it's possible to have just one JSON file which gets appended with the new data each day?
Power BI can then just connect to one file.
Use the Update Blob connector to append the data in a blob.
Follow the workflow:
Blob Content - expression:
Use this concat expression to add your API response to append in blob:
concat(body('Get_blob_content_(V2)'),outputs('Compose'))
Result:

Can you get full content of a JSON file inside a blob storage into a ADF variable? If so, how?

Inside ADF I'm trying to get the ready-made contents of a query for a GraphQL API (Web activity block) stored in a JSON somewhere in the blob. Because of speed requirements, we can't afford to just spin up Databricks every single time.
What can be done to get the content, not metadata of a JSON file and store it inside a ADF variable that would parametrize further pipeline blocks (the path to the file is known, fixed, and the file is accessible via a linked service)?
I would go with creating meta data Azure SQL database (basic cost only 5 usd per month). It can be connected via private link with Azure Data Factory. This is simplest and fastest way. You just save data there and later fill dataflow etc. parameters with results from that database.

How to set the path of a CSV file that is in account storage in azure data factory pipeline

I have created a SSIS package that reads from a CSV file (using the Flat file connection manager) and loads records into a database. I have deployed it on Azure data factory pipeline and I need to give the path of the CSV file as a parameter. I have created a azure storage account and uploaded the source file there as shown below.
Can I just give the URL of the source file for the Import file in the SSIS package settings as shown below? I tried it but it is currently throwing 2906 error. I am new to Azure - appreciate any help here.
First, you said Excel and then you said CSV. Those are two different formats. But since you mention the flat file connection manager, I'm going to assume you meant CSV. If not, let me know and I'll update my answer.
I think you will need to install the SSIS Feature Pack for Azure and use the Azure Storage Connection Manager. You can then use the Azure Blob Source in your data flow task (it supports CSV files). When you add the blob source, the GUI should help you create the new connection manager. There is a tutorial on MS SQL Tips that shows each step. It's a couple years old, but I don't think much has changed.
As a side thought, is there a reason you chose to use SSIS over native ADF V2? It does a nice job of copying data from blob storage to a database.

Convert JSON to CSV in Azure Cosmos DB or Azure Blob

I need to move JSON data in Azure Cosmos DB to Azure Blob and eventually need data to be in CSV format.
Found out there's a feature that converts CSV data to JSON but can't find the other way around..
It really doesn't matter where I convert data from JSON to CSV, either in Azure Cosmos DB or in Azure Blob. How could I do this? Thanks!
Based on you requirements, I think Azure Data Factory is your perfect option.
You could follow this tutorial to configure Cosmos DB Output and Azure Blob Storage Input.
Input:
Output:
Then use Copy Pipelines to process data.
Copy Pipelines:
Result:
Hope it helps you.

Using blob as a JSON data feed

As Azure blob storage is a very cheap data storage solution, would it make sense to store JSON data in blobs so clients can access it like a data API? This way we don't need to spin up Web Apps/APIs to server JSON data.
That could work, depending on the scenario. The blobs will be updated on-the-fly when you push a new version of the JSON files.
I demonstrated this a while ago with a simple app that uploads and updates a file. Clients could target the URL and for them is seemed like they were accessing a JSON data feed that kept being updated.