Convert JSON to CSV in/for Azure - json

I have JSON files from Google Analytics and like to have them in CSV so they can used by Excel or Power BI.
I´ve stored them in Azure Blob Storage and DocumentDB. Is there a possibility to do this transformation in DocumentDB with the Query Explorer or the DocumentDB Data Migration Tool and a stored procedure? Or are there any other suggestions?

The DocumentDB database migration tool supports loading CSV data. Please see https://learn.microsoft.com/en-us/azure/documentdb/documentdb-import-data#a-idcsvaimport-csv-files---convert-csv-to-json

I´ve stored them in Azure Blob Storage and DocumentDB. Is there a possibility to do this transformation in DocumentDB with the Query Explorer or the DocumentDB Data Migration Tool and a stored procedure?
As I known, DocumentDB Query Explorer, Store Procedures, User Defined Functions could not transfer JSON to CSV. And the DocumentDB Data Migration Tool could only export your source to a JSON file, you could refer to this tutorial.
I have JSON files from Google Analytics and like to have them in CSV so they can used by Excel or Power BI.
Based on my understanding, you could follow this tutorial to get data from Azure DocumentDB for Power BI. For transfer JSON to CSV, I assumed that you could try to write your transformation logic based on your development language. Or you could leverage Azure Data Factory to move your JSON files from Blob or DocumentDB to Azure Table Storage, then you could use Microsoft Azure Storage Explorer to manage your table and export your records to CSV as follows:

Related

Insert JSON data into SQL DB using Airflow/python

I extracted data from an API using Airflow.
The data is extracted from the API and saved on cloud storage in JSON format.
The next step is to insert the data into an SQL DB.
I have a few questions:
Should I do it on Airflow or using another ETL like AWS Glue/Azure Data factory?
How to insert the data into the SQL DB? I google "how to insert data into SQL DB using python"?. I found a solution that loops all over JSON records and inserts the data 1 record at a time.
It is not very efficient. Any other way I can do it?
Any other recommendations and best practices on how to insert the JSON data into the SQL server?
I haven't decided on a specific DB so far, so feel free to pick the one you think fits best.
thank you!
You can use Airflow just as a scheduler to run some python/bash scripts in defined time with some dependencies rules, but you can also take advantage of the operators and the hooks provided by Airflow community.
For the ETL part, Airflow isn't an ETL tool. If you need some ETL pipelines, you can run and manage them using Airlfow, but you need an ETL service/tool to create them (Spark, Athena, Glue, ...).
To insert data in the DB, you can create your own python/bash script and run it, or use the existing operators. You have some generic operators and hooks for postgress, MySQL and the different databases (MySQL, postgres, oracle, mssql), and there are some other optimized operators and hooks for each cloud service (AWS RDS, GCP Cloud SQL, GCP Spanner...), if you want to use one of the managed/serverless services, I recommend using its operators, and if you want to deploy your service on a VM or K8S cluster, you need to use the generic ones.
Airflow supports almost all the popular cloud services, so try to choose your cloud provider based on cost, performance, team knowledge and the other needs of your project, and you will surly find a good way to achieve your goal with Airlfow.
You can use Azure Data Factory or Azure Synapse Analytics to move data in Json file to SQL server. Azure Data Factory supports 90+ connectors as of now. (Refer MS doc on Connector overview - Azure Data Factory & Azure Synapse for more details about connectors that are supported by Data Factory).
Img:1 Some connectors which are supported by ADF.
Refer MS docs on pre-requisites and Required Permissions to connect Google cloud storage with ADF
Take source connector as Google Cloud storage in copy activity. Reference: Copy data from Google Cloud Storage - Azure Data Factory & Azure Synapse | Microsoft Learn
Take SQL DB connector for sink.
ADF supports Auto create table option when there is no table created in Azure SQL database. Also, you can map the source and sink columns in mapping settings.

Querying Azure Synapse Analytics external table based on CSV

I created external table in Azure Synapse Analytics Serverless.
The File Format is CSV and it points to a Data Lake Gen 2 folder with multiple CSV files which hold the actual data. The CSV files are being updated from time to time.
I would like to foresee the potential problems that may arise when a user executes a long running query against the external table in the moment when underlying CSV files are being updated.
Will the query fail or maybe the result set will contain dirty data / inconsistent results?
As such there is no issue when connecting Synapse Serverless pool with Azure data lake. Synapse is very much compatible to query, transform and analyze and data stored in data lake.
Microsoft provide the well explained troubleshoot document in case of any error. Please refer Troubleshoot the Azure Synapse Analytics.
Synapse SQL serverless allows you to control what the behavior will be. If you want to avoid the query failures due to constantly appended files, you can use the ALLOW_INCONSISTENT_READS option.
You can see the details here:
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-single-csv-file#querying-appendable-files

How to import a JSON file to Cloud Firestore

I had developed a web app on a mySQL database and now I am switching to Android Mobile Development but I have large amount of data to be exported into Firebase's Cloud Firestore. I could not find a way to do so, I have the mySQL data stored in JSON and CSV.
Do I have to write a script? If yes then can you share the script or is there some sort of tool?
I have large amounts of data to be exported into Firebase's Cloud Firestore, I could not find a way to do so
If you're looking for a "magic" button that can convert your data from a MySQL database to a Cloud Firestore database, please note that there isn't one.
Do I have to write a script?
Yes, you have to write code in order to convert your actual MySQL database into a Cloud Firestore database. Please note that both types of databases share two different concepts. For instance, a Cloud Firestore database is composed of collections and documents. There are no tables in the NoSQL world.
So, I suggest you read the official documentation regarding Get started with Cloud Firestore.
If yes then can you share the script or is there some sort of tool.
There is no script and no tool for that. You should create your own mechanism for that.

Connect Azure Blob Storage JSON data using SSMS

I need to store Azure Blob storage JSON data inside SQL Server database. I am searching step by step guide for that.
Your requirement is very common,i provide 2 main ways for your reference to implement your needs.
1.Azure Data Factory.
You could use Copy Activity in ADF to transfer data from Azure Blob Storage into SQL DB.There are very detailed guides in this link.
2.Bulk insert T-SQL.
You could use T-SQL to load files stored in Azure Blob Storage into SQL DB.Please refer to this link and this guide.

What is the most efficient way to export data from Azure Mysql?

I have searched high and low, but it seems like mysqldump and "select ... into outfile" are both intentionally blocked by not allowing file permissions to the db admin. Wouldn't it save a lot more server resources to allow file permissions than to disallow them? Any other import/export method I can find uses executes much slower, especially with tables that have millions of rows. Does anyone know a better way? I find it hard to believe Azure left no good way to do this common task.
You did not list the other options you found to be slow, but have you thought about using Azure Data Factory:
Use Data Factory, a cloud data integration service, to compose data storage, movement, and processing services into automated data pipelines.
It supports exporting data from Azure MySQL and MySQL:
You can copy data from MySQL database to any supported sink data store. For a list of data stores that are supported as sources/sinks by the copy activity, see Supported data stores and formats
Azure Data Factory allows you to define mappings (optional!), and / or transform the data as needed. It has a pay per use pricing model.
You can start an export manually or using a schedule using the .Net or Python SKD , the Rest api or Powershell.
It seems you are looking to export the data to a file, so Azure Blob Storage or Azure Files are likely to be a good destination. FTP or the local file system are also possible.
"SELECT INTO ... OUTFILE" we can achieve this using mysqlworkbench
1.Select the table
2.Table Data export wizard
3.export the data in the form of csv or Json