Copy Data from MySQL (on-premises) to Cosmos DB - mysql

I have several questions as follows:
I was wondering how I could transfer data from MySQL to Cosmos DB
using either Python or Data Azure Factory, or anything else.
If I understand correctly, a row from the table will be transformed
into a document, is it correct?
Is there any way to create one more row for a doc during the copy activity?
If data in MySQL are changed, will the copied data in Cosmos DB be automatically changed too? If not, how to do such triggers?
I do understand that some questions can be simply done; however, I'm new to this. Please bear with me.

1.I was wondering how I could transfer data from MySQL to Cosmos DB using either Python or Data Azure Factory, or anything else.
Yes, you could transfer data from mysql to cosmos db by using Azure Data Factory Copy Activity.
If I understand correctly, a row from the table will be transformed
into a document, is it correct?
Yes.
Is there any way to create one more row for a doc during the copy
activity?
If you want to merge multiple rows for one document,then the copy activity maybe can't be used directly. You could make your own logical code(e.g. Python code) in the Azure Function Http Trigger.
If data in MySQL are changed, will the copied data in Cosmos DB be
automatically changed too? If not, how to do such triggers?
So,you could tolerate delay sync,you could sync the data using Copy Activity between sql and cosmos db in the schedule. If you need to timely sync,as i know, azure function does support sql server trigger.But you could get some solutions from this document.
Defining Custom Binding in Azure functions
If not the binding on Azure Functions side, then it can be a SQL trigger invoking an Azure Functions HTTP trigger.

Related

Querying Azure Synapse Analytics external table based on CSV

I created external table in Azure Synapse Analytics Serverless.
The File Format is CSV and it points to a Data Lake Gen 2 folder with multiple CSV files which hold the actual data. The CSV files are being updated from time to time.
I would like to foresee the potential problems that may arise when a user executes a long running query against the external table in the moment when underlying CSV files are being updated.
Will the query fail or maybe the result set will contain dirty data / inconsistent results?
As such there is no issue when connecting Synapse Serverless pool with Azure data lake. Synapse is very much compatible to query, transform and analyze and data stored in data lake.
Microsoft provide the well explained troubleshoot document in case of any error. Please refer Troubleshoot the Azure Synapse Analytics.
Synapse SQL serverless allows you to control what the behavior will be. If you want to avoid the query failures due to constantly appended files, you can use the ALLOW_INCONSISTENT_READS option.
You can see the details here:
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-single-csv-file#querying-appendable-files

BigQuery to GCS JSON

I wanted to be able to store Bigquery results as json files in Google Cloud Storage. I could not find an OOB way of doing this so what I had to do was
Run query against Bigquery and store results in permanent tables. I use a random guid to name the permanent table.
Read data from bigquery, convert it to json in my server side code and upload json data to GCS.
Delete permanent table.
Return the json file url in GCS to front end application.
While this works there are some issues with this.
A. I do not believe I am making use of BigQuery's caching by making use of my own permanent tables. Can someone confirm this?
B. Step 2 will be a performance bottleneck. To pull data out of GCP to do JSON conversion to reupload into GCP just feels wrong. A better approach would be to use some cloud native serverless function or some other GCP data workflow type service to do this step that gets triggered upon creation of a new table in the dataset. What do you think is the best way to achieve this step?
C. Is there really no way to do this without using permanent tables?
Any help appreciated. Thanks.
With persistent table, your are able to leverage Bigquery Data Exporting to export the table in JSON format to GCS. It has no cost, comparing with you reading the table from your server side.
Right now, there is indeed a way to avoid creating permanent table. Because every query result is actually a temporary table already. If you go to "Job Information" you can find the full name of the temp table, which can be used in Data Exporting to be exproted as a JSON to GCS. However, this is way more complicated than just create a persistent table and delete it afterwards.

What is the most efficient way to export data from Azure Mysql?

I have searched high and low, but it seems like mysqldump and "select ... into outfile" are both intentionally blocked by not allowing file permissions to the db admin. Wouldn't it save a lot more server resources to allow file permissions than to disallow them? Any other import/export method I can find uses executes much slower, especially with tables that have millions of rows. Does anyone know a better way? I find it hard to believe Azure left no good way to do this common task.
You did not list the other options you found to be slow, but have you thought about using Azure Data Factory:
Use Data Factory, a cloud data integration service, to compose data storage, movement, and processing services into automated data pipelines.
It supports exporting data from Azure MySQL and MySQL:
You can copy data from MySQL database to any supported sink data store. For a list of data stores that are supported as sources/sinks by the copy activity, see Supported data stores and formats
Azure Data Factory allows you to define mappings (optional!), and / or transform the data as needed. It has a pay per use pricing model.
You can start an export manually or using a schedule using the .Net or Python SKD , the Rest api or Powershell.
It seems you are looking to export the data to a file, so Azure Blob Storage or Azure Files are likely to be a good destination. FTP or the local file system are also possible.
"SELECT INTO ... OUTFILE" we can achieve this using mysqlworkbench
1.Select the table
2.Table Data export wizard
3.export the data in the form of csv or Json

Talend Data warehousing tool

Ques: I have two database one is client's database(live database) and another is mine.I am using MySQL database. I should not access client's database directly so I created my own database. By using 'Talend' data warehousing tool I created job for each table and by executing all jobs I can get all updated data from client's database(live database) to my database. I need to execute these jobs manually for getting updated data into my Database, But my question is: is there any process which will automatically remind me, when client insert or update data on there data base so I can execute those jobs manually to get updated data into my database ?? or if client update their any database table so automatically associated job will Execute/Run ?? Please help me on this.
You would need to set up a database trigger that somehow notifies the Talend job and runs it. To do this you'd typically call the job as a web service using a stored procedure or user defined function. This link shows a typical way that a web service may be called on an update trigger for example.
If your source data tables are large, rather than extracting all of the data from the table and then I guess dropping your table and recreating you could use a tMysqlCDC component to only affect changes. The built in tutorial for the component looks like it pretty much covers a useful example of this in practice. If you are seeing regular changes in the source database this could make your job much more performant.
If you have absolutely no access to your client's database then you could alternatively just run the job with some scheduler. The Enterprise versions of Talend come with the Talend Administration Console that allows you to set CRON triggers for a job and could easily be set to run every minute or any other interval (not seconds). Alternatively you could use your operating systems scheduling system to run the job at your desired intervals.
If you can't modify your clients database (i.e. add triggers), and there is no other way to identify changed records (i.e. some kind of audit table) then you're our of luck.

Scheduler program fetching data from MS Access database

I need some clarification. I'm planning to make a Scheduler program where it will fetch data from a MS Access Database, and I also want it to upload the data to a web server (MySQL Database) in JSON Format.
In the first process which is the fetching, I'll use System.Data.Ole.db namespace. This namespace mostly worked in MS Access.
In the 2nd process which is the uploading, I am planning to use FTP protocol and should be JSON Format.
I was just confused with the second step, is FTP protocol applicable for this process? I need some tips and suggestions.
I will make an Android apps to view all the data that will be saved on the web server.
As I understand it, you want to copy data from your local MS Access database to a remote MySql database on a scheduled interval. Next you want to write an Android-app that consumes data from the MySql, this in JSON-format.
It's been a while since I last used MS Access, but I would simply setup the MySql destination table as a "linked table" in the MS Access database, then create a macro in MS Access database to INSERT rows into that linked table. I think that macros can be scheduled.
Alternatively, create a simple VB.NET console application, that reads rows from your MS Access database (as you said, using OleDbConnection), and inserts the rows into the MySql database (MySqlConnection - download from mysql.com). Schedule that with Task Scheduler.
Next, create a simple webpage (I'm guessing that this is hosted by a ASP somewhere) that reads data from the MySql and outputs that as JSON.
Hope this helps!
I suspect that FTP for uploading data to SQL is way out of kilter.
Why don't you simply connect to both the databases using regular connection strings and pull data from one and push to another?
I don't know why you brought in JSON into the equation but if you want to store your data in SQL Server in JSON format you can use JSON.NET or the .NET MVC inbuild JavascriptSerializer to do the job and store the resulting JSON into a regular SQL Server table.