we are planning to migrate our db to Azure cosmos graph db. we are using this
bulk import tool.
nowhere it mentioned Json input format.
Whats the Json format for bulk import to Azure cosmos graph db
https://github.com/Azure-Samples/azure-cosmosdb-graph-bulkexecutor-dotnet-getting-started
azure bulk import image
Appreciate any help.
You actually don't need to build the gremlin queries to insert your edges. In CosmosDB, everything is regarded as a JSON document (even the vertices and edges in a graph collection).
The format of the required JSON isn't officially published and can change at any time but can be discovered though inspection of the SDKs.
I wrote about it here a while ago and it is still valid today.
Related
My application requires some kind of data backup and some kind of data exchange between users, so what I want to achieve is the ability to export an entity but not the entire database.
I have found some help but for the full database, like this post:
Backup core data locally, and restore from backup - Swift
This applies to the entire database.
I tried to export a JSON file, this might work except that the entity I'm trying to export contains images as binary data.
So I'm stuck.
Any help exporting not the full database but just one entity or how to write a JSON that includes binary data.
Take a look at protobuf. Apple has an official swift lib for it
https://github.com/apple/swift-protobuf
Protobuf is an alternate encoding to JSON that has direct support for serializing binary data. There are client libraries for any language you might need to read the data in, or command-line tools if you want to examine the files manually.
I am creating an ETL automation pipeline using resources provided by Amazon
such as Amazon Glue for data transformation.
When I passed 1 MB JSON file, it transforms data successfully & provides an output in required JSON format.
I did R&D on how Amazon Glue processes larger files (2 GB), but didn't find the expected results. Could you please let me know if you have any idea/references regarding the same issue?
I am using PySpark custom script to do transformation stuff.
I have searched high and low, but it seems like mysqldump and "select ... into outfile" are both intentionally blocked by not allowing file permissions to the db admin. Wouldn't it save a lot more server resources to allow file permissions than to disallow them? Any other import/export method I can find uses executes much slower, especially with tables that have millions of rows. Does anyone know a better way? I find it hard to believe Azure left no good way to do this common task.
You did not list the other options you found to be slow, but have you thought about using Azure Data Factory:
Use Data Factory, a cloud data integration service, to compose data storage, movement, and processing services into automated data pipelines.
It supports exporting data from Azure MySQL and MySQL:
You can copy data from MySQL database to any supported sink data store. For a list of data stores that are supported as sources/sinks by the copy activity, see Supported data stores and formats
Azure Data Factory allows you to define mappings (optional!), and / or transform the data as needed. It has a pay per use pricing model.
You can start an export manually or using a schedule using the .Net or Python SKD , the Rest api or Powershell.
It seems you are looking to export the data to a file, so Azure Blob Storage or Azure Files are likely to be a good destination. FTP or the local file system are also possible.
"SELECT INTO ... OUTFILE" we can achieve this using mysqlworkbench
1.Select the table
2.Table Data export wizard
3.export the data in the form of csv or Json
I have JSON files from Google Analytics and like to have them in CSV so they can used by Excel or Power BI.
I´ve stored them in Azure Blob Storage and DocumentDB. Is there a possibility to do this transformation in DocumentDB with the Query Explorer or the DocumentDB Data Migration Tool and a stored procedure? Or are there any other suggestions?
The DocumentDB database migration tool supports loading CSV data. Please see https://learn.microsoft.com/en-us/azure/documentdb/documentdb-import-data#a-idcsvaimport-csv-files---convert-csv-to-json
I´ve stored them in Azure Blob Storage and DocumentDB. Is there a possibility to do this transformation in DocumentDB with the Query Explorer or the DocumentDB Data Migration Tool and a stored procedure?
As I known, DocumentDB Query Explorer, Store Procedures, User Defined Functions could not transfer JSON to CSV. And the DocumentDB Data Migration Tool could only export your source to a JSON file, you could refer to this tutorial.
I have JSON files from Google Analytics and like to have them in CSV so they can used by Excel or Power BI.
Based on my understanding, you could follow this tutorial to get data from Azure DocumentDB for Power BI. For transfer JSON to CSV, I assumed that you could try to write your transformation logic based on your development language. Or you could leverage Azure Data Factory to move your JSON files from Blob or DocumentDB to Azure Table Storage, then you could use Microsoft Azure Storage Explorer to manage your table and export your records to CSV as follows:
I am new to GeoMesa. I mean I just typed geomesa command. So, after following the command line tools tutorial on GeoMesa website. I found some information on ingesting data to geomesa through a .csv file.
So, for my research:
I have a MySQL database storing all the information sent from an Android Application.
And I want to perform some geo spatial analytics on it.
Right now I am converting my MySQL table to .csv file and then ingest it into geomesa as adviced on GeoMesa website.
But my questions are:
Is there any other better option because data is in GB and its a streaming data, hence I have to make .csv file regularly?
Is there any API through which I can connect my MySQL database to geomesa?
Is there any way to ingest using .sql dump file because that would be more easier then .csv file?
Since you are dealing with streaming data, I'd point to two GeoMesa integrations:
First, you might want to check out NiFi for managing data flows. If that fits into your architecture, then you can use GeoMesa with NiFi.
Second, Storm is quite popular for working with streaming data. GeoMesa has a brief tutorial for Storm here.
Third, to ingest sql dumps directly, one option would be to extend the GeoMesa converter library to support them. So far, we haven't had that as a feature request from a customer or a contribution to the project. It'd definitely be a sensible and welcome extension!
I'd also point out the GeoMesa gitter channel. It can be useful for quicker responses.