Azure Databricks High concurrency + Table access control + external hive metastore + ADLS pass through - external

Databricks high concurrency cluster with external hive meta store + ADLS passthrough + Table access control is no more supported 🤷‍♂️
Any thoughts on how to achieve the below functionality
External hive meta store is needed since we migrated from HDInsight to Databricks.
With external hive meta store it is evident that there are many advantages (one of is we can migrate to any Hadoop cluster without worrying about metadata, that’s how we migrated from HDInsight to Databricks).
Table access control is needed to grant fine grained access on the hive databases. My users need read on some , read write on some hive databases.
ADLS pass through is needed for the users to perform read / write operations on the ADLS.(that’s where hive databases point to)

I guess you are looking for answers on how to do this but, you cannot do this, at least not at this time.
Based on this:
https://learn.microsoft.com/en-us/azure/databricks/data/data-sources/azure/adls-passthrough#known-limitations
Passthrough and Table ACL are not supported
"
The powers granted by Azure Data Lake Storage credential passthrough could be used to bypass the fine-grained permissions of Table ACLs, while the extra restrictions of Table ACLs will constrain some of the power you get from Azure Data Lake Storage credential passthrough.
"
You need to drop at least one of these two.
There is also a problem between External Hive and High Concurrency features but that may be by passable.

Related

AWS MySQL to GCP BigQuery data migration

I'm planning a Data Migration from AWS MySQL instances to GCP BigQuery. I don't want to migrate every MySQL Database because finally I want to create a Data Warehouse using BigQuery.
Would exporting AWS MySQL DB to S3 buckets as csv/json/avro, then transfer to GCP buckets be a good option? What would be the best practices for this Data pipeline?
If this was a MySQL to MySQL migration; there were other possible options. But in this case the option you mentioned is perfect.. Also, remember that your MySQL database will keep getting updated.. So, your destination DB might have some records missed out.. because it is not real-time DB transfer.
Your proposal of exporting to S3 files should work OK, and to export the files you can take advantage of the AWS Database Migration Service
With that service you can do either a once-off export to S3, or an incremental export with Change Data Capture. Unfortunately, since BigQuery is not really designed for working with changes on its tables, implementing CDC can be a bit cumbersome (although totally doable). You need to take into account the cost of transferring data across providers.
Another option, which would be much easier for you, is to use the same AWS Database Migration service to move data directly to Amazon Redshift.
In this case, you would get change data capture automatically, so you don't need to worry about anything. And RedShift is an excellent tool to build your data warehouse.
If you don't want to use RedShift for any reason, and you prefer a fully serverless solution, then you can easily use AWS Glue Catalog to read from your databases and export to AWS Athena.
The cool thing about the AWS based solutions is everything is tightly integrated, you can use the same account/users for billing, IAM, monitoring... and since you are moving data within a single provider, there is no extra charge for networking, no latency, and potentially fewer security issues.

Sql vs nosql for a realtime messaging app?

I am creating a messaging app. I have my users stored in a mysql database and messages stored in google datastore a nosql database. However I was wondering what would be the drawbacks of having my messages in a mysql database since I am fetching the message and the user simultaneously.
Is there performance drawbacks?
Generally, different database usage cannot affect anything if your backend architecture is well-defined. Database stores only data to manipulate. I think for authentication you use mySQL and store data in Google Datastore. Performance drawbacks are coming from the bandwidth of your server.
I propose that you must use the same database to store all data, it will be more stable and easy to manage.

How to convert MS SQL tables to DynamoDB tables?

I am new to Amazon DynamoDB and I have eight(8) MS SQL tables that I want to migrate to DynamoDB.
What process should I use for converting and migrating the database schema and data?
I was facing the same problem a year back when I started migrating the app from SQL to DynamoDB. I am not sure if there are automated tools, but I can share what we had done for migration:
Check if your existing data types can be mapped/need to change in DynamoDB. You can merge some of the table which requires less updates into single item with List and Map types or use a Set if required.
The most important thing is to check all your existing queries. This will be the core information you will need when you will design DynamoDB tables.
Make sure you distribute Hash keys properly.
Use GSI and LSI for searching and sorting purposes (project only those attributes that will be needed; this will save money).
Some points that will save some money:
If your tables are read-heavy, try using some caching mechanism, otherwise be ready to increase throughput of the tables.
If your table is write-heavy, then implement a queuing mechanism, such as SQS.
Keep checking all of your important tables status in Management console. They have provided different matrices that will help you in managing the throughput of the tables.
I have written a blog which include all the challenges faced while moving from relational database to NoSQL database

Access a SQLite db file from inside MySQL?

Given the number of different storage engines MySQL supports, I'm just a bit surprised I can't find one that uses SQLite files.
Does such a thing exist?
The use case is that I have data in multiple MySQL data bases that I want to process and export as a SQLite data base that other tools can then process.
The current proposed solution is to use a scratch MySQL DB server to access the other instance using the FEDERATED Storage Engine and to access and populate a local SQLite db file using another Storage Engine.
The constraint is that the cost/benefit trade off can barely justify the proposed workflow (and can't justify writing any code beyond the SQL that reads and processes the federated tables) so I'm strictly limited to "works robustly out of the box" solutions.

From Oracle to MS-Access to Mysql

I have a client with close to 120,000,000 records in an Oracle database. Their engineer claims they can only give us a ms access dump of their database. The data will actually be going into an MySQL relational database instance.
What potential issues and problems can we expect moving from Oracle > Access > MySQL?
We have located tools that can convert oracle db to MySQL, but due the large nature of the database 100gb + I am not sure of the stability of these software based solutions to handle the conversion process. This is a time sensitive project and I am worried that if we make any mistakes in the onset that we may not be able to complete in a timely manner.
Exporting the Oracle data to a comma-separated, tab separated, or pipe separated set of files would not be very challenging. It's done all the time.
I have no idea why someone would claim to only be able to produce an MS Access dump from an Oracle database -- if that's not being done directly via selecting from Access through ODBC then it's done via an intermediate flat file anyway. I'm inclined to call "BS" or "incompetence" on this claim.
The maximum size of an Access database is 2GB so I don't see how the proposed migration could be achieved without partitioning the data.