How do I convert RDBMS DDL to Hive DDL script - mysql

We've a large and disparate data sources including oracle,db2,mysql. We also need to append few audit columns at the end.
I came across the following Java class org.apache.sqoop.hive.HiveTypes. I am planning to create a simple interpreter that accepts RDBMS DDL and spits out Hive DDL script. Any pointers on how I can achieve this?

Hive QL is more or less similar to normal RDBMS DDL. But there are certain things that it lacks and thats why it does not fully follow ANSI SQL. There is no automated process to convert it.
But you have to try running the SQL queries on Hive and wherever it violates you have to change the query according to hive.
For instance Hive takes only equality condition as join condition which is not the case in RDBMS.
For creating an interpreter yourself you first have to list down the common differences between RDBMS query construct and Hive QL construct. Whenever you encounter a RDBMS construct which according to your list will violate in hive the query gets rebuild as per hive. This replacement logic has to be coded.

Related

MySQL: How to list all tables that are used in a procedure?

I'm looking for a method or a query to retrieve all tables that are used in a procedure.
I tried information_schema.routinesbut it contains all the definition for a procedure.
Is there any system table that contains the dependency relationship for this ?
Or how can I get table names from the definitions using other language such as Python?
Thanks a lot!!
The current version of MySQL does not implement such a view in INFORMATION_SCHEMA.
MySQL 8.0.13 added I_S.VIEW_TABLE_USAGE, which allows you to look up the tables used by a view. This was done for WorkLog #11864. That WorkLog notes compatibility with PostgreSQL and Microsoft SQL Server.
However, there is no WorkLog I can find for an hypothetical I_S.ROUTINE_TABLE_USAGE table. I checked PostgreSQL, and it has this view: https://www.postgresql.org/docs/current/infoschema-routine-table-usage.html but MySQL does not.
So to get this information automatically, you would have to query the procedure body, and parse it for table references. Not an easy task.

Index creation in Data Generator

I'm generating a script from an existing MySQL schema using DataGrip's SQL Generator feature. I obtain a working script containing create index statements. I would prefer the indexes to be created by a key clause in the create table statement. I can't see an option in SQL Generator to get that. Do I miss something? I have dozens of tables, so I can't just do it by hand.
The server is a MySQL 5.7.
You can use SQL Generator | Generate: Definitions provided by RDBMS server to get the same result
I found a solution using not the SQL Generator, which doesn't seem to be able to do what I want, but a raw export of the database structure. I select the schema (you can select various and multiple objects: schemas, tables, triggers, produres, functions), on right-click: SQL Scripts -> Request and Copy original DDL, which copies the resulting script extracted from the database. You can then paste it wherever you want, for example a SQL console or a text editor.

Mysql Query match to check if query has been updated

I am trying to match two MySQL Queries (for now, the target is "Create VIEW") to analyze if the result of execution would result in the same effect to Database.
The source of the queries is not the same, making the syntax across the queries inconsistent.
To further simplify the question, let me add more details:
Let's say there is an already existing View in the database.
This View was created using a Create VIEW ... SQL statement.
There is a possibility that the Create VIEW ... statement get's updated, hence to reflect the changes in the database currently this statement is executed at the time of migration.
But, I want to avoid this situation, if the statement Create VIEW ... will result in the same structure as of the existing View in the database, I want to avoid executing it.
To generate the CREATE VIEW from database I am using SHOW CREATE VIEW... (comparing this with the query originally used to create the VIEW).
The primary restriction is I need to make this decision only at the time of migration and cannot presume any conclusions (say, using git diff or commit history...).
I have already done some search to look for a solution for this:
Found no direct solution for this problem (like a SQL engine to which I can feed both queries and know if the result would be the same).
Decided to Parse the queries and to achieve that ended up looking into ANTLR (also used by MYSQL WorkBench)
ANTLR's approach looks promising but, this will require an extensive rule-based parsing and creating a query match program from scratch.
I realized that just parsing queries is not enough, I have to create my own POJOs to store the atomic lexers from queries and then compare the queries based on some rules.
Even if I could find predefined POJOs, that would allow to quickly create a solution for this problem.

What are the SQL Server query syntax not supported by MySQL?

I am working in a project where we are using SQL Server database currently. But recently a decision has been taken that the database will be changed to MySQL.
I am not using any stored procedures, views, triggers, user defined functions, etc. But I think even then some queries written for SQL Server will not be supported by MySQL.
Can anyone help: what are the things that I have to check (and change) so that all the queries will work properly for MySQL also?
Queries that I know without consulting the documentation that will not work:
(recursive) common table expressions
windowing functions
queries using the standard SQL string concatenation ||
UPDATEs with JOIN are different between the two systems
Date arithmetics: date_column + 1 behaves differently in SQL Server
Division by zero will produce an error
SQL Server will reject values that do not fit into a column (instead of silently truncating it, which MySQL does in the default installation)
DDL that will not work and might have an impact on performance and/or data quality
datetime columns where you need precision up to milliseconds
tables with check constraints
indexed views
triggers on views
table functions (select * from my_function(42);)
filtered indexes ("partial index")
function based indexes
There's always the option to take commercial support from MySQL AB for this problem. I'm pretty sure they've done enough MSSQL->MySQL migrations to know alot about that. If a price tag on the migration is not a problem.
Alternatively, you could try to run the MySQL Migration Toolkit over the data and look for meaningful error messages at the stuff it cannot migrate. MySQL Migration Toolkit is part of the MySQL GUI Tools.

Dynamic Linq - query a schema that is only known at run time?

I know with dynamic linq you can construct expressions dynamically in the same way that you might build and execute a dynamic SQL statement - e.g. a dynamic where clause or a dynamic select list. Is it possible to do this in cases where the schema is not known at compile time?
In a database I'm working with users can define their own entities which causes new tables/columns to be created in the back-end database. At run time I'll know the table & column names I need to work with but I won't know the schema at compile time hence I can't build a DBML to work with up front.
Is there any facility for the dynamic discovery of the schema at run time or is this a case where I need to stick with building dynamic SQL statements?
As far as we understand, you don't know neither schema name nor the full structure of your schema for sure.
In this case it seems that the strongly-typed ExecuteQuery method overload will be an option.
Just write the SQL queries and add the necessary parameters (like table and column names) either using string concatenation or as parameters.