I have several sqlserver and mysql db's. And it's impossible join two or more tables between them.
A thought is to use Hbase on hadoop to achieve this by storing all columns that I need to join. Cause I don't need ad-hoc query and just need sync data to HDFS per day.
But I'm not sure if Hbase is well-suited for that considering I have to filter rows by many conditions.
Does anyone have a suggestion about this?
You could use sqoop to import databases from sqlserver and mysql to HDFS, and then use Hive to query the imported data. Hive supports SQL and you'd be able to execute JOIN with Hive.
I don't think you can do JOINs with HBase.
Related
i've a MSSQL database and trying to migrate to MySQL database.. the problem is when I using MySQL WorkBench, some table records in my MSSQL database is not migrated (there is an error and MySQL Workbench not responding).. is there any tools to export MSSQL table records into SQL file that compatible to be executed in MySQL?
Both T-SQL and MySQL and support VALUES clause. So, right click on the database and select Generate scripts from Tasks:
Then you can choose objects:
and then make sure you have selected to get the data, too:
You can even get the schema and change it a little bit to match the MySQL syntax.
For small amount of data this is pretty cool. If you are exporting large tables it will be better to use another tool. For example, using bcp you can export your data in CSV format and then import it in the MySQL database.
We've a large and disparate data sources including oracle,db2,mysql. We also need to append few audit columns at the end.
I came across the following Java class org.apache.sqoop.hive.HiveTypes. I am planning to create a simple interpreter that accepts RDBMS DDL and spits out Hive DDL script. Any pointers on how I can achieve this?
Hive QL is more or less similar to normal RDBMS DDL. But there are certain things that it lacks and thats why it does not fully follow ANSI SQL. There is no automated process to convert it.
But you have to try running the SQL queries on Hive and wherever it violates you have to change the query according to hive.
For instance Hive takes only equality condition as join condition which is not the case in RDBMS.
For creating an interpreter yourself you first have to list down the common differences between RDBMS query construct and Hive QL construct. Whenever you encounter a RDBMS construct which according to your list will violate in hive the query gets rebuild as per hive. This replacement logic has to be coded.
We are handling a data aggregation project by having several microsoft sql server databases combining to one mysql database. all mssql database have the same schema.
The requirements are :
each mssql database can be imported to mysql independently
before being able to import each record to mysql we need to validates each records with a specific createrias via php.
each imported mssql database can be rollbacked. It means even it already imported to mysql, all the mssql database can be removed from the mysql.
we would still like to know where does each record imported to the mysql come from what mssql database.
All import process will be done with PHP .
we have difficulty in many aspects. we don't know what is the best approach to solve our problem.
your help will be highly appreciated.
ps: each mssql database has around 60 tables and each table can have a few hundred thousands .
Don't use PHP as a database administration utility. Any time you build a quick PHP script to transfer records directly from one database to another, you're going to cause yourself a world of hurt when that script becomes required for production operation.
You have a number of problems that you need solved:
You have multiple MSSQL databases with similar if not identical tables.
You have a single MySQL database that you want to merge the data into.
The imported data must be altered in a specific way before being merged.
You want to prevent all duplicate records in your import.
You want to know what database each record originally came from.
The solution?
Analyze the source MSSQL databases and create a merge strategy for them.
Create a database structure on the MySQL database that fits the merge strategy in #1, including all the new key constraints (like unique and foreign keys) required for the consolidation.
At this point you have two options left:
Dump the data from each of the source databases into raw data using your RDBMS administration utility of choice. Alter that data to fit your merge strategy and constraints. Document this, and then merge all of the data into your new database structure.
Use a tool like opendbcopy to map columns from one database to another and run a mass import.
Hope this helps.
Was wondering if anyone had any insight or recommended tools for exporting the records from a PostgreSQL database and importing them into a MySQL database. I believe the table structure is 100% identical.
Thoughts? Thanks!
The command
pg_dump --data-only --column-inserts <database_name>
will generate SQL-standard-compliant INSERT statements with all column names listed and one VALUES clause per INSERT. This is the most portable way of moving data from PostgreSQL to any other SQL database.
Check out SquirrelSQL, it can pump data from one database brand into another via the DBCopy plugin. When the table structures are really identical it works quite well.
There is a ruby app called Taps that will do it. I've used it before with great success:
http://adam.heroku.com/past/2009/2/11/taps_for_easy_database_transfers/
I have a table in a postgreSQL database that I would like to convert into a MySQL table. What would be the easiest way to do this? Any tools to do this? Remember, I am not converting the whole database to MySQL... I am just taking a table from that postgreSQL and convert it to MySQL table
You can use pg_dump to create a SQL dump from a single table and use this dump as input for mysql. Take a look at the option --data-only when you already have the tablestructure in your MySQL database and --column-inserts to create straight forward INSERT statements.
An ETL tool like SSIS, Pentaho, Talend or several others will do the trick. Most support a wide variety of data sources and destinations.