Currently, I’m executing the following steps(Hadoop 1.1.2, Hive 0.11 and Sqoop-1.4.3.bin__hadoop-1.0.0) :
Import data from MySQL to Hive using Sqoop
Execute a query in Hive and store its output in a Hive table
Export the output to MySQL using Sqoop
I was wondering if it would be possible to combine steps 2 & 3 – the output of the Hive query written directly to the MySQL database.
I read about the external tables but couldn’t find an example where the LOCATION clause points to something like jdbc:myql://localhost:3306//. Is it really possible?
This thread talks about the JDBC Storage Handler but couldn't find a Hive example for the same(I guess its unimplemented!)
The link you provided, it seems the Bug is Unresolved.
But from your problem what I understand is that, you want to do a select query in hive and the output of this query needs to written in MySql. Correct me if I am wrong ?
If this is case you can use Sqoop export for this.
Please check this answer of mine: https://stackoverflow.com/a/17753176/1970125
Hope this will help.
Related
I have been following this article on how to analyze twitter data with Hive: http://blog.cloudera.com/blog/2012/11/analyzing-twitter-data-with-hadoop-part-3-querying-semi-structured-data-with-hive/
I have set up flume to collect twitter data and write into HDFS. I have set up a hive table that refers to the same HDFS location.
When I run a command like this from hive:
SELECT entities.user_mentions[0].screen_name FROM tweets;
I get the following response:
OK
Time taken: 0.16 seconds.
It does not matter what query I run, I don't get any results.
As I am new to Hive, am I expecting to see the results in the Hive command line, or do I have to mine the result from mySQL. mySQL is the metastore DB.
When hive data is partitioned with directory, then it needs to be rapaired to see partition/partitions. Thus, running msck repair table your_table_name should solve your problem.
I am using sqoop to import data from Mysql into Hbase.
It works fine but there is one issue.
As i read from Sqoop documentation , sqoop converts mysql data into String and then store it in Hbase.
However this would be problem for me as i will have to export data back from Hbase to Mysql and at that time , how will sqoop deduce data type information for the Hbase data ?
Someone please help for solution to this problem.
What you can do is - during export, just export it to a temporary table in MySql side. At that point the datatype will be different. Then write a query to insert them to the original MySql table from the temp table and during this time filter out unexpected data or convert datatype.
I faced pretty similar issue with timestamp datatype as in Hive I was storing it as bigint. During export I first inserted them as it is to make Sqoop export works. If its successful then I run a query that actually load those data from temp table to original table while converting the data at the same time. Hope it helps.
I would like to know how I can move date from Hive to MySQL?
I have seen example on how to move hive data to Amazon DynamoDB but not for a RDBMS like MySQL. Here is the example that I saw with DynamoDB:
CREATE EXTERNAL TABLE tbl1 ( name string, location string )
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "table",
"dynamodb.column.mapping" = "name:name,location:location") ;
I would like to do the same but with MySQL instead. I wonder if I need to code my own StorageHandler? I also to do not want to use sqoop. I want to be able to do my query directly in my HiveQL script.
You'd currently need a JDBC StorageHandler, which one has not been created just yet, but you could certain build your own.
There is currently an issue report for this which you can follow here:
https://issues.apache.org/jira/browse/HIVE-1555
Have you tried using Sqoop?. Its a good tool to do such kind of stuff.
There are many options. You can download the files in hive as csv file and then try bulk insert into mysql tables. You can use Sqoop. Or you can use some of the popular ETL tools like
Pentaho and many others.
Was wondering if anyone had any insight or recommended tools for exporting the records from a PostgreSQL database and importing them into a MySQL database. I believe the table structure is 100% identical.
Thoughts? Thanks!
The command
pg_dump --data-only --column-inserts <database_name>
will generate SQL-standard-compliant INSERT statements with all column names listed and one VALUES clause per INSERT. This is the most portable way of moving data from PostgreSQL to any other SQL database.
Check out SquirrelSQL, it can pump data from one database brand into another via the DBCopy plugin. When the table structures are really identical it works quite well.
There is a ruby app called Taps that will do it. I've used it before with great success:
http://adam.heroku.com/past/2009/2/11/taps_for_easy_database_transfers/
I'm wondering if there is a utility that exists to create a data dictionary for a MySQL database.
I'm considering just writing a php script that fetches the meta data about the database and displays it in a logical format for users to understand but I'd rather avoid that if there is some pre-built utility out there that can simply do this for me.
Have you looked into HeidiSQL or phpMyAdmin?
Also, MySQL Admin.
Edit#1 fixed typo, added more info
Take a look at https://stackoverflow.com/a/26703098/4208132
There is a db_doc.lua plugin for MySQL Workbench CE
[EDITED]
It seems that the LUA plugin support was discontinued.
So I wrote a plugin in Python to generate data dictionaries.
It is available at: https://github.com/rsn86/MWB-DBDocPy
Looks like MySQL Admin is now MySQL Workbench and you need the Enterprise version to get their reporting tool called DBDoc. It explains a little about customizing DBDoc reporting templates at http://dev.mysql.com/doc/workbench/en/dbdoc-templates.html
The easiest thing to do is Download Toad for MySQL, which is free, and create your own query against the mysql information_schema internal database. You can add columns you want to the query below. Then select all results and export as csv using TOAD.
use information_schema;
desc columns;
select c.table_name, c.column_name, c.data_type from columns c
where c.table_schema = "mydatabaseinstance";