Connect to MySQL from Hive - mysql

I want to connect my MySQL database to Hive so that I can access tables from MySQL server through Hive. I have searched the net and only found solutions for setting MySQL as a metastore database for Hive. But, did not find any methods for my problem. Can anyone please help me set this up? I am expecting something like this except for MySQL instead of MongoDB.

You can achieve this using two ways.
One is by importing the mysql table to hdfs and hive using sqoop. Direct hive import is possible through sqoop. This will create the hive table corresponding to that of mysql in hadoop. Once you import the table to hive, then the new table will work as a hive table alone.
Another way is by using a serde to access mysql tables. I found one hive-mysql serde in github. I haven't tested this serde. If you are good in java, you can write your own serde.
The example that you mentioned above is using a hive-mongodb SerDe.

Hive 2.3.0+ provides ability to define external tables from your MySQL/Postgres/etc using JdbcStorageHandler:
CREATE EXTERNAL TABLE student_jdbc
(
name string,
age int,
gpa double
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MYSQL",
"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
"hive.sql.jdbc.url" = "jdbc:mysql://localhost/sample",
"hive.sql.dbcp.username" = "hive",
"hive.sql.dbcp.password" = "hive",
"hive.sql.table" = "STUDENT"
"hive.sql.dbcp.maxActive" = "1"
);
Also you can use hive.sql.query parameter instead of hive.sql.table to define more specific query like:
"hive.sql.query" = "SELECT name, age, gpa FROM STUDENT"
See Cloudera docs also.

Related

Problem dropping Hive table from pyspark script

I have a table in hive created from many json files using hive-json-serde method, WITH SERDEPROPERTIES ('dots.in.keys' = 'true'), as some keys there have a dot in, like `aaa.bbb`. I create external table and use backticks for these keys. Now I have a problem dropping this table from pyspark script, using sqlContext.sql("DROP TABLE IF EXISTS "+table_name), I'm getting this error message:
An error occurred while calling o63.sql.
: org.apache.spark.SparkException: Cannot recognize hive type string: struct<associations:struct<aaa.bbb:array<string> ...
Caused by: org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '.' expecting ':'(line 1, pos 33)
== SQL ==
struct<associations:struct<aaa.bbb:array<string>,...
---------------------------------^^^
In HUE i can drop this table without any problem. Am I doing it wrong, or may be there is better way to do it?
It looks like it is not possible to work with Hive tables created with the hive-json-serde method, with dot in keys , using sqlContext.sql("...") from pyspark script, as I want. There is always the same error, if I want to drop such Hive table, or create it (haven't tried other things yet). So my workaround is to use python os.system() and execute required query through hive itself:
q='hive -e "DROP TABLE IF EXISTS '+ table_name+';"'
os.system(q)
It's more complicated with CREATE TABLE query, as we need to escape backticks with '\':
statement = "CREATE TABLE test111 (testA struct<\`aa.bb\`:string>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3a://bucket/test111';"
q='hive -e "'+ statement+'"'
It outputs some additional hive related info, but works!

How to Integrate MySql tables Data To Ksql Stream or Tables?

I am trying to build a data pipeline from MySql to Ksql.
Use Case: data source is MySql. I have created a table in MySql.
I am using
./bin/connect-standalone ./etc/schema-registry/connect-avro-standalone.properties ./etc/kafka-connect-jdbc/source-quickstart-sqlite.properties
to start a standalone connector. And it is working fine.
I am starting the consumer with topic name i.e.
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test1Category --from-beginning
when I am inserting the data in MySQL table I am getting the result in consumer as well. I have created KSQL Stream as will with the same topic name. I am expecting the same result in my Kstream as well, But i am not getting any result when i am doing
select * from <streamName>
Connector configuration--source-quickstart-mysql.properties
name=jdbc_source_mysql
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
connection.url=jdbc:mysql://localhost:3306/testDB?user=root&password=cloudera
#comment=Which table(s) to include
table.whitelist=ftest
mode=incrementing
incrementing.column.name=id
topic.prefix=ftopic
Sample Data
MySql
1.) Create Database:
CREATE DATABASE testDB;
2.) Use Database:
USE testDB;
3.) create the table:
CREATE TABLE products (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description VARCHAR(512),
weight FLOAT
);
4.) Insert data into the table:
INSERT INTO products(id,name,description,weight)
VALUES (103,'car','Small car',20);
KSQL
1.) Create Stream:
CREATE STREAM pro_original (id int, name varchar, description varchar,weight bigint) WITH \
(kafka_topic='proproducts', value_format='DELIMITED');
2.) Select Query:
Select * from pro_original;
Expected Output
Consumer
getting the data which is inserted in the MySQL table.
Here I am getting the data in MySQL.
Ksql
In-Stream data should be populated which is inserted in Mysql table and reflecting in Kafka topic.
I am not getting expected result in ksql
Help me for this data pipeline.
Your data is in AVRO format but in the VALUE_FORMAT instead of AVRO you've defined DELIMITED. It is important to instruct KSQL the format of the values that are stored in the topic. The following should do the trick for you.
CREATE STREAM pro_original_v2 \
WITH (KAFKA_TOPIC='products', VALUE_FORMAT='AVRO');
Data inserted into kafka topic after executing
SELECT * FROM pro_original_v2;
should now be visible in your ksql console window.
You can have a look at some Avro examples in KSQL here.

'Relation does not exist' error after transferring to PostgreSQL

I have transfered my project from MySQL to PostgreSQL and tried to drop the column as result of previous issue, because after I removed the problematic column from models.py and saved. error didn't even disappear. Integer error transferring from MySQL to PostgreSQL
Tried both with and without quotes.
ALTER TABLE "UserProfile" DROP COLUMN how_many_new_notifications;
Or:
ALTER TABLE UserProfile DROP COLUMN how_many_new_notifications;
Getting the following:
ERROR: relation "UserProfile" does not exist
Here's a model, if helps:
class UserProfile(models.Model):
user = models.OneToOneField(User)
how_many_new_notifications = models.IntegerField(null=True,default=0)
User.profile = property(lambda u: UserProfile.objects.get_or_create(user=u)[0])
I supposed it might have something to do with mixed-case but I have found no solution through all similar questions.
Yes, Postgresql is a case aware database but django is smart enough to know that. It converts all field and it generally converts the model name to a lower case table name. However the real problem here is that your model name will be prefixed by the app name. generally django table names are like:
<appname>_<modelname>
You can find out what exactly it is by:
from myapp.models import UserProfile
print (UserProfile._meta.db_table)
Obviously this needs to be typed into the django shell, which is invoked by ./manage.py shell the result of this print statement is what you should use in your query.
Client: DataGrip
Database engine: PostgreSQL
For me this worked opening a new console, because apparently from the IDE cache it was not recognizing the table I had created.
Steps to operate with the tables of a database:
Database (Left side panel of the IDE) >
Double Click on PostgreSQL - #localhost >
Double Click on the name of the database >
Right click on public schema >
New > Console
GL

How can I connect to a MySQL database into Apache Spark using SparkR?

I am working on Spark 2.0 and SparkR libs. I want to get a sample code on how can I do following things in SparkR?
Connect to a MySQL or any other SQL database using SparkR.
Write SQL queries like SELECT , UPDATE etc. to modify a table in that database.
I know to do it using R. However I would need some help to use Spark Sessions or SparkSQL context. I am using R Studio for the development.
Moreover, how do we submit this R code as Spark Batch to run continuously at a regular intervals?
jdbcurl <- "jdbc:mysql://xxx.xxx.x.x:xxxx/database"
data <- read.jdbc(jdbcurl, "tablename", user = "user", password = "password" )

How to convert H2Database database file to MySQL database .sql file?

I have some data in H2Database file and I want to convert it to MySQL .sql database file. What are the methods I can follow?
In answer to Thomas Mueller, SquirrelSQL worked fine for me.
Here is the procedure for Windows to convert a H2 database:
Go to "drivers list", where everything is red by default.
Select "H2" driver, and specify the full path to "h2-1.3.173.jar" (for
example) in "Extra Class Path". The H2 driver should display a blue
check in the list.
Select your target driver (PostgreSQL, MySQL), and
do the same, for example for PostgreSQL, specify the full path to
"postgresql-9.4-1201.jdbc41.jar" in Extra Class Path.
Go to "Aliases", then click on "+" for H2 : configure your JDBC chain, for example copy/paste the jdbc chain you obtain when you launch H2, and do the same for your target database: click on "+", configure and "test".
When you double click on your alias, you should see everything inside your database in a new Tab. Go to the tables in source database, do a multi-select on all your tables and do a right-click : "Copy Table".
Go to your target database from Alias, and do a "Paste Table". When all tables are copied altogether, the foreign key references are also generated.
Check your primary keys : from H2 to PostgreSQL, I lost the Primary Key constraints, and the auto-increment capability.
You could also rename columns and tables by a right click : "refactor". I used it to rename reserved words columns after full copy, by disabling name check in options.
This worked well for me.
The SQL script generated by the H2 database is not fully compatible with the SQL supported by MySQL. You would have to change the SQL script manually. This requires that you know both H2 and MySQL quite well.
To avoid this problem, an alternative, probably simpler way to copy the data from H2 to MySQL is to use a 3rd party tool such as the SQuirreL SQL together with the SQuirreL DB Copy Plugin plugin. (First you need to install SQuirreL SQL and on top of that the SQuirreL DB Copy Plugin.)
I created a Groovy script that does the migration from h2 to mysql. From there you could do a mysqldump. It requires that the tables exists in the Mysql database. It should work for ohter DBMS with minor changes.
#Grapes(
[
#Grab(group='mysql', module='mysql-connector-java', version='5.1.26'),
#Grab(group='com.h2database', module='h2', version='1.3.166'),
#GrabConfig(systemClassLoader = true)
])
import groovy.sql.Sql
def h2Url='jdbc:h2:C:\\Users\\xxx\\Desktop\\h2\\sonardata\\sonar'
def h2User='sonar'
def h2Passwd='sonar'
def mysqlUrl='jdbc:mysql://10.56.xxx.xxx:3306/sonar?useunicode=true&characterencoding=utf8&rewritebatchedstatements=true'
def mysqlUser='sonar'
def mysqlPasswd='xxxxxx'
def mysqlDatabase='sonar'
sql = Sql.newInstance(h2Url, h2User, h2Passwd, 'org.h2.Driver' )
def tables = [:]
sql.eachRow("select * from information_schema.columns where table_schema='PUBLIC'") {
if(!it.TABLE_NAME.endsWith("_MY")) {
if (tables[it.TABLE_NAME] == null) {
tables[it.TABLE_NAME] = []
}
tables[it.TABLE_NAME] += it.COLUMN_NAME;
}
}
tables.each{tab, cols ->
println("processing $tab")
println("droppin $tab"+"_my")
sql.execute("DROP TABLE IF EXISTS "+tab+"_my;")
sql.execute("create linked table "+tab+"_my ('com.mysql.jdbc.Driver', '"+mysqlUrl+"', '"+mysqlUser+"', '"+mysqlPasswd+"', '"+mysqlDatabase+"."+tab.toLowerCase()+"');")
sql.eachRow("select count(*) as c from " + tab + "_my"){println("deleting $it.c entries from mysql table")}
result = sql.execute("delete from "+tab+"_my")
colString = cols.join(", ")
sql.eachRow("select count(*) as c from " + tab){println("starting to copy $it.c entries")}
sql.execute("insert into " + tab + "_my ("+colString+") select "+colString+" from " + tab)
}
The H2 database allows you to create a SQL script using the SCRIPT SQL statement or the Script command line tool. Possibly you will need to tweak the script before you can run it against the MySQL database.
You can use fullconvert to convert database. it's easy to use.
Follow steps shown here
https://www.fullconvert.com/howto/h2-to-mysql