FLink Application - interface to MySql and MongoDb - mysql

This post is marked for deletion, as the issue was with the IDE in not creating the proper jar, hence issues with the code interaction
I have a small flink application that reads from a kafka topic,
needs to query if the input from the topic (x) exists in a column of MySql Database before processing it (Not Ideal but its the current requirement)
When I run the Application through the IDE (Intellij) -> It works.
However when I submit the job to flink server it fails to open connection based on driver
Error from Flink Server
// ERROR
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
// ---------------------
// small summary of MAIN
// ---------------------
Get Data from Source (x)
source.map(x => {
// open connection (Fails to open)
// check if data exist in db
})
// -------------------------------------
// open connection function (Scala Code)
// -------------------------------------
def openConnection() : Boolean = {
try {
// - set driver
Class.forName("com.mysql.jdbc.Driver")
// - make the connection
connection = DriverManager.getConnection(url, user, pswd)
// - set status controller
connection_open = true
}
catch {
// - catch error
case e: Throwable => e.printStackTrace
// - set status controller
connection_open = false
}
// return result
return connection_open
}
Question
1) Whats the correct way to interface to MySql Database from a flink application?
2) I will also at a later stage have to do similar interaction with MongoDB, whats the correct way interacting with MongoDB from FLink?

Unbelievable IntelliJ does not update dependencies on rebuild command.
In IntelliJ, You have to delete and re-setup your artifact creator for all dependencies do be added. (Build, Clean, Rebuild,Delete) does not update its settings.
I deleted and recreated the artifact file. And it Works
Apologies for the unnecessary inconvenience (As you can imagine my frustration). But it's a word of caution for those developing in IntelliJ, to manually delete and recreate artifacts
Solution:
(File -> Project Structure -> Artifacts -> (-) delete previous one -> (+) create new one -> Select Main Class)

Related

Apache camel equivalent for spring #Transactional(readonly=true)

I am trying to use Java Streams with JPA in Apache Camel Route with MySQL. I have seen few resources and they have mentioned that we need to have the below 3 properties for Streams to work properly with MySQL.
1. Forward-only resultset
2. Read-only statement
3. Fetch-size set to Integer.MIN_VALUE
I have tried setting the 2nd and 3rd property in the JPA Repository using QueryHint.
#QueryHints(value = {
#QueryHint(name = HINT_FETCH_SIZE, value = "" + Integer.MIN_VALUE),
#QueryHint(name = HINT_READONLY, value = "true")
})
Looks like Fetch-size is working, but 2nd property is not working, as I am still facing the below error:
Caused by: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic#54b12700 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
I have made the route Transactional using transacted() which is equivalent to spring #Transactional. But not sure what is the equivalent for #Transactional(readonly=true). Any leads?
from("direct:" + ROUTE_ID)
.routeId(ROUTE_ID)
.log("Start the processing")
.transacted()
.bean(Processor.class)
.to("log:ProcessName?showAll=true");
I have referred this blog as reference.

How to use ucanaccess (ms access jdbc driver) in azure data bricks?

I am using azure data bricks and am trying to read .mdb files in as part of an ETL program. After doing some research, the only jdbc connector that i've found for ms access (.mdb) formats is "ucanaccess". I've followed some tutorials on azure on how to connect to a jdbc data source, and the connection at first appears successful, but there are some strange behaviors that don't make any sense.
For one, I cannot actually query the data frame because there are different data type errors. This happens for every table in the .mdb file.
connectionProperties = {
"driver" : "net.ucanaccess.jdbc.UcanaccessDriver"
}
url = "jdbc:ucanaccess:///dbfs/mnt/pre-processed/aeaton#legacydirectional.com/DD/DAILIES/5-1-19/MD190062.MDB"
df = spark.read.jdbc(url=url, table="tblbhaitems", properties=connectionProperties)
The result here is a data frame being returned
(data frame returned)
Now, trying to actually get data from the data frame, I get the following error:
df.select("*").show()
error: "org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.139.64.6, executor 0): net.ucanaccess.jdbc.UcanaccessSQLException: UCAExc:::4.0.4 incompatible data type in conversion: from SQL type CHARACTER to java.lang.Integer, value: Item No"
Looking at this error, I decide to try querying a specific string column to at least test other data types. When I perform
df.select("`Job no ID`").show()
I get the column name repeated for every row value of that table:
+---------+
|Job no ID|
+---------+
|Job no ID|
|Job no ID|
|Job no ID|
|Job no ID|
|Job no ID|
+---------+
I'm totally at a loss at why it's connecting and sees the columns but not actually fetching any data. Unfortunately, .mdb files aren't very common, so I feel my options are likely limited here in what I have available to parse the data.
I was facing similar issue like you mentioned when using spark with ucanaccess jdbc driver.
In spark we can create and register custom jdbc dialect for ucanaccess jdbc driver like following:
import org.apache.spark.sql.jdbc.{JdbcDialect, JdbcDialects}
case object MSAccessJdbcDialect extends JdbcDialect {
override def canHandle(url: String): Boolean = url.startsWith("jdbc:ucanaccess")
override def quoteIdentifier(colName: String): String = s"[$colName]"
}
JdbcDialects.registerDialect(MSAccessJdbcDialect)

Handling database open and close in subroutines

I'm implementing a code where I need to perform few actions at fixed intervals.
Few of them are related to fetching data from mysql database.
To schedule these actions at fixed interval I'm using gocron. It is working quite well.
For the database, as of now, I'm creating an instance at the start of main program and passing it to subroutines. I'm using https://github.com/jmoiron/sqlx to work with DB.
The flow of code is:
i- initialise resources. For eg db = sql.Open;put the DB in common struct to pass to all subroutine
ii- scheduleActions using gocron (pass resources as needed)
iii- actions are specific subroutine that perform task as needed using given resource (for eg DB)
I got few cases where the mysql service needs to be restarted.
Then as expected I get error stating invalid connection. some thing like
[mysql] packets.go:33: unexpected EOF
[mysql] packets.go:130: write tcp 127.0.0.1:36191->127.0.0.1:3306: write: broken pipe
[mysql] connection.go:312: invalid connection
To get around this, I did an implementation to acquire the DB connection with in the subroutine and close with defer db.close(). With this I'm getting error related to too many open connections. I have checked for proper close of rows, as well usage of scan. And see the recommendations are being followed.
I would like to understand how to go about DB open and close handling in my case.
You can use sync.Once to prevent this:
var conn *sql.DB // Set package-wide, but not exported
var once sync.Once
func GetConnection() *sql.DB {
once.Do(func() {
var err error
if conn, err = sql.Open("postgres", "<credentials>"); err != nil {
log.Panic(err)
}
conn.SetMaxOpenConns(20) // Sane default
conn.SetMaxIdleConns(0)
conn.SetConnMaxLifetime(time.Nanosecond)
})
return conn
}
Read this: https://aaronoellis.com/articles/preventing-max-connection-errors-in-go

Grails transaction setSavePoint method on mysql causes exception

I am using grails with mysql 5. I am using .withTransaction for transaction management in a service. Within the withTransaction block I am using savePoint() method which is causing the following exception. Note: I am using setRollbackOnly() method without any issue.
2011-06-26 23:02:37,818 [quartzScheduler_Worker-7] ERROR listeners.ExceptionPrinterJobListener - Exception occured in job: GRAILS_JOBS.com.exmp.bdg.PowerRollupJob
org.quartz.JobExecutionException: Transaction manager does not allow nested transactions [See nested exception: org.springframework.transaction.NestedTransactionNotSupportedException: Transaction manager does not allow nested transactions]
at org.codehaus.groovy.grails.plugins.quartz.GrailsJobFactory$GrailsTaskClassJob.execute(GrailsJobFactory.java:81)
at org.quartz.core.JobRunShell.run(JobRunShell.java:199)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:546)
Caused by: org.springframework.transaction.NestedTransactionNotSupportedException: Transaction manager does not allow nested transactions
at org.springframework.jdbc.datasource.JdbcTransactionObjectSupport.getConnectionHolderForSavepoint(JdbcTransactionObjectSupport.java:151)
at org.springframework.jdbc.datasource.JdbcTransactionObjectSupport.createSavepoint(JdbcTransactionObjectSupport.java:104)
at org.springframework.transaction.support.AbstractTransactionStatus.createSavepoint(AbstractTransactionStatus.java:176)
at org.springframework.transaction.SavepointManager$createSavepoint.call(Unknown Source)
at com.exmp.bdg.service.PowerRollupService$_doRollup_closure2.doCall(PowerRollupService.groovy:85)
By default the transaction manager for hibernate and MySQL don't have the save points enabled.
In BootStrap.groovy add the following:
transactionManager.setNestedTransactionAllowed(true)
Then in a transaction you can do the following:
Thing.withTransaction { status ->
//Do some work and a save
def savePoint = status.createSavepoint()
//do other work
if(checkOk)
{
//Everything worked so don't need the save point anymore
status.releaseSavepoint(savePoint)
}
else
{
//The other work did not work so rollback from it.
status.rollbackToSavepoint(savePoint)
}
}

org.hibernate.StaleObjectStateException when using Grails with PostgreSQL

I've written a grails service with the following code:
EPCGenerationMetadata requestEPCs(String indicatorDigit, FilterValue filterValue,
PartitionValue partitionValue, String companyPrefix, String itemReference,
Long quantity) throws IllegalArgumentException, IllegalStateException {
//... code
//problematic snippet bellow
def serialGenerator
synchronized(this) {
log.debug "Generating epcs..."
serialGenerator = SerialGenerator.findByItemReference(itemReference)
if(!serialGenerator) {
serialGenerator = new SerialGenerator(itemReference: itemReference, serialNumber: 0l)
}
startingPoint = serialGenerator.serialNumber + 1
serialGenerator.serialNumber += quantity
serialGenerator.save(flush: true)
}
//code continues...
}
Being a grails service a singleton by default, I thought I'd be safe from concurrent inconsistency by adding the synchronized block above. I've created a simple client for testing concurrency, as the service is exposed by http invoker. I ran multiple clients at the same time, passing as argument the same itemReference, and had no problems at all.
However, when I changed the database from MySQL to PostgreSQL 8.4, I couldn't handle concurrent access anymore. When running a single client, everything is fine. However, if I add one more client asking for the same itemReference, I get instantly a StaleObjectStateException:
Exception in thread "main" org.springframework.orm.hibernate3.HibernateOptimisticLockingFailureException: Object of class [br.com.app.epcserver.SerialGenerator] with identifier [10]: optimistic locking failed; nested exception is org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect): [br.com.app.epcserver.SerialGenerator#10]
at org.springframework.orm.hibernate3.SessionFactoryUtils.convertHibernateAccessException(SessionFactoryUtils.java:672)
at org.springframework.orm.hibernate3.HibernateAccessor.convertHibernateAccessException(HibernateAccessor.java:412)
at org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:411)
at org.springframework.orm.hibernate3.HibernateTemplate.executeWithNativeSession(HibernateTemplate.java:374)
at org.springframework.orm.hibernate3.HibernateTemplate.flush(HibernateTemplate.java:881)
at org.codehaus.groovy.grails.orm.hibernate.metaclass.SavePersistentMethod$1.doInHibernate(SavePersistentMethod.java:58)
(...)
at br.com.app.EPCGeneratorService.requestEPCs(EPCGeneratorService.groovy:63)
at br.com.app.epcclient.IEPCGenerator$requestEPCs.callCurrent(Unknown Source)
at br.com.app.epcserver.EPCGeneratorService.requestEPCs(EPCGeneratorService.groovy:29)
at br.com.app.epcserver.EPCGeneratorService$$FastClassByCGLIB$$15a2adc2.invoke()
(...)
Caused by: org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect): [br.com.app.epcserver.SerialGenerator#10]
Note: EPCGeneratorService.groovy:63 refers to serialGenerator.save(flush: true).
I don't know what to think, as the only thing that I've changed was the database. I'd appreciate any advice on the matter.
I'm using:
Grails 1.3.3
Postgres 8.4 (postgresql-8.4-702.jdbc4 driver)
JBoss 6.0.0-M4
MySQL:
mysqld Ver 5.1.41 (mysql-connector-java-5.1.13-bin driver)
Thanks in advance!
That's weird, try disabling transaction.
This is indeed a strange behavior, but you could try to workaround by using a "select ... for upgrade", via hibernate lock method.
Something like this:
def c = SerialGenerator.createCriteria()
serialgenerator = c.get {
eg "itemReferece", itemReference
lock true
}