I am trying to use Java Streams with JPA in Apache Camel Route with MySQL. I have seen few resources and they have mentioned that we need to have the below 3 properties for Streams to work properly with MySQL.
1. Forward-only resultset
2. Read-only statement
3. Fetch-size set to Integer.MIN_VALUE
I have tried setting the 2nd and 3rd property in the JPA Repository using QueryHint.
#QueryHints(value = {
#QueryHint(name = HINT_FETCH_SIZE, value = "" + Integer.MIN_VALUE),
#QueryHint(name = HINT_READONLY, value = "true")
})
Looks like Fetch-size is working, but 2nd property is not working, as I am still facing the below error:
Caused by: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic#54b12700 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
I have made the route Transactional using transacted() which is equivalent to spring #Transactional. But not sure what is the equivalent for #Transactional(readonly=true). Any leads?
from("direct:" + ROUTE_ID)
.routeId(ROUTE_ID)
.log("Start the processing")
.transacted()
.bean(Processor.class)
.to("log:ProcessName?showAll=true");
I have referred this blog as reference.
Related
I encountered an error while doing full-import in solr-6.6.0.
I am getting exception as bellow
This happens when I set
batchSize="-1" in my db-config.xml
If I change this value to say batchSize="100" then import will run without any error.
But recommended value for this is "-1".
Any suggestion why solr throwing this exception.
By the way the data am trying to import is not huge, data am trying to import is just 250 documents.
Stack trace:
org.apache.solr.handler.dataimport.DataImportHandlerException: java.sql.SQLException: Operation not allowed after ResultSet closed
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:464)
at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:377)
at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:133)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:415)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474)
at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457)
at java.lang.Thread.run(Thread.java:745)
By the way am getting one more warning:
Could not read DIH properties from /configs/state/dataimport.properties :class org.apache.zookeeper.KeeperException$NoNodeException
This happens when config directory is not writable.
How can we make config directory writable in solrCloud mode.
Am using zookeeper as watch-dog. Can we go ahead and change permission of config files which are there is zookeeper?
your help greatly appreciated.
Using fetchSize="-1" is only recommended if you have problems running without it. Its behaviour is up to the JDBC driver, but the cause of people assuming its recommended is this sentence from the old wiki:
DataImportHandler is designed to stream row one-by-one. It passes a fetch size value (default: 500) to Statement#setFetchSize which some drivers do not honor. For MySQL, add batchSize property to dataSource configuration with value -1. This will pass Integer.MIN_VALUE to the driver as the fetch size and keep it from going out of memory for large tables.
Unless you're actually seeing issues with the default values, leave the setting alone and assume your JDBC driver does the correct thing (.. which it might not do with -1 as the value).
The reason for dataimport.properties having to be writable is that it writes a property for the last time the import ran to the file, so that you can perform delta updates by referencing the time of the last update in your SQL statement.
You'll have to make the directory writable for the client (solr) if you want to use this feature. My guess would be that you can ignore the warning if you're not using delta imports.
Consider a running Hadoop job, in which a custom InputFormat needs to communicate ("return", similarly to a callback) a few simple values to the driver class (i.e., to the class that has launched the job), from within its overriden getSplits() method, using the new mapreduce API (as opposed to mapred).
These values should ideally be returned in-memory (as opposed to saving them to HDFS or to the DistributedCache).
If these values were only numbers, one could be tempted to use Hadoop counters. However, in numerous tests counters do not seem to be available at the getSplits() phase and anyway they are restricted to numbers.
An alternative could be to use the Configuration object of the job, which, as the source code reveals, should be the same object in memory for both the getSplits() and the driver class.
In such a scenario, if the InputFormat wants to "return" a (say) positive long value to the driver class, the code would look something like:
// In the custom InputFormat.
public List<InputSplit> getSplits(JobContext job) throws IOException
{
...
long value = ... // A value >= 0
job.getConfiguration().setLong("value", value);
...
}
// In the Hadoop driver class.
Job job = ... // Get the job to be launched
...
job.submit(); // Start running the job
...
while (!job.isComplete())
{
...
if (job.getConfiguration().getLong("value", -1))
{
...
}
else
{
continue; // Wait for the value to be set by getSplits()
}
...
}
The above works in tests, but is it a "safe" way of communicating values?
Or is there a better approach for such in-memory "callbacks"?
UPDATE
The "in-memory callback" technique may not work in all Hadoop distributions, so, as mentioned above, a safer way is, instead of saving the values to be passed back in the Configuration object, create a custom object, serialize it (e.g., as JSON), saved it (in HDFS or in the distributed cache) and have it read in the driver class. I have also tested this approach and it works as expected.
Using the configuration is a perfectly suitable solution (admittedly for a problem I'm not sure I understand), but once the job has actually been submitted to the Job tracker, you will not be able to amend this value (client side or task side) and expect to see the change on the opposite side of the comms (setting configuration values in a map task for example will not be persisted to the other mappers, nor to the reducers, nor will be visible to the job tracker).
So to communicate information back from within getSplits back to your client polling loop (to see when the job has actually finished defining the input splits) is fine in your example.
What's your greater aim or use case for using this?
I got an unified JDBC code for reading/writing large texts. Column is CLOB on Oracle and TEXT on MySQL. The following code
java.sql.Clob aClob = resultSet.getClob(COLUMN_NAME);
java.io.InputStream aStream = aClob.getAsciiStream();
int av = aStream.available();
gives relevant value on MySQL (Connector/J 5.0.4) but zero on Oracle (Oracle JDBC driver 11.2.0.2). Clob.length() fortunately gives correct value on both and InputStream.read() up to -1 works too, so there are other ways of obtaining the data in unified way.
Javadoc gives this weird note:
The available method for class InputStream always returns 0.
So which driver is right? And no, i don't want to drag vendor-specific packages into the code :-) This question is JDBC neutral.
I would be tempted to say that both drivers were right.
The Javadoc for the available() method appears to suggest that the value returned is an estimate of how many bytes the InputStream currently has cached and can return to you without an I/O operation. How many bytes it has cached, and how it does any caching, would seem to me to be an implementation detail. The fact that these values are different merely suggests that the two drivers are implemented differently. Nothing in the Javadoc for the available() method suggests to me that either driver is doing anything wrong.
I'd guess that the Oracle driver doesn't cache any data from the CLOB immediately after executing the query, so that might be why the available() method returns 0. However, once data has been read from the stream, the available() method for the Oracle driver no longer returns 0, as it seems Oracle JDBC driver has been to the database and fetched some data out of the CLOB column. On the other hand, MySQL seems to be a bit more proactive in actually fetching data out of the TEXT column as soon as the query has finished executing.
Having read the Javadoc for the available() method I'm not sure why I'd use it. What are you using it for?
I've written a grails service with the following code:
EPCGenerationMetadata requestEPCs(String indicatorDigit, FilterValue filterValue,
PartitionValue partitionValue, String companyPrefix, String itemReference,
Long quantity) throws IllegalArgumentException, IllegalStateException {
//... code
//problematic snippet bellow
def serialGenerator
synchronized(this) {
log.debug "Generating epcs..."
serialGenerator = SerialGenerator.findByItemReference(itemReference)
if(!serialGenerator) {
serialGenerator = new SerialGenerator(itemReference: itemReference, serialNumber: 0l)
}
startingPoint = serialGenerator.serialNumber + 1
serialGenerator.serialNumber += quantity
serialGenerator.save(flush: true)
}
//code continues...
}
Being a grails service a singleton by default, I thought I'd be safe from concurrent inconsistency by adding the synchronized block above. I've created a simple client for testing concurrency, as the service is exposed by http invoker. I ran multiple clients at the same time, passing as argument the same itemReference, and had no problems at all.
However, when I changed the database from MySQL to PostgreSQL 8.4, I couldn't handle concurrent access anymore. When running a single client, everything is fine. However, if I add one more client asking for the same itemReference, I get instantly a StaleObjectStateException:
Exception in thread "main" org.springframework.orm.hibernate3.HibernateOptimisticLockingFailureException: Object of class [br.com.app.epcserver.SerialGenerator] with identifier [10]: optimistic locking failed; nested exception is org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect): [br.com.app.epcserver.SerialGenerator#10]
at org.springframework.orm.hibernate3.SessionFactoryUtils.convertHibernateAccessException(SessionFactoryUtils.java:672)
at org.springframework.orm.hibernate3.HibernateAccessor.convertHibernateAccessException(HibernateAccessor.java:412)
at org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:411)
at org.springframework.orm.hibernate3.HibernateTemplate.executeWithNativeSession(HibernateTemplate.java:374)
at org.springframework.orm.hibernate3.HibernateTemplate.flush(HibernateTemplate.java:881)
at org.codehaus.groovy.grails.orm.hibernate.metaclass.SavePersistentMethod$1.doInHibernate(SavePersistentMethod.java:58)
(...)
at br.com.app.EPCGeneratorService.requestEPCs(EPCGeneratorService.groovy:63)
at br.com.app.epcclient.IEPCGenerator$requestEPCs.callCurrent(Unknown Source)
at br.com.app.epcserver.EPCGeneratorService.requestEPCs(EPCGeneratorService.groovy:29)
at br.com.app.epcserver.EPCGeneratorService$$FastClassByCGLIB$$15a2adc2.invoke()
(...)
Caused by: org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect): [br.com.app.epcserver.SerialGenerator#10]
Note: EPCGeneratorService.groovy:63 refers to serialGenerator.save(flush: true).
I don't know what to think, as the only thing that I've changed was the database. I'd appreciate any advice on the matter.
I'm using:
Grails 1.3.3
Postgres 8.4 (postgresql-8.4-702.jdbc4 driver)
JBoss 6.0.0-M4
MySQL:
mysqld Ver 5.1.41 (mysql-connector-java-5.1.13-bin driver)
Thanks in advance!
That's weird, try disabling transaction.
This is indeed a strange behavior, but you could try to workaround by using a "select ... for upgrade", via hibernate lock method.
Something like this:
def c = SerialGenerator.createCriteria()
serialgenerator = c.get {
eg "itemReferece", itemReference
lock true
}
I am currently trying to move my DB tables over to InnoDB from MyISAM. I am having timing issues with requests and cron jobs that are running on the server that is leading to some errors. I am quite sure that transaction support will help me with the problem. I am therefore transitioning to InnoDB.
I have a suite of tests which make calls to our webservices REST API and receive XML responses. The test suite is fairly thorough, and it's written in Python and uses SQLAlchemy to query information from the database. When I change the tables in the system from MyISAM to InnoDB however, the tests start failing. However, the tests aren't failing because the system isn't working, they are failing because the ORM is not correctly querying the rows from the database I am testing on. when I step through the code I see the correct results, but the ORM is not returning the correct results at all.
Basic flow is:
class UnitTest(unittest.TestCase):
def setUp(self):
# Create a test object in DB that gets affected by the web server
testObject = Obj(foo='one')
self.testId = testObject.id
session.add(testObject)
session.commit()
def tearDown(self):
# Clean up after the test
testObject = session.query(Obj).get(self.testId)
session.delete(testObject)
session.commit()
def test_web_server(self):
# Ensure the initial state of the object.
objects = session.query(Obj).get(self.testId)
assert objects.foo == 'one'
# This will make a simple HTTP get call on an url that will modify the DB
response = server.request.increment_foo(self.testId)
# This one fails, the object still has a foo of 'one'
# When I stop here in a debugger though, and look at the database,
# The row in question actually has the correct value in the database.
# ????
objects = session.query(Obj).get(self.testId)
assert objects.foo == 'two'
Using MyISAM tables to store the object and this test will pass. However, when I change to InnoDB tables, this test will not pass. What is more interesting is that when I step through the code in the debugger, I can see that the datbase has what I expect, so it's not a problem in the web server code. I have tried nearly every combination of expire_all, autoflush, autocommit, etc. etc, and still can't get this test to pass.
I can provide more info if necessary.
Thanks,
Conrad
The problem is that you put the line self.testId = testObject.id before new object is added to session, flushed, and SQLAlchemy assigned ID to it. Thus self.testId is always None. Move this line below session.commit().