I have a configuration file for Kafka which reads data from MYSQL database perfectly fine
name=local-jbdc
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
connection.url=jdbc:mysql://localhost:3306/book
connection.user=root
connection.password=newpass
topic.prefix=quickstart-events
mode=incrementing
incrementing.column.name=__id
query=select * from book_table
offset.flush.timeout.ms=5000
buffer.memory=200
poll.interval.ms=10000
tasks.max=1
Now when I take out the query and provide table.whitelist it doesnot read anything. Not even error.
The confiuration is shown below
name=local-jbdc
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
connection.url=jdbc:mysql://localhost:3306/book
connection.user=root
connection.password=newpass
topic.prefix=quickstart-events
mode=incrementing
incrementing.column.name=__id
table.whitelist=book_table
offset.flush.timeout.ms=5000
buffer.memory=200
poll.interval.ms=10000
tasks.max=1
Can someone help me understand the root cause of this problem. Also how will I be able to do incremental mode for multiple tables.
Edits
When I stop the kafka with Ctrl+C on keyboard
there is a log coming up like this
[2020-11-30 12:35:38,057] INFO [ReplicaManager broker=0] Shut down completely (kafka.server.ReplicaManager)
[2020-11-30 12:35:38,058] INFO Shutting down. (kafka.log.LogManager)
[2020-11-30 12:35:38,106] INFO [ProducerStateManager partition=connect-status-4] Writing producer snapshot at offset 394 (kafka.log.ProducerStateManager)
[2020-11-30 12:35:38,158] INFO [ProducerStateManager partition=__consumer_offsets-18] Writing producer snapshot at offset 1 (kafka.log.ProducerStateManager)
[2020-11-30 12:35:38,219] INFO [ProducerStateManager partition=quickstart-eventsbook_table-0] Writing producer snapshot at offset 19645 (kafka.log.ProducerStateManager)
[2020-11-30 12:35:38,239] INFO [ProducerStateManager partition=quickstart-book_table-0] Writing producer snapshot at offset 2652 (kafka.log.ProducerStateManager)```
The problem was pretty simple. When provided table.whitelist the connector creates a topic with the topic.prefix appended to the table name. In my case it created a new topic named quickstart-eventsbook_table. When query is provided topic.prefix is treated as the topic to send data too.
This post is marked for deletion, as the issue was with the IDE in not creating the proper jar, hence issues with the code interaction
I have a small flink application that reads from a kafka topic,
needs to query if the input from the topic (x) exists in a column of MySql Database before processing it (Not Ideal but its the current requirement)
When I run the Application through the IDE (Intellij) -> It works.
However when I submit the job to flink server it fails to open connection based on driver
Error from Flink Server
// ERROR
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
// ---------------------
// small summary of MAIN
// ---------------------
Get Data from Source (x)
source.map(x => {
// open connection (Fails to open)
// check if data exist in db
})
// -------------------------------------
// open connection function (Scala Code)
// -------------------------------------
def openConnection() : Boolean = {
try {
// - set driver
Class.forName("com.mysql.jdbc.Driver")
// - make the connection
connection = DriverManager.getConnection(url, user, pswd)
// - set status controller
connection_open = true
}
catch {
// - catch error
case e: Throwable => e.printStackTrace
// - set status controller
connection_open = false
}
// return result
return connection_open
}
Question
1) Whats the correct way to interface to MySql Database from a flink application?
2) I will also at a later stage have to do similar interaction with MongoDB, whats the correct way interacting with MongoDB from FLink?
Unbelievable IntelliJ does not update dependencies on rebuild command.
In IntelliJ, You have to delete and re-setup your artifact creator for all dependencies do be added. (Build, Clean, Rebuild,Delete) does not update its settings.
I deleted and recreated the artifact file. And it Works
Apologies for the unnecessary inconvenience (As you can imagine my frustration). But it's a word of caution for those developing in IntelliJ, to manually delete and recreate artifacts
Solution:
(File -> Project Structure -> Artifacts -> (-) delete previous one -> (+) create new one -> Select Main Class)
I would like to pre-process following log structure with nxlog and send it then to graylog.
My custom app log structure:
timestamp;field1;field2; ---- Start of good event ----
timestamp;field3;field4;field5;field6
timestamp;field7;field8;field9;field10
timestamp;field11;field12; --- End of good event ---
timestamp;FAIL;field13;field14
timestamp;FAIL;field15;field16
The GELF output from nxlog should contain full_message with "good event" or "bad event".
"good event" = 1 line as follows:
timestamp;field1;field2; ---- Start of good event ----;timestamp;field3;field4;field5;field6;timestamp;field7;field8;field9;field10;timestamp;field11;field12; --- End of good event ---
"bad event" should contain 1 line as follows:
timestamp;FAIL;field13;field14; timestamp;FAIL;field15;field16
I have no problem to parse "good event" with xm_multiline and define it's HeaderLine and EndLine.
But I have absolutely no idea, how to parse two different multilines. Could you give me any hint, please?
Is it possible to use if-else statement with "InputType"? I mean "if condition1 then InputType good-event and some-actions else InputType bad-event and some-actions". Or it needs totally different approach - e.g. no xm_multiline usage but some kind of regex magic?
Thanks in advance.
You can still use xm_multiline. You just need to define the two different patterns with regex.
Since you didn't provide your configuration I'll use my configuration for a different log format as an example.
I have a java application I need to monitor the logs for that doesn't use consistent time formatting, so messages might look like this:
2019-04-24 00:00:13,952 WARN [SemaphoreArrayListManagedConnectionPool] (QuartzScheduler_quartzScheduler-wildflyapp0201401_ClusterManager) IJ000604: Throwable while attempting to get a new connection: null: javax.resource.ResourceException: IJ031084: Unable to create connection
new connection: null: javax.resource.ResourceException: IJ031084: Unable to create connection
at org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory.createLocalManagedConnection(LocalManagedConnectionFactory.java:336)
at org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory.getLocalManagedConnection(LocalManagedConnectionFactory.java:343)
Or like this:
14:00:34,426 INFO [stdout] (default task-73) com.xyz.england.idserver.comp.impl.Service DEBUG [Get][db113034-ecc6-4c0d-86f2-moo3e33942f2] Job Package id.
14:00:34,426 INFO [stdout] (default task-73) [DEBUG 2019-04-24 14:00:34,426] [Get][db113034-ecc6-4c0d-86f2-moo3e33942f2] Job Package id.
14:00:34,427 INFO [stdout] (default task-39) com.xyz.england.idserver.comp.impl.Service DEBUG [Get][0c4d63c0-74d7-4599-bc40-mooa84cf62ea] Job Package id.
14:00:34,427 INFO [stdout] (default task-39) [DEBUG 2019-04-24 14:00:34,425] [Get][0c4d63c0-74d7-4599-bc40-mooa84cf62ea] Job Package id.
If the log used one or the other time formats I could have used one of these two configurations:
<Extension java_multiline>
Module xm_multiline
HeaderLine /^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d,\d\d\d /
</Extension>
OR
<Extension java_multiline>
Module xm_multiline
HeaderLine /^\d\d:\d\d:\d\d,\d\d\d/
</Extension>
Since that wasn't the case I had to include them in a single statement using alternation, specifically using the pipe symbol aka the OR operand:
<Extension java_multiline>
Module xm_multiline
HeaderLine /^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d,\d\d\d |^\d\d:\d\d:\d\d,\d\d\d /
</Extension>
Using this regex statement either time format will match as my header line.
We have a neo4j graph database with around 60 million nodes and an equivalent relationships.
We have been facing consistent packet drops and delays in processing and a complete hung server after 2 hours. We had to shutdown and restart our servers every time this happens and we are having trouble understanding where we went wrong with our configuration.
We are seeing the following kind of exceptions in the console.log file -
java.lang.IllegalStateException: s=DISPATCHED i=true a=null o.e.jetty.server.HttpConnection - HttpConnection#609c1158{FILLING}
java.lang.IllegalStateException: s=DISPATCHED i=true a=null o.e.j.util.thread.QueuedThreadPool
java.lang.IllegalStateException: org.eclipse.jetty.util.SharedBlockingCallback$BlockerTimeoutException
o.e.j.util.thread.QueuedThreadPool - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$3#59d5a975 in
qtp1667455214{STARTED,14<=21<=21,i=0,q=58}
org.eclipse.jetty.server.Response - Committed before 500 org.neo4j.server.rest.repr.OutputFormat$1#39beaadf
o.e.jetty.servlet.ServletHandler - /db/data/cypher java.lang.IllegalStateException: Committed at
org.eclipse.jetty.server.Response.resetBuffer(Response.java:1253)
~[jetty-server-9.2.
org.eclipse.jetty.server.HttpChannel - /db/data/cypher java.lang.IllegalStateException: Committed at
org.eclipse.jetty.server.Response.resetBuffer(Response.java:1253)
~[jetty-server-9.2.
org.eclipse.jetty.server.HttpChannel - Could not send response error 500: java.lang.IllegalStateException: Committed
o.e.jetty.server.ServerConnector - Stopped
o.e.jetty.servlet.ServletHandler - /db/data/cypher org.neo4j.graphdb.TransactionFailureException: Transaction was marked
as successful, but unable to commit transaction so rolled back.
We are using neo4j enterprise edition 2.2.5 server in SINGLE/NON CLUSTER mode on Azure D series 8 core CPU,56 GB RAM UBUNTU 14.04 LTS machine with an attached 500GB data disk.
Here is a snapshot of the sizes of neostore files
8.5G Oct 2 15:48 neostore.propertystore.db
15G Oct 2 15:48 neostore.relationshipstore.db
2.5G Oct 2 15:48 neostore.nodestore.db
6.9M Oct 2 15:48 neostore.relationshipgroupstore.db
3.7K Oct 2 15:07 neostore.schemastore.db
145 Oct 2 15:07 neostore.labeltokenstore.db
170 Oct 2 15:07 neostore.relationshiptypestore.db
The Neo4j configuration is as follows -
Allocated 30GB to file buffer cache (dbms.pagecache.memory=30G)
Allocated 20GB to JVM heap memory (wrapper.java.initmemory=20480, wrapper.java.maxmemory=20480)
Using the default hpc(High performance) type cache.
Forcing the RULE planner by default (dbms.cypher.planner=RULE)
Maximum threads processing queries is 16(twice the number of cores) - org.neo4j.server.webserver.maxthreads=16
Transaction timeout of 60 seconds - org.neo4j.server.transaction.timeout=60
Guard Timeout if query execution time is greater than 10 seconds - org.neo4j.server.webserver.limit.executiontime=10000
Rest of the settings are default
We actually want to setup a cluster of 3 nodes but before that we want to be sure if our basic configuration is correct. Please help us
--------------------------------------------------------------------------
EDITED to ADD Query Sample
Typically our cypher query frequency is 18K queries in an hour with an average of roughly 5-6 queries a second. There are also times when there are about 80 queries per second.
Our Typical Queries look like the ones below
match (a:TypeA {param:{param}})-[:RELA]->(d:TypeD) with distinct d,a skip {skip} limit 100 optional match (d)-[:RELF]->(c:TypeC)<-[:RELF]-(b:TypeB)<-[:RELB]-(a) with distinct d,a,collect(distinct b.bid) as bids,collect(distinct c.param3) as param3Coll optional match (d)-[:RELE]->(p:TypeE)<-[:RELE]-(b1:TypeB)<-[:RELB]-(a) with distinct d as distD,bids+collect(distinct b1.bid) as tbids,param3Coll,collect(distinct p.param4) as param4Coll optional match (distD)-[:RELC]->(f:TypeF) return id(distD),distD.param5,exists((distD)<-[:RELG]-()) as param6, tbids,param3Coll,param4Coll,collect(distinct id(f)) as fids
match (a:TypeA {param:{param}})-[:RELB]->(b) return count(distinct b)
MATCH (a:TypeA{param:{param}})-[r:RELD]->(a1)-[:RELH]->(h) where r.param1=true with a,a1,h match (h)-[:RELL]->(d:TypeI) where (d.param2/2)%2=1 optional match (a)-[:RELB]-(b)-[:RELM {param3:true}]->(c) return a1.param,id(a1),collect(b.bid),c.param5
match (a:TypeA {param:{param}}) match (a)-[:RELB]->(b) with distinct b,a skip {skip} limit 100 match (a)-[:RELH]->(h1:TypeH) match (b)-[:RELF|RELE]->(x)<-[:RELF|RELE]-(h2:TypeH)<-[:RELH]-(a1) optional match (a1)<-[rd:RELD]-(a) with distinct a1,a,h1,b,h2,rd.param1 as param2,collect(distinct x.param3) as param3s,collect(distinct x.param4) as param4s optional match (a1)-[:RELB]->(b1) where b1.param7 in [0,1] and exists((b1)-[:RELF|RELE]->()<-[:RELF|RELE]-(h1)) with distinct a1,a,b,h2,param2,param3s,param4s,b1,case when param2 then false else case when ((a1.param5 in [2,3] or length(param3s)>0) or (a1.param5 in [1,3] or length(param4s)>0)) then case when b1.param7=0 then false else true end else false end end as param8 MERGE (a)-[r2:RELD]->(a1) on create set r2.param6=true on match set r2.param6=case when param8=true and r2.param9=false then true else false end MERGE (b)-[r3:RELM]->(h2) SET r2.param9=param8, r3.param9=param8
MATCH (a:TypeA {param:{param}})-[:RELI]->(g:TypeG {type:'type1'}) match (g)<-[r:RELI]-(a1:TypeA)-[:RELJ]->(j)-[:RELK]->(g) return distinct g, collect(j.displayName), collect(r.param1), g.gid, collect(a1.param),collect(id(a1))
match (a:TypeA {param:{param}})-[r:RELD {param2:true}]->(a1:TypeA)-[:RELH]->(b:TypeE) remove r.param2 return id(a1),b.displayName, b.firstName,b.lastName
match (a:TypeA {param:{param}})-[:RELA]->(b:TypeB) return a.param1,count(distinct id(b))
MATCH (a:TypeA {param:{param}}) set a.param1=true;
match (a:TypeE)<-[r:RELE]-(b:TypeB) where a.param4 in {param4s} delete r return count(b);
MATCH (a:TypeA {param:{param}}) return id(a);
Adding a few more strange things I have been noticing....
I am have stopped all my webservers. So, currently there are no incoming requests to neo4j. However I see that there are about 40K open file handles in TCP close/wait state implying the client has closed its connection because of time out and Neo4j has not processed it and responded to that request. I also see (from messages.log) that Neo4j server is
still processing queries and as it does this, the 40K open file handles is slowly reducing. By the time I write this post there are about 27K open file handles in TCP close/wait state.
Also I see that the queries are not continuously processed. Every once in a while I am seeing a pause in messages.log and I see these messages about log rotation because of some out of order sequence as below
Rotating log version:5630
2015-10-04 05:10:42.712+0000 INFO
[o.n.k.LogRotationImpl]: Log Rotation [5630]: Awaiting all
transactions closed...
2015-10-04 05:10:42.712+0000 INFO
[o.n.k.i.s.StoreFactory]: Waiting for all transactions to close...
committed: out-of-order-sequence:95494483 [95494476]
committing:
95494483
closed: out-of-order-sequence:95494480 [95494246]
2015-10-04 05:10:43.293+0000 INFO [o.n.k.LogRotationImpl]: Log
Rotation [5630]: Starting store flush...
2015-10-04 05:10:44.941+0000
INFO [o.n.k.i.s.StoreFactory]: About to rotate counts store at
transaction 95494483 to [/datadrive/graph.db/neostore.counts.db.b],
from [/datadrive/graph.db/neostore.counts.db.a].
2015-10-04
05:10:44.944+0000 INFO [o.n.k.i.s.StoreFactory]: Successfully rotated
counts store at transaction 95494483 to
[/datadrive/graph.db/neostore.counts.db.b], from
[/datadrive/graph.db/neostore.counts.db.a].
I also see these messages once in a while
2015-10-04 04:59:59.731+0000 DEBUG [o.n.k.EmbeddedGraphDatabase]:
NodeCache array:66890956 purge:93 size:1.3485746GiB misses:0.80978173%
collisions:1.9829895% (345785) av.purge waits:13 purge waits:0 avg.
purge time:110ms
or
2015-10-04 05:10:20.768+0000 DEBUG [o.n.k.EmbeddedGraphDatabase]:
RelationshipCache array:66890956 purge:0 size:257.883MiB
misses:10.522135% collisions:11.121769% (5442101) av.purge waits:0
purge waits:0 avg. purge time:N/A
All of this is happening when there are no incoming requests and neo4j is processing old pending 40K requests as I mentioned above.
Since, it is a dedicated server, should not the server be processing the queries continuously without such a large pending queue? Am I missing something here? Please help me
Didn't go completely over your queries. You should examine each of the queries you send often by prefixing with PROFILE or EXPLAIN to see the query plan and get an idea how many accesses they cause.
E.g. the second match in the following query looks like being expensive since the two patterns are not connected with each other:
MATCH (a:TypeA{param:{param}})-[r:RELD]->(a1)-[:RELH]->(h) where r.param1=true with a,a1,h match (m)-[:RELL]->(d:TypeI) where (d.param2/2)%2=1 optional match (a)-[:RELB]-(b)-[:RELM {param3:true}]->(c) return a1.param,id(a1),collect(b.bid),c.bPhoto
Also enable garbage collection logging in neo4j-wrapper.conf and check if you're suffering from long pauses. If so, consider to reduce heap size.
Looks like that this issue requires more research from your side, but there is some things from my experience.
TL;DR; - I had similar issue with my own unmanaged extension, where transactions were not properly handled.
Language/connector
What language/connector is used in your application?
You should verify that:
If some popular open-source library is used - your application is using latest version. Probably there is bug in your connector.
If you have your own, hand-written solution that works with REST API - verify that ALL http request are closed at client side.
Extension/plugins
It's quite easy to mess things up, if custom-written extensions/plugins are used.
What should be checked:
All transaction are always closed (try-with-resource is used)
Neo4j settings
Verify your server configuration. For example, if you have large value for org.neo4j.server.transaction.timeout and you don't handle properly transaction at client side - you can end up with a lot of running transactions.
Monitoring
You are using Enterprise version. That means that you have access to JMX. It's good idea to check information about active Locks & Transactions.
Another Neo4j version
Maybe you can try another Neo4j version. For example 2.3.0-M03.
This will give answers for questions like:
Is this Neo4j 2.2.5 bug?
Is this existing Neo4j installation misconfiguration?
Linux configuration
Check your Linux configuration.
What is in your /etc/sysctl.conf? Are there any invalid/unrelated settings?
Another server
You can try to spin-up another server (i.e. VM at DigitalOcean), deploy database there and load it with Gatling.
Maybe your server have some invalid configuration?
Try to get rid of everything, that can be cause of the problem, to make it easier to find a problem.
Using the following code:
EntityManager manager = factory.createEntityManager();
manager.setFlushMode(FlushModeType.AUTO);
PhysicalCard card = new PhysicalCard();
card.setIdentifier("012345ABCDEF");
card.setStatus(CardStatusEnum.Assigned);
manager.persist(card);
manager.close();
when code runs to this line, the "card" record does not appear in the database. However, if using the FlushModeType.COMMIT, and using transaction like this:
EntityManager manager = factory.createEntityManager();
manager.setFlushMode(FlushModeType.COMMIT);
manager.getTransaction().begin();
PhysicalCard card = new PhysicalCard();
card.setIdentifier("012345ABCDEF");
card.setStatus(CardStatusEnum.Assigned);
manager.persist(card);
manager.getTransaction().commit();
manager.close();
it works fine. From the eclipselink's log i can see the previous code doesn't issue an INSERT statement while the second code does.
Do I miss something here? I'm using EclipseLink 2.3 and mysql connection/J 5.1
I am assuming that you are using EclipseLink in a Java SE application, or in a Java EE application but with an application managed EntityManager instead of a container managed EntityManager.
In both scenarios, all updates made to the persistence context are flushed only when the transaction associated with the EntityManager commits (using EntityTransaction.commit), or when the EntityManager's persistence context is flushed (using EntityManager.flush). This is the reason why the second code snippet issues the INSERT as it invokes the EntityTransaction's begin and commit methods, while the first doesn't; an invocation of em.persist does not issue an INSERT.
As far as FlushModeType values are concerned, the API documentation states the following:
COMMIT
public static final FlushModeType COMMIT
Flushing to occur at transaction commit. The provider may flush at
other times, but is not required to.
AUTO
public static final FlushModeType AUTO
(Default) Flushing to occur at query execution.
Since, queries haven't been executed in the first case case, no flushing i.e. no INSERT statements corresponding to the persistence of the PhysicalCard entity will be issued. It is the explicit commit of the EntityTransaction in the second, that is resulting in the INSERT statement being issued.