Update ttl for all records in aerospike - configuration

I was stuck in a situation that I have initialised a namesapce with
default-ttl to 30 days. There was about 5 million data with that (30-day calculated) ttl-value. Actually, my requirement is that ttl should be zero(0), but It(ttl-30d) was kept with unaware or un-recognise.
So, Now I want to update prev(old) 5 million data with new ttl-value (Zero).
I've checked/tried "set-disable-eviction true", but it is not working, it is removing data according to (old)ttl-value.
How do I overcome out this? (and I want to retrieve the removed data, How can I?).
Someone help me.

First, eviction and expiration are two different mechanisms. You can disable evictions in various ways, such as the set-disable-eviction config parameter you've used. You cannot disable the cleanup of expired records. There's a good knowledge base FAQ What are Expiration, Eviction and Stop-Writes?. Unfortunately, the expired records that have been cleaned up are gone if their void time is in the past. If those records were merely evicted (i.e. removed before their void time due to crossing the namespace high-water mark for memory or disk) you can cold restart your node, and those records with a future TTL will come back. They won't return if either they were durably deleted or if their TTL is in the past (such records gets skipped).
As for resetting TTLs, the easiest way would be to do this through a record UDF that is applied to all the records in your namespace using a scan.
The UDF for your situation would be very simple:
ttl.lua
function to_zero_ttl(rec)
local rec_ttl = record.ttl(rec)
if rec_ttl > 0 then
record.set_ttl(rec, -1)
aerospike:update(rec)
end
end
In AQL:
$ aql
Aerospike Query Client
Version 3.12.0
C Client Version 4.1.4
Copyright 2012-2017 Aerospike. All rights reserved.
aql> register module './ttl.lua'
OK, 1 module added.
aql> execute ttl.to_zero_ttl() on test.foo

Using a Python script would be easier if you have more complex logic, with filters etc.
zero_ttl_operation = [operations.touch(-1)]
query = client.query(namespace, set_name)
query.add_ops(zero_ttl_operation)
policy = {}
job = query.execute_background(policy)
print(f'executing job {job}')
while True:
response = client.job_info(job, aerospike.JOB_SCAN, policy={'timeout': 60000})
print(f'job status: {response}')
if response['status'] != aerospike.JOB_STATUS_INPROGRESS:
break
time.sleep(0.5)
Aerospike v6 and Python SDK v7.

Related

Neo4j server hangs every 2 hours consistently. Please help me understand if something is wrong with the configuration

We have a neo4j graph database with around 60 million nodes and an equivalent relationships.
We have been facing consistent packet drops and delays in processing and a complete hung server after 2 hours. We had to shutdown and restart our servers every time this happens and we are having trouble understanding where we went wrong with our configuration.
We are seeing the following kind of exceptions in the console.log file -
java.lang.IllegalStateException: s=DISPATCHED i=true a=null o.e.jetty.server.HttpConnection - HttpConnection#609c1158{FILLING}
java.lang.IllegalStateException: s=DISPATCHED i=true a=null o.e.j.util.thread.QueuedThreadPool
java.lang.IllegalStateException: org.eclipse.jetty.util.SharedBlockingCallback$BlockerTimeoutException
o.e.j.util.thread.QueuedThreadPool - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$3#59d5a975 in
qtp1667455214{STARTED,14<=21<=21,i=0,q=58}
org.eclipse.jetty.server.Response - Committed before 500 org.neo4j.server.rest.repr.OutputFormat$1#39beaadf
o.e.jetty.servlet.ServletHandler - /db/data/cypher java.lang.IllegalStateException: Committed at
org.eclipse.jetty.server.Response.resetBuffer(Response.java:1253)
~[jetty-server-9.2.
org.eclipse.jetty.server.HttpChannel - /db/data/cypher java.lang.IllegalStateException: Committed at
org.eclipse.jetty.server.Response.resetBuffer(Response.java:1253)
~[jetty-server-9.2.
org.eclipse.jetty.server.HttpChannel - Could not send response error 500: java.lang.IllegalStateException: Committed
o.e.jetty.server.ServerConnector - Stopped
o.e.jetty.servlet.ServletHandler - /db/data/cypher org.neo4j.graphdb.TransactionFailureException: Transaction was marked
as successful, but unable to commit transaction so rolled back.
We are using neo4j enterprise edition 2.2.5 server in SINGLE/NON CLUSTER mode on Azure D series 8 core CPU,56 GB RAM UBUNTU 14.04 LTS machine with an attached 500GB data disk.
Here is a snapshot of the sizes of neostore files
8.5G Oct 2 15:48 neostore.propertystore.db
15G Oct 2 15:48 neostore.relationshipstore.db
2.5G Oct 2 15:48 neostore.nodestore.db
6.9M Oct 2 15:48 neostore.relationshipgroupstore.db
3.7K Oct 2 15:07 neostore.schemastore.db
145 Oct 2 15:07 neostore.labeltokenstore.db
170 Oct 2 15:07 neostore.relationshiptypestore.db
The Neo4j configuration is as follows -
Allocated 30GB to file buffer cache (dbms.pagecache.memory=30G)
Allocated 20GB to JVM heap memory (wrapper.java.initmemory=20480, wrapper.java.maxmemory=20480)
Using the default hpc(High performance) type cache.
Forcing the RULE planner by default (dbms.cypher.planner=RULE)
Maximum threads processing queries is 16(twice the number of cores) - org.neo4j.server.webserver.maxthreads=16
Transaction timeout of 60 seconds - org.neo4j.server.transaction.timeout=60
Guard Timeout if query execution time is greater than 10 seconds - org.neo4j.server.webserver.limit.executiontime=10000
Rest of the settings are default
We actually want to setup a cluster of 3 nodes but before that we want to be sure if our basic configuration is correct. Please help us
--------------------------------------------------------------------------
EDITED to ADD Query Sample
Typically our cypher query frequency is 18K queries in an hour with an average of roughly 5-6 queries a second. There are also times when there are about 80 queries per second.
Our Typical Queries look like the ones below
match (a:TypeA {param:{param}})-[:RELA]->(d:TypeD) with distinct d,a skip {skip} limit 100 optional match (d)-[:RELF]->(c:TypeC)<-[:RELF]-(b:TypeB)<-[:RELB]-(a) with distinct d,a,collect(distinct b.bid) as bids,collect(distinct c.param3) as param3Coll optional match (d)-[:RELE]->(p:TypeE)<-[:RELE]-(b1:TypeB)<-[:RELB]-(a) with distinct d as distD,bids+collect(distinct b1.bid) as tbids,param3Coll,collect(distinct p.param4) as param4Coll optional match (distD)-[:RELC]->(f:TypeF) return id(distD),distD.param5,exists((distD)<-[:RELG]-()) as param6, tbids,param3Coll,param4Coll,collect(distinct id(f)) as fids
match (a:TypeA {param:{param}})-[:RELB]->(b) return count(distinct b)
MATCH (a:TypeA{param:{param}})-[r:RELD]->(a1)-[:RELH]->(h) where r.param1=true with a,a1,h match (h)-[:RELL]->(d:TypeI) where (d.param2/2)%2=1 optional match (a)-[:RELB]-(b)-[:RELM {param3:true}]->(c) return a1.param,id(a1),collect(b.bid),c.param5
match (a:TypeA {param:{param}}) match (a)-[:RELB]->(b) with distinct b,a skip {skip} limit 100 match (a)-[:RELH]->(h1:TypeH) match (b)-[:RELF|RELE]->(x)<-[:RELF|RELE]-(h2:TypeH)<-[:RELH]-(a1) optional match (a1)<-[rd:RELD]-(a) with distinct a1,a,h1,b,h2,rd.param1 as param2,collect(distinct x.param3) as param3s,collect(distinct x.param4) as param4s optional match (a1)-[:RELB]->(b1) where b1.param7 in [0,1] and exists((b1)-[:RELF|RELE]->()<-[:RELF|RELE]-(h1)) with distinct a1,a,b,h2,param2,param3s,param4s,b1,case when param2 then false else case when ((a1.param5 in [2,3] or length(param3s)>0) or (a1.param5 in [1,3] or length(param4s)>0)) then case when b1.param7=0 then false else true end else false end end as param8 MERGE (a)-[r2:RELD]->(a1) on create set r2.param6=true on match set r2.param6=case when param8=true and r2.param9=false then true else false end MERGE (b)-[r3:RELM]->(h2) SET r2.param9=param8, r3.param9=param8
MATCH (a:TypeA {param:{param}})-[:RELI]->(g:TypeG {type:'type1'}) match (g)<-[r:RELI]-(a1:TypeA)-[:RELJ]->(j)-[:RELK]->(g) return distinct g, collect(j.displayName), collect(r.param1), g.gid, collect(a1.param),collect(id(a1))
match (a:TypeA {param:{param}})-[r:RELD {param2:true}]->(a1:TypeA)-[:RELH]->(b:TypeE) remove r.param2 return id(a1),b.displayName, b.firstName,b.lastName
match (a:TypeA {param:{param}})-[:RELA]->(b:TypeB) return a.param1,count(distinct id(b))
MATCH (a:TypeA {param:{param}}) set a.param1=true;
match (a:TypeE)<-[r:RELE]-(b:TypeB) where a.param4 in {param4s} delete r return count(b);
MATCH (a:TypeA {param:{param}}) return id(a);
Adding a few more strange things I have been noticing....
I am have stopped all my webservers. So, currently there are no incoming requests to neo4j. However I see that there are about 40K open file handles in TCP close/wait state implying the client has closed its connection because of time out and Neo4j has not processed it and responded to that request. I also see (from messages.log) that Neo4j server is
still processing queries and as it does this, the 40K open file handles is slowly reducing. By the time I write this post there are about 27K open file handles in TCP close/wait state.
Also I see that the queries are not continuously processed. Every once in a while I am seeing a pause in messages.log and I see these messages about log rotation because of some out of order sequence as below
Rotating log version:5630
2015-10-04 05:10:42.712+0000 INFO
[o.n.k.LogRotationImpl]: Log Rotation [5630]: Awaiting all
transactions closed...
2015-10-04 05:10:42.712+0000 INFO
[o.n.k.i.s.StoreFactory]: Waiting for all transactions to close...
committed: out-of-order-sequence:95494483 [95494476]
committing:
95494483
closed: out-of-order-sequence:95494480 [95494246]
2015-10-04 05:10:43.293+0000 INFO [o.n.k.LogRotationImpl]: Log
Rotation [5630]: Starting store flush...
2015-10-04 05:10:44.941+0000
INFO [o.n.k.i.s.StoreFactory]: About to rotate counts store at
transaction 95494483 to [/datadrive/graph.db/neostore.counts.db.b],
from [/datadrive/graph.db/neostore.counts.db.a].
2015-10-04
05:10:44.944+0000 INFO [o.n.k.i.s.StoreFactory]: Successfully rotated
counts store at transaction 95494483 to
[/datadrive/graph.db/neostore.counts.db.b], from
[/datadrive/graph.db/neostore.counts.db.a].
I also see these messages once in a while
2015-10-04 04:59:59.731+0000 DEBUG [o.n.k.EmbeddedGraphDatabase]:
NodeCache array:66890956 purge:93 size:1.3485746GiB misses:0.80978173%
collisions:1.9829895% (345785) av.purge waits:13 purge waits:0 avg.
purge time:110ms
or
2015-10-04 05:10:20.768+0000 DEBUG [o.n.k.EmbeddedGraphDatabase]:
RelationshipCache array:66890956 purge:0 size:257.883MiB
misses:10.522135% collisions:11.121769% (5442101) av.purge waits:0
purge waits:0 avg. purge time:N/A
All of this is happening when there are no incoming requests and neo4j is processing old pending 40K requests as I mentioned above.
Since, it is a dedicated server, should not the server be processing the queries continuously without such a large pending queue? Am I missing something here? Please help me
Didn't go completely over your queries. You should examine each of the queries you send often by prefixing with PROFILE or EXPLAIN to see the query plan and get an idea how many accesses they cause.
E.g. the second match in the following query looks like being expensive since the two patterns are not connected with each other:
MATCH (a:TypeA{param:{param}})-[r:RELD]->(a1)-[:RELH]->(h) where r.param1=true with a,a1,h match (m)-[:RELL]->(d:TypeI) where (d.param2/2)%2=1 optional match (a)-[:RELB]-(b)-[:RELM {param3:true}]->(c) return a1.param,id(a1),collect(b.bid),c.bPhoto
Also enable garbage collection logging in neo4j-wrapper.conf and check if you're suffering from long pauses. If so, consider to reduce heap size.
Looks like that this issue requires more research from your side, but there is some things from my experience.
TL;DR; - I had similar issue with my own unmanaged extension, where transactions were not properly handled.
Language/connector
What language/connector is used in your application?
You should verify that:
If some popular open-source library is used - your application is using latest version. Probably there is bug in your connector.
If you have your own, hand-written solution that works with REST API - verify that ALL http request are closed at client side.
Extension/plugins
It's quite easy to mess things up, if custom-written extensions/plugins are used.
What should be checked:
All transaction are always closed (try-with-resource is used)
Neo4j settings
Verify your server configuration. For example, if you have large value for org.neo4j.server.transaction.timeout and you don't handle properly transaction at client side - you can end up with a lot of running transactions.
Monitoring
You are using Enterprise version. That means that you have access to JMX. It's good idea to check information about active Locks & Transactions.
Another Neo4j version
Maybe you can try another Neo4j version. For example 2.3.0-M03.
This will give answers for questions like:
Is this Neo4j 2.2.5 bug?
Is this existing Neo4j installation misconfiguration?
Linux configuration
Check your Linux configuration.
What is in your /etc/sysctl.conf? Are there any invalid/unrelated settings?
Another server
You can try to spin-up another server (i.e. VM at DigitalOcean), deploy database there and load it with Gatling.
Maybe your server have some invalid configuration?
Try to get rid of everything, that can be cause of the problem, to make it easier to find a problem.

Windows Phone Background Agents

Question on agents: I specifically want to create a Periodic Task, but only want to run it once every day, say 1am, not every 30 minutes which is the default. In the OnInvoke, do I simply check for the hour, and run it only if current hour matches that desired hour.
But on the next OnInvoke call, it will try to run again in 30 minute, maybe when it's 1:31am.
So I guess I'd use a stored boolean in the app settings to mark as "already run for today" or similar, and then check against that value?
If you specifically want to run a custom action at 1 am, i'm not sure that a single boolean would be enough to make it work.
I guess that you plan to reset your boolean at 1:31 to prepare the execution of the next day, but what if your periodic task is also called at 1h51 (so called more than 2 times between 1am and 2am).
How could this happen? Well maybe this could happen if the device is reboot but i'm not quiet sure about it. In any case, storing the last execution datetime somewhere and comparing it to the current one can be a safer way to ensure that your action is only invoked once per day.
One question remains : Where to store your boolean or datetime (depending which one you'll pick)?
AppSetting does not seem to be a recommanded place according msdn :
Passing information between the foreground app and background agents
can be challenging because it is not possible to predict if the agent
and the app will run simultaneously. The following are recommended
patterns for this.
For Periodic and Resource-intensive Agents: Use LINQ 2 SQL or a file in isolated storage that is guarded with a Mutex. For
one-direction communication where the foreground app writes and the
agent only reads, we recommend using an isolated storage file with a
Mutex. We recommend that you do not use IsolatedStorageSettings to
communicate between processes because it is possible for the data to
become corrupt.
A simple file in isolated storage should get the job done.
If you're going by date (once per day) and it's valid that the task can run at 11pm on a day and 1am the next, then after the agent has run you could store the current date (forgetting about time). Then whenever the agent runs again in 30 minutes, check if the date the task last ran is the same as the current date.
protected override void OnInvoke(ScheduledTask task)
{
var lastRunDate = (DateTime)IsolatedStorageSettings.ApplicationSettings["LastRunDate"];
if(DateTime.Today.Subtract(lastRunDate).Days > 0)
{
// it's a greater date than when the task last ran
// DO STUFF!
// save the date - we only care about the date part
IsolatedStorageSettings.ApplicationSettings["LastRunDate"] = DateTime.Today;
IsolatedStorageSettings.ApplicationSettings.Save();
}
NotifyComplete();
}

Does last_insert_id return the correct auto_increment id in a multiprocessing environment?

Here's some simplified code in my web application:
sub insert {
my $pid = fork();
if ($pid > 0) {
return;
}
else {
&insert_to_mysql();
my $last_id = &get_last_inserted(); # call mysql last_inserted_id
exit(0);
}
}
for my $i (1..10) {
&insert();
}
Since insert is called in a multiprocessing environment, the order of get_last_inserted might be uncertain. Will it always return a correct last id corresponding to insert_to_mysql subroutine? I read some documents saying that as long as the processes don't share the same mysql connection, the returned id will be always the right one. However, these processes are spawned from the same session, so I'm not sure if they share the mysql connection or not. Thanks in advance.
these processes are spawned from the same session
Are you saying you're forking and using the same connection in more than one process? That doesn't work at all, never mind LAST_INSERT_ID(). You can't have two processes reading and writing from the same connection! The response for one could end up in the other, assuming the two clients didn't clobber each other's request.
Does last_insert_id return the correct auto_increment id in a multiprocessing environment?
According to MySQL's documentation for LAST_INSERT_ID(),
The ID that was generated is maintained in the server on a per-connection basis.
It would be useless otherwise. Since connections can't shared across processes, yes, it's safe.
I don't know about MySql and perl, but in PHP that's quite the same issue, since it depends on the environment and not on the language. In PHP, last_insert_id expects one parameter: current connection! As long as multiple instances do not share the same connection ressource, passing the connection resource to the current mysql session should do the trick.
That's what I've found googling around: http://www.xinotes.org/notes/note/179/

Uncommitted transactions in Plone + SqlAlchemy + MySql

We have a hybrid web application integrating a MySql db with Plone (last upgrade was to Plone 4.0), using collective.tin, collective.lead and SqlAlchemy.
Ok, I know that collective.tin never was released and collective.lead has been superseded; however all things work (almost) perfectly since a few years.
Recently we experienced a very strange behaviour and are looking for help in order to understand it.
Among others, we have 2 Plone content types, say A and B, defined by subclassing collective.tin, and the corresponding innodb MySql tables; rows of B have a foreign key towards A.
In the time span of 15-20 minutes, 2 different users created 3 A objects and some 10-20 B objects that weren't committed to MySql but were indexed by Plone; queries I executed with a MySql client from the linux shell weren't able to find those A rows (didn't look for B rows); however, queries executed through the web application (the aforementioned components stack) by those 2 users, and also by other users, occasionally were still finding and correctly visualizing some of those 3 A objects.
Only after I restarted the Zope instance, it was possible to resume normal activity from the Plone web interface; 3 A rows and many B rows were still missing from the MySql db, but the autoincrement counter showed the expected increment; I had to remove 3 invalid brains for A objects from the Plone index (didn't worry for B objects).
Any suggestion on possible causes and on how to investigate the problem?
We had the exact same problem with sqlalchemy 0.4; the session would get out of sync with the actual database contents. The problem was somewhat masked in our case because users were sent to specific backends in the cluster through session affinity. If the affinity was lost suddenly messages had disappeared. The exact details are a little hazy, because I cannot locate the correct (ancient) revision history of the fix I put in place.
From what I can glean from context is that the session identity map prevents the session from requiring the database for objects it retrieved before. It thus won't see changes made to these objects in different sessions.
The fix is to call .expire_all() on the session after each and every commit or rollback; SQLAlchemy 0.5 and up does this automatically (autoexpire=True on the session, now called expire_on_commit I believe), but for 0.4 you'll need to register a SessionExtension to do this for you.
Lucky for you, we also use collective.lead for this project, so my fix is your fix:
# The identity map should be flushed on commit.
# SQLAlchemy 0.5 does this properly, but in 0.4 we need to do this via
# a SesssionExtension.
from sqlalchemy import __version__
if __version__[:3] == '0.4':
from sqlalchemy.orm.session import SessionExtension
class ExpireAllSessionExtension(SessionExtension):
def after_commit(self, session):
"""Expire the identity-map on commit"""
session.expire_all()
def after_rollback(self, session):
"""Expire the identity-map on rollback"""
session.expire_all()
def installExtension():
# Patch collective.lead.database to let us install the extension
# on the session created there.
from collective.lead.database import Database
old_session = Database.session.fget
def session(self):
session = old_session(self)
if session.extension is None:
session.extension = ExpireAllSessionExtension()
return session
Database.session = property(session)
else:
def installExtension():
pass
When defining the mapper, you install this extension with:
from .sessionexpiration import installExtension
# Ensure that sessions get properly expired on commit and rollback.
installExtension()

Reading large files using SQLalchemy

I am trying to read a 200 MB csv file using SQLalchemy. Each line has about 30 columns, of which, I use only 8 columns using the code below. However, the code runs really slow! Is there a way to improve this? I would like to use map/list comprehension or other techniques. As you an tell, I am a newbie. Thanks for your help.
for ddata in dread:
record = DailyData()
record.set_campaign_params(pdata) #Pdata is assigned in the previous step
record.set_daily_data(ddata) #data is sent to a class method where only 8 of 30 items in the list are used
session.add(record)
session.commit() #writing to the SQL database.
don't commit on every record. commit or just flush every 1000 or so:
for i, data in enumerate(csv_stuff):
rec = MyORMObject()
rec.set_stuff(data)
session.add(rec)
if i % 1000 == 0:
session.flush()
session.commit() # flushes everything remaining + commits
if that's still giving you problems then do some basic profiling, see my post at How can I profile a SQLAlchemy powered application?