SQL: insert row with expiration - mysql

Is there a way to insert a row into SQL with an expiration (c.f. you can insert a new key that expires in a minute with Memcached)?
The context is that I want an integration test to insert rows into a database, but I'd prefer not deleting them myself, as it's shared by many. Those delete queries must be manual, or they may not be run, or they may have disastrous typos, etc. I'd prefer the system to do it for me if it can (i.e. automatically and efficiently and well-tested).
(I assume this is not part of the SQL standard and the answer is no.)
related: SQL entries that expire after 24 hours
related: What is the best way to delete old rows from MySQL on a rolling basis?
CONTEXT: I can't make any changes to the database schema, or any of the associated infrastructure.

If you were doing unit testing, I would suggest wrapping each unit test in a BEGIN TRAN / ROLLBACK.
Since you are doing integrated testing, you probably need the data to live outside the scope of a single transaction. SQL Agent would work fine here, except that it would not distinguish between test data and real data. However, you could get around this by INSERTing some identifier to the specific records to be deleted upon expiration. That could be done in a single stored proc..
You might be able to accomplish this by using SQL Server Service Broker. I have not worked with the service broker, but maybe there is a way to delay message processing until a specific time has passed.

add an expiration date column to your table(s). create a job that will delete data that is past expiration on some schedule (say nightly).

Related

Relational DB racing conditions

I'm working with Ruby On Rails (but it doesn't really matter) with a SQL backend, either MySQL or Postgres.
The web application will be multi-process, with a cluster of app-server processes running and working on the same DB.
I was wondering: is there any good and common strategy to handle racing conditions?
Since it's going to be a DB-intense application, I can easily see how two clients can try to modify the same data at the same time.
Let's simplify the situation:
Two clients/users GET the same data, it doesn't matter if this happens at the same time.
They are served with two web pages representing the same data.
Later both of them try to write some incompatible modifications to the same record.
Is there a simple way to handle this kind of situation?
I was thinking of using id-tokens associated with each record. This tokens would be changed upon updates of the records, thus invalidating any subsequent update attempt based on stale data (old expired token).
Is there a better way? Maybe something already built in MySQL?
I'm also interested in coding patterns used in this cases.
thanks
Optimistic locking
The standard way to handle this in webapps is to use what's referred to as "optimistic locking".
Each record has a unique ID and an integer (or timestamp, but integer is better) optimistic lock field. This oplock filed is initialized to 0 on record creation.
When you get the record you get the oplock field with it.
When you set the record you set the oplock value to the oplock you retrieved with the SELECT plus one and you make the UPDATE conditional on the oplock value still being what it was when you last looked:
UPDATE thetable
SET field1 = ...,
field2 = ...,
oplock = 1
WHERE record_id = ...
AND oplock = 0;
If you lost a race with another session this statement will still succeed but it will report zero rows affected. That allows you to tell the user their change collided with changes by another user or to merge their changes and re-send, depending on what makes sense in that part of the app.
Many frameworks provide tooling to help automate this, and most ORMs can do it out of the box. Ruby on Rails supports optimistic locking.
Be careful when combining optimistic locking with pessimistic locking (as described below) for traditional applications. It can work, you just need to add a trigger on all optimistically lockable tables that increments the oplock column on an UPDATE if the UPDATE statement didn't do so its self. I wrote a PostgreSQL trigger for Hibernate oplock support that should be readily adaptable to Rails. You only need this if you're going to update the DB from outside Rails, but in my view it's always a good idea to be safe.
Pessimistic locking
The more traditional approach to this is to begin a transaction and do a SELECT ... FOR UPDATE when fetching a record you intend to modify. You then hold the transaction open and idle while the user ponders what they're going to do and issue the UPDATE on the already-locked record before COMMITting.
This doesn't work well and I don't recommend it. It requires an open, often idle transaction for each user. This can cause problems with MVCC row cleanup in PostgreSQL and can cause locking problems in applications. It's also very inefficient for large applications with high user counts.
Insert races
Dealing with races on INSERT requires you to have a suitable application level unique key on the table, so inserts fail when they conflict.

SQL Server 2008 - How to implement a "Watch Dog Service" which woofs when too many insert statements on a table

Like my title describes: how can I implement something like a watchdog service in SQL Server 2008 with following tasks: Alerting or making an action when too many inserts are committed on that table.
For instance: Error table gets in normal situation 10 error messages in one second. If more than 100 error messages (100 inserts) in one second then: ALERT!
Would appreciate it if you could help me.
P.S.: No. SQL Jobs are not an option because the watchdog should be live and woof on the fly :-)
Integration Services? Are there easier ways to implement such a service?
Kind regards,
Sani
I don't understand your problem exactly, so I'm not entirely sure whether my answer actually solves anything or just makes an underlying problem worse. Especially if you are facing performance or concurrency problems, this may not work.
If you can update the original table, just add a datetime2 field like
InsertDate datetime2 NOT NULL DEFAULT GETDATE()
Preferrably, make an index on the table and then with whatever interval that fits, poll the table by seeing how many rows have an InsertDate > GetDate - X.
For this particular case, you might benefit from making the polling process read uncommitted (or use WITH NOLOCK), although one has to be careful when doing so.
If you can't modify the table itself and you can't or won't make another process or job monitor the relevant variables, I'd suggest the following:
Make a 'counter' table that just has one Datetime2 column.
On the original table, create an AFTER INSERT trigger that:
Deletes all rows where the datetime-field is older than X seconds.
Inserts one row with current time.
Counts to see if too many rows are now present in the counter-table.
Acts if necessary - ie. by executing a procedure that will signal sender/throw exception/send mail/whatever.
If you can modify the original table, add the datetime column to that table instead and make the trigger count all rows that aren't yet X seconds old, and act if necessary.
I would also look into getting another process (ie. an SQL Jobs or a homemade service or similar) to do all the housekeeping, ie. deleting old rows, counting rows and acting on it. Keeping this as the work of the trigger is not a good design and will probably cause problems in the long run.
If possible, you should consider having some other process doing the housekeeping.
Update: A better solution will probably be to make the trigger insert notifications (ie. datetimes) into a queue - if you then have something listening against that queue, you can write logic to determine whether your threshold has been exceeded. However, that will require you to move some of your logic to another process, which I initially understood was not an option.

Set eventual consistency (late commit) in MySQL

Consider the following situation: You want to update the number of page views of each profile in your system. This action is very frequent, as almost all visits to your website result in a page view incremental.
The basic way is update Users set page_views=page_views+1. But this is totally not optimal because we don't really need instant update (1 hour late is ok). Is there any other way in MySQL to postpone a sequence of updates, and make cumulative updates at a later time?
I myself tried another method: storing a counter (# of increments) for each profile. But this also results in handling a few thousands of small files, and I think that the disk IO cost (even if a deep tree-structure for files is applied) would probably exceed the database.
What is your suggestion for this problem (other than MySQL)?
To improve performance you could store your page view data in a MEMORY table - this is super fast but temporary, the table only persists while the server is running - on restart it will be empty...
You could then create an EVENT to update a table that will persist the data on a timed basis. This would help improve performance a little with the risk that, should the server go down, only the number of visits since the last run of the event would be lost.
The link posted by James via the comment to your question, wherein lies an accepted answer with another comment about memcached was my first thought also. Just store the profileIds in memcached then you could set up a cron to run every 15 minutes and grab all the entries then issue the updates to MySQL in a batch, but there are a few things to consider.
When you run the batch script to grab the ids out of memcached, you will have to ensure you remove all entries which have been parsed, otherwise you run the risk of counting the same profile views multiple times.
Being that memcache doesn't support wildcard searching via keys, and that you will have to purge existing keys for the reason stated in #1, you will probably have to setup a separate memcache server pool dedicated for the sole purpose of tracking profile ids, so you don't end up purging cached values which have no relation to profile view tracking. However, you could avoid this by storing the profileId and a timestamp within the value payload, then have your batch script step through each entry and check the timestamp, if it's within the time range you specified, add it to queue to be updated, and once you hit the upper limit of your time range, the script stops.
Another option may be to parse your access logs. If user profiles are in a known location like /myapp/profile/1234, you could parse for this pattern and add profile views this way. I ended up having to go this route for advertiser tracking, as it ended up being the only repeatable way to generate billing numbers. If they had any billing disputes we would offer to send them the access logs and parse for themselves.

Alternative to timestamp column for syncing purpose

I have a local mysql db and a production mysql db. I make changes locally and use a third party tool to sync the changes to the live server. The tool uses the checksum feature to identify the rows changed. My db structure is simple, one varchar200 field (acts as primary key), and a text field.
The problem is the sync is taking ages since there are thousands of rows. I believe adding a timestamp field will held in getting the checksum quickly for the tool to identify the rows to be synced. this created further more problems, as the timestamp field is different in local and prod servers due to the timezone differences.
I am looking for a useful idea or an alternative to timestamp that gets changed when a row is modified.
PS: I posted a similar question but didn't get any useful answers. I dont wany to rely on additional tables.
My tip: Don't use TIMESTAMP datatype, use DATETIME. They hold the same kind of data, but difference is TIMESTAMP is updated every time you touch the row, even if you don't set that column, it will be updated with "now", including insertions.
This means when you use TIMESTAMP, you can never truly synch the two databases - that column will always be different. If you use DATETIME, you can preserve that column's data.
If you can't code your applications to update the DATETIME column with "now", simply create a trigger that will do it for you.
You could do several things:
add a column for "dirty" to the source table. Make it a single BIT that you flip when the row gets changed and flip it back when it gets sync'd. If the row id is a primary key this is a simple insert ... on duplicate key update
store all your times as GMT. So no more fighting over timezone. This is standard practice anywhere time is being stored anyway.
setup replication between the two servers so MySQL will do the copying / updating for you. This is precisely what its designed for and it works well.
Timestamping the rows is in fact a bad idea, by the time the client updates its own latest syncing time, many rows may get updated on the server and you may miss these. Use a counter which increases by one every time you add or modify rows on the server, the client update itself and get the latest value of the counter. The client may not get the very last value for the counter (e.g. some row get updated while the client is requesting the update) but it's guarantee to catch up at the next update

What is the best way to update (or replace) an entire database table on a live machine?

I'm being given a data source weekly that I'm going to parse and put into a database. The data will not change much from week to week, but I should be updating the database on a regular basis. Besides this weekly update, the data is static.
For now rebuilding the entire database isn't a problem, but eventually this database will be live and people could be querying the database while I'm rebuilding it. The amount of data isn't small (couple hundred megabytes), so it won't load that instantaneously, and personally I want a bit more of a foolproof system than "I hope no one queries while the database is in disarray."
I've thought of a few different ways of solving this problem, and was wondering what the best method would be. Here's my ideas so far:
Instead of replacing entire tables, query for the difference between my current database and what I want to place in the database. This seems like it could be an unnecessary amount of work, though.
Creating dummy data tables, then doing a table rename (or having the server code point towards the new data tables).
Just telling users that the site is going through maintenance and put the system offline for a few minutes. (This is not preferable for obvious reasons, but if it's far and away the best answer I'm willing to accept that.)
Thoughts?
I can't speak for MySQL, but PostgreSQL has transactional DDL. This is a wonderful feature, and means that your second option, loading new data into a dummy table and then executing a table rename, should work great. If you want to replace the table foo with foo_new, you only have to load the new data into foo_new and run a script to do the rename. This script should execute in its own transaction, so if something about the rename goes bad, both foo and foo_new will be left untouched when it rolls back.
The main problem with that approach is that it can get a little messy to handle foreign keys from other tables that key on foo. But at least you're guaranteed that your data will remain consistent.
A better approach in the long term, I think, is just to perform the updates on the data directly (your first option). Once again, you can stick all the updating in a single transaction, so you're guaranteed all-or-nothing semantics. Even better would be online updates, just updating the data directly as new information becomes available. This may not be an option for you if you need the results of someone else's batch job, but if you can do it, it's the best option.
BEGIN;
DELETE FROM TABLE;
INSERT INTO TABLE;
COMMIT;
Users will see the changeover instantly when you hit commit. Any queries started before the commit will run on the old data, anything afterwards will run on the new data. The database will actually clear the old table once the last user is done with it. Because everything is "static" (you're the only one who ever changes it, and only once a week), you don't have to worry about any lock issues or timeouts. For MySQL, this depends on InnoDB. PostgreSQL does it, and SQL Server calls it "snapshotting," and I can't remember the details off the top of my head since I rarely use the thing.
If you Google "transaction isolation" + the name of whatever database you're using, you'll find appropriate information.
We solved this problem by using PostgreSQL's table inheritance/constraints mechanism.
You create a trigger that auto-creates sub-tables partitioned based on a date field.
This article was the source I used.
Which database server are you using? SQL 2005 and above provides a locking method called "Snapshot". It allows you to open a transaction, do all of your updates, and then commit, all while users of the database continue to view the pre-transaction data. Normally, your transaction would lock your tables and block their queries, but snapshot locking would be perfect in your case.
More info here: http://blogs.msdn.com/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx
But it requires SQL Server, so if you're using something else....
Several database systems (since you didn't specify yours, I'll keep this general) do offer the SQL:2003 Standard statement called MERGE which will basically allow you to
insert new rows into a target table from a source which don't exist there yet
update existing rows in the target table based on new values from the source
optionally even delete rows from the target that don't show up in the import table anymore
SQL Server 2008 is the first Microsoft offering to have this statement - check out more here, here or here.
Other database system probably will have similar implementations - it's a SQL:2003 Standard statement after all.
Marc
Use different table names(mytable_[yyyy]_[wk]) and a view for providing you with a constant name(mytable). Once a new table is completely imported update your view so that it uses that table.