SQL Server 2008 - How to implement a "Watch Dog Service" which woofs when too many insert statements on a table - sql-server-2008

Like my title describes: how can I implement something like a watchdog service in SQL Server 2008 with following tasks: Alerting or making an action when too many inserts are committed on that table.
For instance: Error table gets in normal situation 10 error messages in one second. If more than 100 error messages (100 inserts) in one second then: ALERT!
Would appreciate it if you could help me.
P.S.: No. SQL Jobs are not an option because the watchdog should be live and woof on the fly :-)
Integration Services? Are there easier ways to implement such a service?
Kind regards,
Sani

I don't understand your problem exactly, so I'm not entirely sure whether my answer actually solves anything or just makes an underlying problem worse. Especially if you are facing performance or concurrency problems, this may not work.
If you can update the original table, just add a datetime2 field like
InsertDate datetime2 NOT NULL DEFAULT GETDATE()
Preferrably, make an index on the table and then with whatever interval that fits, poll the table by seeing how many rows have an InsertDate > GetDate - X.
For this particular case, you might benefit from making the polling process read uncommitted (or use WITH NOLOCK), although one has to be careful when doing so.
If you can't modify the table itself and you can't or won't make another process or job monitor the relevant variables, I'd suggest the following:
Make a 'counter' table that just has one Datetime2 column.
On the original table, create an AFTER INSERT trigger that:
Deletes all rows where the datetime-field is older than X seconds.
Inserts one row with current time.
Counts to see if too many rows are now present in the counter-table.
Acts if necessary - ie. by executing a procedure that will signal sender/throw exception/send mail/whatever.
If you can modify the original table, add the datetime column to that table instead and make the trigger count all rows that aren't yet X seconds old, and act if necessary.
I would also look into getting another process (ie. an SQL Jobs or a homemade service or similar) to do all the housekeeping, ie. deleting old rows, counting rows and acting on it. Keeping this as the work of the trigger is not a good design and will probably cause problems in the long run.
If possible, you should consider having some other process doing the housekeeping.
Update: A better solution will probably be to make the trigger insert notifications (ie. datetimes) into a queue - if you then have something listening against that queue, you can write logic to determine whether your threshold has been exceeded. However, that will require you to move some of your logic to another process, which I initially understood was not an option.

Related

Optimising "NOT IN(...)" query for millions of rows

Note: I do not have access to the source code/database to which this question pertains. The two tables in question are located on different servers.
I'm working with a 3rd party company that have systems integrated with our own. They have a query that runs something like this;
DELETE FROM table WHERE column NOT IN(1,2,3,4,5,.....3 000 000)
It's pretty much referencing around 3 million values in the NOT IN.
I'm trying to point out that this seems like an inefficient method for deleting multiple rows and keeping all the ones noted in the query. The problem is, as I don't have the access myself to the source code/database I'm not totally sure what to suggest as a solution.
I know the idea of this query is to keep a target server synced up with a source server. So if a row is deleted on the source server, the target server will reflect that change when this (and other) query is run.
With this limited knowledge, what possible suggestions could I present to them?
The first thing that comes to mind is having some kind of flag column that indicates whether it's been deleted or not. When the sync script runs it would first perform an update on the target server for all rows marked as deleted (or insert for new rows), then a second query to delete all rows marked for deletion.
Is there more logical way to do something like this, bearing in mind complete overhauls in functionality are out of the question. Only small tweaks to the current process will be possible for a number of reasons.
Instead of
DELETE FROM your_table
WHERE column NOT IN(1,2,3,4,5,.....3 000 000)
you could do
delete t1
from your_table t1
left join table_where_the_ids_come_from t2 on t1.column = t2.id
where t2.id is null
I know the idea of this query is to keep a target server synced up with a source server. So if a row is deleted on the source server, the target server will reflect that change when this (and other) query is run.
I know this is obvious, but why don't these two servers stay in sync using replication? I'm guessing it's because aside from this one table, they don't have identical data.
If out-of-the-box replication isn't flexible enough, you could use a change-data capture tool.
The idea is that the tool monitors changes in a MySQL binary log stream, and reacts to them. The reaction is user-defined, and it can include applying the same change to another MySQL instance, which would keep them in sync.
Here's a blog that shows how to use Maxwell, which is one of the open-source CDC tools, this one released from Zendesk:
https://www.percona.com/blog/2016/09/13/mysql-cdc-streaming-binary-logs-and-asynchronous-triggers/
A couple of advantages of this approach:
No need to re-sync the whole table. You'd only apply incremental changes as they occur.
No need to schedule re-syncs daily or whatever. Since incremental changes are likely to be small, you could apply the changes nearly immediately.
Deleting a large number of rows will take a huge amount of time. This is likely to require a full table scan. As it finds rows to delete, it will stress the undo/redo log. It will clog replication (if using such). Etc.
How many rows do you expect to delete?
Better would be to break the list up into chunks of 1000. (This applies whether using IN(list of constants) or JOIN.) But, since you are doing NOT, it gets stickier. Possibly the best way is to copy over what you want:
CREATE TABLE new LIKE real;
INSERT INTO new
SELECT * FROM real WHERE id IN (...); -- without NOT
RENAME TABLE real TO old,
new TO real;
DROP TABLE old;
I go into details of chunking, partitioning, and other techniques in Big Deletes .

MySQL/MariaDB Trigger for Taking Ran Query and Pasting into a Row

So one of the projects I'm working on requires us to take every query that is ran on the server and automatically paste that query into a table inside of the database. The reason for this is so that the DBA is able to view all prior SQL Queries that have been ran on the box. Unfortunately I don't have any leeway to do this differently as the client is requiring this implementation.
Has anybody done this before or has any code that I could use that will automatically do this? Thanks.
Be careful! If you do an INSERT for every action taken, you will need to do an INSERT for that INSERT, at which point, you will ...
That is, the first logged query will hang the server and fill up the disk!
Instead of doing the task the way it is asked, turn on the "general log" and periodically scrape what it in it into another machine, which does not have this logging turned on.
Other arguments against the task as stated...
If a table has TRIGGERs, you will not be able to add another TRIGGER.
If "every query" really means "every", it is impossible (with a TRIGGER) since you can't write a SELECT or SHOW trigger.
"as the client is requiring this implementation". I would approach this unreasonable constraint by politely finding out what the real goal is. He has only described is an implementation.
If his goal is some kind of audit log, then my suggestion about the general log should suffice.

Two MySQL requests at the same time - Performance issue

I have a MySQL server with many innodb tables.
I have a background script that does A LOT a delete/insert with one request : it deletes many millions of rows from table 2, then insert many millions of rows to table 2 using data from table 1 :
INSERT INTO table 2 (date)
SELECT date from table 1 GROUP BY date
(The request is actually more complex but it is to shown what kind of request I am doing).
At the same time, I am going to run a second background script, that does about a million INSERT or UPDATE requests, but separately (I mean, I execute the first update query, then I execute an insert query, etc...) in table 3.
My issue is that when a script is running, it is fast, like let's say it takes 30minutes each, so 1h total. But when the two scripts are running at the same time, it is VERY slow, like it will take 5h, instead of 1h.
So first, I would like to know what can cause this ? Is it because of IO performance ? (like mysql is writing in two different tables so it is slow to switch between the two ?)
And how could I fix this ? If I could say that the big INSERT query is paused while my second background script is running, it would be great, for example... But I can't find a way to do something like this.
I am not an expert at MySQL administration.. If you need more information, please let me know !
Thank you !!
30 minutes for million INSERT is not fast. Do you have an index on date column? (or whatever column you are using to pivot on)
Regarding your original question.It's difficult to say much without knowing the details of both your scripts and the table structures, but one possible reason why the scripts are running reasonably quickly separately is because you are doing similar kinds of SELECT queries, which might be getting cached by MySQL and then reused for subsequent queries. But if you are running two queries in parallel, then the SELECT's for the corresponding query might not stay in the cache (because there are two concurrent processes which send new queries all the time).
You might want to explicitly disable cache for some queries which you are sure you only run once (using SQL_NO_CACHE modifier) and see if it changes anything. But I'd look into indexing and into your table structure first, because 30 minutes seems to be extremely slow :) E.g. you might also want to introduce partitioning by date for your tables, if you know that you always choose entries in a given period (say by month). The exact tricks depend on your data.
UPDATE: Another issue might be that both your queries work with the same table (table 1), and the default transaction isolation level in MySQL is REPEATABLE READS afair. So it might be that one query is waiting until the other is done with the table to satisfy the transaction isolation level. You might want to lower the transaction isolation level if you are sure that your table 1 is not changed when scripts are working on it.
You can use an event scheduler so you can set mysql to launch this queries at different hours of the day, in another stackoverflow related question you have an exmaple of how to do it: MySQL Event Scheduler on a specific time everyday
Another thing to have in mind is to use the explain plan to see what could be the reason the query is that slow.

How to continuously remove anything older than the newst 10 entries of a MySQLdatabase (Possibly in JPQL/JPA)

I'm looking for a way to continuously monitor and delete the oldest entries so that the database is never larger than a certain value. I'm only interested in the latest 10 for example and everything past that number should be deleted. The database is updated through varous programs but the program that does the monitoring and deleting will probably be a Java EE application with JPA. I don't know at which layer of the implementation this will be done. If MySQL has build in management that does this, if I'll have to write a query that does this, or if there is a feature of Java that can do this.
Edit: I'm using an autoincremented id that could be used to determine threshhold of deleting.
This is a complex problem, because unless your table is not linked to any other table, you might very well have the latest row in table A referencing a very old row in table B. In this case, although the table B's row is very old, you can't delete it without breaking the coherence of your database.
Doing it "continuously" is even harder (read: impossible). I would first
examine if it's really needed. Disks are cheap, and 10 entries in an enterprise database is really nothing.
implement some purge mechanism and execute it very now and then, when the database is not used by anyone else.
I'll have a stab without knowing anything about your table schema:
DELETE FROM MyTable WHERE Id NOT IN (SELECT TOP 10 Id FROM MyTable ORDER BY Date DESC)
This is pretty inefficient to run all the time and there may by a MySql-specific TRUNCATE that does the job nicer. You'd probably get better performance from limiting your reads to the 10 rows you need, and actually archiving / deleting the extraneous data only periodically.

What is the best way to update (or replace) an entire database table on a live machine?

I'm being given a data source weekly that I'm going to parse and put into a database. The data will not change much from week to week, but I should be updating the database on a regular basis. Besides this weekly update, the data is static.
For now rebuilding the entire database isn't a problem, but eventually this database will be live and people could be querying the database while I'm rebuilding it. The amount of data isn't small (couple hundred megabytes), so it won't load that instantaneously, and personally I want a bit more of a foolproof system than "I hope no one queries while the database is in disarray."
I've thought of a few different ways of solving this problem, and was wondering what the best method would be. Here's my ideas so far:
Instead of replacing entire tables, query for the difference between my current database and what I want to place in the database. This seems like it could be an unnecessary amount of work, though.
Creating dummy data tables, then doing a table rename (or having the server code point towards the new data tables).
Just telling users that the site is going through maintenance and put the system offline for a few minutes. (This is not preferable for obvious reasons, but if it's far and away the best answer I'm willing to accept that.)
Thoughts?
I can't speak for MySQL, but PostgreSQL has transactional DDL. This is a wonderful feature, and means that your second option, loading new data into a dummy table and then executing a table rename, should work great. If you want to replace the table foo with foo_new, you only have to load the new data into foo_new and run a script to do the rename. This script should execute in its own transaction, so if something about the rename goes bad, both foo and foo_new will be left untouched when it rolls back.
The main problem with that approach is that it can get a little messy to handle foreign keys from other tables that key on foo. But at least you're guaranteed that your data will remain consistent.
A better approach in the long term, I think, is just to perform the updates on the data directly (your first option). Once again, you can stick all the updating in a single transaction, so you're guaranteed all-or-nothing semantics. Even better would be online updates, just updating the data directly as new information becomes available. This may not be an option for you if you need the results of someone else's batch job, but if you can do it, it's the best option.
BEGIN;
DELETE FROM TABLE;
INSERT INTO TABLE;
COMMIT;
Users will see the changeover instantly when you hit commit. Any queries started before the commit will run on the old data, anything afterwards will run on the new data. The database will actually clear the old table once the last user is done with it. Because everything is "static" (you're the only one who ever changes it, and only once a week), you don't have to worry about any lock issues or timeouts. For MySQL, this depends on InnoDB. PostgreSQL does it, and SQL Server calls it "snapshotting," and I can't remember the details off the top of my head since I rarely use the thing.
If you Google "transaction isolation" + the name of whatever database you're using, you'll find appropriate information.
We solved this problem by using PostgreSQL's table inheritance/constraints mechanism.
You create a trigger that auto-creates sub-tables partitioned based on a date field.
This article was the source I used.
Which database server are you using? SQL 2005 and above provides a locking method called "Snapshot". It allows you to open a transaction, do all of your updates, and then commit, all while users of the database continue to view the pre-transaction data. Normally, your transaction would lock your tables and block their queries, but snapshot locking would be perfect in your case.
More info here: http://blogs.msdn.com/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx
But it requires SQL Server, so if you're using something else....
Several database systems (since you didn't specify yours, I'll keep this general) do offer the SQL:2003 Standard statement called MERGE which will basically allow you to
insert new rows into a target table from a source which don't exist there yet
update existing rows in the target table based on new values from the source
optionally even delete rows from the target that don't show up in the import table anymore
SQL Server 2008 is the first Microsoft offering to have this statement - check out more here, here or here.
Other database system probably will have similar implementations - it's a SQL:2003 Standard statement after all.
Marc
Use different table names(mytable_[yyyy]_[wk]) and a view for providing you with a constant name(mytable). Once a new table is completely imported update your view so that it uses that table.