A really weird (for me) problem is occurring lately. In an application that accepts user submitted data the following occurs at random:
Rows from the Database Table where the user submitted data is stored are disappearing.
Please note that there is NO DELETE, DROP, TRUNCATE or other SQL statement issued on the database table except from the INSERT statement.
Could this be a bug of Mysql? Did some research on mysql.com (forums, bugs, etc) and found 2 similar cases but without getting a solid answer (just suggestions).
Some info you might find useful:
Storage Engine: InnoDB
User Submitted Data sanitized and checked for SQL Injection attempts
Appreciate any suggestions, info.
regards,
Here's 3 possibilities:
The data never got to the database in the first place. Something happened elsewhere so the data disappeared. Maybe intermitten network issues, overloaded server, application bug.
A database transaction was not commited, and got rolled back. Maybe a bug in your application code, maybe some invalid data screwd things up, maybe a concurrency exception occured etc.
A bug in mysql.
I'd look at 1. and 2. first.
A table on which you only ever insert (and presumably select) and never update or delete should be really stable. Are you absolutely certain you're protecting thoroughly against SQL injection attacks? Because those could (of course) delete rows and such if successful.
You haven't mentioned which table engine you're using (there are several), but it's well worth running whatever diagnostic tools there are for it on the table in question. For instance, on a MyISAM table, run myisamchk. Or more generically (this works for several table types), use the CHECK TABLE statement.
Have you had issues with the underlying storage? It may be worth checking for those.
Activating binlog and periodically monitoring DELETE queries can help to identify the culprit.
One more case to fullfill the above. There could also be the case of client-side and server-side parts of application. Client-side initiated changes can be processed on the server side with additional code logics.
For example, in our case, local admin panel updated an order information with pay_date = NULL and php-website processed this table to clean-up overdue orders from this table. As php logics were developed by another programmer, it looked strange when orders update resulted in records to disappear after some time.
The same refers to crone operations, working on mysql database in a schedule.
Related
I'm having another issue with an Microsoft Access database. Every so often, some records will get corrupted. Something happens and different shapes, Chinese characters, and wrong data will be in the records. I did find a way on not losing the corrupted records by having a backup for that table that I update everyday. Still, it's a bit of an annoyance especially when an update is ran.
I've tried to look for different solutions for this problem but none have really worked. It's a database that can be used by multiple users at the same time. It's an older one that I've had to update a bit. I don't have any memo fields present in the table either.
If you are using an autonumber field as a primary key, that could cause an increased corruption risk if the autonumber seed is reset and begins duplicating existing values. This has since been fixed, but you may need to update your Jet Engine Service Pack
If you are in a multi-user environment and have not split your database, you should try that. You can split the database using the database tools tab on the ribbon in the "Move Data" section. That can reduce corruption risk by better managing concurrent updates to the same record. See further discussion here.
Unfortunately I can't tell you the problem without more information regarding your tables and relationships. If the corruption is a common result of your update query, I would start by looking through your update routine for errors.
I have a big table, which saved data with an ID based on input from an external API. The ID is stored in an int field. When I developed the system, I encountered no problems, because the ID of records in the external API were always below 2147483647.
The system has been fetching data from the API for the last few months, and apparantly the ID crossed the 2147483647 mark. I now have a database with thousands of unusable records with ID 2147483647.
It is not possible to fetch this information from the database again (basically, the API allows us to look up data from max x days ago).
I am pretty sure that I am doomed. But might there be any backlog, or any other way, to retrieve the original input queries, or numbers that were truncated by MySQL to fit in the int field?
As already discussed in the comments, there is no way to retrieve the information from the table. It was silently(?!!!) truncated to 32 bits.
First, call the API provider, explain your situation, and see if you can redo the queries. Best that happens is they say yes and you don't have to try to reconstruct things from logs. Worst that happens is they say no and you're back where you are now.
Then there are some logs I would check.
First is the MySQL General Query Log. IF you had this turned on, it may contain the queries which were run. Another possibility is the Slow Query Log, more often enabled, if your queries happened to be slow.
In MySQL, data truncation is a warning by default. It's possible those warnings went into a log and included the original data. The MySQL Error Log is one possibility. On Windows it may have gone into the Windows Event Log. On a Mac, it might be in a log visible to the Console. In Unix, it might have gone to syslog.
Then it's possible the API queries themselves are logged somewhere. If you used a proxy it might contain them in its log. The program fetching from the API and adding to the database may also have its own logs. It's a long shot.
As a last resort, try grepping all of /var/log and /var/local/log and anywhere else you might think could contain a log.
In the future there are some things you can do to prevent this sort of thing from happening again. The most important is to turn on strict SQL mode. This will turn warnings, like that data has been truncated, into errors.
Set UNIQUE constraints on unique columns. Had your API ID column been declared UNIQUE the error would have been detected.
Use UNSIGNED BIGINT for numeric IDs. 2 billion is a number easily exceeded these days. It will mean 4 extra bytes per row or about 8 gigabytes extra to store 2 billion rows. Disk is cheap.
Consider turning on ANSI SQL mode. This will disable a lot of MySQL extensions and make your SQL more portable.
Finally, consider switching to PostgreSQL. Over the years MySQL has accumulated a lot of bad ideas, mish-mashes of functions, and bad default behaviors. You just got bit by one. PostgreSQL is far better designed, more powerful and flexible, and usually as fast or faster.
In Postgres, you would have gotten an error.
test=# CREATE TABLE foo ( id INTEGER );
CREATE TABLE
test=# INSERT INTO foo (id) VALUES (2147483648);
ERROR: integer out of range
If you have binary logging enabled, and you still have backups of the binlogs, and your binlog_format is not set to ROW then your original insert and/or update statements should be preserved there, where you could extract them and replay them into another server with a more appropriate table definition.
If you don't have the binlog enabled and/or you aren't archiving the binlogs in perpetuity... this is one of the reasons why you should consider doing it.
So one of the projects I'm working on requires us to take every query that is ran on the server and automatically paste that query into a table inside of the database. The reason for this is so that the DBA is able to view all prior SQL Queries that have been ran on the box. Unfortunately I don't have any leeway to do this differently as the client is requiring this implementation.
Has anybody done this before or has any code that I could use that will automatically do this? Thanks.
Be careful! If you do an INSERT for every action taken, you will need to do an INSERT for that INSERT, at which point, you will ...
That is, the first logged query will hang the server and fill up the disk!
Instead of doing the task the way it is asked, turn on the "general log" and periodically scrape what it in it into another machine, which does not have this logging turned on.
Other arguments against the task as stated...
If a table has TRIGGERs, you will not be able to add another TRIGGER.
If "every query" really means "every", it is impossible (with a TRIGGER) since you can't write a SELECT or SHOW trigger.
"as the client is requiring this implementation". I would approach this unreasonable constraint by politely finding out what the real goal is. He has only described is an implementation.
If his goal is some kind of audit log, then my suggestion about the general log should suffice.
Like my title describes: how can I implement something like a watchdog service in SQL Server 2008 with following tasks: Alerting or making an action when too many inserts are committed on that table.
For instance: Error table gets in normal situation 10 error messages in one second. If more than 100 error messages (100 inserts) in one second then: ALERT!
Would appreciate it if you could help me.
P.S.: No. SQL Jobs are not an option because the watchdog should be live and woof on the fly :-)
Integration Services? Are there easier ways to implement such a service?
Kind regards,
Sani
I don't understand your problem exactly, so I'm not entirely sure whether my answer actually solves anything or just makes an underlying problem worse. Especially if you are facing performance or concurrency problems, this may not work.
If you can update the original table, just add a datetime2 field like
InsertDate datetime2 NOT NULL DEFAULT GETDATE()
Preferrably, make an index on the table and then with whatever interval that fits, poll the table by seeing how many rows have an InsertDate > GetDate - X.
For this particular case, you might benefit from making the polling process read uncommitted (or use WITH NOLOCK), although one has to be careful when doing so.
If you can't modify the table itself and you can't or won't make another process or job monitor the relevant variables, I'd suggest the following:
Make a 'counter' table that just has one Datetime2 column.
On the original table, create an AFTER INSERT trigger that:
Deletes all rows where the datetime-field is older than X seconds.
Inserts one row with current time.
Counts to see if too many rows are now present in the counter-table.
Acts if necessary - ie. by executing a procedure that will signal sender/throw exception/send mail/whatever.
If you can modify the original table, add the datetime column to that table instead and make the trigger count all rows that aren't yet X seconds old, and act if necessary.
I would also look into getting another process (ie. an SQL Jobs or a homemade service or similar) to do all the housekeeping, ie. deleting old rows, counting rows and acting on it. Keeping this as the work of the trigger is not a good design and will probably cause problems in the long run.
If possible, you should consider having some other process doing the housekeeping.
Update: A better solution will probably be to make the trigger insert notifications (ie. datetimes) into a queue - if you then have something listening against that queue, you can write logic to determine whether your threshold has been exceeded. However, that will require you to move some of your logic to another process, which I initially understood was not an option.
I'm being given a data source weekly that I'm going to parse and put into a database. The data will not change much from week to week, but I should be updating the database on a regular basis. Besides this weekly update, the data is static.
For now rebuilding the entire database isn't a problem, but eventually this database will be live and people could be querying the database while I'm rebuilding it. The amount of data isn't small (couple hundred megabytes), so it won't load that instantaneously, and personally I want a bit more of a foolproof system than "I hope no one queries while the database is in disarray."
I've thought of a few different ways of solving this problem, and was wondering what the best method would be. Here's my ideas so far:
Instead of replacing entire tables, query for the difference between my current database and what I want to place in the database. This seems like it could be an unnecessary amount of work, though.
Creating dummy data tables, then doing a table rename (or having the server code point towards the new data tables).
Just telling users that the site is going through maintenance and put the system offline for a few minutes. (This is not preferable for obvious reasons, but if it's far and away the best answer I'm willing to accept that.)
Thoughts?
I can't speak for MySQL, but PostgreSQL has transactional DDL. This is a wonderful feature, and means that your second option, loading new data into a dummy table and then executing a table rename, should work great. If you want to replace the table foo with foo_new, you only have to load the new data into foo_new and run a script to do the rename. This script should execute in its own transaction, so if something about the rename goes bad, both foo and foo_new will be left untouched when it rolls back.
The main problem with that approach is that it can get a little messy to handle foreign keys from other tables that key on foo. But at least you're guaranteed that your data will remain consistent.
A better approach in the long term, I think, is just to perform the updates on the data directly (your first option). Once again, you can stick all the updating in a single transaction, so you're guaranteed all-or-nothing semantics. Even better would be online updates, just updating the data directly as new information becomes available. This may not be an option for you if you need the results of someone else's batch job, but if you can do it, it's the best option.
BEGIN;
DELETE FROM TABLE;
INSERT INTO TABLE;
COMMIT;
Users will see the changeover instantly when you hit commit. Any queries started before the commit will run on the old data, anything afterwards will run on the new data. The database will actually clear the old table once the last user is done with it. Because everything is "static" (you're the only one who ever changes it, and only once a week), you don't have to worry about any lock issues or timeouts. For MySQL, this depends on InnoDB. PostgreSQL does it, and SQL Server calls it "snapshotting," and I can't remember the details off the top of my head since I rarely use the thing.
If you Google "transaction isolation" + the name of whatever database you're using, you'll find appropriate information.
We solved this problem by using PostgreSQL's table inheritance/constraints mechanism.
You create a trigger that auto-creates sub-tables partitioned based on a date field.
This article was the source I used.
Which database server are you using? SQL 2005 and above provides a locking method called "Snapshot". It allows you to open a transaction, do all of your updates, and then commit, all while users of the database continue to view the pre-transaction data. Normally, your transaction would lock your tables and block their queries, but snapshot locking would be perfect in your case.
More info here: http://blogs.msdn.com/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx
But it requires SQL Server, so if you're using something else....
Several database systems (since you didn't specify yours, I'll keep this general) do offer the SQL:2003 Standard statement called MERGE which will basically allow you to
insert new rows into a target table from a source which don't exist there yet
update existing rows in the target table based on new values from the source
optionally even delete rows from the target that don't show up in the import table anymore
SQL Server 2008 is the first Microsoft offering to have this statement - check out more here, here or here.
Other database system probably will have similar implementations - it's a SQL:2003 Standard statement after all.
Marc
Use different table names(mytable_[yyyy]_[wk]) and a view for providing you with a constant name(mytable). Once a new table is completely imported update your view so that it uses that table.