MySQL & File Management: reuse deleted IDs in future auto increment - mysql

I'm using the database to name files when they are uploaded, and my application uses the database to identify the file (for video streaming), so they need to be the same. But if an upload fails, or someone cancel the upload i'm left with a wasted ID.
overtime I have more wasted ID numbers then one's being used, I created a script that labels them unused after 5 days of the row is inserted, and was thinking of making another script to use those "unused" IDs and if none exists insert a new row. But I can see problems when I have multi servers doing that task.
Can I delete a row and have the ID from that row be re-used in future auto increments?
I'm open to other ways of accomplishing this task, but if I can re-use deleted ID's in future auto increment.

If you delete the row you can run
ALTER TABLE yourtable AUTO_INCREMENT = 1
to re-set the auto-incrementing to start at the last unused ID, but I don't see the point in doing this. If you're doing this for sake of the database, don't. The database doesn't care.

I'd +1 the "DON'T" if it's for the sake of the database, because really, it doesn't care at all. With that said, I do anyway because I'm a bit of a neat freak and I'll give you an example of why I decided to do it.
I created a service for musicians that uses an auto_increment as an index reference for songs, purchases and a few other things, and as with most apps that have registration forms and mail functions, it was bombed with bots, spammers and the like who created tons and tons of free accounts and they get deleted often (cron tab automatically cleans house about every 3 days) and what I end up getting are very inconsistent counts. 1 to 41, then 129 to 240, then 491 to 800 and so on.
It was more of a personal preference to 'close the gaps' because, as I said, i'm a bit of a neat freak and I like consistency. I also have to print out reports for managers and so on that want to track data and they like consistency, so it helps a lot.
And as for what happens when auto increment hits a number that already exits? It skips to the next available one and continues. That's why you can reset it to 1 and everything's cool. no harm done.

Related

Export (backup) and Truncate (delete) table at the same time (i.e. atomic)?

I am hosting a forum with "forum gold".
People trade this a lot, gift it, award people with it to "thank" or "like" posts, or to increase reputation.
However, I am concerned that there might be some exploit that allows people to hack gold into their forum account, so I added logging on EVERY forum gold transaction.
It works well. I can perform sum queries to assure that no unknown sources are introducing forum gold into the system, and to ensure that all forum gold awarded to users are accounted for.
However, it totally blew up. Within just a couple of days, I have more than 100,000 entries in the table. I also got mail from my webhost about a slow mySQL query warning, which is just a simple SELECT from that table of a single record, no joins, ordering, functions like date_add() or anything at all even.
So I want to completely export AND empty the table with the logs. Now, I normally back up the rest of my database via the "export" feature in phpmyadmin. However, this table is highly active, anywhere from 10 up to 50 new rows are added every second, but I want to keep the integrity and accuracy of my computations by not losing any records.
Is there an "atomic" way I can export then delete all records, with no transactions getting in between?
Okay, so I just ended up:
creating a new TEMP table,
selecting everything from the LOG table,
inserting it into the new TEMP table,
then deleting from LOG everything where exists the same record in the TEMP table
exporting the TEMP table
doing a global replace of "INSERT INTO `temp`" into "INSERT INTO `log`"

What is the most elegant way to do Concurrency Control of modifying records in Mysql?

Ok, look at this scenario, I got a table in which users may modify many rows at the same time. I want that when someone is modifying records then other people can't modify the same record. Ex, there's table1 with 3 columns (ID, text, flag):
ID-text-flag
11-txt1-0
12-txt2-0
13-txt3-0
14-txt4-0
I did some research about Concurrency Control in Mysql, the 1st solution that people suggest is to use SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODEin Mysql. However, there is a limitation for this solution, that is u have to update immediately right after you select a record. For example:
SELECT text FROM table1 where ID<=20 FOR UPDATE;
UPDATE table1 SET text = 'new text' where ID<=20;
However, my application requires users to download the data to the Gui & then the users may spend many hours to work on that data before they commit to update.
There is a second solution, Mysql provide the Row Lock mechanism, but this solution require to use InnoDB & its overhead is quite high.
Another option is to use table lock in mysql, the overhead is quite low, but we can't lock the whole table for many hours right? For example, user A may modify the record 1 to 10, user B at the same time may need to modify record 11 to 20, so the user B shouldn't wait until the user A finishes modifying.
In my opinion, i want to have a column "flag", when user A is modifying some records, the flag of these records will turn to 1, if user B wants to modify the same records, the system will popup message saying someone is modifying it. When user A finishes, it will bring the flag to 0, & user A can modify these same record.
But there is a complicated problem for this solution, what if user A forgot to save data? if that happens then the flag is 1 forever? Then the flag should have an expired time after a few hours? & how to do that?
Maybe we need time-stamp or some mechanism to let the flag turn back to 0 after a certain time? But this is more complicated than i thought. We can't let the DB to check the time-stamp of the flag every 1 hour?
--> I am not sure if this is the most elegant solution? I have no commercial experience in DB design & i want to know how DB people manage this issue in a commercial environment?
Can you find a better & more elegant solution than all the above solutions?
As Zerkms suggested, we do not need flag at all, but we need to build the "Record comparing system" in our Application.
Ok, say any user can download data for updating, after he modifies record & before he actually updates DB, the Record comparing system will check the ID & the old text of the data (not the new data that users has just modified) with the same ID & current text of the data in DB, if they are the same then he can commit updating, but if they are different then it means someone else updated the records before him, so the system will not allow him to update. He then has to download data again & start over.
Since it is very rare that 2 users update the same records at the same time, so i think this solution is quite feasible & easy to implement & do not increase the overhead. This is called the Optimistic concurrency control as we build the Record comparing system at the application level, so no database overhead involved. I think this is the most elegant solution for my application need.
Do u think so?
Note: for this system, we can use auto increment ID (for in the case of inserting new records)

Best practice for enabling "undelete" for database entities?

This is for a CRM application using PHP/MySQL. Various entities like customer, contact, note, etc, can be "deleted" by the user. Rather than actually deleting the entity from the database, I just want it to appear deleted to the application, but be kept in the DB and able to be "restored" if needed at a later time. Maybe even add some kind of "recycle bin" to the app.
I've thought of several ways to do this:
Move the deleted entity to another table. (customer to customer_deleted)
Change an attribute on the entity. (enabled to false)
I'm sure there are other ways and that each have their own implications on DB size, performance, etc, I'm just wondering what's the generally recommended way to do something like this?
I would go a combination of both:
Set a flag deleted to true
Use a cronjob to move the entries after a while to a tabelle of type ARCHIVE
If you need to restore the entry, select into the article table and delete from Archive
Why i would go this way?
If a customer deleted the wrong one, the restore could be done instand
After a few weeks/month the article table may grow up to much, so i would archive all entries that are deleted for 1 week p.a.
A common practice is to set a deleted_at column to the date at which the entity was deleted by the user (defaults to null). You may also include a deleted_by column for marking who deleted it. Using some kind of deleted column makes FK relationships easier to work with since these wont break. By moving the row to a new table you would have to update FK (and then update them again if you ever undelete). The downside is that you have to ensure all your queries exclude deleted rows (where this wouldnt be a problem if you moved the row to a new table). Many ORM's make this filtering easier so it depends on what you are using.

Archiving rows dynamically

I was wondering what would be the best solution to dynamically archive rows. For instance when a user marks a task as completed, that task needs to be archived yet still accessible.
What would be the best practices for achieving this? Should I just leave it all in the same table and leave out completed tasks from the queries? I'm afraid that over time the table will become huge (1,000,000 rows in a year or less). Or should I create another table ie task_archive and query that row whenever data is needed from it?
I know similar questions have been asked before but most of them where about archiving thousands of rows simultaneously, I just need to know what would be the best method (and why) to archive 1 row at a time once it's been marked completed
For speed and ease of use, I would generally leave the row in the same table (and flag it as completed) and then later move it to an archive table. This way the user doesn't incur the delay of making that move on the spot; the move can happen as a batch process during non-busy periods.
When that move should happen depends on your application. For example, if they have a dashboard widget that shows "Recently Completed Tasks" that shows all of the tasks completed in the past week (and lets them drill in to see details), it might make sense to move the rows to the archive a week after they've been completed. Or if they frequently need to look at tasks from the current semester (for an academic app) but rarely for previous semesters, make the batch move happen at the end of the semester.
If the table is indexed 1,000,000 rows shouldn't be that big a deal, honestly.
You could use a trigger to capture that the order was marked completed, remove from the current table, and insert into the archive table.
Or, you could create a stored procedure that performed the archive. For example
sp_markcompleted(taskid)
start transaction;
insert into newtable select * from oldtable where id=taskid;
delete from oldtable where id=taskid;
commit;

Versioned and indexed data store

I have a requirement to store all versions of an entity in a easily indexed way and was wondering if anyone has input on what system to use.
Without versioning the system is simply a relational database with a row per, for example, person. If the person's state changes that row is changed to reflect this. With versioning the entry should be updated in such a way so that we can always go back to a previous version. If I could use a temporal database this would be free and I would be able to ask 'what is the state of all people as of yesterday at 2pm living in Dublin and aged 30'. Unfortunately there doesn't seem to be any mature open source projects that can do temporal.
A really nasty way to do this is just to insert a new row per state change. This leads to duplication, as a person can have many fields but only one changing per update. It is also then quite slow to select the correct version for every person given a timestamp.
In theory it should be possible to use a relational database and a version control system to mimic a temporal database but this sounds pretty horrendous.
So I was wondering if anyone has come across something similar before and how they approached it?
Update
As suggested by Aaron here's the query we currently use (in mysql). It's definitely slow on our table with >200k rows. (id = table key, person_id = id per person, duplicated if the person has many revisions)
select name from person p where p.id = (select max(id) from person where person_id = p.person_id and timestamp <= :timestamp)
Update
It looks like the best way to do this is with a temporal db but given that there aren't any open source ones out there the next best method is to store a new row per update. The only problem is duplication of unchanged columns and a slow query.
There are two ways to tackle this. Both assume that you always insert new rows. In every case, you must insert a timestamp (created) which tells you when a row was "modified".
The first approach uses a number to count how many instances you already have. The primary key is the object key plus the version number. The problem with this approach seems to be that you'll need a select max(version) to make a modification. In practice, this is rarely an issue since for all updates from the app, you must first load the current version of the person, modify it (and increment the version) and then insert the new row. So the real problem is that this design makes it hard to run updates in the database (for example, assign a property to many users).
The next approach uses links in the database. Instead of a composite key, you give each object a new key and you have a replacedBy field which contains the key of the next version. This approach makes it simple to find the current version (... where replacedBy is NULL). Updates are a problem, though, since you must insert a new row and update an existing one.
To solve this, you can add a back pointer (previousVersion). This way, you can insert the new rows and then use the back pointer to update the previous version.
Here is a (somewhat dated) survey of the literature on temporal databases: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.6988&rep=rep1&type=pdf
I would recommend spending a good while sitting down with those references and/or Google Scholar to try to find some good techniques that fit your data model. Good luck!