I have a model Post which has a expiry_date. I want to know what is the
best way to manage scalability in this case. 2 options:
Whenever I want to SELECT from the table, I need to include where
expiry_date > NOW. If the table Post grows like a monster, I will be in
trouble. Imagine after 3 years or more. Indexes will be huge too.
Have a trigger, cron job, or a plugin (if it exists) that would go
around the table and move expired items to a new table Post_Archive.
That way, I maintain only current Posts in my main table, which implies
that over 3 years I won't be as bad as option 1.
If you need to archive data on a continuous basis (your #2) than a good option is MaatKit.
http://www.maatkit.org/
It can "nibble" away at data in chunks rather than running mass queries which consume lots of resources (and avoiding polluting your key cache).
So yes, you would run a Maatkit job from cron.
In the meantime, if you want to do #1 at the same time, you could maybe implement a view which conveniently wraps up the "WHERE expiry_dat > NOW" condition so you dont have to include it all on your code.
A cron job sounds good to me, and it can be done by feeding a simple script directly to the mysql command, e.g., roughly:
CREATE TEMPORARY TABLE Moving
SELECT * FROM Post WHERE expiry > NOW();
INSERT INTO Post_Archive
SELECT * FROM Moving;
DELETE FROM Post
WHERE id IN (SELECT id FROM Moving);
DROP TEMPORARY TABLE Moving;
Related
We are updating table XYZ have following fields:
First Name|Middle Name|Last Name|Address|DOB|Country|County|(etc.)
Initially, we are calling some web service which is sending updated information for a row in XYZ like either update first name or DOB update or both or all or none.
Now there is requirement to create a log table in database which store summary of old records and changes done to XYZ. Every affected row should be reported.
Is it good to create similar fields in new table say ABC:
First Name|Middle Name|Last Name|Address|DOB|Country|County|Update_Date
with additional field called "Update_datetime"
Now each time service called we will select values from previous row i.e from XYZ and update the same to ABC with update date.
What are loopholes in this practice? What other better practices can be followed?
Is there a requirement for a log table or a requirement for a proper history?
Oracle has history functionality Out of the box
I doubt MySQL does - you may have to do a different way.
The pros of Oracle is that it will not fail - it's a core feature. The cons of hand rolled is, well, it's hand rolled. Lots of SPs, triggers or other nastiness that people can deliberately or inadvertently bypass.
I echo the need to know what the requirements are behind this. Is it to be human readable (auditing, debugging etc.) or machine readable (e.g. event sourcing architectural pattern)? How often will you need to go back to look at previous versions? How often do things change?
If it's event sourcing, then there are a few answers around on Stack Overflow about that, e.g. Using an RDBMS as event sourcing storage and best event sourcing db strategy. For more of an introduction, there's e.g. a Martin Fowler video.
There are also SO answers on logging changes in MySQL and Using MySQL triggers to log all table changes to a secondary table and an alternative approach (using 1 table, but adding sort-of version numbers to show each record's validity).
I've searched quite a while but I did not find an answer for my question. First let me explain where my question comes from.
I have a web-application that has a user-form in which the user can specify on which days of the week he/she is available and what the office times are at this days. Imagine this as a webform with checkboxes for mo-su and from-time, to-time dropdowns for each checkbox/day. Now if a user profile was inserted to the database table I have something like this in my table:
idUser | idDay | fromTime | toTime
1 | 0 | 08:00 | 09:00
(in this example 0 stands for Monday (0=mo - 6=su))
Of course there is the feature to change the users profile. So it might be possible that first the user was available mo-fr from 5am to 3pm. Now the profile is changed so that the user is only available mo, fr from 5am to 3pm and we, th only till 1pm AND on sa from 9am-5pm, In this case I have to update the existing entries, to delete the removed entries (tue in this example) and to insert the new available days.
Right now I first delete all entries for the user and the I perform a new insert. This is quite easy and in this case probably the best thing to do (because there are max. 7 entries for each user). BUT the same problem could appear in a different situation where it might be better to optimize the query.
So finally my question is: Is it possible to perform an update if exists, insert if not and delete all other entries in just ONE query?
I hope my question is understandable.
The simplest thing is just to delete all instances then create a new schedule for the updated entries. It sounds like you are trying to do some premature optimisation of the process which IMHO will not be necessary. If you find that deleting all existing entries and replacing them is a performance bottleneck then you might want to consider using an approach which tries to minimise the additional inserts but I think that this would probably end up actually being more work than the simpler option of just removing everything and replacing it
You cannot do a INSERT AND UPDATE AND DELETE in 1 query.
You have some options:
In MySql: write a stored procedure that handles your problem, and call the stored procedure to perform the task. 1 call from PHP gets the job done.
In PHP: write some code to perform the task and wrap it in a transaction.
If you would ask what's the best method to get the job done, you'd probably start a war. I put my money on the PHP solution.
This question already has answers here:
Never delete entries? Good idea? Usual?
(10 answers)
Closed 8 years ago.
Just curious to see your opinions, as I am developing a forum script alongside a coexisting CMS - which i have the ACL etc. sorted.
But was wondering if it is best to do DELETE FROM... (as in delete the record)...or just do an UPDATE to set a column to 1 (bool) - which hides it (so it looks deleted).
PS: only those who I trust have access to moderation tools
That's up to you and usually a question of how important is the data you are deleting, or how tolerant you want to be with accidents.
The method I like to use is have a clone database for items you wish to delete. On delete, copy the contents of the selected row to new database, then delete. Having extra "deleted" articles or items in your system is just using up more space and eventually will slow down queries (potentially).
Once you fill up your "delete articles" database, run a dump, archive, truncate.
Let's say you have the database CMS with a table called ARTICLES that you want to store deleted posts, we will create an identical database with the same table structure:
CREATE DATABASE `deleted`;
CREATE TABLE deleted.cmsarticles LIKE CMS.ARTICLES;
In your PHP script that's deleting the content you would do something like this:
//GRAB THE ID OF THE ARTICLE YOU ARE DELETING, MAKE SURE TO SANITIZE!
$article_id=$_POST['id'];
if(is_numeric($article_id) {
$dbconnect=databaseFunction();
$result=$dbconnect->query("SELECT `row1`,`row2` FROM `ARTICLES` WHERE `id`=$article_id");
if($result->num_rows!=0) {
$row=$result->fetch_array(MYSQLI_ASSOC);
//Open new connection to deleted database
$dbconnect2=otherDBFunction();
$dbconnect2->query("INSERT INTO `cmsarticles`(row1,row2) VALUES ({$row['row1']},{$row['row2']}");
$dbconnect->query("DELETE FROM `ARTICLES` where `id`={$_POST['id']}");
}
}
Your intuition is correct. There will be occasions when you want to "undelete" a post, or refer to a deleted post's contents. Imagine if someone posted a threat on someone else's life; you'd want to remove it from view but also want to keep a record of it to, say, share with the police if things escalated.
One common approach is to add a DATETIME column to your table called e.g. deleted_at. This column will be NULL until you "delete" a post, at which time it just gets UPDATEd with the current date and time. Then it's easy to show only records that aren't deleted, e.g. SELECT ... WHERE deleted_at IS NULL.
If performance becomes an issue (it probably won't) or disk size does then you can occasionally prune deleted posts, e.g.:
DELETE FROM posts
WHERE
# DELETE all rows that were "deleted" more than a year ago
COALESCE( deleted_at, NOW() ) < NOW() - INTERVAL 1 YEAR
;
I'm pretty sure this particular quirk isn't a duplicate so here goes.
I have a table of services. In this table, I have about 40 rows of the following columns:
Services:
id_Services -- primary key
Name -- name of the service
Cost_a -- for variant a of service
Cost_b -- for variant b of service
Order -- order service is displayed in
The user can go into an admin tool and update any of this information - including deleting multiple rows, adding a row, editing info, and changing the order they are displayed in.
My question is this, since I will never know how many rows will be incoming from a submission (there could be 1 more or 100% less), I was wondering how to address this in my query.
Upon submission, every value is resubmitted. I'd hate to do it this way but the easiest way I can think of is to truncate the table and reinsert everything... but that seems a little... uhhh... bad! What is the best way to accomplish this?
RE-EDIT: For example: I start with 40 rows, update with 36. I still have to do something to the values in rows 37-40. How can I do this? Are there any mysql tricks or functions that will do this for me?
Thank you very much for your help!
You're slightly limited by the use case; you're doing insertion/update/truncation that's presented to the user as a batch operation, but in the back-end you'll have to do these in separate statements.
Watch out for concurrency: use transactions if you can.
I was wondering what would be the best solution to dynamically archive rows. For instance when a user marks a task as completed, that task needs to be archived yet still accessible.
What would be the best practices for achieving this? Should I just leave it all in the same table and leave out completed tasks from the queries? I'm afraid that over time the table will become huge (1,000,000 rows in a year or less). Or should I create another table ie task_archive and query that row whenever data is needed from it?
I know similar questions have been asked before but most of them where about archiving thousands of rows simultaneously, I just need to know what would be the best method (and why) to archive 1 row at a time once it's been marked completed
For speed and ease of use, I would generally leave the row in the same table (and flag it as completed) and then later move it to an archive table. This way the user doesn't incur the delay of making that move on the spot; the move can happen as a batch process during non-busy periods.
When that move should happen depends on your application. For example, if they have a dashboard widget that shows "Recently Completed Tasks" that shows all of the tasks completed in the past week (and lets them drill in to see details), it might make sense to move the rows to the archive a week after they've been completed. Or if they frequently need to look at tasks from the current semester (for an academic app) but rarely for previous semesters, make the batch move happen at the end of the semester.
If the table is indexed 1,000,000 rows shouldn't be that big a deal, honestly.
You could use a trigger to capture that the order was marked completed, remove from the current table, and insert into the archive table.
Or, you could create a stored procedure that performed the archive. For example
sp_markcompleted(taskid)
start transaction;
insert into newtable select * from oldtable where id=taskid;
delete from oldtable where id=taskid;
commit;