Mysql UPDATE before first checking if necessary or just UPDATE? - mysql

I'm using mysql to update a field in a table when a condition is met...
Should I first do a SELECT to see if the condition is met or do I just try to use UPDATE every time, because if the condition is not met, nothing happens.
To be concrete, here is my SELECT:
SELECT * FROM forum_subscriptions
WHERE IDTopic=11111 AND IDUser=11111 and status=0
I am checking here if I am on forum topic 11111 and if if I (user ID 1) is subscribed to this topic and my status on the subscription is 0 (that means that he didn't yet get email about new post in topic)
So when this is met do:
UPDATE forum_subscriptions SET Status=1 where IDTopic=11111 AND IDUser=1
Now I am wondering, I always do a select here to query if a user is subscribed to this topic and he has a status that he visited that topic before so any new posts will not trigger new email notification. When he visits the page again, the update is triggered that resets the visit so any new posts will again send him email.
So select is made on every user if he is subscribed or not to test the subscription. Update is made only when necessary.
Is it better to just use the update? To try to update on every page, if he is not subscribed to the topic it will not update anything.
How fast is update that doesn't produce any valid data? How is it made internally, how does update find if there is any record, does it select and then update? If so it would be better to only update because I would achieve same thing without any slowdowns. If the update is more expensive than select I should try to check first and then update if necessary.
This example is a real life example, but the logic behing this update/select is really what I am interested because I do find this kind of a problem more often.
Thanx
UPDATE: Thanx both guys, but I do not see on your links if UPDATE is locking even without results or not. As you gave different answers I still don't know what to do.
The subscription table really doesn't need to be myisam, I could change it to InnoDB because I don't have a need to fulltext it. Is this a good solution, to only use update and change this small table to inno? Does mixing table types have any drawbacks?

You just do the update, with no previous select:
UPDATE forum_subscriptions SET Status=1 where IDTopic=11111 AND IDUser=1
If the conditions are not met, update will do nothing.
This update is very fast if you have an index on status and IDtopic and IDuser!
An empty update is just as fast as an empty select.
If you do the select first, you will just slow things down for no reason.
If you want to know how many rows where updated do a
SELECT ROW_COUNT() as rows_affected
After doing the update, this will tell you 0 if no rows where updated, or the number of rows updated (or inserted or deleted, if you used those statements).
This function is ultra fast because it just has to fetch one value from memory.
Workarounds for table locking issues
See here: http://dev.mysql.com/doc/refman/5.5/en/table-locking.html

A potential side affect of always calling the UPDATE is the locking that needs to be put to insure that no other connection modifies these rows.
If the table is MyISAM - a lock will be places on the he entire table during the search.
If the table is InnoDB, locks will be places on the indexes/gaps.
From the Docs:
A locking read, an UPDATE, or a DELETE
generally set record locks on every
index record that is scanned in the
processing of the SQL statement. It
does not matter whether there are
WHERE conditions in the statement that
would exclude the row

Related

MySql Logic Optimization

Currently we have a ticket management system and like all ticketing systems it needs to assign cases in a round-robin manner to the agents. Also, at the same time the agent can apply their own filtering logic and work on their queue.
The problem,
The table with the tickets is very large now, spans over 10 million rows.
One ticket should never be assigned to two different users.
To solve the above problem, this is the flow we have,
Select query is fired with filter criteria and limit 0,1
The row returned by the above query is then selected based on id and locked for update.
Lastly we fire the update saying user X has picked the case.
While step 3 executes other user cannot get a lock on the same case, so they fire 3.a query may be multiple times to get the next available case.
As number of users increase this time in step 4 goes higher and higher.
We tried doing a select for update in query at step 4 itself, but it makes the entire query slow. Assuming this is because a huge number of rows in the select query.
Questions,
Is there a different approach we need to take altogether?
Would doing a select and update in a stored procedure ensure the same results as doing a select for update and then update?
P.S - I have asked the same question stackexchange.
The problem is that you are trying to use MySQL level locking to ensure that a ticket cannot be assigned to more than one person. This way there is no way to detect if a ticket is locked by a user.
I would implement an application level lock by adding 2 lock related fields to the tickets table: a timestamp when the lock was applied and a user id field telling you which user holds the lock. The lock related fields may be held in another table (shopping cart, for example can be used for this purpose).
When a user selects a ticket, then you try to update these lock fields with a conditional update statement:
update tickets
set lock_time=now(), lock_user=...
where ticket_id=... and lock_time is null
Values in place of ... are supplied by your application. lock_time is null criteria is there to make sure that if the ticket has already been selected by another user, then the later user does not override the lock. After the update statement check out the number of rows affected. If it is one, then the current user acquired the lock. If it is 0, then someone else locked the ticket.
If you have the locking data in another table, then place a unique restriction on the ticket id field in that table and use insert to acquire a lock. If the insert succeeds, then the lock is acquired. If it fails, then another user has locked the ticket.
The lock is usually held for a number of minutes, after that your application must release the lock (set locking fields to null or delete the locking record from the other table).

MySql Triggers and performance

I have the following requirement. I have 4 MySQL databases and an application in which the user needs to get the count of number of records in tables of each of these databases. The issue is that count may change in every minute or second. So whenever the user mouse-hovering the particular UI area, I need to have a call to all these databases and get the count. I don’t think it is a best approach, as these tables contain millions of records and every time on mouse over, a dB call is going to all these databases.
Trigger is the one approach I found. Rather than we are pulling data from the database, I feel like whenever any insert/update/delete happening to these tables, a trigger will execute and that will increment/decrement the count in another table (which contain only the count of these tables). But I have read like triggers will affect database performance, but also read some situation trigger is the only solution.
So please guide me in my situation triggers are the solution? If it affects the database performance I don’t need that. Is there any other better approach for this problem?
Thanks
What I understood is you have 4 databases and n number of tables in each of them and when the user hovers over a particular area in your application the user should see the number of rows in that table.
I would suggest you to use count(*) to return the number of rows in each table in the database.Triggers are used to do something when a particular event like update,delete or insert occurs in a database.It's not a good idea to invoke triggers to react to user interactions like hovering.If you can tell me in which language you are designing the front end I can be more specific.
Example:
SELECT COUNT(*) FROM tablename where condition
OR
SELECT SQL_CALC_FOUND_ROWS * FROM tablename
WHERE condition
LIMIT 5;
SELECT FOUND_ROWS();
The second one is used when you want to limit the results but still return total number of rows found.Hope it helps.
Please don't use count(*). This is inefficient, possibly to the point of causing a table scan. If you can get to the information schema, this should return the result you need sub-second:
select table_rows from information_schema.tables where table_name = 'tablename'
If you can't for some reason, and your table has a primary key, try:
SELECT COUNT(field) FROM tablename
...where field is part of the primary key. This will be slower, especially on large tables, but still better than count(*).
Definitely don't use trigger.

How to retrieve the new rows of a table every minute

I have a table, to which rows are only appended (not updated or deleted) with transactions (I'll explain why this is important), and I need to fetch the new, previously unfetched, rows of this table, every minute with a cron.
How am I going to do this? In any programming language (I use Perl but that's irrelevant.)
I list the ways I thought of how to solve this problem, and ask you to show me the correct one (there HAS to be one...)
The first way that popped to my head was to save (in a file) the largest auto_incrementing id of the rows fetched, so in the next minute I can fetch with: WHERE id > $last_id. But that can miss rows. Because new rows are inserted in transactions, it's possible that the transaction that saves the row with id = 5 commits before the transaction that saves the row with id = 4. It's therefore possible that the cron script retrieves row 5 but not row 4, and when row 4 gets committed one split second later, it will never gets fetched (because 4 is not > than 5 which is the $last_id).
Then I thought I could make the cron job fetch all rows that have a date field in the last TWO minutes, check which of these rows have been retrieved again in the previous run of the cron job (to do this I would need to save somewhere which row ids were retrieved), compare, and process only the new ones. Unfortunately this is complicated, and also doesn't solve the problem that will occur if a certain inserting transaction takes TWO AND A HALF minutes to commit for some weird database reason, which will cause the date to be too old for the next iteration of the cron job to fetch.
Then I thought of installing a message queue (MQ) like RabbitMQ or any other. The same process that does the inserting transaction, would notify RabbitMQ of the new row, and RabbitMQ would then notify an always-running process that processes new rows. So instead of getting a batch of rows inserted in the last minute, that process would get the new rows one-by-one as they are written. This sounds good, but has too many points of failure - RabbitMQ might be down for a second (in a restart for example) and in that case the insert transaction will have committed without the receiving process having ever received the new row. So the new row will be missed. Not good.
I just thought of one more solution: the receiving processes (there's 30 of them, doing the exact same job on exactly the same data, so the same rows get processed 30 times, once by each receiving process) could write in another table that they have processed row X when they process it, then when time comes they can ask for all rows in the main table that don't exist in the "have_processed" table with an OUTER JOIN query. But I believe (correct me if I'm wrong) that such a query will consume a lot of CPU and HD on the DB server, since it will have to compare the entire list of ids of the two tables to find new entries (and the table is huge and getting bigger each minute). It would have been fast if the receiving process was only one - then I would have been able to add a indexed field named "have_read" in the main table that would make looking for new rows extremely fast and easy on the DB server.
What is the right way to do it? What do you suggest? The question is simple, but a solution seems hard (for me) to find.
Thank you.
I believe the 'best' way to do this would be to use one process that checks for new rows and delegates them to the thirty consumer processes. Then your problem becomes simpler to manage from a database perspective and a delegating process is not that difficult to write.
If you are stuck with communicating to the thirty consumer processes through the database, the best option I could come up with is to create a trigger on the table, which copies each row to a secondary table. Copy each row to the secondary table thirty times (once for each consumer process). Add a column to this secondary table indicating the 'target' consumer process (for example a number from 1 to 30). Each consumer process checks for new rows with its unique number and then deletes those. If you are worried that some rows are deleted before they are processed (because the consumer crashes in the middle of processing), you can fetch, process and delete them one by one.
Since the secondary table is kept small by continuously deleting processed rows, INSERTs, SELECTs and DELETEs would be very fast. All operations on this secondary table would also be indexed by the primary key (if you place the consumer ID as first field of the primary key).
In MySQL statements, this would look like this:
CREATE TABLE `consumer`(
`id` INTEGER NOT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `consumer`(`id`) VALUES
(1),
(2),
(3)
-- all the way to 30
;
CREATE TABLE `secondaryTable` LIKE `primaryTable`;
ALTER TABLE `secondaryTable` ADD COLUMN `targetConsumerId` INTEGER NOT NULL FIRST;
-- alter the secondary table further to allow several rows with the same primary key (by adding targetConsumerId to the primary key)
DELIMTER //
CREATE TRIGGER `mark_to_process` AFTER INSERT ON `primaryTable`
FOR EACH ROW
BEGIN
-- by doing a cross join with the consumer table, this automatically inserts the correct amount of rows and adding or deleting consumers is just a matter of adding or deleting rows in the consumer table
INSERT INTO `secondaryTable`(`targetConsumerId`, `primaryTableId`, `primaryTableField1`, `primaryTableField2`) SELECT `consumer`.`id`, `primaryTable`.`id`, `primaryTable`.`field1`, `primaryTable`.`field2` FROM `consumer`, `primaryTable` WHERE `primaryTable`.`id` = NEW.`id`;
END//
DELIMITER ;
-- loop over the following statements in each consumer until the SELECT doesn't return any more rows
START TRANSACTION;
SELECT * FROM secondaryTable WHERE targetConsumerId = MY_UNIQUE_CONSUMER_ID LIMIT 1;
-- here, do the processing (so before the COMMIT so that crashes won't let you miss rows)
DELETE FROM secondaryTable WHERE targetConsumerId = MY_UNIQUE_CONSUMER_ID AND primaryTableId = PRIMARY_TABLE_ID_OF_ROW_JUST_SELECTED;
COMMIT;
I've been thinking on this for a while. So, let me see if I got it right. You have a HUGE table in which N, amount which may vary in time, processes write (let's call them producers). Now, there are these M, amount which my vary in time, other processes that need to at least process once each of those records added (let's call them consumers).
The main issues detected are:
Making sure the solution will work with dynamic N and M
It is needed to keep track of the unprocessed records for each consumer
The solution has to escalate as much as possible due to the huge amount of records
In order to tackle those issues I thought on this. Create this table (PK in bold):
PENDING_RECORDS(ConsumerID, HugeTableID)
Modify the consumers so that each time they add a record to the HUGE_TABLE they also add M records to the PENDING_RECORDS table so that it has the HugeTableID and also each of the ConsumerID that exist at that time. Each time a consumer runs it will query the PENDING_RECORDS table and will find a small amount of matches for itself. It will then join against the HUGE_TABLE (note it will be an inner join, not a left join) and fetch the actual data it needs to process. Once the data is processed then the consumer will delete the records fetched from the PENDING_RECORDS table, keeping it decently small.
Interesting, i must say :)
1) First of all - is it possible to add a field to the table that has rows only added (let's call it 'transactional_table')? I mean, is it a design paradigm and you have a reason not to do any sort of updates on this table, or is it "structurally" blocked (i.e. user connecting to db has no privileges to perform updates on this table) ?
Because then the simplest way to do it is to add "have_read" column to this table with default 0, and update this column on fetched rows with 1 (even if 30 processess do this simultanously, you should be fine as it would be very fast and it won't corrupt your data). Even if 30 processess mark the same 1000 rows as fetched - nothing is corrupt. Although if you do not operate on InnoDB, this might be not the best way as far as performance is concerned (MyISAM locks whole tables on updates, InnoDB only rows that are updated).
2) If this is not what you could use - I would surely check out the solution you gave as your last one, with a little modification. Create a table (let's say: fetched_ids), and save fetched rows' ids in that table. Then you could use something like :
SELECT tt.* from transactional_table tt
RIGHT JOIN fetched_ids fi ON tt.id = fi.row_id
WHERE fi.row_id IS NULL
This will return the rows from you transactional table, that have not been saved as already fetched. As long as both (tt.id) and (fi.row_id) have (ideally unique) indexes, this should work just fine even on large sets of data. MySQL handles JOINS on indexed fields pretty well. Do not fear trying out - create new table, copy ids to it, delete some of them and run your query. You'll see the results and you'll know if they are satisfactory :)
P.S. Of course, adding rows to this 'fetched_ids' table should be ran carefully not to create unnecessary duplicates (30 simultaneous processes could write 30 times the data you need - and if you need performance, you should watch out for this case).
How about a second table with a structure like this:
source_fk - this would hold an ID of the data rows you want to read.
process_id - This would be a unique id for one of the 30 processes.
then do a LEFT JOIN and exclude items from your source that have entries matching the specified process_id.
once you get your results, just go back and add the source_fk and process_id for each result you get.
One plus about this is you can add more processes later on with no problem.
I would try adding a timestamp column and use it as a reference when retrieving new rows.

How to properly avoid Mysql Race Conditions

I know this has been asked before, but I'm still confused and would like to avoid any problems before I go into programming if possible.
I plan on having an internal website with at least 100 users active at any given time. Users would post an item (inserted into db with a 0 as its value) and that item would be shown via a php site (db query). Users then get the option to press a button and lock that item as theirs (assign the value of that item as their id)
How do I ensure that 2 or more users don't retrieve the same item at the same time. I know in programming like c++ I would just use plain ol mutex lock. Is their an equivalent in mysql where it will lock just one item entry like that? I've seen references to LOCK_TABLES and GET_LOCK and many others so I'm still very confused on what would be best.
There is potential for many people all racing to press that one button and it would be disastrous if multiple people get a confirmation.
I know this is a prime example of a race condition, but mysql is foreign territory for me.
I obviously will query the value of the item before I update it and make sure it hasn't written, but what is the best way to ensure that this race condition is avoided.
Thanks in advance.
To achieve this, you will need to lock the record somehow.
Add a column LockedBy defaulting to 0.
When someone pushes the button execute a query resembling this:
UPDATE table SET LockedBy= WHERE LockedBy=0 and id=;
After the update verify the affected rows (in php mysql_affected_rows). If the value is 0 it means the query did not update anything because the LockedBy column is not 0 and thus locked by someone else.
Hope this helps
When you post a row, set the column to NULL, not 0.
Then when a user updates the row to make it their own, update it as follows:
UPDATE MyTable SET ownership = COALESCE(ownership, $my_user_id) WHERE id = ...
COALESCE() returns its first non-null argument. So even if you and I are updating concurrently, the first one to commit gets to set the value. The second one will not override that value.
You may consider Transactions
BEGING TRANSACTION;
SELECT ownership FROM ....;
UPDATE table .....; // set the ownership if the table not owned yet
COMMIT;
and also you can ROLLBACK all the queries between the transaction if you caught an error !

MySQL Update entire table with unknown # of rows and clear the rest

I'm pretty sure this particular quirk isn't a duplicate so here goes.
I have a table of services. In this table, I have about 40 rows of the following columns:
Services:
id_Services -- primary key
Name -- name of the service
Cost_a -- for variant a of service
Cost_b -- for variant b of service
Order -- order service is displayed in
The user can go into an admin tool and update any of this information - including deleting multiple rows, adding a row, editing info, and changing the order they are displayed in.
My question is this, since I will never know how many rows will be incoming from a submission (there could be 1 more or 100% less), I was wondering how to address this in my query.
Upon submission, every value is resubmitted. I'd hate to do it this way but the easiest way I can think of is to truncate the table and reinsert everything... but that seems a little... uhhh... bad! What is the best way to accomplish this?
RE-EDIT: For example: I start with 40 rows, update with 36. I still have to do something to the values in rows 37-40. How can I do this? Are there any mysql tricks or functions that will do this for me?
Thank you very much for your help!
You're slightly limited by the use case; you're doing insertion/update/truncation that's presented to the user as a batch operation, but in the back-end you'll have to do these in separate statements.
Watch out for concurrency: use transactions if you can.