I have the following MySQL table:
myTable:
id int auto_increment
voucher int not null
id_user int null
I've populated voucher field with values from 1 to 100000 so I've got 100000 records. When a user clicks a button in a PHP page, I need to allocate a record for the user so I make something similar like:
update myTable set id_user=XXX where
voucher=(SELECT * FROM (SELECT MIN(voucher) FROM myTable WHERE id_user is null) v);
The problem is that I don't use locks and I should use them because if two users click in the same moment I risk assigning the same voucher to different persons (2 updates in the same record so I lose 1 user).
I think there must be a correct way to do this, can you help me please?
Thanks!
If you truly want to serialize your process, you can grab a Lock Tables tablename Write at the start of your transaction, and Unlock Tables when done.
If you are using Innodb and transactions, you have to perform the Lock Tables after the start of the transaction.
I am not advocating this method, as there is usually a better way of handling, however if you need a quick and dirty solution, this will work with a minimal amount of code changes.
Related
I have a MySQL table of Users, and a table of Actions performed by the Users (linked to that User by a the primary key, userid ). The Actions table has an incrementing key indx. Whenever I add a new row to that table, I then update the latest column of the relevant Users row with the indx of the row I just added to the Actions table. So something like:
INSERT INTO actions(indx,actionname,userid) VALUES(default, "myaction", 1);
UPDATE users SET latest=LAST_INSERT_ID() WHERE userid=1;
The idea being that I can check for updates for a User by seeing if the latest is higher then the last time I checked.
My issue is that if more than one connection is opened on the database and they try and add an Action for the same User at the same time, connection2 could conceivably run their INSERT and UPDATE between the INSERT and update of connection1, and the latest entry of the user they're both trying to update will no longer have the indx of the most recent action entry.
I've been reading up on transaction, isolation levels, etc. But haven't really found a way around this (though my understanding of how these work exactly is pretty shaky, so maybe I just misunderstood). I think I need a way to lock the Actions table until the User table is updated. This application only gets used by a few hundred users tops, so I don't think the performance hit due to momentarily locking the table will be too bad.
So is that something that can be done in MySQL? Is there a better solution? I imagine this general pattern must be pretty common: having one table with a bunch of varieties of rows, and a second table with a row that tracks meta data for each variety in table A and needs to be updated atomically each time that first table is changed. So I'm hoping there's a solution that isn't too complex
Use SELECT ... FOR UPDATE to lock the row in order to serialize the access to the table and prevent from race conditions:
START TRANSACTION;
SELECT any_column FROM users WHERE userid=1 FOR UPDATE;
INSERT INTO actions(indx,actionname,userid) VALUES(default, "myaction", 1);
UPDATE users SET latest=LATEST_INSERT_ID() WHERE userid=1;
COMMIT;
However this will slown down your INSERTing rate, because all these transactions from all sessions will be serialized.
The better option is to not store the last ID in users table at all. Just use SELECT max( id ) FROM actions WHERE userid = xxxx in all places where this number is required. With an index on actions( userid ) this query will be very fast (assuming that id column is the primary key in this table), and the inserts will not be slowed down
I've looked over all of the related questions i've find, but couldn't get one which will answer mine.
i got a table like this:
id | name | age | active | ...... | ... |
where "id" is the primary key, and the ... meaning there are something like 30 columns.
the "active" column is of tinyint type.
My task:
Update ids 1,4,12,55,111 (those are just an example, it can be 1000 different id in total) with active = 1 in a single query.
I did:
UPDATE table SET active = 1 WHERE id IN (1,4,12,55,111)
its inside a transaction, cause i'm updating something else in this process.
the engine is InnoDB
My problem:
Someone told me that doing such a query is equivalent to 5 queries at execution, cause the IN will translate to the a given number of OR, and run them one after another.
eventually, instead of 1 i get N which is the number in the IN.
he suggests to create a temp table, insert all the new values in it, and then update by join.
Does he right? both of the equivalency and performance.
What do you suggest? i've thought INSERT INTO .. ON DUPLICATE UPDATE will help but i don't have all the data for the row, only it id, and that i want to set active = 1 on it.
Maybe this query is better?
UPDATE table SET
active = CASE
WHEN id='1' THEN '1'
WHEN id='4' THEN '1'
WHEN id='12' THEN '1'
WHEN id='55' THEN '1'
WHEN id='111' THEN '1'
ELSE active END
WHERE campaign_id > 0; //otherwise it throws an error about updating without where clause in safe mode, and i don't know if i could toggle safe mode off.
Thanks.
It's the other way around. OR can sometimes be turned into IN. IN is then efficiently executed, especially if there is an index on the column. If you have 1000 entries in the IN, it will do 1000 probes into the table based on id.
If you are running a new enough version of MySQL, I think you can do EXPLAIN EXTENDED UPDATE ...OR...; SHOW WARNINGS; to see this conversion;
The UPDATE CASE... will probably tediously check each and every row.
It would probably be better on other users of the system if you broke the UPDATE up into multiple UPDATEs, each having 100-1000 rows. More on chunking .
Where did you get the ids in the first place? If it was via a SELECT, then perhaps it would be practical to combine it with the UPDATE to make it one step instead of two.
I think below is better because it uses primary key.
UPDATE table SET active = 1 WHERE id<=5
I've got a bit of a stupid question. The thing is my program has to have the function to delete data from my database. Yay, not really the problem. But how can I delete data without the danger that others can see, that there has been something deleted.
User Table:
U_ID U_NAME
1 Chris
2 Peter
OTHER TABLE
ID TIMESTAMP FK_U_D
1 2012-12-01 1
2 2012-12-02 1
Sooooo the ID's are AUTO_INCREMENT, so if I delete one of them there's a gap. Furthermore, the timestamp is also bigger than the row before, so ascending.
I want to let the data with ID 1 disappear from the user's profile (U_ID 1).
If I delete it, there is a gap. If I just change the FK_U_ID to 2 (Peter) it's obvious, because when I insert data, there are 20 or 30 data rows with the same U_ID...so it's obvious that there has been a modification.
If I set the FK_U_ID NULL --> same sh** like when I change it to another U_ID.
Is there any solution to get this work? I know that if nobody but me has access to the database, it's just no problem. But just in case, if somebody controls my program it should not be obvious that there has been modifications.
So here we go.
For the ID gaps issue you can use GUIDs as #SLaks suggests, but then you can't use the native RDBMS auto_increment which means you have to create the GUID and insert it along with the rest of the record data upon creation. Of course, you don't really need the ID to be globally unique, you could just store a random string of 20 characters or something, but then you have to do a DB read to see if that ID is taken and repeat (recursively) that process until you find an unused ID... could be quite taxing.
It's not at all clear why you would want to "hide" evidence that a delete was performed. That sounds like a really bad idea. I'm not a fan of promulgating misinformation.
Two of the characteristics of an ideal primary key are:
- anonymous (be void of any useful information, doesn't matter what it's set to)
- immutable (once assigned, it will never be changed.)
But, if we set that whole discussion aside...
I can answer a slightly different question (an answer you might find helpful to your particular situation)
The only way to eliminate a "gap" in the values in a column with an AUTO_INCREMENT would be to change the column values from their current values to a contiguous sequence of new values. If there are any foreign keys that reference that column, the values in those columns would need to be updated as well, to preserve the relationship. That will likely leave the current auto_increment value of the table higher than the largest value of the id column, so I'd want to reset that as well, to avoid a "gap" on the next insert.
(I have done re-sequencing of auto_increment values in development and test environments, to "cleanup" lookup tables, and to move the id values of some tables to ranges that are distinct from ranges in other tables... that let's me test SQL to make sure the SQL join predicates aren't inadvertently referencing the wrong table, and returning rows that look correct by accident... those are some reasons I've done reassignment if auto_increment values)
Note that the database can "automagically" update foreign key values (for InnnoDB tables) when you change the primary key value, as long as the foreign key constraint is defined with ON UPDATE CASCADE, and FOREIGN_KEY_CHECKS is not disabled.
If there are no foreign keys to deal with, and assuming that all of the current values of id are positive integers, then I've been able to do something like this: (with appropriate backups in place, so I can recover if things don't work right)
UPDATE mytable t
JOIN (
SELECT s.id AS old_id
, #i := #i + 1 AS new_id
FROM mytable s
CROSS
JOIN (SELECT #i := 0) i
ORDER BY s.id
) c
ON t.id = c.old_id
SET t.id = c.new_id
WHERE t.id <> c.new_id
To reset the table AUTO_INCREMENT back down to the largest id value in the table:
ALTER TABLE mytable AUTO_INCREMENT = 1;
Typically, I will create a table and populate it from that query in the inline view (aliased as c) above. I can then use that table to update both foreign key columns and the primary key column, first disabling the FOREIGN_KEY_CHECKS and then re-enabling it. (In a concurrent environment, where other processes might be inserting/updating/deleting rows from one of the tables, I would of course first obtain an exclusive lock on all of the tables to be updated.)
Taking up again, the discussion I set aside earlier... this type of "administrative" function can be useful in a test environment, when setting up test cases. But it is NOT a function that is ever performed in a production environment, with live data.
My database knowledge is reasonable I would say, im using MySQL (InnoDb) for this and have done some Postgres work as well. Anyway...
I have a large amount of Yes or No questions.
A large amount of people can contribute to the same poll.
A user can choose either option and this will be recorded in the database.
User can change their mind later and swap choices which will require an update to the data stored.
My current plan for storing this data:
POLLID, USERID, DECISION, TIMESTAMP
Obviously user data is in another table.
To add their choice, I would have to query to see if they have voted before and insert, otherwise, update.
If I want to see the poll results I would need to go iterate through all decisions (albeit indexed portions) every time someone wants to see the poll.
My questions are
Is there any more efficient way to store/query this?
Would I have an index on POLLID, or POLLID & USERID (maybe just a unique constraint)? Or other?
Additional side question: Why dont I have an option to choose HASH vs BTREE indexes on my tables like i would in Postgres?
The design sounds good, a few ideas:
A table for polls: poll id, question.
A table for choices: choice id, text.
A table to link polls to choices: poll id->choice ids.
A table for users: user details, user ids.
A votes table: (user id, poll id), choice id, time stamp. (brackets are a unique pair)
Inserting/updating for a single user will work fine, as you can just check if an entry exists for the user id and the poll id.
You can view the results much easier than iterating through by using COUNT.
e.g.: SELECT COUNT(*) FROM votes WHERE pollid = id AND decision = choiceid
That would tell you how many people voted for "choiceid" in the poll "pollid".
Late Edit:
This is a way of inserting if it doesn't exist and updating if it does:
IF EXISTS (SELECT * FROM TableName WHERE UserId='Uid' AND PollId = 'pollid')
UPDATE TableName SET (set values here) WHERE UserId='Uid' AND PollId = 'pollid'
ELSE
INSERT INTO TableName VALUES (insert values here)
I am running many instances of a webcrawler in parallel.
Each crawler selects a domain from a table, inserts that url and a start time into a log table, and then starts crawling the domain.
Other parallel crawlers check the log table to see what domains are already being crawled before selecting their own domain to crawl.
I need to prevent other crawlers from selecting a domain that has just been selected by another crawler but doesn't have a log entry yet. My best guess at how to do this is to lock the database from all other read/writes while one crawler selects a domain and inserts a row in the log table (two queries).
How the heck does one do this? I'm afraid this is terribly complex and relies on many other things. Please help get me started.
This code seems like a good solution (see the error below, however):
INSERT INTO crawlLog (companyId, timeStartCrawling)
VALUES
(
(
SELECT companies.id FROM companies
LEFT OUTER JOIN crawlLog
ON companies.id = crawlLog.companyId
WHERE crawlLog.companyId IS NULL
LIMIT 1
),
now()
)
but I keep getting the following mysql error:
You can't specify target table 'crawlLog' for update in FROM clause
Is there a way to accomplish the same thing without this problem? I've tried a couple different ways. Including this:
INSERT INTO crawlLog (companyId, timeStartCrawling)
VALUES
(
(
SELECT id
FROM companies
WHERE id NOT IN (SELECT companyId FROM crawlLog) LIMIT 1
),
now()
)
You can lock tables using the MySQL LOCK TABLES command like this:
LOCK TABLES tablename WRITE;
# Do other queries here
UNLOCK TABLES;
See:
http://dev.mysql.com/doc/refman/5.5/en/lock-tables.html
Well, table locks are one way to deal with that; but this makes parallel requests impossible. If the table is InnoDB you could force a row lock instead, using SELECT ... FOR UPDATE within a transaction.
BEGIN;
SELECT ... FROM your_table WHERE domainname = ... FOR UPDATE
# do whatever you have to do
COMMIT;
Please note that you will need an index on domainname (or whatever column you use in the WHERE-clause) for this to work, but this makes sense in general and I assume you will have that anyway.
You probably don't want to lock the table. If you do that you'll have to worry about trapping errors when the other crawlers try to write to the database - which is what you were thinking when you said "...terribly complex and relies on many other things."
Instead you should probably wrap the group of queries in a MySQL transaction (see http://dev.mysql.com/doc/refman/5.0/en/commit.html) like this:
START TRANSACTION;
SELECT #URL:=url FROM tablewiththeurls WHERE uncrawled=1 ORDER BY somecriterion LIMIT 1;
INSERT INTO loggingtable SET url=#URL;
COMMIT;
Or something close to that.
[edit] I just realized - you could probably do everything you need in a single query and not even have to worry about transactions. Something like this:
INSERT INTO loggingtable (url) SELECT url FROM tablewithurls u LEFT JOIN loggingtable l ON l.url=t.url WHERE {some criterion used to pick the url to work on} AND l.url IS NULL.
I got some inspiration from #Eljakim's answer and started this new thread where I figured out a great trick. It doesn't involve locking anything and is very simple.
INSERT INTO crawlLog (companyId, timeStartCrawling)
SELECT id, now()
FROM companies
WHERE id NOT IN
(
SELECT companyId
FROM crawlLog AS crawlLogAlias
)
LIMIT 1
I wouldn't use locking, or transactions.
The easiest way to go is to INSERT a record in the logging table if it's not yet present, and then check for that record.
Assume you have tblcrawels (cra_id) that is filled with your crawlers and tblurl (url_id) that is filled with the URLs, and a table tbllogging (log_cra_id, log_url_id) for your logfile.
You would run the following query if crawler 1 wants to start crawling url 2:
INSERT INTO tbllogging (log_cra_id, log_url_id)
SELECT 1, url_id FROM tblurl LEFT JOIN tbllogging on url_id=log_url
WHERE url_id=2 AND log_url_id IS NULL;
The next step is to check whether this record has been inserted.
SELECT * FROM tbllogging WHERE log_url_id=2 AND log_cra_id=1
If you get any results then crawler 1 can crawl this url. If you don't get any results this means that another crawler has inserted in the same line and is already crawling.
It's better to use row lock or transactional based query so that other parallel request context can access the table.