I've been developing an application that handles accounts and transactions made over these accounts.
Currently the tables the application uses are modelled the following way:
account
+----+-----------------+---------+-----+
| id | current_balance | version | ... |
+----+-----------------+---------+-----+
| 1 | 1000 | 48902 | ... |
| 2 | 2000 | 34933 | ... |
| 3 | 100 | 103 | ... |
+----+-----------------+---------+-----+
account_transaction
+------+-------------+----------------------+---------+------------------+-----+
| id | account_id | date | value | resulting_amount | ... |
+------+-------------+----------------------+---------+------------------+-----+
| 101 | 1 | 03/may/2012 10:13:33 | 1000 | 2000 | ... |
| 102 | 2 | 03/may/2012 10:13:33 | 500 | 1500 | ... |
| 103 | 1 | 03/may/2012 10:13:34 | -500 | 1500 | ... |
| 104 | 2 | 03/may/2012 10:13:35 | -50 | 1450 | ... |
| 105 | 2 | 03/may/2012 10:13:35 | 550 | 2000 | ... |
| 106 | 1 | 03/may/2012 10:13:35 | -500 | 1000 | ... |
+------+-------------+----------------------+---------+------------------+-----+
Whenever the application processes a new transaction, it inserts a new row into account_transaction and, at the account table, it updates the column current_balance that store the current balance for the account and the column version used for optimistic locking.
If the optimistic locking works, the transaction is commited, if it doesn't the transaction is rolled back.
As a rough example, when processing the transaction 102, the application did the
following pseudo SQL/JAVA:
set autocommit = 0;
insert into account_transaction
(account_id, date, value, resulting_amount)
values
(2, sysdate(), 550, 2000);
update account set
current_balance = 2000,
version = 34933
where
id = 2 and
version = 34932;
if (ROW_COUNT() != 1) {
rollback;
}
else {
commit;
}
However certain accounts are very active and receive many simultaneous transactions which causes deadlocks at MySQL while updating the rows at account table. These deadlocks impose a serious performance penalty to the application since it causes the transactions to be reprocessed when deadlocks at the database occur.
How can I efficiently handle the current balance for the accounts? The current balance is needed to authorize/deny new transactions and is used in various reports.
How can I efficiently handle the current balance for the accounts?
I think this whole model is over-engineered.
Abandoning the optimistic locking through version and having a simple...
UPDATE account SET current_balance = current_balance + value WHERE id = ...
...at the end of the transaction that inserts a new account_transaction should be plenty fast. For data integrity, consider putting this into AFTER INSERT trigger on account_transaction1.
First of all, you are doing it at the end of the transaction, so even if the transaction is long, the lock contention on this row should be short.
SQL guarantees consistent data view within a single statement, so there is no need for separate SELECT ... FOR UPDATE.
Also, since you are adding a value, instead or directly setting the sum, it doesn't really matter in which order these operations are done - addition is commutative (so shorter transactions can "overtake" the longer ones).
1 But be careful not to trigger it too early - only insert new account_transaction when it is completely "cooked", don't (for example) insert early but update the resulting_amount later.
Related
Say I have multiple workers that can concurrently read and write against a MySQL table (e.g. jobs). The task for each worker is:
Find the oldest QUEUED job
Set it's status to RUNNING
Return the corresponding ID.
Note that there may not be any qualifying (i.e. QUEUED) jobs when a worker runs step #1.
I have the following pseudo-code so far. I believe I need to cancel (ROLLBACK) the transaction if step #1 returns no jobs. How would I do that in the code below?
BEGIN TRANSACTION;
# Update the status of jobs fetched by this query:
SELECT id from jobs WHERE status = "QUEUED"
ORDER BY created_at ASC LIMIT 1;
# Do the actual update, otherwise abort (i.e. ROLLBACK?)
UPDATE jobs
SET status="RUNNING"
# HERE: Not sure how to make this conditional on the previous ID
# WHERE id = <ID from the previous SELECT>
COMMIT;
I am implementing something very similar to your case this week. A number of workers, each grabbing the "next" row in a set of rows to work on.
The pseudocode is something like this:
BEGIN;
SELECT ID INTO #id FROM mytable WHERE status = 'QUEUED' LIMIT 1 FOR UPDATE;
UPDATE mytable SET status = 'RUNNING' WHERE id = #id;
COMMIT;
Using FOR UPDATE is important to avoid race conditions, i.e. more than one worker trying to grab the same row.
See https://dev.mysql.com/doc/refman/8.0/en/select-into.html for information about SELECT ... INTO.
It's still not quite clear what you are after. But assuming your task is: Find the next QUEUED job. Set it's status to RUNNING and select the corresponding ID.
In a single threaded environment, you can just use your code. Fetch the selected ID into a variable in your application code and pass it to the UPDATE query in the WHERE clause. You don't even need a transaction, since there is only one writing statement. You can mimic in an SQLscript.
Assuming this is your current state:
| id | created_at | status |
| --- | ------------------- | -------- |
| 1 | 2020-06-15 12:00:00 | COMLETED |
| 2 | 2020-06-15 12:00:10 | QUEUED |
| 3 | 2020-06-15 12:00:20 | QUEUED |
| 4 | 2020-06-15 12:00:30 | QUEUED |
You want to start the next queued job (which has id=2).
SET #id_for_update = (
SELECT id
FROM jobs
WHERE status = 'QUEUED'
ORDER BY id
LIMIT 1
);
UPDATE jobs
SET status="RUNNING"
WHERE id = #id_for_update;
SELECT #id_for_update;
You will get
#id_for_update
2
from the last select. And the table will have this state:
| id | created_at | status |
| --- | ------------------- | -------- |
| 1 | 2020-06-15 12:00:00 | COMLETED |
| 2 | 2020-06-15 12:00:10 | RUNNING |
| 3 | 2020-06-15 12:00:20 | QUEUED |
| 4 | 2020-06-15 12:00:30 | QUEUED |
View on DB Fiddle
If you have multiple processes, which start jobs, you would need to lock the row with FOR UPDATE. But that can be avoided using LAST_INSERT_ID():
Starting from the state above, with job 2 already running:
UPDATE jobs
SET status = 'RUNNING',
id = LAST_INSERT_ID(id)
WHERE status = 'QUEUED'
ORDER BY id
LIMIT 1;
SELECT LAST_INSERT_ID();
You will get:
| LAST_INSERT_ID() | ROW_COUNT() |
| ---------------- | ----------- |
| 3 | 1 |
And the new state is:
| id | created_at | status |
| --- | ------------------- | -------- |
| 1 | 2020-06-15 12:00:00 | COMLETED |
| 2 | 2020-06-15 12:00:10 | RUNNING |
| 3 | 2020-06-15 12:00:20 | RUNNING |
| 4 | 2020-06-15 12:00:30 | QUEUED |
View on DB Fiddle
If the UPDATE statement affected no row (there were no queued rows) ROW_COUNT() will be 0.
There might be some risks, which I am not aware of - But this is also not really how I would approach this. I would rather store more information in the jobs table. Simple example:
CREATE TABLE jobs (
id INT auto_increment primary key,
created_at timestamp not null default now(),
updated_at timestamp not null default now() on update now(),
status varchar(50) not null default 'QUEUED',
process_id varchar(50) null default null
);
and
UPDATE jobs
SET status = 'RUNNING',
process_id = 'some_unique_pid'
WHERE status = 'QUEUED'
ORDER BY id
LIMIT 1;
Now a running job belongs to a specific process and you can just select it with
SELECT * FROM jobs WHERE process_id = 'some_unique_pid';
You might even like to have more information - eg. queued_at, started_at, finished_at.
Adding SKIP LOCKED to the SELECT query, and putting in a SQL transaction, committed when the job is done, avoid jobs stuck in status RUNNING if a worker crashes (because the uncommitted transaction will rollback). It's now supported in newest versions of most common DBMS.
See:
Select only unlocked rows mysql
https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html#innodb-locking-reads-nowait-skip-locked
(This is not an answer to the question, but a list of caveats that you need to be aware of when using any of the real Answers. Some of these have already been mentioned.)
Replication -- You must do all the locking on the Primary. If you are using a cluster with multiple writable nodes, be aware of the inter-node delays.
Backlog -- When something breaks, you could get a huge list of tasks in the queue. This may lead to some ugly messes.
Number of 'workers' -- Don't have more than a "few" workers. If you try to have, say, 100 concurrent workers, they will stumble over each other an cause nasty problems.
Reaper -- Since a worker may crash, the task assigned to it may never get cleared. Have a TIMESTAMP on the rows so a separate (cron/EVENT/whatever) job can discover what tasks are long overdue and clear them.
If the tasks are fast enough, then the overhead of the queue could be a burden. That is, "Don't queue it, just do it."
You are right to grab the task in one transaction, then later release the task in a separate transaction. Using InnoDB's locking is folly for any be trivially fast actions.
Say I have multiple workers that can concurrently read and write against a MySQL table (e.g. jobs). The task for each worker is:
Find the oldest QUEUED job
Set it's status to RUNNING
Return the corresponding ID.
Note that there may not be any qualifying (i.e. QUEUED) jobs when a worker runs step #1.
I have the following pseudo-code so far. I believe I need to cancel (ROLLBACK) the transaction if step #1 returns no jobs. How would I do that in the code below?
BEGIN TRANSACTION;
# Update the status of jobs fetched by this query:
SELECT id from jobs WHERE status = "QUEUED"
ORDER BY created_at ASC LIMIT 1;
# Do the actual update, otherwise abort (i.e. ROLLBACK?)
UPDATE jobs
SET status="RUNNING"
# HERE: Not sure how to make this conditional on the previous ID
# WHERE id = <ID from the previous SELECT>
COMMIT;
I am implementing something very similar to your case this week. A number of workers, each grabbing the "next" row in a set of rows to work on.
The pseudocode is something like this:
BEGIN;
SELECT ID INTO #id FROM mytable WHERE status = 'QUEUED' LIMIT 1 FOR UPDATE;
UPDATE mytable SET status = 'RUNNING' WHERE id = #id;
COMMIT;
Using FOR UPDATE is important to avoid race conditions, i.e. more than one worker trying to grab the same row.
See https://dev.mysql.com/doc/refman/8.0/en/select-into.html for information about SELECT ... INTO.
It's still not quite clear what you are after. But assuming your task is: Find the next QUEUED job. Set it's status to RUNNING and select the corresponding ID.
In a single threaded environment, you can just use your code. Fetch the selected ID into a variable in your application code and pass it to the UPDATE query in the WHERE clause. You don't even need a transaction, since there is only one writing statement. You can mimic in an SQLscript.
Assuming this is your current state:
| id | created_at | status |
| --- | ------------------- | -------- |
| 1 | 2020-06-15 12:00:00 | COMLETED |
| 2 | 2020-06-15 12:00:10 | QUEUED |
| 3 | 2020-06-15 12:00:20 | QUEUED |
| 4 | 2020-06-15 12:00:30 | QUEUED |
You want to start the next queued job (which has id=2).
SET #id_for_update = (
SELECT id
FROM jobs
WHERE status = 'QUEUED'
ORDER BY id
LIMIT 1
);
UPDATE jobs
SET status="RUNNING"
WHERE id = #id_for_update;
SELECT #id_for_update;
You will get
#id_for_update
2
from the last select. And the table will have this state:
| id | created_at | status |
| --- | ------------------- | -------- |
| 1 | 2020-06-15 12:00:00 | COMLETED |
| 2 | 2020-06-15 12:00:10 | RUNNING |
| 3 | 2020-06-15 12:00:20 | QUEUED |
| 4 | 2020-06-15 12:00:30 | QUEUED |
View on DB Fiddle
If you have multiple processes, which start jobs, you would need to lock the row with FOR UPDATE. But that can be avoided using LAST_INSERT_ID():
Starting from the state above, with job 2 already running:
UPDATE jobs
SET status = 'RUNNING',
id = LAST_INSERT_ID(id)
WHERE status = 'QUEUED'
ORDER BY id
LIMIT 1;
SELECT LAST_INSERT_ID();
You will get:
| LAST_INSERT_ID() | ROW_COUNT() |
| ---------------- | ----------- |
| 3 | 1 |
And the new state is:
| id | created_at | status |
| --- | ------------------- | -------- |
| 1 | 2020-06-15 12:00:00 | COMLETED |
| 2 | 2020-06-15 12:00:10 | RUNNING |
| 3 | 2020-06-15 12:00:20 | RUNNING |
| 4 | 2020-06-15 12:00:30 | QUEUED |
View on DB Fiddle
If the UPDATE statement affected no row (there were no queued rows) ROW_COUNT() will be 0.
There might be some risks, which I am not aware of - But this is also not really how I would approach this. I would rather store more information in the jobs table. Simple example:
CREATE TABLE jobs (
id INT auto_increment primary key,
created_at timestamp not null default now(),
updated_at timestamp not null default now() on update now(),
status varchar(50) not null default 'QUEUED',
process_id varchar(50) null default null
);
and
UPDATE jobs
SET status = 'RUNNING',
process_id = 'some_unique_pid'
WHERE status = 'QUEUED'
ORDER BY id
LIMIT 1;
Now a running job belongs to a specific process and you can just select it with
SELECT * FROM jobs WHERE process_id = 'some_unique_pid';
You might even like to have more information - eg. queued_at, started_at, finished_at.
Adding SKIP LOCKED to the SELECT query, and putting in a SQL transaction, committed when the job is done, avoid jobs stuck in status RUNNING if a worker crashes (because the uncommitted transaction will rollback). It's now supported in newest versions of most common DBMS.
See:
Select only unlocked rows mysql
https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html#innodb-locking-reads-nowait-skip-locked
(This is not an answer to the question, but a list of caveats that you need to be aware of when using any of the real Answers. Some of these have already been mentioned.)
Replication -- You must do all the locking on the Primary. If you are using a cluster with multiple writable nodes, be aware of the inter-node delays.
Backlog -- When something breaks, you could get a huge list of tasks in the queue. This may lead to some ugly messes.
Number of 'workers' -- Don't have more than a "few" workers. If you try to have, say, 100 concurrent workers, they will stumble over each other an cause nasty problems.
Reaper -- Since a worker may crash, the task assigned to it may never get cleared. Have a TIMESTAMP on the rows so a separate (cron/EVENT/whatever) job can discover what tasks are long overdue and clear them.
If the tasks are fast enough, then the overhead of the queue could be a burden. That is, "Don't queue it, just do it."
You are right to grab the task in one transaction, then later release the task in a separate transaction. Using InnoDB's locking is folly for any be trivially fast actions.
I am creating a web app that lets N number of users to enter receipt data.
A set of scanned receipts is given to users, but no more than 2 users should work on the same receipt.
i.e. User A and User B can work on receipt-1, but User C can not work on it(Another receipt, say receipt-2, should be assigned to the User C).
The table structure I am using looks similar to the following.
[User-Receipt Table]
+------------+--------------+
| user_id | receipt_id |
+------------+--------------+
| 000000001 | R0000000000 |
| 000000001 | R0000000001 |
| 000000001 | R0000000002 |
| 000000002 | R0000000000 |
| 000000002 | R0000000001 |
+------------+--------------+
[Receipt Table]
+-------------+--------+
| receipt_id | status |
+-------------+--------+
| R0000000000 | 0 |
| R0000000001 | 1 |
| R0000000002 | 0 |
| R0000000003 | 2 |
+-------------+--------+
★status 0:not assigned 1:assigned to a user 2: assigned to 2 users
select receipts from the receipt table whose status is not equal to '2'
insert the receipts fetched from the step 1 along with a user to whom receipts are assigned.
update the receipt status(0->1 or 1->2)
This is how I plan to achieve the above requirement.
The problem with this approach is that there could be a chance that the select(step1) is executed right before the update(step3) is executed.
If this happens, the receipts with status 2 might be fetched and assigned to another user, which does not meet the requirement.
How can I make sure that this does not happen?
For all purposes, use transactions :
START TRANSACTION
your SQL commands
COMMIT
Transactions either let all your statements executed or not executed at all and performs implicitly a lock on the updated row which is more efficient than the second approach
You can also do it using LOCK TABLE
I have project like online service, i have made some part and stopped. If user use service it must take some amount (e.g. 5$ per service). I don't know how to build MySQL tables. I have made 2 tables 1st for rest amount 2nd for add and subtract amounts. May be this is wrong way, what is the best practice?
action_table
id | userId | reason | amount
1 | 4 | for service 3 | -5
2 | 2 | refill account | 100
3 | 13 | for service 3 | -5
balance_table
1 | 4 | 23
2 | 2 | 125
3 | 13 | 0
After using service query adds one row to action_table and updates balance_table
Personally, if I was making an account database, I would have one table for an account and one for transactions, like this:
Accounts:
| id | user | name | balance |
Transactions:
| id | account_id | description | amount | is_withdrawal |
The reason I came up with this is because it helps to think of database tables like real world objects sometimes, and in this case you have a Transaction and an Account.
Then, you can use a TRIGGER to update the account table anytime you add a transaction.
For a personal project I'm working on right now I want to make a line graph of game prices on Steam, Impulse, EA Origins, and several other sites over time. At the moment I've modified a script used by SteamCalculator.com to record the current price (sale price if applicable) for every game in every country code possible or each of these sites. I also have a column for the date in which the price was stored. My current tables look something like so:
THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id | us | at | au | de | no | uk | date |
+----------+------+------+------+------+------+------+------------+
| 112233 | 999 | 899 | 999 | NULL | 899 | 699 | 2011-8-21 |
| 123456 | 1999 | 999 | 1999 | 999 | 999 | 999 | 2011-8-20 |
| ... | ... | ... | ... | ... | ... | ... | ... |
+----------+------+------+------+------+------+------+------------+
At the moment each country is updated separately (there's a for loop going through the countries), although if it would simplify it then this could be modified to temporarily store new prices to an array then update an entire row at a time. I'll likely be doing this eventually, anyway, for performance reasons.
Now my issue is determining how to best update this table if one of the prices changes. For instance, let's suppose that on 8/22/2011 the game 112233 goes on sale in America for $4.99, Austria for 3.99€, and the other prices remain the same. I would need the table to look like so:
THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id | us | at | au | de | no | uk | date |
+----------+------+------+------+------+------+------+------------+
| 112233 | 999 | 899 | 999 | NULL | 899 | 699 | 2011-8-21 |
| 123456 | 1999 | 999 | 1999 | 999 | 999 | 999 | 2011-8-20 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 112233 | 499 | 399 | 999 | NULL | 899 | 699 | 2011-8-22 |
+----------+------+------+------+------+------+------+------------+
I don't want to create a new row EVERY time the price is checked, otherwise I'll end up having millions of rows of repeated prices day after day. I also don't want to create a new row per changed price like so:
THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id | us | at | au | de | no | uk | date |
+----------+------+------+------+------+------+------+------------+
| 112233 | 999 | 899 | 999 | NULL | 899 | 699 | 2011-8-21 |
| 123456 | 1999 | 999 | 1999 | 999 | 999 | 999 | 2011-8-20 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 112233 | 499 | 899 | 999 | NULL | 899 | 699 | 2011-8-22 |
| 112233 | 499 | 399 | 999 | NULL | 899 | 699 | 2011-8-22 |
+----------+------+------+------+------+------+------+------------+
I can prevent the first problem but not the second by making each (steam_id, <country>) a unique index then adding ON DUPLICATE KEY UPDATE to every database query. This will only add a row if the price is different, however it will add a new row for each country which changes. It also does not allow the same price for a single game for two different days (for instance, suppose game 112233 goes off sale later and returns to $9.99) so this is clearly an awful option.
I can prevent the second problem but not the first by making (steam_id, date) a unique index then adding ON DUPLICATE KEY UPDATE to every query. Every single day when the script is run the date has changed, so it will create a new row. This method ends up with hundreds of lines of the same prices from day to day.
How can I tell MySQL to create a new row if (and only if) any of the prices has changed since the latest date?
UPDATE -
At the recommendation of people in this thread I have changed the schema of my database to facilitate adding new country codes in the future and avoid the issue of needing to update entire rows at a time. The new schema looks something like:
+----------+------+---------+------------+
| steam_id | cc | price | date |
+----------+------+---------+------------+
| 112233 | us | 999 | 2011-8-21 |
| 123456 | uk | 699 | 2011-8-20 |
| ... | ... | ... | ... |
+----------+------+---------+------------+
On top of this new schema I have discovered that I can use the following SQL query to grab the price from the most recent update:
SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1
At this point my question boils down to this:
Is it possible to (using only SQL rather than application logic) insert a row only if a condition is true? For instance:
INSERT INTO `steam_prices` (...) VALUES (...) IF price<>(SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1)
From the MySQL manual I can not find any way to do this. I have only found that you can ignore or update if a unique index is the same. However if I made the price a unique index (allowing me to update the date if it was the same) then I would not be able to recognize when a game went on sale and then returned to its original price. For instance:
+----------+------+---------+------------+
| steam_id | cc | price | date |
+----------+------+---------+------------+
| 112233 | us | 999 | 2011-8-20 |
| 112233 | us | 499 | 2011-8-21 |
| 112233 | us | 999 | 2011-8-22 |
| ... | ... | ... | ... |
+----------+------+---------+------------+
Also, after just finding and reading MySQL Conditional INSERT, I created and tried the following query:
INSERT INTO `steam_prices`(
`steam_id`,
`cc`,
`update`,
`price`
)
SELECT '7870', 'us', NOW(), 999
FROM `steam_prices`
WHERE
`price`<>999
AND `update` IN (
SELECT `update`
FROM `steam_prices`
ORDER BY `update`
ASC LIMIT 1
)
The idea was to insert the row '7870', 'us', NOW(), 999 if (and only if) the price of the most recent update wasn't 999. When I ran this I got the following error:
1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Any ideas?
You will probably find this easier if you simply change your schema to something like:
steam_id integer
country varchar(2)
date date
price float
primary key (steam_id,country,date)
(with other appropriate indexes) and then only worrying about each country in turn.
In other words, your for loop has a unique ID/country combo so it can simply query the latest-date record for that combo and add a new row if it's different.
That will make your selections a little more complicated but I believe it's a better solution, especially if there's any chance at all that more countries may be added in future (it won't break the schema in that case).
First, I suggest you store your data in a form that is is less hard-coded per country:
+----------+--------------+------------+-------+
| steam_id | country_code | date | price |
+----------+--------------+------------+-------+
| 112233 | us | 2011-08-20 | 12.45 |
| 112233 | uk | 2011-08-20 | 12.46 |
| 112233 | de | 2011-08-20 | 12.47 |
| 112233 | at | 2011-08-20 | 12.48 |
| 112233 | us | 2011-08-21 | 12.49 |
| ...... | .. | .......... | ..... |
+----------+--------------+------------+-------+
From here, you place a primary key on the first three columns...
Now for your question about not creating extra rows... That is what a simple transaction + application logic is great at.
Start a transaction
Run a select to see if the record in question is there
If not, insert one
Was there a problem with that approach?
Hope this helps.
After experimentation, and with some help from MySQL Conditional INSERT and http://www.artfulsoftware.com/infotree/queries.php#101, I found a query that worked:
INSERT INTO `steam_prices`(
`steam_id`,
`cc`,
`price`,
`update`
)
SELECT 7870, 'us', 999, NOW()
FROM `steam_prices` AS p1
LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
WHERE
p2.`steam_id` IS NULL
AND p1.`steam_id`=7870
AND p1.`cc`='us'
AND (
p1.`price`<>999
)
The answer is to first return all rows where there is no earlier timestamp. This is done with a within-group aggregate. You join a table with itself only on rows where the timestamp is earlier. If it fails to join (the timestamp was not earlier) then you know that row contains the latest timestamp. These rows will have a NULL id in the joined table (failed to join).
After you have selected all rows with the latest timestamp, grab only those rows where the steam_id is the steam_id you're looking for and where the price is different from the new price that you're entering. If there are no rows with a different price for that game at this point then the price has not changed since the last update, so an empty set is returned. When an empty set is returned the SELECT statement fails and nothing is inserted. If the SELECT statement succeeds (a different price was found) then it returns the row 7870, 'us', 999, NOW() which is inserted into our table.
EDIT - I actually found a mistake with the above query a little while later and I have since revised it. The query above will insert a new row if the price has changed since the last update, but it will not insert a row if there are currently no prices in the database for that item.
To resolve this I had to take advantage of the DUAL table (which always contains one row), then use an OR in the where clause to test for a different price OR an empty set
INSERT INTO `steam_prices`(
`steam_id`,
`cc`,
`price`,
`update`
)
SELECT 12345, 'us', 999, NOW()
FROM DUAL
WHERE
NOT EXISTS (
SELECT `steam_id`
FROM `steam_prices`
WHERE `steam_id`=12345
)
OR
EXISTS (
SELECT p1.`steam_id`
FROM `steam_prices` AS p1
LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
WHERE
p2.`steam_id` IS NULL
AND p1.`steam_id`=12345
AND p1.`cc`='us'
AND (
p1.`price`<>999
)
)
It's very long, it's very ugly, and it's very complicated. But it works exactly as advertised. If there is no price in the database for a certain steam_id then it inserts a new row. If there is already a price then it checks the price with the most recent update and, if different, inserts a new row.