Currently moving our tables over to an append only model to increase write performance by avoiding UPDATE and DELETE, with a memcached front end for SELECT's.
All rows are timestamped with the latest row being selected using MAX(timestamp).
This works well although after time the table will be full of old irrelevant data, we could write a simple
DELETE FROM table WHERE timestamp < XXXX
Although that will delete rows which may have not been updated in the last XX amount of time, and therefore remove that ID from the table completely not just old rows.
A very simple example schema and data to demonstrate is provided below
---------------------------
| id | INT |
| name | VARCHAR |
| timestamp | TIMESTAMP |
---------------------------
Initial data
-------------------------------------------
| id | name | timestamp |
-------------------------------------------
| 1 | Trevor | 1 |
| 2 | Mike | 1 |
-------------------------------------------
Should a users name be updated a row will be appended, not updated, with the users new name.
-------------------------------------------
| id | name | timestamp |
-------------------------------------------
| 1 | Trevor | 1 |
| 2 | Mike | 1 |
| 1 | Trev | 60 |
-------------------------------------------
Using a simple DELETE query to remove rows older than 60 seconds (Real case would be more like an hour or even a day) would delete Trevor on row 1 as intended but it will also delete the only record of Mike.
-------------------------------------------
| id | name | timestamp |
-------------------------------------------
| 1 | Trev | 60 |
-------------------------------------------
We need it to only DELETE distinct ID rows which are older than XX, so we would be left with both users even though Mike hasn't updated his name and his timestamp is older than XX amount of time.
-------------------------------------------
| id | name | timestamp |
-------------------------------------------
| 2 | Mike | 1 |
| 1 | Trev | 60 |
-------------------------------------------
We could go through each ID, get the latest timestamp, then DELETE all rows older than that timestamp however as the table gets more users this process will take longer.
Is there any SQL query which could, preferably in one or 2 queries clean up the table as described above?
Thanks
I'm not an expert on MySQL but I believe this query should do the trick:
DELETE t1 FROM
table1 AS t1, table1 AS t2
WHERE
t1.id = t2.id
AND
t1.timestamp < t2.timestamp;
You could add those 60 minutes to t1.timestamp so it will only delete rows older than 60 minutes.
SQL Fiddle
Related
Consider below is my table,
register_device
id | user_id | device_id | status |
1 | 12 | 1234 | 1 |
2 | 1 | 5678 | 1 |
3 | 11 | 1456 | 1 |
Logic for trigger:
Before inserting value in register_device table, i want to check whether the new device_id value already exist in register_device table. If in case the value is already present its status need to be changed to 0, before inserting a new value, like below.
id | user_id | device_id | status |
1 | 12 | 1234 | 0 |
2 | 1 | 5678 | 1 |
3 | 11 | 1456 | 1 |
4 | 14 | 1234 | 1 |
So when i run insert operation through coding, trigger need to perform the above logic by its own in database.
Your current approach is going to make it very difficult to implement your insert logic for several reasons:
You can't use a trigger because a trigger cannot modify the table which caused it to fire
Using ON DUPLICATE KEY UPDATE also won't fly because you actually want to insert the record even if an update also happens
I propose a slightly different table design:
id | user_id | device_id | status_timestamp
The only change here is that status has become status_timestamp, which is a timestamp column. You can insert records and use NOW() for the current timestamp value.
Logically speaking, any record which does not have the max value of status_timestamp for a given device_id would correspond to a zero status in your original table. And records having the max status_timestamp value for a given device_id would correspond to a status of 1.
For example, to get the status=1 record for device_id 1234 you could use the following query:
SELECT *
FROM register_device
WHERE device_id = 1234
ORDER BY status_timestamp DESC
LIMIT 1
And to get all the status=0 records for device_id 1234 you could use:
SELECT *
FROM register_device
WHERE device_id = 1234 AND
status_timestamp < (SELECT MAX(status_timestamp) FROM register_device
WHERE device_id = 1234)
I'm a MySQL newbie, but I'm sure there must be a way to do this. I've been looking through StackOverflow for quite a while, though, and haven't found it yet.
I have a MySQL table that is generated from a multi-reducer Hadoop MapReduce job which is analyzing log files. The table is being used in the database that supports a Ruby-on-Rails app, and it looks like this:
+----+-----+------+---------+-----------+
| id | src | dest | time | requests |
+----+-----+------+---------+-----------+
| 0 | abc | xyz | 1000000 | 200000000 |
| 1 | def | uvw | 10 | 300 |
| 2 | abc | xyz | 100000 | 200000 |
| 3 | def | xyz | 1000 | 40000 |
| 4 | abc | uvw | 100 | 5000 |
| 5 | def | xyz | 10000 | 100000 |
+----+-----+------+---------+-----------+
I'm trying to coalesce/sum the columns which have the same src and dest, but I just can't figure out how to do it even after searching through the MySQL 5.1 documentation.
I'm trying to write a script which I could run and obtain something like this at the end (neither the order of the rows nor the id column is important):
+----+-----+------+---------+-----------+
| id | src | dest | time | requests |
+----+-----+------+---------+-----------+
| 6 | abc | xyz | 1100000 | 200200000 |
| 7 | def | uvw | 10 | 300 |
| 8 | abc | uvw | 100 | 5000 |
| 9 | def | xyz | 11000 | 140000 |
+----+-----+------+---------+-----------+
Any ideas on how I could figure this out?
You can't really combine the rows in a single table -- at least not easily. That would require both updates and deletes.
So, just create another table:
create table summary_t as
select src, desc, sum(time) as time, sum(requests) as requests
from table t
group by src, desc;
If you really want this go go back into the original table, then use a temporary table and re-insert the data:
create temporary table summary_t as
select src, desc, sum(time) as time, sum(requests) as requests
from t
group by src, desc;
truncate table t;
insert into t(src, desc, time, requests)
select src, desc, time, requests
from summary_t;
However, having said all that, you should just add another step to your map-reduce application to do that final summary.
Group By with SUM aggregate should work
select src, dest, sum(`time`) as `time`, sum(requests) as requests
from yourtable
group by src, dest
Check if this suite your needs, Create a table with the columns src and dest as primary key and other fields like totaltime and totalrequest.
Create an INSERT AFTER trigger on the existing tabl, which updates the other table totaltime and totalrequest with (old + new) using the src and dest as the key for where condition.
I have table:
+----+--------+----------+
| id | doc_id | next_req |
+----+--------+----------+
| 1 | 1 | 4 |
| 2 | 1 | 3 |
| 3 | 1 | 0 |
| 4 | 1 | 2 |
+----+--------+----------+
id - auto incerement primary key.
nex_req - represent an order of records. (next_req = id of record)
How can I build a SQL query get records in this order:
+----+--------+----------+
| id | doc_id | next_req |
+----+--------+----------+
| 1 | 1 | 4 |
| 4 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 1 | 0 |
+----+--------+----------+
Explains:
record1 with id=1 and next_req=4 means: next must be record4 with id=4 and next_req=2
record4 with id=5 and next_req=2 means: next must be record2 with id=2 and next_req=3
record2 with id=2 and next_req=3 means: next must be record3 with id=1 and next_req=0
record3 with id=3 and next_req=0: means that this is a last record
I need to store an order of records in table. It's important fo me.
If you can, change your table format. Rather than naming the next record, mark the records in order so you can use a natural SQL sort:
+----+--------+------+
| id | doc_id | sort |
+----+--------+------+
| 1 | 1 | 1 |
| 4 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 1 | 4 |
+----+--------+------+
Then you can even cluster-index on doc_id,sort for if you need to for performance issues. And honestly, if you need to re-order rows, it is not any more work than a linked-list like you were working with.
Am able to give you a solution in Oracle,
select id,doc_id,next_req from table2
start with id =
(select id from table2 where rowid=(select min(rowid) from table2))
connect by prior next_req=id
fiddle_demo
I'd suggest to modify your table and add another column OrderNumber, so eventually it would be easy to order by this column.
Though there may be problems with this approach:
1) You have existing table and need to set OrderNumber column values. I guess this part is easy. You can simply set initial zero values and add a CURSOR for example moving through your records and incrementing your order number value.
2) When new row appears in your table, you have to modify your OrderNumber, but here it depends on your particular situation. If you only need to add items to the end of the list then you can set your new value as MAX + 1. In another situation you may try writing TRIGGER on inserting new items and calling similar steps to point 1). This may cause very bad hit on performance, so you have to carefully investigate your architecture and maybe modify this unusual construction.
As I am new to mysql, let me clear this doubt. how to write a query to find/select the latest added records only?
Example:
Consider a Table, which is daily added certain amount of records. Now the table contain 1000 records. And the total 1000 records are taken out for some performance. After sometimes table is added 100 records. Now I would like take the remain 100 only from the 1100 to do some operation. How to do it?
(For example only, I have given the numbers, But originally I don't know the last updated count and the newly added)
Here My table contain three columns Sno, time, data. where Sno is indexed as primary key.
Sample table:
| sno | time | data |
| 1 | 2012-02-27 12:44:07 | 100 |
| 2 | 2012-02-27 12:44:07 | 120 |
| 3 | 2012-02-27 12:44:07 | 140 |
| 4 | 2012-02-27 12:44:07 | 160 |
| 5 | 2012-02-27 12:44:07 | 180 |
| 6 | 2012-02-27 12:44:07 | 160 |
| 7 | 2012-02-28 13:00:35 | 100 |
| 8 | 2012-03-02 15:23:25 | 160 |
Add TIMESTAMP field with 'ON UPDATE CURRENT_TIMESTAMP' option, and you will be able to find last added or last edited records.
Automatic Initialization and Updating for TIMESTAMP.
Create table as below
Create table sample
(id int auto_increment primary key,
time timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
data nvarchar(100)
);
then query as
select * from sample order by time desc limit 1
For a personal project I'm working on right now I want to make a line graph of game prices on Steam, Impulse, EA Origins, and several other sites over time. At the moment I've modified a script used by SteamCalculator.com to record the current price (sale price if applicable) for every game in every country code possible or each of these sites. I also have a column for the date in which the price was stored. My current tables look something like so:
THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id | us | at | au | de | no | uk | date |
+----------+------+------+------+------+------+------+------------+
| 112233 | 999 | 899 | 999 | NULL | 899 | 699 | 2011-8-21 |
| 123456 | 1999 | 999 | 1999 | 999 | 999 | 999 | 2011-8-20 |
| ... | ... | ... | ... | ... | ... | ... | ... |
+----------+------+------+------+------+------+------+------------+
At the moment each country is updated separately (there's a for loop going through the countries), although if it would simplify it then this could be modified to temporarily store new prices to an array then update an entire row at a time. I'll likely be doing this eventually, anyway, for performance reasons.
Now my issue is determining how to best update this table if one of the prices changes. For instance, let's suppose that on 8/22/2011 the game 112233 goes on sale in America for $4.99, Austria for 3.99€, and the other prices remain the same. I would need the table to look like so:
THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id | us | at | au | de | no | uk | date |
+----------+------+------+------+------+------+------+------------+
| 112233 | 999 | 899 | 999 | NULL | 899 | 699 | 2011-8-21 |
| 123456 | 1999 | 999 | 1999 | 999 | 999 | 999 | 2011-8-20 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 112233 | 499 | 399 | 999 | NULL | 899 | 699 | 2011-8-22 |
+----------+------+------+------+------+------+------+------------+
I don't want to create a new row EVERY time the price is checked, otherwise I'll end up having millions of rows of repeated prices day after day. I also don't want to create a new row per changed price like so:
THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id | us | at | au | de | no | uk | date |
+----------+------+------+------+------+------+------+------------+
| 112233 | 999 | 899 | 999 | NULL | 899 | 699 | 2011-8-21 |
| 123456 | 1999 | 999 | 1999 | 999 | 999 | 999 | 2011-8-20 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 112233 | 499 | 899 | 999 | NULL | 899 | 699 | 2011-8-22 |
| 112233 | 499 | 399 | 999 | NULL | 899 | 699 | 2011-8-22 |
+----------+------+------+------+------+------+------+------------+
I can prevent the first problem but not the second by making each (steam_id, <country>) a unique index then adding ON DUPLICATE KEY UPDATE to every database query. This will only add a row if the price is different, however it will add a new row for each country which changes. It also does not allow the same price for a single game for two different days (for instance, suppose game 112233 goes off sale later and returns to $9.99) so this is clearly an awful option.
I can prevent the second problem but not the first by making (steam_id, date) a unique index then adding ON DUPLICATE KEY UPDATE to every query. Every single day when the script is run the date has changed, so it will create a new row. This method ends up with hundreds of lines of the same prices from day to day.
How can I tell MySQL to create a new row if (and only if) any of the prices has changed since the latest date?
UPDATE -
At the recommendation of people in this thread I have changed the schema of my database to facilitate adding new country codes in the future and avoid the issue of needing to update entire rows at a time. The new schema looks something like:
+----------+------+---------+------------+
| steam_id | cc | price | date |
+----------+------+---------+------------+
| 112233 | us | 999 | 2011-8-21 |
| 123456 | uk | 699 | 2011-8-20 |
| ... | ... | ... | ... |
+----------+------+---------+------------+
On top of this new schema I have discovered that I can use the following SQL query to grab the price from the most recent update:
SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1
At this point my question boils down to this:
Is it possible to (using only SQL rather than application logic) insert a row only if a condition is true? For instance:
INSERT INTO `steam_prices` (...) VALUES (...) IF price<>(SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1)
From the MySQL manual I can not find any way to do this. I have only found that you can ignore or update if a unique index is the same. However if I made the price a unique index (allowing me to update the date if it was the same) then I would not be able to recognize when a game went on sale and then returned to its original price. For instance:
+----------+------+---------+------------+
| steam_id | cc | price | date |
+----------+------+---------+------------+
| 112233 | us | 999 | 2011-8-20 |
| 112233 | us | 499 | 2011-8-21 |
| 112233 | us | 999 | 2011-8-22 |
| ... | ... | ... | ... |
+----------+------+---------+------------+
Also, after just finding and reading MySQL Conditional INSERT, I created and tried the following query:
INSERT INTO `steam_prices`(
`steam_id`,
`cc`,
`update`,
`price`
)
SELECT '7870', 'us', NOW(), 999
FROM `steam_prices`
WHERE
`price`<>999
AND `update` IN (
SELECT `update`
FROM `steam_prices`
ORDER BY `update`
ASC LIMIT 1
)
The idea was to insert the row '7870', 'us', NOW(), 999 if (and only if) the price of the most recent update wasn't 999. When I ran this I got the following error:
1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Any ideas?
You will probably find this easier if you simply change your schema to something like:
steam_id integer
country varchar(2)
date date
price float
primary key (steam_id,country,date)
(with other appropriate indexes) and then only worrying about each country in turn.
In other words, your for loop has a unique ID/country combo so it can simply query the latest-date record for that combo and add a new row if it's different.
That will make your selections a little more complicated but I believe it's a better solution, especially if there's any chance at all that more countries may be added in future (it won't break the schema in that case).
First, I suggest you store your data in a form that is is less hard-coded per country:
+----------+--------------+------------+-------+
| steam_id | country_code | date | price |
+----------+--------------+------------+-------+
| 112233 | us | 2011-08-20 | 12.45 |
| 112233 | uk | 2011-08-20 | 12.46 |
| 112233 | de | 2011-08-20 | 12.47 |
| 112233 | at | 2011-08-20 | 12.48 |
| 112233 | us | 2011-08-21 | 12.49 |
| ...... | .. | .......... | ..... |
+----------+--------------+------------+-------+
From here, you place a primary key on the first three columns...
Now for your question about not creating extra rows... That is what a simple transaction + application logic is great at.
Start a transaction
Run a select to see if the record in question is there
If not, insert one
Was there a problem with that approach?
Hope this helps.
After experimentation, and with some help from MySQL Conditional INSERT and http://www.artfulsoftware.com/infotree/queries.php#101, I found a query that worked:
INSERT INTO `steam_prices`(
`steam_id`,
`cc`,
`price`,
`update`
)
SELECT 7870, 'us', 999, NOW()
FROM `steam_prices` AS p1
LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
WHERE
p2.`steam_id` IS NULL
AND p1.`steam_id`=7870
AND p1.`cc`='us'
AND (
p1.`price`<>999
)
The answer is to first return all rows where there is no earlier timestamp. This is done with a within-group aggregate. You join a table with itself only on rows where the timestamp is earlier. If it fails to join (the timestamp was not earlier) then you know that row contains the latest timestamp. These rows will have a NULL id in the joined table (failed to join).
After you have selected all rows with the latest timestamp, grab only those rows where the steam_id is the steam_id you're looking for and where the price is different from the new price that you're entering. If there are no rows with a different price for that game at this point then the price has not changed since the last update, so an empty set is returned. When an empty set is returned the SELECT statement fails and nothing is inserted. If the SELECT statement succeeds (a different price was found) then it returns the row 7870, 'us', 999, NOW() which is inserted into our table.
EDIT - I actually found a mistake with the above query a little while later and I have since revised it. The query above will insert a new row if the price has changed since the last update, but it will not insert a row if there are currently no prices in the database for that item.
To resolve this I had to take advantage of the DUAL table (which always contains one row), then use an OR in the where clause to test for a different price OR an empty set
INSERT INTO `steam_prices`(
`steam_id`,
`cc`,
`price`,
`update`
)
SELECT 12345, 'us', 999, NOW()
FROM DUAL
WHERE
NOT EXISTS (
SELECT `steam_id`
FROM `steam_prices`
WHERE `steam_id`=12345
)
OR
EXISTS (
SELECT p1.`steam_id`
FROM `steam_prices` AS p1
LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
WHERE
p2.`steam_id` IS NULL
AND p1.`steam_id`=12345
AND p1.`cc`='us'
AND (
p1.`price`<>999
)
)
It's very long, it's very ugly, and it's very complicated. But it works exactly as advertised. If there is no price in the database for a certain steam_id then it inserts a new row. If there is already a price then it checks the price with the most recent update and, if different, inserts a new row.