TRUNCATE-INSERT vs SELECT-UPDATE-INSERT - mysql

I have a table that I am using as a temporary table. A cron runs every hour to set a certain value for each row.
| id | item_id | value |
+====+=========+=======+
| 1 | 5 | 52 |
| 2 | 34 | 314 |
| 3 | 27 | 189 |
| 4 | 19 | 200 |
+====+=========+=======+
What I would like to know is if it is better to first TRUNCATE and then refill this table or that I could rather SELECT the existing row, UPDATE it or INSERT it if it doesn't exist.

Insert the record if it doesn't exist in your temporary table and if it has already in your temporary table but you need to update it's value then update the specific record by only target it.
It would be more wise, because it will be reduce the operation execution time.

Related

Moving some data from one column of MySQL table to another

I was wondering if there is an easy way of moving some (not all) data from one column to another.
My MySQL table has 200 entries but this is the simplified version of what I am trying to do:
| ID | A | B |
| 1 | | |
| 2 | | |
| 3 | | aa|
| 4 | | bb|
| 5 | | cc|
So I need to get data from column B to Column A but only the ones that have ID greater than (>) 2. so that aa from 3B will go to 3A, bb from 4B will go to 4A...
UPDATE <tablename> SET
A=B,
-- B=''
WHERE ID>2
Might help. The commented-out line needs to be enabled or disabled, depending on whether you want to move or copy the values between columns.

Erasing duplicate records from MySQL

Due to a bug in my javascript click handling, multiple Location objects are posted to a JSON array that is sent to the server. I think I know how to fix that bug, but I'd also like to implement a server side database duplicate erase function. However, I'm not sure how to write this query.
The only affected table is laid out as
+----+------------+--------+
| ID | locationID | linkID |
+----+------------+--------+
| 64 | 13 | 14 |
| 65 | 14 | 13 |
| 66 | 14 | 15 |
| 67 | 15 | 14 |
| 68 | 15 | 16 |
| 69 | 16 | 17 |
| 70 | 16 | 14 |
| 71 | 17 | 16 |
| 72 | 17 | 16 |
| 73 | 17 | 16 |
| 74 | 17 | 16 |
| 75 | 17 | 16 |
| 76 | 17 | 16 |
| 77 | 17 | 16 |
+----+------------+--------+
As you can see, I have multiple pairs of (17, 16), while 14 has two pairs of (14, 13) and (14, 15). How can I delete all but one record of any duplicate entries?
Don't implement post factum correction logic, put a unique index on the fields that need to be unique, that way the database will stop dupe inserts before it's too late.
If you're using MySQL 5.1 or higher you can remove dupes and create a unique index in 1 command:
ALTER IGNORE TABLE 'YOURTABLE'
ADD UNIQUE INDEX somefancynamefortheindex (locationID, linkID)
You can create a temporary table where you can store the distinct records and then truncate the original table and insert data from temp table.
CREATE TEMPORARY TABLE temp_table (locationId INT,linkID INT)
INSERT INTO temp_table (locationId,linkId) SELECT DISTINCT locationId,linkId FROM table1;
DELETE from table1;
INSERT INTO table1 (locationId,linkId) SELECT * FROM temp_table ;
delete from tbl
using tbl,tbl t2
where tbl.locationID=t2.locationID
and tbl.linkID=t2.linkID
and tbl.ID>t2.ID
I am assuming you don't mean for the clean up, but for the new check? Put a unique index on if possible, if you don't have control of the DB do an upsert and check for nulls instead of an insert.

Query to select newly added records only

As I am new to mysql, let me clear this doubt. how to write a query to find/select the latest added records only?
Example:
Consider a Table, which is daily added certain amount of records. Now the table contain 1000 records. And the total 1000 records are taken out for some performance. After sometimes table is added 100 records. Now I would like take the remain 100 only from the 1100 to do some operation. How to do it?
(For example only, I have given the numbers, But originally I don't know the last updated count and the newly added)
Here My table contain three columns Sno, time, data. where Sno is indexed as primary key.
Sample table:
| sno | time | data |
| 1 | 2012-02-27 12:44:07 | 100 |
| 2 | 2012-02-27 12:44:07 | 120 |
| 3 | 2012-02-27 12:44:07 | 140 |
| 4 | 2012-02-27 12:44:07 | 160 |
| 5 | 2012-02-27 12:44:07 | 180 |
| 6 | 2012-02-27 12:44:07 | 160 |
| 7 | 2012-02-28 13:00:35 | 100 |
| 8 | 2012-03-02 15:23:25 | 160 |
Add TIMESTAMP field with 'ON UPDATE CURRENT_TIMESTAMP' option, and you will be able to find last added or last edited records.
Automatic Initialization and Updating for TIMESTAMP.
Create table as below
Create table sample
(id int auto_increment primary key,
time timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
data nvarchar(100)
);
then query as
select * from sample order by time desc limit 1

MySQL Multi Duplicate Record Merging

A previous DBA managed a non relational table with 2.4M entries, all with unique ID's. However, there are duplicate records with different data in each record for example:
+---------+---------+--------------+----------------------+-------------+
| id | Name | Address | Phone | Email | LastVisited |
+---------+---------+--------------+---------+------------+-------------+
| 1 | bob | 12 Some Road | 02456 | | |
| 2 | bobby | | 02456 | bob#domain | |
| 3 | bob | 12 Some Rd | 02456 | | 2010-07-13 |
| 4 | sir bob | | 02456 | | |
| 5 | bob | 12SomeRoad | 02456 | | |
| 6 | mr bob | | 02456 | | |
| 7 | robert | | 02456 | | |
+---------+---------+--------------+---------+------------+-------------+
This isnt the exact table - the real table has 32 columns - this is just to illustrate
I know how to identify the duplicates, in this case i'm using the phone number. I've extracted the duplicates into a seperate table - there's 730k entires in total.
What would be the most efficient way of merging these records (and flagging the un-needed records for deletion)?
I've looked at using UPDATE with INNER JOIN's, but there are several WHERE clauses needed, because i want to update the first record with data from subsequent records, where that subsequent record has additional data the former record does not.
I've looked at third party software such as Fuzzy Dups, but i'd like a pure MySQL option if possible
The end goal then is that i'd be left with something like:
+---------+---------+--------------+----------------------+-------------+
| id | Name | Address | Phone | Email | LastVisited |
+---------+---------+--------------+---------+------------+-------------+
| 1 | bob | 12 Some Road | 02456 | bob#domain | 2010-07-13 |
+---------+---------+--------------+---------+------------+-------------+
Should i be looking at looping in a stored procedure / function or is there some real easy thing i've missed?
U have to create a PROCEDURE, but before that
create ur own temp_table like :
Insert into temp_table(column1, column2,....) values (select column1, column2... from myTable GROUP BY phoneNumber)
U have to create the above mentioned physical table so that u can run a cursor on it.
create PROCEDURE myPROC
{
create a cursor on temp::
fetch the phoneNumber and id of the current row from the temp_table to the local variable(L_id, L_phoneNum).
And here too u need to create a new similar_tempTable which will contain the values as
Insert into similar_tempTable(column1, column2,....) values (Select column1, column2,.... from myTable where phoneNumber=L_phoneNumber)
The next step is to extract the values of each column u want from similar_tempTable and update into the the row of myTable where id=L_id and delete the rest duplicate rows from myTable.
And one more thing, truncate the similar_tempTable after every iteration of the cursor...
Hope this will help u...

MySQL insert new row on value change

For a personal project I'm working on right now I want to make a line graph of game prices on Steam, Impulse, EA Origins, and several other sites over time. At the moment I've modified a script used by SteamCalculator.com to record the current price (sale price if applicable) for every game in every country code possible or each of these sites. I also have a column for the date in which the price was stored. My current tables look something like so:
THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id | us | at | au | de | no | uk | date |
+----------+------+------+------+------+------+------+------------+
| 112233 | 999 | 899 | 999 | NULL | 899 | 699 | 2011-8-21 |
| 123456 | 1999 | 999 | 1999 | 999 | 999 | 999 | 2011-8-20 |
| ... | ... | ... | ... | ... | ... | ... | ... |
+----------+------+------+------+------+------+------+------------+
At the moment each country is updated separately (there's a for loop going through the countries), although if it would simplify it then this could be modified to temporarily store new prices to an array then update an entire row at a time. I'll likely be doing this eventually, anyway, for performance reasons.
Now my issue is determining how to best update this table if one of the prices changes. For instance, let's suppose that on 8/22/2011 the game 112233 goes on sale in America for $4.99, Austria for 3.99€, and the other prices remain the same. I would need the table to look like so:
THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id | us | at | au | de | no | uk | date |
+----------+------+------+------+------+------+------+------------+
| 112233 | 999 | 899 | 999 | NULL | 899 | 699 | 2011-8-21 |
| 123456 | 1999 | 999 | 1999 | 999 | 999 | 999 | 2011-8-20 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 112233 | 499 | 399 | 999 | NULL | 899 | 699 | 2011-8-22 |
+----------+------+------+------+------+------+------+------------+
I don't want to create a new row EVERY time the price is checked, otherwise I'll end up having millions of rows of repeated prices day after day. I also don't want to create a new row per changed price like so:
THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id | us | at | au | de | no | uk | date |
+----------+------+------+------+------+------+------+------------+
| 112233 | 999 | 899 | 999 | NULL | 899 | 699 | 2011-8-21 |
| 123456 | 1999 | 999 | 1999 | 999 | 999 | 999 | 2011-8-20 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 112233 | 499 | 899 | 999 | NULL | 899 | 699 | 2011-8-22 |
| 112233 | 499 | 399 | 999 | NULL | 899 | 699 | 2011-8-22 |
+----------+------+------+------+------+------+------+------------+
I can prevent the first problem but not the second by making each (steam_id, <country>) a unique index then adding ON DUPLICATE KEY UPDATE to every database query. This will only add a row if the price is different, however it will add a new row for each country which changes. It also does not allow the same price for a single game for two different days (for instance, suppose game 112233 goes off sale later and returns to $9.99) so this is clearly an awful option.
I can prevent the second problem but not the first by making (steam_id, date) a unique index then adding ON DUPLICATE KEY UPDATE to every query. Every single day when the script is run the date has changed, so it will create a new row. This method ends up with hundreds of lines of the same prices from day to day.
How can I tell MySQL to create a new row if (and only if) any of the prices has changed since the latest date?
UPDATE -
At the recommendation of people in this thread I have changed the schema of my database to facilitate adding new country codes in the future and avoid the issue of needing to update entire rows at a time. The new schema looks something like:
+----------+------+---------+------------+
| steam_id | cc | price | date |
+----------+------+---------+------------+
| 112233 | us | 999 | 2011-8-21 |
| 123456 | uk | 699 | 2011-8-20 |
| ... | ... | ... | ... |
+----------+------+---------+------------+
On top of this new schema I have discovered that I can use the following SQL query to grab the price from the most recent update:
SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1
At this point my question boils down to this:
Is it possible to (using only SQL rather than application logic) insert a row only if a condition is true? For instance:
INSERT INTO `steam_prices` (...) VALUES (...) IF price<>(SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1)
From the MySQL manual I can not find any way to do this. I have only found that you can ignore or update if a unique index is the same. However if I made the price a unique index (allowing me to update the date if it was the same) then I would not be able to recognize when a game went on sale and then returned to its original price. For instance:
+----------+------+---------+------------+
| steam_id | cc | price | date |
+----------+------+---------+------------+
| 112233 | us | 999 | 2011-8-20 |
| 112233 | us | 499 | 2011-8-21 |
| 112233 | us | 999 | 2011-8-22 |
| ... | ... | ... | ... |
+----------+------+---------+------------+
Also, after just finding and reading MySQL Conditional INSERT, I created and tried the following query:
INSERT INTO `steam_prices`(
`steam_id`,
`cc`,
`update`,
`price`
)
SELECT '7870', 'us', NOW(), 999
FROM `steam_prices`
WHERE
`price`<>999
AND `update` IN (
SELECT `update`
FROM `steam_prices`
ORDER BY `update`
ASC LIMIT 1
)
The idea was to insert the row '7870', 'us', NOW(), 999 if (and only if) the price of the most recent update wasn't 999. When I ran this I got the following error:
1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Any ideas?
You will probably find this easier if you simply change your schema to something like:
steam_id integer
country varchar(2)
date date
price float
primary key (steam_id,country,date)
(with other appropriate indexes) and then only worrying about each country in turn.
In other words, your for loop has a unique ID/country combo so it can simply query the latest-date record for that combo and add a new row if it's different.
That will make your selections a little more complicated but I believe it's a better solution, especially if there's any chance at all that more countries may be added in future (it won't break the schema in that case).
First, I suggest you store your data in a form that is is less hard-coded per country:
+----------+--------------+------------+-------+
| steam_id | country_code | date | price |
+----------+--------------+------------+-------+
| 112233 | us | 2011-08-20 | 12.45 |
| 112233 | uk | 2011-08-20 | 12.46 |
| 112233 | de | 2011-08-20 | 12.47 |
| 112233 | at | 2011-08-20 | 12.48 |
| 112233 | us | 2011-08-21 | 12.49 |
| ...... | .. | .......... | ..... |
+----------+--------------+------------+-------+
From here, you place a primary key on the first three columns...
Now for your question about not creating extra rows... That is what a simple transaction + application logic is great at.
Start a transaction
Run a select to see if the record in question is there
If not, insert one
Was there a problem with that approach?
Hope this helps.
After experimentation, and with some help from MySQL Conditional INSERT and http://www.artfulsoftware.com/infotree/queries.php#101, I found a query that worked:
INSERT INTO `steam_prices`(
`steam_id`,
`cc`,
`price`,
`update`
)
SELECT 7870, 'us', 999, NOW()
FROM `steam_prices` AS p1
LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
WHERE
p2.`steam_id` IS NULL
AND p1.`steam_id`=7870
AND p1.`cc`='us'
AND (
p1.`price`<>999
)
The answer is to first return all rows where there is no earlier timestamp. This is done with a within-group aggregate. You join a table with itself only on rows where the timestamp is earlier. If it fails to join (the timestamp was not earlier) then you know that row contains the latest timestamp. These rows will have a NULL id in the joined table (failed to join).
After you have selected all rows with the latest timestamp, grab only those rows where the steam_id is the steam_id you're looking for and where the price is different from the new price that you're entering. If there are no rows with a different price for that game at this point then the price has not changed since the last update, so an empty set is returned. When an empty set is returned the SELECT statement fails and nothing is inserted. If the SELECT statement succeeds (a different price was found) then it returns the row 7870, 'us', 999, NOW() which is inserted into our table.
EDIT - I actually found a mistake with the above query a little while later and I have since revised it. The query above will insert a new row if the price has changed since the last update, but it will not insert a row if there are currently no prices in the database for that item.
To resolve this I had to take advantage of the DUAL table (which always contains one row), then use an OR in the where clause to test for a different price OR an empty set
INSERT INTO `steam_prices`(
`steam_id`,
`cc`,
`price`,
`update`
)
SELECT 12345, 'us', 999, NOW()
FROM DUAL
WHERE
NOT EXISTS (
SELECT `steam_id`
FROM `steam_prices`
WHERE `steam_id`=12345
)
OR
EXISTS (
SELECT p1.`steam_id`
FROM `steam_prices` AS p1
LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
WHERE
p2.`steam_id` IS NULL
AND p1.`steam_id`=12345
AND p1.`cc`='us'
AND (
p1.`price`<>999
)
)
It's very long, it's very ugly, and it's very complicated. But it works exactly as advertised. If there is no price in the database for a certain steam_id then it inserts a new row. If there is already a price then it checks the price with the most recent update and, if different, inserts a new row.