Reuse Rows in mysql table without auto incrementing - mysql

hey let me explain my problem. I have a mysql table in which i store data feeds from say 5 different sites. Now i update the feeds once daily. I have this primary key FeedId which auto-increments. Now what i do is when i update feeds from particular site i delete previous data for that site from my table and enter the new one. This way the new data is filled in the rows occupied by previous deleted data and if this time there are more feeds rest are entered at the end of table. But the FeedId is incremented for all the new data.
What i want is that the feeds stored in old locations retain previous Id n only the extra ones being saved at the end get new incremented Ids. Please help as i cant figure out how to do that.

A better solution would be to set a unique key on the feed (aside from the auto-incremented key). Then use INSERT ON DUPLICATE KEY UPDATE
INSERT INTO feeds (name, url, etc, etc2, `update_count`)
VALUES ('name', 'url', 'etc', 'etc2', 1)
ON DUPLICATE KEY UPDATE
`etc` = VALUES(`etc`),
`etc2` = VALUES(`etc2`),
`update_count` = `update_count` + 1;
The benefit is that you're not incrementing the ids, and you're still doing it in one atomic query. Plus, you're only updating / changing what you need to change. (Note that I included the update_count column to show how to update a field)...

Marking the post as delete based on the comments
Try REPLACE INTO to merge the data.
More information #:
http://dev.mysql.com/doc/refman/5.0/en/replace.html

Related

Tweaking INSERT ON DUPLICATE KEY UPDATE

I'm trying to change some old deprecated code on how items are saved in my game. The current method is deleting all items, then inserting all items, then inserting equipment (child table of the items). It has a pretty big performance impact and I was wondering if changing the INSERTS to INSERTS ON DUPLICATE KEY UPDATE would have a noticeable impact on performance.
If it does, then I have a follow-up question. My plan is to load the items in with their original inventoryitemid and use that as the key to save with later.
The issue is when executing this following statement:
INSERT INTO inventoryitems (inventoryitemid, itemid) VALUES (?, ?) ON DUPLICATE KEY UPDATE ...
may miss some conditions. What I would like is for MySQL to INSERT if it doesn't exist with the default value (auto increment), otherwise UPDATE. At the moment new items are generated with an inventoryitemid at 0 since keys are generated on INSERT anyways.
tl;dr: I need a way to INSERT ON DUPLICATE KEY UPDATE without having the inventoryitemid beforehand (since new items are generated in-game with inventoryitemid of 0). At the moment I have to specify what the inventoryitemid is beforehand and I might not have access to that information.
First goal and issue
Try to insert a new item that doesn't exist in the database without having the inventoryitemid beforehand.
Item isn't inserted into the database with the next incremented db value.
Second goal (no issue)
Attempting to insert item into database with an existing inventory itemid
Database updates the item in the database successfully (yay!)
Trying out solution: Inserting value with NULL to try to trigger the autoincrement
When you're inserting a new row, specify inventoryitemid = NULL. This will create a new row because it's not a duplicate key, and it will auto-increment the inventoryitemid.
After you insert a new item like this, you can use LAST_INSERT_ID() to get the inventoryitemid that was assign to it. Then you can send that to the game code so it will know that the new item has this ID. It can then use this ID in future updates, instead of sending NULL.
When you're modifying an existing row, specify the value of its inventoryitemid. Then the ON DUPLICATE KEY UPDATE code will replace it instead of inserting a new row.

How to implement temporal data in MySQL

I currently have a non-temporal MySQL DB and need to change it to a temporal MySQL DB. In other words, I need to be able to retain a history of changes that have been made to a record over time for reporting purposes.
My first thought for implementing this was to simply do inserts into the tables instead of updates, and when I need to select the data, simply doing a GROUP BY on some column and ordering by the timestamp DESC.
However, after thinking about things a bit, I realized that that will really mess things up because the primary key for each insert (which would really just be simulating a number of updates on a single record) will be different and thus mess up any linkage that uses the primary key to link to other records in the DB.
As such, my next thought was to continue updating the main tables in the DB, but also create a new insert into an "audit table" that is simply a copy of the full record after the update, and then when I needed to report on temporal data, I could use the audit table for querying purposes.
Can someone please give me some guidance or links on how to properly do this?
Thank you.
Make the given table R temporal(ie, to maintain the history).
One design is to leave the table R as it is and create a new table R_Hist with valid_start_time and valid_end_time.
Valid time is the time when the fact is true.
The CRUD operations can be given as:
INSERT
Insert into both R
Insert into R_Hist with valid_end_time as infinity
UPDATE
Update in R
Insert into R_Hist with valid_end_time as infinity
Update valid_end_time with the current time for the “latest” tuple
DELETE
Delete from R
Update valid_end_time with the current time for the “latest” tuple
SELECT
Select from R for ‘snapshot’ queries (implicitly ‘latest’ timestamp)
Select from R_Hist for temporal operations
Instead, you can choose to design new table for every attribute of table R. By this particular design you can capture attribute level temporal data as opposed to entity level in the previous design. The CRUD operations are almost similar.
I did a column Deleted and a column DeletedDate. Deleted defaults to false and deleted date null.
Complex primary key on IDColumn, Deleted, and DeletedDate.
Can index by deleted so you have real fast queries.
No duplicate primary key on your IDColumn because your primary key includes deleted and deleted date.
Assumption: you won't write to the same record more than once a millisecond. Could cause duplicate primary key issue if deleted date is not unique.
So then I do a transaction type deal for updates: select row, take results, update specific values, then insert. Really its an update to deleted true deleted date to now() then you have it spit out the row after update and use that to get primary key and/or any values not available to whatever API you built.
Not as good as a temporal table and takes some discipline but it builds history into 1 table that is easy to report on.
I may start updating the deleted date column and change it to added/Deleted in addition to the added date so I can sort records by 1 column, the added/deleted column while always updated the addedBy column and just set the same value as the added/Deleted column for logging sake.
Either way could just do a complex case when not null as addedDate else addedDate as addedDate order by AddedDate desc. so, yeah, whatever, this works.

MySQL SELECT DISTINCT rows (not columns) to filter $_POST for duplicates

I'm trying to filter rows from the MySQL table where all the $_POST data is stored from an online form. Sometimes the user's internet connection stalls or the browser screws up, and the new page after form submission is not displayed (though the INSERT worked and the table row was created). They then hit refresh, and submit their form twice, creating a duplicate row (except for the timestamp and autoincrement id columns).
I'd like to select unique form submissions. This has to be a really common task, but I can't seem to find something that lets me call with DISTINCT applying to every column except the timestamp and id in a succinct way (sort of like SELECT id, timestamp, DISTINCT everything_else FROM table;. At the moment, I can do:
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS (
SELECT DISTINCT everything,except,id,and,timestamp
FROM table1
);
SELECT * FROM table1 LEFT OUTER JOIN temp1
ON table1.everything = temp1.everything
...
;
My table has 20k rows with about 25 columns (classification features for a machine learning exercise). This query takes forever (as I presume it traverses the 20k rows 20K times?) I've never even let it run to completion. What's the standard practice way to do this?
Note: This question suggests add an index to the relevant columns, but there can be max 16 key parts to an index. Should I just choose the most likely unique ones? I can find about 700 duplicates in 2 seconds this way, but I can't be sure of not throwing away a unique row because I also have to ignore some columns when specifying the index.
If you have a UNIQUE key (other than an AUTO_INCREMENT), simply use INSERT IGNORE ... to silently avoid duplicate rows. If you don't have a UNIQUE key, do you never need to find a row again?
If you have already allowed duplicates and you need to get rid of them, that is a different question.
I would try to eliminate the problem in the first place. There are techniques to eliminate this issue. The first one on my mind is that you could generate a random string and store it in both the session and as a hidden field in the form. This random string should be generated each time the form is displayed. When the user submits the form you need to check that the session key and the input key matches. Make sure to generate a different key on each request. Thus when a user refreshes the page he will submit an old key and it will not match.
Another solution could be that if this data should always be unique in the database check if there is that exact data in the database first before inserting. And if the data is unique by lets say the email address you can create a unique key index. Therefore that field will have to be unique in the table.

Correct way of inserting new row into SQL table, but only if pair does not exist

This has been discussed before, however I cannot understand the answers I have found.
Essentially I have a table with three columns memo, user and keyid (the last one is primary and AUTO_INC). I insert a pair of values (memo and user). But if I try to insert that same pair again it should not happen.
From what I found out, the methods to do this all depend on a unique key (which I've got, in keyid) but what I don't understand is that you still need to do a second query just to get the keyid of the existing couple (or get nothing, in which case you go ahead with the insertion).
Is there any way to do all of this in a single query? Or am I understanding what I've read (using REPLACE or IGNORE) wrong?
You need to set a UNIQUE KEY on user + memo,
ALTER TABLE mytable
ADD CONSTRAINT unique_user_memo UNIQUE (memo,user)
and then using INSERT IGNORE or REPLACE according to your needs when inserting. Your current unique key is the primary key, that is all well and good, but you need a 2nd one in order to not allow the insertion of duplicate data. If you do not create a new unique key on the two columns together, then you'll need to do a SELECT query before every insert to check if the pair already exists.

overwrite mysql table data

I have a web crawler. The web crawler gathers the links from web pages I give it but when it is retrieving the links, some links are duplicated due to the website. is there a way in MYSQL to overwrite data if a new row is the exact same as an old row.
Say if I have http://www.facebook.com in a link field
I also manage to pick up http://www.facebook.com again, I would like the latter to overwrite the old row. therefore I don't have clashes on my search engine.
I'm assuming that you want to update a last_updated date if the url already exists. Else there is no good reason to do an update.
INSERT INTO `scrapping_table`
(`url`)
VALUES
("www.facebook.com")
ON DUPLICATE KEY UPDATE
`date_updated` = `datetime.now()`
look into ON DUPLICATE KEY actions
http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html
Basically make the columns you're concerned with a unique key write your insert statement and then add
ON DUPLICATE KEY UPDATE col = overwriting value
If your link field is unique than you can use
INSERT INTO "mytable" (link_field, x_column, y_column) VALUES ("www.facebook.com",'something new for x','something new for y')
ON DUPLICATE KEY UPDATE x_column='something new for x', y_column='something new for y'
Just make sure your link field is unique and if you have more unique fields in your column, I suggest use this second method because they suggest avoid using an ON DUPLICATE KEY clause on tables with multiple unique indexes.
set your link field as unique.
before inserting a row try
Select "primary_id" from mytable where link_field="www.facebook.com"
Count the number of returned row from this SQL.
=>If count>0 then UPDATE the row using the "primary_id" we just grabbed through the SELECT SQL
=> if count==0 , just insert your row
beware!!
while operating a web crawler that probably will find millions of links
you want to minimize the query's each "crawl" process fires...
do you want to create a unique link table that will feed the bots? or do you want to prevent duplicate search results?
unique url pool table:
while crawling the page - you should save url's to an array (or list) and making sure (!in_array()) that the its a unique value array, you will find that each page you crawl includes alot of repeated links - so clean them before using sql.
covert the urls to hashes ("simhash" of 32 digits [1,0]).
now open a connection to db and check if exists if it does dump them! don't update (its making a second process). you should match the links using the hashes over an indexed table it will be far more faster.
prevent duplicate results search:
if you indexed the url in the above methodology you should not find duplicate url's, if you have, it means there is a problem in your crawling operation.
even if you have duplicate values in another table and you want to search it but not returning duplicate results you can use DISTINCT in your query.
good luck!