Perl MySQL - How do I skip updating or inserting a row if a particular field matches? - mysql

I am pretty new to this so sorry for my lack of knowledge.
I set up a few tables which I have successfully written to and and accessed via a Perl script using CGI and DBI modules thanks to advice here.
This is a member list for a local band newsletter. Yeah I know, tons of apps out there but, I desire to learn this.
1- I wanted to avoid updating or inserting a row if an piece of my input matches column data in one particular column/field.
When creating the table, in phpmyadmin, I clicked the "U" (unique) on that columns name in structure view.
That seemed to work and no dupes are inserted but, I desire a hard coded Perl solution so, I understand the mechanics of this.
I read up on "insert ignore" / "update ignore" and searched all over but, everything I found seems to not just skip a dupe.
The column is not a key or autoinc just a plain old field with an email address. (mistake?)
2- When I write to the database, I want to do NOTHING if the incoming email address matches one in that field.
I desire the fastest method so I can loop through their existing lists export data, (they cannot figure out the software) with no racing / locking issues or whatever conditions in which I am in obvious ignorance.
Since I am creating this from scratch, 1 and 2 may be in fact partially moot. If so, what would be the best approach?
I would still like an auto increment ID so, I can access via the ID number or loop through with some kind of count++ foreach.
My stone knife approach may be laughable to the gurus here but, I need to start somewhere.
Thanks in advance for your assistance.

With the email address column declared UNIQUE, INSERT IGNORE is exactly what you want for insertion. Sounds like you already know how to do the right thing!
(You could perform the "don't insert if it already exists" functionality in perl, but it's difficult to get right, because you have to wrap the test and update in a transaction. One of the big advantages of a relational database is that it will perform constraint checks like this for you, ensuring data integrity even if your application is buggy.)
For updating, I'm not sure what an "update ignore" would look like. What is in the WHERE clause that is limiting your UPDATE to only affect the 1 desired row? Perhaps that auto_increment primary key you mentioned? If you are wanting to write, for example,
UPDATE members SET firstname='Sue' WHERE member_id = 5;
then I think this "update ignore" functionality you want might just be something like
UPDATE members SET firstname='Sue' WHERE member_id = 5
AND email != 'sue#example.com';
which is an odd thing to do, but that's my best guess for what you might mean :)

Just do the insert, if data would make the unique column not be unique you'll get an SQL error, you should be able to trap this and do whatever is appropriate (e.g. ignore it, log it, alert user ...)

Related

What is the best way to prevent duplicate values in databases

What is the best way to prevent duplicate values in databases ?
I have a table called names that has only one column called name that is unique (declared as unique attribute).
What is the best way to insert a new name (x) ?
Way1: Should I make a select query for the name x first to check if exist or not. Then make another query to insert the name iff it is not exists in the table.
Way2: Make only one query to insert the name and ignore the error if name already exists.
The second way is the better way. Why run two queries when you can just run one?
When you declare the column as unique, you have told the database to do the extra work for ensure that this is true. You don't need to do anything else -- other than check the errors on the return.
Database constraint will definitely take care about uniqueness, but if you have logic where you need to use last inserted ID to other child table, then only I think you will require to perform manual check before insert, else just ignore exception if raise due to duplication.
The first way works. After the action you can be sure that the record exists (unless some other error occured) You do need a second query (or some another mechanism) to retrieve the actual tuple, either the existing one or a fresly inserted one.
The second way is terrible: the DBMS session is in error-state, {your current work has implicitely been rolled back, and your all cursors have been closed} So, you'll have to start your work allover again, hopefully without the duplicate.
The case you give is a simplified "upsert". Do a search for upsert and you will find answers to the more general question. Some databases, like mysql provide for
insert ignore for this simple case.
Otherwise for the simple case you mention you can use the second approach. For the more general upsert, it is surprisingly difficult to get it right. The issue is concurrent updates. In fact, I have not seen a satisfactory answer for general upserts. Some say to use "merge" but that is subject to concurrency issues.

MySQL Update entire table with unknown # of rows and clear the rest

I'm pretty sure this particular quirk isn't a duplicate so here goes.
I have a table of services. In this table, I have about 40 rows of the following columns:
Services:
id_Services -- primary key
Name -- name of the service
Cost_a -- for variant a of service
Cost_b -- for variant b of service
Order -- order service is displayed in
The user can go into an admin tool and update any of this information - including deleting multiple rows, adding a row, editing info, and changing the order they are displayed in.
My question is this, since I will never know how many rows will be incoming from a submission (there could be 1 more or 100% less), I was wondering how to address this in my query.
Upon submission, every value is resubmitted. I'd hate to do it this way but the easiest way I can think of is to truncate the table and reinsert everything... but that seems a little... uhhh... bad! What is the best way to accomplish this?
RE-EDIT: For example: I start with 40 rows, update with 36. I still have to do something to the values in rows 37-40. How can I do this? Are there any mysql tricks or functions that will do this for me?
Thank you very much for your help!
You're slightly limited by the use case; you're doing insertion/update/truncation that's presented to the user as a batch operation, but in the back-end you'll have to do these in separate statements.
Watch out for concurrency: use transactions if you can.

Keeping id's unique Client Side and Server Side

i am scrubbing my head now for hours to solve thw following situation:
Several Html Forms on a webpage are identified by an id. Users can create forms on the clients side themselves and fill in data. How can I guarantee that the id of the form the user generates is unique and that there doesnt occure any collision in the saving process because the same id was generated by the client of someone else.
The problems/questions:
A random function on the client side could return identical id's on two clients
Looking up the SQL table for free id wouldnt solve the problem
Autoincrement a new id would complicate the whole process because DOM id and SQL id differ so we come to the next point:
A "left join" to combine dom_id and user_id to identify the forms in the database looks like a performance killer because i expect these tables will be huge
The question (formed as simple as i can):
Is there a way that the client can create/fetch a unique id which will be later used as the primary key for a database entry without any collisions? Whats the best practice?
My current solution (bad):
No unique id's at all to identify the forms. Always a combination through a left join to identify the forms generated by the specific user. But what happens if the user says: Delete my account (and my user_id) but leave the data on the server. I would loose the user id and this query qouldn't work anymore...
I am really sorry that i couldn't explain it in another way. But i hope someone understood what i am faced with and could give me at least a hint
THANK YOU VERY MUCH!
GUIDs (Globally Unique IDentifiers) might help. See http://en.wikipedia.org/wiki/GUID
For each form the client could generate a new GUID. Theoretically it should be unique.
I just don't show IDs to the user until they've submitted something, at which point they get to see the generated auto-increment id. It keeps things simple. If you however really need it, you could use a sequence table, but it has some caveats which make me advise against it:
CREATE TABLE sequence (id integer default 0, sequencename varchar(32));
Incrementing:
UPDATE sequence
SET id = #generated := id + 1
WHERE sequencename = 'yoursequencename';
Getting:
SELECT #generated;

get the next id

My question is how to get the next id using NHibernate in a mysql db for an auto-increment ID column ?
Thanks,
Based on the further description you give (as an answer?) below it seems to me that you are indeed looking for the NHibernate feature to automatically read back IDs generated by the database: identity
This will tell NHibernate the ID's value is determined by the database upon insert, it will not send a value as part of its INSERT statement and it will read back the value of the ID column after it has performed the insert. But you do have to tell the database (in the table definition) that it should auto-generate a value for the ID column for each record inserted...
You're going to create a race condition if you do this. To answer your question, I don't think there is a specific way for Hibernate to give you this information since no application can give you this information. By getting the "next id", by the time it returns that data to you, it might be invalid already. The easiest way I can think of is to get the last_insert_id() and add +1 to it.
Why don't you post more information about you're trying to accomplish and we can find a better solution for you?
Provided that you are the only writer to your database then you could get your application to maintain the sequence number for you and allocate the next number yourself.
If you want to do this then you'll want to ensure that your application counter is thread safe.
You'll also want a way to get the last written sequence number when restarting you application.

Versioned and indexed data store

I have a requirement to store all versions of an entity in a easily indexed way and was wondering if anyone has input on what system to use.
Without versioning the system is simply a relational database with a row per, for example, person. If the person's state changes that row is changed to reflect this. With versioning the entry should be updated in such a way so that we can always go back to a previous version. If I could use a temporal database this would be free and I would be able to ask 'what is the state of all people as of yesterday at 2pm living in Dublin and aged 30'. Unfortunately there doesn't seem to be any mature open source projects that can do temporal.
A really nasty way to do this is just to insert a new row per state change. This leads to duplication, as a person can have many fields but only one changing per update. It is also then quite slow to select the correct version for every person given a timestamp.
In theory it should be possible to use a relational database and a version control system to mimic a temporal database but this sounds pretty horrendous.
So I was wondering if anyone has come across something similar before and how they approached it?
Update
As suggested by Aaron here's the query we currently use (in mysql). It's definitely slow on our table with >200k rows. (id = table key, person_id = id per person, duplicated if the person has many revisions)
select name from person p where p.id = (select max(id) from person where person_id = p.person_id and timestamp <= :timestamp)
Update
It looks like the best way to do this is with a temporal db but given that there aren't any open source ones out there the next best method is to store a new row per update. The only problem is duplication of unchanged columns and a slow query.
There are two ways to tackle this. Both assume that you always insert new rows. In every case, you must insert a timestamp (created) which tells you when a row was "modified".
The first approach uses a number to count how many instances you already have. The primary key is the object key plus the version number. The problem with this approach seems to be that you'll need a select max(version) to make a modification. In practice, this is rarely an issue since for all updates from the app, you must first load the current version of the person, modify it (and increment the version) and then insert the new row. So the real problem is that this design makes it hard to run updates in the database (for example, assign a property to many users).
The next approach uses links in the database. Instead of a composite key, you give each object a new key and you have a replacedBy field which contains the key of the next version. This approach makes it simple to find the current version (... where replacedBy is NULL). Updates are a problem, though, since you must insert a new row and update an existing one.
To solve this, you can add a back pointer (previousVersion). This way, you can insert the new rows and then use the back pointer to update the previous version.
Here is a (somewhat dated) survey of the literature on temporal databases: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.6988&rep=rep1&type=pdf
I would recommend spending a good while sitting down with those references and/or Google Scholar to try to find some good techniques that fit your data model. Good luck!