MySQL on duplicate key delete - mysql

I am looking for a (not too convoluted) solution for a MySQL problem. Say I have the following table (with a joint index on group and item):
Group item
nogroup item_a
group_a item_a
Then, eventually, item_a no longer belongs to group_a. So I want to do something like:
update table set group = "nogroup" where item = "item_a" on duplicate key delete.
(obviously this is not a valid symtax but I am looking for a way around this)
I still want to keep a copy of the record with nogroup because, if later on, item_a comes back, i can change its group back to group_a or any other group depending on the case. Whenever item_a is added, there is an insert and it copies all the data from the nogroup record and sets a proper group label. At that point there are two records for item_a: one with group_a and one with no group. The reason it is done this way is to reuse previous data as much as possible as a new entry(with no previous record) is much more involved and take significantly more time and processing.
Say an item belongs to group_a and group_b but suddenly it does not belong to any group: the first update to set group to "nogroup" will work but the second update will create a duplicate key entry error.
The option of "not updating the group column at all" and using "insert on duplicate key update" does not work because there won't be duplicates when the groups are different and this will lead to cases where an item does not belong to a group anymore and yet the record will still be present in the database. The option of verifying if "nogroup" exists first and then updating it to a specific group does not work either because if item_a belongs to more than one group this would update all other records to the same group.
Basically, an item can belong to 1) any number of groups including "nogroup" or 2) solely belonging to "nogroup" and there should always be a copy of at least nogroup somewhere in the database.
It looks like I won't be able to do this in just one query but if someone has a clean way of dealing with this, that would be much appreciated. Maybe some of my assumptions above are wrong and there is an easy way to do it.

Your whole process of maintaining this items-to-groups mapping sounds too complicated. Why not just have a table that has a mapping? Then, when an item is removed from a group, delete it from the table. When it is added, add it to the table. Don't bother with "nogroup".
If you want an archive table, then create one. Have an insert/update/delete trigger (whichever is or are appropriate) that will populate an archive with information that you want to keep over time.
I do not understand why re-using an existing row would be beneficial in terms of performance. There is no obvious database reason why this would be the case.
I am also confused as to why you need a "nogroup" tag at all. If you need a list of items, maintain that list in its own table. And call the table Items -- a much clearer name than "nogroup".

I agree with Gordan's approach. However if you have to do it with a single table it cannot be done in 1 SQL query. You will have to use 2 queries 1 for update and 1 for delete.

Related

MySQL: Data structure for transitive relations

I tried to design a data structure for easy and fast querying (delete, insert an update speed does not really matter for me).
The problem: transitive relations, one entry could have relations through other entries whose relations I don't want to save separately for every possibility.
Means--> I know that Entry-A is related to Entry-B and also know that Entry-B is related to Entry-C, even though I don't know explicitly that Entry-A is related to Entry-C, I want to query it.
What I think the solution is:
Eliminating the transitive part when inserting, deleting or updating.
Entry:
id
representative_id
I would store them as sets, like group of entries (not mysql set type, the Math set, sorry if my English is wrong). Every set would have a representative entry, all of the set elements would be related to the representative element.
A new insert would insert the Entry and set the representative as itself.
If the newly inserted entry should be connected to another, I simply set the representative id of the newly inserted entry to the referred entry's rep.id.
Attach B to A
It doesn't matter, If I need to connect it to something that is not a representative entry, It would be the same, because every entry in the set would have the same rep.id.
Attach C to B
Detach B-C: The detached item would have become a representative entry, meaning it would relate to itself.
Detach B-C and attach C to X
Deletion:
If I delete a non-representative entry, it is self explanatory. But deleting a rep.entry is harder a bit. I need to chose a new rep.entry for the set and set every set member's rep.id to the new rep.entry's rep.id.
So, delete A in this:
Would result this:
What do you think about this? Is it a correct approach? Am I missing something? What should I improve?
Edit:
Querying:
So, If I want to query every entry that is related to an certain entry, whose id i know:
SELECT *
FROM entries a
LEFT JOIN entries b ON (a.rep_id = b.rep_id)
WHERE a.id = :id
SELECT * FROM AlkReferencia
WHERE rep_id=(SELECT rep_id FROM AlkReferencia
WHERE id=:id);
About the application that requires this:
Basically, I am storing vehicle part numbers (references), one manufacturer can make multiple parts that can replace another and another manufacturer can make parts that are replacing other manufacturer's parts.
Reference: One manufacturer's OEM number to a certain product.
Cross-reference: A manufacturer can make products that objective is to replace another product from another manufacturer.
I must connect these references in a way, when a customer search for a number (doesn't matter what kind of number he has) I can list an exact result and the alternative products.
To use the example above (last picture): B, D and E are different products we may have in store. Each one has a manufacturer and a string name/reference (i called it number before, but it can be almost any character chain). If I search for B's reference number, I should return B as an exact result and D,E as alternatives.
So far so good. BUT I need to upload these reference numbers. I can't just migrate them from an ALL-IN-ONE database. Most of the time, when I upload references I got from a manufacturer (somehow, most of the time from manually, but I can use catalogs too), I only get a list where the manufacturer tells which other reference numbers point to his numbers.
Example.:
Asas filter manufacturer, "AS 1" filter has these cross references (means, replaces these):
GOLDEN SUPER --> 1
ALFA ROMEO --> 101000603000
ALFA ROMEO --> 105000603007
ALFA ROMEO --> 1050006040
RENAULT TRUCKS (RVI) --> 122577600
RENAULT TRUCKS (RVI) --> 1225961
ALFA ROMEO --> 131559401
FRAD --> 19.36.03/10
LANDINI --> 1896000
MASSEY FERGUSON --> 1851815M1
...
It would took ages to write all of the AS 1 references down, but there is many (~1500 ?). And it is ONE filter. There is more than 4000 filter and I need to store there references (and these are only the filters). I think you can see, I can't connect everything, but I must know that Alfa Romeo 101000603000 and 105000603007 are the same, even when I only know (AS 1 --> alfa romeo 101000603000) and (as 1 --> alfa romeo 105000603007).
That is why I want to organize them as sets. Each set member would only connect to one other set member, with a rep_id, that would be the representative member. And when someone would want to (like, admin, when uploading these references) attach a new reference to a set member, I simply INSERT INTO References (rep_id,attached_to_originally_id,refnumber) VALUES([rep_id of the entry what I am trying to attach to],[id of the entry what I am trying to attach to], "16548752324551..");
Another thing: I don't need to worry about insert, delete, update speed that much, because it is an admin task in our system and will be done rarely.
It is not clear what you are trying to do, and it is not clear that you understand how to think & design relationally. But you seem to want rows satisfying "[id] is a member of the set named by member [rep_id]".
Stop thinking in terms of representations and pointers. Just find fill-in-the-(named-)blank statements ("predicates") that say what you know about your application situations and that you can combine to ask about your application situations. Every statement gets a table ("relation"). The columns of the table are the names of the blanks. The rows of the table are the ones that make its statement true. A query has a statement built from its table's statements. The rows of its result are the ones that make its statement true. (When a query has JOIN of table names its statement ANDs the tables' statements. UNION ORs them. EXCEPT puts in AND NOT. WHERE ANDs a condition. Dropping a column by SELECT corresponds to logical EXISTS.)
Maybe your application situations are a bunch of cells with values and pointers. But I suspect that your cells and pointers and connections and attaching and inserting are just your way of explaining & justifying your table design. Your application seems to have something to do with sets or partitions. If you really are trying to represent relations then you should understand that a relational table represents (is) a relation. Regardless, you should determine what your table statements are. If you want design help or criticism tell us more about your application situations, not about representation of them. All relational representation is by tables of rows satisfying statements.
Do you really need to name sets by representative elements? If we don't care what the name is then we typically use a "surrogate" name that is chosen by the DBMS, typically via some integer auto-increment facility. A benefit of using such a membership-independent name for a set is that we don't have to rename, in particular by choosing an element.

Complex MySQL Delete Query

Current Structure
As you can see Path can be referenced by multiple Tables and multiple records within those tables.
Points can also be referenced by two different tables.
My Question
I would like to delete a PathType however this gets complicated as
a Path may be owned by more than one PathType so deleting the
Path without checking how many references there are to it is out
of the question.
Secondly, if this Path's only reference is the PathType I'm
trying to delete then I will want to delete this Path and any
records in PathPoints.
Lastly, if there are no other references on Point from any other records then this will also need to be deleted but only if its not used by any other object.
Attempts So Far
DELETE PathType1.*, Path.*, PathPoints.*, Point.* FROM PathType1,Path,PathPoints,Point WHERE PathType1.ID = 1 AND PathType1.PATH = Path.ID AND (SELECT COUNT(*) FROM PathType1 WHERE PathType1.PATH = Path.ID) < 1 AND (SELECT COUNT(*) FROM PathType2 WHERE PathType2.PATH = Path.ID) = 0
Obviously the above statement goes on but this isn't the right way about I don't think because if one fails then nothing is deleted...
I think that maybe it isn't possible to do what I'm attempting through one statement and I may have to iterate through each section and handle them based on the outcome. Not so efficient but I don't see any alternative at this time.
I hope this is clear. If you have any more questions or need any clarification then please do not hesitate to ask
First there is no way I would do this in a query like that even if the database allowed it which most will not. This is an unmaintanable mess.
The preferred method is to create a transaction, then delete from one table at a time starting with the bottommost child table. Then commit the transaction. And of course have error handling so the entire transaction is riolled back if one delete fails to maintain data integrity. If I intended to do this repeatedly, I would do it in a stored proc.

Can I combine INSERT, JOIN and ON DUPLICATE KEY UPDATE

Here's my database structure as it stands today...
inventory_transactions store movements of inventory with quantity_offset value that is either negative or positivie. They also have an inventory_transaction_id
shipments store shipments, which are groups of inventory_transactions with a shipment_id
The relationship between inventory_transactions and shipments is in a table called shipment_inventory_transactions
What I would like to be able to do is increment the quantity_offset of an inventory_transaction that is associated with a given shipment (increase the quantity of a given inventory item within the shipment) if that item already exists in the shipment.
If the item doesn't exist, create the required rows in inventory_transactions and shipment_inventory_transactions
I think some combination of JOIN and ON DUPLICATE KEY UPDATE can do this, but I can't wrap my head around it.
To simplify the situation, I'm considering removing the shipment_inventory_transactions table because the relationship between shipments and inventory_transactions is now going to be 1-to-1. The only gotcha here is that each inventory_transaction can either be associated with a shipment or a receipt, but not both. Storing them both in the same column sounds skanky. But I don't love having an extra column in every row as only one or the other will be used.
Wooh... Brain dump complete. If this made sense and you can provide a sensible answer that has eluded me, I'd be most appreciative.
Ultimately, I found simplifying the database to eliminate the many-to-many relationship let me accomplish what I wanted with a simple Insert. Better to simplify at this point than add great complexity that'll become problematic as the application grows.

How to efficiently design MySQL database for my particular case

I am developing a forum in PHP MySQL. I want to make my forum as efficient as I can.
I have made these two tables
tbl_threads
tbl_comments
Now, the problems is that there is a like and dislike button under the each comment. I have to store the user_name which has clicked the Like or Dislike Button with the comment_id. I have made a column user_likes and a column user_dislikes in tbl_comments to store the comma separated user_names. But on this forum, I have read that this is not an efficient way. I have been advised to create a third table to store the Likes and Dislikes and to comply my database design with 1NF.
But the problem is, If I make a third table tbl_user_opinion and make two fields like this
1. comment_id
2. type (like or dislike)
So, will I have to run as many sql queries as there are comments on my page to get the like and dislike data for each comment. Will it not inefficient. I think there is some confusion on my part here. Can some one clarify this.
You have a Relational Scheme like this:
There are two ways to solve this. The first one, the "clean" one is to build your "like" table, and do "count(*)'s" on the appropriate column.
The second one would be to store in each comment a counter, indicating how many up's and down's have been there.
If you want to check, if a specific user has voted on the comment, you only have to check one entry, wich you can easily handle as own query and merge them two outside of your database (for this use a query resulting in comment_id and kind of the vote the user has done in a specific thread.)
Your approach with a comma-seperated-list is not quite performant, due you cannot parse it without higher intelligence, or a huge amount of parsing strings. If you have a database - use it!
("One Information - One Dataset"!)
The comma-separate list violates the principle of atomicity, and therefore the 1NF. You'll have hard time maintaining referential integrity and, for the most part, querying as well.
Here is one way to do it in a normalized fashion:
This is very clustering-friendly: it groups up-votes belonging to the same comment physically close together (ditto for down-votes), making the following query rather efficient:
SELECT
COMMENT.COMMENT_ID,
<other COMMENT fields>,
COUNT(DISTINCT UP_VOTE.USER_ID) - COUNT(DISTINCT DOWN_VOTE.USER_ID) SCORE
FROM COMMENT
LEFT JOIN UP_VOTE
ON COMMENT.COMMENT_ID = UP_VOTE.COMMENT_ID
LEFT JOIN DOWN_VOTE
ON COMMENT.COMMENT_ID = DOWN_VOTE.COMMENT_ID
WHERE
COMMENT.COMMENT_ID = <whatever>
GROUP BY
COMMENT.COMMENT_ID,
<other COMMENT fields>;
[SQL Fiddle]
Please measure on realistic amounts of data if that works fast enough for you. If not, then denormalize the model and cache the total score in the COMMENT table, and keep it current it through triggers every time a new row is inserted to or deleted from *_VOTE tables.
If you also need to get which comments a particular user voted on, you'll need indexes on *_VOTE {USER_ID, COMMENT_ID}, i.e. the reverse of the primary/clustering key above.1
1 This is one of the reasons why I didn't go with just one VOTE table containing an additional field that can be either 1 (for up-vote) or -1 (for down-vote): it's less efficient to cover with secondary indexes.

How to merge 2 Records in innoDB MySQL databases

This is related to How to change ID in mysql
I also have checked other questions and none are quite like this one.
As we know, innodb has a feature. If I want to channge an id of a record for example, then all other table that point to the previous ID will magically be updated.
What about if I want to MERGE 2 records?
Say I have 2 businesses.
They have 2 ID.
I want to merge them into one. I also want to use innodb awesome feature to automatically change things.
I can't just change one of the id to the other ID. Or can I?
What would you do to merge 2 simmilar records in database?
Of course what actually goes into the combined record will be business decisions.
Basically I just do not want to pin point all the other table one by one. I think on update rule is there for a reason. Is there a way where I just change slaveID to masterID, keep ALL data in master the same, and then have the database itself (rather than my program) to repoint all tables that point to slaveID to point to masterID? of course, records for slaveID will be gone anyway.
For example, with normal mysql engine, you can change ID, and then you have to go through all table that points to the old ID to point the new ID instead. With innodb, that repointing is done by the database engine itself. Which is kind of cool. Why would anyone use non innodb engine anyway.
I want to do the same but for merging.
Trying to set a records primary key to an already existing value will simply result in a key violation error. While this is simple on a first glance, it has a side effect: You can not use ON UPDATE CASCADE to merge two records - it will simply not work.
If you have the possibility to change the schema, you can use the old but good redirect-trick:
(Assuming your IDs are positive, maybe unsigend ints)
add a field redirect int not null default 0
Create a view:
.
CREATE VIEW tablename_view
SELECT
-- repeat next line for every field apart from redirect
IF(s.redirect>0,m.<fieldname>,s.<fieldname>
FROM tablename AS s
LEFT JOIN tablename AS m ON s.redirect=m.id
When you merge a record (slave) into another record (master) run UPDATE tablename SET redirect=<id_of_master> WHERE id=<id_of_slave>
Adapt your select queries to select from tablename_view instead of tablename
Create and use a maintenance script to weed out merger slaves