Current Structure
As you can see Path can be referenced by multiple Tables and multiple records within those tables.
Points can also be referenced by two different tables.
My Question
I would like to delete a PathType however this gets complicated as
a Path may be owned by more than one PathType so deleting the
Path without checking how many references there are to it is out
of the question.
Secondly, if this Path's only reference is the PathType I'm
trying to delete then I will want to delete this Path and any
records in PathPoints.
Lastly, if there are no other references on Point from any other records then this will also need to be deleted but only if its not used by any other object.
Attempts So Far
DELETE PathType1.*, Path.*, PathPoints.*, Point.* FROM PathType1,Path,PathPoints,Point WHERE PathType1.ID = 1 AND PathType1.PATH = Path.ID AND (SELECT COUNT(*) FROM PathType1 WHERE PathType1.PATH = Path.ID) < 1 AND (SELECT COUNT(*) FROM PathType2 WHERE PathType2.PATH = Path.ID) = 0
Obviously the above statement goes on but this isn't the right way about I don't think because if one fails then nothing is deleted...
I think that maybe it isn't possible to do what I'm attempting through one statement and I may have to iterate through each section and handle them based on the outcome. Not so efficient but I don't see any alternative at this time.
I hope this is clear. If you have any more questions or need any clarification then please do not hesitate to ask
First there is no way I would do this in a query like that even if the database allowed it which most will not. This is an unmaintanable mess.
The preferred method is to create a transaction, then delete from one table at a time starting with the bottommost child table. Then commit the transaction. And of course have error handling so the entire transaction is riolled back if one delete fails to maintain data integrity. If I intended to do this repeatedly, I would do it in a stored proc.
Related
Example tables (not actual database):
In this example, I would have the SecurityCode(Unique), and Time. My current solution involves attempting to add a new Person using the security code, then querying the ID, then adding to the Times table. This is 3 separate statements and could likely be a lot faster. Any advice on how to optimise this?
Thanks.
Edit: I previously forgot to mention that this is normally done in a batch of 30-40 records.
I am also considering using SecurityCode as the foreign key in Times.
I think there are many ways of achieve this, the easiest:
Try using "IF", you only need it for the first step of your statement, the last two are independent to the result of this evaluation.
Plus, save your security code in a variable, then you will save one table scan (you already have it)
**please note its just pseudo-code**
IF (exists select * from person where securityCode = #securityCode) then
Step 1
End
Step 2
Step 3
Can you try it?
The fastest way seemed to be to batch ignore insert all security codes, then batch insert all Times with a subquery to select the correct ID from Person.
I am looking for a (not too convoluted) solution for a MySQL problem. Say I have the following table (with a joint index on group and item):
Group item
nogroup item_a
group_a item_a
Then, eventually, item_a no longer belongs to group_a. So I want to do something like:
update table set group = "nogroup" where item = "item_a" on duplicate key delete.
(obviously this is not a valid symtax but I am looking for a way around this)
I still want to keep a copy of the record with nogroup because, if later on, item_a comes back, i can change its group back to group_a or any other group depending on the case. Whenever item_a is added, there is an insert and it copies all the data from the nogroup record and sets a proper group label. At that point there are two records for item_a: one with group_a and one with no group. The reason it is done this way is to reuse previous data as much as possible as a new entry(with no previous record) is much more involved and take significantly more time and processing.
Say an item belongs to group_a and group_b but suddenly it does not belong to any group: the first update to set group to "nogroup" will work but the second update will create a duplicate key entry error.
The option of "not updating the group column at all" and using "insert on duplicate key update" does not work because there won't be duplicates when the groups are different and this will lead to cases where an item does not belong to a group anymore and yet the record will still be present in the database. The option of verifying if "nogroup" exists first and then updating it to a specific group does not work either because if item_a belongs to more than one group this would update all other records to the same group.
Basically, an item can belong to 1) any number of groups including "nogroup" or 2) solely belonging to "nogroup" and there should always be a copy of at least nogroup somewhere in the database.
It looks like I won't be able to do this in just one query but if someone has a clean way of dealing with this, that would be much appreciated. Maybe some of my assumptions above are wrong and there is an easy way to do it.
Your whole process of maintaining this items-to-groups mapping sounds too complicated. Why not just have a table that has a mapping? Then, when an item is removed from a group, delete it from the table. When it is added, add it to the table. Don't bother with "nogroup".
If you want an archive table, then create one. Have an insert/update/delete trigger (whichever is or are appropriate) that will populate an archive with information that you want to keep over time.
I do not understand why re-using an existing row would be beneficial in terms of performance. There is no obvious database reason why this would be the case.
I am also confused as to why you need a "nogroup" tag at all. If you need a list of items, maintain that list in its own table. And call the table Items -- a much clearer name than "nogroup".
I agree with Gordan's approach. However if you have to do it with a single table it cannot be done in 1 SQL query. You will have to use 2 queries 1 for update and 1 for delete.
We're trying to figure out what the relative costs are between a couple of approaches.
We have a web page where people choose to add/keep/remove rows from a table, by marking them with checkboxes. (People can add new entries to the page as well as see existing ones.)
When posted to the web server the page loops over the entries and calls a stored procedure, passing in the state of the checkbox as one of the parameters.
The stored procedure currently calls a delete statement for each entry, followed by an insert if the checkbox is marked. This has the virtue of simplicity.
We're thinking instead of putting some if exists logic in there, to test whether the row is already in the table.
If so and the checkbox is marked, we'd leave it alone. Otherwise we'd insert it. Conversely, if the row isn't in the table and the checkbox is unmarked, we'd skip the delete and insert statements. This minimizes the number of deletes and such but at a cost of more logic.
In terms of load on the database, is one approach generally preferred to the other?
Is there a cost to calling delete statements that don't, in fact, affect any rows, as would be the case when adding new records? Is this worse than an if exists check?
The table is indexed on all relevant columns. I assume for posting 600,000 entries there would be a big advantage to checking beforehand, but the page in question will have 100 entries at most.
The biggest problem you're going to have with performance here is that you are calling a stored procedure for every entry - it really doesn't matter if inside that stored procedure you use DELETE/INSERT or check first, you're still going to have the overhead of 600K procedure calls, some potentially large portion of 600K logged transactions, etc.
I strongly recommend you look at table-valued parameters. Your C# or whatever can pass a set of 600K entries to a single stored procedure, once, and then you can perform two set-based operations (pseudo-code):
UPDATE src SET val = t.val
FROM dbo.tvp INNER JOIN dbo.source AS src
ON t.key = src.key;
INSERT src SELECT x FROM dbo.tvp AS t
WHERE NOT EXISTS (SELECT 1 FROM src WHERE key = t.key);
I am a bit rusty with mysql and trying to jump in again..So sorry if this is too easy of a question.
I basically created a data model that has a table called "Master" with required fields of a name and an IDcode and a then a "Details" table with a foreign key of IDcode.
Now here's where its getting tricky..I am entering:
INSERT INTO Details (Name, UpdateDate) Values (name, updateDate)
I get an error: saying IDcode on details doesn't have a default value..so I add one then it complains that Field 'Master_IDcode' doesn't have a default value
It all makes sense but I'm wondering if there's any easy way to do what I am trying to do. I want to add data into details and if no IDcode exists, I want to add an entry into the master table. The problem is I have to first add the name to the fund Master..wait for a unique ID to be generated(for IDcode) then figure that out and add it to my query when I enter the master data. As you can imagine the queries are going to probably get quite long since I have many tables.
Is there an easier way? where everytime I add something it searches by name if a foreign key exists and if not it adds it on all the tables that its linked to? Is there a standard way people do this? I can't imagine with all the complex databases out there people have not figured out a more easier way.
Sorry if this question doesn't make sense. I can add more information if needed.
p.s. this maybe a different question but I have heard of Django for python and that it helps creates queries..would it help my situation?
Thanks so much in advance :-)
(decided to expand on the comments above and put it into an answer)
I suggest creating a set of staging tables in your database (one for each data set/file).
Then use LOAD DATA INFILE (or insert the rows in batches) into those staging tables.
Make sure you drop indexes before the load, and re-create what you need after the data is loaded.
You can then make a single pass over the staging table to create the missing master records. For example, let's say that one of your staging table contains a country code that should be used as a masterID. You could add the master record by doing something along the lines of:
insert
into master_table(country_code)
select distinct s.country_code
from staging_table s
left join master_table m on(s.country_code = m.country_code)
where m.country_code is null;
Then you can proceed and insert the rows into the "real" tables, knowing that all detail rows references a valid master record.
If you need to get reference information along with the data (such as translating some code) you can do this with a simple join. Also, if you want to filter rows by some other table this is now also very easy.
insert
into real_table_x(
key
,colA
,colB
,colC
,computed_column_not_present_in_staging_table
,understandableCode
)
select x.key
,x.colA
,x.colB
,x.colC
,(x.colA + x.colB) / x.colC
,c.understandableCode
from staging_table_x x
join code_translation c on(x.strange_code = c.strange_code);
This approach is a very efficient one and it scales very nicely. Variations of the above are commonly used in the ETL part of data warehouses to load massive amounts of data.
One caveat with MySQL is that it doesn't support hash joins, which is a join mechanism very suitable to fully join two tables. MySQL uses nested loops instead, which mean that you need to index the join columns very carefully.
InnoDB tables with their clustering feature on the primary key can help to make this a bit more efficient.
One last point. When you have the staging data inside the database, it is easy to add some analysis of the data and put aside "bad" rows in a separate table. You can then inspect the data using SQL instead of wading through csv files in yuor editor.
I don't think there's one-step way to do this.
What I do is issue a
INSERT IGNORE (..) values (..)
to the master table, wich will either create the row if it doesn't exist, or do nothing, and then issue a
SELECT id FROM master where someUniqueAttribute = ..
The other option would be stored procedures/triggers, but they are still pretty new in MySQL and I doubt wether this would help performance.
Is there a way I can check if a row potentially could be deleted? That it for example is not currently connected through restricted foreign keys to anything else.
Reason: I am making an admin page with all the users in the system listed. They can always be disabled, but they may also be deleted. However they can only be deleted if they are not connected to anything critical. And I would like to not having to check that manually if it can be done easily in the database.
Note: I do not want to actually delete any user. I just want to display to the admin that a user could be deleted.
You could try deleting it as part of a transaction, and then roll back the transaction if it succeeds. BUT, I guess the immediate followup question is, why wouldn't you know in the first place if you could delete the row or not?
You could use a view to sum up the number of dependencies without having to worry about storing the data & keeping it current. When the number of dependencies is zero, make the delete option available in the UI...
You can get all orphaned rows by left joining to the table they're connected to, e.g. this will give you all the user id's that don't have any jobs.
SELECT u.id FROM users u LEFT JOIN jobs j on u.id=j.user_id WHERE j.user_id is null;
Try one of the answers here.
MySQL: How to I find all tables that have foreign keys that reference particular table.column AND have values for those foreign keys?