I have the following two queries being run when the assignment of staff on a team changes:
delete from `team_staff` where `team_id`=5
insert into `team_staff` (`team_id`,`staff_id`) values (5,1),(5,2)
In this case, I'm changing the staff roster of Team 5 to be Staff 1 and Staff 2. Anyone else who was assigned is now unassigned.
It works, but I like reducing the number of queries being run - it's good for the query cache and such.
Usually I would do something like this:
insert ignore into `team_staff` (`team_id`,`staff_id`) values (5,1),(5,2)
However this won't remove any staff who aren't Staff 1 or Staff 2.
Is there any way to do this in a single query, or am I stuck with two?
There isn't an insert & delete MySQL function, but that doesn't mean you're out of luck. Consider these options:
1) Your current method deletes without reference to whether your insert will re-insert something you've deleted, creating more work that doesn't really need to be done. Also, deleting all those rows all of the time will require some OPTIMIZE commands to be run periodically. You can optimize your delete command to ignore those you'll insert, but will still need to optimize periodically.
2) Since your teams are numbered, if there are only a few team numbers ever used, you may want to look at using a SET on the staff table. Then you will only need to execute an update command.
3) If SET is too small, you may want to look at one of the INT columns and use binary numbers &'d together. For example, team 1 is 2^1, team 2 is 2^2, ... Then when you update the INT column on the staff table for the team membership, you would update it to represent the value of the teams they were part of (e.g. UPDATE staff SET teams=2^1 & 2^2 & 2^3). This would allow you to search for matches using only numbers.
Those are the potential optimizations which come to my mind, though none of them directly answer your specific question, they do address the intent.
Related
This question deals with how one should handle conditional insertions into tables.
Suppose we customers and employees.
A customer can only be assigned 4 employees at a time.
We will come back to this in a moment.
On the database level, we have checks and triggers.
In MariaDB, CHECK constrains cannot have subqueries, so we cannot impose constraints
regarding degree of participation in this way. For instance, we cannot say something
like
CHECK ( customer_id IN (SUBQUERY that returns count of Employees with == 4 employees) ).
Triggers may be the solution.
insert-or-update). If a manager attempts to assign a customer to an employee who already has 4 customers, we should not allow that record to be inserted into the table that links customers to their employees. In this case we want a trigger that acts before insertion. We only want the
insertion to occur if that employee is not in the subquery that lists
employees with 4 customers.
We want to stop the insertion, but by law, the trigger will not do this
based on a condition that is merely stated in the trigger. From my understanding, the only way to do this is to send a signal (look at Use a trigger to stop an insert or update
This leads to my next two questions.
Is using a signal 'ideal'? Is it problematic? Is there a better way to insert into a table
based on a condition, instead of merely performing side effects prior to
an insertion or perhaps after an insertion?
It appears that the db would send a signal if the constraint was violated to begin with, so would this ever impact the application built on top of it?
Some flavors of restraint can be dealt with thus:
INSERT INTO TABLE (a,b,c)
SELECT ...
WHERE <-- put the logic here (if possible)
That is, arrange to have the SELECT deliver more or fewer rows based on your business logic.
I have to use this for a project at work, and am running into some trouble. I have a large database (58mil rows) that I have figured out how to query down to what I want and then write this row in to a separate table. Here is my code so far:
insert into emissionfactors(pollutantID,fuelTypeID,sourceTypeID,emissionFactor)
select pollutantID,fuelTypeID,sourceTypeID,avg(ratePerDistance) as emissionFactor
from onroad_run_1.rateperdistance
where pollutantID=45
and fuelTypeID=2
and sourceTypeID=32;
I have about 60 different pollutant ID's, and currently I am manually changing the pollutantID number on line 5 and executing the script to write the row into my 'emissionfactors' table. Each run takes 45 seconds and I have several other fuel types and source types to do so this could take like 8 hours of clicking every 45 seconds. I have some training in matlab and thought I could put a while loop around the above code, create an index, and have it loop through from 1 to 184 on the pollutant IDs but I can't seem to get it to work.
Here are my goals:
- loop the pollutantID from 1 to 184.
-- not all integers are in this range, so need it to simply add one to the index and check to see if that number is found in the pollutantID column if the index is not found.
-- if the index number is found in the pollutant ID column, execute my above code to write the data into my other table
You do not need a while loop, all you need is to change your where clause to use a BETWEEN clause and also tell it what you want to base the average on by adding a GROUP BY clause
insert into emissionfactors(pollutantID,fuelTypeID,sourceTypeID,emissionFactor)
select pollutantID,fuelTypeID,sourceTypeID,avg(ratePerDistance) as emissionFactor
from onroad_run_1.rateperdistance
where pollutantID BETWEEN 1 AND 184
and fuelTypeID=2
and sourceTypeID=32
GROUP BY pollutantID , fuelTypeID, sourceTypeID;
If in fact you want the entire range of the pollutantID, fuelTypeID and sourceTypeID that exists you can just remove the where clause altogether.
insert into emissionfactors(pollutantID,fuelTypeID,sourceTypeID,emissionFactor)
select pollutantID,fuelTypeID,sourceTypeID,avg(ratePerDistance) as emissionFactor
from onroad_run_1.rateperdistance
GROUP BY pollutantID , fuelTypeID, sourceTypeID;
You also don't need to check if the row exists before executing the query, as if it doesn't exist and returns no rows it just won't insert any.
As to the speed issue, you will need to look at adding some table indexes to your table to improve performance. In this case an index that has pollutantID, fuelTypeID and sourceTypeID would speed things up greatly.
My advice, ask for help at work. It is better to admit early that you do not know how to do something and get proper help, as you also mention that you have different fuel types that you want, but the details of that are missing from your question.
I have a food_table and a person_table. Then I have a third table fav_food_table that stores the relation between food and person using food_id and person_id.
When the person goes to account info and updates his favourite food, the input data is passed to the PHP (HTTP) script as an array of selected food_id. A person can have multiple fav_food, the relation is one-to-many.
The naïve way to update fav_food_table is to delete from fav_food_table all that belongs to person_id then re-insert all the rows again. Thus, using 2 statements.
Is there a single statement that can do the same thing?
PSUEDO CODE:
CREATE TABLE food_table (food_id, food_name);
CREATE TABLE person_table (person_id, person_name);
CREATE TABLE fav_food (person_id, food_id);
You have a m:n relationship between person_table and food_table. This means you have multiple records in your relationship table related to the same person. When a person updates their favourite foods, any combination of four independent cases can occur:
Food A was favourite before, but is not anymore (DELETE FROM fav_food_table)
Food B was favourite before and still is! (do nothing)
Food C was no favourite before, but now is a favourite (INSERT INTO fav_food_table)
Food D was no favourite before, and still is no favourite! (do nothing)
To correctly keep your database up to date, you have to handle all four cases. Cases 2 and 4 are covered easily. Just don't do anything :)
That means, you have to do at least two steps to keep your database up to date: 1 and 3.
Your goal seems to be to reduce the number of sql statements, that have to be executed.
Deleting all favourites for one person from the table can always be done with a single statement:
DELETE
FROM fav_food_table
WHERE person_id = ?;
To delete selected favourites for one person, only an AND food_id IN (?,?,?) has to be added to the WHERE clause.
Inserting into the table can also be done with a single statement:
INSERT INTO fav_food_table (person_id,food_id)
VALUES (?,?),
(?,?),
(?,?),
.....;
Summary as of right now:
No matter, whether we delete all old records for this person and then insert all new records, or whether we only delete selected records and insert new ones: We can do it with two statements!
In the second case (the "smart" case), however, we need to know not only the new state of the relation, but also the old state to compute the difference between the two. This will result in either one more SELECT statement or needs some smart "client" (PHP) logic.
Your two step process doesn't seem to be naive, but easy and effective to me. There is no way to reduce this process to less than two statements.
If you want to reduce the number of times you have to effectively send commands from PHP to the server, you can either look into mysqli_multi_query() or into creating a stored procedure which holds both your DELETE and INSERT statement. But bottom line, this will be the same thing as executing the two queries on its own.
You can also look into MySQL Transactions to implement a safer process and be able to rollback your DELETE command should an error occur later on.
May I suggest something different. Instead of deleting, updating?
Deleting data in common is something that should be thought off. Deleting data cannot be retrieved. So check this out.
We are adjusting your code a bit.
CREATE TABLE food_table (food_id, food_name);
CREATE TABLE person_table (person_id, person_name);
CREATE TABLE fav_food (person_id, food_id, fav_food_active);
/* see the last column i added. It should be a bit type and can only hold the values 0 and 1. */
Now you technically can update this everytime. No need for deleting the values. This statement is
/* deletion */
UPDATE Fav_Food
SET fav_food_active = 0
WHERE food_id = (your food_id)
AND person_id = (your person_id)
/* activating it again */
UPDATE Fav_Food
SET fav_food_active = 1
WHERE food_id = (your food_id)
AND person_id = (your person_id)
So now you switch between those 2 for activating and deleting it, without having the consequences of deleting hard data. Overal, you can just call it like this in your code
SELECT *
FROM fav_food
WHERE (here your where clause on which you wanna search for)
AND fav_food_active = 1
Remember when you enter something in the database, you should always add it as 1. You can do that in PHP myadmin as auto value, or hard code it in your INSERT statement.
I am not sure what your backend code is (okay taking that back, its PHP, i read your post) looking at your question you shooting everything through an array, but try to work in some checks that foreach fav_food entry, check first in the db if the query excists. If it does, update it, if not, insert it. and let that run in a loop.
So something like
foreach ($food_id as $value){
// check here the overal statement if it excists in your db
// give back count
if ($count == 1){
// update query to delete it.
}else{
// create your insert query here
}
}
Hope this helps. Happy coding!
I tried to design a data structure for easy and fast querying (delete, insert an update speed does not really matter for me).
The problem: transitive relations, one entry could have relations through other entries whose relations I don't want to save separately for every possibility.
Means--> I know that Entry-A is related to Entry-B and also know that Entry-B is related to Entry-C, even though I don't know explicitly that Entry-A is related to Entry-C, I want to query it.
What I think the solution is:
Eliminating the transitive part when inserting, deleting or updating.
Entry:
id
representative_id
I would store them as sets, like group of entries (not mysql set type, the Math set, sorry if my English is wrong). Every set would have a representative entry, all of the set elements would be related to the representative element.
A new insert would insert the Entry and set the representative as itself.
If the newly inserted entry should be connected to another, I simply set the representative id of the newly inserted entry to the referred entry's rep.id.
Attach B to A
It doesn't matter, If I need to connect it to something that is not a representative entry, It would be the same, because every entry in the set would have the same rep.id.
Attach C to B
Detach B-C: The detached item would have become a representative entry, meaning it would relate to itself.
Detach B-C and attach C to X
Deletion:
If I delete a non-representative entry, it is self explanatory. But deleting a rep.entry is harder a bit. I need to chose a new rep.entry for the set and set every set member's rep.id to the new rep.entry's rep.id.
So, delete A in this:
Would result this:
What do you think about this? Is it a correct approach? Am I missing something? What should I improve?
Edit:
Querying:
So, If I want to query every entry that is related to an certain entry, whose id i know:
SELECT *
FROM entries a
LEFT JOIN entries b ON (a.rep_id = b.rep_id)
WHERE a.id = :id
SELECT * FROM AlkReferencia
WHERE rep_id=(SELECT rep_id FROM AlkReferencia
WHERE id=:id);
About the application that requires this:
Basically, I am storing vehicle part numbers (references), one manufacturer can make multiple parts that can replace another and another manufacturer can make parts that are replacing other manufacturer's parts.
Reference: One manufacturer's OEM number to a certain product.
Cross-reference: A manufacturer can make products that objective is to replace another product from another manufacturer.
I must connect these references in a way, when a customer search for a number (doesn't matter what kind of number he has) I can list an exact result and the alternative products.
To use the example above (last picture): B, D and E are different products we may have in store. Each one has a manufacturer and a string name/reference (i called it number before, but it can be almost any character chain). If I search for B's reference number, I should return B as an exact result and D,E as alternatives.
So far so good. BUT I need to upload these reference numbers. I can't just migrate them from an ALL-IN-ONE database. Most of the time, when I upload references I got from a manufacturer (somehow, most of the time from manually, but I can use catalogs too), I only get a list where the manufacturer tells which other reference numbers point to his numbers.
Example.:
Asas filter manufacturer, "AS 1" filter has these cross references (means, replaces these):
GOLDEN SUPER --> 1
ALFA ROMEO --> 101000603000
ALFA ROMEO --> 105000603007
ALFA ROMEO --> 1050006040
RENAULT TRUCKS (RVI) --> 122577600
RENAULT TRUCKS (RVI) --> 1225961
ALFA ROMEO --> 131559401
FRAD --> 19.36.03/10
LANDINI --> 1896000
MASSEY FERGUSON --> 1851815M1
...
It would took ages to write all of the AS 1 references down, but there is many (~1500 ?). And it is ONE filter. There is more than 4000 filter and I need to store there references (and these are only the filters). I think you can see, I can't connect everything, but I must know that Alfa Romeo 101000603000 and 105000603007 are the same, even when I only know (AS 1 --> alfa romeo 101000603000) and (as 1 --> alfa romeo 105000603007).
That is why I want to organize them as sets. Each set member would only connect to one other set member, with a rep_id, that would be the representative member. And when someone would want to (like, admin, when uploading these references) attach a new reference to a set member, I simply INSERT INTO References (rep_id,attached_to_originally_id,refnumber) VALUES([rep_id of the entry what I am trying to attach to],[id of the entry what I am trying to attach to], "16548752324551..");
Another thing: I don't need to worry about insert, delete, update speed that much, because it is an admin task in our system and will be done rarely.
It is not clear what you are trying to do, and it is not clear that you understand how to think & design relationally. But you seem to want rows satisfying "[id] is a member of the set named by member [rep_id]".
Stop thinking in terms of representations and pointers. Just find fill-in-the-(named-)blank statements ("predicates") that say what you know about your application situations and that you can combine to ask about your application situations. Every statement gets a table ("relation"). The columns of the table are the names of the blanks. The rows of the table are the ones that make its statement true. A query has a statement built from its table's statements. The rows of its result are the ones that make its statement true. (When a query has JOIN of table names its statement ANDs the tables' statements. UNION ORs them. EXCEPT puts in AND NOT. WHERE ANDs a condition. Dropping a column by SELECT corresponds to logical EXISTS.)
Maybe your application situations are a bunch of cells with values and pointers. But I suspect that your cells and pointers and connections and attaching and inserting are just your way of explaining & justifying your table design. Your application seems to have something to do with sets or partitions. If you really are trying to represent relations then you should understand that a relational table represents (is) a relation. Regardless, you should determine what your table statements are. If you want design help or criticism tell us more about your application situations, not about representation of them. All relational representation is by tables of rows satisfying statements.
Do you really need to name sets by representative elements? If we don't care what the name is then we typically use a "surrogate" name that is chosen by the DBMS, typically via some integer auto-increment facility. A benefit of using such a membership-independent name for a set is that we don't have to rename, in particular by choosing an element.
This may be a little difficult to answer given that I'm still learning to write queries and I'm not able to view the database at the moment, but I'll give it a shot.
The database I'm trying to acquire information from contains a large table (TransactionLineItems) that essentially functions as a store transaction log. This table currently contains about 5 million rows and several columns describing products which are included in each transaction (TLI_ReceiptAlias, TLI_ScanCode, TLI_Quantity and TLI_UnitPrice). This table has a foreign key which is paired with a primary key in another table (Transactions), and this table contains transaction numbers (TRN_ReceiptNumber). When I join these two tables, the query returns one row for every item we've ever sold, and each row has a receipt number. 16 rows might have the same receipt number, meaning that all of these items were sold in a single transaction. Below that might be 12 more rows, each sharing another receipt number. All transactions are broken down into multiple rows like this.
I'm attempting to build a query which returns all rows sharing a single receipt number where at least one row with that receipt number meets certain criteria in another column. For example, three separate types of gift cards all have values in the TLI_ScanCode column that begin with "740000." I want the query to return rows with values beginning with these six digits in the TLI_ScanCode column, but I would also like to return all rows which share a receipt number with any of the rows which meet the given scan code criteria. Essentially, I need the query to return all rows for every receipt number which is also paired in at least one row with a gift card-related scan code.
I attempted to use a subquery to return a column of all receipt numbers paired with gift card scan codes, using "WHERE A.TRN_ReceiptAlias IN (subquery..." to return only those rows with a receipt number which matched one of the receipt numbers returned by the subquery. This appeared to run without issue for five minutes before the server ground to a halt for another twenty while it processed the query. The query appeared to complete successfully, but given that I was working with IT to restore normal store operations during this time I failed to obtain the results of the query (apart from the associated shame and embarrassment).
I'd like to know if there is a way to write a query to obtain this information without causing the server to hang. I'm assuming that either: a) it wasn't very smart to use a subquery in this manner on such a large table, or b) I don't know enough about SQL to obtain the information I need. I'm assuming the answer is both A and B, but I'd very much like to learn how to do this the right way. Any help would be greatly appreciated. Thanks!
SELECT *
FROM a as a1
JOIN b
ON b.id = a.id
JOIN a as a2
ON a2.id = b.id
WHERE b.some_criteria = 'something';
Include an index on (b.id,b.some_criteria)
You aren't the first person, nor will you be the last to bring down your system with an inefficient query.
The most important lesson is that "Decision Support" and "Analytics" really don't co-exist with a transaction system. You really want to pull the data into a datamart or datawarehouse or some other database that isn't your transaction database, so that you don't take the business offline.
In terms of understanding why your initial query was so inefficient, you want to familiarize yourself with the EXPLAIN EXTENDED syntax that returns you plan information that should help you debug your query and work on making it perform acceptably. If you update your question with the actual explain plan output for it, that would be helpful in determining what the issue is.
Just from the outline you provided, it does sound like a self join would make sense rather than the subquery.