How to write SQL statement with joins, set subtraction, subqueries - mysql

We have an issue-tracking system for our software. Yesterday my boss gave me the task of finding all issues in the issue table submitted by this certain guy with an ID of 2992:
issue table:
issue_id (int)
submitter_id (int)
...other fields
Then for each issue in that result set I had to look at another table to find who was subscribed to it:
issue_subscribers table:
issue_id (int) [primary key to issue table]
subscriber_id (int)
...other fields
I had these ID numbers (3247, 4748, 7430, 8429) which represent other users, and if any of them were NOT subscribed, I had to add them as subscribers to the issue. This means simply adding a new row to issue_subscribers table.
This was way over my head. I ended up doing this with 3 different queries in a Java JDBC program instead of trying to figure it out with pure SQL. But now I'm curious how it could be done. Is this a simple procedure for professional SQL guys? Is it possible to write a single query to do this? How would it be done?

Supposing:
subscriber_id is a foreign key to some table that hold all possible subscribers.
You only need to do this with issue 2992 and subscribers 3247, 4748, 7430 and 8429.
You could write following SQL query:
Insert Into issue_subscribers
(
issue_id,
subscriber_id
-- and other fields if necessary
)
(
Select 2992 issue_id,
subscriber_id
-- and other fields if necessary
From subscribers subs -- table that holds all possible subscribers
Where subscriber_id In (3247, 4748, 7430, 8429)
And Not Exists (Select 1
From issue_subscribers iss
Where issue_id = 2992
And iss.subscriber_id = subs.subscriber_id)
);
In case you would need to use it for other issues and subscribers you would need to change the ids to parameters, and specify the subscriber_ids of the in condition some other way (perhaps a subselect of some kind).

Related

Insert/Update on table with autoincrement and foreign key

I have a table as such:
id entity_id first_year last_year sessions_attended age
1 2020 1996 2008 3 34.7
2 2024 1993 2005 2 45.1
3 ... ... ...
id is auto-increment primary key, and entity_id is a foreign key that must be unique for the table.
I have a query that calculates first and last year of attendance, and I want to be able to update this table with fresh data each time it is run, only updating the first and last year columns:
This is my insert/update for "first year":
insert into my_table (entity_id, first_year)
( select contact_id, #sd:= year(start_date)
from
( select contact_id, event_id, start_date from participations
join events on participations.event_id = events.id where events.event_type_id = 7
group by contact_id order by event_id ASC) as starter)
ON DUPLICATE KEY UPDATE first_year_85 = #sd;
I have one similar that does "last year", identical except for the target column and the order by.
The queries alone return the desired values, but I am having issues with the insert/update queries. When I run them, I end up with the same values for both fields (the correct first_year value).
Does anything stand out as the cause for this?
Anecdotal Note: This seems to work on MySQL 5.5.54, but when run on my local MariaDB, it just exhibits the above behavior...
Update:
Not my table design to dictate. This is a CRM that allows custom fields to be defined by end-users, I am populating the data via external queries.
The participations table holds all event registrations for all entity_ids, but the start dates are held in a separate events table, hence the join.
The variable is there because the ON DUPLICATE UPDATE will not accept a reference to the column without it.
Age is actually slightly more involved: It is age by the start date of the next active event of a certain type.
Fields are being "hard" updated as the values in this table are being pulled by in-CRM reports and searches, they need to be present, can't be dynamically calculated.
Since you have a 'natural' PK (entity_id), why have the id?
age? Are you going to have to change that column daily, or at least monthly? Not a good design. It would be better to have the constant birth_date in the table, then compute the ages in SELECT.
"calculates first and last year of attendance" -- This implies you have a table that lists all years of attendance (yoa)? If so, MAX(yoa) and MIN(yoa) would probably a better way to compute things.
One rarely needs #variables in queries.
Munch on my comments; come back for more thoughts after you provide a new query, SHOW CREATE TABLE, EXPLAIN, and some sample data.

Use value from a table in a second table

I have two tables called users and packages.
In users there is a column called "package" and in packages a column called "id".
What i'm trying to accomplish is, if the package id in the users table is changed to, lets say "1", then another field from the users table called "storage" should be changed to the corresponding "maxstorage" from the packages table... A little illustration here:
DATABASE:
Let's say Joe would like to upgrade to package number 2. Then his storage amount should be changed when his package is changed. It should pull the maxstorage from the packages table into the users table and then in the column "storage"...
How can i accomplish this?
It's pretty hard to explain for me, if anyone gets it then please edit for easier explanation.
What you want is not possible in a query (or at least, not simple). You have to move this logic to your code, e.g. you have a query which changes a row in the users table. In that query, also update storage.
Even better, drop the users.storage_id completely. Good databases don't repeat. you already have the data in the packages table, why copy it to the users?
SELECT users.name, packages.maxstorage
FROM users
LEFT JOIN packages ON (users.package_id = packages.id)
It can be that I didn't understand correctly your question, but what about this:
(pls make appropriate considerations on transactions to avoid conflicts).
CREATE TABLE P (ID INT, MAXSTORAGE INT);
CREATE TABLE U (USR_ID INT, PACKAGE_ID INT, STORAGE INT);
CREATE TRIGGER U_STORAGE_UPDATE BEFORE UPDATE ON U
FOR EACH ROW BEGIN
SET NEW.STORAGE = IF(NEW.PACKAGE_ID<>OLD.PACKAGE_ID , (SELECT MAXSTORAGE FROM P WHERE ID = NEW.PACKAGE_ID), NEW.STORAGE);
END;
INSERT INTO P VALUES (1,12345);
INSERT INTO P VALUES (2,54321);
INSERT INTO U VALUES (1,1,12000);
INSERT INTO U VALUES (2,2,60000);
SELECT * FROM U;
UPDATE U SET PACKAGE_ID=2 WHERE USR_ID=1;
SELECT * FROM U;
UPDATE U SET STORAGE=23
WHERE USR_ID=1;
SELECT * FROM U;
DROP TABLE P;
DROP TABLE U;
Output:
ante
USR_ID PACKAGE_ID STORAGE
1 1 1 12000
2 2 2 60000
post 1st update
USR_ID PACKAGE_ID STORAGE
1 1 2 54321
2 2 2 60000
post 2nd update
USR_ID PACKAGE_ID STORAGE
1 1 2 23
2 2 2 60000
Doesn't answer the question but might be useful to you:
The 'package' column in Users should have a foreign key restraint on 'id' in Packages. This ensures that all data in the 'package' column corresponds to a valid value in the Packages table. Otherwise you could enter some data into the 'package' column that doesn't have a value in the Packages table.

MySQL Insert if select is null to avoid bidirectional duplicates

I'm building a db to hold friendships between users of my app.
The server I use to communicate with the MySQL instance is written using Node.js (Express).
My table 'friendships' consists mainly of two INTs which correspond (foreign keys) to user ids.
I want to avoid bidirectional duplicates ( 1,2 vs. 2,1) so I need to write a query which does the following:
INSERT INTO friendships f (id_1, id_2) VALUES (?, ?) IF (SELECT * FROM friendships s WHERE s.id_1=? AND s.id_2=?) IS NULL ;
Obviously this one doesn't really work. And of course I would have the last two question marks have opposite values compared to the first ones, and a UNIQUE key on the ids (id_1, id_2).
The usual answer for these kind of questions is "just order your ids by size to avoid duplicates" and it's a good answer. But in my case, I want to keep record of who sent the friend request (and who approved), without using any extra variables (and extra queries).
Also, I don't want to use code for this, in order to avoid "concurrent" problems.
Thanks!
In MySQL, you can do this with a trigger that does the check. Some other databases have functional indexes, indexes on computed columns, or check constraints that help implement this functionality.
If you want to do the check in the insert, you can do:
INSERT INTO friendships(id_1, id_2)
select new1, new2
from (select ? as new1, ? as new2) t
where not exists (select 1
from friendships f
where f.id_1 = new2 and f.id_2 = new1
);
You should also have a unique index on id_1 and id_2:
create unique index idx_friendsships_id1_id2 on (id_1, id_2);
EDIT:
The basic query is:
INSERT INTO friendships(id_1, id_2)
select ?, ?
from dual
where not exists (select 1
from friendships f
where f.id_1 = ? and f.id_2 = ?
);
But you have to get the arguments in the right order, so the earlier method is less prone to error.

Inserting data from one table to another?

I've got two tables where I'm trying to insert data from one to another, I've been able to find a few examples of how this can be accomplished on the web, the problem is these examples mostly rely on identical table structure between the two ... you see I'm trying to insert some data from one table into another table with quite a different structure.
I'm trying to insert data from a table called 'catalog_product_entity_media_gallery' into a table called 'catalog_product_entity_varchar'. Below is a simple description of their structure
The 'catalog_product_entity_varchar' looks as follows:
value_id | entity_type_id | attribute_id | store_id | entity_id | value
PK INT INT INT INT VARCHAR
And the 'catalog_product_entity_media_gallery' table looks as follows:
value_id | attribute_id | entity_id | value
PK INT INT VARCHAR
I need to insert the entity, and value columns from catalog_product_entity_media_gallery into catalog_product_entity_varchar. However as you can see the structure is quite different.
The query I'm trying to use is as follows
USE magento_db;
INSERT INTO catalog_product_entity_varchar(entity_type_id, attribute_id, store_id, entity_id, value)
SELECT
4,
74,
0,
catalog_product_entity_media_gallery.entity_id,
catalog_product_entity_media_gallery.value
FROM catalog_product_entity_media_gallery;
I only need the entity_id and value from media_gallery and the other values are always the same, I have tried to do this using the above but this is just hanging in MySQL (no errors)
I think it's due to the fact that I'm trying to select 4, 74 and 0 from catalog_product_entity_media_gallery but I'm not 100% sure (apologies, I'm a bit of a novice with MySQL)
Can anybody point me in the right direction? Is there any way way I can insert some data from the media table whilst inserting static values for some columns? (I hope this all makes sense)
The query syntax is ok.
However, there may be issues with the unique and foreign keys in catalog_product_entity_varchar table, which doesn't allow you to insert data. Also the query may be waiting for some other query to complete (if your query is just a part of bigger scenario), so it is an issue with locking. Most probable is the first case.
Currently, the question lacks important details:
The MySQL client / programming code you use to perform query. So we
are not able to see the case in full and to reproduce it correctly
The scenario you perform. I.e. whether you do it inside the Magento application in some
module during a web-request. Or whether there are other queries in your script,
some opened transactions, other people accessing the DB server, etc.
Based on most probable assumption that you just don't see the actual error with unique/foreign keys, you may try the following queries.
1) Unique index failure.
Try this:
USE magento_db;
INSERT INTO catalog_product_entity_varchar(entity_type_id, attribute_id, store_id, entity_id, value)
SELECT
4 as etid,
74 as aid,
0 as sid,
catalog_product_entity_media_gallery.entity_id as eid,
catalog_product_entity_media_gallery.value as val
FROM
catalog_product_entity_media_gallery
GROUP BY
eid, aid, sid;
There is a huge possibility, that you insert non-unique entries, because catalog_product_entity_media_gallery can hold multiple entries for the same product, while catalog_product_entity_varchar can not. If the query above successfully completes, then the issue is really with unique key. In such a case you must re-verify what you want to achieve, because the initial aim (not the query itself) is wrong.
2) Wrong foreign key (non-existing attribute 74)
Try this (replacing ATTRIBUTE_CODE and ATTRIBUTE_ENTITY_TYPE_ID with the values you need, e.g. 'firstname' and 6):
USE magento_db;
INSERT INTO catalog_product_entity_varchar(entity_type_id, attribute_id, store_id, entity_id, value)
SELECT
4 as etid,
eav_attribute.attribute_id as aid,
0 as sid,
gallery.entity_id as eid,
gallery.value as val
FROM
catalog_product_entity_media_gallery AS gallery
INNER JOIN
eav_attribute
ON
eav_attribute.attribute_code = '<ATTRIBUTE_CODE>'
AND eav_attribute.entity_type_id = <ATTRIBUTE_ENTITY_TYPE_ID>
GROUP BY
eid, aid, sid;
If it executes successfully AND
Some rows are added to the catalog_product_entity_varchar - then it seems, that 74 was chosen as a wrong id of the attribute you needed, thus foreign key in catalog_product_entity_varchar didn't allow you to insert the records.
No rows are added to the catalog_product_entity_varchar - then it seems, that you mistake in attribute id, attribute code and entity type. Recheck, what you put as ATTRIBUTE_CODE and ATTRIBUTE_ENTITY_TYPE_ID.
If both queries still hang - then you have issues with your MySQL client or server or execution scenario.
Note: your initial query may make sense in your specific case, but some issues are signalling that something may be wrong with your approach, because:
You're using direct numbers for ids. But ids are different for different installations and Magento versions. It is expected to use more stable values, like attribute code in second query, by which you should extract the actual attribute id.
You copy data from the storage catalog_product_entity_media_gallery, which can store multiple entries for the same product, to the storage catalog_product_entity_varchar, which is able to store only one entry for the product. It means, that you cannot copy all the data in such a way. Probably, your query doesn't reflect the goal you want to achieve.
The entity type id, inserted to the catalog_product_entity_varchar is not related to attribute id. While in Magento these are deeply connected things. Putting the wrong entity type id in a table will either make Magento behave incorrectly, or it won't notice your changes at all.
try this
INSERT INTO catalog_product_entity_varchar( entity_id, value)
VALUES (
SELECT entity_id, value
FROM catalog_product_entity_media_gallery
WHERE value_id = here the row id of value_id which have those values 4,74,0 )
Assuming the valued_id in the catalog_product_entity_varchar table is an autoincrement, could you not do the following?
USE magento_db;
INSERT INTO catalog_product_entity_varchar(entity_type_id, store_id, entity_id, value)
SELECT
4,
74,
catalog_product_entity_media_gallery.entity_id,
catalog_product_entity_media_gallery.value
FROM catalog_product_entity_media_gallery;
Note that there is no attribute_id column in your catalog_product_entity_varchar table.

Database design - primary key naming conventions

I am interested to know what people think about (AND WHY) the following 3 different conventions for naming database table primary keys in MySQL?
-Example 1-
Table name: User,
Primary key column name: user_id
-Example 2-
Table name: User,
Primary key column name: id
-Example 3-
Table name: User,
Primary key column name: pk_user_id
Just want to hear ideas and perhaps learn something in the process :)
Thanks.
I would go with option 2. To me, "id" itself seems sufficient enough.
Since the table is User so the column "id" within "user" indicates that it is the identification criteria for User.
However, i must add that naming conventions are all about consistency.
There is usually no right / wrong as long as there is a consistent pattern and it is applied across the application, thats probably the more important factor in how effective the naming conventions will be and how far they go towards making the application easier to understand and hence maintain.
I always prefer the option in example 1, in which the table name is (redundantly) used in the column name. This is because I prefer to see ON user.user_id = history.user_id than ON user.id = history.user_id in JOINs.
However, the weight of opinion on this issue generally seems to run against me here on Stackoverflow, where most people prefer example 2.
Incidentally, I prefer UserID to user_id as a column naming convention. I don't like typing underscores, and the use of the underscore as the common SQL single-character-match character can sometimes be a little confusing.
ID is the worst PK name you can have in my opinion. TablenameID works much better for reporting so you don't have to alias a bunch of columns named the same thing when doing complex reporting queries.
It is my personal belief that columns should only be named the same thing if they mean the same thing. The customer ID does not mean the same thing as the orderid and thus they should conceptually have different names. WHen you have many joins and a complex data structure, it is easier to maintain as well when the pk and fk have the same name. It is harder to spot an error in a join when you have ID columns. For instance suppose you joined to four tables all of which have an ID column. In the last join you accidentally used the alias for the first table and not the third one. If you used OrderID, CustomerID etc. instead of ID, you would get a syntax error because the first table doesn't contain that column. If you use ID it would happily join incorrectly.
I tend to go with the first option, user_id.
If you go with id, you usually end up with a need to alias excessively in your queries.
If you go with more_complicated_id, then you either must abbreviate, or you run out of room, and you get tired of typing such long column names.
2 cents.
I agree with #InSane and like just Id. And here's why:
If you have a table called User, and a column dealing with the user's name, do you call it UserName or just Name? The "User" seems redundant. If you have a table called Customer, and a column called Address, do you call the column CustomerAddress?
Though I have also seen where you would use UserId, and then if you have a table with a foreign key to User, the column would also be UserId. This allows for the consistency in naming, but IMO, doesn't buy you that much.
In response to Tomas' answer, there will still be ambiguity assuming that the PK for the comment table is also named id.
In response to the question, Example 1 gets my vote. [table name]_id would actually remove the ambiguity.
Instead of
SELECT u.id AS user_id, c.id AS comment_id FROM user u JOIN comment c ON u.id=c.user_id
I could simply write
SELECT user_id, comment_id FROM user u JOIN comment c ON u.user_id=c.user_id
There's nothing ambiguous about using the same ID name in both WHERE and ON. It actually adds clarity IMHO.
I've always appreciated Justinsomnia's take on database naming conventions. Give it a read: http://justinsomnia.org/2003/04/essential-database-naming-conventions-and-style/
I would suggest example 2. That way there is no ambiguity between foreign keys and primary keys, as there is in example 1. You can do for instance
SELECT * FROM user, comment WHERE user.id = comment.user_id
which is clear and concise.
The third example is redundant in a design where all id's are used as primary keys.
OK so forget example 3 - it's just plain silly, so it's between 1 and 2.
the id for PK school of thought (2)
drop table if exists customer;
create table customer
(
id int unsigned not null auto_increment primary key, -- my names are id, cid, cusid, custid ????
name varchar(255) not null
)engine=innodb;
insert into customer (name) values ('cust1'),('cust2');
drop table if exists orders;
create table orders
(
id int unsigned not null auto_increment primary key, -- my names are id, oid, ordid
cid int unsigned not null -- hmmm what shall i call this ?
)engine=innodb;
insert into orders (cid) values (1),(2),(1),(1),(2);
-- so if i do a simple give me all of the customer orders query we get the following output
select
c.id,
o.id
from
customer c
inner join orders o on c.id = o.cid;
id id1 -- big fan of column names like id1, id2, id3 : they are sooo descriptive
== ===
1 1
2 2
1 3
1 4
2 5
-- so now i have to alias my columns like so:
select
c.id as cid, -- shall i call it cid or custid, customer_id whatever ??
o.id as oid
from
customer c
inner join orders o on c.id = o.cid; -- cid here but id in customer - where is my consistency ?
cid oid
== ===
1 1
2 2
1 3
1 4
2 5
the tablename_id prefix for PK/FK name school of thought (1)
(feel free to use an abbreviated form of tablename i.e cust_id instead of customer_id)
drop table if exists customer;
create table customer
(
cust_id int unsigned not null auto_increment primary key, -- pk
name varchar(255) not null
)engine=innodb;
insert into customer (name) values ('cust1'),('cust2');
drop table if exists orders;
create table orders
(
order_id int unsigned not null auto_increment primary key,
cust_id int unsigned not null
)engine=innodb;
insert into orders (cust_id) values (1),(2),(1),(1),(2);
select
c.cust_id,
o.order_id
from
customer c
inner join orders o on c.cust_id = o.cust_id; -- ahhhh, cust_id is cust_id is cust_id :)
cust_id order_id
======= ========
1 1
2 2
1 3
1 4
2 5
so you see the tablename_ prefix or abbreviated tablename_prefix method is ofc the most
consistent and easily the best convention.
I don't disagree with what most of the answers note - just be consistent. However, I just wanted to add that one benefit of the redundant approach with user_id allows for use of the USING syntactic sugar. If it weren't for this factor, I think I'd personally opt to avoid the redundancy.
For example,
SELECT *
FROM user
INNER JOIN subscription ON user.id = subscription.user_id
vs
SELECT *
FROM user
INNER JOIN subscription USING(user_id)
It's not a crazy significant difference, but I find it helpful.