complex SQL query - one table

complex SQL query - one table - mysql

I am new to SQL.
I was wondering if there is a way to form a complex (I think) query of a certain form, regarding a single table - or a simple query for the same effect.
Let's say I have a table of voice actor candidates, with different attributes (columns) - name and characteristics.
Let's say I have two different actor evaluators (Stewie and Griffin), and all the candidates were evaluated by minimum one of them (one, or both). The evaluators evaluate the actors, and the table is built.
The rows in the table are per-evaluation, not per-person, meaning that some candidates have two separate rows, one from each evaluation.
The evaluator's name is also an attribute, a column.
Can I make a query that will choose all candidates that were evaluated by both evaluators? (and let's say show all these rows, an even number then)
(There is no attribute "evaluated by both" - that's the core)
I think it should find all rows with evaluator Stewie, then search the entire table for rows with the corresponding candidates' names, and get those with evaluator Griffin.
Summary
A table with people - names and characteristics. One or two rows per person. Each row was filled according to a different observer. There is an attribute "Is Nice". How to find all people that were observed by two observers, one marked "Yes" and one "No" under "Is Nice"?
Update
It will take me some time to check all the answers (as not enough experience yet), and I will update what worked for me.

Can I make a query that will choose all candidates that were evaluated
by both evaluators?
(and let's say show all these rows, an even number then)
There are multiple ways to do this. You can check the existence of other evaluator's evaluation, using EXISTS:
SELECT * FROM Candidate AS C1 WHERE EXISTS (SELECT * FROM Candidate AS C2 WHERE C1.id = C2.id AND C1.evaluator != C2.evaluator)
Or, you could join the table to itself: (The checks for evaluators should be changed as appropriate)
SELECT C1.candidateName FROM Candidate AS C1 JOIN Candidate AS C2 USING (id) WHERE C1.evaluator = Stewie AND C2.evaluator = Griffin
How to find all people that were observed by two observers, one marked
"Yes" and one "No" under "Is Nice"?
For this one, you add another condition to the queries above, that checks if one evaluation was "Yes" and the other one was "No".

You seem to want group by and having. SInce a person cannot have more than two rows, and there are only two distinct possible values for isnice (yes or no), we can phrase the query as:
select name
from people
group by name
having max(isnice) <> min(isnice)
This filter names that have (at least) two different values in isnice. Starting from the above assumptions, this is sufficient to ensure that that person was evaluated more than once, and that isnice has (at least) two different values.

So, I read the problem very carefully, and came up with my own solution.
Please verify the code below if this is what you were really asking for?
--Create Candidates Table
CREATE TABLE tbl_candidates
(
c_id INT PRIMARY KEY NOT NULL IDENTITY(1,1),
c_name VARCHAR(30),
)
--Create Evaluators Table
CREATE TABLE tbl_evaluators
(
e_id INT PRIMARY KEY NOT NULL IDENTITY(1,1),
e_name VARCHAR(30),
)
--Create Evaluations Table
CREATE TABLE tbl_evaluations
(
ee_id INT PRIMARY KEY NOT NULL IDENTITY(1,1),
ee_title VARCHAR(30) NOT NULL,
ee_remarks VARCHAR(30) NOT NULL,
ee_date date NOT NULL,
c_id INT FOREIGN KEY (c_id) REFERENCES tbl_candidates(c_id) NOT NULL,
e_id1 INT FOREIGN KEY (e_id1) REFERENCES tbl_evaluators(e_id) NOT NULL,
e_id2 INT FOREIGN KEY (e_id2) REFERENCES tbl_evaluators(e_id),
IsNice VARCHAR(4)
)
--Populate data & check to verify
INSERT INTO tbl_candidates (c_name) VALUES ('Sam') , ('Smith')
SELECT * FROM tbl_candidates
INSERT INTO tbl_evaluators (e_name) VALUES ('Stewie'),('Griffin')
SELECT * FROM tbl_evaluators
INSERT INTO tbl_evaluations
(ee_title,ee_remarks,ee_date,c_id,e_id1,e_id2,IsNice)
VALUES
('Some Title','Some Comment','2020-6-12',1,1,NULL,'No'),
('Some Title','Some Comment','2020-6-12',2,1,2,'Yes'),
('Some Title','Some Comment','2020-6-12',3,2,NULL,'No')
--finally comparing whether we have the matching data of our input vs tables combined data display
select * from tbl_evaluations
select ee_id,ee_title,c_name,ee_remarks,e1.e_name,e2.e_name,ee_date,IsNice from tbl_evaluations ee
left join tbl_candidates c on c.c_id = ee.c_id left join tbl_evaluators e1 on e1.e_id = ee.e_id1 left join tbl_evaluators e2 on e2.e_id = ee.e_id2
See the result proof :

This is surely not the best way to write it, but my first thought is
SELECT * FROM evaluations
WHERE PrName IN (
SELECT PrName
FROM evaluations
WHERE IsNice ='No')
AND PrName IN (
SELECT PrName
FROM evaluations
WHERE IsNice ='Yes')

Related

Uncaught mysqli_sql_exception: Subquery returns more than 1 row

I have two tables one with names and telnumbers the second with calls
addressbook name(VARCHAR) number(VARCHAR)
calls date(DATE) number(VARCHAR) name(VARCHAR)
I want to update the names column in the calls table with the entries in the addressbook for the respective
UPDATE calls
SET name = ( SELECT name FROM addressbook WHERE number = calls.number )
WHERE DATE = "2020.01.01"
ORDER BY DATE
And I get Uncaught mysqli_sql_exception: Subquery returns more than 1 row but there are no doublette in the addressbook I checked it several times.

The only way your update statement can fail with
Subquery returns more than 1 row
is if there is at least one calls row whose number appears more than once in addressbook. You can find them with this query:
select number, count(*)
from addressbook
group by number
having count(*) > 1;
Let's say you have these two rows in addressbook:
name number
------ ------
fred 123
barney 123
And let's say this is the row in calls:
date name number
---------- ---- ------
2020.01.01 null 123
When you execute Stefano's update statement, the limit clause is not deterministic because there's no associated order by clause in the subquery. Nor is there any attribute common to calls and addressbook that would make it meaningful. The order by clause on the update is irrelevant. Therefore, you cannot guarantee which name will be assigned to the calls row. This is the point I was trying to make in my comment to Stefano's answer.
If the design of the system is to allow a number to be owned by multiple people over time (which they are of course), then your schema is not complete. And if that's true, then addressbook needs an effective date for the owner of the number.
If the design of the system is not to allow a number to be owned by multiple people over time, then you must delete the duplicate rows.
In either case, you need to do two things:
employ declarative referential integrity constraints so you don't run afoul again
stop updating calls: either insert (not update) the name or remove the column entirely
If I were to implement the tables of a telephony system, I would start with something like this:
create table PERSON (
PERSON_ID integer not null primary key,
NAME varchar(100) not null /*lots of other columns*/);
create table PERSON_PHONE (
PERSON_PHONE_ID integer not null primary key,
PERSON_ID integer not null,
PHONE_NUM varchar(30) not null,
CONTRACTED date not null, /*lots of other columns*/
unique (PERSON_ID, PHONE_NUM, CONTRACTED),
foreign key (PERSON_ID) references PERSON(PERSON_ID));
create table PHONE_CALL (
START_DATE date not null,
END_DATE date not null,
PERSON_PHONE_ID integer not null,
primary key (PERSON_PHONE_ID, START_DATE),
foreign key (PERSON_PHONE_ID) references PERSON_PHONE(PERSON_PHONE_ID));
It is true that sometimes, for the sake of making queries finish faster using fewer resources, people will sometimes denormalize a schema to decrease the number of join operations that would otherwise be required. Denormalization requires careful consideration.

The error is self explanatory, the sub query returns more than one row, a quick solution is:
SELECT name FROM addressbook WHERE number = calls.number LIMIT 1
if this solve the issue than the query return more than a row. If you need to returns just a row without using LIMIT 1 you should review your query adding more constraints or define a primary key for the addressbook table and continuing use your subquery as it is. This is on you.

How to find people who have deceased through SQL from a table?

I have two doubts:
I have a table as follows:
AUTHOR
(
authorID int NOT NULL,
authName varchar(255) NOT NULL,
authSurname varchar(255),
authPlaceOfBirth varchar(255),
authDOB date(),
authDoD varchar(255),
PRIMARY KEY (authorID)
)
Now, I want to find the authors who have died. That is, if the value of DoD exists in the table, then they have died. How to do this? That is, a particular value in a column exists?
Something like this:
SELECT authName
FROM AUTHOR
WHERE authDoD is not NULL?
Second, I have two tables as follows:
TABLE inventory_genre
{
genreID int NOT NULL,
inventoryID int NOT NULL,
PRIMARY KEY (genreID,inventoryID)
}
TABLE INVENTORY
{
inventoryID int NOT NULL,
title varchar(255),
wholesale int,
markup int,
qtyinStock int,
Discount int,
PRIMARY KEY (inventoryID)
}
I want to list all the genres that have no associated titles in the inventory. I know I have to subtract but I am not able to come up with it exactly. Please guide me in the right direction!

Not sure I understand the criteria you are describing in the first question, but either
select * from author where authDOB is not null;
or
select * from author where authDOB = 'some value that I dont know';
For the second one, you could use exists or in with a nested select:
select * from genre where id not in (select genreId from inventory_genre);

Part 1
Think about it logically:
you're looking for people who are referenced by what?
By their death.
How do you know they're dead?
Because a certain field in the table has been filled in.
Can you quantify this?
Yes, the value of their death exists.
So, yon can construct an SQL search that looks in the table of Authors for rows (people) where the column (authDoD) value is not nothing/null.
As a note for a perfect world you should have IS NULL set in the column authDoD so that if no value is set the column row value is NULL and easier to handle in queries.
SQL (assuming column can be empty but not Null):
SELECT * FROM authors WHERE authDoD > ''
SQL (assuming column can only be null unless dead):
SELECT * FROM authors WHERE authDoD IS NOT NULL
Part 2
You want to return a negative search, a search that turns up no results, so find each genre which does not feature in any inventory table row.
While I'm about to write a longer answer for this, the answer just posted by Tobb does exactly what you need.
Also note that you should ideally have an INDEX on your ID columns to make traversing them smoother and faster.

For the first part of your question, you could use
SELECT * FROM AUTHOR WHERE authDoD IS NOT NULL;
This would SELECT the deceased people

Your first problem is solved, if you check for authorDoD > ''. It will evaluate to false if there's NULL or the empty string ''.
To solve your second problem you could just JOIN your tables, since the inventoryID isn't allowed to be NULL.
JOIN INVENTORY i ON (inventory_genre.inventoryID = i.inventoryID)
After that you can check the existence of an title like above with title > ''.

Most efficient way to select data from one sql table and see if it matches data on another table in the same database

I have a database with 2 tables, both tables have around 200,000 records.
Lets call these tables, TableA and TableB
Currently I have a function that triggers a select query, this query grabs all records in TableA that match a condition. Once I have that data, I have a foreach loop that uses the data from TableA to see if it matches any record in TableB.
The problem is that it takes a while to do this because there are so many records. I know the way Im doing it works because it does what its supposed to but it takes a good 3 minutes to finish the script. Is there a faster more efficient way to do something like this?
Thank you in advance for the help.
PS: I'm using PHP.

The most efficient way to achieve what you want is to:
1. Create a primary key column for each table (if you do not already have one). Example schema where column "id" is a unique identifier for the table row:
TableA
id firstname lastname
1 Michael Douglas
2 Michael Jackson
TableB
id table_a_id pet
1 1 cat
2 2 ape
3 1 dog
Google or search here on stackoverflow on how to create or add a primary key for a mysql table column. An example of creating TableA with a primary key:
CREATE TABLE `TableA` (
`id` int(11) unsigned AUTO_INCREMENT,
`firstname` varchar(100),
`lastname` varchar(100),
PRIMARY KEY (`id`)
)
2. Create an SQL-query to fetch what you need. For example:
To get all rows with at least one match in BOTH tables:
SELECT TableA.id, TableA.firstname, TableA.lastname, TableB.pet
FROM TableA
INNER JOIN TableB
ON TableA.id = TableB.table_a_id;
To instead get all rows from TableA, and only the matching rows from TableB:
SELECT TableA.id, TableA.firstname, TableA.lastname, TableB.pet
FROM TableA
LEFT JOIN TableB
ON TableA.id=TableB.table_a_id;

The answer to your question ultimately depends on what you mean by "if it matches."
Let's assume, for a moment, that you have primary keys on each of these tables, TableA an TableB, and that you're NOT matching those. But that you have one or more other columns, the actual data that you're storing in each row, which you are considering for your matching. Let's call those ColA and ColB.
In that case you could use:
SELECT TableA.id, TableB.id, TableA.ColA, TableB.ColB
FROM TableA
LEFT JOIN TableB
ON (TableA.ColA = TableB.ColA)
AND (TableB.ColB = TableB.ColB);
... notice that we're using a complex expression on which to JOIN. You'd want to add an AND (TableA.XXX = TableB.XXX) for each columned that you want to consider significant in your matching.
Of course I'm assuming that these tables don't share a common surrogate key (otherwise MicKri's JOIN would be simpler ... or a "NATURAL JOIN" would be even simpler still).
What you're doing, conceptually, is defining a pair of (mathematical) sets an finding the intersection between them. The complication of doing this in SQL is that real world tables often have these extra columns (surrogate primary keys, and foreign keys) which aren't attributes of the underlying entities ... but which serve to map relationships among them.
In my example I'm just showing a way to formulate a JOIN query that finds the intersection based only on the attributes that are significant for your purposes.
(By the way, the parentheses in my example are there for human legibility. They should not be required by your SQL engine ... though they don't hurt, either).
Here's one of a number of visual explanations of SQL JOINs that's handy for learning this sort of thing. An INNER JOIN is an intersection. The ON and WHERE clauses define the subsets of the data (columns and rows, respectively) which are to be related.

Remove Records from LEFT JOIN Only If Duplicates and Fits Certain Criteria

I have a fairly big file that I matcedh with another file before uploading it to my database using MySQL. The original file was ~211k (t1) and the returning match after matching it with the existing database (t2) is around 300k -- which means I have to do almost 90k work of record-removal before I can upload.
Since the first query where I used a LEFT JOIN to match them on name took so long, I saved the results as a new table called matchnew (the 300k records, seemingly with 90k of duplicates or bad matches). Here's a sample of the matchnew schema after I joined t1 and t2:
CREATE TABLE `rnmatchnew` (
`id1` varchar(255) DEFAULT NULL,
`first1` varchar(255) DEFAULT NULL,
`last1` varchar(255) DEFAULT NULL,
`phone1` varchar(255) DEFAULT NULL,
`zip1` varchar(255) DEFAULT NULL
`id2` varchar(255) DEFAULT NULL,
`first2` varchar(255) DEFAULT NULL,
`last2` varchar(255) DEFAULT NULL,
`phone2` varchar(255) DEFAULT NULL,
`zip2` varchar(255) DEFAULT NULL;
(And the two IDs [id1 and id2] do not match -- they're two unique identifiers from two different databases.)
Right now I'm looking at most of those duplicates or bad matches by using this simple query:
SELECT *, COUNT(id1)
FROM matchnew
GROUP BY id1
HAVING COUNT(id1) > 1;
The good thing about each table that I matched had different unique identifiers attached to them (id1 from the first table and id2 from the second table, which now both exist in matchnew) -- so it should be fairly easy to see when records are appearing multiple times. Also I because I left joined two existing tables together to get matchnew, that means that I have two sets of data for each person from each table -- so two names, two phone numbers, two addresses, etc. But I only did the LEFT JOIN on first and last name to ensure I'd get the biggest possible return to make sure I didn't miss anybody in case they moved or we have different phone numbers for them, etc.
My question is: Is there code I can write or add to the above query which will remove rows if they fit a certain criteria only if there is more than one unique ID in the table? So for example, if my id1 was 1234567 and my query above showed that there were now three of me in the final column, is there additional code I can write to remove one or two (but not all three) of the duplicates or bad matches if my data doesn't match up with other qualifiers (e.g. phone number or zip code)?
To further clarify, if my record with id1: 1234567 from the initial t1 matched with three people with my name from t2 -- is there a way to remove up to two of the rows if, for example, the record from t1 matched the same phone number as one of the three records with the same name from t2? (The only reason why I specify "up to two" is because this example has three duplicates -- and if none of them match the phone number, I don't want to lose them all entirely in case that's a decision I can make manually.)
That was way more complicated to describe than I expected -- so please just let me know if I can provide any further clarification! Thanks so much for the help.

You need to first insert a identity column for all the rows
With identify column id , the rows will be like this
id id1 phone1 first1
1 1 732 t1
2 1 732 t1
3 1 732 t2
4 1 891 t3
The query would remove only row with id 2 as id1, phone1, first1 are matching
We are doing group by on phone1, id1 , if combination has duplicate values, then retaining only maximum value from first1
DELETE M FROM matchnew M
INNER JOIN (
SELECT id1, phone1, first1, MAX(id) as id
FROM matchnew
GROUP BY id1,phone1,first1
HAVING COUNT(*) > 1 )T
ON M.id < T.id
AND M.phone1 = T.phone1
AND M.id1 = T.id1
AND M.first1 = T.first1

Database design - primary key naming conventions

I am interested to know what people think about (AND WHY) the following 3 different conventions for naming database table primary keys in MySQL?
-Example 1-
Table name: User,
Primary key column name: user_id
-Example 2-
Table name: User,
Primary key column name: id
-Example 3-
Table name: User,
Primary key column name: pk_user_id
Just want to hear ideas and perhaps learn something in the process :)
Thanks.

I would go with option 2. To me, "id" itself seems sufficient enough.
Since the table is User so the column "id" within "user" indicates that it is the identification criteria for User.
However, i must add that naming conventions are all about consistency.
There is usually no right / wrong as long as there is a consistent pattern and it is applied across the application, thats probably the more important factor in how effective the naming conventions will be and how far they go towards making the application easier to understand and hence maintain.

I always prefer the option in example 1, in which the table name is (redundantly) used in the column name. This is because I prefer to see ON user.user_id = history.user_id than ON user.id = history.user_id in JOINs.
However, the weight of opinion on this issue generally seems to run against me here on Stackoverflow, where most people prefer example 2.
Incidentally, I prefer UserID to user_id as a column naming convention. I don't like typing underscores, and the use of the underscore as the common SQL single-character-match character can sometimes be a little confusing.

ID is the worst PK name you can have in my opinion. TablenameID works much better for reporting so you don't have to alias a bunch of columns named the same thing when doing complex reporting queries.
It is my personal belief that columns should only be named the same thing if they mean the same thing. The customer ID does not mean the same thing as the orderid and thus they should conceptually have different names. WHen you have many joins and a complex data structure, it is easier to maintain as well when the pk and fk have the same name. It is harder to spot an error in a join when you have ID columns. For instance suppose you joined to four tables all of which have an ID column. In the last join you accidentally used the alias for the first table and not the third one. If you used OrderID, CustomerID etc. instead of ID, you would get a syntax error because the first table doesn't contain that column. If you use ID it would happily join incorrectly.

I tend to go with the first option, user_id.
If you go with id, you usually end up with a need to alias excessively in your queries.
If you go with more_complicated_id, then you either must abbreviate, or you run out of room, and you get tired of typing such long column names.
2 cents.

I agree with #InSane and like just Id. And here's why:
If you have a table called User, and a column dealing with the user's name, do you call it UserName or just Name? The "User" seems redundant. If you have a table called Customer, and a column called Address, do you call the column CustomerAddress?
Though I have also seen where you would use UserId, and then if you have a table with a foreign key to User, the column would also be UserId. This allows for the consistency in naming, but IMO, doesn't buy you that much.

In response to Tomas' answer, there will still be ambiguity assuming that the PK for the comment table is also named id.
In response to the question, Example 1 gets my vote. [table name]_id would actually remove the ambiguity.
Instead of
SELECT u.id AS user_id, c.id AS comment_id FROM user u JOIN comment c ON u.id=c.user_id
I could simply write
SELECT user_id, comment_id FROM user u JOIN comment c ON u.user_id=c.user_id
There's nothing ambiguous about using the same ID name in both WHERE and ON. It actually adds clarity IMHO.

I've always appreciated Justinsomnia's take on database naming conventions. Give it a read: http://justinsomnia.org/2003/04/essential-database-naming-conventions-and-style/

I would suggest example 2. That way there is no ambiguity between foreign keys and primary keys, as there is in example 1. You can do for instance
SELECT * FROM user, comment WHERE user.id = comment.user_id
which is clear and concise.
The third example is redundant in a design where all id's are used as primary keys.

OK so forget example 3 - it's just plain silly, so it's between 1 and 2.
the id for PK school of thought (2)
drop table if exists customer;
create table customer
(
id int unsigned not null auto_increment primary key, -- my names are id, cid, cusid, custid ????
name varchar(255) not null
)engine=innodb;
insert into customer (name) values ('cust1'),('cust2');
drop table if exists orders;
create table orders
(
id int unsigned not null auto_increment primary key, -- my names are id, oid, ordid
cid int unsigned not null -- hmmm what shall i call this ?
)engine=innodb;
insert into orders (cid) values (1),(2),(1),(1),(2);
-- so if i do a simple give me all of the customer orders query we get the following output
select
c.id,
o.id
from
customer c
inner join orders o on c.id = o.cid;
id id1 -- big fan of column names like id1, id2, id3 : they are sooo descriptive
== ===
1 1
2 2
1 3
1 4
2 5
-- so now i have to alias my columns like so:
select
c.id as cid, -- shall i call it cid or custid, customer_id whatever ??
o.id as oid
from
customer c
inner join orders o on c.id = o.cid; -- cid here but id in customer - where is my consistency ?
cid oid
== ===
1 1
2 2
1 3
1 4
2 5
the tablename_id prefix for PK/FK name school of thought (1)
(feel free to use an abbreviated form of tablename i.e cust_id instead of customer_id)
drop table if exists customer;
create table customer
(
cust_id int unsigned not null auto_increment primary key, -- pk
name varchar(255) not null
)engine=innodb;
insert into customer (name) values ('cust1'),('cust2');
drop table if exists orders;
create table orders
(
order_id int unsigned not null auto_increment primary key,
cust_id int unsigned not null
)engine=innodb;
insert into orders (cust_id) values (1),(2),(1),(1),(2);
select
c.cust_id,
o.order_id
from
customer c
inner join orders o on c.cust_id = o.cust_id; -- ahhhh, cust_id is cust_id is cust_id :)
cust_id order_id
======= ========
1 1
2 2
1 3
1 4
2 5
so you see the tablename_ prefix or abbreviated tablename_prefix method is ofc the most
consistent and easily the best convention.

I don't disagree with what most of the answers note - just be consistent. However, I just wanted to add that one benefit of the redundant approach with user_id allows for use of the USING syntactic sugar. If it weren't for this factor, I think I'd personally opt to avoid the redundancy.
For example,
SELECT *
FROM user
INNER JOIN subscription ON user.id = subscription.user_id
vs
SELECT *
FROM user
INNER JOIN subscription USING(user_id)
It's not a crazy significant difference, but I find it helpful.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008