Group by query in sql - mysql

I have got 2 tables, Security and SecurityTransactions.
Security:
create table security(SecurityId int, SecurityName varchar(50));
insert into security values(1,'apple');
insert into security values(2,'google');
insert into security values(3,'ibm');
SecurityTable:
create table SecurityTransactions(SecurityId int, Buy_sell boolean, Quantity int);
insert into securitytransactions values ( 1 , false, 100 );
insert into securitytransactions values ( 1 , true, 20 );
insert into securitytransactions values ( 1 , false, 50 );
insert into securitytransactions values ( 2 , false, 120 );
I want to find out the security name and it's no of appearance in SecurityTransactions.
The answer is below:
SecurityName | Appearance
apple | 3
google | 1
I wrote the below sql query :
select S.SecurityName, count(t.securityID) as Appearance
from security S inner join securitytransactions t on S.SecurityId = t.SecurityId
group by t.SecurityId, S.SecurityName;
this query gave me the desired result, but it was still rejected by a person saying group by should have been s.securityName. why is it so ?
EDIT :
Which one do you this is correct and why ?
a. group by t.SecurityId, S.securityName
b. group by t.SecurityId
c. group by S.securityName

According to ANSI SQL, if you use a group by clause your select list may only contain items in the group by clause, single row transformations thereof, or aggregate expressions. MySQL is non-standard, and allows other columns too. In this case it happened to produce the right answer, as there's a 1:1 relationship between the SecuirtyId and SecurityName, but generally speaking, this is a bad practice that will make your code hard to understand at best, and unpredictable at worst.
EDIT:
To address the edited question - grouping by both SecuirtyId and SecurityName isn't technically wrong, it's just redundant. Since there's a 1:1 relationship between the two columns, adding the SecurityId column to the group by clause won't change the result, and will just confuse people reading the query.

Related

Unexpected behaviour in MySQL subquery [duplicate]

This issue came up when I got different records counts for what I thought were identical queries one using a not in where constraint and the other a left join. The table in the not in constraint had one null value (bad data) which caused that query to return a count of 0 records. I sort of understand why but I could use some help fully grasping the concept.
To state it simply, why does query A return a result but B doesn't?
A: select 'true' where 3 in (1, 2, 3, null)
B: select 'true' where 3 not in (1, 2, null)
This was on SQL Server 2005. I also found that calling set ansi_nulls off causes B to return a result.
Query A is the same as:
select 'true' where 3 = 1 or 3 = 2 or 3 = 3 or 3 = null
Since 3 = 3 is true, you get a result.
Query B is the same as:
select 'true' where 3 <> 1 and 3 <> 2 and 3 <> null
When ansi_nulls is on, 3 <> null is UNKNOWN, so the predicate evaluates to UNKNOWN, and you don't get any rows.
When ansi_nulls is off, 3 <> null is true, so the predicate evaluates to true, and you get a row.
NOT IN returns 0 records when compared against an unknown value
Since NULL is an unknown, a NOT IN query containing a NULL or NULLs in the list of possible values will always return 0 records since there is no way to be sure that the NULL value is not the value being tested.
Whenever you use NULL you are really dealing with a Three-Valued logic.
Your first query returns results as the WHERE clause evaluates to:
3 = 1 or 3 = 2 or 3 = 3 or 3 = null
which is:
FALSE or FALSE or TRUE or UNKNOWN
which evaluates to
TRUE
The second one:
3 <> 1 and 3 <> 2 and 3 <> null
which evaluates to:
TRUE and TRUE and UNKNOWN
which evaluates to:
UNKNOWN
The UNKNOWN is not the same as FALSE
you can easily test it by calling:
select 'true' where 3 <> null
select 'true' where not (3 <> null)
Both queries will give you no results
If the UNKNOWN was the same as FALSE then assuming that the first query would give you FALSE the second would have to evaluate to TRUE as it would have been the same as NOT(FALSE).
That is not the case.
There is a very good article on this subject on SqlServerCentral.
The whole issue of NULLs and Three-Valued Logic can be a bit confusing at first but it is essential to understand in order to write correct queries in TSQL
Another article I would recommend is SQL Aggregate Functions and NULL.
Compare to null is undefined, unless you use IS NULL.
So, when comparing 3 to NULL (query A), it returns undefined.
I.e. SELECT 'true' where 3 in (1,2,null)
and
SELECT 'true' where 3 not in (1,2,null)
will produce the same result, as NOT (UNDEFINED) is still undefined, but not TRUE
IF you want to filter with NOT IN for a subquery containg NULLs justcheck for not null
SELECT blah FROM t WHERE blah NOT IN
(SELECT someotherBlah FROM t2 WHERE someotherBlah IS NOT NULL )
The title of this question at the time of writing is
SQL NOT IN constraint and NULL values
From the text of the question it appears that the problem was occurring in a SQL DML SELECT query, rather than a SQL DDL CONSTRAINT.
However, especially given the wording of the title, I want to point out that some statements made here are potentially misleading statements, those along the lines of (paraphrasing)
When the predicate evaluates to UNKNOWN you don't get any rows.
Although this is the case for SQL DML, when considering constraints the effect is different.
Consider this very simple table with two constraints taken directly from the predicates in the question (and addressed in an excellent answer by #Brannon):
DECLARE #T TABLE
(
true CHAR(4) DEFAULT 'true' NOT NULL,
CHECK ( 3 IN (1, 2, 3, NULL )),
CHECK ( 3 NOT IN (1, 2, NULL ))
);
INSERT INTO #T VALUES ('true');
SELECT COUNT(*) AS tally FROM #T;
As per #Brannon's answer, the first constraint (using IN) evaluates to TRUE and the second constraint (using NOT IN) evaluates to UNKNOWN. However, the insert succeeds! Therefore, in this case it is not strictly correct to say, "you don't get any rows" because we have indeed got a row inserted as a result.
The above effect is indeed the correct one as regards the SQL-92 Standard. Compare and contrast the following section from the SQL-92 spec
7.6 where clause
The result of the is a table of those rows of T for
which the result of the search condition is true.
4.10 Integrity constraints
A table check constraint is satisfied if and only if the specified
search condition is not false for any row of a table.
In other words:
In SQL DML, rows are removed from the result when the WHERE evaluates to UNKNOWN because it does not satisfy the condition "is true".
In SQL DDL (i.e. constraints), rows are not removed from the result when they evaluate to UNKNOWN because it does satisfy the condition "is not false".
Although the effects in SQL DML and SQL DDL respectively may seem contradictory, there is practical reason for giving UNKNOWN results the 'benefit of the doubt' by allowing them to satisfy a constraint (more correctly, allowing them to not fail to satisfy a constraint): without this behaviour, every constraints would have to explicitly handle nulls and that would be very unsatisfactory from a language design perspective (not to mention, a right pain for coders!)
p.s. if you are finding it as challenging to follow such logic as "unknown does not fail to satisfy a constraint" as I am to write it, then consider you can dispense with all this simply by avoiding nullable columns in SQL DDL and anything in SQL DML that produces nulls (e.g. outer joins)!
In A, 3 is tested for equality against each member of the set, yielding (FALSE, FALSE, TRUE, UNKNOWN). Since one of the elements is TRUE, the condition is TRUE. (It's also possible that some short-circuiting takes place here, so it actually stops as soon as it hits the first TRUE and never evaluates 3=NULL.)
In B, I think it is evaluating the condition as NOT (3 in (1,2,null)). Testing 3 for equality against the set yields (FALSE, FALSE, UNKNOWN), which is aggregated to UNKNOWN. NOT ( UNKNOWN ) yields UNKNOWN. So overall the truth of the condition is unknown, which at the end is essentially treated as FALSE.
SQL uses three-valued logic for truth values. The IN query produces the expected result:
SELECT * FROM (VALUES (1), (2)) AS tbl(col) WHERE col IN (NULL, 1)
-- returns first row
But adding a NOT does not invert the results:
SELECT * FROM (VALUES (1), (2)) AS tbl(col) WHERE NOT col IN (NULL, 1)
-- returns zero rows
This is because the above query is equivalent of the following:
SELECT * FROM (VALUES (1), (2)) AS tbl(col) WHERE NOT (col = NULL OR col = 1)
Here is how the where clause is evaluated:
| col | col = NULL⁽¹⁾ | col = 1 | col = NULL OR col = 1 | NOT (col = NULL OR col = 1) |
|-----|----------------|---------|-----------------------|-----------------------------|
| 1 | UNKNOWN | TRUE | TRUE | FALSE |
| 2 | UNKNOWN | FALSE | UNKNOWN⁽²⁾ | UNKNOWN⁽³⁾ |
Notice that:
The comparison involving NULL yields UNKNOWN
The OR expression where none of the operands are TRUE and at least one operand is UNKNOWN yields UNKNOWN (ref)
The NOT of UNKNOWN yields UNKNOWN (ref)
You can extend the above example to more than two values (e.g. NULL, 1 and 2) but the result will be same: if one of the values is NULL then no row will match.
Null signifies and absence of data, that is it is unknown, not a data value of nothing. It's very easy for people from a programming background to confuse this because in C type languages when using pointers null is indeed nothing.
Hence in the first case 3 is indeed in the set of (1,2,3,null) so true is returned
In the second however you can reduce it to
select 'true' where 3 not in (null)
So nothing is returned because the parser knows nothing about the set to which you are comparing it - it's not an empty set but an unknown set. Using (1, 2, null) doesn't help because the (1,2) set is obviously false, but then you're and'ing that against unknown, which is unknown.
It may be concluded from answers here that NOT IN (subquery) doesn't handle nulls correctly and should be avoided in favour of NOT EXISTS. However, such a conclusion may be premature. In the following scenario, credited to Chris Date (Database Programming and Design, Vol 2 No 9, September 1989), it is NOT IN that handles nulls correctly and returns the correct result, rather than NOT EXISTS.
Consider a table sp to represent suppliers (sno) who are known to supply parts (pno) in quantity (qty). The table currently holds the following values:
VALUES ('S1', 'P1', NULL),
('S2', 'P1', 200),
('S3', 'P1', 1000)
Note that quantity is nullable i.e. to be able to record the fact a supplier is known to supply parts even if it is not known in what quantity.
The task is to find the suppliers who are known supply part number 'P1' but not in quantities of 1000.
The following uses NOT IN to correctly identify supplier 'S2' only:
WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1', NULL ),
( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT DISTINCT spx.sno
FROM sp spx
WHERE spx.pno = 'P1'
AND 1000 NOT IN (
SELECT spy.qty
FROM sp spy
WHERE spy.sno = spx.sno
AND spy.pno = 'P1'
);
However, the below query uses the same general structure but with NOT EXISTS but incorrectly includes supplier 'S1' in the result (i.e. for which the quantity is null):
WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1', NULL ),
( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT DISTINCT spx.sno
FROM sp spx
WHERE spx.pno = 'P1'
AND NOT EXISTS (
SELECT *
FROM sp spy
WHERE spy.sno = spx.sno
AND spy.pno = 'P1'
AND spy.qty = 1000
);
So NOT EXISTS is not the silver bullet it may have appeared!
Of course, source of the problem is the presence of nulls, therefore the 'real' solution is to eliminate those nulls.
This can be achieved (among other possible designs) using two tables:
sp suppliers known to supply parts
spq suppliers known to supply parts in known quantities
noting there should probably be a foreign key constraint where spq references sp.
The result can then be obtained using the 'minus' relational operator (being the EXCEPT keyword in Standard SQL) e.g.
WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1' ),
( 'S2', 'P1' ),
( 'S3', 'P1' ) )
AS T ( sno, pno )
),
spq AS
( SELECT *
FROM ( VALUES ( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT sno
FROM spq
WHERE pno = 'P1'
EXCEPT
SELECT sno
FROM spq
WHERE pno = 'P1'
AND qty = 1000;
this is for Boy:
select party_code
from abc as a
where party_code not in (select party_code
from xyz
where party_code = a.party_code);
this works regardless of ansi settings
also this might be of use to know the logical difference between join, exists and in
http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

Select id from input list NOT present in database

With MySql vers 8.0:
CREATE TABLE cnacs(cid VARCHAR(20), PRIMARY KEY(cid));
Then,
INSERT INTO cnacs VALUES('1');
The first two statements execute successfully. The next statement does not, however. My goal is to return a list of unused cid's from the input table [1, 2]:
SELECT * FROM (VALUES ('1'),('2')) as T(cid) EXCEPT SELECT cid FROM cnacs;
In theory, I'd like the output to be '2', since it has not yet been added. The aforementioned query was inspired by Remus's answer on https://dba.stackexchange.com/questions/37627/identifying-which-values-do-not-match-a-table-row
This is at least the correct syntax for what you are trying to do.
If this query is anything more than a learning exercise though I'd rethink the approach, storing these '1' and '2' values (or however many there ends up being) in their own table
SELECT Column_0
FROM (SELECT * FROM (VALUES ROW('1'), ROW('2')) TMP) VALS
LEFT
JOIN cnacs
ON VALS.Column_0 = cnacs.cid
WHERE cnacs.cid IS NULL

Associated Name spiderweb

Say for instance I have the following entries in my table:
ID - 1
Name - Daryl
ID - 2
Name - Terry
ID - 3
Name - Dave
ID - 4
Name - Mitch
I eventually wish to search my table(s) for one specific name, but show all associated names. For instance,
Searching Daryl will return Terry, Dave & Daryl.
Searching Terry will return Dave, Daryl & Terry
Searching Mitch will only return Mitch.
The current table housing the names is as followed:
--
-- Table structure for table `members`
--
CREATE TABLE `members` (
`ID` int(255) NOT NULL,
`GuildID` int(255) NOT NULL,
`ToonName` varchar(255) NOT NULL,
`AddedOn` date NOT NULL,
`AddedByID` int(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
--
-- Dumping data for table `members`
--
INSERT INTO `members` (`ID`, `GuildID`, `ToonName`, `AddedOn`, `AddedByID`) VALUES
(1, 1, 'Daryl', '2020-01-17', 5),
(2, 1, 'Terry', '2020-01-17', 5),
(3, 1, 'Mitch', '2020-01-17', 5),
(4, 1, 'Dave', '2020-01-17', 5);
--
For Reference. GuildID will be a default search criteria based on the searchers login details. With a spiderweb like this, how would I go about creating another table (or another Column) to bring a combined search spiderweb structure based on the search criteria?
I was thinking something along the lines of:
CREATE TABLE `Associated`(
`ID` INT(255) NOT NULL,
`MainID` INT(255) NOT NULL,
`SecondaryID` INT(255) NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `Associated` (`ID`, `MainID`, `SecondaryID`) VALUES
(1, 1, 2) -- Daryl Associated With Terry
(2, 1, 4) -- Daryl Associated With Dave
But I feel this will make an over complicated value structure with alot of redundant inputs. Is there a more effective way to create a unified search?
The whole idea of operation is that each name is Individual. So certain Entries can be put under Daryl, Terry acting alone. But one search will bring together all associated Names by searching one name then pull together total entries based on the alisas?
You can try This
Select IFNULL(m.ToonName , members.ToonName) as ToonName
from members
LEFT JOIN Associated on Associated.MainID = members.ID
LEFT JOIN members as m on m.ID = Associated.SecondaryID
Where members.ToonName = "Mitch"
While you have entry for
"Mitch" in Associated table it will return you Daryl and when you dont have associated Id it will return the name from members table.
And If you will check this with "Daryl", it will give you Two results,
Select IFNULL(m.ToonName , members.ToonName) as ToonName
from members
LEFT JOIN Associated on Associated.MainID = members.ID
LEFT JOIN members as m on m.ID = Associated.SecondaryID
Where members.ToonName = "Daryl"
In case you want all the names in a single column you can use GROUP_CONCAT as #flash suggested in another answer.
You can directly get the data from the following SQL statement.
For Individual row
SELECT `members`.ToonName FROM `associated` JOIN `members` ON associated.SecondaryID = members.ID WHERE `associated`.MainID = (SELECT ID FROM `members` WHERE ToonName = 'Daryl');
# Output: **ToonName**
Terry,Dave
Grouping Row
// You can also group all rows by comma from following statement
SELECT GROUP_CONCAT(`members`.ToonName) FROM `associated` JOIN `members` ON associated.SecondaryID = members.ID WHERE `associated`.MainID = (SELECT ID FROM `members` WHERE ToonName = 'Daryl');
# Output: **ToonName**
Terry
Dave
Plan A
Add a column to each member. It is the number (or name) if the one group he/she belongs to. Terry, Dave & Daryl would get one value; Mitch would get a different value. Index the column for efficient lookup of related names.
Plan B
Implement a graph, like you suggested. Some tips: Get rid of id; instead have PRIMARY KEY(MainID, SecondaryID). The is an issue to resolve... This design implies a "directedness" of the relationships: Terry --> Dave, but not necessarily Dave --> Terry. If you want to force it to be reflexive, the force two rows to be inserted or insert the two IDs in an canonical order, but then check both directions.
Also, you need to "walk the graph". This is best done with a Recursive CTE. For that feature, you need MySQL 8.0 or MariaDB 10.2.
Plan C
Without the directedness of B, you run into more difficult issues. One is "cluster analysis". Another is messy paths and loops in the 'graph'. Let's avoid these.
Short Answer: Yes you need a second table Associated and it will not make completed structure.
Below is the query to get the required result
SELECT ID, ToonName,
(
SELECT GROUP_CONCAT(ToonName) FROM Associated
JOIN members child ON SecondaryID = child.ID
WHERE MainID = parent.ID
)
FROM members parent
You can also use join but I think, in this case sub query will be better.
NOTE : your tables need some optimization like remove ID field from Associated table, Add index etc.

MySQL query all existent and non existent entries from list (inline table)

I have a MySQL database with a table of tag names. I have a list of tags I want to assign and need to check whether they are in the database or not. Therefore I want to write a query which gives me the ids of all tags in the list which are already present and the ones which are not present yet.
In SQLite I already managed to write this query, but as it contains a CTE it can't directly be converted to MySQL.
The SQLite query is:
WITH
check_tags(name) AS ( VALUES ("name1"), ("name2) )
SELECT check_tags.name, tags.id FROM check_tags
LEFT JOIN tags ON check_tags.name = tags.name
The result would be for example:
id | name
---------------
1 | name1
Null | name2
In MySQL it could be something with SELECT * FROM ( VALUES("name1), ("name2") ) ... which I have seen for other database systems, but this also doesn't work with MySQL.
All these different SQL dialects make searching for help difficult.
The answer was to use an inline table as Aaron Kurtzhals pointed out.
My query now is:
CREATE TEMPORARY TABLE MyInlineTable (id LONG, content VARCHAR(255) );
INSERT INTO MyInlineTable VALUES
(1, 'name1'),
(2, 'name2');
SELECT * from MyInlineTable LEFT JOIN tags on MyInlineTable.content = tags.name

MySQL: Is it possible to "INSERT if number of rows with a specific value is less than X"?

To give a simple analogy, I have a table as follows:
id (PK) | gift_giver_id (FK) | gift_receiver_id (FK) | gift_date
Is it possible to update the table in a single query in such a way that would add a row (i.e. another gift for a person) only if the person has less than 10 gifts so far (i.e. less than 10 rows with the same gift_giver_id)?
The purpose of this would be to limit the table size to 10 gifts per person.
Thanks in advance.
try:
insert into tablename
(gift_giver_id, gift_receiver_id, gift_date)
select GIVER_ID, RECEIVER_ID, DATE from Dual where
(select count(*) from tablename where gift_receiver_id = RECEIVER_ID) < 10
"And would that also be, 'otherwise, update the fields in the oldest row'?"
And would that also be, a rather bloody significant annendum :P
I wouldn't do something that complex in a single query, I'd select first to test for the oldest and then either update or insert accordingly.
Not knowing what language you're working in other than SQL, I'll just stick to pseudocode for non-SQL portions.
SELECT TOP 1 id FROM gifts
WHERE (SELECT COUNT(*) FROM gifts WHERE gift_giver_id = senderidvalue
ORDER BY gift_date ASC) > 9;
{if result.row_count then}
INSERT INTO gifts (gift_giver_id, gift_receiver_id,gift_date)
VALUES val1,val2,val3
{else}
UPDATE gifts SET gift_giver_id = 'val1',
gift_receiver_id = 'val2',gift_date = 'val3'
WHERE {id = result.first_row.id}
The problem with your request is you're trying to find a single query to perform a SELECT as well as either an INSERT or an UPDATE. Someone may well come along and call me out on this to prove me wrong but I think you're asking for the impossible unless you want to get into stored procedures.
I'm no SQL guru but I'm thinking something like the following should work: (assuming a table name of gifts):
INSERT INTO gifts (gift_giver_id, gift_receiver_id,gift_date)
SELECT DISTINCT senderidvalue,receiveridvalue,datevalue FROM gifts
WHERE (SELECT COUNT(*) FROM gifts WHERE gift_giver_id = senderidvalue ) < 10;
[edit] Code formatting doesn't like me :(