how to calculate the negation of a selection predicate of a SQL query? - mysql

I have a sql query 'q' which is of the form :
Select attribute from table1, table2 where SC;
Here 'SC' is the conjunction of all the selection predicates in q
In my case : SC is balance <1000 and salary >=50000
Now i want to calculate "NSC" which is negation of SC in q and need some help.

Per your post you said i want to calculate "NSC" which is negation of SC and
SC is balance <1000 and salary >=50000
So, Negation of SC would simply be balance >= 1000 and salary < 50000
So, you can do like
Select attribute from table1 where balance >= 1000 and salary < 50000
(OR)
Select attribute from table1
where attribute not in
( select attribute from table1
where balance <1000 and salary >=50000 )

The general answer to your question is given by De Morgan's laws.
NOT(P AND Q) = NOT(P) OR NOT(Q)
and
NOT(P OR Q) = NOT(P) AND NOT(Q)
Your expression NOT(balance < 1000 AND salary >= 50000) becomes
balance >= 1000 OR salary < 50000
De Morgan's laws apply to Boolean algebra with a binary logic. If you have NULL values, things get more complicated. The way to go depends on the result you are expecting. A very useful function, in this respect, is the COALESCE function which returns the first non-null parameter passed to it.
COALESCE(balance, 0) >= 1000 OR COALESCE(salary, 0) < 50000

Take your pick of two methods.
Aside: null
I will assume there are no NULLs in tables or expressions. Because that complicates things. In particular, if we want to treat NULL as if it were a value in conditions then:
v = w is true in SQL when v = w AND v IS NOT NULL AND w IS NOT NULL
v != w is true in SQL when v != w AND v IS NOT NULL AND
Also SQL has lots of special behaviour when there are nulls. Also beware that nulls get generated by SQL in things like OUTER JOIN and IN of a subquery. See SQL and Relational Theory: How to Write Accurate SQL Code, 2nd Edition.
1. not ( expression )
SQL doesn't have a comparson-negating "not". You can build an expression that is the not of another expression by parsing from the top down:
not (c AND d) is (not (c) OR not (d))
not (c OR d) is (not (c) AND not(d))
not (e NOT NOTable-operator o) is (e NOTable-operator o)
not (e NOTable-operator o) is (e NOT NOTable-operator o)
not (e NOT NOTable-function(f)) is (e NOTable-function(f))
not (e notable-function(f)) is (e NOT NOTable-function(f))
not (e != f) is (e = f)
not (e = f) is (e != f)
not (v < w) is is (v >= w)
not NOT b is b -- boolean b
not b is (NOT (b)) -- boolean b
Now given SC you find expression not(SC) and you write:
SELECT ... FROM ... WHERE not(SC)
2. Not not ( expression )
In standard SQL you could write the preceding query as:
(SELECT ... FROM ...)
EXCEPT
(SELECT ... FROM ... WHERE SC)
(In Oracle it's MINUS.) But MySql doesn't have EXCEPT. But s EXCEPT t is
SELECT s.a,...
FROM s
LEFT JOIN t
ON s.a = t.a AND ...
WHERE t.a IS NULL
When s.a has no match in t, t.a IS NULL. So this returns only unmatched rows. So
SELECT ... FROM ... WHERE not(SC)
is:
SELECT s.a,...
FROM (SELECT ... FROM ...) s
LEFT JOIN (SELECT ... FROM ... WHERE SC) t
ON s.a = t.a AND ...
WHERE t.a IS NULL

Related

SQL query question of How can i remove duplicates regardless of the column order

Here is the table called flight info.
departure arrival
A B
C A
A C
C A
B C
C D
D B
A C
B A
The out put should be:
departure Arrival
A B
A C
B C
C D
D B
I try to use GROUP BY on both columns. However, I cannot find a way to identify same letters but in different order at two columns. Please help me out. Thank you so much and I appreciate it.
[I prefer MYSQL solution]
select distinct
case when departure<=arrival then departure else arrival end as X,
case when departure> arrival then departure else arrival end as Y
from T
you can use below query
First I concat both column with condition
SELECT departure,arrival,
CASE WHEN departure>arrival THEN CONCAT(arrival,departure) ELSE CONCAT(departure,arrival) END c
FROM t1
ORDER BY c
then select the first column if data has A,B and B,A. This will select A,B and next row return null. Like ranking
SELECT departure,arrival,#winrank := IF( #cvalue=NULL ,c, IF(#cvalue=c,NULL,c)) AS r ,
#cvalue :=c AS r1
FROM (
SELECT departure,arrival,
CASE WHEN departure>arrival THEN CONCAT(arrival,departure) ELSE CONCAT(departure,arrival) END c
FROM t1
ORDER BY c) t , (SELECT #cvalue :=NULL) r
And the final query is
SELECT departure,arrival FROM (
SELECT departure,arrival,#winrank := IF( #cvalue=NULL ,c, IF(#cvalue=c,NULL,c)) AS r ,
#cvalue :=c AS r1
FROM (
SELECT departure,arrival,
CASE WHEN departure>arrival THEN CONCAT(arrival,departure) ELSE CONCAT(departure,arrival) END c
FROM t1
ORDER BY c) t , (SELECT #cvalue :=NULL) r) f
WHERE f.r IS NOT null
One method is:
select distinct arrival, departure
from t
where arrival < departure or
not exists (select 1
from t t2
where t2.arrival = t.departure and t2.departure = t.arrival
);
Here is a db<>fiddle
The first condition selects all rows where the arrival is smaller than the departure. The second adds in the pairs where the inverse is not in the table.

SQL Query - Combined rows with selective columns [MySQL]

I have the following table:
Game | Name | Result | Stage
1 A W F
1 B L 0
2 C L F
2 D W 0
3 E L 0
3 F W 0
The output I am looking for:
Game | Name | Result | Stage
1 A W F
2 D W F
I only want to see the winners (W) from the results of stage F.
I can do this via joins (which isn't very fast):
SELECT *
FROM (
SELECT *
FROM MyTable
WHERE Stage = 'F'
) AA
JOIN MyTable
ON AA.Game = MyTable.Game AND AA.Result <> MyTable.Result
..but I am wondering if there is an easier and more efficient way to do it. Plus this requires I do some more filtering afterwards.
Thanks in advance!
To perform a job of this sort without a self-join or an equivalent, you would want to use SQL window functions, which MySQL does not support. The join you are using is not too bad, but this would be a little simpler:
SELECT
players.Game AS Game,
players.Name AS Name,
'W' AS Result,
'F' as Stage
FROM
MyTable stage
JOIN MyTable players
ON stage.Game = players.Game
WHERE
stage.stage = 'F'
AND players.result = 'W'
With "The winners only from stage F" you only need:
SELECT * FROM MyTable WHERE stage="F" and result="W";
Your own result example however also shows name "D" which is not a winner in stage F.
If you only want to see the winners (W) from the results of stage F you don't need a join. The following statement will work:
SELECT * FROM MyTable where Stage='F' AND Result= 'W'
You probably need a subquery :
SELECT
*
FROM
MyTable
WHERE
Result = 'W'
AND Game IN (SELECT Game FROM MyTable WHERE Stage = 'F')
SELECT x.Game
, y.Name
FROM my_table x
JOIN my_table y
ON y.game = x.game
AND y.result = 'w'
WHERE x.stage = 'f';

JOIN using MIN() where MIN() is greater than value on left side of join

I'm trying to do some conversion analysis on a load of existing data that we have in a (few) SQL database.
The data strucutre itself is very simple. It's just a list of actors (think user_id) and a name of something they did. It looks like this (there's other data, but that will not be used in this query):
CREATE TABLE views(
project_id integer not null,
name varchar(128) not null,
datetime timestamp not null,
actor varchar(256) not null
)
The goal is standard conversion analysis stuff. Number of people who did action A, then did B, C, D, E etc, and the average time between steps.
For clarity, the funnel steps dictate order, but not exclusivity. For example, a funnel looking for names A, B, C should include an actor who's sequence was B, A, B, D, C (as that contains an A, then later a B, then later a C, even though there are steps in between).
Currently I am querying this table using the following (Each join represents the next step in the conversion funnel):
SELECT count(actor), count(span2), avg(span2), count(span3), avg(span3), count(span4), avg(span4), count(span5), avg (span5)
FROM
(
SELECT e1.actor,
DATEDIFF(SECOND, MIN(e1.datetime), MIN(e2.datetime)) AS span2,
DATEDIFF(SECOND, MIN(e2.datetime), MIN(e3.datetime)) AS span3,
DATEDIFF(SECOND, MIN(e3.datetime), MIN(e4.datetime)) AS span4,
DATEDIFF(SECOND, MIN(e4.datetime), MIN(e5.datetime)) AS span5
FROM views AS e1
LEFT JOIN (SELECT actor, MIN(datetime) as datetime FROM views WHERE name = 'Action 2' group by actor) as e2 ON e1.actor = e2.actor AND e2.datetime > e1.datetime
LEFT JOIN (SELECT actor, MIN(datetime) as datetime FROM views WHERE name = 'Action 3' group by actor) as e3 ON e1.actor = e3.actor AND e3.datetime > e2.datetime
LEFT JOIN (SELECT actor, MIN(datetime) as datetime FROM views WHERE name = 'Action 4' group by actor) as e4 ON e1.actor = e4.actor AND e4.datetime > e3.datetime
LEFT JOIN (SELECT actor, MIN(datetime) as datetime FROM views WHERE name = 'Action 5' group by actor) as e5 ON e1.actor = e5.actor AND e5.datetime > e4.datetime
WHERE e1.project_id = 1 and e1.name = 'Action 1'
GROUP BY e1.actor
) AS aggregates
This is quite fast on the data set (<1s on 10M rows). The problem is that it is not actually the correct result. The sub selected joins are asking for MIN(datetime) each time. If an actor sequence happens in the order B, A, B this will not be counted as MIN(A) is greater than MIN(B).
Given a set of actors, who have performed a list of views, I need to check each actor to see if they have performed view A, then later view B, then later view C, regardless of any steps they did in the middle. B, A, B, C qualifies, A, B, B, C qualifies, A, B, Z, C qualifies, A, Z, C does not
To query this "properly" I can remove the MIN(datetime) in the sub joins, and do the MIN() outside the join. This however takes an extremely long time as each row is then joined multiple times for each funnel step (steps are often repeated out of order). The cross product in this case is huge - 21 quadrillion rows says the query planner! (21,666,755,307,950,608). That's obviously no longer a sub 1 second query.
What I want do achieve is a join where the join happens on the MIN value, but where the MIN value is the "MIN value greater than the previous join step". I.e. so for step A to B, the B.datetime is the single MIN B.datetime that is still greater than A.datetime. Something like (not valid SQL!):
....
LEFT JOIN (SELECT actor, datetime FROM views WHERE name = 'Action 2') as e2
ON e1.actor = e2.actor AND e2.datetime > e1.datetime HAVING MIN(e.datetime)
....
Any suggestions on how this can be achieved?
Functions specific to either MySQL or PostgreSQL are fine if suitable.
I would suggest just looking at all transition times. Here is how you can do this in SQL:
SELECT prevName, name, count(*) as NumTransitions,
avg(DATEDIFF(SECOND, "datetime", prevdatetime))
FROM (SELECT e1.actor, "datetime", name,
lag(name) over (partition by actor order by "datetime") as prevName,
lag("datetime") over (partition by actor order by "datetime") as prevDateTime
FROM views AS e1
WHERE e1.project_id = 1
) t
GROUP BY prevName, name;
If you want the number of "actors" for each transition, you can add count(distinct actor).
At first glance I think it is the sql you're using for the inline views that are to blame.
Please try the below. If not could you post some example data and desired results?
SELECT count(actor), count(span2), avg(span2), count(span3), avg(span3), count(span4), avg(span4), count(span5), avg (span5)
FROM
(
SELECT e1.actor,
DATEDIFF(SECOND, MIN(e1.datetime), MIN(e2.datetime)) AS span2,
DATEDIFF(SECOND, MIN(e2.datetime), MIN(e3.datetime)) AS span3,
DATEDIFF(SECOND, MIN(e3.datetime), MIN(e4.datetime)) AS span4,
DATEDIFF(SECOND, MIN(e4.datetime), MIN(e5.datetime)) AS span5
FROM views as e1, views as e2, views as e3, views as e4, views as e5
where e1.actor = e2.actor and e1.actor = e3.actor and e1.actor = e4.actor and e1.actor = e5.actor
and e2.datetime = (select min(x.datetime) from views x where x.name = 'Action 2' and x.actor = e2.actor and x.datetime > e1.datetime)
and e3.datetime = (select min(x.datetime) from views x where x.name = 'Action 3' and x.actor = e3.actor and x.datetime > e2.datetime)
and e4.datetime = (select min(x.datetime) from views x where x.name = 'Action 4' and x.actor = e4.actor and x.datetime > e3.datetime)
and e5.datetime = (select min(x.datetime) from views x where x.name = 'Action 5' and x.actor = e5.actor and x.datetime > e4.datetime)
and e1.project_id = 1 and e1.name = 'Action 1'
GROUP BY e1.actor
) AS aggregates

Linq to SQL using IQueryable API for 1 to many relationship

Assume A is a parent table with many B records. Essentially I need LINQ to SQL to generate this query:
Select * from A
Join B n on B.Id = A.Id
where A.OtherId in (0,1,2,3)
and B.DateTime >= '2011-02-03 00:30:00.000'
and A.TypeId = 1
order by B.DateTime
The LINQ I have looks like this:
List<string> programIds = new List<string>("0", "1", "2", "3");
IQueryable<A> query = db.As;
query = query.Where(a => programIds.Contains(a.ProgramId));
query = query.Where(a => a.B.Any(b => b.DateTime >= ('2011-02-03 00:30:00.000')));
The problem begins on this last statement, the generated query then looks like this:
?query
{SELECT *
FROM [dbo].[A] AS [A]
WHERE (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[B] AS [B]
WHERE ([B].[AirsOnStartDateTime] >= #p0) AND ([B].[Id] = [A].[Id])
)) AND ((CONVERT(BigInt,[A].[OtherId])) IN (#p1, #p2, #p3, #p4))
}
Any ideas?
Sorry, I'm typing this up in notepad and can't verify the syntax at the moment, but I think something like this is what you are looking for:
List<string> programIds = new List<string>("0", "1", "2", "3");
var query = from a in db.As
from b in db.Bs
where programIds.Contains(a.ProgramID)
&& a.TypeID == 1
&& a.ID == b.ID
&& b.DateTime >= ('2011-02-03 00:30:00.000')
select new
{
a....
b....
....
}
Just fill in the fields you want to return in the anonymous type select section.
Also, I made an assumption on the db.Bs being the way to get the B values out of your table... Fix that as appropriate.

L2E: GroupBy always gets translated into Distincts, never Group By

How can I get the Linq-to-Entities provider to truly perform a GROUP BY? No matter what I do, it always generates SQL that is far slower than an actual GROUP BY. For example:
var foo = (from x in Context.AccountQuantities
where x.AccountID == 27777
group x by x.StartDate
into groups
select new
{
groups.FirstOrDefault().StartDate,
Quantity = groups.Max(y => y.Quantity)
}).ToList();
translates to:
SELECT
1 AS [C1],
[Project4].[C1] AS [C2],
[Project4].[C2] AS [C3]
FROM ( SELECT
[Project3].[C1] AS [C1],
(SELECT
MAX([Extent3].[Quantity]) AS [A1]
FROM [dbo].[AccountQuantities] AS [Extent3]
WHERE (27777 = [Extent3].[AccountID]) AND ([Project3].[StartDate] = [Extent3].[StartDate])) AS [C2]
FROM ( SELECT
[Distinct1].[StartDate] AS [StartDate],
(SELECT TOP (1)
[Extent2].[StartDate] AS [StartDate]
FROM [dbo].[AccountQuantities] AS [Extent2]
WHERE (27777 = [Extent2].[AccountID]) AND ([Distinct1].[StartDate] = [Extent2].[StartDate])) AS [C1]
FROM ( SELECT DISTINCT
[Extent1].[StartDate] AS [StartDate]
FROM [dbo].[AccountQuantities] AS [Extent1]
WHERE 27777 = [Extent1].[AccountID]
) AS [Distinct1]
) AS [Project3]
) AS [Project4]
How can I get it to execute this instead?
SELECT
AccountQuantities.StartDate,
MAX(AccountQuantities.Quantity)
FROM AccountQuantities
WHERE AccountID=27777
GROUP BY StartDate
Again, note the lack of ANY GROUP BY in what's executed. I'm fine with EF optimizing things non-ideally but this is orders of magnitude slower, I cannot find any way to convince it to really do a GROUP BY, and is a major problem for us!
Use groups.Key instead of groups.FirstOrDefault().StartDate
var foo = (from x in Context.AccountQuantities
where x.AccountID == 27777
group x by x.StartDate
into groups
select new
{
groups.Key,
Quantity = groups.Max(y => y.Quantity)
}).ToList();