SQL exclude rows matching all of multiple criteria - mysql

I currently have a query merging two tables to create a new one for analysis. After getting some funny results when trying to chart it for presentation, I learned that some of it is fake data that was never cleaned up. I've been able to identify the data causing the problems and, for the sake of time, would like to exclude it within the query so I can move ahead with the analysis.
This fake data matches ALL these criteria:
rate_type = Standard
client_net_cleared = 0
program is blank (not Null)
I identified these in SELECT with a CASE statement, but realized that to make any use of that I'd have to do another table querying everything in this one minus what's identified as meeting the above criteria based on the CASE statement. There has to be a better solution than that.
I'm currently trying to exclude these as part of the WHERE statement, but read around other question topics and found out WHERE is not very good at managing multiple sub-conditions.
What I have:
SELECT *
, CASE WHEN tad.rate_type = 'Standard'
AND tad.client_net_cleared = '0'
AND program= '' THEN 1
ELSE '0'
END AS noise
FROM tableau.km_tv_ad_data_import tad
JOIN tableau.km_tv_ad_report ga
ON ga.session_timestamp >= tad.timestamp - INTERVAL '4 minute'
AND ga.session_timestamp <= tad.timestamp + INTERVAL '5 minute'
AND ga.session_timestamp != tad.timestamp
WHERE tad.timestamp >= '2016-09-01'
AND (tad.rate_type != 'Standard'
AND tad.client_net_cleared != '0'
AND tad.program != '')
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21
Sample data set:
timestamp | rate_type | program | client_net_cleared | noise
---------------------|-----------|-----------------|--------------------|-------
2016-11-01 18:00:00 | Standard | Diving | 50 | 0
2016-12-01 21:00:00 | Holiday | Classic Albums | 100 | 0
2016-11-01 09:00:00 | FireSale | Panorama | 0 | 0
2016-10-01 12:00:00 | Standard | | 0 | 1
2016-12-01 15:00:00 | Holiday | MythBusters | 100 | 0
2016-10-01 13:00:00 | FireSale | House | 200 | 0
What I need:
Exclude rows matching ALL three criteria: rate_type = Standard, client_net_cleared = 0, program is blank (not Null).

The correct criteria is
AND NOT (tad.rate_type = 'Standard'
AND tad.client_net_cleared = '0'
AND tad.program = '')
By deMorgan's Law, this would be equivalent to:
AND (tad.rate_type != 'Standard'
OR tad.client_net_cleared != '0'
OR tad.program != '')
This is like your query, except notice that it uses OR, not AND.

You can also do SELECTs in the WHERE clause to exclude rows using NOT IN. For example all the qualifications with one vendor and the not in excludes people with qualifications with other vendors:
select * from qualification q
inner join certification c on c.id = q.certificationid
where c.vendorid = 3 and
employeeid not in
(
select employeeid from qualification q
inner join
certification c on c.id = q.certificationid
where c.vendorid <> 3
)

Related

Create a query that fetches rows plus their adjacent rows

I am having problems with a specific query - respectively creating the query in the first place.
The columns can be reduced to id, seconds and status.
=============================
| id | seconds | status |
-----------------------------
| 0 | 0 | 0 |
| 1 | 12 | 1 |
| 2 | 25 | 0 |
| 3 | 37 | 1 |
| 4 | 42 | 0 |
=============================
What I'd like to have: All entries with status = 1 PLUS all entries that are less than 10 seconds away from those entries. Basically, I want to fetch all possible pairs (or triplets, etc.) of rows to check manually (later automatically) whether they need to be paired (there is a column parent_id for this purpose, but we don't need that for the query). I could do this in code (first select all status=1, then loop), but I wonder whether it is possible to do this purely in the database.
Thus, my desired output would be the following:
=============================
| id | seconds | status |
-----------------------------
| 1 | 12 | 1 | <- status = 1
| 3 | 37 | 1 | <- status = 1
| 4 | 42 | 0 | <- only 5 seconds after status = 1
=============================
My current best guess is this:
SELECT * FROM entries e0
WHERE
e0.status = 1 OR
e0.status = 0 AND
0 < (SELECT count(*)
FROM entries e1
WHERE e1.status = 1 AND abs(e1.seconds - e0.seconds) < 10)
But this fetches the whole table, and I don't really know why - and it takes a long time to do so (there is an index on the column seconds, the table has 9000 entries).
Is there a way to do this (maybe even effiently)?
Here's one option with union all and exists:
select * from entries where status = 1
union all
select * from entries e where status = 0 and
exists (select 1
from entries e2
where e2.status = 1 and
abs(e.seconds - e2.seconds) < 10
)
SQL Fiddle Demo
Alternatively you could use an outer join with distinct instead of exists:
select distinct e.*
from entries e
left join entries e2 on e2.status = 1
where e.status = 1 or abs(e.seconds - e2.seconds) < 10
More Fiddle
I prefer to do it in a single query. However there are also ways of doing it with exists or subqueries as well. Utilizing an outer join means you can grab everything at once with a nicely crafted where and join statements, adding a group by or distinct based on your performance situation will tidy up your results and make them unique rows.
My suggestion on where statements to ensure your intentions are met is to use parenthesis to establish your intended precedence. It will make your code clearer to your intentions.
WHERE Condition1 = True OR Condition2 = True AND Condition3 = True
Should be
WHERE Condition1 = True OR (Condition2 = True AND Condition3 = True)
Oddly, I would not have thought it would evaluate in the manner you mention because of past experience but then again I ALWAYS use parenthesis to establish my precedence to make it more clear and easier to craft more complex conditions.
Reason you are getting the whole table. Is because of the data in your table. Seriously, sometimes we go looking for the answer and make it complicated, I prefer my way of solving the query of yours but given your result set example my query and yours get the same results! Try changing the 10 seconds down to 1/2/3 etc and see what the effect of your query is. My assumption would be in your full dataset that your any record with a status of 0 is within 10 seconds of a record that has a status of 1...... I would have commented back but this is one of the first questions I have answered.
Here is some example code based on your dataset and query.
DECLARE #Entries AS TABLE (
Id INT
,Seconds INT
,[Status] BIT
)
INSERT INTO #Entries (Id, Seconds, [Status])
VALUES (0,0,0 )
,(1,12,1 )
,(2,25,0 )
,(3,37,1 )
,(4,42,0 )
SELECT *
FROM
#Entries e0
WHERE
e0.Status = 1
OR e0.Status = 0
AND 0 < (SELECT count(*)
FROM
#Entries e1
WHERE e1.Status = 1 AND ABS(e1.Seconds - e0.Seconds) < 10)
SELECT DISTINCT
e0.*
FROM
#Entries e0
LEFT JOIN #Entries e1
ON e1.[Status] = 1
AND ABS(e1.seconds - e0.seconds) < 10
WHERE
e0.[Status] = 1
OR e1.id IS NOT NULL

mysql for percentage between rows

I have some sql that looks like this:
SELECT
stageName,
count(*) as `count`
FROM x2production.contact_stages
WHERE FROM_UNIXTIME(createDate) between '2016-05-01' AND DATE_ADD('2016-08-31', INTERVAL 1 DAY)
AND (stageName = 'DI-Whatever' OR stageName = 'DI-Quote' or stageName = 'DI-Meeting')
Group by stageName
Order by field(stageName, 'DI-Quote', 'DI-Meeting', 'DI-Whatever')
This produces a table that looks like:
+-------------+-------+
| stageName | count |
+-------------+-------+
| DI-quote | 1230 |
| DI-Meeting | 985 |
| DI-Whatever | 325 |
+-------------+-------+
Question:
I would like a percentage from one row to the next. For example the percentage of DI-Meeting to DI-quote. The math would be 100*985/1230 = 80.0%
So in the end the table would look like so:
+-------------+-------+------+
| stageName | count | perc |
+-------------+-------+------+
| DI-quote | 1230 | 0 |
| DI-Meeting | 985 | 80.0 |
| DI-Whatever | 325 | 32.9 |
+-------------+-------+------+
Is there any way to do this in mysql?
Here is an SQL fiddle to mess w/ the data: http://sqlfiddle.com/#!9/61398/1
The query
select stageName,count,if(rownum=1,0,round(count/toDivideBy*100,3)) as percent
from
( select stageName,count,greatest(#rn:=#rn+1,0) as rownum,
coalesce(if(#rn=1,count,#prev),null) as toDivideBy,
#prev:=count as dummy2
from
( SELECT
stageName,
count(*) as `count`
FROM Table1
WHERE FROM_UNIXTIME(createDate) between '2016-05-01' AND DATE_ADD('2016-08-31', INTERVAL 1 DAY)
AND (stageName = 'DI-Underwriting' OR stageName = 'DI-Quote' or stageName = 'DI-Meeting')
Group by stageName
Order by field(stageName, 'DI-Quote', 'DI-Meeting', 'DI-Underwriting')
) xDerived1
cross join (select #rn:=0,#prev:=-1) as xParams1
) xDerived2;
Results
+-----------------+-------+---------+
| stageName | count | percent |
+-----------------+-------+---------+
| DI-Quote | 16 | 0 |
| DI-Meeting | 13 | 81.250 |
| DI-Underwriting | 4 | 30.769 |
+-----------------+-------+---------+
Note, you want a 0 as the percent for the first row. That is easily changed to 100.
The cross join brings in the variables for use and initializes them. The greatest and coalesce are used for safety in variable use as spelled out well in this article, and clues from the MySQL Manual Page Operator Precedence. The derived tables names are just that: every derived table needs a name.
If you do not adhere to the principles in those referenced articles, then the use of variables is unsafe. I am not saying I nailed it, but that safety is always my focus.
The assignment of variables need to follow a safe form, such as the #rn variable being set on the inside of a function like greatest or least. We know that #rn is always greater than 0. So we are using the greatest function to force our will on the query. Same trick with coalesce, null will never happen, and := has lower precedence in the column that follows it. That is, the last one: #prev:= which follows the coalesce.
That way, a variable is set before other columns in that select row attempt to use its value.
So, just getting the expected results does not mean you did it safely and that it will work with your real data.
What you need is to use a LAG function, since MySQL doesn't support it your have to mimic it this way:
select stageName,
cnt,
IF(valBefore is null,0,((100*cnt)/valBefore)) as perc
from (SELECT tb.stageName,
tb.cnt,
#ct AS valBefore,
(#ct := cnt)
FROM (SELECT stageName,
count(*) as cnt
FROM Table1,
(SELECT #_stage = NULL,
#ct := NULL) vars
WHERE FROM_UNIXTIME(createDate) between '2016-05-01'
AND DATE_ADD('2016-08-31', INTERVAL 1 DAY)
AND stageName in ('DI-Underwriting', 'DI-Quote', 'DI-Meeting')
Group by stageName
Order by field(stageName, 'DI-Quote', 'DI-Meeting', 'DI-Underwriting')
) tb
WHERE (CASE WHEN #_stage IS NULL OR #_stage <> tb.stageName
THEN #ct := NULL
ELSE NULL END IS NULL)
) as final
See it working here: http://sqlfiddle.com/#!9/61398/35
EDIT I've actually edited it to remove an unnecessary step (subquery)

MySQL query to select the max date for the relation table based on criteria at second level

Here is my SQLFIDDLE
Basically I have three tables, A issues, journals and journal details.
I would like to have in a single query the following way of representation.
id | status_id | X |
90001 | 12 | NULL |
90002 | 12 | NULL |
90003 | 12 | 2015-01-06 |
90004 | 12 | 2015-01-09 |
The rule applied is for X is the max 'journals' created date at which the 'fixed_version_id' == 55 exists.
Please help.
Thank You,
I recommend you start by getting the details of all the journals that meet your requirement like this:
SELECT *
FROM journal_details
WHERE property = 'fixed_version_id' AND value = '55';
Then you can use those values to get the created date of the journal rows that meet this requirement:
SELECT j.issue_id, MAX(j.created_on) AS created_on
FROM journals j
JOIN journal_details jd ON jd.journal_id = j.id AND jd.property = 'fixed_version_id' AND jd.value = '55'
GROUP BY j.issue_id;
From these results, you can join in to get all issues. If you use an outer join, you'll get null for any journals that didn't meet the criteria:
SELECT i.id, i.status_id, tmp.created_on
FROM issues i
LEFT JOIN(
SELECT j.issue_id, MAX(j.created_on) AS created_on
FROM journals j
JOIN journal_details jd ON jd.journal_id = j.id AND jd.property = 'fixed_version_id' AND jd.value = '55'
GROUP BY j.issue_id
) tmp ON tmp.issue_id = i.id;
Here is an SQL Fiddle example.

SQL GROUP BY Issue on GROUP BY

I've written a query that builds a small table of information from a couple of data sources, it uses a self made table to reference the vehicle model for the final group by which is how the data needs to be viewed, however when I group by vehicle it misses out figures in the subquery column from the group by, i.e. if I group by Prefix it shows the correct numbers, grouped by Vehicle hides off some of the data.
The Prefix can relate to a couple of like vehicle models and hence the need to group by vehicle. Can anyone see what I've done wrong easily from the SQL query below please.
SELECT Vehicle, COUNT(`Chassis-No`) AS Stock,
ROUND((100/COUNT(`Chassis-No`)) * SUM(CASE WHEN `Vehicle Age` > '182' THEN 1 ELSE 0 END),1) AS Perc6Months,
ROUND((100/COUNT(`Chassis-No`)) * SUM(CASE WHEN `Vehicle Age` > '365' THEN 1 ELSE 0 END),1) AS Perc12Months,
(SELECT COUNT(VIN_Prefix) FROM Orderdownload
INNER JOIN VehicleMatrix ON (`VIN_Prefix` LIKE 'S%' AND Prefix = LEFT(`VIN_Prefix`,2)) OR (`VIN_Prefix` NOT LIKE 'S%' AND Prefix = LEFT(`VIN_Prefix`,1)) WHERE DealerCode = 'AA12345' AND `VIN_Prefix` = IF(LEFT(`Chassis-No`,1)='S',LEFT(`Chassis-No`,2),LEFT(`Chassis-No`,1))) As Qty
FROM DealerAgedStock
INNER JOIN VehicleMatrix AS VM
ON (`Chassis-No` LIKE 'S%' AND Prefix = LEFT(`Chassis-No`,2)) OR (`Chassis-No` NOT LIKE 'S%' AND Prefix = LEFT(`Chassis-No`,1))
WHERE `DL Dealer Code` = 'AA12345'
GROUP BY Vehicle
Grouped on Vehicle I get the following:
Vehicle | Perc6Months | Perc12Months | Qty
Mondeo | 37.5 | 0 | 2
Grouped on Prefix I get the following:
VIN_Prefix | Perc6Months | Perc12Months | Qty
S1 | 25 | 0 | 2
S2 | 50 | 0 | 2
Ideally it should look this this:
Vehicle | Perc6Months | Perc12Months | Qty
Mondeo | 37.5 | 0 | 4
Where S1 and S2 are relative to the Vehicle Mondeo, thus it gives me the first instance of subquery rather than adding them together.
My question is: why does the Group By not add the figures together properly from the subquery? I need it to add them to have the correct figures...

Improving this MySQL Query - Select as sub-query

I have this query
SELECT
shot.hole AS hole,
shot.id AS id,
(SELECT s.id FROM shot AS s
WHERE s.hole = shot.hole AND s.shot_number > shot.shot_number AND shot.round_id = s.round_id
ORDER BY s.shot_number ASC LIMIT 1) AS next_shot_id,
shot.distance AS distance_remaining,
shot.type AS hit_type,
shot.area AS onto
FROM shot
JOIN course ON shot.course_id = course.id
JOIN round ON shot.round_id = round.id
WHERE round.uID = 78
This returns 900~ rows in around 0.7 seconds. This is OK-ish, but there are more lines like this required
(SELECT s.id FROM shot AS s
WHERE s.hole = shot.hole AND s.shot_number > shot.shot_number AND shot.round_id = s.round_id
ORDER BY s.shot_number ASC LIMIT 1) AS next_shot_id,
For example
(SELECT s.id FROM shot AS s
WHERE s.hole = shot.hole AND s.shot_number < shot.shot_number AND shot.round_id = s.round_id
ORDER BY s.shot_number ASC LIMIT 1) AS past_shot_id,
Adding this increases the load time to 10s of seconds which is far too long and the page often doesn't load at all or MySQL just locks up and using show processlist shows that the query is just sat there sending data.
Removing the ORDER BY s.shot_number ASC clause in those sub queries reduces the query time down to 0.05 seconds which is much much better. But the ORDER BY is required to ensure that the next or past row (shot) is returned, rather than any old random row.
How can I improve this query to make it run faster and return the same results. Perhaps my approach for obtaining the next and past rows is sub optimal and I need to look at a different way of returning those next and previous row IDs?
EDIT - additional background info
The query was fine on my testing domain, a subdomain. But when moved to the live domain the issues started. Hardly anything was changed yet the whole site came to halt because of these new slow queries. Key notes:
Different domain
Different folder in /var/www
Same DB
Same DB credentials
Same code
Added indexes in an attempt to fix - this didn't help
Could any of these affected the load time?
This will get marked down in a minute for 'not being an answer', but it illustrates a possible solution without simply handing it to you on a plate....
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
SELECT x.i, MIN(y.i) FROM ints x LEFT JOIN ints y ON y.i > x.i GROUP BY x.i;
+---+----------+
| i | MIN(y.i) |
+---+----------+
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 5 | 6 |
| 6 | 7 |
| 7 | 8 |
| 8 | 9 |
| 9 | NULL |
+---+----------+
To expand on Strawberry's answer, doing additional left-join for a "pre-query" to get all the prior / next IDs, then join out to get whatever details you need.
select
Shot.ID,
Shot.Hole,
Shot.Distance as Distance_Remaining,
Shot.Type as Hit_Type,
Shot.Area as Onto
PriorShot.Hole as PriorHole,
PriorShot.Distance as PriorDistanceRemain,
NextShot.Hole as NextHole,
NextShot.Distance as NextDistanceRemain
from
( SELECT
shot.id,
MIN(nextshot.id) as NextShotID,
MAX(priorshot.id) as PriorShotID
FROM
round
JOIN shot
on round.id = shot.round_id
LEFT JOIN shot nextshot
ON shot.round_id = nextshot.round_id
AND shot.hole = nextshot.hole
AND shot.shot_number < nextshot.shot_number
LEFT JOIN shot priorshot
ON shot.round_id = priorshot.round_id
AND shot.hole = priorshot.hole
AND shot.shot_number > priorshot.shot_number
WHERE
round.uID = 78
GROUP BY
shot.id ) AllShots
JOIN Shot
on AllShots.id = Shot.ID
LEFT JOIN shot PriorShot
on AllShots.PriorShotID = PriorShot.ID
LEFT JOIN shot NextShot
on AllShots.NextShotID = NextShot.ID
The inner query gets only those for round.uID = 78, then you can join to the next / prior as needed. I did not add the joins to the course and round tables as no result columns were presented, but could easily be added.
I wonder how well the following performs. It replaces the joining operations with string operations.
SELECT shot.hole AS hole, shot.id AS id,
substring_index(substring_index(shots, ',', find_in_set(shot.id, ss.shots) + 1), ',', -1
) as nextsi,
substring_index(substring_index(shots, ',', find_in_set(shot.id, ss.shots) - 1), ',', -1
) as prevsi,
shot.distance AS distance_remaining, shot.type AS hit_type, shot.area AS onto
FROM shot JOIN
course
ON shot.course_id = course.id JOIN
round
ON shot.round_id = round.id join
(select s.round_id, s.hole, group_concat(s.id order by s.shot_number) as shots
from shot s
group by s.round_id, s.hole
) ss
on ss.round_id = shot.round_id and ss.hole = shot.hole
WHERE round.uID = 78
Note that this doesn't work fully -- it will produce erroneous results on the first and last shot. I'm wondering how the performance is before fixing those details.