MySQL Count after an specific value shows - mysql

The problem is, I need to calculate average number of pages/hits after reaching the pax
page (including pax hit).
The database is:
CREATE TABLE search (
SESSION_ID INTEGER,
HIT_NUMBER INTEGER,
PAGE VARCHAR(24),
MEDIUM_T VARCHAR(24)
);
INSERT INTO search
(SESSION_ID, HIT_NUMBER, PAGE, MEDIUM_T)
VALUES
('123', '1', 'home', 'direct'),
('123', '2', 'flights_home', 'direct'),
('123', '3', 'results', 'direct'),
('456', '1', 'pax', 'metasearch'),
('789', '1', 'home', 'partners'),
('789', '2', 'flights_home', 'partners'),
('789', '3', 'results', 'partners'),
('789', '4', 'home', 'partners'),
('146', '1', 'results', 'SEM'),
('146', '2', 'pax', 'SEM'),
('146', '3', 'payment', 'SEM'),
('146', '4', 'confirmation', 'SEM');
And my approach is:
SELECT s1.SESSION_ID, COUNT(*) as sCOUNT
FROM search s1
WHERE PAGE = 'pax'
GROUP BY s1.SESSION_ID
UNION ALL
SELECT 'Total AVG', AVG(a.sCOUNT)
FROM (
SELECT COUNT(*) as sCOUNT
FROM search s2
GROUP BY s2.SESSION_ID
) a
Obviously the 3r line is wrong, my code misses the part in which after 'pax' is shown starts counting and I don't have any clue for that.
Thank you in advanced :)

Finding all pax pages and the ones after it could be done with exists. Rest is straight forward:
SELECT AVG(hits)
FROM (
SELECT session_id, COUNT(*) AS hits
FROM search AS s1
WHERE page = 'pax' OR EXISTS (
SELECT *
FROM search AS s2
WHERE s2.session_id = s1.session_id
AND s2.hit_number < s1.hit_number
AND s2.page = 'pax'
)
GROUP BY session_id
) AS x
If using MySQL 8 then window functions provide a simpler solution:
WITH cte1 AS (
SELECT session_id, MAX(CASE WHEN page = 'pax' THEN 1 END) OVER (
PARTITION BY session_id
ORDER BY hit_number
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS countme
FROM search
), cte2 as (
SELECT COUNT(*) AS hits
FROM cte1
WHERE countme IS NOT NULL
GROUP BY session_id
)
SELECT AVG(hits)
FROM cte2

My approach uses WITH CTE (common-table-expression) to pre-declare what the underlying query basis is, then querying and averaging from that.
First one premise that was not explicitly covered in your sample data. What happens IF a user bounces back and forth between multiple pages and hits the PAX page more than once. You now have multiple pax page hits. I would assume you want the FIRST instance to such pax page and that is inclusive of all page hits. This solution should help account for it.
Lets look at the inner-most from clause with final alias "pxHits".
I am grouping by session ID and grabbing the FIRST INSTANCE of a pax page hit (or null if no such pax page encountered), but ALSO grabbing the HIGHEST hit number per session. The HAVING clause will make sure that it only returns those sessions that HAD a PAX page are returned leaving all other sessions excluded from the results.
This would result with two entries passed up to the outer select which includes the 1 + lastHitNumber - firstPaxHit calculation. The reason for the 1 + is because you at least HIT the page once. But, in the scenario of your session 456 where the first and last hit WERE the first page, you need that since the lastHitNumber - firstPaxHit would net zero. This would be true if a person had 25 page hits and got to the pax page on page 26. Your result would still be 1 via 1 + 26 - 26 = 1 total page including the pax page, not the 25 prior to.
Your other qualifying session would be 146. The first pax hit was 2 but they proceeded to a highest page hit of 4. so 1 + 4 - 2 = 3 total pages.
So now on to the final. Since you can see the HOW things are prepared, we can now get the averages. You can't mix/auto convert different data types (session_id vs the fixed message of your 'Total Avg'. They must be the same type. So my query is converting the session_id to character to match. I happen to be getting the AVERAGE query first as a simple select from the WITH CTE alias, and THEN getting the actual session_id and counts.
with PaxSummary as
(
select
pxHits.*,
1 + lastHitNumber - firstPaxHit HitsIncludingPax
from
( select
session_id,
min( case when page = 'pax'
then hit_number
else null end ) firstPaxHit,
max( hit_number ) lastHitNumber
from
search
group by
session_id
having
min( case when page = 'pax'
then hit_number
else null end ) > 0 ) pxHits
)
select
'Avg Pax Pages' FinalMsg,
avg( ps2.HitsIncludingPax ) HitsIncludingPax
from
PaxSummary ps2
union all
select
cast( ps1.session_id as varchar) FinalMsg,
ps1.HitsIncludingPax
from
PaxSummary ps1

As an alternative to the EXISTS (correlated subquery) pattern, we can write a query that gets us the hit_number of the first 'pax' hit for each session_id, and use that as an inline view.
Something along these lines:
-- count hits on or after the first 'pax' of each session_id that has a 'pax' hit
SELECT s.session_id
, COUNT(*) AS cnt_hits_after_pax
FROM ( -- get the first 'pax' hit for each session_id
-- exclude session_id that do not have a 'pax' hit
SELECT px.session_id AS pax_session_id
, MIN(px.hit_number) AS pax_hit_number
FROM search px
WHERE px.page = 'pax'
) p
-- all the hits for session_id on or after the first 'pax' hit
JOIN search s
ON s.session_id = p.session_id
AND s.hit_number >= p.hit_number
GROUP BY s.session_id
to get an average from that query, we can wrap it parens and turn it into an inline view
SELECT AVG(c.cnt_hits_after_pax) AS avg_cnt_hits_after_pax
FROM (
-- query above goes here
) c

Related

Is there a way to use aggregate COUNT() values within CASE?

I need to retrieve unique yet truncated part numbers, with their description values being conditionally determined.
DATA:
Here's some simplified sample data:
(the real table has half a million rows)
create table inventory(
partnumber VARCHAR(10),
description VARCHAR(10)
);
INSERT INTO inventory (partnumber,description) VALUES
('12345','ABCDE'),
('123456','ABCDEF'),
('1234567','ABCDEFG'),
('98765','ZYXWV'),
('987654','ZYXWVU'),
('9876543','ZYXWVUT'),
('abcde',''),
('abcdef','123'),
('abcdefg','321'),
('zyxwv',NULL),
('zyxwvu','987'),
('zyxwvut','789');
TRIED:
I've tried too many things to list here.
I've finally found a way to get past all the 'unknown field' errors and at least get SOME results, but:
it's SUPER kludgy!
my results are not limited to unique prods.
Here's my current query:
SELECT
LEFT(i.partnumber, 6) AS prod,
CASE
WHEN agg.cnt > 1
OR i.description IS NULL
OR i.description = ''
THEN LEFT(i.partnumber, 6)
ELSE i.description
END AS `descrip`
FROM inventory i
INNER JOIN (SELECT LEFT(ii.partnumber, 6) t, COUNT(*) cnt
FROM inventory ii GROUP BY ii.partnumber) AS agg
ON LEFT(i.partnumber, 6) = agg.t;
GOAL:
My goal is to retrieve:
prod
descrip
12345
ABCDE
123456
123456
98765
ZYXWV
987654
987654
abcde
abcde
abcdef
abcdef
zyxwv
zyxwv
zyxwvu
zyxwvu
QUESTION:
What are some cleaner ways to use the COUNT() aggregate data with a CASE type conditional?
How can I limit my results so that all prods are UNIQUE?
You can check if a left(partnumber, 6) is not unique in the result by checking if count(*) > 1. In such a case let descrip be left(partnumber, 6). Otherwise you can use max(description) (or min(description)) to get the single description but satisfy the needs to use an aggregation function on columns not in the GROUP BY. To replace empty or NULL descriptions, nullif() and coalesce() can be used.
That would lead to the following using just one level of aggregation and no joins:
SELECT left(partnumber, 6) AS prod,
CASE
WHEN count(*) > 1 THEN
left(partnumber, 6)
ELSE
coalesce(nullif(max(description), ''), left(partnumber, 6))
END AS descrip
FROM inventory
GROUP BY left(partnumber, 6)
ORDER BY left(partnumber, 6);
But there seems to be a bug in MySQL and this query fails. The engine doesn't "see" that, in the list after SELECT partnumber is only used in the expression left(partnumber, 6), which is also in the GROUP BY. Instead the engine falsely complains about partnumber not being in the GROUP BY and not subject to an aggregation function.
As a workaround, we can use a derived table, that does the shortening of partnumber to its first six characters. We then use use that column of the derived table instead of left(partnumber, 6).
SELECT l6pn AS prod,
CASE
WHEN count(*) > 1 THEN
l6pn
ELSE
coalesce(nullif(max(description), ''), l6pn)
END AS descrip
FROM (SELECT left(partnumber, 6) AS l6pn,
description
FROM inventory) AS x
GROUP BY l6pn
ORDER BY l6pn;
Or we slap some actually pointless max()es around the left(partnumber, 6) other than the first, to work around the bug.
SELECT left(partnumber, 6) AS prod,
CASE
WHEN count(*) > 1 THEN
max(left(partnumber, 6))
ELSE
coalesce(nullif(max(description), ''), max(left(partnumber, 6)))
END AS descrip
FROM inventory
GROUP BY left(partnumber, 6)
ORDER BY left(partnumber, 6);
db<>fiddle (Change the DBMS to some other like Postgres or MariaDB to see that they also accept the first query.)

mysql count distinct value

I have trouble wondering how do I count distinct value. using if on the select column
I have SQLFIDDLE here
http://sqlfiddle.com/#!2/6bfb9/3
Records shows:
create table team_record (
id tinyint,
project_id int,
position varchar(45)
);
insert into team_record values
(1,1, 'Junior1'),
(2,1, 'Junior1'),
(3,1, 'Junior2'),
(4,1, 'Junior3'),
(5,1, 'Senior1'),
(6,1, 'Senior1'),
(8,1, 'Senior2'),
(9,1, 'Senior2'),
(10,1,'Senior3'),
(11,1, 'Senior3'),
(12,1, 'Senior3')
I need to count all distinct value, between Junior and Senior column.
all same value would count as 1.
I need to see result something like this.
PROJECT_ID SENIOR_TOTAL JUNIOR_TOTAL
1 3 3
mysql query is this. but this is not a query to get the result above.
SELECT
`team_record`.`project_id`,
`position`,
SUM(IF(position LIKE 'Senior%',
1,
0)) AS `Senior_Total`,
SUM(IF(position LIKE 'Junior%',
1,
0)) AS `Junior_Total`
FROM
(`team_record`)
WHERE
project_id = '1'
GROUP BY `team_record`.`project_id`
maybe you could help me fix my query above to get the result I need.
thanks
I think you want this:
SELECT
project_id,
COUNT(DISTINCT CASE when position LIKE 'Senior%' THEN position END) Senior_Total,
COUNT(DISTINCT CASE when position LIKE 'Junior%' THEN position END) Junior_Total
FROM team_record
WHERE project_id = 1
GROUP BY project_id
The CASE will return a null if the WHEN is false (ie ELSE NULL is the default, which I omitted for brevity), and nulls aren't counted in DISTINCT.
Also, unnecessary back ticks, brackets and qualification removed.

Working out Percentages whilst leaving out blank entries

I have a table called feedback, to monitor feedback from an event:
id, heardabout, booking
0 friend 0
1 online 5
2 friend 3
And i've been using this query to work out the percentages for them:
SELECT `booking` AS `rating`, (COUNT(`booking`) * 100 / (Select COUNT(*) FROM `feedback`)) AS `percent` FROM `feedback` GROUP BY `booking`;
Which produces:
rating, percent
0 10.1449
3 5.7971
4 13.0435
5 71.0145
Which is fine, and correct. However I don't want to count the '0' entries for certain fields (this means they weren't applicable to the user) How would I go about doing that? Simply adding WHEREbooking!= '0' to the above query doesn't achieve anything but leave it out, the numbers don't change and consequently they don't add up to 100.
Just for future reference, I actually solved this:
The SQL statement I needed was:
SELECT `booking` AS `rating`,
(COUNT(`booking`) * 100 / (Select COUNT(*) FROM `feedback` WHERE `booking` != '0')) AS `percent`
FROM `feedback` WHERE `booking` != '0' GROUP BY `booking`;
The key was putting the WHERE condition within the sub-query SELECT COUNT (*) (and also on the end). This means that the total to divide by is given without the '0' entries.

SELECT CASE, COUNT(*)

I want to select the number of users that has marked some content as favorite and also return if the current user has "voted" or not. My table looks like this
CREATE TABLE IF NOT EXISTS `favorites` (
`user` int(11) NOT NULL DEFAULT '0',
`content` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`user`,`content`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Say I have 3 rows containing
INSERT INTO `favorites` (`user`, `content`) VALUES
(11, 26977),
(22, 26977),
(33, 26977);
Using this
SELECT COUNT(*), CASE
WHEN user='22'
THEN 1
ELSE 0
END as has_voted
FROM favorites WHERE content = '26977'
I expect to get has_voted=1 and COUNT(*)=3 but
I get has_voted=0 and COUNT(*)=3. Why is that? How to fix it?
This is because you mixed aggregated and non-aggregated expressions in a single SELECT. Aggregated expressions work on many rows; non-aggregated expressions work on a single row. An aggregated (i.e. COUNT(*)) and a non-aggregated (i.e. CASE) expressions should appear in the same SELECT when you have a GROUP BY, which does not make sense in your situation.
You can fix your query by aggregating the second expression - i.e. adding a SUM around it, like this:
SELECT
COUNT(*) AS FavoriteCount
, SUM(CASE WHEN user=22 THEN 1 ELSE 0 END) as has_voted
FROM favorites
WHERE content = 26977
Now both expressions are aggregated, so you should get the expected results.
Try this with SUM() and without CASE
SELECT
COUNT(*),
SUM(USER = '22') AS has_voted
FROM
favorites
WHERE content = '26977'
See Fiddle Demo
Try this:
SELECT COUNT(*), MAX(USER=22) AS has_voted
FROM favorites
WHERE content = 26977;
Check the SQL FIDDLE DEMO
OUTPUT
| COUNT(*) | HAS_VOTED |
|----------|-----------|
| 3 | 1 |
You need sum of votes.
SELECT COUNT(*), SUM(CASE
WHEN user='22'
THEN 1
ELSE 0
END) as has_voted
FROM favorites WHERE content = '26977'
You are inadvertently using a MySQL feature here: You aggregate your results to get only one result record showing the number of matches (aggregate function COUNT). But you also show the user (or rather an expression built on it) in your result line (without any aggregate function). So the question is: Which user? Another dbms would have given you an error, asking you to either state the user in a GROUP BY or aggregate users. MySQL instead picks a random user.
What you want to do here is aggregate users (or rather have your expression aggregated). Use SUM to sum all votes the user has given on the requested content:
SELECT
COUNT(*),
SUM(CASE WHEN user='22' THEN 1 ELSE 0 END) as sum_votes
FROM favorites
WHERE content = '26977';
You forgot to wrap the CASE statement inside an aggregate function. In this case has_voted will contain unexpected results since you are actually doing a "partial group by". Here is what you need to do:
SELECT COUNT(*), SUM(CASE WHEN USER = 22 THEN 1 ELSE 0 END) AS has_voted
FROM favorites
WHERE content = 26977
Or:
SELECT COUNT(*), COUNT(CASE WHEN USER = 22 THEN 1 ELSE NULL END) AS has_voted
FROM favorites
WHERE content = 26977

MAX with extra criteria

I have the following part of a query I'm working on in MYSQL.
SELECT
MAX(CAST(MatchPlayerBatting.BatRuns AS SIGNED)) AS HighestScore
FROM
MatchPlayerBatting
It returns the correct result. However there is another column I need it to work off.
That is if the maximum value it finds also has a value of "not out" within "BatHowOut", it should show the result as for example 96* rather than just 96.
How could this be done?
To help make the data concrete, consider two cases:
BatRuns BatHowOut
96 not out
96 lbw
BatRuns BatHowOut
96 not out
102 lbw
For the first data, the answer should be '96*'; for the second, '102'.
You can achieve this using self-join like this:
SELECT t1.ID
, CONCAT(t1.BatRuns,
CASE WHEN t1.BatHowOut = 'Not Out' THEN '*' ELSE '' END
) AS HighScore
FROM MatchPlayerBatting t1
JOIN
(
SELECT MAX(BatRuns) AS HighestScore
FROM MatchPlayerBatting
) t2
ON t1.BatRuns = t2.HighestScore
See this sample SQLFiddle with highest "Not Out"
See this another sample SQLFiddle with highest "Out"
See this another sample SQLFiddle with two highest scores
How about ordering the scores in descending order and selecting only the first record?
select concat(BatRuns , case when BatHowOut = 'not out' then '*' else '' end)
from mytable
order by cast(BatRuns as signed) desc,
(case when BatHowOut = 'not out' then 1 else 2 end)
limit 1;
Sample here.
If you want to find highest score score for each player, here is a solution that may not be elegant, but quite effective.
select PlayerID,
case when runs != round(runs)
then concat(round(runs),'*')
else
round(runs)
end highest_score
from (select PlayerID,
max(cast(BatRuns as decimal) +
case when BatHowOut = 'not out' then 0.1 else 0 end
) runs
from MatchPlayerBatting
group by PlayerID) max_runs;
This takes advantage of the fact that, runs can never be fractions, only whole numbers. When there is a tie for highest score and one of them is unbeaten,
adding 0.1 to the unbeaten score will make it the highest. This can be later removed and concatenated with *.
Sample here.