MySQL Query to average 3 columns and exclude 0's? - mysql

This is obviously wrong, but what would be the correct way to average the SUM of 3 columns and exclude the 0's?
SELECT (
AVG(NULLIF(`dices`.`Die1`,0)) +
AVG(NULLIF(`dices`.`Die2`,0)) +
AVG(NULLIF(`dices`.`Die3`,0))
) /3 as avgAllDice
FROM (
SELECT `Die1`,`Die2`,`Die3` FROM `GameLog`
WHERE PlayerId = "12345"
) dices
Thanks.

If I was keeping the inline view query (it's not clear why it's needed). I'd probably do something like this:
SELECT AVG( NULLIF( CASE d.i
WHEN 1 THEN dices.`Die1`
WHEN 2 THEN dices.`Die2`
WHEN 3 THEN dices.`Die3`
END
,0)
) AS `avgAllDice`
FROM ( SELECT gl.`Die1`
, gl.`Die2`
, gl.`Die3`
FROM `GameLog` gl
WHERE gl.playerId = '12345'
) dices
CROSS
JOIN ( SELECT 1 AS i UNION ALL SELECT 2 UNION ALL SELECT 3 ) d
The trick is the cross join operation, giving me three rows for each row returned from dices, and an expression that picks out values of Die1, Die2 and Die3 on each of three rows, respectively.
To exclude values of 0, we replace 0 with with NULL (since AVG doesn't include NULL values.)
Now with all of the non-zero DieN values stacked into a single column, we can just use the AVG function.
Another way to do it would be to get the numerator and denominator for each of Die1, Die2, Die3.... and then total up the numerators, total up the denominators, and then divide the total numerator by the total denominator.
This will should give an equivalent result.
SELECT ( IFNULL(t.n_die1,0) + IFNULL(t.n_die2,0) + IFNULL(t.n_die3,0) )
/ ( t.d_die1 + t.d_die2 + t.d_die3 )
AS avgAllDice
FROM ( SELECT SUM( NULLIF(gl.die1,0)) AS n_die1
, COUNT(NULLIF(gl.die1,0)) AS d_die1
, SUM( NULLIF(gl.die2,0)) AS n_die2
, COUNT(NULLIF(gl.die2,0)) AS d_die2
, SUM( NULLIF(gl.die3,0)) AS n_die3
, COUNT(NULLIF(gl.die3,0)) AS d_die3
FROM `GameLog` gl
WHERE gl.playerid = '12345'
) t
(I didn't work out what gets returned in the edge and corner cases... no matching rows in GameLog, all values of Die1, Die2 and Die3 are zero, etc., for either query. The results might be slightly different, returning a zero instead of NULL, divide by zero edge case, etc.)
FOLLOWUP
I ran a quick test of both queries.
CREATE DATABASE d20170228 ;
USE d20170228 ;
CREATE TABLE GameLog
( playerid VARCHAR(5) DEFAULT '12345'
, die1 TINYINT
, die2 TINYINT
, die3 TINYINT
);
INSERT INTO GameLog (die1,die2,die3)
VALUES (3,0,0),(2,1,0),(4,3,3),(3,3,3),(0,0,0),(4,4,4),(5,4,0),(0,0,2)
;
SELECT (3+2+1+4+3+3+3+3+3+4+4+4+5+4+2)/15 AS manual_avg
manual_avg is coming out 3.2.
Both queries are also returning 3.2

If you want to eliminate zeroes and NULLs, you can simply SELECT from the filtered master set multiple times, doing a UNION ALL on the results, then averaging against that.
SELECT AVG(`allDice`.`DieResult`)
FROM (
SELECT `Die1` AS `DieResult` FROM `GameLog` WHERE COALESCE(`Die1`, 0) <> 0 AND PlayerId = '12345'
UNION ALL
SELECT `Die2` FROM `GameLog` WHERE COALESCE(`Die2`, 0) <> 0 AND PlayerId = '12345'
UNION ALL
SELECT `Die3` FROM `GameLog` WHERE COALESCE(`Die3`, 0) <> 0 AND PlayerId = '12345'
) AS `allDice`
There's no need to overthink this one, it's not too difficult a problem

Related

wrong result mysql in function only

I have the following function, mysql query:
BEGIN
DECLARE r float(10,2);
DECLARE var_total float(10,2);
DECLARE var_discount float(10,2) DEFAULT null;
SELECT
sum(x.amount)
FROM
(
(SELECT
student_booking_school_course_price as amount
FROM
tbl_student_booking_school_course
WHERE
student_booking_id=par_student_booking_id
)
UNION
(SELECT
student_booking_school_accommodation_price as amount
FROM
tbl_student_booking_school_accommodation
WHERE
student_booking_id=par_student_booking_id
)
UNION
(SELECT
student_booking_school_insurance_price as amount
FROM
tbl_student_booking_school_insurance
WHERE
student_booking_id=par_student_booking_id
)
UNION
(SELECT
student_booking_school_transfer_price as amount
FROM
tbl_student_booking_school_transfer
WHERE
student_booking_id=par_student_booking_id
)
) x
INTO var_total;
IF var_total IS NULL THEN
SET r = 0;
END IF;
-- discount
SET var_discount = (SELECT
sb.student_booking_discount_amount
FROM
tbl_student_booking sb
WHERE
sb.student_booking_id=par_student_booking_id LIMIT 1);
IF var_discount IS NOT NULL THEN
SET r = var_total - var_discount;
end if;
return r;
END
The values are:
9698.88 course
559.55 accommodation
559.55 insurance
145.98 discount
It seems that the first query inside the function, only sums distinct values, as the result with discount is: 10112.45, so is not summing one value of 559.55, I tried to output different things as concat with a string and only see the result as 9698.88course,559.55accommodation, etc.. and it is fine. So I assume the issue is that is not summing if values are equals. The strange thing is that running this from the console, only the query outside the function, it sums ok.
My question is this a normal behaviour of MySql?If so is there a way to prevent this? is this a bug?
What you need here is UNION ALL clause:
SELECT
sum(x.amount)
FROM
(
(SELECT
student_booking_school_course_price as amount
FROM
tbl_student_booking_school_course
WHERE
student_booking_id=par_student_booking_id
)
UNION ALL
(SELECT
student_booking_school_accommodation_price as amount
FROM
tbl_student_booking_school_accommodation
WHERE
student_booking_id=par_student_booking_id
)
UNION ALL
(SELECT
student_booking_school_insurance_price as amount
FROM
tbl_student_booking_school_insurance
WHERE
student_booking_id=par_student_booking_id
)
UNION ALL
(SELECT
student_booking_school_transfer_price as amount
FROM
tbl_student_booking_school_transfer
WHERE
student_booking_id=par_student_booking_id
)
) x
INTO var_total;
The MySQL UNION Documentation says:
A DISTINCT union can be produced explicitly by using UNION DISTINCT or
implicitly by using UNION with no following DISTINCT or ALL keyword.

Calculate medians for multiple columns in the same table in one query call

StackOverflow to the rescue!, I need to find the medians for five columns at once, in one query call.
The median calculations below work for single columns, but when combined, multiple uses of "rownum" throws the query off. How can I update this to work for multiple columns? THANK YOU. It's to create a web tool where nonprofits can compare their financial metrics to user-defined peer groups.
SELECT t1_wages.totalwages_pctoftotexp AS median_totalwages_pctoftotexp
FROM (
SELECT #rownum := #rownum +1 AS `row_number` , d_wages.totalwages_pctoftotexp
FROM data_990_c3 d_wages, (
SELECT #rownum :=0
)r_wages
WHERE totalwages_pctoftotexp >0
ORDER BY d_wages.totalwages_pctoftotexp
) AS t1_wages, (
SELECT COUNT( * ) AS total_rows
FROM data_990_c3 d_wages
WHERE totalwages_pctoftotexp >0
) AS t2_wages
WHERE 1
AND t1_wages.row_number = FLOOR( total_rows /2 ) +1
--- [that was one median, below is another] ---
SELECT t1_solvent.solvent_days AS median_solvent_days
FROM (
SELECT #rownum := #rownum +1 AS `row_number` , d_solvent.solvent_days
FROM data_990_c3 d_solvent, (
SELECT #rownum :=0
)r_solvent
WHERE solvent_days >0
ORDER BY d_solvent.solvent_days
) AS t1_solvent, (
SELECT COUNT( * ) AS total_rows
FROM data_990_c3 d_solvent
WHERE solvent_days >0
) AS t2_solvent
WHERE 1
AND t1_solvent.row_number = FLOOR( total_rows /2 ) +1
[those are two - there are five in total I'll eventually need to find medians for at once]
This kind of thing is a big pain in the neck in MySQL. You might be wise to use the free Oracle Express Edition or postgreSQL if you're going to do tonnage of this statistical ranking work. They all have MEDIAN(value) aggregate functions that are either built-in or available as extensions. Here's a little sqlfiddle demonstrating that. http://sqlfiddle.com/#!4/53de8/6/0
But you didn't ask about that.
In MySQL, your basic problem is the scope of variables like #rownum. You also have a pivoting problem: that is, you need to turn rows of your query into columns.
Let's tackle the pivot problem first. What you're going to do is create a union of several big fat queries. For example:
SELECT 'median_wages' AS tag, wages AS value
FROM (big fat query making median wages) A
UNION
SELECT 'median_volunteer_hours' AS tag, hours AS value
FROM (big fat query making median volunteer hours) B
UNION
SELECT 'median_solvent_days' AS tag, days AS value
FROM (big fat query making median solvency days) C
So here are your results in a table of tag / value pairs. You can pivot that table like so, to get one row with a value in each column.
SELECT SUM( CASE tag WHEN 'median_wages' THEN value ELSE 0 END
) AS median_wages,
SELECT SUM( CASE tag WHEN 'median_volunteer_hours' THEN value ELSE 0 END
) AS median_volunteer_hours,
SELECT SUM( CASE tag WHEN 'median_solvent_days' THEN value ELSE 0 END
) AS median_solvent_days
FROM (
/* the above gigantic UNION query */
) Q
That's how you pivot up rows (from the UNION query in this case) to columns. Here's a tutorial on the topic. http://www.artfulsoftware.com/infotree/qrytip.php?id=523
Now we need to tackle the median-computing subqueries. The code in your question looks pretty good. I don't have your data so it's hard for me to evaluate it.
But you need to avoid reusing the #rownum variable. Call it #rownum1 in one of your queries, #rownum2 in the next one, and so on. Here's a dinky sql fiddle doing just one of these. http://sqlfiddle.com/#!2/2f770/1/0
Now let's build it up a bit, doing two different medians. Here's the fiddle http://sqlfiddle.com/#!2/2f770/2/0 and here's the UNION query. Notice the second half of the union query uses #rownum2 instead of #rownum.
Finally, here's the full query with the pivoting. http://sqlfiddle.com/#!2/2f770/13/0
SELECT SUM( CASE tag WHEN 'Boston' THEN value ELSE 0 END ) AS Boston,
SUM( CASE tag WHEN 'Bronx' THEN value ELSE 0 END ) AS Bronx
FROM (
SELECT 'Boston' AS tag, pop AS VALUE
FROM (
SELECT #rownum := #rownum +1 AS `row_number` , pop
FROM pops,
(SELECT #rownum :=0)r
WHERE pop >0 AND city = 'Boston'
ORDER BY pop
) AS ordered_rows,
(
SELECT COUNT( * ) AS total_rows
FROM pops
WHERE pop >0 AND city = 'Boston'
) AS rowcount
WHERE ordered_rows.row_number = FLOOR( total_rows /2 ) +1
UNION ALL
SELECT 'Bronx' AS tag, pop AS VALUE
FROM (
SELECT #rownum2 := #rownum2 +1 AS `row_number` , pop
FROM pops,
(SELECT #rownum2 :=0)r
WHERE pop >0 AND city = 'Bronx'
ORDER BY pop
) AS ordered_rows,
(
SELECT COUNT( * ) AS total_rows
FROM pops
WHERE pop >0 AND city = 'Bronx'
) AS rowcount
WHERE ordered_rows.row_number = FLOOR( total_rows /2 ) +1
) D
This is just two medians. You need five. I think it's easy to make the case that this median computation is absurdly difficult to do in MySQL in a single query.
Suppose you have a table with three columns like table(key, value1, value2).
this query gives you the median value of the two value columns for each key:
SELECT key,
((array_agg(value1 order by value1 asc) )[floor( (count(*)+1)::float/2)] + (array_agg(value1 order by value1 asc) )[ceiling( (count(*)+1)::float/2) ] )/2,
((array_agg(value2 order by value2 asc) )[floor( (count(*)+1)::float/2)] + (array_agg(value2 order by value2 asc) )[ceiling( (count(*)+1)::float/2) ] )/2
FROM table
GROUP BY key

How to get the smallest Integers not yet in a database column

I have a table in a MySQL DB with an UNIQUE INT(10) column. The table is pretty populated and the row contains non-consecutive entries of Integer numbers in that column. I would like to do a query, which gets me the smallest number (or the n smallest numbers) that is not in any row.
Example: The table contains rows with values (1, 2, 3, 5, 7, 8, 10, 12, 15) for the column. The sql statement should return i.e. the five lowest non-contained values, which are 4, 6, 9, 11, 13 in this case.
Is this possible with MySQL?
You can use a "numbers" table (it's handy for various operations):
CREATE TABLE num
( i UNSIGNED INT NOT NULL
, PRIMARY KEY (i)
) ;
INSERT INTO num (i)
VALUES
(1), (2), ..., (1000000) ;
Then:
SELECT
num.i
FROM
num
LEFT JOIN
tableX AS t
ON num.i = t.columnX
WHERE
t.columnX IS NULL
ORDER BY
num.i
LIMIT 5
or:
SELECT
num.i
FROM
num
WHERE
NOT EXISTS
( SELECT *
FROM tableX AS t
WHERE num.i = t.columnX
)
ORDER BY
num.i
LIMIT 5
Another approach, without using an auxilary table, would be to use MySQL variables. You can test it in SQL-Fiddle, test-2. The output is not the same as the previous (just to show that it can be done):
SELECT start_id, end_id
FROM
( SELECT
IF( t.columnX <> #id, #id, NULL) AS start_id
, IF( t.columnX <> #id, t.columnX-1, NULL) AS end_id
, #rows := #rows + (t.columnX - #id) AS r
, #id := t.columnX + 1 AS running_id
FROM
tableX AS t
CROSS JOIN
( SELECT #rows := 0
, #id := 1
) AS dummy
WHERE
#rows < 5
ORDER BY
t.columnX
) AS tmp
WHERE
start_id IS NOT NULL
This will work, but I think it is pretty inefficient. You won't need an extra table though (a table that would be (2^31-1)*4/1024^3 = 8GB for all positive numbers in INT). Also I advise you look at why you need this, because it might not be neccesary.
Also it will return the start and end of a range, but not all numbers in that range. (e.g. if you have numbers 1 and 5 it will return {0,2,4,6})
SELECT (t.num-1) AS bound FROM t
WHERE t.num-1 NOT IN (SELECT t.num FROM t)
UNION
SELECT (t.num+1) AS bound FROM t
WHERE t.num+1 NOT IN (SELECT t.num FROM t)
As I said this will be pretty inefficient, JOINs might be faster but you would need benchmark it.
SELECT (t.num-1) AS bound FROM t
LEFT JOIN t AS u ON t.num-1 = u.num
WHERE u.num IS NULL
UNION
SELECT (t.num+1) AS bound FROM t
LEFT JOIN t AS u ON t.num+1 = u.num
WHERE u.num IS NULL

MySQL - match post code based on one or two first characters

I'm trying to create a SQL statement to find the matching record based on the provided post code and stored post codes in the database plus the weight aspect.
The post codes in the database are between 1 or 2 characters i.e. B, BA ...
Now - the value passed to the SQL statement will always have 2 first characters of the client's post code. How can I find the match for it? Say I have a post code B1, which would only match the single B in the database plus the weight aspect, which I'm ok with.
Here's my current SQL statement, which also takes the factor of the free shipping above certain weight:
SELECT `s`.*,
IF (
'{$weight}' > (
SELECT MAX(`weight_from`)
FROM `shipping`
WHERE UPPER(SUBSTRING(`post_code`, 1, 2)) = 'B1'
),
(
SELECT `cost`
FROM `shipping`
WHERE UPPER(SUBSTRING(`post_code`, 1, 2)) = 'B1'
ORDER BY `weight_from` DESC
LIMIT 0, 1
),
`s`.`cost`
) AS `cost`
FROM `shipping` `s`
WHERE UPPER(SUBSTRING(`s`.`post_code`, 1, 2)) = 'B1'
AND
(
(
'{$weight}' > (
SELECT MAX(`weight_from`)
FROM `shipping`
WHERE UPPER(SUBSTRING(`post_code`, 1, 2)) = 'B1'
)
)
OR
('{$weight}' BETWEEN `s`.`weight_from` AND `s`.`weight_to`)
)
LIMIT 0, 1
The above however uses the SUBSTRING() function with hard coded number of characters set to 2 - this is where I need some help really to make it match only number of characters that matches the provided post code - in this case B1.
Marcus - thanks for the help - outstanding example - here's what my code look like for those who also wonder:
First I've run the following statement to get the right post code:
(
SELECT `post_code`
FROM `shipping`
WHERE `post_code` = 'B1'
)
UNION
(
SELECT `post_code`
FROM `shipping`
WHERE `post_code` = SUBSTRING('B1', 1, 1)
)
ORDER BY `post_code` DESC
LIMIT 0, 1
Then, based on the returned value assigned to the 'post_code' index my second statement followed with:
$post_code = $result['post_code'];
SELECT `s`.*,
IF (
'1000' > (
SELECT MAX(`weight_from`)
FROM `shipping`
WHERE `post_code` = '{$post_code}'
),
(
SELECT `cost`
FROM `shipping`
WHERE `post_code` = '{$post_code}'
ORDER BY `weight_from` DESC
LIMIT 0, 1
),
`s`.`cost`
) AS `cost`
FROM `shipping` `s`
WHERE `s`.`post_code` = '{$post_code}'
AND
(
(
'1000' > (
SELECT MAX(`weight_from`)
FROM `shipping`
WHERE `post_code` = '{$post_code}'
ORDER BY LENGTH(`post_code`) DESC
)
)
OR
('1000' BETWEEN `s`.`weight_from` AND `s`.`weight_to`)
)
LIMIT 0, 1
The following query will get all results where the post_code in the shipping table matches the beginning of the passed in post_code, then it orders it most explicit to least explicit, returning the most explicit one:
SELECT *
FROM shipping
WHERE post_code = SUBSTRING('B1', 1, LENGTH(post_code))
ORDER BY LENGTH(post_code) DESC
LIMIT 1
Update
While this query is flexible, it's not very fast, since it can't utilize an index. If the shipping table is large, and you'll only pass in up to two characters, it might be faster to make two separate calls.
First, try the most explicit call.
SELECT *
FROM shipping
WHERE post_code = 'B1'
If it doesn't return a result then search on a single character:
SELECT *
FROM shipping
WHERE post_code = SUBSTRING('B1', 1, 1)
Of course, you can combine these with a UNION if you must do it in a single call:
SELECT * FROM
((SELECT *
FROM shipping
WHERE post_code = 'B1')
UNION
(SELECT *
FROM shipping
WHERE post_code = SUBSTRING('B1', 1, 1))) a
ORDER BY post_code DESC
LIMIT 1

Filter out orphan table entries

Suppose there is a table with only two columns (an example is shown below). Every '1' entry should be followed (in the sorted order given below) by a '0'. However, as you can see, in the table, there are some 'orphans' where there are two consecutive '1's.
How can I create a query that returns all the rows, except for the first of any consecutive '1's? (This would reduce the example below from 16 rows to 14)
1 E
0 A
1 T
0 S
1 R
0 E
1 F
0 T
1 G
1 T
0 R
1 X
1 R
0 R
1 E
0 T
I'm going to try and clarify my problem, I think that above I simplified it too much. Imagine one table called logs, with four columns:
user (a string containing a username)
machine (a string uniquely identifying various PCs)
type (event's type: a 1 for login and a 0 for logout)
time (the time of the event being logged)
[The machine/time pair provides a unique key, as no machine can be logged in or out of twice at the same instant. Presumably an 'ID' column could be artificially created based on machine/time sort if needed.]
The idea is that every login event should be accompanied by a logout event. In an ideal word it would be fairly easy to match logins to logouts, and hence analyse the time spent logged in.
However, in the case of a power cut, the logout will not be recorded. Therefore (considering only one machine's data, sorted by time) if there are two login events in a row, we want to ignore the first login, because we don't have any reliable data from it. This is the problem I am trying to solve.
Provided, that
only 1's are dupes, never 0's
You want to get rid of all the first 1's if there are more.
Your text says "except for the first of any consecutive", but I think, this is what you want. Or there can only ever be 2, then it is the same.
SELECT x.*
FROM x
LEFT JOIN x y on y.id = (x.id + 1)
WHERE (x.nr = y.nr) IS NOT TRUE -- OR x.nr = 0
ORDER BY x.id
If you want to preserve double 0's, use the commented clause additionally, but probably not needed.
Edit after question edit:
You may want to add an auto-increment column to your data to make this simpler:
Generate (i.e. write) a row number index column in MySQL
Other RDBMS (PostgreSQL, Oracle, SQL Server, ..) have window functions like row_number() or lag() and lead() that make such an operation much easier.
Assuming you get an id (add column, set column id = record number in database) use:
select a.*
from the_table a
left join the_table b on b.id = a.id + 1
and b.col1 = 0
where a.col1 = 1
and b.id is null
Try:
select l.*
from logs l
where l.type = 0 or
not (select type
from (select * from logs order by `time` desc) n
where n.machine = l.machine and
n.user = l.user and
n.time > l.time)
group by () )
USING a CTE to separate the lag-logic from the selection criteria.
DROP TABLE tmp.bits;
CREATE TABLE tmp.bits
( id SERIAL NOT NULL
, bit INTEGER NOT NULL
, code CHAR(1)
);
INSERT INTO tmp.bits(bit, code) VALUES
(1, 'T' )
, (0, 'S' )
, (1, 'R' )
, (0, 'E' )
, (1, 'F' )
, (0, 'T' )
, (1, 'G' )
, (1, 'T' )
, (0, 'R' )
, (1, 'X' )
, (1, 'R' )
, (0, 'R' )
, (1, 'E' )
, (0, 'T' )
;
SET search_path='tmp';
SELECT * FROM bits;
-- EXPLAIN ANALYZE
WITH prevnext AS (
SELECT
bt.id AS thisid
, bt.bit AS thisbit
, bt.code AS thiscode
, bp.bit AS prevbit
, bp.code AS prevcode
FROM bits bt
LEFT JOIN bits bp ON (bt.id > bp.id)
AND NOT EXISTS ( SELECT * FROM bits nx
WHERE nx.id > bp.id
AND nx.id < bt.id
)
)
SELECT thisid, thisbit, thiscode
FROM prevnext
WHERE thisbit=0
OR prevbit IS NULL OR thisbit <> prevbit
;
EDIT:
for those poor soals that cannot use CTEs, it is easy to create a view instead:
CREATE VIEW prevnext AS (
SELECT
bt.id AS thisid
, bt.bit AS thisbit
,bt.code AS thiscode
, bp.bit AS prevbit
, bp.code AS prevcode
FROM bits bt
LEFT JOIN bits bp ON (bt.id > bp.id)
AND NOT EXISTS ( SELECT * FROM bits nx
WHERE nx.id > bp.id
AND nx.id < bt.id
)
)
;
SELECT thisid, thisbit, thiscode
FROM prevnext
WHERE thisbit=0
OR prevbit IS NULL OR thisbit <> prevbit
;