Subcount's as fields in a mysql query - mysql

For a statistics application, with a table structure as such:
unique_id | browser_family | os | date_time | js_enabled | flash_enabled | non_binary_field
1 firefox w7 ... 1 0 yes
2 chrome w7 ... 1 1 no
3 ie9 wx ... 0 0 yes
So, I'd like to perform a query with where clauses on any fields, and have it give me counts of js_enabled=1, flash_enabled =0, non_binary_field = 'yes' for those criteria (say `os` = 'w7' and date(`date_time`) = '01-08-2012').
The result would be:
count(js_enabled=1) | count(flash_enabled=1) | count(non_binary_field='yes')
2 1 1
Is this possible in a single query?
Thanks!

select sum(js_enabled=1),
sum(flash_enabled=1),
sum(non_binary_field='yes')
from your_table
where `os` = 'w7'
and date(`date_time`) = '2012-08-01'

Each field can be filled by a separate subquery:
select
(select count(js_enabled) from yourtable where js_enabled=1),
(select count(flash_enabled) from yourtable where flash_enabled=1),
(select count(non_binary_field) from yourtable where non_binary_field='yes')

Related

Problems using SQL ALL operator

I'm having trouble using/understanding the SQL ALL operator. I have a table FOLDER_PERMISSION with the following columns:
+----+-----------+---------+----------+
| ID | FOLDER_ID | USER_ID | CAN_READ |
+----+-----------+---------+----------+
| 1 | 34353 | 45453 | 0 |
| 2 | 46374 | 342532 | 1 |
| 3 | 46374 | 32352 | 1 |
+----+-----------+---------+----------+
I want to select the folders where all the users have permission to read, how could I do it?
Use aggregation and having:
select folder_id
from t
group by folder_id
having min(can_read) = 1;
Gordon's answer seems better but for the sake of completeness, using ALL a query could look like:
SELECT x1.folder_id
FROM (SELECT DISTINCT
fp1.folder_id
FROM folder_permission fp1) x1
WHERE 1 = ALL (SELECT fp2.can_read
FROM folder_permission fp2
WHERE fp2.folder_id = x1.folder_id);
If you have a table for the folders themselves replace the derived table (aliased x1) with it.
But this only respects users present in folder_permissions. If not all users have a reference in that table you possibly won't get the folders really all users can read.
You can do aggregation :
SELECT fp.FOLDER_ID
FROM folder_permission fp
GROUP BY fp.FOLDER_ID
HAVING SUM( can_read = 0 ) = 0;
You can also express it :
SELECT fp.FOLDER_ID
FROM folder_permission fp
GROUP BY fp.FOLDER_ID
HAVING MIN(CAN_READ) = MAX(CAN_READ) AND MIN(CAN_READ) = 1;
If you wanted to return the full matching records, you could try using some exists logic:
SELECT ID, FOLDER_ID, USER_ID, CAN_READ
FROM yourTable t1
WHERE NOT EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.FOLDER_ID = t1.FOLDER_ID AND t2.CAN_READ = 0);
Demo
The existence of a matching record in the above exists subquery would imply that there exist one or more users for that folder who do not have read access rights.

Select all records where last n characters in column are not unique

I have bit strange requirement in mysql.
I should select all records from table where last 6 characters are not unique.
for example if I have table:
I should select row 1 and 3 since last 6 letters of this values are not unique.
Do you have any idea how to implement this?
Thank you for help.
I uses a JOIN against a subquery where I count the occurences of each unique combo of n (2 in my example) last chars
SELECT t.*
FROM t
JOIN (SELECT RIGHT(value, 2) r, COUNT(RIGHT(value, 2)) rc
FROM t
GROUP BY r) c ON c.r = RIGHT(value, 2) AND c.rc > 1
Something like that should work:
SELECT `mytable`.*
FROM (SELECT RIGHT(`value`, 6) AS `ending` FROM `mytable` GROUP BY `ending` HAVING COUNT(*) > 1) `grouped`
INNER JOIN `mytable` ON `grouped`.`ending` = RIGHT(`value`, 6)
but it is not fast. This requires a full table scan. Maybe you should rethink your problem.
EDITED: I had a wrong understanding of the question previously and I don't really want to change anything from my initial answer. But if my previous answer is not acceptable in some environment and it might mislead people, I have to correct it anyhow.
SELECT GROUP_CONCAT(id),RIGHT(VALUE,6)
FROM table1
GROUP BY RIGHT(VALUE,6) HAVING COUNT(RIGHT(VALUE,6)) > 1;
Since this question already have good answers, I made my query in a slightly different way. And I've tested with sql_mode=ONLY_FULL_GROUP_BY. ;)
This is what you need: a subquery to get the duplicated right(value,6) and the main query yo get the rows according that condition.
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT RIGHT(`value`,6)
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1);
UPDATE
This is the solution to avoid the mysql error in the case you have sql_mode=only_full_group_by
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT DISTINCT right_value FROM (
SELECT RIGHT(`value`,6) AS right_value,
COUNT(*) AS TOT
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1) t2
)
Fiddle here
Might be a fast code, as there is no counting involved.
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/0
select *
from tbl outr
where not exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | --------------- |
| 2 | aaaaaaaaaaaaaa |
| 4 | aaaaaaaaaaaaaaB |
| 5 | Hello |
The logic is to test other rows that is not equal to the same id of the outer row. If those other rows has same right 6 characters as the outer row, then don't show that outer row.
UPDATE
I misunderstood the OP's intent. It's the reversed. Anyway, just reverse the logic. Use EXISTS instead of NOT EXISTS
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/3
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | ----------- |
| 1 | abcdePuzzle |
| 3 | abcPuzzle |
UPDATE
Tested the query. The performance of my answer (correlated EXISTS approach) is not optimal. Just keeping my answer, so others will know what approach to avoid :)
GhostGambler's answer is faster than correlated EXISTS approach. For 5 million rows, his answer takes 2.762 seconds only:
explain analyze
SELECT
tbl.*
FROM
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
) grouped
JOIN tbl ON grouped.ending = RIGHT(value, 6)
My answer (correlated EXISTS) takes 4.08 seconds:
explain analyze
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Straightforward query is the fastest, no join, just plain IN query. 2.722 seconds. It has practically the same performance as JOIN approach since they have the same execution plan. This is kiks73's answer. I just don't know why he made his second answer unnecessarily complicated.
So it's just a matter of taste, or choosing which code is more readable select from in vs select from join
explain analyze
SELECT *
FROM tbl
where right(value, 6) in
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
)
Result:
Test data used:
CREATE TABLE tbl (
id INTEGER primary key,
value VARCHAR(20)
);
INSERT INTO tbl
(id, value)
VALUES
('1', 'abcdePuzzle'),
('2', 'aaaaaaaaaaaaaa'),
('3', 'abcPuzzle'),
('4', 'aaaaaaaaaaaaaaB'),
('5', 'Hello');
insert into tbl(id, value)
select x.y, 'Puzzle'
from generate_series(6, 5000000) as x(y);
create index ix_tbl__right on tbl(right(value, 6));
Performances without the index, and with index on tbl(right(value, 6)):
JOIN approach:
Without index: 3.805 seconds
With index: 2.762 seconds
IN approach:
Without index: 3.719 seconds
With index: 2.722 seconds
Just a bit neater code (if using MySQL 8.0). Can't guarantee the performance though
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/1
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count = 1
Output:
| id | value | unique_count |
| --- | --------------- | ------------ |
| 2 | aaaaaaaaaaaaaa | 1 |
| 4 | aaaaaaaaaaaaaaB | 1 |
| 5 | Hello | 1 |
UPDATE
I misunderstood OP's intent. It's the reversed. Just change the count:
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count > 1
Output:
| id | value | unique_count |
| --- | ----------- | ------------ |
| 1 | abcdePuzzle | 2 |
| 3 | abcPuzzle | 2 |

mysql for percentage between rows

I have some sql that looks like this:
SELECT
stageName,
count(*) as `count`
FROM x2production.contact_stages
WHERE FROM_UNIXTIME(createDate) between '2016-05-01' AND DATE_ADD('2016-08-31', INTERVAL 1 DAY)
AND (stageName = 'DI-Whatever' OR stageName = 'DI-Quote' or stageName = 'DI-Meeting')
Group by stageName
Order by field(stageName, 'DI-Quote', 'DI-Meeting', 'DI-Whatever')
This produces a table that looks like:
+-------------+-------+
| stageName | count |
+-------------+-------+
| DI-quote | 1230 |
| DI-Meeting | 985 |
| DI-Whatever | 325 |
+-------------+-------+
Question:
I would like a percentage from one row to the next. For example the percentage of DI-Meeting to DI-quote. The math would be 100*985/1230 = 80.0%
So in the end the table would look like so:
+-------------+-------+------+
| stageName | count | perc |
+-------------+-------+------+
| DI-quote | 1230 | 0 |
| DI-Meeting | 985 | 80.0 |
| DI-Whatever | 325 | 32.9 |
+-------------+-------+------+
Is there any way to do this in mysql?
Here is an SQL fiddle to mess w/ the data: http://sqlfiddle.com/#!9/61398/1
The query
select stageName,count,if(rownum=1,0,round(count/toDivideBy*100,3)) as percent
from
( select stageName,count,greatest(#rn:=#rn+1,0) as rownum,
coalesce(if(#rn=1,count,#prev),null) as toDivideBy,
#prev:=count as dummy2
from
( SELECT
stageName,
count(*) as `count`
FROM Table1
WHERE FROM_UNIXTIME(createDate) between '2016-05-01' AND DATE_ADD('2016-08-31', INTERVAL 1 DAY)
AND (stageName = 'DI-Underwriting' OR stageName = 'DI-Quote' or stageName = 'DI-Meeting')
Group by stageName
Order by field(stageName, 'DI-Quote', 'DI-Meeting', 'DI-Underwriting')
) xDerived1
cross join (select #rn:=0,#prev:=-1) as xParams1
) xDerived2;
Results
+-----------------+-------+---------+
| stageName | count | percent |
+-----------------+-------+---------+
| DI-Quote | 16 | 0 |
| DI-Meeting | 13 | 81.250 |
| DI-Underwriting | 4 | 30.769 |
+-----------------+-------+---------+
Note, you want a 0 as the percent for the first row. That is easily changed to 100.
The cross join brings in the variables for use and initializes them. The greatest and coalesce are used for safety in variable use as spelled out well in this article, and clues from the MySQL Manual Page Operator Precedence. The derived tables names are just that: every derived table needs a name.
If you do not adhere to the principles in those referenced articles, then the use of variables is unsafe. I am not saying I nailed it, but that safety is always my focus.
The assignment of variables need to follow a safe form, such as the #rn variable being set on the inside of a function like greatest or least. We know that #rn is always greater than 0. So we are using the greatest function to force our will on the query. Same trick with coalesce, null will never happen, and := has lower precedence in the column that follows it. That is, the last one: #prev:= which follows the coalesce.
That way, a variable is set before other columns in that select row attempt to use its value.
So, just getting the expected results does not mean you did it safely and that it will work with your real data.
What you need is to use a LAG function, since MySQL doesn't support it your have to mimic it this way:
select stageName,
cnt,
IF(valBefore is null,0,((100*cnt)/valBefore)) as perc
from (SELECT tb.stageName,
tb.cnt,
#ct AS valBefore,
(#ct := cnt)
FROM (SELECT stageName,
count(*) as cnt
FROM Table1,
(SELECT #_stage = NULL,
#ct := NULL) vars
WHERE FROM_UNIXTIME(createDate) between '2016-05-01'
AND DATE_ADD('2016-08-31', INTERVAL 1 DAY)
AND stageName in ('DI-Underwriting', 'DI-Quote', 'DI-Meeting')
Group by stageName
Order by field(stageName, 'DI-Quote', 'DI-Meeting', 'DI-Underwriting')
) tb
WHERE (CASE WHEN #_stage IS NULL OR #_stage <> tb.stageName
THEN #ct := NULL
ELSE NULL END IS NULL)
) as final
See it working here: http://sqlfiddle.com/#!9/61398/35
EDIT I've actually edited it to remove an unnecessary step (subquery)

SQL: Get the most frequent value for each group

Lets say that I have a table ( MS-ACCESS / MYSQL ) with two columns ( Time 'hh:mm:ss' , Value ) and i want to get most frequent value for each group of row.
for example i have
Time | Value
4:35:49 | 122
4:35:49 | 122
4:35:50 | 121
4:35:50 | 121
4:35:50 | 111
4:35:51 | 122
4:35:51 | 111
4:35:51 | 111
4:35:51 | 132
4:35:51 | 132
And i want to get most frequent value of each Time
Time | Value
4:35:49 | 122
4:35:50 | 121
4:35:51 | 132
Thanks in advance
Remark
I need to get the same result of this Excel solution : Get the most frequent value for each group
** MY SQL Solution **
I found a solution(Source) that works fine with mysql but i can't get it to work in ms-access:
select cnt1.`Time`,MAX(cnt1.`Value`)
from (select COUNT(*) as total, `Time`,`Value`
from `my_table`
group by `Time`,`Value`) cnt1,
(select MAX(total) as maxtotal from (select COUNT(*) as total,
`Time`,`Value` from `my_table` group by `Time`,`Value`) cnt3 ) cnt2
where cnt1.total = cnt2.maxtotal GROUP BY cnt1.`Time`
Consider an INNER JOIN to match the two derived table subqueries rather than a list of subquery select statements matched with WHERE clause. This has been tested in MS Access:
SELECT MaxCountSub.`Time`, CountSub.`Value`
FROM
(SELECT myTable.`Time`, myTable.`Value`, Count(myTable.`Value`) AS CountOfValue
FROM myTable
GROUP BY myTable.`Time`, myTable.`Value`) As CountSub
INNER JOIN
(SELECT dT.`Time`, Max(CountOfValue) As MaxCountOfValue
FROM
(SELECT myTable.`Time`, myTable.`Value`, Count(myTable.`Value`) AS CountOfValue
FROM myTable
GROUP BY myTable.`Time`, myTable.`Value`) As dT
GROUP BY dT.`Time`) As MaxCountSub
ON CountSub.`Time` = MaxCountSub.`Time`
AND CountSub.CountOfValue = MaxCountSub.MaxCountOfValue
you can do this by query like this:
select time, value
from (select value, time from your_table
group by value , time
order by count(time) desc
) temp where temp.value = value
group by value

Return NULL for missing values in an IN list

I have a table like this:
id | val
---------
1 | abc
2 | def
5 | xyz
6 | foo
8 | bar
and a query like
SELECT id, val FROM tab WHERE id IN (1,2,3,4,5)
which returns
id | val
---------
1 | abc
2 | def
5 | xyz
Is there a way to make it return NULLs on missing ids, that is
id | val
---------
1 | abc
2 | def
3 | NULL
4 | NULL
5 | xyz
I guess there should be a tricky LEFT JOIN with itself, but can't wrap my head around it.
EDIT: I see people are thinking I want to "fill the gaps" in a sequence, but actually what I want is to substitute NULL for the missing values from the IN list. For example, this
SELECT id, val FROM tab WHERE id IN (1,100,8,200)
should return
id | val
---------
1 | abc
100 | NULL
8 | bar
200 | NULL
Also, the order doesn't matter much.
EDIT2: Just adding a couple of related links:
How to select multiple rows filled with constants?
Is it possible to have a tableless select with multiple rows?
You could use this trick:
SELECT v.id, t.val
FROM
(SELECT 1 AS id
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5) v
LEFT JOIN tab t
ON v.id = t.id
Please see fiddle here.
Yes, you can. But that will be tricky since there are no sequences in MySQL.
I assume you want just any selection, so it's:
SELECT
*
FROM
(SELECT
(two_1.id + two_2.id + two_4.id +
two_8.id + two_16.id) AS id
FROM
(SELECT 0 AS id UNION ALL SELECT 1 AS id) AS two_1
CROSS JOIN (SELECT 0 id UNION ALL SELECT 2 id) AS two_2
CROSS JOIN (SELECT 0 id UNION ALL SELECT 4 id) AS two_4
CROSS JOIN (SELECT 0 id UNION ALL SELECT 8 id) AS two_8
CROSS JOIN (SELECT 0 id UNION ALL SELECT 16 id) AS two_16
) AS sequence
LEFT JOIN
t
ON sequence.id=t.id
WHERE
sequence.id IN (1,2,3,4,5);
(check the fiddle)
It will work as combination of powers of 2 to generate consecutive table of numbers. Your values are passed to WHERE clause, so you can substitute there any set of values.
I would recommend you to use application for this case - because it will be faster. It may have some sense if you want to use this row set somewhere else (i.e. in some other queries) - but if not, it's a work for your application.
If you'll need higher values, add more rows to sequence generator, like in this fiddle.