Top N rows per group only getting 1 row - mysql

I am trying to show the top 3 (or any number) housings per category. The top meaning the most visited. So if I have a table like this:
+------------+--------------------+--------+
| housing_id | category | visits |
+------------+--------------------+--------+
| 7 | cat | 2 |
| 8 | New Category | 1 |
| 10 | bead and breakfast | 1 |
| 11 | bead and breakfast | 4 |
| 15 | 2 | 3 |
| 16 | 2 | 1 |
| 17 | New Category | 1 |
| 18 | cat | 1 |
+------------+--------------------+--------+
I and I want to select the top 3 most visited housings per category so I am doing this.
select housing_id, category, visits
from
(select housing_id, category, visits,
#category_rank := if(#current_category = category, #country_rank + 1, 1) as category_rank,
#current_category := category
from visit_counts
order by category, visits desc
) ranked
where category_rank <= 3;
I get:
+------------+--------------------+--------+
| housing_id | category | visits |
+------------+--------------------+--------+
| 15 | 2 | 3 |
| 11 | bead and breakfast | 4 |
| 7 | cat | 2 |
| 8 | New Category | 1 |
+------------+--------------------+--------+
but I want:
+------------+--------------------+--------+
| housing_id | category | visits |
+------------+--------------------+--------+
| 15 | 2 | 3 |
| 16 | 2 | 1 |
| 11 | bead and breakfast | 4 |
| 10 | bead and breakfast | 1 |
| 7 | cat | 2 |
| 18 | cat | 1 |
| 8 | New Category | 1 |
| 17 | New Category | 1 |
+------------+--------------------+--------+

You are using the user variables without declaring them. Also, you should assign and read the user variables in one expression as the MySQL doesnt guarantee the order of column evaluation (so assignment may happen before or after you read it).
Try this:
select housing_id, category, visits
from (
select housing_id, category, visits,
#category_rank := if(#current_category = category,
#category_rank + 1,
if(#current_category := category, 1, 1)
) as category_rank
from visit_counts, (select #category_rank := 0, #current_category := null) t2
order by category, visits desc
) ranked
where category_rank <= 3;
Demo

Related

mysql sequence number by value coloumn (query UPDATE)

example:
I have a table with the columns
______________________
|field_id|Code|seq_num|
| 1 | a | 1 |
| 1 | a | 2 |
| 1 | a | 3 |
| 2 | a | 4 |
| 2 | a | 5 |
| 3 | a | 6 |
| 3 | a | 7 |
| 3 | a | 8 |
how to query it, so sequence number look like this
_____________________
|field_id|Code|seq_num|
| 1 | a | 1 |
| 1 | a | 2 |
| 1 | a | 3 |
| 2 | a | 1 |
| 2 | a | 2 |
| 3 | a | 1 |
| 3 | a | 2 |
| 3 | a | 3 |
please help!!
One method is to get the minimum sequence for the field:
select t.field_id, t.code,
(seq_num - min_seqnum + 1) as seqnum
from t join
(select field_id, min(seq_num) as min_seq_num
from t
group by field_id
) f
on t.field_id = f.field_id;
You can also do this using variables, if you don't trust the current sequence numbers to have no gaps:
select . . .,
(#rn := if(#f = field_id, #rn + 1,
if(#f := field_id, 1, 1)
)
) as seq_no
from (select t.*
from t
order by field_id, seq_no
) t cross join
(select #f := '', #rn := 0) params;

Select second min() or second smallest from mysql table

I'm wondering how to select the second smallest value from a mysql table, grouped on a non-numeric column. If I have a table that looks like this:
+----+----------+------------+--------+------------+
| id | customer | order_type | amount | created_dt |
+----+----------+------------+--------+------------+
| 1 | 1 | web | 5 | 2017-01-01 |
| 2 | 1 | web | 7 | 2017-01-05 |
| 3 | 2 | web | 2 | 2017-01-07 |
| 4 | 3 | web | 2 | 2017-02-01 |
| 5 | 3 | web | 3 | 2017-02-01 |
| 6 | 2 | web | 5 | 2017-03-15 |
| 7 | 1 | in_person | 7 | 2017-02-01 |
| 8 | 3 | web | 8 | 2017-01-01 |
| 9 | 2 | web | 1 | 2017-04-01 |
+----+----------+------------+--------+------------+
I want to count the number of second orders in each month/year. I also have a customer table (which is where the customer ids come from). I can find the number of customers with more than at least 2 orders by the customer's created date by querying
select date(c.created_dt) as create_date, count(c.id)
from customer c
where c.id in
(select or.identity_id
from orders or
where
(select count(o.created_dt)
from orders o
where or.customer = o.customer and o.order_tpe in ('web')
) > 1
)
group by 1;
However, that result gives customer by their created date, and I can't seem to figure out how to find the the number of second orders by date.
The desired output i'd like to see, based on the data above, is:
+-------+------+---------------+
| month | year | second_orders |
+-------+------+---------------+
| 1 | 2017 | 1 |
| 2 | 2017 | 1 |
| 3 | 2017 | 1 |
+-------+------+---------------+
One way to approach this
SELECT YEAR(created_dt) year, MONTH(created_dt) month, COUNT(*) second_orders
FROM (
SELECT created_dt,
#rn := IF(#c = customer, #rn + 1, 1) rn,
#c := customer
FROM orders CROSS JOIN (
SELECT #c := NULL, #rn := 1
) i
WHERE order_type = 'web'
ORDER BY customer, id
) q
WHERE rn = 2
GROUP BY YEAR(created_dt), MONTH(created_dt)
ORDER BY year, month
Here is a dbfiddle demo
Output:
+------+-------+---------------+
| year | month | second_orders |
+------+-------+---------------+
| 2017 | 1 | 1 |
| 2017 | 2 | 1 |
| 2017 | 3 | 1 |
+------+-------+---------------+

SQL Query Conditional accumulation

it is possible to display accumulated data, resetting the count based on a condition?
I would like to create a script to accumulate if there is value 1 in cell number, but if another value the count should be restarted. Something like what is displayed in the column cumulative_with_condition.
+----+------------+--------+
| id | release | number |
+----+------------+--------+
| 1 | 2016-07-08 | 4 |
| 2 | 2016-07-09 | 1 |
| 3 | 2016-07-10 | 1 |
| 4 | 2016-07-12 | 2 |
| 5 | 2016-07-13 | 1 |
| 6 | 2016-07-14 | 1 |
| 7 | 2016-07-15 | 1 |
| 8 | 2016-07-16 | 2-3 |
| 9 | 2016-07-17 | 3 |
| 10 | 2016-07-18 | 1 |
+----+------------+--------+
select * from version where id > 1 and id < 9;
+----+------------+--------+---------------------------+
| id | release | number | cumulative_with_condition |
+----+------------+--------+---------------------------+
| 2 | 2016-07-09 | 1 | 1 |
| 3 | 2016-07-10 | 1 | 2 |
| 4 | 2016-07-12 | 2 | 0 |
| 5 | 2016-07-13 | 1 | 1 |
| 6 | 2016-07-14 | 1 | 2 |
| 7 | 2016-07-15 | 1 | 3 |
| 8 | 2016-07-16 | 2-3 | 0 |
+----+------------+--------+---------------------------+
You want something like row_number() (not exactly, but like that). You can do that using variables:
select t.*,
(#rn := if(number = 1, #rn + 1,
if(#n := number, 0, 0)
)
) as cumulative_with_condition
from t cross join
(select #n := '', #rn := 0) params
order by t.id;
As an alternative to using user variables, as demonstrated by Gordon Linoff, in this case it's also possible to self-join, group and count:
SELECT t.id, t.release, t.number, COUNT(version.id) AS cumulative_with_condition
FROM version RIGHT JOIN (
SELECT highs.*, MAX(lows.id) min
FROM version lows RIGHT JOIN version highs ON lows.id <= highs.id
WHERE lows.number <> '1'
GROUP BY highs.id
) t ON version.id > t.min AND version.id <= t.id
WHERE t.id > 1 AND t.id < 9
GROUP BY t.id
See it on sqlfiddle.
But, frankly, neither approach is particularly elegant—as I commented previously, you're probably best off implementing this within your application code.

How to get Latest N Records of selected Group

I want to run a query on MySql version 5.1.9 that returns me only top two (order by JoiningDate) of selected Dept.
For example, my data is like:
+-------+------------------------------------------+----------+------------+
| empid | title | Dept | JoiningDate|
+-------+------------------------------------------+----------+------------+
| 1 | Research and Development | 1 | 2015-08-06 |
| 2 | Consultant | 2 | 2015-08-06 |
| 3 | Medical Consultant | 3 | 2015-08-06 |
| 4 | Officer | 4 | 2015-08-06 |
| 5 | English Translator | 5 | 2015-08-06 |
| 6 | Teacher | 1 | 2015-08-01 |
| 7 | Physical Education | 2 | 2015-08-01 |
| 8 | Accountant | 3 | 2015-08-01 |
| 9 | Science Teacher | 4 | 2015-08-01 |
| 10 | Home Science | 5 | 2015-08-01 |
| 11 | Research Assistant | 1 | 2015-08-05 |
| 12 | Consultant | 2 | 2015-08-05 |
| 13 | Consultant HR | 3 | 2015-08-05 |
| 14 | Technical Lead | 4 | 2015-08-05 |
| 15 | Hindi Translator | 5 | 2015-08-05 |
| 16 | Urdu Teacher | 1 | 2015-08-02 |
| 17 | Physical Education | 2 | 2015-08-02 |
| 18 | Accountant | 3 | 2015-08-02 |
| 19 | Science | 4 | 2015-08-02 |
| 20 | Home Science | 5 | 2015-08-02 |
+-------+------------------------------------------+----------+------------+
I want the query to output the latest joined two empid's of Dept (1,2,3) i.e:
+-------+------------------------------------------+----------+------------+
| empid | title | Dept | JoiningDate|
+-------+------------------------------------------+----------+------------+
| 1 | Research and Development | 1 | 2015-08-06 |
| 11 | Research Assistant | 1 | 2015-08-05 |
| 2 | Consultant | 2 | 2015-08-06 |
| 12 | Consultant | 2 | 2015-08-05 |
| 3 | Medical Consultant | 3 | 2015-08-06 |
| 13 | Consultant HR | 3 | 2015-08-05 |
+-------+------------------------------------------+----------+------------+
In mysql you can use user defined variables to achieve you desired results
SELECT
t.empid,
t.title,
t.Dept,
t.JoiningDate
FROM
(
SELECT
*,
#r:= CASE WHEN #g = b.Dept THEN #r + 1 ELSE 1 END rounum,
#g:= b.Dept
FROM (
SELECT *
FROM table1
CROSS JOIN (SELECT #r:= NULL,#g:=NULL) a
WHERE Dept IN(1,2,3)
ORDER BY Dept,JoiningDate DESC
) b
) t
WHERE t.rounum <=2
DEMO
Use a correlated sub-select to count number of rows with same date but a later JoiningDate. If less than 2, return the row.
select empid, title, Dept, JoiningDate
from tablename t1
where (select count(*) from tablename t2
where t2.Dept = t1.Dept
and t2.JoiningDate > t1.JoiningDate) < 2
Query
select *
from emp_ t1
where
(
select count(*) from emp_ t2
where t2.Dept = t1.Dept
and t2.JoiningDate > t1.JoiningDate
) <= 1
and t1.Dept in (1,2,3)
order by t1.Dept;
SQL Fiddle
Can also achieve it by giving a rownumber.
Query
select t2.empid,
t2.title,
t2.Dept,
t2.JoiningDate
from
(
select empid,
title,
Dept,
JoiningDate,
(
case Dept
when #curA
then #curRow := #curRow + 1
else #curRow := 1 and #curA := Dept end
) as rn
from employee t,
(select #curRow := 0, #curA := '') r
where Dept in (1,2,3)
order by Dept,JoiningDate desc
)t2
where rn < 3;
SQL Fiddle

Advanced MySQL: Find correlations between poll responses

I've got four MySQL tables:
users (id, name)
polls (id, text)
options (id, poll_id, text)
responses (id, poll_id, option_id, user_id)
Given a particular poll and a particular option, I'd like to generate a table that shows which options from other polls are most strongly correlated.
Suppose this is our data set:
TABLE users:
+------+-------+
| id | name |
+------+-------+
| 1 | Abe |
| 2 | Bob |
| 3 | Che |
| 4 | Den |
+------+-------+
TABLE polls:
+------+-----------------------+
| id | text |
+------+-----------------------+
| 1 | Do you like apples? |
| 2 | What is your gender? |
| 3 | What is your height? |
| 4 | Do you like polls? |
+------+-----------------------+
TABLE options:
+------+----------+---------+
| id | poll_id | text |
+------+----------+---------+
| 1 | 1 | Yes |
| 2 | 1 | No |
| 3 | 2 | Male |
| 4 | 2 | Female |
| 5 | 3 | Short |
| 6 | 3 | Tall |
| 7 | 4 | Yes |
| 8 | 4 | No |
+------+----------+---------+
TABLE responses:
+------+----------+------------+----------+
| id | poll_id | option_id | user_id |
+------+----------+------------+----------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 2 |
| 3 | 1 | 2 | 3 |
| 4 | 1 | 2 | 4 |
| 5 | 2 | 3 | 1 |
| 6 | 2 | 3 | 2 |
| 7 | 2 | 3 | 3 |
| 8 | 2 | 4 | 4 |
| 9 | 3 | 5 | 1 |
| 10 | 3 | 6 | 2 |
| 10 | 3 | 5 | 3 |
| 10 | 3 | 6 | 4 |
| 10 | 4 | 7 | 1 |
| 10 | 4 | 7 | 2 |
| 10 | 4 | 7 | 3 |
| 10 | 4 | 7 | 4 |
+------+----------+------------+----------+
Given the poll ID 1 and the option ID 2, the generated table should be something like this:
+----------+------------+-----------------------+
| poll_id | option_id | percent_correlated |
+----------+------------+-----------------------+
| 4 | 7 | 100 |
| 2 | 3 | 66.66 |
| 3 | 6 | 66.66 |
| 2 | 4 | 33.33 |
| 3 | 5 | 33.33 |
| 4 | 8 | 0 |
+----------+------------+-----------------------+
So basically, we're identifying all of the users who responded to poll ID 1 and selected option ID 2, and we're looking through all the other polls to see what percentage of them also selected each other option.
Don't have an instance handy to test, can you see if this gets proper results:
select
poll_id,
option_id,
((psum - (sum1 * sum2 / n)) / sqrt((sum1sq - pow(sum1, 2.0) / n) * (sum2sq - pow(sum2, 2.0) / n))) AS r,
n
from
(
select
poll_id,
option_id,
SUM(score) AS sum1,
SUM(score_rev) AS sum2,
SUM(score * score) AS sum1sq,
SUM(score_rev * score_rev) AS sum2sq,
SUM(score * score_rev) AS psum,
COUNT(*) AS n
from
(
select
responses.poll_id,
responses.option_id,
CASE
WHEN user_resp.user_id IS NULL THEN SELECT 0
ELSE SELECT 1
END CASE as score,
CASE
WHEN user_resp.user_id IS NULL THEN SELECT 1
ELSE SELECT 0
END CASE as score_rev,
from responses left outer join
(
select
user_id
from
responses
where
poll_id = 1 and
option_id = 2
)user_resp
ON (user_resp.user_id = responses.user_id)
) temp1
group by
poll_id,
option_id
)components
After a few hours of trial and error, I managed to put together a query that works correctly:
SELECT poll_id AS p_id,
option_id AS o_id,
COUNT(*) AS optCount,
(SELECT COUNT(*) FROM response WHERE option_id = o_id AND user_id IN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2')) /
(SELECT COUNT(*) FROM response WHERE poll_id = p_id AND user_id IN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2'))
AS percentage
FROM response
INNER JOIN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2') AS user_ids
ON response.user_id = user_ids.user_id
WHERE poll_id != '1'
GROUP BY option_id DESC
ORDER BY percentage DESC, optCount DESC
Based on a tests with a small data set, this query looks to be reasonably fast, but I'd like to modify it so the "IN" subquery is not repeated three times. Any suggestions?
This seems to give the right results for me:
select poll_stats.poll_id,
option_stats.option_id,
(100 * option_responses / poll_responses) as percent_correlated
from (select response.poll_id,
count(*) as poll_responses
from response selecting_response
join response on response.user_id = selecting_response.user_id
where selecting_response.poll_id = 1 and selecting_response.option_id = 2
group by response.poll_id) poll_stats
join (select options.poll_id,
options.id as option_id,
count(response.id) as option_responses
from options
left join response on response.poll_id = options.poll_id
and response.option_id = options.id
and exists (
select 1 from response selecting_response
where selecting_response.user_id = response.user_id
and selecting_response.poll_id = 1
and selecting_response.option_id = 2)
group by options.poll_id, options.id
) as option_stats
on option_stats.poll_id = poll_stats.poll_id
where poll_stats.poll_id <> 1
order by 3 desc, option_responses desc