Select every other row as male/female from mysql table - mysql

I've got a table containing persons gender-coded as 0 and 1. I need to select every other row as male/female. I thought I could manage this somehow by using modulo and the gender-codes 0 and 1, but I haven't managed to figure it out yet...
The result I'm looking for would look like this:
+-----+--------+-------+
| row | gender | name |
+-----+--------+-------+
| 1 | female | Lisa |
| 2 | male | Greg |
| 3 | female | Mary |
| 4 | male | John |
| 5 | female | Jenny |
+-----+--------+-------+
etc.
The alternative is to do it in PHP by merging 2 separate arrays, but I would really like it as a SQL query...
Any suggestions are appreciated!

Do two subqueries to select male and female. Use ranking function to have them enumerated.
Males:
1 | Peter
2 | John
3 | Chris
Females:
1 | Marry
2 | Christina
3 | Kate
Then multiplay ranking result by x10 and add 5 for females. So you have this:
Males:
10 | Peter
20 | John
30 | Chris
Females:
15 | Marry
25 | Christina
35 | Kate
Then do the UNION ALL and sort by new sort order/new ID.
Together it should like this (pseudo code)
SELECT
Name
FROM
(subquery for Males: RANK() AS sortOrd, Name)
UNION ALL
(subquery for Females: RANK()+1 AS SortOrd, Name)
ORDER BY SortOrd
Result should be like this:
Males and Females:
10 | Peter
15 | Marry
20 | John
25 | Christina
30 | Chris
35 | Kate

Found Emulate Row_Number() and modified a bit for your case.
set #rownum := 0;
set #pg := -1;
select p.name,
p.gender
from
(
select name,
gender,
#rownum := if(#pg = gender, #rownum+1, 1) as rn,
#pg := gender as pg
from persons
order by gender
) as p
order by p.rn, p.gender
Try on SQL Fiddle
Note: From 9.4. User-Defined Variables
As a general rule, you should never assign a value to a user variable
and read the value within the same statement. You might get the
results you expect, but this is not guaranteed.
I will leave it up to you do decide if you can use this. I don't use MySQL so I can't really tell you if you should be concerned or not.

Similar to Mikael's solution but without the need to order the resultset multiple times -
SELECT *
FROM (
SELECT people.*,
IF(gender=0, #mr:=#mr+1, #fr:=#fr+1) AS rank
FROM people, (SELECT #mr:=0, #fr:=0) initvars
) tmp
ORDER BY rank ASC, gender ASC;
To avoid having to order both the inner and outer selects I have used separate counters (#mr - male rank, #fr - female rank) in the inner select.

I've got a table containing persons gender-coded as 0 and 1
Then why would you make assumptions on the order of rows in the result set? Seems to me transforming the 0/1 into 'male'/'female' is far more robust:
select name, case gender when 0 then 'male' else 'female' end
from Person

SELECT alias.*, ROW_NUMBER() OVER (PARTITION BY GENDER ORDER BY GENDER) rnk
FROM TABLE_NAME
ORDER BY rnk, GENDER DESC

Related

MySQL: Error: Operand should contain 1 column(s). What's wrong with my use of WHERE...NOT IN(SELECT...)?

Credit:Leetcode_1355. Activity Participants
Question:
Write an SQL query to find the names of all the activities with neither maximum, nor minimum number of participants.
Return the result table in any order. Each activity in table Activities is performed by any person in the table Friends.
Friends table:
+------+--------------+---------------+
| id | name | activity |
+------+--------------+---------------+
| 1 | Jonathan D. | Eating |
| 2 | Jade W. | Singing |
| 3 | Victor J. | Singing |
| 4 | Elvis Q. | Eating |
| 5 | Daniel A. | Eating |
| 6 | Bob B. | Horse Riding |
+------+--------------+---------------+
Activities table:
+------+--------------+
| id | name |
+------+--------------+
| 1 | Eating |
| 2 | Singing |
| 3 | Horse Riding |
+------+--------------+
Result table:
+--------------+
| activity |
+--------------+
| Singing |
+--------------+
My code is as follows:
WITH a AS(
SELECT activity, COUNT(1) AS n
FROM Friends
GROUP BY activity
)
SELECT activity
FROM a
WHERE n NOT IN (SELECT MAX(n),MIN(n) FROM a)
I have seen the success of using n != (select min(n) from a) and n != (select max(n) from a), but I did not know why my code went wrong. My guess is that it's because 'SELECT MAX(n), MIN(n) FROM a' will generate two columns, rather than two rows. While I still don't know the exact reason.
Hope someone can help me out! Thank you so much!
You are close. But NOT IN does work that way -- because the subquery returns multiple columns. And you are comparing to only one value. Instead, use two separate comparisons:
SELECT activity
FROM a
WHERE n <> (SELECT MAX(n) FROM a) AND
n <> (SELECT MIN(n) FROM a) ;
My guess is that it's because SELECT MAX(n), MIN(n) FROM a will generate two columns, rather than two rows.
Yes, that's the point. Other than using two subqueries (which you already found out by yourself), you can also take advantage of window functions here (the fact that you use a with clause indicates that you are running MySQL 8.0, which supports window functions):
select activity
from (
select
activity,
row_number() over(order by count(*) asc) rn_asc,
row_number() over(order by count(*) desc) rn_desc
from friends
group by activity
) t
where 1 not in (rn_asc, rn_desc)
I suspect that this performs better than a with clause and two subqueries.
Instead of using the subquery in WHERE, you can join with the subquery.
WITH a AS(
SELECT activity, COUNT(1) AS n
FROM Friends
GROUP BY activity
)
SELECT activity
FROM a AS a1
JOIN (SELECT MAX(n) AS maxn, MIN(n) AS minn) AS a2
ON a1.n NOT IN (a2.maxn, a2.minn)
You can use MIN() and MAX() window functions:
WITH cte AS (
SELECT activity,
COUNT(*) AS n,
MIN(COUNT(*)) OVER () min_n,
MAX(COUNT(*)) OVER () max_n
FROM Friends
GROUP BY activity
)
SELECT activity, n
FROM cte
WHERE n NOT IN (min_n, max_n)
See the demo.
Results:
| activity | n |
| -------- | --- |
| Singing | 2 |

Combining MySQL querys

This SQL tells me how much when the maximum occurred in the last hour, and is easily modified to show the same for the minimum.
SELECT
mt.mB as Hr_mB_Max,
mt.UTC as Hr_mB_Max_when
FROM
thundersense mt
WHERE
mt.mB =(
SELECT
MAX(mB)
FROM
thundersense mt2
WHERE
mt2.UTC >(UNIX_TIMESTAMP() -3600))
ORDER BY
utc
DESC
LIMIT 1
How do I modify it so it returns both maximum & minimum and their respective times?
Yours Simon M.
Based on my understanding of your question, you are looking to create a 4 column and 1 row answer where it looks like:
+-------+-----------------+----------+-----------------+
| event | time_it_occured | event | time_it_occured |
+-------+-----------------+----------+-----------------+
| fun | 90000 | homework | 12000 |
+-------+-----------------+----------+-----------------+
Below is a similar situation/queries you can adapt for your situation.
So, given a table called 'people' that looks like:
+----+------+--------+
| ID | name | salary |
+----+------+--------+
| 1 | bob | 40000 |
| 2 | cat | 12000 |
| 3 | dude | 50000 |
+----+------+--------+
You can use this query:
SELECT * FROM
(SELECT name, salary FROM people WHERE salary = (SELECT MAX(salary) FROM people)) t JOIN
(SELECT name, salary FROM people WHERE salary = (SELECT MIN(salary) FROM people)) a;
to generate:
+------+--------+------+--------+
| name | salary | name | salary |
+------+--------+------+--------+
| bob | 40000 | cat | 12000 |
+------+--------+------+--------+
Some things to note:
you can change the WHERE clauses to be the ones you have mentioned in question (for MAX and MIN).
Please be careful with the above query, here I am using a cartesian join (cross join in MYSQL) in order to get the 4 columns. To be honest, it doesn't make sense for me to get back data in one row but you said that's what you're looking for.
Here is what I would work with instead, getting two tuples/rows back:
+----------+--------+
| name | salary |
+----------+--------+
| dude | 95000 |
| Cat | 12000 |
+----------+--------+
And to generate this, you would use:
(SELECT name, salary FROM instructor WHERE salary = (SELECT MAX(salary) FROM instructor))
UNION
(SELECT name, salary FROM instructor WHERE salary = (SELECT MIN(salary) FROM instructor));
Also: A JOIN without a ON clause is just a CROSS JOIN.
How to use mysql JOIN without ON condition?
One method uses a join:
SELECT mt.mB as Hr_mB_Max, mt.UTC as Hr_mB_Max_when
FROM thundersense mt JOIN
(SELECT MAX(mB) as max_mb, MIN(mb) as min_mb
FROM thundersense mt2
WHERE mt2.UTC >(UNIX_TIMESTAMP() - 3600)
) mm
ON mt.mB IN (mm.max_mb, mm.min_mb)
ORDER BY utc DESC;
My only concern is your limit 1. Presumably, the mBs should be unique. If not, there is a bit of a challenge. One possibility would be to use an auto-incremented id rather than mB.

MySQL: JOINs on 1-to-1 basis

I think, this problem is of more advanced SQL category (MySQL in this case): I have two tables (TABLE_FRUIT, TABLE_ORIGIN - just example names) which have columns that can be joined (fruit_name).
Consider the following diagram:
TABLE_FRUIT
fruit_id|fruit_name |variety
--------|----------------------
1|Orange |sweet
2|Orange |large
3|Lemon |wild
4|Apple |red
5|Apple |yellow
6|Pear |early
etc...
TABLE_ORIGIN
fuit_id |fruit_name|Origin
---------|----------|--------
1|Apple | Italy
2|Pear | Portugal
3|Grape | Italy
4|Orange | Spain
5|Orange | Portugal
6|Orange | Italy
etc...
Desired Result:
TABLE_FRUIT_ORIGIN
fuit_id |fruit_name|Origin
---------|----------|--------
1|Orange | Spain
2|Orange | Portugal
3|Apple | Italy
4|Pear | Portugal
The tables have multiple identical values in columns that compose the joins(fruit_name). Despite that, I need to join the values on 1-to-1 basis. In other words, there is "Orange" value 2 times in TABLE_FRUIT and 3 times in TABLE_ORIGIN. I am looking for a result of two matches, one for Spain, one for Portugal. Italy value from TABLE_ORIGIN must be ignored, because there is no available third Orange value in TABLE_FRUIT to match Orange value in TABLE_ORIGIN.
I tried what I could, but I can not find anything relevant on Google. For example, I tried adding one more column record_used and tried UPDATE but without success.
TABLE_ORIGIN
fuit_id |fruit_name|origin |record_used
---------|----------|-----------|-----------
1|Apple | Italy |
2|Pear | Portugal |
3|Grape | Italy |
4|Orange | Spain |
5|Orange | Portugal |
6|Orange | Italy |
etc...
UPDATE
TABLE_FRUIT t1
INNER JOIN
TABLE_ORIGIN t2
ON
(t1.fruit_name = t2.fruit_name)
AND
(t2.record_used IS NULL)
SET
t2.record_used = 1;
Summary:
Find matching records between two tables on 1-to-1 basis (probably JOIN)
For each record in TABLE_FRUIT find just one (next first) matching record in TABLE_ORIGIN
If a record in TABLE_ORIGIN was already matched once with a record from TABLE_FRUIT, it may not be considered again in the same query run.
Here is what I had in mind with RANK function. After commenting, I realized mysql doesn't have a built in RANK over GROUP BY function so had to find this work around.
SELECT *
FROM (SELECT fruit_name,
#f_rank := IF(#f_name = fruit_name, #f_rank + 1, 1) AS rank,
#f_name := fruit_name
FROM table_fruit
ORDER BY fruit_name DESC) f
INNER JOIN (SELECT fruit_name,
#f_rank := IF(#f_name = fruit_name, #f_rank + 1, 1) AS
rank,
#f_name := fruit_name
FROM table_origin
ORDER BY fruit_name DESC) o
ON f.fruit_name = o.fruit_name
AND f.rank = o.rank;
Explanation: Rank each item in the table for each fruit. So Orange in the first table would have rank 1 and 2 and so will Apple. In the second table, Orange will have rank 1, 2 and 3 but others will only have rank 1. Then when joining the tables based on names, you can also join based on rank so that way, you'll get Orange rank 1 and 2 match but Orange with rank 3 will not match.
This is based on my understanding of the problem. Let me know if the requirement is something different than what I have given here.
There is an arbitrary relationship between the number of entries and the order of those entries, so use techniques to match the number of items and order of those items. In MariaDB v10 which supports "window functions" dense_rank() and row_number() this is relatively easy:
select
row_number() over(order by fn.fruit_id) as fruit_id
, fn.fruit_name, o.Origin, fn.variety
from (
select fruit_name, variety, fruit_id
, dense_rank() over(partition by fruit_name order by fruit_id) rnk
from table_fruit
) fn
inner join (
select fruit_name, Origin
, dense_rank() over(partition by fruit_name order by fruit_id) rnk
from table_origin
) o on fn.fruit_name = o.fruit_name and fn.rnk = o.rnk
fruit_id | fruit_name | Origin | variety
-------: | :--------- | :------- | :------
1 | Orange | Spain | sweet
2 | Orange | Portugal | large
3 | Apple | Italy | red
4 | Pear | Portugal | early
dbfiddle here
A pure MySQL solution is a bit more complex because it requires use of #variables that will substitute for those window functions.

What is SQL to select a property and the max number of occurrences of a related property?

I have a table like this:
Table: p
+----------------+
| id | w_id |
+---------+------+
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 6 | 5 |
| 6 | 8 |
| 6 | 10 |
| 6 | 10 |
| 7 | 8 |
| 7 | 10 |
+----------------+
What is the best SQL to get the following result? :
+-----------------------------+
| id | most_used_w_id |
+---------+-------------------+
| 5 | 8 |
| 6 | 10 |
| 7 | 8 |
+-----------------------------+
In other words, to get, per id, the most frequent related w_id.
Note that on the example above, id 7 is related to 8 once and to 10 once.
So, either (7, 8) or (7, 10) will do as result. If it is not possible to
pick up one, then both (7, 8) and (7, 10) on result set will be ok.
I have come up with something like:
select counters2.p_id as id, counters2.w_id as most_used_w_id
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters2
join (
select p_id, max(count_of_w_ids) as max_counter_for_w_ids
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters
group by p_id
) as p_max
on p_max.p_id = counters2.p_id
and p_max.max_counter_for_w_ids = counters2.count_of_w_ids
;
but I am not sure at all whether this is the best way to do it. And I had to repeat the same sub-query two times.
Any better solution?
Try to use User defined variables
select id,w_id
FROM
( select T.*,
if(#id<>id,1,0) as row,
#id:=id FROM
(
select id,W_id, Count(*) as cnt FROM p Group by ID,W_id
) as T,(SELECT #id:=0) as T1
ORDER BY id,cnt DESC
) as T2
WHERE Row=1
SQLFiddle demo
Formal SQL
In fact - your solution is correct in terms of normal SQL. Why? Because you have to stick with joining values from original data to grouped data. Thus, your query can not be simplified. MySQL allows to mix non-group columns and group function, but that's totally unreliable, so I will not recommend you to rely on that effect.
MySQL
Since you're using MySQL, you can use variables. I'm not a big fan of them, but for your case they may be used to simplify things:
SELECT
c.*,
IF(#id!=id, #i:=1, #i:=#i+1) AS num,
#id:=id AS gid
FROM
(SELECT id, w_id, COUNT(w_id) AS w_count
FROM t
GROUP BY id, w_id
ORDER BY id DESC, w_count DESC) AS c
CROSS JOIN (SELECT #i:=-1, #id:=-1) AS init
HAVING
num=1;
So for your data result will look like:
+------+------+---------+------+------+
| id | w_id | w_count | num | gid |
+------+------+---------+------+------+
| 7 | 8 | 1 | 1 | 7 |
| 6 | 10 | 2 | 1 | 6 |
| 5 | 8 | 3 | 1 | 5 |
+------+------+---------+------+------+
Thus, you've found your id and corresponding w_id. The idea is - to count rows and enumerate them, paying attention to the fact, that we're ordering them in subquery. So we need only first row (because it will represent data with highest count).
This may be replaced with single GROUP BY id - but, again, server is free to choose any row in that case (it will work because it will take first row, but documentation says nothing about that for common case).
One little nice thing about this is - you can select, for example, 2-nd by frequency or 3-rd, it's very flexible.
Performance
To increase performance, you can create index on (id, w_id) - obviously, it will be used for ordering and grouping records. But variables and HAVING, however, will produce line-by-line scan for set, derived by internal GROUP BY. It isn't such bad as it was with full scan of original data, but still it isn't good thing about doing this with variables. On the other hand, doing that with JOIN & subquery like in your query won't be much different, because of creating temporery table for subquery result set too.
But to be certain, you'll have to test. And keep in mind - you already have valid solution, which, by the way, isn't bound to DBMS-specific stuff and is good in terms of common SQL.
Try this query
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having max(ccc)
here is the sqlfidddle link
You can also use this code if you do not want to rely on the first record of non-grouping columns
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having ccc=max(ccc);

MySQL Trying to Return Staggered Results

For example I have a table with three fields:
id (int)
name (varchar)
company (int)
Let's say that I have the following data (example only)
id --- name --- company
---------------------------------------
1 --- John Baker --- 1
2 --- Ann Johnson --- 1
3 --- John Wu --- 1
4 --- Mike Johns --- 2
5 --- John John --- 2
6 --- Johnny Boy --- 2
I would like perform a search on name, and return the data staggered by company. So if I perform a search on LIKE '%John%' , I wish to return the data in a way where it is sorted by company like: 1, 2, 1, 2, 1, 2 whilst maintaining as much relevancy in return order to the original search term as possible.
I have no idea how to return the data in this staggered way, and I have thought about it for hours. If somebody can please help me I'd love to hear their ideas!
This would be rather easy if we could use SQL standard functions to ROW_NUMBER each name PARTITIONed BY company. (ROW_NUMBER gives us a "naive ranking" or simple numbering, with no ties and no gaps.) This is essentially #zebediah49's proposal in his comment. MySQL, sadly, cannot today do this.
#GordonLindoff's answer simulates this functionality in MySQL with a self-join technique. Here's another way to do the same.
First we group every person by company and then globally naively rank them, using a user variable. So, the three people in company A become #1, #2, and #3, and the two people in company B become #4 and #5, and so on:
company | name == ROW_NUMBER()d after ==> company | name | rank
--------+-------- == GROUP BY `company` ==> --------+---------+------
A | Alice A | Alice | 1
B | Bob A | Charlie | 2
A | Charlie A | Deborah | 3
A | Deborah B | Bob | 4
B | Erwin B | Erwin | 5
The user variable technique to simulate ROW_NUMBER in MySQL is easy to search for, but here's a compact demonstration from another SO answer.
Now if we MOD a global naive rank by the number of people within a company, we get a "partitioned rank", a relative rank within the company:
company | name | rank | npeople | rank % npeople
--------+----------+------+---------+----------------
A | Alice | 1 | 3 | 1
A | Charlie | 2 | 3 | 2
A | Deborah | 3 | 3 | 0
B | Bob | 4 | 2 | 0
B | Erwin | 5 | 2 | 1
Putting it all together, JOINing against a query to count the number of people in each company, we get:
SELECT id, name, ranked.company
FROM ( SELECT tbl.id, tbl.name, tbl.company, (#rn := #rn + 1) AS rn
FROM tbl
JOIN (SELECT #rn := 0) vars
WHERE tbl.name LIKE '%John%'
ORDER BY company) ranked
JOIN (SELECT company, COUNT(id) AS npeople FROM tbl GROUP BY company) companies
ON ranked.company = companies.company
ORDER BY rn MOD companies.npeople, company
If you want it sorted by company:
select *
from t
where . . .
order by company, id;
If you want it interleaved, then a counter within a company helps. Here is one way:
select t.*
from (select *,
(select count(*) from t t2 where <where clause on t2 here> and t2.comapny = t.company and t2.id < t.id) as seqnum
from t
where . . .
) t
order by seqnum, company
One possible solution:
select *
from yourTable
where ...
order by ((company*1000) + id);
Add as much zeros as you need. At least, you'll need as much zeros as this number:
select pow(10,length(max(company))) from yourTable;
The ordering may be quite slow if you pull a lot of records in this query, so I suggest you use an optimal where condition.
Not the most elegant solution, but it may work.