MySQL: JOINs on 1-to-1 basis - mysql

I think, this problem is of more advanced SQL category (MySQL in this case): I have two tables (TABLE_FRUIT, TABLE_ORIGIN - just example names) which have columns that can be joined (fruit_name).
Consider the following diagram:
TABLE_FRUIT
fruit_id|fruit_name |variety
--------|----------------------
1|Orange |sweet
2|Orange |large
3|Lemon |wild
4|Apple |red
5|Apple |yellow
6|Pear |early
etc...
TABLE_ORIGIN
fuit_id |fruit_name|Origin
---------|----------|--------
1|Apple | Italy
2|Pear | Portugal
3|Grape | Italy
4|Orange | Spain
5|Orange | Portugal
6|Orange | Italy
etc...
Desired Result:
TABLE_FRUIT_ORIGIN
fuit_id |fruit_name|Origin
---------|----------|--------
1|Orange | Spain
2|Orange | Portugal
3|Apple | Italy
4|Pear | Portugal
The tables have multiple identical values in columns that compose the joins(fruit_name). Despite that, I need to join the values on 1-to-1 basis. In other words, there is "Orange" value 2 times in TABLE_FRUIT and 3 times in TABLE_ORIGIN. I am looking for a result of two matches, one for Spain, one for Portugal. Italy value from TABLE_ORIGIN must be ignored, because there is no available third Orange value in TABLE_FRUIT to match Orange value in TABLE_ORIGIN.
I tried what I could, but I can not find anything relevant on Google. For example, I tried adding one more column record_used and tried UPDATE but without success.
TABLE_ORIGIN
fuit_id |fruit_name|origin |record_used
---------|----------|-----------|-----------
1|Apple | Italy |
2|Pear | Portugal |
3|Grape | Italy |
4|Orange | Spain |
5|Orange | Portugal |
6|Orange | Italy |
etc...
UPDATE
TABLE_FRUIT t1
INNER JOIN
TABLE_ORIGIN t2
ON
(t1.fruit_name = t2.fruit_name)
AND
(t2.record_used IS NULL)
SET
t2.record_used = 1;
Summary:
Find matching records between two tables on 1-to-1 basis (probably JOIN)
For each record in TABLE_FRUIT find just one (next first) matching record in TABLE_ORIGIN
If a record in TABLE_ORIGIN was already matched once with a record from TABLE_FRUIT, it may not be considered again in the same query run.

Here is what I had in mind with RANK function. After commenting, I realized mysql doesn't have a built in RANK over GROUP BY function so had to find this work around.
SELECT *
FROM (SELECT fruit_name,
#f_rank := IF(#f_name = fruit_name, #f_rank + 1, 1) AS rank,
#f_name := fruit_name
FROM table_fruit
ORDER BY fruit_name DESC) f
INNER JOIN (SELECT fruit_name,
#f_rank := IF(#f_name = fruit_name, #f_rank + 1, 1) AS
rank,
#f_name := fruit_name
FROM table_origin
ORDER BY fruit_name DESC) o
ON f.fruit_name = o.fruit_name
AND f.rank = o.rank;
Explanation: Rank each item in the table for each fruit. So Orange in the first table would have rank 1 and 2 and so will Apple. In the second table, Orange will have rank 1, 2 and 3 but others will only have rank 1. Then when joining the tables based on names, you can also join based on rank so that way, you'll get Orange rank 1 and 2 match but Orange with rank 3 will not match.
This is based on my understanding of the problem. Let me know if the requirement is something different than what I have given here.

There is an arbitrary relationship between the number of entries and the order of those entries, so use techniques to match the number of items and order of those items. In MariaDB v10 which supports "window functions" dense_rank() and row_number() this is relatively easy:
select
row_number() over(order by fn.fruit_id) as fruit_id
, fn.fruit_name, o.Origin, fn.variety
from (
select fruit_name, variety, fruit_id
, dense_rank() over(partition by fruit_name order by fruit_id) rnk
from table_fruit
) fn
inner join (
select fruit_name, Origin
, dense_rank() over(partition by fruit_name order by fruit_id) rnk
from table_origin
) o on fn.fruit_name = o.fruit_name and fn.rnk = o.rnk
fruit_id | fruit_name | Origin | variety
-------: | :--------- | :------- | :------
1 | Orange | Spain | sweet
2 | Orange | Portugal | large
3 | Apple | Italy | red
4 | Pear | Portugal | early
dbfiddle here
A pure MySQL solution is a bit more complex because it requires use of #variables that will substitute for those window functions.

Related

MySQL: Error: Operand should contain 1 column(s). What's wrong with my use of WHERE...NOT IN(SELECT...)?

Credit:Leetcode_1355. Activity Participants
Question:
Write an SQL query to find the names of all the activities with neither maximum, nor minimum number of participants.
Return the result table in any order. Each activity in table Activities is performed by any person in the table Friends.
Friends table:
+------+--------------+---------------+
| id | name | activity |
+------+--------------+---------------+
| 1 | Jonathan D. | Eating |
| 2 | Jade W. | Singing |
| 3 | Victor J. | Singing |
| 4 | Elvis Q. | Eating |
| 5 | Daniel A. | Eating |
| 6 | Bob B. | Horse Riding |
+------+--------------+---------------+
Activities table:
+------+--------------+
| id | name |
+------+--------------+
| 1 | Eating |
| 2 | Singing |
| 3 | Horse Riding |
+------+--------------+
Result table:
+--------------+
| activity |
+--------------+
| Singing |
+--------------+
My code is as follows:
WITH a AS(
SELECT activity, COUNT(1) AS n
FROM Friends
GROUP BY activity
)
SELECT activity
FROM a
WHERE n NOT IN (SELECT MAX(n),MIN(n) FROM a)
I have seen the success of using n != (select min(n) from a) and n != (select max(n) from a), but I did not know why my code went wrong. My guess is that it's because 'SELECT MAX(n), MIN(n) FROM a' will generate two columns, rather than two rows. While I still don't know the exact reason.
Hope someone can help me out! Thank you so much!
You are close. But NOT IN does work that way -- because the subquery returns multiple columns. And you are comparing to only one value. Instead, use two separate comparisons:
SELECT activity
FROM a
WHERE n <> (SELECT MAX(n) FROM a) AND
n <> (SELECT MIN(n) FROM a) ;
My guess is that it's because SELECT MAX(n), MIN(n) FROM a will generate two columns, rather than two rows.
Yes, that's the point. Other than using two subqueries (which you already found out by yourself), you can also take advantage of window functions here (the fact that you use a with clause indicates that you are running MySQL 8.0, which supports window functions):
select activity
from (
select
activity,
row_number() over(order by count(*) asc) rn_asc,
row_number() over(order by count(*) desc) rn_desc
from friends
group by activity
) t
where 1 not in (rn_asc, rn_desc)
I suspect that this performs better than a with clause and two subqueries.
Instead of using the subquery in WHERE, you can join with the subquery.
WITH a AS(
SELECT activity, COUNT(1) AS n
FROM Friends
GROUP BY activity
)
SELECT activity
FROM a AS a1
JOIN (SELECT MAX(n) AS maxn, MIN(n) AS minn) AS a2
ON a1.n NOT IN (a2.maxn, a2.minn)
You can use MIN() and MAX() window functions:
WITH cte AS (
SELECT activity,
COUNT(*) AS n,
MIN(COUNT(*)) OVER () min_n,
MAX(COUNT(*)) OVER () max_n
FROM Friends
GROUP BY activity
)
SELECT activity, n
FROM cte
WHERE n NOT IN (min_n, max_n)
See the demo.
Results:
| activity | n |
| -------- | --- |
| Singing | 2 |

SQL Select count of categories accross multiple columns

I have a data structure in the table of these columns
ID | Title | Category_level_1 | Category_level_2 | Category_level_3
1 | offer 1 | Browns | Greens | White
2 | offer 1 | Browns | White |
3 | offer 2 | Greens | Yellow |
4 | offer 3 | Browns | Greens |
5 | offer 4 | Browns | Yellow | White
Without the ability to change the table structure I need to "count the number for Offers per Category across the 3 columns"
There is also columns for date range of the offer, to limit to the current ones, but I want to work out the query first.
I need to get a list of all the Categories and then put offers against them.
Offer can be in the table more than once.
As far as I have got is do a temp table first with a UNION.
CREATE TEMPORARY TABLE IF NOT EXISTS Cats AS
( SELECT DISTINCT(opt) FROM (
SELECT Category_level_1 AS opt FROM a_table
UNION
SELECT Category_level_2 AS opt FROM a_table
UNION
SELECT Category_level_3 AS opt FROM a_table
) AS Temp
) ;
SELECT
Cats.opt AS "Joint Cat",
(
SELECT count(*)
FROM a_table
WHERE a_table.`Category_level_1` = Cats.opt
OR a_table.`Category_level_2` = Cats.opt
OR a_table.`Category_level_3` = Cats.opt
GROUP BY a_table.Title
) As Total
FROM Cats
WHERE Category_level_1 != ''
ORDER BY Category_level_1 ASC;
ISSUE:
a) so the union works well and I get my values. DONE
b) the Total subselect though is not grouping correctly.
I just want a count of all the rows returned but it is grouping with a count of the row titles not all rows.
So trying to work out how to figure this should work and the SQL could be totally different with the answer:
Joint Category | Total Count of offers
Browns | 3
White | 3
Greens | 2
Yellow | 2
plan
take a union of all distinct categories, alias to Joint Category
aggregate count over Joint Category ( where not null or blank - not clear from your rendering if those fields are null or blank.. )
grouping by Joint Category
query
select `Joint Category`, count(*) as `Total Count of offers`
from
(
select Title, Category_level_1 as `Joint Category`
from a_table
union
select Title, Category_level_2
from a_table
union
select Title, Category_level_3
from a_table
) allcats
where `Joint Category` is not null
and `Joint Category` <> ''
group by `Joint Category`
;
output
+----------------+-----------------------+
| Joint Category | Total Count of offers |
+----------------+-----------------------+
| Browns | 3 |
| Greens | 3 |
| White | 2 |
| Yellow | 2 |
+----------------+-----------------------+
sqlfiddle
Your results are a bit confusing . . . I cannot tell why browns and whites both have a count of 3. I think you are counting the combination of level and category.
I would be inclined to approach this using union all and then use count() or count(distinct), depending on what the counting logic really is. For the combination of level and category:
SELECT cat, COUNT(DISTINCT level, title) as numtitles
FROM ((SELECT title, 1 as level, category_level1 as cat FROM a_table) union all
(SELECT title, 2 as level, category_level2 as cat FROM a_table) union all
(SELECT title, 3 as level, category_level3 as cat FROM a_table)
) tc
WHERE cat is not null
GROUP BY cat;
You can include the date column in each of the subqueries and then include a condition in the WHERE clause.

Outliers of data by groups

I want to analyse outliers a of grouped data. Lets say I have data:
+--------+---------+-------+
| fruit | country | price |
+--------+---------+-------+
| apple | UK | 1 |
| apple | USA | 3 |
| apple | LT | 2 |
| apple | LV | 5 |
| apple | EE | 4 |
| pear | SW | 6 |
| pear | NO | 2 |
| pear | FI | 3 |
| pear | PL | 7 |
+--------+---------+-------+
Lets take pears. If my method of finding outliers would be to take 25% highest prices of pears and lowest 25%, outliers of pears would be
+--------+---------+-------+
| pear | NO | 2 |
| pear | PL | 7 |
+--------+---------+-------+
As for apples:
+--------+---------+-------+
| apple | UK | 1 |
| apple | LV | 5 |
+--------+---------+-------+
That I want is to create a view, which would show table of all fruits outliers union. If I had this view, I could analyse only tails, also intersect view with main table to get table without outliers - that's my goal. Solution to this would be:
(SELECT * FROM fruits f WHERE f.fruit = 'pear' ORDER BY f.price ASC
LIMIT (SELECT ROUND(COUNT(*) * 0.25,0)
FROM fruits f2
WHERE f2.fruit = 'pear')
)
union all
(SELECT * FROM fruits f WHERE f.fruit = 'pear' ORDER BY f.price DESC
LIMIT (SELECT ROUND(COUNT(*) * 0.25,0)
FROM fruits f2
WHERE f2.fruit = 'pear')
)
union all
(SELECT * FROM fruits f WHERE f.fruit = 'apple' ORDER BY f.price ASC
LIMIT (SELECT ROUND(COUNT(*) * 0.25,0)
FROM fruits f2
WHERE f2.fruit = 'apple')
)
union all
(SELECT * FROM fruits f WHERE f.fruit = 'apple' ORDER BY f.price DESC
LIMIT (SELECT ROUND(COUNT(*) * 0.25,0)
FROM fruits f2
WHERE f2.fruit = 'apple')
)
This would give me a table I want, however code after LIMIT doesn't seem to be correct... Another problem is number of groups. In this example there are only two groups(pears,apples), but in my actual data there are around 100 groups. So 'union all' should somehow automatically go thru all unique fruits without writing code for each unique fruit, find number of outliers of each unique fruit, take only that numbe of rows and show it all in another table(view).
You can't supply LIMIT with a value from a subquery, in any RDBMS I'm aware of. Some dbs don't even allow host variables/parameters in their versions of the clause (I'm thinking of iSeries DB2).
This is essentially a greatest-n-per-group problem. Similar queries in most other RDBMSs are solved with what are called Windowing functions - essentially, you're looking at a movable selection of data.
MySQL doesn't have this functionality, so we have to counterfeit it. The actual mechanics of the query will depend on the actual data you need, so I can only speak to what you're attempting here. The techniques should be generally adaptable, but may require rather more creativity than otherwise.
To start with you want a function that will return a number indicating it's position - I'm assuming duplicate prices should be given the same rank (ties), and that doing so won't create a gap in the number. This is essentially the DENSE_RANK() windowing function. We can get these results by doing the following:
SELECT fruit, country, price,
#Rnk := IF(#last_fruit <> fruit, 1,
IF(#last_price = price, #Rnk, #Rnk + 1)) AS Rnk,
#last_fruit := fruit,
#last_price := price
FROM Fruits
JOIN (SELECT #Rnk := 0) n
ORDER BY fruit, price
Example Fiddle
... Which generates the following for the 'apple' group:
fruit country price rank
=============================
apple UK 1 1
apple LT 2 2
apple USA 3 3
apple EE 4 4
apple LV 5 5
Now, you're trying to get the top/bottom 25% of rows. In this case, you need a count of distinct prices:
SELECT fruit, COUNT(DISTINCT price)
FROM Fruits
GROUP BY fruit
... And now we just need to join this to the previous statement to limit the top/bottom:
SELECT RankedFruit.fruit, RankedFruit.country, RankedFruit.price
FROM (SELECT fruit, COUNT(DISTINCT price) AS priceCount
FROM Fruits
GROUP BY fruit) CountedFruit
JOIN (SELECT fruit, country, price,
#Rnk := IF(#last_fruit <> fruit, 1,
IF(#last_price = price, #Rnk, #Rnk + 1)) AS rnk,
#last_fruit := fruit,
#last_price := price
FROM Fruits
JOIN (SELECT #Rnk := 0) n
ORDER BY fruit, price) RankedFruit
ON RankedFruit.fruit = CountedFruit.fruit
AND (RankedFruit.rnk > ROUND(CountedFruit.priceCount * .75)
OR RankedFruit.rnk <= ROUND(CountedFruit.priceCount * .25))
SQL Fiddle Example
...which yields the following:
fruit country price
=======================
apple UK 1
apple LV 5
pear NN 2
pear NO 2
pear PL 7
(I duplicated a pear row to show "tied" prices.)
Does round not need 2 / 3 arguments? I.e. do you not need to put in, to what decimal place you wish to round?
so
...
LIMIT (SELECT ROUND(COUNT(*) * 0.25)
FROM #fruits f2
WHERE f2.fruit = 'apple')
becomes
...
LIMIT (SELECT ROUND(COUNT(*) * 0.25,2)
FROM #fruits f2
WHERE f2.fruit = 'apple')
also, just having a quick look at lunch, but it looks like you're just expecting the min / max values. Could you not just use those functions instead?

MySQL Trying to Return Staggered Results

For example I have a table with three fields:
id (int)
name (varchar)
company (int)
Let's say that I have the following data (example only)
id --- name --- company
---------------------------------------
1 --- John Baker --- 1
2 --- Ann Johnson --- 1
3 --- John Wu --- 1
4 --- Mike Johns --- 2
5 --- John John --- 2
6 --- Johnny Boy --- 2
I would like perform a search on name, and return the data staggered by company. So if I perform a search on LIKE '%John%' , I wish to return the data in a way where it is sorted by company like: 1, 2, 1, 2, 1, 2 whilst maintaining as much relevancy in return order to the original search term as possible.
I have no idea how to return the data in this staggered way, and I have thought about it for hours. If somebody can please help me I'd love to hear their ideas!
This would be rather easy if we could use SQL standard functions to ROW_NUMBER each name PARTITIONed BY company. (ROW_NUMBER gives us a "naive ranking" or simple numbering, with no ties and no gaps.) This is essentially #zebediah49's proposal in his comment. MySQL, sadly, cannot today do this.
#GordonLindoff's answer simulates this functionality in MySQL with a self-join technique. Here's another way to do the same.
First we group every person by company and then globally naively rank them, using a user variable. So, the three people in company A become #1, #2, and #3, and the two people in company B become #4 and #5, and so on:
company | name == ROW_NUMBER()d after ==> company | name | rank
--------+-------- == GROUP BY `company` ==> --------+---------+------
A | Alice A | Alice | 1
B | Bob A | Charlie | 2
A | Charlie A | Deborah | 3
A | Deborah B | Bob | 4
B | Erwin B | Erwin | 5
The user variable technique to simulate ROW_NUMBER in MySQL is easy to search for, but here's a compact demonstration from another SO answer.
Now if we MOD a global naive rank by the number of people within a company, we get a "partitioned rank", a relative rank within the company:
company | name | rank | npeople | rank % npeople
--------+----------+------+---------+----------------
A | Alice | 1 | 3 | 1
A | Charlie | 2 | 3 | 2
A | Deborah | 3 | 3 | 0
B | Bob | 4 | 2 | 0
B | Erwin | 5 | 2 | 1
Putting it all together, JOINing against a query to count the number of people in each company, we get:
SELECT id, name, ranked.company
FROM ( SELECT tbl.id, tbl.name, tbl.company, (#rn := #rn + 1) AS rn
FROM tbl
JOIN (SELECT #rn := 0) vars
WHERE tbl.name LIKE '%John%'
ORDER BY company) ranked
JOIN (SELECT company, COUNT(id) AS npeople FROM tbl GROUP BY company) companies
ON ranked.company = companies.company
ORDER BY rn MOD companies.npeople, company
If you want it sorted by company:
select *
from t
where . . .
order by company, id;
If you want it interleaved, then a counter within a company helps. Here is one way:
select t.*
from (select *,
(select count(*) from t t2 where <where clause on t2 here> and t2.comapny = t.company and t2.id < t.id) as seqnum
from t
where . . .
) t
order by seqnum, company
One possible solution:
select *
from yourTable
where ...
order by ((company*1000) + id);
Add as much zeros as you need. At least, you'll need as much zeros as this number:
select pow(10,length(max(company))) from yourTable;
The ordering may be quite slow if you pull a lot of records in this query, so I suggest you use an optimal where condition.
Not the most elegant solution, but it may work.

Select every other row as male/female from mysql table

I've got a table containing persons gender-coded as 0 and 1. I need to select every other row as male/female. I thought I could manage this somehow by using modulo and the gender-codes 0 and 1, but I haven't managed to figure it out yet...
The result I'm looking for would look like this:
+-----+--------+-------+
| row | gender | name |
+-----+--------+-------+
| 1 | female | Lisa |
| 2 | male | Greg |
| 3 | female | Mary |
| 4 | male | John |
| 5 | female | Jenny |
+-----+--------+-------+
etc.
The alternative is to do it in PHP by merging 2 separate arrays, but I would really like it as a SQL query...
Any suggestions are appreciated!
Do two subqueries to select male and female. Use ranking function to have them enumerated.
Males:
1 | Peter
2 | John
3 | Chris
Females:
1 | Marry
2 | Christina
3 | Kate
Then multiplay ranking result by x10 and add 5 for females. So you have this:
Males:
10 | Peter
20 | John
30 | Chris
Females:
15 | Marry
25 | Christina
35 | Kate
Then do the UNION ALL and sort by new sort order/new ID.
Together it should like this (pseudo code)
SELECT
Name
FROM
(subquery for Males: RANK() AS sortOrd, Name)
UNION ALL
(subquery for Females: RANK()+1 AS SortOrd, Name)
ORDER BY SortOrd
Result should be like this:
Males and Females:
10 | Peter
15 | Marry
20 | John
25 | Christina
30 | Chris
35 | Kate
Found Emulate Row_Number() and modified a bit for your case.
set #rownum := 0;
set #pg := -1;
select p.name,
p.gender
from
(
select name,
gender,
#rownum := if(#pg = gender, #rownum+1, 1) as rn,
#pg := gender as pg
from persons
order by gender
) as p
order by p.rn, p.gender
Try on SQL Fiddle
Note: From 9.4. User-Defined Variables
As a general rule, you should never assign a value to a user variable
and read the value within the same statement. You might get the
results you expect, but this is not guaranteed.
I will leave it up to you do decide if you can use this. I don't use MySQL so I can't really tell you if you should be concerned or not.
Similar to Mikael's solution but without the need to order the resultset multiple times -
SELECT *
FROM (
SELECT people.*,
IF(gender=0, #mr:=#mr+1, #fr:=#fr+1) AS rank
FROM people, (SELECT #mr:=0, #fr:=0) initvars
) tmp
ORDER BY rank ASC, gender ASC;
To avoid having to order both the inner and outer selects I have used separate counters (#mr - male rank, #fr - female rank) in the inner select.
I've got a table containing persons gender-coded as 0 and 1
Then why would you make assumptions on the order of rows in the result set? Seems to me transforming the 0/1 into 'male'/'female' is far more robust:
select name, case gender when 0 then 'male' else 'female' end
from Person
SELECT alias.*, ROW_NUMBER() OVER (PARTITION BY GENDER ORDER BY GENDER) rnk
FROM TABLE_NAME
ORDER BY rnk, GENDER DESC