SQL - select data from various tables based on multiple values in CELL - mysql

I'm working with an old database and I want to select data spread across different tables but in an recurring manner based on multiple values in one cell.
There are three tables (here with fictional data): treaties, parties, related_parties. The treaties table includes information about specific treaties as well as the ID of the parties that signed it. The parties table comprises information about the parties, whereas the related_parties table consists of information about parties that were related (the related_partner_id being the IDs used the parties table).
Table 1: treaties
id
name
party_id
1
Peace of Westphalia
49, 80
2
Peace of Rijswijk
49, 50, 81
Table 2: parties
party_id
party_name
49
Holy Roman Empire
50
Dutch Republic
51
Mainz
52
Cologne
80
France
81
Sweden
82
Paris
83
Bordeaux
Table 3: related_parties
party_id
related_party_id
49
51, 52
80
82, 83
What I want as the output is something like this, where information about the parties is gathered for every value in the relevant cells. So for the first treaty (1) this would be:
id
name
party_id
party_name
related_party_id
related_party_name
1
Peace of Westphalia
49
Holy Roman Empire
51
Mainz
1
Peace of Westphalia
49
Holy Roman Empire
52
Cologne
1
Peace of Westphalia
80
France
82
Paris
1
Peace of Westphalia
80
France
83
Bordeaux
Is this at all doable? So far every query I've created only retrieves the data pertaining to the first value in a cell.

To achieve this, the first thing to do would be divide columns into multiple rows. You have to define the maximum amount of divisions your column can have, but i came with this query which becomes pretty handy for this task:
https://www.appsloveworld.com/mysql/100/79/split-string-into-multiple-rows-in-sql
Applied to this particular case:
select t5.*,t6.party_name as related_party_name from (
select t3.id,t3.name,t3.party_id, t4.party_name,related_party_id
from
(
select t2.id, t2.name, t1.party_id,t1.related_party_id from
(
select * from (
select
party_id,(SUBSTRING_INDEX(SUBSTRING_INDEX(related_party_id, ',', NS.n), ',', -1)) as related_party_id
from (
select 1 as n union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10
) NS
inner join related_parties B ON NS.n <= CHAR_LENGTH(B.related_party_id) - CHAR_LENGTH(REPLACE(B.related_party_id, ',', '')) + 1
) divided_related_parties
)t1
left join
(
select * from (
select
id,name,(SUBSTRING_INDEX(SUBSTRING_INDEX(party_id, ',', NS.n), ',', -1)) as party_id
from (
select 1 as n union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10
) NS
inner join treaties B ON NS.n <= CHAR_LENGTH(B.party_id) - CHAR_LENGTH(REPLACE(B.party_id, ',', '')) + 1
) divided_treaties
)t2
using(party_id)
)t3
left join
(
select * from parties
)t4
using(party_id)
)t5
left join
(
select * from parties
)t6
on(t5.related_party_id=t6.party_id)
order by 1,3
This is exactly what you want (add more elements to union all if you expect more than 10 parties in a treaty).

Related

Aggregate information from one table to another with a different “layout” (mysql)

this is my starting table which provides sales information by Id.
Id
Store_Name
Market
Sales
Main_Product
1
StoreA
Rome
10
a
2
StoreB
Rome
15
b
3
StoreC
Rome
9
c
4
Mag1
Paris
10
a
5
Mag2
Paris
23
b
6
Mag3
Paris
12
c
7
Shop1
London
11
a
8
Shop2
London
31
b
9
Shop3
London
45
c
10
Shop4
London
63
d
In order to build a report and create some dynamic sentences, I will need the dataset to be "paginated" as per below table:
Id
Dimension
Dimension_Name
Sales
Main_Product
1
ShoppingCentre
StoreA
10
a
1
Market
Rome
34
a
2
ShoppingCentre
StoreB
15
b
2
Maket
Rome
34
b
3
ShoppingCentre
StoreC
9
c
3
Market
Rome
34
c
Do you have any tip about how to build the last table starting from the first one?
To sum-up:
The new table will be always by Id
Aggregation of market sales happens at row level where every single shopping centre is located
This is the query that I have built so far but wondering if there is a better and more efficient way to accomplish the same:
with store_temp_table as (
select
id
,Store_Name
,Market
, Main_Product
, sum(Sales) as Sales
from Production_Table
where 1=1
group by
1,2,3,4
)
, market_temp_table as (
select
market
, sum(Sales) as Sales
from Production_Table
where 1=1
group by
1
)
, store_temp_table_refined as(
Select
a.id
,a.Main_Product
, 'ShoppingCentre' as Dimension_Name
,SUM(a.Sales) as Sales
FROM store_temp_table a INNER JOIN
market_temp_table b on a.market = b.market
group by
1,2,3
)
, market_temp_table_refined as (
Select
a.id
,a.Main_Product
, 'Market' as DimensionName
,SUM(b.Sales) as Sales
FROM store_temp_table a INNER JOIN
market_temp_table b on a.market = b.market
group by
1,2,3
)
select * from store_temp_table_refined
union all
select * from market_temp_table_refined
Thank you
Use a CTE that returns the dimensions that you want and cross join it to a query that returns the columns of the table and an additional column with the total sales of each market:
WITH Dimensions(id, Dimension) AS (VALUES
ROW(1, 'ShoppingCentre'),
ROW(2, 'Market')
)
SELECT p.Id,
d.Dimension,
CASE d.id WHEN 1 THEN p.Store_Name ELSE p.Market END Dimension_Name,
CASE d.id WHEN 1 THEN p.Sales ELSE p.MarketSales END Sales,
p.Main_Product
FROM Dimensions d
CROSS JOIN (SELECT *, SUM(Sales) OVER (PARTITION BY Market) AS MarketSales FROM Production_Table) p
ORDER BY p.id, d.id;
Or, with UNION ALL:
SELECT Id,
'ShoppingCentre' Dimension,
Store_Name Dimension_Name,
Sales,
Main_Product
FROM Production_Table
UNION ALL
SELECT Id,
'Market',
Market,
SUM(Sales) OVER (PARTITION BY Market),
Main_Product
FROM Production_Table
ORDER BY Id,
CASE Dimension WHEN 'ShoppingCentre' THEN 1 WHEN 'Market' THEN 2 END;
See the demo.

Solving for outlier range, how to calculate on two different rows from same output?

I have query below as:
SELECT
age_quartile,
MAX(age) AS quartile_break
from
(SELECT
full_name,
age,
NTILE(4) OVER (ORDER BY age) AS age_quartile
FROM friends) AS quartiles
WHERE age_quartile IN (1, 3)
GROUP BY age_quartile)
This gives me output that looks like:
age_quartile | quantile_break
1 31
3 35
Desired Output:
outlier range
25
41
where 25 = 31-6 and 41 = 35 + 6
How can I add to my query above where I can my final desired output? My query currently gives me what the numbers are where I need to do one additional step to solve for the outlier range. thanks!
table data looks like:
friends
full_name | age
Ameila Lara 1
Evangeline Griffin 21
Kiara Atkinson 31
Isobel Nieslen 31
Genevuve Miles 32
Jane Jenkins 99
Marie Acevedo null
Dont now ntile is the right function to use here. But one way is to define the age quartiles in a temp table and join with age table and find the results. Just a try. There may be better way. Interested to see other answers.
Sample Query:
with friends as
(
select 'user1' as full_name, 31 as age union all
select 'user2' as full_name, 55 as age union all
select 'user3' as full_name, 75 as age
),
quartiles_age as
(
select 1 as quartile, 0 as st_range, 25 as end_range union all
select 2 as quartile, 26 as st_range, 50 as end_range union all
select 3 as quartile, 51 as st_range, 75 as end_range union all
select 4 as quartile, 76 as st_range, 100 as end_range
)
SELECT
fr.full_name,
fr.age,
qrtl_age.quartile,
qrtl_age.end_range - fr.age as diff_age
FROM
friends fr
join quartiles_age qrtl_age on fr.age between qrtl_age.st_range and qrtl_age.end_range
Fiddle URL : (https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c48d209264e90d276cea6ae03f2a7af6)
You can calculate the range using:
MAX(age) - MIN(age) AS age_range
That answers the question you asked. Your sample data has an arbitrary 6 for the calculation, which the question does not explain.

Better approach to solving this Mysql query

I have two tables similar to the below examples. I wrote a query to combine the two tables and get the total score of the students. The total score consists of (caone+catwo+examscore). I am searching to see if there are other better approaches to solving this in terms of performance and also syntax wise. Thanks
ca table
name id course ca_cat score
one 1 maths 1 10
one 1 maths 2 6
two 2 maths 1 9
two 2 maths 2 7
exam table
name id course score
one 1 maths 50
two 2 maths 49
My query is shown below
WITH
firstca AS (
SELECT
id,
name,
score,
subject,
FROM
ca
WHERE
cacount =1 ),
secondca AS (
SELECT
id,
name,
score,
subject,
FROM
ca
WHERE
cacount=2),
exam AS (
SELECT
id,
name,
score,
subject,
FROM
exam),
totalscore AS (
SELECT
fca.studentid,
fca.name,
fca.subject,
fca.score AS firstcascore,
sca.score AS secondcascore,
ex.score AS examscore,
(fca.score +sca.score) AS totalca,
(fca.score+sca.score+ex.score) AS totalscores,
FROM
firstca AS fca
JOIN
secondca AS sca
ON
fca.studentid=sca.studentid
AND fca.subject=sca.subject
JOIN
exam AS ex
ON
fca.studentid=ex.studentid
AND fca.subject=ex.subject
The final result table can be similar to this
name id course caone catwo exam totalscore
one 1 maths 10 6 50 66
two 2 maths 9 7 49 65
Is there a better way to write this query, maybe without the with statement or using subqueries and unions?
I wish to learn from every answer here.
Below is for BigQuery Standard SQL
#standardSQL
SELECT name, id, course, caone, catwo, exam,
caone + catwo + exam AS totalscore
FROM (
SELECT name, id, course,
MAX(IF(ca_cat = 1, t2.score, NULL)) AS caone,
MAX(IF(ca_cat = 2, t2.score, NULL)) AS catwo,
ANY_VALUE(t1.score) AS exam
FROM `project.dataset.exam` t1
JOIN `project.dataset.ca` t2
USING (name, id, course)
GROUP BY name, id, course
)
If to apply to sample data from your question - output is
Row name id course caone catwo exam totalscore
1 one 1 maths 10 6 50 66
2 two 2 maths 9 7 49 65

How to select one column with all distinct values based on some clause

I essentially like to have one query which I'll execute one time and like to have the result (no multiple query execution) and definitely, the query should use simple MySQL structure (no complex/advanced structure to be used like BEGIN, loop, cursor).
Say I've two tables.
1st Table = Country (id(PK), name);
2nd Table = Businessman (id(PK), name, city, country_id(FK))
Like to SELECT all countries, whose businessmen are from distinct cities. No two businessmen exist in one country, who are from the same city. If so, that country will not be selected by the SELECT clause.
Country
id name
1 India
2 China
3 Bahrain
4 Finland
5 Germany
6 France
Businessman
id name city country_id
1 BM1 Kolkata 1
2 BM2 Delhi 1
3 BM3 Mumbai 1
4 BM4 Beijing 2
5 BM5 Paris 6
6 BM6 Beijing 2
7 BM7 Forssa 4
8 BM8 Anqing 2
9 BM9 Berlin 5
10 BM10 Riffa 3
11 BM11 Nice 6
12 BM12 Helsinki 4
13 BM13 Bremen 5
14 BM14 Wiesbaden 5
15 BM15 Angers 6
16 BM16 Sitra  3
17 BM17 Adliya 3
18 BM18 Caen 6
19 BM19 Jinjiang 2
20 BM20 Tubli 3
21 BM21 Duisburg 5
22 BM22 Helsinki 4
23 BM23 Kaarina 4
24 BM24 Bonn 5
25 BM25 Kemi 4
In this respect, China and Finland shouldn't be listed.
I've attempted using count and group by, but no luck.
Can you please help me to build up this query.
Here it is, all you need is to join Businessman table and count cities and distinct cities and if they equal that means all businessmen are from different cities:
SELECT
c.`id`,
c.`name`,
COUNT(b.`id`) AS BusinessmanCount,
COUNT(b.`city`) AS CityCount,
COUNT(DISTINCT b.`city`) AS DistinctCityCount
FROM `countries` c
INNER JOIN Businessman b ON c.`id` = b.`country_id`
GROUP BY c.`id`
HAVING CityCount = DistinctCityCount
For minified version what you exactly need:
SELECT
c.`id`,
c.`name`
FROM `countries` c
INNER JOIN Businessman b ON c.`id` = b.`country_id`
GROUP BY c.`id`
HAVING COUNT(b.`city`) = COUNT(DISTINCT b.`city`)
Well, I think we should have waited for you to show your own query, because one learns best from mistakes and their explanations. However, now that you've got answers already:
Yes, you need group by and count. I'd group by cities to see if I got duplicates. Then select countries and exclude those that have duplicate cities.
select *
from country
where id not in
(
select country_id
from businessmen
group by city, country_id
having count(*) > 1
);
You need either nested aggregations:
select *
from Country
where id in
(
select country_id
from
(
select city, country_id,
count(*) as cnt -- get the number of rows per country/city
from Businessman
group by city, country_id
) as dt
group by country_id
having max(cnt) = 1 -- return only those countries where all counts are unique
)
Or compare two counts:
select *
from Country
where id in
(
select country_id
from Businessman
group by country_id
having count(*) = count(distinct city) -- number of cities is equal to umber of rows
)

Using SELECT clause inside AVG function

I have the following table:
table: people
id | name | income
==========================
1 Bob 10
2 John 5
3 Amy 15
4 Alyson 5
5 Henry 20
I want to take the average of only a select number of rows, like this:
SELECT
id,
name,
(AVG(
SELECT income FROM people WHERE FIND_IN_SET(id, '1,2,3')
) - income) AS averageDiff
FROM people;
I expect to get a result like this:
id | name | averageDiff
==========================
1 Bob 0
2 John 5
3 Amy -5
4 Alyson -5
5 Henry 10
However, I get an error (#1064) when trying to use the SELECT clause inside of the AVG function. How can I do this?
Use this syntax:
SELECT avg(income) FROM people WHERE FIND_IN_SET(id, '1,2,3')
You need to enclose the above query in brackeds in this way:
SELECT
......
(ABS(IFNULL(`age`, 0)
- IFNULL((SELECT AVG(age) FROM people WHERE FIND_IN_SET(id, '1,2,3')), 0)))
+ (ABS(IFNULL(`income`, 0)
- IFNULL((SELECT AVG(income) FROM people WHERE FIND_IN_SET(id, '1,2,3')), 0))) AS sumAvg
FROM `people`
....
If you want the average for everybody as your starting point, calculate that with a query and cross join it to the people you want to include:
SELECT
people.id,
people.name,
people.income - av.avgincome AS averageDiff
FROM people
CROSS JOIN (SELECT AVG(income) AS avgincome FROM people) av
WHERE people.ID IN (1, 2, 3)
If you want the average for the subset of people with ID 1, 2 or 3 as your starting point you can do it like this:
SELECT
people.id,
people.name,
people.income - av.avgincome AS averageDiff
FROM people
CROSS JOIN (
SELECT AVG(income) AS avgincome
FROM people
WHERE ID IN (1, 2, 3)) av
WHERE people.ID IN (1, 2, 3)
Both approaches avoid the correlated subquery (meaning a SELECT for a column name based on the top-level table), which is slow with large recordsets.
The FIND_IN_SET(people.id, '1,2,3') will work, but if you have an index ontable.idtheIN (1, 2, 3)` will be much faster. It will probably be faster even if you don't have the index.