MySQL query to get common or duplicate values from different columns - mysql

I have a table with two columns, one column (AffiliationCountry) shows the countries and the other column (ArtSubareaKeyword) shows the subject areas in related countries with comma-separated values.
I want to extract the subject area which is repeating for a country the same country one or more times and save it in a new column with the name "MostPopularSubjectArea".
Table with values:
As you can see in the table that a country is repeating and its values are also repeating.
AffiliationCountry
ArtSubareaKeyword1
ArtSubareaKeyword1
ArtSubareaKeyword1
Spain
Cell membranes
Cell staining
Coimmunoprecipitation
Kazakhstan
Factor analysis
Human performance
Immunofluorescence
Japan
Bone marrow
Diagnostic medicine
Genetic loci
Kazakhstan
Drug research
Factor analysis
Human performance
Results that are required:
I want a SQL query that can store for that country a new column that stores the common subjects area which is occurring more.
AffiliationCountry
MostPopularSubjectArea
Kazakhstan
Human performance

As per the table, you can select the pair of columns, union them and find the count using group by:
select
t1.affiliation_country, t1.keyword, count(t1.keyword) as count_keyword
from
(
select affiliation_country, lower(artsubareakeyword1) keyword from affliation_details
union all
select affiliation_country, lower(artsubareakeyword2) from affliation_details
union all
select affiliation_country, lower(artsubareakeyword3) from affliation_details
) t1
group by
t1.affiliation_country, t1.keyword
order by
count(t1.keyword) desc
Query Reference(Fiddle): https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=25bded45786a85f6740902699e633846
Updated Query:
with affiliation_details as
(
select
t1.affiliation_country, t1.keyword, count(t1.keyword) as count_keyword
from
(
select affiliation_country, lower(artsubareakeyword1) keyword from affliation_details
union all
select affiliation_country, lower(artsubareakeyword2) from affliation_details
union all
select affiliation_country, lower(artsubareakeyword3) from affliation_details
) t1
group by
t1.affiliation_country, t1.keyword
order by
count(t1.keyword) desc
)
select
distinct affiliation_country
from
affiliation_details
where
count_keyword in (
select
max(count_keyword)
from
affiliation_details
)

Related

why UNION ALL command in mysql doesn't give back any results?

I am trying to merge two queries into one, but UNION is not working for me.
Here is the code:
SELECT
Customer_A,
Activity,
Customer_P,
Purchase
FROM (
SELECT
buyer_id as Customer_A,
COUNT(buyer_id) As Activity
FROM
customer_info_mxs
GROUP BY buyer_id
UNION ALL
SELECT
buyer_id as Customer_P,
SUM(purchase_amount) As Purchase
FROM
customer_info_mxs
GROUP BY buyer_id
)sub
I expect to have 4 columns as a result, but I get 2 instead (Customer_A) and(Activity).
If the query is supposed to return a list of customers, their number of purchases, and the total amount they’ve spent, then you can use a single query like this:
SELECT mxs.buyer_id as Customer,
COUNT(mxs.purchase_id) As Activity,
SUM(mxs.purchase_amount) As Purchases
FROM customer_info_mxs mxs
GROUP BY mxs.buyer_id;
Otherwise, your first subquery will always be a buyer_id and a value of 1.
Be sure to change purchase_id to whatever the unique id is for each purchase if you wish to see that number.
I think there is some confusion about the union statement. The union statement returns a row set that is the sum of all of the 'unioned' queries; since these queries have only 2 columns, the combined output only has two columns. The fact that the columns have different names is irrelevant. The column names in the output are being applied from the first query of the union.
One option is to just do
select buyer_id, count(buyer_id), sum(purchase_amount) from customer_info_mxs group by buyer_id
From your question, it looks like you are trying to do a pivot, turning some of the rows into additional columns. That could be done with ... some difficulty.
i read your comment,
'main goal is to creat a dataset in which returns 5 columns as: Customer_A, Activity (top 100), customer_P, Purchase(top 100), inner join of activity and purchase'
please try this query
SET #row_number = 0, #row_number2 = 0;
SELECT t1.Customer_A,t1.Activity, t2.Customer_P, t2.Purchase
from (
SELECT (#row_number:=#row_number + 1) AS n, t.Customer_a, t.Activity
from (
select buyer_id as Customer_A,COUNT(buyer_id) As Activity
FROM customer_info_mxs
GROUP BY buyer_id
order by Activity desc
Limit 100
)t
) t1
left join (
SELECT (#row_number2:=#row_number2 + 1) AS n,
FROM (
select buyer_id as Customer_P, SUM(purchase_amount) Purchase
FROM customer_info_mxs
GROUP BY buyer_id
order by Purchase desc
Limit 100
)t
) t2 on t2.n=t1.n
basic idea is, i just create some temporary number 0-99 to table 1 (t1) and join to temporary number on table 2 (t2)

mysql select from 2 different tables and return either highest price

I have a mysql query that I'm trying to figure out.
Basically I have table 1 cols: estate agent, price, location, bungalow, cottage
and I have table 2 cols: estate agent, price, location, penthouse, duplex
As you can see these tables are very different.
I need a query to select all cols from table 1 or 2 depending on which has the highest price. For example:
SELECT * FROM table1, table2 WHERE table1.price = table2.price ORDER BY price DESC LIMIT 1,1;
If the tables are so similar, you could UNION them, then sort the result descending and get the higher price.
(SELECT * FROM table1)
UNION ALL
(SELECT * FROM table2)
ORDER BY price DESC
LIMIT 1
You would need to specify the columns explicitly if you want to give them aliases.
Re your followup question:
If you have a few columns different between the two tables, and you want to preserve them, you need to move away from SELECT * and name all the columns explicitly.
(SELECT estate agent, price, location, bungalow, cottage, NULL AS penthouse, NULL AS duplex
FROM table1)
UNION ALL
(SELECT estate agent, price, location, NULL, NULL, penthouse, duplex
FROM table2)
ORDER BY price DESC
LIMIT 1
You don't need to give aliases in the second subquery because column names are always determined by the first query of a UNION. Even if you do declare column aliases in the second query, they'll be ignored.
try this:
SELECT * FROM (
SELECT * FROM table1
UNION
SELECT * FROM table2
ORDER BY price desc)x
LIMIT 1;

MYSQL For/While Loop

Is it possible to do a For or While loop in MYSQL?
I've got the following code extract, but the full code goes up to home_id_15, home_score_15, away_id_15 and away_score_15:
$query3 = '
SELECT match_date, fixture_id, COUNT(a.home) AS home, SUM(a.points) AS points FROM
(
SELECT match_date, fixture_id, home_id_1 AS home, home_score_1 AS points FROM scores
WHERE home_id_1 =' .intval($_REQUEST['ID']).'
UNION ALL
SELECT match_date, fixture_id, away_id_1 AS home, away_score_1 AS points
FROM scores
WHERE away_id_1 =' .intval($_REQUEST['ID']).'
UNION ALL
SELECT match_date, fixture_id, home_id_2 AS home, home_score_2 AS points
FROM scores
WHERE home_id_2 =' .intval($_REQUEST['ID']).'
UNION ALL
SELECT match_date, fixture_id, away_id_2 AS home, away_score_2 AS points
FROM scores
WHERE away_id_2 =' .intval($_REQUEST['ID']).'
UNION ALL) a
GROUP BY match_date'
The first and second sub-SELECTS are basically being repeated until they reach 15.
This seems a bit long-winded and I was wondering if it's possible to use a loop in MYSQL to output
home_id_1, home_score_1, away_id_1, away_score_1 [up to] home_id_15, home_score_15, away_id_15, away_score_15
, respectively?
Thanks,
Dan.
It looks like you might need to normalize your database a little bit more. Let's say you had 6 scores for each row. Instead of making each score a column, make a separate table called "scores" or something like that with a foreign key column and a score column. Then join the table with this scores table.
Example:
TABLE: team
team_id
name
TABLE: scores
team_id
score
SELECT t.*, s.score
FROM team t
join scores s
on t.team_id=s.team_id;
Todo: Add the concept of matches into your schema and the Join

MySQL count all items, aggregate less than as others

I need to fetch data from many mysql 5.6 tables to create a pie chart. As you know, the pie chart is useful if it represents meaningful data. However when you have many non meaningful data points, say less than .. or non important values, the pie chart becomes unclear. I need to count the occurrence of each category and aggregate the not significant counts, less than X, as OTHERS.
At the moment I make a
Select category, count(*) as total from table_name group by category.
It gives me each category and it's counts. How can I get the categories whose totals are over 50 and the ones that are below, get summarized under "Others". Thanks, Jorge.
SELECT IF(total > 50, category, 'Others') AS category, SUM(total) AS total
FROM (SELECT category, COUNT(*) AS total
FROM table_name
GROUP BY category) AS subquery
GROUP BY category
Say you want to summarize all categories with less than 4 entries:
select category, count(*) as total from table_name group by category having count(*) >= 4
union
select 'others', sum(c) as total from (
select category, count(*) c from table_name group by category having count(*) < 4
) tmp

Complex Query using Multiple Columns in a single table

Let's say that I have a table of race results. The table consists of seven columns as follows: Date ( MySql Date format of xxxx-xx-xx ), and one column each for the names of the top six finishers named First, Second, Third, Fourth, Fifth, and Sixth. I have a several sets of results and maybe 100 or so different names in the various finisher columns. I need a query that would allow me to list each person whose name has appeared in any of the finisher columns ( First, Second, Third, Fourth, Fifth, Sixth ) along with only the most recent date that their name appeared. I do NOT need separate results based on finish place, so I need all six of the finisher columns lumped together. Most of the names will appear on dozens of different dates, but I only need the most recent date that each name appeared. Ideally the result would generate a list of each name and their most recent finish date, sorted from least recent to most recent. I tried to create a fiddle to demonstrate this but for whatever reason I could not get the date to work correctly in the fiddle. Anyway, anyone who can offer even a shred of help on this would be greatly appreciated.
SELECT <table>.date, <table>.first as name FROM <table> GROUP BY name
UNION DISTINCT
SELECT <table>.date, <table>.second as name FROM <table> GROUP BY name
UNION DISTINCT
SELECT <table>.date, <table>.third as name FROM <table> GROUP BY name
UNION DISTINCT
SELECT <table>.date, <table>.fourth as name FROM <table> GROUP BY name
UNION DISTINCT
SELECT <table>.date, <table>.fifth as name FROM <table> GROUP BY name
UNION DISTINCT
SELECT <table>.date, <table>.sixth as name FROM <table> GROUP BY name
ORDER BY date
This gave me exactly the results I needed. I basically just used "name" to catch the finishers, and "MAX(Date) as last" to pick out only the most recent showing for each "name" when I queried all six columns.
SELECT name, MAX(Date) as last
FROM (
SELECT first name,Date FROM Results
UNION ALL
SELECT second, Date FROM Results
UNION ALL
SELECT third, Date FROM Results
UNION ALL
SELECT fourth, Date FROM Results
UNION ALL
SELECT fifth, Date FROM Results
UNION ALL
SELECT sixth, Date FROM Results) Q
GROUP BY name
ORDER BY last ASC;
select t.first, a.date from (select distinct(first) from race union select distinct(second) from race union select distinct(third) from race union select distinct(fourth) from race union select distinct(fifth) from race union select distinct(sixth) from race) as t, race a where t.first in (a.first, a.second, a.third, a.fourth, a.fifth, a.sixth)