Combine multiple table and use Group By Function in MYSQL - mysql

I have 5 different datasets from 5 different tables.. From those 5 different tables I have taken below group by data..
select number,count(*) as total from tb01 group by number limit 5;
select number,count(*) as total from tb02 group by number limit 5;
Like that I can retrieve 5 different datasets. Here is an example.
+-----------+-------+
| number | total |
+-----------+-------+
| 114000259 | 1 |
| 114000400 | 1 |
| 114000686 | 1 |
| 114000858 | 1 |
| 114003895 | 1 |
+-----------+-------+
Now I need to combine those 5 different tables such as below tabular format.
+-----------+-------+-------+-------+
| number | tb01 | tb02 | tb03 |
+-----------+-------+-------+-------+
| 114000259 | 1 | 2 | 1 |
| 114000400 | 1 | 0 | 1 |
| 114000686 | 1 | 3 | 1 |
| 114000858 | 1 | 1 | 5 |
| 114003895 | 1 | 0 | 1 |
+-----------+-------+-------+-------+
Can someone help me to combine those 5 grouped data sets and get the union as above.
Note: I dont need the header as same as table names..these headers can be anything
Further I dont need to limit 5, above is to get a sample of 5 data only. I have a large dataset.

It's a job for JOINs and subqueries. My answer will consider three tables. It should be obvious how to expand it to five.
Your first subquery: get all possible numbers.
SELECT number FROM tb01 UNION
SELECT number FROM tb02 UNION
SELECT number FROM tb03
Then you have a subquery for each table to get the count.
SELECT number, COUNT(*) AS total
FROM tb02 GROUP BY number
Then you LEFT JOIN everything and SELECT from that.
SELECT numbers.number,
tb01.total tb01,
tb02.total tb02,
tb03.total tb03
FROM (
SELECT number FROM tb01 UNION
SELECT number FROM tb02 UNION
SELECT number FROM tb03
) numbers
LEFT JOIN (
SELECT number, COUNT(*) AS total
FROM tb01 GROUP BY number
) tb01 ON numbers.number = tb01.number
LEFT JOIN (
SELECT number, COUNT(*) AS total
FROM tb02 GROUP BY number
) tb02 ON numbers.number = tb02.number
LEFT JOIN (
SELECT number, COUNT(*) AS total
FROM tb03 GROUP BY number
) tb03 ON numbers.number = tb01.number
You can add ORDER BY and LIMIT clauses to that overall query as necessary.
The first subquery together with the LEFT JOIN ensures that you get results even if some of your tables are missing number rows. (Some DBMSs have FULL OUTER JOIN, but MySQL does not.)
Pro tip: If you use LIMIT without ORDER BY, you get an unpredictable subset of your rows. Unpredictable is worse than random, because you get the same subset in testing with small tables, but when your tables grow you may start getting different subsets. You'll never catch the problem in unit testing. LIMIT without ORDER BY is a serious error.

Related

Mysql order by top two then id

I want to show first two top voted Posts then others sorted by id
This is table
+----+-------+--------------+--------+
| Id | Name | Post | Votes |
+====+=======+==============+========+
| 1 | John | John's msg | -6 |
| 2 |Joseph |Joseph's msg | 8 |
| 3 | Ivan | Ivan's msg | 3 |
| 4 |Natalie|Natalie's msg | 10 |
+----+-------+--------------+--------+
After query result should be:
+----+-------+--------------+--------+
| Id | Name | Post | Votes |
+====+=======+==============+========+
| 4 |Natalie|Natalie's msg | 10 |
| 2 |Joseph |Joseph's msg | 8 |
-----------------------------------------------
| 1 | John | John's msg | -6 |
| 3 | Ivan | Ivan's msg | 3 |
+----+-------+--------------+--------+
I have 1 solution but i feel like there is better and faster way to do it.
I run 2 queries, one to get top 2, then second to get others:
SELECT * FROM table order by Votes desc LIMIT 2
SELECT * FROM table order by Id desc
And then in PHP i make sure that i show 1st query as it is, and on displaying 2nd query i remove entry's that are in 1st query so they don't double.
Can this be done in single query to select first two top voted, then others?
You would have to use subqueries or union - meaning you have a single outer query, which contains multiple queries inside. I would simply retrieve the IDs from the first query and add a id not in (...) criterion to the where clause of the 2nd query - thus filtering out the posts retrieved in the first query:
SELECT * FROM table WHERE Id NOT IN (...) ORDER BY Id DESC
With union the query would look like as follows:
(SELECT table.*, 1 as o FROM table order by Votes desc LIMIT 2)
UNION
(SELECT table.*, 0 FROM table
WHERE Id NOT IN (SELECT Id FROM table order by Votes desc LIMIT 2))
ORDER BY o DESC, if(o=1,Votes,Id) DESC
As you can see, it wraps 3 queries into one and has a more complicated ordering as well because in union the order of the records retrieved is not guaranteed.
Two simple queries seem to be a lot more efficient to me in this particular case.
There could be different ways to write a query that returns the rows in the order you want. My solution is this:
select
table.*
from
table left join (select id from table order by votes desc limit 2) l
on table.id = l.id
order by
case when l.id is not null then votes end desc,
tp.id
the subquery will return the first two id ordered by votes desc, the join will succeed whenever the row is one of the first two otherwise l.id will be null instead.
The order by will order by number of votes desc whenever the row is the first or the second (=l.id is not null), when l.id is null it will put the rows at the bottom and order by id instead.

What is SQL to select a property and the max number of occurrences of a related property?

I have a table like this:
Table: p
+----------------+
| id | w_id |
+---------+------+
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 6 | 5 |
| 6 | 8 |
| 6 | 10 |
| 6 | 10 |
| 7 | 8 |
| 7 | 10 |
+----------------+
What is the best SQL to get the following result? :
+-----------------------------+
| id | most_used_w_id |
+---------+-------------------+
| 5 | 8 |
| 6 | 10 |
| 7 | 8 |
+-----------------------------+
In other words, to get, per id, the most frequent related w_id.
Note that on the example above, id 7 is related to 8 once and to 10 once.
So, either (7, 8) or (7, 10) will do as result. If it is not possible to
pick up one, then both (7, 8) and (7, 10) on result set will be ok.
I have come up with something like:
select counters2.p_id as id, counters2.w_id as most_used_w_id
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters2
join (
select p_id, max(count_of_w_ids) as max_counter_for_w_ids
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters
group by p_id
) as p_max
on p_max.p_id = counters2.p_id
and p_max.max_counter_for_w_ids = counters2.count_of_w_ids
;
but I am not sure at all whether this is the best way to do it. And I had to repeat the same sub-query two times.
Any better solution?
Try to use User defined variables
select id,w_id
FROM
( select T.*,
if(#id<>id,1,0) as row,
#id:=id FROM
(
select id,W_id, Count(*) as cnt FROM p Group by ID,W_id
) as T,(SELECT #id:=0) as T1
ORDER BY id,cnt DESC
) as T2
WHERE Row=1
SQLFiddle demo
Formal SQL
In fact - your solution is correct in terms of normal SQL. Why? Because you have to stick with joining values from original data to grouped data. Thus, your query can not be simplified. MySQL allows to mix non-group columns and group function, but that's totally unreliable, so I will not recommend you to rely on that effect.
MySQL
Since you're using MySQL, you can use variables. I'm not a big fan of them, but for your case they may be used to simplify things:
SELECT
c.*,
IF(#id!=id, #i:=1, #i:=#i+1) AS num,
#id:=id AS gid
FROM
(SELECT id, w_id, COUNT(w_id) AS w_count
FROM t
GROUP BY id, w_id
ORDER BY id DESC, w_count DESC) AS c
CROSS JOIN (SELECT #i:=-1, #id:=-1) AS init
HAVING
num=1;
So for your data result will look like:
+------+------+---------+------+------+
| id | w_id | w_count | num | gid |
+------+------+---------+------+------+
| 7 | 8 | 1 | 1 | 7 |
| 6 | 10 | 2 | 1 | 6 |
| 5 | 8 | 3 | 1 | 5 |
+------+------+---------+------+------+
Thus, you've found your id and corresponding w_id. The idea is - to count rows and enumerate them, paying attention to the fact, that we're ordering them in subquery. So we need only first row (because it will represent data with highest count).
This may be replaced with single GROUP BY id - but, again, server is free to choose any row in that case (it will work because it will take first row, but documentation says nothing about that for common case).
One little nice thing about this is - you can select, for example, 2-nd by frequency or 3-rd, it's very flexible.
Performance
To increase performance, you can create index on (id, w_id) - obviously, it will be used for ordering and grouping records. But variables and HAVING, however, will produce line-by-line scan for set, derived by internal GROUP BY. It isn't such bad as it was with full scan of original data, but still it isn't good thing about doing this with variables. On the other hand, doing that with JOIN & subquery like in your query won't be much different, because of creating temporery table for subquery result set too.
But to be certain, you'll have to test. And keep in mind - you already have valid solution, which, by the way, isn't bound to DBMS-specific stuff and is good in terms of common SQL.
Try this query
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having max(ccc)
here is the sqlfidddle link
You can also use this code if you do not want to rely on the first record of non-grouping columns
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having ccc=max(ccc);

Mysql join with counting results in another table

I have two tables, one with ranges of numbers, second with numbers. I need to select all ranges, which have at least one number with status in (2,0). I have tried number of different joins, some of them took forever to execute, one which I ended with is fast, but it select really small number of ranges.
SELECT SQL_CALC_FOUND_ROWS md_number_ranges.*
FROM md_number_list
JOIN md_number_ranges
ON md_number_list.range_id = md_number_ranges.id
WHERE md_number_list.phone_num_status NOT IN (2, 0)
AND md_number_ranges.reseller_id=1
GROUP BY range_id
LIMIT 10
OFFSET 0
What i need is something like "select all ranges, join numbers where number.range_id = range.id and where there is at least one number with phone_number_status not in (2, 0).
Any help would be really appreciated.
Example data structure:
md_number_ranges:
id | range_start | range_end | reseller_id
1 | 000001 | 000999 | 1
2 | 100001 | 100999 | 2
md_number_list:
id | range_id | number | phone_num_status
1 | 1 | 0000001 | 1
2 | 1 | 0000002 | 2
3 | 2 | 1000012 | 0
4 | 2 | 1000015 | 2
I want to be able select range 1, because it has one number with status 1, but not range 2, because it has two numbers, but with status which i do not want to select.
It's a bit hard to tell what you want, but perhaps this will do:
SELECT *
from md_number_ranges m
join (
SELECT md_number_ranges.id
, count(*) as FOUND_ROWS
FROM md_number_list
JOIN md_number_ranges
ON md_number_list.range_id = md_number_ranges.id
WHERE md_number_list.phone_num_status NOT IN (2, 0)
AND md_number_ranges.reseller_id=1
GROUP BY range_id
) x
on x.id=m.id
LIMIT 10
OFFSET 0
Is this what you're looking for?
SELECT DISTINCT r.*
FROM md_number_ranges r
JOIN md_number_list l ON r.id = l.range_id
WHERE l.phone_num_status NOT IN (0,2)
SQL Fiddle Demo

Show all grouped results and sort

I have a table, like that one:
| B | 1 |
| C | 2 |
| B | 2 |
| A | 2 |
| C | 3 |
| A | 2 |
I would like to fetch it, but sorted and grouped. That is, I would like it grouped by the letter, but sorted by the highest sum of the group. Also, I want to show all entries within the group:
| C | 3 |
| C | 2 |
| A | 2 |
| A | 2 |
| B | 2 |
| B | 1 |
The order is that way because C has 3 and 2. 3+2=5, which is higher than 2+2=4 for A which in turn is higher than 2+1=3 for B.
I need to show all "grouped" letters because there are other columns that are distinct all of which I need shown.
EDIT:
Thanks for the quick reply. I have the audacity, however, to inquire further.
I have this query:
SELECT * FROM `ip_log` WHERE `IP` IN
(SELECT `IP` FROM `ip_log` GROUP BY `IP` HAVING COUNT(DISTINCT `uid`) > 1)
GROUP BY `uid` ORDER BY `IP`
The letters in the upper description are ip (I need it grouped by the IP addresses) and the numbers are timestamp (I need it sorted by the sum (or just used as the sorting parameter)). Should I create a temporary table and then use the solution below?
select t.Letter, t.Value
from MyTable t
inner join (
select Letter, sum(Value) as ValueSum
from MyTable
group by Letter
) ts on t.Letter = ts.Letter
order by ts.ValueSum desc, t.Letter, t.Value desc
SQL Fiddle Example
If your table's columns are letter and number, the way I would go around to doing this would be the following:
SELECT
letter,
GROUP_CONCAT(number ORDER BY number DESC),
SUM(number) AS total
FROM table
GROUP BY letter
ORDER BY total desc
What you will get, based on your example is the following:
| C | 3,2 | 5
| A | 2,2 | 4
| B | 2,1 | 3
You can then process that data to get the actual information you want/need.
If you still want the data in the format you requested originally, it is not possible with a single query. The reason for that is that you can't sort based on an aggregated data that you are not calculating in the same query (the SUM of the number column). So you will need to make a sub-query to calculate that and feed it back into the original query (disclaimer: untested query):
SELECT
letter,
number
FROM table
JOIN (SELECT ltr, SUM(number) AS total FROM table GROUP BY letter) AS totals
ON table.letter = totals.ltr
ORDER BY totals.total desc, letter desc, number desc

Score algorithm in multiple join

I have a list of publications stored in publications table. Each publication has a many-to-many relation with categories and also a many-to-many relation with keywords.
Given a publication I'd like to find related ones based on a score value computed with the following algorithm:
each shared category with other publications counts as one point
each shared keyword with other publications counts as one point
the score value is the sum of the points computed with previous steps
I want to retrieve with a single query the list of related publications ordered by this score.
Now I have these two queries which compute the score for both categories and keyword
SELECT c.publication_id, (COUNT(c.category_id)) AS cscore
FROM cat_pub c
WHERE c.category_id IN <list of category ids obtained from the current publication>
GROUP BY c.publication_id
ORDER BY cscore DESC
and for the keyword score
SELECT k.publication_id, (COUNT(k.keyword_id)) AS kscore
FROM key_pub k
WHERE k.keyword IN <list of category ids obtained from the current publication>
GROUP BY k.publication_id
ORDER BY kscore DESC
Finally I need to JOIN the resulting query with a SELECT query which should retrieve publications data (title, intro, etc,) ordering them by score and with a limit clause to get the most relevant publications related to the selected one.
Currently I tried to use these two queries as subtables in a join:
SELECT mydata.*, (q1.cscore + q2.kscore) AS score
FROM publications p
INNER JOIN (<cscore query>) q1 ON p.id = q1.publication_id
INNER JOIN (<kscore query>) q2 ON p.id = q2.publication_id
ORDER BY score DESC
LIMIT 5
EXPLAIN shows me that a couple of temporary table will be used. Could it be a performance problem? Is there any better way to implement this?
Update
To answer to Johan's comment
Your solution is wrong. Use a LIMIT clause in subqueries could lead to inconsistent results with every value for the limit. What if I have the following results for the subqueries (I'll show 11 records, but your query will fetch only the first ten)
+-------+--------+ +-------+--------+
| p.id | cscore | | p.id | kscore |
+-------+--------+ +-------+--------+
| 27854 | 100 | | 27865 | 100 |
| 27853 | 100 | | 27864 | 100 |
| 27852 | 100 | | 27863 | 100 |
| 27851 | 100 | | 27862 | 100 |
| 27850 | 100 | | 27861 | 100 |
| 27849 | 100 | | 27860 | 100 |
| 27848 | 100 | | 27859 | 100 |
| 27847 | 100 | | 27858 | 100 |
| 27846 | 100 | | 27857 | 100 |
| 27845 | 100 | | 27856 | 100 |
| 27844 | 100 | | 27855 | 100 |
| 1000 | 99 | | 1000 | 99 |
+-------+--------+ +-------+--------+
If I have ten record with 100 as cscore and ten different records with 100 as kscore the join will produce an empty set. So I'm not getting any result, while the publication with id 1000 should be the solution and it's left out from the result set.
Furthermore I could consider your solution with a LEFT JOIN, in this case only records from the left table will be fetched, and each record will get a total score of 100 (because of the NULL given by the empty kscore field in the second table). Again, the result is wrong because the highest scored record should be p1000 with a total score of 198 (= 99 + 99)
Your solution cannot produce reliable results.
You only want 5 results each from the subqueries.
I think it is best to only select 5 from then and use that in the query.
Rewrite q1 as:
SELECT c.publication_id, COUNT(*) AS cscore
FROM cat_pub c
WHERE c.publication_id = p.id
AND c.category_id IN <list of category ids obtained from the current publication>
GROUP BY c.publication_id
ORDER BY cscore DESC
LIMIT 10
Rewrite q2 as:
SELECT k.publication_id, COUNT(*) AS kscore
FROM key_pub k
WHERE p.id = k.publication_id
AND k.keyword IN <list of category ids obtained from the current publication>
GROUP BY k.publication_id
ORDER BY kscore DESC
LIMIT 10
Leave the join as is:
SELECT p.*, (q1.cscore + q2.kscore) AS score
FROM publications p
INNER JOIN (<cscore query>) q1 ON p.id = q1.publication_id
INNER JOIN (<kscore query>) q2 ON p.id = q2.publication_id
ORDER BY score DESC
LIMIT 5
Note that count(*) is usually a faster choice, because it will not test of null If you can have null values and don't want to include those in the count, then name the count(field) explicitly.