Mysql join with counting results in another table - mysql

I have two tables, one with ranges of numbers, second with numbers. I need to select all ranges, which have at least one number with status in (2,0). I have tried number of different joins, some of them took forever to execute, one which I ended with is fast, but it select really small number of ranges.
SELECT SQL_CALC_FOUND_ROWS md_number_ranges.*
FROM md_number_list
JOIN md_number_ranges
ON md_number_list.range_id = md_number_ranges.id
WHERE md_number_list.phone_num_status NOT IN (2, 0)
AND md_number_ranges.reseller_id=1
GROUP BY range_id
LIMIT 10
OFFSET 0
What i need is something like "select all ranges, join numbers where number.range_id = range.id and where there is at least one number with phone_number_status not in (2, 0).
Any help would be really appreciated.
Example data structure:
md_number_ranges:
id | range_start | range_end | reseller_id
1 | 000001 | 000999 | 1
2 | 100001 | 100999 | 2
md_number_list:
id | range_id | number | phone_num_status
1 | 1 | 0000001 | 1
2 | 1 | 0000002 | 2
3 | 2 | 1000012 | 0
4 | 2 | 1000015 | 2
I want to be able select range 1, because it has one number with status 1, but not range 2, because it has two numbers, but with status which i do not want to select.

It's a bit hard to tell what you want, but perhaps this will do:
SELECT *
from md_number_ranges m
join (
SELECT md_number_ranges.id
, count(*) as FOUND_ROWS
FROM md_number_list
JOIN md_number_ranges
ON md_number_list.range_id = md_number_ranges.id
WHERE md_number_list.phone_num_status NOT IN (2, 0)
AND md_number_ranges.reseller_id=1
GROUP BY range_id
) x
on x.id=m.id
LIMIT 10
OFFSET 0

Is this what you're looking for?
SELECT DISTINCT r.*
FROM md_number_ranges r
JOIN md_number_list l ON r.id = l.range_id
WHERE l.phone_num_status NOT IN (0,2)
SQL Fiddle Demo

Related

Combine multiple table and use Group By Function in MYSQL

I have 5 different datasets from 5 different tables.. From those 5 different tables I have taken below group by data..
select number,count(*) as total from tb01 group by number limit 5;
select number,count(*) as total from tb02 group by number limit 5;
Like that I can retrieve 5 different datasets. Here is an example.
+-----------+-------+
| number | total |
+-----------+-------+
| 114000259 | 1 |
| 114000400 | 1 |
| 114000686 | 1 |
| 114000858 | 1 |
| 114003895 | 1 |
+-----------+-------+
Now I need to combine those 5 different tables such as below tabular format.
+-----------+-------+-------+-------+
| number | tb01 | tb02 | tb03 |
+-----------+-------+-------+-------+
| 114000259 | 1 | 2 | 1 |
| 114000400 | 1 | 0 | 1 |
| 114000686 | 1 | 3 | 1 |
| 114000858 | 1 | 1 | 5 |
| 114003895 | 1 | 0 | 1 |
+-----------+-------+-------+-------+
Can someone help me to combine those 5 grouped data sets and get the union as above.
Note: I dont need the header as same as table names..these headers can be anything
Further I dont need to limit 5, above is to get a sample of 5 data only. I have a large dataset.
It's a job for JOINs and subqueries. My answer will consider three tables. It should be obvious how to expand it to five.
Your first subquery: get all possible numbers.
SELECT number FROM tb01 UNION
SELECT number FROM tb02 UNION
SELECT number FROM tb03
Then you have a subquery for each table to get the count.
SELECT number, COUNT(*) AS total
FROM tb02 GROUP BY number
Then you LEFT JOIN everything and SELECT from that.
SELECT numbers.number,
tb01.total tb01,
tb02.total tb02,
tb03.total tb03
FROM (
SELECT number FROM tb01 UNION
SELECT number FROM tb02 UNION
SELECT number FROM tb03
) numbers
LEFT JOIN (
SELECT number, COUNT(*) AS total
FROM tb01 GROUP BY number
) tb01 ON numbers.number = tb01.number
LEFT JOIN (
SELECT number, COUNT(*) AS total
FROM tb02 GROUP BY number
) tb02 ON numbers.number = tb02.number
LEFT JOIN (
SELECT number, COUNT(*) AS total
FROM tb03 GROUP BY number
) tb03 ON numbers.number = tb01.number
You can add ORDER BY and LIMIT clauses to that overall query as necessary.
The first subquery together with the LEFT JOIN ensures that you get results even if some of your tables are missing number rows. (Some DBMSs have FULL OUTER JOIN, but MySQL does not.)
Pro tip: If you use LIMIT without ORDER BY, you get an unpredictable subset of your rows. Unpredictable is worse than random, because you get the same subset in testing with small tables, but when your tables grow you may start getting different subsets. You'll never catch the problem in unit testing. LIMIT without ORDER BY is a serious error.

MySQL - Retrieve the max value of an associated column within a LEFT JOIN with a different perimeter than the WHERE clause of the main query

I'm using MySql 5.6 and have a select query with a LEFT JOIN but i need to retrieve the max of a associated column email_nb) but with a different "perimeter" of constraints.
Let's take an example: let me state that it is a mere example with only 5 rows but it should work also when I have thousands... (I'm stating this since there is a LIMIT clause in my query)
Table 'query_results'
+-----------------------------+------------+--------------+
| query_result_id | query_id | author |
+-----------------------------+------------+--------------+
| 2 | 1 | john |
| 3 | 1 | eric |
| 7 | 3 | martha |
| 9 | 4 | john |
| 10 | 1 | john |
+-----------------------------+------------+--------------+
Table 'customers_emails'
+-------------------+-----------------+--------------+-----------+-------------+------------------------
| customer_email_id | query_result_id | customer_id | author | email_nb | days_since_sending
+-------------------+-----------------+--------------+-----------+-------------+------------------------
| 5 | 2 | 12 | john | 2 | 150
| 12 | 3 | 7 | eric | 4 | 90
| 27 | 3 | 12 | eric | 2 | 86
| 40 | 9 | 15 | john | 9 | 87
| 42 | 2 | 12 | john | 7 | 23
| 51 | 10 | 12 | john | 3 | 89
+-------------------+-----------------+--------------+-----------+-------------+-----------------------
Notes:
you can have a query_result where the author appears in NO row at all in any of the customers_emails, hence the LEFT JOIN I'm using.
You can see author is by design kind of duplicated as it's both on the first table and the second table each time associated with a query_result_id. It's important to note.
email_nb is an integer between 0 and 10
there is a LIMIT clause as I need to retrieve a set number of records
Today my query aims at retrieving query_results with a certain number of conditions on The specificity is that I make sure to retrieve query_results with an author who does not appear in any customer_email_id where the days_since_sending would be less than 60 days: it means i check these days_since_sending not only within the records for this query, but across all customers_emails thanks to the subquery NOT IN (see below).
This is my current query for customer_id = 12 and query_id = 1
SELECT
qr.query_result_id,
qr.author,
FROM
query_results qr
LEFT JOIN
customers_emails ce
ON
qr.author = ce.author
WHERE
qr.query_id = 1 AND
qr.author IS NOT NULL
AND qr.author NOT IN (
SELECT recipient
FROM customers_emails
WHERE
(
customer_id = 12 AND
( days_since_sending >= 60) )
)
)
# we don't take by coincidence/bad luck 2 query results with the same author
GROUP BY
qr.author
ORDER BY
qr.query_result_id ASC
LIMIT
20
This is the expected output:
+-----------------------------+------------+--------------+
| query_result_id | author | email_nb |
+-----------------------------+------------+--------------+
| 10 | john | 7 |
| 3 | eric | 2 |
+-----------------------------+------------+--------------+
My challenge/difficulty today:
Notice on the 2nd line Eric is tied to email_nb 2 and not the max of all Eric's emails which could have been 4 if we had taken the max of email_nb across ALL messages to author=eric. but we stay within the limit of customer_id = 12 so there's only one left with email_nb = 2
Also notice that on the first line, the email_nb associated with query_result = 10 is 7, and not 3, which could have been the case as 3 is what appears in table customers_emails on the last line.
Indeed for emails to 'john' i had the choice between email_nb 2, 7 and 3 but I take highest so it's 7 (even if this email is from more than 60 days ago !! This is very important and part of what I don't know how to do: the perimeters are different: today I retrieve all the query_results where the author has NOT been sent a email for the past 60 days (see the NOT IN subquery) BUT I need to have in the column the max email_nb sent to john by customer_id=12 and query_id=1 EVEN if it was sent more than 60 days ago so these are different perimeters...Don't really know how to do this...
It means in other words I don't want to find the max (email_nb) within the same WHERE clauses such as days_since_sending >= 60 or within the same LIMIT and GROUP BY...as my current query: what I neeed is to retrieve the maximum value of email_nb for customer_id=12 AND query_id=1 and sent to john across ALL records on the customers_emails table!
If there is no associated row on customers_emails at all (it means no email have been ever sent by this customer for this query in the past) then the email_nb should be sth like NULL..
This means I do NOT want this output:
+-----------------------------+------------+--------------+
| query_result_id | author | email_nb |
+-----------------------------+------------+--------------+
| 10 | john | 3 |
| 3 | eric | 2 |
+-----------------------------+------------+--------------+
How to achieve this in MySQL 5.6 ?
Since you were confusing a bit, I came up on this.
select
max(q.query_result_id) as query_result_id,q.author,max(email_nb) as email_nb
from query_results q
left join customers_emails c on q.author=c.author
where customer_id=12 and query_id=1
group by q.author;
I think the best thing to do in a situation like this is break it down into smaller queries and then combine them together.
The first thing you want to do is this:
The specificity is that I make sure to retrieve query_results with an author who does not appear in any customer_email_id where the days_since_sending would be less than 60 days
This might look something like this:
-- Query A
SELECT DISTINCT q.author FROM query_results q
WHERE q.author NOT IN (
SELECT c.author FROM customers_emails c
WHERE c.days_since_sending < 60
)
AND q.query_id = 1
This will get you the list of authors (with duplicates removed) that haven't had an email in the last 60 days that appear for the given query ID. Your next requirement is the following:
I need to have in the column the max email_nb sent to john by customer_id=12 and query_id=1 EVEN if it was sent more than 60 days ago
This query could look like this:
-- Query B
SELECT c.query_result_id, c.author, MAX(c.email_nb) as max_email_nb
FROM customers_emails c
LEFT JOIN query_results q ON c.author = q.author
WHERE c.customer_id = 12
AND q.query_id = 1
GROUP BY c.query_result_id, c.author
That gets you the maximum email_nb for each author/query_result combination, not taking into consideration the date at all.
The only thing left to do is reduce the set of results from the second query down to only the authors that appear in the first query. There are a few different methods for doing that. For example, you could INNER JOIN the two queries by author:
SELECT b.* FROM (
-- Query B
SELECT c.query_result_id, c.author, MAX(c.email_nb) as max_email_nb
FROM customers_emails c
LEFT JOIN query_results q ON c.author = q.author
WHERE c.customer_id = 12
AND q.query_id = 1
GROUP BY c.query_result_id, c.author
) b INNER JOIN (
-- Query A
SELECT DISTINCT q.author FROM query_results q
WHERE q.author NOT IN (
SELECT c.author FROM customers_emails c
WHERE c.days_since_sending < 60
)
AND q.query_id = 1
) a ON a.author = b.author
You could use another NOT IN clause:
SELECT b.* FROM (
-- Query B
SELECT c.query_result_id, c.author, MAX(c.email_nb) as max_email_nb
FROM customers_emails c
LEFT JOIN query_results q ON c.author = q.author
WHERE c.customer_id = 12
AND q.query_id = 1
GROUP BY c.query_result_id, c.author
) b
WHERE b.author NOT IN (
-- Query A
SELECT DISTINCT q.author FROM query_results q
WHERE q.author NOT IN (
SELECT c.author FROM customers_emails c
WHERE c.days_since_sending < 60
)
AND q.query_id = 1
) a
There are most likely ways to improve the speed or reduce down the lines of code for this query, but if you need to do that you now have a query that works at least that you can compare the results to.

Can I count multiple columns with Group By?

I have a table with these columns:
s, s2, s3
1, 2, 3
4
1, 3
4, 2,
2, 1
3, 4
4
I want to know how many times the unique values in column s appears in the columns s, s2 and s3.
So far I have:
$query = "SELECT s, COUNT(*) as count FROM table GROUP BY s";
This will give me:
1 - count 2
2 - count 1
3 - count 1
4 - count 3
But I want to count the column s2 and s3 also so the outcome will be:
1 - count 3
2 - count 3
3 - count 3
4 - count 4
Any idea how I must edit the query so I can count the columns s, s2 and s3 group by the values of column s?
Kind regards,
Arie
You need a UNION ALL for all the columns and then count them:
select
t.s, count(*) counter
from (
select s from tablename union all
select s2 from tablename union all
select s3 from tablename
) t
where t.s is not null
group by t.s
See the demo.
Results:
| s | counter |
| --- | ------- |
| 1 | 3 |
| 2 | 3 |
| 3 | 3 |
| 4 | 4 |
If in the columns s2 and s3 there are values that do not exist in the column s and you want them excluded, then instead of:
where t.s is not null
use
where t.s in (select s from tablename)
#forpas answer is a good one. However, two things you should consider.
Due to the use of union the query will become slower as the data size increases.
If the input is as following:
s, s2, s3
1, 2, 3
4
1, 3
4, 2,
2, 1
3, 4
4 5
The result of the provided query will be:
| s | counter |
| --- | ------- |
| 1 | 3 |
| 2 | 3 |
| 3 | 3 |
| 4 | 4 |
| 5 | 1 |
whereas it should remain the same as 5 is not present into the s column.
In order to resolve both of the above issues, I propose the approach to use JOIN instead of UNION:
SELECT t3.s, IF(t3.s = t4.s3, cnt1 + 2, cnt1 + 1) as counter FROM
(SELECT *, count(*) AS cnt1 FROM
(SELECT s from table) AS t1
LEFT JOIN
(SELECT s2 FROM table) AS t2
ON t1.s = t2.s2 GROUP BY t1.s
) AS t3
LEFT JOIN
(SELECT s3 FROM table) AS t4
ON t3.s = t4.s3
ORDER BY t3.s
The query might look a bit lengthy and complicated but it is really simple when you look into the logic.
Step 1
What I have done here is to make a left join from s column to s2 and counted results for that so it will give you 1 lesser number than how many numbers are present in total as it will make relation left to right.
Step 2
Then I have made a left join from s to s3, and only increase the count of step 1 by 1 if the relation is found.
Step 3
Ultimately I have increased the count by 1 so that we can convert the number of relations to the number of the enities.
I hope it makes sense

Select where max Mysql

help please make sql select to database. There are such data.
My table is:
id news_id season seria date_update
---|------|---------|-----|--------------------
1 | 4 | 1 | 7 | 2017-04-14 16:38:10
2 | 4 | 1 | 7 | 2017-04-14 17:38:10
5 | 4 | 1 | 7 | 2017-04-14 16:38:10
3 | 4 | 1 | 7 | 2017-04-14 16:38:10
4 | 4 | 1 | 7 | 2017-04-14 16:38:10
6 | 4 | 1 | 7 | 2017-04-14 16:38:10
7 | 4 | 1 | 7 | 2017-04-14 16:38:10
8 | 1 | 1 | 25 | 2017-04-23 18:42:00
Need to get all cells grouped by max season and seria and date and sorted by date_update DESC.
In result i need next rows
id news_id season seria date_update
---|------|---------|-----|--------------------
8 | 1 | 1 | 25 | 2017-04-23 18:42:00
2 | 4 | 1 | 7 | 2017-04-14 17:38:10
Because this rows have highest season and seria and date_update per One news_id. I.e i need to select data wich have highest season and seria and date_update grouped by news_id and also sorted by date_update DESC
I tried so, but the data is not always correct, and it does not always for some reason cover all the cells that fit the condition.
SELECT serial.*
FROM serial as serial
INNER JOIN (SELECT id, MAX(season) AS maxseason, MAX(seria) AS maxseria FROM serial GROUP BY news_id) as one_serial
ON serial.id = one_serial.id
WHERE serial.season = one_serial.maxseason AND serial.seria = one_serial.maxseria
ORDER BY serial.date_update
Please, help. Thank.
The specification is unclear.
But we do know that the GROUP BY news_id clause is going collapse all of the rows with a common value of news_id into a single row. (Other databases would throw an error with this syntax; we can get MySQL to throw a similar error if we include ONLY_FULL_GROUP_BY in the sql_mode.)
My suggestion would be to remove the GROUP BY news_id clause from the end of the query.
But that's just a guess. It's not at all clear what you are trying to achieve.
EDIT
SELECT t.*
FROM (
SELECT r.news_id
, r.season
, r.seria
, MAX(r.date_update) AS max_date_update
FROM (
SELECT p.news_id
, p.season
, MAX(p.seria) AS max_seria
FROM (
SELECT n.news_id
, MAX(n.season) AS max_season
FROM serial n
GROUP BY n.news_id
) o
JOIN serial p
ON p.news_id = o.news_id
AND p.season = o.max_season
) q
JOIN serial r
ON r.news_id = q.news_id
AND r.season = q.season
AND r.seria = q.max_seria
) s
JOIN serial t
ON t.news_id = s.news_id
AND t.season = s.season
AND t.seria = s.seria
AND t.date_update = s.max_date_update
GROUP BY t.news_id
ORDER BY t.news_id
Or, an alternate approach making use of MySQL user-defined variables...
SELECT s.id
, s.season
, s.seria
, s.date_update
FROM (
SELECT IF(q.news_id = #p_news_id,0,1) AS is_max
, q.id
, #p_news_id := q.news_id AS news_id
, q.season
, q.seria
, q.date_update
FROM ( SELECT #p_news_id := NULL ) r
CROSS
JOIN serial q
ORDER
BY q.news_id DESC
, q.season DESC
, q.seria DESC
, q.date_update DESC
) s
WHERE s.is_max
ORDER BY s.news_id
The subquery selects the maximum season and the maximum seria per news_id. How many records exist for the news_id that match both the maximum season and the maximum seria we don't know. It can be, one or two or thousand or zero.
So with the join you get an unknown number of records per news_id. Then you group by news_id. This gets you one result row per news_id. How then can you select serial.*? * means all columns from a row, but which row,when there can be many for a news_id? MySQL usually picks values arbitrarily in this case (usually all from the same row, but even that is not guaranteed). So you end up with random rows which you order by date_update.
This doesn't make much sense. So the question is: what do you really want to achieve? Maybe my explanation suffices and you are able now to fix your query yourself.

What is SQL to select a property and the max number of occurrences of a related property?

I have a table like this:
Table: p
+----------------+
| id | w_id |
+---------+------+
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 6 | 5 |
| 6 | 8 |
| 6 | 10 |
| 6 | 10 |
| 7 | 8 |
| 7 | 10 |
+----------------+
What is the best SQL to get the following result? :
+-----------------------------+
| id | most_used_w_id |
+---------+-------------------+
| 5 | 8 |
| 6 | 10 |
| 7 | 8 |
+-----------------------------+
In other words, to get, per id, the most frequent related w_id.
Note that on the example above, id 7 is related to 8 once and to 10 once.
So, either (7, 8) or (7, 10) will do as result. If it is not possible to
pick up one, then both (7, 8) and (7, 10) on result set will be ok.
I have come up with something like:
select counters2.p_id as id, counters2.w_id as most_used_w_id
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters2
join (
select p_id, max(count_of_w_ids) as max_counter_for_w_ids
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters
group by p_id
) as p_max
on p_max.p_id = counters2.p_id
and p_max.max_counter_for_w_ids = counters2.count_of_w_ids
;
but I am not sure at all whether this is the best way to do it. And I had to repeat the same sub-query two times.
Any better solution?
Try to use User defined variables
select id,w_id
FROM
( select T.*,
if(#id<>id,1,0) as row,
#id:=id FROM
(
select id,W_id, Count(*) as cnt FROM p Group by ID,W_id
) as T,(SELECT #id:=0) as T1
ORDER BY id,cnt DESC
) as T2
WHERE Row=1
SQLFiddle demo
Formal SQL
In fact - your solution is correct in terms of normal SQL. Why? Because you have to stick with joining values from original data to grouped data. Thus, your query can not be simplified. MySQL allows to mix non-group columns and group function, but that's totally unreliable, so I will not recommend you to rely on that effect.
MySQL
Since you're using MySQL, you can use variables. I'm not a big fan of them, but for your case they may be used to simplify things:
SELECT
c.*,
IF(#id!=id, #i:=1, #i:=#i+1) AS num,
#id:=id AS gid
FROM
(SELECT id, w_id, COUNT(w_id) AS w_count
FROM t
GROUP BY id, w_id
ORDER BY id DESC, w_count DESC) AS c
CROSS JOIN (SELECT #i:=-1, #id:=-1) AS init
HAVING
num=1;
So for your data result will look like:
+------+------+---------+------+------+
| id | w_id | w_count | num | gid |
+------+------+---------+------+------+
| 7 | 8 | 1 | 1 | 7 |
| 6 | 10 | 2 | 1 | 6 |
| 5 | 8 | 3 | 1 | 5 |
+------+------+---------+------+------+
Thus, you've found your id and corresponding w_id. The idea is - to count rows and enumerate them, paying attention to the fact, that we're ordering them in subquery. So we need only first row (because it will represent data with highest count).
This may be replaced with single GROUP BY id - but, again, server is free to choose any row in that case (it will work because it will take first row, but documentation says nothing about that for common case).
One little nice thing about this is - you can select, for example, 2-nd by frequency or 3-rd, it's very flexible.
Performance
To increase performance, you can create index on (id, w_id) - obviously, it will be used for ordering and grouping records. But variables and HAVING, however, will produce line-by-line scan for set, derived by internal GROUP BY. It isn't such bad as it was with full scan of original data, but still it isn't good thing about doing this with variables. On the other hand, doing that with JOIN & subquery like in your query won't be much different, because of creating temporery table for subquery result set too.
But to be certain, you'll have to test. And keep in mind - you already have valid solution, which, by the way, isn't bound to DBMS-specific stuff and is good in terms of common SQL.
Try this query
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having max(ccc)
here is the sqlfidddle link
You can also use this code if you do not want to rely on the first record of non-grouping columns
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having ccc=max(ccc);