How can I add a column to a right join select query - mysql

I am trying to find a way to add a country code to a database call record based on a phone number column. I have a table with countries and their dialling codes called countries. I can query all records and add the country code after but I need to be able to filter and paginate the results.
I am working with a system I don't have much control over so adding new columns to tables or rewriting large blocks of code isn't really an option. This is what I have to work with.
Countries Table.
id
name
dialling_code
1
Ireland
353
2
America
1
Call Record table.
id
startdatetime
enddatetime
route_id
phonenumber
duration_seconds
1
2014-12-18 18:51:12
2014-12-18 18:52:12
23
3538700000
60
2
2014-12-18 17:41:02
2014-12-18 17:43:02
43
18700000
120
Routes table.
id
number
enabled
23
1234567890
1
43
0987654321
1
I need to get sum values of duration, total unique phone numbers all grouped by route_id, route_number but now we need to group these results by country_id so we can group callers by country. I use the mysql query below to get sum values of duration, total unique phone numbers all grouped by route_id, route_number. This query was written by another developer a long time ago.
SELECT
phone_number,
route_number,
COUNT(callrecord_id) AS total_calls,
SUM(duration_sec) AS total_duration,
callrecord_join.route_id
FROM routes
RIGHT JOIN (
SELECT
DATE(a.startdatetime) AS call_date,
a.id AS callrecord_id,
a.route_id AS route_id,
a.phonenumber AS phone_number,
a.duration_seconds as duration_sec,
b.inboundnumber AS route_number,
FROM callrecord AS a
INNER JOIN routes AS b ON a.route_id = b.id
WHERE DATE_FORMAT(a.startdatetime, '%Y-%m-%d') >= '2014-12-18'
AND DATE_FORMAT(a.startdatetime, '%Y-%m-%d') <= '2014-12-18'
AND b.isenabled = 1
) AS callrecord_join ON routes.id = callrecord_join.route_id
GROUP BY route_id, route_number
LIMIT 10 offset 0;
I have everything up to adding a country_id in the right join table so I can group by the country_id.
I know I could loop through each country using php and get the results using a where clause, something like the below but I cannot paginate these results or filter them easily.
WHERE LEFT(a.phonenumber, strlen($dialling_code)) = $dialling_code
How can I use the countries table to add a column to the join table query with the country id so I can group by route_id, route_number and country_id? Something like the table below.
id
startdatetime
enddatetime
route_id
phonenumber
duration_seconds
country_id
1
2014-12-18 18:51:12
2014-12-18 18:52:12
23
3538700000
60
1
2
2014-12-18 17:41:02
2014-12-18 17:43:02
43
18700000
120
2

The RIGHT JOIN from routes to callrecord_join serves no purpose, as you already have the INNER JOIN between routes and callrecord in the sub-query, which is on the righthand side of the join.
You can use the join you have described -
JOIN countries c ON LEFT(a.phonenumber, LENGTH(c.dialling_code)) = c.dialling_code
but it will give the same result as:
JOIN countries c ON a.phonenumber LIKE CONCAT(c.dialling_code, '%')
which should be slightly less expensive.
You should test the join to countries to make sure none of your numbers in callrecord join to multiple countries. Some international dialling codes are ambiguous, so it depends on which list of dialling codes you are using.
SELECT a.*, COUNT(*), GROUP_CONCAT(c.dialling_code)
FROM callrecord a
JOIN country c ON a.phonenumber LIKE CONCAT(c.dialling_code, '%')
GROUP BY a.id
HAVING COUNT(*) > 1;
Obviously, you will need to batch the above query if your dataset is very large.
I hope I am not grossly over-simplifying things, but from what I understand of your question the query is just:
SELECT
r.id AS route_id,
r.number AS route_number,
c.name AS country_name,
SUM(a.duration_seconds) AS total_duration,
COUNT(a.id) AS total_calls,
COUNT(DISTINCT a.phonenumber) AS unique_numbers
FROM callrecord AS a
JOIN routes AS r ON a.route_id = r.id
JOIN countries c ON a.phonenumber LIKE CONCAT(c.dialling_code, '%')
WHERE a.startdatetime >= '2014-12-18'
AND a.startdatetime < '2014-12-19'
AND r.isenabled = 1
GROUP BY r.id, r.number, c.name
LIMIT 10 offset 0;
Please note the removal of DATE_FORMAT() from the startdatetime to make these criteria sargable, assuming a suitable index is available.

Related

Aggregating three tables but getting wrong values during the aggregation operation

"employee" Table
emp_id
empName
1
ABC
2
xyx
"client" Table:
id
emp_id
clientName
1
1
a
2
1
b
3
1
c
4
2
d
"collection" Table
id
emp_id
Amount
1
2
1000
2
1
2000
3
1
1000
4
1
1200
I want to aggregate values from the three tables input tables here reported as samples. For each employee I need to find
the total collection amount for that employee (as a sum)
the clients that are involved with the corresponding employee (as a comma-separated value)
Here follows my current query.
MyQuery:
SELECT emp_id,
empName,
GROUP_CONCAT(client.clientName ORDER BY client.id SEPARATOR '') AS clientName,
SUM(collection.Amount)
FROM employee
LEFT JOIN client
ON clent.emp_id = employee.emp_id
LEFT JOIN collection
ON collection.emp_id = employee.emp_id
GROUP BY employee.emp_id;
The problem of this query is that I'm getting wrong values of sums and clients when an employee is associated to multiple of them.
Current Output:
emp_id
empName
clientName
TotalCollection
1
ABC
a,b,c,c,b,a,a,b,c
8400
2
xyz
d,d
1000
Expected Output:
emp_id
empName
clientName
TotalCollection
1
ABC
a , b , c
4200
2
xyz
d
1000
How can I solve this problem?
There are some typos in your query:
the separator inside the GROUP_CONCAT function should be a comma instead of a space, given your current output, though comma is default value, so you can really omit that clause.
each alias in your select requires the table where it comes from, as long as those field names are used in more than one tables among the ones you're joining on
your GROUP BY clause should at least contain every field that is not aggregated inside the SELECT clause in order to have a potentially correct output.
The overall conceptual problem in your query is that the join combines every row of the "employee" table with every row of the "client" table (resulting in multiple rows and higher sum of amounts during the aggregation). One way for getting out of the rabbit hole is a first aggregation on the "client" table (to have one row for each "emp_id" value), then join back with the other tables.
SELECT emp.emp_id,
emp.empName,
cl.clientName,
SUM(coll.Amount)
FROM employee emp
LEFT JOIN (SELECT emp_id,
GROUP_CONCAT(client.clientName
ORDER BY client.id) AS clientName
FROM client
GROUP BY emp_id) cl
ON cl.emp_id = emp.emp_id
LEFT JOIN (SELECT emp_id, Amount FROM collection) coll
ON coll.emp_id = emp.emp_id
GROUP BY emp.emp_id,
emp.empName,
cl.clientName
Check the demo here.
Regardless of my comment, here is a query for your desired output:
SELECT
a.emp_id,
a.empName,
a.clientName,
SUM(col.Amount) AS totalCollection
FROM (SELECT e.emp_id,
e.`empName`,
GROUP_CONCAT(DISTINCT c.clientName ORDER BY c.id ) AS clientName
FROM employee e
LEFT JOIN `client` c
ON c.emp_id = e.emp_id
GROUP BY e.`emp_id`) a
LEFT JOIN collection col
ON col.emp_id = a.emp_id
GROUP BY col.emp_id;
When having multiple joins, you should be careful about the relations and the number of results(rows) that your query generates. You might as well have multiple records in output than your desired ones.
Hope this helps
SELECT emp_id,
empName,
GROUP_CONCAT(client.clientName ORDER BY client.id SEPARATOR '') AS clientName,
C .Amount
FROM employee
LEFT JOIN client
ON clent.emp_id = employee.emp_id
LEFT JOIN (select collection.emp_id , sum(collection.Amount ) as Amount from collection group by collection.emp_id) C
ON C.emp_id = employee.emp_id
GROUP BY employee.emp_id;
it works for me now

Show all of sum and count without using group by

I want to Retrieve customer names, total orders (how many time they order the products) and the total amount they're spent in the lifetime. Run a single query WITHOUT Join, group by, having operators. Show only customers who have at least one order.
Here is my database
Customer- CustomerID| CustomerName SalesOrder- SalesOrderID | CustomerID | SaleTotal
100000 | John 1001 | 100000 | 2000
200000 | Jane 1002 | 100000 | 3000
300000 | Sean 1003 | 200000 | 5000
When I query
SELECT CustomerName,count(*) AS Total_Orders,sum(SaleTotal) AS SaleTotal
FROM Customer C,SalesOrderHeader SH WHERE C.CustomerID=SH.CustomerID;
It show only one row.
The answer that I want is
CustomerName | Total_Orders | SaleTotal
John 2 5000
Jane 1 5000
I just new on mysql.
So does anyone here know how to do this?
If you are to do this without joins and group by, then the simplest approach is to use correlated subqueries:
select *
from (
select
c.customerName,
(
select count(*)
from salesOrder so
where so.customerID = c.customerID
) totalOrders,
(
select sum(salesTotal)
from salesOrder so
where so.customerID = c.customerID
) saleTotal
from customer c
) t
where totalOrders > 0
Note that this query is clearly suboptimal - because it scans the salesOrder table twice, while a single scan would suffice. A better way to write this would be:
select c.customerName, count(*) totalOrders, sum(salesTotal) saleTotal
from customer c
inner join saleOrder so on so.customerID = c.customerID
group by c.customerID, c.customerName
There is no need for a having clause here - the inner join filters out customers that have no order already.
Use aggregation . . . and proper join syntax:
SELECT CustomerName, COUNT(*) AS Total_Orders, SUM(SaleTotal) AS SaleTotal
FROM Customer C JOIN
SalesOrderHeader SH
ON C.CustomerID = SH.CustomerID
GROUP BY CustomerName;
Your query would fail in almost any database -- including newer versions of MySQL. You have mixed aggregated columns and unaggregated columns in the SELECT. The unaggregated ones should be in a GROUP BY.
Never use commas in the FROM clause. Always use proper, explicit, standard, readable JOIN syntax.
You have to use below query. You cannot achieve it without join and group by
SELECT CustomerName,count(*) AS Total_Orders,sum(SaleTotal) AS SaleTotal
FROM Customer C,SalesOrderHeader SH WHERE C.CustomerID=SH.CustomerID
group by;

Optimisation of subqueries

I have a relation between users and groups. Users can be in a group or not.
EDIT : Added some stuff to the model to make it more convenient.
Let's say I have a rule to add users in a group considering it has a specific town, and a custom metadata like age 18).
Curently, I do that to know which users I have to add in the group of the people living in Paris who are 18:
SELECT user.id AS 'id'
FROM user
LEFT JOIN
(
SELECT user_id
FROM user_has_role_group
WHERE role_group_id = 1 -- Group for Paris
)
AS T1
ON user.id = T1.user_id
WHERE
(
user.town = 'Paris' AND JSON_EXTRACT('custom_metadata', '$.age') = 18
)
AND T1.user_id IS NULL
It works & gives me the IDs of the users to insert in group.
But when I have 50 groups to proceed, like for 50 town or various ages, it forces me to do 50 requests, it's very slow and not efficient for my Database.
How could I generate a result for each group ?
Something like :
role_group_id user_to_add
1 1
1 2
2 1
2 3
The only way I know to do that for now is to do an UNION on several sub queries like the one above, but of course it's very slow.
Note that the custom_metadata field is a user defined field. I can't create specific columns or tables.
Thanks a lot for your help.
if I good understood you:
select user.id, grp.id
from user, role_group grp
where (user.id, grp.id) not in (select user_id, role_group_id from user_has_role_group) and user.town in ('Paris', 'Warsav')
that code give list of users and group which they not belong from one of towns..
To add the missing entries to user_has_role_group, you might want to have some mapping between those town names and their group_id's.
The example below is just using a subquery with unions for that.
But you could replace that with a select from a table.
Maybe even from role_group, if those names correlate with the user town names.
insert into user_has_role_group (user_id, group_id)
select u.user_id, g.group_id
from user u
join (
select 'Paris' as name, 1 as group_id union all
select 'Rome', 2
-- add more towns here
) g on (u.town = g.name)
left join user_has_role_group ug
on (ug.user_id = u.user_id and ug.role_group_id = g.group_id)
where u.town in ('Paris','Rome') -- add more towns here
and json_extract(u.custom_metadata, '$.age') = 18
and ug.id is null;

Eliminate certain duplicated rows after group by

With this db:
Chef(cid,cname,age),
Recipe(rid,rname),
Cooked(orderid,cid,rid,price)
Customers(cuid,orderid,time,daytime,age)
[cid means chef id, and so on]
Given orders from customers, I need to find for each chef, the difference between his age and the average of people who ordered his/her meals.
I wrote the following query:
select cid, Ch.age - AVG(Cu.age) as Diff
from Chef Ch NATURAL JOIN Cooked Co,Customers Cu
where Co.orderid = Cu.orderid
group by cid
This solves the problem, but if you assume that customers has their unique id, it might not work,because then one can order two meals of the same chef and affect the calculation.
Now I know it can be answered with NOT EXISTS but I'm looking for a soultion which includes the group by function (something similar to what I wrote). So far I couldn't find (I searched and tried many ways, from select distinct , to manipulation in the where clause ,to "having count(distinct..)" )
Edit: People asked for an exmaple. i'm coding using SQLFiddle and it crashes alot, so I'll try my best:
cid | cuid | orderid | Cu.age
-----------------------------
1 333 1 20
1 200 2 41
1 200 5 41
2 4 3 36
Let's say Chef 1's age is 50 . My query will give you 50 - (20+40+40/3) = 16 and 2/3. althought it should actually be 50 - (20+40/2) = 20. (because the guy with id 200 ordered two recipes of our beloved Chef 1.).
Assume Chef 2's age is 47. My query will result:
cid | Diff
----------
1 16.667
2 11
Another edit: I wasn't taught any particular sql-query form.So I really have no idea what are the differences between Oracle's to MySql's to Microsoft Server's, so I'm basically "freestyle" querying.(I hope it will be good in my exam as well :O )
First, you should write your query as:
select cid, Ch.age - AVG(Cu.age) as Diff
from Chef Ch join
Cooked Co
on ch.cid = co.cid join
Customers Cu
on Co.orderid = Cu.orderid
group by cid;
Two different reasons:
NATURAL JOIN is just a bug waiting to happen. List the columns that you want used for the join, lest an unexpected field or spelling difference affect the results.
Never use commas in the FROM clause. Always use explicit JOIN syntax.
Next, the answer to your question is more complicated. For each chef, we can get the average age of the customers by doing:
select cid, avg(age)
from (select distinct co.cid, cu.cuid, cu.age
from Cooked Co join
Customers Cu
on Co.orderid = Cu.orderid
) c
group by cid;
Then, for the difference, you need to bring that information in as well. One method is in the subquery:
select cid, ( age - avg(cuage) ) as diff
from (select distinct co.cid, cu.cuid, cu.age as cuage, c.age as cage
from Chef c join
Cooked Co
on ch.cid = co.cid join
Customers Cu
on Co.orderid = Cu.orderid
) c
group by cid, cage;

MySQL MAX_JOIN_SIZE error

I have two tables. One is a call history table which logs calls made (starttime, endtime, phone number, user, etc). The other is an orders table which logs order details (order number, customer info, orderdate, etc.). Orders are not always created when a call is created so there isnt a guaranteed ID to match them up. Right now, I'm interested in getting totals by day. When I try to run a a query to sum calls and join orders by day I get the following error:
The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay
This is the query I use:
SELECT
DATE_FORMAT(c.date_call_start,'%Y-%m-%d') as date,
COUNT(c.id) as calls,
COUNT(o.id) as orders
FROM tbl_calls c
LEFT OUTER JOIN tbl_orders o
ON DATE_FORMAT(c.date_call_start,'%Y-%m-%d') = DATE_FORMAT(o.created,'%Y-%m-%d')
WHERE c.campaign_id = 1
AND DATE_FORMAT(c.date_call_start,'%Y-%m-%d') = '2013-12-09'
GROUP BY DATE_FORMAT(c.date_call_start,'%Y-%m-%d')
Even when there are only a few calls for a particular day, it still shows the same error. So I'm pretty sure it my query that needs work.
I have also tried a sub query, but that doesn't rollup the totals from the subquery.
SELECT
DATE_FORMAT(c.date_call_start,'%Y-%m-%d') as date,
count(c.id) as calls,
(select count(DISTINCT o.id)
FROM tbl_orders o
WHERE DATE_FORMAT(o.created,'%Y-%m-%d') = DATE_FORMAT(c.date_call_start,'%Y-%m-%d')
) as orders
FROM tb_calls c
WHERE c.campaign_id = 1
AND DATE_FORMAT(c.date_call_start,'%Y-%m-%d') BETWEEN '2013-12-09' AND '2013-12-15'
GROUP BY DATE_FORMAT(c.date_call_start,'%Y-%m-%d')
WITH ROLLUP
Any thoughts on how I can get this query to work? Ultimately I'd like a result like below so I can do other calculations like % orders etc.
date | calls | orders
------------------------------------
2013-12-01 | 100| 10
2013-12-02 | 125| 20
NULL | 225| 30
UPDATED:
Based on the answer I did the following:
created call_date field with a date field (no datetime) to tbl_calls
created date_order field with a date format (not datetime) to tbl_orders
Updated each table and set the new fields to = date_format(the_date_time_stamp,'%Y-%m-%d') from the same table.
Also added an index to each of the new date fields.
That made the following query work:
SELECT
c.call_date as date,
COUNT(DISTINCT c.id) as calls,
COUNT(DISTINCT o.id) as orders,
ROUND((COUNT(DISTINCT o.id) / COUNT(DISTINCT c.id))*100,2) as conversion
FROM tbl_calls c
JOIN tbl_orders o
ON c.call_date = o.date_order
WHERE c.campaign_id = 1
AND c.call_date BETWEEN '2013-12-09' AND '2013-12-15'
GROUP BY c.call_date
WITH ROLLUP
Which gives me the following result and I can build off of this. Thanks to each of you who provided suggestions. I tried each. All make sense. However, since I ultimately had to create the additional date fields I chose the answer by
date | calls | orders| conversion
-------------------------------------------
2013-12-09 | 151 | 6 | 3.97
2013-12-10 | 164 | 2 | 1.22
2013-12-11 | 165 | 6 | 3.64
2013-12-12 | 189 | 1 | 0.53
2013-12-13 | 116 | 4 | 3.45
null | 785 | 19 | 2.42
First - try the results of EXPLAIN SELECT.... where ... is the rest of your select query above.
Since you're performing the join on two fields which have a function applied to them - I'm take a guess and say MySQL is performing two full table scans and using type all for the join. See this for an explanation of the EXPLAIN output.
DATE_FORMAT(c.date_call_start,'%Y-%m-%d') = DATE_FORMAT(o.created,'%Y-%m-%d')
You'll most likely want to create a separate field in each table that contains just the result of the DATE_FORMAT call. Then create an index for each of these new fields. Then join on these new indexed fields. MySQL should like that much better.
Presumably you want to count the calls and orders for each date. However, that is not what your query does, because it creates a cartesian product for all orders on a given date.
Instead, summarize the data first by date and then combine the results. This may be what you want:
select c.date, calls, orders
from (select DATE_FORMAT(c.date_call_start, '%Y-%m-%d') as date, count(*) as calls
from tbl_calls c
WHERE c.campaign_id = 1 and
DATE_FORMAT(c.date_call_start, '%Y-%m-%d') = '2013-12-09'
group by DATE_FORMAT(c.date_call_start, '%Y-%m-%d')
) c left outer join
(select DATE_FORMAT(o.created,'%Y-%m-%d') as date, count(*) as orders
from tbl_orders o
group by DATE_FORMAT(o.created, '%Y-%m-%d')
) o
on c.date = o.date;
If #Barmar 's suggestion does not work, then you may need to split the fields into DATE and TIME.
A different direction is to make two temp tables (giving you three queries:
CREATE TEMPORARY TABLE `tbl_calls_temp` SELECT * FROM tbl_calls c WHERE DATE(c.date_call_start) = '2013-12-09' AND c.campaign_id = 1
Then do the same restricting for the tbl_orders TABLE
CREATE TEMPORARY TABLE `tbl_orders_temp` SELECT * FROM tbl_orders o WHERE DATE(o.created) = '2013-12-09'
Finally query against the two temporary tables. Depending on how much data you get, you may want to add indexes to the temporary tables... but in all likelihood you are facing a full-join
SELECT
DATE_FORMAT(c.date_call_start,'%Y-%m-%d') as date,
COUNT(c.id) as calls,
COUNT(o.id) as orders
FROM tbl_calls_temp c
LEFT OUTER JOIN tbl_orders_temp o
ON DATE_FORMAT(c.date_call_start,'%Y-%m-%d') = DATE_FORMAT(o.created,'%Y-%m-%d')
GROUP BY DATE_FORMAT(c.date_call_start,'%Y-%m-%d')
And that should be much faster... assuming you have any indexes in your initial tables that can be queried.