using aggregate functions before joining the tables

using aggregate functions before joining the tables - mysql

I have two tables and joining them on customer_id.
The first table is deal and I store the data of a deal there. And every deal has volume and rest , pay etc.
The second table is handle and it's hard for me explain what is purpose of this table but the handle table is the same deal table and it has volume_handle, rest_handle, pay_handle etc.
I have to use the left join because i want all records in deal table and the matched records from handle table
I want to sum volume and sum rest from deal and sum volume_handle from handle and the relationship between these tables is customer_id and buy_id.
for example the deal table:
id = 1
volume = 1000
rest = 1000
customer_id = 1
---------------
id = 2
volume = 500
rest = 0
customer_id = 1
---------------
id = 3
volume = 2000
rest = 0
customer_id = 2
and handle table is :
id = 1
volume_handle = 3000
buy_id = 1
the query i write is :
select sum(deal.rest) as rest , sum(deal.volume) as volume , sum(handle.volume_handle) as handle
from deal
left join handle on deal.customer_id = handle.buy_id
group by deal.customer_id;
and the result of this query is :
//when customer_id is 1
volume = 1500
rest = 1000
handle = 6000
//when customer_id is 2
volume = 2000
rest = 0
handle = null
the volume and the rest is right but the handle from second table is wrong because the result of sum(handle.volume_handle) is 3000 not 6000(when customer_id is 1 )
and i don't know how use aggregate functions before joining the tables.
anyone here can write the query for this problem?

Because you can have multiple rows in handle for each deal.customer_id value, you need to perform aggregation in that table before you JOIN it to deal. Something like this:
SELECT d.customer_id,
SUM(d.rest) AS rest,
SUM(d.volume) AS volume,
MAX(h.volume_handle) AS handle
FROM deal d
LEFT JOIN (SELECT buy_id, SUM(volume_handle) AS volume_handle
FROM handle
GROUP BY buy_id) h ON h.buy_id = d.customer_id
GROUP BY d.customer_id
Output:
customer_id rest volume handle
1 1000 1500 3000
2 0 2000 null
Demo on dbfiddle
Note that I have used MAX around h.volume_handle, this won't change the result (as all the values it will test will be the same) but will be required to avoid any only_full_group_by errors.

Related

Filter large number of records on mysql when using INNER JOIN with two fields

I'm working on existing database with millions of inserts per day. Database design itself pretty bad and filtering records from it takes huge amount of time. we are in the process of moving this to ELK cluster but in the mean time I have to filter some records for immediate use.
I have two tables like this
table - log_1
datetime | id | name | ip
2017-01-01 01:01:00 | 12345 | sam | 192.168.100.100
table - log_2
datetime | mobile | id
2017-01-01 01:01:00 | 999999999 | 12345
I need to filter my data using ip and from the log_1 and datetime on both log_1 and log_2. to do that I use below query
SELECT log_1.datetime, log_1.id, log_1.name, log_1.ip, log_2,datetime, log_2.mobile, log_2.id
FROM log_1
INNER JOIN log_2
ON log_1.id = log_2.id AND log_1.datetime = log_2.datetime
where log_1.ip = '192.168.100.100'
limit 100
Needless to say this take forever to retrieve results with such large number of records. is there any better method I can do the same thing without waiting long time mysql to respond ?. In other words how can I optimized my query against such large database.
database is not production and it's for just analytics

First of all, your current LIMIT clause is fairly meaningless, because the query has no ORDER BY clause. It is not clear which 100 records you want to retain. So, you might want to use something like this:
SELECT
l1.datetime,
l1.id,
l1.name,
l1.ip,
l2.datetime,
l2.mobile,
l2.id
FROM log_1 l1
INNER JOIN log_2 l2
ON l1.id = l2.id AND l1.datetime = l2.datetime
WHERE
l1.ip = '192.168.100.100'
ORDER BY
l1.datetime DESC
LIMIT 100;
This would return the 100 most recent matching records. As for speeding up this query, one way to at least make the join faster would be to add the following index on the log_2 table:
CREATE INDEX idx ON log_2 (datetime, id, mobile);
Assuming MySQL chooses to use this index, it should make the join much faster, because each id and datetime value can be looked up in a B-tree instead of doing a manual scan of the entire table. Note that the index also covers the mobile column, which is needed in the select.

Can you try this :
1. Create index on both tables on id column if not already created (this will take time).
Try creating two temp tables log_1_tmp and log_2_tmp with data as below :
Query 1 - insert into log_1_tmp select * from log_1 where log_1.ip = '192.168.100.100'
Query 2 - insert into log_2_tmp select * from log_2 where log_2.ip = '192.168.100.100'
Run your query on above two tables and here you can remove where condition from your query.
See if this works.

Joining and selecting multiple tables and creating new column names

I have very limited experience with MySQL past standard queries, but when it comes to joins and relations between multiple tables I have a bit of an issue.
I've been tasked with creating a job that will pull a few values from a mysql database every 15 minutes but the info it needs to display is pulled from multiple tables.
I have worked with it for a while to figure out the relationships between everything for the phone system and I have discovered how I need to pull everything out but I'm trying to find the right way to create the job to do the joins.
I'm thinking of creating a new table for the info I need, with columns named as:
Extension | Total Talk Time | Total Calls | Outbound Calls | Inbound Calls | Missed Calls
I know that I need to start with the extension ID from my 'user' table and match it with 'extensionID' in my 'callSession'. There may be multiple instances of each extensionID but each instance creates a new 'UniqueCallID'.
The 'UniqueCallID' field then matches to 'UniqueCallID' in my 'CallSum' table. At that point, I just need to be able to say "For each 'uniqueCallID' that is associated with the same 'extensionID', get the sum of all instances in each column or a count of those instances".
Here is an example of what I need it to do:
callSession Table
UniqueCallID | extensionID |
----------------------------
A 123
B 123
C 123
callSum table
UniqueCallID | Duration | Answered |
------------------------------------
A 10 1
B 5 1
C 15 0
newReport table
Extension | Total Talk Time | Total Calls | Missed Calls
--------------------------------------------------------
123 30 3 1
Hopefully that conveys my idea properly.
If I create a table to hold these values, I need to know how I would select, join and insert those things based on that diagram but I'm unable to construct the right query/statement.

You simply JOIN the two tables, and do a group by on the extensionID. Also, add formulas to summarize and gather the info.
SELECT
`extensionID` AS `Extension`,
SUM(`Duration`) AS `Total Talk Time`,
COUNT(DISTINCT `UniqueCallID`) as `Total Calls`,
SUM(IF(`Answered` = 1,0,1)) AS `Missed Calls`
FROM `callSession` a
JOIN `callSum` b
ON a.`UniqueCallID` = b.`UniqueCallID`
GROUP BY a.`extensionID`
ORDER BY a.`extensionID`

You can use a join and group by
select
a.extensionID
, sum(b.Duration) as Total_Talk_Time
, count(b.Answered) as Total_Calls
, count(b.Answered) -sum(b.Answered) as Missed_calls
from callSession as a
inner join callSum as b on a.UniqueCallID = b.UniqueCallID
group by a.extensionID

This should do the trick. What you are being asked to do is to aggregate the number of and duration of calls. Unless explicitly requested, you do not need to create a new table to do this. The right combination of JOINs and AGGREGATEs will get the information you need. This should be pretty straightforward... the only semi-interesting part is calculating the number of missed calls, which is accomplished here using a "CASE" statement as a conditional check on whether each call was answered or not.
Pardon my syntax... My experience is with SQL Server.
SELECT CS.Extension, SUM(CA.Duration) [Total Talk Time], COUNT(CS.UniqueCallID) [Total Calls], SUM(CASE CS.Answered WHEN '0' THEN SELECT 1 ELSE SELECT 0 END CASE) [Missed Calls]
FROM callSession CS
INNER JOIN callSum CA ON CA.UniqueCallID = CS.UniqueCallID
GROUP BY CS.Extension

Compare 2 fields of table A with 6 fields of table B

I've got two tables in my MySQL DB. One contains requiredSkill1, requiredSkillLevel1, requiredSkill2, requiredSkillLevel2, requiredSkill3 and requiredSkillLevel3.
The other table has X rows per user with the following collumns: skill and level.
itemid requiredSkill1 requiredSkillLevel1 requiredSkill2 requiredSkillLevel2 requiredSkill3 requiredSkillLevel3
2410 3319 4 20211 1 NULL NULL
The other table:
userid skill level
21058 3412 4
21058 3435 2
21058 3312 4
Keep in mind, these are just examples.
I want every itemid which has matching values in requiredSkill{1-3} and requiredSkillLevel{1-3}.
Is this even possible with a single query and is this still performant, since the user table contains up to 300 rows per user and the item table has a fixed value of 6000 rows. This will be used in a web application, so I can use Ajax to load ranges of items from the database to decrease loading time.

I don't have the data set up. A SQL Fiddle would be helpful, but I think you want to approach it like this:
SELECT itemid FROM items i
INNER JOIN users u1 ON u1.skill = i.requiredSkill1 AND u1.level >= i.requiredSkillLevel1
INNER JOIN users u2 ON u2.skill = i.requiredSkill2 AND u2.level >= i.requiredSkillLevel2 AND u1.userid = u2.userid
INNER JOIN users u3 ON u3.skill = i.requiredSkill3 AND u3.level >= i.requiredSkillLevel3 AND u3.userid = u1.userid
Someone will solve this for you if you post demo data.

mysql update with a self referencing query

I have a table of surveys which contains (amongst others) the following columns
survey_id - unique id
user_id - the id of the person the survey relates to
created - datetime
ip_address - of the submission
ip_count - the number of duplicates
Due to a large record set, its impractical to run this query on the fly, so trying to create an update statement which will periodically store a "cached" result in ip_count.
The purpose of the ip_count is to show the number of duplicate ip_address survey submissions have been recieved for the same user_id with a 12 month period (+/- 6months of created date).
Using the following dataset, this is the expected result.
survey_id user_id created ip_address ip_count #counted duplicates survey_id
1 1 01-Jan-12 123.132.123 1 # 2
2 1 01-Apr-12 123.132.123 2 # 1, 3
3 2 01-Jul-12 123.132.123 0 #
4 1 01-Aug-12 123.132.123 3 # 2, 6
6 1 01-Dec-12 123.132.123 1 # 4
This is the closest solution I have come up with so far but this query is failing to take into account the date restriction and struggling to come up with an alternative method.
UPDATE surveys
JOIN(
SELECT ip_address, created, user_id, COUNT(*) AS total
FROM surveys
WHERE surveys.state IN (1, 3) # survey is marked as completed and confirmed
GROUP BY ip_address, user_id
) AS ipCount
ON (
ipCount.ip_address = surveys.ip_address
AND ipCount.user_id = surveys.user_id
AND ipCount.created BETWEEN (surveys.created - INTERVAL 6 MONTH) AND (surveys.created + INTERVAL 6 MONTH)
)
SET surveys.ip_count = ipCount.total - 1 # minus 1 as this query will match on its own id.
WHERE surveys.ip_address IS NOT NULL # ignore surveys where we have no ip_address
Thank you for you help in advance :)

A few (very) minor tweaks to what is shown above. Thank you again!
UPDATE surveys AS s
INNER JOIN (
SELECT x, count(*) c
FROM (
SELECT s1.id AS x, s2.id AS y
FROM surveys AS s1, surveys AS s2
WHERE s1.state IN (1, 3) # completed and verified
AND s1.id != s2.id # dont self join
AND s1.ip_address != "" AND s1.ip_address IS NOT NULL # not interested in blank entries
AND s1.ip_address = s2.ip_address
AND (s2.created BETWEEN (s1.created - INTERVAL 6 MONTH) AND (s1.created + INTERVAL 6 MONTH))
AND s1.user_id = s2.user_id # where completed for the same user
) AS ipCount
GROUP BY x
) n on s.id = n.x
SET s.ip_count = n.c

I don't have your table with me, so its hard for me to form correct sql that definitely works, but I can take a shot at this, and hopefully be able to help you..
First I would need to take the cartesian product of surveys against itself and filter out the rows I don't want
select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)
The output of this should contain every pair of surveys that match (according to your rules) TWICE (once for each id in the 1st position and once for it to be in the 2nd position)
Then we can do a GROUP BY on the output of this to get a table that basically gives me the correct ip_count for each survey_id
(select x, count(*) c from (select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)) group by x)
So now we have a table mapping each survey_id to its correct ip_count. To update the original table, we need to join that against this and copy the values over
So that should look something like
UPDATE surveys SET s.ip_count = n.c from surveys s inner join (ABOVE QUERY) n on s.survey_id = n.x
There is some pseudo code in there, but I think the general idea should work
I have never had to update a table based on the output of another query myself before.. Tried to guess the right syntax for doing this from this question - How do I UPDATE from a SELECT in SQL Server?
Also if I needed to do something like this for my own work, I wouldn't attempt to do it in a single query.. This would be a pain to maintain and might have memory/performance issues. It would be best have a script traverse the table row by row, update on a single row in a transaction before moving on to the next row. Much slower, but simpler to understand and possibly lighter on your database.

Mysql subquery with joins

I have a table 'service' which contains details about serviced vehicles. It has an id and Vehicle_registrationNumber which is a foreign key. Whenever vehicle is serviced, a new record is made. So, for example if I make a service for car with registration ABCD, it will create new row, and I will set car_reg, date and car's mileage in the service table (id is set to autoincreament) (e.g 12 | 20/01/2012 | ABCD | 1452, another service for the same car will create row 15 | 26/01/2012 | ABCD | 4782).
Now I want to check if the car needs a service (the last service was either 6 or more months ago, or the current mileage of the car is more than 1000 miles since last service), to do that I need to know the date of last service and the mileage of the car at the last service. So I want to create a subquery, that will return one row for each car, and the row that I'm interested in is the newest one (either with the greatest id or latest endDate). I also need to join it with other tables because I need this for my view (I use CodeIgniter but don't know if it's possible to write subqueries using CI's ActiveRecord class)
SELECT * FROM (
SELECT *
FROM (`service`)
JOIN `vehicle` ON `service`.`Vehicle_registrationNumber` = `vehicle`.`registrationNumber`
JOIN `branch_has_vehicle` ON `branch_has_vehicle`.`Vehicle_registrationNumber` = `vehicle`.`registrationNumber`
JOIN `branch` ON `branch`.`branchId` = `branch_has_vehicle`.`Branch_branchId`
GROUP BY `service`.`Vehicle_registrationNumber` )
AS temp
WHERE `vehicle`.`available` != 'false'
AND `service`.`endDate` <= '2011-07-20 20:43'
OR service.serviceMileage < vehicle.mileage - 10000

SELECT `service`.`Vehicle_registrationNumber`, Max(`service`.`endDate`) as lastService,
MAX(service.serviceMileage) as lastServiceMileage, vehicle.*
FROM `service`
INNER JOIN `vehicle`
ON `service`.`Vehicle_registrationNumber` = `vehicle`.`registrationNumber`
INNER JOIN `branch_has_vehicle`
ON `branch_has_vehicle`.`Vehicle_registrationNumber` = `vehicle`.`registrationNumber`
INNER JOIN `branch`
ON `branch`.`branchId` = `branch_has_vehicle`.`Branch_branchId`
WHERE vehicle.available != 'false'
GROUP BY `service`.`Vehicle_registrationNumber`
HAVING lastService<=DATE_SUB(CURDATE(),INTERVAL 6 MONTH)
OR lastServiceMileage < vehicle.mileage - 10000
;
I hope I have no typo in it ..

If instead of using * in the subquery you specify the fields you need (which is always good practice anyway), most databases have a MAX() function that returns the maximum value within the group.
Actually, you don't even need the subquery. You can do the joins and use the MAX in the SELECT statement. Then you can do something like
SELECT ...., MAX('service'.'end_date') AS LAST_SERVICE
...
GROUP BY 'service'.'Vehicle_registrationNumber'
Or am I missing something?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

using aggregate functions before joining the tables - mysql

Related

Filter large number of records on mysql when using INNER JOIN with two fields

Joining and selecting multiple tables and creating new column names

Compare 2 fields of table A with 6 fields of table B

mysql update with a self referencing query

Mysql subquery with joins

Categories

Resources