I'm very new to SQL/MySQL and Stackoverflow for that matter, and I'm trying to create a query through iReport (though I don't have to use iReport) for SugarCRM CE. What I need is to create a report that displays the number of "Referrals", "Voicemails", "Emails", and "Call_ins" that are linked to a specific "user" (employee). The query I currently have set up works; however it is running through the data multiple times generating a report that is 200+ pages. This is the code that I am currently using:
SELECT
( SELECT COUNT(*) FROM `leads` INNER JOIN `leads_cstm` ON `leads`.`id` = `leads_cstm`.`id_c` WHERE (leadtype_c = 'Referral' AND users.`id` = leads.`assigned_user_id`) ),
( SELECT COUNT(*) FROM `leads` INNER JOIN `leads_cstm` ON `leads`.`id` = `leads_cstm`.`id_c` WHERE (leadtype_c = 'VM' AND users.`id` = leads.`assigned_user_id`) ),
( SELECT COUNT(*) FROM `leads` INNER JOIN `leads_cstm` ON `leads`.`id` = `leads_cstm`.`id_c` WHERE (leadtype_c = 'Email' AND users.`id` = leads.`assigned_user_id`) ),
users.`first_name`,users.`last_name`
FROM
`users` users,
`leads` leads
I would appreciate any guidance!
You want to use conditional summation. The following uses MySQL syntax:
SELECT sum(leadtype_c = 'Referral') as Referrals,
sum(leadtype_c = 'VM') as VMs,
sum(leadtype_c = 'Email') as Emails,
users.`first_name`, users.`last_name`
FROM users join
`leads`
on users.`id` = leads.`assigned_user_id` INNER JOIN
`leads_cstm`
ON `leads`.`id` = `leads_cstm`.`id_c`
group by users.id;
You can use COUNT with CASE for this:
SELECT u.first_name,
u.last_name,
count(case when leadtype_c = 'Referral' then 1 end),
count(case when leadtype_c = 'VM' then 1 end),
count(case when leadtype_c = 'Email' then 1 end)
FROM users u
JOIN leads l ON u.id = l.assigned_user_id
JOIN leads_cstm lc ON l.id = lc.id_c
GROUP BY u.id
To match your exact results, you should probably use an OUTER JOIN instead, but this gives you the idea.
A Visual Explanation of SQL Joins
Related
I could do with some help here with this issue in which I'm trying to generate a set of results with balances brought forward and balances carried forward.
The MySQL script:
CREATE OR REPLACE VIEW vwbalances AS
SELECT th.user_id usrid
, YEAR(th.date_created) year
, IFNULL(
(SELECT SUM(tx.amount)
FROM transdetail tx
JOIN transhdr th
ON th.thdr_id = tx.thdr_id
JOIN users u
ON th.user_id = u.user_id
JOIN transtype tt
ON tt.ttype_id = tx.ttype_id
WHERE tx.ttype_id in (2,9,11)
AND u.user_id = usrid
and YEAR(tx.date_created) < year
)
,0) bal_bfwd
, SUM(td.amount) bal_ytd
, ifnull(
(SELECT SUM(ty.amount)
FROM transdetail ty
JOIN transhdr th
ON th.thdr_id = ty.thdr_id
JOIN users u
ON th.user_id = u.user_id
JOIN transtype tt
ON tt.ttype_id = ty.ttype_id
WHERE ty.ttype_id in (2,9,11)
AND u.user_id = usrid
AND YEAR(ty.date_created) <= year
)
,0) bal_cfwd
FROM transdetail td
JOIN transhdr th
ON th.thdr_id = td.thdr_id
JOIN users u
ON th.user_id = u.user_id
JOIN transtype tt
ON tt.ttype_id = td.ttype_id
WHERE td.ttype_id in (2,9,11)
GROUP
BY th.user_id
, YEAR(th.date_created)
I get this (incorrect results) when I run SELECT * FROM vwbalances:
And I get this (correct results) when I run the full SELECT statement that I used to create the VIEW:
Thanks in advance.
The query you posted shouldn't even work.
The usrid is an alias in your most outer select. It is not available in your subqueries.
Unless you have a column called year things like year(tx.date_created) < year shouldn't work either
You use the same table alias in your outer query and in your subqueries. This should also not be possible. If it is, that's probably why you get weird results.
Apart from that, you don't have to do the basically same query multiple times. You can shorten your query to something like this:
SELECT `th`.`user_id` AS usrid
, year(`th`.`date_created`) AS 'year'
sum(if(year(td.date_created`) < year, amount, 0)) as bal_ytd,
sum(if(year(td.date_created`) <= year, amount, 0)) as bal_cfwd
FROM `transdetail` `td`
JOIN `transhdr` `th` ON `th`.`thdr_id` = `td`.`thdr_id`
JOIN `users` `u` ON `th`.`user_id` = `u`.`user_id`
JOIN `transtype` `tt` ON `tt`.`ttype_id` = `td`.`ttype_id`
WHERE `td`.`ttype_id` IN (2, 9, 11)
GROUP BY `th`.`user_id`
, year(`th`.`date_created`)
(given of course, that this weird year thing works for you)
The following script has worked correctly. All help and tips appreciated:
CREATE OR REPLACE VIEW vwbalances AS
SELECT tho.user_id AS usrid, YEAR(td.date_created) AS 'year'
, ROUND(IFNULL((SELECT sum(tx.amount) FROM transdetail tx
JOIN transhdr th ON th.thdr_id = tx.thdr_id
JOIN users u ON th.user_id = u.user_id
JOIN transtype tt ON tt.ttype_id = tx.ttype_id
WHERE tx.ttype_id IN (2,9,11) AND u.user_id = tho.user_id AND
YEAR(tx.date_created) < YEAR(td.date_created) ),0),2) AS
bal_bfwd
, round(sum(td.amount),2) AS bal_ytd
, ROUND(IFNULL((select SUM(ty.amount) FROM transdetail ty
JOIN transhdr th on th.thdr_id = ty.thdr_id
JOIN users u ON th.user_id = u.user_id
JOIN transtype tt ON tt.ttype_id = ty.ttype_id
WHERE ty.ttype_id IN (2,9,11) AND u.user_id = tho.user_id AND
YEAR(ty.date_created) <= YEAR(td.date_created)),0),2) AS
bal_cfwd
FROM transdetail td JOIN transhdr tho ON tho.thdr_id =
td.thdr_id
JOIN users u ON tho.user_id = u.user_id
JOIN transtype tt on tt.ttype_id = td.ttype_id
WHERE td.ttype_id IN (2,9,11)
GROUP BY tho.user_id, YEAR(td.date_created);
Imagine that we have a database with a logs table and types table. I want to do a query where I figure out if UserX has entries for certain types of logs. Let's say that UserX has logged type_1 and type_2, but not type_3. I want to write a simple query to see if this is true or false.
At first I tried something like:
SELECT * FROM logs AS l
INNER JOIN types AS t
ON t.id = l.type_id
WHERE t.name = "type_1"
AND t.name = "type_2"
AND t.name != "type_3";
But I quickly realised that it was not possible to do it like this, since t.name cannot have multiple values. I have tried a bunch of different approaches now, but cannot seem to find the one right for me. I'm sure the solution is fairly simple, I just don't see it at the moment.
Hope someone can point me in the right direction.
I have made a simple test database in this fiddle, to use for testing and example: https://www.db-fiddle.com/f/nA6iKgCcJwKnXKsxaNvsLt/0
One option with conditional aggregation.
SELECT l.userID
FROM logs AS l
JOIN types AS t ON t.id = l.type_id
GROUP BY l.userID
HAVING COUNT(DISTINCT CASE WHEN t.name IN ('type_1','type_2') THEN t.name END) = 2
AND COUNT(DISTINCT CASE WHEN t.name = 'type_3' THEN t.name END) = 0
You can do it like Vamsi, but if you prefer an easier to understand SQL then you can do it like this:
SELECT * FROM logs AS l
INNER JOIN types AS t
ON t.id = l.type_id
WHERE true
AND EXISTS (SELECT 1 FROM logs ll WHERE l.user_id = ll.user_id AND type_id = 1)
AND EXISTS (SELECT 1 FROM logs ll WHERE l.user_id = ll.user_id AND type_id = 2)
AND NOT EXISTS (SELECT 1 FROM logs ll WHERE l.user_id = ll.user_id AND type_id = 3)
I do not recommend using count(distinct) for this purpose. It can be expensive. I would simply do:
SELECT l.userId
FROM logs l INNER JOIN
types t
ON t.id = l.type_id
WHERE t.name IN ('type_1', 'type_2', 'type_3')
GROUP BY l.userId
HAVING SUM(t.name = 'type_1') > 0 AND -- at least one
SUM(t.name = 'type_2') > 0 AND -- at least one
SUM(t.name = 'type_3') = 0 ; -- none
I'm not sure how to make the following SQL query more efficient. Right now, the query is taking 8 - 12 seconds on a pretty fast server, but that's not close to fast enough for a Website when users are trying to load a page with this code on it. It's looking through tables with many rows, for instance the "Post" table has 717,873 rows. Basically, the query lists all Posts related to what the user is following (newest to oldest).
Is there a way to make it faster by only getting the last 20 results total based on PostTimeOrder?
Any help would be much appreciated or insight on anything that can be done to improve this situation. Thank you.
Here's the full SQL query (lots of nesting):
SELECT DISTINCT p.Id, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime, p.Content AS Content, p.Bu AS Bu, p.Se AS Se, UNIX_TIMESTAMP(p.PostCreationTime) AS PostTimeOrder
FROM Post p
WHERE (p.Id IN (SELECT pc.PostId
FROM PostCreator pc
WHERE (pc.UserId IN (SELECT uf.FollowedId
FROM UserFollowing uf
WHERE uf.FollowingId = '100')
OR pc.UserId = '100')
))
OR (p.Id IN (SELECT pum.PostId
FROM PostUserMentions pum
WHERE (pum.UserId IN (SELECT uf.FollowedId
FROM UserFollowing uf
WHERE uf.FollowingId = '100')
OR pum.UserId = '100')
))
OR (p.Id IN (SELECT ssp.PostId
FROM SStreamPost ssp
WHERE (ssp.SStreamId IN (SELECT ssf.SStreamId
FROM SStreamFollowing ssf
WHERE ssf.UserId = '100'))
))
OR (p.Id IN (SELECT psm.PostId
FROM PostSMentions psm
WHERE (psm.StockId IN (SELECT sf.StockId
FROM StockFollowing sf
WHERE sf.UserId = '100' ))
))
UNION ALL
SELECT DISTINCT p.Id AS Id, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime, p.Content AS Content, p.Bu AS Bu, p.Se AS Se, UNIX_TIMESTAMP(upe.PostEchoTime) AS PostTimeOrder
FROM Post p
INNER JOIN UserPostE upe
on p.Id = upe.PostId
INNER JOIN UserFollowing uf
on (upe.UserId = uf.FollowedId AND (uf.FollowingId = '100' OR upe.UserId = '100'))
ORDER BY PostTimeOrder DESC;
Changing your p.ID in (...) predicates to existence predicates with correlated subqueries may help. Also since both halves of your union all query are pulling from the Post table and possibly returning nearly identical records you might be able to combine the two into one query by left outer joining to UserPostE and adding upe.PostID is not null as an OR condition in the WHERE clause. UserFollowing will still inner join to UPE. If you want the same Post record twice once with upe.PostEchoTime and once with p.PostCreationTime as the PostTimeOrder you'll need keep the UNION ALL
SELECT
DISTINCT -- <<=- May not be needed
p.Id
, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime
, p.Content AS Content
, p.Bu AS Bu
, p.Se AS Se
, UNIX_TIMESTAMP(coalesce( upe.PostEchoTime
, p.PostCreationTime)) AS PostTimeOrder
FROM Post p
LEFT JOIN UserPostE upe
INNER JOIN UserFollowing uf
on (upe.UserId = uf.FollowedId AND
(uf.FollowingId = '100' OR
upe.UserId = '100'))
on p.Id = upe.PostId
WHERE upe.PostID is not null
or exists (SELECT 1
FROM PostCreator pc
WHERE pc.PostId = p.ID
and pc.UserId = '100'
or exists (SELECT 1
FROM UserFollowing uf
WHERE uf.FollowedId = pc.UserID
and uf.FollowingId = '100')
)
OR exists (SELECT 1
FROM PostUserMentions pum
WHERE pum.PostId = p.ID
and pum.UserId = '100'
or exists (SELECT 1
FROM UserFollowing uf
WHERE uf.FollowedId = pum.UserId
and uf.FollowingId = '100')
)
OR exists (SELECT 1
FROM SStreamPost ssp
WHERE ssp.PostId = p.ID
and exists (SELECT 1
FROM SStreamFollowing ssf
WHERE ssf.SStreamId = ssp.SStreamId
and ssf.UserId = '100')
)
OR exists (SELECT 1
FROM PostSMentions psm
WHERE psm.PostId = p.ID
and exists (SELECT
FROM StockFollowing sf
WHERE sf.StockId = psm.StockId
and sf.UserId = '100' )
)
ORDER BY PostTimeOrder DESC
The from section could alternatively be rewritten to also use an existence clause with a correlated sub query:
FROM Post p
LEFT JOIN UserPostE upe
on p.Id = upe.PostId
and ( upe.UserId = '100'
or exists (select 1
from UserFollowing uf
where uf.FollwedID = upe.UserID
and uf.FollowingId = '100'))
Turn IN ( SELECT ... ) into a JOIN .. ON ... (see below)
Turn OR into UNION (see below)
Some the tables are many:many mappings? Such as SStreamFollowing? Follow the tips in http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
Example of IN:
SELECT ssp.PostId
FROM SStreamPost ssp
WHERE (ssp.SStreamId IN (
SELECT ssf.SStreamId
FROM SStreamFollowing ssf
WHERE ssf.UserId = '100' ))
-->
SELECT ssp.PostId
FROM SStreamPost ssp
JOIN SStreamFollowing ssf ON ssp.SStreamId = ssf.SStreamId
WHERE ssf.UserId = '100'
The big WHERE with all the INs becomes something like
JOIN ( ( SELECT pc.PostId AS id ... )
UNION ( SELECT pum.PostId ... )
UNION ( SELECT ssp.PostId ... )
UNION ( SELECT psm.PostId ... ) )
Get what you can done of that those suggestions, then come back for more advice if you still need it. And bring SHOW CREATE TABLE with you.
I am trying to bring back a string based on an IF statement but it is extremely slow.
It has something to do with the first subquery but I am unsure of how to rearrange this as to bring back the same results but faster.
Here is my SQL:
SELECT IF
(
(
SELECT COUNT(*)
FROM
(
SELECT DISTINCT enquiryId, type
FROM parts_enquiries, parts_service_types AS pst
WHERE parts_enquiries.serviceTypeId = pst.id
) AS parts
WHERE parts.enquiryId = enquiries.id
) > 1, 'Mixed',
(
SELECT DISTINCT type
FROM parts_enquiries, parts_service_types AS pst
WHERE parts_enquiries.serviceTypeId = pst.id AND enquiryId = enquiries.id
)
) AS partTypes
FROM enquiries,
entities
WHERE enquiries.entityId = entities.id
How can I make it faster?
I have modified my original query below, but I am getting the error that subquery returns more than one row:
SELECT
(SELECT
CASE WHEN COUNT(DISTINCT type) > 1 THEN 'Mixed' ELSE `type` END AS type
FROM parts_enquiries
INNER JOIN parts_service_types AS pst ON parts_enquiries.serviceTypeId = pst.id
INNER JOIN enquiries ON parts_enquiries.enquiryId = enquiries.id
INNER JOIN entities ON enquiries.entityId = entities.id
GROUP BY enquiryId) AS partTypes
FROM enquiries,
entities
WHERE enquiries.entityId = entities.id
Please have a look if this query yields the same results:
SELECT
enquiryId,
CASE WHEN COUNT(DISTINCT type) > 1 THEN 'Mixed' ELSE `type` END AS type
FROM parts_enquiries
INNER JOIN parts_service_types AS pst ON parts_enquiries.serviceTypeId = pst.id
INNER JOIN enquiries ON parts_enquiries.enquiryId = enquiries.id
INNER JOIN entities ON enquiries.entityId = entities.id
GROUP BY enquiryId
But N.B.'s comment is still valid. To see if and index is used and other information we need to see the EXPLAIN and the table definitions.
This should get you what you want.
I would first pre-query your parts enquiries and parts service types looking for both the count and MINIMUM of the part 'type', grouped by the enquiry ID.
then, run your IF() against that result. If the distinct count is > 0, then 'Mixed'. If only one, since I did the MIN(), it would only have the description of that one value that you desire anyhow.
SELECT
E.ID
IF ( PreQuery.DistTypes > 1, 'Mixed', PreQuery.FirstType ) as PartType
from
Enquiries E
JOIN ( SELECT
PE.EnquiryID,
COUNT( DISTINCT PE.ServiceTypeID ) as DistTypes,
MIN( PST.Type ) as FirstType
from
Parts_Enquiries PE
JOIN Parts_Service_Types PST
ON PE.ServiceTypeID = PST.ID
group by
PE.EnquiryID ) as PreQuery
ON E.ID = PreQuery.EnquiryID
I'm currently writing a webapp that matches users based on answered question. I've realized my matching algorithm in just one query and tuned it so far that it takes 8.2ms to calculate the match percentage between 2 users. But my webapp has to take a list of users and iterate through the list performing this query. For 5000 users it took 50sec on my local machine. Is it possible to put everything in one query that returns one column with the user_id and one column with the calculated match? Or is a stored procedure an option?
I'm currently working with MySQL but willing to switch databases if needed.
For anyone interested in the schema and data, I've created a SQLFiddle: http://sqlfiddle.com/#!2/84233/1
and my matching query:
SELECT COALESCE(SQRT( (100.0*as1.actual_score/ps1.possible_score) * (100.0*as2.actual_score/ps2.possible_score) ) - (100/ps1.commonquestions), 0) AS perc
FROM (SELECT SUM(imp.value) AS actual_score
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 101
AND (uq1.accans1 = uq2.answer_id
OR uq1.accans2 = uq2.answer_id
OR uq1.accans3 = uq2.answer_id
OR uq1.accans4 = uq2.answer_id)
WHERE uq1.user_id = 1) AS as1,
(SELECT SUM(value) AS possible_score, COUNT(*) AS commonquestions
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 101
WHERE uq1.user_id = 1) AS ps1,
(SELECT SUM(imp.value) AS actual_score
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 1
AND (uq1.accans1 = uq2.answer_id
OR uq1.accans2 = uq2.answer_id
OR uq1.accans3 = uq2.answer_id
OR uq1.accans4 = uq2.answer_id)
WHERE uq1.user_id = 101) AS as2,
(SELECT SUM(value) AS possible_score
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 1
WHERE uq1.user_id = 101) AS ps2
I was bored, so: Here's a rewritten version of your query - based on a PostgreSQL port of your schema - that calculates the matches for all user pairings at once:
http://sqlfiddle.com/#!12/30524/6
I've checked and it produces the same results for the user pair (1,5).
WITH
userids(uid) AS (
select distinct user_id from user_questions
),
users(u1,u2) AS (
SELECT u1.uid, u2.uid FROM userids u1 CROSS JOIN userids u2 WHERE u1 <> u2
),
scores AS (
SELECT
sum(CASE WHEN uq2.answer_id IN (uq1.accans1, uq1.accans2, uq1.accans3, uq1.accans4) THEN imp.value ELSE 0 END) AS actual_score,
sum(imp.value) AS potential_score,
count(1) AS common_questions,
users.u1,
users.u2
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id
INNER JOIN users ON (uq1.user_id=users.u1 AND uq2.user_id=users.u2)
GROUP BY u1, u2
),
score_pairs(u1,u2,u1_actual,u2_actual,u1_potential,u2_potential,common_questions) AS (
SELECT s1.u1, s1.u2, s1.actual_score, s2.actual_score, s1.potential_score, s2.potential_score, s1.common_questions
FROM scores s1 INNER JOIN scores s2 ON (s1.u1 = s2.u2 AND s1.u2 = s2.u1)
WHERE s1.u1 < s1.u2
)
SELECT
u1, u2,
COALESCE(SQRT( (100.0*u1_actual/u1_potential) * (100.0*u2_actual/u2_potential) ) - (100/common_questions), 0) AS "match"
FROM score_pairs;
There's no reason you couldn't port this back to MySQL, as the CTE is only there for readability and doesn't do anything you can't do with FROM (SELECT ...). There's no WITH RECURSIVE clause and no CTE is referenced from more than one other CTE. You'd have a bit of a scary nested query, but that's just a formatting challenge.
Changes:
Generate a set of distinct users
Self-join that set of distinct users to create a set of user pairings
and then join on that list of pairings in the score query to produce a table of scores
Produce the scores table by combining the largely duplicate queries for possiblescore1 and possiblescore2, actualscore1 and actualscore2.
then summarize it in the final outer query
I haven't optimised the query; as written it runs in 5ms on my system. On bigger data it's possible you may need to restructure some of it or use tricks like converting some CTE clauses into SELECT ... INTO TEMPORARY TABLE temp table creation statements that you then index before querying.
It's also possible that you'll want to move the generation of the users rowset out of the CTE and into a FROM subquery clause of scores. That's because WITH is required to behave as an optimisation fence between clauses, so the database must materialize rows and can't use tricks like pushing clauses up or down.