I am looking for an efficient alternative to subqueries/joins for this query. Let's say I a table that stores information about companies with the following columns:
name: the name of the company
state: the state the company is located
in
revenue: the annual revenue of the company
employees: how many
employees this company has
active_business: wether or not the company
is in business (1 = yes, 0 = no)
Let's say that from this table, I want to find out how many companies in each state meet the requirement for some minimum amount of revenue, and also how many companies meet the requirement for some minimum number of employees. This can be expressed as the following subquery (can also be written as a a join):
SELECT state,
(
SELECT count(*)
FROM records AS a
WHERE a.state = records.state
AND a.revenue > 1000000
) AS companies_with_min_revenue,
(
SELECT count(*)
FROM records AS a
WHERE a.state = records.state
AND a.employees > 10
) AS companies_with_min_employees
FROM records
WHERE active_business = 1
GROUP BY state
My question is this. Can I do this without the subqueries or joins? Since the query is already iterating over each row (there's no indexes), is there some way I can add a condition that if the row meets the minimum revenue requirements and is in the same state, it will increment some sort of counter for the query (similar to map/reduce)?
I think CASE and SUM will solve it:
SELECT state
, SUM(CASE WHEN R.revenue > 1000000 THEN 1 ELSE 0 END) AS companies_with_min_revenue
, SUM(CASE WHEN R.employees > 10 THEN 1 ELSE 0 END) AS companies_with_min_employees
FROM records R
WHERE R.active_business = 1
GROUP BY R.state
As you can see, we will have a value of 1 per record with a revenue of greater than 1000000 (else 0), then we'll take the sum. The same goes with the other column.
Thanks to this StackOverflow question. You'll find this when you search "sql conditional count" in google.
Related
I could probably do the following via PHP code but I feel it could most likely been accomplished in MySQL, so just looking for someone to help me out on a query.
I have a contract table which defines a customer monthly payment, let's say for example it is £500 per month. I then have another table called rent, where every month rent is inputted. Each rent has a status called Paid, Unpaid and Partial.
My query so far is the following, do I have to do multiple sub-queries, or is their a simple way.
SELECT cc.property_id, cc.property_rent, r.order_total, r.order_status,
SUM(CASE WHEN r.order_status = 'Partial' THEN cc.property_rent - r.order_total ELSE 0 END) AS partial_rent_owed
FROM t_customers_contract cc JOIN
t_customers_rent r
ON cc.customer_id = r.customer_id WHERE cc.property_id = 62 AND r.transaction_type = 'rent' AND
(r.date_created BETWEEN '2017-04-05' AND '2019-04-05')
GROUP BY cc.property_id
Basically, if the rent status is partial then subtract it from what is usually and then total sum of what is owed.
The desired result would be in the total SUM of what is owed and what has been contributed per property as a following output:
property_id, total_rent_made, total_rent_owed
The current contract table structure and data is as follows:
The current rent table structure and data is as follows:
As you can see order_id 20 and 27 are Partial payments and the actual payment to be made based on the contract for these ID's should be 750 and 700.
I have managed to resolve it, it is the following query. If anyone can improve it, happy to take a suggestion based on performance or it being more cleaner.
SELECT cc.property_id,
SUM(CASE WHEN r.order_status = 'Partial' THEN (cc.property_rent - r.order_total) ELSE 0 END) AS partial_rent_owed
FROM t_customers_contract cc JOIN t_customers_rent r ON (cc.property_id = r.property_id)
WHERE cc.contract_id = r.contract_id
AND cc.customer_id = 7866
AND r.transaction_type = 'rent'
AND (r.date_created BETWEEN '2016-04-05' AND '2019-04-05')
GROUP BY cc.property_id
For the purpose of monitoring my data from my users I want to visualise my data in a Cohort analysis. Lets say that i have the following tables in my database:
Table: track_register
user_id, date, time
And in the following table:
Table: track_loginuser_id, date, time, succes
How i want my cohort analysis to look is like:
Months Sign Ups loged in more then once
May 40 80%
I am using Cyfe to visualise this so the data has to be formatted in a table like this:
Month,Sign Ups,Loged in more then once
May 2015,40,32
Jun 2015,60,55
(click here for cyfe example)
Eventually i want to add more data to the cohort from other tables such as percentage of users who actually bought the product and more of that good stuff.
The first set of data (the signups per month) is not the hard part. But what i am struggling with is how to fetch the data from the track login table. I will have to count the number of times a specific user has loged in and if thats > 1 then +1. I can imagine that u use CASE for that. The trouble is to separated it by the correct moth. Because the moth where de +1 supposed to go needs to be fetched from the track_register table.
Its seems kind of hard to me to put this all in one single query? But if it couldn't be done why go to the trouble of building a cohort analysis on cyfe?
Hi DATE as field name is restricted so I used DATA.
You can try this code:
SELECT TO_CHAR(NVL(a.data, b.data), 'MON YYYY') months
, COUNT(DISTINCT a.login) sign_ups
, SUM(CASE WHEN COUNT(DISTINCT b.login) > 1 THEN 1 ELSE 0 END) Loged_in_more_then_once
FROM track_register a LEFT JOIN track_login b ON a.login = b.login
GROUP BY TO_CHAR(NVL(a.data, b.data), 'MON YYYY')
ORDER BY 1
Or:
SELECT TO_CHAR(NVL(a.data, b.data), 'MON YYYY') months
, COUNT(DISTINCT a.login) sign_ups
, SUM(CASE WHEN COUNT(DISTINCT b.login) > 1 THEN 1 ELSE 0 END) Loged_in_more_then_once
FROM track_register a LEFT JOIN track_login b
ON a.login = b.login AND LAST_DAY(a.data) = LAST_DAY(b.data)
GROUP BY TO_CHAR(NVL(a.data, b.data), 'MON YYYY')
ORDER BY 1
I'm having a bit of trouble figuring out a good statement to write. I am able to achieve what I want when I query a specific 'Company' but I wanting to get the values for all of the companies in the database.
Basically I have 3 tables: Users, Companies, Plans_ExchangeMailbox. What I need to do is query how many plans are in use for each company. The plans are assigned at the user level in the users table.
Here is my table layouts:
USERS
DisplayName
CompanyCode (This is the ID from the CompanyCode in the Companies table)
MailboxPlan (This is the ID from the Plans_ExchangeMailbox Table)
Companies
CompanyName
CompanyCode
Plans_ExchangeMailbox
MailboxPlanName
MailboxPlanID
Here is the format I am looking to generate:
CompanyName, MailboxPlanName, Count (this is the number of MailboxPlanID for a company)
I was able to get this working but the problem is it can only do one company at a time and it doesn't get the CompanyName:
SELECT
Plans_ExchangeMailbox.MailboxPlanName,
SUM(CASE WHEN Users.MailboxPlan = Plans_ExchangeMailbox.MailboxPlanId THEN 1 ELSE 0 END) AS PlanCount
FROM
Plans_ExchangeMailbox, Users
WHERE
Users.CompanyCode='CC0'
GROUP BY
Plans_ExchangeMailbox.MailboxPlanName
The Final Format How it Should Be:
Headers: CompanyName, PlanName, Count
Values:
Microsoft, Bronze Plan, 5
Microsoft, Gold Plan, 20
Dell, Bronze Plan, 3
Dell, Silver Plan, 80
etc.....
Try this:
SELECT
C.CompanyName,
E.MailboxPlanName,
COUNT(1) Cnt
FROM Companies C
JOIN Users U
ON C.CompanyCode = U.CompanyCode
JOIN Plans_ExchangeMailbox E
ON U.MailboxPlan = E.MailboxPlanID
GROUP BY
C.CompanyCode,
C.CompanyName,
E.MailboxPlanID,
E.MailboxPlanName
Grouped by C.CompanyCode and E.MailboxPlanID in case if there are different companies or MailboxPlan with same name. If no,you can remove them from GROUP BY clause.
I'm having a question that can be explained using a simple fictive table.
Table "Drinks" has just three fields:
Id (1..N) - Primary key
Date ('2012-09-19'...) - Each date can occur very often
Hot (1 for yes, and 0 for false).
I would like to produce a list like this:
Date Total Hot Cold
2012-09-19 14 6 8
2012-09-10 21 18 3
Etc.
The field "Cold" is as you might expect calculated as (Total - Hot).
What I've got so far is:
SELECT Date, count(*) AS Total FROM Drinks GROUP BY Date;
This gives me the desired table, but of course without the columns "Hot" and "Cold".
Is there a way to modify my query so I can produce this table in one go? I can of course built the table in phases using PHP code, but that is probably not the elegant way nor the fastest.
I'm happy to watch and learn... :)
You can add CASE statements in your SELECT clause.
SELECT Date,
count(*) AS Total,
SUM(CASE WHEN Hot = 1 THEN 1 ELSE 0 END) totlHOT,
SUM(CASE WHEN Hot = 0 THEN 1 ELSE 0 END) totalCold
FROM Drinks
GROUP BY Date;
SELECT Date,
count(*) AS Total,
SUM(Hot = 1) Hot,
SUM(Hot = 0) Cold
FROM Drinks
GROUP BY Date;
Although several . questions . come . close . to what I want (and as I write this stackoverflow has suggested several more, none of which quite capture my problem), I just don't seem to be able to find my way out of the SQL thicket.
I have a single table (let's call it the user_classification_fct) that has three fields: user, week, and class (e.g. user #1 in week #1 had a class of 'Regular User', while user #2 in week #1 has a class of 'Infrequent User'). (As an aside, I have implemented classes as INTs, but wanted to work with something legible in the form of VARCHAR while I sorted out the SQL.)
What I want to do is produce a summary report of how user behaviour is changing in aggregate along the lines of:
There were 50 users who were regular users in both week 1 and week 2 and ...
There were 10 users who were regular users in week 1, but fell to infrequent users in week 2
There were 5 users who went from infrequent in week 1 to regular in week 2
... and so on ...
What makes this slightly more tricky is that user #5000 might only have started using the service in week 2 and so have no record in the table for week 1. In that case, I'd want to see a NULL FOR week 1 and a 'Regular User' (or whatever is appropriate) for week 2. The size of the table is not strictly relevant, but with 5 weeks' worth of data I'm looking at 42 million rows, so I do not want to insert 4 'fake' rows of 'Non-User' for someone who only starts using the service in week 5 or something.
To me this seems rather obviously like a case for using a LEFT or RIGHT JOIN in MySQL because the NULL should come through on the 'missing' record.
I have tried using both WHERE and AND conditions on the LEFT JOINs and am just not getting the 'right' answers (i.e. I either get no NULL values at all in the case of trailing WHERE conditions, or my counts are far, far too high for the number of distinct users (which is ca. 10 million) in the case of the AND constraints used below). Here's was my last attempt to get this working:
SELECT
ucf1.class_nm AS 'Class in 2012/15',
ucf2.class_nm AS 'Class in 2012/16',
ucf3.class_nm AS 'Class in 2012/17',
ucf4.class_nm AS 'Class in 2012/18',
ucf5.class_nm AS 'Class in 2012/19',
count(*) AS 'Count'
FROM
user_classification_fct ucf5
LEFT JOIN user_classification_fct ucf4
ON ucf5.user_id=ucf4.user_id
AND ucf5.week_key=201219 AND ucf4.week_key=201218
LEFT JOIN user_classification_fct ucf3
ON ucf4.user_id=ucf3.user_id
AND ucf4.week_key=201218 AND ucf3.week_key=201217
LEFT JOIN user_classification_fct ucf2
ON ucf3.user_id=ucf2.user_id
AND ucf3.week_key=201217 AND ucf2.week_key=201216
LEFT JOIN user_classification_fct ucf1
ON ucf2.user_id=ucf1.user_id
AND ucf2.week_key=201216 AND ucf1.week_key=201215
GROUP BY 1,2,3,4,5;
In looking at the various other questions on stackoverflow.com, it may well be that I need to perform the queries one-at-a-time and UNION the result sets together or use parentheses to chain them one-to-another, but those approaches are not ones that I'm familiar with (yet) and I can't even get a single LEFT JOIN (i.e. week 5 to week 1, dropping all the other weeks of data) to return something useful.
Any tips would be much, much appreciated and I would really appreciate suggestions that work in MySQL as switching database products is not an option.
You can do this with a group by. I would start by summarizing all the possible combinations for the five weeks as:
select c_201215, c_201216, c_201217, c_201218, c_201219,
count(*) as cnt
from (select user_id,
max(case when week_key=201215 then class_nm end) as c_201215,
max(case when week_key=201216 then class_nm end) as c_201216,
max(case when week_key=201217 then class_nm end) as c_201217,
max(case when week_key=201218 then class_nm end) as c_201218,
max(case when week_key=201219 then class_nm end) as c_201219
from user_classification_fct ucf
group by user_id
) t
group by c_201215, c_201216, c_201217, c_201218, c_201219
This may solve your problem. If you have 5 classes (including NULL), then this will return at most 5^5 or 3,125 rows.
This fits into Excel, so you can do the final processing there. Alternatively, you can still use the database.
If you want to extract pairs of weeks, then I would suggest putting the above into a temporary table, say "t". And doing a series of extracts with unions:
select *
from ((select '201215' as weekstart, c_201215, c_201216, sum(cnt) as cnt
from t
group by c_201215, c_201216
) union all
(select '201216', c_201216, c_201217, sum(cnt) as cnt
from t
group by c_201216, c_201217
) union all
(select '201217', c_201217, c_201218, sum(cnt) as cnt
from t
group by c_201217, c_201218
) union all
(select '201218', c_201218, c_201219, sum(cnt) as cnt
from t
group by c_201218, c_201219
)
) tg
order by 1, cnt desc
I suggest putting it in a subquery because you don't want to message around with common-subquery optimizations on such a large table. You'll get to your final answer by summarizing first, and then bringing the data together.