Build sql query from multiple tables for cyfe dashboard - mysql

For the purpose of monitoring my data from my users I want to visualise my data in a Cohort analysis. Lets say that i have the following tables in my database:
Table: track_register
user_id, date, time
And in the following table:
Table: track_loginuser_id, date, time, succes
How i want my cohort analysis to look is like:
Months Sign Ups loged in more then once
May 40 80%
I am using Cyfe to visualise this so the data has to be formatted in a table like this:
Month,Sign Ups,Loged in more then once
May 2015,40,32
Jun 2015,60,55
(click here for cyfe example)
Eventually i want to add more data to the cohort from other tables such as percentage of users who actually bought the product and more of that good stuff.
The first set of data (the signups per month) is not the hard part. But what i am struggling with is how to fetch the data from the track login table. I will have to count the number of times a specific user has loged in and if thats > 1 then +1. I can imagine that u use CASE for that. The trouble is to separated it by the correct moth. Because the moth where de +1 supposed to go needs to be fetched from the track_register table.
Its seems kind of hard to me to put this all in one single query? But if it couldn't be done why go to the trouble of building a cohort analysis on cyfe?

Hi DATE as field name is restricted so I used DATA.
You can try this code:
SELECT TO_CHAR(NVL(a.data, b.data), 'MON YYYY') months
, COUNT(DISTINCT a.login) sign_ups
, SUM(CASE WHEN COUNT(DISTINCT b.login) > 1 THEN 1 ELSE 0 END) Loged_in_more_then_once
FROM track_register a LEFT JOIN track_login b ON a.login = b.login
GROUP BY TO_CHAR(NVL(a.data, b.data), 'MON YYYY')
ORDER BY 1
Or:
SELECT TO_CHAR(NVL(a.data, b.data), 'MON YYYY') months
, COUNT(DISTINCT a.login) sign_ups
, SUM(CASE WHEN COUNT(DISTINCT b.login) > 1 THEN 1 ELSE 0 END) Loged_in_more_then_once
FROM track_register a LEFT JOIN track_login b
ON a.login = b.login AND LAST_DAY(a.data) = LAST_DAY(b.data)
GROUP BY TO_CHAR(NVL(a.data, b.data), 'MON YYYY')
ORDER BY 1

Related

MySQL query to calculate rent owed based on a partial payment

I could probably do the following via PHP code but I feel it could most likely been accomplished in MySQL, so just looking for someone to help me out on a query.
I have a contract table which defines a customer monthly payment, let's say for example it is £500 per month. I then have another table called rent, where every month rent is inputted. Each rent has a status called Paid, Unpaid and Partial.
My query so far is the following, do I have to do multiple sub-queries, or is their a simple way.
SELECT cc.property_id, cc.property_rent, r.order_total, r.order_status,
SUM(CASE WHEN r.order_status = 'Partial' THEN cc.property_rent - r.order_total ELSE 0 END) AS partial_rent_owed
FROM t_customers_contract cc JOIN
t_customers_rent r
ON cc.customer_id = r.customer_id WHERE cc.property_id = 62 AND r.transaction_type = 'rent' AND
(r.date_created BETWEEN '2017-04-05' AND '2019-04-05')
GROUP BY cc.property_id
Basically, if the rent status is partial then subtract it from what is usually and then total sum of what is owed.
The desired result would be in the total SUM of what is owed and what has been contributed per property as a following output:
property_id, total_rent_made, total_rent_owed
The current contract table structure and data is as follows:
The current rent table structure and data is as follows:
As you can see order_id 20 and 27 are Partial payments and the actual payment to be made based on the contract for these ID's should be 750 and 700.
I have managed to resolve it, it is the following query. If anyone can improve it, happy to take a suggestion based on performance or it being more cleaner.
SELECT cc.property_id,
SUM(CASE WHEN r.order_status = 'Partial' THEN (cc.property_rent - r.order_total) ELSE 0 END) AS partial_rent_owed
FROM t_customers_contract cc JOIN t_customers_rent r ON (cc.property_id = r.property_id)
WHERE cc.contract_id = r.contract_id
AND cc.customer_id = 7866
AND r.transaction_type = 'rent'
AND (r.date_created BETWEEN '2016-04-05' AND '2019-04-05')
GROUP BY cc.property_id

MySQL alternative to subquery/join

I am looking for an efficient alternative to subqueries/joins for this query. Let's say I a table that stores information about companies with the following columns:
name: the name of the company
state: the state the company is located
in
revenue: the annual revenue of the company
employees: how many
employees this company has
active_business: wether or not the company
is in business (1 = yes, 0 = no)
Let's say that from this table, I want to find out how many companies in each state meet the requirement for some minimum amount of revenue, and also how many companies meet the requirement for some minimum number of employees. This can be expressed as the following subquery (can also be written as a a join):
SELECT state,
(
SELECT count(*)
FROM records AS a
WHERE a.state = records.state
AND a.revenue > 1000000
) AS companies_with_min_revenue,
(
SELECT count(*)
FROM records AS a
WHERE a.state = records.state
AND a.employees > 10
) AS companies_with_min_employees
FROM records
WHERE active_business = 1
GROUP BY state
My question is this. Can I do this without the subqueries or joins? Since the query is already iterating over each row (there's no indexes), is there some way I can add a condition that if the row meets the minimum revenue requirements and is in the same state, it will increment some sort of counter for the query (similar to map/reduce)?
I think CASE and SUM will solve it:
SELECT state
, SUM(CASE WHEN R.revenue > 1000000 THEN 1 ELSE 0 END) AS companies_with_min_revenue
, SUM(CASE WHEN R.employees > 10 THEN 1 ELSE 0 END) AS companies_with_min_employees
FROM records R
WHERE R.active_business = 1
GROUP BY R.state
As you can see, we will have a value of 1 per record with a revenue of greater than 1000000 (else 0), then we'll take the sum. The same goes with the other column.
Thanks to this StackOverflow question. You'll find this when you search "sql conditional count" in google.

Relational Database Logic

I'm fairly new to php / mysql programming and I'm having a hard time figuring out the logic for a relational database that I'm trying to build. Here's the problem:
I have different leaders who will be in charge of a store anytime between 9am and 9pm.
A customer who has visited the store can rate their experience on a scale of 1 to 5.
I'm building a site that will allow me to store the shifts that a leader worked as seen below.
When I hit submit, the site would take the data leaderName:"George", shiftTimeArray: 11am, 1pm, 6pm (from the example in the picture) and the shiftDate and send them to an SQL database.
Later, I want to be able to get the average score for a person by sending a query to mysql, retrieving all of the scores that that leader received and averaging them together. I know the code to build the forms and to perform the search. However, I'm having a hard time coming up with the logic for the tables that will relate the data. Currently, I have a mysql table called responses that contains the following fields,
leader_id
shift_date // contains the date that the leader worked
shift_time // contains the time that the leader worked
visit_date // contains the date that the survey/score was given
visit_time // contains the time that the survey/score was given
score // contains the actual score of the survey (1-5)
I enter the shifts that the leader works at the beginning of the week and then enter the survey scores in as they come in during the week.
So Here's the Question: What mysql tables and fields should I create to relate this data so that I can query a leader's name and get the average score from all of their surveys?
You want tables like:
Leader (leader_id, name, etc)
Shift (leader_id, shift_date, shift_time)
SurveyResult (visit_date, visit_time, score)
Note: omitted the surrogate primary keys for Shift and SurveyResult that I would probably include.
To query you join shifts and surveys group on leader and taking the average then jon that back to leader for a name.
The query might be something like (but I haven;t actually built it in MySQL to verify syntax)
SELECT name
,AverageScore
FROM Leader a
INNER JOIN (
SELECT leader_id
, AVG(score) AverageScore
FROM Shift
INNER JOIN
SurveyResult ON shift_date = visit_date
AND shift_time = visit_time --depends on how you are recording time what this really needs to be
GROUP BY leader ID
) b ON a.leader_id = b.leader_id
I would do the following structure:
leaders
id
name
leaders_timetabke (can be multiple per leader)
id,
leader_id
shift_datetime (I assume it stores date and hour here, minutes and seconds are always 0
survey_scores
id,
visit_datetime
score
SELECT l.id, l.name, AVG(s.score) FROM leaders l
INNER JOIN leaders_timetable lt ON lt.leader_id = l.id
INNER JOIN survey_scores s ON lt.shift_datetime=DATE_FORMAT('Y-m-d H:00:00', s.visit_datetime)
GROUP BY l.id
DATE_FORMAT here helps to cut hours and minutes from visit_datetime so that it could be matched against shift_datetime. This is MYSQL function, so if you use something else you'll need to use different function
Say you have a 'leader' who has 5 survey rows with scores 1, 2, 3, 4 and 5.
if you select all surveys from this leader, sum the survey scores and divide them by 5 (the total amount of surveys that this leader has). You will have the average, in this case 3.
(1 + 2 + 3 + 4 + 5) / 5 = 3
You wouldn't need to create any more tables or fields, you have what you need.

How can I find days between different paired rows?

I've been racking my brain about how to do this in one query without PHP code.
In a nutshell, I have a table that records email activity. For the sake of this example, here is the data:
recipient_id activity date
1 delivered 2011-08-30
1 open 2011-08-31
2 delivered 2011-08-30
3 delivered 2011-08-24
3 open 2011-08-30
3 open 2011-08-31
The goal: I want to display to users a single number that tells how many recipients open their email within 24 hours.
E.G. "Users that open their email within 24 hours: 13 Readers"
In the case of the sample data, above, the value would be "1". (Recipient one was delivered an email and opened it the next day. Recipient 2 never opened it and recipient 3 waited 5 days.)
Can anyone think of a way to express the goal in a single query?
Reminder: In order to count, the person must have a 'delivered' tag and at least one 'open' tag. Each 'open' tag only counts once per recipient.
** EDIT ** Sorry, I'm using MySQL
Here is a version in mysql.
select count(distinct recipient_id)
from email e1
where e1.activity = 'delivered'
and exists
(select * from email e2
where e1.recipient_id = e2.recipient_id
and e2.activity = 'open'
and datediff(e2.action_date,e1.action_date) <= 1)
The basic principle is that you want to find a delivered row for a recipient that also has an open within 24 hours.
The datediff() is a good way to do the date arithmetic in mysql -- other dbs will vary on exact methods for this step. The rest of the sql will work anywhere.
SQLFiddle here: http://sqlfiddle.com/#!2/c9116/4
Untested, but should work ;) Don't know which SQL dialect you use, so I've used TSQL DATEDIFF function.
select distinct opened.recipient_id -- or count(distinct opened.recipient_id) if you want to know number
from actions as opened
inner join actions as delivered
on opened.recipient_id = delivered.recipient_id and delivered.activity = 'delivered'
where opened.activity = 'open' and DATEDIFF(day, delivered.date, opened.date) <= 1
Edit: I'd confused opened with delivered - now replaced.
Assumptions: MySql, table is called "TABLE"
Ok, I am not 100% on this, because I don't have a copy of the table to run it against, but I think that you could do something like this:
SELECT COUNT(DISTINCT t1.recipient_id) FROM TABLE t1
INNER JOIN TABLE t2 ON t1.recipient_id = t2.recipient_id AND t1.activity != t2.activity
WHERE t1.activity in ('delivered', 'open') AND t2.activity in ('delivered', 'open')
AND ABS(DATEDIFF(t1.date, t2.date)) = 1
Basically, you are joining a table onto itself, where the activities don't match, but recipient_ids do, and the status is either 'delivered' or 'open'. What you would end up getting, is a result that looks like this:
1 delivered 2011-08-30 1 open 2011-08-31
You are then doing a diff between the two dates (with an absolute value, because we don't know which order they will be in) and making sure that it is equal to 1 (or 24 hours).

Multiple LEFT JOINs to self with criteria to produce distribution

Although several . questions . come . close . to what I want (and as I write this stackoverflow has suggested several more, none of which quite capture my problem), I just don't seem to be able to find my way out of the SQL thicket.
I have a single table (let's call it the user_classification_fct) that has three fields: user, week, and class (e.g. user #1 in week #1 had a class of 'Regular User', while user #2 in week #1 has a class of 'Infrequent User'). (As an aside, I have implemented classes as INTs, but wanted to work with something legible in the form of VARCHAR while I sorted out the SQL.)
What I want to do is produce a summary report of how user behaviour is changing in aggregate along the lines of:
There were 50 users who were regular users in both week 1 and week 2 and ...
There were 10 users who were regular users in week 1, but fell to infrequent users in week 2
There were 5 users who went from infrequent in week 1 to regular in week 2
... and so on ...
What makes this slightly more tricky is that user #5000 might only have started using the service in week 2 and so have no record in the table for week 1. In that case, I'd want to see a NULL FOR week 1 and a 'Regular User' (or whatever is appropriate) for week 2. The size of the table is not strictly relevant, but with 5 weeks' worth of data I'm looking at 42 million rows, so I do not want to insert 4 'fake' rows of 'Non-User' for someone who only starts using the service in week 5 or something.
To me this seems rather obviously like a case for using a LEFT or RIGHT JOIN in MySQL because the NULL should come through on the 'missing' record.
I have tried using both WHERE and AND conditions on the LEFT JOINs and am just not getting the 'right' answers (i.e. I either get no NULL values at all in the case of trailing WHERE conditions, or my counts are far, far too high for the number of distinct users (which is ca. 10 million) in the case of the AND constraints used below). Here's was my last attempt to get this working:
SELECT
ucf1.class_nm AS 'Class in 2012/15',
ucf2.class_nm AS 'Class in 2012/16',
ucf3.class_nm AS 'Class in 2012/17',
ucf4.class_nm AS 'Class in 2012/18',
ucf5.class_nm AS 'Class in 2012/19',
count(*) AS 'Count'
FROM
user_classification_fct ucf5
LEFT JOIN user_classification_fct ucf4
ON ucf5.user_id=ucf4.user_id
AND ucf5.week_key=201219 AND ucf4.week_key=201218
LEFT JOIN user_classification_fct ucf3
ON ucf4.user_id=ucf3.user_id
AND ucf4.week_key=201218 AND ucf3.week_key=201217
LEFT JOIN user_classification_fct ucf2
ON ucf3.user_id=ucf2.user_id
AND ucf3.week_key=201217 AND ucf2.week_key=201216
LEFT JOIN user_classification_fct ucf1
ON ucf2.user_id=ucf1.user_id
AND ucf2.week_key=201216 AND ucf1.week_key=201215
GROUP BY 1,2,3,4,5;
In looking at the various other questions on stackoverflow.com, it may well be that I need to perform the queries one-at-a-time and UNION the result sets together or use parentheses to chain them one-to-another, but those approaches are not ones that I'm familiar with (yet) and I can't even get a single LEFT JOIN (i.e. week 5 to week 1, dropping all the other weeks of data) to return something useful.
Any tips would be much, much appreciated and I would really appreciate suggestions that work in MySQL as switching database products is not an option.
You can do this with a group by. I would start by summarizing all the possible combinations for the five weeks as:
select c_201215, c_201216, c_201217, c_201218, c_201219,
count(*) as cnt
from (select user_id,
max(case when week_key=201215 then class_nm end) as c_201215,
max(case when week_key=201216 then class_nm end) as c_201216,
max(case when week_key=201217 then class_nm end) as c_201217,
max(case when week_key=201218 then class_nm end) as c_201218,
max(case when week_key=201219 then class_nm end) as c_201219
from user_classification_fct ucf
group by user_id
) t
group by c_201215, c_201216, c_201217, c_201218, c_201219
This may solve your problem. If you have 5 classes (including NULL), then this will return at most 5^5 or 3,125 rows.
This fits into Excel, so you can do the final processing there. Alternatively, you can still use the database.
If you want to extract pairs of weeks, then I would suggest putting the above into a temporary table, say "t". And doing a series of extracts with unions:
select *
from ((select '201215' as weekstart, c_201215, c_201216, sum(cnt) as cnt
from t
group by c_201215, c_201216
) union all
(select '201216', c_201216, c_201217, sum(cnt) as cnt
from t
group by c_201216, c_201217
) union all
(select '201217', c_201217, c_201218, sum(cnt) as cnt
from t
group by c_201217, c_201218
) union all
(select '201218', c_201218, c_201219, sum(cnt) as cnt
from t
group by c_201218, c_201219
)
) tg
order by 1, cnt desc
I suggest putting it in a subquery because you don't want to message around with common-subquery optimizations on such a large table. You'll get to your final answer by summarizing first, and then bringing the data together.