Mysql how to find cumulative total by group - mysql

Can anyone help me to sort this out pleaase. i have a episode table and for an episode there will be following appointments . Episode table will be like
+-------------+------------+------------+------------+----------------+------+
| Episode_id | Patientid | St_date | End_date | Status | ... |
+-------------+------------+------------+------------+----------------+------+
| 61112345 | 100001 | 12-01-2010 | | Active | |
| 61112346 | xxxxxx | 20-01-2010 | 10-10-2011 | Withdrawn | |
| ......... | xxxxxxxx | 30-01-2010 | 10-05-2011 | Lost to follow | |
| ......... | xxxxxxxx | 01-02-2011 | Active | Active | |
+-------------+------------+------------+------------+----------------+------+
Status field holds the status of each episode.A episode has 6 appointments , 3 months per appointment. so totally an episode has 18 months . some patient may complete all 6 appointment , some may withdraw in the middle, or some will be lost to follow up. i need to create a dashboard .
Appointment table will have fields for
Appointment_id
PatientId
...
Stats // Completed or pending, which is used for reporting
For example if a patient complete 2 appointment and if he is marked as Withdrawn on episdode which means that he has withdrawn from 3rd visit and active for 2 visits, if we lost to follow him on 5th app, then he will be active for 4app and then he will be added to lost to follow up on 5th visit. if he completes all then he will added to active for all 6 visits. and the report should be like
Report from 01-01-2010 to 31-12-2010
+--------+--------+-------------+----------------+---------+
| | Active | Withdrawn | Lost to follow | Revised |
+------- +--------+-------------+----------------+---------+
| visit1 | 1500 | 30 | 5 | 5 |
| Visit2 | 1800 | 20 | 4 | 3 |
| Visit3 | 1900 | 45 | 3 | 2 |
| Visit4 | 1800 | 34 | 0 | 1 |
| Visit5 | 1900 | 30 | 0 | 1 |
| Visit6 | 1200 | 20 | 0 | 5 |
+--------+--------+-------------+----------------+---------+
Currently we are fetching the query and using loop only we are generating reports like this, but it is taking time to process, is there any way i can achieve using query itself.

It isn't really clear what you want to group by, but I can give you a general answer. After your where clause you can add "group by fieldname order by fieldname" where fieldname is the element you want to count or sum. You can then count(fieldname) or sum(fieldname) to either add or count.
This may be helpful: http://www.artfulsoftware.com/infotree/qrytip.php?id=105

Related

Need an SQL query to calculate the sum of user activity time in desired period

I'm looking for a way to find out how much time in total has each user been active during, for example, the month of February 2020. Is there a way this can be done by querying the MySQL database?
I have a "user_activity" table in my databese, which contains all changes that the user is making to his account.
It has 4 columns that can be important for this calculation:
"field" - this column gathers data on what type of change the user is making to his account, the value of this column is "active" for all cases where the user is activating/deactivating his account
"old_value" - this is the value of what is inside of "field" that was present before the change
"new_value - this is the new value of what is inside of "field"
"activity_time" - this is the time of the change
EXAMPLE TABLE: user_activity
| user_activity_id | user_id | field | old_value | new_value | activity_time |
-------------------------------------------------------------------------------------
| 1 | 1 | active | 1 | 0 | 2020-01-01 15:45:00 |
| 2 | 1 | active | 0 | 1 | 2020-01-02 10:31:00 |
| 3 | 3 | active | 0 | 1 | 2020-01-02 16:22:00 |
| 4 | 4 | active | 0 | 1 | 2020-01-03 03:25:00 |
| 5 | 4 | active | 1 | 0 | 2020-01-06 19:59:00 |
So each time the user activates his account a line is entered in "user_activity" table with new activity_time and values where field = "active" and old_value = 0 and new_value = 1.
A single user can activate or deactivate his account multiple times during 1 month and I'm working on a table with tens of thousands of entries like this.
EXAMPLE DESIRED OUTPUT:
| user_id | active_hours_feb_2020 |
-----------------------------------
| 1 | 500 |
| 2 | 0 |
| 3 | 700 |
| 4 | 250 |
You can get the past activity time and calculate the difference. Think the query will work if you track the activity all the time.
query using lag - https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9f6fea01dd9fdc3e91d1e446ec027927

Storing changes on MySQL Database properly

I've currently got three tables to track changes on specific entries but it seems like I am ending up with a ton of entries and I am not sure if that is the best possible way.
My first table holds the basic information and the second and third one the extra entries I grab every 8 hours.
ID | creation_date | removal_date | article_url | status which are basically the most stable entries. Status and removal_date are the only ones that will change in case we disable/remove an entry.
Example:
ID | creation_date | removal_date | article_url | status
---|------------------|------------------|-------------|-------
1 | 10/01/2020 20:00 | NULL | http://xxx | 1
2 | 23/01/2020 10:00 | 27/01/2020 13:00 | http://xxx2 | 2
3 | 10/02/2020 15:00 | NULL | http://xxx3 | 1
Status 1 = Active
Status 2 = Inactive
The second table holds everything else:
ID | main_id | last_update | title | description | views | rating | comments
The second table creates a new entry every 8 hours as long as something changes. Then based on the entries added here, I show average views/rating/comments changes on a daily/weekly/monthly basis.
Example:
ID | main_id | last_update | title | description | views | rating | comments
---|---------|------------------|----------------|--------------------|-------|--------|---------
1 | 1 | 10/01/2020 20:00 | First Article | Description.. | 1 | 1 | 0
2 | 2 | 23/01/2020 10:00 | Second Article | Desc.. | 1 | 1 | 0
3 | 1 | 11/01/2020 20:00 | First Article | Description update | 15 | 3 | 2
4 | 1 | 12/01/2020 20:00 | 1st Article | Description update | 30 | 5 | 4
5 | 3 | 10/02/2020 15:00 | 3rd Article | Descript! | 3 | 1 | 1
The third table holds the tags:
ID | main_id | tag_id | date_added | date_removed
I thought instead of having a status to add an empty date_removed so in case the tags get updated/removed/etc update that part. The tags are saved in a separate table and just grab the id and store the connection between the two here.
Example:
ID | main_id | tag_id | date_added | date_removed
---|---------|--------|------------------|------------------
1 | 1 | 2 | 10/01/2020 20:00 | NULL
2 | 1 | 3 | 15/01/2020 16:30 | 17/01/2020 13:00
3 | 2 | 3 | 23/01/2020 10:00 | NULL
4 | 3 | 5 | 10/02/2020 15:00 | NULL
5 | 1 | 5 | 11/02/2020 17:00 | NULL
I'd just like to know if there is a better / more proper way to store the above data.
Yepp, #Maria, is clearer.
Assuming that you are dealing with blog entries, you may have a data model like this.
Table 1. articles. // where every article is created.
article_id | article_creation_date | article_title | article_url | article_creator_id | article_description |
-----------|-----------------------|---------------|-------------|--------------------|------------------------------|
1 | 2020/03-31 10:36:05 | "The Dilemma" | /articles/1 | 23 | Explains the relations....|
Table 2. article_status // stores changes of state to each article.
article_status | article_id | status | date_of_change |
---------------|------------|--------|--------------------|
1 | 1 | 7 | 15/04/2020 09:30:00|
Table 3. article_tags. // every article and it's tags
article_tag | article_id | tag_id | date_added |
------------|------------|--------|--------------------|
1 | 1 | 24 | 15/04/2020 09:30:00|
Table 4. article views // stores the summarized amount of views to each article, for a period, say day, week, 8 hours,...
article_v_id | article_id | views_summarized | time_lapse | time_lapse_value | date_summarization |
-------------|------------|------------------|----------------|-------------------|-------------------|
1 | 1 | 1578 | Day | 10/04/2020 | 12/04/2020 13:27:04 |
Table 5. article_updates // stores changes/updates made to each article.
article_update_id | article_id | type_of_update | update_detail | update_author | date_of_update |
------------------|------------|--------------------------------|---------------|------------------------|
1 | 1 | Title | | John Doe | 19/04/2020 15:27:24 |
And the contentn of the update is stored directly on articles table, say change of title. NO need to store all modified titles, content. Just the event and who made the change.

MySQL return prioritizes value else return other value

I have a table (This is a mock-table),
+------------+------+--------+
| Name | Code | Active |
+------------+------+--------+
| Sales | 55 | 1 |
| Sales | 55 | 0 |
| IT | 22 | 1 |
| Production | 33 | 1 |
| Production | 33 | 0 |
| Marketing | 77 | 0 |
| Marketing | 77 | 0 |
+------------+------+--------+
And I want to return a list of distinct names and codes. However, I want to determine if the department is active or not. so if Sales has a 1 in active and a 0 in active they are active, but is they had only zeros then they are not.
I've tried a variety of methods and read through a few dozen SO post, but am not gaining any progress.
The output I am trying to achieve is:
+------------+------+--------+
| Name | Code | Active |
+------------+------+--------+
| Sales | 55 | 1 |
| IT | 22 | 1 |
| Production | 33 | 1 |
| Marketing | 77 | 0 |
+------------+------+--------+
How can I prioritize the Active column to a value of 1, but still return an entry if all entries with the same code have a value of 0 (such as marketing)?
GROUP BY name and code and get the maximum value of Active. (Assuming 0 and 1 are the possible values for Active column)
SELECT Name,Code,MAX(Active) active
FROM tablename
GROUP BY Name,Code

Database schema - Configurable fields?

I sell leads and charge my clients like so:
(Only one type of payment from the followings can be charged from a client)
Pay Per Lead:
$__ for the first __ leads per month
$__ for the next __ leads per month
$__ for the next __ leads per month
and so on...
Pay per Appointment:
$__ for the first __ leads per month
$__ for the next __ leads per month
$ __ for the next __ leads per month
and so on...
Pay per Percentage of Sale:
__% of the sale price (per sale)
My Question:
What are the best possible database design solutions in such cases?
What i have tried:
+---------+
| clients |
+---------+
| id |
| name |
+---------+
+---------------+
| deals |
+---------------+
| client_id |
| max_quantity |
| cost |
| unit_type |
+---------------+
So records for client with the id 1 might look like:
+-----------+--------------+---------------+-------------+
| client_id | max_quantity | cost_per_unit | unit_type |
+-----------+--------------+---------------+-------------+
| 1 | 10 | 10 | lead |
| 1 | 30 | 5 | lead |
| 1 | 100 | 2 | lead |
| 1 | 10 | 35 | appointment |
| 1 | 30 | 20 | appointment |
| 1 | 100 | 10 | appointment |
| 1 | 1000 | 5 | appointment |
| 1 | 0 | 50 | sale |
+-----------+--------------+---------------+-------------+
Now the above table means that:
$10 will be charged per lead upto 10 leads
$5 will be charged per lead upto 30 leads
$2 will be charged per lead upto 100 leads
$35 will be charged per appointment upto 10 leads
$20 will be charged per appointment upto 30 leads
$10 will be charged per appointment upto 100 leads
$5 will be charged per appointment upto 1000 leads
$50 will be charged per sale
Also i want to add x number of such rules (per lead, per appointment, per sale)
I personally don't think that my approach is one of the best solutions. Looking forward to hear for you cleaver folks! Thank you.
P.S. I know that unit_type can be further normalized but this is not the issue :)
Update
Maybe i can store serialized data?
Your proposed schema is a good start and has some merits. IMO the less elegant parts are the denormalized repetition of unit_type values and non-functional max_quantity value for sale.
Would suggest splitting deals into three tables rather than one. Would personally go with singular rather than plural table names** and begin with the same prefix so they are listed close to each other: Something like commission_lead, commission_appointment and commission_sale.
** [Lots of debate on this here]
Would also suggest including both lower and upper bands in each row. This does use more data than is strictly needed but think it is worth doing as it should make the table data more readable and simplify the calculation queries.
So the proposed new schema is:
+---------+
| client |
+---------+
| id |
| name |
+---------+
+-----------------+
| commission_lead |
+-----------------+
| client_id |
| min_quantity |
| max_quantity |
| cost_per_unit |
+-----------------+
+------------------------+
| commission_appointment |
+------------------------+
| client_id |
| min_quantity |
| max_quantity |
| cost_per_unit |
+------------------------+
+-----------------+
| commission_sale |
+-----------------+
| client_id |
| cost_per_unit |
+-----------------+
And the records for client_id = 1 are:
commission_lead
+-----------+--------------+--------------+---------------+
| client_id | min_quantity | max_quantity | cost_per_unit |
+-----------+--------------+--------------+---------------+
| 1 | 0 | 10 | 10 |
| 1 | 11 | 30 | 5 |
| 1 | 31 | 100 | 2 |
+-----------+--------------+--------------+---------------+
commission_appointment
+-----------+--------------+--------------+---------------+
| client_id | min_quantity | max_quantity | cost_per_unit |
+-----------+--------------+--------------+---------------+
| 1 | 0 | 10 | 35 |
| 1 | 11 | 30 | 20 |
| 1 | 31 | 100 | 10 |
| 1 | 101 | 1000 | 5 |
+-----------+--------------+--------------+---------------+
commission_sale
+-----------+---------------+
| client_id | cost_per_unit |
+-----------+---------------+
| 1 | 50 |
+-----------+---------------+
I make an assumption that the change is very rare (update/insert), most of the time you use select to calculate the cost, so I propose this design, the select to calculate cost is very simple
+-----------+--------------+---------------+---------------+--------------+------------+
| client_id | max_quantity | min_quantity | cost_per_unit | default_cost | unit_type |
+-----------+--------------+---------------+---------------+--------------+------------+
| 1 | 10 | 0 | 10 | 0 | lead|
| 1 | 40 | 10 | 5 | 100 | lead|
| 1 | 140 | 40 | 2 | 250 | lead|
| 1 | 10 | 0 | 35 | 0 | appointment|
| 1 | 40 | 10 | 20 | 350 | appointment|
| 1 | 140 | 40 | 10 | 950 | appointment|
| 1 | 1140 | 140 | 5 | 1950 | appointment|
| 1 | 0 | 0 | 50 | 0 | sale|
+-----------+--------------+---------------+---------------+--------------+------------+
select query looks like
select
default_cost + ($quantity - min_quantity) * cost_per_unit
from
table
where
unit_type = $unit_type
and (max_quantity >= $quantity or max_quantity = 0)
and $quantity >= min_quantity
IF you consider the cost calculations business logic that is likely to change in the future AND you dont need to filter/sort the table based on the calculation constants, I recommend having one column for rule_id, that pretty much works like your unit_type, and one varchar column called properties where all the specific values needed for that rule is stored with a separator.
You then retrieve the rules that apply for your client to your business logic and do your calculations there. If you need a new rule that suddenly takes 5 parameters, you don't need to change the database schema. Simply write code for a new rule_id in your business logic and you are good to go.
Of course, if you prefer to move calculation logic into stored procedures and/or need to filter or order by rule properties, I think you should go with separate columns for each rule parameter...

Assistance with database design

I've got a excel sheet that contains all the employees that have worked for my company and is still working for us. It's a sheet of around 200 rows. Each row has basic info, like surname, name, position, qualification etc etc. 16 columns of basic info. Now, the tricky part is this. After the 16 columns, there are months (May-05 up to the present (Apr-12)). Under every month column, an employee either get's a 0 (contract), 1 (permanent), 2 (contract-terminated) or 3 (student).
What would be the best way to do this? I was thinking of 4 tables (listed below), where the one table determines permanently terminated people (for the sake of knowing who was on what type of employment).
MySQL Table: hr_employees
|-----------------|-------|----|----|----|
| employee_number | name | sur| etc| etc|
|-----------------|-------|----|----|----|
| 1 | Dave | F | xx | xx |
|-----------------|-------|----|----|----|
MySQL Table: hr_month
|----|--------|
| id | month |
|----|--------|
| 1 | May-05 |
| 2 | Jun-05 |
|----|--------|
MySQL Table: hr_status
|----|------|------|--------|
| id | e_no | date | status |
|----|------|------|--------|
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
|----|------|------|--------|
MySQL Table: hr_terminated
|----|------|
| id | e_no |
|----|------|
| 1 | 1 |
| 2 | 1 |
|----|------|
I hope you guys understand what I want to achieve, otherwise, ask a question, and I'll answer as best I can! :)
Thanks.
Here is a design that simplifies your data entry and is more relational database like and less Excel like, insofar as it's normalized.
MySQL Table: hr_employee
|-----------------|-------|----|----|----|
| employee_number | name | sur| etc| etc|
|-----------------|-------|----|----|----|
| 1 | Dave | F | xx | xx |
|-----------------|-------|----|----|----|
| 2 | Bob | M | xx | xx |
|-----------------|-------|----|----|----|
MySQL Table: hr_employee_status
|-----------------|------------|------------|--------|
| employee_number | from_date | to_date | status |
|-----------------|------------|------------|--------|
| 1 | 2005-05-01 | 2005-08-31 | 3 |
|-----------------|------------|------------|--------|
| 1 | 2006-05-01 | 2010-02-28 | 0 |
|-----------------|------------|------------|--------|
| 2 | 2010-03-01 | 9999-12-31 | 1 |
|-----------------|------------|------------|--------|
Here you can see that Dave was hired on as a student from May '05 to August '05, then he came back in May '06 as a contract employee which he worked as until the end of February '10. Then on March 1, 2010 Bob was hired as permanent employee and he is still working (max collating date means "until further notice").
The great advantage of this design is that you only have to enter/edit data when something changes, not once a month for every employee that you have or have ever had. You can also see what your workforce looked like at any given date (not just by months!) with a very simple SQL query.