Im having a problem with an aggregate function in mysql.
As an example I have this table layout. It gets filled with data every x minutes.
Company | Employee | Room | Temperature
---------------------------------------
A | Mike | 301 | 20
A | Mike | 301 | 30
A | Mike | 301 | 30
A | Mike | 402 | 40
A | Lisa | 402 | 10
Now in my query I'm grouping Company + Employee into one result and I'm looking for the count of the maximum occurrences of the Room value while still aggregating other values like temperature.
SELECT
Company,
Employee,
??? as Room,
AVG(Temperature) as Temperature
FROM
example_table
GROUP BY
Company,
Employee
In this example the room 301 appears 3 times for Mike which should output 3 in the aggregate function. Since the data is on a set interval it is basically the maximum length of a stay in a room for this employee. I'm looking for a result like this
Company | Employee | Room | Temperature
---------------------------------------
A | Mike | 3 | 30
A | Lisa | 1 | 10
I feel like I'm missing something but so far I have found nothing which worked in a query for me. I can group_concant the rooms and solve this with 2 lines of code in php, but the actual data is gigabytes which I don't want to send to a script. Performance of the database query doesn't matter. MySQL 8 is available.
edit: I've changed the example to make sure COUNT(DISTINCT Room) doesn't accidentally give the correct result, because it's not what I'm looking for.
SELECT Company, Employee
, MAX(roomOccurrence) AS Room
, AVG(roomTemp * roomOccurrence) AS Temperature
FROM ( SELECT Company, Employee, Room
, COUNT(*) AS roomOccurrence, AVG(Temperature) AS roomTemp
FROM example_table
GROUP BY Company, Employee, Room
) AS subQ
GROUP BY Company, Employee
;
Note the outer temperature average weights the temperature average from the inner query.
Alternatively, you could SUM the temps in the subquery...and then divide the SUM of the SUM by the SUM of the room COUNT; but's it should be the same either way. I would at best expect minor performance differences, and not sure if either way would be consistently faster.
Related
Following is my scenario
I have tables named
Products
id | name | count | Price
-------------------------
1 | meat | 1 | 10
Users
id | name | balance
-----------------
1 | Tim | 10
2 | Joe | 10
Work flow
select products if count >= 1,
reduce user's balance and count = count - 1
if no_balance or count < 1 throw error
Let's say if both users placing an order for 1 product at exact same time, products table count updates to -1, means query executes for both users.
Products
id | name | count | Price
-------------------------
1 | meat | -1 | 10
During placeing of an order,I have used the below query to select matching products
Select * from products where count >= 1 and price >= 10
Also, if users place orders with even little time difference, the expecting output gathered.
Is there any solution to this ?
You should consider use lock for each row, for example.
Select * from products where count >= 1 and price >= 10 FOR UPDATE.
But in your scenario, I advice you use Redis to do that.
How to design a second kill system for online shop
I have two tables (Invoices and taxes) in mysql:
Invoices:
- id
- account_id
- issued_at
- total
- gross_amount
- country
Taxes:
- id
- invoice_id
- tax_name
- tax_rate
- taxable_amount
- tax_amount
I'm trying to retrive a report like this
rep_month | country | total_amount | tax_name | tax_rate(%) | taxable_amount | tax_amount
--------------------------------------------------------------------------------------
2017-01-01 | ES | 1000 | TAX1 | 21 | 700 | 147
2017-01-01 | ES | 1000 | TAX2 | -15 | 700 | 105
2016-12-01 | FR | 100 | TAX4 | 20 | 30 | 6
2016-12-01 | FR | 100 | B2B | 0 | 70 | 0
2017-01-01 | GB | 2500 | TAX3 | 20 | 1000 | 200
The idea behind this is that an invoice has a has_many relation with taxes. So an invoice can have or not taxes. The report should show the total amount collected (total_amount) for a given country (regardess if it includes taxes)
and indicate which part of that total amount is taxable (taxable_amount) for an specific tax.
My current approach is this one:
SELECT
DATE_FORMAT(invoices.issued_at, '%Y-%m-01') AS rep_month,
invoices.country AS country
( SELECT sum(docs.gross_amount)
FROM invoices AS docs
WHERE docs.country = invoices.country
AND DATE_FORMAT(docs.issue_date, '%Y-%m-01') = rep_month
) AS total_amount,
taxes.tax_name AS tax_name,
taxes.tax_rate AS tax_rate,
SUM(taxes.taxable_amount) AS taxable_amount,
SUM(taxes.tax_amount) AS tax_amount
FROM invoices
JOIN taxes ON invoices.id = taxes.document_id
AND documents.issue_date BETWEEN '2016-01-01' AND '2017-12-31'
GROUP BY account_id, rep_month, country, tax_name, tax_rate
ORDER BY country desc
Well, this works but for a real dataset (thousands of records) it's really slow as the select subquery for retrieving the total_amount is being run for each row of the report.
I cannot make a LEFT JOIN taxes with a direct SUM(gross_amount) as the GROUP BY groups by tax name and rate and I need to show the total collected per country regardless if the amount was taxed or not. Is there a faster alternative to this?
I do not know the exact use case of using this query but the issue is the way with which you're trying to structure the DB, you're trying to get the entire data in one go.
Ideally, you should run the query you have and store in a different table (summary table) and then query directly from the summary table whenever you want. And if you have a new entry in the Invoices table then you can use the query to run either on every entry or periodically update the summary table via a cronjob.
First off, please excuse my formatting. I'm new here and this is my first posting.
I would like to take the value of a column (costs) split by invoice. For example, my table currently looks like this, where worker is the worker who worked on the job, inv is the invoice number, costs is the total costs for the invoice, and amount is the amount of work each worker did on that invoice. I would like to, at the end of this, be able to sum the worker amounts and cost amounts to come up with the invoice total:
Worker | Inv | Costs | Amount
----A--- |---1 | 12 -----| 50
----B--- |---1 | 12 -----| 10
----C--- |---1 | 12 -----| 40
----A----|---2 | 1 ------| 10
----B----|---2 | 1 ------| 10
and I would like it to look like this:
Worker | Inv | Costs | Amount
----A--- |---1 | 4 -------| 50
----B--- |---1 | 4 -------| 10
----C--- |---1 | 4 -------| 40
----A----|---2 | .5 ------| 10
----B----|---2 | .5 ------| 10
The end result after I throw this into a pivot table in Excel would show that Invoice1 is for a total of $112 and the total for Invoice2 is $21
This one sums the cost and amount for each invoice:
SELECT inv, SUM(Cost+Amount)
FROM MyTable
GROUP BY inv
The result is $112 for Invoice 1 and $21 for Invoice 2
select inv,count(1) as workercount
from tbl
group by inv
This query will give the count of the number of invoices, workercount will contain the number you want to divide by. We will use it as a subquery
select worker, inv, costs/invcount.workercount as costs, amount
from tbl
inner join
(select inv,count(1) as workercount
from tbl
group by inv) invcount
on tbl.inv = invcount.inv
pivot results from there. You can calc the total invoice amount via SQL instead of the excel spreadsheet you want here.
I am working on a PHP/MySQL timesheet system, and the report I want to create selects all employees who have worked less than the required amount of time between two dates.
The employee's time is stored in hours and minutes (INT), but I am only concerned with the hours.
The employee table looks like:
ID | Name
1 | George
2 | Fred
The timesheet_entry table:
ID | employeeID | hour | date
1 | 1 | 2 | 2013-07-25
2 | 2 | 4 | 2013-07-25
3 | 1 | 3 | 2013-07-25
So if I SELECT employees who have worked less than 5 hours (PHP variable hrsLimit) on 2013-07-25, it should return 2 Fred, as George has worked a total of 5 hours on that date.
I have a HTML form so the user can set the variables for the query.
I have tried:
SELECT employeeid,
employeename
FROM employee
JOIN timesheet_entry tse
ON tse.tse_employeeid = employeeid
AND Sum(tse.hour) < $hrslimit
I have not worried about the date yet.
The confusing bit here is that we are joining two tables. Perhaps I should select the hours and put the SUM clause at the end in a WHERE instead?
You need to group data and then place SUM condition in the HAVING part of the query.
select employee.id,
employee.Name,
Date,
sum(`hour`)
from timesheet_entry
join employee on timesheet_entry.employeeID=employee.ID
group by timesheet_entry.employeeID,date
having sum(`hour`)<$hrslimit
SQLFiddle demo
My example table:
+----------+---------------------+
| username | time |
+----------+---------------------+
| john | 2013-02-04 17:39:43 |
| john | 2013-02-03 00:21:31 |
| peter | 2013-02-02 15:04:53 |
| grace | 2013-02-02 03:57:43 |
| peter | 2013-02-03 15:36:15 |
+----------+---------------------+
This table registers activities from users. I need to count the number of users whose last activity date was more than 30 days ago.
I had developed this query:
SELECT
username,
MAX(time),
DATEDIFF(NOW(), MAX(time)) as diff
FROM tracking
GROUP BY username
HAVING diff > 30
Which effectively returns the list of users whose activities are more than 30 days ago, along with the date of that last activity.
But I need the count of this list, not the list itself. Is there any way I can count the list?
NOTES:
I can only rely on SQL statements, I can't use PHP or ASP or anything else.
I can't use STORED PROCEDURES.
I don't need performance, as this statement will only be run once in a while.
Here is a relatively simple way:
SELECT count(distinct username) -
count(distinct case when DATEDIFF(NOW(), time) <= 30 then username end) as numusers
FROM tracking
This takes the total number of users and subtracts the count of the ones with activity in the last 30 days.
Just like that?
Select count(*) as Num FROM
(
SELECT
username,
MAX(time),
DATEDIFF(NOW(), MAX(time)) as diff
FROM tracking
GROUP BY username
HAVING diff > 30
)