mysql true way for using group by

mysql true way for using group by - mysql

i have this table
CREATE TABLE IF NOT EXISTS `goldprice` (
`price` double unsigned NOT NULL,
`days` smallint(5) unsigned NOT NULL,
`seconds` mediumint(5) unsigned NOT NULL,
`sid` smallint(4) unsigned NOT NULL,
`gid` smallint(4) NOT NULL,
PRIMARY KEY (`days`,`seconds`,`sid`),
KEY `sid` (`sid`),
KEY `gid` (`gid`),
KEY `days` (`days`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
and i want minimum and maximum price of each day
first and last price of each day (base on its seconds)
with using subquery i can solve some part of my problem
before grouping i make a subquery and sorting in it
result in mysql is true
min, max, and last or first (based on sort type) can be made
two important things remains
last and first both is required
performance is very important subquery seems not good
and
i have his sql
SELECT price, days, seconds FROM goldprice
where gid=1 and days>=16200 group by days
order by days desc, seconds desc
change "days>=16200" to "days=16200"
will returns different result in "days=16200" row.
the sort is not remaining desc.
i know behavior of my sql group by
but i can't find good solution for my needs
MySQL order by before group by

Your query is incorrect. You select days with a random price match and a random seconds match. You should decide what price and seconds you want to show per day. The SUM? The MINinum? The MAXimum?
When starting with GROUP BY, you should make sure that for each column you either group by it or aggregate it (e.g. use SUM(price) instead of price alone or have price in the group by clause).
Example:
select a, b, MIN(c), MAX(d), e
from mytable
group by a, b;
a and b are okay, because you group by them. MIN(c) and MAX(d) are okay, because you aggregate c and d. e is incorrect; it is neither in the group by clause nor being aggregated. This is allowed in MySQL, but it's an advanced feature one must be aware of and handle that carefully. Above select statement would give just any of the matching e per a, b - the minimun e, the maximum e or just any other e. Only do this when you know that e is unique for a, b or you don't care what e you get. As said, it's an advanced feature.

my solve is here
select g2.days, group_concat(price) prices,
group_concat(seconds) times, minp, maxp from goldprice g1
inner join (
SELECT gid, days,
max(price) as maxp, min(price) as minp,
max(seconds) as maxt, min(seconds) as mint
FROM `goldprice` where gid = 1 and days = 16200
group by days
) g2
on (g1.days = g2.days and g1.gid = g2.gid and (seconds = mint or seconds = maxt))
where gid = 1 and days = 16200
group by days order by days desc, seconds asc
is my solve is correct

Related

Query NOT IN a null subquery returning null instead of the whole table

My work is for an appointment system:
I have two tables:
Times(hour varchar);
Reservations(time varchar, date varchar);
Times table have all the times a store is open (as strings) from 8 to 6pm (08:00,08:30,09:00,etc..)
Reservation has the times reserved.
The store has 3 employees that can do an appointment simultaneously, so 3 client can reserve at 10:00am per example.
My goal is to return the list of times that aren't reserved but on one condition: If a time has been reserved less than 3 times it can still be reserved. I tried this query
SELECT `hour` FROM `times` WHERE `hour` NOT IN (SELECT `time` FROM `reservations` WHERE `date` = '$date' HAVING COUNT(`time`)>=3);
The problem is this returns null if there are no reserved times, but i cant understand why.. If the subquery returns null, the first query not in subquery(null) must return all the times in the Times table right? Its giving me empty rows... Anyone know why?

This query:
SELECT `time`
FROM `reservations`
WHERE `date` = '$date'
GROUP BY `time`
HAVING COUNT(*) >= 3
returns the list of times that are reserved under your condition.
So use a LEFT JOIN of Times to that query and return only the unmatched rows:
SELECT t.`hour`
FROM `times` t LEFT JOIN (
SELECT `time`
FROM `reservations`
WHERE `date` = '$date'
GROUP BY `time`
HAVING COUNT(*) >= 3
) r on r.time = t.`hour`
WHERE r.time IS NULL

If subquery returns null, then IN operator will always give you null as result. If you want to get results ensure you don't have nulls in subquery or make subquery return empty set. I tried it on SQL server.

MySQL query too much slow

I'm trying to make a query for get some trend stats, but the benchmark is really slow. The query execution time is around 134 seconds.
I have a MySQL table called table_1.
Below the create statement
CREATE TABLE `table_1` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`original_id` bigint(11) DEFAULT NULL,
`invoice_num` bigint(11) DEFAULT NULL,
`registration` timestamp NULL DEFAULT NULL,
`paid_amount` decimal(10,6) DEFAULT NULL,
`cost_amount` decimal(10,6) DEFAULT NULL,
`profit_amount` decimal(10,6) DEFAULT NULL,
`net_amount` decimal(10,6) DEFAULT NULL,
`customer_id` bigint(11) DEFAULT NULL,
`recipient_id` text,
`cashier_name` text,
`sales_type` text,
`sales_status` text,
`sales_location` text,
`invoice_duration` text,
`store_id` double DEFAULT NULL,
`is_cash` int(11) DEFAULT NULL,
`is_card` int(11) DEFAULT NULL,
`brandid` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_registration_compound` (`id`,`registration`)
) ENGINE=InnoDB AUTO_INCREMENT=47420958 DEFAULT CHARSET=latin1;
I have set a compound index made of id+registration.
Below the query
SELECT
store_id,
CONCAT('[',GROUP_CONCAT(tot SEPARATOR ','),']') timeline_transactions,
SUM(tot) AS total_transactions,
CONCAT('[',GROUP_CONCAT(totalRevenues SEPARATOR ','),']') timeline_revenues,
SUM(totalRevenues) AS revenues,
CONCAT('[',GROUP_CONCAT(totalProfit SEPARATOR ','),']') timeline_profit,
SUM(totalProfit) AS profit,
CONCAT('[',GROUP_CONCAT(totalCost SEPARATOR ','),']') timeline_costs,
SUM(totalCost) AS costs
FROM (select t1.md,
COALESCE(SUM(t1.amount+t2.revenues), 0) AS totalRevenues,
COALESCE(SUM(t1.amount+t2.profit), 0) AS totalProfit,
COALESCE(SUM(t1.amount+t2.costs), 0) AS totalCost,
COALESCE(SUM(t1.amount+t2.tot), 0) AS tot,
t1.store_id
from
(
SELECT a.store_id,b.md,b.amount from ( SELECT DISTINCT store_id FROM table_1) AS a
CROSS JOIN
(
SELECT
DATE_FORMAT(a.DATE, "%m") as md,
'0' as amount
from (
select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) month as Date
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) a
where a.Date >='2019-01-01' and a.Date <= '2019-01-14'
group by md) AS b
)t1
left join
(
SELECT
COUNT(epl.invoice_num) AS tot,
SUM(paid_amount) AS revenues,
SUM(profit_amount) AS profit,
SUM(cost_amount) AS costs,
store_id,
date_format(epl.registration, '%m') md
FROM table_1 epl
GROUP BY store_id, date_format(epl.registration, '%m')
)t2
ON t2.md=t1.md AND t2.store_id=t1.store_id
group BY t1.md, t1.store_id) AS t3 GROUP BY store_id ORDER BY total_transactions desc
Below the EXPLAIN
Maybe I should change from timestamp to datetime in registration column?

About 90% of your execution time will be used to execute GROUP BY store_id, date_format(epl.registration, '%m').
Unfortunately, you cannot use an index to group by a derived value, and since this is vital to your report, you need to precalculate this. You can do this by adding that value to your table, e.g. using a generated column:
alter table table_1 add md varchar(2) as (date_format(registration, '%m')) stored
I kept the varchar format you used for the month here, you could also use a number (e.g. tinyint) for the month.
This requires MySQL 5.7, otherwise you can use triggers to achieve the same thing:
alter table table_1 add md varchar(2) null;
create trigger tri_table_1 before insert on table_1
for each row set new.md = date_format(new.registration,'%m');
create trigger tru_table_1 before update on table_1
for each row set new.md = date_format(new.registration,'%m');
Then add an index, preferably a covering index, starting with store_id and md, e.g.
create index idx_table_1_storeid_md on table_1
(store_id, md, invoice_num, paid_amount, profit_amount, cost_amount)
If you have other, similar reports, you may want to check if they use additional columns and could profit from covering more columns. The index will require about 1.5GB of storage space (and how long it takes your drive to read 1.5GB will basically single-handedly define your execution time, short of caching).
Then change your query to group by this new indexed column, e.g.
...
SUM(cost_amount) AS costs,
store_id,
md -- instead of date_format(epl.registration, '%m') md
FROM table_1 epl
GROUP BY store_id, md -- instead of date_format(epl.registration, '%m')
)t2 ...
This index will also take care of another other 9% of your execution time, SELECT DISTINCT store_id FROM table_1, which will profit from an index starting with store_id.
Now that 99% of your query is taken care of, some further remarks:
the subquery b and your date range where a.Date >='2019-01-01' and a.Date <= '2019-01-14' might not do what you think it does. You should run the part SELECT DATE_FORMAT(a.DATE, "%m") as md, ... group by md separately to see what it does. In its current state, it will give you one row with the tuple '01', 0, representing "january", so it is basically a complicated way of doing select '01', 0. Unless today is the 15th or later, then it returns nothing (which is probably unintended).
Particularly, it will not limit the invoice dates to that specific range, but to all invoices that are from (the whole) january of any year. If that is what you intended, you should (additionally) add that filter directly, e.g. by using FROM table_1 epl where epl.md = '01' GROUP BY ..., reducing your execution time by an additional factor of about 12. So (apart from the 15th and up-problem), with your current range you should get the same result if you use
...
SUM(cost_amount) AS costs,
store_id,
md
FROM table_1 epl
WHERE md = '01'
GROUP BY store_id, md
)t2 ...
For different date ranges you will have to adjust that term. And to emphasize my point, this is significantly different from filtering invoices by their date, e.g.
...
SUM(cost_amount) AS costs,
store_id,
md
FROM table_1 epl
WHERE epl.registration >='2019-01-01'
and epl.registration <= '2019-01-14'
GROUP BY store_id, md
)t2 ...
which you may (or may not) have tried to do. You would need a different index in that case though (and it would be a slightly different question).
there might be some additional optimizations, simplifications or beautifications in the rest of your query, e.g group BY t1.md, t1.store_id looks redundant and/or wrong (indicating you are actually not on MySQL 5.7), and the b-subquery can only give you values 1 to 12, so generating 1000 dates and reducing them again could be simplified. But since they are operating on 100-ish rows, they will not affect execution time significantly, and I haven't checked those in detail. Some of it is probably due to getting the right output format or to generalizations (although, if you are dynamically grouping by other formats than by month, you need other indexes/columns, but that would be a different question).
An alternative way to precalculate your values would be a summary table where you e.g. run your inner query (the expensive group by) once a day and store the result in a table and then reuse it (by selecting from this table instead of doing the group by). This is especially viable for data like invoices that never change (although otherwise you can use triggers to keep the summary tables up to date). It also becomes more viable if you have several scenarios, e.g. if your user can decide to group by weekday, year, month or zodiac sign, since otherwise you would need to add an index for each of those. It becomes less viable if you need to dynamically limit your invoice range (to e.g. 2019-01-01 ... 2019-01-14). If you need to include the current day in your report, you can still precalculate and then add the values for the current date from the table (which should only involve a very limited number of rows, which is fast if you have an index starting with your date column), or use triggers to update your summary table on-the-fly.

With PRIMARY KEY(id), having INDEX(id, anything) is virtually useless.
See if you can avoid nesting subqueries.
Consider building that 'date' table permanently and have a PRIMARY KEY(md) on it. Currently, neither subquery has an index on the join column (md).
You may have the "explode-implode" syndrome. This is where JOINs expand the number of rows, only to have the GROUP BY collapse them.
Don't use COUNT(xx) unless you need to check xx for being NULL. Simply do COUNT(*).
store_id double -- Really?
TIMESTAMP vs DATETIME -- they perform about the same; don't bother changing it.
Since you are only looking at 2019-01, get rid of
date_format(epl.registration, '%m')
That, alone, may speed it up a lot. (However, you lose generality.)

MySQL put a specific row at the top of the result

I'm doing a basic SQL select query which returns a set of results. I want a specific row which the entry "Fee" to be put at the top of the results, then the rest.
Something like:
SELECT * FROM tbl ORDER By Charges = Fee DESC, Charges DESC
Can anyone help?

You could try this :
SELECT * from tbl ORDER BY CASE WHEN Charges = 'Fee' THEN 0 ELSE 1 END, Charges DESC;

I think you'd have a use a UNION query. ORDER BY doesn't support this kind of thing by default as far as I know.
Something like this:
SELECT * FROM tbl WHERE Charges = 'Fee'
UNION
SELECT * FROM tbl ORDER BY Charges DESC

You would have to use ORDER BY with a FIELD attribute, which would then order by those first.
As I don't have your table definitions, I have throw one together here http://sqlfiddle.com/#!9/91376/13
For sake of it disappearing, the script pretty much consists of;
CREATE TABLE IF NOT EXISTS `tbl` (
`id` int(6) unsigned AUTO_INCREMENT,
`Name` char(6) not null,
`Charges` char(10) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `tbl` (`Name`, `Charges`)
VALUES ('One', 'Fee'), ('Two', 'Charge'), ('Three', 'Charge'),
('Four', 'Charge'), ('Five', 'Fee'), ('Six', 'Fee'),
('Seven', 'Invoice'), ('Eight', 'Fee'), ('Nine', 'Invoice'),
('Ten', 'Invoice');
SELECT *
FROM tbl
ORDER BY FIELD(`Charges`, 'Charge') DESC
;
Which returns:
id Name Charges
2 Two Charge
3 Three Charge
4 Four Charge
1 One Fee
9 Nine Invoice
8 Eight Fee
7 Seven Invoice
6 Six Fee
5 Five Fee
10 Ten Invoice
So, to directly answer your question, your query would be;
SELECT *
FROM tbl
ORDER BY FIELD(Charges, 'Fee') DESC
edit : Viewable, sorted by Charges = Fee here : http://sqlfiddle.com/#!9/91376/15

SELECT * FROM tbl ORDER By FIELD(Charges, 'Fee') DESC
You can use something like the above. Where Charges is the field and fee the specific value. That way you can keep it simple.

Advanced MySQL Query select and compare time in one field

Employees
EmpID : int(10)
Firstname: varchar(100)
Lastname: varchar(100)
HireDate: timestamp
TerminationDate: timestamp
AnnualReviews
EmpID: int(10)
ReviewDate: timestamp
What is query that returns each employee and for each row/employee include the greatest number of employees that worked for the company at any time during their tenure and the first date that maximum was reached.
So far, this is my query:
select *, (select count(empid) from employees where terminationdate between t.hiredate and t.terminationdate)
from employees as t
group by empid

What you have is close.
But there's more work to do.
We'd to work out the conditions that determine how many employees were "working" at any point in time (i.e. at a given timestamp value.) The condition I'd check:
HireDate <= timestamp < TerminationDate
We'd need to extend that comparison, so that a NULL value for TerminationDate would be handled like it were a point in time after the timestamp value. That's easy enough to do.)
HireDate <= timestamp AND ( timestamp < TerminationDate OR TerminationDate IS NULL
So, something like this:
SELECT COUNT(1)
FROM Employees e
WHERE ( :timestamp >= e.HireDate )
AND ( :timestamp < e.TerminationDate OR e.TerminationDate IS NULL)
That "count" value would remain the same, and would only change for a "hire" or "terminate" event.
If we got a distinct list of all timestamps for all "hire" and "terminate" events, we could get the number of employees at that point in time.
So, this query would give us the employee count every time the employee count might change:
SELECT t.ts AS `as_of`
, COUNT(1) AS `employee_count`
FROM Employees e
JOIN ( SELECT t.TerminationDate AS ts
FROM Employees t
WHERE t.TerminationDate IS NOT NULL
GROUP BY t.TerminationDate
UNION
SELECT h.HireDate AS ts
FROM Employees h
WHERE h.HireDate IS NOT NULL
GROUP BY h.HireDate
) t
ON ( t.ts >= e.HireDate )
AND ( t.ts < e.TerminationDate OR e.TerminationDate IS NULL)
GROUP BY t.ts
We could use that result (as an inline view) and join that to particular Employee, and get just the rows that have an as_of timestamp that matches the period of employment for that employee. Then just pulling out the maximum employee_count. It wouldn't be difficult to identify the earlier of multiple as_of dates, if that maximum employee_count occurred multiple times.
(The wording of the question leaves open a question, the "earliest date" ever that the employee count met or exceeded the maximum that occurred during an employees tenure, or just the earliest date within the employees tenure that the maximum was reached. It's possible to get either result.)
That's just one way to approach the problem.

SQL GROUP BY return empty set

I have a table rental:
rentalId int(11)
Customer_customerId int(11)
Vehicle_registrationNumber varchar(20)
startDate datetime
endDate datetime
pickUpLocation int(11)
returnLocation int(11)
booking_time timestamp
timePickedUp timestamp
timeReturned timestamp
and table payment:
paymentId int(11)
Rental_rentalId int(11)
amountDue decimal(10,2)
amountPaid decimal(10,2)
paymentDate timestamp
I run two group by functions, first one counts the number of reservations and sums the payments by day, this function only works as expected when having pickUpLocation` is omitted, otherwise it returns incorrect values. :
SELECT COUNT(rentalId) AS number_of_rentals, MONTH(booking_time) AS month,
`YEAR(booking_time) AS year,
CONCAT(DAY(booking_time), '-', MONTH(booking_time), '-',`
YEAR(booking_time) ) AS date, SUM(amountDue) AS total_value, SUM(amountPaid) AS
total_paid, `pickUpLocation`
FROM (`rental`)
JOIN `payment` ON `payment`.`Rental_rentalId` = `rental`.`rentalId`
GROUP BY DAY(booking_time)
HAVING `month` = 2
AND `year` = 2012
AND `pickUpLocation` = 1
ORDER BY `booking_time` desc
LIMIT 31
The second function is expected to sum the reservations and payments (both due and received) for the entire month, for a specific location:
SELECT COUNT(rentalId) AS number_of_rentals, MONTH(booking_time) AS month,
YEAR(booking_time) AS year, SUM(amountDue) AS total_value,
SUM(amountPaid) AS total_paid,
`pickUpLocation`
FROM (`rental`)
JOIN `payment` ON `payment`.`Rental_rentalId` = `rental`.`rentalId`
GROUP BY MONTH(booking_time)
HAVING `month` = 2
AND `year` = 2012
AND `pickUpLocation` = 1
ORDER BY `booking_time` desc
It works for some locations and doesn't work for others (returns correct set when there are many reservations, but when there are only few, it returns empty set). I use MySQL. Any help greatly appreciated.

You're doing an inner join between rental and payment which means you will only ever get rentals that have been paid for. If you want to find rentals without payment info too in your result, you need to use a LEFT JOIN instead of just an (inner) JOIN.
Note that that may result in NULLs in your result if there are no payments to account for, so you may have to adjust the output of your query using one of the control flow functions.
Edit: You're also GROUPing before your conditions, that will GROUP all rows for a month into one single row. Since the year and the PickupLocation may vary, you will get random values (of the ones available) in those two fields. HAVING will then filter on those random fields, leaving you with a possibly empty result set. WHERE on the other hand will see every row before GROUPing and do the right thing (tm) on a row to row basis, so the conditions should be put there instead.
(The same change should probably be done to your first, working, query)
Demo here.

You may need to push some conditions from HAVING to WHERE clause:
WHERE YEAR(booking_time) = 2012
AND MONTH(booking_time) = 2
AND `pickUpLocation` = 1
GROUP BY DAY(booking_time)
LIMIT 31
For a specific month, you don't even need the GROUP BY:
WHERE YEAR(booking_time) = 2012
AND MONTH(booking_time) = 2
AND `pickUpLocation` = 1
The above condition is not very good regarding performance:
WHERE YEAR(booking_time) = 2012
AND MONTH(booking_time) = 2
You should change it into:
WHERE booking_time >= '2012-02-01'
AND booking_time < '2012-03-01'
so the query can use an index on booking_time (if you have or you add one in the future) and so it doesn't call the YEAR() and MONTH() functions for every row of the table.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008