I have a condition that I need to query Couchbase database field with two different where clauses in the same query.
for example:
for 'sales' field in a database entity, I want to have one query to get the sales in a single hour and whole day
output example:
`{`
`hourly_sales = 500`
`total_day_sales = 5000
`}`
I know that I can use 2 different queries, but requirement is to use one query for both
You can achieve that using case when statement. For example -
SELECT SUM(CASE WHEN `TIME` BETWEEN "13:00:00" AND "14:00:00" THEN `SALES`) AS HOURLY_SALES,
SUM(CASE WHEN `TIME` BETWEEN "00:00:00" AND "23:59:59" THEN `SALES`) AS TOTAL_DAY_SALES
FROM `BUCKET_NAME`
Make use of the time format according to your need.
I have a MySQL table that contains 20 000 000 rows, and columns like (user_id, registered_timestamp, etc). I have written a below query to get a count of users registered day wise. The query was taking a long time to execute. Will adding an index to the registered_timestamp column improve the execution time?
select date(registered_timestamp), count(userid) from table group by 1
Consider using this query to get a list of dates and the number of registrations on each date.
SELECT date(registered_timestamp) date, COUNT(*)
FROM table
GROUP BY date(registered_timestamp)
Then an index on table(registered_timestamp) will help a little because it's a covering index.
If you adapt your query to return dates from a limited range, for example.
SELECT date(registered_timestamp) date, COUNT(*)
FROM table
WHERE registered_timestamp >= CURDATE() - INTERVAL 8 DAY
AND registered_timestamp < CURDATE()
GROUP BY date(registered_timestamp)
the index will help. (This query returns results for the week ending yesterday.) However, the index will not help this query.
SELECT date(registered_timestamp) date, COUNT(*)
FROM table
WHERE DATE(registered_timestamp) >= CURDATE() - INTERVAL 8 DAY /* slow! */
GROUP BY date(registered_timestamp)
because the function on the column makes the query unsargeable.
You probably can address this performance issue with a MySQL generated column. This command:
ALTER TABLE `table`
ADD registered_date DATE
GENERATED ALWAYS AS DATE(registered_timestamp)
STORED;
Then you can add an index on the generated column
CREATE INDEX regdate ON `table` ( registered_date );
Then you can use that generated (derived) column in your query, and get a lot of help from that index.
SELECT registered_date, COUNT(*)
FROM table
GROUP BY registered_date;
But beware, creating the generated column and its index will take a while.
select date(registered_timestamp), count(userid) from table group by 1
Would benefit from INDEX(registered_timestamp, userid) but only because such an index is "covering". The query will still need to read every row of the index, and do a filesort.
If userid is the PRIMARY KEY, then this would give you the same answers without bothering to check each userid for being NOT NULL.
select date(registered_timestamp), count(*) from table group by 1
And INDEX(registered_timestamp) would be equivalent to the above suggestion. (This is because InnoDB implicitly tacks on the PK.)
If this query is common, then you could build and maintain a "summary table", which collects the count every night for the day's registrations. Then the query would be a much faster fetch from that smaller table.
I currently have the following query prepared:
select sum(amount) as total
from incomes
where (YEAR(date) = '2019' and MONTH(date) = '07')
and incomes.deleted_at is null
when reviewing it a bit, notice that it takes too long to have a lot of data in the table, since it goes through all this. I do not know much about optimizing queries, but I want to start documenting and researching for this case, reading a little note that although it is possible to create an index for a date type field, MySQL will not use an index once a column of the WHERE clause is wrapped with a function in this case YEAR and MONTH. So is this correct? What steps should I follow to improve its performance? Should I try to restructure my query?
I would suggest writing the query as:
select sum(i.amount) as total
from incomes i
where i.date >= '2019-07-01' and
i.date < '2019-08-01' and
i.deleted_at is null;
This query can take advantage of an index on incomes(deleted_at, date, amount):
create index idx_incomes_deleted_at_date_amount on incomes(deleted_at, date, amount)
I have a table with date column named day. The way I have it indexed is using multiple keys:
KEY user_id (user_id,day)
I want to make sure I use the index properly when I make a query that selects every row for a user_id from the beginning of the month to a given day in the month. For example, let's say I want to query for every day since the beginning of the month until today, what's the best way to write my query to ensure that I hit the index, here's what I have so far:
select * from table_name
WHERE user_id = 1
AND (day between DATE_FORMAT(NOW() ,'%Y-%m-01') AND NOW() )
use Explain to ensure if your query using the index you've created
For example
explain
select * from table_name
WHERE user_id = 1
AND (day between DATE_FORMAT(NOW() ,'%Y-%m-01') AND NOW() )
should give you the details of the query plan by the mysql optimizer
In the query plan in possible keys there user_id(index) should be present --- > which states that the possible indexes that can be used to fetch the result set from a particular table
keys field user_id(index) should be present --- > which shows that the index that is being used to get the results
Extra ---> Additional information (using index--the whole result set can be fetched from the index file itself, using where ---> query uses index for filter the criteria and so on...)
In Your query you have mentioned user_id = 1 constant which will help more in reducing the number of records that are to be scanned, even though there is a range check on day, so the query will use index provided that you have less percentage of duplicate values in user_id column
I have a product order table in mysql. It's like this:
create table `order`
(productcode int,
quantity tinyint,
order_date timestamp,
blablabla)
then, to get rate of rise, i wrote this query:
SELECT thismonth.productcode,
(thismonth.ordercount-lastmonth.ordercount)/lastmonth.ordercount as riserate
FROM ( (SELECT productcode,
sum(quantity) as ordercount
FROM `order`
where date_format(order_date,'%m') = 12
group by productcode) as thismonth,
(SELECT productcode,
sum(quantity) as ordercount
FROM `order`
where date_format(order_date,'%m') = 11
group by productcode) as lastmonth)
WHERE thismonth.productcode = lastmonth.productcode
ORDER BY riserate;
but it runs about 30s on my pc(200000 records, 200MB(include other fields)).
Are there any way to increase query speed? I already create index for productcode field.
I thought the reason of low performance is 'GROUP BY', is there any different way?
I tried your answers, but all of them seems not work, and I was wondering if there is something wrong with index(it's not me who created them), so I delete all index and re-created them, everything goes fine -- It only takes 3-4s. And difference between my query and yours is not very obvious. But REALLY thanks you guys, I learned a lot :)
Try adding an index on (ORDER_DATE, PRODUCTCODE) and change the query to eliminate the use of the DATE_FORMAT function, as in:
SELECT thismonth.productcode,
(thismonth.ordercount-lastmonth.ordercount)/lastmonth.ordercount as riserate
FROM ( (SELECT productcode,
sum(quantity) as ordercount
FROM `order`
WHERE ORDER_DATE BETWEEN '01-12-2010' AND '31-12-2010'
GROUP BY PRODUCTCODE) as thismonth,
(SELECT productcode,
sum(quantity) as ordercount
FROM `order`
WHERE ORDER_DATE BETWEEN '01-11-2010' AND '30-11-2010'
group by productcode) as lastmonth)
WHERE thismonth.productcode = lastmonth.productcode
ORDER BY riserate;
Share and enjoy.
Given the sheer amount of data you seem to be working with, optimization may be difficult. I would first look at how you are using the order_date field. It should probably be indexed with the product_code field. I also don't think date_format is the best way to get the month out of the date - MONTH(order_date) would almost certainly be faster.
Failing that, if this is a query that is going to be hit many times, I would create a new table for the historical data and fill it with the results of your inner queries. Since it's historical data, you won't need to continually get the the latest data. Since you won't have to calculate the historical data every time you run the query, it will run a lot faster.
#Bob Jarvis' solution might resolve your speed issue. If not, or if you want to try an alternative:
Add update_month to store the month
of update_date
Update the column for existing rows
Add an index on update_month
Create a BEFORE UPDATE trigger to
set the value of update_month on row
updates
Create a BEFORE INSERT trigger to
set the value of update_month on row
inserts
Modify your query accordingly
SELECT
productcode,
(this_month_count - last_month_count) / last_month_count AS riserate
FROM (
SELECT
o.product,
SUM(CASE MONTH(o.order_date) WHEN MONTH(m.date_start) THEN o.quantity END) AS last_month_count,
SUM(CASE MONTH(o.order_date) WHEN MONTH(m.date_end) THEN o.quantity END) AS this_month_count
FROM `order` o
INNER JOIN (
SELECT
CAST('2010-11-01' AS date) AS date_start,
CAST('2010-12-31' AS date) AS date_end
) m ON o.order_date BETWEEN m.date_start AND m.date_end
GROUP BY o.product
) s
Consider using datetime instead of timestamp
If your only reason to use timestamp is to have auto default value on insert and update, use datetime instead and put now() into your inserts and updates or use triggers. Timestamp gives you additional conversion for time zones, but if you don't have clients connecting to your database from different time zones you are just losing time on conversions. This alone should give you 15-30% speed up.
This might be one of rare cases where optimizer can choose wrong index
And productcode index is wrong in this case. Because you are grouping by productcode and using where for other column, which is not very selective, optimizer may think using index for productcode can speed up things. But with this index used it gives you very random scan through index lookup but still with quite big number of rows, instead of faster sequential semi-full scan without it, but with order_date index to limit number of rows scanned. Optimizer simply doesn't know you can expect rows mostly to be sorted by order_date on the disk and not by productcode. Of course to make order_date index work you have to change your query so for every comparison using order_date column name is on one side of the =,<,> or BETWEEN and constant values on the other side, like suggested by Bob Javis in his answer (+1 to him). So you might want to try his query slightly modified, with correected date formats and force use of order_date index - assuming you have it, if not you really should add it with
ALTER TABLE `order` ADD INDEX order_date( order_date );
So the final query should look like:
SELECT thismonth.productcode,
(thismonth.ordercount-lastmonth.ordercount)/lastmonth.ordercount as riserate
FROM ( (SELECT productcode,
sum(quantity) as ordercount
FROM `order` FORCE INDEX( order_date )
WHERE order_date BETWEEN '2010-12-01' AND '2010-12-31'
GROUP BY productcode) as thismonth,
(SELECT productcode,
sum(quantity) as ordercount
FROM `order` FORCE INDEX( order_date )
WHERE order_date BETWEEN '2010-11-01' AND '2010-11-30'
group by productcode) as lastmonth)
WHERE thismonth.productcode = lastmonth.productcode
ORDER BY riserate;
Not using productid index should give you some speed up (full scan should be faster), and using order_date index even more, depending on how many rows satisfy order_date conditions vs all rows in the table.