I have a table that has over 100,000,000 rows and I have a query that looks like this:
SELECT
COUNT(IF(created_at >= '2015-07-01 00:00:00', 1, null)) AS 'monthly',
COUNT(IF(created_at >= '2015-07-26 00:00:00', 1, null)) AS 'weekly',
COUNT(IF(created_at >= '2015-06-30 07:57:56', 1, null)) AS '30day',
COUNT(IF(created_at >= '2015-07-29 17:03:44', 1, null)) AS 'recent'
FROM
items
WHERE
user_id = 123456;
The table looks like so:
CREATE TABLE `items` (
`user_id` int(11) NOT NULL,
`item_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`user_id`,`item_id`),
KEY `user_id` (`user_id`,`created_at`),
KEY `created_at` (`created_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The explain looks fairly harmless, minus the massive row counts:
1 SIMPLE items ref PRIMARY,user_id user_id 4 const 559864 Using index
I use the query to gather counts for a specific user for 4 segments of time.
Is there a smarter/faster way to obtain the same data or is my only option to tally these as new rows are put into this table?
If you have an index on created_at, I would also put in the where clause created_at >= '2015-06-30 07:57:56' which is the lowest date possible in your segment.
Also with the same index it might work splitting in 4 queries:
select count(*) AS '30day'
FROM
items
WHERE
user_id = 123456
and created_at >= '2015-06-30 07:57:56'
union ....
And so on
I would add an index on created_at field:
ALTER TABLE items ADD INDEX idx_created_at (created_at)
or (as Thomas suggested) since you are also filtering for user_id a composite index on created_at and user_id:
ALTER TABLE items ADD INDEX idx_user_created_at (user_id, created_at)
and then I would write your query as:
SELECT 'monthly' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-01 00:00:00' AND user_id = 123456
UNION ALL
SELECT 'weekly' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-26 00:00:00' AND user_id = 123456
UNION ALL
SELECT '30day' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-06-30 07:57:56' AND user_id = 123456
UNION ALL
SELECT 'recent' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-29 17:03:44' AND user_id = 123456
yes, the output is a little different. Or you can use inline queries:
SELECT
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'monthly',
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'weekly',
...
and if you want an average, you could use a subquery:
SELECT
monthly,
weekly,
monthly / total,
weekly / total
FROM (
SELECT
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'monthly',
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'weekly',
...,
(SELECT COUNT(*) FROM items WHERE user_id=...) AS total
) s
INDEX(user_id, created_at) -- optimal
AND created_at >= '2015-06-30 07:57:56' -- helps because it cuts down on the number of index entries to touch
Doing a UNION does not help since it leads to 4 times as much work.
Doing for subquery SELECTs does not help for the same reason.
Also
COUNT(IF(created_at >= '2015-07-29 17:03:44', 1, null))
can be shortened to
SUM(created_at >= '2015-07-29 17:03:44')
(But probably does not speed it up much)
If the data does not change over time, only new rows are added, then Summary tables of past data would lead to a significant speedup, but only if you could avoid things like '07:57:56' for '30day'. (Why have '00:00:00' for only some of them?) Perhaps the speedup would be another factor of 10 on top of the other changes. Want to discuss further?
(I do not see any advantage in using PARTITION.)
Related
Here is my table
Which have field type which means 1 is for income and 2 is for expense
Now requirement is for example in table there is two transaction made on 2-10-2018 so i want data as following
Expected Output
id created_date total_amount
1 1-10-18 10
2 2-10-18 20(It calculates all only income transaction made on 2nd date)
3 3-10-18 10
and so on...
it will return an new field which contains only incom transaction made on perticulur day
What i had try is
SELECT * FROM `transaction`WHERE type = 1 ORDER BY created_date ASC
UNION
SELECT()
//But it wont work
SELECT created_date,amount,status FROM
(
SELECT COUNT(amount) AS totalTrans FROM transaction WHERE created_date = created_date
) x
transaction
You can Also See Schema HERE http://sqlfiddle.com/#!9/6983b9
You can Count() the total number of expense transactions using conditional function If(), on a group of created_date.
Similarly, you can Sum() the amount of expense done using If(), on a created_date.
Try the following:
SELECT
`created_date`,
SUM(IF (`type` = 2, `amount`, 0)) AS total_expense_amount,
COUNT(IF (`type` = 2, `id`, NULL)) AS expense_count
FROM
`transaction`
GROUP BY `created_date`
ORDER BY `created_date` ASC
Do you just want a WHERE clause?
SELECT t.created_date, SUM(amount) as total_amount
FROM transaction t
WHERE type = 2
GROUP BY t.created_date
ORDER BY created_date ASC ;
Tables
CREATE TABLE `aircrafts_in` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`city_from` int(11) NOT NULL COMMENT 'Откуда',
`city_to` int(11) NOT NULL COMMENT 'Куда',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=91 DEFAULT CHARSET=utf8 COMMENT='Самолёты по направлениям'
CREATE TABLE `aircrafts_in_parsed_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`price` int(11) NOT NULL COMMENT 'Ценник',
`airline` varchar(255) NOT NULL COMMENT 'Авиакомпания',
`date` date NOT NULL COMMENT 'Дата вылета',
`info_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `info_id` (`info_id`),
KEY `price` (`price`),
KEY `date` (`date`)
) ENGINE=InnoDB AUTO_INCREMENT=940682 DEFAULT CHARSET=utf8
date - departure date
CREATE TABLE `aircrafts_in_parsed_info` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` enum('success','error') DEFAULT NULL,
`type` enum('roundtrip','oneway') NOT NULL,
`date` datetime NOT NULL COMMENT 'Дата парсинга',
`aircrafts_in_id` int(11) DEFAULT NULL COMMENT 'ID направления',
PRIMARY KEY (`id`),
KEY `aircrafts_in_id` (`aircrafts_in_id`)
) ENGINE=InnoDB AUTO_INCREMENT=577759 DEFAULT CHARSET=utf8
date - created date, when was parsed
Task
Get lowest price of ticket and date of departure for each month. Be aware that the minimum price is relevant, not just the minimum. If multiple dates with minimum cost, we need a first.
My solution
I think that there's something not quite right.
I don't like subqueries for grouping, how to solve this problem
select *
from (
select * from (
select airline,
price,
pdata.`date` as `date`
from aircrafts_in_parsed_data `pdata`
inner join aircrafts_in_parsed_info `pinfo`
on pdata.`info_id` = pinfo.`id`
where pinfo.`aircrafts_in_id` = {$id}
and pinfo.status = 'success'
and pinfo.`type` = 'roundtrip'
and `price` <> 0
group by pdata.`date`, year(pinfo.`date`) desc, month(pinfo.`date`) desc, day(pinfo.`date`) desc
) base
group by `date`
order by price, year(`date`) desc, month(`date`) desc, day(`date`) asc
) minpriceperdate
group by year(`date`) desc, month(`date`) desc
Takes 0.015 s without cache, table size can view in auto increment
SELECT MIN(price) AS min_price,
LEFT(date, 7) AS yyyy_mm
FROM aircrafts_in_parsed_data
GROUP BY LEFT(date, 7)
will get the lowest price for each month. But it can't say 'first'.
From my groupwise-max cheat-sheet, I derive this:
SELECT
yyyy_mm, date, price, airline -- The desired columns
FROM
( SELECT #prev := '' ) init
JOIN
( SELECT LEFT(date, 7) != #prev AS first,
#prev := LEFT(date, 7)
LEFT(date, 7) AS yyyy_mm, date, price, airline
FROM aircrafts_in_parsed_data
ORDER BY
LEFT(date, 7), -- The 'GROUP BY'
price ASC, -- ASC to do "MIN()"
date -- To get the 'first' if there are dup prices for a month
) x
WHERE first -- extract only the first of the lowest price for each month
ORDER BY yyyy_mm; -- Whatever you like
Sorry, but subqueries are necessary. (I avoided YEAR(), MONTH(), and DAY().)
You are right, your query is not correct.
Let's start with the innermost query: You group by pdata.date + pinfo.date, so you get one result row per date combination. As you don't specify which price or airline you are interested in for each date combination (such as MAX(airline) and MIN(price)), you get one airline arbitrarily chosen for a date combination and one price also arbitrarily chosen. These don't even have to belong to the same record in the table; the DBMS is free to chose one airline and one price matching the dates. Well, maybe the date combination of pdata.date and pinfo.date is already unique, but then you wouldn't have to group by at all. So however we look at this, this isn't proper.
In the next query you group by pdata.date only, thus again getting arbitrary matches for airline and price. You could have done that in the innermost query already. It makes no sense to say: "give me a randomly picked price per pdata.date and pinfo.date and from these give me a randomly picked price per pdata.date", you could just as well say it directly: "give me a randomly picked price per pdata.date". Then you order your result rows. This is completely useless, as you are using the results as a subquery (derived table) again, and such is considered an unordered set. So the ORDER BY gives the DBMS more work to do, but is in no way guaranteed to influence the main queries results.
In your main query then you group by year and month, again resulting in arbitrarily picked values.
Here is the same query a tad shorter and cleaner:
select
pdata.airline, -- some arbitrily chosen airline matching year and month
pdata.price, -- some arbitrily chosen price matching year and month
pdata.date -- some arbitrily chosen date matching year and month
from aircrafts_in_parsed_data pdata
inner join aircrafts_in_parsed_info pinfo on pdata.info_id = pinfo.id
where pinfo.aircrafts_in_id = {$id}
and pinfo.status = 'success'
and pinfo.type = 'roundtrip'
and pdata.price <> 0
group by year(pdata.date), month(pdata.date)
order by year(pdata.date) desc, month(pdata.date) desc
As to the original task (as far as I understand it): Find the records with the lowest price per month. Per month means GROUP BY month. The lowest price is MIN(price).
select
min_price_record.departure_year,
min_price_record.departure_month,
min_price_record.min_price,
full_record.departure_date,
full_record.airline
from
(
select
year(`date`) as departure_year,
month(`date`) as departure_month,
min(price) as min_price
from aircrafts_in_parsed_data
where price <> 0
and info_id in
(
select id
from aircrafts_in_parsed_info
where aircrafts_in_id = {$id}
and status = 'success'
and type = 'roundtrip'
)
group by year(`date`), month(`date`)
) min_price_record
join
(
select
`date` as departure_date,
year(`date`) as departure_year,
month(`date`) as departure_month,
price,
airline
from aircrafts_in_parsed_data
where price <> 0
and info_id in
(
select id
from aircrafts_in_parsed_info
where aircrafts_in_id = {$id}
and status = 'success'
and type = 'roundtrip'
)
) full_record on full_record.departure_year = min_price_record.departure_year
and full_record.departure_month = min_price_record.departure_month
and full_record.price = min_price_record.min_price
order by
min_price_record.departure_year desc,
min_price_record.departure_month desc;
I have a table that contains all purchased items.
I need to check which users purchased items in a specific period of time (say between 2013-03-21 to 2013-04-21) and never purchased anything after that.
I can select users that purchased items in that period of time, but I don't know how to filter those users that never purchased anything after that...
SELECT `userId`, `email` FROM my_table
WHERE `date` BETWEEN '2013-03-21' AND '2013-04-21' GROUP BY `userId`
Give this a try
SELECT
user_id
FROM
my_table
WHERE
purchase_date >= '2012-05-01' --your_start_date
GROUP BY
user_id
HAVING
max(purchase_date) <= '2012-06-01'; --your_end_date
It works by getting all the records >= start date, groups the resultset by user_id and then finds the max purchase date for every user. The max purchase date should be <=end date. Since this query does not use a join/inner query it could be faster
Test data
CREATE table user_purchases(user_id int, purchase_date date);
insert into user_purchases values (1, '2012-05-01');
insert into user_purchases values (2, '2012-05-06');
insert into user_purchases values (3, '2012-05-20');
insert into user_purchases values (4, '2012-06-01');
insert into user_purchases values (4, '2012-09-06');
insert into user_purchases values (1, '2012-09-06');
Output
| USER_ID |
-----------
| 2 |
| 3 |
SQLFIDDLE
This is probably a standard way to accomplish that:
SELECT `userId`, `email` FROM my_table mt
WHERE `date` BETWEEN '2013-03-21' AND '2013-04-21'
AND NOT EXISTS (
SELECT * FROM my_table mt2 WHERE
mt2.`userId` = mt.`userId`
and mt2.`date` > '2013-04-21'
)
GROUP BY `userId`
SELECT `userId`, `email` FROM my_table WHERE (`date` BETWEEN '2013-03-21' AND '2013-04-21') and `date` >= '2013-04-21' GROUP BY `userId`
This will select only the users who purchased during that timeframe AND purchased after that timeframe.
Hope this helps.
Try the following
SELECT `userId`, `email`
FROM my_table WHERE `date` BETWEEN '2013-03-21' AND '2013-04-21'
and user_id not in
(select user_id from my_table
where `date` < '2013-03-21' or `date` > '2013-04-21' )
GROUP BY `userId`
You'll have to do it in two stages - one query to get the list of users who did buy within the time period, then another query to take that list of users and see if they bought anything afterwards, e.g.
SELECT userID, email, count(after.*) AS purchases
FROM my_table AS after
LEFT JOIN (
SELECT DISTINCT userID
FROM my_table
WHERE `date` BETWEEN '2013-03-21' AND '2013-04-21'
) AS during ON after.userID = during.userID
WHERE after.date > '2013-04-21'
HAVING purchases = 0;
Inner query gets the list of userIDs who purchased at least one thing during that period. That list is then joined back against the same table, but filtered for purchases AFTER the period , and counts how many purchases they made and filters down to only those users with 0 "after" purchases.
probably won't work as written - haven't had my morning tea yet.
SELECT
a.userId,
a.email
FROM
my_table AS a
WHERE a.date BETWEEN '2013-03-21'
AND '2013-04-21'
AND a.userId NOT IN
(SELECT
b.userId
FROM
my_table AS b
WHERE b.date BETWEEN '2013-04-22'
AND CURDATE()
GROUP BY b.userId)
GROUP BY a.userId
This filters out anyone who has not purchased anything from the end date to the present.
I have the following two tables
CREATE TABLE IF NOT EXISTS `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
CREATE TABLE IF NOT EXISTS `events_dates` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_id` bigint(20) NOT NULL,
`date` date NOT NULL,
`start_time` time NOT NULL,
`end_time` time NOT NULL,
PRIMARY KEY (`id`),
KEY `event_id` (`event_id`),
KEY `date` (`event_id`)
) ENGINE=MyISAM;
Where the link is event_id
What I want is to retrieve all unique event records with their respective event dates ordered by the smallest date ascending within a certain period
Basically the following query does exactly what I want
SELECT Event.id, Event.title, EventDate.date, EventDate.start_time, EventDate.end_time
FROM
events AS Event
JOIN
com_events_dates AS EventDate
ON (Event.id = EventDate.event_id AND EventDate.date = (
SELECT MIN(MinEventDate.date) FROM events_dates AS MinEventDate
WHERE MinEventDate.event_id = Event.id AND MinEventDate.date >= CURDATE() # AND `MinEventDate`.`date` < '2013-02-27'
)
)
WHERE
EventDate.date >= CURDATE() # AND `EventDate`.`date` < '2013-02-27'
ORDER BY EventDate.date ASC , EventDate.start_time ASC , EventDate.end_time DESC
LIMIT 20
This query is the result of multiple attempts at further improving the slow time this initially had (1.5 seconds) when i wanted to use group by and other subqueries. Its the fastest one yet but considering that there are 1400 event records and 10000 event records in total, the query takes 400+ ms time to process, also I run a count based on this (for paging purposes) that takes a lot of time as well.
Strangely enough omitting the EventDate condition in the main where clause causes this to be even higher 1s+.
Is there anything I can do to improve this or a different approach at the table structure?
Just to clarify to anyone else... the "#" in MySQL acts as a continuation comment and is basically ignored in the query, it is not an "AND EventDate.Date < '2013-02-27'". That said, it appears you want a list of all events COMING UP that have not yet happened. I would start with a simple "prequery" that just grabs all events and the minimum date based on the event date not happening yet. Then join that result to the other tables to get the rest of the fields you want
SELECT
E.ID,
E.Title,
ED2.`date`,
ED2.Start_Time,
ED2.End_Time
FROM
( SELECT
ED.Event_ID,
MIN( ED.`date` ) as MinEventDate
from
Event_Dates ED
where
ED.`date` >= curdate()
group by
ED.Event_ID ) PreQuery
JOIN Events E
ON PreQuery.Event_ID = E.ID
JOIN Event_Dates ED2
ON PreQuery.Event_ID = ED2.Event_ID
AND PreQuery.MinEventDate = ED2.`date`
ORDER BY
ED2.`date`,
ED2.Start_Time,
ED2.End_Time DESC
LIMIT 20
Your table has redundant index on event ID, just by different names. Calling the name of an index date does not mean that's the column being indexed. The value(s) in parens ( event_id ) is what the index is built on.
So, I would change your create table to...
KEY `date` ( `event_id`, `date`, `start_time` )
Or, to manually create an index.
Create index ByEventAndDate on Event_Dates ( `event_id`, `date`, `start_time` )
If you are talking about optimization, it is helpful to include execution plans when possible.
By the way try this ones (if you are not tried it already):
SELECT
Event.id,
Event.title,
EventDate.date,
EventDate.start_time,
EventDate.end_time
FROM
(select e.id, e.title, min(date) as MinDate
from events_dates as ed
join events as e on e.id = ed.event_id
where date >= CURDATE() and date < '2013-02-27'
group by e.id, e.title) as Event
JOIN events_dates AS EventDate ON Event.id = EventDate.event_id
and Event.MinDate = EventDate.date
ORDER BY EventDate.date ASC , EventDate.start_time ASC , EventDate.end_time DESC
LIMIT 20
;
#assuming event_dates.date for greater event_dates.id always greater
SELECT
Event.id,
Event.title,
EventDate.date,
EventDate.start_time,
EventDate.end_time
FROM
(select e.id, e.title, min(ed.id) as MinID
from events_dates as ed
join events as e on e.id = ed.event_id
where date >= CURDATE() and date < '2013-02-27'
group by e.id, e.title) as Event
JOIN events_dates AS EventDate ON Event.id = EventDate.event_id
and Event.MinID = EventDate.id
ORDER BY EventDate.date ASC , EventDate.start_time ASC , EventDate.end_time DESC
LIMIT 20
I have huge table with millions of records that store stock values by timestamp. Structure is as below:
Stock, timestamp, value
goog,1112345,200.4
goog,112346,220.4
Apple,112343,505
Apple,112346,550
I would like to query this table by timestamp. If the timestamp matches,all corresponding stock records should be returned, if there is no record for a stock for that timestamp, the immediate previous one should be returned. In the above ex, if I query by timestamp=1112345 then the query should return 2 records:
goog,1112345,200.4
Apple,112343,505 (immediate previous record)
I have tried several different ways to write this query but no success & Im sure I'm missing something. Can someone help please.
SELECT `Stock`, `timestamp`, `value`
FROM `myTable`
WHERE `timestamp` = 1112345
UNION ALL
SELECT `Stock`, `timestamp`, `value`
FROM `myTable`
WHERE `timestamp` < 1112345
ORDER BY `timestamp` DESC
LIMIT 1
select Stock, timestamp, value from thisTbl where timestamp = ? and fill in timestamp to whatever it should be? Your demo query is available on this fiddle
I don't think there is an easy way to do this query. Here is one approach:
select tprev.*
from (select t.stock,
(select timestamp from t.stock = s.stock and timestamp <= <whatever> order by timestamp limit 1
) as prevtimestamp
from (select distinct stock
from t
) s
) s join
t tprev
on s.prevtimestamp = tprev.prevtimestamp and s.stock = t.stock
This is getting the previous or equal timestamp for the record and then joining it back in. If you have indexes on (stock, timestamp) then this may be rather fast.
Another phrasing of it uses group by:
select tprev.*
from (select t.stock,
max(timestamp) as prevtimestamp
from t
where timestamp <= YOURTIMESTAMP
group by t.stock
) s join
t tprev
on s.prevtimestamp = tprev.prevtimestamp and s.stock = t.stock