MySQL/memSQL not using index on BETWEEN join condition - mysql

We have two tables:
A dates table that contains one date per day for the last 10 and next 10 years.
A states table that has the following columns: start_date, end_date, state.
The query we run looks like this:
SELECT dates.date, COUNT(*)
FROM dates
JOIN states
ON dates.date BETWEEN states.start_date AND states.end_date
WHERE dates.date BETWEEN '2017-01-01' AND '2017-01-31'
GROUP BY dates.date
ORDER BY dates.date;
According to the query plan, memSQL isn't using an index on the JOIN condition and this makes the query slow. Is there a way we can use an index on the JOIN condition?
We tried memSQL skiplist indexes on dates.date, states.start_date, states.end_date, (states.start_date, states.end_date)
Tables & EXPLAIN:
CREATE TABLE `dates` (
`date` date DEFAULT NULL,
KEY `date_index` (`date`)
)
CREATE TABLE `states` (
`start_date` datetime DEFAULT NULL,
`end_date` datetime DEFAULT NULL,
`state` varchar(256) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
KEY `start_date` (`start_date`),
KEY `end_date` (`end_date`),
KEY `start_date_end_date` (`start_date`,`end_date`),
)
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| GatherMerge [remote_0.date] partitions:all est_rows:96 alias:remote_0 |
| Project [r2.date, CAST(COALESCE($0,0) AS SIGNED) AS `COUNT(*)`] est_rows:96 |
| Sort [r2.date] |
| HashGroupBy [SUM(r2.`COUNT(*)`) AS $0] groups:[r2.date] |
| TableScan r2 storage:list stream:no |
| Repartition [r1.date, `COUNT(*)`] AS r2 shard_key:[date] est_rows:96 est_select_cost:26764032 |
| HashGroupBy [COUNT(*) AS `COUNT(*)`] groups:[r1.date] |
| Filter [r1.date <= states.end_date] |
| NestedLoopJoin |
| |---IndexRangeScan drstates_test.states, KEY start_date (start_date) scan:[start_date <= r1.date] est_table_rows:123904 est_filtered:123904 |
| TableScan r1 storage:list stream:no |
| Broadcast [dates.date] AS r1 distribution:tree est_rows:96 |
| IndexRangeScan drstates_test.dates, KEY date_index (date) scan:[date >= '2017-01-01' AND date <= '2017-01-31'] est_table_rows:18628 est_filtered:96 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+

ON dates.date BETWEEN states.start_date
AND states.end_date
is essentially un-optimizable. The only practical way to perform this test is to tediously test every row.
If you are using MySQL and don't need the dates table, consider starting with
SELECT *
FROM states
WHERE start_date >= '2017-01-01'
AND end_date < '2017-01-01' + INTERVAL 1 MONTH
Note that this works for any combination of DATE and DATETIME datatypes.
Since I am unclear on the ultimate goal, I am unclear on what to do next.

Related

Query to find an entry between dates

I have a table containing several records associated to the same entities. Two of the fields are dates - start and end dates of a specific period.
Example:
ID
Name
Start
End
3
Fred
2022/01/01
2100/12/31
2
John
2018/01/01
2021/12/31
1
Mark
2014/03/22
2017/12/31
The dates and names vary, but the only rule is that there are NO OVERLAPS - it's a succession of people in charge of a unique role, so there is only one record which is valid for any date.
I have a query returning me a date (let's call it $ThatDay) and what I am trying to do is to find a way to find which name it was at that specific date. For example, if the date was July 4th, 2019, the result of the query I am after would be "John"
I have run out of ideas on how to structure a query to help me find it. Thank you in advance for any help!
you can use a SELECT with BETWEEN as WHERE clause
The date format of MySQL is yyyy-mm-dd , if you keep that you wil never have problems
CREATE TABLE datetab (
`ID` INTEGER,
`Name` VARCHAR(4),
`Start` DATETIME,
`End` DATETIME
);
INSERT INTO datetab
(`ID`, `Name`, `Start`, `End`)
VALUES
('3', 'Fred', '2022/01/01', '2100/12/31'),
('2', 'John', '2018/01/01', '2021/12/31'),
('1', 'Mark', '2014/03/22', '2017/12/31');
SELECT `Name` FROM datetab WHERE '2019-07-04' BETWEEN `Start` AND `End`
| Name |
| :--- |
| John |
db<>fiddle here
If ou have a (Sub)- Query with a date as result,you can join it for example
SELECT `Name`
FROM datetab CROSS JOIN (SELECT '2019-07-04' as mydate FROM dual) t1
WHERE mydate BETWEEN `Start` AND `End`
| Name |
| :--- |
| John |
db<>fiddle here
Also when the query only return one row and column you can use the subquery like this
SELECT `Name`
FROM datetab
WHERE (SELECT '2019-07-04' as mydate FROM dual) BETWEEN `Start` AND `End`
| Name |
| :--- |
| John |
db<>fiddle here
Select where the result of your find-date query is between start and end:
select * from mytable
where (<my find date query>)
between start and end

Show last published result

I want to get last published result.Suppose My Current Date is 01-08-2019 and time is 11:00.Then i will get result of row no 2(ID) OR (another)Current Date is 02-08-2019 and 13:00 then I will get row no 5(ID).
Note: Based on current date & Time
id date time number
1 | 31-07-2019 | 12:30 | 20
2 | 31-07-2019 | 18:30 | 35
3 | 01-08-2019 | 12:30 | 40
4 | 01-08-2019 | 18:30 | 70
5 | 02-08-2019 | 12:30 | 21
6 | 02-08-2019 | 18:30 | 61
If you want to make a comparison with the current Server's Time use NoW() otherwise you can input your own time for example '2018-12-31 18:30'
SELECT * FROM `YourTable`
where `Date` <= Now() AND `Time` <= DATE_FORMAT(Now(), '%H:%i')
Order by `Date` Desc,`Time` Desc LIMIT 1;
Also Looking at your current Date Column you may be using the wrong datatype... you may consider creating the table to something like this (Where i used the datatype Date)
CREATE TABLE `YourTable` (
`ID` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`Date` DATE NOT NULL,
`Time` TIME NOT NULL,
`Number` INTEGER,
PRIMARY KEY (`ID`)
);
Or even better you could combine both Date and time together
CREATE TABLE `YourTable` (
`ID` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`Date` DATETIME NOT NULL,
`Number` INTEGER,
PRIMARY KEY (`ID`)
);
With the above structre, you could simplify your query to
SELECT * FROM `YourTable`
where `Date` <= Now()
Order by `Date` Desc LIMIT 1;
It just a SQL query for example:
SELECT * FROM `ENTER_HERE_YOUR_TABLE_NAME` ORDER BY `date`, `time` DESC LIMIT 1
No extras needed.
Should be work fine in your case.
select id from tablename where date<='01-08-2019' and time<='11:00' order by date,time desc limit 1
Try this if you have difficulties in changing your current table datatype.
SELECT * FROM yourtable
WHERE date = DATE_FORMAT(CURDATE(),'%d-%m-%Y')
AND time <= TIME_FORMAT(CURTIME(),'%H:%i')
ORDER BY time DESC
LIMIT 1;

Difference between 2 records timestamp sql

I have this table:
CREATE TABLE result (
id bigint(20) NOT NULL AUTO_INCREMENT,
tag int(11) NOT NULL,
timestamp timestamp NULL DEFAULT NULL,
value double NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY nasudnBBEby333412dsa (timestamp, tag)
) ENGINE=InnoDB AUTO_INCREMENT=115 DEFAULT CHARSET=utf8mb4;
I would like to calculate the difference between two consecutive days that have the same column tag. For example, in timestamp:
| 1 | 1 | 2017-06-18 00:00:00 | 7.3 |
| 2 | 1 | 2017-06-17 00:00:00 | 7.4 |
I want to result: -0.1
Which query should i write?
You can try this
1) Use join to select value of next consecutive day.
2) then calculate difference
SELECT r1.id, r1.tag, r1.value AS CURRENT_VALUE, r2.value AS NEXT_VALUE, (
r1.value - r2.value
) AS DIFF, r1.timestamp
FROM `result` r1
LEFT JOIN result r2 ON r2.tag=r1.tag AND r2.`timestamp` = r1.`timestamp` + INTERVAL 1
DAY WHERE r2.value IS NOT NULL
GROUP BY r1.timestamp
Output
First, if you want to store date values, you can use date, so there is no time component.
Second, you can do this with join:
select r.*, (r.value - rprev.value) as diff
from results r left join
results rprev
on r.tag = rprev.tag and
r.timestamp = rprev.timestamp + interval 1 day;

How to exclude rows based on certain values and duplicates

We have data related to subscription events - create, update, delete, etc. I want to be able to query on this data based on certain values to determine if a given user was active on a given date based on the events logged. I have the following table: (SQL fiddle here)
CREATE TABLE events (
eid varchar(45) NOT NULL,
cid varchar(45) DEFAULT NULL,
sid varchar(45) DEFAULT NULL,
event_type varchar(45) DEFAULT NULL,
period_start datetime DEFAULT NULL,
period_end datetime DEFAULT NULL,
date date DEFAULT NULL,
datetime datetime DEFAULT NULL,
PRIMARY KEY (eid)
);
with the following example data:
INSERT INTO events
(eid, cid, sid, event_type, period_start, period_end, date, datetime)
VALUES
('event_1', 'customer_1', 'subscription_456', 'created', '2016-03-11 17:38:50', '2016-09-11 18:38:50', '2016-03-11', '2016-03-11 17:38:51');
('event_2', 'customer_1', 'subscription_456', 'updated', '2016-09-11 18:38:50', '2017-03-11 17:38:50', '2016-09-11', '2016-09-11 18:46:04'),
('event_3', 'customer_1', 'subscription_456', 'deleted', '2016-09-11 18:38:50', '2017-03-11 17:38:50', '2016-09-11', '2016-09-11 22:39:43'),
I am looking for a query where I could enter in any date to see if this user was active during this time based on the period_start, period_end, and event_type.
Basically, if a row exists with event_type = 'deleted', then it should exclude that row and any other rows with the same sid, period_start and period_end values. I have tried:
SELECT e.* FROM events e
JOIN (SELECT sid, event_type, period_start, period_end
FROM events) e2
ON
(e2.sid = e.sid AND e2.event_type = "deleted"
AND e2.period_start = e.period_start
AND e2.period_end = e.period_end)
WHERE
(e.event_type = 'created' OR e.event_type = 'updated')
AND date(e.period_start) <= '2016-04-01'
AND date(e.period_end) >= '2016-04-01';
which should return the created event (but isn't returning anything), while using the dates 2016-09-01 or 2017-01-01 should return nothing. I'm not sure what to try next. I'd really like to be able to accomplish this in a query rather than having to process the data in PHP or JS.
As you have now added a description of how you want deleted information excluded I suggest the following:
SELECT e.eid, e.cid, e.sid, e.event_type, e.period_start, e.period_end, e2.eid, e2.event_type
FROM events e
left join events e2
on e.sid = e2.sid and e.event_type <> 'deleted' and e2.event_type = 'deleted'
AND e.period_start = e2.period_start AND e.period_end = e2.period_end
WHERE e.event_type <> 'deleted'
AND e2.eid IS NULL
AND '2016-04-01' between e.period_start and e.period_end
Previous answer:
I really don't know what you want, perhaps it would help if you listed the a set of parameters, and then listed the expected result for those?
In the absence of that perhaps this will help:
Query 1:
SELECT e.*
FROM events e
WHERE '2016-04-01' between e.period_start and e.period_end
Results:
| eid | cid | sid | event_type | period_start | period_end | date | datetime |
|---------|------------|------------------|------------|-------------------------|-----------------------------|-------------------------|-------------------------|
| event_1 | customer_1 | subscription_456 | created | March, 11 2016 17:38:50 | September, 11 2016 18:38:50 | March, 11 2016 00:00:00 | March, 11 2016 17:38:51 |
Query 2:
SELECT e.eid, e.cid, e.sid, e.event_type, e.period_start, e.period_end, e2.eid, e2.event_type
FROM events e
left join events e2
on e.sid = e2.sid and e.event_type <> 'deleted' and e2.event_type = 'deleted'
WHERE '2016-04-01' between e.period_start and e.period_end
Results:
| eid | cid | sid | event_type | period_start | period_end | eid | event_type |
|---------|------------|------------------|------------|-------------------------|-----------------------------|---------|------------|
| event_1 | customer_1 | subscription_456 | created | March, 11 2016 17:38:50 | September, 11 2016 18:38:50 | event_3 | deleted |
It is hard to tell what is broken at given sample data.
However it is important to debug at which line SQL stops to return values:
See the following example, there is still one line returned at the highlighted part of SQL
However if you run the additional 2 lines at the end, there will be no result, and you will need to double check why? Because no data will match after SQL line #11

Unable to INSERT ON DUPLICATE KEY UPDATE from another query

I am working on the following table:
CREATE TABLE `cons` (
`Id` char(20) NOT NULL,
`Client_ID` char(12) NOT NULL,
`voice_cons` decimal(11,8) DEFAULT '0.00000000',
`data_cons` int(11) DEFAULT '0',
`day` date DEFAULT NULL,
PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I need to get some data from another table, cdr, which contains a row per event. This means every call or data connection has its own row.
+-----------+--------------+----------------+-------+
| Client_ID | Data_Up_Link | Data_Down_Link | Price |
+-----------+--------------+----------------+-------+
| 1 | 23 | 56 | 0 |
| 1 | 12 | 3 | 0 |
| 1 | 0 | 0 | 5 |
+-----------+--------------+----------------+-------+
I need to compute the total voice and data consumption for each Client_ID in my new cons table, but just keeping a single record for each Client_ID and day. To keep the question simple, I will consider just one day.
+-----------+-----------+------------+
| Client_ID | data_cons | voice_cons |
+-----------+-----------+------------+
| 1 | 94 | 5 |
+-----------+-----------+------------+
I have unsuccessfully tried the following, among many other (alias, .
insert into cons_day (Id, Client_ID, voice_cons, MSISDN, day)
select
concat(Client_ID,date_format(date,'%Y%m%d')),
Client_ID,
sum(Price) as voice_cons,
date as day
from cdr
where Type_Cdr='VOICE'
group by Client_ID;
insert into cons_day (Id, Client_ID, data_cons, MSISDN, day)
select
concat(Client_ID,date_format(date,'%Y%m%d')),
Client_ID,
sum(Data_Down_Link+Data_Up_Link) as data_cons,
Calling_Number as MSISDN,
date as day
from cdr
where Type_Cdr='DATA'
group by Client_ID
on duplicate key update data_cons=data_cons;
But I keep getting the values unchanged or receiving SQL errors. I would really appreciate a piece of advice.
Thank you very much in advance.
First of all it seems that Id column in cons table is absolutely redundant. You already have ClientID and Day columns. Just make them PRIMARY KEY.
That being said the proposed table schema might look like
CREATE TABLE `cons`
(
`Client_ID` char(12) NOT NULL,
`voice_cons` decimal(11,8) DEFAULT '0.00000000',
`data_cons` int(11) DEFAULT '0',
`day` date DEFAULT NULL,
PRIMARY KEY (`Client_ID`, `day`)
);
Now you can use conditional aggregation to get your voice_cons and data_cons in one go
SELECT Client_ID,
SUM(CASE WHEN Type_CDR = 'VOICE' THEN price END) voice_cons,
SUM(CASE WHEN Type_CDR = 'DATA' THEN Data_Up_Link + Data_Down_Link END) data_cons,
DATE(date) day
FROM cdr
GROUP BY Client_ID, DATE(date)
Note: you have to GROUP BY both by Client_ID and DATE(date)
Now the INSERT statement should look like
INSERT INTO cons (Client_ID, voice_cons, data_cons, day)
SELECT Client_ID,
SUM(CASE WHEN Type_CDR = 'VOICE' THEN price END) voice_cons,
SUM(CASE WHEN Type_CDR = 'DATA' THEN Data_Up_Link + Data_Down_Link END) data_cons,
DATE(date) day
FROM cdr
GROUP BY Client_ID, DATE(date)
ON DUPLICATE KEY UPDATE voice_cons = VALUES(voice_cons),
data_cons = VALUES(data_cons);
Note: since now you simultaneously get both voice_cons and data_cons you might not need ON DUPLICATE KEY clause at all if you don't process data for the same dates multiple times.
Here is SQLFiddle demo