False Positives outside date range in this mySQL JOIN - mysql

I am getting historical count data together in an automated report. The two main tables schemas are below. The third table referenced is person which has it's ids as foreign keys in email_list_subscription. That table's primary key consists of the two foreign keys email_list and person.
SQLFIDDLE HERE
The query below is coming up with a count which is outside the date ranges allowed in the query and I can't figure out why. It has rows for an email list that definitely has now rows in 2014 at all.
CREATE TABLE `email_list` (
`id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`handle` varchar(50) NOT NULL DEFAULT '',
`title` varchar(255) DEFAULT NULL,
`operator` varchar(255) DEFAULT NULL,
`operator_contact_name` varchar(255) DEFAULT NULL,
`operator_contact_email` varchar(150) DEFAULT NULL,
`operator_contact_phone` varchar(20) DEFAULT NULL,
`operator_listid` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `handle` (`handle`),
KEY `handle_2` (`handle`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `email_list_subscription` (
`email_list` smallint(5) unsigned NOT NULL DEFAULT '0',
`person` int(10) unsigned NOT NULL DEFAULT '0',
`as_email_address` varchar(150) DEFAULT NULL,
`datetime_synced_to_operator` datetime DEFAULT NULL,
`opted_in` datetime DEFAULT NULL,
`opted_out` datetime NOT NULL,
`undeliverable` datetime NOT NULL,
PRIMARY KEY (`email_list`,`person`),
KEY `email_list` (`email_list`),
KEY `person` (`person`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Here is a query dumped from the script and it's results checked directly in mySQL monitor:
SELECT
el.id, el.handle,
els.`email_list` ,
COUNT( els.person ) AS c
FROM
`email_list` el,
`email_list_subscription` els
WHERE
el.id = els.email_list
AND (
DATE( els.`datetime_synced_to_operator` ) >= '2014-04-01'
OR
DATE( els.`opted_in` ) >= '2014-04-01'
)
AND (
DATE( els.`datetime_synced_to_operator` ) <= '2014-05-18'
OR
DATE( els.`opted_in` ) <= '2014-05-18'
)
GROUP BY els.`email_list`
How is this capturing els rows whose dates are not in the range?

Those DATE() calls are going to kill your performance, much better to do
els.`datetime_synced_to_operator` >= '2014-04-01 00:00:00'
(for example).
Also, it is not clear your date ranges are going to work as intended; this seems more clear (but may have different results depending on data):
WHERE el.id = els.email_list
AND (
( els.`datetime_synced_to_operator` BETWEEN '2014-04-01 00:00:00' AND '2014-05-18 23:59:59')
OR
( els.`opted_in` BETWEEN '2014-04-01 00:00:00' AND '2014-05-18 23:59:59')
)
;
Also: What was wrong with the original where (below)?
AND (
DATE( els.`datetime_synced_to_operator` ) >= '2014-04-01'
OR
DATE( els.`opted_in` ) >= '2014-04-01'
)
AND (
DATE( els.`datetime_synced_to_operator` ) <= '2014-05-18'
OR
DATE( els.`opted_in` ) <= '2014-05-18'
)
Best illustrated with an example... any row with datetime_synced_to_operator any time after the start date (even after the end date) and an opted_in any time before the end date (even before the start date) gives true for this clause; and vice versa.

Related

MySQL 8.0.26 Slow Query When Counting Results on Three Tables with Indexes

I have a statistics page on my internal admin site to show some traffic information on individual sites. However, the query is taking nearly 80 seconds to run, even with Indexes placed on the keys for each of the tables.
I'm typically running this query searching for session status within 7 days of the date ran.
SELECT
*,
(
SELECT
COUNT(`session_id`)
FROM
`my-db`.`sessions`
WHERE
`my-db`.`sessions`.`site_id` = `my-db`.`sites`.`site_id`
AND `session_datetime` > '2021-10-17 00:00:00'
) as session_count,
(
SELECT
`session_datetime`
FROM
`my-db`.`sessions`
WHERE
`my-db`.`sessions`.`site_id` = `my-db`.`sites`.`site_id`
AND `session_datetime` > '2021-10-17 00:00:00'
ORDER BY
`session_id` ASC
LIMIT
1
) as first_session,
(
SELECT
`session_datetime`
FROM
`my-db`.`sessions`
WHERE
`my-db`.`sessions`.`site_id` = `my-db`.`sites`.`site_id`
AND `session_datetime` > '2021-10-17 00:00:00'
ORDER BY
`session_id` DESC
LIMIT
1
) as last_session,
(
SELECT
COUNT(`site_profiles_id`)
FROM
`my-db`.`sites_profiles`
WHERE
`my-db`.`sites_profiles`.`site_id` = `my-db`.`sites`.`site_id`
AND `origin` = 1
AND `date_added` > '2021-10-17 00:00:00'
) as profiles_originated,
(
SELECT
COUNT(`site_profiles_id`)
FROM
`my-db`.`sites_profiles`
WHERE
`my-db`.`sites_profiles`.`site_id` = `my-db`.`sites`.`site_id`
AND `scanned` = 1
AND `date_added` > '2021-10-17 00:00:00'
) as profiles_scanned,
(
SELECT
COUNT(`site_profiles_id`)
FROM
`my-db`.`sites_profiles`
WHERE
`my-db`.`sites_profiles`.`site_id` = `my-db`.`sites`.`site_id`
AND `date_added` > '2021-10-17 00:00:00'
) as profiles_collected
FROM
`my-db`.`sites`
WHERE
`site_id` in (
SELECT
DISTINCT(`site_id`)
FROM
`my-db`.`sessions`
WHERE
`session_datetime` > '2021-10-17 00:00:00'
)
ORDER BY
`session_count` DESC
LIMIT
25;
I'm trying to understand the results of EXPLAIN, but I believe the issue is because of the RANGE type of the index used on the datetime.
It's worth noting, I'm dynamically changing the ORDER BY clause depending on a sort dropdown selected by the admin user to sort the results by - site_id ASC/DESC, session_count ASC/DESC and profiles_collected ASC/DESC.
The performance of the profiles_collected DESC is significantly impacted when compared to the others.
network_sites
CREATE TABLE `sites` (
`site_id` bigint NOT NULL AUTO_INCREMENT,
`account_id` bigint NOT NULL,
`site_hash` varchar(128) CHARACTER SET utf8 NOT NULL,
`site_address` varchar(255) CHARACTER SET utf8 NOT NULL,
`site_status` int NOT NULL,
`site_created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`site_updated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`site_id`),
UNIQUE KEY `site_id_UNIQUE` (`site_id`),
UNIQUE KEY `site_hash_UNIQUE` (`site_hash`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
network_profiles_sessions
CREATE TABLE `sessions` (
`session_id` bigint NOT NULL AUTO_INCREMENT,
`site_id` bigint NOT NULL,
`profile_id` bigint DEFAULT NULL,
`session_hash` varchar(128) CHARACTER SET utf8 NOT NULL,
`session_ip_address` varchar(45) CHARACTER SET utf8 DEFAULT NULL,
`session_useragent` text CHARACTER SET utf8,
`session_page_uri` text CHARACTER SET utf8,
`session_datetime` datetime DEFAULT CURRENT_TIMESTAMP,
`session_has_data` tinyint DEFAULT '0',
`session_processed` tinyint DEFAULT '0',
`session_queued` tinyint DEFAULT '0',
PRIMARY KEY (`session_id`),
UNIQUE KEY `session_id_UNIQUE` (`session_id`),
KEY `session_has_data` (`session_has_data`,`session_id`),
KEY `session_processed` (`session_processed`,`session_id`),
KEY `session_queued` (`session_queued`,`session_id`),
KEY `session_datetime` (`session_datetime`,`session_id`),
KEY `session_hash` (`session_hash`,`session_id`),
KEY `site_id` (`site_id`,`session_id`),
FULLTEXT KEY `session_page_uri` (`session_page_uri`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
network_sites_profiles
CREATE TABLE `sites_profiles` (
`site_profiles_id` bigint NOT NULL AUTO_INCREMENT,
`site_id` bigint NOT NULL,
`profile_id` bigint NOT NULL,
`origin` int DEFAULT NULL,
`scanned` int DEFAULT NULL,
`date_added` datetime DEFAULT CURRENT_TIMESTAMP,
`date_lastseen` datetime DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`site_profiles_id`),
UNIQUE KEY `site_users_id_UNIQUE` (`site_profiles_id`),
KEY `site_id` (`site_id`,`site_profiles_id`),
KEY `date_added` (`date_added` DESC,`site_profiles_id`),
KEY `origin` (`origin`,`site_profiles_id`),
KEY `scanned` (`scanned`,`site_profiles_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
PRIMARY KEY(a)
UNIQUE KEY(a) -- redundant, DROP it
A PK is a UNIQUE key is an INDEX.
The last 3 subqueries can be combined:
SELECT SUM(origin = 1) AS profiles_originated,
SUM(scanned = 1) AS profiles_scanned,
COUNT(*) AS profiles_collected
FROM profiles
WHERE date_added >= '2021-10-17'
And then JOIN to that. However, there are some potential problems...
How do session.datetime and date_added compare? I'm assuming that a session is added before it happens?
I assume you want to include midnight of the morning of Oct 17?
The first 3 subqueries can perhaps be similarly simplified. Note that MAX(session_datetime) is sufficient for last_session.

SQL - Find Available Slots Within Date Range

I have following database schema:
CREATE TABLE `property` (
`id` INT(11) PRIMARY KEY NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) NOT NULL
);
CREATE TABLE `venue` (
`id` INT(11) PRIMARY KEY NOT NULL AUTO_INCREMENT,
`property_id` INT(11) NOT NULL,
`name` VARCHAR(100) NOT NULL
);
CREATE TABLE `venue_available` (
`id` INT(11) PRIMARY KEY NOT NULL AUTO_INCREMENT,
`venue_id` INT(100) NOT NULL,
`day` VARCHAR(10) NOT NULL,
`from_time` TIME NOT NULL,
`to_time` TIME NOT NULL,
`lead_time_in_minutes` INT(11)
);
CREATE TABLE `venue_unavailable` (
`id` INT(11) PRIMARY KEY NOT NULL AUTO_INCREMENT,
`venue_id` INT(100) NOT NULL,
`from_datetime` DATETIME NOT NULL,
`to_datetime` DATETIME NOT NULL
);
CREATE TABLE `venue_reservation` (
`id` INT(11) PRIMARY KEY NOT NULL AUTO_INCREMENT,
`venue_id` INT(100) NOT NULL,
`start_datetime` DATETIME NOT NULL,
`end_datetime` DATETIME NOT NULL
);
I want to find properties having venues available from 25th Aug(Sat) to 27th August (Mon) from 10am to 3pm
Here is the SQL query I tried
SELECT
p.id,
p.name AS property_name,
v.name AS venue_name
FROM
venue v
LEFT JOIN
property p ON v.property_id = p.id
-- venue_available
LEFT JOIN
venue_available va_0 ON va_0.venue_id = v.id
LEFT JOIN
venue_available va_1 ON va_1.venue_id = v.id
WHERE 1 = 1
-- venue_available
AND (
(va_0.day = 'sat' AND va_0.from_time <= '2018-08-25 10:00:00' AND va_0.to_time >= '2018-08-25 15:00:00') AND
(va_1.day = 'sun' AND va_1.from_time <= '2018-08-26 10:00:00' AND va_1.to_time >= '2018-08-26 15:00:00')
)
-- venue_unavailable
AND NOT EXISTS (SELECT * FROM venue_unavailable vu WHERE '2018-08-25 10:00:00' <= vu.to_datetime AND '2018-08-26 15:00:00' >= vu.from_datetime)
GROUP BY
p.id;
The problem with the current query is, the condition for venue_available in SQL query seems to work correctly, but when I add the condition for venue_unavailable it returns me the empty result, however based on the data I am expecting 1 result.
Here is the link to SQL fiddle, if you want to play around with schema and fixtures
http://sqlfiddle.com/#!9/33d60f/10
Here is what I am trying to do
1. Get the list of all properties (not venues)
2. List the property only if one or more venue is available after checking with
venue_available
venue_unavailable
venue_reservation
Can you help me with how to go about this?x
Thank you.
UPDATE1
I followed the following post to determine overlapping dates in venue_unavailable Select rows that are not between dates (reservation)
Alright, so the way I solved it is using sub query which is working now.
I am now using the WHERE clause with something like this
WHERE v.id NOT IN (SELECT venue_id FROM provider_block pb WHERE :start_datetime <= pb.to_date AND :end_datetime >= pb.from_date)
This seems to do the job for now.

Mysql Queries in big data table

I have problem with my mysql database table. I have more then 20 millions rows in table. Table structure shown below. Main problem is that queries take really long time to execute (some queries take more then 20 second). I use indexes where i can, however many queries use date range and with date range my indexes don't work. Also in queries i use almost every column. What i need to change to my data table, to improve efficiency?
`history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`barcode` varchar(100) DEFAULT NULL,
`bag` varchar(100) DEFAULT NULL,
`action` int(10) unsigned DEFAULT NULL,
`place` int(10) unsigned DEFAULT NULL,
`price` decimal(10,2) DEFAULT NULL,
`old_price` decimal(10,2) DEFAULT NULL,
`user` int(11) DEFAULT NULL,
`amount` int(10) DEFAULT NULL,
`rotation` int(10) unsigned DEFAULT NULL,
`discount` decimal(10,2) DEFAULT NULL,
`discount_type` tinyint(2) unsigned DEFAULT NULL,
`original` int(10) unsigned DEFAULT NULL,
`was_in_shop` int(10) unsigned DEFAULT NULL,
`cate` int(10) unsigned DEFAULT NULL COMMENT 'grupe',
`sub_cate` int(10) unsigned DEFAULT NULL,
`comment` varchar(255) DEFAULT NULL,
`helper` varchar(255) DEFAULT NULL,
`ywd` varchar(255) DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
`deleted_at` timestamp NULL DEFAULT NULL
)
PRIMARY KEY (`id`),
KEY `barcode` (`barcode`) USING BTREE,
KEY `action` (`action`) USING BTREE,
KEY `original` (`original`) USING BTREE,
KEY `created_at` (`created_at`) USING BTREE,
KEY `bag` (`bag`) USING BTREE
ENGINE=InnoDB
Some of my queries:
select SUM(amount) as amount,
SUM(comment) as price,
cate
from `history`
where ( `action` = '4'
and `place` = '28'
and `created_at` >= '2018-04-01 00:00:00'
and `created_at` <= '2018-04-30 23:59:59'
)
and `history`.`deleted_at` is null
group by `cate`;
select cate,
SUM(amount) AS kiekis,
SUM(IF(discount>0,(price*amount)-discount,(price*amount))) AS suma,
SUM(IF(discount>0,IF(discount_type=1,(discount*price)/100,discount),0)) AS nuolaida
from `history`
where ( `history`.`action` = '4'
and `history`.`created_at` >= '2018-01-01 00:00:00'
and `history`.`created_at` <= '2018-01-23 23:59:59'
)
and LENGTH(barcode) > 7
and `history`.`deleted_at` is null
group by `cate`;
Your first query is better written as:
select SUM(h.amount) as amount,
SUM(h.comment) as price,
h.cate
from history h
where h.action = 4 and
h.place = 28 and
h.created_at >= '2018-04-01' and
h.created_at < '2018-05-01' and
h.deleted_at is null
group by h.cate;
Why?
place and action are numbers. The comparison should be to a number. Mixing types can prevent the use of indexes.
The time component is not useful for the date comparison.
Qualifying all columns names is just a good idea.
Then, for this query, a reasonable index is history(action, place, created_at, deleted_at).
So, I would start with multi-column indexes.
If you continue to have performance issues, you should then consider partitioning the data based on the created_at date.
INDEX(a), INDEX(b) serves some purposes, but the "composite" INDEX(a,b) better serves some queries.
where ( `action` = '4'
and `place` = '28'
and `created_at` >= '2018-04-01 00:00:00'
and `created_at` <= '2018-04-30 23:59:59'
)
and `history`.`deleted_at` is null
Needs
INDEX(action, place, -- first, but in either order
deleted_at,
created_at) -- last
I prefer to write the date range thus:
and `history`.`created_at` >= '2018-04-01'
and `history`.`created_at` < '2018-04-01' + INTERVAL 1 MONTH
It's a lot easier than dealing with leap year, end of year, etc. And it works 'correctly' for DATE, DATETIME, DATETIME(6), TIMESTAMP, and TIMESTAMP(6).
For this
where ( `history`.`action` = '4'
and `history`.`created_at` >= '2018-01-01 00:00:00'
and `history`.`created_at` <= '2018-01-23 23:59:59'
)
and LENGTH(barcode) > 7
and `history`.`deleted_at` is null
I would try this as the most likely:
INDEX(action, deleted_at, created_at) -- in this order
Do not have separate tables for separate years. If you will be deleting old data, then consider PARTITION BY RANGE(TO_DAYS(...)) in order to get the speed of DROP PARTITION. (But that is another discussion.)
If I was in your situation I would consider a paged database name. By this I mean have multiple history_X tables where X is an int related to the content.
Since this is a history table is it possible to include part of the date in the name?
You said that you use ranges to search for the data, so if you were to use year in the table name you could have
history_2014
history_2015
history_2016
history_2017
history_2018
etc.
Then you could search with the table that applies to your date range.
If you need date from a range that spans to tables then you could use a UNION query to bridge the 2 result sets into one.

Constraint to only allow users one record in each hour

I would like to create a time constraint on the datetime start and the datetime end so that you can only enter one appointment per hour on a calendar calendar in this form:
CREATE TABLE `events` (
`id` int(11) NOT NULL,
`title` varchar(255) NOT NULL,
`color` varchar(7) DEFAULT NULL,
`start` datetime NOT NULL,
`end` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I tried this but it does not work:
alter table calendar
add constraint chk_HH_mm
check( start( HH:mm ) >= end( HH:mm) && end( HH:mm) <= start( HH:mm ));
An ideas ?

Hotel Booking application rooms avaiability

I'm working on a hotel booking app, where you can select a date and a room type and the application shows the available rooms. I'm using the Java Spring framework to do this.
This are the tables that i think matter to this query:
CREATE TABLE IF NOT EXISTS `booking` (
`id` bigint(20) NOT NULL,
`aproved` bit(1) NOT NULL,
`begin_date` datetime DEFAULT NULL,
`end_date` datetime DEFAULT NULL,
`room_id` bigint(20) DEFAULT NULL,
`user_id` bigint(20) DEFAULT NULL
) ENGINE=InnoDB AUTO_INCREMENT=17 DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `room` (
`id` bigint(20) NOT NULL,
`name` varchar(255) DEFAULT NULL,
`room_type_id` bigint(20) DEFAULT NULL
) ENGINE=InnoDB AUTO_INCREMENT=161 DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `room_type` (
`id` bigint(20) NOT NULL,
`number_of_rooms` int(11) NOT NULL,
`price` int(11) NOT NULL,
`type` varchar(255) DEFAULT NULL,
`hotel_id` bigint(20) DEFAULT NULL
) ENGINE=InnoDB AUTO_INCREMENT=16 DEFAULT CHARSET=latin1;
I'm having dificulties making that query...
I made this one, but it's not a good idea to join the rooms with boookings because that will only select the rooms when there are bookings...
SELECT *
FROM room r join booking b on b.room_id = r.id join room_type rt on r.room_type_id = rt.id
WHERE not ((b.begin_date >= :initDate And b.begin_date <= :endDate) or (b.begin_date >= :initDate And b.end_date <= :endDate) or (b.begin_date <= :initDate and b.end_date >= :endDate) and b.aproved = true and rt.id = :roomType)
Any ideas?
select * from rooms r
where r.room_type_id = :desiredRoomType
and not exists (
select * from bookings b
where begin_date >= :desiredDate
and end_date <= :desiredDate
)
I am not sure, why begin_date/end_date might be null in your case, if they really can be, the query should reflect that.