MySQL - searching a self join and ranges or data - mysql

I'm tasked with my local community center to build a 'newlywed' type game in time for Valentines day, so no rush!
So, We've got 50 odd couples who know each other quite well are going to be asked 100 questions before time. Each question has the users response and a range at which allow a margin of error (this range quota will be limited). and they can then select what they think their partners answer will be, with the same range for margin of error.
EG (I'll play a round as me and my GF):
Question: Do you like fruit?
I am quite fussy about fruit so I'll put a low score out of 100.. say 20. But what I do like, I LOVE and think that my GF might think I will put a higher answer, so my margin of error I'll allow is going to be 30.
I think she loves fruit and will put at least 90.. but she enjoys alot of foods so may just rank it lower, so I'll give her a margin of 20.
Ok, repeat that process for 100 questions and 50 couples.
I'm left with a table like this:
u_a = User answer
u_l = user margin of error level
p_a = partner answer
p_l = partner margin of error level
CREATE TABLE IF NOT EXISTS `large` (
`id_user` int(11) NOT NULL,
`id_q` int(11) NOT NULL,
`u_a` int(11) NOT NULL,
`u_l` int(11) NOT NULL,
`p_a` int(11) NOT NULL,
`p_l` int(11) NOT NULL,
KEY `id_user` (`id_user`,`id_q`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Stackoverflow Test';
So my row will be in the previous example:
(1, 1, 20, 30, 90, 20)
my mission is to search ALL users to see who the best matches are out of the 50.. (and hope that couples are good together!).
I'll want to search the DB for all users where my answer for my partner matches their answer, but for every user.
Here's what I've got so far (Note I've commented out some code, that's cause I'm trying two ways, not sure what's best):
SELECT
match.id_user,
count(*) as count
from `large` `match`
INNER JOIN `large` `me` ON me.id_q = match.id_q
WHERE
me.id_user = 1 AND
match.id_user != 1 AND
GREATEST(abs(me.p_a - match.u_a), 0) <= me.p_l
AND
GREATEST(abs(match.p_a - me.u_a), 0) <= match.p_l
#match.u_a BETWEEN GREATEST(me.p_a - me.p_l, 0) AND (me.p_a + me.p_l)
#AND
#me.u_a BETWEEN GREATEST(match.p_a - match.p_l, 0) AND (match.p_a + match.p_l)
GROUP BY match.id_user
ORDER BY count DESC
My question today is :
This query takes AGES! I'd like to do it during the game and allow users a chance to change answers on the night and get instant results, so this has to be quick. I'm looking at 40 seconds when looking up all matches for me (user 1).
I'm reading about DB engines and indexing now to make sure I'm doing all that I can... but suggestions are welcome!
Cheers and PHEW!

Your query shouldn't be taking 40 seconds on a smallish data set. The best way to know what is going on is to use explain before the query.
However, I suspect the problem is the condition on me. The MySQL engine might be creating all possible combinations for all users and then filtering you out. You can test this by modifying this code:
from `large` `match` INNER JOIN
`large` `me`
ON me.id_q = match.id_q
WHERE me.id_user = 1 AND
match.id_user != 1 AND . . . .
To:
from `large` `match` INNER JOIN
(select me.*
from `large` `me`
where me.id_user = 1
) me
ON me.id_q = match.id_q
WHERE match.id_user != 1 AND . . . .
In addition, the following indexes might help the query: large(id_user, id_q) and large(id_q).

Related

MySQL select HOTTEST (most upvoted in shortest time)

For a long time I have been trying to figure out how to make Hottest Posts
What I want to achieve : ORDER BY MOST UPVOTED IN LESS TIME
For example I got 4 posts:
ID UPVOTES(Total) UPVOTES(Weekly) DATE
1 50 50 01.09.2017
2 421 6 25.07.2017
3 71 50 13.08.2017
4 111 37 15.08.2017
And It would need to order like 1 -> 3 -> 4 -> 2
My Goal is to get UPVOTES(Weekly) - > I Don't know how to calculate
it. I just made it here, to better explain what I want to achieve.
I have got 2 database tables fun_posts and fun_post_upvotes
I was trying to achieve it like this, but it didn't work, it just ordered by id or ordered by upvotes
$stmt = $this->conn->prepare=("SELECT * , (SELECT COUNT(*) FROM
fun_post_upvotes WHERE image_id=fun_posts.id GROUP BY image_id) FROM fun_posts
ORDER BY fun_posts.id DESC, fun_posts.upvotes DESC");
This is working a part of it.
SELECT fun_posts_upvotes.image_ID, COUNT(image_ID) as 'Upvotes' FROM fun_posts_upvotes GROUP BY fun_posts_upvotes.image_ID ORDER BY Upvotes DESC;
Just try it and add the Date part. ;-) you can ask again if you tried some :P
If I understand the problem correctly you should quite easily be able to apply the following to your problem:
MYSQL Order By Sum of Columns
Order By Sum of Two Fields
CREATE TABLE `tb1` (
`id` int(11) NOT NULL,
`Name` varchar(50) NOT NULL,
`up` int(11) NOT NULL,
`Down` int(11) NOT NULL,
`Date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SELECT `Name`, (`tb1`.`up`/ DATEDIFF(NOW(),`tb1`.`Date` )) as `heat`
FROM `tb1`
ORDER BY (`tb1`.`up`/ DATEDIFF(`tb1`.`Date`, NOW())) ASC
This should illustrate my point
I think adding to your query:
ORDER BY UPVOTES(Weekly) DESC
would work for your issue.

Mysql Select Statement isn't working

So maybe it's due to lack of sleep, but I am having a major brain malfunction and can't remember what is going wrong here. Here is my statement:
SELECT DISTINCT `county`, COUNT(*)
FROM `ips`
WHERE `county` != 'NULL' AND `county` != '' AND
EXISTS (SELECT * FROM `pages`
WHERE (`timestamp` BETWEEN FROM_UNIXTIME(?) AND FROM_UNIXTIME(?)))
GROUP BY `county`
I'm expecting the results to be something like:
County | Number
Some county | 42
Other county | 27
My pages table has a timestamp of each time a page is viewed by a user, so if they viewed a page between the date, the county from the IP table is selected and the number of that total county is being populated as the num. I'm using PDO and i'm passing in two times that I've used strtotime() on.
I'm currently stuck. All help is apprieciated. Hopefully it's not some stupid little mistake that I've overlooked.
You cant compare null with != you need to use is not null.
SELECT `county`, COUNT(*)
FROM `ips`
WHERE `county` IS NOT NULL AND `county` != '' AND
EXISTS (SELECT 1 FROM `pages`
WHERE (`timestamp` BETWEEN FROM_UNIXTIME(?) AND FROM_UNIXTIME(?)))
GROUP BY `county`

MySQL: How to construct a given query

I am not a MySQL guru at all, and I would really appreciate if someone takes some time to help me. I have three tables as shown below:
TEAM(teamID, teamName, userID)
YOUTH_TEAM(youthTeamID, youthTeamName, teamID)
YOUTH_PLAYER(youthPlayerID, youthPlayerFirstName, youthPlayerLastName, youthPlayerAge, youthPlayerDays, youthPlayerRating, youthPlayerPosition, youthTeamID)
And this is the query that I have now:
SELECT team.teamName, youth_team.youthTeamName, youth_player.*
FROM youth_player
INNER JOIN youth_team ON youth_player.youthTeamID = youth_team.youthTeamID
INNER JOIN team ON youth_team.teamID = team.teamID
WHERE youth_player.youthPlayerAge < 18
AND youth_player.youthPlayerDays < 21
AND youth_player.youthPlayerRating >= 5.5
What I would like to add to this query is a more thorough checks like the following:
if player has 16 years, and his position is scorer, then the player should have at least 7 rating in order to be returned
if player has 15 years, and his position is playmaker, then the player should have at least 5.5 rating in order to be returned
etc., etc.
How can I implement these requirements in my query (if possible), and is that query going to be a bad-way solution? Is it maybe going to be better if I do the selection with PHP code (if we suppose I use PHP) instead of doing it in the query?
Here is a possible solution with an additional "criteria/filter" table:
-- SAMPLE TEAMS: Yankees, Knicks:
INSERT INTO `team` VALUES (1,'Yankees',2),(2,'Knicks',1);
-- SAMPLE YOUTH TEAMS: Yankees Juniors, Knicks Juniors
INSERT INTO `youth_team` VALUES (1,'Knicks Juniors',1),(2,'Yankees Juniors',2);
-- SAMPLE PLAYERS
INSERT INTO `youth_player` VALUES
(1,'Carmelo','Anthony',16,20,7.5,'scorer',1),
(2,'Amar\'e','Stoudemire',17,45,5.5,'playmaker',1),
(3,'Iman','Shumpert',15,15,6.1,'playmaker',1),
(4,'Alex','Rodriguez',18,60,3.5,'playmaker',2),
(5,'Hiroki','Kuroda',16,17,8.7,'scorer',2),
(6,'Ichiro','Suzuki',19,73,8.3,'playmaker',2);
-- CRITERIA TABLE
CREATE TABLE `criterias` (
`id` int(11) NOT NULL,
`age` int(11) DEFAULT NULL,
`position` varchar(45) DEFAULT NULL,
`min_rating` double DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- SAMPLE CRITERIAS
-- AGE=16, POSITION=SCORER, MIN_RATING=7
-- AGE=15, POSITION=PLAYMAKER, MIN_RATING=5.5
INSERT INTO `criterias` VALUES (1,16,'scorer',7), (2,15,'playmaker',5.5);
Now your query could look like:
SELECT team.teamName, youth_team.youthTeamName, youth_player.*
FROM youth_player
CROSS JOIN criterias
INNER JOIN youth_team ON youth_player.youthTeamID = youth_team.youthTeamID
INNER JOIN team ON youth_team.teamID = team.teamID
WHERE
(
youth_player.youthPlayerAge < 18
AND youth_player.youthPlayerDays < 21
AND youth_player.youthPlayerRating >= 5.5
)
AND
(
youth_player.youthPlayerAge = criterias.age
AND youth_player.youthPlayerPosition = criterias.position
AND youth_player.youthPlayerRating >= criterias.min_rating
)
This yields (shortened results):
teamName youthTeamName youthPlayerName Age Days Rating Position
=============================================================================
Yankees "Knicks Juniors" Carmelo Anthony 16 20 7.5 scorer
Yankees "Knicks Juniors" Iman Shumpert 15 15 6.1 playmaker
Knicks "Yankees Juniors" Hiroki Kuroda 16 17 8.7 scorer
Doing it in the query is quite fine...... as long as it doesn't get too messed up. You can perform a lot of stuff in your query, but it may get hard to maintain. So if it gets too long and you want somebody else to take a look at it, you should split it up or find a solution in your php-script.
As for your requirements add this too your WHERE-part:
AND
(
(YOUTH_PLAYER.youthPlayerAge >= 16 AND YOUTH_PLAYER.youthPlayerPosition = 'scorer' AND YOUTH_PLAYER.youthPlayerRating >= 7)
OR (YOUTH_PLAYER.youthPlayerAge >= 15 AND YOUTH_PLAYER.youthPlayerPosition = 'playmaker' AND YOUTH_PLAYER.youthPlayerRating >= 5.5)
)

SQL SUM issues with joins

I got a quite complex query (at least for me).
I want to create a list of users that are ready to be paid. There are 2 conditions that need to be met: order status should be 3 and the total should be more then 50. Currently I got this query (generated with Codeingiter active record):
SELECT `services_payments`.`consultant_id`
, `consultant_userdata`.`iban`
, `consultant_userdata`.`kvk`, `consultant_userdata`.`bic`
, `consultant_userdata`.`bankname`
, SUM(`services_payments`.`amount`) AS amount
FROM (`services_payments`)
JOIN `consultant_userdata`
ON `consultant_userdata`.`user_id` = `services_payments`.`consultant_id`
JOIN `services`
ON `services`.`id` = `services_payments`.`service_id`
WHERE `services`.`status` = 3
AND `services_payments`.`paid` = 0
HAVING `amount` > 50
The services_payments table contains the commissions, consultant_userdata contains the userdata and services keeps the order data. The current query only gives me 1 result while I'm expecting 4 results.
Could someone please tell me what I'm doing wrong and what would be the solution?
For ActiveRecord, rsanchez' answer would be more of
$this->db->group_by('services_payments.consultant_id, consultant_userdata.iban, consultant_userdata.kvk, consultant_userdata.bic, consultant_userdata.bankname');

Multiple sort depending on the current day and tomorrow (bus trips)

I am stuck on huge problem i will say with my below query. Here j5 represent friday and j6 represent saturday (1 to 7... sunday to monday).
As you know, the buses have different schedules depending on the time of the week. Here, I am taking next 5 trips departure after 25:00:00 on cal (j5) and/or after 01:00:00 on cal2 (j6). Bus schedule are builded like this :
If it's 1 am then the current bus time is 25, 2 am is 26 ... you got it. So if I want departure trip for today after let's say 1 AM, i may get only 2-3 since the "bus" day end soon. To solve this problem, I want to add the next departure from the next day (here is saturday after friday). But next day start at 00 like every day in our world.
So what I want to do is : get all next trips for friday j5 after 25:00:00. If I don't have 5, then get all n trip departure for saturday after 01:00:00 (since 25:00:00 = 01:00:00).
Example :
I get departure trip at 25:16:00, 25:46:00 and 26:16:00 for friday. It's 3. I want then to get 2 other departure trip for the next day so i get 5 at the end, and it will be like this 04:50:00 and 05:15:00.
So next departure trip from X stop is : 25:16:00(friday), 25:46:00(friday), 26:16:00(friday), 04:50:00(saturday), 05:15:00(saturday).
I am having problem to sort both results from trips.trip_departure.
I know it may be complicated, it's complicated for me to explain but... anyway. Got question I am here. Thanks a lot in advance !
PS: Using MySQL 5.1.49 and PHP 5.3.8
PS2: I want to avoid doing multiple query in PHP so I'd like to do this in one query, no matter what.
SELECT
trips.trip_departure,
trips.trip_arrival,
trips.trip_total_time,
trips.trip_direction
FROM
trips,
trips_assoc,
(
SELECT calendar_regular.cal_regular_id
FROM calendar_regular
WHERE calendar_regular.j5 = 1
) as cal,
(
SELECT calendar_regular.cal_regular_id
FROM calendar_regular
WHERE calendar_regular.j6 = 1
) as cal2
WHERE
trips.trip_id = trips_assoc.trip_id
AND
trips.route_id IN (109)
AND
trips.trip_direction IN (0)
AND
trips.trip_period_start <= "2011-11-25"
AND
trips.trip_period_end >= "2011-11-25"
AND
(
(
cal.cal_regular_id = trips_assoc.calendar_id
AND
trips.trip_departure >= "25:00:00"
)
OR
(
cal2.cal_regular_id = trips_assoc.calendar_id
AND
trips.trip_departure >= "01:00:00"
)
)
ORDER BY
trips.trip_departure ASC
LIMIT
5
EDIT Table structure :
Table calendar_regular
j1 mean sunday, j7 monday, etc).
`cal_regular_id` tinyint(3) unsigned NOT NULL AUTO_INCREMENT,
`j1` tinyint(1) NOT NULL COMMENT 'Lundi',
`j2` tinyint(1) NOT NULL COMMENT 'Mardi',
`j3` tinyint(1) NOT NULL COMMENT 'Mercredi',
`j4` tinyint(1) NOT NULL COMMENT 'Jeudi',
`j5` tinyint(1) NOT NULL COMMENT 'Vendredi',
`j6` tinyint(1) NOT NULL COMMENT 'Samedi',
`j7` tinyint(1) NOT NULL COMMENT 'Dimanche',
PRIMARY KEY (`cal_regular_id`),
KEY `j1` (`j1`),
KEY `j2` (`j2`),
KEY `j3` (`j3`),
KEY `j4` (`j4`),
KEY `j5` (`j5`),
KEY `j6` (`j6`),
KEY `j7` (`j7`)
Data :
cal_regular_id j1 j2 j3 j4 j5 j6 j7
1 0 0 0 0 1 0 0
2 0 0 0 1 1 0 0
3 1 1 1 1 1 0 0
4 0 0 0 0 0 1 0
5 0 0 0 0 0 0 1
Some bus are avaiable x days it's a table that define when in the week... assigned to the trip_assoc table.
Trips table
`agency_id` smallint(5) unsigned NOT NULL,
`trip_id` binary(16) NOT NULL,
`trip_period_start` date NOT NULL,
`trip_period_end` date NOT NULL,
`trip_direction` tinyint(1) unsigned NOT NULL,
`trip_departure` time NOT NULL,
`trip_arrival` time NOT NULL,
`trip_total_time` mediumint(8) NOT NULL,
`trip_terminus` mediumint(8) NOT NULL,
`route_id` mediumint(8) NOT NULL,
`shape_id` binary(16) NOT NULL,
`block` binary(16) DEFAULT NULL,
KEY `testing` (`route_id`,`trip_direction`),
KEY `trip_departure` (`trip_departure`)
trips_assoc table
`agency_id` tinyint(4) NOT NULL,
`trip_id` binary(16) NOT NULL,
`calendar_id` smallint(6) NOT NULL,
KEY `agency_id` (`agency_id`),
KEY `trip_id` (`trip_id`,`calendar_id`)
First off, NEVER let an outside entity dictate a non-unique join column. They can possibly (with authorization/authentication) dictate unique ones (like a deterministic GUID value). Otherwise, they get to dictate a natural key somewhere, and your database automatically assigns row ids for joining. Also, unless you're dealing with a huge number of joins (multiple dozens) over un-indexed rows, the performance is going to be far less of a factor than the headaches of dealing with it elsewhere.
So, from the look of things, you are storing bus schedules from multiple companies (something like google must be doing for getting public transit directions, yes).
Here's how I would deal with this:
You're going to need a calendar file. This is useful for all business scenarios, but will be extremely useful here (note: don't put any route-related information in it).
Modify your agency table to control join keys. Agencies do not get to specify their ids, only their names (or some similar identifier). Something like the following should suffice:
agency
=============
id - identity, incrementing
name - Externally specified name, unique
Modify your route table to control join keys. Agencies should only be able to specify their (potentially non-unique) natural keys, so we need a surrogate key for joins:
route
==============
id - identity, incrementing
agency_id - fk reference to agency.id
route_identifier - natural key specified by agency, potentially non-unique.
- required unique per agency_id, however (or include variation for unique)
route_variation - some agencies use the same routes for both directions, but they're still different.
route_status_id - fk reference to route_status.id (potential attribute, debatable)
Please note that the route table shouldn't actually list the stops that are on the route - it's sole purpose is to control which agency has which routes.
Create a location or address table. This will benefit you mostly in the fact that most transit companies tend to put multiple routes through the same locations:
location
=============
id - identity, incrementing
address - there are multiple ways to represent addresses in a database.
- if nothing else, seperating the fields should suffice
lat/long - please store these properly, not as a single column.
- two floats/doubles will suffice, although there are some dedicated solutions.
At this point, you have two options for dealing with stops on a route:
Define a stop table, and list out all stops. Something like this:
stop
================
id - identity, incrementing
route_id - fk reference to route.id
location_id - fk reference to location.id
departure - Timestamp (date and time) when the route leaves the stop.
This of course gets large very quickly, but makes dealing with holiday schedules easy.
Define a schedule table set, and an schedule_override table set:
schedule
===================
id - identity, incrementing
route_id - fk reference to route.id
start_date - date schedule goes into effect.
schedule_stop
===================
schedule_id - fk reference to schedule.id
location_id - fk reference to location.id
departure - Time (time only) when the route leaves the stop
dayOfWeek - equivalent to whatever is in calendar.nameOfDay
- This does not have to be an id, so long as they match
schedule_override
===================
id - identity, incrementing
route_id - fk reference to route.id
effective_date - date override is in effect. Should be listed in the calendar file.
reason_id - why there's an override in effect.
schedule_override_stop
===========================
schedule_override_id - fk reference to schedule_override.id
location_id - fk reference to location.id
departure - time (time only) when the route leaves the stop
With this information, I can now get the information I need:
SELECT
FROM agency as a
JOIN route as b
ON b.agency_id = a.id
AND b.route_identifier = :(whatever 109 equates to)
AND b.route_variation = :(whatever 0 equates to)
JOIN (SELECT COALESCE(d.route_id, j.route_id) as route_id,
COALESCE(e.location_id, j.location_id) as location_id,
COALESCE(TIMESTAMP(c.date, e.departure),
TIMESTAMP(c.date, j.departure)) as departure_timestamp
FROM calendar as c
LEFT JOIN (schedule_override as d
JOIN schedule_override_stop as e
ON e.schedule_override_id = d.id)
ON d.effective_date = c.date
LEFT JOIN (SELECT f.route_id, f.start_date
g.dayOfWeek, g.departure, g.location_id,
(SELECT MIN(h.start_date)
FROM schedule as h
WHERE h.route_id = f.route_id
AND h.start_date > f.start_date) as end_date
FROM schedule as f
JOIN schedule_stop as g
ON g.schedule_id = f.id) as j
ON j.start_date <= c.date
AND j.end_date > c.date
AND j.dayOfWeek = c.dayOfWeek
WHERE c.date >= :startDate
AND c.date < :endDate) as k
ON k.route_id = b.id
AND k.departure_timestamp >= :leaveAfter
JOIN location as m
ON m.id = k.location_id
AND m.(location inforation) = :(input location information)
ORDER BY k.departure_timestamp ASC
LIMIT 5
This will give a list of all departures leaving from the specified location, for the given route, between startDate and endDate (exclusive), and after the leaveAfter timestamp. Statement (equivalent) runs on DB2. It picks up changes to schedules, overrides for holidays, etc.
I think X-Zero advice is the best solution, but I had free time:) Please see below, I have used concat to handle as timestamp and after ordered by those two column. I wrote freehand can be error, I have used exists, somewhere I read its more faster than join but you can just use concat and order parts of the query
SELECT
trips.trip_departure,
trips.trip_arrival,
trips.trip_total_time,
trips.trip_direction,
CONCAT(trips.trip_period_start,' ',trips.trip_departure) as start,
CONCAT(trips.trip_period_end,' ',trips.trip_departure) as end,
FROM trips
WHERE EXISTS
(
SELECT
trips_assoc.calendar_id
FROM
trips_assoc
WHERE
trips.trip_id = trips_assoc.trip_id
AND EXISTS
(
SELECT
calendar_regular.cal_regular_id
FROM
calendar_regular
WHERE
cal2.cal_regular_id = trips_assoc.calendar_id
AND
(
calendar_regular.j5 = 1
OR
calendar_regular.j6 = 1
)
)
)
AND
trips.route_id IN (109)
AND
trips.trip_direction IN (0)
AND
trips.trip_period_start <= "2011-11-25"
AND
trips.trip_period_end >= "2011-11-25"
AND
(
trips.trip_departure >= "25:00:00"
OR
trips.trip_departure >= "01:00:00"
)
ORDER BY
TIMESTAMP(start) ASC,TIMESTAMP(end) ASC
LIMIT
5
EDIT: COPY/PASTE issue corrected