I have a table where I store the standings data of a racing championship. The problem is - I don't actually store the driver's position, but I just store the number of points along with other data (number of wins, etc.) and then let MySQL sort it for me. This usually works fine, but what if I want to know a driver's position in a certain season? I could go with the following code:
SELECT l.driver, #curRow := #curRow + 1 AS position
FROM driverStandings l
JOIN (SELECT #curRow := 0) r
WHERE l.season = 1
ORDER BY l.points DESC, l.racesWon DESC
However, this would return a list of every driver and his/her position in season 1. If I just wanted to know a driver's (e.g. "Vettel") position, what could I do? I've tried everything I could think of, to no avail.
SELECT COUNT(v.id) + 1
FROM driverStandings d
LEFT JOIN driverStandings v
ON d.points > v.points
OR (d.points = v.points AND d.racesWon > v.racesWon)
WHERE v.id = ?
If you want, you can GROUP BY instead of filtering on v.id (just remember to add it to the SELECT list in order that you can identify each record though).
Do a select count where points are greater than that driver's points. This would be easiest in a second query, but there is that chance of a race condition (no pun intended).
select (count(*) + 1) as position from driverStandings l where l.points > :driverPoints
you really ought to store the position explicitly as mentioned in the comment to your original post. it will make all of the code surrounding this problem cleaner.
it should be possible to fill the values so the new driver position field is filled with the current row as in your example using an inert...select
however you could also make a get the same info inefficiently using a sub query
Related
CONDENSED VERSION
I'm trying to join a new list with my existing database with no unique identifier -- but I'm trying to figure out a way to do it in one query that's more specific than matching by first name/last name but less specific than by all the fields available (first name/middle name/last name/address/phone).
So my idea was to match solely on first/last name and then try to assign each possible matching field with points to see if anyone who matched had 'zero points' and thus have the first name/last name match stripped from them. Here's what I came up with:
SELECT *,
#MidMatch := IF(LEFT(l.middle,1)=LEFT(d.middle,1),"TRUE","FALSE") MidMatch,
#AddressMatch := IF(left(l.address,5)=left(d.address,5),"TRUE","FALSE") AddressMatch,
#PhoneMatch := IF(right(l.phone,4)=right(d.phone,4),"TRUE","FALSE") PhoneMatch,
#Points := IF(#MidMatch = "TRUE",4,0) + IF(#AddressMatch = "TRUE",3,0) + IF(#PhoneMatch = "TRUE",1,0) Points
FROM list l
LEFT JOIN database d on IF(#Points <> 0,(l.first = d.first AND l.last = d.last),(l.first = d.first AND l.last = d.last AND l.address = d.vaddress));
The query runs fine but it does still match people who's first/last names are identical even if their points are zero (and if their addresses don't match).
Is there a way to do what I'm looking for with this roundabout points system? I've found that it helps me a lot when trying to identify which duplicate to choose, so I'm trying to expand it to the initial match. Or should I do something different?
SPECIFIC VERSION
This is kind of a roundabout idea -- so if somebody has something more straight forward, I'd definitely be willing to bail on this completely and try something else. But basically I have a 93k person table (from a database) that I'm matching against a 92k person table (from a new list). I expect many of them to be the same but certainly not all -- and I'm trying to avoid creating duplicates. Unfortunately, there's no unique identifiers that can be matched, so I'm generally stuck with matching based on some variation of first name, middle name, last name, address, and/or phone number.
The schema for the two tables (list and database) are pretty identical with the fields you see above (first name, middle name, last name, address, phone) -- the only difference is that the database table also has an unique numerical ID that I would use to upload back into the database after this match. Unfortunately the list table has no such ID. Records with the ID would get matched and loaded in on top of the old record and any record without that ID would get loaded as a new record.
What I'm trying to avoid with this question is creating a bunch of different tables and queries that start with a really specific JOIN statement and then eventually get down to just first and last name -- since there's likely some folks who should match but have moved and/or gotten a new phone number since this last list.
I could write a very simple query as a JOIN and do it numerous times, each time taking out another qualifier:
SELECT *
FROM list l
JOIN database d
ON d.first = l.first AND d.last = l.last AND d.middle = l.middle AND d.address = l.address AND d.phone = l.phone;
And I'd certainly feel confident that those people from the new list matched with the existing people in my database, but it'd only return a very small amount of people, then I'd have to go back and loosen the criteria (e.g. drop the middle name restriction, etc.) and continually create tables then merge them all back together at the end along with all the ones that didn't match at all, which I would assume would be the new people.
But is there a way to write the query solely using a first/last name match, then evaluating the other criteria and wiping the match from people who have zero 'points' (below)? Here's what I attempted to do assigning [arbitrary] points to each match:
SELECT *,
#MidMatch := IF(LEFT(l.middle,1)=LEFT(d.middle,1),"TRUE","FALSE") MidMatch,
#AddressMatch := IF(left(l.address,5)=left(d.address,5),"TRUE","FALSE") AddressMatch,
#PhoneMatch := IF(right(l.phone,4)=right(d.phone,4),"TRUE","FALSE") PhoneMatch,
#Points := IF(#MidMatch = "TRUE",4,0) + IF(#AddressMatch = "TRUE",3,0) + IF(#PhoneMatch = "TRUE",1,0) Points
FROM list l
LEFT JOIN database d on IF(#Points <> 0,(l.first = d.first AND l.last = d.last),(l.first = d.first AND l.last = d.last AND l.address = d.vaddress));
The LEFT and RIGHT formulas within the IF statements are just attempting to control for unstandardized data that gets sent. I also would've done something with a WHERE statement, but I still need the NULL values to return so I know who matched and who didn't. So I ended up attempting to use an IF statement in the LEFT JOIN to say that if the Points cell was equal to zero, that the JOIN statement would get really specific and what I thought would hopefully still return the row but it wouldn't be matched to the database even if their first and last name did.
The query doesn't produce any errors, though unfortunately I'm still getting people back who have zeros in their Points column but matched with the database because their first and last names matched (which is what I was hoping the IF/Points stuff would stop).
Is this potentially a way to avoid bad matches, or am I going down the wrong path? If this isn't the right way to go, is there any other way to write one query that will return a full LEFT JOIN along with NULLs that don't match but have it be more specific than just first/last name but less work than doing a million queries based on a new table each time?
Thanks and hopefully that made some sense!
Your first query:
SELECT *,
#MidMatch := IF(LEFT(l.middle,1)=LEFT(d.middle,1),"TRUE","FALSE") MidMatch,
#AddressMatch := IF(left(l.address,5)=left(d.address,5),"TRUE","FALSE") AddressMatch,
#PhoneMatch := IF(right(l.phone,4)=right(d.phone,4),"TRUE","FALSE") PhoneMatch,
#Points := IF(#MidMatch = "TRUE",4,0) + IF(#AddressMatch = "TRUE",3,0) + IF(#PhoneMatch = "TRUE",1,0) Points
FROM list l LEFT JOIN
database d
on IF(#Points <> 0,(l.first = d.first AND l.last = d.last),(l.first = d.first AND l.last = d.last AND l.address = d.vaddress));
This is making a serious mistake with regards to variables. The simplest is the SELECT -- the SELECT does not guarantee the order of calculation of expressions, so they could calculated in any order. And the logic is wrong if #Points is calculated first. This problem is compounded by referring to variables in different clauses. The SQL statement is a logical statement describing the results set, not a programmatic statement of how the query is run.
Let me assume that you have a unique identifier for each row in the database (just to identify the row). Then you can get the match by using a correlated subquery:
select l.*,
(select d.databaseid
from database d
where l.first = d.first and l.last = d.last
order by (4 * (LEFT(l.middle, 1) = LEFT(d.middle, 1) ) +
3 * (left(l.address, 5) = left(d.address, 5)) +
1 * (right(l.phone, 4) = right(d.phone, 4))
)
limit 1
) as did
from list l;
You can join back to the database table to get more information if you need it.
EDIT:
Your comment made it clear. You don't just want the first and last name but something else as well.
select l.*,
(select d.databaseid
from database d
where l.first = d.first and l.last = d.last and
(LEFT(l.middle, 1) = LEFT(d.middle, 1) or
left(l.address, 5) = left(d.address, 5) or
right(l.phone, 4) = right(d.phone, 4)
)
order by (4 * (LEFT(l.middle, 1) = LEFT(d.middle, 1) ) +
3 * (left(l.address, 5) = left(d.address, 5)) +
1 * (right(l.phone, 4) = right(d.phone, 4))
)
limit 1
) as did
from list l;
I'm trying to build a query that will show me the number of points for each team (as a start anyway) in my database. I have two tables, one with the list of teams, and one with the matches, in the format
Week|HomeTeam|AwayTeam|HomeScore|AwayScore|Result|MatchID
Where Result can be either A (away win), H (home win), or D (draw). I can find the number of points for a given team easily enough with
SELECT
(SELECT COUNT(matches.Result)*3 FROM matches WHERE (HomeTeam='Arsenal' AND Result='H') OR (AwayTeam='Arsenal' AND Result='A')) +
(SELECT COUNT(matches.Result) FROM matches WHERE (HomeTeam='Arsenal' OR AwayTeam='Arsenal') AND Result='D');
I want to do is something like:
FOR EACH teams.Team in teams
SELECT
(SELECT COUNT(matches.Result)*3 FROM matches WHERE (HomeTeam=Team AND Result='H') OR (AwayTeam=Team AND Result='A')) +
(SELECT COUNT(matches.Result) FROM matches WHERE (HomeTeam=Team OR AwayTeam=Team) AND Result='D');
Unfortunately, this isn't working and I can't quite figure out why. What am I doing wrong?
Simple use a subquery:
SELECT teams.Team,
((SELECT COUNT(matches.Result)*3 FROM matches WHERE (HomeTeam=teams.Team AND Result='H') OR (AwayTeam=Team AND Result='A')) +
(SELECT COUNT(matches.Result) FROM matches WHERE (HomeTeam=teams.Team OR AwayTeam=teams.Team) AND Result='D')) as Points
I have a query as follows
select
Sum(If(departments.vat, If(weeklytransactions.weekendingdate Between
'2011-01-04' And '2099-12-31', weeklytransactions.takings / 1.2,
If(weeklytransactions.weekendingdate Between '2008-11-30' And '2010-01-01',
weeklytransactions.takings / 1.15, weeklytransactions.takings / 1.175)),
weeklytransactions.takings)) As Total,
weeklytransactions.weekendingdate,......
and another that returns a vat rate as follows
select format(Max(Distinct vat_rates.Vat_Rate),3) From vat_rates Where
vat_rates.Vat_From <= '2011-01-03'
I want to replace the hard coded if statement with the lower query, replacing the date in the lower query with weeklytransactions.weekendingdate.
After Kevin's comments, here is the full query I'm trying to get to work;
Select Max(vat_rates.vat_rate) As r,
If(departments.vat, weeklytransactions.takings / r, weeklytransactions.takings) As Total,
weeklytransactions.weekendingdate,
Week(weeklytransactions.weekendingdate),
round(datediff(weekendingdate, (if(month(weekendingdate)>5,concat(year(weekendingdate),'-06-01'),concat(year(weekendingdate)-1,'-06-01'))))/7,0)+1 as fyweek,
cast((Case When Month(weeklytransactions.weekendingdate) >5 Then Concat(Year(weeklytransactions.weekendingdate), '-',Year(weeklytransactions.weekendingdate) + 1) Else Concat(Year(weeklytransactions.weekendingdate) - 1, '-',Year(weeklytransactions.weekendingdate)) End) as char) As fy,
business_units.business_unit
From departments Inner Join (business_units Inner Join weeklytransactions On business_units.buID = weeklytransactions.businessUnit) On departments.deptid = weeklytransactions.departmentId
Where (vat_rates.vat_from <= weeklytransactions.weekendingdate and business_units.Active = true and business_units.sales=1)
Group By weeklytransactions.weekendingdate, business_units.business_unit Order By fy desc, business_unit, fyweek
Regards
Pete
Assuming I read your question correctly, your problem is about having the result of another SELECT used to be returned by the result of your main query (plus depending on how acquainted you are with SQL, maybe you haven't had the occasion to learn about JOINs?).
You can have subqueries you extract data from within a SELECT, provided you define it within the FROMclause. The following query will work, for example:
SELECT A.a, B.b
FROM A
JOIN (SELECT aggregate(c) FROM C) AS B
Notice that there is no reference to table A within the subquery. Thing is, you cannot just add it like that to the query, as the subquery doesn't know it is a subquery. So the following won't work:
SELECT A.a, B.b
FROM A
JOIN (SELECT aggregate(c) FROM C WHERE C.someValue = A.someValue) AS B
Back to basics. What you want to do here visibly, is to aggregate some data associated to each of the records of another table. For that, you will need merge your SELECT queries and use GROUP BY:
SELECT A.a, aggregate(C.c)
FROM A, C
WHERE C.someValue = A.someValue
GROUP BY A.a
Back to your tables, the following should work:
SELECT w.weekendingdate, FORMAT(MAX(v.Vat_Rate, 3)
FROM weeklytransactions AS w, vat_rates AS v
WHERE v.Vat_From <= w.weekendingdate
GROUP BY w.weekendingdate
Feel free to add and remove fields and conditions as you see fit (I wouldn't be surprised that you'd also want to use a lower bound when filtering the records from vat_rates, since the way I have written it above, for a given weekendingdate, you get records from that week + the weeks before!).
So it looks like my first try did not address the actual problem. With the additional information provided in the comments, as well as the new complete query, let's see how this goes.
We are still missing error messages, but normally the query as written should result in MySQL having the following complaint:
ERROR 1109 (42S02): Unknown table 'vat_rates' in field list
Why? Because the vat_rates table does not appear in the FROM clause, whereas it should. Let's make that more obvious by simplifying the query, removing all references to the business_units table as well as the fields, calculations and order that do not add or remove anything to the problem, leaving us with the following:
SELECT MAX(vat_rates.vat_rate) AS r,
IF(d.vat, w.takings / r, w.takings) AS Total
FROM departments AS d
INNER JOIN weeklytransactions AS w ON w.departmentId = d.deptid
WHERE vat_rates.vat_from <= w.weekendingdate
GROUP BY w.weekendingdate
That cannot work, and will produce the error mentioned above. It looks like there is no FOREIGN ID between the weeklytransactions and vat_rates tables, so we have no choice but to do a CROSS JOIN for the moment, hoping that the condition in the WHERE clause and the aggregate function used to get r are enough to fit the business logic at hand here. The following query should return the expected data instead of an error message (I also remove r since that seems to be an intermediate value judging by the comments that were written):
SELECT IF(d.vat, w.takings / MAX(v.vat_rate), w.takings) AS Total
FROM vat_rates AS v, departments AS d
INNER JOIN weeklytransactions AS w ON w.departmentId = d.deptid
WHERE v.vat_from <= w.weekendingdate
GROUP BY w.weekendingdate
From there, assuming it works, you will only need to put back all the parts I removed to get your final query. I am a tad doubtful about the way the VAT rate is gotten here, but I have no idea what your requirements are in that regard so I leave it up to you to make sure that works as expected.
I currently have the following columns:
hit_id, visit_id, timestamp, page_url, page_next
hit_id increments upwards
visit_id is an ID of the visit and unique to each visitor
timestamp is a unix timestamp of the hit
page_url is the page being looked at
page_next is the page that was looked at next
I would like to to add a new column, page_last, where the previous page URL would go into - I should be able to extract this from page_url and page_next. I do not know why I did not create this column in the first place, probably a slight over-site really.
Is there anyway to fill this column using some MySQL trickery? page_last would always be empty on the initial hit on the website (doesn't contain referrer website).
I find the name page_last ambiguous (does it mean the previous page? or this last page on the visit?). I suggest you change it to page_prev.
The following comes close to filling this in, assuming that no one visited the same page multiple times in a visit:
select h.*, hprev.page_url as page_prev
from hits h left outer join
hits hprev
on hprev.page_next = h.page_url and hprev.visit_id = h.visit_id
If that is not true, then you need the most recent one. You can get that using a correlated subquery:
select h.*,
(select h2.page_url
from hits h2
where h2.visit_id = h.visit_id and h2.page_next = h.page_url and
h2.timestamp < h.timestamp
order by timestamp desc
limit 1
) as page_prev
from hits h
Doing the update is a bit tricky in MySQL, because you are not able to directly use the updated table in the update. But, the following trick should work:
update hits
set page_prev = (select page_url
from (select page_url
from hits h2
where h2.visit_id = hits.visit_id and
h2.page_next = hits.page_url and
h2.timestamp < hits.timestamp
order by timestamp desc
limit 1
) h3
)
The trick works because MySQL materializes views, so it actually creates a "temporary table" containing the necessary information for the update.
The first query works just fine. It returns one row from the table 'routepoint'. It has a certain 'route_id' and 'geo_distance()' is on its minimum given the parameters. I know that the subquery in the FROM section seems unnecessarily complicated but in my eyes it helps to highlight the problem with the second query.
The differences are in the last two rows.
SELECT rp.*
FROM routepoint rp, route r, (SELECT * FROM ride_offer WHERE id = 6) as ro
WHERE rp.route_id = r.id
AND r.id = ro.current_route_id
AND geo_distance(rp.lat,rp.lng,52372070,9735690) =
(SELECT MIN(geo_distance(lat,lng,52372070,9735690))
FROM routepoint rp1, ride_offer ro1
WHERE rp1.route_id = ro1.current_route_id AND ro1.id = 6);
The next query does not work at all. It completely freezes mysql and I have to restart.
What am I doing wrong? The first subquery returns excactly one row. I don't understand the difference.
SELECT rp.*
FROM routepoint rp, route r, (SELECT * FROM ride_offer WHERE id = 6) as ro
WHERE
rp.route_id = r.id
AND r.id = ro.current_route_id
AND geo_distance(rp.lat,rp.lng,52372070,9735690) =
(SELECT MIN(geo_distance(lat,lng,52372070,9735690))
FROM routepoint rp1
WHERE rp1.route_id = ro.current_route_id);
The problem is, as pointed out by Romain, that this is costly.
This article describes an algorithm that reduces the cost by a 2-step process.
Step 1: Find a bounding box that contains at least one point.
Step 2: Find the closest point by examining all points in the bounding box, which should be a comparatively small number, thus not so costly.