SQL Select with multiple search parameters using joins and subqueries - mysql

I have spent hours searching for an answer for my problem without satisfying results.
I want to select everything with one query from players, villages and alliances -tables and date and population from histories table.
Selection must be filtered with following rules:
Select latest information by date.
Select only if player has <= number of villages at the moment.
Select only if total population of player's villages is <= at the moment
and 3. are the ones causing my head hurt. How to add those to my query?
Here is my current query:
SELECT players.name AS player,
players.uid as uid,
players.tid,
villages.name AS village,
villages.vid as vid,
villages.fid as fid,
alliances.name AS alliance,
alliances.aid as aid,
SQRT( POW( least(abs($xcoord - villages.x),
400-abs($xcoord - villages.x)), 2 ) +
POW( least(abs($ycoord - villages.y),
400-abs($ycoord - villages.y)), 2 ) ) AS distance
FROM histories
LEFT JOIN players ON players.uid = histories.uid
LEFT JOIN villages ON villages.vid = histories.vid
LEFT JOIN alliances ON alliances.aid = histories.aid
LEFT JOIN histories h2
ON ( histories.vid = h2.vid AND histories.idhistory < h2.idhistory )
WHERE h2.vid IS NULL
AND histories.uid != $uid
AND SQRT( POW(least(abs($xcoord - villages.x),
400-abs($xcoord - villages.x)), 2 ) +
POW(least(abs($ycoord - villages.y),
400-abs($ycoord - villages.y)), 2 ) ) < $rad
ORDER BY distance
Notice: xcoord and ycoord are posted from the search form.
Example output:
PLayer| Village | Alliance | Distance
P1 | V1 | A1 | 1
P2 | V4 | A2 | 2
P1 | V2 | A1 | 3
P1 | V3 | A1 | 4
P2 | V5 | A2 | 5
Thank you in advance for helping. :)
This query can find players that have less than 2 villages. I just cant put my original query and this together. Is it even possible?
SELECT
b.*, count(b.uid) as hasvillages
FROM
histories b
WHERE
b.vid IN (SELECT a.vid FROM villages a)
GROUP BY
b.uid
HAVING
count(b.uid) < 2
HERE IS THE LINK TO SQLFIDDLE
HERE IS THE LINK TO PICTURE OF MY DATABASE EER DIAGRAM

After one week of try-outs I have finally found the answer.
With this query I can use following search parameters:
Find latest rows by date
Find rows by limiting the number of villages the player has.
Find rows by limiting the total population of villages the player has.
Find rows by calculating the distance.
Exclude players or alliances from selection.
Here is the query
SELECT players.name AS player, players.uid as uid, players.tid,
villages.name AS village, villages.vid as vid, villages.fid as fid,
alliances.name AS alliance, alliances.aid as aid,
SQRT( POW( least(abs(100 - villages.x),400-abs(100 - villages.x)), 2 ) +
POW( least(abs(100 - villages.y),400-abs(100 - villages.y)), 2 ) ) AS distance
FROM histories
LEFT JOIN players ON players.uid = histories.uid
LEFT JOIN villages ON villages.vid = histories.vid
LEFT JOIN alliances ON alliances.aid = histories.aid
WHERE histories.uid IN
(SELECT b.uid FROM histories b
WHERE (b.vid IN (SELECT a.vid FROM villages a) and b.date
in (select max(date) from histories))
GROUP BY b.uid HAVING count(b.uid) < 4 AND
sum(b.population) < 2000)
AND histories.uid != 1
and histories.date in (select max(date) from histories)
AND SQRT( POW( least(abs(100 - villages.x),400-abs(100 - villages.x)),2)+
POW( least(abs(100 - villages.y),400-abs(100 - villages.y)), 2 ) ) < 200
ORDER BY distance

Related

Getting percentage of total in SQL with two joins

So I'm trying to do something that I think should be fairly simple with SQL. But I'm having a hard time figuring it out. Here is the format of my data:
One table with user information, let's call it User:
ID name_user Drive_Type
1 Tim Stick shift
2 Jim Automatic
3 Bob Automatic
4 Lisa Stick shift
Then I have one table used for the join, let's call it Join_bridge:
user_ID car_has_ID
1 12
2 13
3 14
4 14
And one table with car information, let's call it Car:
car_ID name
12 Honda
13 Toyota
14 Ford
Then what I want is something that looks like this with the total number of Ford's that are stick shift and the percentage
name Total percentage
Ford 1 25%
I have tried the following, which gets the total right, but not the percentage:
select Drive_Type,
name,
count(Drive_Type) as Total,
(count(Drive_Type) / (select count(*)
from User
join Join_bridge
on User.ID = user_ID
join Car
on Car.car_ID = Join_bridge.car_has_ID
) * 100.0 as Percent
from User
join Join_bridge
on User.ID = Join_bridge.user_ID
join Car
on Car.car_ID = Join_bridge.car_has_ID
where name = 'Ford' and Drive_Type = "Automatic"
;
What am I missing? Thanks.
See this SQL Fiddle with the query - the trick is to SUM over CASE that returns 1 for rows you look for and 0 for the rest in order to calculate "Total" at the same time you can also count all rows to calculate percentage.
Here's the SQL query:
SELECT
'Ford' name,
SUM(a.ford_with_stack_flag) Total,
100.0 * SUM(a.ford_with_stack_flag) / COUNT(*) percentage
FROM (
SELECT
Car.name,
(CASE WHEN User.Drive_Type = 'Stick Shift' and Car.name = 'Ford' THEN 1 ELSE 0 END) ford_with_stack_flag
FROM User
JOIN Join_bridge on User.ID = Join_bridge.user_ID
JOIN Car ON Car.car_ID = Join_bridge.car_has_ID
) a
Compute percent and join to Car. Window functions are supported in MySql 8.0
select c.car_ID, c.name, p.cnt, p.Percent
from car c
join (
select car_has_ID, u.Drive_Type,
count(*) cnt,
count(*) / count(count(*)) over() Percent
from Join_bridge b
join user u on u.ID = b.user_ID
group by b.car_has_ID, u.Drive_Type
) p on p.car_has_ID = c.car_ID
where c.name = 'Ford' and p.Drive_Type='Stick shift';
db<>fiddle

Mysql Bayesian and sort by star ratings

Say I have two tables. businesses and reviews for businesses.
businesses table:
+----+-------+
| id | title |
+----+-------+
reviews table:
+----+-------------+---------+------+
| id | business_id | message | rate |
+----+-------------+---------+------+
each review has a rate ( 1 to 5 stars )
I want to sort businesses by their reviews rates, based on Bayesian Ranking with condition of having at least 2 reviews.
Here is my query:
SELECT b.id,
(SELECT COUNT(r.rate) as rr FROM reviews r WHERE r.business_id = b.id) as rr,
(SELECT
((COUNT(r.rate) / (COUNT(r.rate) + 2)) AVG(r.rate) +
(2 /(COUNT(r.rate) + 2)) 4)
FROM reviews r where r.business_id = b.id AND rr > 2
) as score
FROM businesses b
order by score desc
LIMIT 4
this will output me:
+------+----+------------+
| id | rr | score |
+------+----+------------+
| 992 | 14 | 4.31250000 |
+------+----+------------+
| 237 | 3 | 4.2000000 |
+------+----+------------+
| 19 | 5 | 4.0000000 |
+------+----+------------+
| 1009 | 12 | 3.9285142 |
+------+----+------------+
I have two questions:
as you see in ((COUNT(r.rate) / (COUNT(r.rate) + 2)) AVG(r.rate) +
(2 /(COUNT(r.rate) + 2)) 4) FROM reviews r where r.business_id = b.id AND rr > 2 ) some functions are running more than once, like COUNT or AVG. are they running once in background and maybe caches the resuslt? OR run for every single call?
is there any equivalent query for this but more optimize?
thanks in advance.
I would hope that MySQL would optimise the multiple counts away, but not certain.
However you could rearrange you query to join against a sub query. This way you are not performing 2 sub queries for every row.
SELECT b.id,
sub0.rr,
sub0.score
FROM businesses b
INNER JOIN
(
SELECT r.business_id,
COUNT(r.rate) AS rr ,
((COUNT(r.rate) / (COUNT(r.rate) + 2)) AVG(r.rate) + (2 /(COUNT(r.rate) + 2)) 4) AS score
FROM reviews r
GROUP BY r.business_id
HAVING rr > 2
) sub0
ON sub0.business_id = b.id
ORDER BY score DESC
LIMIT 4
Note that the result here are very slightly different as it will exclude records with only 2 reviews, while your query will still return them but with a score of NULL. I have left in the apparent missing operators (ie, before AVG(r.rate) and before 4) AS score from your original query.
Using the above idea you could recode it to return both the count and the average rate in the sub query, and just use the values of those returned columns for the calculation.
SELECT b.id,
sub0.rr,
((rr / (rr + 2)) arr + (2 /(rr + 2)) 4) AS score
FROM businesses b
INNER JOIN
(
SELECT r.business_id,
COUNT(r.rate) AS rr ,
AVG(r.rate) AS arr
FROM reviews r
GROUP BY r.business_id
HAVING rr > 2
) sub0
ON sub0.business_id = b.id
ORDER BY score DESC
LIMIT 4

SQL: get A with max B for every distinct C

In my example, I have a table containing info about different venues, with columns for city, venue_name, and capacity. I need to select the city and venue_name for the venue with the highest capacity within each city. So if I have data:
city | venue | capacity
LA | venue1 | 10000
LA | venue2 | 20000
NY | venue3 | 1000
NY | venue4 | 500
... the query should return:
LA | venue2
NY | venue3
Can anybody give me advice on how to accomplish this query in SQL? I've gotten tangled up in joins and nested queries :P. Thanks!
select t.city, t.venue
from tbl t
join (select city, max(capacity) as max_capacity from tbl group by city) v
on t.city = v.city
and t.capacity = v.max_capacity
One way to do this is with not exists:
select i.*
from info i
where not exists (select 1
from into i2
where i2.city = i.city and i2.capacity > i.capacity);
The common approach is to join the table back to itself using a subquery with max:
select y.city, y.venue_name
from yourtable y
join (select city, max(capacity) maxcapacity
from yourtable
group by city
) t on y.city = t.city and y.capacity = t.maxcapacity
You can use an outer apply to order those values and bring the results back to your main query.
http://www.codeproject.com/Articles/607246/Making-OUTER-and-CROSS-APPLY-work-for-you
Another alternative would be to use the RowNum() function. http://msdn.microsoft.com/en-us/library/ms186734.aspx
SELECT
v.city,
Ranked.Venue,
Ranked.Capacity
FROM Venues v WITH (NOLOCK)
Outer Apply
(
SELECT TOP 1
Venue, Capacity
FROM Venues Ranked WITH (NOLOCK)
WHERE v.City = Ranked.City
ORDER BY Capacity DESC
) as Ranked
GROUP BY
v.city,
Ranked.Venue,
Ranked.Capacity

Combine multiple SQL select statements into columns

I'm having a hard time wrapping my mind around this, any assistance is most appreciated.
I have two select statements with joins to 1 or more tables.
SELECT repinfo.repName, SUM(callstatssummary.CallsIn)
FROM repinfo
LEFT JOIN callstatssummary
ON repinfo.isaacID = callstatssummary.IsaacID AND callstatssummary.ShiftDate >= '2013-02-10' AND callstatssummary.ShiftDate <= '2013-02-16'
GROUP BY repinfo.repName;
The output of the first statement is a list of everyone in the repinfo table, with the sum of the total calls they took during the week. I used a left join to include people who didn't take calls in the result.
SELECT repinfo.repName, SUM(`1036`.afterRgu) - SUM(`1036`.priorRgu)
FROM repinfo
JOIN reporders
ON repinfo.repID = reporders.oRep
JOIN `1036`
ON reporders.workOrder = `1036`.workOrder AND `1036`.entryDate >= '2013-02-10' AND `1036`.entryDate <= '2013-02-16' AND `1036`.afterRgu >= `1036`.priorRgu
GROUP BY repinfo.repName;
The second statement outputs the number of products that each person sold during the week. The repinfo table has the information about the representative, which joins with the reporders table to match the work order. The 1036 table has detailed information about the orders.
I am looking to output something like this - essentially combine the output of the two select statements:
| repName | SUM(callstatssummary.CallsIn) | SUM(`1036`.afterRgu) - SUM(`1036`.priorRgu) |
______________________________________________________________________________________________
| Bruce W | 41 | 13 |
| Cathy M | 84 | 17 |
| Jonah S | NULL | 29 |
Any suggestions?
One way to combine those statements is to make each of them a derived-table / inline-view and join on repName.
Please note: Obviously you would want to join on a rep ID number (or whatever you call the primary key of the repinfo table) if two reps can have the same name.
select
r.repName, c.sumCallsIn, o.sumProdSold
from
repinfo r
left join (
SELECT repinfo.repName,
SUM(callstatssummary.CallsIn) sumCallsIn
FROM repinfo
LEFT JOIN callstatssummary
ON repinfo.isaacID = callstatssummary.IsaacID
AND callstatssummary.ShiftDate >= '2013-02-10'
AND callstatssummary.ShiftDate <= '2013-02-16'
GROUP BY repinfo.repName
) c
on c.repName = r.repName
left join (
SELECT repinfo.repName,
SUM(`1036`.afterRgu) - SUM(`1036`.priorRgu) sumProdSold
FROM repinfo
JOIN reporders
ON repinfo.repID = reporders.oRep
JOIN `1036`
ON reporders.workOrder = `1036`.workOrder
AND `1036`.entryDate >= '2013-02-10'
AND `1036`.entryDate <= '2013-02-16'
AND `1036`.afterRgu >= `1036`.priorRgu
GROUP BY repinfo.repName
) o
on r.repName = o.repName
order by r.repName;

MySQL: Group by date proximity?

I wrote this query, it does almost what I want:
SELECT * FROM
(
SELECT COUNT(*) as cnt,
lat,
lon,
elev,
GROUP_CONCAT(CONCAT(usaf,'-',wban))
FROM `ISH-HISTORY_HASPOS`
GROUP BY lat,lon,elev
) AS x WHERE cnt >=1;
output:
+-----+--------+----------+--------+-------------------------------------------------+
| cnt | lat | lon | elev | GROUP_CONCAT(CONCAT(usaf,'-',wban)) |
+-----+--------+----------+--------+-------------------------------------------------+
| 4 | 30.478 | -87.187 | 36 | 722220-13899,722221-13899,722223-13899,999999-13899 |
| 4 | 36.134 | -80.222 | 295.7 | 723190-93807,723191-93807,723193-93807,999999-93807 |
| 5 | 37.087 | -84.077 | 369.1 | 723290-03849,723291-03849,723293-03849,724243-03849,999999-03849 |
| 5 | 38.417 | -113.017 | 1534.1 | 745200-23176,745201-23176,999999-23176,724757-23176,724797-23176 |
| 4 | 40.217 | -76.851 | 105.8 | 999999-14751,725110-14751,725111-14751,725118-14751 |
+-----+--------+----------+--------+-------------------------------------------------+
This returns a concatenated list of stations that are located at identical coordinates. However, I am only interested in concatenating stations with adjoining date ranges. The table that I select from (ISH-HISTORY_HASPOS) has two datetime columns : 'begin' and 'end'. I need the values for these two columns to be within 3 days of each other to satisfy the GROUP_CONCAT conditions.
Edit: In order for a station to be included in the final result's GROUP_CONCAT it must satisfy the following conditions:
It must be co-located with another station in the list (group by
lat,lon,elev)
Its end time must be within 3 days of another station's begin time OR its begin time must be within 3 days of another station's
end time. When I say "another station", I am referring to stations
that are co-located (meet the conditions for #1).
I figure that I will have to use a subquery but I can't seem to figure out how to do it. Some help would be greatly appreciated! Either a query or a stored procedure would be great but a php solution would also be acceptable.
Here is a dump of the table that I am querying:sql dump
The results should look the same as my example, but non-adjoining items (date-wise) should not be there.
A solution could be using a subquery to compute the list of station within 3 days of each other and adding this subquery as a where clause to the main query.
The subquery consists of a cartesian product to list all possible station couples with a first condition to get just the first half of the resulting matrix and two conditions to specify the time constraints. As to these latter conditions I just guessed them, I don't really know the begin and end fields unit of measure.
The resulting query could be this:
SELECT * FROM (
SELECT COUNT(*) AS
cnt,
lat,
lon,
elev,
GROUP_CONCAT(CONCAT(usaf, '-', wban))
FROM ISH-HISTORY_HASPOS
WHERE id IN (
SELECT DISTINCT t1.id
FROM ISH-HISTORY_HASPOS t1
INNER JOIN ISH-HISTORY_HASPOS t2
ON t1.lon = t2.lon
AND t1.lat = t2.lat
AND t1.elev = t2.elev
WHERE t1.id < t2.id
AND abs(t1.begin - t2.end) < 259200
AND abs(t1.end - t2.begin) < 259200
UNION
SELECT DISTINCT t2.id
FROM ISH-HISTORY_HASPOS t1
INNER JOIN ISH-HISTORY_HASPOS t2
ON t1.lon = t2.lon
AND t1.lat = t2.lat
AND t1.elev = t2.elev
WHERE t1.id < t2.id
AND abs(t1.begin - t2.end) < 259200
AND abs(t1.end - t2.begin) < 259200
)
GROUP BY lat, lon, elev
) AS x WHERE cnt >= 1;
I only have access and knowledge of SQL Server so I can't get your data to work and I don't know if MySQL has the equivalent functionality but here is a verbal description of what you need to do.
You need a recursive statement (WITH CTE in SQL Server) to join the table to itself on lat, lon, elev and begin BETWEEN end -3 AND end +3. You will need to be careful not to get caught in an infinite loop - I suggest building a comma seperated list of the IDs you have visited and checking this as you go. Its painful but keep this list in ID order becuase it is what you will need to group on at the end. You also need to keep track of your depth and the original id.
Something like ...
WITH cte(id, idlist, lat, lon, elev, starts, ends)
AS (
SELECT id, CAST(id AS varchar), lat, lon, elev, starts, ends
FROM `ISH-HISTORY_HASPOS`
UNION ALL
SELECT i.id, FunctionToManagetheList(i.idlist, cte.id), lat, lon, elev, starts, ends
FROM `ISH-HISTORY_HASPOS` i
INNER JOIN
cte ON i.lat=cte.lat AND
i.lon=cte.lon AND
i.elev=cte.elev AND
NOT FunctionToCheckIfTheIDisintheLitst(i.id, cte.idlist)
)
SELECT stuffyouneed
FROM `ISH-HISTORY_HASPOS` i
INNER JOIN
(SELECT id, MAX(depth) AS MaxDepth
FROM cte
GROUP BY id) cte1 ON i.id=cte.id
INNER JOIN
cte cte2 ON cte1.id=cte2.id AND cte1.MaxDepth=cte2.Depth
GROUP BY cte.idlist