MySQL retrieve data based on two matching row column [duplicate] - mysql

Project Aim :
We are developing bus timing Api where user will search for buses.
Following are my table structure
I have following tables
buses
id | bus_name
Description of table: Store all buses Names
routes
id | route_name
Description of table: Store All city names
stops
id | stop_name
Description of table: All stop names
stop_orders
id | route_id | stop_id | stop_order
Description of table: here i will assign stops for city and stop_order column help to identify which stop next to each other
bus_timing
id | stop_order_id | bus_id | bus_timing | trip | trip_direction
Description of table: Here i will assign buses for route stops along with time and trip and direction
Output Expecting:
When user search between source to destination with time then Api must return all buses list with time
if direct buses not there then interconnected buses should show
For example if user search between stop_8 to stop_18 with 01:00:00 to 12:00:00 then all buses list with time should show.if direct buses not there to travel between two stops then interconnected link buses list should show
Output what i got is
PHP compare associative array based on condition
Present return result issue is
It will return all buses even though if bus is only travel to stop_8 but not stop_18.But my result must return only those buses which will travel between two stops i mean it must fall between both stops .
Even i have no idea how to find interconnected buses list
When time range is long then there is chance of same bus will travel(trip and direction) multiple times
Updates
Still looking for answer .Right now given answer has some points so offered bounty

Because stop_id cannot be two different values in the same row.
Aggregation is one way to do what you want:
SELECT b.bus_name
FROM buses b JOIN
route_connect rc
ON rc.busid = b.id JOIN
stops s
ON s.id = rc.stop_id
GROUP BY b.bus_name
HAVING SUM( s.stop_name = 'Sydney' ) > 0 AND
SUM( s.stop_name = 'Melbourne' ) > 0;
This returns buses that have stops with the name of both cities.
Given that buses can have lots of stops, it might be more efficient to do:
SELECT b.bus_name
FROM buses b JOIN
route_connect rc
ON rc.busid = b.id JOIN
stops s
ON s.id = rc.stop_id
WHERE s.stop_name in ('Sydney', 'Melbourne')
GROUP BY b.bus_name
HAVING COUNT(DISTINCT s.stop_name) = 2;

Also if buses are not directly travel between two city then i need to show inter connected buses.
That's a massive problem in a class of problems called routing problems.. For it, you need a better tool: consider migrating or integrating PostgreSQL, and examining PgRouting specifically you'll likely want Dijkstra's Shortest Path. PgRouting runs atop the PostGIS extension.
Or, consider working on integrating with Esri.
Alternatively you can mess around with this, but I wouldn't advise it.
OQgraph (update)
From symcbean in the comments, you could use the "OQgraph database engine" to do this too. There is an example of shortest path here.

Related

MySQL Query Is Too Slow Or Times Out

I am having a complete nightmare with my application. I haven't worked with datasets this big before, and my query is either timing out or taking ages to return something. I've got a feeling that my approach is just all wrong.
I have a payments table with a postcode field (among others). It has 40,000 rows roughly (one for each transaction). It has an auto-inc PRIMARY key and an INDEX on the postcode foreign-key.
I also have a postcodes lookup table with 2,500,000 rows. The table is structured like so;
postcode | country | county | localauthority | gor
AB1 1AA S99999 E2304 X 45
AB1 1AB S99999 E2304 X 45
The postcode field is PRIMARY and I have INDEXes on all the other fields.
Each field (apart from postcode) has a lookup table. In the case of country it's something like;
code | description
S99999 Wales
The point of the application is that the user can select areas of interest (such as "England", "London", "South West England" etc) and be shown payments results for those areas.
To do this, when a user selects the areas they are interested, I then created a temp table, with one row, listing ALL postcodes for the areas they selected. Then I LEFT JOIN it on to my payments table.
The problem is that if the user selects a big region (like "England") then I have to create a massive temp table (or about 1 million rows) and then compare it to the 40,000 payments to decide which to display.
UPDATE
Here is my code;
$generated_temp_table = "selected_postcodes_".$timestamp_string."_".$userid;
$temp_table_data = $temp_table
->setTempTable($generated_temp_table)
->newQuery()
->with(['payment' => function ($query) use ($column_values) {
$query->select($column_values);
}])
;
Here is my attempt to print out the raw query;
$sql = str_replace(['%', '?'], ['%%', "'%s'"], $temp_table_data->toSql());
$fullSql = vsprintf($sql, $temp_table_data->getBindings());
print_r($fullSql);
This is the result;
select * from `selected_postcodes_1434967426_1`
This doesn't look like the right query, I can't work out what Eloquent is doing here. I don't know why the full query is not printing out.
if you have too many result like 1 million, then use offset limit concept. Then it will save you'r time of the query. Also make sure in you select query you are filtering required fields only.( avoid select * from XXXX. use select A, B from XXX).

how to get average of rows that have a certain relationship

I have a bunch of data that is stored pertaining to county demographics in a database. I need to be able to access the average of data within in the state of a certain county.
For example, I need to be able to get the average of all counties who's state_id matches the state_id of the county with a county_id of 1. Essentially, if a county was in Virginia, I would need the average of all of the counties in Virginia. I'm having trouble setting up this query, and I was hoping that you guys could give me some help. Here's what I have written, but it only returns one row from the database because of it linking the county_id of the two tables together.
SELECT AVG(demographic_data.percent_white) as avg_percent_white
FROM demographic_data,counties, states
WHERE counties.county_id = demographic_data.county_id AND counties.state_id = states.state_id
Here's my basic database layout:
counties
------------------------
county_id | county_name
states
---------------------
state_id | state_name
demographic_data
-----------------------------------------
percent_white | percent_black | county_id
Your query is returning one row, because there's an aggregate and no GROUP BY. If you want an average of all counties within a state, we'd expect only one row.
To get a "statewide" average, of all counties within a state, here's one way to do it:
SELECT AVG(d.percent_white) AS avg_percent_white
FROM demographic_data d
JOIN counties a
ON a.county_id = d.county_id
JOIN counties o
ON o.state_id = a.state_id
WHERE o.county_id = 42
Note that there's no need to join to the state table. You just need all counties that have a matching state_id. The query above is using two references to the counties table. The reference aliased as "a" is for all the counties within a state, the reference aliased as "o" is to get the state_id for a particular county.
If you already had the state_id, you wouldn't need a second reference:
SELECT AVG(d.percent_white) AS avg_percent_white
FROM demographic_data d
JOIN counties a
ON a.county_id = d.county_id
WHERE a.state_id = 11
FOLLOWUP
Q What if I wanted to bring in another table.. Let's call it demographic_data_2 that was also linked via the county_id
A I made the assumption that the demographic_data table had one row per county_id. If the same holds true for the second table, then a simple JOIN operation.
JOIN demographic_data_2 c
ON c.county_id = d.county_id
With that table joined in, you could add an appropriate aggregate expression in the SELECT list (e.g. SUM, MIN, MAX, AVG).
The trouble spots are typically "missing" and "duplicate" data... when there isn't a row for every county_id in that second table, or there's more than one row for a particular county_id, that leads to rows not included in the aggregate, or getting double counted in the aggregate.
We note that the aggregate returned in the original query is an "average of averages". It's an average of the values for each county.
Consider:
bucket count_red count_blue count_total percent_red
------ --------- ---------- ----------- -----------
1 480 4 1000 48
2 60 1 200 30
Note that there's a difference between an "average of averages", and calculating an average using totals.
SELECT AVG(percent_red) AS avg_percent_red
, SUM(count_red)/SUM(count_total) AS tot_percent_red
avg_percent_red tot_percent_red
--------------- ---------------
39 45
Both values are valid, we just don't want to misinterpret or misrepresent either the value.

Querying normalized database, 3 tables

I have three tables in a MySQL database:
stores (PK stores_id)
states (PK states_id)
join_stores_states (PK join_id, FK stores_id, FK states_id)
The "stores" table has a single row for every business. The join_stores_states table links an individual business to each state it's in. So, some businesses have stores in 3 states, so they 3 rows in join_stores_states, and others have stores in 1 state, so they have just 1 row in join_stores_states.
I'm trying to figure out how to write a query that will list each business in one row, but still show all the states it's in.
Here's what I have so far, which is obviously giving me every row out of join_stores_states:
SELECT states.*, stores.*, join_stores_states.*
FROM join_stores_states
JOIN stores
ON join_stores_states.stores_id=stores.stores_id
JOIN states
ON join_stores_states.states_id=states.states_id
Loosely, this is what it's giving me:
store 1 | alabama
store 1 | florida
store 1 | kansas
store 2 | montana
store 3 | georgia
store 3 | vermont
This is more of what I want to see:
store 1 | alabama, florida, kansas
store 2 | montana
store 3 | georgia, vermont
Suggestions as to which query methods to try would be just as appreciated as a working query.
If you need the list of states as a string, you can use MySQL's GROUP_CONCAT function (or equivalent, if you are using another SQL dialect), as in the example below. If you want to do any kind of further processing of the states separately, I would prefer you run the query as you did, and then collect the resultset into a more complex structure (hashtable of arrays, as a simplest measure, but more complex OO designs are certainly possible) in the client by iterating over the resulting rows.
SELECT stores.name,
GROUP_CONCAT(states.name ORDER BY states.name ASC SEPARATOR ', ') AS state_names
FROM join_stores_states
JOIN stores
ON join_stores_states.stores_id=stores.stores_id
JOIN states
ON join_stores_states.states_id=states.states_id
GROUP BY stores.name
Also, even if you only need the concatenated string and not a data structure, some databases might not have an aggregate concatenation function, in which case you will have to do the client processing anyway. In pseudocode, since you did not specify a language either:
perform query
stores = empty hash
for each row from query results:
get the store object from the hash by name
if the name isn't in the hash:
put an empty store object into the hash under the name
add the state name to the store object's stores array

MySQL, how to repeat same line x times

I have a query that outputs address order data:
SELECT ordernumber
, article_description
, article_size_description
, concat(NumberPerBox,' pieces') as contents
, NumberOrdered
FROM customerorder
WHERE customerorder.id = 1;
I would like the above line to be outputted NumberOrders (e.g. 50,000) divided by NumberPerBox e.g. 2,000 = 25 times.
Is there a SQL query that can do this, I'm not against using temporary tables to join against if that's what it takes.
I checked out the previous questions, however the nearest one:
is to be posible in mysql repeat the same result
Only gave answers that give a fixed number of rows, and I need it to be dynamic depending on the value of (NumberOrdered div NumberPerBox).
The result I want is:
Boxnr Ordernr as_description contents NumberOrdered
------+--------------+----------------+-----------+---------------
1 | CORDO1245 | Carrying bags | 2,000 pcs | 50,000
2 | CORDO1245 | Carrying bags | 2,000 pcs | 50,000
....
25 | CORDO1245 | Carrying bags | 2,000 pcs | 50,000
First, let me say that I am more familiar with SQL Server so my answer has a bit of a bias.
Second, I did not test my code sample and it should probably be used as a reference point to start from.
It would appear to me that this situation is a prime candidate for a numbers table. Simply put, it is a table (usually called "Numbers") that is nothing more than a single PK column of integers from 1 to n. Once you've used a Numbers table and aware of how it's used, you'll start finding many uses for it - such as querying for time intervals, string splitting, etc.
That said, here is my untested response to your question:
SELECT
IV.number as Boxnr
,ordernumber
,article_description
,article_size_description
,concat(NumberPerBox,' pieces') as contents
,NumberOrdered
FROM
customerorder
INNER JOIN (
SELECT
Numbers.number
,customerorder.ordernumber
,customerorder.NumberPerBox
FROM
Numbers
INNER JOIN customerorder
ON Numbers.number BETWEEN 1 AND customerorder.NumberOrdered / customerorder.NumberPerBox
WHERE
customerorder.id = 1
) AS IV
ON customerorder.ordernumber = IV.ordernumber
As I said, most of my experience is in SQL Server. I reference http://www.sqlservercentral.com/articles/Advanced+Querying/2547/ (registration required). However, there appears to be quite a few resources available when I search for "SQL numbers table".

How to get the sum of a column from combined tables in mySQL?

I've been trying to write a mySQL-statement for the scenario below, but I just can't get it to work as intended. I would be very grateful if you guys could help me get it right!
I have two tables in a mySQL-database, event and route:
event:
id | date | destination | drivers |
passengers | description | executed
route:
name | distance
drivers contains a string with the usernames of the registered drivers in an event on the form "jack:jill:john".
destination contains the event destination (oh, really?) and its value is always the same as one of the values in the field name in the table route (i.e. the destination must already exist in route).
executed tells if the event is upcoming (0) or already executed (1).
distance is the distance to the destination in km from the home location.
What I want is to get the total distance covered for one specific user, only counting already executed events.
E.g., if Jill has been registered as a driver in two executed events where the distances to the destinations are 50km and 100km respectively, I would like the query to return the value 150.
I know I can use something like ...WHERE drivers LIKE '%jill%' AND executed = 1 to get the executed events where Jill was driving, and SUM() to get the total distance, but how do I combine the two tables and get it all to work?
Your help is very much appreciated!
/Linus
I haven't use MySQL for years, so sorry if I've got the syntax wrong, but something like this should do it:
In generic SQL:
select sum(distance) from route
join event on route.name = event.destination
where drivers like '%jill%' AND executed = 1
Or not using JOIN:
select sum(distance) from route, event
where drivers like '%jill%' AND executed = 1
and route.name = event.destination
Stuart's answer shows you how to get the sum of the column, but I just want to note that:
...WHERE drivers LIKE '%jill%'...
will return any event with a driver whose name contains the letters 'jill'.
Secondly, this database design doesn't seem to be normalized. You have driver names and route names repeated. If you normalize the database and have something like:
participant
id | name | role
event
id | date | route_id | description | executed
route
id | name | distance
participant_event
id | participant_id | event_id
then it would be a lot easier to work with the data.
Then if you wanted to implement a user search, you could make the query:
SELECT id FROM participant WHERE
name LIKE '%jill%' AND
role='driver';
Then if the query returns more than one result, let the user/application choose the correct driver and then run a SELECT SUM like Stuart's query:
SELECT SUM(r.distance) FROM route r
JOIN event e ON e.route_id=r.id
JOIN participant_event pe ON e.id=pe.event_id
JOIN participant p ON pe.participant_id=p.id
WHERE p.id=?;
Otherwise, the only way to ensure that you're only getting the total distance driven by one driver is to do something like this (assuming drivers is comma-delimited):
...WHERE LCASE(drivers)='jill' OR
drivers LIKE 'jill, %' OR
drivers LIKE '%, jill' OR
drivers LIKE '%, jill,%';