How do I group stops into a parent station in GTFS? - gtfs

In GTFS (defines public transportation schedules and geographic information), a station (parent_station) contains several stops (stop_id).
I am analyzing Paris GTFS data. All parent_station fields are blank value.
mysql> SELECT DISTINCT parent_station FROM stops;
+----------------+
| parent_station |
+----------------+
| |
| 0 |
+----------------+
How do I specify parent staions for stops (or group stops into parent station)?
mysql> SELECT * FROM stops LIMIT 10;
+---------+-----------+------------------------------------+-------------------------------------------+-----------+----------+---------------+----------------+
| stop_id | stop_code | stop_name | stop_desc | stop_lat | stop_lon | location_type | parent_station |
+---------+-----------+------------------------------------+-------------------------------------------+-----------+----------+---------------+----------------+
| 1166824 | | "Olympiades" | "91 rue de Tolbiac - 75113" | 48.826948 | 2.367038 | 0 | |
| 1166825 | | "Olympiades" | "91 rue de Tolbiac - 75113" | 48.826948 | 2.367038 | 0 | |
| 1166826 | | "Bibliotheque-Francois Mitterrand" | "Face au 62 rue du Chevaleret - 75113" | 48.829831 | 2.376120 | 0 | |
| 1166827 | | "Bibliotheque-Francois Mitterrand" | "Face au 62 rue du Chevaleret - 75113" | 48.829831 | 2.376120 | 0 | |
| 1166828 | | "Cour Saint-Emilion" | "Cour Chamonard - 75112" | 48.833314 | 2.387300 | 0 | |
| 1166829 | | "Cour Saint-Emilion" | "Cour Chamonard - 75112" | 48.833314 | 2.387300 | 0 | |
| 1166830 | | "Bercy" | "Place du Bataillon du Pacifique - 75112" | 48.840543 | 2.379409 | 0 | |
| 1166831 | | "Bercy" | "Place du Bataillon du Pacifique - 75112" | 48.840543 | 2.379409 | 0 | |
| 1166832 | | "Gare de Lyon" | "Gare SNCF - 75112" | 48.844652 | 2.373108 | 0 | |
| 1166833 | | "Gare de Lyon" | "Gare SNCF - 75112" | 48.844652 | 2.373108 | 0 | |
+---------+-----------+------------------------------------+-------------------------------------------+-----------+----------+---------------+----------------+
The stop 1166830 and 1166831 should belong to the same parent station for the same longitude and lantitude.
One idea comes into my mind. With a given radius (say r), two stops belong to a same station if their distance (say d) is less than r, i.e., d < r.
Any better ideas?

Assuming that you are sure that stop entries are not duplicates but they are stops located inside station, I propose following solution:
Find list of different stops with same name and location, then edit to indicate the first stop in the list as a "station" and other remaining stops in the list as stops inside the station.
Reference document will help you to do it. As an example I give you following edited (shown with ^^^^) rows:
| 1166830 | | "Bercy"| "Place du Bataillon du Pacifique - 75112" | 48.840543 | 2.379409 | 1 | |
^^^
| 1166831 | | "Bercy"| "Place du Bataillon du Pacifique - 75112" | 48.840543 | 2.379409 | 0 | 1166830 |
^^^^^^^

Related

Update field with query using longest match number

I have the following table of call_logs:
+------------+---------------------+------------+-------------+--------------+
| id | datetime | a_number | b_number | nem |
+------------+---------------------+------------+-------------+--------------+
| 1262662410 | 2020-07-17 10:43:57 | 3415529238 | 12642356719 | |
| 1262661229 | 2020-07-17 10:43:48 | 1126751251 | 12641344559 | |
| 1262658679 | 2020-07-17 10:43:28 | 3516807236 | 16199573103 | |
+------------+---------------------+------------+-------------+--------------+
and another table of prefixes:
+---------+-------+-------------------+------+
| prefix | lenght| description | nem |
+---------+-------+-------------------+------+
| 1907 | 4 | ALASKA | ALAS |
| 1684 | 4 | AMERICAN SAMOA | ASAM |
| 1264 | 4 | ANGUILLA | AGLL |
| 1264235 | 7 | ANGUILLA - MOBILE | AGLM |
| 1264469 | 7 | ANGUILLA - MOBILE | AGLM |
| 1264476 | 7 | ANGUILLA - MOBILE | AGLM |
| 1264536 | 7 | ANGUILLA - MOBILE | AGLM |
| 1264537 | 7 | ANGUILLA - MOBILE | AGLM |
| 1264538 | 7 | ANGUILLA - MOBILE | AGLM |
| 1264539 | 7 | ANGUILLA - MOBILE | AGLM |
+---------+-------+-------------------+------+
What MySQL query or precedure do you recommend to update the call_logs.nem field analyzing the prefixes.prefix field that best matches (with the greatest number of digits) with the field call_logs.b_number.
Example:
+------------+---------------------+------------+-------------+--------------+
| id | datetime | a_number | b_number | nem |
+------------+---------------------+------------+-------------+--------------+
| 1262662410 | 2020-07-17 10:43:57 | 3415529238 | 12642356719 | AGLM |
| 1262661229 | 2020-07-17 10:43:48 | 1126751251 | 12641344559 | AGLL |
+------------+---------------------+------------+-------------+--------------+
call_logs is a big table, it would be good to find the most efficient method.
Can anyone help me on this?
Thanks a lot!
CLARIFICATION:
Both fields: b_numberand prefix are VARCHAR type.
What type of UPDATE Query could be done in this case?
You can use not exists. Assuming that number and prefix are of numeric datatypes:
update call_logs c
inner join prefixes p on (c.b_number / p.prefix) % 10 = 0
set c.mem = p.mem
where not exists (
select 1
from prefixes p1
where (c.b_number / p1.prefix) % 10 = 0 and p1.length > p.length
)

MySQL returns bad result

I have question about SELECT FROM WHERE statement, which returns me bad result.
Here is my table called friends:
+----------+-----------+------------+--------+--------+-------+
| lastname | firstname | callprefix | phone | region | zip |
+----------+-----------+------------+--------+--------+-------+
| Lužný | Bob | 602 | 111222 | OL | 79821 |
| Matyáš | Bob | 773 | 123456 | BR | NULL |
| Strouhal | Fido | 300 | 343434 | ZL | 76701 |
| Přikryl | Tom | 581 | 010101 | PL | 72000 |
| Černý | Franta | 777 | 000999 | OL | 79801 |
| Zavadil | Olda | 911 | 111311 | OL | 79604 |
| Berka | Standa | 604 | 111234 | ZL | 72801 |
| Vlcik | BbB | 736 | 555444 | KV | 35210 |
+----------+-----------+------------+--------+--------+-------+
And here is my query.
SELECT * FROM friends WHERE region <= 'z';
I would expect that the rows with region ZL should be present, but they are not. Can you please tell me why?
Result is:
+----------+-----------+------------+--------+--------+-------+
| lastname | firstname | callprefix | phone | region | zip |
+----------+-----------+------------+--------+--------+-------+
| Lužný | Bob | 602 | 111222 | OL | 79821 |
| Matyáš | Bob | 773 | 123456 | BR | NULL |
| Přikryl | Tom | 581 | 010101 | PL | 72000 |
| Černý | Franta | 777 | 000999 | OL | 79801 |
| Zavadil | Olda | 911 | 111311 | OL | 79604 |
| Vlcik | BbB | 736 | 555444 | KV | 35210 |
+----------+-----------+------------+--------+--------+-------+
When I try this query:
SELECT * FROM friends WHERE region >= 'z';
the result contains both rows with region = 'ZL'
????
Thank you!
Because "ZL" is greater than "Z." Z is just one character so will only return values less that Z or with the value of Z. What are you trying to achieve with this query?
Can you please tell me why?
If you add a record where region is Z, and sorted those rows alphabetically by region, would you expect ZL to come before or after Z? Obviously it would come after, so it does not meet your criteria.
If you want to only consider the first character, then add that to your criteria:
SELECT * FROM friends WHERE LEFT(region,1) <= 'Z';
I would also make Z explicitly a capital letter in case your database settings make it a case-sensitive search.
Have you tried
SELECT * FROM friends WHERE region <= 'zl';?
From the computer's perspective, 'z' < 'zl'

Replace < 1 values in an aggregate SUM T-SQL query

I am try to avoid using #temp to get an average where a field from table contains just a few less than 1 values. Convert to varchar, replace then convert back seems ugly. Suggestions very welcomed!!
Select
(select cmp_zip from company where cmp_id = ord_originpoint) as OriginZip,
ord_originpoint as OriginPoint,
replace((select cty_nmstct from company where cmp_id = ord_originpoint),'/','') as Origin_City_State,
(select cmp_zip from company where cmp_id = ord_destpoint) as DestZip,
ord_destpoint as DestPoint,
replace((select cty_nmstct from company where cmp_id = ord_destpoint),'/','') as Dest_City_State,
COUNT(ord_hdrnumber) as OrdCount,
SUM(ord_rate)/COUNT(ord_hdrnumber) as AvgRate,
SUM(ord_totalmiles)/COUNT(ord_hdrnumber) as AvgMiles,
(SUM(ord_rate) / COUNT(ord_hdrnumber)) / (SUM(ord_totalmiles) / COUNT(ord_hdrnumber)) as AvgRevperMiles
from
orderheader (NOLOCK)
where ord_billto <> 'CSXJAC01'
and ord_revtype1 = 'NE'
and ord_status = 'CMP'
and ord_bookdate > GETDATE() - 730
and ord_completiondate < GETDATE()
and ord_totalmiles > 0
group by
ord_originpoint,ord_destpoint
order by
OrdCount desc
If I exclude columns less than zero in the totalmiles column I get :
+-------+----------+---------------+-------+-------+----------+-----------------+-------+---+------+-----+
| 8850 | DSDMIL | MILLTOWN | NJMID | 7206 | NORELI | ELIZABETH | NJ | 1 | 243 | 25 |
| 7047 | CSXNOR | NORTH BERGEN | NJ | 11550 | NASHEM | HEMPSTEAD | NYNAS | 1 | 492 | 34 |
| 7047 | CSXNOR | NORTH BERGEN | NJ | 7022 | PERFAI | FAIRVIEW | NJBER | 1 | 190 | 1 |
| 17013 | PEPCAR01 | CARLISLE | PA | 21224 | CSXBAL | BALTIMORE | MD | 1 | 350 | 97 |
| 23944 | GARKEN | KENBRIDGE | VA | 21224 | CSXBAL | BALTIMORE | MD | 1 | 814 | 230 |
| 21224 | CSXBAL | BALTIMORE | MD | 18202 | HAZHAZ04 | HAZLETON | PA | 1 | 621 | 161 |
| 17055 | WOOMEC | MECHANICSBURG | PACUM | 21224 | CSXBAL | BALTIMORE | MD | 1 | 355 | 85 |
| 23139 | MORPOW01 | POWHATAN | VA | 21224 | CSXBAL | BALTIMORE | MD | 1 | 1376 | 186 |
| 17109 | PEPHAR | HARRISBURG | PA | 21224 | CSXBAL | BALTIMORE | MD | 1 | 350 | 78 |
| 21224 | CSXBAL | BALTIMORE | MD | 20066 | WASWAS05 | WASHINGTON | DC | 1 | 675 | 54 |
| 21224 | CSXBAL | BALTIMORE | MD | 20743 | GRACAP | CAPITOL HEIGHTS | MD | 1 | 300 | 45 |
| 7047 | CSXNOR | NORTH BERGEN | NJ | 7866 | ROCROC05 | ROCKAWAY | NJ | 1 | 243 | 34 |
| 21224 | CSXBAL | BALTIMORE | MD | 20772 | BOBUPP | UPPER MARLBORO | MD | 1 | 283 | 37 |
+-------+----------+---------------+-------+-------+----------+-----------------+-------+---+------+-----+
I'm still not entirely sure of the exact question that you're asking, but the way that you're doing subqueries within your SELECT statement is always going to be poor for performance, try using correct joins, something like this;
SELECT
c1.cmp_zip AS OriginZip
,oh.ord_originpoint AS OriginPoint
,replace(c1.cty_nmstct),'/','') AS Origin_City_State
,c2.cmp_zip AS DestZip
,oh.ord_destpoint AS DestPoint
,replace(c2.cty_nmstct,'/','') AS Dest_City_State
,COUNT(oh.ord_hdrnumber) AS OrdCount
,SUM(oh.ord_rate)/COUNT(oh.ord_hdrnumber) AS AvgRate
,SUM(oh.ord_totalmiles)/COUNT(oh.ord_hdrnumber) AS AvgMiles
,(SUM(oh.rd_rate)/COUNT(oh.ord_hdrnumber))/( SUM(oh.ord_totalmiles)/COUNT(oh.ord_hdrnumber)) AS AvgRevperMiles
FROM orderheader oh
LEFT JOIN company c1
ON oh.ord_originpoint = c1.cmp_id
LEFT JOIN company c2
ON oh.ord_destpoint = c2.cmp_id
WHERE oh.ord_billto <> 'CSXJAC01'
AND oh.ord_revtype1 = 'NE'
AND oh.ord_status = 'CMP'
AND oh.ord_bookdate > GETDATE() - 730
AND oh.ord_completiondate < GETDATE()
AND oh.ord_totalmiles > 0
GROUP BY
c1.cmp_zip
,oh.ord_originpoint
,replace(c1.cty_nmstct),'/','')
,c2.cmp_zip
,oh.ord_destpoint
,replace(c2.cty_nmstct,'/','')
ORDER BY OrdCount DESC
You may need to tweak the odd field (i'm not sure the table that contains the field rd_rate for example, althouth this may be a typo for ord_rate at a guess)

mysql: comparing two columns

my tables and their layout:
mysql> select * FROM xt_shipping_zones;
+---------+-------------+---------------------------------------------------------------------------+
| zone_id | zone_name | zone_countries |
+---------+-------------+---------------------------------------------------------------------------+
| 5 | ZONE1 | AT,BE,BG,DK,FI,FR,GR,IE,IT,LV,LT,LU,MC,NL,PL,PT,RO,SM,SE,SK,SI,ES,HU,GB |
| 6 | Deutschland | DE |
| 8 | ZONE2Brutto | AD,NO,VA |
| 9 | ZONE2NETTO | CH,LI |
+---------+-------------+---------------------------------------------------------------------------+
mysql> select * FROM xt_shipping_cost WHERE shipping_geo_zone = 99995 LIMIT 5;
+------------------+-------------+-------------------+-----------------------+--------------------------+------------------------+----------------+------------------+
| shipping_cost_id | shipping_id | shipping_geo_zone | shipping_country_code | shipping_type_value_from | shipping_type_value_to | shipping_price | shipping_allowed |
+------------------+-------------+-------------------+-----------------------+--------------------------+------------------------+----------------+------------------+
| 269 | 34 | 99995 | | 0.31 | 17.99 | 17.0000 | 1 |
| 270 | 34 | 99995 | | 17.99 | 35.99 | 34.0000 | 1 |
| 271 | 34 | 99995 | | 35.99 | 53.99 | 51.0000 | 1 |
| 272 | 34 | 99995 | | 53.99 | 71.99 | 68.0000 | 1 |
| 273 | 34 | 99995 | | 71.99 | 89.99 | 85.0000 | 1 |
+------------------+-------------+-------------------+-----------------------+--------------------------+------------------------+----------------+------------------+
mysql> SELECT * FROM geoip WHERE 92569600 BETWEEN start AND end;
+----------+----------+---------+-----+
| start | end | country | id |
+----------+----------+---------+-----+
| 92569600 | 92585983 | AT | 895 |
+----------+----------+---------+-----+
My Query:
SELECT
xt_shipping_cost.shipping_type_value_from,
xt_shipping_cost.shipping_type_value_to,
xt_shipping_cost.shipping_price,
geoip.country
FROM xt_shipping_cost
INNER JOIN xt_shipping_zones
ON xt_shipping_cost.shipping_geo_zone = xt_shipping_zones.zone_id + 99990
INNER JOIN geoip
ON geoip.country REGEXP xt_shipping_zones.zone_countries
WHERE 34664448 BETWEEN geoip.start AND geoip.end
My Problem:
Query is working if there is only ONE entry in xt_shipping_zones.zone_countries like DE. If there are multiple (with comma seperated entries) i cant get a match on that row.
Doing it manually:
mysql> SELECT * FROM `xt_shipping_zones` WHERE `zone_countries` REGEXP 'AT';
+---------+-----------+---------------------------------------------------------------------------+
| zone_id | zone_name | zone_countries |
+---------+-----------+---------------------------------------------------------------------------+
| 5 | ZONE1 | AT,BE,BG,DK,FI,FR,GR,IE,IT,LV,LT,LU,MC,NL,PL,PT,RO,SM,SE,SK,SI,ES,HU,GB |
+---------+-----------+---------------------------------------------------------------------------+
SQLFiddle: http://sqlfiddle.com/#!9/68f8d0/1
I hope i didn't failed to much to make my problem clear.
Thank you
I think you can use find_in_set()
SELECT
xt_shipping_cost.shipping_type_value_from,
xt_shipping_cost.shipping_type_value_to,
xt_shipping_cost.shipping_price,
geoip.country
FROM xt_shipping_cost
INNER JOIN xt_shipping_zones
ON xt_shipping_cost.shipping_geo_zone = xt_shipping_zones.zone_id + 99990
INNER JOIN geoip
ON find_in_set(geoip.country, xt_shipping_zones.zone_countries)
WHERE 34664448 BETWEEN geoip.start AND geoip.end
It is no good idea to store the values as csv. That is very bad database design.

Mysql: joining tables for translation records

I have 2 tables with this configuration:
table language('id', 'language_name', 'iso_code')
table translation('id', 'language_id', 'translated_text')
In the first table I have records:
---------------------------------
| id | language_name | iso_code |
---------------------------------
| 1 | English | en |
| 2 | Espanõl | es |
| 3 | Français | fr |
---------------------------------
The second table:
--------------------------------------
| id | language_id | translated_text |
--------------------------------------
| 1 | 1 | Good Morning |
| 2 | 1 | How are you? |
| 1 | 2 | Buenos dias |
| 2 | 3 | Comment ça va? |
--------------------------------------
All English text strings exist, but some of the other languages dont.
I would like to show a table with ALL English text strings and corresponding translations, like:
----------------------------------------
| text_id | en | es |
----------------------------------------
| 1 | Good Morning | Buenos dias |
| 2 | How are you? | |
----------------------------------------
or
-------------------------------------------
| text_id | en | fr |
-------------------------------------------
| 1 | Good Morning | Comment ça va? |
| 2 | How are you? | |
-------------------------------------------
Any ideas?
Just keep doing left joins to same table on the ID, but extra columns representing their language...
Edited to show English if no value in corresponding columns per comment inquiry.
select
eng.id,
eng.translated_text InEnglish,
coalesce( spn.translated_text, eng.translated_text ) InSpanish,
coalesce( frn.translated_text, eng.translated_text ) InFrench
from
translation eng
left join translation spn
on eng.id = spn.id
and spn.Language_ID = 2
left join translation frn
on eng.id = frn.id
and spn.Language_ID = 3
where
eng.Language_id = 1
order by
eng.id