Finding the route with connecting flights using N1QL - couchbase

I am trying to extend the sample travel app available in Couchbase to display flights from source to destination with possible routes which includes connecting flight similar to what we generally see when searching expedia or similar travel sites
For Example : Source: Orlando Destination:Dayton. I am trying to display the following
1. Direct flights from Oralando to Dayton
2. Flights with one connection ( Orlando -> New York, New York -> Dayton)
3. other flights with more than one connecting flight.
Is there a way we can achieve this using N1QL?
Note: Using the travel sample data available in the couchbase installation.

There is no way to do such queries for arbitary-length paths. This is a well-known problem in standard SQL that N1QL shares. There are a few types of SQL that permit recursive queries, and they can solve this problem, but N1QL isn't one of them.
The best that N1QL can offer is (a query for direct flights UNION ALL a query for paths of length 2 UNION ALL a query for paths of length 3 UNION ALL ...), but then you need to know the maximum path length you will accept.
The queries for paths of length 2 or more would be written using JOINs.
You'll also need to avoid odd things such as redundant legs, as in this path:
Toronto -> New York -> Atlanta -> New York -> Miami
I suggest you implement the search for these multi-step flights in the app server using something like breadth-first search, and use N1QL to retrieve eligible flights at each step.

Related

database structure for a mobile app, which is faster

Maybe the question is a bit complex and complicated (that's why I need to make a brief introduction)
my team and I are developing a Mobile App with Node js. Now we are in the part of the database structure. Our idea is to do it in Azure SQL. But we have a couple of questions regarding the structure of the database.
We offer 5 services (at the moment), of which each user can be assigned several services (may non or all). Based on the services it has, the user will be redirected to a screen where all the services will be and only those assigned with color (to be able to click) and the others in gray (so that they cant click it)
Which is better, create one column per service or all services in a single column array style?
for example
service 1| service 2| service 3|service 4|service 5|
true | fasle | true | true| false| true
or
service
[service 1,service 2,service 3,service 4,service 5]
Because I think that if in the future we have x services, going through the entire array and making a condition to verify what service it has is going to make the latency of the app to slow, instead hitting a certain column maybe makes it faster
I hope the question has been understood, sorry if the maries.
regards
This isn't really an azure SQL question but more of a relational database question.
In general you should avoid both these methods and try to normalize your database.
Your database knows how to query multiple databases without any performance hits, its made for it.
The best option in my experience is to create a many to many table connection
So one table that holds the original data without any mention of a service, Maybe called Entities
Id, Data, Time, Active
Another table that holds the relations to the services, called EntitiesToServices
Id, ServiceId, OtherTableId
And a third table that holds data about the services called Services
Id, Name
In this way you can expand all your services freely and add more tables without anyone interfering with each other.
If all you need is a set of up to 64 true/false values, consider a single column of type SET. Similarly you could some any sized INT (again with a limit of 64 flags) and turn on/off each 'service'. Today you have 5; tomorrow, as you say, there will be more.
The syntax for SET is a bit clumsy. So is using INT for this.
It is very compact; this may or may not be a bonus.
Normalizing (as mentioned in another answer) may be a better solution, especially if you need to store more than just on/off for each service for each user.
Please provide more details on what actions will happen with these flags; then we can get into more detail.

Can I integrate a MySQL and a PostgreSQL database into a Spring MVC Maven Project?

I'm working on a project and I have been using MySQL as the primary database for a number of entities with their respective relations in place. The purpose of the project is to rate neighbourhoods in my hometown based on a number of criteria, one of which is house prices. I have an excel sheet with 36,000 + records of house sales, their prices and the lat and long of the house. A user will search a location, and a latitude and longitude will be calculated. I aim to return all the lat, long and house price values from the records of house sales within a radius of 1km so I can calculate the average house price for that respective area. Please see the screenshot below for a visual aid of what I'm talking about.
Based on research it seems that maybe the PostGIS extension for Postgresql is the way to go as it has a lot of tools for querying data based on geographic criteria.
As I have a MySQL DB already in place and I have no intention of creating a relationship between the entities in that DB and the house prices in a potential Postgresql DB can I and/or is it possible to have 2 separate databases in one MVC spring project or should I include everything in just one DB?

How can I extract location hierarchy from Openstreetmap?

I need to store first parent of any Location entity in a mySQL database. So at the end I'll have a complete hierarchy. For example I need to know Berlin is part of Germany and store Germany as first parent of Berlin in the table. How can I query OSM for such information?
You can't query OSM directly for this information. Of course OSM contains such information, mainly through boundary relations and admin_levels. But the exact hierarchy between different elements has to be calculated first.
Geocoders for OSM can be used to obtain these information. The currently most popular one is Nominatim. You can install your own Nominatim instace by either importing the whole planet or an country or area extract. Then you can try to obtain these information via the database created by Nominatim.

Method to Match Multiple Columns Dynamically in one Table to another Table (e.g. Address, City, State, Zip)

I'm using SQL Server 2008 w T-SQL syntax. I know a little bit of C# and Python - so possibly could venture into those paths to make this work.
The Problem:
I have multiple databases where I have to match customers within each Database to a "Master Customer" file.
It's basically mapping those customers at those distributors to the supplier level.
There are 3-8 million customers for each Database (8 of them) that have to be matched to the Supplier Table (1800 customers).
I'd rather not have to do a Excel "Matching game" for about 3-4 weeks (30 million customers). I need some shortcuts as this is exhaustive.
This is one Distributor Table:
select
master_cust_Num,
master_cust_name,
cust_shipto_name,
cust_shipto_address,
cust_shipto_address_2,
cust_shipto_city,
cust_shipto_state,
cust_shipto_zip
from
Distributor.sales
group by
master_cust_Num,
master_cust_name,
cust_shipto_name,
cust_shipto_address,
cust_shipto_address_2,
cust_shipto_city,
cust_shipto_state,
cust_shipto_zip
This is a small snippet of what the table yields:
And I'd have to match that to a table that looks like this:
Basically I need a function that will search out the address lines in the Distributor DBs for a matches or closest match(es). I could do a case when address like '%birch%' to find all 'birch' street matches when distributor.zip_code=supplier.zip_Code" but I'd rather not have to write in each of those strings within the like statements.
Is there an XML trick that will pick out parts within the multiple distributor address columns to match that of the supplier address column? (or maybe 1 at a time, if that's easier)
Can I do a like '%[0-9][A-Z]% type of search? I'm open to best practices (will awards pts) as to tackle this beast. I guess I'mt not even sure how to tackle this other than brute force by grouping by zip codes and working street matches from there.
The matching/searching 'like' function (or XML) or whatever would have to try to dynamically match one column say "Birch St" in the Supplier Address column to find all matches of "Birch Ave" "Birch St" "Birch Ave" "Birch" that had that same zip.
Do you have SQL Server Integration Services? If so, you could use the SSIS Fuzzy Lookup to look up customers in the master customer table by address.
If you're not familiar with that feature of SSIS, it will let you specify columns to compare, how many potential matches to return, and a threshold that a comparison has to meet to be reported to you.
Here's an article specifically about matching addresses: Using Fuzzy Lookup Transformations in SQL Server Integration Services
If you don't have it available ... well, then we start getting into implementing your own fuzzy-matching algorithm. Which is quite possible: it's just a bit more laborious.

How to design this "bus stations" database?

I want to design a database about bus stations. There're about 60 buses in the city, each of them contains these informations:
BusID
BusName
Lists of stations in the way (forward and back)
This database must be efficient in searching, for example, when user want to list buses which are go through A and B stations, it must run quickly.
In my first thought, I want to put stations in a seperate table, includes StationId and Station, and then list of stations will contains those StationIds. I guest it may work, but not sure that it's efficient.
How can I design this database?
Thank you very much.
Have you looked at Database Answers to see if there is a schema that fits your requirements?
I had to solve this problem and I used this :
Line
number
name
Station
name
latitude
longitude
is_terminal
Holiday
date
description
Route
line_id
from_terminal : station_id
to_terminal : station_id
Route schedule
route_id
is_holiday_schedule
starting_at
Route stop
route_id
station_id
elapsed_time_from_start : in minutes
Does it looks good for you ?
Some random thoughts based on travel on London buses In My Youth, because this could be quite complex I think.
You might need entities for the following:
Bus -- the physical entity, with a particular model (ie. particular seating capacity and disabled access, and dimensions etc) and VIN.
Bus stop -- the location at which a bus stops. Usually bus stops come in pairs, one for each side of the road, but sometimes they are on a one-way road.
Route -- a sequence of bus stops and the route between them (multiple possible roads exist). Sometimes buses do not run the entire route, or skip stops (fast service). Is a route just one direction, or is it both? Maybe a route is actually a loop, not a there-and-back.
Service -- a bus following a certain route
Scheduled Run -- an event when a bus on a particular service follows a particular route. It starts at some part of the route, ends at another part, and maybe skips certain stops (see 3).
Actual Run -- a particular bus followed a particular scheduled run. What time did it start, what time did it get to particular stops, how many people got on and off, what kind of ticket did they have?
(This sounds like homework, so I won't give a full answer.)
It seems like you just need a many-to-many relationship between buses and stops using 3 tables. A query with two inner joins will give you the buses that stop at two specific stops.
I'd hack it.
bus_id int
path varchar(max)
If a bus goes through the following stations (in this order):
01
03
09
17
28
Then I'd put in a record where path was set to
'-01-03-09-17-28-'
When someone wants to find a bus to get from station 03 to 28, then my select statement is
select * from buses where path like '%-03-%-28-%'
Not scalable, not elegant, but dead simple and won't churn through tables like mad when trying to find a route. Of course, it only works if there's a single bus that goes through the two stations in question.
what you have thought is good, in some cases it may or may not be efficient. I think that yo u should create tables as table1(BusID, BusName) table 2(Station List, Bus Id). I think this would would help. And try to use joins between these two tables to get the result. One more thing if possible try to normalize the tables that would help you.
I'd go for 3 tables :
bus
stations
bus_stations
"bus" for what the name stands for, "stations" for the station id's and names, and "bus_stations" to connnect those other 2 tables, wich would have bus_id, station_id_from station_id_to
This is probably more complex that you really need, but if, in the furure, you need to know the full trajectory of a bus, and also, from witch station one bus comes when it goes to "B station", will be usefull.
60 buses will not make that much impact in performance though.