Openstreetmap: Bring ways of motorway in the correct order - gis

I want to analyze the duration of speed limits on german motorways.
Therefore I chose the relation 20904, which is a german motorway. https://www.openstreetmap.org/relation/20904.
I expected that the members (way pieces) are in the correct order. That means to me, that each following way is in the next line. But there are ways which are not sorted at all. For example the last member of that relation is anywhere in the middle of the motorway.
How can I sort the members of the relation in a chronological order? I couldn't find anything about it so I think I got something really wrong in working with OSM
I tried to sort the ways in a json by identifying the last and the first node of the ways, but here again I'm not sure if the nodes are in the correct order.

Related

SQL - Finding rows with unknown, but slightly similar, values?

I am trying to write a query that will return similar rows regarding the "Name" column.
My issue is that within my SQL database , there are the following examples:
NAME DOB
Doe, John 1990-01-01
Doe, John A 1990-01-01
I would like a query that returns similar, but not exact, duplicates of the "Name" column. Since I do not know exactly which patients this occurs for, I cannot just query for "Doe, John%".
I have written this query using MySQL Workbench:
SELECT
Name, DOB, id, COUNT(*)
FROM
Table
GROUP BY
DOB
HAVING
COUNT(*) > 1 ;
However, this results in an undesirable amount of results which Name is not similar at all. Is there any way I can narrow down my results to include only similar (but not exact duplicate!) Name? It seems impossible, since I do not know exactly which rows have similar Name, but I figured I'd ask some experts.
To be clear, this is not a duplicate of the other question posted, since I do not know the content of the two(or more) strings whereas that poster seemed to have known some content. Ideally, I would like to have the query limit results to rows with the first 3 or 4 characters being the same in the "Name" column.
But again, I do not know the content of the strings in question. Hope this helps clarify my issue.
What I intend on doing with these results is manually auditing the rest of the information in each of the duplicate rows (over 90 other columns per row may or may not have abstract information in them that must be accurate) and then deleting the unneeded row.
I would just like to get the most concise and accurate list I can to go through, so I don't have to scroll through over 10,000 rows looking for similar names.
For the record, I do know for a fact that the two rows will have exactly similar names up until the middle initial. In the past, someone used a tool that exported names from one database to my SQL database, which included middle initials. Since then, I have imported another list that does not include middle initials. I am looking for the ones that have middle initials from that subset.
This is a very large topic and effort depends on what you consider as "similar" and what the structure of the data is. For example are you going to want to match Doe, Johnathan as well?
Several algorithms exist but they can be extremely resource intensive when matching name alone if you have a large data set. That is why often using other attributes such as DOB, or Email, or Address to first narrow your possible matches then compare names typically works better.
When comparing you can use several algorithms such as Jaro-Winkler, Levenshtein Distance, ngrams. But you should also consider "confidence" of match by looking at the other information as suggested above.
Issue with matching addresses is you have the same fuzy logic problems. 1st vs first. So if going this route I would actually turn into GPS coordinates using another service then accepting records within X amount of distance.
And the age old issue with this is Matching a husband and wife. I personally know a married couple both named Michael Hatfield. So you could try to bring in gender of name but then Terry, Tracy, etc can be either....
Bottom line is only go the route of similarity of names if you have to and if you do look into other solutions like services by Melissa data, sql server data quality services as a tool.....
Update per comment about middle initial. If you always know the name will be the same except middle initial then this task can be fairly simple and not need any complicated algorithm. You could match based on one string + '%' being LIKE the other then testing to make sure length is only 2 different and that there is 1 more spaces in it than the smaller string. Or you could make an attempt at cleansing/removing the middle initial, this can be a little complicated if name has a space in it Doe, Ann Marie. But you could do it by testing if 2nd to last character is a space.

Infinite Sub Category Ordering in MySQL

Given this table
Is it possible to write SQL to write a SELECT query that would result in an order such as this, while also being applicable regardless the number of subcategories in the table? I can write queries to fetch the correct order with a hard coded number of expected subcategories but I run into difficulty when that number is unknown.
Miscellaneous
Personal
Bobby
Jane
Susie
Tom
Other
Work Lunches
Or would I have to adjust the schema?
Well the best way is to manage the hierarchical data via the "Nested set" model:
http://en.wikipedia.org/wiki/Nested_set_model
it may seem alien first, but it is just fantastic once you get your head around it. (Yes I am using it, and it works great)
Of course this means you have to change your schema a bit (include the left & right values) and the selects/inserts/updates are different. But you can select or re-attach whole branches in one go very easily.

How slow is the LIKE query on MySQL? (Custom fields related)

Apologies if this is redundant, and it probably is, I gave it a look but couldn't find a question here that fell in with what I wanted to know.
Basically we have a table with about ~50000 rows, and it's expected to grow much bigger than that. We need to be able to allow admin users to add in custom data to an item based on its category, and users can just pick which fields defined by the administrators they want to add info to.
Initially I had gone with an item_categories_fields table which pairs up entries from item_fields to item_categories, so admins can add custom fields and reuse them across categories for consistency. item_fields has a relationship to item_field_values which links values with fields, which is how we handled things in .NET. The project is using CAKEPHP though, and we're just learning as we go, so it can get a bit annoying at times.
I'm however thinking of maybe just adding an item_custom_fields table that is essentially the item_id and a text field that stores XMLish formatted data. This is just for the values of the custom fields.
No problems if I want to fetch the item by its id as the required data is stored in the items table, but what if I wanted to do a search based on a custom field? Would a
SELECT * FROM item_custom_fields
WHERE custom_data LIKE '%<material>Plastic</material>%'
(user input related issues aside) be practical if I wanted to fetch items made of plastic in this case? Like how slow would that be?
Thanks.
Edit: I was afraid of that as realistically this thing will be around 400k rows for that one table at launch, thanks guys.
Any LIKE query that starts with % will not use any indexes you have on the column, so the query will scan the whole table to find the result.
The response time for that depends highly on your machine and the size of the table, but it definitely won't be efficient in any shape or form.
Your previous/existing solution (if well indexed) should be quite a bit faster.

Joining & Grouping SQL Results

I'm building a small cinema booking system PHP web application,
The database has a Film and Showing table. (amongst others but not important)
A Showing has a date and a time, and each Showing consists of one Film
A Film can have many Showings
I'm trying to build a query that will get all the film_name, showing_date and showing_time although I want to group the results so I don't have multiple films in the result, as you can have more than one showing on the same date.
I have this SQL:
SELECT f.film_name, s.showing_date, s.showing_time
FROM film f, showing s
WHERE f.film_id = s.film_id
GROUP BY s.film_id
However it's not showing all the times for each film, just the first one. I guess there is a lot I'm missing out, and maybe I should split the showing times into a separate table, but any help would be greatly appreciated. I will most more information and diagrams if necessary.
Thanks
Assuming you want one row per film, with all showings in the same row, try:
SELECT f.film_name, group_concat(concat(s.showing_date, s.showing_time)) showings
FROM film f, showing s
WHERE f.film_id = s.film_id
GROUP BY s.film_id
You cannot do what you are asking to do.
Each row in your result set can only show one film name and one show time. If film A is showing 5 times, then you can either get a result set of five lines, all listing film A and the different show times, or if you group by film A, you will only get one result, and it will list the first show time.
Based upon what you have told us, I believe what you are looking for is some way to condense each film into one row that still lists the showing dates and times properly. In order to do this, you will need to somehow collapse these rows into one row in a way that is not often used. Normall you would use some sort of function on these rows (SUM, COUNT, etc.) to give aggregate data. However, it sounds like you want to see the actual data.
To do this, there is a really helpful SO question here:
Concatenate many rows into a single text string?
The second-highest rated response talks about using XML PATH, which would probably be the cleanest way of doing it if your database supports that feature. If not, look at the accepted answer (COALESCE). I would suggest putting this type of code into a scalar function that returned one field with comma-separated showtimes for you. Then you could list a film and have a list of showtimes next to the film.
Sorry for the confusing and maybe wasting of time, I think I have found the solution by splitting the showing times into a separate table.
I find all of the films being shown on a certain date, then loop through and select all the showing times for those films based on the showing id returned from the first query, as there will only be on showing of a film per day. I add this information to the first result per loop cycle and pass the whole data back.
There's probably better way's of doing it, but this will do for now.
Thanks

mySQL is there a type of array that is searchable?

I know this may be a simple question and if I knew what I was looking for in specific I might be able to find it on my own. However this idea is a little out of the box in my line of normal thinking. So the question is, can I store an object/array of data in a single column that is actually searchable without having to break the object/array down with server-side script.
What the concept is, is I have a table in my db currently and its not even a definite table currently. But what I was initially thinking of having is a single table that each row will have a unique id and with this id a set of numbers (or more if I can actually store an object). What this data is, is my hope for not have rows of what could be redundant data. This is part of a one-to-many / many-to-many concept. The only thing I can think of off the top of my head is Google+ and it's "Circles" I want to be able to take a set of things group them together in a Circle like thing. Where if I choose that circle it will only show to those I want it show to.
Maybe I have this all wrong. If so, if someone can point me in a more solid direction that would be awesome. Bottom line is, I have a series of tables that have one distinct ID across all of them that is unique. This table is hoped to bridge some of those IDs to other things I have in the works. Where I can group these IDs together with one distinct id.
You probably want to implement an ER diagram like this example:
A user can have zero, one or more circles.
Then there's a many-to-many relationship between users and circles.