DB Design regarding intersections - mysql

Of these two designs for a crossroad database
#Street
street_id | street_nm
#Crossing
crossing_id | x | y | street_id_1 | street_id_2
VS
#Street
street_id | street_nm
#Crossing
crossing_id | x | y
#street crossing relationship
street_id | crossing_id
Assuming every crossing has only exactly two roads, is there a reason why one would use the first solution over the first?
EDIT: For the second setup, how could I create a view where results look like this
crossing_id| x | y | street_nm_1 | street_nm_1
Also I'm not sure how creating a junction with three roads would effect the view.

I'd prefer the second.
First of all "assuming every crossing has only exactly two roads" is quite risky. In general, when designing I prefer not to rely on assumptions that clash with reality because sooner or later your design will have to accomodate for "extra cases".
But the second design is better for another reason... assuming you want to design a query that returns all roads that cross road "X" (which I suppose would be a pretty common requirement) your first design forces you to test for road "X" id both in street_id_1 and street_id_2 - in general, the queries are more convoluted because whenever you are looking for a given road, you don't know if it will be listed in id_1 or id_2.
The relationship "x crosses y" should be symmetrical (unless you want to distinguish between "main roads" and "tributaries", which does not seem to be the case here) so the second design is closer to the intent.
Regarding your question about the view... what about:
Select a.cross_id,a.x,a.y,b.street_nm,c.street_nm
from crossing a, crossing_rel e, street b, street c
where b.street_id=e.street_id and
c.street_id=e.street_id and
a.crossing_id=e.crossing_id and
b.street <> c.street
note that this will not give any specific order to which street appears as "x" and which as "y"... maybe you will prefer something like:
Select a.cross_id,a.x,a.y,b.street_nm,c.street_nm
from crossing a, crossing_rel e, street b, street c
where b.street_id=e.street_id and
c.street_id=e.street_id and
a.crossing_id=e.crossing_id and
b.street_nm < c.street_nm

The second solution is a little more flexible for additions to either crossings or streets while keeping the relationship between them in its proper context. It's a subtle distinction, but worth mentioning.

Related

How to get rid of arrays in mysql database?

Dining room specializes on complex dinners. Have collection of recipes (each of them collect rates of the products). Every product have changeable price.
Is it the best design?
Recipe(r_id, r_title, r_category, r_price)
Product(p_id, p_title, p_price)
UsingProducts(r_id, p_id, amount)
I am just not sure about UsingProducts..
The design looks quite okay.
As zerkms mentioned, you're lacking units. That doesn't have to be a problem, as your product can be "100 g flour" so the unit is implicit. However, when printing the recipe, you would print "5 x 100 g flour" instead of "500 g flour". It would also print "10 x 100 g flour" instead of "1 kilo flour".
Just think about whether this an issue for you and if you even need unit conversion like 1000 g = 1 kilo.
Another point is your category. So a recipe can only belong to one category. So you won't have something like "vegetarian" and "soups" with the problem where to place a vegetarian soup, but use distinct categories instead. Okay. However, don't you want a table for them, so to be able to easily select them? If you want to stay with this design you should at least make them an enum column (something special in MySQL), so you dont mistakenly have recipes in "soups" and others in "suops".
At last: What is the r_price for? Shouldn't that be the the sum of all sub prices (product price x amount)? Don't hold data redundantly. This must not be done. Otherwise inconsistencies can occur (e.g. 10$ + 10$ = 30$). Remove r_price from table recipe to have a normalized database.

Data structure for storing chord progression rules? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
What would be the most appropriate (naturally suited) way to represent the various chord progression (musical) rules in a data-structure such that each chord had a weighted set of options that it could progress to?
This data structure would be implemented in a procedural music generation program in a way that you could code: (language-agnostic pseudo-code)
Chord[7] songArray;
Chord first = new Chord(I); //set the first chord's value
songArray[0] = first;
for (i=0; i<7; i++){
Chord temp = songArray[i].next(); //select the following chord
songArray[i+1] = temp;
}
Note: In classical-type music, each chord in a given key can naturally progress to another chord following these rules:
----------------------
| Chord | Leads to |
|=======================
| I | any |
| ii | V, vii |
| iii | IV, vi |
| IV | ii, V, vii |
| V | vi |
| vi | ii, ii, IV, V|
| vii | I |
----------------------
The data structure would store the various progressions as weighted options. As an example, consider the IV chord in any given major key: IV can naturally progress to ii, V, or vii, but could also break the rules in progressing to any other chord. Breaking the rules would happen infrequently.
I have considered some sort of linked list/tree data structure, but it would hardly resemble any type of tree or list I've ever used -- additionally, I can't work out how to implement the weighting:
Another thought was to use JSON or something similar, but it seems to get redundant very quickly:
{
"I":{
"100%":{
"I",
"ii",
"iii",
"IV",
"V",
"vi",
"vii"
}
},
"ii":{
"80%":{
"V",
"vii"
},
"20%":{
"i",
"ii",
"iii",
"IV",
"vi"
}
},
// ...
}
Note: I am comfortable implementing this in a handful of languages, and at this point am NOT concerned with a specific language implementation, but a language-agnostic data-structure architecture.
A Markov Chain might be a good fit for this problem.
A Markov chain is a stochastic process where the progression to the next state is determined by the current state. So for a given interval from your table you would apply weights to the "Leads to" values and then determine randomly to which state to progress.
I'd expect you to have less than 100 chords, therefore if you use 32 bits to represent probability series (likely extreme overkill) you'd end up with a 100x100x4 (40000) byte array for a flat Markov matrix representation. Depending on the sparsity of the matrix (e.g. if you have 50 chords, but each one typically maps to 2 or 3 chords) for speed and less importantly space reasons you may want an array of arrays where each final array element is (chord ID, probability).
In either case, one of the key points here is that you should use a probability series, not a probability sequence. That is, instead of saying "this chord has a 10% chance, and this one has a 10% chance, and this one has a 80% chance) say "the first chord has a 10% chance, the first two chords have a 20% chance, and the first three chords have a 100% chance."
Here's why: When you go to select a random but weighted value, you can generate a number in a fixed range (for unsigned integers, 0 to 0xFFFFFFFF) and then perform a binary search through the chords rather than linear search. (Search for the element with least probability series value that is still greater than or equal to the number you generated.)
On the other hand, if you've only got a few following chords for each chord, a linear search would likely be faster than a binary search due to a tighter loop, and then all the probability series saves you calculating a simple running sum of the probability values.
If you don't require the most staggeringly amazing performance (and I suspect you don't -- for a computer there's just not that many chords in a piece of music) for this portion of your code, I'd honestly just stick to a flat representation of a Markov matrix -- easy to understand, easy to implement, reasonable execution speed.
Just as a fun aside, this sort of thing lends itself well to thinking about predictive coding -- a common methodology in data compression. You might consider an n-gram based algorithm (e.g. PPM) to achieve higher-order structure in your music generation without too much example material required. It's been working in data compression for years.
It sounds like you want some form of directed, weighted graph where the nodes are the chords and the edges are the progression options with edge weights being the progression's likelihood.

How many combinations of k neighboring pixels are there in an image?

I suck at math, so I can't figure this out: how many combinations of k neighboring pixels are there in an image? Combinations of k pixels out of n * n total pixels in the image, but with the restriction that they must be neighbors, for each k from 2 to n * n. I need the sum for all values of k for a program that must take into account that many elements in a set that it's reasoning about.
Neighbors are 4-connected and do not wrap-around.
Once you get the number of distinct shapes for a blob of pixels of size k (here's a reference) then it comes down to two things:
How many ways on your image can you place this blob?
How many of these are the same so that you don't double-count (because of symmetries)?
Getting an exact answer is a huge computational job (you're looking at more than 10^30 distinct shapes for k=56 -- imagine if k = 10,000) but you may be able to get good enough for what you need by fitting for the first 50 values of k.
(Note: the reference in the wikipedia article takes care of duplicates with their definition of A_k.)
It seems that you are working on a problem that can be mapped to Markovian Walks.
If I understand your question, you are trying to count paths of length k like this:
Start (end)-> any pixel after visiting k neighbours
* - - - - -*
| |
| |
- - - -
in a structure that is similar to a chess board, and you want to connect only vertical and horizontal neighbours.
I think that you want the paths to be self avoiding, meaning that a pixel should not be traversed twice in a walk (meaning no loops). This condition lead to a classical problem called SAWs (Self Avoiding Walks).
Well, now the bad news: The problem is open! No one solved it yet.
You can find a nice intro to the problem here, starting at page 54 (or page 16, the counting is confusing because the page numbers are repeating in the doc). But the whole paper is very interesting and easy to read. It manages to explain the mathematical background, the historical anecdotes and the scientific importance of markovian chains in a few slides.
Hope this helps ... to avoid the problem.
If you were planning to iterate over all possible polyominos, I'm afraid you'll be waiting a long time. From the wikipedia site about polyominos, it's going to be at least O(4.0626^n) and probably closer to O(8^n). By the time n=14, the count will be over 5 billion and too big to fit into an int. By time n=30, the count will be more than 17 quintillion and you won't be able to fit it into a long. If all the world governments pooled together their resources to iterate through all polyominos in a 32 x 32 icon, they would not be able to do it before the sun goes supernova.
Now that doesn't mean what you want to do is intractable. It is likely almost all the work you do on one polyominal was done in part on others. It may be a fun task make an exponential speedup using dynamic programming. What is it you're trying to accomplish?

Why does the arrow go up in inheritance?

When you draw an inheritance diagram you usually go
Base
^
|
Derived
Derived extends Base. So why does the arrow go up?
I thought it means that "Derived communicates with Base" by calling functions in it, but Base cannot call functions in Derived.
AFAIK one of the reasons is notational consistency. All other directed arrows (dependency, aggregation, composition) points from the dependant to the dependee.
In inheritance, B depends on A but not vice versa. Thus the arrow points from B to A.
In UML the arrow is called a "Generalization" relationship and it only signals that each object of class Derived is also an object of class Base.
From the superstructure 2.1.2:
A Generalization is shown as a line with a hollow triangle as an
arrowhead between the symbols representing the involved classifiers.
The arrowhead points to the symbol representing the general
classifier. This notation is referred to as the “separate target style.”
Not really an answer though to the question :-)
Read the arrow as "inherits from" and it makes sense. Or, if you like, think of it as the direction calls can be made.
I always think of it as B having more stuff in it then A (subclasses often have more methods than superclasses), hence B gets the wide end of the arrow and A gets the pointy end!
B is the subject, A is the object, action is "inherit". So B acts on A, hence the direction of the arrow.
I think the point is to express "generalization": A is a generalization of B.
This way the arrow expresses the same concept as in extension but goes the "right" way
A note about ascii notation - from the wonderful c2 wiki page
You might consider the following ascii diagram arrow for
IS_A relation (inheritance)
+-------+ +-----------+
| Base | | Interface |
+---^---+ +-----^-----+
/_\ /_\ _
| : (_) OtherInterface
| : |
| : |
+---------+ +----------------+
| Derived | | Implementation |
+---------+ +----------------+
vs
HAS_A relation (containment)
+-------+
| User |
+-------+
|
|
|
\ /
+----V----+
| Roles |
+---------+

Human name comparison: ways to approach this task

I'm not a Natural Language Programming student, yet I know it's not trivial strcmp(n1,n2).
Here's what i've learned so far:
comparing Personal Names can't be solved 100%
there are ways to achieve certain degree of accuracy.
the answer will be locale-specific, that's OK.
I'm not looking for spelling alternatives! The assumption is that the input's spelling is correct.
For example, all the names below can refer to the same person:
Berry Tsakala
Bernard Tsakala
Berry J. Tsakala
Tsakala, Berry
I'm trying to:
build (or copy) an algorithm which grades the relationship 2 input names
find an indexing method (for names in my database, for hash tables, etc.)
note:
My task isn't about finding names in text, but to compare 2 names. e.g.
name_compare( "James Brown", "Brown, James", "en-US" ) ---> 99.0%
I used Tanimoto Coefficient for a quick (but not super) solution, in Python:
"""
Formula:
Na = number of set A elements
Nb = number of set B elements
Nc = number of common items
T = Nc / (Na + Nb - Nc)
"""
def tanimoto(a, b):
c = [v for v in a if v in b]
return float(len(c)) / (len(a)+len(b)-len(c))
def name_compare(name1, name2):
return tanimoto(name1, name2)
>>> name_compare("James Brown", "Brown, James")
0.91666666666666663
>>> name_compare("Berry Tsakala", "Bernard Tsakala")
0.75
>>>
Edit: A link to a good and useful book.
Soundex is sometimes used to compare similar names. It doesn't deal with first name/last name ordering, but you could probably just have your code look for the comma to solve that problem.
We've just been doing this sort of work non-stop lately and the approach we've taken is to have a look-up table or alias list. If you can discount misspellings/misheard/non-english names then the difficult part is taken away. In your examples we would assume that the first word and the last word are the forename and the surname. Anything in between would be discarded (middle names, initials). Berry and Bernard would be in the alias list - and when Tsakala did not match to Berry we would flip the word order around and then get the match.
One thing you need to understand is the database/people lists you are dealing with. In the English speaking world middle names are inconsistently recorded. So you can't make or deny a match based on the middle name or middle initial. Soundex will not help you with common name aliases such as "Dick" and "Richard", "Berry" and "Bernard" and possibly "Steve" and "Stephen". In some communities it is quite common for people to live at the same address and have 2 or 3 generations living at that address with the same name. The only way you can separate them is by date of birth. Date of birth may or may not be recorded. If you have the clout then you should probably make the recording of date of birth mandatory. A lot of "people databases" either don't record date of birth or won't give them away due to privacy reasons.
Effectively people name matching is not that complicated. Its entirely based on the quality of the data supplied. What happens in practice is that a lot of records remain unmatched - and even a human looking at them can't resolve the mismatch. A human may notice name aliases not recorded in the aliases list or may be able to look up details of the person on the internet - but you can't really expect your programme to do that.
Banks, credit rating organisations and the government have a lot of detailed information about us. Previous addresses, date of birth etc. And that helps them join up names. But for us normal programmers there is no magic bullet.
Analyzing name order and the existence of middle names/initials is trivial, of course, so it looks like the real challenge is knowing common name alternatives. I doubt this can be done without using some sort of nickname lookup table. This list is a good starting point. It doesn't map Bernard to Berry, but it would probably catch the most common cases. Perhaps an even more exhaustive list can be found elsewhere, but I definitely think that a locale-specific lookup table is the way to go.
I had real problems with the Tanimoto using utf-8.
What works for languages that use diacritical signs is difflib.SequenceMatcher()