What is the best way to handle the restriction of an API? - language-agnostic

Our core domain so far has an abstraction called PersonName with methods for firstName, lastName, middleInitial etc. As we are expanding the domain to Spain, we figured that they only talk in terms of name, firstSurname and secondSurname i.e. middleInitial etc have no significance to them.
The PersonName interface is currently being used at many places in the current API and the SpainPersonName should also be used at the same places. So, my option is to extend SpainPersonName from PersonName. But, if i do this then I will end up exposing the API for firstName, middleInitial etc which are not applicable for Spain domain.
My question is how best we can refactor the current abstractions still keeping the backward compatibility? Any refactoring or design suggestions are greatly appreciated.

I am not too sure what your question is. By "to extend SpainPersonName from PersonName", do you mean make SpainPersonName implement or inherit from PersonName?
In any case, let me speculate that the PersonName abstraction might be a flawed one. An abstraction must be widely applicable, at least to the situations where it is applied, right? We Spaniards do not think in terms of first name vs. last name, as you well point out. Maybe the abstraction needs to be rethought. In my experience, an abstraction based on GivenName plus FamilyName is the most widely applicable one, even to Asian cultures where the order of names is not the "usual" one.
Being constructive, I think that you need to map the Spanish first and second surnames to the abstract last name, because that (first and second surnames) is what we Spaniards conceive as our "last name". If you can do that, then you are doing acceptably well.

Sounds like you need to change PersonName to AngloPersonName and create a PersonName interface?

Related

How to handle properties that exist "between" entities (in a many-to-many relationship in this case)?

I've found a few questions on modelling many-to-many relationships, but nothing that helps me solve my current problem.
Scenario
I'm modelling a domain that has Users and Challenges. Challenges have many users, and users belong to many challenges. Challenges exist, even if they don't have any users.
Simple enough. My question gets a bit more complicated as users can be ranked on the challenge. I can store this information on the challenge, as a set of users and their rank - again not too tough.
Question
What scheme should I use if I want to query the individual rank of a user on a challenge (without getting the ranks of all users on the challenge)? At this stage, I don't care how I make the call in data access, I just don't want to return hundreds of rank data points when I only need one.
I also want to know where to store the rank information; it feels like it's dependent upon both a user and a challenge. Here's what I've considered:
The obvious: when instantiating a Challenge, just get all the rank information; slower but works.
Make a composite UserChallenge entity, but that feels like it goes against the domain (we don't go around talking about "user-challenges").
Third option?
I want to go with number two, but I'm not confident enough to know if this is really the DDD approach.
Update
I suppose I could call UserChallenge something more domain appropriate like Rank, UserRank or something?
The DDD approach here would be to reason in terms of the domain and talk with your domain expert/business analyst/whoever about this particular point to refine the model. Don't forget that the names of your entities are part of the ubiquitous language and need to be understood and used by non-technical people, so maybe "UserChallenge" is not he most appropriate term here.
What I'd first do is try to determine if that "middle entity" deserves a place in the domain model and the ubiquitous language. For instance, if you're building a website and there's a dedicated Rankings page where the user he can see a list of all his challenges with the associated ranks, chances are ranks are a key matter in the application and a Ranking entity will be a good choice to represent that. You can talk with your domain expert to see if Rankings is a good name for it, or go for another name.
On the other hand, if there's no evidence that such an entity is needed, I'd stick to option 1. If you're worried about performance issues, there are ways of reducing the multiplicity of the relationship. Eric Evans calls that qualifying the association (DDD, p.83-84). Technically speaking, it could mean that the Challenge has a map - or a dictionary of ranks with the User as a key.
I would go with Option 2. You don't have to "go around talkin about user-challenges", but you do have to go around grabbin all them Users for a given challenge and sorting them by rank and this model provides you a great way to do it!

Implementing search on medical link list/table that allows for synonyms/abbreviations- and importing such a thing

I'm making a simple searchable list which will end up containing about 100,000 links on various medical topics- mostly medical conditions/diseases.
Now on the surface of things this sounds easy... in fact I've set my tables up in the following way:
Links: id, url, name, topic
Topics (eg cardiology, paediatrics etc): id, name
Conditions (eg asthma, influenza etc): id, name, aliases
And possibly another table:
Link & condition (since 1 link can pertain to multiple conditions): link id, condition id
So basically since doctors (including myself) are super fussy, I want to make it so that if you're searching for a condition- whether it be an abbreviation, british or american english, or an alternative ancient name- you get relevant results (eg "angiooedema", "angioedema", "Quincke's edema" etc would give you the same results; similarly with "gastroesophageal reflux" "gastro-oesophageal reflux disease", GERD, GORD, GOR). Additionally, at the top of the results it would be good to group together links for a diagnosis that matches the search string, then have matches to link name, then finally matches to the topic.
My main problem is that there are thousands if not tens of thousands of conditions, each with up to 20 synonyms/spellings etc. One option is to get data from MeSH which happens to be a sort of medical thesaurus (but in american english only so there would have to be a way of converting from british english). The trouble being that the XML they provide is INSANE and about 250mb. To help they have got a guide to what the data elements are.
Honestly, I am at a loss as to how to tackle this most effectively as I've just started programming and working with databases and most of the possibilities of what to do seem difficult/suboptimal.
Was wondering if anyone could give me a hand? Happy to clarify anything that is unclear.
Your problem is well suited to a document-oriented store such as Lucene. For example you can design a schema such as
Link
Topic
Conditions
Then you can write a Lucene query such as Topic:edema and you should get all results.
You can do wildcard search for more.
To match british spellings (or even misspellings) you can use the ~ query which finds terms within a certain string distance. For example edema~0.5 matches oedema, oedoema and so on...
Apache Lucene is a Java library with portts available for most major languages. Apache Solr is a full-fledged search server built using Lucene lib and easily integrable into your platform-of-choice because it has a RESTful API.
Summary: my recommendation is to use Apache Solr as an adjunct to your MySql db.
It's hard. Your best bet is to use MeSH and then perhaps soundex to match on British English terms.

Object Oriented style programming for interaction between objects

I am trying to write a program in object-oriented style. I have some confusions when coding the interaction between two objects.
Scenario:
Person (John) gives Person (Betty) $ 5.
Possible solutions (pseudo code):
A) John.pays(Betty, 5);
B) Betty.receives(John, 5);
C) Bank.transfer(John, Betty, 5);
D)
begin transaction:
John.decrease(5);
Betty.increase(5);
end transaction:
E) Service.transferMoney(John, Betty, 5); // Service is a generic service object
Please tell me which one is a more appropriate way of coding in OOP way, and the reason behind it. I am looking for some guidelines like "Tell, Don't Ask" rule.
Thanks.
One thing I've noticed is that people that are new to OOP get caught up in trying to map the physical world into the code they are writing. Do you really care that John and Betty are people or are you actually wanting to depict a bank account? I think your choice of objects in the example actually make it harder to figure out the solution to the problem.
The important parts of this are
1) Where to put the logic of how to move the money.
2) Where to store the data of how much money each person has.
You need to decide if you want to talk about the problem in the context of a person or a customer of a bank (may be a person, company, or something else). I'm guessing you are talking about a customer because assuming it is a person would be limiting and misleading. Also, a Bank is a pretty generic term, is it the big brick building with people inside of it or is it the online website with several different pages that do different things.
A bank account object can have a method (possibly static depending on how you decide to store your data and what all you are going to use your object for) that knows how to transfer from one account to another. The logic of how to transfer does not belong to Betty or John or a bank, it belongs to a bankAccount which can have special logic based on the type of account if there are fee's involved or the like. If you gave that logic to the bank you would end up with a giant bank class with methods for everything from greating a customer to dealing with money in very specific account types. Each account type my have different rules for how it handles transfers. Think of times where you may want to show a transfer or deposit as pending.
If you are just solving the problem of transfering money, there is no need to create a bunch of objects. Based on the known requirements and presumed future requirements the below would be a good choice.
CheckingAccount.Transfer(johnsAccountNo, bettysAccountNo, amount)
Can I ask a question now? Who controls the money? Does John decide the transaction amount, does Betty, or some unspecified 3rd party?
The reason I am asking is because there is no real right or wrong answer here, just one that might be more flexible, or robust than the others. If this is a real life situation then I would model the transaction as something that both parties have to agree on before it proceeds, and the person spending the money (John) initiating it. Something like answer C and #Mysterie Man
tx transaction_request = John.WantsToBuyFor(5); //check if John can
if( Betty.AgreesWith( transaction_request ) ) //check if Betty wants
{
transaction_request.FinalizeWith(Betty); //Do it with Betty
}
and the FinalizeWith function does the math
void FinalizeWith(Person party)
{
requestor.cash -= amount;
party.cash += amount;
{
Of course you might want to add some description of what item is John buying.
The answer to this question is a long and complicated one that you'll get in bits and pieces from a large number of people. Why only in bits and pieces? Because the correct answer depends almost entirely upon what your system's requirements are.
One trap you will have to make sure you don't fall into, however, is this one. Read the answers you get here. You'll get a lot of good advice. (Pay the most attention to the advice that's been voted up a lot.) Once you've read and understood those, read Steve Yegge's rant (and understand it!) as well. It will save you sanity in the long run.
I'd vote for none of the above :)
Why is John paying Betty? That's an important question, as it explains where the entry point is. Let's say John owes Betty money, and it's payday.
public class John
{
public void onPayday()
{
Betty.Receive(5.0f);
}
}
This is if you want to go with a pure object-interaction style approach, of course.
The difference here is that we don't have an outside routine coordinating the interactions between John and Betty. Instead, we have John responding to external events, and choosing when to interact with Betty. This style also leads to very easy descriptions of desired functionality - eg "on payday, John should pay Betty."
This is a pretty good example of what Inversion of Control means - the objects are interacting with each other, rather than being manipulated by some external routine. It's also an exmaple of Tell, Don't Ask, as the objects are telling each other things (John was told it's payday, John tells Betty to accept 5 dollars).
There are a number of alternate solutions here. For instance,
Betty.Receieves(John.Gives(5))
This assumes that the Gives function returns the amount given.
tx = CashTransaction(John, Betty);
tx.Transfer(5);
This assumes the first prameter is the Payor, and the second is the Payee, then you can perform multiple transactions without creating new objects.
Things can be modeled in a number of ways. You should choose the one that most closely resembles what you are trying to model.
There is one property of pure OOP that can help with the example which easily passes under the radar, but the object-capability model makes explicit and centers on. The linked document ("From Objects to Capabilities" (FOtC)) goes into the topic in detail, but (in short) the point of capabilities is that the ability of an object to affect its world is limited to objects it has references to. That may not seem significant at first, but is very important when it comes to protecting access and affects what methods of a class are available in methods of other classes.
Option A) gives account John access to account Betty, while option B) gives Betty access to account John; neither is desirable. With option C), account access is mediated by a Bank, so only Banks could steal or counterfeit money. Option D) is different than the other three: the others show a message being sent but not the implementation, while D) is a method implementation that doesn't show what message it handles, nor what class it handles it for. D) could easily be the implementation for any of the first three options.
FOtC has the beginning of a solution that includes a few other classes:
sealers & unsealers,
purses, which are a little like accounts in that they contain money but don't necessarily have an owner.
mints, which are the only things that can create purses with positive balances
A mint has a sealer/unsealer pair, which it endows to a purse whenever the mint creates one. Purses oversee balance changes; they use the sealer when decreasing a balance, and the unsealer to transfer from one purse to another. Purses can spawn empty purses. Because of the use of sealers & unsealers, a purse only works with other purses created by the same mint. Someone can't write their own purse to counterfeit money; only an object with access to a mint can create money. Counterfeiting is prevented by limiting access to mints.
Anyone with access to a purse can initiate a transaction by spawning an empty purse and transferring money from the first purse into it. The temporary purse can then be sent to a recipient, which can transfer money from the temporary purse to some other purse that it owns. Theft is prevented by limiting access to purses. For example, a bank holds purses on behalf of clients in accounts. Since a bank has access only to the purses of its clients' accounts and temporary purses, only a client's bank can steal from the client (though note that in a transfer between bank accounts, there are two clients that can be victimized, hence two potential thieves).
This system is missing some important details, such as monetary authorities (which hold references to one or more mints) to create money.
All in all, monetary transactions are tricky to implement securely, and thus may not be the best examples to learn the basics of OOP.
If you really want to get OOPy, try the following
Person Betty,John;
CashTransfer PocketMoney;
PocketMoney.from = John;
PocketMoney.to = Betty;
PocketMoney.amount = 20.00;
PocketMoney.transfer();
The point of OOP isn't to make code more like written language, but to have objects with different methods and parameters to make code more readable.
So from the above code, you can see that John is giving Betty $20 in pocket money. The code is meaningful, allowing for easier code readability, as well as understandability.
My vote: C. Where C does what D does (e.g. doesn't lose money, etc.).
In this small example, "the bank" is a perfectly valid entity which knows how much money John and Betty have. Neither John nor Betty should be able to lie to the bank.
Don't be afraid to invert (or not) the logic in an "OO" program as required for the situation.
You should model according to your domain. Option C looks best choice as it will separate the transaction logic into the Bank\Service class.
This is a question I often struggle with myself as a novice programmer. I agree that "C" seems like the best choice. In something like this, I think it's best to use a "neutral" entity such as the "bank". This actually models most real life transactions of importance since most transactions of import utilize checks and/or credit (a neutral 3rd party).
Being new to OOP and finally using some OOP, I'd say that it should be A and B.
We are focusing on persons and it's up to each person to handle his money. We don't know if he's going to use the bank or if he's just getting cash directly from Betty.
You created a Person class and you add methods to the class with two methods: send and recieve. It also must have a public var named balance to keep track of their balances.
You create two Person objects: Betty and John. Use methods accordingly. Like John.sends(Betty, 5). That should create Betty and update Betty's balance as well.
What if they want to use the bank? Add another method, say... Transfer(acct) whatever it is.
That's what I would think.

How important is database normalization in a very simple database?

I am making a very simple database (mysql) with essentially two types of data, always with a 1 to 1 relationship:
Events
Sponsor
Time (Optional)
Location (City, State)
Venue (Optional)
Details URL
Sponsors
Name
URL
Cities will be duplicated often, but is there really much value in having a cities table for such a simple database schema?
The database is populated by screen-scraping a website. On this site the city field is populated via selecting from a dropdown, so there will not be mistypes, etc and it would be easy to match the records up with a city table. I'm just not sure there would be much of a point even if the users of my database will be searching by city frequently.
Normalize the database now.
It's a lot easier to optimize queries on normalized data than it is to normalize a pile of data.
You say it's simple now - these things have a tendency to grow. Design it right and you'll get the experience of proper design and some future proofing.
I think you are looking at things the wrong way - you should always normalize unless you have a good reason not to.
Trusting your application to maintain data integrity is a needless risk. You say the data is made uniform because it is selected from a dropdown. What if someone hacks on the form and modifies the data, or if your code inadvertently allows a querystring param with the same name?
Where will the city data come from that populates your dropdown box for the user? Wouldn't you want a table for that?
It looks like you are treating Location as one attribute including city and state. Suppose you want to sort or analyse events by state alone rather than city and state? That could be hard to do if you don't have an attribute for state. Logically I would expect state to belong in a city table - although that may depend on exactly how you want to identify cities.
Direct answer: Just because a problem is relatively simple is no reason to not do things to keep it simple. It's a lot easier to walk on my feet than on my hands. I don't recall ever saying, "Oh, I only have to go half a mile, that's a short distance so I might as well walk on my hands."
Longer answer: If you don't keep any information about a city other than it's name, and you don't have a pre-set list of cities (e.g. to build a drop-down), then your schema is already normalized. What would be in a City table other than the city name? (I presume State cannot be dependent on City because you could have two cities with the same name in different states, e.g. Dayton OH and Dayton TN.) The relevant rule of normalization is "no non-key dependencies", that is, you cannot have data that depends on data that is not a key. If you had, say, latitude and longitude of each city, then this data would be repeated in every record that referenced the same city. In that case you would certainly want to break out a separate city table to hold the latitude and longitude. You could, of course, create a "city code" that is an integer or abbreviation that links to a city table. But if there's no other data about a city, I don't see how this gains anything.
Technically, I would assume that City depends on Venue. If the venue is "Rockefeller Center", that implies that the city must be New York. But if venue is optional, this creates problems. One possibility is to have a Venue table that lists venue name, city, and state, and for cases where you don't specify a venue, have an "unspecified" for each city. This would be more textbook correct, but in practice if in most case you do not specify a venu, it would gain little. If most of the time you DO specify a venu, it would probably be a good idea.
Oh, and, is there really a 1:1 relation between event and sponsor? I can believe that an event cannot have more than one sponsor. (In real life, there are plenty of events with multiple sponsors, but maybe for your purposes you only care about a "primary sponsor" or some such.) But does a sponsor never hold more than one event? That seems unlikely.
Why not go ahead and normalize? You write as if there are significant costs of normalizing that outweigh the benefits. It's easier to set it up in a normal form before you populate it than to try and normalize it later.
Also, I wonder about your 1-to-1 relationship. Naively, I would imagine that an event might have multiple sponsors, or that a sponsor might be involved in more than one event. But I don't know your business logic...
ETA:
I don't know why I didn't notice this before, but if you are really averse to normalizing your database, and you know that you will always have a 1-to-1 relationship between the events and sponsors, then why would you have the sponsors in a separate table?
It sounds like you may be a little confused about what normalization is and why you would do it.
The answer hinges, IMO, on whether you want to prevent errors during data-entry. If you do, you will need a VENUES table:
VENUES
City
State
VenueName
as well as a CITIES and STATES table. (Note: I've seen situations where the same city occurs multiple times in the same state, usually smaller towns, so CITY/STATE do not comprise a unique dyad. Normally there's a zipcode to disambiguate.)
To prevent situations where the data-entry operator enters a venue for NY NY which is actually in SF CA, you'd need to validate the venue entry to see if such a venue exists in the city/state supplied on the record.
Then you'd need to make CITY/STATE mandatory, and have to write code to rollback the transaction and handle the error.
If you are not concerned about enforcing this sort of accuracy, then you don't really need to have CITY and STATES tables either.
If you are interested in learning about normalization, you should learn what happens when you don't normalize. For each normal form (beyond 1NF) there is an update anomaly that will occur as a consequence of harmful redundancy.
Often it's possible to program around the update anomalies, and sometimes that's more practical than always normalizing to the ultimate degree.
Sometimes, it's possible for a database to get into an inconsistent state due to failure to normalize, and failure to program the application to compensate.
In your example, the best I can come up with is a sort of lame hypotheical. What if the name of a city got mispelled in one row, but spelled correctly in all the others. What if you summarized by city and sponsor? Your output would reflect the error, and diovide one group into two groups. Maybe it would be better if the city were only spelled out once in the database, for better or for worse. At least the grouping for the summary would be correct, even if the name were mispelled.
Is this worth nromalizing for? Hey, it's your project, not mine. You decide

What should I call a class that contains a sequence of states

I have a GUI tool that manages state sequences. One component is a class that contains a set of states, your typical DFA state machine. For now, I'll call this a StateSet (I have a more specific name in mind for the actual class that makes sense, but this name I think will suffice for the purpose of this question.)
However, I have another class that has a collection (possibly partially unordered) of those state sets, and lists them in a particular order. and I'm trying to come up with a good name for it - not just for internal code, but for customers to refer to it.
The role of this particular second collection is to encapsulate the entire currently used/available collection of StateSets that the user has created. All of the StateSets will be used eventually in the application. A good analogy would be a hand of cards versus the entire table: The 'table' contains all of the currently available hands, while the 'hand' contains a particular collection of cards.
I've got these as starter ideas I could throw out for the class name; I'm not comfortable with either at the moment:
Sequence (maybe...with something else tacked on to the name)
StateSetSet (reasonable for code, but not for customers)
And as ewernli mentions, these are really technical terms, which don't really convey a the idea well. Any other suggestions or ideas?
Sequence - Definitely NOT. It's too generic, and doesn't have any real semantic meaning.
StateSetSet - While more semantically correct, this is confusing. You have a sequence, which implies order, which is different from a set, which does not.
That being said, the best option, IMO, is StateSetSequence, as it implies you have a sequence of StateSet instances.
What is the role/function of you StateSetSet?
StateSetSet or Sequence are technical terms.
Prefer a term that convey the role/function of the class.
That could well be something like History, Timeline, WorldSnapshot,...
EDIT
According to your updated description, StateSet looks to me like StateSpace (the space of all possible states). If the user can then interactively create something, it might be appropriate to speak of a Workspace. If the user creates various state spaces of interest, I would then go for StateSpaceWorkspace. Isn't that a cool name :)
"StateSets" may be sufficient.
Others:
StateSetList
StateSetLister
StateSetListing
StateSetSequencer
I like StateSetArrangement, implying an ordering without implying anything about the underlying storage mechanisms.