Is it possible to compute the intersection of two non completely specified DFA? - intersection

If the two DFAs for which the intersection has to be computed are not completely specified, it can happens that it is possible to reach a state containing only one state of the two DFAs. Is that correct or additional steps are required before proceeding to the computation of the intersection?

Given a DFA that's missing a transition on some state, you can always convert that into a "full" DFA by adding in a new state qdead that isn't accepting and has transitions only to itself, then adding transitions into qdead for each missing transition. So in that sense, if you don't have full DFAs specified, you could always convert the DFAs into "full" DFAs and then run the cross-product construction as usual.
If you're specifically building a DFA For the intersection of the two input DFAs, though, you don't need to do this because all states generated this way are going to be equivalent to one another (none of them can reach an accepting state in the original machine). There are a couple of ways of forming intersections, and depending on which approach you take you could make anywhere from minor to major adjustments:
One algorithm simply computes all possible pairs of states from the two input DFAs and then fills in the transitions by looking at where the pairs of states transition to. If you're using this approach, you can just have each state that's missing a transition in one of the input DFAs have no transitions in the cross-product, simulating "one of the automata would have died here."
Another algorithm uses a DFS or BFS to only construct reachable states in the cross-product. In that case, no modifications are necessary - if you find a pair of states where one is missing a transition, simply don't add any successor states.
On the other hand, if you're doing something like a union construction, where you only need one of the two states to be accepting, these approaches won't work because you need to be able to simulate the idea of "one automaton died, and the other is still happily running." For that, adding in the explicit dead state is a simple and effective approach.

Related

Stop rete activations

I have a rule that retracts thousands of facts when a certain condition is met. This rule sits in a module that contains two other rules that use "not" statements. My questions are:
Does the rete network get recalculated every time the first rule retracts a fact?
It is because of the "not" statements in the other two rules or would that happen anyway?
Is there a way to stop recomputing activations until the first rule has no more facts to retract?
Thanks!
Precise answers aren't possible without knowing the patterns in the rules that use the type of the retracted facts.
Clearly, if Fact is that type and the rules #2 and #3 contain just
not Fact(...constraints...)
nothing tremendous should happen until the last of those Fact facts (that meets the constraints, if any) is removed from working memory: then an additional node may have to be created, depending on what else is that not CE); this may continue depending on what is after the not CE and result in terminal nodes, i.e., activations.
If a pattern like
Fact(...constraints...)
is in any of these rules, retracting a Fact (that meets these constraints, if any) causes some immediate action on any pending activations and removal of nodes in the network, provided it has been included before.
There is not much you can do to avoid happenings in the Rete network.
That said, the necessity of having to retract thousands of facts is rather scary. How many remain? It might be cheaper to pick out the select few and start over in an entirely new Rete. Or use a design pattern that does not expose all of those thousands at once to the Engine. Or something else.
We've written a lazy algorithm, that avoids re-producing partial matches and activations, until the rule is potentially ready to fire. Being lazy you can use salience to delay when rule is evaluated.
http://blog.athico.com/2013/11/rip-rete-time-to-get-phreaky.html

DDD: The conondrum of Side-Effect-Free functions

I apologize for so many questions, but I felt that they make the most sense only when treated as a unit
Note - all quotes are from DDD: Tackling Complexity in the Heart of Software ( pages 250 and 251 )
1)
Operations can be broadly divided into two categories, commands and
queries.
...
Operations that return results without producing side effects are
called functions. A function can be called multiple times and return
the same value each time.
...
Obviously, you can't avoid commands in most software systems, but the
problem can be mitigated in two ways. First, you can keep the commands
and queries strictly segregated in different operations. Ensure that
the methods that cause changes do not return domain data and are kept
as simple as possible. Perform all queries and calculations in methods
that cause no observable side effects
a) Author implies that a query is a function since it doesn't produce side effects. He also notes that function will always return same value, by which I assume he means that for the same input we will always get the same output?
b) Assume we have a method QandC(int entityId) which queries for specific domain entity, from which it extracts certain values, which in turn are used to initialize a new Value Object and this VO is then returned to the caller. Isn't according to above quote QandC a function, since it doesn't change any state?
c) But author also argues that for same input a function will always produce same output, which isn't the case with QandC, since if we place several calls to QandC, it will produce different results, assuming that in the time between the two calls this entity was modified or even deleted. As such, how can we claim QandC is a function?
d)
Ensure that the methods that cause changes do not return domain data
...
Reason being that the state of returned non-VO may be changed in some future operations and as such the side effects of such methods are unpredictable?
e)
Ensure that the methods that cause changes do not return domain data
...
Is a query method that returns an entity still considered a function, even if it doesn't change any state?
2)
VALUE OBJECTS are immutable, which implies that, apart from
initializers called only during creation, all their operations are
functions.
...
An operation that mixes logic or calculations with state change
should be refactored into two separate operations. But by definition,
this segregation of side effects into simple command methods only
applies to ENTITIES. After completing the refactoring to separate
modification from querying, consider a second refactoring to move the
responsibility for the complex calculations into a VALUE OBJECT. The
side effect often can be completely eliminated by deriving a VALUE
OBJECT instead of changing existing state, or by moving the entire
responsibility into a VALUE OBJECT.
a)
VALUE OBJECTS are immutable, which implies that, apart from
initializers called only during creation, all their operations are
functions ... But by definition, this segregation of side effects into
simple command methods only applies to ENTITIES.
I think author is saying all methods defined on VOs are functions, which doesn't make sense, since even though a method defined on a VO can't change its own state, it still can change the state of other, non-VO objects?!
b) Assuming method defined on an entity doesn't change any state, do we consider such a method as being a function, even though it is defined on an entity?
c)
... consider a second refactoring to move the responsibility for the
complex calculations into a VALUE OBJECT.
Why is author suggesting we should only refactor from entities those function that perform complex calculations? Why instead shouldn't we also refactor simpler functions?
d)
... consider a second refactoring to move the responsibility for the
complex calculations into a VALUE OBJECT.
In any case, why is author suggesting we should refactor functions out of entities and place them inside VOs? Just because it makes it more apparent to the client that this operation MAY be a function?
e)
The side effect often can be completely eliminated by deriving a VALUE
OBJECT instead of changing existing state, or by moving the entire
responsibility into a VALUE OBJECT.
This doesn't make sense, since it appears author is arguing if we move a command ( ie operation which changes the state ) into a VO, then we will in essence eliminate any side-effects, even if command is changing the state. So any ideas, what was author actually trying to say?
UPDATE:
1b)
It depends on the perspective. A database query does not change state
and thus has no side effects, however it isn't deterministic by
nature, since as you point out the data can change. In the book, the
author is referring to functions associated with value object and
entities, which don't themselves make external calls. Therefore, the
rules don't apply to QandC.
So author was describing only functions that don't make external calls and as such QandC isn't a type of function that author was describing?
1c)
QandC does not itself change state - there are no side effects. The
underlying state may be changed out of band however. Due to this, it
is not a pure function.
But it also isn't the Side-Effect-Free function in the sense author defined them?
1d)
Again, this is based on CQS.
I know I'm repeating myself, but I assume discussion in the book is based on CQS and CQS doesn't consider QandC as Side Effect Free function due to a chance of entity returned by QandC having its state modified ( by some other operation ) sometime in the future?
1e)
It is considered a query from the CQRS perspective, but it cannot be
called a function in the sense that a pure function on a VO is a
function due to lack of determinism.
I don't quite understand what you were trying to say ( the confusing part is in bold ). Perhaps that while QandC is considered a query, it is not considered a function due to returning an entity and such the side-effects are unpredictable, which makes QandC a non-deterministic by nature
So author is only making those statements ( see quote in 1e ) under the implicit assumption that no operation defined in VO will ever try to change the state of non-VO objects?
2d)
Given that VOs are immutable, they are a fitting place to house pure
functions. This is another step towards freeing domain knowledge from
technical constraints.
I don't understand why moving function from entity to VO would help free domain knowledge from technical constraints ( I'm also not really sure what you mean by technical – technical as in technology-related or... )?
I assume other reason for putting function in VO is because it is that much more obvious ( to client ) that this is a function?
2e)
I view this as a hint towards event-sourcing. Instead of changing
existing state, you add a new event which represents the change. There
is still a net side effect, however existing state remains stable.
I must confess I know nothing about even-source programming, since I'd like to first wrap my head around DDD. Anyway, so author didn't imply that just moving a command to VO would automatically eliminate side-effects, but instead some additional actions would have to be taken ( such as implementing event-sourcing ), only he "forgot" to mention that part?
SECOND UPDATE:
2d)
One of the defining characteristics of an entity is its identity ....
By placing business logic into VOs you can consider it outside of the
context of an entity's identity. This makes it easier to test this
logic, among other things.
I somehwat understand the point you're making ( when thinking about the concept from distance ), but on the other hand I really don't. Why would function within an entity be influenced by an identity of this entity ( assuming this function is pure function, in other word it doesn't change state and is deterministic )?
2e)
Yes that is my understanding of it - there is still a net "side
effect". However, there are different ways to attain a side effect.
One way is to mutate existing state. Another way is to make the state
change explicit with an object representing that change.
I - Just to be sure ... From your answer I gather that author didn't imply that side-effects would be eliminated simply by moving a command into VO?
II - Ok,if I understand you correctly, we can move a command into VOs ( even though VOs shouldn't change the state of anything and as such shouldn't cause any side-effects ) and this command inside VO is still allowed to produce some sort of side effects, but this side effect is somehow more acceptable ( OR MORE CONTROLLABLE ) by making state change explicit ( which I interpret as the thing that changed is returned to the caller as VO )?
3) I must say that I still don't quite understand why state-changing method SC shouldn't return domain objects. Perhaps because non-VO may be changed in some future operations and as such the side effects of SC are very unpredictable?
THIRD UPDATE:
Delegating the management of state to the entity and the
implementation of behavior to VOs creates certain advantages. One is
basic partitioning of responsibilities.
a) You're saying that even though a method describes a behavior of an entity ( and thus entity containing this method adheres to SRP ) and as such belongs in the entity, it may still be a good idea to move it into VO? Thus in essence, we would partition a responsibility of an entity into two even smaller responsibilities?
b) But won't moving behavior into VO basically turn this entity into a mere data container ( I understand that entity will still manage its state, but still ... )?
thank you
1a) Yes. The discourse on separating queries from commands is based on the Command-query separation principle.
1b) It depends on the perspective. A database query does not change state and thus has no side effects, however it isn't deterministic by nature, since as you point out the data can change. In the book, the author is referring to functions associated with value object and entities, which don't themselves make external calls. Therefore, the rules don't apply to QandC. Determinism could be fabricated however, offering degrees of "pureness". For instance, a serializable transaction could be created which can ensure that data doesn't change for its duration.
1c) QandC does not itself change state - there are no side effects. The underlying state may be changed out of band however. Due to this, it is not a pure function. However, the restriction that QandC doesn't change state is still valuable. The value is fittingly demonstrated by CQRS which is the application of CQS in distributed scenarios.
1d) Again, this is based on CQS. Another take on this is the Tell-Don't-Ask principle. Given an understanding of these principles however, the rule can be bent IMO. A side-effecting method could return a VO representing the result for instance. However, in certain scenarios such as CQRS + Event Sourcing it could be desirable for commands to return void.
1e) It is considered a query from the CQRS perspective, but it cannot be called a function in the sense that a pure function on a VO is a function due to lack of determinism.
2a) No, a VO function shouldn't change state of anything, it should instead return a new object.
2b) Yes.
2c) Because functional purity tends to become more important in more complex scenarios. However, as you point out, isn't a clear and definitive rule. It shouldn't be based on complexity as much as it is based on the domain at hand.
2d) Given that VOs are immutable, they are a fitting place to house pure functions. This is another step towards freeing domain knowledge from technical constraints.
2e) I view this as a hint towards event-sourcing. Instead of changing existing state, you add a new event which represents the change. There is still a net side effect, however existing state remains stable.
UPDATE
1b) Yes.
1c) It is a side-effect free function, however it is not a deterministic function because it cannot be thought to always return the same value given the same input. For example, the function that returns the current time is a side-effect free function, but it certainly does not return the same value in subsequent calls.
1d) QandC can be thought of as side-effect free, but not pure. Another way to look at functional purity is as referential transparency - the ability to replace a function call by its value without changing program behavior. In other words, asking the question does not change the answer. QandC can guarantee that, but only within a context such as a transaction. So QandC can be thought of as a function, but only in a specific context.
1e) I think the confusing part is that the author is talking specifically about functions on VOs and entities - not database queries, where as we are talking about both. My statement extends the discussion to database queries and CQRS given certain restrictions, ie an ambient transaction.
2d) I can see how what I said was a bit vague, I was getting lazy. One of the defining characteristics of an entity is its identity. It maintains its identity throughout its life-cycle while its state may change. By placing business logic into VOs you can consider it outside of the context of an entity's identity. This makes it easier to test this logic, among other things.
2e) Yes that is my understanding of it - there is still a net "side effect". However, there are different ways to attain a side effect. One way is to mutate existing state. Another way is to make the state change explicit with an object representing that change.
UPDATE 2
2d) This particular point can be argued or can be a matter of preference. One perspective is the idea is based on the single-responsibility principle (SRP). The responsibility of an entity is the association of an identity with behavior and state. Behavior combines input with existing state to produce state transitions. Delegating the management of state to the entity and the implementation of behavior to VOs creates certain advantages. One is basic partitioning of responsibilities. Another is more subtle and perhaps more arguable. It is the idea that logic can be considered in a stateless manner. This allows thinking about such logic easier and more like thinking about a mathematical equation where all changes are explicit - no hidden state.
2e.1) Yes, eliminating a net side effect would alter behavior, which is not the goal.
2e.2) Yes.
3) Commands returning void have several advantages. One is that they become naturally more adept in async scenarios - no need to wait for a result. Another is that it allows you to represent the operation as a single command object - again, because there is no return value. This applies in CQRS and also event sourcing. In these cases, any command output is dispatched as an event instead of a result. But again, if these requirements don't apply returning a result object can be appropriate.
UPDATE 3
a) Yes, and this is a specific type of partitioning.
b) The responsibility of the entity is to coordinate behavior by delegating to VOs and applying the resulting state changes.

Methods for automated synonym detection

I am currently working on a neural network based approach to short document classification, and since the corpuses I am working with are usually around ten words, the standard statistical document classification methods are of limited use. Due to this fact I am attempting to implement some form of automated synonym detection for the matches provided in the training. My question more specifically is about resolving a situation as follows:
Say I have classifications of "Involving Food", and one of "Involving Spheres" and a data set as follows:
"Eating Apples"(Food);"Eating Marbles"(Spheres); "Eating Oranges"(Food, Spheres);
"Throwing Baseballs(Spheres)";"Throwing Apples(Food)";"Throwing Balls(Spheres)";
"Spinning Apples"(Food);"Spinning Baseballs";
I am looking for an incremental method that would move towards the following linkages:
Eating --> Food
Apples --> Food
Marbles --> Spheres
Oranges --> Food, Spheres
Throwing --> Spheres
Baseballs --> Spheres
Balls --> Spheres
Spinning --> Neutral
Involving --> Neutral
I do realize that in this specific case these might be slightly suspect matches, but it illustrates the problems I am having. My general thoughts were that if I incremented a word for appearing opposite the words in a category, but in that case I would end up incidentally linking everything to the word "Involving", I then thought that I would simply decrement a word for appearing in conjunction with multiple synonyms, or with non-synonyms, but then I would lose the link between "Eating" and "Food". Does anyone have any clue as to how I would put together an algorithm that would move me in the directions indicated above?
There is an unsupervized boot-strapping approach that was explained to me to do this.
There are different ways of applying this approach, and variants, but here's a simplified version.
Concept:
Start by a assuming that if two words are synonyms, then in your corpus they will appear in similar settings. (eating grapes, eating sandwich, etc.)
(In this variant I will use co-occurence as the setting).
Boot-Strapping Algorithm:
We have two lists,
one list will contain the words that co-occur with food items
one list will contain the words that are food items
Supervized Part
Start by seeding one of the lists, for instance I might write the word Apple on the food items list.
Now let the computer take over.
Unsupervized Parts
It will first find all words in the corpus that appear just before Apple, and sort them in order of most occuring.
Take the top two (or however many you want) and add them into the co-occur with food items list. For example, perhaps "eating" and "Delicious" are the top two.
Now use that list to find the next two top food words by ranking the words that appear to the right of each word in the list.
Continue this process expanding each list until you are happy with the results.
Once that's done
(you may need to manually remove some things from the lists as you go which are clearly wrong.)
Variants
This procedure can be made quite effective if you take into account the grammatical setting of the keywords.
Subj ate NounPhrase
NounPhrase are/is Moldy
The workers harvested the Apples.
subj verb Apples
That might imply harvested is an important verb for distinguishing foods.
Then look for other occurrences of subj harvested nounPhrase
You can expand this process to move words into categories, instead of a single category at each step.
My Source
This approach was used in a system developed at the University of Utah a few years back which was successful at compiling a decent list of weapon words, victim words, and place words by just looking at news articles.
An interesting approach, and had good results.
Not a neural network approach, but an intriguing methodology.
Edit:
the system at the University of Utah was called AutoSlog-TS, and a short slide about it can be seen here towards the end of the presentation. And a link to a paper about it here
You could try LDA which is unsupervised. There is a supervised version of LDA but I can't remember the name! Stanford parser will have the algorithm which you can play around with. I understand it's not the NN approach you are looking for. But if you are just looking to group information together LDA would seem appropriate, especially if you are looking for 'topics'
The code here (http://ronan.collobert.com/senna/) implements a neural network to perform a variety on NLP tasks. The page also links to a paper that describes one of the most successful approaches so far of applying convolutional neural nets to NLP tasks.
It is possible to modify their code to use the trained networks that they provide to classify sentences, but this may take more work than you were hoping for, and it can be tricky to correctly train neural networks.
I had a lot of success using a similar technique to classify biological sequences, but, in contrast to English language sentences, my sequences had only 20 possible symbols per position rather than 50-100k.
One interesting feature of their network that may be useful to you is their word embeddings. Word embeddings map individual words (each can be considered an indicator vector of length 100k) to real valued vectors of length 50. Euclidean distance between the embedded vectors should reflect semantic distance between words, so this could help you detect synonyms.
For a simpler approach WordNet (http://wordnet.princeton.edu/) provides lists of synonyms, but I have never used this myself.
I'm not sure if I misunderstand your question. Do you require the system to be able to reason based on your input data alone, or would it be acceptable to refer to an external dictionary?
If it is acceptable, I would recommend you to take a look at http://wordnet.princeton.edu/ which is a database of English word relationships. (It also exists for a few other languges.) These relationships include synonyms, antonyms, hyperonyms (which is what you really seem to be looking for, rather than synonyms), hyponyms, etc.
The hyperonym / hyponym relationship links more generic terms to more specific ones. The words "banana" and "orange" are hyponyms of "fruit"; it is a hyperonym of both. http://en.wikipedia.org/wiki/Hyponymy Of course, "orange" is ambiguous, and is also a hyponym of "color".
You asked for a method, but I can only point you to data. Even if this turns out to be useful, you will obviously need quite a bit of work to use it for your particular application. For one thing, how do you know when you have reached a suitable level of abstraction? Unless your input is hevily normalized, you will have a mix of generic and specific terms. Do you stop at "citrus","fruit", "plant", "animate", "concrete", or "noun"? (Sorry, just made up this particular hierarchy.) Still, hope this helps.

Searching in graphs trees with Depth/Breadth first/A* algorithms

I have a couple of questions about searching in graphs/trees:
Let's assume I have an empty chess board and I want to move a pawn around from point A to B.
A. When using depth first search or breadth first search must we use open and closed lists ? This is, a list that has all the elements to check, and other with all other elements that were already checked? Is it even possible to do it without having those lists? What about A*, does it need it?
B. When using lists, after having found a solution, how can you get the sequence of states from A to B? I assume when you have items in the open and closed list, instead of just having the (x, y) states, you have an "extended state" formed with (x, y, parent_of_this_node) ?
C. State A has 4 possible moves (right, left, up, down). If I do as first move left, should I let it in the next state come back to the original state? This, is, do the "right" move? If not, must I transverse the search tree every time to check which states I've been to?
D. When I see a state in the tree where I've already been, should I just ignore it, as I know it's a dead end? I guess to do this I'd have to always keep the list of visited states, right?
E. Is there any difference between search trees and graphs? Are they just different ways to look at the same thing?
A. When using depth first search or
breadth first search must we use open
and closed lists ?
With DFS you definitely need to store at least the current path. Otherwise you would not be able to backtrack. If you decide upon maintaining a list of all visited (closed) nodes, you are able to detect and avoid cycles (expanding the same node more than once). On the other side you don't have the space efficiency of DFS anymore. DFS without closed list only needs space proportional to the depth of the search space.
With BFS you need to maintain an open list (sometimes called fringe). Otherwise the algorithm simply can't work. When you additionally maintain a closed list, you will (again) be able to detect/avoid cycles. With BFS the additional space for the closed list might be not that bad, since you have to maintain the fringe anyway. The relation between fringe size and closed list size strongly depends upon the structure of the search space, so this has to be considered here. E.g. for a branching factor of 2, both lists are equal in size and the impact of having the closed list doesn't seem very bad compared to its benefits.
What about A*, does it need it?
A*, as it can be seen as some special (informed) type of BFS, needs the open list. Omitting the closed list is more delicate than with BFS; also deciding upon updating costs inside the closed list. Depending upon those decisions, the algorithm can stop being optimal and/or complete depending on the type of heuristic used, etc. I won't go into details here.
B.
Yup, the closed list should form some kind of inverse tree (pointers going towards the root node), so you can extract the solution path. You usually need the closed list for doing this. For DFS, your current stack is exactly the solution path (no need for closed list here). Also note that sometimes you are not interested in the path but only in the solution or the existence of it.
C.
Read previous answers and look for the parts which talk about the detection of cycles.
D.
To avoid cycles with a closed list: don't expand nodes that are already inside the closed list. Note: with path-costs coming into play (remember A*), things might get more tricky.
E. Is there any difference between
search trees and graphs?
You could consider searches that maintain a closed list to avoid cycles as graph-searches and those without one tree-searches.
A) It's possible to avoid the open/closed lists - you could try all possible paths, but that would take a VERY long time.
B) Once you've reached the goal, you use the parent_of_this_node information to "walk backwards" from the goal. Start with the goal, get its parent, get the parent's parent, etc. until you reach the start.
C) I think it doesn't matter - there's no way that the step you describe would result in a shorter path (unless your steps have negative weight, in which case you can't use Dijkstra/A*). In my A* code, I check for this case and ignore it, but do whatever is easiest to code up.
D) It depends - I believe Dijkstra can never reopen the same node (can someone correct me on that?). A* definitely can revisit a node - if you find a shorter path to the same node, you keep that path, otherwise you ignore it.
E) Not sure, I've never done anything specifically for trees myself.
There's a good introduction to A* here:
http://theory.stanford.edu/~amitp/GameProgramming/
that covers a lot of details about how to implement the open set, pick a heuristic, etc.
A. Open and Closed lists are common implementation details, not part of the algorithm as such. It's common to do a depth-first tree search without either of these for example, the canonical way being a recursive traversal of the tree.
B. It is typical to ensure that nodes refer back to previous nodes allowing for a plan to be reconstructed by following the back-links. Alternatively you just store the entire solution so far in each candidate, though it would then be misleading to call it a node really.
C. I'm assuming that moving left and then moving right bring you to an equivalent state - in this case, you would have already explored the original state, it would be on the closed list, and therefore should not have been put back onto the open list. You don't traverse the search tree each time because you keep a closed list - often implemented as an O(1) structure - for precisely this purpose of knowing which states have already been fully examined. Note that you cannot always assume that being in the same position is the same as being in the same state - for most game path-finding purposes, it is, but for general purpose search, it is not.
D. Yes, the list of visited states is what you're calling the closed list. You also want to check the open list to ensure you're not planning to examine a given state twice. You don't need to search any tree as such, since you typically store these things in linear structures. The algorithm as a whole is searching a tree (or a graph), and it generates a tree (of nodes representing the state space) but you don't explicitly search through a tree structure at any point within the algorithm.
E. A tree is a type of graph with no cycles/loops in it. Therefore you use the same graph search procedure to search either. It's common to generate a tree structure that represents your search through the graph, which is represented implicitly by the backwards links from each node to the node that preceded/generated it in the search. (Although if you go down the route of holding the entire plan in each state, there will be no tree, just a list of partial solutions.)

Do you use particular conventions for naming complementary variables?

I often find myself trying to come up with good names for complementary pairs of variables; where two variables denote opposing concepts, two participants in some sort of duologue, and so on.
This might be better explained by a counter-example - I maintain an app that prints two graphics as part of a print advertisement. They're stored in the database as TopLogo and LowerLogo, which I have to stop and double-check every time I use them because I'm expecting top to complement bottom, and lower should complement upper.
There's some obvious examples that I think work well:
client / server
source / target for copying/moving data or files from one variable to another
minimum / maximum
but there's some concepts that just don't lend themselves to such neat naming schemes. For example, when paging through records, does 'last' mean 'final' or 'previous' ? I recently saw some code that used firstPage, previousPage, nextPage and finalPage to avoid the ambiuous lastPage completely, which I thought was very beat, hence this question.
Do you have any particularly neat variable name pairs you'd care to share with us? (Bonus points if they're the same length, which makes the code so much neater in monospaced fonts.)
Like with all kinds of code style conventions, consistency is what you should strive for.
I would have the development team agree on "standard" pairs of prefixes for common scenarios like "source/destination" or "from/to" and then stick with them for the whole project. As long as every developer is aware of what is meant with a particular prefix in the codebase, it is easier to avoid misunderstandings.
Exceptions to the rule should be clarified in the documentation if the variable is part of a public API, or in comments within the code, if it's visibility is restricted to a single class or method.
In my databases you'll find many valid-state temporal ("history") tables containing a pair of columns named start_date and end_date. No bonus points for me, then, because I'd rather use the commonly used 'end' than try to come up with an intuitive alternative with the same number of characters as the word 'start'.
I tend to prefer these generic terms even when more context-specific terms may be viable e.g. preferring employee_start_date over employee_hire_date (what if their employment started for a reason other than being formally hiring e.g. their company was the subject of an acquisition). That said, I'd prefer person_birth_date over person_start_date :)
While one does try to be semantically coherent in obvious cases -- e.g., maximum goes with minimum, and not "lowest" -- in well-structured OO code (which isn't all code, I know) the problem disappears with a good IDE. Classes are short, methods are short, and variables are few in each method. So it doesn't matter what you call the variable pairs so long as they're clear. Your code might not look professional, but real quality is in the code, not in the look of your code.
The problem further disappears if there is good JavaDoc or whatever the documentation system is, and if have good Class names that go with them. For instance, if you have an instance of a Connection class and it has a method a method called setDestination, that's okay, but if you know that setDestination takes one parameter called destination and it's of the Server class, you're cool... even though you might prefer to call it target, aimHere, placeToSendTheData, or whatever (and the corresponding names, source, comingFromHere, and placeToGetTheDataFrom). Plus the doc system says what the thing is for, and that is priceless.
This next thing might sound stupid and I'm sure I'll get voted down here on StackOverflow, but unique non-professional sounding variable names have a great advantage: I know that my variables have names like placeWeWantTheDataToGo (and the IDE takes care of typing it), but the "serious" guys who do the JDK would never use such silly names. So I know immediately that the variable is one of mine. Incidentally, when I worked with developers in Spain and Italy, they write code with Spanish variable names (not always, but usually). This causes the same effect: we can quickly see that the Conexion class is ours, but the Connection class is not.
[Also, instead of typing your variable names, assign them a constant String somewhere in your code and use that, so if they called it lower or downer instead of low, you're still okay.]
Yes, I do try to name complementary sets of variables systematically so that the symmetry is clear. It is not always easy; sometimes, not even possible. Well, not possible using the rules I lay down for myself - which means I usually try to have the names the same length. The 'top' and 'lower' example would drive me batty (assuming I'm not batty already, which is far from certain); I'd probably use 'upper' and 'lower' because those are the same length; 'top' and 'bottom' would frustrate me too because of the difference in length.