I really need your help with a query in BaseX.
The problem is that I really do not understand the logic behind this language which is Xquery.
So I have this first exercise and it is asking me:
"Find the first symptom(s) appearing after June 5, 2012. Report the result in a document having root SYMSAFTER, containing elements SYM."
The database is like that
<?xml version="1.0"?>
<PATIENT_SYMS>
<PATIENT>
<NAME>Bob</NAME>
<SYMOCC>
<SYM>
<INT>high</INT>
<DESC> edema </DESC>
</SYM>
</SYMOCC>
</PATIENT>
<PATIENT>
<NAME>Ann</NAME>
<SYMOCC>
<DATE>2015-08-03</DATE>
<SYM>
<INT>low</INT>
<DESC> asthma </DESC>
</SYM>
</SYMOCC>
<SYMOCC>
<DATE>2017-05-03</DATE>
<SYM>
<INT> high </INT>
<DESC> nausea </DESC>
</SYM>
</SYMOCC>
</PATIENT>
<PATIENT>
<NAME> Tom </NAME>
<SYMOCC>
<DATE>2011-01-01</DATE>
<SYM>
<INT>high</INT>
<DESC> headache </DESC>
</SYM>
<SYM>
<INT> low </INT>
<DESC> nausea </DESC>
</SYM>
</SYMOCC>
</PATIENT>
<PATIENT>
<NAME>Sue</NAME>
</PATIENT>
</PATIENT_SYMS>
The answer to the question is the following:
<SYMSAFTER> {
for $s in doc('Ps.xml')//SYMOCC
where $s/DATE > '2012-06-05' and (every $s1 in doc('Ps.xml')//SYMOCC satisfies not($s1/DATE > '2012-06-05') or $s1/DATE >= $s/DATE)
return $s
}
</SYMSAFTER>
The output will be:
<SYMSAFTER>
<SYMOCC>
<DATE>2015-08-03</DATE>
<SYM>
<INT>low</INT>
<DESC>asthma</DESC>
</SYM>
</SYMOCC>
</SYMSAFTER>
I honestly don't understand the logic behind that.
How instructions are executed in this language? Is it comparing every single date in $s with any other date in s1? Is there any order it follows?
How does satisfies/satisfies-not work? Because in this case to understand what is going on I thought: "well, if
satisfies not($s1/DATE > 2012-06-05)
why this one down below it is actually not working?
satisfies ($s1/DATE < 2012-06-05)
isn't it the exact same thing?
Why is the last part "OR" and not "AND". I got we're checking if the first date is actually the first by checking if there isn't another date before that date but shouldn't it be "AND"?
Why in this line
$s1/DATE >= $s/DATE
we put greater equal (and not just greater)? isn't it obvious that it is going to find the same date equal to the one on $s?
As you can imagine I'm a bit confuse about this, but online informations are really poor and I had no idea on what I need to do.
Thank you!
Learning any language from online resources alone can be very tough. There's so much information, but it is typically of very mixed quality, and most of it's written in an hour or two with very little design or review. Get yourself a good old-fashioned book, like Priscilla Walmsley's - you know that's written by an expert, who has spent months thinking carefully about how to present information in a logical sequence, and it will have been carefully reviewed by others.
Now let's look at this example query.
for $s in doc('Ps.xml')//SYMOCC
where $s/DATE > '2012-06-05'
and (every $s1 in doc('Ps.xml')//SYMOCC
satisfies not($s1/DATE > '2012-06-05')
or $s1/DATE >= $s/DATE)
return $s
I actually think this is a very poor answer to the question, but let's analyse what it means.
Firstly, you have to know the language pretty well to know the precedence of the operators, specifically, whether the "or xxxx" clause is part of the "satisfies" condition or not. In fact it is, as I have tried to show in my indentation - but it would be better to use parentheses to make it clear.
The query is looking for dates in doc('Ps.xml')//SYMOCC that satisfy two conditions: (a) the date D must be after 2012-06-05, and (b) every date in the document must either be before 2012-06-05, or >= D. Those two conditions correspond to the conditions in the requirement that (a) the date must be after 2012-06-05, and (b) it must be earlier than any other date.
Let's try and answer your questions:
How instructions are executed in this language? Is it comparing every single date in $s with any other date in s1? Is there any order it follows?
It's not an imperative, procedural language, it's a declarative language. It doesn't have instructions, and they aren't executed. It's a logic-based declarative language where you say what conditions the answer must satisfy, and the system works out how to get that answer. Different implementations will do it quite differently depending on their optimization strategy.
The difference between DATE < XXX and not(DATE >= XXX) arises when there is no DATE (some of the SYMOCC elements do not have a DATE child). If there is no DATE, then DATE < XXX and DATE >= XXX are both false.
Why is it OR rather than AND? Well, I think the way the query is expressed is a little perverse, but given the approach taken, it's correct. The date D we're looking for is the first one after 2012-06-05 if every other date is either (a) earlier than 2012-06-05, or (b) later than D.
Why is the final condition >= rather than >? Because there can be multiple symptoms appearing on the same date. If you wrote >, then you'd get no results in the event of duplicates.
Most of your questions seem to be less a problem with XQuery notation, and more a lack of understanding of how predicate logic works. But having said that, I would have produced a different solution to this problem. I would start by sorting all the events by date, then removing those before 2012-06-05, then removing those after the first date in the sequence. That would be something like
let $selected :=
for $s in doc('Ps.xml')//SYMOCC[DATE]
where $s/DATE > '2012-06-05'
order by $s/DATE
return $s
return $selected[DATE = $selected[1]/DATE]
Related
This has been bugging me since a long time.
Suppose I have a boolean function F defined as follows:
Now, it can be expressed in its SOP form as:
F = bar(X)Ybar(Z)+ XYZ
But I fail to understand why we always complement the 0s to express them as 1. Is it assumed that the inputs X, Y and Z will always be 1?
What is the practical application of that? All the youtube videos I watched on this topic, how to express a function in SOP form or as sum of minterms but none of them explained why we need this thing? Why do we need minterms in the first place?
As of now, I believe that we design circuits to yield and take only 1 and that's where minterms come in handy. But I couldn't get any confirmation of this thing anywhere so I am not sure I am right.
Maxterms are even more confusing. Do we design circuits that would yield and take only 0s? Is that the purpose of maxterms?
Why do we need minterms in the first place?
We do not need minterms, we need a way to solve a logic design problem, i.e. given a truth table, find a logic circuit able to reproduce this truth table.
Obviously, this requires a methodology. Minterm and sum-of-products is mean to realize that. Maxterms and product-of-sums is another one. In either case, you get an algebraic representation of your truth table and you can either implement it directly or try to apply standard theorems of boolean algebra to find an equivalent, but simpler, representation.
But these are not the only tools. For instance, with Karnaugh maps, you rewrite your truth table with some rules and you can simultaneously find an algebraic representation and reduce its complexity, and it does not consider minterms. Its main drawback is that it becomes unworkable if the number of inputs rises and it cannot be considered as a general way to solve the problem of logic design.
It happens that minterms (or maxterms) do not have this drawback, and can be used to solve any problem. We get a trut table and we can directly convert it in an equation with ands, ors and nots. Indeed minterms are somehow simpler to human beings than maxterms, but it is just a matter of taste or of a reduced number of parenthesis, they are actually equivalent.
But I fail to understand why we always complement the 0s to express them as 1. Is it assumed that the inputs X, Y and Z will always be 1?
Assume that we have a truth table, with only a given output at 1. For instance, as line 3 of your table. It means that when x=0, y=1 and z=0 , the output will be zero. So, can I express that in boolean logic? With the SOP methodology, we say that we want a solution for this problem that is an "and" of entries or of their complement. And obviously the solution is "x must be false and y must be true and z must be false" or "(not x) must be true and y must be true and (not z) must be true", hence the minterm /x.y./z. So complementing when we have a 0 and leaving unchanged when we have a 1 is way to find the equation that will be true when xyz=010
If I have another table with only one output at 1 (for instance line 8 of your table), we can find similarly that I can implement this TT with x.y.z.
Now if I have a TT with 2 lines at 1, one can use the property of OR gates and do the OR of the previous circuits. when the output of the first one is 1, it will force this behavior and ditto for the second. And we directly get the solution for your table /xy/z+xyz
This can be extended to any number of ones in the TT and gives a systematic way to find an equation equivalent to a truth table.
So just think of minterms and maxterms as a tool to translate a TT into equations. What is important is the truth table (that describes the behaviour of what you want to do) and the equations (that give you a way to realize it).
Pattern matching (as found in e.g. Prolog, the ML family languages and various expert system shells) normally operates by matching a query against data element by element in strict order.
In domains like automated theorem proving, however, there is a requirement to take into account that some operators are associative and commutative. Suppose we have data
A or B or C
and query
C or $X
Going by surface syntax this doesn't match, but logically it should match with $X bound to A or B because or is associative and commutative.
Is there any existing system, in any language, that does this sort of thing?
Associative-Commutative pattern matching has been around since 1981 and earlier, and is still a hot topic today.
There are lots of systems that implement this idea and make it useful; it means you can avoid write complicated pattern matches when associtivity or commutativity could be used to make the pattern match. Yes, it can be expensive; better the pattern matcher do this automatically, than you do it badly by hand.
You can see an example in a rewrite system for algebra and simple calculus implemented using our program transformation system. In this example, the symbolic language to be processed is defined by grammar rules, and those rules that have A-C properties are marked. Rewrites on trees produced by parsing the symbolic language are automatically extended to match.
The maude term rewriter implements associative and commutative pattern matching.
http://maude.cs.uiuc.edu/
I've never encountered such a thing, and I just had a more detailed look.
There is a sound computational reason for not implementing this by default - one has to essentially generate all combinations of the input before pattern matching, or you have to generate the full cross-product worth of match clauses.
I suspect that the usual way to implement this would be to simply write both patterns (in the binary case), i.e., have patterns for both C or $X and $X or C.
Depending on the underlying organisation of data (it's usually tuples), this pattern matching would involve rearranging the order of tuple elements, which would be weird (particularly in a strongly typed environment!). If it's lists instead, then you're on even shakier ground.
Incidentally, I suspect that the operation you fundamentally want is disjoint union patterns on sets, e.g.:
foo (Or ({C} disjointUnion {X})) = ...
The only programming environment I've seen that deals with sets in any detail would be Isabelle/HOL, and I'm still not sure that you can construct pattern matches over them.
EDIT: It looks like Isabelle's function functionality (rather than fun) will let you define complex non-constructor patterns, except then you have to prove that they are used consistently, and you can't use the code generator anymore.
EDIT 2: The way I implemented similar functionality over n commutative, associative and transitive operators was this:
My terms were of the form A | B | C | D, while queries were of the form B | C | $X, where $X was permitted to match zero or more things. I pre-sorted these using lexographic ordering, so that variables always occurred in the last position.
First, you construct all pairwise matches, ignoring variables for now, and recording those that match according to your rules.
{ (B,B), (C,C) }
If you treat this as a bipartite graph, then you are essentially doing a perfect marriage problem. There exist fast algorithms for finding these.
Assuming you find one, then you gather up everything that does not appear on the left-hand side of your relation (in this example, A and D), and you stuff them into the variable $X, and your match is complete. Obviously you can fail at any stage here, but this will mostly happen if there is no variable free on the RHS, or if there exists a constructor on the LHS that is not matched by anything (preventing you from finding a perfect match).
Sorry if this is a bit muddled. It's been a while since I wrote this code, but I hope this helps you, even a little bit!
For the record, this might not be a good approach in all cases. I had very complex notions of 'match' on subterms (i.e., not simple equality), and so building sets or anything would not have worked. Maybe that'll work in your case though and you can compute disjoint unions directly.
What existing terminology and art is there for data types that have values implying ranges of tolerance, not specific points?
An example: time values. In ISO 8601 notation, the value 1964 encompasses the values 1964-05, 1964-05-02, 1964-05-02T18, 1964-05-02T18:27, 1964-05-02T18:27:43, 1964-05-02T18:27:43.0613.
That is, each one of those values is not a zero-dimensional point, but an interval encompassing a range of more-precise values.
The more precise values in that set should compare equal to the less-precise ones:
1964 < 1964-05-02 → False
1964 > 1964-05-02 → False
1964 = 1964-05-02 → True
and ‘greater than’ and ‘less than’ should be both false for values encompassed within a less-precise value. The intervals don't overlap, so that's not a concern.
1964-05-02T18:27:43 < 1964-05-02T18:30:11 → True
1964-05-02T18:27:43 < 1964-05-02 → False
1964-05-02T18:27:43 < 1964-05-04 → True
But how should such types be implemented? What kind of comparison am I talking about? What about arithmetic on such values?
In short, what existing body of knowledge should I be looking to for exploration of these concepts?
As your italics managed to work out, this is called interval arithmetic.
You're specifically interested in order and equality relationships between interval values. The wikipedia article doesn't talk about that, but i assume it has been worked on, as it's a fairly basic thing to want to do with numbers, even fuzzy ones.
I would imagine that you would say that two intervals are not equal if their ranges do not overlap at all, and that an interval is greater than another interval if the former's range lies entirely above the latter's.
However, i don't think you can have a sensible definition of equal; you might need several different kinds of quasi-equality. You could say two ranges which are not not equal are equal, but i don't think that really helps. That's more like possibly equal. Then there's your idea of one range containing another, in which case you might say that the larger was roughly equal to the smaller. However, since the roughly equal relationship is not symmetric, it's not an equivalence relation, and so it doesn't make a good kind of general-purpose equality.
Or maybe this whole thing is just a generalised case of the idea of significant figures? I suppose interval arithmetic is just the arithmetic you use to deal with numbers that have significant figures.
I'm not a Natural Language Programming student, yet I know it's not trivial strcmp(n1,n2).
Here's what i've learned so far:
comparing Personal Names can't be solved 100%
there are ways to achieve certain degree of accuracy.
the answer will be locale-specific, that's OK.
I'm not looking for spelling alternatives! The assumption is that the input's spelling is correct.
For example, all the names below can refer to the same person:
Berry Tsakala
Bernard Tsakala
Berry J. Tsakala
Tsakala, Berry
I'm trying to:
build (or copy) an algorithm which grades the relationship 2 input names
find an indexing method (for names in my database, for hash tables, etc.)
note:
My task isn't about finding names in text, but to compare 2 names. e.g.
name_compare( "James Brown", "Brown, James", "en-US" ) ---> 99.0%
I used Tanimoto Coefficient for a quick (but not super) solution, in Python:
"""
Formula:
Na = number of set A elements
Nb = number of set B elements
Nc = number of common items
T = Nc / (Na + Nb - Nc)
"""
def tanimoto(a, b):
c = [v for v in a if v in b]
return float(len(c)) / (len(a)+len(b)-len(c))
def name_compare(name1, name2):
return tanimoto(name1, name2)
>>> name_compare("James Brown", "Brown, James")
0.91666666666666663
>>> name_compare("Berry Tsakala", "Bernard Tsakala")
0.75
>>>
Edit: A link to a good and useful book.
Soundex is sometimes used to compare similar names. It doesn't deal with first name/last name ordering, but you could probably just have your code look for the comma to solve that problem.
We've just been doing this sort of work non-stop lately and the approach we've taken is to have a look-up table or alias list. If you can discount misspellings/misheard/non-english names then the difficult part is taken away. In your examples we would assume that the first word and the last word are the forename and the surname. Anything in between would be discarded (middle names, initials). Berry and Bernard would be in the alias list - and when Tsakala did not match to Berry we would flip the word order around and then get the match.
One thing you need to understand is the database/people lists you are dealing with. In the English speaking world middle names are inconsistently recorded. So you can't make or deny a match based on the middle name or middle initial. Soundex will not help you with common name aliases such as "Dick" and "Richard", "Berry" and "Bernard" and possibly "Steve" and "Stephen". In some communities it is quite common for people to live at the same address and have 2 or 3 generations living at that address with the same name. The only way you can separate them is by date of birth. Date of birth may or may not be recorded. If you have the clout then you should probably make the recording of date of birth mandatory. A lot of "people databases" either don't record date of birth or won't give them away due to privacy reasons.
Effectively people name matching is not that complicated. Its entirely based on the quality of the data supplied. What happens in practice is that a lot of records remain unmatched - and even a human looking at them can't resolve the mismatch. A human may notice name aliases not recorded in the aliases list or may be able to look up details of the person on the internet - but you can't really expect your programme to do that.
Banks, credit rating organisations and the government have a lot of detailed information about us. Previous addresses, date of birth etc. And that helps them join up names. But for us normal programmers there is no magic bullet.
Analyzing name order and the existence of middle names/initials is trivial, of course, so it looks like the real challenge is knowing common name alternatives. I doubt this can be done without using some sort of nickname lookup table. This list is a good starting point. It doesn't map Bernard to Berry, but it would probably catch the most common cases. Perhaps an even more exhaustive list can be found elsewhere, but I definitely think that a locale-specific lookup table is the way to go.
I had real problems with the Tanimoto using utf-8.
What works for languages that use diacritical signs is difflib.SequenceMatcher()
I'm quite anal about form validation. So while creating a validator for a "data of birth" (DOB) field in one of my current projects for a job application form (platform/language is neutral in this context), I wanted something to prevent 'punky' inputs.
I used a date picker and restricted the max date to be XX years from the current day. XX make sense for this scenario as anyone younger shouldn't be even applying for the job.
The validation error message is: You seem too young for the job.
Then I began to get adventurous. How about?
If DOB is more than 120 years ago, message: "You cannot be that old!!!"
If DOB is in the future, message: "You must be kidding, you are not born yet!!!"
In the end, I deployed without the last 2, too cheeky for my no-nonsense client.
I would like to know how far/much would you guys go to validate DOB fields for good usability (or humor)?
Similarly for dates like, "Date of marriage", "Year of graduation" etc...
PS: As I was about to submit this post, there's a warning under the title textbox:
"The question you're asking appears subjective and is likely to be closed."
Fingers crossed.
To add:
I'm quite surprised that some/most of the guys are not too concern about the validation. I repeat one of my comments here:
If the user entered the date wrongly (something very obvious) whether by intent or by mistake; that's one of the purposes of the validators to catch it. When data goes into the system, the site owner only know the input is wrong, he/she would not know the actual value without asking the user. If this field is highly important, it will not be a pretty scenario.
Think about the times you've filled out forms. How many times have you been frustrated because some "overly clever" programmer inserted some "validation" that just happened to be incorrect for your circumstance? I say, trust the user. Come to think of it, as time goes on I guess people are living longer and getting on the net at earlier ages, anyway. :P
don't forget you can also warn the user against unlikely values. In most cases, a typo is more likely than deliberately being awkward.
So for your application, maybe something like this:
Age < min. applicant age - error
Age > common retirement age - warning
Age > expected life span - error
Validation vs. Correctness
The point of input validation is to ensure all elements are within the range allowed for and expected by further processing - i.e. if your database guarantees all applicants in the DB are 18 years or older, validate that. If your database also accepts school kids applying for internships, don't.
Everything unusual is just a warning. Yes, a value of 120 years is crazy, you should warn the user and possibly flag this record as suspicous / for review. However, there's no point in rejecting it (unless you have a business rule that e.g. all applicants are younger than 70).
Fake trust
Imagine what happens if you tell one user that "you rule out unlikely DOBs at the input". She might tell her co-worker that DOB is "already validated". He ends up with an unfounded trust that the applicant is 90, and if it were a fake you would have rejected it.
All further processing - by human or by computer - must still assume the DOB may be incorrect - just because of a typo. You are trying to create a guarantee you can't actually make. Many users trust the computer they use every day more than a stranger, you are trying to enforce this trust - which is IMO s fallacy.
Transmutation
Many applications live much longer than the original implementer imagined, and quite some will be used for purposes beyond his wildest dreams. Building in artificial limits that neither simplify the actual processing nor the job of the operator don't actually help.
(That puts me probably into the no-nonsense category of your client - but thst's my way to be "anal about validation": knowing when to stop :))
I think validation is incredibly important, but not necessarily in your situation. Which isn't to say that your situation is trivial, I just have my own date-oriented nits to pick.
Specifically, my concerns are always in keeping things in logical order. If someone says they were born in 1802, that's fine (sorta), I just want their date of graduation to be greater than their date of birth. But you run into itchy little problems when it comes to time (as in hours and minutes), for instance, if a user chooses 8:30 as the start time and then chooses 9:15 as the end time, but then realizes that the end time was 8:45. They decide to change the 9 to an 8 with the intention of changing the minutes to :45. But my validation script is too busy saying "Hey Wait! 8:15 is before 8:30, nice try!" but I can't risk letting them leave it wrong, etc etc.
For your situation specifically, I would lean toward what is ethical right. Because as it's been pointed out, someone could be entering a family history (with DOBs in the 1600's) or future purchases (with dates after today), so there is no realistic limit on dates in general. But there are limits to your scenario, ie:
If Age is less than legal working age (16 in most parts of the US), don't even offer anything higher than that year as an option (if you are using drop down).
If Age is beyond reasonable working age (which can be a sensitive subject) offer the highest value based on retirement age and simply add a ">" in front of that year. If someone is 75 and applying for an admin-level job, they will be more pleased that you made things simple rather than offended that you didn't have their year of birth listed. If anything, they will be impressed (I think) that you went this route instead of nothing at all, implying they shouldn't waste their time.
In the end you have a simple drop down very easy to script (example in PHP):
$currentYear = date('Y');
echo "<select name=\"YearOfBirth\">";
for($i = 16; $i <= 64; $i++) {
$optionYear = $currentYear - $i;
echo "<option value=\"$optionYear\">$optionYear</option>";
}
$greaterYear = $currentYear - 65;
echo "<option value=\">$greaterYear\">>$greaterYear</option>";
echo "</select>";
When asking living people for their birthdate, only reject values that are definitely wrong. Any birthdate in the future is definitely wrong. And I would draw a line and say that any birthdate before (say) 1880 is definitely wrong. Anything else is a valid birthdate.
So any birthdate that fails the above tests is rejected with a message at field level, like "This date is in the future/too far in the past. Please enter your birthdate."
Any other birthdate is valid (maybe the user really is 11 years old, or 108). But the overall form may be rejected by business rules. For example, "You must be at least 18 years old to apply."
The idea is to separate individual field validation from form validation. Conflating them yields complicated rules. Separating means you can re-use the rules for the field (e.g. "DOB of a living person must be between 1/1/1880 and today") in other contexts.
If you're doing this for anything professional - like a job application - I might not use "!" in messages to users. Take a look at any well done website you'd like, you're not going to find it in common use.
Valid date: check
Date not in future: maybe (I deal with medical applications, so I suppose you could be treating unborn babies)
Date not older than 120 years: probably
I'm not a big fan of over-engineering these things, particular if a user mistake is relative harmless and can be spotted and fixed easily. That's how I approach it anyway.
Valid Date:
I'll go to the extend of checking whether this date exists or not. i.e. leap year 29th Feb and so on
Date in the future:
we usually check the age (this year - dob given) and must be at least a certain age to sign up.
Date older than 120 years or not:
I won't check. 200 years would be a safer limit? (in case a 121 year old man wants to use the computer *chuckles*)
I think you should consider your actual requirements when designing validations. Yes if the field is a date field (and perhaps more importantly if it stores a date but some less than stellar dba made it a varchar),make sure only a valid date is submitted. This is critical. Invalid dates cause all sorts of issues with querying the data. If it is a date that must of necessity have occurred in the past, limit the date range to the present date or earlier.
After that go with what your client wants. If they want to pay for you to eliminate people younger than work age, they will tell you. Disallowing a top age limit can get you into legal trouble for age discrimination. The client may not want you to do this either.
Humour is a pretty subjective thing and very project specific so it’s a bit difficult to answer along those lines. Having said that, if the application supports a formal process such as applying for a job I’d probably err on the side of caution and keep it pretty factual.
As for validation, I believe the effort so you go to here should be proportional to the impact of invalid data making its way through from the UI. Going back to the job application form, I imagine there will be a human review process at some time so the risk of invalid data is minimal whether the data was intentionally or inadvertently entered incorrectly.
If you’re worried about “punky” or bot driven inputs then use Captcha. Having said all that, I reckon you’re pretty safe with the validation rules you’ve used.
Well I'm not a programer (More of a BA) though I'm trying to gain some development skills as I think it may help me be a better BA. I've done a bit of VBA (Don't laugh).
Anyway in thinking about this here's my two cents
1) Dropping the humour. Whats funny to you now won't be to someone else. Furthermore, whats funny after two or three goes isn't funny after 25 or 30 - its just tiresome even if you are dealing with a jokey crowd!
2) I am coming round to the idea that unless you can definitively validate something as being plain wrong, E.g. you don't want to let someone enter a value < 0, then you should consider warning rather than prevention via dialogues or whatever the OS standard happens to be.
Hey what do I know, In a week I'll have changed my mind (I'm a Business Analyst) and will be demanding instant repsonses from developers ;->
Let's just use two digit years everywhere. No one's going to be using our software after 1999!
Below are the checks that you can do while validating the DOB:
calculate the age from the DOB and do the following checks
AGE > XX [XX is the min age required to apply]
AGE < XX {SHould throw a message mentioning that you are not old enough}
AGE = XX
If there is no upper limit of age then we can take it as retirement age else verify with the upper limit for the next two checks
AGE < Retirement Age
AGE > Retirement Age {Should throw a message mentiong that you are too old to apply}
AGE = retirement Age
DOB is a valid date (by giving valid date)
DOB is invalid -
Enter 0 in either of day/month/Year
Enter some negative Value
Enter some invalid date e.g. 30th feb or 32 Jan etc
Enter valid date with different separators (although the date is a valid one but due to different separators it will become an invalid one)
Enter date with different formats such as by giving dd/mm/yyyy, dd/mm/yy, dd/MON/yyyy etc.
Enter some future date (Invalid here as your purpose is something different)
being a perfectionist i would go here for 150 :D
as low as the chances are, people have passed the 120, and who know what shall happens in the coming 30 years :D
i don't find it that important however..
It all depends on the application. A line of business (LOB) application for order processing is very different to tracking historical or future data.
One can agree it needs to be a valid date, but consider there are multiple calendars (e.g. month number can be 13, year can be over 5000).
Validate for an integer and to be helpful; I think anything else: an abusive/big brother/over-enginereed system is a bad idea.
People should be allowed to lie on these forms if they wish; it's not a legal thing, it's a website.
Don't take it so seriously.
Just let the user pick a date. The user should be in control..not the system/developer. The only date you should avoid with respect to DOB is the future as that is incorrect (i.e. preventing error by design). The date picker you provide should handle any date format issues.
And definitely do not throw up any cheeky exceptions/messages. Your message should aid the user in recognising & recoverying from an error.
Hope that helps.