Timezones in MySQL - mysql

I'm playing with timezones in MySQL.
I need to assign people to timezones, and so I looked in mysql.time_zone_data.
Australia seems to have 5 independent timezones [1], so why does mysql.time_zone_data have 23 options?
Australia/ACT
Australia/Adelaide
Australia/Brisbane
Australia/Broken_Hill
Australia/Canberra
Australia/Currie
Australia/Darwin
Australia/Eucla
Australia/Hobart
Australia/LHI
Australia/Lindeman
Australia/Lord_Howe
Australia/Melbourne
Australia/NSW
Australia/North
Australia/Perth
Australia/Queensland
Australia/South
Australia/Sydney
Australia/Tasmania
Australia/Victoria
Australia/West
Australia/Yancowinna
[1] http://www.timetemperature.com/australia/australia_time_zones.shtml

The same reason, why there are several options in your OS, ...
Not everyone knows in which timezone his town is. So there are some huge cities (which are in the same zone) for selection. So you can look for a city near your location and automatically select the correct timezone.
For example: Berlin and Munich are in the same zone as well as Canberra and Sydney

why does mysql.time_zone_data have 23 options?
Usually because each of those mini-regions has historically had different time rules. They may be using the same timezones now, but if you want to reliably convert a time that might be in the past, you'll need to know which exact set of rules the locale has not just now, but for as far back in history as timezones have been stably legislated.
This is what makes timezone databases so absurdly large. Timezones are a horror.

Related

How do I specify a geographic location with the HTML "time" element?

The HTML5 specs for the time element have a note under the heading "A valid time-zone offset string" that says this:
For times without dates (or times referring to events that recur on multiple dates), specifying the geographic location that controls the time is usually more useful than specifying a time zone offset, because geographic locations change time zone offsets with daylight savings time. [...]
While I totally agree with this statement, I have been wondering - and this is my question - how can I specify a geographic location in the time element? I've been looking through the specs but I haven't found a clue. Additional web research also didn't yield any useful information. Can someone point me in the right direction?
BTW: I'm a beginner in web programming, and although this really seems to be just a minor detail I like to get things right from the start.
As far as I am aware, there is no way to specify <time> via region with raw HTML. I believe the documentation is simply stating that it's more useful to do it based on region, not that it is necessarily possible with raw HTML. This can certainly be achieved with a back-end language however, and injected into the <time> element (or datetime attribute).
Timezones can be specified with +, offset in relation to GMT:
<!-- GMT+1 (like Italy) -->
<time>+01:00</time>
And can be combined with fully-qualified times as well:
<!-- 16th September 2014 at 18 hours, 20 minutes, and 30 seconds
in a time zone of GMT+1 (like Italy) -->
<time>2014-09-16T18:20:30+01:00</time> in Italy
As is demonstrated above, perhaps the best you can do is explicitly state the relevant region, such as <time …>…</time> in Italy.
In order to retrieve the geographic timezone, IANA has a list of all applicable timezones per region.
Dates should be in the format yyyy-mm-ddTHH:MM[:SS[.mmm]] or yyyy-mm-dd HH:MM[:SS[.mmm]], where:
H stands for hours
M stands for minutes
S stands for seconds
m stands for milliseconds
The square brackets indicate the parts that are optional
Hope this helps! :)
From W3:
Definition and Usage
The tag defines a human-readable date/time.
This element can also be used to encode dates and times in a machine-readable way so that user agents can offer to add birthday reminders or scheduled events to the user's calendar, and search engines can produce smarter search results.
From Mozilla:
The HTML time element represents either a time on a 24-hour clock or a precise date in the Gregorian calendar (with optional time and timezone information).
So in other words, the time element isn't really supposed to be used for a precise geolocation, but maybe a timezone. For location, like #Ryan suggested, do something along the lines of <time …>…</time> in Paris

pyephem, libnova, stellarium, JPL Horizons disagree on moon RA/DEC?

MINOR EDIT: I say below that JPL's Horizons library is not open source. Actually, it is, and it's available here: http://naif.jpl.nasa.gov/naif/tutorials.html
At 2013-01-01 00:00:00 UTC at 0 degrees north latitude, 0 degrees east
latitude, sea level elevation, what is the J2000 epoch right ascension
and declination of the moon?
Sadly, different libraries give slightly different answers. Converted
to degrees, the summarized results (RA first):
Stellarium: 141.9408333000, 9.8899166666 [precision: .0004166640, .0000277777]
Pyephem: 142.1278749990, 9.8274722221 [precision .0000416655, .0000277777]
Libnova: 141.320712606865, 9.76909442356909 [precision unknown]
Horizons: 141.9455833320, 9.8878888888 [precision: .0000416655, .0000277777]
My question: why? Notes:
I realize these differences are small, but:
I use pyephem and libnova to calculate sun/moon rise/set, and
these times can be very sensitive to position at higher latitudes
(eg, midnight sun).
I can understand JPL's Horizons library not being open source,
but the other three are. Shouldn't someone work out the
differences in these libraries and merge them? This is my main
complaint. Do the stellarium/pyephem/libnova library authors have
a fundamental difference in how to make these calculations, or do
they just need to merge their code?
I also realize there might be other reasons the calculations are
different, and would appreciate any help in rectifying these
possible errors:
Pyephem and Libnova may be using the epoch of the date instead of J2000
The moon is close enough that observer location can affect its
RA/DEC (parallax effect).
I'm using Perl's Astro::Nova and Python's pyephem, not the
original C implementations of these libraries. However, if these
differences are caused by using Perl/Python, that is important in
my opinion.
My code (w/ raw results):
First, Perl and Astro::Nova:
#!/bin/perl
# RA/DEC of moon at 0N 0E at 0000 UTC 01 Jan 2013
use Astro::Nova;
# 1356998400 == 01 Jan 2013 0000 UTC
$jd = Astro::Nova::get_julian_from_timet(1356998400);
$coords = Astro::Nova::get_lunar_equ_coords($jd);
print join(",",($coords->get_ra(), $coords->get_dec())),"\n";
RESULT: 141.320712606865,9.76909442356909
- Second, Python and pyephem:
#!/usr/local/bin/python
# RA/DEC of moon at 0N 0E at 0000 UTC 01 Jan 2013
import ephem; e = ephem.Observer(); e.date = '2013/01/01 00:00:00';
moon = ephem.Moon(); moon.compute(e); print moon.ra, moon.dec
RESULT: 9:28:30.69 9:49:38.9
- The stellarium result (snapshot):
- The JPL Horizons result (snapshot):
[JPL Horizons requires POST data (not really, but pretend), so I
couldn't post a URL].
I haven't linked them (lazy), but I believe there are many
unanswered questions on stackoverflow that effectively reduce to
this question (inconsistency of precision astronomical libraries),
including some of my own questions.
I'm playing w this stuff at: https://github.com/barrycarter/bcapps/tree/master/ASTRO
I have no idea what Stellarium is doing, but I think I know about the other three. You are correct that only Horizons is using J2000 instead of the epoch-of-date for this apparent, locale-specific observation. You can bring it into close agreement with PyEphem by clicking "change" next to the "Table Settings" and switching from "1. Astrometric RA & DEC" to "2. Apparent RA & DEC."
The difference with Libnova is a bit trickier, but my late-night guess is that Libnova uses UT instead of Ephemeris Time, and so to make PyEphem give the same answer you have to convert from one time to the other:
import ephem
moon, e = ephem.Moon(), ephem.Observer()
e.date = '2013/01/01 00:00:00'
e.date -= ephem.delta_t() * ephem.second
moon.compute(e)
print moon.a_ra / ephem.degree, moon.a_dec / ephem.degree
This outputs:
141.320681918 9.77023197401
Which is, at least, much closer than before. Note that you might also want to do this in your PyEphem code if you want it to ignore refraction like you have asked Horizons to; though for this particular observation I am not seeing it make any difference:
e.pressure = 0
Any residual difference is probably (but not definitely; there could be other sources of error that are not occurring to me right now) due to the different programs using different formulae to predict where the planets will be. PyEphem uses the old but popular VSOP87. Horizons uses the much more recent — and exact — DE405 and DE406, as stated in its output. I do not know what models of the solar system the other products use.

Calculate date from numeric value

The number 71867806 represents the present day, with the smallest unit of days.
Sorry guy's, caching owned me, it's actually milliseconds!
How can I
calculate the currente date from it?
(or) convert it into an Unix timestamp?
Solution shouldn't use language depending features.
Thanks!
This depends on:
What unit this number represents (days, seconds, milliseconds, ticks?)
When the starting date was
In general I would discourage you from trying to reinvent the wheel here, since you will have to handle every single exception in regards to dates yourself.
If it's truly an integer number of days, and the number you've given is for today (April 21, 2010, for me as I'm reading this), then the "zero day" (the epoch) was obviously enough 71867806 days ago. I can't quite imagine why somebody would pick that though -- it works out to roughly 196,763 years ago (~194,753 BC, if you prefer). That seems like a strange enough time to pick that I'm going to guess that there's more to this than what you've told us (perhaps more than you know about).
It seems to me the first thing to do is verify that the number does increase by one every 24 hours. If at all possible keep track of the exact time when it does increment.
First, you have only one point, and that's not quite enough. Get the number for "tomorrow" and see if that's 71867806+1. If it is, then you can safely bet that +1 means +1 day. If it's something like tomorrow-today = 24, then odds are +1 means +1 hour, and the logic to display days only shows you the "day" part. If it's something else check to see if it's near (24*60, which would be minutes), (24*60*60, which would be seconds), or (24*60*60*1000, which would be milliseconds).
Once you have an idea of what kind of units you are using, you can estimate how many years ago the "start" date of 0 was. See if that aligns with any of the common calendar systems located at http://en.wikipedia.org/wiki/List_of_calendars. Odds are that the calendar you are using isn't a truly new creation, but a reimplementation of an existing calendar. If it seems very far back, it might be an Julian Date, which has day 0 equivalent to BCE 4713 January 01 12:00:00.0 UT Monday. Julian Dates and Modified Julian dates are often used in astronomy calculations.
The next major goal is to find Jan 1, 1970 00:00:00. If you can find the number that represents that date, then you simply subtract it from this foreign calendar system and convert the remainder from the discovered units to milliseconds. That will give you UNIX time which you can then use with the standard UNIX utilities to convert to a time in any time zone you like.
In the end, you might not be able to be 100% certain that your conversion is exactly the same as the hand implemented system, but if you can test your assumptions about the calendar by plugging in numbers and seeing if they display as you predicted. Use this technique to create a battery of tests which will help you determine how this system handles leap years, etc. Remember, it might not handle them at all!
What time is: 71,867,806 miliseconds from midnight?
There are:
- 86,400,000 ms/day
- 3,600,000 ms/hour
- 60,000 ms/minute
- 1,000 ms/second
Remove and tally these units until you have the time, as follows:
How many days? None because 71,867,806 is less than 86,400,000
How many hours? Maximum times 3,600,000 can be removed is 19 times
71,867,806 - (3,600,000 * 19) = 3,467,806 ms left.
How many minutes? Maximum times 60,000 can be removed is 57 times.
3,467,806 - (60,000 * 57) = 47,806 ms left
How many seconds? Maximum times 1,000 can be removed is 47 times.
47,806 - (1,000 * 47) = 806
So the time is: 19:57:47.806
It is indeed a fairly long time ago if the smallest number is in days. However, assuming you're sure about it I could suggest the following shell command which would be obviously not valid for dates before 1st Jan. 1970:
date -d "#$(echo '(71867806-71853086)*3600*24'|bc)" +%D
or without bc:
date -d "#$(((71867806 - 71853086) * 3600 * 24))" +%D
Sorry again for the messy question, i got the solution now. In js it looks like that:
var dayZero = new Date(new Date().getTime() - 71867806 * 1000);

Human name comparison: ways to approach this task

I'm not a Natural Language Programming student, yet I know it's not trivial strcmp(n1,n2).
Here's what i've learned so far:
comparing Personal Names can't be solved 100%
there are ways to achieve certain degree of accuracy.
the answer will be locale-specific, that's OK.
I'm not looking for spelling alternatives! The assumption is that the input's spelling is correct.
For example, all the names below can refer to the same person:
Berry Tsakala
Bernard Tsakala
Berry J. Tsakala
Tsakala, Berry
I'm trying to:
build (or copy) an algorithm which grades the relationship 2 input names
find an indexing method (for names in my database, for hash tables, etc.)
note:
My task isn't about finding names in text, but to compare 2 names. e.g.
name_compare( "James Brown", "Brown, James", "en-US" ) ---> 99.0%
I used Tanimoto Coefficient for a quick (but not super) solution, in Python:
"""
Formula:
Na = number of set A elements
Nb = number of set B elements
Nc = number of common items
T = Nc / (Na + Nb - Nc)
"""
def tanimoto(a, b):
c = [v for v in a if v in b]
return float(len(c)) / (len(a)+len(b)-len(c))
def name_compare(name1, name2):
return tanimoto(name1, name2)
>>> name_compare("James Brown", "Brown, James")
0.91666666666666663
>>> name_compare("Berry Tsakala", "Bernard Tsakala")
0.75
>>>
Edit: A link to a good and useful book.
Soundex is sometimes used to compare similar names. It doesn't deal with first name/last name ordering, but you could probably just have your code look for the comma to solve that problem.
We've just been doing this sort of work non-stop lately and the approach we've taken is to have a look-up table or alias list. If you can discount misspellings/misheard/non-english names then the difficult part is taken away. In your examples we would assume that the first word and the last word are the forename and the surname. Anything in between would be discarded (middle names, initials). Berry and Bernard would be in the alias list - and when Tsakala did not match to Berry we would flip the word order around and then get the match.
One thing you need to understand is the database/people lists you are dealing with. In the English speaking world middle names are inconsistently recorded. So you can't make or deny a match based on the middle name or middle initial. Soundex will not help you with common name aliases such as "Dick" and "Richard", "Berry" and "Bernard" and possibly "Steve" and "Stephen". In some communities it is quite common for people to live at the same address and have 2 or 3 generations living at that address with the same name. The only way you can separate them is by date of birth. Date of birth may or may not be recorded. If you have the clout then you should probably make the recording of date of birth mandatory. A lot of "people databases" either don't record date of birth or won't give them away due to privacy reasons.
Effectively people name matching is not that complicated. Its entirely based on the quality of the data supplied. What happens in practice is that a lot of records remain unmatched - and even a human looking at them can't resolve the mismatch. A human may notice name aliases not recorded in the aliases list or may be able to look up details of the person on the internet - but you can't really expect your programme to do that.
Banks, credit rating organisations and the government have a lot of detailed information about us. Previous addresses, date of birth etc. And that helps them join up names. But for us normal programmers there is no magic bullet.
Analyzing name order and the existence of middle names/initials is trivial, of course, so it looks like the real challenge is knowing common name alternatives. I doubt this can be done without using some sort of nickname lookup table. This list is a good starting point. It doesn't map Bernard to Berry, but it would probably catch the most common cases. Perhaps an even more exhaustive list can be found elsewhere, but I definitely think that a locale-specific lookup table is the way to go.
I had real problems with the Tanimoto using utf-8.
What works for languages that use diacritical signs is difflib.SequenceMatcher()

"Date of birth" validation: How far/much would you go?

I'm quite anal about form validation. So while creating a validator for a "data of birth" (DOB) field in one of my current projects for a job application form (platform/language is neutral in this context), I wanted something to prevent 'punky' inputs.
I used a date picker and restricted the max date to be XX years from the current day. XX make sense for this scenario as anyone younger shouldn't be even applying for the job.
The validation error message is: You seem too young for the job.
Then I began to get adventurous. How about?
If DOB is more than 120 years ago, message: "You cannot be that old!!!"
If DOB is in the future, message: "You must be kidding, you are not born yet!!!"
In the end, I deployed without the last 2, too cheeky for my no-nonsense client.
I would like to know how far/much would you guys go to validate DOB fields for good usability (or humor)?
Similarly for dates like, "Date of marriage", "Year of graduation" etc...
PS: As I was about to submit this post, there's a warning under the title textbox:
"The question you're asking appears subjective and is likely to be closed."
Fingers crossed.
To add:
I'm quite surprised that some/most of the guys are not too concern about the validation. I repeat one of my comments here:
If the user entered the date wrongly (something very obvious) whether by intent or by mistake; that's one of the purposes of the validators to catch it. When data goes into the system, the site owner only know the input is wrong, he/she would not know the actual value without asking the user. If this field is highly important, it will not be a pretty scenario.
Think about the times you've filled out forms. How many times have you been frustrated because some "overly clever" programmer inserted some "validation" that just happened to be incorrect for your circumstance? I say, trust the user. Come to think of it, as time goes on I guess people are living longer and getting on the net at earlier ages, anyway. :P
don't forget you can also warn the user against unlikely values. In most cases, a typo is more likely than deliberately being awkward.
So for your application, maybe something like this:
Age < min. applicant age - error
Age > common retirement age - warning
Age > expected life span - error
Validation vs. Correctness
The point of input validation is to ensure all elements are within the range allowed for and expected by further processing - i.e. if your database guarantees all applicants in the DB are 18 years or older, validate that. If your database also accepts school kids applying for internships, don't.
Everything unusual is just a warning. Yes, a value of 120 years is crazy, you should warn the user and possibly flag this record as suspicous / for review. However, there's no point in rejecting it (unless you have a business rule that e.g. all applicants are younger than 70).
Fake trust
Imagine what happens if you tell one user that "you rule out unlikely DOBs at the input". She might tell her co-worker that DOB is "already validated". He ends up with an unfounded trust that the applicant is 90, and if it were a fake you would have rejected it.
All further processing - by human or by computer - must still assume the DOB may be incorrect - just because of a typo. You are trying to create a guarantee you can't actually make. Many users trust the computer they use every day more than a stranger, you are trying to enforce this trust - which is IMO s fallacy.
Transmutation
Many applications live much longer than the original implementer imagined, and quite some will be used for purposes beyond his wildest dreams. Building in artificial limits that neither simplify the actual processing nor the job of the operator don't actually help.
(That puts me probably into the no-nonsense category of your client - but thst's my way to be "anal about validation": knowing when to stop :))
I think validation is incredibly important, but not necessarily in your situation. Which isn't to say that your situation is trivial, I just have my own date-oriented nits to pick.
Specifically, my concerns are always in keeping things in logical order. If someone says they were born in 1802, that's fine (sorta), I just want their date of graduation to be greater than their date of birth. But you run into itchy little problems when it comes to time (as in hours and minutes), for instance, if a user chooses 8:30 as the start time and then chooses 9:15 as the end time, but then realizes that the end time was 8:45. They decide to change the 9 to an 8 with the intention of changing the minutes to :45. But my validation script is too busy saying "Hey Wait! 8:15 is before 8:30, nice try!" but I can't risk letting them leave it wrong, etc etc.
For your situation specifically, I would lean toward what is ethical right. Because as it's been pointed out, someone could be entering a family history (with DOBs in the 1600's) or future purchases (with dates after today), so there is no realistic limit on dates in general. But there are limits to your scenario, ie:
If Age is less than legal working age (16 in most parts of the US), don't even offer anything higher than that year as an option (if you are using drop down).
If Age is beyond reasonable working age (which can be a sensitive subject) offer the highest value based on retirement age and simply add a ">" in front of that year. If someone is 75 and applying for an admin-level job, they will be more pleased that you made things simple rather than offended that you didn't have their year of birth listed. If anything, they will be impressed (I think) that you went this route instead of nothing at all, implying they shouldn't waste their time.
In the end you have a simple drop down very easy to script (example in PHP):
$currentYear = date('Y');
echo "<select name=\"YearOfBirth\">";
for($i = 16; $i <= 64; $i++) {
$optionYear = $currentYear - $i;
echo "<option value=\"$optionYear\">$optionYear</option>";
}
$greaterYear = $currentYear - 65;
echo "<option value=\">$greaterYear\">>$greaterYear</option>";
echo "</select>";
When asking living people for their birthdate, only reject values that are definitely wrong. Any birthdate in the future is definitely wrong. And I would draw a line and say that any birthdate before (say) 1880 is definitely wrong. Anything else is a valid birthdate.
So any birthdate that fails the above tests is rejected with a message at field level, like "This date is in the future/too far in the past. Please enter your birthdate."
Any other birthdate is valid (maybe the user really is 11 years old, or 108). But the overall form may be rejected by business rules. For example, "You must be at least 18 years old to apply."
The idea is to separate individual field validation from form validation. Conflating them yields complicated rules. Separating means you can re-use the rules for the field (e.g. "DOB of a living person must be between 1/1/1880 and today") in other contexts.
If you're doing this for anything professional - like a job application - I might not use "!" in messages to users. Take a look at any well done website you'd like, you're not going to find it in common use.
Valid date: check
Date not in future: maybe (I deal with medical applications, so I suppose you could be treating unborn babies)
Date not older than 120 years: probably
I'm not a big fan of over-engineering these things, particular if a user mistake is relative harmless and can be spotted and fixed easily. That's how I approach it anyway.
Valid Date:
I'll go to the extend of checking whether this date exists or not. i.e. leap year 29th Feb and so on
Date in the future:
we usually check the age (this year - dob given) and must be at least a certain age to sign up.
Date older than 120 years or not:
I won't check. 200 years would be a safer limit? (in case a 121 year old man wants to use the computer *chuckles*)
I think you should consider your actual requirements when designing validations. Yes if the field is a date field (and perhaps more importantly if it stores a date but some less than stellar dba made it a varchar),make sure only a valid date is submitted. This is critical. Invalid dates cause all sorts of issues with querying the data. If it is a date that must of necessity have occurred in the past, limit the date range to the present date or earlier.
After that go with what your client wants. If they want to pay for you to eliminate people younger than work age, they will tell you. Disallowing a top age limit can get you into legal trouble for age discrimination. The client may not want you to do this either.
Humour is a pretty subjective thing and very project specific so it’s a bit difficult to answer along those lines. Having said that, if the application supports a formal process such as applying for a job I’d probably err on the side of caution and keep it pretty factual.
As for validation, I believe the effort so you go to here should be proportional to the impact of invalid data making its way through from the UI. Going back to the job application form, I imagine there will be a human review process at some time so the risk of invalid data is minimal whether the data was intentionally or inadvertently entered incorrectly.
If you’re worried about “punky” or bot driven inputs then use Captcha. Having said all that, I reckon you’re pretty safe with the validation rules you’ve used.
Well I'm not a programer (More of a BA) though I'm trying to gain some development skills as I think it may help me be a better BA. I've done a bit of VBA (Don't laugh).
Anyway in thinking about this here's my two cents
1) Dropping the humour. Whats funny to you now won't be to someone else. Furthermore, whats funny after two or three goes isn't funny after 25 or 30 - its just tiresome even if you are dealing with a jokey crowd!
2) I am coming round to the idea that unless you can definitively validate something as being plain wrong, E.g. you don't want to let someone enter a value < 0, then you should consider warning rather than prevention via dialogues or whatever the OS standard happens to be.
Hey what do I know, In a week I'll have changed my mind (I'm a Business Analyst) and will be demanding instant repsonses from developers ;->
Let's just use two digit years everywhere. No one's going to be using our software after 1999!
Below are the checks that you can do while validating the DOB:
calculate the age from the DOB and do the following checks
AGE > XX [XX is the min age required to apply]
AGE < XX {SHould throw a message mentioning that you are not old enough}
AGE = XX
If there is no upper limit of age then we can take it as retirement age else verify with the upper limit for the next two checks
AGE < Retirement Age
AGE > Retirement Age {Should throw a message mentiong that you are too old to apply}
AGE = retirement Age
DOB is a valid date (by giving valid date)
DOB is invalid -
Enter 0 in either of day/month/Year
Enter some negative Value
Enter some invalid date e.g. 30th feb or 32 Jan etc
Enter valid date with different separators (although the date is a valid one but due to different separators it will become an invalid one)
Enter date with different formats such as by giving dd/mm/yyyy, dd/mm/yy, dd/MON/yyyy etc.
Enter some future date (Invalid here as your purpose is something different)
being a perfectionist i would go here for 150 :D
as low as the chances are, people have passed the 120, and who know what shall happens in the coming 30 years :D
i don't find it that important however..
It all depends on the application. A line of business (LOB) application for order processing is very different to tracking historical or future data.
One can agree it needs to be a valid date, but consider there are multiple calendars (e.g. month number can be 13, year can be over 5000).
Validate for an integer and to be helpful; I think anything else: an abusive/big brother/over-enginereed system is a bad idea.
People should be allowed to lie on these forms if they wish; it's not a legal thing, it's a website.
Don't take it so seriously.
Just let the user pick a date. The user should be in control..not the system/developer. The only date you should avoid with respect to DOB is the future as that is incorrect (i.e. preventing error by design). The date picker you provide should handle any date format issues.
And definitely do not throw up any cheeky exceptions/messages. Your message should aid the user in recognising & recoverying from an error.
Hope that helps.