free form date field - html

I was reading this archive on joel on sw http://www.joelonsoftware.com/articles/TwoStories.html
here he mentions he wanted to make a date field free format. Then I thought how would someone do that? ofcourse search in gmail does it.
has anyone implemented this before? and how?

You could check out any of the following date parsing implementations
Google's parsedatetime on google code (http://code.google.com/p/parsedatetime/)
python-dateutil (http://labix.org/python-dateutil)
javascript lib (http://www.datejs.com/)
perl (http://metacpan.org/pod/DateTime::Format::Flexible)

PHP's strtotime() is a very nice implementation of this.

If you have users both side of the Atlantic you'll need to prevent 09/10/11 style dates explicitly as much confusion will happen - either require the month in letters or use a drop-down for it and abandon free-form.
Remember 09/11 is the ninth of November in Europe. Using an IP address's location to disambiguate fails because Europeans and Americans can cross the Atlantic!

Related

Workflowy + iCal integration

Tasks entered in Workflowy that have a date set, shall appear in my iCal.
Is this somehow possible?
As Frequently Asked Questions of WorkFlowy list, date is in plan.
iCal may be implemented by some guy using API...
What features are coming up for WorkFlowy?
The following are some of the features we plan on building into WorkFlowy. We'll implement them with the standard WorkFlowy style and simplicity.
Offline access
Dates & reminders
Multi-item selection
Colors and formatting
Collaboration improvements
API
I agree that would be awsome, but as you might have guessed by now, nope not possible.. yet.

Is there a way to formerly define a time interval for configuring a process?

Horribly worded question...I know.
I'm working on an application that processes data for the previous day. The problem is that I know the customer is going to eventually ask to it for every hour or some other arbitrary time interval. I know that languages such as Java or SQL have masks for defining dates. Well what about a way to define a time interval?
Let me ask it this way. If someone asked you to create a configurable piece of software how would you allow the user to specify the time intervals?
UPDATE
What I'm looking for is a textual representation. Imagine somethign that would be put into a config file.
Here's how Google App Engine's cron API is configured, it's a bit more user-friendly than cron:
http://code.google.com/appengine/docs/python/config/cron.html#The_Schedule_Format
Python (and Java?) source to implement it should be available in the SDK. Haven't looked to see how easy it would be to extract, but it should at least provide some ideas.
I think it'd be possible to add various things to the format as required - it's structured enough to be extensible. For example it's currently missing the ability to say "every hour at 4 minutes past", which is common in UNIX cron but not really relevant to GAE because although perhaps the engine could figure out which minutes are busy and which aren't, and balance load, the user certainly can't.
Obviously the major weakness of offering something that looks like natural language to the average user, is that they'll think your code is psychic, and expect it to also understand things like "every other Wednesday except the week after Easter", or "whenever the clocks change" ;-)
I'm not sure if I understand your question correctly. Are you asking for a way to input time intervals?
I don't know your environment, but most language frameworks offer some more or less sophisticated GUI components. Something like a calendar control may be what you are looking for?
Edit: Ah, command line (or config file) it is.
Can you parse an XML file? In this case, you could just serialize a timespan (most languages have a class like this). The use may edit the XML file, typically just replacing, say, the value of the <minutes> part. Deserializing this XML file to obtain a Timespan again is easy.
If you prefer plaintext or command line input, it is a bit less easy. However, many languages support some kind of Timespan.Parse (string text, string format). You should have a look if this concept is present in your environment.
Many environment offer some kind of sscanf(), that parses an input string. Sometimes there is a "time" format as well.
If the earlier ideas don't provide a solution, you can try to parse the date using regex. There should be many resources on this topic on this site as well as the web.
If you dislike regex, split the input string according to your own format rules. That's kind of an ugly solution though.
If you don't feel like splitting user input stringy by hand, use Random.NextInt(1000) and hope nobody notices. 0:-)

any way to "ping" a phone number?

We have a customer who wants to go through their CRM database and somehow determine phone numbers which are valid, without actually having someone sit there and try calling them all.
Is there any way to do something akin to a "ping" on a phone number (including landlines)?
You will need to go through a third party. I have used Melissa data for address verification with good success, they also offer phone verification, but I have not used it
http://www.melissadata.com/listservices/resphoneverify.htm
If getting a 100% correct phone number is crucial, I'd look into a service which would actually call the number, give a verification code and make the user confirm that code with the site. It is a PIA from the users perspective, but that is the most complete route you can take. Doing a quick little googling came up with this site, http://www.phoneconfirm.com which seems to do what I mentioned. I am sure there are others though.
If you can't/don't want to go through a third party, I can't imagine writing something like this yourself would be impossible. Scaling it would be the biggest issue.
could always go with the good ole war dialer
I believe a CTI system using ISDN calling based service can quickly return a status code that the number is either valid/invalid before the destination begins to ring.
One vendor is Katalina systems, their product is called VoiceGuide and they have a dialling out module that may give you what you want. see www.voiceguide.com.
Just export the calling list to the dialler (csv file) and review the call status after processing.
If the list is very large, it may justify purchasing a system to do this. The rate of calling depends upon the number of lines installed/availble. You might require some custom modifications to abort the call after obtaining the status. Katalina should be able to help. I am not sure if VoIP trunks can give you full access to the line status.
I once did something like that. Yeah, for telemarketers. And yeah, it haunts my conscience to this day.
It was based on a module called app_amd.c (Answering Machine Detection) which was a third party add-on for Asterisk and, AFAIK, can be found in their main tree now. With an E1/T1, you can also distinguish between bad numbers, busy, and many other status codes. Look that up, it may help.

Reliably extracting identity fields from scanned documents / images?

I have to pull two pre-printed (not hand-written) fields out of a paper form, such that it can be automatically routed after being scanned. The fields contain batch and item identifiers, like "GG-9192" or "EPN/245G".
I've tried the following software:
Tesseract-OCR
Cuneiform
Canon ImageRunner built-in OCR
Asprise OCR Java API (demo)
I've tried the following settings:
Scanning at resolutions of 300dpi and 600dpi
Tried different fonts, including OCR-A and OCR-B.
In all cases output was pretty much all over the place. I can kick back documents for which I can't properly extract the necessary information, but I'm thinking it's going to be at least half of them. I considered some sort of fuzzy logic based on known values in a database, but sometimes these identifiers can differ by a single character, like "123G" and "123C".
Is this a lost cause? Perhaps OCR just isn't mature enough to handle a requirement of this nature? What other techniques might you recommend? Barcodes?
Edit: the containing application is in Java, so any recommendations for which there are free or cheap Java-based APIs for would help.
Edit 2: if anyone is interested...without any special tuning, Cuneiform for Linux and the Canon ImageRunner worked best, with Tesserect-OCR and Asprise Java API producing the worst results...none of the four was acceptable for anything but standard document search grade OCR. I'm beginning to think that this isn't going to work out.
If you have control over the fields, why use a human-readable format in the first place? For scanning, it seems like a QR Code, or something similar would be best. It is marked for orientation, and has some built-in error correction.
http://en.wikipedia.org/wiki/QR_Code
I started digging for products starting with Tomato's suggestion. I tried ABBYY and CVISION. Both have products that can automate OCR:
CVISION Maestro Recognition Server 4.0
ABBYY Recognition Server 2.0
In addition, ABBYY has SDKs for various platforms, and CVISION has an SDK that appears to work with at least VB/VC++.
I haven't tried either SDK yet, and am not sure it's necessary for my project. All I need is PDFs coming in that I can extract the text from. I did however try CVISION's server product and with the OCR on its most accurate settings, it worked really well. I haven't tried ABBYY's server product yet because I have to go through a reseller to get a trial. I'm in the process of doing so, but if it starts getting annoying I'm probably going to go with CVISION. I did try ABBYY's FineReader standalone product, and it worked very well, so I assume that their server product would also.

Internationalization in your projects

How have you implemented Internationalization (i18n) in actual projects you've worked on?
I took an interest in making software cross-cultural after I read the famous post by Joel, The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). However, I have yet to able to take advantage of this in a real project, besides making sure I used Unicode strings where possible. But making all your strings Unicode and ensuring you understand what encoding everything you work with is in is just the tip of the i18n iceberg.
Everything I have worked on to date has been for use by a controlled set of US English speaking people, or i18n just wasn't something we had time to work on before pushing the project live. So I am looking for any tips or war stories people have about making software more localized in real world projects.
It has been a while, so this is not comprehensive.
Character Sets
Unicode is great, but you can't get away with ignoring other character sets. The default character set on Windows XP (English) is Cp1252. On the web, you don't know what a browser will send you (though hopefully your container will handle most of this). And don't be surprised when there are bugs in whatever implementation you are using. Character sets can have interesting interactions with filenames when they move to between machines.
Translating Strings
Translators are, generally speaking, not coders. If you send a source file to a translator, they will break it. Strings should be extracted to resource files (e.g. properties files in Java or resource DLLs in Visual C++). Translators should be given files that are difficult to break and tools that don't let them break them.
Translators do not know where strings come from in a product. It is difficult to translate a string without context. If you do not provide guidance, the quality of the translation will suffer.
While on the subject of context, you may see the same string "foo" crop up in multiple times and think it would be more efficient to have all instances in the UI point to the same resource. This is a bad idea. Words may be very context-sensitive in some languages.
Translating strings costs money. If you release a new version of a product, it makes sense to recover the old versions. Have tools to recover strings from your old resource files.
String concatenation and manual manipulation of strings should be minimized. Use the format functions where applicable.
Translators need to be able to modify hotkeys. Ctrl+P is print in English; the Germans use Ctrl+D.
If you have a translation process that requires someone to manually cut and paste strings at any time, you are asking for trouble.
Dates, Times, Calendars, Currency, Number Formats, Time Zones
These can all vary from country to country. A comma may be used to denote decimal places. Times may be in 24hour notation. Not everyone uses the Gregorian calendar. You need to be unambiguous, too. If you take care to display dates as MM/DD/YYYY for the USA and DD/MM/YYYY for the UK on your website, the dates are ambiguous unless the user knows you've done it.
Especially Currency
The Locale functions provided in the class libraries will give you the local currency symbol, but you can't just stick a pound (sterling) or euro symbol in front of a value that gives a price in dollars.
User Interfaces
Layout should be dynamic. Not only are strings likely to double in length on translation, the entire UI may need to be inverted (Hebrew; Arabic) so that the controls run from right to left. And that is before we get to Asia.
Testing Prior To Translation
Use static analysis of your code to locate problems. At a bare minimum, leverage the tools built into your IDE. (Eclipse users can go to Window > Preferences > Java > Compiler > Errors/Warnings and check for non-externalised strings.)
Smoke test by simulating translation. It isn't difficult to parse a resource file and replace strings with a pseudo-translated version that doubles the length and inserts funky characters. You don't have to speak a language to use a foreign operating system. Modern systems should let you log in as a foreign user with translated strings and foreign locale. If you are familiar with your OS, you can figure out what does what without knowing a single word of the language.
Keyboard maps and character set references are very useful.
Virtualisation would be very useful here.
Non-technical Issues
Sometimes you have to be sensitive to cultural differences (offence or incomprehension may result). A mistake you often see is the use of flags as a visual cue choosing a website language or geography. Unless you want your software to declare sides in global politics, this is a bad idea. If you were French and offered the option for English with St. George's flag (the flag of England is a red cross on a white field), this might result in confusion for many English speakers - assume similar issues will arise with foreign languages and countries. Icons need to be vetted for cultural relevance. What does a thumbs-up or a green tick mean? Language should be relatively neutral - addressing users in a particular manner may be acceptable in one region, but considered rude in another.
Resources
C++ and Java programmers may find the ICU website useful: http://www.icu-project.org/
Some fun things:
Having a PHP and MySQL Application that works well with German and French, but now needs to support Russian and Chinese. I think I move this over to .net, as PHP's Unicode support is - in my opinion - not really good. Sure, juggling around with utf8_de/encode or the mbstring-functions is fun. Almost as fun as having Freddy Krüger visit you at night...
Realizing that some languages are a LOT more Verbose than others. German is a LOT more verbose than English usually, and seeing how the German Version destroys the User Interface because too little space was allocated was not fun. Some products gained some fame for their creative ways to work around that, with Oblivion's "Schw.Tr.d.Le.En.W." being memorable :-)
Playing around with date formats, woohoo! Yes, there ARE actually people in the world who use date formats where the day goes in the middle. Sooooo much fun trying to find out what 07/02/2008 is supposed to mean, just because some users might believe it could be July 2... But then again, you guys over the pond may believe the same about users who put the month in the middle :-P, especially because in English, July 2 sounds a lot better than 2nd of July, something that does not neccessarily apply to other languages (i.e. in German, you would never say Juli 2 but always Zweiter Juli). I use 2008-02-07 whenever possible. It's clear that it means February 7 and it sorts properly, but dd/mm vs. mm/dd can be a really tricky problem.
Anoter fun thing, Number formats! 10.000,50 vs 10,000.50 vs. 10 000,50 vs. 10'000,50... This is my biggest nightmare right now, having to support a multi-cultural environent but not having any way to reliably know what number format the user will use.
Formal or Informal. In some language, there are two ways to address people, a formal way and a more informal way. In English, you just say "You", but in German you have to decide between the formal "Sie" and the informal "Du", same for French Tu/Vous. It's usually a safe bet to choose the formal way, but this is easily overlooked.
Calendars. In Europe, the first day of the Week is Monday, whereas in the US it's Sunday. Calendar Widgets are nice. Showing a Calendar with Sunday on the left and Saturday on the right to a European user is not so nice, it confuses them.
I worked on a project for my previous employer that used .NET, and there was a built in .resx format we used. We basically had a file that had all translations in the .resx file, and then multiple files with different translations. The consequence of this is that you have to be very diligent about ensuring that all strings visible in the application are stored in the .resx, and anytime one is changed you have to update all languages you support.
If you get lazy and don't notify the people in charge of translations, or you embed strings without going through your localization system, it will be a nightmare to try and fix it later. Similarly, if localization is an afterthought, it will be very difficult to put in place. Bottom line, if you don't have all visible strings stored externally in a standard place, it will be very difficult to find all that need to be localized.
One other note, very strictly avoid concatenating visible strings directly, such as
String message = "The " + item + " is on sale!";
Instead, you must use something like
String message = String.Format("The {0} is on sale!", item);
The reason for this is that different languages often order the words differently, and concatenating strings directly will need a new build to fix, but if you used some kind of string replacement mechanism like above, you can modify your .resx file (or whatever localization files you use) for the specific language that needs to reorder the words.
I was just listening to a Podcast from Scott Hanselman this morning, where he talks about internationalization, especially the really tricky things, like Turkish (with it's four i's) and Thai. Also, Jeff Atwood had a post:
Besides all the previous tips, remember that i18n it's not just about changing words for their equivalent on other languages, especially for non-latin languages alphabets (korean, Arabic) which written right to left, so the whole UI will have to conform, like
item 1
item 2
item 3
would have to be
arabic text 1 -
arabic text 2 -
arabic text 3 -
(reversed bullet list doesn't seem to work :P)
which can be a UI nightmare if your system has to apply changes dinamically once the user changes the language being used.
Another very hard thing is to test different languages, not just for the correctness of word, but since languages like Korean usually have bigger font type for their characters this may lead to language specific bugs (like "SAVE" text on a button being larger than the button itself for some language).
One of the funnier things to discover: italics and bold text makrup does not work with CJK (Chinese/Japanese/Korean) characters. They simply become unreadable. (OK, I couldn't really read them before either, but especially bolding just creates ink blots)
I think everyone working in internationalization should be familiar with the Common Locale Data Repository, which is now a sub-project of Unicode:
Common Locale Data Repository
Those folks are working hard to establish a standard resource for all kinds of i18n issues: currency, geographical names, tons of stuff. Any project that's maintaining its own core local data given that this project exists is pretty bonkers, IMHO.
I suggest to use something like 99translations.com to maintain your translations . Otherwise you won't be able to tell what of your translations are up to date in every language.
Another challenge will be accepting input from your users. In many cases, this is eased by the input processing provided by the operating system, such as IME in Windows, which works transparently with common text widgets, but this facility will not be available for every possible need.
One website I use has a translation method the owner calls "wiki + machine translation". This is a community based site so is obviously different to the needs of companies.
http://blog.bookmooch.com/2007/09/23/how-bookmooch-does-its-translations/
One thing no one have mentioned yet is strings with some warying part as in "The unit will arive in 5 days" or "On Monday something happens." where 5 and Monday will change depending on state. It is not a good idea to split those in two and concatenate them. With only one varying part and good documentation you might get away with it, with two varying parts there will be some language that preferes to change the order of them.