Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Most people would agree that internationalizing an existing app is more expensive than developing an internationalized app from scratch.
Is that really true? Or when you write an internationalized app from scratch the cost of doing I18N is just being amortized over multiple small assignments and nobody feels on his shoulders the whole weight of the internationalization task?
You can even claim that a mature app has many and many LOC that where deleted during the project's history, and that they don't need to be I18Ned if internationalization is made as an after thought, but would have been I18N if the project was internationalized from the very beggining.
So do you think a project starting today, must be internationalized, or can that decision be deferred to the future based on the success (or not) the software enjoys and the geographic distribution of the demand.
I am not talking about the ability to manipulate unicode data. That you have for free in most mainstream languages, databases and libraries. I am talking specifically of supporting your own software's user interface in multiple languages and locales.
"when you write an internationalized app from scratch the cost of doing I18N is ... amortized"
However, that's not the whole story.
Retroactively tracking down every message to the users is -- in some cases -- impossible.
Not hard. Impossible.
Consider this.
theMessage = "Some initial part" + some_function() + "some following part";
You're going to have a terrible time finding all of these kinds of situations. After all, some_function just returns a String. You don't know if it's a database key (never shown to a person) or a message which must be translated. And when it's translated, grammar rules may reveal that a 3-part string concatenation was a dumb idea.
You can't simply GREP every String-valued function as containing a possible I18N message that must be translated. You have to actually read the code, and possibly rewrite the function.
Clearly, when some_function has any complexity to it at all, you're stumped as to why one part of your application is still in Swedish while the rest was successfully I18N'd into other languages. (Not to pick on Swedes in particular, replace this with any language used for development different from final deployment.)
Worse, of course, if you're working in C or C++, you might have some of this split between pre-processor macros and proper C-language syntax.
And in a dynamic language -- where code can be built on the fly -- you'll be paralyzed by a design in which you can't positively identify all the code. While dynamically generating code is a bad idea, it also makes your retroactive I18N job impossible.
I'm going to have to disagree that it costs more to add it to an existing application than from scratch with a new one.
A lot of the time i18n is not required until the application gets 'big'. When you do get big, you will likely have a bigger development team to devote to i18n so it will be less of a burden.
You may not actually need it. A lot of small teams put great effort to support internationalization when you have no customers who require it.
Once you have internationalized, it makes incremental changes more time consuming. It doesn't take a lot of extra time but every time you need to add a string to the product, you need to add it to the bundle first and then add a reference. No it is not a lot of work but it is effort and does take a bit of time.
I prefer to 'cross that bridge when we come to it' and internationalize only when you have a paying customer looking for it.
Yes, internationalizing an existing app is definitely more expensive than developing the app as internationalized from day one. And it's almost never trivial.
For instance
Message = "Do you want to load the " & fileType() & " file?"
cannot be internationalised without some code alterations because many languages have grammatical rules like gender agreement. You often need a different message string for loading every possible file type, unlike in English when it's possible to bolt together substrings.
There are many other issues like this: you need more UI space because some languages need more characters than English to express the same concept, you need bigger fonts for East Asia, you need to use localised date/times in the user interface but perhaps English US when communicating with databases, you need to use semicolon as a delimeter for CSV files, string comparisons and sorting are cultural, phone numbers & addresses...
So do you think a project starting
today, must be internationalized, or
can that decision be deferred to the
future based on the success (or not)
the software enjoys and the geographic
distribution of the demand?
It depends. How likely is the specific project to be internationalised? How important it is to get a first version fast?
If you truly think you get "unicode handling" "for free", you may have a surprise coming your way when you try.
Unless you use a framework that has proven i18n ability beyond languages with the ANSI or very similar character sets, you will find several niggles and more major issues where the unicode handling isn't quite right, or simply unavailable. Even with relatively common languages (e.g. German) you can run into difficulty with shrinking or expanding letter counts and APIs that don't support unicode.
And then think of languages with different reading-ordering!
This is one of the reasons you should really plan it in from the beginning, and test the stuff to destruction on the set of languages you plan to support.
The concept of i18n and l10n is broader than merely translating strings to and fro some languages.
Example: Consider the input of date and time by users. If you haven't internationalization in mind when you design
a) the interface for the user and
b) the storage, retrieval and display mechanism
you will get a really bad time, when you want to enable other input schemes.
Agreed, in most cases i18n is not necessary in the first place. But, and that is my point, if you don't spend a thought on some areas, that must be touched for i18n, you will find yourself ending up rewriting large portions of the original code. And then, adding i18n is a lot more expensive than having spent some thought beforehand.
One thing that seems like it can be a big issue is the different character counts for a message in various languages. I do some work on iPhone apps and especially on a small screen if you design the UI for a message that has 10 characters and then you try to internationalize later and find you need 20 characters to display the same thing you now have to redo your UI to accommodate. Even with desktop apps this can still be a large PITA.
It depends on your project and how your team is organised.
I've been involved in the internationalization of a website, and it was one developer full-time for a year, probably about 6-8 months part-time for me to handle installation impacts when needed (reorganising files, etc), and other developers getting involved from time to time when their projects needed heavy refactoring. This was in an application that was at v3.
So that's definitely expensive. What you have to ask is how expensive is it to provide a localization system from the start, and how will that impact the project in the early stages. Your project at v1 may not be able to survive delays and setbacks caused by issues with a hastily-designed internationalization framework, while a stable v3 project with a wide customer base may have the capital to invest in doing that properly.
It also depends on whether you want to internationalize everything including log messages, or just the UI strings, and how many of those UI strings there are, and who you have available to do localization and the QA that goes with it, and even what languages you want to support - for example, does your system need to support unicode strings (which is a requirement for Asian languages).
And don;t forget that changing the database backend to support internationalized data can be costly as well. Just try to change that varchar field to nvarchar when you already have 20,000,000 records.
I think it depends on a language. Every j2ee(java web) app is i18n, because its very easy(even IDE can extract strings for you and you just name them).
In j2ee its cheaper to add it later, however the culture is to add them as soon as possible. I think its because j2ee uses a lot of open-source and almost all open-source libs are i18n. its great idea for them, but not for most j2ee app. most enterprise apps are just for one company that speak one language.
Plus if you have bad testers putting it too soon makes them give you bug reports about labels and translations(I only once saw translations done NOT by developers). After testers are done with it you have buggy app with excellent i18n support. However it might be fun for users to switch language and see if they can use it. However using your app its just boring work for them, so they wont even do that. The only users of i18n are the testers.
Weird string joining is not in j2ee culture since you know that one day someone might want to make it i18n. Only problem is extracting labels from html templates.
I cant say what is expensive but, i can tell you that a clean API lets you internationalise your Aplication at very low cost.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I maintain an Open Source Android app. Every once in a while, some anonymous hero localizes to its mother tongue, sending files or using our online tool.
At first I thought the magic of collaboration would be enough to provide timely localizations, but actually, the UI strings change, and each release ships with roughly:
5 languages localized at 100%
8 languages localized at 70% (because recent strings have not been localized)
I am extremely thankful, but is there something I can do to bring the localizations from 70% to 100% when the release comes?
I send messages on the mailing list at code freeze a month before each release, and then a week before, but most of the people who contributed localizations don't read the mailing list, in fact most of them were probably just good samaritans passing by.
Should I stalk the translators and ask them personally?
I have been thinking about having a person responsible for each language. This person (how should I call the role?) would be "responsible" for bringing the translation to 100% before each release. Their names would be listed in the "About" dialog. Is it a good or a bad strategy? Any tips?
It's the 90-9-1 principle - http://www.90-9-1.com/ - and designating people in charge of things isn't going to do it. You can offer cash - rewards/pay - or you can groom them. If you bring money into the mix, remember that it quickly becomes tradeoff analysis. People will compare what you offer versus what they can earn on their own. Since I assume you don't have that much money, you don't want that sort of comparison.
Realistically, grooming them is a better option. You've done the first step - include them - show that their fixes, updates, etc are included and help the product. The next step is publicly thanking them and appreciating them. That will get the first 80% as you've seen. The next step is getting them personally committed. Start interacting with them directly. Send them your thanks.. not just in email. If you have a product t-shirt, send them one with a hand written note. In your release notes, link to their sites. If you ever see them in person, buy their coffee.. whatever. The point is that you go out of your way - however small - to acknowledge that they're going out of their own way.
I'm the Project Lead of the Open Source Project Management System web2project and have been doing this longer than I care to consider. ;)
Use both a carrot and a stick approach. Giving people credit in your "About" screen, and calling them the lead translator for a particular language (if they're willing to accept that responsibility) is a good thing. Don't include a translation until you've gotten someone to commit to being the lead translator for that language; just like with code, you don't want to accept contributions unless (a) you are able and willing to maintain those contributions or (b) someone who you consider reliable has volunteered to be responsible for maintaining it.
The "stick" is that if someone stops maintaining a language, and fails to appropriately pass the responsibility off to someone else, you will remove that language from the next version of your application. Most people who translate your app probably do so because they would prefer to use your app in their native language. The threat of removing the language from the next version, or the wake-up call when they find that the next version doesn't contain their language, might inspire them to come back and finish up the last 30% translation work.
You are getting it wrong, community translation is an ongoing process or lets say never ending approach.
This doesn't meant that is wrong, but you have to live with the fact that localization is always a compromise and that Usually, it is better to have 20 languages with 50% translation than having 10 languages 100% translated.
If one language is important, it will have more users, so it will have more contributors and the translation rate will be greater.
You don't know when and if the translation for a specific language will be 100%, probably never.
The good part is that you shouldn't care about obtaining 100%, you should spend your effort in motivating people to contribute.
Probably you already know, community translation doesn't play well with packed products.
The solution to this problem is to change the way you work and provide partial translations in your release and a simple update system that updates them (preferably a silent one, like the Chromium update).
PS. If you need to assure a 100% translation rate for a limited number of languages before your release, consider paying a translation vendor to do it.
Well, you get what you paid for. The strategy that you are planing to implement will not give you anything. That's because heroes have their own lives. The only way to get this done, is to engage more people to translate, because completion percentage is a function of a community size.
If your application is useful enough, you may try to give a link "help to translate into your language" and this could do for a while.
Think of creating discussion group or board that you could post "Call for translation" when you are ready to ship. Translators usually don't have a time to track changes.
One thing to note is that localization people need time to do the localizations, and the less they're getting paid for it, the longer they need. This means that you should at least try to avoid changing the localization keys (especially including defining new ones) shortly before a release. I know it's nice to not be constrained like that, but the reality is that if you're not paying then you're not going to get a fast turnaround in the majority of cases, so you have to plan for this and mitigate in your schedule. In effect, you have to stop thinking about the text shown to a user as something that can be finalized late in the project and instead start treating it as an important interface matter that you'll plan to lock down much earlier.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am currently doing a dissertation about the implications or dangers that today's software development practices or teachings may have on the long term effects of programming.
Just to make it clear: I am not attacking the use abstractions in programming. Every programmer knows that abstractions are the bases for modularity.
What I want to investigate with this dissertation are the positive and negative effects abstractions can have in software development. As regards the positive, I am sure that I can find many sources that can confirm this. But what about the negative effects of abstractions? Do you have any stories to share that talk about when certain abstractions failed on you?
The main concern is that many programmers today are programming against abstractions without having the faintest idea of what the abstraction is doing under-the-covers. This may very well lead to bugs and bad design. So, in you're opinion, how important is it that programmers actually know what is going below the abstractions?
Taking a simple example from Joel's Back to Basics, C's strcat:
void strcat( char* dest, char* src )
{
while (*dest) dest++;
while (*dest++ = *src++);
}
The above function hosts the issue that if you are doing string concatenation, the function is always starting from the beginning of the dest pointer to find the null terminator character, whereas if you write the function as follows, you will return a pointer to where the concatenated string is, which in turn allows you to pass this new pointer to the concatenation function as the *dest parameter:
char* mystrcat( char* dest, char* src )
{
while (*dest) dest++;
while (*dest++ = *src++);
return --dest;
}
Now this is obviously a very simple as regards abstractions, but it is the same concept I shall be investigating.
Finally, what do you think about the issue that schools are preferring to teach Java instead of C and Lisp ?
Can you please give your opinions and your says as regards this subject?
Thank you for your time and I appreciate every comment.
First of all, abstractions are inevitable because they help us to deal with the mind-blowing complexity of things.
Abstractions are also inevitable because it is more and more required of an individual to undertake more tasks or even complete projects. To address the problem, one uses libraries which wrap lower-level concepts and expose more complex behavior.
Naturally, a developer has less and less time to know the intrinsics of the things. The latest concern I heard about on SO pages is starting to learn JavaScript with jQuery library, ignoring the raw JavaScript at all.
The issue is about the balance between:
Know the little tiniest details of some technology and be a master of it, but at the same time being unable to work with anything else.
Superficial knowledge of a wide variety of technologies and tools which however proves sufficient for common everyday tasks which allows an individual to perform in multiple areas possibly covering all sides of some (moderately big) project.
Take your pick.
Some work requires the one, another position requires the other.
So, in you're opinion, how important is it that programmers actually know what is going below the abstractions?
It would be nice if people knew what is happening behind the scenes. This knowledge comes with time and practice, up to a certain degree. Depends on what kind of tasks you have. You certainly shouldn't blame people for not knowing everything. If you wish a person to be able to perform in a variety of fields, it is inevitable he won't have time to cover each up to the last bit.
What is essential, is the knowledge of the basic building blocks. Data structures, algorithms, complexity. That should provide a basis for everything else.
Knowing tiniest details of some particular technology is good, but not essential. Anyway, you can't learn them all. They're too many and they keep coming.
Finally, what do you think about the issue that schools are preferring to teach Java instead of C and Lisp ?
Schools shouldn't be teaching programming languages at all. They're to teach basics of theoretical and practical CS, social skills, communication, team work. To cover a vast variety of topics and problems to provide a wide angle view for their graduates. This will help them to find their way. Whatever they need to know in details, they'll do it on their own.
An example where abstraction has failed:
In this case, a piece of software was needed to communicate to many different third party data processors. The communication was done through various messaging protocols; the transport method/protocol is not important in this case. Just assume everyone communicated through messaging.
The idea was to abstract the features of each of these third parties into a single, unified message format. It seemed relatively straightforward because each of the third parties performed a similar service. The problem was that some third parties used different terms to explain similar features. It was also found that some third parties had additional features that other third parties did not have.
The designers of the abstraction did not see through the difference of third party terms nor did they think it was reasonable to limit the scope of the unified features to only support the common features of the third parties. Instead, a single, monolithic message schema was developed to support any and all features of the third parties considered at the time. In what was probably considered a future-proofing move, they added a means of also passing an infinite number of name/value pairs along with the monolithic message in case there were future data elements that the monolithic message could not handle.
Early on, it became clear that changing the monolithic message was going to be difficult due to so many people using it in mission critical systems. The use of the name/value pairs increased. Each name that could be used was documented inside a large spreadsheet, and developers were required to consult the spreadsheet to avoid duplication of name value function. The list got so large, however, that it was found that there were frequently collisions in purposes of name values.
The majority of the monolithic message's fields now have no purpose and are kept mainly for backwards compatibility. There are name values that can be used to replace fields in the monolithic message. The majority of the interfacing is now done through the name/value pairs. In cases where the client is intending to communicate with more than one third party, each client needs to reconcile the name values available for each third party. It would be almost simpler to interface directly to the third party themselves.
I believe this illustrates that, from a consumer of the monolithic message perspective, that it is important that developers of the consuming code not know what is happening under the covers. If the designers had considered that the consumers of the monolithic message should not have to understand the abstraction in great detail, the monolithic message and it's associated name/value pairs might never have happened. Documenting the abstraction with assertions regarding input and expected output would make life so much simpler.
As for colleges not teaching C and Lisp....they are cheating the students. You get a better understanding of what is going on with the machine and OS with C. You get a bit of a different perspective on processing data and approaching problems with Lisp. I have used some of the ideas I learned using Lisp in programs written in C, C++, .Net, and Java. Learning Java after knowing even just C is not very difficult. The OO part is really not programming language specific, so perhaps using Java for that is acceptable.
An understanding of fundamentals of algorithms (e.g. time complexity) and some knowledge about the metal is essential to designing/writing smells-good code.
I would suggest, though, that just as important is education in modern abstractions and profiling. I feel that modern abstractions make me so much more productive than I would be without them that they are at least as important as good fundamentals, if not more so.
An important element that lacked in my education was the use of profilers. When used routinely and correctly, profilers can help mitigate problems with poor fundamentals.
Since you quote Joel Spolsky, I take it your aware of his "Law of Leaky Abstractions"? I'll mention it for future readers. http://www.joelonsoftware.com/articles/LeakyAbstractions.html
Green & Blackwell's Ironies of Abstractions talks a bit about the effort of learning the abstraction. http://homepage.ntlworld.com/greenery/workStuff/Papers/index.html
The term "astronaut architecture" is a reaction to over-abstraction.
I know I certainly curse abstraction when I haven't touched Java or C# in a while and i want to write to a file, but have to instance a Stream...Writer...Adaptor....Handler....
Also, Patterns, as in Gang Of Four. Seemed great when I first read about them in the mid-90's, but can never remember factory, facade, interface, helper, worker, flyweight....
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have an interesting idea for a new programming language. It's based on a new programming paradigm that I've been working out in my head for some time. I finally got around to start working on a basic parser and interpreter for it a few weeks ago.
I want my new language to be successful and I want to eventually create a community around it when it's ready to release. The idea behind it is fairly innovative, so I don't expect it to gain a lot of ground in the business world, but it would thrill me more than anything else to see a handful of start ups use or open source projects use it.
So taking those aims into account, what can I do to help make my language successful? What do language projects do to become successful? What should I avoid at all costs? I'd love to hear opinions or stories about other languages -- successful or not -- so I can think about them as I continue to develop.
So far, the two biggest concerns on my mind are finding a market, access to existing libraries, having amazing tool support. What else might I add to this list?
The true answer is by having a beard.
http://blogs.microsoft.co.il/blogs/tamir/archive/2008/04/28/computer-languages-and-facial-hair-take-two.aspx
Although not specific to new programming languages, the book Producing Open Source Software by Karl Fogel (available to read online) may be contain some hints to the issue of making a community around your new programming language.
In terms of adoption of programming languages in general, it seems like the trend lately has been to have a rich library to make development times shorter.
As there isn't much detail on what your language is like, it's hard to determine whether adoption of the language is going to depend on the availability of a rich library. Perhaps your language will be able to fill a niche that has been overlooked by other languages and be able to gain users. Or perhaps it has a slick name that will draw people in -- there are many factors which can affect the adoption of a language.
Here are some factors that come to mind when thinking about recent successful languages:
Ability to leverage existing libraries in the new language.
Having an adapter to external libraries written in other languages.
Python allows access to code written in C through the Python/C API.
Targeting a platform which already has plenty of libraries available for use.
Groovy and Scala target the Java platform, therefore allowing the use of and interoperation between existing Java code.
Language design and syntax to allow increased productivity.
Many dynamically-typed languages have gained popularity, such as Ruby and Python to name a couple.
More concise and clear code can be written in languages such as Groovy, as opposed to verbose languages such as Java.
Offering features such as functions as first-class objects and closures which aren't offered in more "traditional" languages such as C and Java.
A community of dedicated users who also are willing to teach newcomers on the benefits of a language
The human factor is going to be big in wide-spread support for a language -- if people never start using your language, it won't gain more users.
Also, another suggestion that I could add is to make the development of your language open -- keep your users posted on developments in your language, and allow people to give you feedback. Better yet, let your users take part in the decision-making process, if you feel that is appropriate.
I believe that by offering ways to participate in the bringing up of a language, the more people will feel that they have a stake in the success of the new language, so the more likely it will gain more support.
Good luck!
Most languages that end up taking off rapidly do so by means of a killer app. For C it was Unix. Ruby had Rails. JavaScript is the only available programming system common to most browsers without third-party add-ons.
Another means of success is by fiat. This only works if you have significant clout. For example C#, as nice as a language as it might be, wouldn't be any where near as popular as it is now if Microsoft had not pushed it as hard as it does. Objective-C is the language of MacOS X simply because Apple says so.
The vast majority of languages, though, which lack a single killer app or a major corporate backer have gained success through long term investment of their respective creators. Perl and Python are prime examples. C++ has no single entity behind it, but it has evolved as the needs of developers have changed.
Don't worry about trying to make the language be successful; worry about using it to solve real problems and make real money.
You'll either make lots of money from using this language, or not. Once you have lots of money, others may care how you did it. Or not, either way you have lots of money.
If you don't make lots of money, nobody will want to know how you did it.
Edit based on comment: I define successful as people using it, and people use languages to solve problems, most for profit, thus successful == profitable.
In addition to making the language easy to use (which has several meanings), you should develop a comprehensive library that covers and also provides a good level of abstraction over (the following most important areas):
* Data structures and manipulation
* File I/O support
* XML processing
* Networking (plus web based technologies like HTTP/HTTPS)
* Database support
* Synchronous and asynchronous I/O
* Processes and threads
* Math
A well thought out framework that makes rapid development faster (and easier to maintain) would be a great addition. For this, you should know the currently popular frameworks well.
Keep in mind that it takes a lot of time. I think it took python about 10 years (someone please correct me if I'm wrong).
So even if your community still seems small after say, 5 years, that's not the end of the story.
"It's based on a new programming paradigm that I've been working out in my head for some time."
While laudable, odds are really good that someone has already done something with your "new" paradigm.
To make a language usable, it must build on prior art. Totally new is not a good path to success. My favorite example is Algol 68.
Algol 60 was wildly popular (back in the day, which is a while ago, admittedly).
The experts wanted to build on this success. They proposed some new paradigms, the effort split into factions. The purists put the new paradigms into Algol 68; it disappeared into obscurity. Some folks created a different version of Algol, called PL/I. It did not have any really new paradigms. It actually went somewhere and was used heavily. Another group created Pascal -- it didn't have much that was new -- it discarded things from Algol 60. It actually went somewhere ans was used heavily.
Your new paradigm must have a clear and concise summary so people can fit it into a context of where the language is usable, how it can be used, what the costs and benefits of using it are.
A "new programming paradigm" causes some people to say "why learn a completely new paradigm when the ones I have work so nicely?" You have to be very clear on how it helps to have a new paradigm.
The language and libraries must work, and work very, very well. A language that isn't rock-solid is worthless. In order to be rock-solid it must be very simple.
It has to have a tutorial that will help anyone get started with your language.
Good Framework for Common Tasks
Easy Installation/Deployment
Good Documentation
Debugger/IDE and other Tools
A popular flagship product that uses your language!
Good documentation, including a detailed reference manual as well as simple examples to get people started quickly.
Good library support so that people can actually write useful programs.
Most popular languages seem to be very strong in either or both or both of those.
Use Trojan Horse approach
C++ - The Forgotten Trojan Horse
An interesting article on why C++ can grab the heart of programmers successfully.
I know that there is no way to fully protect our code.
I also know that if a user wants to crack our app, then he or she is not a user that would buy our app.
I also know that it is better to improve our app.. instead of being afraid of anticracking techniques.
I also know that there is no commercial tool that can protec our app....
I also know that....
Ok. Enough. I've heard everything.
I really think that adding a little protection won't hurt.
So.... have you ever used code virtulizer from oreans or vmprotect?
I've heard that they are sometimes detected as virus by some antivirus.
Any experiences that I should be aware of before buying it.
I know it creates some virtual machines and obfuscates a little the code to make it harder to find the weaknesses of our registration routines.
Is there any warning I should know?
Thanks.
Any advice would be appreciated.
Jag
In my humble opinion, you should be lucky or even eager to be pirated, because that means your product is successful and popular.
That's plain incorrect. My software that I worked many months on was cracked the moment it was released. There are organised cracking groups that feed off download.com's RSS channel etc and crack each app that appears. It's a piece of cake to extract the keygen code of any app, so my response was to:
a) resort to digital certificate key files which are impossible to forge as they are signed by a private AES key and validated by a public one embedded in the app (see: aquaticmac.com - I use the stl c++ implementation which is cross-platform), along with.
b) The excellent Code Virtualizer™. I will say that the moment I started using Code Virtualizer™ I was getting some complaints from one or two users about app crashes. When I removed it from their build the crashes ceased. Still, I'm not sure whether it was a problem with CV per se as it could have been an obscure bug in my code, but I since reshuffled my code and I have since heard no complaints.
After the above, no more cracks. Some people look at being cracked as a positive thing, as it's a free publicity channel, but those people usually haven't spent months/years on an idea only to find you're being ripped off. Quite hard to take.
Unfortunately, VM-protected software is more likely to get affected by false positives than conventional packing software. The reason for that is that since AV protection is so complicated, AV software are often unable to analyze the protected code, and may rely on either pattern libraries or may issue generic warnings for any files protected by a system it can't analyze. If your priority is to eliminate false positives, I suggest picking a widely-used protection solution, e.g. AsProtect (although Oreans' products are becoming quite popular as well).
Software VM protection is quite popular today, especially as it's now available at an accessible price for small companies and independent software developers. It also takes a considerable amount of effort to crack in comparison to non-VM techniques - the wrappers usually have the standard anti-debugging tricks that other protections have, as well as the VM protection. Since the virtual machine is generated randomly on each build, the crackers will need to analyze the VM instruction set and reverse engineer the protected code back to machine code.
The main disadvantage of VM protection is that if it's overused (used to protect excessive parts of the code), it can slow down your application considerably - so you'll need to protect just the critical parts (registration checks, etc). It also doesn't apply to certain application types - it likely won't work on DLLs that are used for injection, as well as device drivers.
I've also heard that StrongBit EXECryptor is a decent protection package at a decent price. (I'm not affiliated with said company nor guarantee any quality what-so-ever, it's just word of mouth and worth checking out IMO).
How have you implemented Internationalization (i18n) in actual projects you've worked on?
I took an interest in making software cross-cultural after I read the famous post by Joel, The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). However, I have yet to able to take advantage of this in a real project, besides making sure I used Unicode strings where possible. But making all your strings Unicode and ensuring you understand what encoding everything you work with is in is just the tip of the i18n iceberg.
Everything I have worked on to date has been for use by a controlled set of US English speaking people, or i18n just wasn't something we had time to work on before pushing the project live. So I am looking for any tips or war stories people have about making software more localized in real world projects.
It has been a while, so this is not comprehensive.
Character Sets
Unicode is great, but you can't get away with ignoring other character sets. The default character set on Windows XP (English) is Cp1252. On the web, you don't know what a browser will send you (though hopefully your container will handle most of this). And don't be surprised when there are bugs in whatever implementation you are using. Character sets can have interesting interactions with filenames when they move to between machines.
Translating Strings
Translators are, generally speaking, not coders. If you send a source file to a translator, they will break it. Strings should be extracted to resource files (e.g. properties files in Java or resource DLLs in Visual C++). Translators should be given files that are difficult to break and tools that don't let them break them.
Translators do not know where strings come from in a product. It is difficult to translate a string without context. If you do not provide guidance, the quality of the translation will suffer.
While on the subject of context, you may see the same string "foo" crop up in multiple times and think it would be more efficient to have all instances in the UI point to the same resource. This is a bad idea. Words may be very context-sensitive in some languages.
Translating strings costs money. If you release a new version of a product, it makes sense to recover the old versions. Have tools to recover strings from your old resource files.
String concatenation and manual manipulation of strings should be minimized. Use the format functions where applicable.
Translators need to be able to modify hotkeys. Ctrl+P is print in English; the Germans use Ctrl+D.
If you have a translation process that requires someone to manually cut and paste strings at any time, you are asking for trouble.
Dates, Times, Calendars, Currency, Number Formats, Time Zones
These can all vary from country to country. A comma may be used to denote decimal places. Times may be in 24hour notation. Not everyone uses the Gregorian calendar. You need to be unambiguous, too. If you take care to display dates as MM/DD/YYYY for the USA and DD/MM/YYYY for the UK on your website, the dates are ambiguous unless the user knows you've done it.
Especially Currency
The Locale functions provided in the class libraries will give you the local currency symbol, but you can't just stick a pound (sterling) or euro symbol in front of a value that gives a price in dollars.
User Interfaces
Layout should be dynamic. Not only are strings likely to double in length on translation, the entire UI may need to be inverted (Hebrew; Arabic) so that the controls run from right to left. And that is before we get to Asia.
Testing Prior To Translation
Use static analysis of your code to locate problems. At a bare minimum, leverage the tools built into your IDE. (Eclipse users can go to Window > Preferences > Java > Compiler > Errors/Warnings and check for non-externalised strings.)
Smoke test by simulating translation. It isn't difficult to parse a resource file and replace strings with a pseudo-translated version that doubles the length and inserts funky characters. You don't have to speak a language to use a foreign operating system. Modern systems should let you log in as a foreign user with translated strings and foreign locale. If you are familiar with your OS, you can figure out what does what without knowing a single word of the language.
Keyboard maps and character set references are very useful.
Virtualisation would be very useful here.
Non-technical Issues
Sometimes you have to be sensitive to cultural differences (offence or incomprehension may result). A mistake you often see is the use of flags as a visual cue choosing a website language or geography. Unless you want your software to declare sides in global politics, this is a bad idea. If you were French and offered the option for English with St. George's flag (the flag of England is a red cross on a white field), this might result in confusion for many English speakers - assume similar issues will arise with foreign languages and countries. Icons need to be vetted for cultural relevance. What does a thumbs-up or a green tick mean? Language should be relatively neutral - addressing users in a particular manner may be acceptable in one region, but considered rude in another.
Resources
C++ and Java programmers may find the ICU website useful: http://www.icu-project.org/
Some fun things:
Having a PHP and MySQL Application that works well with German and French, but now needs to support Russian and Chinese. I think I move this over to .net, as PHP's Unicode support is - in my opinion - not really good. Sure, juggling around with utf8_de/encode or the mbstring-functions is fun. Almost as fun as having Freddy Krüger visit you at night...
Realizing that some languages are a LOT more Verbose than others. German is a LOT more verbose than English usually, and seeing how the German Version destroys the User Interface because too little space was allocated was not fun. Some products gained some fame for their creative ways to work around that, with Oblivion's "Schw.Tr.d.Le.En.W." being memorable :-)
Playing around with date formats, woohoo! Yes, there ARE actually people in the world who use date formats where the day goes in the middle. Sooooo much fun trying to find out what 07/02/2008 is supposed to mean, just because some users might believe it could be July 2... But then again, you guys over the pond may believe the same about users who put the month in the middle :-P, especially because in English, July 2 sounds a lot better than 2nd of July, something that does not neccessarily apply to other languages (i.e. in German, you would never say Juli 2 but always Zweiter Juli). I use 2008-02-07 whenever possible. It's clear that it means February 7 and it sorts properly, but dd/mm vs. mm/dd can be a really tricky problem.
Anoter fun thing, Number formats! 10.000,50 vs 10,000.50 vs. 10 000,50 vs. 10'000,50... This is my biggest nightmare right now, having to support a multi-cultural environent but not having any way to reliably know what number format the user will use.
Formal or Informal. In some language, there are two ways to address people, a formal way and a more informal way. In English, you just say "You", but in German you have to decide between the formal "Sie" and the informal "Du", same for French Tu/Vous. It's usually a safe bet to choose the formal way, but this is easily overlooked.
Calendars. In Europe, the first day of the Week is Monday, whereas in the US it's Sunday. Calendar Widgets are nice. Showing a Calendar with Sunday on the left and Saturday on the right to a European user is not so nice, it confuses them.
I worked on a project for my previous employer that used .NET, and there was a built in .resx format we used. We basically had a file that had all translations in the .resx file, and then multiple files with different translations. The consequence of this is that you have to be very diligent about ensuring that all strings visible in the application are stored in the .resx, and anytime one is changed you have to update all languages you support.
If you get lazy and don't notify the people in charge of translations, or you embed strings without going through your localization system, it will be a nightmare to try and fix it later. Similarly, if localization is an afterthought, it will be very difficult to put in place. Bottom line, if you don't have all visible strings stored externally in a standard place, it will be very difficult to find all that need to be localized.
One other note, very strictly avoid concatenating visible strings directly, such as
String message = "The " + item + " is on sale!";
Instead, you must use something like
String message = String.Format("The {0} is on sale!", item);
The reason for this is that different languages often order the words differently, and concatenating strings directly will need a new build to fix, but if you used some kind of string replacement mechanism like above, you can modify your .resx file (or whatever localization files you use) for the specific language that needs to reorder the words.
I was just listening to a Podcast from Scott Hanselman this morning, where he talks about internationalization, especially the really tricky things, like Turkish (with it's four i's) and Thai. Also, Jeff Atwood had a post:
Besides all the previous tips, remember that i18n it's not just about changing words for their equivalent on other languages, especially for non-latin languages alphabets (korean, Arabic) which written right to left, so the whole UI will have to conform, like
item 1
item 2
item 3
would have to be
arabic text 1 -
arabic text 2 -
arabic text 3 -
(reversed bullet list doesn't seem to work :P)
which can be a UI nightmare if your system has to apply changes dinamically once the user changes the language being used.
Another very hard thing is to test different languages, not just for the correctness of word, but since languages like Korean usually have bigger font type for their characters this may lead to language specific bugs (like "SAVE" text on a button being larger than the button itself for some language).
One of the funnier things to discover: italics and bold text makrup does not work with CJK (Chinese/Japanese/Korean) characters. They simply become unreadable. (OK, I couldn't really read them before either, but especially bolding just creates ink blots)
I think everyone working in internationalization should be familiar with the Common Locale Data Repository, which is now a sub-project of Unicode:
Common Locale Data Repository
Those folks are working hard to establish a standard resource for all kinds of i18n issues: currency, geographical names, tons of stuff. Any project that's maintaining its own core local data given that this project exists is pretty bonkers, IMHO.
I suggest to use something like 99translations.com to maintain your translations . Otherwise you won't be able to tell what of your translations are up to date in every language.
Another challenge will be accepting input from your users. In many cases, this is eased by the input processing provided by the operating system, such as IME in Windows, which works transparently with common text widgets, but this facility will not be available for every possible need.
One website I use has a translation method the owner calls "wiki + machine translation". This is a community based site so is obviously different to the needs of companies.
http://blog.bookmooch.com/2007/09/23/how-bookmooch-does-its-translations/
One thing no one have mentioned yet is strings with some warying part as in "The unit will arive in 5 days" or "On Monday something happens." where 5 and Monday will change depending on state. It is not a good idea to split those in two and concatenate them. With only one varying part and good documentation you might get away with it, with two varying parts there will be some language that preferes to change the order of them.