Jobs/Files stuck in Training for more than 24h - microsoft-translator

I have a couple of files in 2 different projects that are "Training" for more than 24h now in the new Custom Translator (not Translator Hub). Each is only 26,000 sentences approximately, so it doesn't really justify the wait. I have no other status screens or resources to search, so any ideas would be welcome.

We were experiencing some issues related to training last week that were causing
model training times to run longer than they should. We have since fixed the problem and I hope that your job has completed at this time.

Related

App not being shown in the gallery

I couldn't find any contact information, so I thought posting here would be the next best thing. I resubmitted an app last week (actually I've done it twice now in case there was some error) and I'm experiencing some problems:
1) the previous version of the app has disappeared from the gallery; and
2) it's taking a huge amount fo time (relative to my other experiences) to get it approved.
Who may I contact to follow up on this since it is quite urgent for our needs.
Many thanks,
Eamonn
There are occasionally delays in the review process for scripts being published to the gallery. If it takes a very long time (weeks) you may want to consider re-submitting the script.

Slightly Slow Approval to the Gallery, non-removal of existing script

So, I have to make a minor bug fix to all of my scripts: I didn't realize there was a limit to the amount you could push into the Cache (BTW Google, I'm pretty sure this isn't documented anywhere).
Anyhow, so my three line fix resulted in my having to resubmit a bunch of scripts. Typically this isn't a big deal, Google is usually super awesome about approving them (usually the next business day). However, unfortunately they seem to be taking more time this time. This became a problem because I had to do a presentation today, and I just assumed they would be approved by now (I fudged it and just showed a spreadsheet with the script already installed).
So, I guess my main question here is maybe would have a more graceful upgrade process? It sometimes doesn't make sense to have the script removed from the gallery when waiting for approval.
Thanks!
Ben
I've opened an issue a while ago regarding this (nearly 2 years now). You probably want to star it to keep track of updates.
About the approval process, it is not "reliable" as you could see. I had scripts that took 3 months to be re-approved and then, the next upgrade, only a couple of days.

Discovering "templates" in a given text?

If I have significant amounts of text and am trying to discover templates that occur most frequently, I was thinking of solving it using the N-Gram approach and in fact it was suggested as a solution in this question as well but my requirement is slightly different. Just to clarify, I have some text like this:
I wake up every day morning and read the newspaper and then go to work
I wake up every day morning and eat my breakfast and then go to work
I am not sure that this is the solution but I will try
I am not sure that this is the answer but I will try
I am not feeling well today but I will get the work done and deliver it tomorrow
I was not feeling well yesterday but I will get the work done and let you know by tomorrow
and am trying to extract "templates" like this:
I wake up every day morning and ... and then go to work
I am not sure that this is the ... but I will try
I ... not feeling well ... but I will get the work done and ... tomorrow
I am looking for an approach that can scale to million of lines of text so I was just wondering if I can adapt the same N-gram approach to solve this problem or are there any alternatives?
Millions of lines of text isn't a really big number :)
What you're looking for is at least similar to collocation finding. You could try to compute pointwise mutual information on n-grams. See Manning & Schütze (1999) for this and other approaches to the problem.

How long does code last?

I'm in the process of going back over some of the more minor TODO's in my code. One of them is in a class that handles partial dates, e.g. Jan 2001. It works fine for dates that will be seen in our system (1990 - 2099) and gracefully fails for other dates.
The TODO that I've left for myself is that I don't handle dates in the century 2100 and beyond. I don't really think it worth the effort fixing this particular problem, but I am cognisant of the Y2k bugs. If we were in 2080 already I think I'd be thinking differently and would fix the bug.
So how long does code last for? How far ahead should we plan for our systems to keep running for?
Update
Ok, thanks for all your input. I think I'm going for the option of leave the TODO in the code and do nothing. The thoughts I found most interesting were:
#Adrian - Eternity, I think that's the most correct assumption, your point about VM's is a good one.
#jan-hancic - It depends, yes it does.
#chris-ballance - I'm guessing I'll be dead by the time this restriction is hit, so they can come defile my grave if they want, but I'll be dead, so I'll just haunt his ass.
The reason I decided to do nothing was simple. It added negligable business value, the other things that needed looking at did add value so I'll do them first and if I get the time I'll fix it, but really it'll be nothing more than an academic exercise.
Longer than you expect.
Eternity.
Given the trend that old system keep running in virtual machines, we must assume that all useful code will run forever. There are many system that run since the 60ies, eg backend code in financial sector, and there seems to be no indication that these systems will ever get replaced. (And in the meantime, the frontend is being replaced every other year with the latest fad in web technology. So, the closer your code is to the core of your system, the more likely it will run forever.)
You can't have a general answer here. Depends on what kind of project you are building.
If you are writing software for a space probe then you might want to code it so that it will work for the next 100 years and more.
But if you are programming a special Xmas offer for your company's web page, a few weeks should be enough ...
Assume that whoever will maintain the code is a psychopath and has your home address.
Nobody really knows. Professional programming has been around for 30-40 years, so nobody really knows if code is going to last for 100 years. But if the Y2K bug is an indication, it is that a lot of code is going to stick around for a lot longer than the programmer intended. Keep in mind that even if you take that into account, it could still stick around longer than you expected. No matter how much you prepare, it might still outlive it's intended life expectancy.
My advice is to not plan for code to last 100 years. Instead try to make sure all your code will work for the same length of time, that is, part of it should not fail in 2 years, while the other part should fail in 100 years. Remember, you should always fix the weakest link first, so there is no point making the strongest link stronger.
Sometimes, code lasts longer than you think. But, more important is the slippery slope argument. Once you forgive yourself a bit of non-bullet-proofness, you may be tempted to optimize further and skimp on logical correctness, until it finally bites you.
By the way, I recommend to have an issue ID (such as FogBugz case number) in every TODO comment, so that people can actually subscribe to and track this TODO.
in Dan Bernstein's immortal words: Don't contribute to the Y10K problem!
I don´t think the code will last so long.
Think about all the inventions and progress made in the last 90 years.
In 2100 we won´t have to write down code.
There will be some kind of brain-machine interface.
Well, we recently made a timestamp format where time is stored in a unsigned 64-bit integer as microseconds from 1970. It will last until the year 586912, which should be enough.
Coding for "forever" is unnecessary - of course you could use BigIntegers and such everywhere, but why? Just be prepared for more than 5 or 10 years. Twenty year old production code is not quite unusual nowadays, and I suspect that the average life cycle will get even longer in the near future.
It depends on how much business value the code has and how much resources it takes to write it from scratch. The more value and resources the longer it lasts. Ten years and more is typical for commercial "works, don't touch it" code.
I always tried to code like my applications must work "forever". I am very sure I wont be around anymore in 2100 but knowing my software has a build in expiration date doesn't make me feel good. If you know about such things try to avoid them! You will never know but some unknown programmer in the future may be grateful.
Right up until the time that it breaks, or otherwise ceases to be useful, and then for a bit longer after that
The essential things are:
How good is your internal date class (get a very robust library version and stick to it!)
It's not just the passage of time, but also the growth in the range of inputs your users want. For example, maybe you have 30 year mortgage inputs now, but next month someone decides to input a 99 year lease with maturity 2110, or a 100 year Disney bond!
If you accept 2 digit year inputs with a date window, think very carefully about how that is applied to start and end dates, and give lots of immediate feedback.
Here are my two cents:
When we design a project, we usually declare it to last "at least" 5 years. Usually no more than 10 years before we re-design it and build it all over. (We're talking about mid-large size projects here).
What usually happens is that the new project you build is supposed to replace the old one, either techonology wise (i.e. moving from MF to windows, VB to .net etc.), but this project never ends. So your client ends up working with 2 systems at once and that leftover system is what later is referred to as "legacy".
If you wait long enough, a third project will rise causing the client to work with 3 systems at once and so on...
But to answer your question, I would bet on 5-10 years before redesign, and unless your dates are supposed to be long into the future - no need to worry about the 2100 limitation.
IMHO it comes down to craftmanship : the pride we take in our work, coding to a standard we would not be ashamed another real coder to see.
In the case of dates like this, you've stated that it gracefully fails after 2100. This sounds like you can remove the TODO without a bad conscience, because you have built in a response that will allow the cause of failure to be easily diagnosed and fixed in the (however likely or unlikely) circumstance that a failure occurs.
There are examples of code running on older machines which is 40 or 50 years old.
(Interesting bits in this thread: http://developers.slashdot.org/developers/08/05/11/1759213.shtml).
You've got to ask yourself about the nature of the problem you're solving but generally speaking even "quick fixes" will be around for years so you could realistically be looking at a decade or more for code intended to have a decent shelf life.
The other things you need to think about is:
1) What is the "active life" of the application - that is where it's going to be used and processing.
2) What is the "inactive life" of the application - that is it's not going to be used day to day but might be used for retrieving and viewing old records. For instance UK audit law means that records need to be available for 7 years, so that's potentially 7 years from last system use.
3) What is the range of future data it needs to handle? For instance say you're taking down credit card expiry dates - you can have a card which won't expire for a decade. Can you handle that date?
The answers to these questions will generally lead you to the assumption that you should never knowingly write code which has date constraints beyond those the OS/Language you're using dictates.
The question isn't "How long does code last?" but rather "How long will things in my code affect an application?"
Even if your code is replaced, it's possible that it will get replaced with code that does the exact same thing. To some extent, this is the direct cause of the Y2K problem. More to the point, it is the direct cause of the Y2038 problem.
Also keep in mind what you mean by last.
For example, the original UNIX operating was developed 30 years ago. But during that 30 years, the product has evolved over time.
Though, it wouldn't surprise me if a little but of original code still exists in it today.
So think of it 2 ways ... 1) do you ever antisipate the code being touched in the future, 2) the product/code will evolve if you have support and involvmment.
My current shop has a large code base that runs financial applications with complex business rules. Some of these rules are encoded in stored procedures, some in triggers, and some in 3gl and 4gl application code. There is running code from the late 90's, and none of it in your "traditional" Legacy languages like COBOL or FORTRAN. As one could imagine, it's a steaming pile of spaghetti code, most created before TDD meant anything, so people are reluctant to touch anything.
I have had occasion to be brought in on contract more than a decade after the fact to consult on porting code to a new platform (OS/2 just isn't that popular these days!). When in doubt, assume that your code will live longer than you will. At the very least, document the heck out of limitations like this; fix them unless that takes tremendously more work than to document them.
In 1995 I started work at a new job, on an 8 year old code base.
So it's incept date was 1987 or thereabouts.
The code is probably still in service. Thats what ? 23 years.
There's been some moves of the company, but they probably kept the software ( because it works)
If it's still in service now, it will still be in service in a decade or so.
It's not surprising, particularly high tech code, in C (mostly)
In 1999 I started at a new job, the the codebase had antecedents back to 1984.
The new version I designed in the 2000's is still in service, with design elements like data file formats from the previous one ( and so on back) and that would be a development program over 26 years.
So the year 2086 problem is starting to loom a little as those 32 bit signed time_t's roll over.
Remember that one of the major bonuses of modern programming is reuse.
So that means the code you write originally to solve one problem may get re-purposed and used in a completely different scenario years later (maybe even without your knowledge, by a team mate).
As a side note:
One of the major pluses of automated unit testing, is testing code that you can't even remember is there in a system! :)
As long as people can continue to bill for support with people willing to pay for it.
3-5 years max. After that you have moved on to another job and left your crappy code behind.

Commercial coding for free - how much if any should you do?

I recently answered a question with a proposition that the asker should improve his resistant-to-change boss's legacy system by coding the alternative in his spare time and then presenting it as an alternative approach to his peers.
It got me thinking about all the unpaid development work I have done in my working life. Although I know it is our character to work late, in darkened rooms, eating pizza and slouching in front of a couple of monitors when do you shutdown and go home?
As one of the comments on that question said, why not put that time into a project of your own? The real answer is an indicated of how much you have "signed up" to the project. I give two examples
1) A brilliant project where we did 6 impossible things before breakfast. The PM managed to get us all (and I mean everyone) to work an all nighter, then do a long day, then home and restart at 4.00AM in time for a 12:00 demo. The things is we loved it, and reaped the rewards later.
2) A desperate mess going nowhere, demoralised team, a clear mandate to fail from senior management. At that stage what can you do, long lunches, google, and CV work. No work on out of hours time at all.
It all depends on the project.
I have the luxury (burden?) of working from home so I end up doing quite a bit more work than I would were I in the office. With the laptop already set up and connected to the VPN, it's entirely too difficult for me to resist the temptation to "just pound this out". I'd say I probably average a few (less than 10) hours a week of "unpaid" work. BUT, I will only work "unpaid" if what I'm doing is at my own direction or to help fill a gap in my knowledge that is relevant to the current project.
The little "oh, I should try it this way" moments would qualify for "unpaid" time as would the "now, how the fsck do I do that?" problems.