Related
I have a dream to improve the world of distributed programming :)
In particular, I'm feeling a lack of necessary tools for debugging, monitoring, understanding and visualizing the behavior of distributed systems (heck, I had to write my own logger and visualizers to satisfy my requirements), and I'm writing a couple of such tools in my free time.
Community, what tools do you lack with this regard? Please describe one per answer, with a rough idea of what the tool would be supposed to do. Others can point out the existence of such tools, or someone might get inspired and write them.
OK, let me start.
A distributed logger with a high-precision global time axis - allowing to register events from different machines in a distributed system with high precision and independent on the clock offset and drift; with sufficient scalability to handle the load of several hundred machines and several thousand logging processes. Such a logger allows to find transport-level latency bottlenecks in a distributed system by seeing, for example, how many milliseconds it actually takes for a message to travel from the publisher to the subscriber through a message queue, etc.
Syslog is not ok because it's not scalable enough - 50000 logging events per second will be too much for it, and timestamp precision will suffer greatly under such load.
Facebook's Scribe is not ok because it doesn't provide a global time axis.
Actually, both syslog and scribe register events under arrival timestamps, not under occurence timestamps.
Honestly, I don't lack such a tool - I've written one for myself, I'm greatly pleased with it and I'm going to open-source it. But others might.
P.S. I've open-sourced it: http://code.google.com/p/greg
Dear Santa, I would like visualizations of the interactions between components in the distributed system.
I would like a visual representation showing:
The interactions among components, either as a UML collaboration diagram or sequence diagram.
Component shutdown and startup times as self-interactions.
On which hosts components are currently running.
Location of those hosts, if available, within a building or geographically.
Host shutdown and startup times.
I would like to be able to:
Filter the components and/or interactions displayed to show only those of interest.
Record interactions.
Display a desired range of time in a static diagram.
Play back the interactions in an animation, with typical video controls for playing, pausing, rewinding, fast-forwarding.
I've been a good developer all year, and would really like this.
Then again, see this question - How to visualize the behavior of many concurrent multi-stage processes?.
(I'm shamelessly refering to my own stuff, but that's because the problems solved by this stuff were important for me, and the current question is precisely about problems that are important for someone).
You could have a look at some of the tools that come with erlang/OTP. It doesn't have all the features other people suggested, but some of them are quite handy, and built with a lot of experience. Some of these are, for instance:
Debugger that can debug concurrent processes, also remotely, AFAIR
Introspection tools for mnesia/ets tables as well as process heaps
Message tracing
Load monitoring on local and remote nodes
distributed logging and error report system
profiler which works for distributed scenarios
Process/task/application manager for distributed systems
These come of course in addition to the base features the platform provides, like Node discovery, IPC protocol, RPC protocols & services, transparent distribution, distributed built-in database storage, global and node-local registry for process names and all the other underlying stuff that makes the platform tic.
I think this is a great question and here's my 0.02 on a tool I would find really useful.
One of the challenges I find with distributed programming is in the deployment of code to multiple machines. Quite often these machines may have slightly varying configuration or worse have different application settings.
The tool I have in mind would be one that could on demand reach out to all the machines on which the application is deployed and provide system information. If one specifies a settings file or a resource like a registry, it would provide the list for all the machines. It could also look at the user access privileges for the users running the application.
A refinement would be to provide indications when settings are not matching a master list provided by the developer. It could also indicate servers that have differing configurations and provide diff functionality.
This would be really useful for .NET applications since there are so many configurations (machine.config, application.config, IIS Settings, user permissions, etc) that the chances of varying configurations are high.
In my opinion, what is missing is a distributed programming platform...a platform that makes application programming over distributed systems as transparent as non-distributed programming is now.
Isn't it a bit early to work on Tools when we don't even agree on a platform? We have several flavors of actor models, virtual shared memory, UMA, NUMA, synchronous dataflow, tagged token dataflow, multi-hierchical memory vector processors, clusters, message passing mesh or network-on-a-chip, PGAS, DGAS, etc.
Feel free to add more.
To contribute:
I find myself writing a lot of distributed programs by constructing a DAG, which gets transformed into platform-specific code. Every platform optimization is a different kind of transformation rules on this DAG. You can see the same happening in Microsoft's Accelerator and Dryad, Intel's Concurrent Collections, MIT's StreaMIT, etc.
A language-agnostic library that collects all these DAG transformations would save re-inventing the wheel every time.
You can also take a look at Akka:
http://akka.io
Let me notify those who've favourited this question by pointing to the Greg logger - http://code.google.com/p/greg . It is the distributed logger with a high-precision global time axis that I've talked about in the other answer in this thread.
Apart from the mentioned tool for "visualizing the behavior of many concurrent multi-stage processes" (splot), I've also written "tplot" which is appropriate for displaying quantitative patterns in logs.
A large presentation about both tools, with lots of pretty pictures here.
I have always worked in environments where developers had to go through a process of working with Network Operations (server guys) to deploy stuff from development/test to production.
I recently started a job where developers can go directly from their machines to production with no middle man. Are there reasons that developers should not be able to do this?
What I have so far:
You are more careful about deploying
something if it has to go through
someone else. As a young programmer
it sometimes took me several tries to
get a working deployment out. Since
the NetOps guys were pissed I learned
to make sure it was right the first
time.
There is some accountability if something goes wrong and more than one person knows what's going on. Boss: "The site just went down!", Everyone else in the office: "Abe just did a deploy, it's his fault!"
When someones sole responsibility is the production server, it's less likely that they will do something stupid.
There will (hopefully) be more information on the deploy and roll back capabilities. Logs, backups that can be rolled back to, automated features...
Are there any other good reasons? Am I just being a control freak?
A few that come to mind (there may be overlap with yours):
A developer can tweak something until it works. This shouldn't be done in Production. If that developer is hit by a bus the next day, nobody will know the system. A documented and repeatable-by-someone-else deployment process helps ensure that such business knowledge is captured.
As a developer, I don't want that kind of access. If something fails, it's far less likely that it's my fault. I'll come in and help, we're all on the same team after all, but I like to know that someone else had to review my work and agree with it. (The same is true of my DB delta scripts. I want a more qualified DBA whose sole responsibility is to the database to review my work. If all they do is run what I tell them when I tell them, then that's essentially no different than giving me direct access. It's just slower.)
Developers often make quick fixes to simple things. We all know that it's often not as cut and dry as the developer thought, and that quick fix either didn't fix it or broke something else. No matter how small the change/fix, there should still be a QA process. (For some shops where uptime isn't so critical that QA process can actually be Production, but that's a rare exception. It shouldn't be that way, from a purist perspective, but as with anything it's a risk/reward ratio. If the risk is low (as in a Production failure doesn't incur much penalty if any at all) and the cost of QA is comparatively high, then it's fine.)
Regulatory needs. PCI compliance, etc. often mandates clear separation of tasks between jobs. This is often misconstrued as "developers can't access production" and treated very black and white. But it does mean that developers should be able to access only what they need in order to do their job. If you don't need production data, and that data is sensitive, you shouldn't have it.
Because many developers are congenitally incapable of thinking they make mistakes - the same reason good dev groups have dedicated test teams.
"I'll just make this small config change in Prod, that won't break anything."
OOP developers should understand separation of responsibilities, I would have thought. You break it, you own it. Avoid the problem with a separate Ops team.
In some environments (e.g. finance) large sums of money (and sometimes the law) are also at risk from ill-advised or ill-intentioned changes in an uncontrolled Production environment.
In small teams, I can see a case for developers having production access, but that has to be controlled and auditable so that you ALWAYS know what is in Production. In that sense, it does not matter who pushes the deploy and rollback buttons, but that they exist and are the only way to change the Production environment.
I for one do not want that to be a large part of my job. You may find that your own devs agree once they see how much more time they can spend coding.
The main reason is because allowing a dev to deploy directly to production cuts out the QA process. Which introduces risk. Which management types don't like.
So another bullet point for you is massive increase in RISK.
Security - By having one gatekeeper (with a backup) only one person is accessing production data and servers. This means fewer access points.
Ease of management - You don't need to create as many accounts in your production environment to keep track of - or even worse, share one account among many. (assuming your prod environment is separated from your dev environment.
Practice makes perfect - one person who builds a routine and sticks to it has fewer chance for screw ups.
If there is a way to make a mistake it will eventually happen. Law of big numbers. It is unreasonable to put the burden on developers to be perfect, if you also want them to be productive.
Change management
Accountability
QA
One button builds / deployment
Unit tests
Code stability - suppose you push, right when someone else just checked in code?
Now, the amount of overhead / difficulty to change should be directly related to your up time requirements. Restated: the more costly downtime is, the more you should invest in preventing downtime.
By deploying directly to the production environment, there is a good chance that no QA was involved (i.e. nothing was tested).
Because there needs to be ONE person you can go to who knows what's deployed on the site. If every developer can deploy, you don't know who deployed what when somebody notices something wrong.
SOC-1 compliance may (unnecessarily) suggest or require that the developer be a separate person than the one deploying to production so that controls are in place to prevent malicious intent.
I am a non technical person and have a small company who has been supporting my companies software for a number of years. The solution works well and permutations of the solution has been with the current IT service provider for over 15 years. I recently got a more established IT firm to do a general audit on the software. The current solution uses access as a front end with sqlserver 2005 as the database. The company who did the audit presented a list of faults amongst others that the technology is outdated, the solution is not scalable, bad design, non user friendly interfaces, tables not normalised, tables has no referential integrity, non use of proper coding standards and naming conventions, no application security only database security etc. The firm who did the audit proposed that the solution must be re-written and offered to do so. The current service provider aknowledges some of the findings but assures me that it poses very little or no risk to my business. To re-write the application will cost a lot of money. I am in a difficult situation and would appreciate some technical advice. I basically need to know if my business is at risk running on the current technology. I have a maximum of 70 concurrent users working on the system at a given time
Well, if you value Joel's word, i would say that you are indeed, risking alot here.
Rewriting stuff was and will never be a safe thing to do for a company.
To boil it down into simple terms, ask yourself these questions:
Are you having problems with the software currently? Are users complaining about the user interface, or is it particularly hard for new users to pick up the software when using it? Is data being lost or corrupted at any stage, or are you having problems retrieving reports from the database?
Do you currently, or in the future are you likely to need modifications? If your software is badly written, modifications will be more costly, and more likely to break the application and cause downtime in general.
If the answer to both questions is no, then you likely don't need to rewrite the software. You have to remember that good software developers see badly written software and want to re-write it properly - as well as this, there is money for them in developing the software, so their view isn't totally unbiased.
Like others have said, re-writing a system has its own share of risks - old bugs that were fixed a long time ago can rear their heads again, new bugs can be introduced, the developers of the new system can totally miss the point and make the system less usable than the previous system.
If there are problems with the current system though it may be worthwhile to consider having the system re-written by competent developers - if you opt to go this route however, make sure to get feedback from your current users, especially the 'expert' or 'power' users, to ensure that the system will fulfill all of their requirements.
Before you go view your problem from the technical perspective, you must assess how critical the application is to your business. It sounds as though you have a functioning application. If it delivers consistent behavior AND you have no need for upgrades / new development, you may want to leave it alone. We software developers love to complain about everyone else's code, re-write other's work with "elegant" solutions. That means money.
However, you have an investment that may need maintenance, and when you have the underlying code and database in dis-array, you will incur more cost because the application does not lend itself to be modified. You'll want to get a feel for how much change you need to support. Given that it has been in production for 15 years you've had a good run, so you don't have much risk there.
To do a re-write will cost you, because you need to recreate what the app does, and since the supporting database and program seem to be "de-normalized" and unstructured, it's going to a big effort. There are advantages to have a clean database model because it will be easier to do reports, export to Excel, etc. AND should you want to modify it the developers will have an easier time figuring out what to do.
To spend money to get what you already have requires that you challenge the firm to detail what additional benefits you'll receive. Are these benefits beyond what you're getting today, and will this firm deliver on their promises? Will your company be better off if the database is "normalized" but you receive no other benefit than what the current app gives you? Keep these in mind before you make the jump to a new platform.
Fix the problems in the existing app. It will be much cheaper, can be done incrementally, and if done properly, will result in a more maintainable app.
The suggestion to replace the ADP front end sounds like pure prejudice/ignorance to me -- they don't sell Access development so they want to build you an entirely new app.
On the other hand, the advice about the back end sounds like something that you shouldn't wait to fix (though it could require a lot of work, since existing data likely won't fit proper RI).
The front end and back end problems are two separate issues, and can be handled independently (though the app may need to be updated to reflect changes in RI -- impossible to say without a case-by-case evaluation).
I would hire a competent Access developer with ADP experience to handle the project. This will be much cheaper than the complete rewrite and you won't sacrifice any functionality, nor will you re-introduce bugs that have already been addressed in your existing app. It can also likely be done incrementally (depending on what the problems are and how they need to be solved).
The suggestions offered by the "more established IT firm" are pretty common for access/sql server projects. The suggestion is almost always re-write them as web applications.
I just did this myself last year -- took an MS Access front-end/SQL Server back end application, and rewrote the access part as a C#/ASP.Net website. We enjoyed better performance and more flexibility as a result of the switch, but the old front end had been around long enough that we never did get back all of the functionality that we used to have before the rewrite.
If you're actually seeing 70 concurrent users, and none of them are experiencing performance issues, and your corporate network is secure enough, then you may lose more by rewriting the application, at least in terms of functionality. On the other hand, this may be a good chance to evaluate "what works" and "what could work better"--and enhance workflows.
Excellent use of coding standards doesn't necessarily translate to an excellent application.
What prompted the audit? Does their solution address this issue?
Let's do the math:
People: 70
Avg. Hrs Using software/Day: 2 (Conservative)
Salary/Hour: $8.00 (Really Conservative)
Business Days/Year: 250 (Took out weekends & vacation/sick)
Cost of labor using application: 70 * 2 * 8 * 250 = $280,000 / Year (Could go over 500K)
How much improvement can you get? 5%, 10%, 25%
How much will the new application cost? 50K, 100K, 200K
If you are able to save this time, will your users be freed up to do revenue generating activites or will they just have more time to surf the web? You may want to create some worker efficiency factor: 90%, 75%
Simple answer... Most of the "risks" of using Access are surmounted by using SQL server as the backend. You already said your current solution works.
So it boils down to your future plans. If your existing application isn't missing any functionality that can't be provided via access I would just stick with what you have.
If you need new features I would consider a few things.
Are they something Access can't provide or do well (ex: Internet-facing Solutions)?
What is the potential benefit reaped by having the new features?
What is the potential cost incurred by not having the new features?
Can you put a dollar figure on 1 & 2?
How much to develop the solution in Access?
How much to develop the solution in C#
In other words, always do the CBA :) Better yet, do you own CBA, then ask both companies to provide you with one, and compare for fun. In the worst case you might get your existing company to come down on their price to retain you as a client.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I read in a mythical man month that integration takes 3 times the amount of time it took to develop the individual components.
What has you guys experienced?
Yes and no. Integration effort is probably still 3x, but now it's amortized over the whole development process (e.g. early integration, integration tests (esp. in TDD), etc.)
We still have to do the work but it doesn't catch us by surprise anymore.
That was referring to the bad old days when they built all the software components separately and then tried to put them all together. Smart people don't work like that anymore - they integrate continuously.
I would agree, if not higher. Though it really depends on the integration touch points.
I was involved on a project to carry out integration of a number of modules between Siebel and SAP. While both of these products have integration modules available, all the problems on the project (and there were many) were involved in the integration.
It wasn't helped by the fact that the majority of SAP that we were using was in German, and the messages being transfered were in different XML encoding formats (UTF8 / UTF16).
Once we'd got to grips with the intricacies of what SAP wanted to send and recieve, the whole project moved along much quicker.
Key things for a successfull integration project:
Good documentation (in English!) on the integration modules
Good documentation on the message formats
Good project management
The project management bit is important as they supply the pizza, and do show some understanding when you have been working 30 hours straight to get an account name from one textbox on one machine to appear in another textbox on another machine.
Our project lasted over a year. The rest of the configuration of Siebel that we did, which was alot was only a couple of months,
So Integration - 10 months+, rest of the config 2 months.
Integration time depends on a few of factors: the size of the project, communication between teams, and your integration philosophy.
Small projects take less time to integrate. Large, very large, or huge projects will take more time to integrate. I've been on small projects where integration was minimal. I've been on huge projects spread across multiple component teams where integration took a very long time.
Time to integrate also depends on how well you project communication across teams. If your teams are not communicating it can take 3x or more time to integrate and work out all the related bugs.
Continuous integration helps with the perception that integration takes less time. With CI the time to integrate is amortized over the life of the project. But again if you have a poor relationship with the other component teams to total time for all integrations will take non-zero time.
CI is definitely better than the alternative. Waiting until late in the development cycle to integrate is bound to cause you much pain. The earlier you begin the integration the more comfortable each team becomes with the process.
Kind of a side note, there is an interesting talk given by Juval Lowey regarding this. If you have many tiny components, then you increase effort to integrate. If you have only a few large components, you decrease integration, but you have more complex components to maintain. So your effort to integrate is dependent on the architecture and where it balances the number of components to their complexity. A good balance is key, because if you drift too one side or the other, the effort required increases exponentially. If you have 20 components and add a 21st, it's not just 5% (1/20) more complex, because you have to consider the interactions of the potentially other 20 components. So adding one component adds the potential for 20 ways to interact with the existing components. Of course a good design will limit this interaction to as few components as possible, but that was basically why Juval felt it was exponential.
Neil and Markus' answers are spot on. In Windows, we integrate continuously. Our source code control system (we call it "Source Depot") is hierarchical with "WinMAIN" being at the top, and a tree of branches underneath it that general correspond to the organizational structure.
In general, changes flow up and down this tree in a process of "forward integrations" and "reverse integrations". This happens very regularly - almost every day for the top level branches under winmain, and a bit less often for the lower level branches. Each major branch builds fully every day in at least four flavors. The top level branches build in six flavors. A flavor is something like x86/fre, or x64/chk. We also get some daily pseudo localized builds as well.
By "build" I mean build a fully installable Windows client product from source. This happens several hundred times per day.
This works well for us - there are two important goals here:
maintaining good code flow (we call it velocity) up and down the tree. The idea is that any branch is never too different from WinMAIN.
Catching integration errors as early as possible.
Markus is very correct that this amortizes integration costs over the life of the project. For us, this makes the costs a LOT lower than they would be if we deferred costs more toward the ends of cycles. This is how things used to work. Not to air too much dirty laundry, but when I started in Windows about 5 1/2 years ago, getting a build out took a very, very long time. They now happen every day, like clock work. The Windows Engineering Tools and Release group (WETR) gets all the credit for this.
As many others have suggested in various forums - regular integration and full daily automated builds are essential for most projects.
Note, daily and regular automated test are a whole other topic (its a massive effort as you can imagine).
If you have an integration phase towards the end of your project on you project plan you are doomed and x3 is not too bad.
You should rather go for continuous integration, where you have, say every 2 weeks, a release with some integration before it.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Sometimes you don't have the source code and need to reverse engineer a program or a black box. Any fun war stories?
Here's one of mine:
Some years ago I needed to rewrite a device driver for which I didn't have source code. The device driver ran on an old CP/M microcomputer and drove a dedicated phototypesetting machine through a serial port. Almost no documentation for the phototypesetting machine was available to me.
I finally hacked together a serial port monitor on a DOS PC that mimicked the responses of the phototypesetting machine. I cabled the DOS PC to the CP/M machine and started logging the data coming out of the device driver as I feed data in through the CP/M machine. This enabled me to figure out the handshaking and encoding used by the device driver and re-create an equivalent one for a DOS machine.
Read the story of FCopy for the C-64 here:
Back in the 80s, the Commodore C-64 had an intelligent floppy drive, the 1541, i.e. an external unit that had its own CPU and everything.
The C-64 would send commands to the drive which in turn would then execute them on its own, reading files, and such, then send the data to the C-64, all over a propriatory serial cable.
The manual for the 1541 mentioned, besides the commands for reading and writing files, that one would read and write to its internal memory space. Even more exciting was that one could download 6502 code into the drive's memory and have it executed there.
This got me hooked and I wanted to play with that - execute code on the drive. Of course, there was no documention on what code could be executed there, and which functions it could use.
A friend of mine had written a disassembler in BASIC. and so I read out all its ROM contents, which was 16KB of 6502 CPU code, and tried to understand what it does. The OS on the drive was quite amazing and advanced IMO - it had a kind of task management, with commands being sent from the communication unit to the disk i/o task handler.
I learned enough to understand how to use the disk i/o commands to read/write sectors of the disc. Actually, having read the Apple ]['s DOS 3.3 book which explained all of the workings of its disk format and algos in much detail, was a big help in understanding it all.
(I later learned that I could have also found reserve-eng'd info on the more 4032/4016 disk drives for the "business" Commodore models which worked quite much the same as the 1541, but that was not available to me as a rather disconnected hobby programmer at that time.)
Most importantly, I also learned how the serial comms worked. I realized that the serial comms, using 4 lines, two for data, two for handshake, was programmed very inefficiently, all in software (though done properly, using classic serial handshaking).
Thus I managed to write a much faster comms routine, where I made fixed timing assumtions, using both the data and the handshake line for data transmission.
Now I was able to read and write sectors, and also transmit data faster than ever before.
Of course, it would have been great if one could simply load some code into the drive which speeds up the comms, and then use the normal commands to read a file, which in turn would use the faster comms. This was no possible, though, as the OS on the drive did not provide any hooks for that (mind that all of the OS was in ROM, unmodifiable).
Hence I was wondering how I could turn my exciting findings into a useful application.
Having been a programmer for a while already, dealing with data loss all the times (music tapes and floppy discs were not very realiable back then), I thought: Backup!
So I wrote a backup program which could duplicate a floppy disc in never-before seen speed: The first version did copy an entire 170 KB disc in only 8 minutes (yes, minutes), the second version did it even in about 4.5 minutes. Whereas the apps before mine took over 25 minutes. (Mind you, the Apple ][, which had its disc OS running on the Apple directly, with fast parallel data access, did this all in a minute or so).
And so FCopy for the C-64 was born.
It became soon extremely popular. Not as a backup program as I had intended it, but as the primary choice for anyone wanting to copy games and other software for their friends.
Turned out that a simplification in my code, which would simply skip unreadable sectors, writing a sector with a bad CRC to the copy, did circumvent most of the then-used copy protection schemes, making it possible to copy most formerly uncopyable discs.
I had tried to sell my app and sold it actually 70 times. When it got advertised in the magazines, claiming it would copy a disc in less than 5 minutes, customers would call and not believe it, "knowing better" that it can't be done, yet giving it a try.
Not much later, others started to reverse engineer my app, and optimize it, making the comms even faster, leading to copy apps that did it even in 1.5 minutes. Faster was hardly possible, because, due to the limited amount of memory available on the 1541 and the C-64, you had to swap discs several times in the single disc drive to copy all 170 KB of its contents.
In the end, FCopy and its optimized successors were probably the most-popular software ever on the C-64 in the 80s. And even though it didn't pay off financially for me, it still made me proud, and I learned a lot about reverse-engineering, futility of copy protection and how stardom feels. (Actually, Jim Butterfield, an editor for a C-64 magazine in Canada, told its readers my story, and soon he had a cheque for about 1000 CA$ for me - collected by the magazine from many grateful users sending 5$-cheques, which was a big bunch of money back then for me.)
I actually have another story:
A few years past my FCopy "success" story, I was approached by someone who asked me if I could crack a slot machine's software.
This was in Germany, where almost every pub had one or two of those: You'd throw some money in what amounts to about a US quarter, then it would spin three wheels and if you got lucky with some pattern, you'd then have the choice to "double or nothing" your win on the next play, or get the current win. The goal of the play was to try to double your win a few times until you'd get in the "series" mode where any succeeding win, no matter how minor, would get you a big payment (of about 10 times your spending per game).
The difficulty was to know when to double and when not. For an "outsider" this was completely random, of course. But it turned out that those German-made machines were using simple pseudo-randomized tables in their ROMs. Now, if you watched the machine play for a few rounds, you'd could figure out where this "random table pointer" was and predict its next move. That way, a player would know when to double and when to pass, leading him eventually to the "big win series".
Now, this was already a common thing when this person approached me. There was an underground scene which had access to the ROMs in those machines, find the tables and create software for computers such as a C-64 to use for prediction of the machine's next moves.
Then came a new type of machine, though, which used a different algorithm: Instead of using pre-calc'd tables, it did something else and none of the resident crackers could figure that out. So I was approached, being known as a sort of genius since my FCopy fame.
So I got the ROMs. 16KB, as usual. No information on what it did and how it worked whatsoever. I was on my own. Even the code didn't look familiar (I knew 6502 and 8080 by then only). After some digging and asking, I found it was a 6809 (which I found to be the nicest 8 bit CPU to exist, and which had analogies to the 680x0 CPU design, which was much more linear than the x86 family's instruction mess).
By that time, I had already a 68000 computer (I worked for the company "Gepard Computer" which built and sold such a machine, with its own developer OS and, all) and was into programming Modula-2. So I wrote a disassembler for the 6809, one that helped me with reverse engineering by finding subroutines, jumps, etc. Slow I got a an idea of the flow control of the slot machine's program. Eventually I found some code that looked like a mathmatical algorithm and it dawned on me that this could be the random generating code.
As I never had a formal education in computer sciences, up to then I had no idea how a typical randomgen using mul, add and mod worked. But I remember seeing something mentioned in a Modula-2 book and then realized what it was.
Now I could quickly find the code that would call this randomgen and learn which "events" lead to a randomgen iteration, meaning I knew how to predict the next iterations and their values during a game.
What was left was to figure out the current position of the randomgen. I had never been good with abstract things such as algebra. I knew someone who studied math and was a programmer too, though. When I called him, he quickly knew how to solve the problem and quabbled a lot about how simple it would be to determine the randomgen's seed value. I understood nothing. Well, I understood one thing: The code to accomplish this would take a lot of time, and that a C-64 or any other 8 bit computer would take hours if not days for it.
Thus, I decided to offer him 1000 DM (which was a lot of money for me back then) if he could write me an assembler routine in 68000. Didn't take him long and I had the code which I could test on my 68000 computer. It took usually between 5 and 8 minutes, which was acceptable. So I was almost there.
It still required a portable 68000 computer to be carried to the pub where the slot machine stands. My Gepard computer was clearly not of the portable type. Luckly, someone else I knew in Germany produced entire 68000 computers on a small circuit board. For I/O it only had serial comms (RS-232) and a parallel port (Centronics was the standard of those days). I could hook up some 9V block battieries to it to make it work. Then I bought a Sharp pocket computer, which had a rubber keyboard and a single-line 32 chars display. Running on batteries, which was my terminal. It had a RS-232 connector which I connected to the 68000 board. The Sharp also had some kind of non-volatile memory, which allowed me to store the 68000 random-cracking software on the Sharp, transfer it on demand to the 68000 computer, which then calculated the seed value. Finally I had a small Centronics printer which printed on narrow thermo paper (which was the size of what cash registers use to print receipts). Hence, once the 68000 had the results, it would send a row of results for the upcoming games on the slot machine to the Sharp, which printed them on paper.
So, to empty one of these slot machines, you'd work with two people: You start playing, write down its results, one you had the minimum number of games required for the seed calculation, one of you would go to the car parked outside, turn on the Sharp, enter the results, it would have the 68000 computer rattle for 8 minutes, and out came a printed list of upcoming game runs. Then all you needed was this tiny piece of paper, take it back to your buddy, who kept the machine occupied, align the past results with the printout and no more than 2 minutes later you were "surprised" to win the all-time 100s series. You'd then play these 100 games, practically emptying the machine (and if the machine was empty before the 100 games were played, you had the right to wait for it to be refilled, maybe even come back next day, whereas the machine was stopped until you came back).
This wasn't Las Vegas, so you'd only get about 400 DM out of a machine that way, but it was quick and sure money, and it was exciting. Some pub owners suspected us of cheating but had nothing against us due to the laws back then, and even when some called the police, the police was in favor of us).
Of course, the slot making company soon got wind of this and tried to counteract, turning off those particular machines until new ROMs were installed. But the first few times they only changed the randomgen's numbers. We only had to get hold of the new ROMs, and it took me a few minutes to find the new numbers and implement them into my software.
So this went on for a while during which me and friends browsed thru pubs of several towns in Germany looking for those machines only we could crack.
Eventually, though, the machine maker learned how to "fix" it: Until then, the randomgen was only advanced at certain predictable times, e.g. something like 4 times during play, and once more per the player's pressing of the "double or nothing" button.
But then they finally changed it so that the randomgen would continually be polled, meaning we were no longer able to predict the next seed value exactly on time for the pressing of the button.
That was the end of it. Still, making the effort of writing a disassembler just for this single crack, finding the key routines in 16KB of 8 bit CPU code, figuring out unknown algorithms, investing quite a lot of money to pay someone else to develop code I didn't understand, finding the items for a portable high-speed computer involving the "blind" 68000 CPU with the Sharp as a terminal and the printer for the convenient output, and then actually emptying the machines myself, was one of the most exciting things I ever did with my programming skills.
Way back in the early 90s, I forgot my Compuserve password. I had the encrypted version in CIS.INI, so I wrote a small program to do a plaintext attack and analysis in an attempt to reverse-engineer the encryption algorithm. 24 hours later, I figured out how it worked and what my password was.
Soon after that, I did a clean-up and published the program as freeware so that Compuserve customers could recover their lost passwords. The company's support staff would frequently refer these people to my program.
It eventually found its way onto a few bulletin boards (remember them?) and Internet forums, and was included in a German book about Compuserve. It's still floating around out there somewhere. In fact, Google takes me straight to it.
Once, when playing Daggerfall II, I could not afford the Daedric Dai-Katana so I hex-edited the savegame.
Being serious though, I managed to remove the dongle check on my dads AutoCAD installation using SoftICE many years ago. This was before the Internet was big. He works as an engineer so he had a legitimate copy. He had just forgotten the dongle at his job and he needed to do some things and I thought it would be a fun challenge. I was very proud afterwards.
Okay, this wasn't reverse engineering (quite) but a simple hardware hack born of pure frustration. I was an IT manager for a region of Southwestern Bell's cell phone service in the early 90s. My IT department was dramatically underfunded and so we spent money on smart people rather than equipment.
We had a WAN between major cities, used exclusively for customer service, with critical IP links. Our corporate bosses were insistent that we install a network monitoring system to notify us when the lines went down (no money for redundancy, but spend bucks for handling failures. Sigh.)
The STRONGLY recommended solution ran on a SPARC workstation and started at $30K plus the cost of a SPARC station (around $20K then), together which was a substantial chunk of my budget. I couldn't see it - this was a waste of $$. So I decided a little hacking was in order.
I took an old PC scheduled for destruction and put a copy of ProComm (remember ProComm?) and had it ping each of the required nodes along the route (this was one of the later versions of ProComm that scripted FTP as well as serial lines, KERMIT, etc.) A little logic in the coding fired-off a pager message when a node couldn't be reached. I had already used it to cobble together a pager system for our techs, so I reused the pager code. The script ran continuously, sending a ping once each minute across each of the critical links, and branched into the pager code when a ping wasn't returned.
We duplicated this system at each critical location for a cost of less than $500 and had a very fast notification when a link went down. Next issue - one of our first trouble-shooting methods was to power-cycle our routers and/or terminal servers. I got some dial-up X10 controllers and a few X10 on/off appliance power switches. You had to know the correct phone number to use and the correct tones to push, but we printed up a cheat card for each technician and they kept it with their pager. Instant fast response! One of my techs then programmed the phones we all had to reset specific equipment at each site as a speed-dial. One-tech solves the problem!
Now the "told-you-so" unveiling.
I'm sitting at lunch with our corporate network manager in Dallas who is insisting on a purchase of the Sun-based network management product. I get a page that one of our links is down, and then a second page. Since the pager messages are coming from two different servers, I know exactly which router is involved (it was a set-up, I knew anyway, as the tech at the meeting with me was queued to "down a router" during the meal so we could show-off.) I show the pager messages to the manager and ask him what he would do to resolve this problem. He eyes me suspiciously, since he hasn't yet successfully been paged by his Solaris NMS system that is supposed to track critical links. "Well, I guess you'd better call a tech and get them to reset the router and see if that fixes it." I turned to the tech who was lunching with us and asked him to handle it. He drew out his cell phone (above the table this time) and pressed the speed-dial he had programmed to reset the router in question. The phone dialed the X10 switch, told it to power-off the router, paused for five seconds, told it to power-up the router, and disconnected. Our ProComm script sent us pages telling us the link was back up within three minutes of this routine. :-)
The corporate network manager was very impressed. He asked me what the cost was for my new system. When I told him less than $1K, he was apoplectic. He had just ordered a BIG set of the Sun Solaris network management solution just for the tasks I'd illustrated. I think he had spent something like $150K. I told him how the magic was done and offered him the ProComm script for the price of lunch. TANSTAAFL. He told me he'd buy me lunch to keep my mouth shut.
Cleaning out my old drawers of disks and such, I found a copy of the code - "Pingasaurus Rex" was the name I had given it. That was hacking in the good old days.
The most painful for me was for this product where we wanted to include an image on a Excel Spreadsheet (few years back before the open standards). So I had to get and "understanding" if such thing exists of the internal format for the docs, as well. I ended up doing some Hex comparison between files with and without the image to figure out how to put it there, plus working on some little endian math....
I once worked on a tool that would gather inventory information from a PC as it logged into the network. The idea was to keep track of all the PCs in your company.
We had a new requirement to support the Banyan VINES network system, now long forgotten but pretty cool at the time it came out. I couldn't figure out how to get the Ethernet MAC address from Banyan's adapter as there was no documented API to do this.
Digging around online, I found a program that some other Banyan nerd had posted that performed this exact action. (I think it would store the MAC address in an environment variable so you could use it in a script). I tried writing to the author to find out how his program worked, but he either didn't want to tell me or wanted some ridiculous amount of money for the information (I don't recall).
So I simply fired up a disassembler and took his utility apart. It turned out he was making one simple call to the server, which was an undocumented function code in the Banyan API. I worked out the details of the call pretty easily, it was basically asking the server for this workstations address via RPC, and the MAC was part of the Banyan network address.
I then simply emailed the engineers at Banyan and told them what I needed to do. "Hey, it appears that RPC function number 528 (or whatever) returns what I need. Is that safe to call?"
The Banyan engineers were very cool, they verified that the function I had found was correct and was pretty unlikely to go away. I wrote my own fresh code to call it and I was off and running.
Years later I used basically the same technique to reverse engineer an undocumented compression scheme on an otherwise documented file format. I found a little-known support tool provided by the (now defunct) company which would decompress these files, and reverse engineered it. It turned out to be a very straightforward Lempel-Ziv variant applied within the block structure of their file format. The results of that work are recorded for posterity in the Wireshark source code, just search for my name.
Almost 10 years ago, I picked up the UFO/XCOM Collector's Edition in the bargain bin at a local bookstore, mostly out of nostalgia. When I got back home, I got kinda excited that it had been ported to Windows (the DOS versions didn't run under win2k)... and then disappointed that it had garbled graphics.
I was about to shrug my shoulders (bargain bin and all), but then my friend said "Haven't you... bugfixed... software before?", which led to a night of drinking lots of cola and reverse-engineering while hanging out with my friend. In the end, I had written a bugfix loader that fixed the pitch vs. width issue, and could finally play the two first XCOM games without booting old hardware (DOSBOX wasn't around yet, and my machine wasn't really powerful enough for fullblown virtualization).
The loader gained some popularity, and was even distributed with the STEAM re-release of the games for a while - I think they've switched to dosbox nowadays, though.
I wrote a driver for the Atari ST that supported Wacom tablets. Some of the Wacom information could be found on their web sites, but I still had to figure out a lot on my own.
Then, once I had written a library to access the wacom tables (and a test application to show the results) - it dawned on me that there was no API for the OS (GEM windowing system) to actually place the mouse cursor somewhere. I ended up having to hook some interrupts in something called the VDI (like GDI in windows), and be very very careful not to crash the computer inside there. I had some help (in the form of suggestions) from the developers of an accelerated version of the VDI (NVDI), and everything was written in PurePascal. I still occasionally have people asking me how to move the mouse cursor in GEM, etc.
I've had to reverse engineer a video-processing app, where I only had part of the source code. It took me weeks and weeks to even work out the control-flow, as it kept using CORBA to call itself, or be called from CORBA in some part of the app that I couldn't access.
Sheer idiocy.
I recently wrote an app that download the whole content from a Domino Webmail server using Curl. This is because the subcontractor running the server asks for a few hundred bucks for every archive request.
They changed their webmail version about one week after I released the app for the departement but managed to make it working again using a GREAT deal of regex and XML
When I was in highschool they introduced special hours each week (there were 3 hours if i recall correctly) in which we had to pick a classroom with a teacher there to help with any questions on their subject. No of course everybody always wanted to spend their time in the computer room to play around on the computers there.
To choose the room where you would have to be there was an application that would monitor how many students would go to a certain room, and so you had to reserve your slot on time or otherwise there was not much choice where to go.
At that time I always liked to play around on the computers there and I had obtained administrator access already, only for this application that would not help me so much. So I used my administrator access to make a copy of the application and take it home to examine. Now I don't remember all the details but I discoverd this application used some access database file located on a hidden network share. Then after taking a copy of this database file I found there was a password on the database. Using some linux access database tools I could easily work around that and after that it was easy to import this database in my own mysql server.
Then though a simple web interface I could find details for every student in the school to change their slots and promote myself to sit in my room of choice every time.
The next step was to write my own application that would allow me to just select a student from the list and change anything without having to look up their password which was implemented in only a few hours.
While not such a very impressive story as some others in this thread, I still remember it was a lot of fun to do for a highschool child back then.