How would you go about reverse engineering a set of binary data pulled from a device?

How would you go about reverse engineering a set of binary data pulled from a device? - binary

A friend of mine brought up this questiont he other day, he's recently bought a garmin heart rate moniter device which keeps track of his heart rate and allows him to upload his heart rate stats for a day to his computer.
The only problem is there are no linux drivers for the garmin USB device, he's managed to interpret some of the data, such as the model number and his user details and has identified that there are some binary datatables essentially which we assume represent a series of recordings of his heart rate and the time the recording was taken.
Where does one start when reverse engineering data when you know nothing about the structure?

I had the same problem and initially found this project at Google Code that aims to complete a cross-platform version of tools for the Garmin devices ... see: http://code.google.com/p/garmintools/. There's a link on the front page of that project to the protocols you need, which Garmin was thoughtful enough to release publically.
And here's a direct link to the Garmin I/O specification: http://www.garmin.com/support/pdf/IOSDK.zip

I'd start looking at the data in a hexadecimal editor, hopefully a good one which knows the most common encodings (ASCII, Unicode, etc.) and then try to make sense of it out of the data you know it has stored.

As another poster mentioned, reverse engineering can be hairy, not in practice but in legality.
That being said, you may be able to find everything related to your root question at hand by checking out this project and its' code...and they do handle the runner's heart rate/GPS combo data as well
http://www.gpsbabel.org/

I'd suggest you start with checking the legality of reverse engineering in your country of origin. Most countries have very strict laws about what is allowed and what isn't regarding reverse engineering devices and code.

I would start by seeing what data is being sent by the device, then consider how such data could be represented and packed.
I would first capture many samples, and see if any pattern presents itself, since heart beat is something which is regular and that would suggest it is measurement related to the heart itself. I would also look for bit fields which are monotonically increasing, as that would suggest some sort of time stamp.
Having formed a hypothesis for what is where, I would write a program to test it and graph the results and see if it makes sense. If it does but not quite, then closer inspection would probably reveal you need some scaling factors here or there. It is also entirely possible I need to process the data first before it looks anything like what their program is showing, i.e. might need to integrate the data points. If I get garbage, then it is back to the drawing board :-)
I would also check the manufacturer's website, or maybe run strings on their binaries. Finding someone who works in the field of biomedical engineering would also be on my list, as they would probably know what protocols are typically used, if any. I would also look for these protocols and see if any could be applied to the data I am seeing.

I'd start by creating a hex dump of the data. Figure it's probably blocked in some power-of-two-sized chunks. Start looking for repeating patterns. Think about what kind of data they're probably sending. Either they're recording each heart beat individually, or they're recording whatever the sensor is sending at fixed intervals. If it's individual beats, then there's going to be a time delta (since the last beat), a duration, and a max or avg strength of some sort. If it's fixed intervals, then it'll probably be a simple vector of readings. There'll probably be a preamble of some sort, with a start timestamp and the sampling rate. You can try decoding the timestamp yourself, or you might try simply feeding it to ctime() and see if they're using standard absolute time format.
Keep in mind that lots of cheap A/D converters only produce 12-bit outputs, so your readings are unlikely to be larger than 16 bits (and the high-order 4 bits may be used for flags). I'd recommend resetting the device so that it's "blank", dumping and storing the contents, then take a set of readings, record the results (whatever the device normally reports), then dump the contents again and try to correlate the recorded results with whatever data appeared after the "blank" dump.

Unsure if this is what you're looking for but Garmin has created an API that runs with your browser. It seems OSX is supported, as well as Windows browsers... I would try it from Google Chromium to see if it can be used instead of this reverse engineering...
http://developer.garmin.com/web-device/garmin-communicator-plugin/
API Features
Auto-detection of devices connected to a computer Access to device
product information like product name and software version Read
tracks, routes and waypoints from supported recreational, fitness and
navigation devices Write tracks, routes and waypoints to supported
recreational, fitness and navigation devices Read fitness data from
supported fitness devices Geo-code address and save to a device as a
waypoint or favorite Read and write Garmin XML files (GPX and TCX) as
well as binary files. Support for most Garmin devices (USB, USB
mass-storage, most serial devices) Support for Internet Explorer,
Firefox and Chrome on Microsoft Windows. Support for Safari, Firefox
and Chrome on Mac OS X.

Can you synthesize a heart beat using something like a computer speaker? (I have no idea how such devices actually work). Watch how the binary results change based on different inputs.
Ripping apart the device and checking out what's inside would probably help too.

Related

Adobe Air unique id issue

I created an AIR app which sends an ID to my server to verify the user's licence.
I created it using
NetworkInfo.networkInfo.findInterfaces() and I use the first "name" value for "displayName" containing "LAN" (or first mac address I get if the user is on a MAC).
But I get a problem:
sometime users connect to internet using an USB stick (given from a mobile phone company) and it changes the serial number I get; probably the USB stick becomes the first value in the vector of findInterfaces().
I could take the last value, but I think I could get similar problems too.
So is there a better way to identify the computer even with this small hardware changes?
It would be nice to get motherboard or CPU serial, but it seems to be not possible. I've found some workaround to get it, but working on WIN and not on a MAC.
I don't want to store data on the user computer for authentication to set "a little" more difficult to hack the software.
Any idea?
Thanks
Nadia

So is there a better way to identify the computer even with this small hardware changes?
No, there is no best practices to identify personal computer and build on this user licensing for the software. You should use server-side/licensing-manager to provide such functional. Also it will give your users flexibility with your desktop software. It's much easier as for product owner (You don't have call center that will respond on every call with changed Network card, hard drive, whatever) and for users to use such product.
Briefly speaking, user's personal computer is insecure (frankly speaking you don't have options to store something valuable) and very dynamic environment (There is very short cycle on the hardware to use it as part of licensing program).

I am in much the same boat as you, and I am now finally starting to address this... I have researched this for over a year and there are a couple options out there.
The biggest thing to watch out for when using a 3rd party system is the leach effect. Nearly all of them want a percentage of your profit - which in my mind makes it nothing more than vampireware. This is on top of a percentage you WILL pay to paypal, merchant processor, etc.
The route I will end up taking is creating a secondary ANE probably written in Java because of 1) Transitioning my knowledge 2) Ability to run on various architectures. I have to concede this solution is not fool proof since reverse engineering of java is nearly as easy as anything running on FP. The point is to just make it harder, not bullet proof.
As a side note - any naysayers of changing CPU / Motherboard - this is extremely rare if not no longer even done. I work on a laptop and obviously once that hardware cycle is over, I need to reregister everything on a new one. So please...
Zarqon was developed by: Cliff Hall
This appears to be a good solution for small scale. The reason I do not believe it scales well based on documentation (say beyond a few thousand users) is it appears to be a completely manual process ie-no ability to tie into a payment system to then auto-gen / notify the user of the key (I could be wrong about this).
Other helpful resources:
http://www.adobe.com/devnet/flex/articles/flex_paypal.html

How can I analyze live data from webcam?

I am going to be working on self-chosen project for my college networking class and I just had a couple questions to help get me started in the right direction.
My project will involve creating a new "physical" link over which data, in the form of text, will be transmitted from one computer to another. This link will involve one computer with a webcam that reads a series of flashing colors (black/white) as binary and converts it to text. Each series of flashes will simulate a packet of data. I will be using OSX an the integrated webcam in a Macbook, the flashing computer will either be windows or osx.
So my questions are: which programming languages or API's would be best for reading live webcam data and analyzing the color of a certain area as well as programming and timing the flashes? Also, would I need to worry about matching the flash rate of the "writing" computer and the frame capture rate of the "reading" computer?
Thank you for any help you might be able to provide.

Regarding the frame capture rate, Shannon sampling theorem says that "perfect reconstruction of a signal is possible when the sampling frequency is greater than twice the maximum frequency of the signal being sampled". In other words if your flashing light switches 10 times per second, you need a camera of more than 20fps to properly capture that. So basically check your camera specs, divide by 2, lower the resulting a little and you have your maximum flashing rate.
Whatever can get the frames will work. If the light conditions in which the camera works are gonna be stable, and the position of the light on images is gonna be static then it is gonna be very very easy with checking the average pixel values of a certain area.
If you need additional image processing you should probably also find out about OpenCV (it has bindings to every programming language).

To answer your question about language choice, I would recommend java. The Java Media Framework is great and easy to use. I have used it for capturing video from webcams in the past. Be warned, however, that everyone you ask will recommend a different language - everyone has their preferences!
What are you using as the flashing device? What kind of distance are you trying to achieve? Something worth thinking about is how are you going to get the receiver to recognise where within the captured image to look for the flashes. Some kind of fiducial marker might be necessary. Longer ranges will make this problem harder to resolve.
If you're thinking about shorter ranges, have you considered using a two-dimensional transmitter? (given that you're using a two-dimensional receiver, it makes sense) and maybe have a transmitter that shows a sequence of QR codes (or similar encodings) on a monitor?
You will have to consider some kind of error-correction encoding, such as a hamming code. While encoding would increase the data footprint, it might give you overall better bandwidth given that you can crank up the speed much higher without having to worry about the odd corrupt bit.
Some 'evaluation' type material might include you discussing the obvious security risks in using such a channel - anyone with line of sight to the transmitter can eavesdrop! You could suggest in your writeup using some kind of encryption, a block cipher in CBC would do, but would require a key-exchange prior to transmission, so you could think about public key encryption.

Simulator or Emulator? What is the difference?

While I understand what simulation and emulation mean in general, I almost always get confused about them. Assume that I create a piece of software that mimics existing hardware/software, what should I call it? A simulator or an emulator?
Could anyone explain the difference in terms of programming?
Bonus: What is the difference in English between these two terms? (Sorry, I am not a native speaker :))

Emulation is the process of mimicking the outwardly observable behavior to match an existing target. The internal state of the emulation mechanism does not have to accurately reflect the internal state of the target which it is emulating.
Simulation, on the other hand, involves modeling the underlying state of the target. The end result of a good simulation is that the simulation model will emulate the target which it is simulating.
Ideally, you should be able to look into the simulation and observe properties that you would also see if you looked into the original target. In practice, there may some shortcuts to the simulation for performance reasons -- that is, some internal aspects of the simulation may actually be an emulation.
MAME is an arcade game emulator; Hyperterm is a (not very good) terminal emulator. There's no need to model the arcade machine or a terminal in detail to get the desired emulated behavior.
Flight Simulator is a simulator; SPICE is an electronics simulator. They model as much as possible every detail of the target to represent what the target does in reality.
EDIT: Other responses have pointed out that the goal of an emulation is to able to substitute for the object it is emulating. That's an important point. A simulation's focus is more on the modeling of the internal state of the target -- and the simulation does not necessarily lead to emulation. In particular, a simulation may run far slower than real-time. SPICE, for example, cannot substitute for an actual electronics circuit (even if assuming there was some kind of magical device that perfectly interfaces electrical circuits to a SPICE simulation.)
A simulation does not always lead to emulation --

If a flight-simulator could transport you from A to B then it would be a flight-emulator.
An emulator can replace the original for real use.
A Virtual PC emulates a PC.
A simulator is a model for study and analysis.
An emulator will always have to operate close to real-time. For a simulator that is not always the case. A geological simulation could do 1000 years/second or more.

Simulation = For analysis and study
Emulation = For usage as a substitute
A simulator is an environment which models but an emulator is one that replicates the usage as on the original device or system.
Simulator mimics the activity of something that it is simulating. It "appears"(a lot can go with this "appears", depending on the context) to be the same as the thing being simulated. For example the flight simulator "appears" to be a real flight to the user, although it does not transport you from one place to another.
Emulator, on the other hand, actually "does" what the thing being emulated does, and in doing so it too "appears to be doing the same thing". An emulator may use different set of protocols for mimicking the thing being emulated, but the result/outcome is always the same as the original object. For example, EMU8086 emulates the 8086 microprocessor on your computer, which obviously is not running on 8086 (= different protocols), but the output it gives is what a real 8086 would give.

It's a difference in focus. Emulators1 focus on recreating the behavior of a system, with no regard for how the system functions internally. Simulators2 focus on modeling the components of a system. You use an emulator when you care mostly about what a system does, and a simulator when you care about how it does it.
As for their general English meanings, emulation is "the endeavor to equal or to excel another in qualities or actions", while simulation is "to model, replicate, duplicate the behavior, appearance or properties of". Not much difference. Emulation comes from æmulus, "striving, rivaling," and is related to "imitate" and "image," which suggests a surface-lever resemblance. "Simulation" comes from similis "like", as does the word "similar," which perhaps suggests a deeper congruence.
References:
Wikipedia: Emulator
Wikipedia: Computer Simulation
Wiktionary: emulation
Wiktionary: simulation
Etymology Online: emulation
Etymology Online: simulation

I don't think emulator and simulator can be compared. Both mimic something, but are not part of the same scope of reasonning, they are not used in the same context.
In short: an emulator is designed to copy some features of the orginial and can even replace it in the real environment. A simulator is not desgined to copy the features of the original, but only to appear similar to the original to human beings. Without the features of the orginal, the simulator cannot replace it in the real environment.
An emulator is a device that mimics something close enough so that it can be substituted to the real thing. E.g you want a circuit to work like a ROM (read only memory) circuit, but also wants to adjust the content until it is what you want. You'll use a ROM emulator, a black box (likely to be CPU-based) with a physical and electrical interfaces compatible with the ROM you want to emulate. The emulator will be plugged into the device in place of the real ROM. The motherboard will not see any difference when working, but you will be able to change the emulated-ROM content easily. Said otherwise the emulator will act exactly as the actual thing in its motherboard context (maybe a little bit slower due to actual internal model) but there will be additional functions (like re-writing) visible only to the designer, out of the motherboard context. So emulator definition would be: something that mimic the original, has all of its functional features, can actually replace it to some extend in the real world, and may have additional features not visible in the normal context.
A simulator is used in another thinking context, e.g a plane simulator, a car simulator, etc. The simulation will take care only of some aspect of the actual thing, usually those related to how a human being will perceive and control it. The simulator will not perform the functions of the real stuff, and cannot be sustituted to it. The plane simulator will not fly or carry someone, it's not its purpose at all. The simulator is not intended to work, but to appear to the pilot somehow like the actual thing for purposes other than its normal ones, e.g. to allow ground training (including in unusual situations like all-engine failure). So simulator definition would be: something that can appear to human, to some extend, like the original, but cannot replace it for actual use. In addition the pilot will know that the simulator is a simulator.
I don't think we'll see any ROM simulator, because ROM are not interacting with human beings, nor we'll see any plane emulator, because planes cannot have a replacement performing the same functions in the real world.
In my view the model inside an emulator or a simulator can be anything, and has not to be similar to the model of the original. A ROM emulator model will likely be software instead of hardware, MS Flight Simulator cannot be more software than it is.
This comparison of both terms will contradict the currently selected answer (from Toybuilder) which puts the difference on the internal model, while my suggestion is that the difference is whether the fake can or cannot be used to perform the actual function in the actual world (to some accepted extend, indeed).
Note that the plane simulator will have also to simulate the earth, the sun, the wind, etc, which are not part of the plane, so a plane simulator will have to mimic some aspects of the plane, as well as the environment of the plane because it is not used in this actual environment, but in a training room.
This is a big difference with the emulator which emulates only the orginal, and its purpose is to be used in the environment of the original with no need to emulate it. Back to the plane context... what could be a plane emulator? Maybe a train that will connect two airports -- actually two plane steps -- carrying passengers, with stewardesses onboard, with car interior looking like an actual plane cabin, and with captain saying "ladies and gentlemen our altitude is currenlty 10 kms and the temperature at our destination is 24°C". Its benefit is difficult to see, hum...
As a conclusion, the emulator is a real thing intended to work, the simulator is a fake intended to trick the user.

To understand the difference between a simulator and an emulator, keep in mind that a simulator tries to mimic the behavior of a real device. For example, in the case of the iOS Simulator, it simulates the real behavior of an actual iPhone/iPad device. However, the Simulator itself uses the various libraries installed on the Mac (such as QuickTime) to perform its rendering so that the effect looks the same as an actual iPhone. In addition, applications tested on the Simulator are compiled into x86 code, which is the byte-code understood by the Simulator. A real iPhone device, conversely, uses ARM-based code.
In contrast, an emulator emulates the working of a real device. Applications tested on an emulator are compiled into the actual byte-code used by the real device. The emulator executes the application by translating the byte-code into a form that can be executed by the host computer running the emulator.
To understand the subtle difference between simulation and emulation, imagine you are trying to convince a child that playing with knives is dangerous. To simulate this, you pretend to cut yourself with a knife and groan in pain. To emulate this, you actually cut yourself.

In more or less normal parlance: If your software can do everything the mimicked system can do, it's an emulator. If it only approximates the results of a system (IT or otherwise), it's a simulator.

An emulator is a model of a system which will accept any valid input that that the emulated system would accept, and produce the same output or result. So your software is an emulator, only if it reproduces the behavior of the emulated system precisely.

Some years ago I came up with a very short adage that, I believe, captures the essence of the difference quite nicely:
A simulator is an emulator on a mission.
By that I mean that you use an emulator when you can't use the real thing, and you use a simulator when you can't use the real thing and you want to find something out about it.

Simple Explanation.
If you want to convert your PC (running Windows) into Mac, you can do either of these:
(1) You can simply install a Mac theme on your Windows. So, your PC feels more like Mac, but you can't actually run any Mac programs. (SIMULATION)
(or)
(2) You can program your PC to run like Mac (I'm not sure if this is possible :P ). Now you can even run Mac programs successfully and expect the same output as on Mac. (EMULATION)
In the first case, you can experience Mac, but you can't expect the same output as on Mac.
In the second case, you can expect the same output as on Mac, but still the fact remains that it is only a PC.

Simulator: it is similar to interpreter.
i.e. it actually executes the real code in line by line to mimic the behaviour
Emulator: it is similar executable.
i.e. it takes compiled code and executes it.

The distintion between the two terms is a bit fuzzy. Coming from a world where "Emulators" are pieces of hardware that allow you debug embedded systems. And remember products that allowed you to have ICE (In Circuit Emulation) capabilities to debug a PC platform, I find the use of the term "Emulation" to be a somewhat of a misnomer for software that SIMULATES the behaviour of a piece of hardware.
My justification for the current use of the term is Emulation is that it may "augment" the functionality, and only is concerned with a "reasonable" approximation of the behaviour of the system.
ICE: (In Circuit Emulation)
A piece of hardware that is plugged into a board in place of the actual processor. It allows you to run the system as if the actual processor was present. Typically these have a variant of the processor on them to actually execute the software with glue logic to allow the user to break executation and single step under hardware control. Some would also provide logging capability. Most modern processors development systems have replace ICE type emulation with JTAG Emulation, where the JTAG just talks to the processor via a special purpose serial link and all execution is perform by the processor mounted on the board.
Software EMULATOR:
An 0x86 emulator is only concerned with being able to execute 0x86 assembly language, not providing accurate cycle per cycle behaviourial model of a SPECIFIC 0x86 processor. Bochs is an example of this. QEMU does this, but also allows "virtualization" using special kernel modules.
SIMULATOR:
Texas Instruments provides a CYCLE ACCURATE behaviourial model of there processors for software development that is intended to be a accurate SIMULATION of SPECIFIC processor cores behavior for the developers to use prior to having working hardware.
Software EMULATOR augmenting functionality:
BLEEM not only allowed you to run Playstation Software, but also allowed the display to be output with higher resolution than the Playstation was able to provide, and also took advantage of more advanced capabilities of GPUs that were avaliable. (i.e. Better blending and smoothing of textures.)

An emulator is an alternative to the real system but a simulator is used to optimize, understand and estimate the real system.

A simulation is a system that behaves similar to something else, but is implemented in an entirely different way. It provides the basic behavior of a system but may not necessarily abide by all of the rules of the system being simulated. It is there to give you an idea about how something works.
An emulation is a system that behaves exactly like something else, and abides by all of the rules of the system being emulated. It is effectively a complete replication of another system, right down to being binary compatible with the emulated system's inputs and outputs, but operating in a different environment to the environment of the original emulated system. The rules are fixed, and cannot be changed or the system fails.

Both terms are something completely different and only intersect very little. To find the right term is actually very easy, just think about following:
A simulation does not do anything for real. You can study it, for example how computer work, but it usually has no outcome other than that. A plane crash in a Flight Simulator causes no real harm. A weather forecast simulation itself does not change the weather.
An emulation does something for real. You can work with an emulated computer like with a physical one and create documents with it. And a plane crash in a Flight Emulator would have an outcome, like people experiencing the real impact including possible physical harm.
Your confusion probably stems from the fact, that "studying the simulation" and "accessing the emulation" often is quite the same thing.
You are not alone with your confusion. The Film "Matrix" speaks of a simulation. However The Matrix is running an emulation, as it has real impact on all members of The Matrix. In contrast the training room has no real impact, so this is a simulation (of The Matrix).
Let's see some examples.
Simulated vs. Emulated Rain
Take a water hose in the garden and let it rain. What's the difference between simulation and emulation here?
When you are simulating rain, people still will blame you for getting wet. Your rain has some real impact on the world, but your simulation hasn't, as the simulation does not fool anybody in that it is real rain.
In contrast, when you are emulating rain, people would blame the weather. This is, your emulated rain really behaves like rain in reality.
This rain emulation hence distorts reality,
in making people belie in the wrong culprit.
It took me quite some time to understand that.
Hence it isn't easy nor obvious which explains all the confusion.
Keep in mind that a simulation can have sideeffects,
like the weather forecast is based on simulations,
which takes quite some computing power and thus electrical energy,
which has an environmental impact.
Hence in the example of "simulated rain", people getting wet just is a sideffect and not part of the simulation. Same is true if you simulate a rainbow with this simulated rain. While the property of "how rainbows work" is part of this simulation, the simulation itself is not providing the rainbow, this just happens due to refraction of the sun on the sideffect of the waterdrops.
Simulated vs. Emulated Computer
While you might think "a simulated computer can have an outcome" this is practically wrong reasoning. If you save files onto a simulated harddrive, these files cannot leave the simulated drive outside of the simulation. You can obtain the files by studying the simulated drive, but this is not part of the simulation itself.
In case the harddrive saves the data such, that the data is actually usable outside of the simulation, you have an emulated harddrive within the simulation to do so.
So an emulation can be part of a simulation and vice versa.
Simulated vs. Emulated Filesystem
If you simulate a filesystem, you probably, for practicability, will choose to save the files onto your real filesystem as-is (perhaps with some additional meta-information). In that case the simulation seems to create real "value" outside of the simulation: Usable files!
But this is just by coincidence, because your simulated filesystem actually emulates a filesystem as well. You actually emulated the outside filesystem inside your simulation!
Simulated vs. Emulated TPM or HSM
A good example of the difference is, when you think of security. A TPM is a specific device to keep it's own keys secure (source of identity) while an HSM is a general device to secure foreign keys (verify identity).
Fun Fact: My fingers constantly type TMP instead of TPM.
If you simulate a TPM this has a huge effect on security, because then you can observe the internal states of the TPM. Which renders all the security void. Even that such a simulation can give you valuable hints of improving the design of a TPM itself, you won't want to expose precious data to the simulated TPM for real.
However if you emulate a TPM you will try to hide these internal states to the outside as good as you can. Such an emulated TPM then can be possibly used to really secure something else better than without it.
With a real TPM you cannot emulate the properties of a real HSM. All you can archive is to simulate an HSM, but this will not have the security properties of a real HSM, so all data which is stored in this simulated HSM will not be protected (they will only be protected within the simulation itself).
In contrast, with a real HSM you can emulate a TPM with all properties of a real TPM. For this the HSM needs to be constructed such, that no information needs to leave the HSM which does not leave a TPM as well.
(Please note that I do not know anything about HSMs or TPMs in particular, so it might be that there are no HSMs out there which are able to provide emulated TPMs.)
Simulated vs. Emulated World
If our world is simulated, we are simulations, too. Hence some spectator (let's call her God) can look at us and change the simulation any time. Also we cannot find out if we are simulated or not. As I am pretty sure that I know that I am, I do not think I am simulated, because self-awareness looks like an effect with a real component to me, which contradicts simulation. This also means, our world cannot be a simulation, too, as a simulation can only affect me like the world does, if I am part of the simulation.
But our world still can be emulated (like in the Film "Matrix"), as all I have to "prove the world" is my state of mind and sensory input, which I cannot verify, as I cannot leave myself. If I am not part of the emulation, then there should be a chance to observe discontinuity (like in the film "Matrix"), in case the emulation does not work flawlessly.
This changes when I emulated, too, like running an OS in an emulator. Then I cannot observe such errors, as my state can be reset from within the emulation (call it: Sleep) without observable discontinuation.
However I rather think that the world is a holographic hallucination than something like an emulation. Because if it is emulated, then I am pwned by somebody (call him Rick) who is running the emulation for some purpose, while a hallucination is purely my own thing.
I stop here, because hallucinations lead us to something completely different.

This question is probably best answered by taking a look at historical practice.
In the past, I've seen gaming console emulators on PC for the PlayStation & SEGA.
Simulators are commonplace when referring to software that tries to mimic real life actions, such as driving or flying. Gran Turismo and Microsoft Flight Simulator spring to mind as classic examples of simulators.
As for the linguistic difference, emulation usually refers to the action of copying someone's (or something's) praiseworthy characteristics or behaviors. Emulation is distinct from imitation, in which a person is copied for the purpose of mockery.
The linguistic meaning of the verb 'simulation' is essentially to pretend or mimic someone or something.

In computer science both a simulation and emulation produce the same outputs, from the same inputs, that the original system does; However, an emulation also uses the same processes to achieve it and is made out of the same materials. A simulation uses different processes from the original system. Also worth noting is the term replication, which is the intermediate of the two - using the same processes but being made out of a different material.
So if I want to run my old Super Mario Bros game on my PC I use an SNES emulator, because it is using the same or similar computer code (processes) to run the game, and uses the same or similar materials (silicon chip).
However, if I want to fly a Boeing 747 jet on my PC I use a flight simulator because it uses completely different processes from the original (there are no actual wings, lift or aerodynamics involved!).
Here are the exact definitions taken from a computer science glossary:
A simulation is a model of a system that captures the functional connections between inputs and outputs of the system, but without necessarily being based on processes that are the same as, or similar to, those of the system itself.
A replication is a model of a system that captures the functional connections between inputs and outputs of the system and is based on processes that are the same as, or similar to, those of the system itself.
An emulation is a model of some system that captures the functional connections between inputs and outputs of the system, based on processes that are the same as, or similar to, those of that system, and that is built of the same materials as that system.
Reference: The Open University, M366 Glossary 1.1, 2007

Both are models of an object that you have some means of controlling inputs to and observing outputs from.
The key difference is that:
With an emulator, you want the output exactly match what the object you are emulating would produce.
With a simulator, you want certain properties of your output to be similar to what the object would produce.
Let me give an example -- suppose you want to do some system testing to see how adding a new sensor (like a thermometer) to a system would affect the system. You know that the thermometer sends a message 8 time a second containing its measurement.
Simulation -- if you do not have the thermometer yet, but you want to test that this message rate will not overload you system, you can simulate the sensor by attaching a unit that sends a random number 8 times a second. You can run any test that does not rely on the actual value the sensor sends.
Emulation -- suppose you have a very expensive thermometer that measures to 0.001 C, and you want to see if you can get by with a cheaper thermometer that only measures to the nearest 0.5 C. You can emulate the cheaper thermometer using an expensive thermometer and then rounding the reading to the nearest 0.5 C and running tests that rely on the temperature values.
Note that simulations can also be used for forecasting or predicting behavior. Finite element analysis simulations are used in many applications, including weather prediction and virtual wind tunnels.
The definitions of the terms:
emulation -- surpass or exactly match
simulate -- imitate in appearance or character

The definitions of the words describe the difference the best. A google search gives the following definitions of simulate and emulate:
simulate imitate the appearance or character of.
emulate match or surpass (a person or achievement), typically by imitation.
A simulation imitates a system. An emulation simulates a system so well that it could replace it or may even surpass it.
In computing, an emulation would be a drop in replacement for the system it is emulating. Often times it will even outperform the system it is imitating. For example, game console emulators usually make improvements such as greater hardware compatibility, better performance, and improved audio/video quality.
Simulations, on the other hand, are limited by them being models. They are a best attempt to mimic a system, but not replacements for it. There are hardware emulators because hardware can be imitated and it would be hard to tell the difference. There is no Farming Emulator because there is no emulation that could replace actual farming. We can only simulate a model of farming to gain insight on how to farm better.

A Virtual PC tries to emulate a Computer, from the point of view of a Programmer BUT, at the same time, it simulates a Computer from the point of view of a Electrical Engineer.

Simulator is something more broader than Emulator and it seems like the duality of this terms is overthought in the posts above.
Emulator
People decided to use a new word emulation in the "computer world" when they started replacing some hardware parts of the existing system in straightforward manner - imitating their behaviour and relying on the computational nature to be sure to not break something and leave everything in the equivalent state. So we have emulated the piece of this! (and the whole still works as before)
Emulator usually used in narrow sense in digital area as replacement and virtualization - presenting in digital form as a piece of software - of something known and existed before (virtual chips, circuit boards, electronic devices). So when the world became more digital and brought the emulator word to the masses, the masses added uncertainty to it (or additional reasons).
Simulator
First of all, I saw many comments about emulators do or replace something real but simulators not.
BUT flight simulator is used for a real thing - it trains pilots, gives them skill up and knowledge and it replaces expensive real planes and saves much of money. And we cannot just say a plane-emulator because we have inner feeling that this is much more than that, so we call it simulator :) Plane simulator could contain emulated radar or transponder that is true.
Contra-statements that simulators are used for analysis and study (and emulators for something real), but that analysis and study not less a real thing than emulated GSM boards (even more in the informational age we live in). Analysis adds a value to the business, cuts costs or points out to profits not less than the replaced (emulated) hardware.
Simulator is similar to modelling of something that we can't obtain for some reason (cost, technology, physical impossibility). It is usually simulated for something new or intangible or complex or not properly known to us like market, weather, combustion, user. So here comes the flight, black hole, stock exchange, simulations.
So finally:
Simulator is broader than Emulator
Simulator tends to imitate/model more global processes/things in general with ability to narrow the imitation down (e.g. capacitor simulator with presets representing some known models)
Emulator tends to imitate certain hardware devices with certain specification, known characteristics and properties (e.g. SNES emulator, Intel 8087 or Roland TB-303)
As for words origin
All came from Latin and mean:
emulate is "to be equal" (looks like more aggressive and straightforward - rivalry)
simulate is "to be similar" (looks like more sly and tricky - imitation)

Emulator:
Consider a situation that you know only English and you are in China. In order to interact with a Chinese person you need a translator. Now, role of translator is that it will seek input from you in English and convert to Chinese and and give that input to the Chinese person and gets response from the Chinese person and convert to English and give the output to you in English. Now that translator and Chinese person is the emulator. Both combine will provide similar functionality as if you were communicating with the English person. So hardware may be different but functionality will be same.
Simulator:
I can't give better example than SPICE or flight simulator. Both will replace hardware component behavior with the software or mathematical model which will behave similar to the hardware.
In the end it depends on the context that which solution better suits project needs.

Emulation is like
Abstruction.
It shows what it can do.
Example: Car driving emulation.
Simulation is like
Encaptulation.
It shows how it can do
Example: Car engine inner activity.

The simulator is necessarily a scale model.
Emulators pretend to be a 1:1 model.

How do I go about reverse engineering a UDP-based custom game protocol with nothing other than Wireshark?

How do I go about reverse engineering a UDP-based custom game protocol with nothing other than Wireshark? I can log a bunch of traffic, but then what? My goal is to write a dissector plugin for Wireshark that will eventually be able to decode the game commands. Does this seem feasible? What challenges might I face? Is it possible the commands are encrypted?

Yeah, it's feasible. But how practical it is will depend on the game in question. Compression will make your job harder, and encryption will make it impossible (at least through Wireshark - you can still get at the data in memory).
Probably the best way to go about this is to do it methodically - don't log 'a bunch of traffic' but instead perform a single action or command within the game and see what data is sent out to communicate that. Then you can look at the packet and try to spot anything of interest. Usually you won't learn much from that, so try another command and compare the new message with the first one. Which parts are in the same place? Which parts have moved? And which parts have changed entirely? Look especially for a value in a fixed position near the start of the packet that could be describing the message type. Generally speaking the start of the packet will be the generic stuff like the header and later parts of the packet will be the message-specifics. Consider that a UDP protocol often has its own hand-rolled ordering or reliability scheme and that you might find sequence numbers in there near the start.
Knowing your data types is handy. Integer values might be stored in big-endian or little-endian format, for example. And many games send data as floating point values, so be on the look-out for 2 or 3 floats in a row that might be describing a position or velocity.

Commercial games expect that people will try to hack the protocol as a means to cheat, so will generally use encryption and probably tamper-detection as well.
Stopping this type of activity is of great concern to game makers because it ruins the experience for the majority of players when a few players have super-tools. For games like online poker the consequences are even more severe.

Reliably extracting identity fields from scanned documents / images?

I have to pull two pre-printed (not hand-written) fields out of a paper form, such that it can be automatically routed after being scanned. The fields contain batch and item identifiers, like "GG-9192" or "EPN/245G".
I've tried the following software:
Tesseract-OCR
Cuneiform
Canon ImageRunner built-in OCR
Asprise OCR Java API (demo)
I've tried the following settings:
Scanning at resolutions of 300dpi and 600dpi
Tried different fonts, including OCR-A and OCR-B.
In all cases output was pretty much all over the place. I can kick back documents for which I can't properly extract the necessary information, but I'm thinking it's going to be at least half of them. I considered some sort of fuzzy logic based on known values in a database, but sometimes these identifiers can differ by a single character, like "123G" and "123C".
Is this a lost cause? Perhaps OCR just isn't mature enough to handle a requirement of this nature? What other techniques might you recommend? Barcodes?
Edit: the containing application is in Java, so any recommendations for which there are free or cheap Java-based APIs for would help.
Edit 2: if anyone is interested...without any special tuning, Cuneiform for Linux and the Canon ImageRunner worked best, with Tesserect-OCR and Asprise Java API producing the worst results...none of the four was acceptable for anything but standard document search grade OCR. I'm beginning to think that this isn't going to work out.

If you have control over the fields, why use a human-readable format in the first place? For scanning, it seems like a QR Code, or something similar would be best. It is marked for orientation, and has some built-in error correction.
http://en.wikipedia.org/wiki/QR_Code

I started digging for products starting with Tomato's suggestion. I tried ABBYY and CVISION. Both have products that can automate OCR:
CVISION Maestro Recognition Server 4.0
ABBYY Recognition Server 2.0
In addition, ABBYY has SDKs for various platforms, and CVISION has an SDK that appears to work with at least VB/VC++.
I haven't tried either SDK yet, and am not sure it's necessary for my project. All I need is PDFs coming in that I can extract the text from. I did however try CVISION's server product and with the OCR on its most accurate settings, it worked really well. I haven't tried ABBYY's server product yet because I have to go through a reseller to get a trial. I'm in the process of doing so, but if it starts getting annoying I'm probably going to go with CVISION. I did try ABBYY's FineReader standalone product, and it worked very well, so I assume that their server product would also.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008