How can I analyze live data from webcam? - binary

I am going to be working on self-chosen project for my college networking class and I just had a couple questions to help get me started in the right direction.
My project will involve creating a new "physical" link over which data, in the form of text, will be transmitted from one computer to another. This link will involve one computer with a webcam that reads a series of flashing colors (black/white) as binary and converts it to text. Each series of flashes will simulate a packet of data. I will be using OSX an the integrated webcam in a Macbook, the flashing computer will either be windows or osx.
So my questions are: which programming languages or API's would be best for reading live webcam data and analyzing the color of a certain area as well as programming and timing the flashes? Also, would I need to worry about matching the flash rate of the "writing" computer and the frame capture rate of the "reading" computer?
Thank you for any help you might be able to provide.

Regarding the frame capture rate, Shannon sampling theorem says that "perfect reconstruction of a signal is possible when the sampling frequency is greater than twice the maximum frequency of the signal being sampled". In other words if your flashing light switches 10 times per second, you need a camera of more than 20fps to properly capture that. So basically check your camera specs, divide by 2, lower the resulting a little and you have your maximum flashing rate.
Whatever can get the frames will work. If the light conditions in which the camera works are gonna be stable, and the position of the light on images is gonna be static then it is gonna be very very easy with checking the average pixel values of a certain area.
If you need additional image processing you should probably also find out about OpenCV (it has bindings to every programming language).

To answer your question about language choice, I would recommend java. The Java Media Framework is great and easy to use. I have used it for capturing video from webcams in the past. Be warned, however, that everyone you ask will recommend a different language - everyone has their preferences!
What are you using as the flashing device? What kind of distance are you trying to achieve? Something worth thinking about is how are you going to get the receiver to recognise where within the captured image to look for the flashes. Some kind of fiducial marker might be necessary. Longer ranges will make this problem harder to resolve.
If you're thinking about shorter ranges, have you considered using a two-dimensional transmitter? (given that you're using a two-dimensional receiver, it makes sense) and maybe have a transmitter that shows a sequence of QR codes (or similar encodings) on a monitor?
You will have to consider some kind of error-correction encoding, such as a hamming code. While encoding would increase the data footprint, it might give you overall better bandwidth given that you can crank up the speed much higher without having to worry about the odd corrupt bit.
Some 'evaluation' type material might include you discussing the obvious security risks in using such a channel - anyone with line of sight to the transmitter can eavesdrop! You could suggest in your writeup using some kind of encryption, a block cipher in CBC would do, but would require a key-exchange prior to transmission, so you could think about public key encryption.

Related

Chess Engine - Confusion in engine types - Flash as3

I am not sure this kind of question has been asked before, and been answered, by as far as my search is concerned, I haven't got any answer yet.
First let me tell you my scenario.
I want to develop a chess game in Flash AS3. I have developed the interface. I have coded the movement of the pieces and movement rules of the pieces. (Please note: Only movement rules yet, not capture rules.)
Now the problem is, I need to implement the AI in chess for one player game. I am feeling helpless, because though I know each and every rules of the chess, but applying AI is not simple at all.
And my biggest confusion is: I have been searching, and all of my searches tell me about the chess engines. But I always got confused in two types of engines. One is for front end, and second is real engines. But none specifies (or I might not get it) which one is for which.
I need a API type of some thing, where when I can get searching of right pieces, and move according to the difficulty. Is there anything like that?
Please note: I want an open source and something to be used in Flash.
Thanks.
First of all http://nanochess.110mb.com/archive/toledo_javascript_chess_3.html here is the original project which implements a relatively simple AI (I think it's only 2 steps deep) in JavaScript. Since that was a contest project for minimal code, it is "obfuscated" somewhat by hand-made reduction of the source code. Here's someone was trying to restore the same code to a more or less readable source: https://github.com/bormand/nanochess .
I think that it might be a little bit too difficult to write it, given you have no background in AI... I mean, a good engine needs to calculate more then two steps ahead, but just to give you some numbers: the number of possible moves per step, given all pieces are on the board would be approximately 140 at max, the second step thus would be all the combination of these moves with all possible moves of the opponent and again this much combinations i.e. 140 * 140 * 140. Which means you would need a very good technique to discriminate the bad moves and only try to predict good moves.
As of today, there isn't a deterministic winning strategy for chess (in other words, it wasn't solved by computers, like some other table games), which means, it is a fairly complex game, but an AI which could play at a hobbyist level isn't all that difficult to come up with.
A recommended further reading: http://aima.cs.berkeley.edu/
A Chess Program these days comes in two parts:
The User Interface, which provides the chess board, moves view, clocks, etc.
The Chess Engine, which provides the ability to play the game of chess.
These two programs use a simple text protocol (UCI or XBoard) to communicate with the UI program running the chess engine as a child process and communicating over pipes.
This has several significant advantages:
You only need one UI program which can use any compliant chess engine.
Time to develop the chess engine is reduced as only a simple interface need be provided.
It also means that the developers get to do the stuff they are good at and don't necessarily have to be part of a team in order to get the other bit finished. Note that there are many more chess engines than chess UI's available today.
You are coming to the problem with several disadvantages:
As you are using Flash, you cannot use this two program approach (AFAIK Flash cannot use fork(). exec(), posix_spawn()). You will therefore need to provide all of the solution which you should at least attempt to make multi-threaded so the engine can work while the user is interacting with the UI.
You are using a language which is very slow compared to C++, which is what engines are generally developed in.
You have access to limited system resources, especially memory. You might be able to override this with some setting of the Flash runtime.
If you want your program to actually play chess then you need to solve the following problems:
Move Generator: Generates all legal moves in a position. Some engine implementations don't worry about the "legal" part and prune illegal moves some time later. However you still need to detect check, mate, stalemate conditions at some point.
Position Evaluation: Provide a score for a given position. If you cannot determine if one position is better for one side than another then you have no way of finding winning moves.
Move Tree and pruning: You need to store the move sequences you are evaluating and a way to prune (ignore) branches that don't interest you (normally because you have determined that they are weak). A chess move tree is vast given every possible reply to every possible move and pruning the tree is the way to manage this.
Transpotion table: There are many transpositions in chess (a position reached by moving the pieces in a different order). One method of avoiding the re-evaluation of the position you have already evaluated is to store the position score in a transposition table. In order to do that you need to come up with a hash key for the position, which is normally implemented using Zobrist hash.
The best sites to get more detailed information (I am not a chess engine author) would be:
TalkChess Forum
Chess Programming Wiki
Good luck and please keep us posted of your progress!

Manually feeding x264 with my own motion data?

I am trying to encode a stream using x264 (by feeding individual images), but what's unusual is that I already have some motion information for my frames. I know exactly which areas have been modified in each frame, and I know where motion has occurred in the frame.
Is there a way to feed x264 my own motion information? I'd like to give it motion vectors for given areas in the frame, and somehow tell it that certain areas in the frame are guaranteed to not have had any motion in them.
I think this might significantly improve the performance of the encoding (because I'm allowing the codec to completely skip the motion estimation phase), and should also somewhat increase quality in cases where the encoder's motion estimation algos might have missed the motion that actually occurred.
Do I need to modify the encoder in order to do this, or is this supported in the existing API?
Short answer: No you can't feed in your motion estimation data to x264.
Long Answer: IIRC, x264 does it's work by being fed in the raw frame, with no extra data. To accommodate the motion estimation data you have, you'd have to modify the x264 source code to accomplish this.
You may be able to find what you need within common\mvpred.c or encoder\me.c. I'm not sure how many of the x264 developers actually visit Stack overflow (I know one of their lead developers has an account here) but you can try talking to them through their usual channels on their IRC channel or on the doom9 forums.
doom9: http://forum.doom9.org/forumdisplay.php?f=77
doom10:http://doom10.org/index.php?board=5.0 IRC:
irc://irc.freenode.net/x264 and irc://irc.freenode.net/x264dev
Mailing list: http://mailman.videolan.org/listinfo/x264-devel
I wish I could give you more information, but unfortunately I'm not particularly well versed in the code base. The developers are always willing and able to help anyone wishing to work on x264 though.

Solar system computer model

I'm interested in building a 3D model of our solar system for web use (probably with AS3 and papervision) and have been looking into how I would go about encoding the planetary positions. My idea was to download the already calculated positions from NASA as calculating the positions myself seems a but overcomplicated. I'm not sure though whether I should use a helio centric or an earth centric encoding.
I wanted to know if there are any one with any experience in this. Which approach would be better? The NASA JPL website seems to have the positions of all the major bodies in our solar system as earth centric. I can see this becoming a problem later on though when adding Voyager and Mars Lander missions to the model?
Any feedback, comments and links are very welcome.
EDIT: I have a rough model running that uses heliocentric coordinates, but I haven't been able to find the coordinates for all planets in this format.
UPDATE:
I don't have a lot of detail to provide for know because I really don't know what I'm doing (from the space point of view). I wanted to get a handle on 3D programming, and am interested in space. The idea was that I would make a rough solar system simulator with at first all the planets and their orbiters (maybe excluding satellites at first). Perhaps include a news aggregator and some links to news/resources and so on. The general idea would be to allow people to click around and get super excited about going to the moon and Mars (for a starter).
In the long run I hopefully would be able to add in satellites and the moon missions (scroll back in time to the 70's and see the moon missions).
So to answer Arrieta's question the idea was not to calculate eclipses but to build an easy to approach, interactive space exploratorium, and learn some 3D and space related stuff on the way.
Glad you want to build your own simulator, but depending on what you want to do it may be far from an easy task. The simplest approach is as follows:
Download the JPL-DE405 ephemerides and the subroutines for retrieving the planetary positions (wrt Solar System Barycenter).
Request for timespan, compute the positions, and display them to the screen in a visually appealing manner
Done
Now, why would you want to do this? If you want to view the planet's orbits, that's it. You are done. If you want to compute geometric events (like eclipses, or line-of-sight, or ilumination) then you are in a whole different ball game. That's astronautics, and it is not simple.
Please be more specific. The distinction you make of "geocentric" or "heliocentric" coordinates really has no major difficulty involved. If you have all the states in heliocentric frame, you can compute the geocentric frame by simple vector subtraction. That's not the problem! The problems are a thousand more, but you need to be specific so we can provide more guidance.
JPL has provided high quality ephemerides for decades now, and we have a full team of brilliant people working on it. It is one of the most difficult things to get right!
Again, provide more details or check out other sources of information.
Please google "Solar System Simulator" (done here, at JPL) and see if it fulfills your needs.
Cheers.
It may be worth you checking out the ASCOM Platform (we also have a stack exchange site called ASCOM Answers).
The ASCOM Platform has several useful libraries for doing this sort of thing.
USNO NOVAS (Naval Observatory Vector Astrometry)
Kepler orbit engine
The USNO/NOVAS stuff was originally written in C and we've wrapped it up in .NET for ease of use from C# and VB.
As an added bonus (actually it's the raison d’être for ASCOM), the Platform makes it easy for you to control things like telescopes, it's used by Microsoft's World Wide Telescope for exactly that purpose. I tmight be a fun extension to your model to be able to point a telescope at things.
I'd probably start (well, I did a while back) with heliocentric coordinates and get a few of the planets up and running. But sooner or later you'll want to write a heliocentric-to-geocentric coordinate conversion routine, and its inverse. For some bodies, such as artificial satellites the geocentric coordinates will be easier to deal with.
You can use the astro-phys api to get a JSON formatted state vector for all the planets. It calculates them using JPL's de406 so it's pretty accurate and uses the solar system barycenter.
Although, if you know where the sun is relative to the earth and you're in a geocentric model, you can subtract the position of the sun from all of the bodies (including earth) to be heliocentric.

Usability: speech recognition versus keypad

We are seeing more and more speech recognition implemented and request for libraries that does good speech recognition. What's the rationale (in term of usability) behind it versus a keyboard or keypad? What reasons would you have to invest in this development?
For example, let's take the call centers. A few years ago, almost every call center used an IVR that prompted for a key for the menus. Now, we're seeing more and more menus with prompt for a spoken keyword and/or a pressed keypad: "please say invoice or press 1 to see your invoice". Or we are seeing the same thing in companies' phone directory: "please say the name of the person you are trying to reach" ... "Franck Loyd" ... "Did you say Jack Freud? Please say yes if you want to reach this person or say no to try again".
I guess it's a plus when you're in your car without holding your phone but is it worth the additional waiting time? Longer interaction for all the choices, longer prompt time while trying to analyze if something was said and so on? Also, reliability is better than it was, definitely, but sometime it feels more like an toy someone decided to plugged into the system so it can feel futuristic.
Any experience designing IVR or software that used (or chose not to) speech recognition?
Thanks!
What's the rationale (in term of
usability) behind it versus a keyboard
or keypad?
Usability is a very broad term. If I were to attempt to enter my address with a touch pad, it wouldn't be considered very usable. Some argue that using a speech engine with an overall success rate of 70-80% isn't very usable either. As indicated in other posts, hands free input can be much easier for those on a mobile phone. However, using words versus numeric input can actually be less intuitive than a touch tone phone if the topic is somewhat foreign to the caller. A caller hearing terms and phrases that aren't very familiar can't remember them in the 10-30 seconds of the prompt but they can hover over the best sounding choice with their finger or remember the order of choices.
What reasons would you have
to invest in this development?
This is an odd question. Usually the decision to use speech or not in an IVR environment is not driven from the development view of the world. Unless you have a specific requirement that really requires speech, you are almost always reducing overall success rates. Speech is usually a factor of corporate image ... or having the latest technological toy.
I guess it's a plus when you're in your car without holding your phone
but is it worth the additional waiting time?
Speech recognition latencies aren't very high these days when using modern ASRs. In most cases, input is handled in parallel with speech and time between end of speech recognition is .5 to 1s. Be aware that many IVRs then need to perform data look-ups after some inputs and this can appear as a slower system. Normal inputs pushing beyond 1s is usually the sign of an under-powered deployment.
It may not have been under-powered when original implemented, but through tuning efforts, you make a lot of performance versus accuracy decisions. To get that next .1%, resources can be pushed beyond what they should be at peak.
Also, reliability is better than it was, definitely,
but sometime it feels more like an toy someone decided
to plugged into the system so it can feel futuristic.
In general, yes. On the reliability note, you need to really look at the overall numbers to get a sense of the system. It is a battle of statistics where the individual isn't very important (unless they hold the title of VP or above). Through optimization of the input (shifting prompting), resource usage and other speech reco tuning parameters you attempt to maximize accuracy. For basic natural language responses, you can get in the upper 90s. However, your overall success rate is much lower. Imagine 5 prompts all at 98% (in reality, you tend to have a bunch 99 and then a few mid 90s or slightly below): .98 * .98 * .98 * .98 * .98 = 90%. That means 1 out of 10 failing. That is before caller confusion and business rules. DTMF input is usually very near 100%, even after several inputs.
Any experience designing IVR or software that
used (or chose not to) speech recognition?
Yes. But, I suspect that really isn't the question you want. As someone on the technology side, this is usually not your decision and you have limited influence on it. If you are really looking for the pros/cons of speech:
Pros:
Cool/hip (note, speech alone isn't sufficient. You need a great VUI and voice talent)
Good for a highly mobile crowd that shuns ear pieces. The future is supposed to be blending speech with tactile input. Maybe. It probably won't come from the IVR side of the market.
Good for tasks that can't be done with DTMF. Note, many of these problems tend to have low success rates in speech as well. Cost (versus humans) is usually the driving factor not usability. Dropping a call into a voicemail box for things like address change can be very cost effective.
Cons:
Expensive to development, deploy and maintain. Adding new choices can have a significant impact on success rates if you aren't careful. Always monitor the impact of change.
Is often deployed inappropriately. For example, just say your numeric menu choice. This is nearly often a case of we want speech coolness, but can't afford what it really takes to achieve speech coolness.
Success rates will be lower and therefore call center costs will be higher.
Failures tend to focus on specific prompts and individual callers. A caller that regularly experiences problems with your system will be very unhappy with you.
Callers get angry when they aren't understood. Is your goal to identify a subset of your customer base and really get them angry ?
I think that speech-recognition like any method of input has it's pro's and con's.
Pro's
No learning curve, we have been speaking since a very young age.
Very user-intuitive.
On the phone, no need to constantly move the headset from your ear.
Con's
Longer wait time
If bad sound quality, takes multiple attempts to get the selection right.
In some cases a company is required to handle rotary phones. It might be found as more cost affective to just setup the recognition system instead of both.
Voice recognition has a lot more overhead than touch tones. If you want the best results you need to constantly tweak the app and train the system on unrecognized word pronunciations. You also need to be very particular on how you prompt the user with voice recognition or you may get unexpected responses.
Overall touch tone is a lot easier as there are only a limited set of possible options at any given time.
If your app is straight forward enough you voice rec many only complicate it. Press 2 for some other language..
Speech recognition is definetly the wave of the future when combined with touchscreen technology. As example I use tazti speech recognition. It's available in XP and Vista version. Since Microsoft's touchscreen "Surface" platform runs on Vista, I'm sure tazti will work with the touchscreen technology. When I tried tazti speech recognition the built in commands worked great. Also it let's me create my own speech commands and those also work great. Voice searching Google and Yahoo, Wikipedia Youtube and many other search engines works great. Has many other features as well. But it doesn't have dictation. I found that I eliminate 70% or more of my internet generated clicks.... maybe more. NOTE: Tazti is a free download from their website.

How would you go about reverse engineering a set of binary data pulled from a device?

A friend of mine brought up this questiont he other day, he's recently bought a garmin heart rate moniter device which keeps track of his heart rate and allows him to upload his heart rate stats for a day to his computer.
The only problem is there are no linux drivers for the garmin USB device, he's managed to interpret some of the data, such as the model number and his user details and has identified that there are some binary datatables essentially which we assume represent a series of recordings of his heart rate and the time the recording was taken.
Where does one start when reverse engineering data when you know nothing about the structure?
I had the same problem and initially found this project at Google Code that aims to complete a cross-platform version of tools for the Garmin devices ... see: http://code.google.com/p/garmintools/. There's a link on the front page of that project to the protocols you need, which Garmin was thoughtful enough to release publically.
And here's a direct link to the Garmin I/O specification: http://www.garmin.com/support/pdf/IOSDK.zip
I'd start looking at the data in a hexadecimal editor, hopefully a good one which knows the most common encodings (ASCII, Unicode, etc.) and then try to make sense of it out of the data you know it has stored.
As another poster mentioned, reverse engineering can be hairy, not in practice but in legality.
That being said, you may be able to find everything related to your root question at hand by checking out this project and its' code...and they do handle the runner's heart rate/GPS combo data as well
http://www.gpsbabel.org/
I'd suggest you start with checking the legality of reverse engineering in your country of origin. Most countries have very strict laws about what is allowed and what isn't regarding reverse engineering devices and code.
I would start by seeing what data is being sent by the device, then consider how such data could be represented and packed.
I would first capture many samples, and see if any pattern presents itself, since heart beat is something which is regular and that would suggest it is measurement related to the heart itself. I would also look for bit fields which are monotonically increasing, as that would suggest some sort of time stamp.
Having formed a hypothesis for what is where, I would write a program to test it and graph the results and see if it makes sense. If it does but not quite, then closer inspection would probably reveal you need some scaling factors here or there. It is also entirely possible I need to process the data first before it looks anything like what their program is showing, i.e. might need to integrate the data points. If I get garbage, then it is back to the drawing board :-)
I would also check the manufacturer's website, or maybe run strings on their binaries. Finding someone who works in the field of biomedical engineering would also be on my list, as they would probably know what protocols are typically used, if any. I would also look for these protocols and see if any could be applied to the data I am seeing.
I'd start by creating a hex dump of the data. Figure it's probably blocked in some power-of-two-sized chunks. Start looking for repeating patterns. Think about what kind of data they're probably sending. Either they're recording each heart beat individually, or they're recording whatever the sensor is sending at fixed intervals. If it's individual beats, then there's going to be a time delta (since the last beat), a duration, and a max or avg strength of some sort. If it's fixed intervals, then it'll probably be a simple vector of readings. There'll probably be a preamble of some sort, with a start timestamp and the sampling rate. You can try decoding the timestamp yourself, or you might try simply feeding it to ctime() and see if they're using standard absolute time format.
Keep in mind that lots of cheap A/D converters only produce 12-bit outputs, so your readings are unlikely to be larger than 16 bits (and the high-order 4 bits may be used for flags). I'd recommend resetting the device so that it's "blank", dumping and storing the contents, then take a set of readings, record the results (whatever the device normally reports), then dump the contents again and try to correlate the recorded results with whatever data appeared after the "blank" dump.
Unsure if this is what you're looking for but Garmin has created an API that runs with your browser. It seems OSX is supported, as well as Windows browsers... I would try it from Google Chromium to see if it can be used instead of this reverse engineering...
http://developer.garmin.com/web-device/garmin-communicator-plugin/
API Features
Auto-detection of devices connected to a computer Access to device
product information like product name and software version Read
tracks, routes and waypoints from supported recreational, fitness and
navigation devices Write tracks, routes and waypoints to supported
recreational, fitness and navigation devices Read fitness data from
supported fitness devices Geo-code address and save to a device as a
waypoint or favorite Read and write Garmin XML files (GPX and TCX) as
well as binary files. Support for most Garmin devices (USB, USB
mass-storage, most serial devices) Support for Internet Explorer,
Firefox and Chrome on Microsoft Windows. Support for Safari, Firefox
and Chrome on Mac OS X.
Can you synthesize a heart beat using something like a computer speaker? (I have no idea how such devices actually work). Watch how the binary results change based on different inputs.
Ripping apart the device and checking out what's inside would probably help too.