What are the benefits of Gray code in evolutionary computation? - binary

Books and tutorials on genetic algorithms explain that encoding an integer in a binary genome using Gray code is often better than using standard base 2. The reason given is that a change of +1 or -1 in the encoded integer, requires only one bit flip for any number. In other words, neighboring integers are also neighboring in Gray code, and the optimization problem in Gray encoding has at most as many local optima as the original numeric problem.
Are there other benefits to using Gray code, compared to standard base 2?

Gray Encoding is used to avoid the occurrences of Hamming Walls. As explained in this paper, Section 3.5.
Basically A Hamming wall is a point at which it becomes rare or highly unlikely that the GA will mutate in exactly the right way to produce the next step in fitness.
Due to the properties of Gray Coding, this is much less likely to happen.

Related

Is there a trick to creating an animated gif of tv static that will allow it to be relatively small?

Apologies in advance, but this isn't really a photoshop question. Rather, I'm trying to come up with something that is convincing but exploits the compression and features of the gif format as best as possible to produce the smallest possible file for the animation.
Some constraints:
It needs to be at least 20 or 30 frames. I've tried with fewer (and since they're largely uncompressable 15 frames is half the size of 30, generally speaking)
Size needs to be no less than about 256x192
It doesn't need to be color though, nor even full grayscale. I've seen convincing stills with as few as about 16 grays
It can have a pattern, but not one that is instantly obvious to the human eye. If someone takes a single frame and after a minute or two can spot the pattern (which makes it compressable?) that's ok
Frames 2 through n can use quite a bit of alpha, but when I started using big horizontal stripes of alpha, it was instantly noticeable to my eyes. So you don't get to rack up a bunch of RLE with the easy cheat.
All of the above and still needs to look good at 30-33ms frame speed. No variable speed or relying on anything significantly faster than that.
Also acceptable: an apng that complies with the above constraints. Possibly even mpeg, if you can come up with that (I'm ignorant of how the DCT does its magic).
Ideally I could get something down in the 250kbyte range, but I'd settle for anything significantly smaller than the 9 meg monstrosity I cooked up last week.
Oh, and one last thing: obviously I don't expect anyone to supply the graphic for me. I'm just looking for some trick(s) that will let me get there myself eventually.
This is a very interesting question.
Static (random noise) by its nature is actually highly incompressible. Information theory says that true noise is basically incompressible, and the more patterns something contains the more compressible it becomes (to the point of a solid line of 1's or 0's being perfectly compressible.
The ideal would be to create a true noise generator (just random numbers), but that doesn't help within the constraints of your problem.
The best thing I can think of is storing a number of small tiles of static and displaying them in staggered fashion to prevent the eye catching on to any patterns. Aside from that, you won't have much luck compressing this beyond 256 x 192 x 20 / 2 or about 500 kilobytes ( assuming 20 frames with resolution of 256 x 192, using 4 bit color depth ).
Simply encoding your animated gif in 16 color mode should get you to that point.
Well old but still unanswered answer (not checked anyway)
so create the NoSignal image data
If it is not obvious how read this:
NoSignal in asm and C++
encode into gif
Had played with it a bit so I used resolution 320x240, the lowest bit resolution usable is 3 bit per pixel. Lower does not look good. Single global palette only (obvious) here 300KB example
[Notes]
if this is just for some app then generate the image on the run it is really just few lines of code see that linked answer in bullet #1
Yes, you can achieve that with a lossy GIF compression, or rather a specifically rigged compressor that outputs noisy LZW stream.
A best-case scenario for LZW compression is to output X pixels, then X+1 pixels, then X+2 pixels, etc. It's easy to make that noisy.
Try screwing up the gfc_lookup function to (almost) always return longest dictionary item and compress series of noisy frames with it:
https://github.com/pornel/giflossy/blob/master/src/gifwrite.c#L270
Not easily normally. Good randomness (high entropy) by definition does not compress well. Having it greyscale may help, but not much.
If you want to do this on a web page and you have (some) control, you can always write a very small bit of JS to help... if you can do this, then you can do the following:
Create a gif about 1.5x the size you need with high-entropy static.
Set the clipping to the size you want.
Then you randomly move it around by changing the starting offset.
As long as your offsets are a decent distance away from one another (and don't repeat patterns) it is usually difficult to discern it as movement, and it looks truly like static.
I did this trick about 20 years ago on an Amiga to emulate static on a limited-memory demo, and it worked remarkably well... it also does not require fast low-level code as all was done by changing offsets and the co-processor bitblit-ed the rest.

html5 svg vs canvas for granite like background

i want to make to make a granite like background like http://www.tivli.com/ with a gradient at the center. i have found how to do gradient with both in the w3c tutorials, but are there any tutorials on how to make granite backgrounds in html5 canvas or svg? Thanks.
The site you referenced actually uses a simple 'noize.png' and then uses css3 radial gradients to buildup that background. I know you already knew that, I'm mentioning this for future readers.
Given this fact, I'll assume in the rest of my answer you want to learn, not a copy-pasta solution.
I've given up on svg looong time ago. But in canvas it's easy and fun... (especially now flash is FINALLY officially dead. Hurray).
So as others have already suggested in the comments to your question, why not use a seamless noise image? (you know where to find one :P).
You could still embed this image as 'DATA' in the html(, HINT: even or even feed image-data straight into canvas that will render it as your 'noise.php').
But then.. you are right: what if you wanted to change the noize-size?
And you want to know how to make granite/noise anyway..
And is mathematically/programmatically describing this noise lower in character-count (file-size) than supplying a ready-made image(-fragment)?
Start UPDATE 2 part 1:
Actually, after some good night sleep I realized/remembered that visual noise is one of the BEST way's to determine randomness. Humans are notoriously good at finding visual patterns, even professionals use this (and as such this is also heavily used in cryptography where one would need -for instance- a useful one time pad).
Also see 'commander' Crockford's YUI-lecture 'Principles of Security' from 19m07s to 22m37s.
Now why is this important? Well ECMA-script (aka javascript) defines a loose Math.random() function:
"returns a number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with
approximately uniform distribution over that range, using an
implementation-dependent algorithm or strategy"
Re-read the italic/bold part and welcome yourself to reality: each and every browser (brand/version) has it's own random-routine!!
"But what does it mean?" Well.. simply put.. Depending on browser(version)'s ES-Script implementation (cough cough IE): Noise based on Math.random() will/might render visible patterns in your noise (independently of possible tile-size)!!
So for the rest of this answer we are going to assume either an ideal world where browsers spit-out proper random numbers, or that you took control and use a stronger 'predictable' random-solution as is discussed on this wonderful article that google's bubble accidentally leaked :)
End Update 2 part 1
So let's start with the radial gradient-part. You already figured that one out.
Ok, then follows the noise-function in canvas (you could you could do this before the radial gradient, but this order gives a nicer grain and diffuses color banding the gradient produces -on a average lcd you would see them anyway since they're not true color-) : this is done by generating random pixels.
There would be a lot of different algorithms to use, I've used a straight-forward one that you can understand without math..
Note that generating noise for a modern day full-screen resolution is easily larger than 1 mega-pixel in resolution, so this would be slow! To overcome this we need to generate and RE-USE a small seamless tile. We use this as a pattern-fill in our full-size image that already has the radial gradient.
I also assume you want the radial gradient liquidly placed in the middle of the view-port, so if you want to go the fixed way (as opposed to the noize.png/css3 way you referenced), you'll also need an extra eventhandler 'onResize()' to have canvas render a new background.
Why? Well if you where to let the browser scale this background-image (created upon page-load) automatically, then the nice grain-size of your noise would change to, EVEN leading to visible PATTERNS that you would not want..
(Since I desperately want to go to sleep now..): The rest is thoroughly explained in the source-code of the function I wrote for you..
Here is the link to the fully documented code I wrote for you: jsfiddle.net/sU74C/ and here you can see it in full-screen preview.   UPDATE 1: function genNoise 80% FASTER!!
Use it if you like (retaining the link to this answer) or learn from it and do your own thing.
PLEASE DON'T FORGET to accept AN answer to this question (hopefully mine :))
Hope this helps!
UPDATE 2 part 2:
There are more way's to interact with canvas. One could also calculate/(re-)use/generate/save/import pixel-maps/array's (as png or base64 or jpg or ...) for instance, see this excellent article on faster 8bit and even faster 32bit (if the browser supports 'Uint8ClampedArray' as the type of the data property of the ImageData object) pixel-array's, including a proper solution to account for the endianness of the processor!!
So after giving this some considerable thought, it turns out that to do this 'right' is actually a challenge and should be divided in 2 parts:
Where do I get my noise-data (Math.random() or custom random or pre-defined external (image, json-string, random.com) or embedded (packed?) data)?
What is the fastest way to build/store/re-use this noise on full-screen size/canvas.
Given the statements in part 1 of this update and that we don't want patterns in our visible noise, I'm starting to lean to using some pre-rendered 'random' noise data (meant to tile seamlessly) that is embedded in the noise-generator: otherwise there is the overhead of running your own none-engine-optimized random function (times..a lot..).
Also I think one might get away with just black and white and transparency afterwards.. This might considerably speed-up things up AND reduce embedded pixel-data.
Think about it: black or white equals 0 or 1..
In base 64 one character can represent 6 bits. So a 30x30px image has 900 px divided by 6 bits = 150 characters (sweet-spot increments by 6px, so next is 36px*36px is 216 characters).

Item matching with domain knowlege

I have various product items that I need to decide if they are the same. A quick example:
Microsoft RS400 mouse with middle button should match Microsoft Red Style 400 three buttoned mouse but not Microsoft Red Style 500 mouse
There isn't anything else nice that I can match with apart from the name and just doing it on the ratio of matching words isn't good enough (The error rate is far too high)
I do know about the domain and so I can (for example) hand write the fact that a three buttoned mouse is probably the same as a mouse with a middle button. I also know the manufacturers (or can take a very good guess at them).
The only thought I have had so far is matching them by trying to use hand written rules to reduce the size of the string and then checking the matching words, but I wondered if anyone had any ideas best way of doing this matching was with a better accuracy and precision (or where to start looking) and if anyone knew of any work that had been done in this area? (papers, examples etc).
"I do know about the domain..."
How much exactly do you know about the domain? If you know everything about the domain, then you might be better off building an index of all your manufacturers products (basically the description of the product from the manufacturers webpage). Then instead of trying to match your descriptions to each other, matching them to your index of products.
Advantages to this approach:
presumably all words used in the description of the product have been used somewhere in the promotional literature
if when building the index you were able to weight some of the information (such as product codes) then you may have more success
Disadvantages:
may take a long time to create the index (especially if done by hand)
If you don't know everything about your domain, then you might consider down-ranking words that are very common (you can get lists of common words off the internet), and up-ranking numbers and words that aren't in a dictionary (you can get lists of words off the internet/most linux/unix distributions come with them for spell checking purposes).
I don't know how much you know about search, but in the past I've found the book "Search Engines: Information Retrieval in Practice" by W. Bruce Croft, Donald Metzler, Trevor Strohman to be useful. There are some sample chapters in the publishers website which will tell you if the book's for you or not: pearsonhighered.com
Hope that helps.
In addition to hand-written rules, you may try to use supervised learning with feature extraction.
Let features be the words in description, than look on descriptions as feature vectors.
When teaching the algorithm, let it show you two vectors that look similar by the ratio, and if it's same item, let the algorithm improve weighs for those words.
For example, each pair of words may have bigger weight than simple ratio, as you have done.
[3-button] [middle]
[wheel] [button]
[mouse] [mouse]
By your algorithm, it'll give ratio of 1/3 to similarity. When you set this as "same item" algorithm should add more value to those pair of words, when it reaches them next time.
Just tokenize (you should seperate numbers from letters in that step aswell, so not just a whitespace tokenizer), stem, filter stopwords and uninteresting words like mouse. Perhaps you should have a list with words producers aswell and shorten all not producers and numbers to their first letter. (if you do that, you have to seperate capital letters aswell in the tokenizer)
Microsoft RS400 mouse with middle button -> Microsoft R S 400
Microsoft Red Style 400 three buttoned mouse -> Microsoft R S 400
Microsoft Red Style 500 mouse -> Microsoft R S 500
If you want a better solution
vsm (vector space model) out of plagiarism detection would be nice. (Every word gets a weight, according to their discriminative value and those weights are projected into a multidimensional space. After that you just measure the angular degree between 2 texts)
I would suggest something a lot more generally applicable. As I understand it, you want some nlp processing that will deal with things that you recognize as synonyms. I think that's a pretty simple implementation right there.
If I were you I would make a keyword object that had a list of synonyms as a parameter, then write a script that would scrape whatever text you have for words that only appear occasionally (have some capped frequency at which the keyword is actually considered applicable), then add a list of keywords as a parameter of each keyword that contains it's synonyms. If you were willing to go a step further I would set weights on the synonym list showing how similar they are.
With this kind of nlp problem, the chance that you will get to 100% accuracy is 0, but you could well get above 90%, I would suggest adding an element by which you can adjust the weights in an automated way. I have to be fairly vague here, but in my last job I was tasked with a similar problem, and was able to get accuracy in the high 90's. My implementation was also probably more complicated than what you need, but even a simple implementation should get you pretty good return, but if you aren't dealing with a fairly large data set (~hundreds+) it's probably not worth scripting.
Quick example, in your example the difference can be distilled pretty accurately to just saying that "middle" and "three" are synonyms. You can get more complex if you need to, but that would match a lot.

Electrically charging edges in a force-based graph drawing algorithm?

I'm attempting to write a short mini-program in Python that plays around with force-based algorithms for graph drawing.
I'm trying to minimize the number of times lines intersect. Wikipedia suggests giving the lines an electrical charge so that they repel each other. I asked my physics teacher how I might simulate this, and she mentioned using calculus with Coulomb's Law, but I'm uncertain how to start.
Could somebody give me a hint on how I could do this? (Or alternatively, another way to tweak a force-based graph drawing algorithm to minimize the number of times the lines cross?) I'm just looking for a hint; no source code please.
In case anybody's interested, my source code and a youtube vid I made about it.
You need to explicitly include a term in your cost function that minimizes the number of edge crossings. For example, for every pair of edges that cross, you incur a fixed penalty or, if the edges are weighted, you incur a penalty that is the product of the two weights.

Optical character recognition

Hey everyone,
I'm trying to create a program in Java that can read numbers of the screen, and also recognise images on the screen. I was wondering how i can achieve this?
The font of the numbers will always be the same. I have never programmed anything like this before, but my idea of how it works is to have the program take a screenshot, then overlay the image of the numbers with the section of the screenshot image and check if they match, repeating this for each numbers. If this is the correct way to do this, how would i put that in code.
Thanks in advance for any help.
You could always train a neural net to do it for you. They can get pretty accurate sometimes. If you use something like Matlab it actually has capabilities for that already. Apparently there's a neural network library for java (http://neuroph.sourceforge.net/) although I've never used it personally.
Here's a tutorial about using neuroph: http://www.certpal.com/blogs/2010/04/java-neural-networks-and-neuroph-a-tutorial/
You can use a neural network, support vector machine, or other machine learning construct for this. But it will not do the entire job. If you do a screen shot, you are going to be left with a very large image that you will need to find the individual characters on. You also need to deal with the fact that the camera might not be pointed straight at the text that you want to read. You will likely need to use a series of algorithms to lock onto the right parts of the image and then downsample it in a way that size becomes neutral.
Here is a simple Java applet I wrote that does some of this.
http://www.heatonresearch.com/articles/42/page1.html
It lets you draw on a relatively large area and locks in on your char. Then it recognizes it. I am using the alphabet, but digits should be easier. The complete Java source code is included.
One simpler approach could be to use template matching. If the fonts are same, and/or the size (in pixels)is known, then simple template matching can do the job for you. ifsize of input is unknown, you might have to create copies of images at different scales and do the matching at each scale.
One with the extreme value(highest or lowest depending on the method you follow for template matching) is your result.
Follow this link for details