I've been playing with the dynamic audio library standingwave 3. Almost the first thing one notices is that if one tries out the code samples in the developer's guide, namely this code:
// Create a chord of three simultaneous sine tones: A3, E4, A4.
var sequence:ListPerformance = new ListPerformance();
sequence.addSourceAt(0, new SineSource(new AudioDescriptor(), 0.1, 440));
sequence.addSourceAt(0, new SineSource(new AudioDescriptor(), 0.1, 660));
sequence.addSourceAt(0, new SineSource(new AudioDescriptor(), 0.2, 880));
// Play it back.
var source:IAudioSource = new AudioPerformer(sequence);
player.play(source);
then one gets a really unpleasant sound and trace messages that read "AUDIO CLIPPING". The developer explains in one of the issue reports on github that you need to reduce the gain of samples when you mix them together to avoid this, and that there's no easy way to know dynamically how much reduction is needed.
My question is, how is it that stangingwave2 seems to have dealt with this automatically? For instance, the code quoted above did not clip in SW2. Likewise consider this SW2 example demo - if you increase the sustain and hold (the S/H sliders) and press one of the sequence buttons, multiple tones will overlap without clipping, even though the source doesn't show any apparent sign of changing the gain or the volume of the sin tones, they just get mixed together.
What's going on here - did SW2 have some way of dealing with this automagically, or is there some robust way of generally overlaying arbitrary numbers of sounds dynamically without causing clipping? Thanks!
As no activity seems likely here, I'll note for the ages that apparently SW2 simply returns sine sources at much less than full scale, but it turns out that if you combine enough sources you do get clipping. SW3 returns the sources at full scale so the clipping becomes apparent with fewer sources.
Related
I understand it's often faster to pre-render graphics to an off-screen canvas. Is this the case for a shape as simple as a circle? Would it make a significant difference for rendering 100 circles at a game-like framerate? 50 circles? 25?
To break this into two slightly different problems, there are two aspects to what you're asking:
1) is drawing a shape off-screen and putting it on-screen faster
2) is drawing a shape one time and copying it to 100 different places faster than drawing a shape 100 times
The answer to the first one is "it depends".
That's a technique known as "buffering" and it's not really about speed.
The goal of buffering an image is to remove jerkiness from it.
If you drew everything on-screen, then as you loop through all of your objects and draw them, they're updating in real-time.
In the NES-days, that was normal, because there wasn't much room in memory, or much power to do anything about it, and because programmers didn't know much better, with the limited instructions they had to work with.
But that's not really the way games do things, these days.
Typically, they call all of the draw updates for one frame, then they take that whole frame as a finished image, and paste that whole thing on the screen.
The GPU (and GL/DirectX) takes care of this, by default, in a process called "double-buffering".
It's a double-buffer, because there's room for the "in-progress" buffer used for the updates, as well as the buffer that holds the final image from the last frame, that's being read by the monitor.
At the end of the frame processing, the buffers will "swap". The newly full frame will be sent to the monitor and the old frame will be overwritten with the new image data from the other draw calls.
Now, in HTML5, there isn't really access to the frame-buffer, so we do it ourselves; make every draw call to an offscreen canvas. When all of the updates are finished (the image is stable), then copy and paste that whole image to the onscreen canvas.
There is a large speed-optimization in here, called "blitting", which basically copies over only the parts that have changed, and reuses the old image.
There's a lot more to it than that, and there are a lot of caveats, these days, because of all of the special-effects we add, but there it is.
The second part of your question has to do with a concept called "instancing".
Instancing is similar to blitting, but while blitting is about only redrawing what's changed, instancing is about drawing the exact same thing several times in different places.
Say you're painting a forest in Photoshop.
You've got two options:
Draw every tree from scratch.
Draw one tree, copy it, paste it all over the image.
The downside of the second one is that each "instance" of the image looks exactly the same.
If your "template" image changes colour or takes damage, then all instances of the image do, too.
Also, if you had 87 different tree variations for an 8000 tree forest, making instances of them all would still be very fast, but it would take more memory, because you now need to save 87x more images than when it was just one tree, to reference on every draw call.
The upside is that it's still much, much faster.
To answer your specific question about X circles, versus instancing 1 circle:
Yes, it's still going to be a lot faster.
What a "lot" means, though, will change based on a lot of different things, because now you're talking about browsers on PCs.
How strong is the PC?
How good is the videocard?
How large is the canvas in software-pixels (not CSS pixels)?
How large are the circles? Do they have alpha-blending?
Is this written in WebGL or software?
If software is the canvas compositing in hardware mode?
For a typical PC, you should still be able to hit 60fps in Chrome, drawing 20 circles, I think (depending on what you're doing to them... ...just drawing them onscreen, every frame is simple), so in this case, the instances are still a "lot" faster, but it's not going to matter, because you've already passed the performance-ceiling of Canvas.
I don't know that the same would be true on phones/tablets, or battery-powered laptops/netbooks, though.
Yes, transferring from an offscreen canvas is faster than even primitive drawings like an arc-circle.
That's because the GPU just copies the pixels from the offscreen canvas (not much CPU effort required)
I working on an application for processing document images (mainly invoices) and basically, I'd like to convert certain regions of interest into an XML-structure and then classify the document based on that data. Currently I am using ImageJ for analyzing the document image and Asprise/tesseract for OCR.
Now I am looking for something to make developing easier. Specifically, I am looking for something to automatically deskew a document image and analyze the document structure (e.g. converting an image into a quadtree structure for easier processing). Although I prefer Java and ImageJ I am interested in any libraries/code/papers regardless of the programming language it's written in.
While the system I am working on should as far as possible process data automatically, the user should oversee the results and, if necessary, correct the classification suggested by the system. Therefore I am interested in using machine learning techniques to achieve more reliable results. When similar documents are processed, e.g. invoices of a specific company, its structure is usually the same. When the user has previously corrected data of documents from a company, these corrections should be considered in the future. I have only limited knowledge of machine learning techniques and would like to know how I could realize my idea.
The following prototype in Mathematica finds the coordinates of blocks of text and performs OCR within each block. You may need to adapt the parameters values to fit the dimensions of your actual images. I do not address the machine learning part of the question; perhaps you would not even need it for this application.
Import the picture, create a binary mask for the printed parts, and enlarge these parts using an horizontal closing (dilation and erosion).
Query for each blob's orientation, cluster the orientations, and determine the overall rotation by averaging the orientations of the largest cluster.
Use the previous angle to straighten the image. At this time OCR is possible, but you would lose the spatial information for the blocks of text, which will make the post-processing much more difficult than it needs to be. Instead, find blobs of text by horizontal closing.
For each connected component, query for the bounding box position and the centroid position. Use the bounding box positions to extract the corresponding image patch and perform OCR on the patch.
At this point, you have a list of strings and their spatial positions. That's not XML yet, but it sounds like a good starting point to be tailored straightforwardly to your needs.
This is the code. Again, the parameters (structuring elements) of the morphological functions may need to change, based on the scale of your actual images; also, if the invoice is too tilted, you may need to "rotate" roughly the structuring elements in order to still achieve good "un-skewing."
img = ColorConvert[Import#"http://www.team-bhp.com/forum/attachments/test-drives-initial-ownership-reports/490952d1296308008-laura-tsi-initial-ownership-experience-img023.jpg", "Grayscale"];
b = ColorNegate#Binarize[img];
mask = Closing[b, BoxMatrix[{2, 20}]]
orientations = ComponentMeasurements[mask, "Orientation"];
angles = FindClusters#orientations[[All, 2]]
\[Theta] = Mean[angles[[1]]]
straight = ColorNegate#Binarize[ImageRotate[img, \[Pi] - \[Theta], Background -> 1]]
TextRecognize[straight]
boxes = Closing[straight, BoxMatrix[{1, 20}]]
comp = MorphologicalComponents[boxes];
measurements = ComponentMeasurements[{comp, straight}, {"BoundingBox", "Centroid"}];
texts = TextRecognize#ImageTrim[straight, #] & /# measurements[[All, 2, 1]];
Cases[Thread[measurements[[All, 2, 2]] -> texts], (_ -> t_) /; StringLength[t] > 0] // TableForm
The paper we use for skew angle detection is: Skew detection and text line position determination in digitized documents by Gatos et. al. The only limitation with this paper is that it can detect skew upto -5 and +5 degrees. After that, we need something to slap the user with a message! :)
In your case, where there are primarily invoice scans, you may beautifully use: Multiresolution Analysis in Extraction of Reference Lines from Documents with Gray Level Background by Tag et. al.
We wrote the code in MATLAB, if you need help let me know!
I worked on a similar project once, and for being a long time user of OpenCV I ended up using it once again. OpenCV is a popular-cross-platform-computer-vision-library that offers programming interfaces for C and C++.
I found an interesting blog that had a post on how to detect the skew angle of a text using OpenCV, and then another on how to deskew.
To retrieve the text of the document and be able to pass a smaller image to tesseract, I suggest taking a look at the bounding box technique.
I don't know if the image acquisition procedure is your responsibility, but if it is you might want to take a look at how to do camera calibration with OpenCV to fix the distortion in the image caused by some camera lenses.
I am writing an HTML5 canvas app in javascript. I am using multiple canvas elements as layers to support animation without having to re-draw the whole image every frame.
Is there a maximum number of canvas elements that I can layer on top of each other in this way -- (and see an appropriate result on all of the HTML5 platforms, of course).
Thank you.
I imagine you will probably hit a practical performance ceiling long before you hit the hard specified limit of somewhere between several thousand and 2,147,483,647 ... depending on the browser and what you're measuring (number of physical elements allowed on the DOM or the maximum allowable z-index).
This is correlated to another of my favorite answers to pretty much any question that involves the phrase "maximum number" - if you have to ask, you're probably Doing It Wrong™. Taking an approach that is aligned with the intended design is almost always just as possible, and avoids these unpleasant murky questions like "will my user's iPhone melt if I try to render 32,768 canvas elements stacked on top of each other?"
This is a question of the limits of the DOM, which are large. I expect you will hit a performance bottleneck before you hit a hard limit.
The key in your situation, I would say, is to prepare some simple benchmarks/tests that dynamically generate Canvases (of arbitrary number), fill them with content, and add them to the DOM. You should be able to construct your tests in such a way where A) if there is a hard limit you will spot it (using identifiable canvas content or exception handling), or B) if there is a performance limit you will spot it (using profiling or timers). Then perform these tests on a variety of browsers to establish your practical "limit".
There are also great resources available here https://developers.facebook.com/html5/build/games/ from the Facebook HTML5 games initiative. Therein are links to articles and open source benchmarking tools that address and test different strategies similar to yours.
I am working on a simple drawing application, and i need an algorithm to make flood fills.
The user workflow will look like this (similar to Flash CS, just more simpler):
the user draws straight lines on the workspace. These are treated as vectors, and can be selected and moved after they are drawn.
user selects the fill tool, and clicks on the drawing area. If the area is surrounded by lines in every direction a fill is applied to the area.
if the lines are moved after the fill is applied, the area of fill is changed accordingly.
Anyone has a nice idea, how to implement such algorithm? The main task is basically to determine the line segments surrounding a point. (and storing this information somehow, incase the lines are moved)
EDIT: an explanation image: (there can be other lines of course in the canvas, that do not matter for the fill algorithm)
EDIT2: a more difficult situation:
EDIT3: I have found a way to fill polygons with holes http://alienryderflex.com/polygon_fill/ , now the main question is, how do i find my polygons?
You're looking for a point location algorithm. It's not overly complex, but it's not simple enough to explain here. There's a good chapter on it in this book: http://www.cs.uu.nl/geobook/
When I get home I'll get my copy of the book and see if I can try anyway. There's just a lot of details you need to know about. It all boils down to building a DCEL of the input and maintain a datastructure as lines are added or removed. Any query with a mouse coord will simply return an inner halfedge of the component, and those in particular contain pointers to all of the inner components, which is exactly what you're asking for.
One thing though, is that you need to know the intersections in the input (because you cannot build the trapezoidal map if you have intersecting lines) , and if you can get away with it (i.e. input is few enough segments) I strongly suggest that you just use the naive O(n²) algorithm (simple, codeable and testable in less than 1 hour). The O(n log n) algorithm takes a few days to code and use a clever and very non-trivial data structure for the status. It is however also mentioned in the book, so if you feel up to the task you have 2 reasons to buy it. It is a really good book on geometric problems in general, so for that reason alone any programmer with interest in algorithms and datastructures should have a copy.
Try this:
http://keith-hair.net/blog/2008/08/04/find-intersection-point-of-two-lines-in-as3/
The function returns the intersection (if any) between two lines in ActionScript. You'll need to loop through all your lines against each other to get all of them.
Of course the order of the points will be significant if you're planning on filling them - that could be harder!
With ActionScript you can use beginFill and endFill, e.g.
pen_mc.beginFill(0x000000,100);
pen_mc.lineTo(400,100);
pen_mc.lineTo(400,200);
pen_mc.lineTo(300,200);
pen_mc.lineTo(300,100);
pen_mc.endFill();
http://www.actionscript.org/resources/articles/212/1/Dynamic-Drawing-Using-ActionScript/Page1.html
Flash CS4 also introduces support for paths:
http://www.flashandmath.com/basic/drawpathCS4/index.html
If you want to get crazy and code your own flood fill then Wikipedia has a decent primer, but I think that would be reinventing the atom for these purposes.
I'm currently working on a pure html 5 canvas implementation of the "flying tag cloud sphere", which many of you have undoubtedly seen as a flash object in some pages.
The tags are drawn fine, and the performance is satisfactory, but there's one thing in the canvas element that's kind of breaking this idea: you can't identify the objects that you've drawn on a canvas, as it's just a simple flat "image"..
What I have to do in this case is catch the click event, and try to "guess" which element was clicked. So I have to have some kind of matrix, which stores a link to a tag object for each pixel on the canvas, AND I have to update this matrix on every redraw. Now this sounds incredibly inefficient, and before I even start trying to implement this, I want to ask the community - is there some "well known" algorithm that would help me in this case? Or maybe I'm just missing something, and the answer is right behind the corner? :)
This is called the point location problem, and it's one of the basic topics in computational geometry. There are a lot of methods you could use that would be much faster than the approach you're thinking of, but the details depend on what exactly you want to accomplish.
For example, each text string is contained in a bounding box. Do you just want to test whether the user clicked somewhere in that box? Then simply store the minimum and maximum coordinates of each rendered string, and test the point against each bounding box to see if it's contained in that range. If you have a large number of points to test, you can build any number of data structures to speed this up (e.g. R-trees), but for a single point the overhead of constructing such a structure probably isn't worthwhile.
If you care about whether the point actually falls within the opaque area of the stroked characters, the problem is slightly trickier. One solution would be to use the bounding box approach to first eliminate most of the possibilities, and then render the remaining strings one at a time to an offscreen buffer, checking each time to see if the target point has been touched.