I've read Taking still photos from MDN describing how you can capture photos from the web cam (using a video element and mediaDevices.getUserMedia) to show them to the user using a canvas.
However I do not need (and do not want) to display the taken image to the user and thus I also do not want to use a canvas as I'd say that first drawing a canvas just to get the image data (as ImageData or DataUri) could be bad for the performance.
So is there a way to get the image data without using canvas?
Using canvas is totally fine and performs well. Just don't attach the canvas to the document and it won't be shown. I think your concern about performance is unfounded.
The only other way to get data out of a stream would be MediaRecorder but it produces video at a given frame rate, not what you want, and much more involved.
There's talk of an imageCapture API, but it's not implemented in any browser yet, except behind a flag. It would offer a .takePhoto() method which would also give access to full-resolution photo cameras where available (i.e. phones).
Related
I am posting this question by request of a Firebase engineer.
I am using the Camera2 API in conjunction with Firebase-mlkit vision. I am using both barcode and on-platform OCR. The things I am trying to decode are mostly labels on equipment. In testing the application I have found that trying to scan the entire camera image produces mixed results. The main problem is that the field of view is too wide.
If there are multiple bar codes in view, firebase returns multiple results. You can sort of work around this by looking at the coordinates and picking the one closest to the center.
When scanning text, it's more or less the same, except that you get multiple Blocks, many times incomplete (you'll get a couple of letters here and there).
You can't just narrow the camera mode, though - for this type of scanning, the user benefits from the "wide" camera view for alignment. The ideal situation would be if you have a camera image (let's say for the sake of argument it's 1920x1080) but only a subset of the image is given to firebase-ml. You can imagine a camera view that has a guide box on the screen, and you orient and zoom the item you want to scan within that box.
You can select what kind of image comes from the Camera2 API but firebase-ml spits out warnings if you choose anything other than YUV_420_488. The problem is that there's not a great way in the Android API to deal with YUV images unless you do it yourself. That's what I ultimately ended up doing - I solved my problem by writing a Renderscript that takes an input YUV, converts it to RGBA, crops it, then applies any rotation if necessary. The result of this is a Bitmap, which I then feed into either the FirebaseVisionBarcodeDetectorOptions or FirebaseVisionTextRecognizer.
Note that the bitmap itself cases mlkit runtime warnings, urging me to use the YUV format instead. This is possible, but difficult. You would have to read the byte array and stride information from the original camera2 yuv image and create your own. The object that comes from camear2 is unfortunately a package-protected class, so you can't subclass it or create your own instance - you'd essentially have to start from scratch. (I'm sure there's a reason Google made this class package protected but it's extremely annoying that they did).
The steps I outlined above all work, but with format warnings from mlkit. What makes it even better is the performance gain - the barcode scanner operating on an 800x300 image takes a tiny fraction as long as it does on the full size image!
It occurs to me that none of this would be necessary if firebase paid attention to cropRect. According to the Image API, cropRect defines what portion of the image is valid. That property seems to be mutable, meaning you can get an Image and change its cropRect after the fact. That sounds perfect. I thought that I could get an Image off of the ImageReader, set cropRect to a subset of that image, and pass it to Firebase and that Firebase would ignore anything outside of cropRect.
This does not seem to be the case. Firebase seems to ignore cropRect. In my opinion, firebase should either support cropRect, or the documentation should explicitly state that it ignores it.
My request to the firebase-mlkit team is:
Define the behavior I should expect with regard to cropRect, and document it more explicitly
Explain at least a little about how images are processed by these recognizers. Why is it so insistent that YUV_420_488 be used? Maybe only the Y channel is used in decoding? Doesn't the recognizer have to convert to RGBA internally? If so, why does it get angry at me when I feed in Bitmaps?
Make these recognizers either pay attention to cropRect, or state that they don't and provide another way to tell these recognizers to work on a subset of the image, so that I can get the performance (reliability and speed) that one would expect out of having to ML correlate/transform/whatever a smaller image.
--Chris
Lets say there is a function getScreenShot. We call it like:
scrShot=getScreenShot(videoID, time, quality)
And it gives us a screenshot of the video from the specified time (like 1:23) in specified quality (like 720p).
Is there any possible way to do it automatically without loading the full video.
Not sure about the actual screenshot part, but I imagine you can seek to the time by setting .currentTime and then use this post to hide the various controls, giving you an unobstructed image.
The real question to answer is if you can capture a screenshot. I would guess not due to privacy concerns?
Is there any way to be able to query the GPU to tell me if my viewport in my webpage is currently on screen or not? For example, if I had a 3d scene rendering in a canvas in an iframe, is there a way to query the hardware (within my iframe and only the pixels or verts in the viewport) to say if I am on screen or scrolled off screen?
I'm curious as to whether this is something I can do at the vertex shader level. Does WebGL even perform the shader program on a viewport that is offscreen? Lets say if it is scrolled below the canvas, or the viewport is obstructed by another webpage browser window? Is there a way to query the compositing portion of webgl to see if it is even in view or Iterate through the "RenderObject" Tree to test if it is even onscreen and then return this value? I am trying to get much more performance out of a project I am working on and I am trying to only render what is visible on screen.
Any possible ideas? Is this even possible? Thanks!
RequestAnimationFrame is only reasonable way to handle unnecessary performance loss even semantically because window.requestAnimationFrame tells the browser that you wish to perform an animation... So browser will figure out how it should handle your wish in optimal way taking into account current page state.
But since iframes communicate using local storage you can push to them your base page state so each of them will decide should it RequestAnimationFrame or not. But im not shure that it is a good thing to have multiply render contexts on your page, they all eat resources and can't share them (data that stored in GPU is sandboxed) so eventually they will start to push each other from GPU memory and cause lags + GPU pipeline might be not so happy with all those tiny standalone entities. Fragmentation is main GPU performance enemy.
You don't ask this question at the canvas/WebGL level, because it might, for example, be scrolled back on screen before you draw another frame, and browsers don't want to not have content to show, so there's no provision to not draw.
I believe you will have to consult the DOM geometry properties (e.g. .scrollLeft) of your scrollable areas to determine whether the canvas is visible. There is enough information in said properties that you can do this generically without hardcoding knowledge of your page structure.
Also, make sure you are exclusively using requestAnimationFrame for your drawing/simulation scheduling; it will pause animations if the page is hidden/minimized/in another tab/otherwise explicitly invisible.
I would like to face a problem for which I haven't seen a solution looking around in Internet. This is: I need to save the elements drawn by WEB users on a canvas space not as a flat image, but each one singularly. This in order to let the same user, or even other users, to modify every single element (drag-and-drop, erase, erase partially, ecc.) in a second moment. This should also help to eventually save a drawing history and restore it in next working sessions. All the examples I found were intended to save just a canvas flat image.
Update:
To better clarify: not necessary as layers, but for sure I thought to realize several different driving tools; a drawing element is the singular application/istance of a tool: a circle, a box, a added image, a straight line or even a free hand drawing that start from the moment of right button mouse click till it is released. Then the chance to save the elements state allowing to modify each one in a second moment.
You can't do this natively with canvas. You should look at using a third party library. Fabric is a library that was built to do what you want.
The base idea was to use convans as a container for vectorial shapes (triangles, squares, cirlces, etc.), manual drawn figures (see example http://www.williammalone.com/articles/create-html5-canvas-javascript-drawing-app/) and inserted images giving the chance to users to save/upload the content not as serialized image, but with each distinguished element in its original format in order to continue to work on them in a future work session.
Here's the thing: I have a Google Map with a lot of markers in it.
The problem is that the map loads, stays empty for a little while and only then markers are displayed. The markers are customized PNGs.
My idea is to "preload" them (not sure it's the right word) so they appear almost at the same time as the map.
What I did so far is to add the same images I use on my map, outside of the map, earlier in the page in display:none;
I'm not sure but it seems like the time between the map and the images are displayed has been reduced.
Is it the best way to do it, and is it a good practice?
You could use "sprites" i.e. a collection of separate images on 1 single png. This bears the advantage of requiring only 1 load i.e. less separate loads. Google GWT pushes this technique a lot (i.e. Image bundles).
The value of this technique increases with the number of discrete images that require loading: the more separate images, the longer it takes to load them.
Don't use display:none for preloading. Because an element set to display:none doesn't render any of its physical attributes, the browser doesn't bother downloading it until it's made visible.
An alternative is to use visibility:hidden, but you run the risk of running into a user agent that does pretty much the same thing. visibility:hidden requires that the browser compute the box model for the image, which requires that image is loaded (to get the dimensions). I don't believe this works in IE6, though.
The last technique (and my favorite) is to create a div directly before your </body> tag. Position it absolute with left: -99999999px; top: -99999999px. The browser is forced to render the images (and consequently load them) and there's no messy Javascript to deal with.
Now, to integrate this with your issue, put the code for your Google map after your "preload div". Your browser will be forced to load the images before it runs the code to create the map. This should solve your problem.
Hope this helps!
From what I recall of most modern browsers, images are always loaded once (given the src of the image is the same). I guess you mean loading them before the maps load.
In my opinion in does not really matter that much. Markers should be relatively light compared to the map image itself and I can't really use them without the map anyway.
If you think it improve you user experience then I think it is a good practice, but I'd try to get them on a more cleaner way, probably an ajax call early in the page load?
Take a look at Ajax In Action: Preloading Images