How to recognize and count objects with Firebase / ML Kit - firebase-mlkit

I'd like to recognize and count objects in a picture, e.g. count the number of houses in a picture of a neighbourhood. What's the best way to do this with ML Kit?
Do I need to use the Object Detection API? Or is it possible to get multiple "house" tags using a straight-forward image-labeler?

The ML Kit Object Detection API (note that it is now offered as a standalone SDK) can count objects in an image / video stream, but it limited to the 5 largest objects. Also, you should evaluate if the object detection works for your use case. It is a very general localizer and works for most objects, however with when objects are close together / overlapping it may not distinguish between them.
If you need to detect more than 5 objects, I would recommend looking at directly using TensorFlow Lite with some of the pre-trained models available on TF Hub or train one yourself using AutoML Vision Edge if the general models don't fit your use case.
Fwiw, Image Labeling assigns labels that describe the scene of an image. However, it does not count the number of objects, you typically get a single label "house".

Related

Autodesk Forge - Do any IDs persist when translating a new source file with minor changes?

Background
We currently use AutoDesk Forge to display 2D models of floor plans. Users have the ability to upload new floor plans (which uploads to OSS, then translates the file) to replace existing ones, that may include new objects in the room, or slightly different positioning of existing objects. Currently, we grab certain object dbId's via viewer.getSelection() to "bind" (using the term loosely here) some external data to the object and perform certain interactions within our web app. We are using .DWG files.
Issue
When uploading a new floor plan that removes an object, it shifts dbId's of other objects, and our external binding is then inaccurate.
Question
• Are there any IDs that persist between the uploads/translations?
*We do not maintain control of the .dwg files prior to their upload, so adding attributes on the drawing before it's translated likely won't be viable for our particular case - but if that's the best (or only) approach, I would like to know to propose it as a possible solution to my team.
Example
Let's say there's a simple square room with 5 chairs, and it's rendered and visible in the viewer since it's been uploaded as room1 (object key). We identify 3 chairs by their dbIds and save that, so a user can jump right to the object in question, and we put a label on it. Then someone comes along and removes one of the chairs, uploads/translates the document again with the same object key. Now, the dbIds are changed and assigned to different chairs and as a result. Question is, is there something other than dbId that persists between the different renderings? Or, is there something that I haven't considered that would be a better approach of keeping the binding accurate between uploads?
EDIT: Same scenario happens with element ID's. An interesting finding is the element ID and dbId are tightly coupled (meaning if Object A's dbId is 3 and element ID is 6E, then a deletion happens, a new object will have 3 and 6E respectively). Also, I believe the designers creating the AutoCAD files are making these as polylines, if that makes a difference
Contingency Plan
If there's not an ID of some sort that persists, I'm considering storing the coordinates of the object we want to bind to, and later find the closest object to those x/y/z coords. Is that a possibility, to find an object close to or overlapping xyz?
Use external id.
Useful blog: https://forge.autodesk.com/blog/get-dbid-externalid

Is there a reality capture parameter to request the desired number of vertices?

In the previous reality capture system users could set a parameter which would determine the resolution of the output models. I want to wind up with models about 100-150K vertices. Is there a setting that allows me to request the modeler to keep the number of generated vertices within some bounds, somewhere in the forge API?
The vertex/triangle decimation is usually, what can be called "subjective" task, which can also explain why there are so many optimization algorithms in the wild.
One type of optimization you would need and expect for "organic" models, and totally different one for an architectural building.
The Reality Capture API provides you only with raw Hi-Res results, avoiding "opionated" optimizations. This should be considered just as a step in automation pipeline.
Another step, would be, upon receiving, to automatically optimize the resulted mesh based on set of settings you need.
One of these steps could be Design Automation for 3ds Max, where you feed a model and using the ProOptimizer Modifier within 3ds Max, you output the mesh with needed detail. A sample of this step, can be found here: https://forge-showroom.autodesk.io/post/prooptimizer.
There are also numerous opensource solutions which should help you cover this post-processing step.

Is there an open source solution for Multiple camera multiple object (people) tracking system?

I have been trying to tackle a problem where I need to track multiple people through multiple camera viewpoints on a real-time basis.
I found a solution DeepCC (https://github.com/daiwc/DeepCC) on DukeMTMC dataset but unfortunately, this solution has been taken down because of data confidentiality issues. They were using Fast R-CNN for object detection, triplet loss for Re-identification and DeepSort for real-time multiple object tracking.
Questions:
1. Can someone share some other resources regarding the same problem?
2. Is there a way to download and still use the DukeMTMC database for multiple tracking problem?
3. Is anyone aware when the official website (http://vision.cs.duke.edu/DukeMTMC/) will be available again?
Please feel free to provide different variations of the question :)
Intel OpenVINO framewors has all part of this task:
Objects detection with pretrained Faster RCNN, SSD or YOLO.
Reidentification models.
And complete demo application.
And you can use another models. Or if you want to use detection on GPU then take opencv_dnn_cuda for detection and OpenVINO for reidentification.
A good deep learning library that I have used in the past for my work is called Mask R-CNN, or Mask Regions-Convolutional Neural-Network. Although I have only used this algorithm on images and not on videos, the same principles apply, and it's very easy to make the transition to detection objects in a video. The algorithm uses Tensorflow and Keras, where you can split your input data, i.e images of people, into two sets, training, and validation.
For training, use a third party software like via, to annotate the people in the images. After the annotations have been drawn, you will export a JSON file with all annotations drawn, which will be used for the training process. Do the same thing for the validation phase, BUT make sure the images in the validation have not been seen before by the algorithm.
Once you have annotated both groups and generated JSON files, you then can start training the algorithm. Mask R-CNN makes it very easy to train, with all you need to do is pass one line full of commands to start it. If you want to train data on your GPU instead of your CPU, then install Nvidia's CUDA, which works very well with supported GPUs, and requires no coding after the installation.
During the training stage, you will be generating weights files, which are stored in the .h5 format. Depending on the number of epochs you choose, there will be a weights file generated per epoch. Once the training has finished, you then will just have to reference that weights file anytime you want to detect relevant objects, i.e. in your video feed.
Some important info:
Mask R-CNN is somewhat of an older algorithm, but it still works flawlessly today. Although some people have updated the algorithm to Tenserflow 2.0+, to get the best use out of it, use the following.
Tensorflow-gpu 1.13.2+
Keras 2.0.0+
CUDA 9.0 to 10.0
Honestly, the hardest part for me in the past was not using the algorithm, but finding the right versions of Tensorflow, Keras, and CUDA, that all play well with each other, and don't error out. Although the above-mentioned versions will work, try and see if you can upgrade or downgrade certain libraries to see if you can get better results.
Article about Mask R-CNN with video, I find it to be very useful and resourceful.
https://www.pyimagesearch.com/2018/11/19/mask-r-cnn-with-opencv/
The GitHub repo can be found below.
https://github.com/matterport/Mask_RCNN
EDIT
You can use this method across multiple cameras, just set up multiple video captures within a computer vision library like OpenCV. I assume this would be done with Python, which both Mask R-CNN and OpenCV are primarly based in.

Most efficient way to build an evoluable world map for an HTML game

I have a multiplayer cooperative game project in mind and my main concern is the game map.
A bit of context
The players interact with a world map. This map is a first pre generated. This map should be tiled based (each tile representing a part of the world). However, the players should have the capacity to change the map (build something here, destroy another thing here). These modifications of the map should be visible for all other players.
Question
What is an efficient way of doing this ?
Classic array stored server side and update this array when a user does an action? Wouldn't it be quite CPU consuming on client side when building the map from that array? (image maps? <map></map>)
Use a game "engine" such as gdevelop or babylonjs?
From my point of view, for me to be able to fully customize my map, the array solution seems an easy way to get it done. But I don't have any experience on this topic.
I have recently hade a look on this map generator and tried to build a map on it (<map></map>), but this does not allow me to customize the map after it has been generated.
I think your best option would be to:
Store the map data in a simple serializable data structure. For example a double array of objects with some integers - a Tile Type enum, a Building Type, state data if you need it, etc. That will allow you to easily serialize and send the data between the server and the clients.
Use a game engine / canvas renderer / webgl renderer to render a view to the client by using the data array. I have experience with PIXI.js (a 2D rendering framework using either WebGL or Canvas) and Phaser (a 2D game engine, build on top of PIXI). So i can recommend you those two if your game is 2D. PIXI is used just for rendering, there is no game logic in it and you will have to implement it. It's good if the game is not that complex or if you want to learn how to do stuff on your own. Phaser on the other hand is a full game engine that has all sorts of game development functionality, but that also means that it's more bloated with things that you might not need.
When a user clicks something send to the server that "user x clicked tile x,y", process the input, edit the main data array and send it back to all clients. You can use Web sockets for that or just plain HTTP requests
Alternatively you can use one of the "big" game engines and just compile it down to js and html from there - Unity, Godot, Cocos creator (in this one you actually write in JS)

Is it possible to embed [bokeh] high level charts?

It seems most Bokeh embedding examples are using bokeh.plotting.figure object. Is it possible to embed a high level chart, like bokeh.charts.Bar or bokeh.charts.Scatter? Or is it possible to have convert a high level chart to a bokeh.plotting.figure object?
Thanks a lot.
The User's Guide section on Bokeh APIs has a good run down of how all these parts fit together, that I would suggest reading.
The long-story-short: Regardless of what API you use, bokeh.plotting or bokeh.charts, the end result is always just a collection of the same low-level bokeh.models objects. You can can think of bokeh.models as very basic building blocks, and the other higher level APIs as conveniences that help you to assemble the building blocks more efficiently and correctly.
So, in that light, yes, it is perfectly fine to embed a bokeh.chart using exactly the same functions described in Embedding Plots and Apps.
The one thing I will add is that if you need to update the plot's data after the fact, in place, then the bokeh.figure API will probably be more straightforward. The mapping between your data, and what gets plotted is more direct. Things generated by bokeh.charts may transform your input data into entirely different forms before plotting (e.g. you give a series, and Histogram has to spit out coordinates for boxes—not the data you started with)