I am working on a project for detecting abandoned luggage in train stations and airports. Is there some dataset that contains all kinds of bags and luggage? I searched a lot but I can't find any, and I would really appreciate if someone can help me.
Thanks!
I was also looking for this kind of dataset and haven't found any.
So my solution was to build one from the images of the COCO dataset.
download the metadata and only save labelled images that have one of these 3 classes: backpack, suitcase, handbag (further I'd refer to them as just bag).
get rid of the pictures where the bag is carried by a person. To do it, write a script that considers the bounding rects of all objects in the picture; if at least one bag in the picture has its bounding box intercepting with the bounding box of the object person, that's a good indicator that the bag is in person's hands/back/shoulder. The remaining bags may be considered abandoned.
save links to these images in a separate file and download them with CURL.
Note that this approach still requires manual clean-up after the images are downloaded, but that's the best you could come up with in absence of ready-to-use datasets.
Related
I have been trying this problem for weeks but to no avail.
My problem is:
Deep Learning Model has the following information:
INPUT: Sequence of Images
OUTPUT: What is happening in the image, i.e. categorise what is the activity happening from a sequence of 10 activities.
I have two cameras recording the same activity from two views, how could I combine those two views to improve the accuracy?
I think you should use DELF features, extract features of both similar images and combine them.
How to combine the two views is fully dependent on your understanding of the problem. Let me give you two different examples,
CASE I: when you review your training data, you can easily tell which camera is better for some data. e.g. one camera may capture everything useful, while the other camera may not due to possible occlusions (note: I am not saying one camera is always better than the other). In this case, you may use a later fusion technique to only fuse the two resulting features representing the sequences from two cameras.
CASE II: it is difficult for you to tell which camera is better. This basically indicates that you may not see performance boost after you consider both cameras, but maybe some small improvement.
Finally, when you say two cameras, is it possible for you do something like the binocular stereo vision? In this case, you may obtain the extra depth information that is not included in any single camera, and maybe helpful for the recognition task.
I'm using Autodesk Forge to integrate with our remodeling tool. In particular, I need to count objects of different families and types and determine to what room they actually belong. I use Model Derivative API for this purpose. To keep the room/area information I convert .rvt files to .nwc files as suggested here. However, when I retrieve data with GET /modelderivative/v2/designdata/{urn}/metadata/{guid}/properties I face the following problems from time to time:
Room information sometimes disappears from Objects for some reason
Objects disappear from result data for some reason (but they seem to exist when I browse them in A360)
I have no idea, what can be the reason for this.
I have no explanation for the disappearance of room data or objects for you.
If you can provide a reproducible case demonstrating that, I will gladly pass it on to the development team for analysis.
If you are interested in an immediate reliable solution and full control, which I assume is the case, I would suggest following the second bullet item in the advice provided by Eason in the previous answer that you refer to above:
Extract all the room information and object relationships you are interested in via the Revit API, store that data somewhere yourself, and use it later on wherever you like to your heart's content.
Then you will be completely safe and independent of all other components and their unpredictable behaviour.
If the only information that you need is the room containing each family instance, I can even implement a suitable Revit add-in for you.
Another suggestion that might help, if that is indeed the data you require: determine that information in a Revit add-in and attach it to each family instance in your own personal shared parameter. That will ensure that it remains intact through the translation process. Afaik, all shared parameter data is retained, independent of other behaviour.
I tried following the Siamese Network MNIST example with Caffe, and many other stackoverflow posts, or here on Google groups, but the information is always incomplete or a dead end. All I am trying to do is to feed a siamese network 2 RGB images to calculate for similarity.
What I've done so far is that I concatenated the 2 RGB images into one, converted it to leveldb, edited the slicing layer in "mnist_siamese_train_test.prototxt" to "slice_point: 3". From what I understand now is that the problem will be with the channels. How do I fix this issue, I havent found any useful resource to tell me how to do this, or fits my case. Please let me know if there is another way totally of just feeding the network directories and lists instead of leveldb and concatenating the images. Let me know if there is anything that needs further explanation.
you can find the answer in details in this thread, in short, you have two options:
Use a slice layer to slice the blob you created in the lmdb, as you pointed in the question, you have slice_point:3 and a 6 "channel" image (3 for each image) it should split it in 2 images with 3 channels each.
Use 2 different InputDataLayers, each one with a different file, you can see a working example in the thread.
Now, as pointed out, you seem to be doing the right thing, can you copy-paste your error and your .prototxt file here?
Also check if the dimension along which you are slicing is correct
I tried to look for free H.264 editors that could do such forementioned tasks, but i couldn't find any suitable results (probably due to my incorrect(?) search terms).
Basically, i have quite a few 20 to 40 second looping movie files (rendered with Adobe Premiere), and i would like to multiply an individual movie to about ten or twenty fold, and then save it again, preferably without the need to re-render (or re-encode?). Is this possible?
Hopefully i managed to make myself understood, thanks :)
After a whole bunch of searching and testing, i ended up with Avidemux and it did the task perfectly, this time around. I'm sure there would've been a way to automatize the appending / "multiplying" of the file, but with a tablet pen (main click set as double click) and specific folder it truly went by like a breeze.
I was also interested of Smart Cutter (i quite enjoyed the interface after getting used to it), but since it was trialware the choice was clear. Still, if i would have to do these kind of tasks more often, i might consider purchasing it.
I am trying to develop a strategy game and i am thinking of creating the following game maps similar to below.
(source: totaldiplomacy.com)
(source: teamteabag.com)
How do i go about doing it and what kind of software to use of books/ tutorials to follow?
Thanks
Assuming that you can draw the graphics that you need, the rest is accomplished by defining the "territories".
A territory will have
a name
a location (just a simple position would probably suffice, one for each place where you want to draw the key bits of information)
a list of neighboring territories
any other game-relevant information, such as what units are there, what resources it provides, etc.
The "hard" bit may be generating the connectivity graph. It's probably easiest to refer to each of your territories by number, as in your second image. Then, the "list of neighboring territories" for territory 14 would be 13, 15, and 23. So don't try to do this automatically, it'll be much easier (as long as the scope doesn't grow too large) to just define this manually.
In the general case, ignoring language and framework, you want to have two things:
a model, which in both those examples would store all the domains, armies, etc.
a map view, which in the simple case is an image file of some kind with some kind of tagging to indicate which bits of the model go where.
If you're looking to program games, I would recommend the XNA framework. There is alot of good resources for new programmers, head over to http://msdn.microsoft.com/en-us/library/bb203893%28v=XNAGameStudio.40%29.aspx to get started on your first game!