I am training Yolov5. I have labels.txt files that contain 60 labels, but I want to train the model only on 3 classes, how can I do that? - yolov5

I am training YOLOv5 on xView dataset, and it contains of 60 classes. The label.txt files contains 60 labels. But I want to train the model on only 3 classes to be faster. anyone knows how can I do that. Should I change the name of classes on data.yaml ?

Delete all the classes you don't want to use from the txt files which belongs to images in datasets. (You can write a shell script to do it) Modify the label.txt and data.yaml with your new (3 classes) situation. It should be work.

Related

Is there any way to combine two models of Yolov5 of different classes using Ensemble technique

I have two models trained ie. one for cvv and other for cards and no_cards. In detect.py command, How can I combine the two models pt files so that I can get one output after combining two models.
I've tried it in the way as:
weights=['runs/exp5/weights/best.pt', 'runs/exp6/weights/best.pt'], # model.pt path(s)
I am getting an error. How can I modify this? Kindly suggest.
Thanks in advance.
I have also tried writing
weights=['yolov5x.pt', 'yolov5l.pt']
and in yolov5x.pt, I've mentioned number of classes ie. nc as 2 and in yolov5l.pt as 1.
But I want to mention the trained model exp as well but where to write I am having doubt.

How can we add extra class with existing YOLOV5 model?

I need to add an extra one class with the existing 80 class of YOLOV5. I am aware of custom training , but after that, it will lose the pretrained 80 classes information. My requirement is the existing 80 classes + 1 custom class
Check the answer here: https://github.com/ultralytics/yolov5/issues/1071
you can label your new classes starting from 80 and then simply append
your new data to coco in your data.yaml. See GlobalWheat2020.yaml for
an example of training on multiple datasets.

How to train a DNN classification model using arbitrary data features stored in mysql?

I had a term project that needs to use data stored in MySQL to train a classification model using Tensorflow or whatever else.
I've tried to use examples from https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/keras/feature_columns.ipynb, and it took me a lot of time to process the data to a csv file and modify the python script. While I need to do a lot of experiments, is there may be much more simple tool for me to train and experiment on my MySQL dataset?
Maybe SQLFlow can meet your needs; I tried to build an SQLFlow script with the dataset you provided, she should be like this:
SELECT *
FROM Heart_Disease
TRAIN DNNClassifier /* a pre-defined TensorFlow estimator, tf.estimator.DNNClassifier */
WITH n_classes = 3, hidden_units = [10, 20] /* a parameter of the Estimator class constructor */
COLUMN Age, Sex, CP, FBS .. /* From the raw data, enter the columns that you think will help predict your heart rate. */
LABEL Target /* lable column */
INTO Heart_Disease.test_model; /* The trained model is saved to the specified data table */
It is also very easy to apply this model:
SELECT *
FROM Heart_Disease.predict
PREDICT Heart_Disease.predict_result.Target
USING Heart_Disease.test_model;
Heart_Disease.predict Target column is empty, The predicted Target is saved to the Heart_Disease.predict_result.Target table.
FYI:https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/demo.md
This is my first answer. Hope I can help you.
What you I think can do, is get the dump of data from sql if it's not realtime and not getting updated and then use that dump for the rest,
or you can create a connection of mysql and then feed that connection into pandas read_sql function, to get the dataframe.
A way to do that
Also if you're new to tensorflow, you should try looking at the tensorflow's estimator API that shall do your work, Apart from that you may use tensorflow's keras wrapper that also eases the work of making a NN network.

CNTK load pictures with class affiliation in percent

I am trying to build a neuronal network with CNTK to estimate the age of a person.
Currently I want to try an approach using only one class. So every picture gets label 0 but also an affiliation to the class in percent.
So the net should learn that the probability of a 30 year old person to match class 0 is 30% ... 60yo = 60% ... 93yo = 93%.
Currently I am working on a reduced data set of 50k images (.jpg) and use the MiniBatchSourceFromData function.
Since I have a lot more training data available (400k + augmentations) I wanted to load the pictures in chunks for training, due to limited server RAM.
Following THIS CNTK tutorial I have to use the MiniBatchSource function and feed a deserializer with a map_file which includes the paths and labels to my training data. .
My Problem is, that the map_file doesn't support class affiliations. I can only define what picture belongs to which class.
Since I am new to CNTK and deep learning in general, I'd like to know if there is another option to read chunked data as well as tell the network how likely it is that the picture corresponds to a specific class.
Best regards.
You can create a composite reader. One deserializes you images, another can deserialise your numeric data.
Read this, the last section shows you how to use a composite reader

Extract aligned sections of FASTA to new file

I've already looked here and in other forums, but couldn't find the answer to my question. I want to design baits for a target enrichment Sequencing approach and have the output of a MarkerMiner search for orthologous loci from four different genomes with A. thaliana as a Reference as. These output alignments are separate Fasta-Files for each A. thaliana annotated gene with the sequences from my datasets aligned to it.
I have already run a script to filter out those loci supported to be orthologous by at least two of my four input datasets.
However, now, I'm stumped.
My alignments are gappy, since the input data is mostly RNAseq whereas the Reference contains the introns as well. So it looks like this :
AT01G1234567
ATCGATCGATGCGCGCTAGCTGAATCGATCGGATCGCGGTAGCTGGAGCTAGSTCGGATCGC
MyData1
CGATGCGCGC-----------CGGATCGCGG---------------CGGATCGC
MyData2
CGCTGCGCGC------------GGATAGCGG---------------CGGATCCC
To effectively design baits I now need to extract all the aligned parts from the file, so that I will end up with separate files; or separate alignments within the file; for the parts that are aligned between MyData and the Reference sequence with all the gappy parts excluded. There are about 1300 of these fasta files, so doing it manually is no option.
I have a bit of programming experience in python and with Linux command line tools, however I am completely lost on how to go about this. I would appreciate a hint, on what kind of tools are out there I could use or what kind of algorithm I need to come up with.
Thank you.
Cheers