Is there any way to combine two models of Yolov5 of different classes using Ensemble technique - yolov5

I have two models trained ie. one for cvv and other for cards and no_cards. In detect.py command, How can I combine the two models pt files so that I can get one output after combining two models.
I've tried it in the way as:
weights=['runs/exp5/weights/best.pt', 'runs/exp6/weights/best.pt'], # model.pt path(s)
I am getting an error. How can I modify this? Kindly suggest.
Thanks in advance.
I have also tried writing
weights=['yolov5x.pt', 'yolov5l.pt']
and in yolov5x.pt, I've mentioned number of classes ie. nc as 2 and in yolov5l.pt as 1.
But I want to mention the trained model exp as well but where to write I am having doubt.

Related

BentoML - Seving a CatBoostClassifier with cat_features

I am trying to create a BentoML service for a CatBoostClassifier model that was trained using a column as a categorical feature. If i save the model and I try to make some predictions with the saved model (not as a BentoML service) all works as expected, but when I create the service using BentML I get an error
_catboost.CatBoostError: Bad value for num_feature[non_default_doc_idx=0,feature_idx=2]="Tertiary": Cannot convert 'b'Tertiary'' to float
The value is found in a column named 'road_type' and the model was trained using 'object' as the data type for the column.
If I try to give a float or an integer for the 'road_type' column I get the following error
_catboost.CatBoostError: catboost/libs/data/model_dataset_compatibility.cpp:53: Feature road_type is Categorical in model but marked different in the dataset
If someone has encountered the same issue and found a solution I would appreciate it. Thanks!
I have tried different approaches for saving the model or loading the model but unfortunately it did not worked.
You can try to explicitly pass the cat_features to the bentoml runner.
It would be something like this:
from catboost import Pool
runner = bentoml.catboost.get("bentoml_catboost_model:latest").to_runner()
cat_features = [2] # specify your cat_features indexes
prediction = runner.predict.run(Pool(input_data, cat_features=cat_features))

I am training Yolov5. I have labels.txt files that contain 60 labels, but I want to train the model only on 3 classes, how can I do that?

I am training YOLOv5 on xView dataset, and it contains of 60 classes. The label.txt files contains 60 labels. But I want to train the model on only 3 classes to be faster. anyone knows how can I do that. Should I change the name of classes on data.yaml ?
Delete all the classes you don't want to use from the txt files which belongs to images in datasets. (You can write a shell script to do it) Modify the label.txt and data.yaml with your new (3 classes) situation. It should be work.

How do I get molecular structural information from SMILES

My question is: is there any algorithm that can convert a SMILES structure into a topological fingerprint? For example if glycerol is the input the answer would be 3 x -OH , 2x -CH2 and 1x -CH.
I'm trying to build a python script that can predict the density of a mixture using an artificial neural network. As an input I want to have the structure/fingerprint of my molecules starting from the SMILES structure.
I'm already familiar with -rdkit and the morganfingerprint but that is not what i'm looking for. I'm also aware that I can use the 'matching substructure' search in rdkit, but then I would have to define all the different subgroups. Is there any more convenient/shorter way?
For most of the structures, there's no existing option to find the fragments. However, there's a module in rdkit that can provide you the number of fragments especially when it's a function group. Check it out here. As an example, let's say you want to find the number of aliphatic -OH groups in your molecule. You can simply call the following function to do that
from rdkit.Chem.Fragments import fr_Al_OH
fr_Al_OH(mol)
or the following would return the number of aromatic -OH groups:
from rdkit.Chem.Fragments import fr_Ar_OH
fr_Ar_OH(mol)
Similarly, there are 83 more functions available. Some of them would be useful for your task. For the ones, you don't get the pre-written function, you can always go to the source code of these rdkit modules, figure out how they did it, and then implement them for your features. But as you already mentioned, the way would be to define a SMARTS string and then fragment matching. The fragment matching module can be found here.
If you want to predict densities of pure components before predicting the mixtures I recommend the following paper:
https://pubs.acs.org/doi/abs/10.1021/acs.iecr.6b03809
You can use the fragments specified by rdkit as mnis proposes. Or you could specify the groups as SMARTS patterns and look for them yourself using GetSubstructMatches as you proposed yourself.
Dissecting a molecule into specific groups is not as straightforward as it might appear in the first place. You could also use an algorithm I published a while ago:
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0382-3
It includes a list of SMARTS for the UNIFAC model, but you could also use them for other things, like density prediction.

Extract aligned sections of FASTA to new file

I've already looked here and in other forums, but couldn't find the answer to my question. I want to design baits for a target enrichment Sequencing approach and have the output of a MarkerMiner search for orthologous loci from four different genomes with A. thaliana as a Reference as. These output alignments are separate Fasta-Files for each A. thaliana annotated gene with the sequences from my datasets aligned to it.
I have already run a script to filter out those loci supported to be orthologous by at least two of my four input datasets.
However, now, I'm stumped.
My alignments are gappy, since the input data is mostly RNAseq whereas the Reference contains the introns as well. So it looks like this :
AT01G1234567
ATCGATCGATGCGCGCTAGCTGAATCGATCGGATCGCGGTAGCTGGAGCTAGSTCGGATCGC
MyData1
CGATGCGCGC-----------CGGATCGCGG---------------CGGATCGC
MyData2
CGCTGCGCGC------------GGATAGCGG---------------CGGATCCC
To effectively design baits I now need to extract all the aligned parts from the file, so that I will end up with separate files; or separate alignments within the file; for the parts that are aligned between MyData and the Reference sequence with all the gappy parts excluded. There are about 1300 of these fasta files, so doing it manually is no option.
I have a bit of programming experience in python and with Linux command line tools, however I am completely lost on how to go about this. I would appreciate a hint, on what kind of tools are out there I could use or what kind of algorithm I need to come up with.
Thank you.
Cheers

ParagraphVectors in deeplearning4j

I am new in utilizing deeplearning4j. I am running the paragraphvector classifier on a dataset including labeled and unlabeled data, and got a result. When I run it again on the same dataset using a same configuration, I will get another results! The new results is close to the previous one, but why it generates slightly different results?! What I mean by slighltly different results is like at the first run, it detects and assigns two testing samples to the first class we have, and in the second run, it assigns those two samples or probably one of them to another class. It happens normally for just one or two maybe three samples. Maybe I needed to inform you in advance that we have three classes that they are all related to cancer types diseases.
Any hint/help/advice would be highly appreciated.
I use such a below configuration:
paragraphVectors = new ParagraphVectors.Builder()
.learningRate(0.2)
.minLearningRate(0.001)
.windowSize(2)
.iterations(3)
.batchSize(500)
.workers(4)
.stopWords(stopWords())
.minWordFrequency(10)
.layerSize(100)
.epochs(1)
.iterate(iterator)
.trainWordVectors(true)
.tokenizerFactory(tokenizerFactory)
.build();
Problem turned out to be bad input with the tokenizer.