Yolo V5 issue "Exception: Dataset not found." on local machine - yolov5

I am trying to train a model using Yolo V5.
I have the issue of Data base not found.
I have a train, test and valid files that contain all the image and labels files.
I have tested the files on googlecolap and it dose work. However, on my local machine it shows the issue of Exception: Dataset not found.
(Yolo_5) D:\\YOLO_V_5\Yolo_V5\yolov5>python train.py --img 416 --batch 8 --epochs 100 --data /data.yaml --cfg models/yolov5s.yaml --weights '' --name yolov5s_results --cache
Using torch 1.7.0 CUDA:0 (GeForce GTX 1080, 8192MB)
Namespace(adam=False, batch_size=8, bucket='', cache_images=True, cfg='models/yolov5s.yaml', data='.\\data.yaml', device='', epochs=100, evolve=False, exist_ok=False, global_rank=-1, hyp='data/hyp.scratch.yaml', image_weights=False, img_size=[416, 416], local_rank=-1, log_imgs=16, multi_scale=False, name='yolov5s_results', noautoanchor=False, nosave=False, notest=False, project='runs/train', rect=False, resume=False, save_dir='runs\\train\\yolov5s_results55', single_cls=False, sync_bn=False, total_batch_size=8, weights="''", workers=16, world_size=1)
Start Tensorboard with "tensorboard --logdir runs/train", view at http://localhost:6006/
Hyperparameters {'lr0': 0.01, 'lrf': 0.2, 'momentum': 0.937, 'weight_decay': 0.0005, 'warmup_epochs': 3.0, 'warmup_momentum': 0.8, 'warmup_bias_lr': 0.1, 'box': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'anchors': 3, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mosaic': 1.0, 'mixup': 0.0}
WARNING: Dataset not found, nonexistent paths: ['D:\\me1eye\\Yolo_V5\\valid\\images']
Traceback (most recent call last):
File "train.py", line 501, in <module>
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 78, in train
check_dataset(data_dict) # check
File "D:\me1eye\YOLO_V_5\Yolo_V5\yolov5\utils\general.py", line 92, in check_dataset
raise Exception('Dataset not found.')
Exception: Dataset not found.
Internal process exited
(Olive_Yolo_5) D:\me1eye\YOLO_V_5\Yolo_V5\yolov5>

there is a much simpler solution. Just go into data.yaml wherever you saved it and change the relative paths to absolut - i.e. just write the whole path! e.g.
train: C:\hazlab\BCCD\train\images
val: C:\hazlab\BCCD\valid\images
nc: 3
names: ['Platelets', 'RBC', 'WBC']
job done - note, as you are in Windows, there is a known issue in the invocation of tain.py - do not use quotes on the file names in the CLI e.g.
!python train.py --img 416 --batch 16 --epochs 100 --data C:\hazlab\BCCD\data.yaml --cfg ./models/custom_yolov5s.yaml --weights '' --name yolov5s_results --cache

Well! I have also encountered this problem and now I fix it.
All you have to do is to keep train, test, validation (these three folders containing images and labels), and yolov5 folder (that is cloned from GitHub) in the same directory. Also, another thing is that the 'data.yaml' file has to be inside the yolov5 folder.
Command to train the model would be like this:
!python train.py --img 416 --batch 16 --epochs 10 --data ./data.yaml --cfg ./models/yolov5m.yaml --weights '' --name yolov5m_results

The issue is due to not found actual dataset path. I found same issue when i trained the Yolov5 model on custom dataset using google colab, I did the following to resolve this.
Make sure provide correct path of data.yaml of dataset.
Make sure path of dataset in data.yaml should be be corrected.
train, test, and valid key should contain path with respect to the main path of the dataset.
Example data.yaml file given below.
path: /content/drive/MyDrive/car-detection-dataset
train: train/images
val: valid/images
test: test/images
nc: 1
names: ['car']

Related

Pandoc Mermaid filter

I am trying to use this pandoc filter to convert markdown to HTML.
This is the example file:
gantt
dateFormat YYYY-MM-DD
title Adding GANTT diagram functionality to mermaid
section A section
Completed task :done, des1, 2014-01-06,2014-01-08
Active task :active, des2, 2014-01-09, 3d
Future task : des3, after des2, 5d
Future task2 : des4, after des3, 5d
section Critical tasks
Completed task in the critical line :crit, done, 2014-01-06,24h
Implement parser and jison :crit, done, after des1, 2d
Create tests for parser :crit, active, 3d
Future task in critical line :crit, 5d
Create tests for renderer :2d
Add to mermaid :1d
This is the command that I am running:
pandoc file.md -f markdown -o out.html --filter=pandoc-mermaid
This is the error message:
File "D:\Anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\Anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Anaconda3\Scripts\pandoc-mermaid.exe\__main__.py", line 7, in <module>
File "D:\Anaconda3\lib\site-packages\pandoc_mermaid_filter.py", line 38, in main
toJSONFilter(mermaid)
File "D:\Anaconda3\lib\site-packages\pandocfilters.py", line 130, in toJSONFilter
toJSONFilters([action])
File "D:\Anaconda3\lib\site-packages\pandocfilters.py", line 164, in toJSONFilters
sys.stdout.write(applyJSONFilters(actions, source, format))
File "D:\Anaconda3\lib\site-packages\pandocfilters.py", line 195, in applyJSONFilters
altered = walk(altered, action, format, meta)
File "D:\Anaconda3\lib\site-packages\pandocfilters.py", line 123, in walk
return {k: walk(v, action, format, meta) for k, v in x.items()}
File "D:\Anaconda3\lib\site-packages\pandocfilters.py", line 123, in <dictcomp>
return {k: walk(v, action, format, meta) for k, v in x.items()}
File "D:\Anaconda3\lib\site-packages\pandocfilters.py", line 110, in walk
res = action(item['t'],
File "D:\Anaconda3\lib\site-packages\pandoc_mermaid_filter.py", line 31, in mermaid
subprocess.check_call([MERMAID_BIN, "-i", src, "-o", dest])
File "D:\Anaconda3\lib\subprocess.py", line 359, in check_call
retcode = call(*popenargs, **kwargs)
File "D:\Anaconda3\lib\subprocess.py", line 340, in call
with Popen(*popenargs, **kwargs) as p:
File "D:\Anaconda3\lib\subprocess.py", line 858, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "D:\Anaconda3\lib\subprocess.py", line 1311, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] El sistema no puede encontrar el archivo especificado
Error running filter pandoc-mermaid:
Filter returned error status 1
The folder where the executable is located is apparently added to path. Any ideas on how I could fix it?
Specifications:
Windows 10 Home
pandoc 2.14.0.1
Thanks
Not solving the problem but providing a different way to achieve the same and in case it's useful to others who arrive here:
Mermaid-cli has the feature to act as a preprocessor for Markdown.
As such, you can do the following:
mkdir build
npx -p #mermaid-js/mermaid-cli mmdc -i file.md -o build/output-svg.md
# Generates lots of output-svg-1.svg files linked from the doc
cd build
pandoc output-svg.md -o output.html
It defaults to SVG images (which generally works fine for HTML but fails when targeting PDF). It has a less well documented feature to force PNG generation.
npx -p #mermaid-js/mermaid-cli mmdc -i file.md --outputFormat=png -o build/output-png.md
# Generates lots of output-png-1.png files linked from the doc
cd build
pandoc output-png.md -o output.pdf
I found success with the NPM library which adds a similar Mermaid filter for Pandoc: raghur/mermaid-filter.
You should be able to do:
pandoc file.md -f markdown -o out.html --filter=mermaid-filter
IMPORTANT: On Windows, you need to specify the filter as --filter mermaid-filter.cmd instead of --filter mermaid-filter (I missed this at first and was very confused)

Span-Aste with allennlp - testing against new unseen and unlabeled data

I am trying to use this colab of this github page to extract the triplet [term, opinion, value] from a sentence from my custom dataset.
Here is an overview of the system architecture:
While I can use the sample offered in the colab and also train the model with my data, I don't know I should re-use this against an unlabeled sample.
If I try to run the colab as-is changing only the test and dev data with unlabeled data, I encounter this error:
DEVICE=0 { "names": "sample", "seeds": [
0 ], "sep": ",", "name_out": "results", "kwargs": {
"trainer__cuda_device": 0,
"trainer__num_epochs": 10,
"trainer__checkpointer__num_serialized_models_to_keep": 1,
"model__span_extractor_type": "endpoint",
"model__modules__relation__use_single_pool": false,
"model__relation_head_type": "proper",
"model__use_span_width_embeds": true,
"model__modules__relation__use_distance_embeds": true,
"model__modules__relation__use_pair_feature_multiply": false,
"model__modules__relation__use_pair_feature_maxpool": false,
"model__modules__relation__use_pair_feature_cls": false,
"model__modules__relation__use_span_pair_aux_task": false,
"model__modules__relation__use_span_loss_for_pruners": false,
"model__loss_weights__ner": 1.0,
"model__modules__relation__spans_per_word": 0.5,
"model__modules__relation__neg_class_weight": -1 }, "root": "aste/data/triplet_data" } { "root": "/content/Span-ASTE/aste/data/triplet_data/sample", "train_kwargs": {
"seed": 0,
"trainer__cuda_device": 0,
"trainer__num_epochs": 10,
"trainer__checkpointer__num_serialized_models_to_keep": 1,
"model__span_extractor_type": "endpoint",
"model__modules__relation__use_single_pool": false,
"model__relation_head_type": "proper",
"model__use_span_width_embeds": true,
"model__modules__relation__use_distance_embeds": true,
"model__modules__relation__use_pair_feature_multiply": false,
"model__modules__relation__use_pair_feature_maxpool": false,
"model__modules__relation__use_pair_feature_cls": false,
"model__modules__relation__use_span_pair_aux_task": false,
"model__modules__relation__use_span_loss_for_pruners": false,
"model__loss_weights__ner": 1.0,
"model__modules__relation__spans_per_word": 0.5,
"model__modules__relation__neg_class_weight": -1 }, "path_config": "/content/Span-ASTE/training_config/aste.jsonnet", "repo_span_model": "/content/Span-ASTE", "output_dir": "model_outputs/aste_sample_c7b00b66bf7ec669d23b80879fda043d", "model_path": "models/aste_sample_c7b00b66bf7ec669d23b80879fda043d/model.tar.gz", "data_name": "sample", "task_name": "aste" }
# of original triplets: 11
# of triplets for current setup: 11
# of original triplets: 7
# of triplets for current setup: 7 Traceback (most recent call last): File "/usr/lib/python3.7/pdb.py", line 1699, in main
pdb._runscript(mainpyfile)
File "/usr/lib/python3.7/pdb.py", line 1568, in _runscript
self.run(statement)
File "/usr/lib/python3.7/bdb.py", line 578, in run
exec(cmd, globals, locals) File "<string>", line 1, in <module>
File "/content/Span-ASTE/aste/main.py", line 1, in <module>
import json
File "/usr/local/lib/python3.7/dist-packages/fire/core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/usr/local/lib/python3.7/dist-packages/fire/core.py", line 468, in
_Fire
target=component.__name__)
File "/usr/local/lib/python3.7/dist-packages/fire/core.py", line 672, in
_CallAndUpdateTrace
component = fn(*varargs, **kwargs) File "/content/Span-ASTE/aste/main.py", line 278, in main
scores = main_single(p, overwrite=True, seed=seeds[i], **kwargs)
File "/content/Span-ASTE/aste/main.py", line 254, in main_single
trainer.train(overwrite=overwrite)
File "/content/Span-ASTE/aste/main.py", line 185, in train
self.setup_data()
File "/content/Span-ASTE/aste/main.py", line 177, in setup_data
data.load()
File "aste/data_utils.py", line 214, in load
opinion_offset=self.opinion_offset,
File "aste/evaluation.py", line 165, in read_inst
o_output = line[2].split() # opinion IndexError: list index out of range Uncaught exception. Entering post mortem debugging Running 'cont' or 'step' will restart the program
> /content/Span-ASTE/aste/evaluation.py(165)read_inst()
-> o_output = line[2].split() # opinion (Pdb)
From my understanding, it seems that it is searching for the labels to start the evaluation. The problem is that I don't have those labels - although I have provided training set with similar data and labels associated.
I am new in deep learning and also allennlp so I am probably missing knowledge. I have tried to solve this for the past 2 weeks but I am still stuck, so here I am.
KeyPi, this is a supervised learning model, it needs labelled data for your text corpus in the form sentence(ex: I charge it at night and skip taking the cord with me because of the good battery life .) followed by '#### #### ####' as a separator and list of labels(include aspect/target word index in first list and the openion token index in the sentence followed by 'POS' for Positive and 'NEG' for negitive.) [([16, 17], [15], 'POS')]
16 and 17- battery life and in index 15, we have openion word "good".
I am not sure if you have figures this out already and find some way to label the corpus.

How to specify gunicorn log max size

I'm running gunicorn as:
guiconrn --bind=0.0.0.0:5000 --log-file gunicorn.log myapp:app
Seems like gunicorn.log keeps growing. Is there a way to specify a max size of the log file, so that if it reaches max size, it'll just override it.
Thanks!!
TLDR;
I believe there might be a "python only" solution using the rotating file handler provided in the internal lib of python. (at least 3.10)
To test
I created a pet project for you to fiddle with:
Create the following python file
test_logs.py
import logging
import logging.config
import time
logging.config.fileConfig(fname='log.conf', disable_existing_loggers=False)
while True:
time.sleep(0.5)
logging.debug('This is a debug message')
logging.info('This is an info message')
logging.warning('This is a warning message')
logging.error('This is an error message')
logging.critical('This is a critical message')
Create the following config file
log.conf
[loggers]
keys=root
[handlers]
keys=rotatingHandler
[formatters]
keys=sampleFormatter
[logger_root]
level=DEBUG
handlers=rotatingHandler
[handler_rotatingHandler]
class=logging.handlers.RotatingFileHandler
level=DEBUG
formatter=sampleFormatter
args=('./logs/logs.log', 'a', 1200, 1, 'utf-8')
[formatter_sampleFormatter]
format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
Create the ./logs directory
Run python test_logs.py
To Understand
As you may have noticed already, the setting that allow for this behaviour is logging.handlers.RotatingFileHandler and the provided arguments args=('./logs/logs.log', 'a', 1200, 10, 'utf-8')
RotatingFileHandler is a stream handler writing to a file. That allow for 2 parameters of interest:
maxBytes set arbitrarily at 1200
backupCount set arbitrarily to 10
The behaviour is that upon reaching 1200 Bytes in size, the file is closed, renamed to /logs/logs.log.<a number up to 10> and a new file is opened.
BUT is any of maxBytes or backupCount is 0. No rotation is done !
In Gunicorn
As per the documentation you can feed a config file.
This could look like:
guiconrn --bind=0.0.0.0:5000 --log-config log.conf myapp:app
You will need to tweak it to your existing setup.
On Ubuntu/Linux, suggest to use logrotate to manage your logs, do like this: https://stackoverflow.com/a/55643449/6705684
Since Python>3.3, With RotatingFileHandler, here is my solution(MacOS/Windows/Linux/...) :
import os
import logging
from logging.handlers import RotatingFileHandler
fmt_str = '[%(asctime)s]%(module)s - %(funcName)s - %(message)s'
fmt = logging.Formatter(fmt_str)
def rotating_logger(name, fmt=fmt,
level=logging.INFO,
logfile='.log',
maxBytes=10 * 1024 * 1024,
backupCount=5,
**kwargs
):
logger = logging.getLogger(name)
hdl = RotatingFileHandler(logfile, maxBytes=maxBytes, backupCount=backupCount)
hdl.setLevel(level)
hdl.setFormatter(fmt)
logger.addHandler(hdl)
return logger
more refer:
https://docs.python.org/3/library/logging.handlers.html#rotatingfilehandler

MALLET - How to pass the csv file which contains word count to näive bayes in mallet?

I have created the CSV file which contains label name and word frequency.
e.g.
0, 4.0, 0.0, 0.0, 1.0, 0.0
0, 0.0, 1.0, 2.0, 0.0, 0.0
1, 1.0, 0.0, 0.0, 0.0, 3.0
Where the index zero represents the label (0 and 1)
My question is, How to import this kind CSV file into mallet to generate instance list? How to pass this file to Näive Bayes Classifier?
I found the answer to my own question.
In mallet, there are some pipes which create CSV to feature vector.
pipeList.add(new Csv2Array());
pipeList.add(new Target2Label());
pipeList.add(new Array2FeatureVector());
Output for above example:
0 and 1: It takes as target name.
for the first line:
1(1)=4.0
2(2)=0.0
3(3)=0.0
4(4)=1.0
5(5)=0.0
same for other two lines.

How does one specify the input when using a CSV with Kur

I'm trying to feed a CSV file to Kur, but I don't know how to specify more than one column in the input without the program crashing. Here's a small example:
model:
- input:
- SepalWidthCm
- SepalLengthCm
- dense: 10
- activation: tanh
- dense: 3
- activation: tanh
name: Species
train:
data:
- csv:
path: Iris.csv
header: yes
epochs: 1000
weights: best.w
log: tutorial-log
loss:
- target: Species
name: mean_squared_error
The error:
File "/Users/bytter/.pyenv/versions/3.5.2/bin/kur", line 11, in <module>
sys.exit(main())
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/__main__.py", line 269, in main
sys.exit(args.func(args) or 0)
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/__main__.py", line 48, in train
func = spec.get_training_function()
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/kurfile.py", line 282, in get_training_function
model = self.get_model(provider)
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/kurfile.py", line 148, in get_model
self.model.build()
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/model/model.py", line 282, in build
self.build_graph(input_nodes, output_nodes, network)
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/model/model.py", line 356, in build_graph
for layer in node.container.build(self):
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/containers/container.py", line 281, in build
self._built = list(self._build(model))
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/containers/layers/placeholder.py", line 122, in _build
'Placeholder "{}" requires a shape.'.format(self.name))
kur.containers.parsing_error.ParsingError: Placeholder "..input.0" requires a shape.
Using - input: SepalWidthCm works as expected.
The problem with your approach is that Kur doesn't know how you want the inputs concatenated. Should your input become 2D tensor of dimensions (2, N) (where N is the number of data points in your CSV file), like this?
[
[SepalWidthCm_0, SepalWidthCm_1, ...],
[SepalLengthCm_0, SepalLengthCm_1, ...]
]
(N.B., that example isn't a very deep-learning friendly structure.) Or should it be combined into a tensor of dimensions (N, 2), like this?
[
[SepalWidthCm_0, SepalLengthCm_0],
[SepalWidthCm_1, SepalLengthCm_1],
...
]
Or maybe you want to apply the same operations to each column in parallel? Regardless, this problem gets a lot harder / more ambiguous to answer when your input data is multi-dimensional (e.g., instead of scalars like length or width, you have vectors or even matrices).
Instead of trying to guess what you want (and possibly getting it wrong), Kur expects each input to be a single data source, which you can then combine however you see fit.
Here are a couple ways you might want your data combined, and how to do it in Kur.
Row-wise Combination. This is the second example above, where we want to combine "rows" of CSV data into tuples, so that the input tensor has dimensionality (batchSize, 2). Then your Kur model would look like:
model:
# Define the model inputs.
- input: SepalWidthCm
- input: SepalLengthCm
# Concatenate the inputs.
- merge: concat
inputs: [SepalWidthCm, SepalLengthCm]
# Do processing on these "vectorized" inputs.
- dense: 10
- activation: tanh
- dense: 1
- activation: tanh
# Output
- output: Species
Independent Processing, and then Combining. This is the setup where you do some operations on each input column independently, and then you merge them together (potentially with some more operations afterwards). In ASCII-art, this might look like:
INPUT_1 --> dense, activation --\
+---> dense, activation --> OUTPUT
INPUT_2 --> dense, activation --/
In this case, you would have a Kur model that looks like this:
model:
# First "branch" of processing.
- input: SepalWidthCm
- dense: 10
- activation: tanh
name: WidthBranch
# Second "branch" of processing.
- input: SepalLengthCm
- dense: 10
- activation: tanh
name: LengthBranch
# Fuse things together.
- merge:
inputs: [WidthBranch, LengthBranch]
# Continue some processing
- dense: 1
- activation: tanh
# Output
- output: Species
Keep in mind that the merge layer has been around since Kur 0.3, so make sure you using a recent version.
(Disclaimer: I am the core maintainer of Kur.)