MALLET - How to pass the csv file which contains word count to näive bayes in mallet? - csv

I have created the CSV file which contains label name and word frequency.
e.g.
0, 4.0, 0.0, 0.0, 1.0, 0.0
0, 0.0, 1.0, 2.0, 0.0, 0.0
1, 1.0, 0.0, 0.0, 0.0, 3.0
Where the index zero represents the label (0 and 1)
My question is, How to import this kind CSV file into mallet to generate instance list? How to pass this file to Näive Bayes Classifier?

I found the answer to my own question.
In mallet, there are some pipes which create CSV to feature vector.
pipeList.add(new Csv2Array());
pipeList.add(new Target2Label());
pipeList.add(new Array2FeatureVector());
Output for above example:
0 and 1: It takes as target name.
for the first line:
1(1)=4.0
2(2)=0.0
3(3)=0.0
4(4)=1.0
5(5)=0.0
same for other two lines.

Related

How can I read such prj file?

I'm writing gis application using DotSpatial 2.0
Now I have data from town hall and
Shapefile sf = Shapefile.OpenFile(Path);
throws
DotSpatial.Projections.ProjectionException
'Projection Not Found'
ProjectionInfo proj = ProjectionInfo.Open(Path);
throws:
DotSpatial.Projections.InvalidEsriFormatException
Could anyone tell what is wrong with this file? and how to handle this case?
The file content is:
PROJCS["ETRS89 / Poland CS2000 zone 6",
GEOGCS["ETRS89",
DATUM["European Terrestrial Reference System 1989",
SPHEROID["GRS 1980", 6378137.0, 298.257222101,
AUTHORITY["EPSG","7019"]],
TOWGS84[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
AUTHORITY["EPSG","6258"]],
PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]],
UNIT["degree", 0.017453292519943295],
AXIS["Geodetic longitude", EAST],
AXIS["Geodetic latitude", NORTH],
AUTHORITY["EPSG","4258"]],
PROJECTION["Transverse_Mercator", AUTHORITY["EPSG","9807"]],
PARAMETER["central_meridian", 18.0],
PARAMETER["latitude_of_origin", 0.0],
PARAMETER["scale_factor", 0.999923],
PARAMETER["false_easting", 6500000.0],
PARAMETER["false_northing", 0.0],
UNIT["m", 1.0],
AXIS["Easting", EAST],
AXIS["Northing", NORTH],
AUTHORITY["EPSG","2177"]]
Other applications like SpatialManager and ArcMap read it corectly.

What can a `RecordBatch` do that a `StructArray` cannot?

Among the different types of arrays that exist in Arrow, one of them is the StructArray. When converted to a pandas structure using PyArrow, it returns a pd.Series with a number of rows, each of them containing a dictionary:
>>> pa.array([{'x': 1, 'y': True}, {'z': 3.4, 'x': 4}]).to_pandas()
0 {'x': 1.0, 'y': True, 'z': None}
1 {'x': 4.5, 'y': None, 'z': 3.4}
dtype: object
On the other hand, a RecordBatch advertises itself as "a collection of equal-length array instances":
>>> pa.RecordBatch.from_pylist([{'x': 1, 'y': True, 'z': None}, {'z': 3.4, 'x': 4.5}]).to_pandas()
x y z
0 1.0 True NaN
1 4.5 None 3.4
However, even if one takes the form of a pd.Series and the other a pd.DataFrame, it seems to me that, essentially, they both contain the same information. One is an array, the other is a collection of arrays.
Therefore, what can a RecordBatch do that a StructArray cannot?
They're different abstractions. A StructArray is an Array with an associated type; it's a nested array. A RecordBatch is a…RecordBatch and contains a schema; abstractly, it's a 2D chunk of data where each column is contiguous in memory.
They intentionally have similar interfaces because they "look similar" to each other, but for instance, a StructArray can't be written to a file by itself; it would need to be wrapped and converted into a RecordBatch.
Also, StructArray (as an Array) has a top-level validity bitmap (i.e. each of its child arrays has their own validity information, and so does the StructArray itself). But a RecordBatch doesn't have its own validity bitmap.

Yolo V5 issue "Exception: Dataset not found." on local machine

I am trying to train a model using Yolo V5.
I have the issue of Data base not found.
I have a train, test and valid files that contain all the image and labels files.
I have tested the files on googlecolap and it dose work. However, on my local machine it shows the issue of Exception: Dataset not found.
(Yolo_5) D:\\YOLO_V_5\Yolo_V5\yolov5>python train.py --img 416 --batch 8 --epochs 100 --data /data.yaml --cfg models/yolov5s.yaml --weights '' --name yolov5s_results --cache
Using torch 1.7.0 CUDA:0 (GeForce GTX 1080, 8192MB)
Namespace(adam=False, batch_size=8, bucket='', cache_images=True, cfg='models/yolov5s.yaml', data='.\\data.yaml', device='', epochs=100, evolve=False, exist_ok=False, global_rank=-1, hyp='data/hyp.scratch.yaml', image_weights=False, img_size=[416, 416], local_rank=-1, log_imgs=16, multi_scale=False, name='yolov5s_results', noautoanchor=False, nosave=False, notest=False, project='runs/train', rect=False, resume=False, save_dir='runs\\train\\yolov5s_results55', single_cls=False, sync_bn=False, total_batch_size=8, weights="''", workers=16, world_size=1)
Start Tensorboard with "tensorboard --logdir runs/train", view at http://localhost:6006/
Hyperparameters {'lr0': 0.01, 'lrf': 0.2, 'momentum': 0.937, 'weight_decay': 0.0005, 'warmup_epochs': 3.0, 'warmup_momentum': 0.8, 'warmup_bias_lr': 0.1, 'box': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'anchors': 3, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mosaic': 1.0, 'mixup': 0.0}
WARNING: Dataset not found, nonexistent paths: ['D:\\me1eye\\Yolo_V5\\valid\\images']
Traceback (most recent call last):
File "train.py", line 501, in <module>
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 78, in train
check_dataset(data_dict) # check
File "D:\me1eye\YOLO_V_5\Yolo_V5\yolov5\utils\general.py", line 92, in check_dataset
raise Exception('Dataset not found.')
Exception: Dataset not found.
Internal process exited
(Olive_Yolo_5) D:\me1eye\YOLO_V_5\Yolo_V5\yolov5>
there is a much simpler solution. Just go into data.yaml wherever you saved it and change the relative paths to absolut - i.e. just write the whole path! e.g.
train: C:\hazlab\BCCD\train\images
val: C:\hazlab\BCCD\valid\images
nc: 3
names: ['Platelets', 'RBC', 'WBC']
job done - note, as you are in Windows, there is a known issue in the invocation of tain.py - do not use quotes on the file names in the CLI e.g.
!python train.py --img 416 --batch 16 --epochs 100 --data C:\hazlab\BCCD\data.yaml --cfg ./models/custom_yolov5s.yaml --weights '' --name yolov5s_results --cache
Well! I have also encountered this problem and now I fix it.
All you have to do is to keep train, test, validation (these three folders containing images and labels), and yolov5 folder (that is cloned from GitHub) in the same directory. Also, another thing is that the 'data.yaml' file has to be inside the yolov5 folder.
Command to train the model would be like this:
!python train.py --img 416 --batch 16 --epochs 10 --data ./data.yaml --cfg ./models/yolov5m.yaml --weights '' --name yolov5m_results
The issue is due to not found actual dataset path. I found same issue when i trained the Yolov5 model on custom dataset using google colab, I did the following to resolve this.
Make sure provide correct path of data.yaml of dataset.
Make sure path of dataset in data.yaml should be be corrected.
train, test, and valid key should contain path with respect to the main path of the dataset.
Example data.yaml file given below.
path: /content/drive/MyDrive/car-detection-dataset
train: train/images
val: valid/images
test: test/images
nc: 1
names: ['car']

Modularity calculation for weighted graphs in igraph

I used the fastgreedy algorithm in igraph for my community detection in a weighted, undirected graph. Afterwards I wanted to have a look at the modularity and I got different values for different methods and I am wondering why. I included a short example, which demonstrates my problem:
library(igraph)
d<-matrix(c(1, 0.2, 0.3, 0.9, 0.9,
0.2, 1, 0.6, 0.4, 0.5,
0.3, 0.6, 1, 0.1, 0.8,
0.9, 0.4, 0.1, 1, 0.5,
0.9, 0.5, 0.8, 0.5, 1), byrow=T, nrow=5)
g<-graph.adjacency(d, weighted=T, mode="lower",diag=FALSE, add.colnames=NA)
fc<-fastgreedy.community(g)
fc$modularity[3]
#[1] -0.05011095
modularity(g,membership=cutat(fc,steps=2),weights=get.adjacency(g,attr="weight"))
#[1] 0.07193047
I would expect both of the values to be identical and if I try the same with an unweighted graph, I get the same values.
d2<-round(d,digits=0)
g2<- graph.adjacency(d2, weighted=NULL, mode="lower",diag=FALSE, add.colnames=NA)
fc2<-fastgreedy.community(g2)
plot(fc2,g2)
fc2$modularity[3]
#[1] 0.15625
modularity(g2,membership=cutat(fc2,steps=2))
#[1] 0.15625
Another user had a similar problem, but I have the current version of igraph, so that should not be the problem. Can someone explain to me why there is a difference or is there a problem with my code I don't see?
The line
modularity(g,membership=cutat(fc,steps=2),weights=get.adjacency(g,attr="weight"))
is wrong. If you want to pass the weights of edges to modularity(), do it with E(g)$weight:
modularity(g, membership = cutat(fc, steps = 2), weights = E(g)$weight)
# [1] -0.05011095

How do I plot a function and data in Mathematica?

Simple question but I can't find the answer.
I want to combine a ListLinePlot and a regular Plot (of a function) onto one plot. How do I do this?
Thanks.
Use Show, e.g.
Show[Plot[x^2, {x, 0, 3.5}], ListPlot[{1, 4, 9}]]
Note, if plot options conflict Show uses the first plot's option, unless the option is specified in Show. I.e.
Show[Plot[x^2, {x, 0, 3.5}, ImageSize -> 100],
ListPlot[{1, 4, 9}, ImageSize -> 400]]
shows a combined plot of size 100.
Show[Plot[x^2, {x, 0, 3.5}, ImageSize -> 100],
ListPlot[{1, 4, 9}, ImageSize -> 400], ImageSize -> 300]
Shows a combined plot of size 300.
An alternative to using Show and combining two separate plots, is to use Epilog to add the data points to the main plot. For example:
data = Table[{i, Sin[i] + .1 RandomReal[]}, {i, 0, 10, .5}];
Plot[Sin[x], {x, 0, 10}, Epilog -> Point[data], PlotRange -> All]
or
Plot[Sin[x], {x, 0, 10}, Epilog -> Line[data], PlotRange -> All]