Early stopping based on BLEU in FairSeq - deep-learning

My goal is to use BLEU as early stopping metric while training a translation model in FairSeq.
Following the documentation, I am adding the following arguments to my training script:
--eval-bleu --eval-bleu-args --eval-bleu-detok --eval-bleu-remove-bpe
I am getting the following error:
fairseq-train: error: unrecognized arguments: --eval-bleu --eval-bleu-args --eval-bleu-detok --eval-bleu-remove-bpe
System information:
fairseq version: 0.10.2
torch: 1.10.1+cu113
More Details:
When I am trying to finetune M2M100 model, I am getting error as:
KeyError: 'bleu'
when using following:
CUDA_VISIBLE_DEVICES=0,1,2,3 fairseq-train \
$path_2_data --ddp-backend=no_c10d \
--best-checkpoint-metric bleu \
--maximize-best-checkpoint-metric \
--max-tokens 2048 --no-epoch-checkpoints \
--finetune-from-model $pretrained_model \
--save-dir $checkpoint --task translation_multi_simple_epoch \
--encoder-normalize-before \
--langs 'af,am,ar,ast,az,ba,be,bg,bn,br,bs,ca,ceb,cs,cy,da,de,el,en,es,et,fa,ff,fi,fr,fy,ga,gd,gl,gu,ha,he,hi,hr,ht,hu,hy,id,ig,ilo,is,it,ja,jv,ka,kk,km,kn,ko,lb,lg,ln,lo,lt,lv,mg,mk,ml,mn,mr,ms,my,ne,nl,no,ns,oc,or,pa,pl,ps,pt,ro,ru,sd,si,sk,sl,so,sq,sr,ss,su,sv,sw,ta,th,tl,tn,tr,uk,ur,uz,vi,wo,xh,yi,yo,zh,zu' \
--lang-pairs $lang_pairs \
--decoder-normalize-before --sampling-method temperature \
--sampling-temperature 1.5 --encoder-langtok src \
--decoder-langtok --criterion label_smoothed_cross_entropy \
--label-smoothing 0.2 --optimizer adam --adam-eps 1e-06
--adam-betas '(0.9, 0.98)' --lr-scheduler inverse_sqrt \
--lr 3e-05 --warmup-updates 2500 --max-update 400000 \
--dropout 0.3 --attention-dropout 0.1 \
--weight-decay 0.0 --update-freq 2 --save-interval 1 \
--save-interval-updates 5000 --keep-interval-updates 10 \
--seed 222 --log-format simple --log-interval 2 --patience 5 \
--arch transformer_wmt_en_de_big --encoder-layers 24 \
--decoder-layers 24 --encoder-ffn-embed-dim 8192 \
--decoder-ffn-embed-dim 8192 --encoder-layerdrop 0.05 \
--decoder-layerdrop 0.05 --share-decoder-input-output-embed \
--share-all-embeddings --fixed-dictionary $fix_dict --fp16 \
--skip-invalid-size-inputs-valid-test

The task that you are using translation_multi_simple_epoch does not have these arguments; they are specific for translation task.
Note that some of the arguments that you are using require values.
--eval-bleu-args expects a path to a configuration JSON for SacreBLEU. If you want to you the default 4-gram BLEU, you should skip this.
--eval-bleu-detok expects a specification of how you want to detokenize the model output. The default value is space which does not do anything.
For more details, see the documentation of the translation task in FairSeq.

Related

Scala - How to Split all List of List Json Nodes using json-path

I have a Json from which I want to pick List of List Json, where instance can be multiple inside List. Using json-path easily we can pick if giving index number of List/Array. But in a Big File we don't know total how many instance will be there and we have not to loose any data. So number of instance has to be check in a dynamic way and pick seperate json for all inside List node. Additionally has to create relation_path also for all the Data.
Can Anyone suggest How to check if a json node is Array/List (Ex : 2 Drive) and how many nested List objects are available like 2 Partition in 1st Drive and 1 Partition in 2nd Drive. These numbers are not fixed to be provide in json-path code.
Input List of List Json :
{"Start":{"HInfo":{"InfoId":"650FEC74","Revision":"5.2.0.51","Drive":[{"InfoId":"650FEC74","Index":0,"Name":"Drive0","Partition":[{"InfoId":"650FEC74","DriveID":"F91B1F36","Index":0},{"InfoId":"650FEC74","DriveID":"F91B1F36","Index":1}]},{"InfoId":"650FEC74","Index":1,"Name":"Drive1","Partition":{"InfoId":"650FEC74","DriveID":"3F275869","Index":0}}]}}}
Output List of Json :
[{"Partition":[{"InfoId":"650FEC74","DriveID":"F91B1F36","Index":0},{"InfoId":"650FEC74","DriveID":"F91B1F36","Index":1}],"relation_tree":"Start/HInfo/Drive/Drive-1/Partition"},{"Partition":{"InfoId":"650FEC74","DriveID":"3F275869","Index":0},"relation_tree":"Start/HInfo/Drive/Drive-2/Partition"}]
What I am trying using json-path, but this is not fittable as I here I am providing Index Number manually, which is not possible in all the case as index number can be 0 to any.
val jsonString = """{"Start":{"HInfo":{"InfoId":"650FEC74","Revision":"5.2.0.51","Drive":[{"InfoId":"650FEC74","Index":0,"Name":"Drive0","Partition":[{"InfoId":"650FEC74","DriveID":"F91B1F36","Index":0},{"InfoId":"650FEC74","DriveID":"F91B1F36","Index":1}]},{"InfoId":"650FEC74","Index":1,"Name":"Drive1","Partition":{"InfoId":"650FEC74","DriveID":"3F275869","Index":0}}]}}}"""
val jsonStr: JsValue = Json.parse(jsonString)
var pruneJson1 = (__ \ "Partition").json.copyFrom((__ \ "Start" \ "HInfo" \ "Drive" \ (0) \ "Partition").json.pick)
val finalPartitionPrune1 = Option(jsonStr.transform(pruneJson1)).get.get.as[JsObject] + ("relation_tree" -> Json.toJson("Start"+"/"+"HInfo"+"/"+"Drive"+"/"+"Drive-1"+"/"+"Partition"))
println(finalPartitionPrune1)
var pruneJson2 = (__ \ "Partition").json.copyFrom((__ \ "Start" \ "HInfo" \ "Drive" \ (1) \ "Partition").json.pick)
val finalPartitionPrune2 = Option(jsonStr.transform(pruneJson2)).get.get.as[JsObject] + ("relation_tree" -> Json.toJson("Start"+"/"+"HInfo"+"/"+"Drive"+"/"+"Drive-2"+"/"+"Partition"))
println(finalPartitionPrune2)
This is the simplest solution I could think of:
val finalJson = Json.toJson(
(jsonStr \ "Start" \ "HInfo" \ "Drive")
.as[Seq[JsValue]]
.map(jsValue => JsObject(Seq(
"Partition" -> (jsValue \ "Partition").get,
"relation_tree" -> JsString(s"Start/HInfo/Drive/Drive-${(jsValue \ "Index").get}/Partition")))))
Basically it reads all drives as sequence of JsValues and then maps them to JsObjects with needed format. It uses Index value of drive to create relation_tree value, so it will fail if this value is missing. As an alternative you can use zipWithIndex method to add your own indices to sequence. As a final step it converts sequence back to JsValue
Here's zipWithIndex version:
val finalJson = Json.toJson(
(jsonStr \ "Start" \ "HInfo" \ "Drive")
.as[Seq[JsValue]]
.zipWithIndex
.map{ case (jsValue, index) => JsObject(Seq(
"Partition" -> (jsValue \ "Partition").get,
"relation_tree" -> JsString(s"Start/HInfo/Drive/Drive-$index/Partition")))
})

How can I enter Null/None into mysql

I am using Python 3.7 and pyMySQL as a connector to MySQL server.
I am trying to do the following query:
query="INSERT IGNORE INTO tweets (ID, Text,create," \
"Date,local,foolowed,Count," \
"isqwdii,inR,in " \
"Sensitive,redirection)" \
"VALUES (%s,%s,%s,%s,%s,%d,%d,%d,%s,%s,%d,%d)"
vals=[kwargs['ID'], kwargs['Text'],
kwargs['create'], kwargs['Date'],
kwargs['local'], kwargs['foolowed'],
kwargs['Count'], kwargs['isascii'],
kwargs['inR'], kwargs['in'],
kwargs['Sensitive'], kwargs['redirection']]
self.__cur.execute(query,vals)
self.__conn.commit()
the problem is that some of the %d can be None and when it happens I get the following error "TypeError: %d format: a number is required, not str".
I can't use the format because it will make get None as string in the DB. I want the DB get it as None/NULL
Thanks in advance
I think you will need to test if it is NULL in your Python code and if so, replace the %d with %s for string.
(My experience in PHP is Null is a string type and so I am assuming this is carried across to the Python MySQL interfaces)
therefore pseudo-code would be:
>>> type['isascii'] = '%d'
>>> if kwargs['isascii'] is None:
>>> type['isascii'] = '%s'
>>>> query= "INSERT IGNORE INTO tweets (ID, Text,create," \
"Date,local,foolowed,Count," \
"isqwdii,inR,in " \
"Sensitive,redirection)" \
f"VALUES (%s,%s,%s,%s,%s,%d,%d,{str(type['isascii'])},%s,%s,%d,%d)"
I apologise because I do not know Python 3.7 syntax, but I hope this gives you the concept.
Response from Leo120:
Actually it worked when I placed all placeholders as %s (and got all the data as needed in the MySQL). as it turns out, in the pymysql %s stands for all, wierd. thank you for your help

QEmu `eject` complains device is not found while it is there

I need to eject a floppy from QEmu 3.0 monitor, but the command surprisingly fails complaining the device is not found, while it is really there.
Listing of devices:
(qemu) info block
fda: dos-6-22/Dos622-1.img (raw)
Attached to: /machine/unattached/device[11]
Removable device: not locked, tray closed
Cache mode: writeback
hda: hda.img (raw)
Attached to: /machine/peripheral-anon/device[1]
Cache mode: writeback
Eject command result:
(qemu) eject fda
Device 'fda' not found
This is so although this documentation says this is how I have to do: https://www.linux-kvm.org/page/Change_cdrom (just that I want to eject the floppy instead of the CD‑ROM).
The change command complains the same:
(qemu) change fda dos-6-22/Dos622-2.img raw
Device 'fda' not found
Is this a bug or me doing something wrong?
I tried using different node names, with always the same result.
Update:
I’m pretty sure there is no correct answer and it’s rather a bug, which I just submitted: https://bugs.launchpad.net/qemu/+bug/1799766.
I’m posting as an answer, but I’m not strictly sure. I can just say, if I understand correctly, this is a bug.
The answer comes in two parts.
First part, is a stripped down failing invocation:
qemu-system-i386 \
-monitor stdio \
-machine type=isapc,vmport=off \
-blockdev driver=file,node-name=fda-img,filename=fda.img \
-blockdev driver=raw,node-name=fda,file=fda-img \
-global isa-fdc.driveA=fda
(qemu) info block
ide1-cd0: [not inserted]
Attached to: /machine/unattached/device[19]
Removable device: not locked, tray closed
sd0: [not inserted]
Removable device: not locked, tray closed
fda: fda.img (raw)
Attached to: /machine/unattached/device[13]
Removable device: not locked, tray closed
Cache mode: writeback
(qemu) eject fda
Device 'fda' not found
Second part, is the same without the last argument -global isa-fdc.driveA=fda:
qemu-system-i386 \
-monitor stdio \
-machine type=isapc,vmport=off \
-blockdev driver=file,node-name=fda-img,filename=fda.img \
-blockdev driver=raw,node-name=fda,file=fda-img
(qemu) info block
ide1-cd0: [not inserted]
Attached to: /machine/unattached/device[19]
Removable device: not locked, tray closed
floppy0: [not inserted]
Attached to: /machine/unattached/device[13]
Removable device: not locked, tray closed
sd0: [not inserted]
Removable device: not locked, tray closed
(qemu) eject floppy0
There is more error when -global isa-fdc.driveA=fda is removed. However, the documentation says:
-global driver=driver,property=property,value=value
Set default value of driver’s property prop to value, e.g.:
qemu-system-i386 -global ide-hd.physical_block_size=4096 disk-image.img
In particular, you can use this to set driver properties for devices which are created automatically by the machine model. To create a device which is not created automatically and set properties on it, use -device.
-global driver.prop=value is shorthand for -global driver=driver,property=prop,value=value. The longhand syntax works even when driver contains a dot.
What I put a stress on in the quote, suggest I’m not misusing -global and that’s most probably a bug.
Update for more details:
It seems using -drive instead of -device and driveA assignment, the result is not the same, although RedHat documentation recommands using -device instead of -drive and QEmu 3.0 documentation says -drive is essentially a shortcut for -device (“essentially”, not telling about the difference).
Below, two cases, with an except of info block and an excerpt of info qtree.
With this one, eject floppy0 works:
qemu-system-i386 \
-monitor stdio \
-machine type=isapc,vmport=off \
-drive format=raw,if=floppy,media=disk,file=fda.img \
-device isa-vga,vgamem_mb=1 \
-serial msmouse
[…]
floppy0 (#block156): fda.img (raw)
Attached to: /machine/unattached/device[12]
Removable device: not locked, tray closed
Cache mode: writeback
[…]
dev: isa-fdc, id ""
iobase = 1008 (0x3f0)
irq = 6 (0x6)
dma = 2 (0x2)
driveA = ""
driveB = ""
check_media_rate = true
fdtypeA = "auto"
fdtypeB = "auto"
fallback = "288"
isa irq 6
bus: floppy-bus.0
type floppy-bus
dev: floppy, id ""
unit = 0 (0x0)
drive = "floppy0"
logical_block_size = 512 (0x200)
physical_block_size = 512 (0x200)
min_io_size = 0 (0x0)
opt_io_size = 0 (0x0)
discard_granularity = 4294967295 (0xffffffff)
write-cache = "auto"
share-rw = false
drive-type = "144"
With this one, eject fda does not work:
qemu-system-i386 \
-monitor stdio \
-machine type=isapc,vmport=off \
-blockdev driver=file,node-name=fda-img,filename=fda.img \
-blockdev driver=raw,node-name=fda,file=fda-img \
-global isa-fdc.driveA=fda \
-device isa-vga,vgamem_mb=1 \
-serial msmouse
[…]
fda: fda.img (raw)
Attached to: /machine/unattached/device[12]
Removable device: not locked, tray closed
Cache mode: writeback
[…]
dev: isa-fdc, id ""
iobase = 1008 (0x3f0)
irq = 6 (0x6)
dma = 2 (0x2)
driveA = ""
driveB = ""
check_media_rate = true
fdtypeA = "auto"
fdtypeB = "auto"
fallback = "288"
isa irq 6
bus: floppy-bus.0
type floppy-bus
dev: floppy, id ""
unit = 0 (0x0)
drive = "fda"
logical_block_size = 512 (0x200)
physical_block_size = 512 (0x200)
min_io_size = 0 (0x0)
opt_io_size = 0 (0x0)
discard_granularity = 4294967295 (0xffffffff)
write-cache = "auto"
share-rw = false
drive-type = "144"

Extract and sum certain layers of NetCDF variable

I have a NetCDF file, here is a truncated output of ncdump -h:
dimensions:
lat = 720 ;
lon = 1440 ;
cft = 64 ;
natpft = 14 ;
double PCT_CFT(cft, lat, lon) ;
PCT_CFT:long_name = "percent cft" ;
PCT_CFT:units = "unitless" ;
PCT_CFT:_FillValue = -9999. ;
PCT_CFT:coordinates = "LON LAT" ;
double PCT_NAT_PFT(natpft, lat, lon) ;
PCT_NAT_PFT:long_name = "percent pft" ;
PCT_NAT_PFT:units = "unitless" ;
PCT_NAT_PFT:_FillValue = -9999. ;
PCT_NAT_PFT:coordinates = "LON LAT" ;
What I need is to extract and sum values from the variable PCT_CFT for the layers 3, 4, 61 and 62 along the dimension cft and then sum up almost all the remaining layers (ie. 5-60, 63, 64) and add these two results to the variable PCT_NAT_PFT as layers 16 and 15 along the dimension natpft respectively.
I would like to achieve this using NCO (or CDO) if possible, I want to avoid using other tools like Python or R... I only know how to sum up the variable across the whole dimension but not across selected layers only - I could therefore probably work around this problem, but I'd like to know if there's a better and cleaner way to do so.
I will assume your input file is in.nc
And your cft layers are one based !?
1) sum along cft layers,3-4,61-62
ncks --msa_usr_rdr -v PCT_CFT -d cft,2,3 -d cft,60,61 in.nc in_1.nc
ncwa -a cft -y sum in_1.nc sum_1.nc
2) sum along cft layers, 5-60,63-64
ncks --msa_usr_rdr -v PCT_CFT -d cft,4,59 -d cft,62,63 in.nc in_2.nc
ncwa -a cft -y sum in_2.nc sum_2.nc
3) add two new layers to PCT_NAT_PFT
ncks -v PCT_NAT_PFT --msa_usr_rdr -d natpft,0,13 -d natpft,0,1 in.nc in_3.nc
4) add sums from 1), 2) to PCT_NAT_PFT
ncap2 -v -A -s 'PCT_NAT_PFT(15,:,:)=PCT_CFT(:,:);' sum_1.nc in_3.nc
ncap2 -v -A -s 'PCT_NAT_PFT(14,:,:)=PCT_CFT(:,:);' sum_2.nc in_3.nc

Understand tcpdump output for RTCP RR and SR

Can somebody explain the SR/RR parts of this tcpdump output?
Example: tcpdump -n udp -x port 5091 and less 129 -T rtcp
16:58:15.034159 IP 1.2.3.4.5091 > 10.2.3.4.45041: sr #3665059093.56 3025985984 1003p 160480b 3l 1012s 12j #23811.54+1.80 sdes 12
16:58:23.753766 IP 1.2.3.4.5091 > 10.2.3.4.45041: rr 5l 1446s 24j #23820.57+1.49 bye 8
Thanks!
I found the information here
I believe the values (e.g. packet loss) will be seen as "missing" if 0.
Adding -vvv for further verbose and you get something along the lines of:
sr 608743728 #3665062971.29 3057007839 124p 19840b 458089647 2l 135s 7j #0.00+0.00 sdes 12 608743728
Which will be
rtcp_type;ssrc_sender;ntp_timestamp_reference;media_timestamp_reference;num_packets_sent;num_bytes_sent;ssrc_source;packet_loss;ext_last_seq_recieved;jitter;some_ts;no_bytes_of_source_desc;ssrc_sender