What is the difference between ipfs.add and ipfs.dag.put? - ipfs

Using the js-ipfs lib, I'm struggling to find good information regarding the difference between the following commands:
> await ipfs.add('hello world',{cidVersion:1})
{
path: 'bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e',
cid: CID(bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e),
size: 11,
mode: undefined,
mtime: undefined
}
> await ipfs.dag.put('hello world')
CID(bafyreifg3qptriirganaf6ggmbdhclgzz7gncundvtsyrovyzqigm25jfe)
My expectation: CIDs would be the same.
Would appreciate any pointers.

Below is based on ipfs add and dag put from Kubo RPC, but JS-IPFS was based on it and most likely does something similar (only difference may be that dag put in Kubo expects/requires dag-json by default)
ipfs add is for adding files and directories and representing them as UnixFS. It will chunk bigger files into smaller blocks and represent them as dag-pb or raw bytes (leaves).
ipfs dag put allows you to operate on IPLD data structures other than files and directories. In Kubo, this command will assume input to be dag-json and will store it as binary dag-cbor.
You can compare produced CIDs at https://cid.ipfs.tech – in your example:
ipfs add created raw block (because 'hello world' fits in a single block, no need for dag-pb)
ipfs dag created dag-cbor (because that is the default --store-codec)

Related

CSV Data Set Config not looping

I'm using v5.1.1 of JMeter and attempting to use the "CSV Data Set Config". The file is read correctly as I can tell from the Debug Sampler/Results Tree, but the file is not being read line by line. In other words, it reads the first line and never proceeds to the next line for processing.
I would like to use the data inside the CSV to iterate over a series of HTTP Requests to an external API. I currently have a single thread with only the "CSV Data Set Config" and "HTTP Request".
Do I need to wrap this with a ForEach controller or another looping construct? Perhaps I'm missing it but I do not see in the documentation that would indicate it's necessary.
Thanks
You dont need to wrap this in a ForEach loop. First line in the CSV file is a var name:
Let's say your csv file looks like
foo, bar
1, John
2, George
3, Laura
And you use an http request sampler
then ${foo} and ${bar} will get iterated sequentially. However please make sure you are mindful about the CSV Data Set Config options. The following options works ok for me:
By default CSV Data Set Config doesn't trigged any "looping", it reads next line from the CSV file for each thread (virtual user) for each iteration.
So if you want to see more values from the CSV file - either add more users or loops or both.
Given
This CSV file:
line1
line2
line3
Following CSV Data Set Config setup:
And the following Thread Group setup:
You will get the following values (assuming __threadNum() function to visualize current virtual user number and ${__jm__Thread Group__idx} pre-defined variable to show current Thread Group iteration) :
Check out JMeter Parameterization - The Complete Guide article for more information on various approaches on parameterizing JMeter tests using external data sources

LMDB: How to interpret output from mdb_stat and mdb_dump utilities

I have a functional LMDB that, for test purposes, currently contains only 21 key / value records. I've successfully tested inserting and reading records, and I'm comfortable with the database working as intended.
However, when I use the mdb_stat and mdb_dump utilities, I see the following output, respectively:
Status of Main DB
Tree depth: 1
Branch pages: 0
Leaf pages: 1
Overflow pages: 0
Entries: 1
VERSION=3
format=bytevalue
type=btree
mapsize=1073741824
maxreaders=126
db_pagesize=4096
HEADER=END
4d65737361676573
000000000000010000000000000000000100000000000000d81e0000000000001500000000000000ba1d000000000000
DATA=END
In particular, why would mdb_stat indicate only one entry when I have 21? Moreover, each entry comprises 1024 x 300 values of five bytes per value. mdb_dump obviously doesn't show anywhere near the 1,536,000 bytes I'd expect to see, yet the values I mdb_put() and mdb_get() on the fly are correct. Anyone know what's going on?
The relationship between an operating system's directory and an LMDB environment's data.mdb and lock.mdb files is one-to-one.
If the LMDB environment (in the OS directory) has more than one database, then the environment also contains a separate LMDB database containing all of its named databases.
The mdb_stat and mdb_dump utilities appear to contain minimal logic, so when they are fed a given directory via the command line, they appear to produce results only for the database storing database names and not the database(s) storing the actual data of interest.
4d65737361676573 is the Ascii for "Messages", which is the name of table ("sub-db" in lmdb terminology) storing the actual data in your case.
The mdb_dump command only dumps the main db by default. You can use the -s option to dump that sub-db, i.e.
mdb_dump -s Messages
or you can use the -a option to dump all the sub-dbs.
Since you are using a sub-database, the number of entries in the main database corresponds to the number of sub-databases you've created (ie just 1).
Try using mdb_stat -a. This will show you a break-down of all the sub-databases (as well as the main DB). In this breakdown it will list the number of entries for each sub-database. Here you should see your 21 entries.

How to Convert Open Image Dataset to LMDB [duplicate]

I am relatively new to machine learning/python/ubuntu.
I have a set of images in .jpg format where half contain a feature I want caffe to learn and half don't. I'm having trouble in finding a way to convert them to the required lmdb format.
I have the necessary text input files.
My question is can anyone provide a step by step guide on how to use convert_imageset.cpp in the ubuntu terminal?
Thanks
A quick guide to Caffe's convert_imageset
Build
First thing you must do is build caffe and caffe's tools (convert_imageset is one of these tools).
After installing caffe and makeing it make sure you ran make tools as well.
Verify that a binary file convert_imageset is created in $CAFFE_ROOT/build/tools.
Prepare your data
Images: put all images in a folder (I'll call it here /path/to/jpegs/).
Labels: create a text file (e.g., /path/to/labels/train.txt) with a line per input image . For example:
img_0000.jpeg 1
img_0001.jpeg 0
img_0002.jpeg 0
In this example the first image is labeled 1 while the other two are labeled 0.
Convert the dataset
Run the binary in shell
~$ GLOG_logtostderr=1 $CAFFE_ROOT/build/tools/convert_imageset \
--resize_height=200 --resize_width=200 --shuffle \
/path/to/jpegs/ \
/path/to/labels/train.txt \
/path/to/lmdb/train_lmdb
Command line explained:
GLOG_logtostderr flag is set to 1 before calling convert_imageset indicates the logging mechanism to redirect log messages to stderr.
--resize_height and --resize_width resize all input images to same size 200x200.
--shuffle randomly change the order of images and does not preserve the order in the /path/to/labels/train.txt file.
Following are the path to the images folder, the labels text file and the output name. Note that the output name should not exist prior to calling convert_imageset otherwise you'll get a scary error message.
Other flags that might be useful:
--backend - allows you to choose between an lmdb dataset or levelDB.
--gray - convert all images to gray scale.
--encoded and --encoded_type - keep image data in encoded (jpg/png) compressed form in the database.
--help - shows some help, see all relevant flags under Flags from tools/convert_imageset.cpp
You can check out $CAFFE_ROOT/examples/imagenet/convert_imagenet.sh
for an example how to use convert_imageset.

Entry delimiter of JSON files for Hive table

We are collecting JSON data (public social media posts in particular) via REST API invocations, which we plan to dump into HDFS, then abstract a Hive table on top it using SerDe. I wonder though what would be the appropriate delimiter per JSON entry in a file? Is it new line ("\n")? So it would look like this:
{ id: entry1 ... post: }
{ id: entry2 ... post: }
...
{ id: entryn ... post: }
How about if we encounter a new line character within the JSON data itself, for example in post?
The best way would be one record per line, separated by "\n" exactly as you guessed.
This also means that you should be careful to escape "\n" that may be inside the JSON elements.
Indented JSON won't work well with hadoop/hive, since to distribute processing, hadoop must be able to tell when a records ends, so it can split processing of a file with N bytes with W workers in W chunks of size roughly N/W.
The splitting is done by the particular InputFormat that's been used, in case of text, TextInputFormat.
TextInputFormat will basically split the file at the first instance of "\n" found after byte i*N/W (for i from 1 to W-1).
For this reason, having other "\n" around would confuse Hadoop and it will give you incomplete records.
As an alternative, I wouldn't recommend it, but if you really wanted you could use a character other than "\n" by configuring the property "textinputformat.record.delimiter" when reading the file through hadoop/hive, using a character that won't be in JSON (for instance, \001 or CTRL-A is commonly used by Hive as a field delimiter) but that can be tricky since it has to also be supported by the SerDe.
Also, if you change the record delimiter, anybody who copies/uses the file on HDFS must be aware of the delimiter, or they won't be able to parse it correctly, and will need special code to do it, while keeping "\n" as a delimiter, the files will still be normal text files and can be used by other tools.
As for the SerDe, I'd recommend this one, with the disclaimer that I wrote it :)
https://github.com/rcongiu/Hive-JSON-Serde

How to trigger an OpenNMS event with thresholds

it seems that it is not possible for me to trigger an event in OpenNMS using a threshold...
first the fact (as much detail as i can)
i want to monitor a html file, better, the content.
if a value is not what i expected OpenNMS should call be.
my html file:
Document Count: 5
in /var/lib/opennms/rrd/snmp/NODE are two files named: "documentCount" (.jbr & .meta)
--> because of the http-datacollection-config.xml
in my logfiles is written:
INFO [LegacyScheduler-Thread-2-of-50] RrdUtils: updateRRD: updating RRD file /var/lib/opennms/rrd/snmp/21/documentCount.jrb with values '1385031023:5'"
so the "5" is collected correctly.
now i created a threshold for this case:
<threshold type="high" ds-type="node"
value="4.0" rearm="2.0" trigger="1" triggeredUEI="uei.opennms.org/threshold/highThresholdExceeded"
filterOperator="or" ds-name="documentCount"
/>
in my collectd-configuration.xml is the threshold also enabled:
in my opinion the threshold of 4 is exceeded, because the value is 5. so the highTresholdEvent should be fired. BUT IT DOESNT.
so i'm here to ask if someone had an idea.
regards dawn
Check collectd.log with the following
tail -f collectd.log | grep -i thresholding
Threshold checking was moved to evaluate while the data is being retrieved a while back as opposed to a post process of rrd files.
Even with the log setting at info you should find some clues as to why the threshold rule is not matching any data.