Programmatically check data transfer on IPFS - ipfs

We are building a desktop app, on Electron, to share media on IPFS. We want to incentivize the people, who either by an IPFS add or pin, make data available to other users and in effect are "seeding" the data. We want to track how much data is being sent and received by each user, programmatically and periodically.
Is there a standard pattern or a service to be able to do this?
TIA!

On the CLI you can use the ipfs stats bw -p <peer id> command to see the total bytes sent and recieved between your node and the peer id you pass in.
$ ipfs stats bw -p QmeMKDA6HbDD8Bwb4WoAQ7s9oKZTBpy55YFKG1RSHnBz6a
Bandwidth
TotalIn: 875 B
TotalOut: 14 kB
RateIn: 0 B/s
RateOut: 0 B/s
See: https://docs.ipfs.io/reference/api/cli/#ipfs-stats-bw
You can use the ipfs.stats.bw method to the data programatically from the js implementation of IPFS js-ipfs or via the js-ipfs-http-client talking to the http api of a locally running ipfs daemon.
ipfs.stats.bw will show all traffic between to peers, which can include dht queries and other traffic that isn't directly related to sharing blocks of data.
If you want info on just blocks of data shared then you can use ipfs bitswap ledger from the command line.
$ ipfs bitswap ledger QmeMKDA6HbDD8Bwb4WoAQ7s9oKZTBpy55YFKG1RSHnBz6a
Ledger for QmeMKDA6HbDD8Bwb4WoAQ7s9oKZTBpy55YFKG1RSHnBz6a
Debt ratio: 0.000000
Exchanges: 0
Bytes sent: 0
Bytes received: 0
See: https://docs.ipfs.io/reference/api/cli/#ipfs-bitswap-ledger
That api is not directly available in js-ipfs or the js-http-api-client yet.

Related

Why gsutil restore a file from a bucket encrypted with KMS (using a service account without DECRYPT permission)?

I am working with GCP KMS, and it seems that when I send a file to a GCP bucket (using gustil cp) it is encrypted.
However, I have a question related to the permission to restore that file from the same bucket, using a different service account. I mean, the service account that I am using to restore the file from the bucket, doesn't have Decrypt privilege and even so the gustil cp works.
My question is whether it's normal behavior, or if I'm missing something ?
Let me describe my question:
First of all, I confirm that the default encryption for the bucket is the KEY that I set up previously:
$ kms encryption gs://my-bucket
Default encryption key for gs://my-bucket:
projects/my-kms-project/locations/my-location/keyRings/my-keyring/cryptoKeys/MY-KEY
Next, with gcloud config, I set a service account, which has "Storage Object Creator" and "Cloud KMS CryptoKey Encrypter" permissions:
$ gcloud config set account my-service-account-with-Encrypter-and-object-creator-permissions
Updated property [core/account].
I send a local file to the bucket:
$ gsutil cp my-file gs://my-bucket
Copying file://my-file [Content-Type=application/vnd.openxmlformats-officedocument.presentationml.presentation]...
| [1 files][602.5 KiB/602.5 KiB]
Operation completed over 1 objects/602.5 KiB.
After sending the file to the bucket, I confirm that the file is encrypted using the KMS key I created before:
$ gsutil ls -L gs://my-bucket
gs://my-bucket/my-file:
Creation time: Mon, 25 Mar 2019 06:41:02 GMT
Update time: Mon, 25 Mar 2019 06:41:02 GMT
Storage class: REGIONAL
KMS key: projects/my-kms-project/locations/my-location/keyRings/my-keyring/cryptoKeys/MY-KEY/cryptoKeyVersions/1
Content-Language: en
Content-Length: 616959
Content-Type: application/vnd.openxmlformats-officedocument.presentationml.presentation
Hash (crc32c): 8VXRTU==
Hash (md5): fhfhfhfhfhfhfhf==
ETag: xvxvxvxvxvxvxvxvx=
Generation: 876868686868686
Metageneration: 1
ACL: []
Next, I set another service account, but this time WITHOUT DECRYPT permission and with object viewer permission (so that it be able to read files from the bucket):
$ gcloud config set account my-service-account-WITHOUT-DECRYPT-and-with-object-viewer-permissions
Updated property [core/account].
After set up the new service account (WITHOUT Decrypt permission), the gustil to restore the file from the bucket works smooth...
gsutil cp gs://my-bucket/my-file .
Copying gs://my-bucket/my-file...
\ [1 files][602.5 KiB/602.5 KiB]
Operation completed over 1 objects/602.5 KiB.
My question is whether is it a normal behavior ? Or, since the new service account doesn't have Decrypt permission, the gustil cp to restore the file shouldn't work ? I mean, it is not the idea that with KMS encryption, the 2nd gustil cp command should fail with a "403 permission denied" error message or something..
If I revoke "Storage object viewer" privilege from the 2nd service account (to restore the file from the bucket), in this case the gustil fails, but it is because it doesn't have permission to read the file:
$ gsutil cp gs://my-bucket/my-file .
AccessDeniedException: 403 my-service-account-WITHOUT-DECRYPT-and-with-object-viewer-permissions does not have storage.objects.list access to my-bucket.
I appreciate if someone else could give me a hand, and clarify the question....specifically I don't sure whether the command gsutil cp gs://my-bucket/my-file . should work or not.
I think it shouldn't work (because the service account doesn't have Decrypt permission), or should it work ?
This is working correctly. When you use Cloud KMS with Cloud Storage, the data is encrypted and decrypted under the authority of the Cloud Storage service, not under the authority of the entity requesting access to the object. This is why you have to add the Cloud Storage service account to the ACL for your key in order for CMEK to work.
When an encrypted GCS object is accessed, the KMS decrypt permission of the accessor is never used and its presence isn't relevant.
If you don't want the second service account to be able to access the file, remove its read access.
By default, Cloud Storage encrypts all object data using Google-managed encryption keys. You can instead provide your own keys. There are two types:
CSEK which you must supply
CMEK which you also supply, but this time is managed by Google KMS service (this is the one you are using).
When you use gsutil cp, you are already using the encryption method behind the curtains. So, as stated on the documentation for Using Encryption Keys:
While decrypting a CSEK-encrypted object requires supplying the CSEK
in one of the decryption_key attributes, this is not necessary for
decrypting CMEK-encrypted objects because the name of the CMEK used to
encrypt the object is stored in the object's metadata.
As you can see, the key is not necessary because it is already included on the metadata of the object which is the one the gsutil is using.
If encryption_key is not supplied, gsutil ensures that all data it
writes or copies instead uses the destination bucket's default
encryption type - if the bucket has a default KMS key set, that CMEK
is used for encryption; if not, Google-managed encryption is used.

aws s3 > is "aws s3 cp" command implemented with multithreads?

I am newbie in using aws s3 client. I tried to use "aws s3 cp" command to download batch of files from s3 to local file system, it is pretty fast. But I then tried to only read all the contents of the batch of files in a single thread loop by using the amazon java sdk API, it is suprisingly several times slower then the given "aws s3 cp" command :<
Anyone know what is the reason? I doubted that "aws s3 cp" is multi-threaded
If you looked at the source of transferconfig.py, it indicates that the defaults are:
DEFAULTS = {
'multipart_threshold': 8 * (1024 ** 2),
'multipart_chunksize': 8 * (1024 ** 2),
'max_concurrent_requests': 10,
'max_queue_size': 1000,
}
which means that it can be doing 10 requests at the same time, and that it also chunks the transfers into 8MB pieces when the file is larger than 8MB
This is also documented on the s3 cli config documentation.
These are the configuration values you can set for S3:
max_concurrent_requests - The maximum number of concurrent requests.
max_queue_size - The maximum number of tasks in the task queue. multipart_threshold - The size threshold the CLI uses for multipart transfers of individual files.
multipart_chunksize - When using multipart transfers, this is the chunk size that the CLI uses for multipart transfers of individual files.
You could tune it down, to see if it compares with your simple method:
aws configure set default.s3.max_concurrent_requests 1
Don't forget to tune it back up afterwards, or else your AWS performance will be miserable.

How to limit the number of jobs on a host using Sungrid?

I am using Sungrid6.2u5 ,I am trying to submit some jobs on 4 hosts, I need to run 50 jobs using all the 4 hosts but I want to inform the SGE that I want only 5 jobs to be run on the 4th host at any given time,how do I do that?
I am new to SunGrid.Could any one please point me to the SGE basics,I mean where do I get started?
I found this online,
Beginner's Guide to Sun Grid Engine 6.2 by Daniel Templeton
but apparently this is intended for system administrators ,I am just a normal user who is trying to understand the SGE features.
Thanks,
If you should not run more than 5 jobs on 4th node (let's call it computer04), probably, it is not capable of running something more. In general, you are encouraged to specify amount of resources for you job properly to prevent cores overload and out-of-memory situation.
If you have totally 20 Gb on computer04 and your job uses 5 Gb, you can limit all your jobs to 5Gb memory usage:
qsub -l vmem=5G my_work
The similar holds for disk amount:
qsub -l fsize=10G my_work
I found it is possible to run job on specific host with -l -h= option.
qsub -l -h=computer04 -l vmem=5G my_work
for 5 jobs. Then use
qsub -l vmem=5G my_work
for other 45 jobs.
(More dirty way)
You could do it without memory/disk restrictions:
qsub -l -h=computer04 my_work # 5 jobs
qsub -l -h="!computer04" my_work # for 45 jobs
If you have different queues or resources, and you could use them for different jobs. E.g., you have queue_4 that runs everything on computer04, and queue_main that is linked with other computers, then, you do
qsub -q queue_4 my_work
for 5 jobs, and
qsub -q queue_main my_work
for other jobs.
UPD on comment:
It is possible to force SGE denial of more than X jobs for user/host. It should be done by queue administrator.
qconf -arqs
{
name max_jobs_per_computer04
description "maximal number of jobs for user1 on computer04 restricted to 5!"
enabled TRUE
limit users user1 hosts computer04 to slots=5
}
If you want to restrict your user only in submitting jobs of some kind for computer04, you need to define complex parameter as shown here.

Unable to access Google Compute Engine instance using external IP address

I have a Google compute engine instance(Cent-Os) which I could access using its external IP address till recently.
Now suddenly the instance cannot be accessed using its using its external IP address.
I logged in to the developer console and tried rebooting the instance but that did not help.
I also noticed that the CPU usage is almost at 100% continuously.
On further analysis of the Serial port output it appears the init module is not loading properly.
I am pasting below the last few lines from the serial port output of the virtual machine.
rtc_cmos 00:01: RTC can wake from S4
rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day, 114 bytes nvram
cpuidle: using governor ladder
cpuidle: using governor menu
EFI Variables Facility v0.08 2004-May-17
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
GRE over IPv4 demultiplexor driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
registered taskstats version 1
rtc_cmos 00:01: setting system clock to 2014-07-04 07:40:53 UTC (1404459653)
Initalizing network drop monitor service
Freeing unused kernel memory: 1280k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 800k freed
Freeing unused kernel memory: 1584k freed
Failed to execute /init
Kernel panic - not syncing: No init found. Try passing init= option to kernel.
Pid: 1, comm: swapper Not tainted 2.6.32-431.17.1.el6.x86_64 #1
Call Trace:
[] ? panic+0xa7/0x16f
[] ? init_post+0xa8/0x100
[] ? kernel_init+0x2e6/0x2f7
[] ? child_rip+0xa/0x20
[] ? kernel_init+0x0/0x2f7
[] ? child_rip+0x0/0x20
Thanks in advance for any tips to resolve this issue.
Mathew
It looks like you might have an script or other program that is causing you to run out of Inodes.
You can delete the instance without deleting the persistent disk (PD) and create a new vm with a higher capacity using your PD, however if it's an script causing this, you will end up with the same issue. It's always recommended to backup your PD before making any changes.
Run this command to find more info about your instance:
gcutil --project= getserialportoutput
If the issue still continue, you can either
- Make a snapshot of your PD and make a PD's copy or
- Delete the instance without deleting the PD
Attach and mount the PD to another vm as a second disk, so you can access it to find what is causing this issue. Visit this link https://developers.google.com/compute/docs/disks#attach_disk for more information on how to do this.
Visit this page http://www.ivankuznetsov.com/2010/02/no-space-left-on-device-running-out-of-inodes.html for more information about inodes troubleshooting.
Make sure the Allow HTTP traffic setting on the vm is still enabled.
Then see which network firewall you are using and it's rules.
If your network is set up to use an ephemral IP, it will be periodically released back. This will cause your IP to change over time. Set it to static/reserved then (on networks page).
https://developers.google.com/compute/docs/instances-and-network#externaladdresses

Hot reconfiguration of HAProxy still lead to failed request, any suggestions?

I found there are still failed request when the traffic is high using command like this
haproxy -f /etc/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
to hot reload the updated config file.
Here below is the presure testing result using webbench :
/usr/local/bin/webbench -c 10 -t 30 targetHProxyIP:1080
Webbench – Simple Web Benchmark 1.5
Copyright (c) Radim Kolar 1997-2004, GPL Open Source Software.
Benchmarking: GET targetHProxyIP:1080
10 clients, running 30 sec.
Speed=70586 pages/min, 13372974 bytes/sec.
**Requests: 35289 susceed, 4 failed.**
I run command
haproxy -f /etc/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
several times during the pressure testing.
In the haproxy documentation, it mentioned
They will receive the SIGTTOU
611 signal to ask them to temporarily stop listening to the ports so that the new
612 process can grab them
so there is a time period that the old process is not listening on the PORT(say 80) and the new process haven’t start to listen to the PORT (say 80), and during this specific time period, it will cause the NEW connections failed, make sense?
So is there any approach that makes the configuration reload of haproxy that will not impact both existing connections and new connections?
On recent kernels where SO_REUSEPORT is finally implemented (3.9+), this dead period does not exist anymore. While a patch has been available for older kernels for something like 10 years, it's obvious that many users cannot patch their kernels. If your system is more recent, then the new process will succeed its attempt to bind() before asking the previous one to release the port, then there's a period where both processes are bound to the port instead of no process.
There is still a very tiny possibility that a connection arrived in the leaving process' queue at the moment it closes it. There is no reliable way to stop this from happening though.