Pin 100k hashes on own IPFS cluster - ipfs

I am running an IPFS cluster in the cloud and I would like to pin about 100k hashes of objects from the network.
I'm currently iterating through the list using the ipfs pin add <hash>, but it's taking forever (as some hashes can't be found immediately or take long time to be found)
Is there a way to request a IPFS node/cluster to pin add hashes in batches? A best effort approach would suffice as I know some hashes may have disappeared or not be reacheable anymore.
Is there a way to achieve this quickly?

You can stream a list of files to ipfs pin add on STDIN. Here, /path/to/hashes is a file with one IPFS hash on each line:
ipfs pin add < /path/to/hashes
You can also pass the --progress flag to see the current pinning progress.

Related

IPFS: How to add a file to an existing folder?

Given a rather large folder, that has already been pushed to the network, and deleted locally. How would a file be added to that folder, without re-downloading the entire folder it?
You can only do it by using ipns after downloading it again with ipfs get, which should be fast if it's still pinned to your local storage:
(1) first add (i.e. re-add) your folder to ipfs recursively: ipfs add -r /path/to/folder. The second column of the last stdout line has the ipfs hash of the parent folder you just added. (The original files are still the same, so the hashes will be the same too.)
(2) then publish that hash: ipfs name publish /ipfs/<CURRENT_PARENTFOLDER_HASH>. This will return your peer ID, and you can share the link as /ipns/<PEER_ID>; repeat this step (ipfs name publish) whenever the folder contents (and therefore the parent folder hash) changes. The ipns object will then always point to the latest version of your folder.
(3) if you plan on sharing a lot, you can create a new keypair for each folder you share: ipfs key gen --type=rsa --size=2048 new-share-key … and then use that key (instead of your default key) to publish (and later republish) that folder: ipfs name publish --key=new-share-key /ipfs/<CURRENT_PARENTFOLDER_HASH>
See also the documentation here: https://docs.ipfs.io/reference/cli/#ipfs-name-publish
I'm a bit late to answer this, but I found the 2 existing answers a bit unclear.
Tl;Dr; Just commands and minimal info
If you want a thorough detailed explanation, scroll down to the section starting with The 2 keys to mutability.
If you just need the commands you should run, and barebones usage info so you know how to actually adjust the command for your use case, then read this TL;DR; section.
Use IPNS / DNSLink for references to IPFS objects that can be updated
IPNS
Create a key, back it up if using in production, then use ipfs name publish to change the object that your key currently points to. Access your key by prefixing /ipns/ to commands / URLs instead of /ipfs/.
ipfs key gen test
# backup your key if used in production
ipfs key export -o /home/somewhere/safe/test.key test
umount /ipns
ipfs name publish -k test QmWRsWoZjiandZUXLyczXSoWi84hXNHvBQ49BiQx9hPdjs
# Published to k51qzi5uqu5dkqxbxeulacqmz5ekmopr3nsh9zmgve1dji0dccdy86uqyhq1m0: /ipfs/QmWRsWoZjiandZUXLyczXSoWi84hXNHvBQ49BiQx9hPdjs
ipfs ls /ipns/k51qzi5uqu5dkqxbxeulacqmz5ekmopr3nsh9zmgve1dji0dccdy86uqyhq1m0
# Qme85tx5Wnsjc5pZZs1JGogBNUVM2WThC18ERh6t2YFJSK 37 lorem.txt
ipfs name publish -k test QmaDDLFL3fM4sQkQfV82LdNqtNnyaeAmgC46Qc7FDQdkq8
# Published to k51qzi5uqu5dkqxbxeulacqmz5ekmopr3nsh9zmgve1dji0dccdy86uqyhq1m0: /ipfs/QmaDDLFL3fM4sQkQfV82LdNqtNnyaeAmgC46Qc7FDQdkq8
# Since it's not a folder this time, we use 'ipfs cat' to read
# it to the console, since we know the file was plain text.
ipfs cat /ipns/k51qzi5uqu5dkqxbxeulacqmz5ekmopr3nsh9zmgve1dji0dccdy86uqyhq1m0
# foo bar foo bar foo foo foo
# bar foo foo bar bar foo bar
DNSLink
Set a TXT record on _dnslink above the (sub)domain you want to use as an IPNS reference. Set the value to dnslink=/ipns/<id> or dnslink=/ipfs/<id> depending on whether you're pointing it at an IPFS object or an IPNS address, and replace <id> with the object ID / IPNS address you want to point it to.
Domain: privex.io
(Subdomain) Name: _dnslink.test
Record Type: TXT
Value: dnslink=/ipns/k51qzi5uqu5dkqxbxeulacqmz5ekmopr3nsh9zmgve1dji0dccdy86uqyhq1m0
TTL (expiry): 120 (seconds)
Just like normal IPNS, you should now be able to query it with IPFS CLI tools, or IPFS gateways by using /ipns/<your_domain> instead of /ipfs/<object_id>.
If we now cat /ipns/test.privex.io we can see it's working properly, pointing to the foo bar text file (no wrapped folder).
ipfs#privex ~ $ ipfs cat /ipns/test.privex.io
foo bar foo bar foo foo foo
bar foo foo bar bar foo bar
Add an existing IPFS object ID to another IPFS object (wrapped folder)
Using the following command, you can add an individual IPFS file, or an entire wrapped folder to an existing object using their respective object IDs, and the command will output a new object ID, referencing a new object that contains both the original folder data, and the new data that you wanted to add.
The syntax for the command is: ipfs object patch add-link [object-to-add-to] [name-of-newly-added-file-or-folder] [object-to-inject]
ipfs#privex:~$ ipfs object patch add-link QmXCfnzXHThHwaTvSSAKeErxK48XkyVoL6ZNEhkpKmZyW3 hello/foo.txt QmaDDLFL3fM4sQkQfV82LdNqtNnyaeAmgC46Qc7FDQdkq8
QmaWoYZnSXnKqzskrBwtmZPE74qKe4AF5YfwaY83nzeCCL
The 2 keys to mutability
1. Having an IPFS object ID that stays the same despite the content changing
Unfortunately, IPFS object IDs (the ones starting with Q) are immutable, meaning their contents cannot be altered in the future without getting a new ID, due to the fact an object ID is effectively a hash (usually a form of SHA256).
HOWEVER, both IPNS and DNSLink have a solution for this.
IPNS is "Interplantary Name System", which is strongly integrated into IPFS. It allows you to generate an address (public key) and a private key, similar to how Bitcoin and many other cryptocurrencies work. Using your private key, you can point your IPNS
First, you'll want to generate a key (note: you'll need a key per individual IPNS address you want)
ipfs#privex:~$ ipfs key gen test
k51qzi5uqu5dkqxbxeulacqmz5ekmopr3nsh9zmgve1dji0dccdy86uqyhq1m0
If you plan to use your IPNS address for something other than testing, you should export the private key and keep a copy of it somewhere safe. Note that the private key is a binary file, so if you want to store it somewhere that expects plain text, you can convert it into base64 like so: base64 test.key
ipfs key export -o /home/somewhere/safe/test.key test
Next we'll publish a random IPFS folder to the IPNS address, which contains one file (lorem.txt) with a few lines of lorem ipsum text. If you use the FUSE /ipns folder, you may need to unmount it before you're able to publish via IPNS:
ipfs#privex:~$ umount /ipns
ipfs#privex:~$ ipfs name publish -k test QmWRsWoZjiandZUXLyczXSoWi84hXNHvBQ49BiQx9hPdjs
Published to k51qzi5uqu5dkqxbxeulacqmz5ekmopr3nsh9zmgve1dji0dccdy86uqyhq1m0: /ipfs/QmWRsWoZjiandZUXLyczXSoWi84hXNHvBQ49BiQx9hPdjs
ipfs#privex:~$ ipfs ls /ipns/k51qzi5uqu5dkqxbxeulacqmz5ekmopr3nsh9zmgve1dji0dccdy86uqyhq1m0
Qme85tx5Wnsjc5pZZs1JGogBNUVM2WThC18ERh6t2YFJSK 37 lorem.txt
That's just one example though - to prove that the IPNS address can actually be updated with different content, in this next example, I'll publish an individual text file directly to the IPNS address (not a wrapped folder).
# Publish the IPFS object 'QmaDDLFL3fM4sQkQfV82LdNqtNnyaeAmgC46Qc7FDQdkq8'
# to our existing named key 'test'
ipfs#privex:~$ ipfs name publish -k test QmaDDLFL3fM4sQkQfV82LdNqtNnyaeAmgC46Qc7FDQdkq8
# Since it's not a folder this time, 'ipfs ls' won't return anything.
# So instead, we use 'ipfs cat' to read it to the console, since we
# know the file was plain text.
ipfs#privex:~$ ipfs cat /ipns/k51qzi5uqu5dkqxbxeulacqmz5ekmopr3nsh9zmgve1dji0dccdy86uqyhq1m0
foo bar foo bar foo foo foo
bar foo foo bar bar foo bar
DNSLink
DNSLink is a part of IPNS that allows for human readable IPNS addresses through the standard domain system (e.g. example.com).
Since the IPNS section was rather long, I'll keep this one short and sweet. If you want to know more about DNSLink, please visit dnslink.io.
First, either you already have a domain to use, or you acquire a domain from a registrar such as Namecheap.
Go to your domain record management panel - if you use Cloudflare, then they are your domain management panel. Add a TXT record for _dnslink.yourdomain.com or if you want to use a subdomain, _dnslink.mysub.yourdomain.com (on most registrars, you only enter the part before the domain you're managing, i.e. _dnslink or _dnslink.mysub).
In the value box, enter dnslink= followed by either /ipfs/ or /ipns/ depending on whether you want to use an IPFS object ID or an IPNS name address, then enter your object ID / IPNS name to the end.
For example, if you were pointing your domain to the IPNS address in the earlier example, you'd enter:
dnslink=/ipns/k51qzi5uqu5dkqxbxeulacqmz5ekmopr3nsh9zmgve1dji0dccdy86uqyhq1m0
Or if you wanted to point it to the example folder containing lorem.txt with a few lines of lorem ipsum, it would be
dnslink=/ipfs/QmWRsWoZjiandZUXLyczXSoWi84hXNHvBQ49BiQx9hPdjs
For example purposes, here's a summary of how I setup test.privex.io
Domain: privex.io
(Subdomain) Name: _dnslink.test
Record Type: TXT
Value: dnslink=/ipns/k51qzi5uqu5dkqxbxeulacqmz5ekmopr3nsh9zmgve1dji0dccdy86uqyhq1m0
TTL (expiry): 120 (seconds)
(note: most people are fine with "auto" TTL, or the somewhat standard 600 TTL. If you intend to change the DNSLink value regularly, or you're experimenting and likely updating it constantly, you may want a low TTL of 60 or even 30)
After setting it up, with the IPNS address still pointing at the raw foo bar text data, I used ipfs cat to read the data that the domain pointed to:
ipfs#privex:~$ ipfs cat /ipns/test.privex.io
foo bar foo bar foo foo foo
bar foo foo bar bar foo bar
2. Add existing IPFS objects to your object, without having to download/organise the object being added.
First we create the IPFS object - a wrapped folder containing hello/lorem.txt - which has the object ID QmXCfnzXHThHwaTvSSAKeErxK48XkyVoL6ZNEhkpKmZyW3
ipfs#privex:~$ mkdir hello
ipfs#privex:~$ echo -e "lorem ipsum dolor\nlorem ipsum dolor\n" > hello/lorem.txt
ipfs#privex:~$ ipfs add -p -r -w hello
added Qme85tx5Wnsjc5pZZs1JGogBNUVM2WThC18ERh6t2YFJSK hello/lorem.txt
added QmWRsWoZjiandZUXLyczXSoWi84hXNHvBQ49BiQx9hPdjs hello
added QmXCfnzXHThHwaTvSSAKeErxK48XkyVoL6ZNEhkpKmZyW3
37 B / 37 B [=======================================================================] 100.00%
ipfs#privex:~$ ipfs ls QmXCfnzXHThHwaTvSSAKeErxK48XkyVoL6ZNEhkpKmZyW3
QmWRsWoZjiandZUXLyczXSoWi84hXNHvBQ49BiQx9hPdjs - hello/
ipfs#privex:~$ ipfs ls QmXCfnzXHThHwaTvSSAKeErxK48XkyVoL6ZNEhkpKmZyW3/hello
Qme85tx5Wnsjc5pZZs1JGogBNUVM2WThC18ERh6t2YFJSK 37 lorem.txt
Next, for the sake of creating an example external object ID that isn't part of the original wrapped folder, I created foo.txt containg a couple of lines of random foo bar text, and uploaded it to IPFS on its own. Its object ID is QmaDDLFL3fM4sQkQfV82LdNqtNnyaeAmgC46Qc7FDQdkq8
ipfs#privex:~$ echo -e "foo bar foo bar foo foo foo\nbar foo foo bar bar foo bar\n" > foo.txt
ipfs#privex:~$ ipfs add foo.txt
added QmaDDLFL3fM4sQkQfV82LdNqtNnyaeAmgC46Qc7FDQdkq8 foo.txt
57 B / 57 B [======================================================================] 100.00%
Finally, we use ipfs object patch add-link to add the foo.txt object (QmaDDLFL3fM4sQkQfV82LdNqtNnyaeAmgC46Qc7FDQdkq8) I created before, inside of the hello/ folder of the original wrapped folder I created (QmXCfnzXHThHwaTvSSAKeErxK48XkyVoL6ZNEhkpKmZyW3).
The syntax for the command is: ipfs object patch add-link [object-to-add-to] [name-of-newly-added-file-or-folder] [object-to-inject]
ipfs#privex:~$ ipfs object patch add-link QmXCfnzXHThHwaTvSSAKeErxK48XkyVoL6ZNEhkpKmZyW3 hello/foo.txt QmaDDLFL3fM4sQkQfV82LdNqtNnyaeAmgC46Qc7FDQdkq8
QmaWoYZnSXnKqzskrBwtmZPE74qKe4AF5YfwaY83nzeCCL
It outputs a new object ID QmaWoYZnSXnKqzskrBwtmZPE74qKe4AF5YfwaY83nzeCCL which is the ID of the newly created object that contains both hello/lorem.txt from the original, and hello/foo.txt which was injected later on.
NOTE: This command ALSO works when adding entire wrapped folders to another wrapped folder, however, be careful to avoid double nesting. e.g. you have Qxxxx/hello/world and Qyyyy/lorem/ipsum - if you add Qyyyy to Qxxxx specifying the name lorem - it will be added as Qzzzz/lorem/lorem/ipsum
If we now do ipfs ls on the new object ID, we can see that the hello/ sub-folder contains BOTH foo.txt and lorem.txt - confirming that foo.txt was successfully injected into the duplicate, without needing to download both the original and foo.txt - then organising them properly in a folder before uploading.
ipfs#privex:~$ ipfs ls QmaWoYZnSXnKqzskrBwtmZPE74qKe4AF5YfwaY83nzeCCL
QmbU3BwdMarL8n6KCzVdYqMh6HEjCv6pLJQZhoVGWZ5bWW - hello/
ipfs#privex:~$ ipfs ls QmaWoYZnSXnKqzskrBwtmZPE74qKe4AF5YfwaY83nzeCCL/hello
QmaDDLFL3fM4sQkQfV82LdNqtNnyaeAmgC46Qc7FDQdkq8 57 foo.txt
Qme85tx5Wnsjc5pZZs1JGogBNUVM2WThC18ERh6t2YFJSK 37 lorem.txt
Summary
As explained in the first section, IPFS object IDs are immutable, thus while it's possible to merge existing objects on IPFS, it still results in a new object ID.
BUT, by using IPNS key addresses and/or DNSLink, you can have a mutable (editable) reference that points to any IPFS object, and can be updated to point to a new object ID on-demand, e.g. whenever you update the contents of an existing object, or if you decide you simply want your IPNS key/domain to point at something completely different, you're free to do so :)
This should be easy with the files API. Assuming you have already added the new file to ipfs and obtained its hash, try:
ipfs files cp /ipfs/QmExistingLargeFolderHash /folder-to-modify
ipfs files cp /ipfs/QmNewFileHash /folder-to-modify/new-file
This of course does not add a file to an existing folder (because folders and files are immutable), it just creates a copy/new version of the folder with a new file added. Hence, it will have a new hash:
ipfs files stat /folder-to-modify
The files API does not pin the files that are referenced or retrieve any subfolders unless necessary, so this can be done on any node in the network without incurring lots of traffic.
[Edit]
A while later, I learn that there are a few more things you can do:
Instead of
ipfs files cp /ipfs/QmNewFileHash /folder-to-modify/new-file
you can use ipfs files write -te if you haven't added the file to ipfs yet.
You can enable write features of the HTTP API to use PUT requests to obtain hashes of new versions of a folder. See this blogpost.
You can mount ipns via fuse and write to …/ipns/local.
And probably best: you can use ipfs object patch add-link /ipfs/QmExistingLargeFolderHash new-file /ipfs/QmNewFileHash to do it in one step

Why matchInDirectory should return mount points

I am implementing Tcl Filesystem object. Can someone explain what are mount points. Why they are needed. And what will happen if my matchInDirectoryProc will not return any mount point like native filesystem implementation does?
Let's say there is foo/bar/vfs.myzip where vfs.myzip is a container file for which I am implementing filesystem. I am assuming that vfs.myzip is a mount point. Should my implementation return foo/bar/vfs.myzip if type is TCL_GLOB_TYPE_MOUNT, path is foo/bar/ and patter is "*". What if patter will be "*/*"?
A mount point is a prefix of a path that is the root of a particular virtual filesystem (the native filesystem is special-cased, IIRC). Everything in a VFS will appear below that mount point.
So, suppose /foo/bar/vfs.myzip is the mount point, and inside the VFS is a file abc.txt, a directory def, and another file def/ghi.html. In that case, once correctly mounted the following would exist:
/foo/bar/vfs.myzip/abc.txt
/foo/bar/vfs.myzip/def
/foo/bar/vfs.myzip/def/ghi.html
Now, the matchInDirectoryProc is used inside the globbing code. It's purpose is to return the list of directory entries that match a particular set of constraints in a particular (virtual) directory. It's wrapped inside the Tcl API function Tcl_FSMatchInDirectory, whose documentation notes that:
Note that the glob code implements recursive patterns internally, so this function will only ever be passed simple patterns, which can be matched using the logic of string match. To handle recursion, Tcl will call this function frequently asking only for directories to be returned. A special case of being called with a NULL pattern indicates that the path needs to be checked only for the correct type.
That is, don't worry about that */* pattern; you'll never see it.
I'm not entirely sure how the search for mounts works, but I think it is determining if there is a mount handled by the particular VFS that matches a path. The main example of doing this that I can find online is the TclVFS package, which is rather odd in a few ways. Here's the relevant code but I think that it isn't easy to understand. But for all that, one thing is relatively clear: it's asking about mounts within a particular directory, and not recursively.
Thus, if your mount point is /foo/bar/vfs.myzip then when your code is called asking about mount points in /foo/bar it ought to return an entry for vfs.myzip. If that's the only mount point you maintain, that's the only thing you need to handle in that case.
Assuming that I'm correct anyway. I don't know the virtual filesystem layer well, so this is based on reading code and documentation, not real experience…

Model derivative: translation stops at 50%, never fails, never completes

I have a following scenario, 2 revit files, ModelA.rvt and ModelB.rvt. They are cross-referenced together, zipped and uploaded twice under diferrent object key (ModelA.zip, ModelB.zip). ZIP files are identical, very small(4MB) and containing both files. They both are uploaded succesfuly in a loop using:
PUT https://developer.api.autodesk.com/oss/v2/buckets/:bucketKey/objects/:objectName
Files are overwritten with token scope data:write and a post job called with x-ads-force = true in case of model update. Then I call the POST JOB 2x in a loop, once with ModelA.rvt as rootFilename for ModelA.zip and secondly with ModelB.rvt for ModelB.zip. Both post jobs are done sucesfully.
Right after I am getting manifest for both zip files each 10 secs. ModelB.zip is translated 100% in a few secs, but ModelA.zip never finishes (few hours so far), just hangs for no reason. On friday I thought that is just temporary issue, but no it still lasts.
I tried this scenario 3x times, each time with different set of files today and 3 days back. Same result. This one is the easiest one and they are all already present on the cloud. Still have no idea what is going on.
When I list bucket objects, zip files are never present. Another weird thing. Other files with non-zip extension are.
Does anyone have a clue what is causing this, what could be possible workaround? That is serious issue, because it corrupts usability and reliability of the whole API.
The linked revit files need to be in one zipfile with the new v2 API. See this post for more details: http://adndevblog.typepad.com/cloud_and_mobile/2016/07/translate-referenced-files-by-derivative-api.html

TCL help: How to check for unmount/bad disks before read file

I need help here.
I have list of directory/file path and my program will read through every one.
Somehow one of the directory is unmount/bad disks and cause my program hang over there when I'm try to open the file using command below.
catch {set directory_fid [open $filePath r]}
So, how can I check the directory status before I'm reading/open the file? I want to skip that file if no response for certain time and continue to read next file.
*file isdir $dir is not working as well
*There is no response when i'm using ls -dir in Unix also.
Before you start down this path, I would review your requirements and see if there's any easier way to handle this. It would be better to fix the mounts so that they don't cause a hang condition if an access attempt is made.
The main problem is that for the directories you are checking, you need to know the corresponding mount point. If you don't know the mount point, it's hard to tell whether the directory you want to check will cause any hangs when you try to access it.
First, you would have to parse /etc/fstab and get a list of possible filesystem mount points (Assumption, Linux system -- if not Linux, there will be an equivalent file).
Second, to see what is currently mounted you need the di Tcl extension (wiki page) (or main page w/download links). (*). Using this extension, you can get a list of mounted filesystems.
# the load only needs to be done once...
set ext [info sharedlibextension]
set lfn [file normalize [file join [file dirname [info script]] diskspace$ext]]
load $lfn
# there are various options that can be passed to the `diskspace`
# command that will change which filesystems are listed.
set fsdata [diskspace -f {}]
set fslist [dict keys $fsdata]
Now you have a list of possible mount points, and you know which are mounted.
Third, you need to figure out which mount point corresponds to the directory you want to check. For example, if you have:
/user/bll/source/stuff.c
You need to check for /user/bll/source, then /user/bll, then /user, then / as possible mount points.
There's a huge assumption here that the file or any of its parent directories are not symlinked to another place.
Once you determine the probable mount point, you can check if it is mounted:
if { $mountpoint in $fslist } {
...
} else {
# better skip this one, the probable mount point is not mounted.
}
As you can see, this is a lot of work. It's fragile.
Better to fix the mounts so they don't hang.
(*) I wrote di and the di Tcl extension. This is a portable solution. You can of course use exec to run df or mount, but there are other issues (parsing, portability, determining which filesystems to use) if you use the more manual method.

How do you configure Xcode Server (Bot) to only keep the n most recent integrations?

We've recently discovered that Xcode Server (i.e. a Bot) will keep all past integrations. (We discovered this as the builds started failing and we realized the CI server was completely out of disk space).
How can you configure a bot (or the server in general) to only keep the last n integrations? Or even the last n days?
If there is no built-in setting, is there a way to accomplish this via a cron job that doesn't have to use the unofficial XCode Server API?
The current max disk size is a ratio of 0.75 of the capacity (if I understand the output well). You can see it for yourself if you run curl -k -u USER:PASS https://localhost:20343/api/settings. You might be able to change it by calling this API as a PATCH request with a modified value for max_percent_disk_usage to something smaller and then giving it time to clean up. I haven't tested that however.
If you're interested in how this works, see /Applications/Xcode.app/Contents/Developer/usr/share/xcs/xcsd/routes/routes_setting.js line 19. From there you should be able to dig deeper and see for yourself.
Hope this helps.
This was very helpful, #czechboy!
The JSON document returned when you fetch the settings will contain the _id of the xcode instance whose settings you wish to modify, and you must send the PATCH request to https://localhost:20343/api/settings/<id>. The body of the request should be something like:
{ "set_props": { "max_percent_disk_usage": 0.40 } }
After doing this I needed to restart the server before old files were cleaned up.