Is it possible to query an IPFS network for the uptime (or at least the blocks added over a given time) of individual nodes? - ipfs

I’d like to query an IPFS network for the uptime of its nodes individually. That is, over a duration d I would like to know roughly how much time a node has been participating in the network. Instead of time, I believe it’s also safe to frame this problem in terms of work and query for the number of blocks added within a time period.
Is there a way to do this?
crosspost

The short answer is not really. See the response on discuss.ipfs.io for more detail.
If you are fine with a trustful system, you can potentially expose block add data through the RPC. But, this isn't supported out of the box.
Otherwise, you'll have to resort to some kind of polling and crawling it seems.

Related

Replacing items in message queue

Our system requirements say that we need to build a slightly unusual producer-consumer processing system. Imagine we have multiple data streams and we take a snapshot each X seconds and put it into the queue for processing. The streams count is not constant. The more clients we have, the more streams we need to process. At the same time, we don't need to process ALL taken snapshots. If we have too many clients and we are not able to process all items in real-time, we would prefer to skip old snapshots and process only the latest ones.
So as I see, the requirements can be met by keeping only one item in a queue for each stream. If there is a new snapshot, while the previous is still there, we need to REPLACE it using stream id as a key.
Is it possible to implement such behavior by Service Bus queue or something similar? Or maybe it makes sense to look into some other solutions like Redis?
So as I see, the requirements can be met by keeping only one item in a
queue for each stream. If there is a new snapshot, while the previous
is still there, we need to REPLACE it using stream id as a key. Is it
possible to implement such behavior by Service Bus queue or something
similar?
To the best of my knowledge, Azure Service Bus does not support this scenario. Through it's duplicate detection functionality, in fact it supports exact opposite of that. You would need to use some other mechanism (like Redis Cache you mentioned) to accomplish this.

real time data web page

I am new to programming and am working on pushing real time data from a PLC to a web page either by deploying HTML 5 on the WAGO or a Modbus driver wrapper. I honestly have tried to research but don't know where to start. it will be a closed private network with little to no influence from the outside web. I am simply looking to display a single piece of live information for proof of concept. basically I'm trying to custom design a Groov program.
You might want to look into using OPC. Kepware & SoftwareToolbox are just 2 of many vendors that offer tools to help you get your data the way you want it.
There is an existing tool to do what you want, but I am under the impression you have to write one from scratch. The existing tool is http://www.softwaretoolbox.com/cogentdatahub/ if you are interested in looking at it for ideas.
I've been able to interface with PLC using python and modbusTCP and an Raspberry pi as the webserver. Python is a quick and easy to learn language. Websockets are the HTML5 component best used for realtime data.
simple connect code (after you install everything):
from pymodbus.client.sync import ModbusTcpClient as ModbusClient
from time import sleep
client = ModbusClient('ip_address_of_modbus_IO')
if(client.connect()):
print(client.read_discrete_inputs(200,1).bits[0])
client.write_coil(0,True)
sleep(100)
client.write_coil(2,True)
found here:
http://simplyautomationized.blogspot.com/2013/09/home-automation-project-2-rpi-light.html
Can create a websocket broadcast server using an example here:
http://simplyautomationized.blogspot.com/2015/09/raspberry-pi-create-websocket-api.html
Fortunately you can not push data to a browser.
The Internet would become an even greater mess if you could.
To solve this, have your webpage contain a timer, written in JavaScript.
Every say 1 sec. it does an AJAX request (e.g. use jQuery implementation) to the server, which then delivers (almost) realtime data.
The webpage then displays that in some DOM element, e.g. an empty DIV.
So it's the browser polling your server.
#BlueDog
The data is "almost" realtime because sampling once a second gives a delay of at least one second. In the ideal case, as soon as data changes, it would be pushed to the browser. Unfortunately the browser has no way of knowing that anything changed, so the best it can do is frequently "ask" for updates (polling).
How much the delay is depends on your poll frequency. If it's once per second one has to add the delays for transmission of the page request and the reply of the server. The transmission time depends on your network (which may be the Internet with all uncertainty involved). If the backbones involved have enough capacity I expect overall delay to be between 1 and 1.5 seconds. With a dedicated network and even more frequent polling, I expect that 0.5 seconds should be possible. These are however estimated averages. If I request a page over the Internet and my provider (again) has a problem, it may be hours before I receive what I want. Also things like virus scanners and OS updates may spoil your game.
So, practically: with a good broadband connection, a stable browser and the right process priorities it should be possible to get below 1 second overall delay (incl. poll time interval) for 95% of the time. Be prepared to reboot the client every few days. Most browsers leak memory and most OS'es do so too.

Best way to test for total reachability

I have a set of nodes, each of which is connected to at least one other node. I would like to know if the connections are such that each node is reachable from every other. For example:
1--2
|
3--4
Versus:
1--2
3--4
I am convinced this kind of reachability testing can be projected in terms of an exact cover problem, however I can't seem to get my brain wrapped around how to do so. Does anyone have any pointers, documentation, web sites, etc., on how to do this? Examples would be extremely valuable.
Update: My ignorance has betrayed me, as it would seem there are far more efficient algorithms for this kind of test. If you have one, please point me to it.
start from any node and do a depth/breadth first traversal
count number of visited node (of course, dont visit any node twice!)
compare counted number with total number
There is also a fast (but quite complicated) algorithm for maintaining connectivity dynamically (i.e. under edge insertions/deletions), shown in this paper: Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity
The basic idea is maintaining a spanning tree. The easy cases are inserting an edge, and deleting a non spanning tree edge. The problem is when deleting a spanning tree edge, because now there is not guaranteed connectivity - we have to search efficiently an alternate route to connect the broken parts, otherwise the graph is disconnected.

What is the best approach to monitor site performance in rails

I'd like to have a special part of administrate area which shows a list of pages/queries that was slow for the past week.
Is there a gem / plugin or other easy way to implement/automate such functionality for the Rails framework?
http://www.newrelic.com/features.html
And a short screencast for RPM:
http://railslab.newrelic.com/2009/01/22/new-relic-rpm
P.S. I don't work for New Relic:)
Here's one example, and another and yet another. See google for even more.
Yeah, it seems like a nice idea. I´ve never seen it before but I imagine you could just log the time each request takes to a DB; then a simple query will show you the slowest requests.
How you optimize your app depends entirely on your code; I guess there´s no silver bullet for that.
First thing that comes to mind: one could track execution time of the queries and if it passes some threshold which is considered normal (average maybe) it gets logged along with some profiling information (which is discarded otherwise).
It's might also be feasible to profile individual parts of the query (like data acquisition from db, logic and so on), then again compare the times against some averages.
One pitfall is that some pages/queries are bound to be processed significantly longer then others because of the difference in the amount of "work" they do. One would have to keep a whole lot of averages for different parts of the site / different types of queries in order to get rid of constant flow of normal queries that execute longer by design.
This is a very simple approach though, I'm sure there are better ways to do it.

How do download accelerators work?

We require all requests for downloads to have a valid login (non-http) and we generate transaction tickets for each download. If you were to go to one of the download links and attempt to "replay" the transaction, we use HTTP codes to forward you to get a new transaction ticket. This works fine for a majority of users. There's a small subset, however, that are using Download Accelerators that simply try to replay the transaction ticket several times.
So, in order to determine whether we want to or even can support download accelerators or not, we are trying to understand how they work.
How does having a second, third or even fourth concurrent connection to the web server delivering a static file speed the download process?
What does the accelerator program do?
You'll get a more comprehensive overview of Download Accelerators at wikipedia.
Acceleration is multi-faceted
First
A substantial benefit of managed/accelerated downloads is the tool in question remembers Start/Stop offsets transferred and uses "partial" and 'range' headers to request parts of the file instead of all of it.
This means if something dies mid transaction ( ie: TCP Time-out ) it just reconnects where it left off and you don't have to start from scratch.
Thus, if you have an intermittent connection, the aggregate transfer time is greatly lessened.
Second
Download accelerators like to break a single transfer into several smaller segments of equal size, using the same start-range-stop mechanics, and perform them in parallel, which greatly improves transfer time over slow networks.
There's this annoying thing called bandwidth-delay-product where the size of the TCP buffers at either end do some math thing in conjunction with ping time to get the actual experienced speed, and this in practice means large ping times will limit your speed regardless how many megabits/sec all the interim connections have.
However, this limitation appears to be "per connection", so multiple TCP connections to a single server can help mitigate the performance hit of the high latency ping time.
Hence, people who live near by are not so likely to need to do a segmented transfer, but people who live in far away locations are more likely to benefit from going crazy with their segmentation.
Thirdly
In some cases it is possible to find multiple servers that provide the same resource, sometimes a single DNS address round-robins to several IP addresses, or a server is part of a mirror network of some kind. And download managers/accelerators can detect this and apply the segmented transfer technique across multiple servers, allowing the downloader to get more collective bandwidth delivered to them.
Support
Supporting the first kind of acceleration is what I personally suggest as a "minimum" for support. Mostly, because it makes a users life easy, and it reduces the amount of aggregate data transfer you have to provide due to users not having to fetch the same content repeatedly.
And to facilitate this, its recommended you, compute how much they have transferred and don't expire the ticket till they look "finished" ( while binding traffic to the first IP that used the ticket ), or a given 'reasonable' time to download it has passed. ie: give them a window of grace before requiring they get a new ticket.
Supporting the second and third give you bonus points, and users generally desire it at least the second, mostly because international customers don't like being treated as second class customers simply because of the greater ping time, and it doesn't objectively consume more bandwidth in any sense that matters. The worst that happens is they might cause your total throughput to be undesirable for how your service operates.
It's reasonably straight forward to deliver the first kind of benefit without allowing the second simply by restricting the number of concurrent transfers from a single ticket.
I believe the idea is that many servers limit or evenly distribute bandwidth across connections. By having multiple connections, you're cheating that system and getting more than your "fair" share of bandwidth.
It's all about Little's Law. Specifically each stream to the web server is seeing a certain amount of TCP latency and so will only carry so much data. Tricks like increasing the TCP window size and implementing selective acks help but are poorly implemented and generally cause more problems than they solve.
Having multiple streams means that the latency seen by each stream is less important as the global throughput increases overall.
Another key advantage with a download accelerator even when using a single thread is that it's generally better than using the web browsers built in download tool. For example if the web browser decides to die the download tool will continue. And the download tool may support functionality like pausing/resuming that the built-in brower doesn't.
My understanding is that one method download accelerators use is by opening many parallel TCP connections - each TCP connection can only go so fast, and is often limited on the server side.
TCP is implemented such that if a timeout occurs, the timeout period is increased. This is very effective at preventing network overloads, at the cost of speed on individual TCP connections.
Download accelerators can get around this by opening dozens of TCP connections and dropping the ones that slow to below a certain threshold, then opening new ones to replace the slow connections.
While effective for a single user, I believe it is bad etiquette in general.
You're seeing the download accelerator trying to re-authenticate using the same transaction ticket - I'd recommend ignoring these requests.
From: http://askville.amazon.com/download-accelerator-protocol-work-advantages-benefits-application-area-scope-plz-suggest-URLs/AnswerViewer.do?requestId=9337813
Quote:
The most common way of accelerating downloads is to open up parllel downloads. Many servers limit the bandwith of one connection so opening more in parallel increases the rate. This works by specifying an offset a download should start which is supported for HTTP and FTP alike.
Of course this way of acceleration is quite "unsocial". The limitation of bandwith is implemented to be able to serve a higher number of clients so using this technique lowers the maximum number of peers that is able to download. That's the reason why many servers are limiting the number of parallel connection (recognized by IP), e.g. many FTP-servers do this so you run into problems if you download a file and try to continue browsing using your browser. Technically these are two parallel connections.
Another technique to increase the download-rate is a peer-to-peer-network where different sources e.g. limited by asynchron DSL on the upload-side are used for downloading.
Most download 'accelerators' really don't speed up anything at all. What they are good at doing is congesting network traffic, hammering your server, and breaking custom scripts like you've seen. Basically how it works is that instead of making one request and downloading the file from beginning to end, it makes say four requests...the first one downloads from 0-25%, the second from 25-50%, and so on, and it makes them all at the same time. The only particular case where this helps any, is if their ISP or firewall does some kind of traffic shaping such that an individual download speed is limited to less than their total download speed.
Personally, if it's causing you any trouble, I'd say just put a notice that download accelerators are not supported, and have the users download them normally, or only using a single thread.
They don't, generally.
To answer the substance of your question, the assumption is that the server is rate-limiting downloads on a per-connection basis, so simultaneously downloading multiple chunks will enable the user to make the most of the bandwidth available at their end.
Typically download-accelerators depend on partial content download - status code 206. Just like the streaming media players, media players ask for a small chunk of the full file to the server and then download it and play. Now the catch is if a server restricts partial-content-download then the download accelerator won't work!. It's easy to configure a server like Nginx to restrict partial-content-download.
How to know if a file can be downloaded via ranges/partially?
Ans: check for a header value Accept-Ranges:. If it does exist then you are good to go.
How to implement a feature like this in any programming language?
Ans: well, it's pretty easy. Just spin up some threads/co-routines(choose threads/co-routines over processes in I/O or network bound system) to download the N-number of chunks in parallel. Save the partial files in the right position in the file. and you are technically done. Calculate the download speed by keeping a global variable downloaded_till_now=0 and increment it as one thread completes downloading a chunk. don't forget about mutex as we are writing to a global resource from multiple thread so do a thread.acquire() and thread.release(). And also keep a unix-time counter. and do math like
speed_in_bytes_per_sec = downloaded_till_now/(current_unix_time-start_unix_time)