How do download accelerators work? - html

We require all requests for downloads to have a valid login (non-http) and we generate transaction tickets for each download. If you were to go to one of the download links and attempt to "replay" the transaction, we use HTTP codes to forward you to get a new transaction ticket. This works fine for a majority of users. There's a small subset, however, that are using Download Accelerators that simply try to replay the transaction ticket several times.
So, in order to determine whether we want to or even can support download accelerators or not, we are trying to understand how they work.
How does having a second, third or even fourth concurrent connection to the web server delivering a static file speed the download process?
What does the accelerator program do?

You'll get a more comprehensive overview of Download Accelerators at wikipedia.
Acceleration is multi-faceted
First
A substantial benefit of managed/accelerated downloads is the tool in question remembers Start/Stop offsets transferred and uses "partial" and 'range' headers to request parts of the file instead of all of it.
This means if something dies mid transaction ( ie: TCP Time-out ) it just reconnects where it left off and you don't have to start from scratch.
Thus, if you have an intermittent connection, the aggregate transfer time is greatly lessened.
Second
Download accelerators like to break a single transfer into several smaller segments of equal size, using the same start-range-stop mechanics, and perform them in parallel, which greatly improves transfer time over slow networks.
There's this annoying thing called bandwidth-delay-product where the size of the TCP buffers at either end do some math thing in conjunction with ping time to get the actual experienced speed, and this in practice means large ping times will limit your speed regardless how many megabits/sec all the interim connections have.
However, this limitation appears to be "per connection", so multiple TCP connections to a single server can help mitigate the performance hit of the high latency ping time.
Hence, people who live near by are not so likely to need to do a segmented transfer, but people who live in far away locations are more likely to benefit from going crazy with their segmentation.
Thirdly
In some cases it is possible to find multiple servers that provide the same resource, sometimes a single DNS address round-robins to several IP addresses, or a server is part of a mirror network of some kind. And download managers/accelerators can detect this and apply the segmented transfer technique across multiple servers, allowing the downloader to get more collective bandwidth delivered to them.
Support
Supporting the first kind of acceleration is what I personally suggest as a "minimum" for support. Mostly, because it makes a users life easy, and it reduces the amount of aggregate data transfer you have to provide due to users not having to fetch the same content repeatedly.
And to facilitate this, its recommended you, compute how much they have transferred and don't expire the ticket till they look "finished" ( while binding traffic to the first IP that used the ticket ), or a given 'reasonable' time to download it has passed. ie: give them a window of grace before requiring they get a new ticket.
Supporting the second and third give you bonus points, and users generally desire it at least the second, mostly because international customers don't like being treated as second class customers simply because of the greater ping time, and it doesn't objectively consume more bandwidth in any sense that matters. The worst that happens is they might cause your total throughput to be undesirable for how your service operates.
It's reasonably straight forward to deliver the first kind of benefit without allowing the second simply by restricting the number of concurrent transfers from a single ticket.

I believe the idea is that many servers limit or evenly distribute bandwidth across connections. By having multiple connections, you're cheating that system and getting more than your "fair" share of bandwidth.

It's all about Little's Law. Specifically each stream to the web server is seeing a certain amount of TCP latency and so will only carry so much data. Tricks like increasing the TCP window size and implementing selective acks help but are poorly implemented and generally cause more problems than they solve.
Having multiple streams means that the latency seen by each stream is less important as the global throughput increases overall.
Another key advantage with a download accelerator even when using a single thread is that it's generally better than using the web browsers built in download tool. For example if the web browser decides to die the download tool will continue. And the download tool may support functionality like pausing/resuming that the built-in brower doesn't.

My understanding is that one method download accelerators use is by opening many parallel TCP connections - each TCP connection can only go so fast, and is often limited on the server side.
TCP is implemented such that if a timeout occurs, the timeout period is increased. This is very effective at preventing network overloads, at the cost of speed on individual TCP connections.
Download accelerators can get around this by opening dozens of TCP connections and dropping the ones that slow to below a certain threshold, then opening new ones to replace the slow connections.
While effective for a single user, I believe it is bad etiquette in general.
You're seeing the download accelerator trying to re-authenticate using the same transaction ticket - I'd recommend ignoring these requests.

From: http://askville.amazon.com/download-accelerator-protocol-work-advantages-benefits-application-area-scope-plz-suggest-URLs/AnswerViewer.do?requestId=9337813
Quote:
The most common way of accelerating downloads is to open up parllel downloads. Many servers limit the bandwith of one connection so opening more in parallel increases the rate. This works by specifying an offset a download should start which is supported for HTTP and FTP alike.
Of course this way of acceleration is quite "unsocial". The limitation of bandwith is implemented to be able to serve a higher number of clients so using this technique lowers the maximum number of peers that is able to download. That's the reason why many servers are limiting the number of parallel connection (recognized by IP), e.g. many FTP-servers do this so you run into problems if you download a file and try to continue browsing using your browser. Technically these are two parallel connections.
Another technique to increase the download-rate is a peer-to-peer-network where different sources e.g. limited by asynchron DSL on the upload-side are used for downloading.

Most download 'accelerators' really don't speed up anything at all. What they are good at doing is congesting network traffic, hammering your server, and breaking custom scripts like you've seen. Basically how it works is that instead of making one request and downloading the file from beginning to end, it makes say four requests...the first one downloads from 0-25%, the second from 25-50%, and so on, and it makes them all at the same time. The only particular case where this helps any, is if their ISP or firewall does some kind of traffic shaping such that an individual download speed is limited to less than their total download speed.
Personally, if it's causing you any trouble, I'd say just put a notice that download accelerators are not supported, and have the users download them normally, or only using a single thread.

They don't, generally.
To answer the substance of your question, the assumption is that the server is rate-limiting downloads on a per-connection basis, so simultaneously downloading multiple chunks will enable the user to make the most of the bandwidth available at their end.

Typically download-accelerators depend on partial content download - status code 206. Just like the streaming media players, media players ask for a small chunk of the full file to the server and then download it and play. Now the catch is if a server restricts partial-content-download then the download accelerator won't work!. It's easy to configure a server like Nginx to restrict partial-content-download.
How to know if a file can be downloaded via ranges/partially?
Ans: check for a header value Accept-Ranges:. If it does exist then you are good to go.
How to implement a feature like this in any programming language?
Ans: well, it's pretty easy. Just spin up some threads/co-routines(choose threads/co-routines over processes in I/O or network bound system) to download the N-number of chunks in parallel. Save the partial files in the right position in the file. and you are technically done. Calculate the download speed by keeping a global variable downloaded_till_now=0 and increment it as one thread completes downloading a chunk. don't forget about mutex as we are writing to a global resource from multiple thread so do a thread.acquire() and thread.release(). And also keep a unix-time counter. and do math like
speed_in_bytes_per_sec = downloaded_till_now/(current_unix_time-start_unix_time)

Related

real time data web page

I am new to programming and am working on pushing real time data from a PLC to a web page either by deploying HTML 5 on the WAGO or a Modbus driver wrapper. I honestly have tried to research but don't know where to start. it will be a closed private network with little to no influence from the outside web. I am simply looking to display a single piece of live information for proof of concept. basically I'm trying to custom design a Groov program.
You might want to look into using OPC. Kepware & SoftwareToolbox are just 2 of many vendors that offer tools to help you get your data the way you want it.
There is an existing tool to do what you want, but I am under the impression you have to write one from scratch. The existing tool is http://www.softwaretoolbox.com/cogentdatahub/ if you are interested in looking at it for ideas.
I've been able to interface with PLC using python and modbusTCP and an Raspberry pi as the webserver. Python is a quick and easy to learn language. Websockets are the HTML5 component best used for realtime data.
simple connect code (after you install everything):
from pymodbus.client.sync import ModbusTcpClient as ModbusClient
from time import sleep
client = ModbusClient('ip_address_of_modbus_IO')
if(client.connect()):
print(client.read_discrete_inputs(200,1).bits[0])
client.write_coil(0,True)
sleep(100)
client.write_coil(2,True)
found here:
http://simplyautomationized.blogspot.com/2013/09/home-automation-project-2-rpi-light.html
Can create a websocket broadcast server using an example here:
http://simplyautomationized.blogspot.com/2015/09/raspberry-pi-create-websocket-api.html
Fortunately you can not push data to a browser.
The Internet would become an even greater mess if you could.
To solve this, have your webpage contain a timer, written in JavaScript.
Every say 1 sec. it does an AJAX request (e.g. use jQuery implementation) to the server, which then delivers (almost) realtime data.
The webpage then displays that in some DOM element, e.g. an empty DIV.
So it's the browser polling your server.
#BlueDog
The data is "almost" realtime because sampling once a second gives a delay of at least one second. In the ideal case, as soon as data changes, it would be pushed to the browser. Unfortunately the browser has no way of knowing that anything changed, so the best it can do is frequently "ask" for updates (polling).
How much the delay is depends on your poll frequency. If it's once per second one has to add the delays for transmission of the page request and the reply of the server. The transmission time depends on your network (which may be the Internet with all uncertainty involved). If the backbones involved have enough capacity I expect overall delay to be between 1 and 1.5 seconds. With a dedicated network and even more frequent polling, I expect that 0.5 seconds should be possible. These are however estimated averages. If I request a page over the Internet and my provider (again) has a problem, it may be hours before I receive what I want. Also things like virus scanners and OS updates may spoil your game.
So, practically: with a good broadband connection, a stable browser and the right process priorities it should be possible to get below 1 second overall delay (incl. poll time interval) for 95% of the time. Be prepared to reboot the client every few days. Most browsers leak memory and most OS'es do so too.

What does Chrome Network Timings really mean and what does affects each timing length?

I was looking at chrome dev tools #resource network timing to detect requests that must be improved. In the link before there's a definition for each timing but I don't understand what processes are being taken behind the scenes that are affecting the length of the period.
Below are 3 different images and here is my understanding of what's going on, please correct me if I'm wrong.
Stalled: Why there are timings where the request get's stalled for 1.17s while others are taking less?
Request Sent: it's the time that our request took to reach server
TTFB: Time took until the server responds with the first byte of data
Content Download: The time until the whole response reaches the client
Thanks
Network is an area where things will vary greatly. There are a lot of different numbers that go into play with these and they vary between different locations and even the same location with different types of content.
Here is some more detail on the areas you need more understanding with:
Stalled: This depends on what else is going on in the network stack. One thing could not be stalled at all, while other requests could be stalled because 6 connections to the same location are already open. There are more reasons for stalling, but the maximum connection limit is an easy way to explain why it may occur.
The stalled state means, we just can't send the request right now it needs to wait for some reason. Generally, this isn't a big deal. If you see it a lot and you are not on the HTTP2 protocol, then you should look into minimizing the number of resources being pulled from a given location. If you are on HTTP2, then don't worry too much about this since it deals with numerous requests differently.
Look around and see how many requests are going to a single domain. You can use the filter box to trim down the view. If you have a lot of requests going off to the same domain, then that is most likely hitting the connection limit. Domain sharding is one method to handle this with HTTP 1.1, but with HTTP 2 it is an anti-pattern and hurts performance.
If you are not hitting the max connection limit, then the problem is more nuanced and needs a more hands-on debugging approach to figure out what is going on.
Request sent: This is not the time to reach the server, that is the Time To First Byte. All request sent means is the request is sent and it took the network stack X time to carry it out.
Nothing you can do to speed this up, it is more for informational and internal debugging purposes.
Time to First Byte (TTFB): This is the total time for the sent request to get to the destination, then for the destination to process the request, and finally for the response to traverse the networks back to the client.
A high TTFB reveals one of two issues. The first is a bad network connection between the client and server. So data is slow to reach the server and get back. The second cause is, a slow server processing the request. This is either because the hardware is weak or the application running is slow. Or, both of these problems can exist at once.
To address a high TTFB, first cut out as much network as possible. Ideally, host the application locally on a low-resource virtual machine and see if there is still a big TTFB. If there is, then the application needs to be optimized for response speed. If the TTFB is super-low locally, then the networks between your client and the server are the problem. There are various ways to handle this that I won't get into since it is an area of expertise unto itself. Research network optimization, and even try moving hosts and seeing if your server providers network is the issue.
Remember the entire server-stack comes into play here. So if nginx or apache are configured poorly, or your database is taking a long time to respond, or your cache is having trouble, then these can cause delays. They are also difficult to detect locally, since your local server could vary in configuration from the remote stack.
Content Download: This is the total time from the TTFB resolving for the client to get the rest of the content from the server. This should be short unless you are downloading a large file. You should take a look at the size of the file, the conditions of the network, and then judge about how long this should take.

How do web spiders differ from Wget's spider?

The next sentence caught my eye in Wget's manual
wget --spider --force-html -i bookmarks.html
This feature needs much more work for Wget to get close to the functionality of real web spiders.
I find the following lines of code relevant for the spider option in wget.
src/ftp.c
780: /* If we're in spider mode, don't really retrieve anything. The
784: if (opt.spider)
889: if (!(cmd & (DO_LIST | DO_RETR)) || (opt.spider && !(cmd & DO_LIST)))
1227: if (!opt.spider)
1239: if (!opt.spider)
1268: else if (!opt.spider)
1827: if (opt.htmlify && !opt.spider)
src/http.c
64:#include "spider.h"
2405: /* Skip preliminary HEAD request if we're not in spider mode AND
2407: if (!opt.spider
2428: if (opt.spider && !got_head)
2456: /* Default document type is empty. However, if spider mode is
2570: * spider mode. */
2571: else if (opt.spider)
2661: if (opt.spider)
src/res.c
543: int saved_sp_val = opt.spider;
548: opt.spider = false;
551: opt.spider = saved_sp_val;
src/spider.c
1:/* Keep track of visited URLs in spider mode.
37:#include "spider.h"
49:spider_cleanup (void)
src/spider.h
1:/* Declarations for spider.c
src/recur.c
52:#include "spider.h"
279: if (opt.spider)
366: || opt.spider /* opt.recursive is implicitely true */
370: (otherwise unneeded because of --spider or rejected by -R)
375: (opt.spider ? "--spider" :
378: (opt.delete_after || opt.spider
440: if (opt.spider)
src/options.h
62: bool spider; /* Is Wget in spider mode? */
src/init.c
238: { "spider", &opt.spider, cmd_boolean },
src/main.c
56:#include "spider.h"
238: { "spider", 0, OPT_BOOLEAN, "spider", -1 },
435: --spider don't download anything.\n"),
1045: if (opt.recursive && opt.spider)
I would like to see the differences in code, not abstractly. I love code examples.
How do web spiders differ from Wget's spider in code?
A real spider is a lot of work
Writing a spider for the whole WWW is quite a task --- you have to take care about many "little details" such as:
Each spider computer should receive data from a few thousand servers in parallel in order to make efficient use of the connection bandwidth. (asynchronous socket i/o).
You need several computers that spider in parallel in order to cover the vast amount of information on the WWW (clustering; partitioning the work)
You need to be polite to the spidered web sites:
Respect the robots.txt files.
Don't fetch a lot of information too quickly: this overloads the servers.
Don't fetch files that you really don't need (e.g. iso disk images; tgz packages for software download...).
You have to deal with cookies/session ids: Many sites attach unique session ids to URLs to identify client sessions. Each time you arrive at the site, you get a new session id and a new virtual world of pages (with the same content). Because of such problems, early search engines ignored dynamic content. Modern search engines have learned what the problems are and how to deal with them.
You have to detect and ignore troublesome data: connections that provide a seemingly infinite amount of data or connections that are too slow to finish.
Besides following links, you may want to parse sitemaps to get URLs of pages.
You may want to evaluate which information is important for you and changes frequently to be refreshed more frequently than other pages. Note: A spider for the whole WWW receives a lot of data --- you pay for that bandwidth. You may want to use HTTP HEAD requests to guess whether a page has changed or not.
Besides receiving, you want to process the information and store it. Google builds indices that list for each word the pages that contain it. You may need separate storage computers and an infrastructure to connect them. Traditional relational data bases don't keep up with the data volume and performance requirements of storing/indexing the whole WWW.
This is a lot of work. But if your target is more modest than reading the whole WWW, you may skip some of the parts. If you just want to download a copy of a wiki etc. you get down to the specs of wget.
Note: If you don't believe that it's so much work, you may want to read up on how Google re-invented most of the computing wheels (on top of the basic Linux kernel) to build good spiders. Even if you cut a lot of corners, it's a lot of work.
Let me add a few more technical remarks on three points
Parallel connections / asynchronous socket communication
You could run several spider programs in parallel processes or threads. But you need about 5000-10000 parallel connections in order to make good use of your network connection. And this amount of parallel processes/threads produces too much overhead.
A better solution is asynchronous input/output: process about 1000 parallel connections in one single thread by opening the sockets in non-blocking mode and use epoll or select to process just those connections that have received data. Since Linux kernel 2.4, Linux has excellent support for scalability (I also recommend that you study memory-mapped files) continuously improved in later versions.
Note: Using asynchronous i/o helps much more than using a "fast language": It's better to write an epoll-driven process for 1000 connections written in Perl than to run 1000 processes written in C. If you do it right, you can saturate a 100Mb connection with processes written in perl.
From the original answer:
The down side of this approach is that you will have to implement the HTTP specification yourself in an asynchronous form (I am not aware of a re-usable library that does this). It's much easier to do this with the simpler HTTP/1.0 protocol than the modern HTTP/1.1 protocol. You probably would not benefit from the advantages of HTTP/1.1 for normal browsers anyhow, so this may be a good place to save some development costs.
Edit five years later:
Today, there is a lot of free/open source technology available to help you with this work. I personally like the asynchronous http implementation of node.js --- it saves you all the work mentioned in the above original paragraph. Of course, today there are also a lot of modules readily available for the other components that you need in your spider. Note, however, that the quality of third-party modules may vary considerably. You have to check out whatever you use. [Aging info:] Recently, I wrote a spider using node.js and I found the reliability of npm modules for HTML processing for link and data extraction insufficient. For this job, I "outsourced" this processing to a process written in another programming language. But things are changing quickly and by the time you read this comment, this problem may already a thing of the past...
Partitioning the work over several servers
One computer can't keep up with spidering the whole WWW. You need to distribute your work over several servers and exchange information between them. I suggest to assign certain "ranges of domain names" to each server: keep a central data base of domain names with a reference to a spider computer.
Extract URLs from received web pages in batches: sort them according to their domain names; remove duplicates and send them to the responsible spider computer. On that computer, keep an index of URLs that already are fetched and fetch the remaining URLs.
If you keep a queue of URLs waiting to be fetched on each spider computer, you will have no performance bottlenecks. But it's quite a lot of programming to implement this.
Read the standards
I mentioned several standards (HTTP/1.x, Robots.txt, Cookies). Take your time to read them and implement them. If you just follow examples of sites that you know, you will make mistakes (forget parts of the standard that are not relevant to your samples) and cause trouble for those sites that use these additional features.
It's a pain to read the HTTP/1.1 standard document. But all the little details got added to it because somebody really needs that little detail and now uses it.
I am not sure exactly what the original author of the comment was referring to, but I can guess that wget is slow as a spider, since it appears to only use a single thread of execution (at least by what you have shown).
"Real" spiders such as heritrix use a lot of parallelism and tricks to optimize their crawling speed, while simultaneously being nice to the website they are crawling. This typically means limiting hits to one site at a rate of 1 per second (or so), and crawling multiple websites at the same time.
Again this is all just a guess based on what I know of spiders in general, and what you posted here.
Unfortunately, many of the more well-known 'real' web spiders are closed-source, and indeed closed-binary. However there are a number of basic techniques wget is missing:
Parallelism; you're never going to be able to keep up with the entire web without retrieving multiple pages at a time
Prioritization; some pages are more important to spider than others
Rate limiting; you'll be banned quickly if you keep pulling down pages as quickly as you can
Saving to something other than a local filesystem; the Web is big enough that it's not going to fit in a single directory tree
Rechecking pages periodically without restarting the entire process; in practice, with a real spider you'll want to recheck 'important' pages for updates frequently, while less interesting pages can go for months.
There are also various other inputs that can be used such as sitemaps and the like. Point is, wget isn't designed to spider the entire web, and it's not really a thing that can be captured in a small code sample, as it's a problem of the whole overall technique being used, rather than any single small subroutine being wrong for the task.
I'm not going to go into details of how to spider the internet, I think that wget comment is regarding to spidering one website which is still a serious challenge.
As a spider you need to figure out when to stop, not go into recursive crawls just because the URL changed like date=1/1/1900 to 1/2/1900 and so
Even bigger challenge to sort out URL Rewrite (I have no clue what so ever how google or any other handles this). It's pretty big challenge to crawl enough but not too much. And how one can automatically recognise URL Rewrite with some random parameters and random changes in the content?
You need to parse Flash / Javascript at least up to some level
You need to consider some crazy HTTP issues like base tag. Even parsing the HTML is not easy, considering most of the websites are not XHTML and browsers are so flexible in the syntax.
I don't know how much of these implemented or considered in wget but you might want to take a look at httrack to understand the challenges of this task.
I'd love to give you some code examples but this is big tasks and a decent spider will be about 5000 loc without 3rd party libraries.
+ Some of them already explained by #yaakov-belch so I'm not going to type them again

How to get your code ready for Loadbalancing

As we did this in the past, i'd like to gather useful information for everyone moving to loadbalancing, as there are issues which your code must be aware of.
We moved from one apache server to squid as reverse proxy/loadbalancer with three apache servers behind.
We are using PHP/MySQL, so issues may differ.
Things we had to solve:
Sessions
We moved from "default" php sessions (files) to distributed memcached-sessions. Simple solution, has to be done. This way, you also don't need "sticky sessions" on your loadbalancer.
Caching
To our non-distributed apc-cache per webserver, we added anoter memcached-layer for distributed object caching, and replaced all old/outdated filecaching systems with it.
Uploads
Uploads go to a shared (nfs) folder.
Things we optimized for speed:
Static Files
Our main NFS runs a lighttpd, serving (also user-uploaded) images. Squid is aware of that and never queries our apache-nodes for images, which gave a nice performance boost. Squid is also configured to cache those files in ram.
What did you do to get your code/project ready for loadbalancing, any other concerns for people thinking about this move, and which platform/language are you using?
When doing this:
For http nodes, I push hard for a single system image (ocfs2 is good for this) and use either pound or crossroads as a load balancer, depending on the scenario. Nodes should have a small local disk for swap and to avoid most (but not all) headaches of CDSLs.
Then I bring Xen into the mix. If you place a small, temporal amount of information on Xenbus (i.e. how much virtual memory Linux has actually promised to processes per VM aka Committed_AS) you can quickly detect a brain dead load balancer and adjust it. Oracle caught on to this too .. and is now working to improve the balloon driver in Linux.
After that I look at the cost of splitting the database usage for any given app across sqlite3 and whatever db the app wants natively, while realizing that I need to split the db so posix_fadvise() can do its job and not pollute kernel buffers needlessly. Since most DBMS services want to do their own buffering, you must also let them do their own clustering. This really dictates the type of DB cluster that I use and what I do to the balloon driver.
Memcache servers then boot from a skinny initrd, again while the privileged domain watches their memory and CPU use so it knows when to boot more.
The choice of heartbeat / takeover really depends on the given network and the expected usage of the cluster. Its hard to generalize that one.
The end result is typically 5 or 6 physical nodes with quite a bit of memory booting a virtual machine monitor + guests while attached to mirrored storage.
Storage is also hard to describe in general terms.. sometimes I use cluster LVM, sometimes not. The not will change when LVM2 finally moves away from its current string based API.
Finally, all of this coordination results in something like Augeas updating configurations on the fly, based on events communicated via Xenbus. That includes ocfs2 itself, or any other service where configurations just can't reside on a single system image.
This is really an application specific question .. can you give an example? I love memcache, but not everyone can benefit from using it, for instance. Are we reviewing your configuration or talking about best practices in general?
Edit:
Sorry for being so Linux centric ... its typically what I use when designing a cluster.

How Does Listening to a Multicast Hurt Me?

I'm receiving a recovery feed from an exchange for recovering data missed from their primary feed.
The exchange strongly recommends listening to the recovery feed only when data is needed, and leaving the multicast once I have recovered the data I need.
My question is, if I am using asio, and not reading from the NIC when I don't need it, what is the harm? The messages have sequence numbers, so I can't accidentally process an old message "left" on the card.
Is this really harming my application?
It's likely not harming your application so much as harming your machine - since the nic is still configured into the multicast group, it's still listening to those messages and passing them up, before your software ignores them and they get discarded. That's a lot of extra work that your network stack and kernel are doing, and therefore a lot of extra load on the machine in general, not just your app.
Listening to your recovery feed could also have a potential impact on a network level. As pjz mentioned, your NIC and IP stack will have more frames/packets to process. In addition, more of your available bandwidth is being used up by data that is not being used by your application; this could lead to dropped frames if there is congestion on your link. Whether congestion is likely to occur depends on whether your server is 100Mb or 1Gb attached, how much other traffic your host is sending/receiving, etc.
Another potential concern is the impact on other hosts. If the switch your host is attached to does not have IGMP snooping enabled, then all hosts on the same VLAN will receive the additional multicast traffic, which could lead them to experience the same problems as mentioned above.
If you have a networking team administering your network for you, it may be worth seeking out some recommendations from them? If you feel it is necessary to subscribe to a redundant feed, it would seem prudent to work out what level of redundancy already exists in your network and how likely it is that messages on the primary feed will be lost.
An addition to muz's comment...
It's unlikely that this will make any difference to your system, but it's worth being aware that there is an overhead associated with maintaining a multicast membership (assuming that you're using IGMP - which is probably reasonable given the restriction about "leaving the multicast")
IGMP requires the sending and processing of multicast group memberships at regular intervals. And (as alluded to in muz's comment) if you have any switches or routers between you and the multicast source that are capable of igmp snooping then they are able to disable the multicast for a given network.