What exactly is a MIME type [closed] - manifest

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've researched but all I can find is that the manifest file should have the correct MIME type & that's text/cache-manifest. I have no idea what a MIME type is.

As stated in WikiPedia:
An Internet media type is a standard identifier used on the Internet
to indicate the type of data that a file contains. Common uses include
the following:
email clients use them to identify attachment files,
web browsers use them to determine how to display or output files that are not in HTML format,
search engines use them to classify data files on the web.
A media type is composed of a type, a subtype, and zero or more
optional parameters. As an example, an HTML file might be designated
text/html; charset=UTF-8. In this example text is the type, html is
the subtype, and charset=UTF-8 is an optional parameter indicating the
character encoding.
IANA manages the official registry of media types.
The identifiers were originally defined in RFC 2046, and were called
MIME types because they referred to the non-ASCII parts of email
messages that were composed using the MIME (Multipurpose Internet Mail
Extensions) specification. They are also sometimes referred to as
Content-types.
Their use has expanded from email sent through SMTP, to other
protocols such as HTTP, RTP and SIP.

MIME types have this name because of their original purpose. According to Wikipedia:
Multipurpose Internet Mail Extensions (MIME) is an Internet standard
that extends the format of email to support: Text in character sets
other than ASCII, Non-text attachments, Message bodies with multiple
parts, Header information in non-ASCII character sets.
Although MIME was designed mainly for SMTP protocol, its use today has
grown beyond describing the content of email and now often includes
describe content type in general, including for the web (see Internet
media type) and as a storage for rich content in some commercial
products (e.g., IBM Lotus Domino and IBM Lotus Quickr).
Virtually all human-writt"en Internet email and a fairly large
proportion of automated email is transmitted via SMTP in MIME format.
Internet email is so closely associated with the SMTP and MIME
standards that it is sometimes called SMTP/MIME email.[1] The content
types defined by MIME standards are also of importance outside of
email, such as in communication protocols like HTTP for the World Wide
Web. HTTP requires that data be transmitted in the context of
email-like messages, although the data most often is not actually
email.

Related

How were HTML forms interpreted in the early 90s?

In the modern web an HTML <form> element is submitted and then interpreted by scripting. Either it is interpreted by a server side programming language (usually PHP) or it is interpreted by a client side script (almost always JavaScript).
Forms existed even in the early 90s. How were they interpreted back then?
According to this Wikipedia article there was an email based HTML form submission back then, but it was unreliable. Was this all there was? Why did HTML even have forms if they were so useless without scripting? Or was it a chicken and egg sort of situation?
Before server side scripting (PHP, Ruby, node.js) there was server side programming.
One of the original interfaces between web servers and back-end processes was the Common Gateway Interface (CGI). It was introduced in the early 90s by the NCSA back-end team at the same time forms was introduced into HTML by Tim Berners-Lee (who was also at NCSA at the time). So forms was introduced at roughly the same time CGI was invented.
Initially a lot of people wrote CGI programs in C. I was one of those who had to do so as a homework assignment. Instead of a giant all-encompassing framework we wrote small C programs that read from stdin and print to stdout (we printed HTTP response, not just the HTML as per CGI spec). A website had lots of these small programs each doing one small thing and updated some database (sometimes that database was just a flat file).
Almost as soon as it was introduced people also started writing CGI scripts in Perl. So there was really no transition period between C programs and scripting languages. People simply stopped writing CGI scripts in C because it was faster to do so in scripting languages.
Server side was actually always in the picture.
The Apache HTTP Server was available since 1995, and in 1996 it also had Perl support (which was used as a server-side programming language).
JavaScript was created in 1996 and Netscape was the first browser supported the client-side language (other browsers vendors implementations were based on the work that was done in Netscape).
In 1993 the Mosaic browser is released with support for images, nested lists and fill-out forms.
Basically - every HTTP server that could handle request and pass it to some application (no matter in what language that application is written in) is a server-side application. It can be written in scripting language (Perl/Python/PHP/Ruby), high-level language (Java/C#) and if you really want - even assembly. All you need to do is make sure you "follow the protocol".
JavaScript wasn't so advance (hell Ajax wasn't even out yet). So it was pure server-side. Mostly CGI (being Perl) and PHP.
There was also Coldfusion but wasn't a popular favorite.
Eventually, at the end of 1999 and early 2000s ASP.NET (aspx) and JavaServer Pages (jsp) came out, although a lot of commercial sites used aspx and jsp for obvious reasons.
Note, Java applets also existed (mostly for rendering though) but had to be separately downloaded and supported by the browser.
Additionally, I stumbled on an interesting piece of history on Wikipedia. HTML forms could also be sent by e-mail, using a mailto: address in the target attribute. Didn't seem to be popular, but still cool!
Quoting the Wikipedia article:
User-agent support for email based HTML form submission, using a
'mailto' URL as the form action, was proposed in RFC 1867 section 5.6,
during the HTML 3.2 era. Various web browsers implemented it by
invoking a separate email program or using their own rudimentary SMTP
capabilities. Although sometimes unreliable, it was briefly popular as
a simple way to transmit form data without involving a web server or
CGI scripts.
And RFC 1867 (November 1995):
5.6 Allow form ACTION to be "mailto:"
Independent of this proposal, it would be very useful for HTML
interpreting user agents to allow a ACTION in a form to be a
"mailto:" URL. This seems like a good idea, with or without this
proposal. Similarly, the ACTION for a HTML form which is received via
mail should probably default to the "reply-to:" of the message.
These two proposals would allow HTML forms to be served via HTTP
servers but sent back via mail, or, alternatively, allow HTML forms
to be sent by mail, filled out by HTML-aware mail recipients, and the
results mailed back.

Why implement 64 base codificatioin instead of other in MIME over smtp protocol?

The title says everything.
I would like to ask if there's any place on the internet where i can consult the other "candidates" to MIME protocol.
Thanks in advance.
MIME isn't a protocol, it's really just a format specification.
That said, there are no alternatives for use with SMTP. There are also no open specification alternatives, either (there are proprietary alternatives, but they aren't what the general internet uses - for example, Exchange can/used to(?) use a SOAP-based protocol, GroupWise had its own custom protocol as well, and I'm sure so did Lotus Notes... but they all also support MIME, SMTP, POP3 and IMAP).
Also there's no website that I know of that lists them.

Is Microsoft Exchange NDR compliant with RFC 3461, RFC 3834?

I am trying to parse NDR (Non Delivery Report) from a plethora of providers, including but not limited to, Microsoft Exchange, GMail, Yahoo! and Microsoft Live.
However, I am not sure whether Microsoft Exchange (all currently supported versions) conforms to the relevant RFCs that other providers listed above are conforming to.
Any help would be appreciated.
A very quick google for "Exchange RFC 3461" returns this Knowledge Base page with all the supported RFCs. Both RFCs are in there

Why use data URI scheme?

Basically the question is in the title.
Many people have had the question stackoverflow of how to create a data URI and problems therein.
My question is why use data URI?
What are the advantages to doing:
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />
Over doing:
<img src="dot.png" alt="Red dot" />
I understand one has less overhead on the server side (maybe), but what are the real advantages/disadvantages to using data URI?
According to Wikipedia:
Advantages:
HTTP request and header traffic is not required for embedded data, so
data URIs consume less bandwidth whenever the overhead of encoding
the inline content as a data URI is smaller than the HTTP overhead.
For example, the required base64 encoding for an image 600 bytes long
would be 800 bytes, so if an HTTP request required more than 200
bytes of overhead, the data URI would be more efficient.
For transferring many small files (less than a few kilobytes each), this can be faster. TCP transfers tend to start slowly. If each file requires a new TCP connection, the transfer speed is limited by the round-trip time rather than the available bandwidth. Using HTTP keep-alive improves the situation, but may not entirely alleviate the bottleneck.
When browsing a secure HTTPS web site, web browsers commonly require that all elements of a web page be downloaded over secure connections, or the user will be notified of reduced security due to a mixture of secure and insecure elements. On badly configured servers, HTTPS requests have significant overhead over common HTTP requests, so embedding data in data URIs may improve speed in this case.
Web browsers are usually configured to make only a certain number
(often two) of concurrent HTTP connections to a domain, so inline
data frees up a download connection for other content.
Environments with limited or restricted access to external resources
may embed content when it is disallowed or impractical to reference
it externally. For example, an advanced HTML editing field could
accept a pasted or inserted image and convert it to a data URI to
hide the complexity of external resources from the user.
Alternatively, a browser can convert (encode) image based data from
the clipboard to a data URI and paste it in a HTML editing field.
Mozilla Firefox 4 supports this functionality.
It is possible to manage a multimedia page as a single file. Email
message templates can contain images (for backgrounds or signatures)
without the image appearing to be an "attachment".
Disadvantages:
Data URIs are not separately cached from their containing documents
(e.g. CSS or HTML files) so data is downloaded every time the
containing documents are redownloaded. Content must be re-encoded and
re-embedded every time a change is made.
Internet Explorer through version 7 (approximately 15% of the market as of January 2011), lacks support. However this can be overcome by serving browser specific content.
Internet Explorer 8 limits data URIs to a maximum length of 32 KB.
Data is included as a simple stream, and many processing environments (such as web browsers) may not support using containers (such as multipart/alternative or message/rfc822) to provide greater complexity such as metadata, data compression, or content negotiation.
Base64-encoded data URIs are 1/3 larger in size than their binary
equivalent. (However, this overhead is reduced to 2-3% if the HTTP
server compresses the response using gzip) Data URIs make it more
difficult for security software to filter content.
According to other sources
- Data URLs are significantly slower on mobile browsers.
A good use of Data URI is allowing the download of content that have been generated client side, without resorting to a server-side 'proxy'. Here are some example I can think of:
saving the output of a canvas element as an image.
offering download of a table as CSV
downloading output of any kind of online editor (text, drawing, CSS code ...etc)
Mainly I find use of this if I can't (for some reason) use CSS sprites and I don't want to download every little single image that I will use for styling.
Or for some reason you don't want anyone to link the image from external page. This can be achieved by other methodologies but embedding works as well.
Otherwise, personally I wouldn't encode large images as photos. It's better to have your media at a different server. A server that can lack all of the web-server related software installed. Simply delivering media. Much better use of resources.
I have used the data URI scheme in several (C++, Python) command line applications to generate
reports which include data plots.
Having a single file is quite convenient to send the reports by email (or move them around in general). Compared to PDF I did not need an additional library (other than a base64 encoding
routine) and I don't need to take care of page breaks (and I almost never need to print
these reports). I usually don't put these reports on a web server, just view them on the
local filesystem with a browser.
I agree with BiAiB that the real value of Data URIs is making client-side generated content available as file download without any need for server round-trips.
A working example of using Data URIs for "offering download of a table as CSV" is described on my blog.
IMHO, the embedding of image (or other binary resource) data into an HTML file for performance reasons is a red herring. The speed gain due to less HTTP connections is negligible and breaks the nice principle of separation between (textual) markup and binary resources (image files, videos, etc.).
I think HTTP 1.1 pipelining and some suggested improvements to HTTP are a cleaner and better way to handle HTTP network speed issues.

How can you access the info on a website via a program?

Suppose I want to write a program to read movie info from IMDb, or music info from last.fm, or weather info from weather.com etc., just reading the webpage and parsing it is quiet tedious. Often websites have an xml feed (such as last.fm) set up exactly for this.
Is there a particular link/standard that websites follow for this feed? Such as robot.txt, is there a similar standard for information feeds, or does each website have its own standard?
This is the kind of problem RSS or Atom feeds were designed for, so look for a link for an RSS feed if there is one. They're both designed to be simple to parse too. That's normally on sites that have regularly updated content though, like news or blogs. If you're lucky, they'll provide many different RSS feeds for different aspects of the site (the way Stackoverflow does for questions, for instance)
Otherwise, the site may have an API you can use to get the data (like Facebook, Twitter, Google services etc). Failing that, you'll have to resort to screen-scraping and the possible copyright and legal implications that are involved with that.
Websites provide different ways to access this data. Like web services , Feeds, Endpoints to query their data.
And there are programs used to collect data from pages without using standard techniques. These programs are called Bots. These programs use different techniques to get data from websites (NOTE: Be careful Data may be copyright protected)
The most common such standards are RSS and the related Atom. Both are formats for XML syndication of web content. Most software libraries include components for parsing these formats, as they are widespread.
yes rss standard. And xml standard.
Sounds to me like you're referring to RSS or Atom feeds. These are specified for a given page in the source; for instance, open the source html for this very page and go to line 22.
Both Atom and RSS are standards. They are both XML based, and there are many parsers for each.
You mentioned screen scraping as the "tedious" option; it is also normally against the terms of service for the website. Doing this may get you blocked. Feed reading is by definition allowed.
There are a number of standards websites use for this, depending on what they are doing, and what they want to do.
RSS is a protocol for sending out formatted chunks of data in machine-parsable form. It stands for "Real Simple Syndication" and is usually used for news feeds, blogs, and other things where there is new content on a periodic or sporadic basis. There are dozens of RSS readers which allow one to subscribe to multiple RSS sources and periodically check them for new data. It is intended to be lightweight.
AJAX is a protocol for sending commands from websites to the web server and getting results back in a machine-parsable form. It is designed to work with JavaScript on the web client. The AJAX standard specifies how to format and send a request and how to format and send a reply, as well as how to parse the requests and replies. It tends to be up to the developers to know what commands are available via AJAX.
SOAP is another protocol like AJAX, but it's uses tend to be more program-to-program, rather than from web client to server. SOAP allows for auto-discovery of what commands are available by use of a machine-readable file in WSTL format, which essentially specifies in XML the method signatures and types used by a particular SOAP interface.
Not all sites use RSS, AJAX, or SOAP. Last.fm, one of the examples you listed, does not seem to support RSS and uses it's own web-based API for getting information from the site. In those cases, you have to find out what their API is (Last.fm appears to be well documented, however).
Choosing the method of obtaining data depends on the application. If its a public/commercial application screen scraping won't be an option. (E.g. if you want to use IMDB information commercially then you will need to make contract paying them 15000$ or more according to their website's usage policy)
I think your problem isn't not knowing the standard procedure for obtaining website information but rather not knowing that your inability to obtain data is due to websites not wanting to provide that data.
If a website wants you to be able to use their information, then there will almost certainly be a well documented api interface with various standard protocols for queries.
A list of APIs can be found here.
Dataformats listed at this particular sites are: CSV, GeoRSS, HTML, JSON, KML, OPML, OpenSearch, PHP, RDF, RSS, Text, XML, XSPF, YAML, CSV, GEORSS.