Is there a way to explicitly tell the CUPS server that the file you are sending is text/html thus overriding the mime.types lookup?
Yes, there is.
Use this commandline:
lp -d printername -o document-format=text/html file.html
Update (in response to comments)
I provided an exact answer to the OP's question.
However, this (alone) does not guarantee that the file will be successfully printed. To achieve that, CUPS needs a filter which can process the input of MIME type text/html.
Such a filter is not provided by CUPS itself. However, it is easy to plug your own filter into the CUPS filtering system, and some Linux distributions ship such a filter capable to consume HTML files and convert them to a printable format.
You can check what happens in such a situation on your system. The cupsfilter command is a helper utility to run available/installed CUPS filters without the need to do actual printing through the CUPS daemon:
touch 1.html
/usr/sbin/cupsfilter --list-filters 1.html
Now on a system with no HTML consuming filter ready, you'd get this response:
cupsfilter: No filter to convert from text/html to application/pdf.
On a different system (like on a Mac), you'll see this:
xhtmltopdf
You can even force input and output MIME types to see which filters CUPS would run automatically when asked to print this file an a printer supporting that particular output MIME type (-i sets the input MIME type, -m the output):
/usr/sbin/cupsfilter \
-i text/html \
-m application/postscript \
--list-filters \
1.html
xhtmltopdf
cgpdftops
Here it would first convert HTML to PDF using xhtmltopdf, then transform the resulting PDF to PostScript using cgpdftops.
If you skip the --list-filters parameter, the command would actually even go ahead and do the conversion by actively running (not just listing) the two filters and emit the result to <stdout>.
You could write your own CUPS filter based on a Shell script. The only other ingredient you need is a command line tool, such as htmldoc or wkhtmltopdf, which is able to process HTML input and produce some format that in turn could be consumed by the CUPS filtering chain further down the road.
Be aware, that some (especially JavaScript-heavy) HTML files cannot be successfully processed by simple command line tools into print-ready formats.
If you need more details about this, just ask another question...
Related
Basically I would like to export the analysics of wireshark to RTP streams into CSV or XML format to read it again for some tests. I can do the following using tshark through command line.
tshark -r rtp.pcap -q -z rtp,streams
Is there a way to specify and output file and it's format? If there's a way to do this through wireshark directly, it's welcome.
Note: what need to store is the overall statistics of all the streams not the detailed one per each stream.
You can save the output to a text file using the redirect operator. i.e. > output.txt. This is very basic and difficult to parse but unfortunately, there does not seem to be any way to control the format of the output. The -T -E -e combination outputs details from each packet and the -w option outputs a raw file.
Wireshark
Go to Telephony -> RTP -> Show All Streams
You can copy the values to the clipboard in CSV.
See also Wireshark Wiki
We are using Log4cplus to generate logs in our Linux based home appliance. These logs are currently available on the home appliance on which a web server is running. We also show this logfile through a web browser. However since the log file is text (i.e. not in html format), the file is not formatted and it is difficult to view each log separately.
We would like to view these logs through the web server with logs formatted at html. log4j supports output logs in html format, however we have not found a way to generate html formatted logs using log4cplus. This posting is to gather ideas of how this may be possible to do this using log4cplus. Either within log4cplus or post processing but real-time since we are looking for logs in real-time.
aha Could be a starting point to produce an html file, but to get a richer formatting you could probably write some script using awk to html-ize your output.
For example, considering the following output file:
2014-07-02 20:52:39 DEBUG className:200 - This is debug message
2014-07-02 20:52:39 DEBUG className:201 - This is debug message2
The following script will produce some valid html table based on the three first fields:
#!/usr/bin/awk -f
BEGIN { print "<table>"; }
{ print "<tr><td>" $1 "<td></td>" $2 "<td></td>" $3 "</td></tr>" }
END { print "</table>" }
Just expand on this.
To get realtime handling, you will need to daemonize it.
log4cplus currently (2015-04-18) does not support HTML formatted file output in any specific way. You could sort of fake it using a layout. Or you could write your own Appender instance.
When I run certain console applications (particularly, MSBuild or PowerShell) they produce an output containing text of different colors (for warnings, errors etc). Sometimes I need to save it for a future analysis, or to send it in an e-mail. I only can copy plain text from the console, or redirect the program output to a file, but this way all colors are lost. Is there a way to capture an output of a console application in a color-preserving format like HTML or RTF?
Powershell team blogged this script, which captures console screen buffer up to the current cursor position and returns it in HTML format.
I am creating a filter for files coming onto a Unix machine. I only want to allow plain text files that do not look like scripts to pass through.
For checking plain text I am checking the executable bit of the file and using the -T file test from perl. (I understand this is not 100%, but it will catch the binary files I most want to avoid). I think this will be sufficient, but any suggestions are welcome.
My main question is in recognizing when a plain text file is a script. Every script I've ever written has started out with a #! line, so my first thought is to read in the file's first line and block any containing that. Are there common non-script plain text files that start with the #! line that I will flag with a false-positive? Are there better/additional methods of identifying a script?
That's what the file command (see Wikipedia) is for. It recognizes much more than just the she-bang (#!), and can tell you what kind of script it is, if any.
I'm trying to find a way to automatically download all links from a web page, but I also want to rename them. for example:
<a href = fileName.txt> Name I want to have </a>
I want to be able to get a file named 'Name I want to have' (I don't worry about the extension).
I am aware that I could get the page source, then parse all the links, and download them all manually, but I'm wondering if there are any built-in tools for that.
lynx --dump | grep http:// | cut -d ' ' -f 4
will print all the links that can be batch fetched with wget - but is there a way to rename the links on the fly?
I doubt anything does this out of the box. I suggest you write a script in Python or similar to download the page, and load the source (try the Beautiful Soup library for tolerant parsing). Then it's a simple matter of traversing the source to capture the links with their attributes and text, and download the files with the names you want. With the exception of Beautiful Soup (if you need to be able to parse sloppy HTML), all you need is built in with Python.
I solved the problem by converting the web page entirely to unicode on the first pass (using notepad++'s built-in conversion)
Then I wrote a small shell script that used cat, awk and wget to fetch all the data.
Unfortunately, I couldn't automate the process since I didn't find any tools for linux which would convert an entire page from KOI8-R to unicode.