By default G-WAN strips white spaces from HTML files to minimize the file.
What's the best way to allow pre-formatted text defined by <pre> tag to get through?
#Richard Heath
Interesting -- I'm using a vanilla installation of G-Wan with the <pre> block starting like this <pre class="fragment">.
See sample of doxygen generated doc
This is being hosted up on a vanilla installation of g-wan.
Update:
As a temporary (not clean/quick fix) work around, I've changed the startup to look like this:
START=""
...
nohup ./$NAME $START &>/dev/null &
I will try later to write a handler to filter the return.
updated sample files for comparison
./gwan -d
http://alex4u2nv.com/test/test.html
nohup ./gwan &> /dev/null &
http://alex4u2nv.com/docs/test.html
If you look at this link showing both preformated source code and text then it is clear that G-WAN v3.3 respects the <pre> tag even when running in daemon mode.
If you have an example of a broken page then publish the broken text rather than such a huge HTML page.
Further, in the link you provide the text is NOT broken but there is a client script that blocks the Internet Browser (one has to stop Javascript to see the whole page).
Related
TL;DR: vim seems to be sourcing both indent/javascript.vim and indent/html.vim on editing html files; is this intentional or a bug? How can I make html files only source html.vim?
Recently I found out that vim seems to be using indent filetype plugins for both javascript and html on editing html files, and I've done some testing based on this behaviour on minimal vim configurations.
Here is my one-line .vimrc:
filetype plugin indent on
Inside my .vim directory:
~ % tree .vim
.vim
└── indent
├── html.vim
└── javascript.vim
1 directory, 2 files
Where:
~ % cat .vim/indent/javascript.vim
setlocal formatprg=js-beautify
let g:testvar_js="js testvar"
let g:testvar="testvar defined in javascript.vim"
and
~ % cat .vim/indent/html.vim
setlocal formatprg=html-beautify
let g:testvar_html="html testvar"
let g:testvar="testvar defined in html.vim"
Then I open up a new, empty vim buffer with vim foo.html, and tested with some commands:
:set filetype?
filetype=html
:set formatprg?
formatprg=js-beautify
:echo g:testvar
testvar defined in javascript.vim
:echo g:testvar_html
html testvar
:echo g:testvar_js
js testvar
As if vim sources both indent filetype plugins, with indent/html.vim first and then indent/javascript.vim.
Therefore, my questions are:
Did I make any silly mistakes?
If no, then is this an intentional design, a bug, or is that vim has nothing to do with this at all?
Is there a way to make vim only source on html.vim when editing html files?
Some additional information that might be helpful:
I'm on vim 8.2, macOS arm64, using Terminal.app
Neovim exhibits the same behaviour; actually that's where I first note it
This behaviour does not occur for ftplugin/, only indent/
javascript files are not affected by indent/html.vim: variables defined in indent/html.vim are all undefined in a javascript buffer
formatprg of html files is always js-beautify on open, regardless of if there are any javascript code pieces or <script> tags inside that html file
An indent/css.vim will not be involved at all when editing html - I've tested
js-beautify and html-beautify are two separate executables (repository is here)
bin % ls -n js-beautify
lrwxr-xr-x 1 501 80 53 Apr 19 17:59 js-beautify -> ../lib/node_modules/js-beautify/js/bin/js-beautify.js
bin % ls -n html-beautify
lrwxr-xr-x 1 501 80 55 Apr 19 17:59 html-beautify -> ../lib/node_modules/js-beautify/js/bin/html-beautify.js
If you want me to do some additional tests or need more information, just shout.
Many thanks
Here is a perfectly valid HTML sample:
<!DOCTYPE html>
<html lang="en">
<head>
<title>Sample</title>
<script>
console.log('Hello, World!');
</script>
<style>
body {
background: orange;
}
</style>
</head>
<body>
<h1>Sample</h1>
</body>
</html>
You will notice it has a tiny bit of embedded JavaScript in it, which is a good enough reason for $VIMRUNTIME/indent/html.vim to source $VIMRUNTIME/indent/javascript.vim. After all, the javascript indent script is supposed to know how to indent JavaScript, so why not use it in a html buffer that can contain embedded JavaScript?
FWIW, here is the snippet responsible for that behaviour:
if !exists('*GetJavascriptIndent')
runtime! indent/javascript.vim
endif
Note that the maintainers of $VIMRUNTIME/indent/html.vim chose the external route for javascript and the internal one for css. Maybe because $VIMRUNTIME/indent/css.vim didn't fit the bill? I don't know and, frankly, I don't think it matters.
Now, let's go through your mistakes…
Filetype-specific scripts (indent, syntax, ftplugins) are sourced in this order:
~/.vim/indent/<filetype>.vim,
$VIMRUNTIME/indent/<filetype>.vim
~/.vim/after/indent/<filetype>.vim
If you are not very careful, stuff you put in an earlier script might be overwritten when a later script is sourced. For that reason, it makes a lot more sense to put your own stuff in scripts under after/.
The following lines have nothing to do in indent scripts:
setlocal formatprg=js-beautify
setlocal formatprg=html-beautify
They are supposed to be in ftplugins:
" after/ftplugin/javascript.vim
setlocal formatprg=js-beautify
" after/ftplugin/html.vim
setlocal formatprg=html-beautify
So…
Did I make any silly mistakes?
Yes, see above.
If no, then is this an intentional design, a bug, or is that vim has nothing to do with this at all?
Well yes, this is an intentional design that works pretty well. It only caused problems because you misused it.
Is there a way to make vim only source on html.vim when editing html files?
indent/html.vim? Yes, it certainly is possible but why would you want to do that?
ftplugin/html.vim? It already works the way you want and it is the right place for the things you mistakenly put in indent/html.vim to begin with.
--- EDIT ---
Just curious, indent/ files are supposed to set indentation options right, then why shouldn't I set the indentation program there?
Filetype-specific scripts are typically sourced once, when a file of the corresponding filetype is loaded into a buffer. Because it is relatively common to have languages embedded in other languages (JavaScript in HTML) or languages that are supersets of other languages (C++ vs C), Vim makes it possible to source other filetype-specific scripts. That's pretty much a concrete example of code reuse and that's generally considered a good thing.
Indent scripts can source other indent scripts, syntax scripts can source other syntax scripts, and ftplugins can source other ftplugins.
So Vim gives us a useful low-level mechanism but it is up to us to decide what to put where, and that always depends on the context.
In the case of HTML, it makes sense to use the existing JavaScript indent stuff, so $VIMRUNTIME/indent/html.vim sources $VIMRUNTIME/indent/javascript.vim early on and then proceeds with setting HTML-specific stuff. The end result is a html indent script that also supports embedded JavaScript. The html syntax script uses a similar mechanism in order to highlight embedded JavaScript. In some simple cases, you can even have one ftplugin sourcing another ftplugin but $VIMRUNTIME/ftplugin/html.vim doesn't.
But it doesn't always makes sense: options may be overwritten, mappings may be overwritten or defined in contexts where they don't make sense, etc. In this specific case, what external tool to use for formatting is highly context-sensitive: you can't really expect js-beautify to format HTML properly or html-beautify to format JavaScript properly so formatprg must be set separately for the javascript and html filetypes.
ANd this is where your first mistake kicks in.
Here is once again the snippet that sources $VIMRUNTIME/indent/javascript.vim from $VIMRUNTIME/indent/html.vim:
if !exists('*GetJavascriptIndent')
runtime! indent/javascript.vim
endif
:help :runtime is a smart alternative to :help :source that looks for files in :help 'runtimepath'. Because your ~/.vim/indent/javascript.vim is in your runtimepath, it will be sourced. Because there is a !, every matching file is going to be sourced. Because it comes first in runtimepath, it might be overwritten by later scripts.
In your case, $VIMRUNTIME/indent/html.vim automatically sources your ~/.vim/indent/javascript.vim, which contains stuff that shouldn't be set in a html buffer.
The after directory allows you to have the last word on what is set for a given filetype because built-in scripts rarely, if ever, do runtime! after/indent/<filetype>.vim
That explains why it is a bad idea to carelessly put your filetype-specific stuff in ~/.vim/{ftplugin,indent,syntax}/ and why you should put it in ~/.vim/after/{ftplugin,indent,syntax}/ instead.
I am using the free plan of the wordpress.com platform to host reference information on a small site. The goal is to be able to copy the code from the site page and place it in your own IDE, such as VSCode. Since the plan is free, all Wordpress features have been cut to a minimum, including the installation of plugins. It is possible to use only standard blocks such as HTML, Code, Classic Editor, etc. When it became necessary to publish highlighted code, I did not find anything better than to copy it from my code editor and convert it to HTML, then insert it into Wordpress standard HTML block. And for the first time everything was fine, i.e. I was able to copy a block of highlighted code from a page on my site and paste it into the VSCode code editor. And the code was displayed in the same way as on the page. But suddenly, everything changed and the following problems arose: the single quotes character (') began to display as an opening single quote (‘) and a closing single quote (’), which makes the code inoperable and needs to be edited, which is extremely inconvenient:
describe(‘Examples for Querying commands’, () => {
before(‘Navigate to querying page’, () => {
cy.visit(‘https://example.cypress.io/commands/querying‘);
});
// Скопируйте интересующий вас пример и вставьте его здесь
});
Double quotes began to display incorrectly on the site itself. Instead of ("), they began to display as (»):
cy.get(‘[data-test-id=»test-example»]’)
What could such a metamorphosis be connected with? It happened after the next resave of the edited page. The single quote character is encoded on the page as &apos replacing it with the symbol (') itself does nothing either. You can watch it here: https://kitchensinkcypress.wordpress.com/%d0%bf%d0%be%d0%b8%d1%81%d0%ba-%d1%8d%d0%bb%d0%b5%d0%bc%d0%b5%d0%bd%d1%82%d0%be%d0%b2/. The site is under construction. Please tell me how I can overcome it?
I am not sure if I can treat this as an answer, but it would be an answer if this is Wordpress bug. I found out that the reason of the issue is page update. Steps to reproduce:
Create html snippet for the highlighted code example.
Save it into html file, open the file in browser and ensure that all characters are displayed properly.
Create HTML block on the page and insert there the content of the mentioned html file.
Open the page in browser and insure that the code is displayed properly.
Copy the code snippet from the page and paste it into VSCode editor. Ensure that the code is displayed properly.
Now make any changes anywhere on the web page except of the mentioned HTML block and press Save button.
Expected result: The mentioned code snippet is still displayed properly as no changes were maid inside its HTML block.
Actual result: The snippet has been corrupted. Double quotes (") is turned into (»), single quotes are displayed as opening single quote and closing single quotes if being copied from the page and pasted to VSCode editor:
cy.get(‘[data-test-id=»test-example»]’)
which makes the code not usable. So I believe that this is a Wordpress bug as we have two different results for displaying of the same block of the html code without any user manipulation over this block.
I noticed on my website, http://www.cscc.org.sg/, there's this odd symbol that shows up.
It says L SEP. In the HTML Code, it display the same thing.
Can someone shows me how to remove them?
That character is U+2028 or HTML entity code
which is a kind of newline character. It's not actually supposed to be displayed. I'm guessing that either your server side scripts failed to translate it into a new line or you are using a font that displays it.
But, since we know the HTML and UNICODE vales for the character, we can add a few lines of jQuery that should get rid of the character. Right now, I'm just replacing it with an empty space in the code below. Just add this:
$(document).ready(function() {
$("body").children().each(function() {
$(this).html($(this).html().replace(/
/g," "));
});
});
This should work, though please note that I have not tested this and may not work as none of my browsers will display the character.
But if it doesn't, you can always try pasting your text block onto http://www.nousphere.net/cleanspecial.php which will remove any special characters.
Some fonts render LS as L SEP. Such a glyph is designed for unformatted presentations of the character, such as when viewing the raw characters of a file in a binary editor. In a formatted presentation, actual line spacing should be displayed instead of the glyph.
The problem is that neither the web server nor web browser are interpreting the LS as a newline. The web server could detect the LS and replace it with <br>. Such a feature would fit well with a web server that dynamically generates HTML anyway, but would add overhead and complexity to a web server that serves file contents without modification.
If a LS makes its way to the web browser, the web browser doesn't interpret it as formatting. Page formatting is based only on HTML tags. For example, LF and CR just affect formatting of the HTML source code, not the web page's formatting (except in <pre> sections). The browser could in principle interpret LS and PS (paragraph separator) as <br> and <p>, but the HTML standard doesn't tell browsers to do that. (It seems to me like it would be a good addition.)
To replace the raw LS character with the line separation that the content creator likely intended, you'll need to replace the LS characters with HTML markup such as <br>.
This is the solution for the 'strange symbol' issue.
$(document).ready(function () {
$("body").children().each(function() {
document.body.innerHTML = document.body.innerHTML.replace(/\u2028/g, ' ');
});
})
The jquery/js solutions here work to remove the character, but it broke my Revolution Slider. I ended up doing a search replace for the character on the wp_posts tabel with Better Search Replace plugin: https://wordpress.org/plugins/better-search-replace/
When you copy paste the character from a page to the plugin box, it is invisible, but it does work. Before doing DB replaces, always have a database (or full) backup ready! And be sure to uncheck the bottom checkbox to not do a dry run with the plugin.
I'm afraid this is highly specific, so please bear with me and read carefully.
The problem:
Open a PDF file, select and copy some text that contains line breaks and paste it into a TinyMCE textarea in the Google Chrome browser. Then delete any line break and insert a space at the same point: the space that is added is non-breaking even though I used a regular "space bar" key stroke in TinyMCE.
How do I know there is a non-breaking space?
You can click the "show invisible characters" button on the first row of my TinyMCE implementation (see link below). Remember that with TinyMCE your must turn that option Off and On again every time you modify the text to see the changes.
The non-breaking spaces will appear in orange, normal spaces appear normally.
What I have found so far:
If I delete the character that comes after the line break and then type that character again, I can insert a normal space. The problem seems to be attached to that character.
If I delete the character occuring before the line break, the problem persists, i.e. when I delete the space and type a new space it is still a non breaking space.
Also when I save the text to the MySQL database, and read it again in TinyMCE, the problem still occurs, which reinforces my impression that the "hidden" character is attached to the letter following the line break (there is no saving on the test page of course).
Replicating it
You could of course try it yourself, but here is my testbed for you: http://www.roseback.com/test/tinymce4.html
I have tested it with many PDF files that we receive from graphic designers, from many products and eras. These PDFs are the files that are used for printing and there is no problem with those files for that use.
I uploaded a sample file here: http://www.roseback.com/test/languedoc.pdf. Test with the first paragraph starting with "Ce film exceptionnel".
However I have also tested random PDF files from the web and replicated the problem every time. So if you try with your own files and can't replicate, that might be interesting.
Environment:
Web page: the page is in HTML5, in UTF-8.
On the original page, the page is served via PHP and the textarea content comes from a MySQL 5.1 DB. The DB connection is set to UTF-8 in PHP, the content of the table and of the text field is in utf8_unicode_ci
On the test page there is no content and no saving, so no DB is involved.
Browser: Chrome. Does not happen in Firefox or Opera (not tested elsewhere)
TinyMCE: version 3 and version 4 (both standard version, not jQuery)
OS: on Windows 7 Pro 64 bit and also on Windows XP Pro 32 bit
I would appreciate any feedback, even simple confirmation / replication of the problem.
Hmm, i think what you observe has something to do with the fact that tinymce inserts non breaking spaces instead of spaces. Tinymce needs to so this in order to avoid that the browser shows more than one space concurrently entered as one single space (this is the default browser behaviour).
You can verify this by inserting more than one space and then have a look at the non-visible characters.
I am under Linux and I want to fetch an html page from the web and then output it on terminal. I found out that html2text essentially does the job, but it converts my html to a plain text whereas I would better convert it into ansi colored text in the spirit of ls --color=auto. Any ideas?
The elinks browser can do that. Other text browsers such as lynx or w3m might be able to do that as well.
elinks -dump -dump-color-mode 1 http://example.com/
the above example provides a text version of http://example.com/ using 16 colors. The output format can be customized further depending on need.
The -dump option enables the dump mode, which just prints the whole page as text, with the link destinations printed out in a kind of "email-style".
-dump-color-mode 1 enables the coloring of the output using the 16 basic terminal colors. Depending on the value and the capabilities of the terminal emulator this can be up to ~16 million (True Color). The values are documented in elinks.conf(5).
The colors used for output can be configured as well, which is documented in elinks.conf(5) as well.
The w3m browser supports coloring the output text.
You can use the lynx browser to output the text using this command.
lynx -dump http://example.com