Make Plone accept any HTML input - html

What's the easiest way, in repeatable manner, to make Plone WYSIWYG and HTML input accept all elements and styles?
The use case
Lot of private sites
Trusted editors
Advanced editors (able to produce hand crafted HTML and want to produce hand crafted HTML)
Plone element whitelisting is more hinder in these kind of cases.
Implementation
Add-on product, with big button "disable all HTML security"
Since HTML filtering wants you to type in every CSS directive by hand, it is not practical in any mean. There must be a hidden switch to turn off all HTML filtering somehow?
Also is it possible to make unsafe HTML easily possible for Archetypes/Dexterity Rich text fields?

It's not a big button, but it's not hard either. In the Zope Management Interface, at the top of a Plone site, go to portal_transforms and open the safe_html transform. Disable it by putting a "1" in the disable_transform field and saving.
This should not be done if there are any untrusted content authors or editors — or any naive enough to copy code from a third-party site and paste it into an editor.

You now need to follow the linked post below by David Glick... but it's stupidly complicated and i gave up, instead favouring using customOverrides product to insert my js.
http://glicksoftware.com/blog/disable-html-filtering
Also, the Plone doco about this is wrong (http://docs.plone.org/develop/plone/forms/wysiwyg.html#disabling-html-filtering-and-safe-html-transformation) and needs to be updated and in all honesty this sort of thing is an instant turn off for would be new adopters.

Related

Can I integrate grapejs website builder into my own website

Does anyone know if I can integrate GrapeJS into my own website so clients could build their own websites using it? IF anyone has done this, how easy is it and are there downsides?
This question is pretty open ended, but I'll take a shot at it.
The short answer is yes, you can use Grapesjs to allow clients to make their own sites; however, the details matter.
Grapesjs by default doesn't know anything about your stack, website structure, metadata, etc. You will need to either supply plugins or implement those features yourself. I've worked on a project for a company that used Grapesjs to implement single page apps and I'll include just some of the tweaks we had to manage.
Hiding certain layers that only confuse average users.
Hiding pretty much all of the styling, and using traits to allow people to pick from some predefined styles.
Take the html, css on store and generate the final html page, and store it in our static serving folder on the server.
Implement a wrapping "App" component that has traits for the different metadata we want users to control (open graph metadata, title, etc)
and those are just the big things, I'm sure I am forgetting several small ones.
For your application, you'll also need to implement a custom trait for links / buttons that allows you to link from one "page" to another. As well as, a way to allow a user to pick which page to work on.
The long answer is Yes, but Grapesjs is only the starting point.
Yes you can.
However it is not straightforward.
If you want to build a Drag Drop Editor like GrapeJS Demo, here is the Source Code - https://github.com/artf/grapesjs-preset-webpage
You can see an implementation at https://codegres.org/dragdrop

keep user-generated content from breaking layout?

I have a site that wraps some user-generated content, and I want to be able to separate the markup for the layout, and the markup from the user-generated content, so the u-g content can't break the site layout.
The user-generated content is trusted, as it is coming from a known group of users on my network, but nonetheless only a small subset of html tags are allowed (p, ul/ol/li, em, strong, and a couple more). However, the user-generated content is not guaranteed to be well-formed, and we have had some instances of malformed user-generated content breaking the layout of the site.
We are working with our users to keep the content well-formed, but in the meantime I am trying to find a good way to separate the content from the layout. I have been looking into namespaces, but have been unable to find good documentation about CSS support for embedded namespaces.
Anyone have any good ideas?
EDIT
I have seen some really good suggestions here, but I should probably clarify that I have absolutely no control over the input mechanism that the users use. They are entering content into one system, and my page uses that system's API to pull content out of it. That system is using TinyMCE, but like I said, we are still getting some malformed content.
Why not use markdown
If your users are HTML literate or people that can grasp the concept of markdown syntax I suggest you rather go with that. Stackoverflow works great with it. I can't imagine having a usual rich editor on Stackoverflow. Markdown editors are much simpler and faster to use and provide enough formatting capabilities for most situations. If you need some special additional features you can always add those in but for starters oute of the box capabilities will suffice.
Real-time view for self validation
But don't forget to include a real time view of what users are writing. Self validation makes miracles so they correct their own mistakes before posting data.
Instead of parsing the result or forcing the user to use a structured format, just display the content within an iframe:
<iframe id="user_html"></iframe>
<script>
document.getElementById("user_html").src = "data:text/html;charset=utf-8," + escape(content);
</script>
I built custom CMS systems exclusively for several years and always had great luck with a combination of a quality WYSIWYG, strong front-end validation, and relentless back-end validation.
I always gravitate toward CKEditor because it's the only front-end editor that can deal with Microsoft Word output on the front end...that's a must-have in my books. Sure, others have a paste from word solution, but good luck getting users to use it. I've actually had a client overload a db insert thanks to Microsoft Word that didn't get scrubbed in Tiny. HTML tidy is a great solution to clean things up prior to validation on the back end.
CK has built-in templates and classes, so I used those to help my users format without going overboard. On the back-end I checked to ensure they hadn't tried any funny business with CSS, but it was never a concern with that group of users. Give them enough (safe) features and they'll never HAVE to go rogue.
Maybe overkill, but HTML
Tidy
could help if you can use it.
Use a WYSIWYG like
TinyMCE
or CKEditor that has built in cleanup methods.
Robert Koritnik's suggestion to use markdown seems brilliant, especially considering that you only allow a few harmless formatting tags.
I don't think there's anything you can do with CSS to stop layouts from breaking due to open HTML tags, so I would probably forget that idea.

FCKEditor breaking HTML forms

I'm in the process of reproducing some standalone HTML forms as pages in a CMS that uses FCKEditor by simply copying and pasting the relevant code into the editor.
But when I save and view the page, the HTML has been changed and the tag has been moved up to just below the open tag -- and not at the bottom of the form. This obviously renders all of the fields in the form, including the submit button, useless.
Is there a way to tell FCKEditor that I know what I'm doing and I don't need it to validate the HTML output?
Unfortunately this is a hosted CMS service (actually part of an email blast tool) so making changes to the configuration will mean I need to go through the company's support system, which is fine -- but they haven't been able to solve it for me yet, so I'm hoping to get the answers for them.
Thanks!
This is a bit of a difficult thing because as far as I know, it's not necessarily the WYSIWYG editors that "fix" "broken" HTML, it's the browsers' HTML editing engines themselves, and it's often near impossible to talk them out of doing this.
You'd have to show your exact source to get detailed feedback, but check out whether protectedSource is something for you. It's supposed to protect code that is covered by the regular expression you specify.
I'm not sure about FCKEditor, but you might want to consider switching to TinyMCE. TinyMCE allows you to both edit a list of allowed tags, and to turn off HTML validation off completely if you like.

WYSIWYG-editor with "add custom html feature" and secure (validated) html output?

I've been looking into some of the WYSIWYG editors (TinyMCE, FCKEditor, etc.) and they all seem to offer a lot of options.
However, one vital feature that seems to lack is a simple "add custom html" option which would allow the user to input any of these embed-snippets you find all around the web these days, for example a youtube video. This is different than a "edit html/source" feature as that requires actual knowledge of html and there is the risk of the user writing invalid code.
Another issue that I couldn't find much about is the output html. How would I make sure that this output causes no security invulnerabilities? Even when the user has the ability to add his own html?
So, basically, is there an open source WYSIWYG editor which covers these 2 features?
FCKEditor achieves this via plugins. e.g. http://sourceforge.net/projects/youtubepluginfo/
For the first part, you either have the "view source" view of the editor or, if that is too complex, I'm pretty sure such plugins already exist for all major editors. If they don't, building a "insert arbitrary HTML" plugin should be easy to implement by tweaking another simple plug-in like the youTube one linked to in Martin's answer.
The second part - sanitizing the incoming HTML - is impossible to achieve in the WYSIWYG editor itself, because it acts solely on the client side, and fills content into a form input that could be manipulated anyway, even though you turn off the "custom HTML" function in the editor.
Therefore, the sanitizing of the HTML needs to take place on server side. If you can use PHP, a tool that looks very good to me from the outside - I haven't worked with it but plan to in the near future - is HTML Purifier. It claims to produce reliable HTML with minimum hassle.

Embed section of HTML from another site?

Is there a way to embed only a section of a website in another HTML page?
Example: I see an answer I want to blog about, so I grab the HTML content, and splat it in somewhere, and show only that, styled like it is on stackoverflow. Basically, I want to blockquote the section of the page with original styling, if that makes sense. Is that something the site itself has to provide, or can I use an iframe and tell it to show only a certain element or something crazy? Open to all options, but I want it to show up as HTML, not as an image (that's really a last resort).
If this is even possible, are there security concerns I need to aware of?
Don't think image should really be last resort. You have no control over the HTML/CSS of the source page, so even if you craft a solution (probably by using JavaScript to parse out the desired snippet) there is no guarantee that tomorrow the site doesn't decide to change its layout.
Even Jeff, who has control over the layout of stackoverflow.com, still prefers to screen-capture the site, rather than pull in the contents live.
Now if your goal was to have the contents auto-update, that would be a different story. But still, unless you use some agreed-upon method of sharing content, such as RSS, your solution would be very fragile.
The concept you are describing is roughly what is called a "purple include" or "transclusions". There is a library out there for it, but its not exactly actively developed. Here's a couple ajaxian articles on it.
I'd recommend using a server side solution with Python; using urllib2 to request the page, then using BeautifulSoup to parse out the bit that you need. BeautifulSoup has a very flexible selection api with which you can craft heuristics for the section you are interested in.
To illustrate:
soup = BeautifulSoup(html)
text = soup.find(text="Some text on the page that is unlikely to change")
print soup.parent.prettify()
That way if the webmaster later changes the markup on the page, your scraping script should still work.
On client side <iframe> is the only practical option. It is possible to scroll it, but it might not work in the long term, because it's technically close to clickjacking attack.
There's also cross-site XHR, but requires opt-in from destination site, and today works only in few latest browsers.
Getting HTML on server side is easy (every decent web framework has ability to download page and parse HTML and you can use XPath/XSLT or DOM to extract bit you want).
Getting styles however is going to be tricky – CSS rules may not work with HTML fragment taken out of context. You'd have to parse CSS, extract and transform rules or use browser and read currentStyle of every node.
Obviously you have to heavily filter HTML you extract to avoid XSS. It's harder than it seems.
If you don't need to automate this, a good HTML+CSS WYSIWYG editor might be able to extract content fragment with styles.
That sounds like something that IE8's Web Slices would be perfect for. However, it's only available in IE8, and the site of origin would have to implement for you to be able to take advantage of it.