CakePHP JSON Rendering - json

I cannot for the life of me work out why
$xml['interaction']['twitteraccount'] = 'hello';
Causes my JSON output to render as HTML rather than JSON. I've tried all options and played around for a while. Surely I'm missing something? As soon as I take that line out again, it renders as JSON!
public function lifestream()
{
$this->RequestHandler->setContent('json', 'application/json' );
$this->set('interactions', $this->Interaction->find('all'));
$xmlArray = array();
foreach($this->Interaction->find('all') as $interaction) {
$sourceexploded = explode("/",$interaction['Interaction']['source']);
if($sourceexploded[0] == "twitter") {
$xml['interaction']['source'] = $sourceexploded[0];
$xml['interaction']['twitteraccount'] = 'hello';
} else {
$xml['interaction']['source'] = $interaction['Interaction']['source'];
}
$xml['interaction']['timestamp'] = $interaction['Interaction']['timestamp'];
$xml['interaction']['receivedfrom'] = $interaction['Interaction']['receivedfrom'];
$xmlArray[] = $xml;
}
echo json_encode($xmlArray);

You have to use the JsonView.
In your route.php write: Router::parseExtensions('json');
In your controller you have to set the RequestHandler Component.
class SomeNameController{
public $components = array('RequestHandler');
public function lifestream(){
$this->RequestHandler->setContent('json', 'application/json' );
$this->set('interactions', $this->Interaction->find('all'));
$xmlArray = array();
foreach($this->Interaction->find('all') as $interaction) {
/* do stuff */
$xmlArray[] = $xml;
}
$this->set('data', $xmlArray);
$this->set('_serialize', array(
'data',
));
}
}
Try to go on "samename/lifestream.json" now or make an HTTP request with "Content-Type: application/json".
Look at : http://book.cakephp.org/2.0/en/views/json-and-xml-views.html

add this 2 line of code:
$this->layout = 'ajax';
$this->autoRender = false;
The first line instructs the render to use an empty layout called ajax (you can find it on Views/Layouts/ajax.ctp
The second one instruct the render to not look for an view template (ctp file)
Then when you echo the json_encode it will rendered as xml

Related

Download CSV in Mezzio Framework (Zend/Laminas)

In Mezzion Framework I have the next Handler:
<?php
namespace Bgc\Handler;
use App\Service\GenerateReportToCSV;
use Bgc\Queue\BGCQueueManager;
use Laminas\Diactoros\Response\TextResponse;
use League\Csv\Writer;
use Psr\Http\Message\ServerRequestInterface;
use Psr\Http\Server\RequestHandlerInterface;
class DownloadBgcReportHandler implements RequestHandlerInterface
{
protected $bgcQManager;
protected $reportToCSV;
public function __construct(BGCQueueManager $bgcQManager, $reportToCSV)
{
$this->bgcQManager = $bgcQManager;
$this->reportToCSV = $reportToCSV;
}
public function handle(ServerRequestInterface $request): TextResponse
{
$queryParams = $request->getQueryParams();
$params = [];
if (isset($queryParams['startDate'])) {
$starDate = new \DateTime($queryParams['startDate']);
$params['startDate'] = $starDate->modify('midnight');
}
if (isset($queryParams['startDate'])) {
$endDate = new \DateTime($queryParams['endDate']);
$params['endDate'] = $endDate->modify('tomorrow');
}
$itemsBGC = $this->bgcQManager->getDataToDownload($params);
$time = time();
$fileName = "bgc-report-$time.csv";
$csv = Writer::createFromFileObject(new \SplFileObject());
$csv->insertOne($this->reportToCSV->getHeadingsBGC());
foreach ($itemsBGC as $item) {
$csv->insertOne($item);
}
return new TextResponse($csv->getContent(), 200, [
'Content-Type' => 'text/csv',
'Content-Transfer-Encoding' => 'binary',
'Content-Disposition' => "attachment; filename='$fileName'"
]);
}
}
I have the below error:
Whoops\Exception\ErrorException: Declaration of Bgc\Handler\DownloadBgcReportHandler::handle(Psr\Http\Message\ServerRequestInterface $request): Laminas\Diactoros\Response\TextResponse must be compatible with Psr\Http\Server\RequestHandlerInterface::handle(Psr\Http\Message\ServerRequestInterface $request): Psr\Http\Message\ResponseInterface in file /home/peter/proyectos/revelations-thena-api/src/Bgc/src/Handler/DownloadBgcReportHandler.php on line 20
I don't know, to create a downloable file. The hadbler works fine with Json. I tried to change from ResponseInterface to TextResponse.
How can I download file CSV?
Thank you
The error you received is telling you that your method signature is not compliant to interface's method signature.
RequestHandlerInterface:
interface RequestHandlerInterface
{
public function handle(ServerRequestInterface $request): ResponseInterface;
}
As you see, the signature states that an object of type ResponseInterface is returned.
You modified the signature:
class DownloadBgcReportHandler implements RequestHandlerInterface
{
public function handle(ServerRequestInterface $request): TextResponse;
}
The signature must be the same, but then you can return the TextResponse without problem (since it extends Laminas\Diactoros\Response, which implements Psr\Http\Message\ResponseInterface)
Just change that and it will works :)
You have modified you handle method, so right now you aren't fulfilling the requirements of the RequestHandlerInterface
Replace the return value for the handler with ResponseInterface enforced in the interface: RequestHandlerInterface
so i think you are best helped with:
<?php
namespace Bgc\Handler;
use App\Service\GenerateReportToCSV;
use Bgc\Queue\BGCQueueManager;
use Laminas\Diactoros\Response;
use Laminas\Diactoros\Stream;
use League\Csv\Writer;
use Psr\Http\Message\ResponseInterface;
use Psr\Http\Message\ServerRequestInterface;
use Psr\Http\Server\RequestHandlerInterface;
class DownloadBgcReportHandler implements RequestHandlerInterface
{
protected $bgcQManager;
protected $reportToCSV;
public function __construct(BGCQueueManager $bgcQManager, $reportToCSV)
{
$this->bgcQManager = $bgcQManager;
$this->reportToCSV = $reportToCSV;
}
public function handle(ServerRequestInterface $request): ResponseInterface
{
$queryParams = $request->getQueryParams();
$params = [];
if (isset($queryParams['startDate'])) {
$starDate = new \DateTime($queryParams['startDate']);
$params['startDate'] = $starDate->modify('midnight');
}
if (isset($queryParams['startDate'])) {
$endDate = new \DateTime($queryParams['endDate']);
$params['endDate'] = $endDate->modify('tomorrow');
}
$itemsBGC = $this->bgcQManager->getDataToDownload($params);
$time = time();
$fileName = "bgc-report-$time.csv";
// $csv = Writer::createFromFileObject(new \SplFileObject());
// $csv->insertOne($this->reportToCSV->getHeadingsBGC());
$csv = Writer::createFromString($this->reportToCSV->getHeadingsBGC());
foreach ($itemsBGC as $item) {
$csv->insertOne($item);
}
$body = new Stream($csv->getContent());
return new Response($body, 200, [
'Cache-Control' => 'must-revalidate',
'Content-Disposition' => 'attachment; filename=' . $fileName,
'Content-Length' => strval($body->getSize()),
'Content-Type' => 'text/csv',
'Content-Transfer-Encoding' => 'binary',
'Expires' => '0',
'Pragma' => 'public',
]);
}
}
PS: i have commented the 2 lines in which an empty new \SplFileObject() was used, because the required param $filename was empty (and i did not want to make a decision there) and added a line with Writer::createFromString().

Laravel "Malformed UTF-8 characters, possibly incorrectly encoded", how to fix?

I try to return as json added attribute which I get with the following method in my User model but I keep getting
"message": "Malformed UTF-8 characters, possibly incorrectly encoded",
"exception": "InvalidArgumentException",
"file": "/var/www/timetool/vendor/laravel/framework/src/Illuminate/Http/JsonResponse.php",
the code
/**
* #return string
*/
public function getAvatarImageAttribute($value)
{
if($this->hasMedia('avatar')) {
$image = $this->getMedia('avatar');
$img = \Intervention\Image\ImageManagerStatic::make($image[0]->getPath())->encode('data-url');
}
elseif (isset($this->blob->dokument)) {
$img = 'data:image/jpeg;base64,'. base64_encode($this->blob->document);
} else {
$img = '';
}
return $img;
}
in controller I have
return \Response::json($users, 200, array('Content-Type' => 'application/json;charset=utf8'), JSON_UNESCAPED_UNICODE);
I'm thinking it's related to JSON needing only UTF8 chars and your blob may have invalid chars. Try utf8_encode($img). http://at2.php.net/manual/en/function.utf8-encode.php
In your controller just return. Laravel will build a proper json response for you.
Paste this function top of your document:
public static function convert_from_latin1_to_utf8_recursively($dat)
{
if (is_string($dat)) {
return utf8_encode($dat);
} elseif (is_array($dat)) {
$ret = [];
foreach ($dat as $i => $d) $ret[ $i ] = self::convert_from_latin1_to_utf8_recursively($d);
return $ret;
} elseif (is_object($dat)) {
foreach ($dat as $i => $d) $dat->$i = self::convert_from_latin1_to_utf8_recursively($d);
return $dat;
} else {
return $dat;
}
}
Call the above function to convert the content. It has a parameter just it need the value of blob image (binary):
$img = $this->convert_from_latin1_to_utf8_recursively($this->blob->document)
In my case the problem was the encoding of the controller. The solution was to convert it to UTF8 and the bug was fixed.

How to force convert json action in Cakephp Rest

i need force convert json action in cakephp rest response. When i set '_serialize' like this
$this->set(array('message' => $lessons, '_serialize' => array('message')));
it works but some turkish characters view in unicode like "\u00e7al\u0131\u015fma alan\u0131".
It's solution is render data with json_encode($data, JSON_UNESCAPED_UNICODE) but cakephp render it automatically. How to force json_encode with JSON_UNESCAPED_UNICODE ?
Sorry for bad English.
in cakePHP 3 you can say in controller:
$this->set('_jsonOptions', JSON_UNESCAPED_UNICODE);
$this->set('_serialize', ['zones']);
which will override the options used.
JsonView doesn't accept options
There's no way to inject options in the json_encode() call invoked by JsonView, as it's hard coded optionless in the _serialize() method like this:
protected function _serialize($serialize) {
// ...
if (version_compare(PHP_VERSION, '5.4.0', '>=') && Configure::read('debug')) {
return json_encode($data, JSON_PRETTY_PRINT);
}
return json_encode($data);
}
Use a custom/extended view
So if you want to use automatic serialization, then you have to create your own/an extended view that either accepts options, or hard codes your desired options.
Here's an (untested) example with hard coded options. The _serialize() method is basically just a copy with the JSON_UNESCAPED_UNICODE option added:
App::uses('JsonView', 'View');
class MyJsonView extends JsonView {
protected function _serialize($serialize) {
if (is_array($serialize)) {
$data = array();
foreach ($serialize as $alias => $key) {
if (is_numeric($alias)) {
$alias = $key;
}
if (array_key_exists($key, $this->viewVars)) {
$data[$alias] = $this->viewVars[$key];
}
}
$data = !empty($data) ? $data : null;
} else {
$data = isset($this->viewVars[$serialize]) ? $this->viewVars[$serialize] : null;
}
if (version_compare(PHP_VERSION, '5.4.0', '>=') && Configure::read('debug')) {
return json_encode($data, JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE);
}
return json_encode($data, JSON_UNESCAPED_UNICODE);
}
}
See also http://book.cakephp.org/2.0/en/views.html#creating-your-own-view-classes
I haven't tested this myself, so it's just from the top of my head.
You could write your own View class extending JsonView and override the _serialize method.
https://github.com/cakephp/cakephp/blob/4e8e266754a25748f481b2f567e45f767808be53/lib/Cake/View/JsonView.php#L131
<?php
App::uses('JsonView', 'View');
class MyCustomView extends JsonView {
protected function _serialize($serialize) {
if (is_array($serialize)) {
$data = array();
foreach ($serialize as $alias => $key) {
if (is_numeric($alias)) {
$alias = $key;
}
if (array_key_exists($key, $this->viewVars)) {
$data[$alias] = $this->viewVars[$key];
}
}
$data = !empty($data) ? $data : null;
} else {
$data = isset($this->viewVars[$serialize]) ? $this->viewVars[$serialize] : null;
}
return json_encode($data, JSON_UNESCAPED_UNICODE);
}
}
And then in your controller do something like
<?php
App::uses('MyCustomView', 'View');
class SomeController extends AppController {
public function someMethod() {
$this->viewClass = 'MyCustomView';
// What ever you normally do
}
}

How to return Repository Objects as Json on Symfony2

I'm trying to return the users like this, but of course it doesn't work, I need the data as JSon since im working with BackboneJs
/**
* #Route("/mytest",name="ajax_user_path")
*/
public function ajaxAction()
{
$em = $this->get('doctrine')->getManager();
$users = $this->get('doctrine')->getRepository('GabrielUserBundle:Fosuser')->findAll();
$response = array("users"=>$users);
return new Response(json_encode($response));
}
Thanks for your help guys, here is the Solution
Get the JMSSerializerBundle,
This is the code on the controller
/**
* #Route("/user")
* #Template()
*/
public function userAction()
{
$em = $this->get('doctrine')->getManager();
$users = $this->get('doctrine')->getRepository('GabrielUserBundle:Fosuser')->findAll();
$serializer = $this->get('jms_serializer');
$response = $serializer->serialize($users,'json');
return new Response($response);
}
So, findAll returns an array of entities (objects) and json_encode cannot correctly encode that array. You have to prepare your data berofe send response like that:
Example:
use Symfony\Component\HttpFoundation\JsonResponse;
/**
* #Route("/mytest",name="ajax_user_path")
*/
public function ajaxAction()
{
$users = $this->get('doctrine')->getRepository('GabrielUserBundle:Fosuser')->findAll();
$response = array();
foreach ($users as $user) {
$response[] = array(
'user_id' => $user->getId(),
// other fields
);
}
return new JsonResponse(json_encode($response));
}
Moreover, it would be great if you put preparing response to ex. UserRepository class.
With Symfony you have JsonResponse like :
return new JsonResponse($users);
And don't forget to add the header :
use Symfony\Component\HttpFoundation\JsonResponse;
I have never tried to encode a complete object, but I have used json with arrays of informations like this:
$vars = array(
'test' => 'test'
);
$response = new JsonResponse($vars);
return $response;
As you can see in JsonResponse, its function setData() is encoding the array, so you don't have to do it yourself:
public function setData($data = array())
{
// Encode <, >, ', &, and " for RFC4627-compliant JSON, which may also be embedded into HTML.
$this->data = json_encode($data, JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_AMP | JSON_HEX_QUOT);
return $this->update();
}

Sanitizing untrusted HTML5

I want to be able to accept HTML from untrusted users and sanitize it so that I can safely include it in pages on my website. By this I mean that markup should not be stripped or escaped, but should be passed through essentially unchanged unless it contains dangerous tags such as <script> or <iframe>, dangerous attributes such as onload, or dangerous CSS properties such as background URLs. (Apparently some older IEs will execute javascript URLs in CSS?)
Serving the content from a different domain, enclosed in an iframe, is not a good option because there is no way to tell in advance how tall the iframe has to be so it will always look ugly for some pages.
I looked into HTML Purifier, but it looks like it doesn't support HTML5 yet. I also looked into Google Caja, but I'm looking for a solution that doesn't use scripts.
Does anyone know of a library that will accomplish this? PHP is preferred, but beggars can't be choosers.
The black listing approach puts you under upgrade pressure. So each time browsers start to support new standards you MUST draw your sanitizing tool to the same level. Such changes happen more often than you think.
White listing (which is achieved by strip_tags with well defined exceptions) of cause shrinks options for your users, but puts you on the save site.
On my own sites I have the policy to apply the black listing on pages for very trusted users (such as admins) and the whitelisting on all other pages. That sets me into the position to not put much effort into the black listing. With more mature role & permission concepts you can even fine grain your black lists and white lists.
UPDATE:
I guess you look for this:
Allow user submitted HTML in PHP
with HTMLpurifier, how to add a couple attributes to the default whitelist, e.g. 'onclick'
I got the point that strip_tags whitelists on tag level but does accept everything on attribute level. Interestingly HTMLpurifier seems to do the whitelisting on attribute level. Thanks, was a nice learning here.
You might be able to do something along the lines of:
preg_replace('/<\s*iframe\s+[^>]*>.*<\s*\/\s*iframe\s+[^>]*>/i', '', $html);
preg_replace('/<\s*script\s+[^>]*>.*<\s*\/\s*script\s+[^>]*>/i', '', $html);
preg_replace('/\s+onload\s+=\s+"[^"]+"/i', '', $html);
... but then again: you have RegExes, now you have two problems - this might remove more than wanted and leave more than wanted as well.
But since HTML Purifier is probably the most modern and well suited (and open source) project you should still use that one and maybe make adjustments if you really need them.
You can check out one of the following as well:
kses - de facto standard, found a way into wordpress as well
htmLawed - an further developed kses
PHP Input Filter - can filter tags and attributes
Though you also have to make sure that your own page layout doesn't take a hit in including the results due to not closed tags.
Maybe it's better to go on a different approach?
How about telling them what they can use?
In that case you can use use strip_tags. It will be easier and a lot more controllable this way. Very easy to extend in the future aswell
On Ruby I'm using Nokogiri (php version) to parse HTML content. You can parse user's data and remove unnecessary tags or attributes, and then convert it to text.
phpQuery - another parser.
And in PHP there is a strip_tags function.
Or you can manualy remove all attributes:
$dom = new DOMDocument;
$dom -> loadHTML( $html );
$xpath = new DOMXPath( $dom );
$nodes = $xpath -> query( "//*[#style]" ); // all elements with style attribute
foreach ( $nodes as $node ) {
// remove or do what you want
$node -> removeAttribute( "style" );
}
echo $dom -> saveHTML();
See WdHTMLParser class. I use this class for my forum.
Sample with WdHTMLParser :
This class parse the HTML to an array :
<div>
<span>
<br />
<span>
un bout de texte
</span>
<input type="text" />
</span>
</div>
Array :
Array (
[0] => Array (
[name] => div
[args] => Array ()
[children] => Array (
[0] => Array (
[name] => span
[args] => Array ()
[children] => Array (
[0] => Array (
[name] => br
[args] => Array ()
)
[1] => Array (
[name] => span
[args] => Array ()
[children] => Array (
[0] => un bout de texte
)
)
[2] => Array (
[name] => input
[args] => Array (
[type] => text
)
)
)
)
)
)
)
WdHTMLParser array to HTML
I use this class on my website to convert array to HTML.
voyageWdHTML_allowattr : These attributes will be allowed.
voyageWdHTML_allowtag : These tags will be allowed.
voyageWdHTML_special : Make your own rules. Actually, I add "_blank" to each link. And replace <br> to new line (\n) in pre tag.
fix_javascript : You can to enable/disable this function, but it is useless.
Sample php :
<?php
include "WdHTMLParser.php";
include "parser.php";
list($erreur, $message) = (new Parser())->parseBadHTML("<div>
<span>
<a onclick=\"alert('Hacked ! :'(');\">Check javascript</a>
<script>alert(\"lol\");</script>
</span>
</div>");
if ($erreur) {
die("Error : ".$message);
}
echo $message;
Output :
<div>
<span>
<a target="_blank">Check javascript</a>
<pre>alert("lol");</pre>
</span>
</div>
My Parser class :
<?php
class Parser {
//private function fix_javascript(&$message) { }
private function voyageWdHTML_args($tab_args, $objname) {
$html = "";
foreach ($tab_args as $attr => $valeur) {
if ($valeur !== null && $this->voyageWdHTML_allowattr($attr)) {
$html .= " $attr=\"".htmlentities($valeur)."\"";
}
}
return $html;
}
private function voyageWdHTML_allowattr($attr) {
return in_array($attr, array("align", "face", "size", "href", "title", "target", "src", "color", "style",
"data-class", "data-format"));
}
private function voyageWdHTML_allowtag($name) {
return in_array($name, array("br", "b", "i", "u", "strike", "sub", "sup", "div", "ol", "ul", "li", "font", "span", "code",
"hr", "blockquote", "cite", "a", "img", "p", "pre", "h6", "h5", "h4", "h3", "h2", "h1"));
}
private function voyageWdHTML_special(&$obj) {
if ($obj["name"] == "a") { $obj["args"]["target"] = "_blank"; }
if ($obj["name"] == "pre") {
array_filter($obj["children"], function (&$var) {
if (is_string($var)) { return true; }
if ($var["name"] == "br") { $var = "\n"; return true; }
return false;
});
}
}
private function voyageWdHTML($tableau, $lvl = 0) {
$html = "";
foreach ($tableau as $obj) {
if (is_array($obj)) {
if (!$this->voyageWdHTML_allowtag($obj["name"])) {
$obj["name"] = "pre";
if (!isset($obj["children"])) {
$obj["children"] = array();
}
}
if (isset($obj["children"])) {
$this->voyageWdHTML_special($obj);
$html .= "<{$obj["name"]}{$this->voyageWdHTML_args($obj["args"], $obj["name"])}>{$this->voyageWdHTML($obj["children"], $lvl+1)}</{$obj["name"]}>";
} else {
$html .= "<{$obj["name"]}>";
}
} else {
$html .= $obj;
}
}
return $html;
}
public function parseBadHTML($message) {
$WdHTMLParser = new WdHTMLParser();
$message = str_replace(array("<br>", "<hr>"), array("<br/>", "<hr/>"), $message);
$tableau = $WdHTMLParser->parse($message);
if ($WdHTMLParser->malformed) {
$retour = $WdHTMLParser->error;
} else {
$retour = $this->voyageWdHTML($tableau);
//$this->fix_javascript($retour);// To make sur
}
return array($WdHTMLParser->malformed, $retour);
}
}
WdHTMLParser class
<?php
class WdHTMLParser {
private $encoding;
private $matches;
private $escaped;
private $opened = array();
public $malformed;
public function parse($html, $namespace = NULL, $encoding = 'utf-8') {
$this->malformed = false;
$this->encoding = $encoding;
$html = $this->escapeSpecials($html);
$this->matches = preg_split('#<(/?)' . $namespace . '([^>]*)>#', $html, -1, PREG_SPLIT_DELIM_CAPTURE);
$tree = $this->buildTree();
if ($this->escaped) {
$tree = $this->unescapeSpecials($tree);
}
return $tree;
}
private function escapeSpecials($html) {
$html = preg_replace_callback('#<\!--.+-->#sU', array($this, 'escapeSpecials_callback'), $html);
$html = preg_replace_callback('#<\?.+\?>#sU', array($this, 'escapeSpecials_callback'), $html);
return $html;
}
private function escapeSpecials_callback($m) {
$this->escaped = true;
$text = $m[0];
$text = str_replace(array('<', '>'), array("\x01", "\x02"), $text);
return $text;
}
private function unescapeSpecials($tree) {
return is_array($tree) ? array_map(array($this, 'unescapeSpecials'), $tree) : str_replace(array("\x01", "\x02"), array('<', '>'), $tree);
}
private function buildTree() {
$nodes = array();
$i = 0;
$text = NULL;
while (($value = array_shift($this->matches)) !== NULL) {
switch ($i++ % 3) {
case 0: {
if (trim($value)) {
$nodes[] = $value;
}
}
break;
case 1: {
$closing = ($value == '/');
}
break;
case 2: {
if (substr($value, -1, 1) == '/') {
$nodes[] = $this->parseMarkup(substr($value, 0, -1));
} else if ($closing) {
$open = array_pop($this->opened);
if ($value != $open) {
$this->error($value, $open);
}
return $nodes;
} else {
$node = $this->parseMarkup($value);
$this->opened[] = $node['name'];
$node['children'] = $this->buildTree($this->matches);
$nodes[] = $node;
}
}
}
}
return $nodes;
}
public function parseMarkup($markup) {
preg_match('#^[^\s]+#', $markup, $matches);
$name = $matches[0];
preg_match_all('#\s+([^=]+)\s*=\s*"([^"]+)"#', $markup, $matches, PREG_SET_ORDER);
$args = array();
foreach ($matches as $m) {
$args[$m[1]] = html_entity_decode($m[2], ENT_QUOTES, $this->encoding);
}
return array('name' => $name, 'args' => $args);
}
public function error($markup, $expected) {
$this->malformed = true;
printf('unexpected closing markup "%s", should be "%s"', $markup, $expected);
}
}
To make sur use, you can use this function (mybb.com) :
<?php
class Parser {
private function fix_javascript(&$message) {
$js_array = array(
"#(&\#(0*)106;?|&\#(0*)74;?|&\#x(0*)4a;?|&\#x(0*)6a;?|j)((&\#(0*)97;?|&\#(0*)65;?|a)(&\#(0*)118;?|&\#(0*)86;?|v)(&\#(0*)97;?|&\#(0*)65;?|a)(\s)?(&\#(0*)115;?|&\#(0*)83;?|s)(&\#(0*)99;?|&\#(0*)67;?|c)(&\#(0*)114;?|&\#(0*)82;?|r)(&\#(0*)105;?|&\#(0*)73;?|i)(&\#112;?|&\#(0*)80;?|p)(&\#(0*)116;?|&\#(0*)84;?|t)(&\#(0*)58;?|\:))#i",
"#(o)(nmouseover\s?=)#i",
"#(o)(nmouseout\s?=)#i",
"#(o)(nmousedown\s?=)#i",
"#(o)(nmousemove\s?=)#i",
"#(o)(nmouseup\s?=)#i",
"#(o)(nclick\s?=)#i",
"#(o)(ndblclick\s?=)#i",
"#(o)(nload\s?=)#i",
"#(o)(nsubmit\s?=)#i",
"#(o)(nblur\s?=)#i",
"#(o)(nchange\s?=)#i",
"#(o)(nfocus\s?=)#i",
"#(o)(nselect\s?=)#i",
"#(o)(nunload\s?=)#i",
"#(o)(nkeypress\s?=)#i"
);
$message = preg_replace($js_array, "$1<b></b>$2$4", $message);
}
}
I decided to just use html5lib-python. This is what I came up with:
#!/usr/bin/env python
import sys
from xml.dom.minidom import Node
import html5lib
from html5lib import (HTMLParser, sanitizer, serializer, treebuilders,
treewalkers)
parser = HTMLParser(tokenizer=sanitizer.HTMLSanitizer,
tree=treebuilders.getTreeBuilder("dom"))
serializer = serializer.htmlserializer.HTMLSerializer(omit_optional_tags=False)
document = parser.parse(sys.stdin.read(), encoding="utf-8")
# find the <html> node
for child in document.childNodes:
if child.nodeType == Node.ELEMENT_NODE and child.nodeName == 'html':
htmlNode = child
# find the <body> node
for child in htmlNode.childNodes:
if child.nodeType == Node.ELEMENT_NODE and child.nodeName == 'body':
bodyNode = child
# serialize all children of the <body> node
for child in bodyNode.childNodes:
stream = treewalkers.getTreeWalker("dom")(child)
sys.stdout.write(serializer.render(stream, encoding="utf-8"))
Example input:
<script>alert("hax")</script>
<p onload="alert('this is a dangerous attribute')"><b>hello,</b> world</p>
Example output:
<script>alert("hax")</script>
<p><b>hello,</b> world</p>
I personally use HTML Purifier for this exact purpose:
http://htmlpurifier.org/docs
It works well and allows you to customize down to every tag and attribute. So far I have had no security issues with this plugin.