TLDR; I want to enable database-logging of xss_clean() when replacing evil data.
I want to enable database logging of the xss_clean() function in Security.php, basically what I want to do is to know if the input I'm feeding xss_clean() with successfully was identified to have malicious data in it that was filtered out or not.
So basically:
$str = '<script>alert();</script>';
$str = xss_clean($str);
What would happen ideally for me is:
Clean the string from XSS
Return the clean $str
Input information about the evil data (and eventually the logged in user) to the database
As far as I can see in the Security.php-file there is nothing that takes care of this for me, or something that COULD do so by hooks etc. I might be mistaken of course.
Since no logging of how many replaces that were made in Security.php - am I forced to extend Security.php, copy pasting the current code in the original function and altering it to support this? Or is there a solution that is more clean and safe for future updates of CodeIgniter (and especially the files being tampered/extended with)?
You would need to extend the Security class, but there is absolutely no need to copy and paste any code if all you need is a log of the input/output. Something along the lines of the following would allow you to do so:
Class My_Security extends CI_Security {
public function xss_clean($str, $is_image = FALSE) {
// Do whatever you need here with the input ... ($str, $is_image)
$str = parent::xss_clean($str, $is_image);
// Do whatever you need here with the output ... ($str)
return $str;
}
}
That way, you are just wrapping the existing function and messing with the input/output. You could be more forward compatible by using the PHP function get_args to transparently pass around the arguments object, if you were concerned about changes to the underlying method.
Related
I have been reading Clean Code by Robert C. Martin and have a basic (but fundamental) question about functions and program structure.
The book emphasizes that functions should:
be brief (like 10 lines, or less)
do one, and only one, thing
I am a little unclear on how to apply this in practice. For example, I am developing a program to:
load a baseline text file
parse baseline text file
load a test text file
parse test text file
compare parsed test with parsed baseline
aggregate results
I have tried two approaches, but neither seem to meet Martin's criteria:
APPROACH 1
setup a Main function that centrally commands other functions in the workflow. But then main() can end up being very long (violates #1), and is obviously doing many things (violates #2). Something like this:
main()
{
// manage steps, one at a time, from start to finish
baseFile = loadFile("baseline.txt");
parsedBaseline = parseFile(baseFile);
testFile = loadFile("test.txt");
parsedTest = parseFile(testFile);
comparisonResults = compareFiles(parsedBaseline, parsedTest);
aggregateResults(comparisonResults);
}
APPROACH 2
use Main to trigger a function "cascade". But each function is calling a dependency, so it still seems like they are doing more than one thing (violates #2?). For example, calling the aggregation function internally calls for results comparison. The flow also seems backwards, as it starts with the end goal and calls dependencies as it goes. Something like this:
main()
{
// trigger end result, and let functions internally manage
aggregateResults("baseline.txt", "comparison.txt");
}
aggregateResults(baseFile, testFile)
{
comparisonResults = compareFiles(baseFile, testFile);
// aggregate results here
return theAggregatedResult;
}
compareFiles(baseFile, testFile)
{
parsedBase = parseFile(baseFile);
parsedTest = parseFile(testFile);
// compare parsed files here
return theFileComparison;
}
parseFile(filename)
{
loadFile(filename);
// parse the file here
return theParsedFile;
}
loadFile(filename)
{
//load the file here
return theLoadedFile;
}
Obviously functions need to call one another. So what is the right way to structure a program to meet Martin's criteria, please?
I think you are interpreting rule 2 wrong by not taking context into account. The main() function only does one thing and that is everything, i.e. running the whole program. Let's say you have a convert_abc_file_xyz_file(source_filename, target_filename) then this function should only do the one thing its name (and arguments) implies: converting a file of format abc into one of format xyz. Of course on a lower level there are many things to be done to achieve this. For instancereading the source file (read_abc_file(…)), converting the data from format abc into format xyz (convert_abc_to_xyz(…)), and then writing the converted data into a new file (write_xyz_file(…)).
The second approach is wrong as it becomes impossible to write functions that only do one thing because every functions does all the other things in the ”cascaded” calls. In the first approach it is possible to test or reuse single functions, i.e. just call read_abc_file() to read a file. If that function calls convert_abc_to_xyz() which in turn calls write_xyz_file() that is not possible anymore.
First, here the way i'm calling the function :
eval([functionName '(''stringArg'')']); % functionName = 'someStringForTheFunctionName'
Now, I have two functionName functions in my path, one that take the stringArg and another one that takes something else. I'm getting some errors because right now the first one it finds is the function that doesn't take the stringArg. Considering the way i'm calling the functionName function, how is it possible to call the correct function?
Edit:
I tried the function which :
which -all someStringForTheFunctionName
The result :
C:\........\x\someStringForTheFunctionName
C:\........\y\someStringForTheFunctionName % Shadowed
The shadowed function is the one i want to call.
Function names must be unique in MATLAB. If they are not, so there are duplicate names, then MATLAB uses the first one it finds on your search path.
Having said that, there are a few options open to you.
Option 1. Use # directories, putting each version in a separate directory. Essentially you are using the ability of MATLAB to apply a function to specific classes. So, you might set up a pair of directories:
#char
#double
Put your copies of myfun.m in the respective directories. Now when MATLAB sees a double input to myfun, it will direct the call to the double version. When MATLAB gets char input, it goes to the char version.
BE CAREFUL. Do not put these # directories explicitly on your search path. DO put them INSIDE a directory that is on your search path.
A problem with this scheme is if you call the function with a SINGLE precision input, MATLAB will probably have a fit, so you would need separate versions for single, uint8, int8, int32, etc. You cannot just have one version for all numeric types.
Option 2. Have only one version of the function, that tests the first argument to see if it is numeric or char, then branches to perform either task as appropriate. Both pieces of code will most simply be in one file then. The simple scheme will have subfunctions or nested functions to do the work.
Option 3. Name the functions differently. Hey, its not the end of the world.
Option 4: As Shaun points out, one can simply change the current directory. MATLAB always looks first in your current directory, so it will find the function in that directory as needed. One problem is this is time consuming. Any time you touch a directory, things slow down, because there is now disk input needed.
The worst part of changing directories is in how you use MATLAB. It is (IMHO) a poor programming style to force the user to always be in a specific directory based on what code inputs they wish to run. Better is a data driven scheme. If you will be reading in or writing out data, then be in THAT directory. Use the MATLAB search path to categorize all of your functions, as functions tend not to change much. This is a far cleaner way to work than requiring the user to migrate to specific directories based on how they will be calling a given function.
Personally, I'd tend to suggest option 2 as the best. It is clean. It has only ONE main function that you need to work with. If you want to keep the functions district, put them as separate nested or sub functions inside the main function body. Inside of course, they will have distinct names, based on how they are driven.
OK, so a messy answer, but it should do it. My test function was 'echo'
funcstr='echo'; % string representation of function
Fs=which('-all',funcstr);
for v=1:length(Fs)
if (strcmp(Fs{v}(end-1:end),'.m')) % Don''t move built-ins, they will be shadowed anyway
movefile(Fs{v},[Fs{v} '_BK']);
end
end
for v=1:length(Fs)
if (strcmp(Fs{v}(end-1:end),'.m'))
movefile([Fs{v} '_BK'],Fs{v});
end
try
eval([funcstr '(''stringArg'')']);
break;
catch
if (strcmp(Fs{v}(end-1:end),'.m'))
movefile(Fs{v},[Fs{v} '_BK']);
end
end
end
for w=1:v
if (strcmp(Fs{v}(end-1:end),'.m'))
movefile([Fs{v} '_BK'],Fs{v});
end
end
You can also create a function handle for the shadowed function. The problem is that the first function is higher on the matlab path, but you can circumvent that by (temporarily) changing the current directory.
Although it is not nice imo to change that current directory (actually I'd rather never change it while executing code), it will solve the problem quite easily; especially if you use it in the configuration part of your function with a persistent function handle:
function outputpars = myMainExecFunction(inputpars)
% configuration
persistent shadowfun;
if isempty(shadowfun)
funpath1 = 'C:\........\x\fun';
funpath2 = 'C:\........\y\fun'; % Shadowed
curcd = cd;
cd(funpath2);
shadowfun = #fun;
cd(curcd); % and go back to the original cd
end
outputpars{1} = shadowfun(inputpars); % will use the shadowed function
oupputpars{2} = fun(inputparts); % will use the function highest on the matlab path
end
This problem was also discussed here as a possible solution to this problem.
I believe it actually is the only way to overload a builtin function outside the source directory of the overloading function (eg. you want to run your own sum.m in a directory other than where your sum.m is located.)
EDIT: Old answer no longer good
The run command won't work because its a function, not a script.
Instead, your best approach would be honestly just figure out which of the functions need to be run, get the current dir, change it to the one your function is in, run it, and then change back to your start dir.
This approach, while not perfect, seems MUCH easier to code, to read, and less prone to breaking. And it requires no changing of names or creating extra files or function handles.
I'm a big fan of abbrev-mode and I'd like something a bit similar: you start typing and as soon as you enter some punctation (or just a space would be enough) it invokes a function (if I type space after a special abbreviation, of course, just like abbrev-mode does).
I definitely do NOT want to execute some function every single time I hit space...
So instead of expanding the abbreviation using abbrev-mode, it would run a function of my choice.
Of course it needs to be compatible with abbrev-mode, which I use all the time.
How can I get this behavior?
One approach could be to use pre-abbrev-expand-hook. I don't use abbrev mode myself, but it rather sounds as if you could re-use the abbrev mode machinery this way, and simply define some 'abbreviations' which expand to themselves (or to nothing?), and then you catch them in that hook and take whatever action you wish to.
The expand library is apparently related, and that provides expand-expand-hook, which may be another alternative?
edit: Whoops; pre-abbrev-expand-hook is obsolete since 23.1
abbrev-expand-functions is the correct variable to use:
Wrapper hook around `expand-abbrev'.
The functions on this special hook are called with one argument:
a function that performs the abbrev expansion. It should return
the abbrev symbol if expansion took place.
See M-x find-function RET expand-abbrev RET for the code, and you'll also want to read C-h f with-wrapper-hook RET to understand how this hook is used.
EDIT:
Your revised question adds some key details that my answer didn't address. phils has provided one way to approach this issue. Another would be to use yasnippet . You can include arbitrary lisp code in your snippet templates, so you could do something like this:
# -*- mode: snippet -*-
# name: foobars
# key: fbf
# binding: direct-keybinding
# --
`(foo-bar-for-the-win)`
You'd need to ensure your function didn't return anything, or it would be inserted in the buffer. I don't use abbrev-mode, so I don't know if this would introduce conflicts. yas/snippet takes a bit of experimenting to get it running, but it's pretty handy once you get it set up.
Original answer:
You can bind space to any function you like. You could bind all of the punctuation keys to the same function, or to different functions.
(define-key your-mode-map " " 'your-choice-function)
You probably want to do this within a custom mode map, so you can return to normal behaviour when you switch modes. Globally setting space to anything but self-insert would be unhelpful.
Every abbrev is composed of several elements. Among the main elements are the name (e.g. "fbf"), the expansion (any string you like), and the hook (a function that gets called). In your case it sounds like you want the expansion to be the empty string and simply specify your foo-bar-for-the-win as the hook.
Sometimes I come across this problem where you have a set of functions that obviously belong to the same group. Those functions are needed at several places, and often together.
To give a specific example: consider the filemtime, fileatime and filectime functions. They all provide a similar functionality. If you are building something like a filemanager, you'll probably need to call them one after another to get the info you need. This is the moment that you get thinking about a wrapper. PHP already provides stat, but suppose we don't have that function.
I looked at the php sourcecode to find out how they solved this particular problem, but I can't really find out what's going on.
Obviously, if you have a naive implementation of such a grouping function, say filetimes, would like this:
function filetimes($file) {
return array(
'filectime' => filectime($file)
,'fileatime' => fileatime($file)
,'filemtime' => filemtime($file)
);
}
This would work, but incurs overhead since you would have to open a file pointer for each function call. (I don't know if it's necessary to open a file pointer, but let's assume that for the sake of the example).
Another approach would be to duplicate the code of the fileXtime functions and let them share a file pointer, but this obviously introduces code duplication, which is probably worse than the overhead introduced in the first example.
The third, and probably best, solution I came up with is to add an optional second parameter to the fileXtime functions to supply a filepointer.
The filetimes functions would then look like this:
function filetimes($file) {
$fp = fopen($file, 'r');
return array(
'filectime' => filectime($file, $fp)
,'fileatime' => fileatime($file, $fp)
,'filemtime' => filemtime($file, $fp)
);
}
Somehow this still feels 'wrong'. There's this extra parameter that is only used in some very specific conditions.
So basically the question is: what is best practice in situations like these?
Edit:
I'm aware that this is a typical situation where OOP comes into play. But first off: not everything needs to be a class. I always use an object oriented approach, but I also always have some functions in the global space.
Let's say we're talking about a legacy system here (with these 'non-oop' parts) and there are lots of dependencies on the fileXtime functions.
tdammer's answer is good for the specific example I gave, but does it extend to the broader problem set? Can a solution be defined such that it is applicable to most other problems in this domain?
Use classes, Luke.
I'd rewrite the fileXtime functions to accept either a filename or a file handle as their only parameter. Languages that can overload functions (like C++, C# etc) can use this feature; in PHP, you'd have to check for the type of the argument at run time.
Passing both a filename and a file handle would be redundant, and ambiguous calls could be made:
$fp = fopen('foo', 'r');
$times = file_times('bar', $fp);
Of course, if you want to go OOP, you'd just wrap them all in a FileInfo class, and store a (lazy-loaded?) private file handle there.
Working on a particular application, I keep writing very similar queries, again and again. They're not exactly the same, but are of very similar form, and embedded in almost identical chunks of code, e.g.,
$Mysqli = new mysqli;
if ($Stmt = $Mysqli->prepare("SELECT foo
FROM tblFoo
WHERE something = ?")) {
$Stmt->bind_param('s', $this->_something);
$Stmt->execute();
if (0 != $Stmt->errno)
throw new Exception("blah, blah, blah");
$Stmt->bind_result($foo);
while ($Stmt->fetch()){
$this->_foos[] = new Foo($foo);
}
$Stmt->close();
} else {
throw new Exception("blah, blah, blah"););
}
}
and later, somewhere else ...
$Mysqli = new mysqli;
if ($Stmt = $Mysqli->prepare("SELECT bar, baz
FROM tblBar
WHERE somethingElse = ?")) {
$Stmt->bind_param('s', $this->_somethingElse);
$Stmt->execute();
if (0 != $Stmt->errno)
throw new Exception("blah, blah, blah");
$Stmt->bind_result($bar, $baz);
while ($Stmt->fetch()){
// do something else with $bar and $baz
}
$Stmt->close();
} else {
throw new Exception("blah, blah, blah"););
}
}
... and then another, and elsewhere another ... etc.
Is this a real violation of DRY? It doesn't seem to make sense to write a class for performing this kind of query (with constructor params or setters for table, column, bound variables, etc.) and then reusing it throughout my app. But at the same time, I can't shake this nagging feeling that I'm repeating myself.
Maybe it's just that there are only so many ways to write a simple query and that a certain amount of repetition like this is to be expected.
Thoughts?
A lot of people do exactly this. Create a class representing each table, that all inherit from the same class. The base class can handle loading and saving the data. So if you want to load the data, you just call the load method. And you can set and access the field values by the properties of the objects.
There's also libraries like hibernate that handle a lot of the dirty work for you.
I would say it was a violation. From a glance at your code (unless I'm missing something), those two statements are identical except for the strings "foo" and "bar" (repeated a few times) and your actual business logic.
At the minimum you could extract that. More to the point, all the SQL strings should probably be extracted. They've always seemed more like data to me than code. If you had an array of SQL strings, tables, etc--would that make refactoring more obvious?
(Extracting strings and other data from your code is a great way to begin a refactor)
One possibility, if you had an object that had the table name and a method containing your business logic, you could pass it into a "Processor" with all the boilerplate (which is all the rest of the code you have there).
I think that might make it dry.
(PS. Always prefer composition over inheritance. There is no advantage to using inheritance over a class that can "Run" your objects)
You're encountering one of the first desires to modularise. :-)
There are two general solutions. The first is to abstract the call to all (or most) of the similar queries. And then again to another set of queries. And then to another. And so on. Unfortunately, this results in numerous blocks that do the same thing: call a query, check for problems, assemble a result set, pass it back. It's a DAL, but only of a sorts. It also doesn't scale.
The second solution is to abstract the process of making a query and returning the result, and to then look at abstracting the data access on top of that (which basically means you build something that assembles SQL). This is more likely to become a proper ORM rather than a simple DAL. It is much easier to scale this.