This might be a stupid question but after all the research on best practices – including this great SO post that explains sanitizing, validation, escaping for storage and escaping for display – I am still confused.
I have built a routine where I sanitize user input – say, a comment post, or "edit my first name" string – with $value = filter_var($value, FILTER_SANITIZE_STRING);. Given a value of O'Hara, that gets rid of <a></a> and similar tags nicely. Then this new value gets validated: error if empty value and field is not nullable; or if too long; etc. Lastly, I save that value in the DB using a CakePHP query builder – which, of course, supports binding string values.
But when I then save that value in the DB, it is saved as O'Hara instead of O'Hara – because of said sanitization.
Am I supposed to decode it back / to yet another format? If so with which method?
Or, am I to use the sanitized version for validation but then the original value for DB stora-- that can't be it.
Or is FILTER_SANITIZE_STRING a flag I need to tweak? The tutorials I've seen [1] [2] suggest that the flag is enough.
I feel so dumb because that great post mentioned earlier seems to still not be enough for me. All I can find are posts from ~2012 that say you should bind.
Any help would be appreciated.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have some HTML on a page that has a bunch of tables and data (it's a report page). This is all legacy code, so no harassment necessary on the use of tables.
Given that it is legacy code, it is fragile, and we want to confirm that the table looks like we want (number of columns, rows, and the data inside of them are accurate).
My first inclination is to use selenium web driver and run through everything that way (Page Object Pattern), but a co-worker suggested that I just view source of the page, copy the table in question, and then just use this to do a string comparison in the test.
My initial thoughts on his proposal is that it is not a good test because you're starting with the answer and then writing a test to make sure you get that answer (essentially non-TDD). But I'm not sure that's a good enough answer in this case.
How should I test HTML table to make sure all columns, rows are how we like, in addition to the contents of each cell?
It depends. String matching sounds like Approval Testing, depending on just how dynamic the table is that could be fine.
If I already had Selenium tests running I'ld stick with what I have. Using findElements to count and verify the various columns, rows, and values.
Re: your comment if you cannot convince the developers to add ids, names, or something to make your job easier and you do go the Selenium route then xpath is probably what you will want to use. We've created utility methods to help in these sort of situations:
public boolean isLabeledTextPresent(String label, String text) {
WebElement element = findElement(By.xpath("//tr/th/label[contains(text(), '" +
label + "')]/ancestor::tr/td"));
String labeledText = element.getText().trim();
return labeledText.contains(text);
}
I think both methods are valid, it really depends on what you are trying to do and what advantages/disadvantages work best for you.
It would take a little work (depending on you or your teammates skill sets and experience) to write a Selenium script to scrape the table and verify certain things.
Advantages:
Once completed, it will validate very quickly and will be less fragile than method #2.
Disadvantages:
This is dependent on how quickly you can write a script and how easily you are able to validate all the things you want to validate. If all you want is # cols/rows and cell content, that should be very easy. If you want to validate things like formatting (size, color, etc.) then that starts to get a little more complicated to do through code.
It would be super easy to copy/paste HTML and validate against that. The problem is, as you pointed out, you are starting with the answer in some respects. You can get around that by validating that the HTML source for the table is correct. That will have to be done manually but once you get that, you can open the page and compare source of the table vs what you have validated.
Advantages:
You will be able to tell when anything changes... formatting, data, # cells, ... everything.
Disadvantages:
You will be able to tell when anything changes... lol. Your test will fail when anything is changed which will make the test very fragile if you expect that the table will ever get updated. If it gets updated, you will have to revalidate all the HTML for the table which could get to be a tedious process depending how often you expect this to happen. One thing that will help with this validation is to use a diffing tool... you can quickly determine what has changed and validate that instead of having to validate everything each time there is a change.
I would lean towards #1, write the script. It will be less fragile and as long as someone has the right skills shouldn't be that big of a task.
EDIT
You didn't specify what language you are working in but here's some code in Java that hopefully will point you in the right direction if you choose to write a script.
WebElement table = driver.findElement(...));
List<WebElement> rows = table.findElements(By.tagName("tr"));
Assert("Validate expected number of rows", rows.size(), expectedNoOfRows);
for (int row = 0; row < rows.size(); row++)
{
List<WebElement> cells = rows.get(row).findElements(By.tagName("td"));
Assert("Validate expected number of cells in row " + row, cells.size(), expectedNoOfCells[row]);
for (int cell = 0; cell < cells.size(); cell++)
{
Assert("Validate expected text in (" + cell + "," + row + ")", cells.get(cell).getText().trim(), expectedText[cell][row]);
}
}
You could do something like this at a basic level. If you want to get more fancy, you could add logic that looks for specific parts of the report so you can get a "landmark", e.g. Summary, Data, ... making headings up ... so you will know what to expect in the next section.
You could run a variation of this code to dump the different values, number of rows, number of cells in each row, and cell contents. Once you validate that those values are correct, you could use that as your master and do comparisons vs it. That will keep you from false fails on comparing straight HTML source. Maybe it's something in the middle between a script and text comparison based on HTML source.
I'll take a real example I have to implement in a program I'm coding:
I have a database that has the score of every game bowled in the past three years in a bowling center. With a GUI, you can choose to either search for the best score on each lane, search for the best score between two dates, for the best score for each week, etc.
I'm wondering what the best way to implement this is. Should I code something like this:
public Vector<Scores> grabMaxScores(sortType, param1, param2)
{
if(sortType.equals("By lane"))
...
else if(sortType.equals("Between given dates")
...
}
Or is it more appropriate to code different methods for each type and call the correct one in the listener?
public Vector<Scores> grabMaxScoresBetweenDates(startDate, endDate)
{
...
}
public Vector<Scores> grabMaxScoresByLane(minLane, maxLane)
{
...
}
I'm not necessarily asking for this particular problem, it's just a question I find asking myself often when I'm coding multiple methods that are alike where the principle is the same, but the parameters are different.
I can see there are good reasons to use each of them, but I want to know if there is a "more correct" or standard way of coding this.
In my personal opinion, I would prefer your second option over the first. This is because you have the opportunity to be precise about things like the types of the parameters. For example, minLane and maxLane may just be integers, but startDate and endDate could very well be Date objects. It's often nicer if you can actually specify what you expect, as it reduces the need for such things as casting and range checks, etc. Also, I would find it more readable, as the function names just say what you are trying to do.
However, I may have an alternative idea, which is kind of a variation on your first example (I actually got this inspiration from Java's Comparator, in case you're familiar with that). Rather than pass a string as the first argument, pass some sort of Selector object. Selector would be the name of a class or a interface, which would look something like so (in Java):
interface Selector {
public void select(Score next);
public Score getBest( );
}
If the select method "likes" the value of next which is given to it, it can store the value for later. If it doesn't like it, it can simply discard it, and keep whatever value it already has. After all the data is processed, the best value will be left over, and can be requested by calling getBest. Of course, you can alter the interface to suit your particular needs (e.g. it seems like you might be expecting more than one value to be retrieved. Also, generics might help a lot as well).
The reason I like this idea is that now your function is very general purpose. In order to add new functionality, you don't need to add functions, and you don't need to modify any functions you already have. Instead, the user of your code can simply define their own implementation of Selector as they see fit. This allows your code to be far more compositional, which makes it easier to use. The only inconvenience is the need to define implementations of Selector, though, you could also provide several default ones.
The approach you have used would also work. But if you want to add some new functionality like "get lowest scores on Friday evening", you will need to add one more function, which kinda not so good thing to do.
As you have already have the data in a database you can generate database queries which would fetch the required results and display. So you need not modify your code every time.
The code base I'm currently working on is littered with hard-coded values.
I view all hard coded values as a code smell and I try to eliminate them where possible...however there are some cases that I am unsure about.
Here are two examples that I can think of that make me wonder what the best practice is:
1. MyTextBox.Text = someCondition ? "Yes" : "No"
2. double myPercentage = myValue / 100;
In the first case, is the best thing to do to create a class that allows me to do MyHelper.Yes and MyHelper.No or perhaps something similar in a config file (though it isn't likely to change and who knows if there might ever be a case where its usage would be case sensitive).
In the second case, finding a percentage by dividing by 100 isn't likely to ever change unless the laws of mathematics change...but I still wonder if there is a better way.
Can anyone suggest an appropriate way to deal with this sort of hard coding? And can anyone think of any places where hard coding is an acceptable practice?
And can anyone think of any places where hard coding is an acceptable practice?
Small apps
Single man projects
Throw aways
Short living projects
For short anything that won't be maintained by others.
Gee I've just realized how much being maintainer coder hurt me in the past :)
The real question isn't about hard coding, but rather repetition. If you take the excellent advice found in "The Pragmatic Programmer", simply Don't Repeat Yourself (DRY).
Taking the principle of DRY, it is fine to hardcode something at any point. However, once you use that particular value again, refactor so this value is only hardcoded once.
Of course hard-coding is sometimes acceptable. Following dogma is rarely as useful a practice as using your brain.
(For an example of this, perhaps it's interesting to go back to the goto wars. How many programmers do you know that will swear by all things holy that goto is evil? Why then does Steve McConnell devote a dozen pages to a measured discussion of the subject in Code Complete?)
Sure, there's a lot of hard-gained experience that tells us that small throw-away applications often mutate into production code, but that's no reason for zealotry. The agilists tell us we should do the simplest thing that could possibly work and refactor when needed.
That's not to say that the "simplest thing" shouldn't be readable code. It may make perfect sense, even in a throw-away spike to write:
const MAX_CACHE_RECORDS = 50
foo = GetNewCache(MAX_CACHE_RECORDS)
This is regardless of the fact that in three iterations time, someone might ask for the number of cache records to be configurable, and you might end up refactoring the constant away.
Just remember, if you go to the extremes of stuff like
const ONE_HUNDRED = 100
const ONE_HUNDRED_AND_ONE = 101
we'll all come to The Daily WTF and laugh at you. :-)
Think! That's all.
It's never good and you just proved it...
double myPercentage = myValue / 100;
This is NOT percentage. What you wanted to write is :
double myPercentage = (myValue / 100) * 100;
Or more correctly :
double myPercentage = (myValue / myMaxValue) * 100;
But this hard coded 100 messed with your mind... So go for the getPercentage method that Colen suggested :)
double getpercentage(double myValue, double maxValue)
{
return (myValue / maxValue) * 100;
}
Also as ctacke suggested, in the first case you will be in a world of pain if you ever need to localize these literals. It's never too much trouble to add a couple more variables and/or functions
The first case will kill you if you ever need to localize. Moving it to some static or constant that is app-wide would at least make localizing it a little easier.
Case 1: When should you hard-code stuff: when you have no reason to think that it will ever change. That said, you should NEVER hard code stuff in-line. Take the time to make static variables or global variables or whatever your language gives you. Do them in the class in question, and if you notice that two classes or areas of your code share the same value FOR THE SAME REASON (meaning it's not just coincidence), point them to the same place.
Case 2: For case case 2, you're correct: the laws of "percentage" will not change (being reasonable, here), so you can hard code inline.
Case 3: The third case is where you think the thing could change but you don't want to/have time to bother loading ResourceBundles or XML or whatever. In that case, you use whatever centralizing mechanism you can -- the hated Singleton class is a good one -- and go with that until you actually have need to deal with the problem.
The third case is tricky, though: it's extraordinarily hard to internationalize an application without really doing it... so you will want to hard-code stuff and just hope that, when the i18n guys come knocking, your code is not the worst-tasting code around :)
Edit: Let me mention that I've just finished a refactoring project in which the prior developer had placed the MySql connect strings in 100+ places in the code (PHP). Sometimes they were uppercase, sometimes they were lower case, etc., so they were hard to search and replace (though Netbeans and PDT did help a lot). There are reasons why he/she did this (a project called POG basically forces this stupidity), but there is just nothing that seems less like good code than repeating the same thing in a million places.
The better way for your second example would be to define an inline function:
double getpercentage(double myValue)
{
return(myValue / 100);
}
...
double myPercentage = getpercentage(myValue);
That way it's a lot more obvious what you're doing.
Hardcoded literals should appear in unit tests for the test values, unless there is so much reuse of a value within a single test class that a local constant is useful.
The unit tests are a description of expected values without any abstraction or redirection.
Imagine yourself reading the test - you want the information literally in front of you.
The only time I use constants for test values is when many tests repeat a value (itself a bit suspicious) and the value may be subject to change.
I do use constants for things like names of test files to compare.
I don't think that your second is really an example of hardcoding. That's like having a Halve() method that takes in a value to use to divide by; doesn't make sense.
Beyond that, example 1, if you want to change the language for your app, you don't want to have to change the class, so it should absolutely be in a config.
Hard coding should be avoided like Dracula avoids the sun. It'll come back to bite you in the ass eventually.
"hardcoding" is the wrong thing to worry about. The point is not whether special values are in code or in config files, the point is:
If the value could ever change, how much work is that and how hard is it to find? Putting it in one place and referring to that place elsewhere is not much work and therefore a way to play it safe.
Will maintainance programmers definitely understand why the value is what it is? If there is any doubt whatsoever, use a named constant that explains the meaning.
Both of these goals can be achieved without any need for config files; in fact I'd avoid those if possible. "putting stuff in config files means it's easier to change" is a myth, unless either
you actually want to support customers changing the values themselves
no value that could possibly be put in the config file can cause a bug (buffer overflow, anyone?)
your build and deployment process sucks
The text for the conditions should be in a resource file; that's what it's there for.
Not normally (Are hard-coding literals acceptable)
Another way at looking at this is how using a good naming convention
for constants used in-place of hard coded literals provides additional
documentation in the program.
Even if the number is used only once, it can still be hard to recognized
and may even be hard to find for future changes.
IMHO, making programs easier to read should be second nature to a
seasoned software professional. Raw numbers rarely communicate
meaningfully.
The extra time taken to use a well named constant will make the
code readability (easy to recall to the mind) and useful for future
re-mining (code re-use).
I tend to view it in terms of the project's scope and size.
Some simple projects that I am a solo dev on? Sure, I hard code lots of things. Tools I write that only I will ever use? Sure, if it gets the job done.
But, in working on larger, team projects? I agree, they are suspect and usually the product of laziness. Tag them for review and see if you can spot a pattern where they can be abstracted away.
In your example, the text box should be localizable, so why not a class that handles that?
Remember that you WILL forget the meaning of any non-obvious hard-coded value.
So be certain to put a short comment after each to remind you.
A Delphi example:
Length := Length * 0.3048; { 0.3048 converts feet to meters }
no.
What is a simple throw away app today will be driving your entire enterprise tomorrow. Always use best practices or you'll regret it.
Code always evolves. When you initially write stuff hard coding is the easiest way to go. Later when a need arrives to change the value it can be improved. In some cases the need never comes.
The need can arrive in many forms:
The value is used in many places and it needs to be changed by a programmer. In this case a constant is clearly needed.
User needs to be able to change the value.
I don't see the need to avoid hard coding. I do see the need to change things when there is a clear need.
Totally separate issue is that of course the code needs to be readable and this means that there might be a need for a comment for the hard coded value.
For the first value, it really depends. If you don't anticipate any kind of wide-spread adoption of your application and internationalization will never be an issue, I think it's mostly fine. However, if you are writing some kind of open source software or something with a larger audience consider the fact that it may one day need to be translated. In that case, you may be better off using string resources.
It's okay as long as you don't do refactoring, unit-testing, peer code reviews. And, you don't want repeat customers. Who cares?
I once had a boss who refused to not hardcode something because in his mind it gave him full control over the software and the items related to the software. Problem was, when the hardware died that ran the software the server got renamed... meaning he had to find his code. That took a while. I simply found a hex editor and hacked around it instead of waiting.
I normally add a set of helper methods for strings and numbers.
For example when I have strings such as 'yes' and 'no' I have a function called __ so I call __('yes'); which starts out in the project by just returning the first parameter but when I need to do more complex stuff (such as internationaizaton) it's already there and the param can be used a key.
Another example is VAT (form of UK tax) in online shops, recently it changed from 17.5% to 15%. Any one who hard coded VAT by doing:
$vat = $price * 0.175;
had to then go through all references and change it to 0.15, instead the super usefull way of doing it would be to have a function or variable for VAT.
In my opinion anything that could change should be written in a changeable way. If I find myself doing the same thing more than 5 times in the same day then it becomes a function or a config var.
Hard coding should be banned forever. Althought in you very simple examples i don't see anything wrong using them in any kind of project.
In my opinion hard coding is when you believe that a variable/value/define etc. will never change and create all your code based on that belief.
Example of such hard coding is the book Teach Yourself C in 24 Hours that everybody should avoid.