Handling POST data sent by html form in CGI C - html

So, as I said a few days ago, I'm trying to make a login script using CGI-C on a Apache server.
My form is submitting two variables to Test.cgi: username and password (pattern 2 to 40 characters only) using the POST method.
here is my code so far:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *lengthy;
int figures;
char somelimit[512];
lengthy = getenv("CONTENT_LENGTH");
figures = atoi(lengthy);
fgets(somelimit, figures, stdin);
printf("Content-type: text/html\n\n");
printf("%s\n", somelimit);
return 0;
}
Q. How do I extract username and password values from stdin? A normal return I'm getting in the above case is "username=xyz&password=xyz12" how do I process this?
Q.I want to limit what I read from CONTENT_LENGTH header, in-case of a malformed CONTENT-LENGTH header.
what type of Data is this header returning? I know it is supposed to return a "Decimal no of Octets". Valid values are 0 or more. I want to take 1 to X, where X is the upper limit, considering I have two variables, username/password, both limited to 40 characters each in html form.
I tried int[] and char[], instead of the pointer. Why can't I convert it directly with something like:
int some[1024];
some = atoi(gentenv("CONTENT_LENGTH"));
why is atoi considered unsafe?
Q. How do I take only the stdin to contain only US-ASCII characters, to avoid malformed message-body.
I'm a C Newbie, so please go easy :)
PS: Please don't recommend any frameworks/web-servers, etc.
Edit:I just realized that perhaps I asked too many questions. Sorry about that. I'm going to fix this post to make it cohesive and well bounded. Please stand by.
Edit2: This is the final question, no more edits. I will accept an answer which at least answers 2 out of 3 questions above.

A lot of things going on here, and a lot of questions.
First, I recommend that you not output your HTTP header until you're about to output the rest. It's more logical that way, and allows you to output a Redirect header instead if something in your program requires it.
Second, use strtoul() instead of atoi(), since the latter has no error-checking.
You only need one buffer, not two; and I recommend you allocate it dynamically based off the content length. Use malloc() and don't forget to check the return value. Do NOT try to anticipate the upper bounds of your content length.
You'll have to decode the argument string to get any values out. What you do with them is up to you, but handling user names and passwords is a wholly separate topic that could take days to cover. But suffice it to say, never EVER store a password in plain text in a file.
CONTENT_LENGTH is text passed by stdin. That text includes the number of bytes of content. You will have to convert the text to an integer type, probably size_t, before it is useful to you. That's what your atoi() function is doing (which, again, you should replace with strtoul())
Use HTTPS.
Stop emitting your Content-type header prematurely. Then, if you decide you need to redirect, you can emit a Redirect header instead.
getenv() returns a pointer to a static text block that you cannot change. If you copy the pointer, you cannot change the text in the string. If you copy the string to a new array, you would be able to change the text in the string; however, I cannot think of a reason why you'd want to do that.
In your current code, you do not allocate any memory off the heap so you do not need to call free(). However, I recommend you rethink this aspect of your design.

Related

How to add back comments/whitespaces in translator using the Antlr4's visitor model

I'm currently writing a TSQL (Sybase/Microsoft SQL) to MySQL translator using the ANTLR4 visitor approach.
I'm able to push comments and whitespaces to different channels so that I can use that information later.
What's not super clear is:
how do I get the data back?
and more importantly how do I plug the comments and whitespaces back into my translated MySQL code?
Re: #1, this seems to work to get the list of all tokens including the comments/whitespaces:
public static List<Token> getHiddenTokensFromString(String sqlIn, int hiddenChannel) {
CharStream charStream = CharStreams.fromString(sqlIn);
CaseChangingCharStream upper = new CaseChangingCharStream(charStream, true);
TSqlLexer lexer = new TSqlLexer(upper);
CommonTokenStream commonTokenStream = new CommonTokenStream(lexer, hiddenChannel);
commonTokenStream.fill();
List<Token> hiddenTokens = commonTokenStream.getTokens();
return hiddenTokens;
}
Re #2, what makes it particularly challenging is that as part of the translation, lines of SQL have to be moved around, some lines removed and some lines added.
Any help will be greatly appreciated.
Thanks.
The ANTLR4 lexer creates a number of tokens, each with an index (a running number). Provided you didn't just skip a token, all tokens are available for later inspection, once the parsing step is done, regardless of their channels (the channel is actually just a number property on a token).
So, given you have a token you want to translate, get its index and then ask the token stream for the tokens with the next smaller index or next higher index. These are usually the hidden whitespaces.
Once you have the whitespace token use its start and stop index to get the original text from the char stream. And since you know where you are in the translation process when you do that, it should be easy to know where to insert the original text.

Why is this %2B string being urldecoded?

[This may not be precisely a programming question, but it's a puzzle that may best be answered by programmers. I tried it first on the Pro Webmasters site, to overwhelming silence]
We have an email address verification process on our website. The site first generates an appropriate key as a string
mykey
It then encodes that key as a bunch of bytes
&$dac~ʌ����!
It then base64 encodes that bunch of bytes
JiRkYWN+yoyIhIQ==
Since this key is going to be given as a querystring value of a URL that is to be placed in an HTML email, we need to first URLEncode it then HTMLEncode the result, giving us (there's no effect of HTMLEncoding in the example case, but I can't be bothered to rework the example)
JiRkYWN%2ByoyIhIQ%3D%3D
This is then embedded in HTML that is sent as part of an email, something like:
click here.
Or paste <b>http://myapp/verify?key=JiRkYWN%2ByoyIhIQ%3D%3D</b> into your browser.
When the receiving user clicks on the link, the site receives the request, extracts the value of the querystring 'key' parameter, base64 decodes it, decrypts it, and does the appropriate thing in terms of the site logic.
However on occasion we have users who report that their clicking is ineffective. One such user forwarded us the email he had been sent, and on inspection the HTML had been transformed into (to put it in terms of the example above)
click here
Or paste <b>http://myapp/verify?key=JiRkYWN+yoyIhIQ%3D%3D</b> into your browser.
That is, the %2B string - but none of the other percentage encoded strings - had been converted into a plus. (It's definitely leaving us with the right values - I've looked at the appropriate SMTP logs).
key=JiRkYWN%2ByoyIhIQ%3D%3D
key=JiRkYWN+yoyIhIQ%3D%3D
So I think that there are a couple of possibilities:
There's something I'm doing that's stupid, that I can't see, or
Some mail clients convert %2b strings to plus signs, perhaps to try to cope with the problem of people mistakenly URLEncoding plus signs
In case of 1 - what is it? In case of 2 - is there a standard, known way of dealing with this kind of scenario?
Many thanks for any help
The problem lies at this step
on inspection the HTML had been transformed into (to put it in terms of the example above)
click here
Or paste <b>http://myapp/verify?key=JiRkYWN+yoyIhIQ%3D%3D</b> into
your browser.
That is, the %2B string - but none of the other percentage encoded
strings - had been converted into a plus
Your application at "the other end" must be missing a step of unescaping. Regardless of if there is a %2B or a + a function like perls uri_unescape returns consistent answers
DB<9> use URI::Escape;
DB<10> x uri_unescape("JiRkYWN+yoyIhIQ%3D%3D")
0 'JiRkYWN+yoyIhIQ=='
DB<11> x uri_unescape("JiRkYWN%2ByoyIhIQ%3D%3D")
0 'JiRkYWN+yoyIhIQ=='
Here is what should be happening. All I'm showing are the steps. I'm using perl in a debugger. Step 54 encodes the string to base64. Step 55 shows how the base64 encoded string could be made into a uri escaped parameter. Steps 56 and 57 are what the client end should be doing to decode.
One possible work around is to ensure that your base64 "key" does not contain any plus signs!
DB<53> $key="AB~"
DB<54> x encode_base64($key)
0 'QUJ+
'
DB<55> x uri_escape('QUJ+')
0 'QUJ%2B'
DB<56> x uri_unescape('QUJ%2B')
0 'QUJ+'
DB<57> $result=decode_base64('QUJ+')
DB<58> x $result
0 'AB~'
What may be happening here is that the URLDecode is turning the %2b into a +, which is being interpreted as a space character in the URL. I was able to overcome a similar problem by first urldecoding the string, then using a replace function to replace spaces in the decoded string with + characters, and then decrypting the "fixed" string.

Where do I get "junk" data to help test my code?

For my C class I've written a simple statistics program -- it calculates max, min, mean, etc. Anyway, I've gotten the program successfully compiled, so all I need to do now is actually test it; the only problem is that I don't have anything to test with.
In my case, I need a list of doubles -- my program needs to accept between 2 and 1,000,000; Is there some resource online that can produce lists of otherwise meaningless data? I know Lorem Ipsum gets used for typesetting, and I'm wondering if there's something similar for various types of numerical data.
Or am I out of luck, and I'll have to just create my own junk data?
The problem with testing software is not the source of the data, but the test set. I mean, can you test an int sum(int a, int b) method by just inputting random numbers to it? No, you need to know what to expect. This is a test set: inputs and expected outputs.
What do you say when you discover that 548888876+99814465=643503341? How can you tell this is the real result?
More than finding random numbers to give your program, you must somehow know the results of your computation in advance in order to compare it.
There are a few ways to do it: what I suggest you is to pick a random number generator (amphetamachine +1) and use the data both on your code and on a program that you already know is good, ie. Matlab for your purposes. After computing your statistics with both, compare your results and see if you coded good or need to do some debug.
By the way, I volountarily altered the result of the above sum...
What about just generating a random double?
Random r = new Random();
for (int i = 0; i < 100000; i++)
{
double number = r.NextDouble();
//do something with the value
}
Since the data you need will depend on the program, there is no source of generic data that I know of.
If you are able to write that program, you should be able to write a script to generate dummy data for yourself.
Just use a loop to print out random numbers within the range your program can accept.
Generate a file with random bytes:
$ dd \
of=random-bytes \
if=/dev/urandom \
bs=1024 \
count=1024
http://www.generatedata.com/#generator
I've used that data generator before with some success. To be fair, it will usually involve copy/pasting the data it generates into some other format that you'll be able to read in.
You can generate your own data for this specific case quite easily though. Loop a random number of times with a terminating condition of 1,000,000. Generating random doubles within the range you expect. Feed that in and away you go.
Generating your own test data in this case is probably the best option.
You could take the first million digits of pi and chop them up into however many doubles you want.
The first few could be 3.14159, 2.65358, 9.79323, 8.46264, 3.38327, 9.50288, 4.19716, and 9.39937, for example.

How should substring() work?

I do not understand why Java's [String.substring() method](http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#substring(int,%20int%29) is specified the way it is. I can't tell it to start at a numbered-position and return a specified number of characters; I have to compute the end position myself. And if I specify an end position beyond the end of the String, instead of just returning the rest of the String for me, Java throws an Exception.
I'm used to languages where substring() (or substr()) takes two parameters: a start position, and a length. Is this objectively better than the way Java does it, and if so, can you prove it? What's the best language specification for substring() that you have seen, and when if ever would it be a good idea for a language to do things differently? Is that IndexOutOfBoundsException that Java throws a good design idea, or not? Does all this just come down to personal preference?
There are times when the second parameter being a length is more convenient, and there are times when the second parameter being the "offset to stop before" is more convenient. Likewise there are times when "if I give you something that's too big, just go to the end of the string" is convenient, and there are times when it indicates a bug and should really throw an exception.
The second parameter being a length is useful if you've got a fixed length of field. For instance:
// C#
String guid = fullString.Substring(offset, 36);
The second parameter being an offset is useful if you're going up to another delimited:
// Java
int nextColon = fullString.indexOf(':', start);
if (start == -1)
{
// Handle error
}
else
{
String value = fullString.substring(start, nextColon);
}
Typically, the one you want to use is the opposite to the one that's provided on your current platform, in my experience :)
I'm used to languages where
substring() (or substr()) takes two
parameters: a start position, and a
length. Is this objectively better
than the way Java does it, and if so,
can you prove it?
No, it's not objectively better. It all depends on the context in which you want to use it. If you want to extract a substring of a specific length, it's bad, but if you want to extract a substring that ends at, say, the first occurrence of "." in the string, it's better than if you first had to compute a length. The question is: which requirement is more common? I'd say the latter. Of course, the best solution would be to have both versions in the API, but if you need the length-based one all the time, using a static utility method isn't that horrible.
As for the exception, yeah, that's definitely good design. You asked for something specific, and when you can't get that specific thing, the API should not try to guess what you might have wanted instead - that way, bugs become apparent more quickly.
Also, Java DOES have an alternative substring() method that returns the substring from a start index until the end of the string.
second parameter should be optional, first parameter should accept negative values..
If you leave off the 2nd parameter it will go to the end of the string for you without you having to compute it.
Having gotten some feedback, I see when the second-parameter-as-index scenario is useful, but so far all of those scenarios seem to be working around other language/API limitations. For example, the API doesn't provide a convenient routine to give me the Strings before and after the first colon in the input String, so instead I get that String's index and call substring(). (And this explains why the second position parameter in substr() overshoots the desired index by 1, IMO.)
It seems to me that with a more comprehensive set of string-processing functions in the language's toolkit, the second-parameter-as-index scenario loses out to second-parameter-as-length. But somebody please post me a counterexample. :)
If you store this away, the problem should stop plaguing your dreams and you'll finally achieve a good night's rest:
public String skipsSubstring(String s, int index, int length) {
return s.subString(index, index+length);
}

Are hard-coded STRINGS ever acceptable?

Similar to Is hard-coding literals ever acceptable?, but I'm specifically thinking of "magic strings" here.
On a large project, we have a table of configuration options like these:
Name Value
---- -----
FOO_ENABLED Y
BAR_ENABLED N
...
(Hundreds of them).
The common practice is to call a generic function to test an option like this:
if (config_options.value('FOO_ENABLED') == 'Y') ...
(Of course, this same option may need to be checked in many places in the system code.)
When adding a new option, I was considering adding a function to hide the "magic string" like this:
if (config_options.foo_enabled()) ...
However, colleagues thought I'd gone overboard and objected to doing this, preferring the hard-coding because:
That's what we normally do
It makes it easier to see what's going on when debugging the code
The trouble is, I can see their point! Realistically, we are never going to rename the options for any reason, so about the only advantage I can think of for my function is that the compiler would catch any typo like fo_enabled(), but not 'FO_ENABLED'.
What do you think? Have I missed any other advantages/disadvantages?
If I use a string once in the code, I don't generally worry about making it a constant somewhere.
If I use a string twice in the code, I'll consider making it a constant.
If I use a string three times in the code, I'll almost certainly make it a constant.
if (config_options.isTrue('FOO_ENABLED')) {...
}
Restrict your hard coded Y check to one place, even if it means writing a wrapper class for your Map.
if (config_options.isFooEnabled()) {...
}
Might seem okay until you have 100 configuration options and 100 methods (so here you can make a judgement about future application growth and needs before deciding on your implementation). Otherwise it is better to have a class of static strings for parameter names.
if (config_options.isTrue(ConfigKeys.FOO_ENABLED)) {...
}
I realise the question is old, but it came up on my margin.
AFAIC, the issue here has not been identified accurately, either in the question, or the answers. Forget about 'harcoding strings" or not, for a moment.
The database has a Reference table, containing config_options. The PK is a string.
There are two types of PKs:
Meaningful Identifiers, that the users (and developers) see and use. These PKs are supposed to be stable, they can be relied upon.
Meaningless Id columns which the users should never see, that the developers have to be aware of, and code around. These cannot be relied upon.
It is ordinary, normal, to write code using the absolute value of a meaningful PK IF CustomerCode = "IBM" ... or IF CountryCode = "AUS" etc.
referencing the absolute value of a meaningless PK is not acceptable (due to auto-increment; gaps being changed; values being replaced wholesale).
.
Your reference table uses meaningful PKs. Referencing those literal strings in code is unavoidable. Hiding the value will make maintenance more difficult; the code is no longer literal; your colleagues are right. Plus there is the additional redundant function that chews cycles. If there is a typo in the literal, you will soon find that out during Dev testing, long before UAT.
hundreds of functions for hundreds of literals is absurd. If you do implement a function, then Normalise your code, and provide a single function that can be used for any of the hundreds of literals. In which case, we are back to a naked literal, and the function can be dispensed with.
the point is, the attempt to hide the literal has no value.
.
It cannot be construed as "hardcoding", that is something quite different. I think that is where your issue is, identifying these constructs as "hardcoded". It is just referencing a Meaningfull PK literally.
Now from the perspective of any code segment only, if you use the same value a few times, you can improve the code by capturing the literal string in a variable, and then using the variable in the rest of the code block. Certainly not a function. But that is an efficiency and good practice issue. Even that does not change the effect IF CountryCode = #cc_aus
I really should use constants and no hard coded literals.
You can say they won't be changed, but you may never know. And it is best to make it a habit. To use symbolic constants.
In my experience, this kind of issue is masking a deeper problem: failure to do actual OOP and to follow the DRY principle.
In a nutshell, capture the decision at startup time by an appropriate definition for each action inside the if statements, and then throw away both the config_options and the run-time tests.
Details below.
The sample usage was:
if (config_options.value('FOO_ENABLED') == 'Y') ...
which raises the obvious question, "What's going on in the ellipsis?", especially given the following statement:
(Of course, this same option may need to be checked in many places in the system code.)
Let's assume that each of these config_option values really does correspond to a single problem domain (or implementation strategy) concept.
Instead of doing this (repeatedly, in various places throughout the code):
Take a string (tag),
Find its corresponding other string (value),
Test that value as a boolean-equivalent,
Based on that test, decide whether to perform some action.
I suggest encapsulating the concept of a "configurable action".
Let's take as an example (obviously just as hypthetical as FOO_ENABLED ... ;-) that your code has to work in either English units or metric units. If METRIC_ENABLED is "true", convert user-entered data from metric to English for internal computation, and convert back prior to displaying results.
Define an interface:
public interface MetricConverter {
double toInches(double length);
double toCentimeters(double length);
double toPounds(double weight);
double toKilograms(double weight);
}
which identifies in one place all the behavior associated with the concept of METRIC_ENABLED.
Then write concrete implementations of all the ways those behaviors are to be carried out:
public class NullConv implements MetricConverter {
double toInches(double length) {return length;}
double toCentimeters(double length) {return length;}
double toPounds(double weight) {return weight;}
double toKilograms(double weight) {return weight;}
}
and
// lame implementation, just for illustration!!!!
public class MetricConv implements MetricConverter {
public static final double LBS_PER_KG = 2.2D;
public static final double CM_PER_IN = 2.54D
double toInches(double length) {return length * CM_PER_IN;}
double toCentimeters(double length) {return length / CM_PER_IN;}
double toPounds(double weight) {return weight * LBS_PER_KG;}
double toKilograms(double weight) {return weight / LBS_PER_KG;}
}
At startup time, instead of loading a bunch of config_options values, initialize a set of configurable actions, as in:
MetricConverter converter = (metricOption()) ? new MetricConv() : new NullConv();
(where the expression metricOption() above is a stand-in for whatever one-time-only check you need to make, including looking at the value of METRIC_ENABLED ;-)
Then, wherever the code would have said:
double length = getLengthFromGui();
if (config_options.value('METRIC_ENABLED') == 'Y') {
length = length / 2.54D;
}
// do some computation to produce result
// ...
if (config_options.value('METRIC_ENABLED') == 'Y') {
result = result * 2.54D;
}
displayResultingLengthOnGui(result);
rewrite it as:
double length = converter.toInches(getLengthFromGui());
// do some computation to produce result
// ...
displayResultingLengthOnGui(converter.toCentimeters(result));
Because all of the implementation details related to that one concept are now packaged cleanly, all future maintenance related to METRIC_ENABLED can be done in one place. In addition, the run-time trade-off is a win; the "overhead" of invoking a method is trivial compared with the overhead of fetching a String value from a Map and performing String#equals.
I believe that the two reasons you have mentioned, Possible misspelling in string, that cannot be detected until run time and the possibility (although slim) of a name change would justify your idea.
On top of that you can get typed functions, now it seems you only store booleans, what if you need to store an int, a string etc. I would rather use get_foo() with a type, than get_string("FOO") or get_int("FOO").
I think there are two different issues here:
In the current project, the convention of using hard-coded strings is already well established, so all the developers working on the project are familiar with it. It might be a sub-optimal convention for all the reasons that have been listed, but everybody familiar with the code can look at it and instinctively knows what the code is supposed to do. Changing the code so that in certain parts, it uses the "new" functionality will make the code slightly harder to read (because people will have to think and remember what the new convention does) and thus a little harder to maintain. But I would guess that changing over the whole project to the new convention would potentially be prohibitively expensive unless you can quickly script the conversion.
On a new project, symbolic constants are the way IMO, for all the reasons listed. Especially because anything that makes the compiler catch errors at compile time that would otherwise be caught by a human at run time is a very useful convention to establish.
Another thing to consider is intent. If you are on a project that requires localization hard coded strings can be ambiguous. Consider the following:
const string HELLO_WORLD = "Hello world!";
print(HELLO_WORLD);
The programmer's intent is clear. Using a constant implies that this string does not need to be localized. Now look at this example:
print("Hello world!");
Here we aren't so sure. Did the programmer really not want this string to be localized or did the programmer forget about localization while he was writing this code?
I too prefer a strongly-typed configuration class if it is used through-out the code. With properly named methods you don't lose any readability. If you need to do conversions from strings to another data type (decimal/float/int), you don't need to repeat the code that does the conversion in multiple places and can cache the result so the conversion only takes place once. You've already got the basis of this in place already so I don't think it would take much to get used to the new way of doing things.