Mediawiki extension that produces cryptographic hash value of the input string? - mediawiki

I'm looking for an extension, probably a parser function, that takes an input string (could be any wikitext, and probably needs to support wikitext parsing in the call) and produces the cryptographic hash of the input.
Examples (with Semantic Mediawiki query) usage:
{{#hash: sha1|{{#show: SomeTestingPage|?SomeValue# -}} }}
{{#hash: md5|{{#ask: [[Category: Boats]] [[Displacement::>100000]] [[Purpose::Freighter]]|?format=list|link=none|headers=hide}} }}
Is there anything like this in Mediawiki?
Important: Don't confuse this with the HashTables extension, which implements hash table lookup but doesn't actually produce cryptographic hash values at the page level.

Beyond the fact that the cryptographic value of md5 and sha1 is questionable at this point, I'm not aware of an already existing extension that does this. Creating one should be rather easy though, including it being able to parse wikitext.
https://www.mediawiki.org/wiki/Manual:Parser_functions this manual gives you a basic example of how this would work, it should be fairly easy to follow.

Related

How do you detect that a visitor changed a value in the query string?

For our last week in school (finals next week) our teacher decided to give us a crash course in Perl. We talked about all the differences we would encounter if we used Perl and then we started talking about "spoofing".
We were given an HTML example where a user could input their first and last names. Of course our example already had Mickey as the first name and Mouse as the last name.
<form action="action_page.php">
First name:<br>
<input type="text" value="Mickey">
<br>
Last name:<br>
<input type="text" name="lastname" value="Mouse">
<br><br>
<input type="submit" value="Submit">
</form>
At the end when you hit submit you were redirected to a new screen that said your first name is Mickey and your last name is Mouse.
Our teacher said "spoofing" is when you change the method = get in the URL so instead of having
firstname=Mickey&lastname=Mouse
you would enter something like
firstname=baseball&lastname=bat
That would instantly alter the intended command and you would end up getting first name as baseball and lastname as bat.
This all sounds pretty straight forward, until he said he wanted us to write a program to prevent spoofing without using a post method.
Instead when a user attempts to spoof the system we're supposed to print out some anti-spoofing comment.
Unfortunately, we never really talked about spoofing aside from the examples. I've attempted to Google spoofing to see some example code, or at least understand this concept, but I haven't had much luck, or I haven't looked in the right places.
So I thought I would ask here. Can someone who is decent at Perl direct me towards basic anti-spoofing programs and content, or at least explain and show how spoofing is supposed to work.
What you need to do is to authenticate the data in the query string, and validate it when you receive it. There is a standard tool(set) for this: a cryptographic Message Authentication Code (MAC).
Basically, a MAC is a function that takes in a message (any arbitrary string) and a secret key, and outputs a random-looking token that depends, in a complicated way, on both the message and the key. Importantly, it is effectively impossible to compute a valid MAC token for a modified message without knowing the key.
To validate a query string (or some other data) with a MAC, you'd basically follow these steps:
Encode the data into a "canonical" form as a string. For an HTTP URL, you could just use the query string (and/or the entire URL) as it is, although you may wish to normalize it e.g. by %-decoding any characters that don't have to be encoded, and normalizing the case of any %-encoded values (e.g. %3f → %3F).
Alternatively, you could decode the query string into, say, an associative array, and serialize this array in a format of your choice. This can make it easier to combine parameters from multiple sources (e.g. hidden form fields), to add extra data fields (see below) and to choose which fields you want to validate.
Optionally, combine the data with any additional information you wish to associate it with, such as a user ID and/or a timestamp. (You can either transmit the timestamp explicitly, or just round it down to, say, the last hour, and check both the current and the previous timestamp when validating it.) Changing any of these values will change the MAC output, thus preventing attackers from e.g. trying to submit one user's data under another user's account.
Store a secret key (preferably, a securely generated random value of, say, 128 bits) on the server. Obviously, this secret key must be stored so that users cannot access it (e.g. by guessing the path to the config file).
Feed the canonically encoded data and the secret key into the MAC algorithm. Take the result and (if your MAC library doesn't do this for you) encode it in some convenient matter (e.g. using the URL-safe Base64 variant).
Append the encoded MAC token as an extra parameter in the URL.
When you receive the data back, remove the MAC token, feed the rest of the data back into the MAC generation code as described above, and check that the resulting MAC matches the one you received.
MAC algorithms can be constructed from cryptographic hash functions like MD5 or SHA-1/2/3. In fact, a basic MAC can be obtained simply by concatenating the secret and the message, hashing them, and using the result as the token.
For some hash functions, like SHA-3, the simple MAC construction described above is actually believed to be secure; for older hash functions, which were not explicitly designed with this use in mind, however, it's safer to use the (slightly) more complicated HMAC construction, which hashes the input twice.
Alternatively, there are also MAC algorithms, such as CMAC, which are based on block ciphers (like AES) instead of hash functions. In some cases (e.g. on embedded platforms, where a fast hash function may not be available) these may be more efficient than HMAC; for a web application, however, the choice is essentially a matter of taste.
One difference between GET and POST is that the information for the former is passed in the URL itself. That means you can type what you like in the browser's address bar -- it doesn't have to have come from an HTML form. I think that's what is meant by spoofing here.
The most obvious protection is to calculate a CRC of all the protected fields -- in this case MickeyMouse -- and put that value in a hidden field of the HTML form sent out by the server. Then, when the request comes back, calculate the CRC of the same fields and check that it matches the value of the returned hidden field.
Of course that can be circumvented if the user works out how the protection functions and adds his own calculation of the CRC of his spoofed data as well. But this should be sufficient for a proof of concept.
If you want to detect if a user has changed a parameter in the querystring of a url after a form has performed a GET action, then generate a client side hash before the form is submitted. The hash would be based on the values of the form fields, and then compared to a recalculated hash based on the current parameter values on the response page. If the hashes don't match the querystring has been tampered with.
Here's a client side Crypto library to calculate the hashes https://code.google.com/p/crypto-js/
Note this is only for educational use, and wouldn't provide enough security in the real world, as a person could also discover the hashing key by inspecting the page source and use that to generate their own hashes.
A POST method wouldn't prevent spoofing anyway. POST and GET do almost exactly the same thing - they send plain text encoded variables to a web server.
They're insanely easy to "spoof" - the point isn't the spoofing, it's that you shouldn't trust "user input" like that, ever.
I would suggest in the case of the names, it doesn't matter. So what if I fudge your web page to "pretend" I am called "baseball bat" instead?
If it's important, like for example, ensuring I can only see my test results - then you need to handle the data processing server side. One method of doing this is via session tracking - so rather than including field in a web form, I instead use a "session token".
You would 'send' me a username and password - ideally using a hash to make it impossible to 'see' as you're sending it, or in your browser history. And then I would check it against my server, to check if that hash is 'valid' by performing the same operation on the server, and comparing the two.
So perlishly:
#!/usr/bin/perl
use strict;
use warnings;
use Digest::SHA qw ( sha1_base64 );
my ( $firstname, $lastname ) = qw ( Mickey Mouse );
my $timewindow = int ( time / 300 );
my $token = sha1_base64 ( $timewindow.$firstname.$lastname );
print $token;
This produces a token that doesn't last long - it changes every 5 minutes - but it's extremely difficult to tamper with.
The reason for including the time, is to avoid replay attacks, whereby if look in your browser history, I can find "your" token and reuse it. (That's probably the next question after the "spoofing" one though :))
If you sent the parameters with the token, bear in mind that it's actually quite easy for a malicious actor to perform the same calculation themselves, and send some completely faked credentials and tokens.
This is something of a simplistic example though - because really, faked parameters shouldn't matter, because you shouldn't trust them in the first place. If 'Mickey Mouse' is valid, and 'baseball bat' isn't, then your server should detect that when processing the form, and discard the latter, which makes the whole 'form spoofing' thing irrelevant.
The question is rather narrowly phrased, so this answer might not quite address what you're asking. But as a matter of policy, if you don't want your users to tamper with your data you should not give them custody of it. Why are you relying on the query string for the user name if the server already knows it? Rely on the client for authentication and for new information, and rely on your records for any information that should stay beyond the user's control.
POST requests can be crafted almost as easily as GET requests, and cryptographic protection, even when it is secure, is only useful to the extent that the client cannot access
the encrypted data; so why transmit it back and forth?

JSON REST endpoint returning / consuming JSON literals

Is it advisable or not in a RESTful web service to use JSON literal values (string / number) as input parameter in the payload or in the response body?
If I have an endpoint PUT /mytodolist is it OK for it to accept a JSON string literal value "Take out the rubbish" in the request payload (with Content-Type=application/json) or should it accept a JSON object instead ({"value":"Take out the rubbish"})?
Similarly, is it fine for GET /mytodolist/1 to return "Take out the rubbish" in the response body or should it return a proper JSON object {"value":"Take out the rubbish"}
Spring MVC to makes implementing and testing such endpoints easy, however clients have flagged this as non standard or hard to implement. In my point of view JSON literals are JSON, but not JSON objects, so I'd say it is fine. I have found no recommendations using Google.
EDIT 1: Clafirication
The question is entirely about the 'standard', if it allows this or not.
I understand the problem with the extensibility, but one can never design a fully extensible interface IMHO. If changes need to be done, we can try extending what we have in a backwards compatible way, but there will come a time when it becomes messy and an other approach is required - which is commonly handled by versioning the API in one way or another. I find it a fair point even though, because using literals as request/response body immediately becomes inextensible, while coming up with a reasonable one-attribute JSON object does not.
It is also understood that some frameworks have problems with handling JSON literals, this is the origin of this question. The tool I used happened to support this, so I thought this was all right, but the front-end library did not.
Still, what I am intending to find out right now, is if using JSON literals is according to the de-facto standard (even if it is a cornercase) or not.
I would recommend to use JSON object always. One reason is that for Content-Type application/json people expect something staring with "{" and not all frameworks will handle json literals properly. Second reason is that probably you will add some additional attributes to you list item (due date, category, priority, etc). And then you'll break backward compatibility, by adding new field.
It may be acceptable in the context of your example, but keep in mind that unambiguous interfaces are easier to use and that will encourage adoption.
For example, your interface could interpret "Take out he rubbish" as the same as {task:"take out the rubbish"}, but once you add additional properties (eg "when" or "who") the meaning of a solitary string in the request becomes ambiguous. It's inevitable that you'll add support for new properties as your interface matures.

Why use a JSON object to pass data with POST versus a Query String in Perl?

I'm looking to run a script using Stripe.pm, basically looking to do credit card processing. The credit card number is not being passed at all. All the examples I see use a JSON object passed in a POST call but I have a lot of experience using Query Strings i.e.
http://www.example.com/cgi-bin/processingscript.pl?param1=XXXX&param2=YYYYY&param3=ZZZZZ
Is this a security risk? What is the advantage or disadvantage of posting using JSON versus a query string like I'm used to using?
From a purely technical point of view, there is no difference between POST and GET if you pass a reasonably short parameter. You can also just pass JSON as a GET parameter no problem:
GET foo.pl?json={'foo':'bar'}
It would make sense to url-encode the data in this case. You can also send the same request using POST.
If you do not want to use query params at all, you need POST and put your JSON into the request body. Depending on which option you choose, there are differences in how to deal with it in Perl. Let's say you are using the CGI module... Perl makes no difference between POST and GET params.
For the query string GET or POST, you need to do:
use CGI;
my $cgi = CGI->new;
my $json = $cgi->param('json');
If you put the payload directly into the request body, you will instead need to do:
use CGI;
my $cgi = CGI->new;
$cgi->param('POSTDATA');
This is documented in CGI under "handling non url-encoded ...".
For JSON, there is also of course the time it takes to parse it, but that should be negligible.
The advantage JSON has over query strings without JSON inside them is, that you can encode arbitrary complex data structures inside JSON, while plain-text query strings are just one level deep.
From a security point of view, pretty much everything has been said. I'll recap my own ideas:
use SSL
do not put sensitive stuff into log files
if you are dealing with CC data (even if it is not the number itself), take extra care; read up on PCI DSS and encrypt stuff during transmission
NEVER store a cvc!
if you want to learn more about that topic, there is a Stack Exchange site called Information Security.
Security
- making sure no one other than user/server know the data GET/POST has no influence. Use SSL to ensure this.
- Stopping the user passing "interesting" arguments to your script. POST has a certain amount of security-through-obscurity in that most users won't see the parameters, but it's no real security. You should check the parameters in your application to cover that.
Generally POST with a nice data package (e.g. JSON) makes for a much more flexible and maintainable application interface, and has the advantage that you don't need to worry about encoding and length of parameters the same way you do when using GET.
POST:
Generally safer for important data
Parameters not shown in url
You can pass longer Strings-Structures (e.g. JSON) than by using GET that has length limit on the parameters you send

How can I safely add user-supplied URLs to my HTML page?

As with any user supplied data, the URLs will need to be escaped and filtered appropriately to avoid all sorts of exploits. I want to be able to
Put user supplied URLs in href attributes. (Bonus points if I don't get screwed if I forget to write the quotes)
...
Forbid malicious URLs such as javascript: stuff or links to evil domain names.
Allow some leeway for the users. I don't want to raise an error just because they forgot to add an http:// or something like that.
Unfortunately, I can't find any "canonical" solution to this sort of problem. The only thing I could find as inspiration is the encodeURI function from Javascript but that doesn't help with my second point since it just does a simple URL parameter encoding but leaving alone special characters such as : and /.
OWASP provides a list of regular expressions for validating user input, one of which is used for validating URLs. This is as close as you're going to get to a language-neutral, canonical solution.
More likely you'll rely on the URL parsing library of the programming language in use. Or, use a URL parsing regex.
The workflow would be something like:
Verify the supplied string is a well-formed URL.
Provide a default protocol such as http: when no protocol is specified.
Maintain a whitelist of acceptable protocols (http:, https:, ftp:, mailto:, etc.)
The whitelist will be application-specific. For an address-book app the mailto: protocol would be indispensable. It's hard to imagine a use case for the javascript: and data: protocols.
Enforce a maximum URL length - ensures cross-browser URLs and prevents attackers from polluting the page with megabyte-length strings. With any luck your URL-parsing library will do this for you.
Encode a URL string for the usage context. (Escaped for HTML output, escaped for use in an SQL query, etc.).
Forbid malicious URLs such as javascript: stuff or links or evil domain names.
You can utilize the Google Safe Browsing API to check a domain for spyware, spam or other "evilness".
For the first point, regular attribute encoding works just fine. (escape characters into HTML entities. escaping quotes, the ampersand and brackets is OK if attributes are guaranteed to be quotes. Escaping other alphanumeric characters will make the attribute safe if its accidentally unquoted.
The second point is vague and depends on what you want to do. Just remember to use a whitelist approach instead of a blacklist one its possible to use html entity encoding and other tricks to get around most simple blacklists.

Is use of #attributes in JSON non-standard or standard?

We're adding JSON output to an existing API that outputs XML, to make MobileHTML integration much easier. However, our developer has queried the use of #attributes appearing in the JSON output.
Our original XML looks like:
<markers>
<marker id="11906" latitude="52.226578"
...
and so the JSON comes out as:
callbackname({"marker":[{"#attributes":{"id":"11906","latitude":"52.226578"
....
Our developer has stated:
"Although '#attributes' is legal JSON, it seems to break dot notation, so I can't call data.#attributes. I can call data['#attributes'], so there's a workaround, but it seems safer just to avoid the #-symbol, unless there's a good reason for it."
The XML->JSON(P) conversion is done using:
$xmlObject = simplexml_load_string ($data);
$data = json_encode ($xmlObject);
I want to make our API as easy-to-integrate as possible, and therefore to use standard stuff where possible. But we're using the native PHP json_encode function, so I'd be surprised if it's doing something non-standard.
Is use of #attributes non-standard? Is this basically just the problem that our API is a bit broken in terms of using <marker id..> rather than <marker><id> ?
The JSON standard only specifies what is and isn't valid; it doesn't set convention. There's nothing inherently wrong with using property names which aren't valid Javascript identifiers.
However, as your developer points out, this does make it slightly more awkward to use the result in JS, as it makes it impossible to use dot notation. On the gripping hand, using attributes for simple content is usually seen as "good XML" and you're using using default, built-in tools to convert from XML to JSON. I'd tend to consider that a good enough reason to leave it as it is.
If it were me, I'd look at how difficult it would be to implement a custom XML -> JSON converter. If it's simple and straightforward, go that route and avoid #attribute (it will also likely make your JSON smaller and simpler). If it's too much hassle, however, missing out on dot notation isn't the end of the world. At worst, var attr = data.marker["#attributes"]; will get around the issue.