HTMLForm's default action - mediawiki

While doing Code Review on Wikimedia Gerrit, I stumbled across comments saying:
$htmlForm->setAction( wfScript() );
Reviewer: not needed, wfScript() is the default for the action.
So I consulted the documentation about HTMLForm::setAction (huge page).
Set the value for the action attribute of the form.
When set to false (which is the default state), the set title is used.
However, what I do not understand is how wfScript (Get the path to a specified script file, respecting file extensions; this is a wrapper around $wgScriptPath etc. except for 'index' and 'load' which use $wgScript/$wgLoadScript) could be extracted from the title (instance of Title?).
This doesn't make any sense to me as wfScript() returns an entry point and all Titles usually share the same entry point.
Looking up HTMLForm::getAction, I see the code really uses Title. Only conditionally, though. Simply said, if Title::getLocalURL would return a URL containing a query string, e.g. /mw/index.php?title=Special:Contributions, wfScript() is returned, and the title isn't used at all, as opposed to what is documented in HTMLForm::setAction(). The rationale is clear: This is because browsers may strip or amend the query string, which is unwanted here.
Why isn't the hidden form field approach always used and why does the Title have to know about its entry point?
How is $this->getConfig()->get( 'ArticlePath' ) related to $this->getTitle()->getLocalURL() [The former is used as a condition in and the latter is possibly returned from HTMLForm::getAction.]

I'm not totally sure I understand your question, so if this answer doesn't really answer your questions, feel free to comment on it and I'll try to fix my answer :)
Why isn't the hidden form field approach always used and why does the Title have to know about its entry point?
Why should it? It would be possible, yes, but the only reason to use it is, that browsers strip out parameters passed to the action parameter of the form. Other values (such as short urls) works fine. The other aspect is, that, if you configure short url's (e.g. yourdomain.com/wiki/Special:UserLogin instead of yourdomain.com/w/index.php?title=Special:UserLogin), why should HTMLForm use
yourdomain.com/w/index.php?title=Special:UserLogin&wpusername=test&wppassword=123 (bad example, because UserLogin doesn't use HTMLForm and wouldn't use GET, but think about any other example :P) instead of the (for the user) nicer one yourdomain.com/wiki/Special:UserLogin?wpusername=test&wppassword=123? So it doesn't have a real technical background to not use always the hidden title field, iirc.
How is $this->getConfig()->get( 'ArticlePath' ) related to $this->getTitle()->getLocalURL()
The wgArticlePath configuration variable specifies the base URL for article links, which means, if you call getLocalURL on a Title object, the config var is used to build the URL/Link if no query is specified (see the code of getLocalURL to know how it works). That means, that the config variable specifies, how links are returned from this function (e.g. /w/index.php?title=$1 or /wiki/$1). So it's a very important part for this function and (to close the circle to HTMLForm) the important condition to decide, if wfScript() is used or the local url (from the Title object), as it is the condition for Title::getLocalURL() to decide if a question mark is used or not.
I hope that helps a bit to understand what HTMLForm does, if not, feel free to comment :)

Related

Replace iframe.src attribute with javascript comment holding value

I am looking for a way to replace the content of the src attribute for an iframe with a dummy variant containing the original src value (but will not actually fetch anything). I am loading the html code via Ajax so I can change the src-attribute before the code is injected into the DOM - so I don't need help with that part. What I would appreciate feedback on is what to put in the src attribute. There is a related post here discussing what can go in the src attribute, but in contrast to this post, I want to store data (namely the original src value) so that I can extract it later. It seems the alternatives are:
src="javascript:/*http://originalsrcvalue.com*/"
src="about:blank/*http://originalsrcvalue.com*/"
src="#http://originalsrcvalue.com"
I am leaning towards the last variant using bookmarks. I'm looking for feedback on potential problems or cross-browser issues that might arise or suggestions for alternative solutions.
Edit: One way of addressing the problem is to use custom attributes - and this is probably what I'll end up using in this specific case. However, I would also like feedback on ways to store data in src-tags in the fashion showed above.
You could store the actual URL to a data-your-data-name attribute and fetch it with Javascript when you need it, by doing element.getAttribute('data-your-data-name') or if you don't care much about IE users, by element.dataset.yourDataName
References:
https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Using_data_attributes
https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/dataset

Hide the text input field portion in a form's GET query url when the value is an empty string

e.g.: http://127.0.0.1:8000/database/?reference_doi=&submit=Submit
I know It appears to be an html standard, but is there a tag to switch it, so that the empty text input string does not appear in the query url?
Or alternatively, since I'm using Django, I tried doing the following in my view.
request_get_copy = request.GET.copy()
for key, value in request_get_copy.items():
if not value or key == 'submit':
request_get_copy.pop(key)
request.GET = request_get_copy
request.META['QUERY_STRING'] = request_get_copy.urlencode()
I displayed request.GET and request.META['QUERY_STRING'] in the actual page through my template, and several methods that request object has, and they all gave successfully "corrected" values, like http://127.0.0.1:8000/database/ But since the GET request first goes through the browser, the displayed url still contains empty string value portions. Is there anything I can do?
The easiest thing you could do is to issue a redirect to your fixed URL:
fixed_url = request_get_copy.urlencode()
return redirect(fixed_url)
Even better if you do that only if it actually changed, and before any DB access or heavy work.
This means an additional GET, but gives you the result you want, and I guess that's more valuable to you :)
If Javascript is an option, you could also do this changes before the submit actually happens, it's a tad more convoluted but will avoid the extra request.
Edit: Just to be clear, there's no way to "turn it off", you could say this is how HTTP and browsers work :)

How to expand a json for a specified website, based on the url

This might be a bit of a confounded question, but please bear with me:
If I were on a site, wanting to read comments through the json, as with this particular site, how would I expand this particular site such that I can see more than 10 comments? Currently, the ending to the url looks like /?content_id=60902841-c238-364c-92f0-68e8b4dce996&_device=full&count=10&sortBy=highestRated&isNext=true&offset=10&pageNumber=1&_media.modules.content_comments.switches._enable_view_others=1&_media.modules.content_comments.switches._enable_mutecommenter=1&enable_collapsed_comment=1.
I tried changing the pageNumber to a higher number and got the same results. I tried change the &count=10 to &count=50, which also doesn't work.
Thanks!
Each website will be a case by case. Some websites will allow you to expand it at the end of their url, as in:
https://somesite.com/search&page=1&resultcount=100
where you would change the parameter of resultcount to a higher value. Some sites cap the value at arbitrary values, and some don't have this parameter.

Capybara won't find a button by its "name" attribute

A W3C-validated HTML 5 web page contains this working, simple button inside a login form.
<input data-disable-with="Signing in, please wait&hellip;"
name="commit" type="submit" value="Sign in" />
I'm writing a largely pointless test :-) in a Rails 3.2.17 application that's just to get the hang of Capybara and I've already got completely stuck Googling, reading documentation and reading source code to the test framework, with no joy - attempting to find this button by its name (i.e. "commit") fails.
click_button("commit")
find_button("commit")
Both result in Capybara::ElementNotFound: Unable to find button "commit". If I use the visible button text of Sign in then the element is found, i.e. these:
click_button("Sign in")
find_button("Sign in")
...both work fine, so it would appear that the XML parser isn't having any trouble finding the element.
Documentation for click_button says that the locator works on "id, text or value", with "text" being meaningless for an input element like this (the visible text is taken from the value attribute), but relevant perhaps for button elements. So, we might expect that to fail, though if we view the code via the documentation, find that it calls down to find in the same way as find_button. Yet find_button is documented differently; it says it locates by "id, name or value". So sadly, we know from this that the documentation is broken because it says two different things for what turns out to be an identical call at the back end.
Either way, the element isn't found by name, and that means the lower level find call isn't searching name attributes as far as I can see. This means Capybara (2.2.1, on Nokogiri 1.6.1) is rather broken in that respect. How come nobody has noticed? I've Googled for ages and it doesn't seem to come up. I seem to be rather missing the point :-)
Why don't I just search for the English text in the button, you might ask? Because of internationalisation. This old, Rails 1 -> 2 -> 3 upgraded app has some I18n parts and other static text parts. I don't want to be forced to put I18n into any view that Capybara tests, just so I can have the test use I18n.t() to ensure a match despite different languages or locale file updates. Likewise, it would clearly be very stupid in 2014 to write hard-coded English strings into my tests.
That's why we have names and IDs and such... The unique (in theory!) identifiers that are machine-read, not human-read.
I could hack up something that CSS-selected by "type=submit" but seriously, why isn't Capybara searching the name attribute when its documentation says it does, and why does the documentation disagree on what attributes are searched on two methods that call down to exactly the same back-end implementation with exactly the same parameters?
TIA :)
It turns out the docs are misleading for both calls, as neither look at the attributes listed. It's also clearly very confusing what exactly a "button" means, since a couple of people herein seemed to think it literally only meant an HTML button element but that's not the case.
If you view the source for the documentation of, say, click_button:
https://github.com/jnicklas/capybara/blob/a94dfbc4d07dcfe53bbea334f7f47f584737a0c0/lib/capybara/node/actions.rb#L36
...you will see that this just calls (as I've mentioned elsewhere) to find with a type of :button, which in turn passes through to Capybara's Query engine which, in turn, ends up just using the standard internal selection mechanism to find things. It's quite elegant; in the same way that an external client can add their own custom selectors to making finding things more convenient:
http://rubydoc.info/github/jnicklas/capybara/master/Capybara#add_selector-class_method
...so Capybara adds its own selectors internally, including, importantly, :button:
https://github.com/jnicklas/capybara/blob/a94dfbc4d07dcfe53bbea334f7f47f584737a0c0/lib/capybara/selector.rb#L133
It's not done by any special case magic, just some predefined custom selectors. Thus, if you've been wondering what custom selectors are available from the get-go in Capybara, that's the file to read (it's probably buried in the docs too but I've not found the list myself yet).
Here, we see that the button code is actually calling XPath::HTML.button, which is a different chunk of code in a different repository, with this documentation:
http://rdoc.info/github/jnicklas/xpath/XPath/HTML#button-instance_method
...which is at the time of writing slightly out of date with respect to the code, since the code shows quite a lot more stuff being recognised, including input types of reset and button (i.e. <input type="button"...> rather than <button...>...</button>, though the latter is also included of course).
https://github.com/jnicklas/xpath/blob/59badfa50d645ac64c70fc6a0c2f7fe826999a1f/lib/xpath/html.rb#L22
We can also see in this code that the finder method really only finds by id, value and title - i.e. not by "text" and not by name either.
So assuming XPath is behaving as intended, though it's not clear from docs, we can see that Capybara isn't documenting itself correctly but probably ought to make the link down to XPath APIs for more information, to avoid the current duplication of information and the problems this can cause for both maintainers and API clients.
In the mean time, I've filed this issue:
https://github.com/jnicklas/capybara/issues/1267
You can also use css selectors which are default capybara locators. People say they are faster.
find('[name=commit]').click
Capybara do not look at name attribute in it's finders :(
You can use xpath selector if you want
find(:xpath, "//input[contains(#name, 'commit')]").click()
If anyone wants it is possible to add (quite easily) find by name selector. In order to do so:
Add following code to test/test_helper.rb (for minitest)
Capybara.add_selector(:name) do
xpath { |name| XPath.descendant[XPath.attr(:name).contains(name)] }
end
Use it
Now in your tests you can use following selector:
find(:name, 'part_of_the_name_attribute')
It will find every element which name attribute contains searched value.
Example
find(:name, 'user')
This will find elements (element could be of any type):
<select name='user_name'>
<input name='name_of_user'>
<textarea name='some_user_info'>
You can use this selector to find a button on a page with RSpec and Capybara:
expect(page).to have_selector(:link_or_button, "Button text")
Check your gem depencies. RSpec 3 or higher works with gem 'rspec-rails', '~> 3.7.1' then capybara version must be gem 'capybara', '~>2.18.0' and poltergeist should be gem 'poltergeist', '~>1.17.0'.

REST/Ajax deep linking compatibility - Anchor tags vs query string

So I'm working on a web app, and I want to filter search results.
A nice restful implementation might look like this:
1. mysite.com/clothes/men/hats+scarfs
But lets say we want to ajax up the filtering, like the cool kids, and we want to retain deep linking, we might use the anchor tag and parse that with Javascript to show the correct listings:
2. mysite.com/clothes#/men/hats+scarfs
However, if someone clicks the first link with JS enabled, and then changes filters, we might get:
3. mysite.com/clothes/men/hats+scarfs#/women/shoes
Urk.
Similarly, if someone does not have JS enabled, and clicks link 2 - JS will not parse the options and the correct listings will not be shown.
Are Ajax deep links and non-Ajax links incompatible? It would seem so, as servers cannot parse the # part of a url, since it is not sent to the server.
There's a monkeywrench being thrown into this issue by Google: A proposal for making Ajax crawlable. Google is including recommendations for url structure there that may give you ideas for your own application.
Here's the wrapup:
In summary, starting with a stateful
URL such as
http://example.com/dictionary.html#AJAX
, it could be available to both
crawlers and users as
http://example.com/dictionary.html#!AJAX
which could be crawled as
http://example.com/dictionary.html?_escaped_fragment_=AJAX
which in turn would be shown to users
and accessed as
http://example.com/dictionary.html#!AJAX
View Google's Presentation here (note: google docs presentation)
In general I think it's useful to simply turn off JavaScript and CSS entirely and browse your website and web application and see what ends up getting exposed. Once you get a sense of what's visible, you will understand what most search engines see and that in turn will show you what is and is not getting spidered.
If you go to mysite.com/clothes/men/hats+scarfs with JavaScript enabled then your JavaScript should automatically rewrite that to mysite.com/clothes#men/hats+scarfs - when you click on a filter, they should be controlled by JavaScript meaning you'll only change the hashtag rather than the entire URL (as you're going to have return false anyway).
The problem you have is for non-JS users going to your JS enabled deeplinks as the server can't determine that stuff. Unfortunately, the only thing you can do is take them to mysite.com/clothes and make them start their journey again (as far as I'm aware). You'll need to try and ensure that when people link to the site, they use the hardcoded deeplink rather than the hashed deeplink
I don't recommend ever using the query string as you are sending data back to the server without direct relevance to the prior specified destination. That is a corruptible security hole as malicious code can be manually added to the query string to cause a XSS or buffer overflow attack at your webserver.
I believe REST was intended to work with absolute URIs without a query string, because then your specifying only a location of a resource and it is that location that is descriptive and semantically relevant in addition to the possibility of the resource being so equally relevant. Even if there is no resource at the specified path you have still instantiated a potentially unique and descriptive location that can be processed accordingly.
Users entering the site via deep links
Nonsensical links (like /clothes/men/hats#women/shoes) can be avoided if you construct your Ajax initialisation code in such a way that users who enter the site on filtered pages (e.g. /clothes/women/shoes) are taken to the /clothes page before any Ajax filtering happens. For example, you might do something like this (using jQuery):
$("a.filter")
.each(function() {
var href = $(this).attr("href").replace("/clothes/", "/clothes#");
$(this).attr("href", href);
})
.click(function() {
update_filter($(this).attr("href").split("#")[1]);
});
Users without JavaScript
As you said in the question, there's no way for the server to know about the URL fragment so filtering would not be applied for users without JavaScript enabled if they were given a link to /clothes#filter.
However, even without filtering, these links could be made more meaningful for non-JS users by using the filter strings as IDs in your /clothes page. To prevent this messing with the Ajax experience the IDs would need to be changed (or the elements removed) with JavaScript before the Ajax links were initialised.
How practical this is depends on how many categories you have and what your /clothes page contains.