Delphi7 Indy HTTPServer not getting form parameters - html

I have an Indy 10 IdHTTPServer in a Windows application which serves a virtual HTML form with two text boxes and a submit button. When the button is pressed in the browser I am not seeing any form params returned to the server.
Note that this is a bit of proof of concept code which will be used to make a windows service respond to button presses in a web form.
The HTML form is like this:
<form action="http://<addressofsite>/" method="post">
First name:<br>
<input type="text" name="firstname" value="Mickey"><br>
Last name:<br>
<input type="text" name="lastname" value="Mouse"><br><br>
<input type="submit" value="Submit">
</form>
in the Delphi code I have this:
procedure TForm1.HTTPServer1CommandGet(AThread: TIdPeerThread;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
begin
...
if ARequestInfo.Command = 'POST' then
begin
{******* POSTS ***************}
Memo1.Text := ARequestInfo.RawHTTPCommand;
end;
end;
I have tried various bits of the ARequestInfo structure but whatever I try all I see when the button is pressed in the browser is:
POST / HTTP 1.1
No params appear to be passed.
I'm obviously doing something wrong, so please can someone point out my idiocy.
Update:
As pointed out by The Arioch below, I should have checked that the browser is actually sending the data - so using Chrome developer tools I examined the headers, the results of which are:
Response Headers
Connection:close
Content-Type:text/html
Server:Indy/10.0.52
Request Headers
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image /webp,*/*;q=0.8
Accept-Encoding:gzip, deflate, br
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Authorization:Basic YWRtaW46cGFzcw==
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:31
Content-Type:application/x-www-form-urlencoded
Host:127.0.0.1:8091
Origin:http://127.0.0.1:8091
Referer:http://127.0.0.1:8091/main
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36
Form Data
firstname:Mickey
lastname:Mouse
So the browser is definitely sending the form data.

The raw encoded form data is stored in the ARequestInfo.FormParams and ARequestInfo.UnparsedParams properties.
If TIdHTTPServer.ParseParams is true (which it is by default), the decoded form data is stored in the ARequestInfo.Params property, eg:
procedure TForm1.HTTPServer1CommandGet(AThread: TIdPeerThread;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
var
FirstName, LastName: string;
begin
...
if (ARequestInfo.CommandType = hcPOST) and
IsHeaderMediaType(ARequestInfo.ContentType, 'application/x-www-form-urlencoded') then
begin
FirstName := ARequestInfo.Params.Values['firstname'];
LastName := ARequestInfo.Params.Values['lastname'];
...
end;
end;
Note that TIdHTTPServer is a multi-threaded component. The various events, including OnCommandGet, are fired in the context of worker threads. So, if you need to touch UI controls, like your TMemo, you must synchronize with the main UI thread, eg:
procedure TForm1.HTTPServer1CommandGet(AThread: TIdPeerThread;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
begin
...
if (ARequestInfo.CommandType = hcPOST) and
HeaderIsMediaType(ARequestInfo.ContentType, 'application/x-www-form-urlencoded') then
begin
TThread.Synchronize(nil,
procedure
begin
Memo1.Text := ARequestInfo.Params.Text;
end
);
...
end;
end;
Also, 10.0.52 is an outdated version of Indy. The current version (at the time of this writing) is 10.6.2.5384.

Related

Timeout Error - DHL API to Google Sheets - UrlFetchApp

In Python I use as headers the "Request Headers" that are in the request captured using the browser's developer options and it works fine.
I tried the same with Apps Script, but UrlFetchApp retrieves Timeout exception:
function WS() {
var myHeaders = {
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,es;q=0.8,pt;q=0.7',
'cookie': '', //the cookies that appear here in my browser
'referer': 'https://www.dhl.com/global-en/home/tracking/tracking-express.html?submit=1&tracking-id=4045339815',
'sec-ch-ua': '"Microsoft Edge";v="105", "Not)A;Brand";v="8", "Chromium";v="105"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.53',
'x-sec-clge-req-type': 'ajax',
};
var options = {
'method': 'GET',
'headers': myHeaders,
}
var response = UrlFetchApp.fetch("https://www.dhl.com/utapi?trackingNumber=4045339815&language=en&source=tt",options);
Logger.log(response.getContentText())
};
I would appreciate any ideas / hint.
EDIT :
Website to catch cookies :
https://www.dhl.com/global-en/home/tracking/tracking-express.html?submit=1&tracking-id=4045339815
I think the problem is most likely the user-agent header. Apps Script's URL Fetch Service uses Google's servers to send the request instead of your browser. As a result, Apps Script forces its own user agent that looks like this:
"User-Agent": "Mozilla/5.0 (compatible; Google-Apps-Script; beanserver; +https://script.google.com; id: ...)"
On the other hand, Python sends the headers exactly as you specified them. You can test this yourself by sending your requests to a test server like https://httpbin.org/headers. The only difference between the Python and Apps Script requests is the user-agent header.
It doesn't look like there's a way to bypass this. There's a request in Google's issue tracker here to allow customization of the user agent but it's been open since 2013 so it doesn't seem like something they want to do, maybe for transparency reasons or something similar.
The reason why this header would be a problem is because DHL doesn't want you to use their user-facing endpoints to request information with scripts, though you probably already know this since you're trying to replicate the browser's headers and cookies. Trying to access the endpoint without the right headers just results in this message:
My guess is that DHL has blacklisted the Apps Script user agent, hence the timeout. If you want to use Apps Script you probably will have to go to https://developer.dhl and set up a developer account to get your own API key. If you want to keep using your current method then you'll have to stick to Python or anything else that won't change your headers.
Edit:
Here's a quick Python sample that seems to support the theory:
import requests
#Chrome user agent, this works
useragent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.53'
#No user agent, this also works
#useragent = ''
#Fake user agent, this still works
#useragent = 'Mozilla/5.0 (compatible; Googlu-Opps-Script)'
#Apps Script user agent, this just hangs
#useragent = 'Mozilla/5.0 (compatible; Google-Apps-Script)'
headers= {
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,es;q=0.8,pt;q=0.7',
'cookie': 'your-cookie',
'referer': 'https://www.dhl.com/global-en/home/tracking/tracking-express.html?submit=1&tracking-id=4045339815',
'sec-ch-ua': '"Microsoft Edge";v="105", "Not)A;Brand";v="8", "Chromium";v="105"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': useragent,
'x-sec-clge-req-type': 'ajax'}
url="https://www.dhl.com/utapi?trackingNumber=4045339815&language=en&source=tt"
result = requests.get(url, headers=headers)
print(result.content.decode())
Based on my testing in Python, even a blank or fake user agent will work, but one that has Google-Apps-Script will just keep hanging. Even changing a single letter to Google-Opps-Script or something similar will make it work.

How to get div with multiple classes BS4

What is the most efficient way to get divs with BeautifulSoup4 if they have multiple classes?
I have an html structure like this:
<div class='class1 class2 class3 class4'>
<div class='class5 class6 class7'>
<div class='comment class14 class15'>
<div class='date class20 showdate'> 1/10/2017</div>
<p>comment2</p>
</div>
<div class='comment class25 class9'>
<div class='date class20 showdate'> 7/10/2017</div>
<p>comment1</p>
</div>
</div>
</div>
I want to get div with comment. Usually there is no problem with nested classes, but I don't know why the command:
html = BeautifulSoup(content, "html.parser")
comments = html.find_all("div", {"class":"comment"})
doesn't work. It gives empty array.
And I guess this happens because there are a lot of classes, so he looks for div with only comment class and it doesn't exist. How can I find all the comments?
Apparently, the URL that fetches the comments section is different from the original URL that retrieves the main contents.
This is the original URL you gave:
http://community.sparknotes.com/2017/10/06/find-out-your-colleges-secret-mantra-we-hack-college-life-at-the-100-of-the-best
Behind the scenes, if you record the network log in the network tab of Chrome's developer menu, you'll see a list of all URLs that are sent by the browser. Most of them are for fetching images and scripts. Few relate to other sites such as Facebook or Google (for analytics, etc.). The browser sends another request to this particular site (sparknotes), which gives you the comments section. This is the URL:
http://community.sparknotes.com/commentlist?post_id=1375724&page=1&comment_type=&_=1507467541548
The value for post_id can be found in the web page returned when we request the first URL. It is contained in an input tag which has a hidden attribute.
<input type="hidden" id="postid" name="postid" value="1375724">
You can extract this info from the first web page using a simple soup.find('input', {'id': 'postid'})['value']. Of course, since this identifies the post uniquely, you need not worry about its changing dynamically on each request.
I couldn't find the '1507467541548' value passed to '_' parameter (last parameter of the URL) anywhere in the main page or anywhere in the cookies set by response headers of any of the pages.
However, I went out on a limb and tried to fetch the URL by passing it without the '_' parameter, and it worked.
So, here's the entire script that worked for me:
from bs4 import BeautifulSoup
import requests
req_headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive',
'Host': 'community.sparknotes.com',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}
with requests.Session() as s:
url = 'http://community.sparknotes.com/2017/10/06/find-out-your-colleges-secret-mantra-we-hack-college-life-at-the-100-of-the-best'
r = s.get(url, headers=req_headers)
soup = BeautifulSoup(r.content, 'lxml')
post_id = soup.find('input', {'id': 'postid'})['value']
# url = 'http://community.sparknotes.com/commentlist?post_id=1375724&page=1&comment_type=&_=1507467541548' # the original URL found in network tab
url = 'http://community.sparknotes.com/commentlist?post_id={}&page=1&comment_type='.format(post_id) # modified by removing the '_' parameter
r = s.get(url)
soup = BeautifulSoup(r.content, 'lxml')
comments = soup.findAll('div', {'class': 'commentCite'})
for comment in comments:
c_name = comment.div.a.text.strip()
c_date_text = comment.find('div', {'class': 'commentBodyInner'}).text.strip()
print(c_name, c_date_text)
As you can see, I haven't used headers for the second requests.get. So I'm not sure if it's required at all. You can experiment omitting them in the first request as well. But make sure you use requests, as I haven't tried using urllib. Cookies might play a vital role here.

Send data to Delphi server

I'm making a Delphi XE5 VCL Forms Application with a TIdHTTPServer on the main form and a CommandGet of the IdHTTPServer procedure:
procedure TForm1.IdHTTPServerCommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
var pageContent: TStringList;
begin
if pos('&command=add', ARequestInfo.UnparsedParams) > 0 then
begin
pageContent:= TStringList.Create;
try
pageContent.Add('<html>');
pageContent.Add('<head>');
pageContent.Add('<title>Profile</title>');
pageContent.Add('</head>');
pageContent.Add('<body>');
pageContent.Add('<input id="subjects" type="text"/>');
pageContent.Add('<input id="Add" type="button" onclick="sendData()"/>');
pageContent.Add('</body>');
pageContent.Add('</html>');
AResponseInfo.ContentText := pageContent.Text;
finally
pageContent.Free;
end;
end;
end;
My question is how the user input is send to the server when the user clicks the button 'Add'.
With this HTML, the client (web browser) will not send any data because there is no HTML form element present.

Plupload crossdomain upload 200 http error

I would like to upload files to a remote server using the plupload library. Everything works with Chrome (32.0) and IE 10 using the html5 runtime but when I try with Firefox 27 (html5 runtime) or IE 8 (html4 runtime) I get an error Error #-200: HTTP Error..
Clientside script :
$(function() {
var uploader = new plupload.Uploader({
browse_button: 'browse',
url: 'https://remote.com/API/action.php',
runtimes : 'html5,flash,silverlight,html4',
flash_swf_url : './js/Moxie.swf',
silverlight_xap_url : './js/Moxie.xap'
});
uploader.init();
uploader.settings.multipart_params = {
[...]
};
// PreInit events, bound before any internal events
uploader.bind('init', function(up, info) {
console.log('[Init]', 'Info:', info, 'Features:', up.features);
alert(info['runtime']);
});
uploader.bind('Error', function(up, err) {
document.getElementById('console').innerHTML += "\nError #" + err.code + ": " + err.message;
});
document.getElementById('start-upload').onclick = function() {
uploader.start();
};
});
First request with Chrome :
Request URL:https://remote.com/API/action.php
Request Method:OPTIONS
Status Code:200 OK
Second request with Chrome :
Request URL:https://remote.com/API/action.php
Request Method:POST
Status Code:200 OK
Request Headers
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
Access-Control-Request-Headers:content-type
Access-Control-Request-Method:POST
Cache-Control:no-cache
Connection:keep-alive
Host:hipt.ucc.ie
Origin:http://server.com
Pragma:no-cache
Referer: XXX
User-Agent:Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36
Response Headers
Access-Control-Allow-Headers:Content-Type, Authorization, X-Requested-With
Access-Control-Allow-Methods:GET, PUT, POST, DELETE, OPTIONS
Access-Control-Allow-Origin:*
Access-Control-Max-Age:1000
Cache-Control:no-cache
Connection:close
Content-Length:5
Content-Type:text/html; charset=UTF-8
Date:Mon, 24 Feb 2014 11:57:54 GMT
Server:Apache/2.2.3 (CentOS)
X-Powered-By:PHP/5.1.6
Serverside script :
<?php
header('Access-Control-Allow-Origin: *');
header('Access-Control-Allow-Methods: GET, PUT, POST, DELETE, OPTIONS');
header('Cache-Control: no-cache');
header('Access-Control-Max-Age: 1000');
header('Access-Control-Allow-Headers: Content-Type, Authorization, X-Requested-With');
if (!empty($_FILES)) {
With Firefox the response to the request with the OPTIONS method is empty and there is no following POST request.
Here are the Firefox headers:
I cannot figure out why it is not working with Firefox and IE8.
Thanks for your help.
[EDIT] I just tried with flash runtimes: same thing it works with Chrome and IE 10 but not with Firefox and IE8. The weird thing is that the alert(info['runtime']); does not appear but there is no javascript error in the console...
Ok so I finally find out why it wasn't working. I checked using wireshark and I noticed that there was an encrypted alert.
I then check the certificate of the remote server using : http://www.sslshopper.com/ssl-checker.html and got this answer :
The certificate is not trusted in all web browsers. You may need to install an Intermediate/chain certificate to link it to a trusted root certificate. Learn more about this error. The fastest way to fix this problem is to contact your SSL provider.
I had to add an exception and it finally worked \o/
This error is also raised when you get a server-side 500 error. For example, if you have a syntax error in your program (or a fatal run-time error).
I had the same problem, but I clearly knew that the problem was in csrf token.
The solution is the following:
HTML:
<HTML>
<body>
<p>your template content</p>
<!-- upload demo -->
<ul id="filelist"></ul>
<br />
<pre id="console"></pre>
<div id="container">
<a id="browse" href="javascript:;">[Browse...]</a>
{% csrf_token %} <!-- it may be places elsewhere in HTML -->
<a id="start-upload" href="javascript:;">[Start Upload]</a>
</div>
<!-- upload demo end -->
<p>your template cont'd</p>
<script type="text/javascript">
// function to get your crsf token from the cookie
function getCookie(name) {
let cookieValue = null;
if (document.cookie && document.cookie !== '') {
const cookies = document.cookie.split(';');
for (let i = 0; i < cookies.length; i++) {
const cookie = cookies[i].trim();
// Does this cookie string begin with the name we want?
if (cookie.substring(0, name.length + 1) === (name + '=')) {
cookieValue = decodeURIComponent(cookie.substring(name.length + 1));
break;
}
}
}
return cookieValue;
}
const crsf = getCookie('csrftoken');
var uploader = new plupload.Uploader({
browse_button: 'browse', // this can be an id of a DOM element or the DOM element itself
url: '/upload/',
chunk_size: '822kb',
headers: {'X-CSRFToken': crsf}, // here we add token to a header
max_retries: 1
});
uploader.init();
<!-- and so on -->
</script>
</body>
</HTML>
#urls.py
urlpatterns = [
...
path(route='upload/', view=Upload.as_view(), name='upload'),
#views.py
class Upload(View):
def post(self, request):
print('got upload post request')
print(request.headers)
## do with the chunk whatever you need...
return HttpResponse('email good, thank you!', status=200)
The example from here:
https://www.plupload.com/docs/v2/Getting-Started#wiki-full-example
And you may read about settings over here:
https://www.plupload.com/docs/v2/Uploader#settings-property
It works with post() method in Django 2.1+

Why can I login through this form with a browser, but not LWP?

I was trying to login into a website which uses this form with three inputs to authenticate.
<form action="/login.html" method="post">
<div class="loginlabel1 aright">ID / Email: </div>
<div class="bsearchfield">
<input type="text" name="profid" class="inputBx" size="15" value="" />
</div>
<div class="clear"></div>
<div class="loginlabel1 aright">Password: </div>
<div class="bsearchfield">
<input type="password" name="password" class="inputBx" size="15" value="" />
</div>
<div class="clear"></div>
<div class="loginbutton1">
<input name="login"type="image" src="images/logi.gif" align="right" border="0" />
</div>
</form>
If I login through browser, a successful login redirects me to http://www.example.com/myhome.html.
But the following script is not logging me in and returns the same login.html page. Did I miss something? I am not getting any error message. Did I post successfully?
#!/usr/bin/perl -w
use LWP 5.64;
my $browser = LWP::UserAgent->new || die " Failed LWP USER AGENT : $!";
$ENV{HTTP_proxy} = "http://proxy:port";
$browser->env_proxy;
$browser->cookie_jar({});
my #Header = (
'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)',
'Accept' => 'image/gif, image/x-xbitmap, image/jpeg,image/pjpeg, image/png, */*',
'Accept-Charset' => 'iso-8859-1,*,utf-8',
'Accept-Language' => 'en-US',
);
push #{$browser->requests_redirectable}, 'POST';
$response = $browser->post(
"http://www.example.com/login.html",
[
'profid' => 'username',
'password' => 'password'
],#Header
);
$response->is_success or die "Failed to post: ", $response->status_line;
print "Successfully posted username and password.\n" if $response->is_fresh;
#printf("%s",$response->content);
printf("%s\n", $response->status_line);
printf("%s", $response->header("Accept-Ranges"));
printf("%s", $response->header("Age"));
printf("%s", $response->header("ETag"));
printf("%s", $response->header("Location"));
printf("%s", $response->header("Proxy-Authenticate"));
printf("%s", $response->header("Retry-After"));
printf("%s", $response->header("Server"));
printf("%s", $response->header("Vary"));
printf("%s", $response->header("WWW-Authenticate"));
delete $ENV{HTTP_PROXY};
Your submit button is an image. When clicking on an input of type image, a browser sends the pixel coordinates where you clicked to the CGI. In your form, a browser would send login.x and login.y along with profid and password.
BTW, Firebug is a great tool for debugging CGI.
Sometimes they require correct accept-encoding and/or referer headers. I'd also try user-agent header, to be sure.
I'd also recommend LiveHTTPHeaders for Firefox. You turn it on, then submit your form and it shows exactly what was GET or POST'd to the site, including all headers, params, and cookies and then shows all of the responses from the server including set cookies, headers, and redirects.
There could be javascript on the page creating additional params that you are not seeing when you just look at the form, the image coords as PacoRG above stated, or it may be requiring that you accept a cookie first and sent that with the login.
LiveHTTPHeaders also lets you modify the headers and "replay" - this lets you modify what is sent to the server (any headers, cookies, params, etc) to help determine what is actually required by the server to login.
Also, I believe that LWP by default automatically follows redirects, so the page may actually be redirecting and you are not seeing it (I believe the "simple_request" function does not follow redirs.)
In the LWP response, you can walk backwards through any redirects like so:
my $prev_res = $res->previous();
while ( $prev_res ) {
print $prev_res->status_line . "\n";
$prev_res = $prev_res->previous();
}
Hope this helps!
You're not submitting the name of the submit button which is clicked; I suspect the code at the other end is checking for the presence of that variable in the request to see if the form has been submitted or not.
As PacoRG points out, the submit button is an image; as such, submitting by clicking that button in a browser would submit fields named "login.x" and "login.y", along with "login".
A good way to avoid problems like this is to use WWW::Mechanize to do a lot of the work for you, for instance:
my $mech = WWW::Mechanize->new;
$mech->get('http://www.example.com/login.html');
$mech-submit_form(
with_fields => {
profid => $username,
password => $password,
},
);
The above would request the login page, find the appropriate form, and submit it.
Also, as others have said, if requests from your script are handled differently to requests from your browser, the best way to debug is to grab the full HTTP request that both send, and look for pertinent differences. For the browser, you can use an extension like Firefox's LiveHTTPHeaders or Tamper Data plugins, or use something like Wireshark to capture the request as it's sent. For the script, you can easily have it output the request being sent.
For instance, for a script using LWP::UserAgent or WWW::Mechanize (which subclasses LWP::UserAgent), you can add:
$mech->add_handler("request_send", sub { shift->dump; return });
$mech->add_handler("response_done", sub { shift->dump; return });
This will dump the raw request sent, along with the raw response from the server. (Change $mech to whatever var your LWP::UserAgent / WWW::Mechanize object is in - $browser in your example.)