Correct way to use InternetReadFile() asynchronously - wininet

I've got code that's performing HTTP requests using WinInet API's asynchronously. In general, my code works, but I'm confused about the 'right' way to do things. In the documentation for InternetReadFile(), it states:
To ensure all data is retrieved, an application must continue to call
the InternetReadFile function until the function returns TRUE and the
lpdwNumberOfBytesRead parameter equals zero.
but in asynchronous mode, it may (or may not) return false, and an error of ERROR_IO_PENDING, indicating it'll do the work asynchronously, and call my callback when finished. If I read the documentation literally, it seems that the asynchronous calls could also just do a partial read of the requested buffer, and require the caller to keep calling InternetReadFile until a read of 0 bytes is encountered.
A typical implementation using InternetReadFile() synchronously would look something like this:
while(InternetReadFile(Request, Buffer, BufferSize, &BytesRead) && BytesRead != 0)
{
// do something with Buffer
}
but with the possibility that any one call to InternetReadFile() could signal that it's going to do the work asynchronously (and perhaps read part, but not all of your request), it becomes much more complicated. If I turn to MSDN sample code for guidance, the implementation is simple, simply calling InternetReadFile() once, and expecting a single return having read the entire requested buffer either instantly or asynchronously. Is this the correct way to use this function, or is MSDN Sample Code ignoring the possibility that InternetReadFile() will only read part of the requested buffer?

After a more careful reading of the asynchronous example, I see now that it is reading repeatedly until a successful read of 0 bytes is encountered. So to answer my own question, you must call InternetReadFile() over and over again, and be prepared for either a synchronous or asynchronous response.

Reading InternetReadFile() repeatedly until it returns TRUE and BytesRead is 0 is a correct way to use InternetReadFile(), but not enough if you work asynchronously.
As MSDN says
When running asynchronously, if a call to InternetReadFile does not result in a completed transaction, it will return FALSE and a subsequent call to GetLastError will return ERROR_IO_PENDING. When the transaction is completed the InternetStatusCallback specified in a previous call to InternetSetStatusCallback will be called with INTERNET_STATUS_REQUEST_COMPLETE.
So InternetReadFile() may return FALSE and set the last error to ERROR_IO_PENDING value if you work in asynchronous mode.
When InternetSetStatusCallback will be called again with INTERNET_STATUS_REQUEST_COMPLETE, the lpvStatusInformation parameter will contain the address of an INTERNET_ASYNC_RESULT structure (see InternetStatusCallback callback function). The INTERNET_ASYNC_RESULT.dwResult member will contain the result of asynchronous operation (TRUE or FALSE since you called InternetReadFile) and the INTERNET_ASYNC_RESULT.dwError will contain a error code only if dwResult is FALSE.
If dwResult is TRUE then your Buffer contains data read from Internet, and the BytesRead contains the number of bytes read asynchronously.
So one of the most important things when you work asynchronously, the Buffer and the BytesRead must be persistent between InternetStatusCallback calls, i.e. must not be allocated on the stack. Otherwise it has undefined behaviour, causes memory corruption, etc.

Related

What is the memory allocation when you return a function in Golang?

Here is a simplified code
func MyHandler(a int) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteCode(a)
})
}
Whenever a http request comes MyHandler will be called, and it will return a function which will be used to handle the request. So whenever a http request comes a new function object will be created. Function is taken as the first class in Go. I'm trying to understand what actually happened when you return a function from memory's perspective. When you return a value, for example an integer, it will occupy 4 bytes in stack. So how about return a function and lots of things inside the function body? Is it an efficient way to do so? What's shortcomings?
If you're not used to closures, they may seem a bit magic. They are, however, easy to implement in compilers:
The compiler finds any variables that must be captured by the closure. It puts them into a working area that will be allocated and remain allocated as long as the closure itself exists.
The compiler then generates the inner function with a secret extra parameter, or some other runtime trickery,1 such that calling the function activates the closure.
Because the returned function accesses its closure variables through the compile-time arrangement, there's nothing special needed. And since Go is a garbage-collected language, there's nothing else needed either: the pointer to the closure keeps the closure data alive until the pointer is gone because the function cannot be called any more, at which point the closure data evaporates (well, at the next GC).
1GCC sometimes uses trampolines to do this for C, where trampolines are executable code generated at runtime. The executable code may set a register or pass an extra parameter or some such. This can be expensive since something treated as data at runtime (generated code) must be turned into executable code at runtime (possibly requiring a system call and potentially requiring that some supervisory code "vet" the resulting runtime code).
Go does not need any of this because the language was defined with closures in mind, so implementors don't, er, "close off" any easy ways to make this all work. Some runtime ABIs are defined with closures in mind as well, e.g., register r1 is reserved as the closure-variables pointer in all pointer-to-function types, or some such.
Actual function size is irrelevant. When you return a function like this, memory will be allocated for the closure, that is, any variables in the scope that the function uses. In this case, a pointer will be returned containing the address of the function and a pointer to the closure, which will contain a reference to the variable a.

Reading byte array the second time causes error?

I am using the following code to read an error message from a byte array and it works fine the first time but if I try to access it the second time it throws an error:
errorData = process.standardError.readUTFBytes(process.standardError.bytesAvailable);
StandardError is of type InboundPipe?
The error is:
Error: Error #3212: Cannot perform operation on a NativeProcess that is not running.
even though the process is running (process.running is true). It's on the second call to readUTFBytes that seems to be the cause.
Update:
Here is the code calling the same call one after another. The error happens on the next line and process.running has not changed from true. Happens on the second call.
errorData = process.standardError.readUTFBytes(process.standardError.bytesAvailable);
errorData = process.standardError.readUTFBytes(process.standardError.bytesAvailable);
I also found out the standardError is a InboundPipe instance and implements IDataInput.
Update 2:
Thanks for all the help. I found this documentation when viewing the bytesAvailable property.
[Read Only] Returns the number of bytes of data available for reading
in the input buffer. User code must call bytesAvailable to ensure that
sufficient data is available before trying to read it with one of the
read methods.
When I call readUTFBytes() it resets the bytes available to 0. So when I read it a second time and there are no bytes available it causes the error. The error is or may be incorrect in my opinion or the native process.running flag is incorrect.
I looked into seeing if it has a position property and it does not, at least not in this instance.
Could you try to set position to zero before reading process, especially before repetitive access:
Moves, or returns the current position, in bytes, of the file pointer into the ByteArray object. This is the point at which the next call to a read method starts reading or a write method starts writing.
//ByteArray example
var source: String = "Some data";
var data: ByteArray = new ByteArray();
data.writeUTFBytes(source);
data.position = 0;
trace(data.readUTFBytes(data.bytesAvailable));
data.position = 0;
trace(data.readUTFBytes(data.bytesAvailable));
This was a tricky problem since the object was not a byte array although it looks and acts like one (same methods and almost same properties). It is an InboundPipe that also implements IDataInput.
I found this documentation when viewing the bytesAvailable property.
[Read Only] Returns the number of bytes of data available for reading
in the input buffer. User code must call bytesAvailable to ensure that
sufficient data is available before trying to read it with one of the
read methods.
When I call readUTFBytes() it resets the bytes available to 0. So when I call it a second time and there are no bytes available it causes the error. The error is or may be incorrect in my opinion or the native process.running flag is incorrect although I have reason to believe it's the former.
The solution is to check bytesAvailable before calling read operations and store the value if it needs to be accessed later.
if (process.standardError.bytesAvailable) {
errorData = process.standardError.readUTFBytes(process.standardError.bytesAvailable);
errorDataArray.push(errorData);
}
I looked into seeing if it has a position property and it does not, at least not in this instance.

Is currying just a way to avoid inheritance?

So my understanding of currying (based on SO questions) is that it lets you partially set parameters of a function and return a "truncated" function as a result.
If you have a big hairy function takes 10 parameters and looks like
function (location, type, gender, jumpShot%, SSN, vegetarian, salary) {
//weird stuff
}
and you want a "subset" function that will let you deal with presets for all but the jumpShot%, shouldn't you just break out a class that inherits from the original function?
I suppose what I'm looking for is a use case for this pattern. Thanks!
Currying has many uses. From simply specifying default parameters for functions you use often to returning specialized functions that serve a specific purpose.
But let me give you this example:
function log_message(log_level, message){}
log_error = curry(log_message, ERROR)
log_warning = curry(log_message, WARNING)
log_message(WARNING, 'This would be a warning')
log_warning('This would also be a warning')
In javascript I do currying on callback functions (because they cannot be passed any parameters after they are called (from the caller)
So something like:
...
var test = "something specifically set in this function";
onSuccess: this.returnCallback.curry(test).bind(this),
// This will fail (because this would pass the var and possibly change it should the function be run elsewhere
onSuccess: this.returnCallback.bind(this,test),
...
// this has 2 params, but in the ajax callback, only the 'ajaxResponse' is passed, so I have to use curry
returnCallback: function(thePassedVar, ajaxResponse){
// now in here i can have 'thePassedVar', if
}
I'm not sure if that was detailed or coherent enough... but currying basically lets you 'prefill' the parameters and return a bare function call that already has data filled (instead of requiring you to fill that info at some other point)
When programming in a functional style, you often bind arguments to generate new functions (in this example, predicates) from old. Pseudo-code:
filter(bind_second(greater_than, 5), some_list)
might be equivalent to:
filter({x : x > 5}, some_list)
where {x : x > 5} is an anonymous function definition. That is, it constructs a list of all values from some_list which are greater than 5.
In many cases, the parameters to be omitted will not be known at compile time, but rather at run time. Further, there's no limit to the number of curried delegates that may exist for a given function. The following is adapted from a real-world program.
I have a system in which I send out command packets to a remote machine and receive back response packets. Every command packet has an index number, and each reply bears the index number of the command to which it is a response. A typical command, translated into English, might be "give me 128 bytes starting at address 0x12300". A typical response might be "Successful." along with 128 bytes of data.
To handle communication, I have a routine which accepts a number of command packets, each with a delegate. As each response is received, the corresponding delegate will be run on the received data. The delegate associated with the command above would be something like "Confirm that I got a 'success' with 128 bytes of data, and if so, store them into my buffer at address 0x12300". Note that multiple packets may be outstanding at any given time; the curried address parameter is necessary for the routine to know where the incoming data should go. Even if I wanted to write a "store data to buffer" routine which didn't require an address parameter, it would have no way of knowing where the incoming data should go.

What are the cons of returning an Exception instance instead of raising it in Python?

I have been doing some work with python-couchdb and desktopcouch. In one of the patches I submitted I wrapped the db.update function from couchdb. For anyone that is not familiar with python-couchdb the function is the following:
def update(self, documents, **options):
"""Perform a bulk update or insertion of the given documents using a
single HTTP request.
>>> server = Server('http://localhost:5984/')
>>> db = server.create('python-tests')
>>> for doc in db.update([
... Document(type='Person', name='John Doe'),
... Document(type='Person', name='Mary Jane'),
... Document(type='City', name='Gotham City')
... ]):
... print repr(doc) #doctest: +ELLIPSIS
(True, '...', '...')
(True, '...', '...')
(True, '...', '...')
>>> del server['python-tests']
The return value of this method is a list containing a tuple for every
element in the `documents` sequence. Each tuple is of the form
``(success, docid, rev_or_exc)``, where ``success`` is a boolean
indicating whether the update succeeded, ``docid`` is the ID of the
document, and ``rev_or_exc`` is either the new document revision, or
an exception instance (e.g. `ResourceConflict`) if the update failed.
If an object in the documents list is not a dictionary, this method
looks for an ``items()`` method that can be used to convert the object
to a dictionary. Effectively this means you can also use this method
with `schema.Document` objects.
:param documents: a sequence of dictionaries or `Document` objects, or
objects providing a ``items()`` method that can be
used to convert them to a dictionary
:return: an iterable over the resulting documents
:rtype: ``list``
:since: version 0.2
"""
As you can see, this function does not raise the exceptions that have been raised by the couchdb server but it rather returns them in a tuple with the id of the document that we wanted to update.
One of the reviewers went to #python on irc to ask about the matter. In #python they recommended to use sentinel values rather than exceptions. As you can imaging just an approach is not practical since there are lots of possible exceptions that can be received. My questions is, what are the cons of using Exceptions over sentinel values besides that using exceptions is uglier?
I think it is ok to return the exceptions in this case, because some parts of the update function may succeed and some may fail. When you raise the exception, the API user has no control over what succeeded already.
Raising an Exception is a notification that something that was expected to work did not work. It breaks the program flow, and should only be done if whatever is going on now is flawed in a way that the program doesn't know how to handle.
But sometimes you want to raise a little error flag without breaking program flow. You can do this by returning special values, and these values can very well be exceptions.
Python does this internally in one case. When you compare two values like foo < bar, the actual call is foo.__lt__(bar). If this method raises an exception, program flow will be broken, as expected. But if it returns NotImplemented, Python will then try bar.__ge__(foo) instead. So in this case returning the exception rather than raising it is used to flag that it didn't work, but in an expected way.
It's really the difference between an expected error and an unexpected one, IMO.
exceptions intended to be raised. It helps with debugging, handling causes of the errors and it's clear and well-established practise of other developers.
I think looking at the interface of the programme, it's not clear what am I supposed to do with returned exception. raise it? from outside of the chain that actually caused it? it seems a bit convoluted.
I'd suggest, returning docid, new_rev_doc tuple on success and propagating/raising exception as it is. Your approach duplicates success and type of 3rd returned value too.
Exceptions cause the normal program flow to break; then exceptions go up the call stack until they're intercept, or they may reach the top if they aren't. Hence they're employed to mark a really special condition that should be handled by the caller. Raising an exception is useful since the program won't continue if a necessary condition has not been met.
In languages that don't support exceptions (like C) you're often forced to check return values of functions to verify everything went on correctly; otherwise the program may misbehave.
By the way the update() is a bit different:
it takes multiple arguments; some may fail, some may succeed, hence it needs a way to communicate results for each arg.
a previous failure has no relation with operations coming next, e.g. it is not a permanent error
In that situation raising an exception would NOT be usueful in an API. On the other hand, if the connection to the db drops while executing the query, then an exception is the way to go (since it's a permament error and would impact all operations coming next).
By the way if your business logic requires all operations to complete successfully and you don't know what to do when an update fails (i.e. your design says it should never happen), feel free to raise an exception in your own code.

What is an idempotent operation?

What is an idempotent operation?
In computing, an idempotent operation is one that has no additional effect if it is called more than once with the same input parameters. For example, removing an item from a set can be considered an idempotent operation on the set.
In mathematics, an idempotent operation is one where f(f(x)) = f(x). For example, the abs() function is idempotent because abs(abs(x)) = abs(x) for all x.
These slightly different definitions can be reconciled by considering that x in the mathematical definition represents the state of an object, and f is an operation that may mutate that object. For example, consider the Python set and its discard method. The discard method removes an element from a set, and does nothing if the element does not exist. So:
my_set.discard(x)
has exactly the same effect as doing the same operation twice:
my_set.discard(x)
my_set.discard(x)
Idempotent operations are often used in the design of network protocols, where a request to perform an operation is guaranteed to happen at least once, but might also happen more than once. If the operation is idempotent, then there is no harm in performing the operation two or more times.
See the Wikipedia article on idempotence for more information.
The above answer previously had some incorrect and misleading examples. Comments below written before April 2014 refer to an older revision.
An idempotent operation can be repeated an arbitrary number of times and the result will be the same as if it had been done only once. In arithmetic, adding zero to a number is idempotent.
Idempotence is talked about a lot in the context of "RESTful" web services. REST seeks to maximally leverage HTTP to give programs access to web content, and is usually set in contrast to SOAP-based web services, which just tunnel remote procedure call style services inside HTTP requests and responses.
REST organizes a web application into "resources" (like a Twitter user, or a Flickr image) and then uses the HTTP verbs of POST, PUT, GET, and DELETE to create, update, read, and delete those resources.
Idempotence plays an important role in REST. If you GET a representation of a REST resource (eg, GET a jpeg image from Flickr), and the operation fails, you can just repeat the GET again and again until the operation succeeds. To the web service, it doesn't matter how many times the image is gotten. Likewise, if you use a RESTful web service to update your Twitter account information, you can PUT the new information as many times as it takes in order to get confirmation from the web service. PUT-ing it a thousand times is the same as PUT-ing it once. Similarly DELETE-ing a REST resource a thousand times is the same as deleting it once. Idempotence thus makes it a lot easier to construct a web service that's resilient to communication errors.
Further reading: RESTful Web Services, by Richardson and Ruby (idempotence is discussed on page 103-104), and Roy Fielding's PhD dissertation on REST. Fielding was one of the authors of HTTP 1.1, RFC-2616, which talks about idempotence in section 9.1.2.
No matter how many times you call the operation, the result will be the same.
Idempotence means that applying an operation once or applying it multiple times has the same effect.
Examples:
Multiplication by zero. No matter how many times you do it, the result is still zero.
Setting a boolean flag. No matter how many times you do it, the flag stays set.
Deleting a row from a database with a given ID. If you try it again, the row is still gone.
For pure functions (functions with no side effects) then idempotency implies that f(x) = f(f(x)) = f(f(f(x))) = f(f(f(f(x)))) = ...... for all values of x
For functions with side effects, idempotency furthermore implies that no additional side effects will be caused after the first application. You can consider the state of the world to be an additional "hidden" parameter to the function if you like.
Note that in a world where you have concurrent actions going on, you may find that operations you thought were idempotent cease to be so (for example, another thread could unset the value of the boolean flag in the example above). Basically whenever you have concurrency and mutable state, you need to think much more carefully about idempotency.
Idempotency is often a useful property in building robust systems. For example, if there is a risk that you may receive a duplicate message from a third party, it is helpful to have the message handler act as an idempotent operation so that the message effect only happens once.
A good example of understanding an idempotent operation might be locking a car with remote key.
log(Car.state) // unlocked
Remote.lock();
log(Car.state) // locked
Remote.lock();
Remote.lock();
Remote.lock();
log(Car.state) // locked
lock is an idempotent operation. Even if there are some side effect each time you run lock, like blinking, the car is still in the same locked state, no matter how many times you run lock operation.
An idempotent operation produces the result in the same state even if you call it more than once, provided you pass in the same parameters.
An idempotent operation is an operation, action, or request that can be applied multiple times without changing the result, i.e. the state of the system, beyond the initial application.
EXAMPLES (WEB APP CONTEXT):
IDEMPOTENT:
Making multiple identical requests has the same effect as making a single request. A message in an email messaging system is opened and marked as "opened" in the database. One can open the message many times but this repeated action will only ever result in that message being in the "opened" state. This is an idempotent operation. The first time one PUTs an update to a resource using information that does not match the resource (the state of the system), the state of the system will change as the resource is updated. If one PUTs the same update to a resource repeatedly then the information in the update will match the information already in the system upon every PUT, and no change to the state of the system will occur. Repeated PUTs with the same information are idempotent: the first PUT may change the state of the system, subsequent PUTs should not.
NON-IDEMPOTENT:
If an operation always causes a change in state, like POSTing the same message to a user over and over, resulting in a new message sent and stored in the database every time, we say that the operation is NON-IDEMPOTENT.
NULLIPOTENT:
If an operation has no side effects, like purely displaying information on a web page without any change in a database (in other words you are only reading the database), we say the operation is NULLIPOTENT. All GETs should be nullipotent.
When talking about the state of the system we are obviously ignoring hopefully harmless and inevitable effects like logging and diagnostics.
Just wanted to throw out a real use case that demonstrates idempotence. In JavaScript, say you are defining a bunch of model classes (as in MVC model). The way this is often implemented is functionally equivalent to something like this (basic example):
function model(name) {
function Model() {
this.name = name;
}
return Model;
}
You could then define new classes like this:
var User = model('user');
var Article = model('article');
But if you were to try to get the User class via model('user'), from somewhere else in the code, it would fail:
var User = model('user');
// ... then somewhere else in the code (in a different scope)
var User = model('user');
Those two User constructors would be different. That is,
model('user') !== model('user');
To make it idempotent, you would just add some sort of caching mechanism, like this:
var collection = {};
function model(name) {
if (collection[name])
return collection[name];
function Model() {
this.name = name;
}
collection[name] = Model;
return Model;
}
By adding caching, every time you did model('user') it will be the same object, and so it's idempotent. So:
model('user') === model('user');
Quite a detailed and technical answers. Just adding a simple definition.
Idempotent = Re-runnable
For example,
Create operation in itself is not guaranteed to run without error if executed more than once.
But if there is an operation CreateOrUpdate then it states re-runnability (Idempotency).
Idempotent Operations: Operations that have no side-effects if executed multiple times.
Example: An operation that retrieves values from a data resource and say, prints it
Non-Idempotent Operations: Operations that would cause some harm if executed multiple times. (As they change some values or states)
Example: An operation that withdraws from a bank account
It is any operation that every nth result will result in an output matching the value of the 1st result. For instance the absolute value of -1 is 1. The absolute value of the absolute value of -1 is 1. The absolute value of the absolute value of absolute value of -1 is 1. And so on. See also: When would be a really silly time to use recursion?
An idempotent operation over a set leaves its members unchanged when applied one or more times.
It can be a unary operation like absolute(x) where x belongs to a set of positive integers. Here absolute(absolute(x)) = x.
It can be a binary operation like union of a set with itself would always return the same set.
cheers
In short, Idempotent operations means that the operation will not result in different results no matter how many times you operate the idempotent operations.
For example, according to the definition of the spec of HTTP, GET, HEAD, PUT, and DELETE are idempotent operations; however POST and PATCH are not. That's why sometimes POST is replaced by PUT.
An operation is said to be idempotent if executing it multiple times is equivalent to executing it once.
For eg: setting volume to 20.
No matter how many times the volume of TV is set to 20, end result will be that volume is 20. Even if a process executes the operation 50/100 times or more, at the end of the process the volume will be 20.
Counter example: increasing the volume by 1. If a process executes this operation 50 times, at the end volume will be initial Volume + 50 and if a process executes the operation 100 times, at the end volume will be initial Volume + 100. As you can clearly see that the end result varies based upon how many times the operation was executed. Hence, we can conclude that this operation is NOT idempotent.
I have highlighted the end result in bold.
If you think in terms of programming, let's say that I have an operation in which a function f takes foo as the input and the output of f is set to foo back. If at the end of the process (that executes this operation 50/100 times or more), my foo variable holds the value that it did when the operation was executed only ONCE, then the operation is idempotent, otherwise NOT.
foo = <some random value here, let's say -2>
{ foo = f( foo ) }   curly brackets outline the operation
if f returns the square of the input then the operation is NOT idempotent. Because foo at the end will be (-2) raised to the power (number of times operation is executed)
if f returns the absolute of the input then the operation is idempotent because no matter how many multiple times the operation is executed foo will be abs(-2).
Here, end result is defined as the final value of variable foo.
In mathematical sense, idempotence has a slightly different meaning of:
f(f(....f(x))) = f(x)
here output of f(x) is passed as input to f again which doesn't need to be the case always with programming.
my 5c:
In integration and networking the idempotency is very important.
Several examples from real-life:
Imagine, we deliver data to the target system. Data delivered by a sequence of messages.
1. What would happen if the sequence is mixed in channel? (As network packages always do :) ). If the target system is idempotent, the result will not be different. If the target system depends of the right order in the sequence, we have to implement resequencer on the target site, which would restore the right order.
2. What would happen if there are the message duplicates? If the channel of target system does not acknowledge timely, the source system (or channel itself) usually sends another copy of the message. As a result we can have duplicate message on the target system side.
If the target system is idempotent, it takes care of it and result will not be different.
If the target system is not idempotent, we have to implement deduplicator on the target system side of the channel.
For a workflow manager (as Apache Airflow) if an idempotency operation fails in your pipeline the system can retry the task automatically without affecting the system. Even if the logs change, that is good because you can see the incident.
The most important in this case is that your system can retry the task that failed and doesn't mess up the pipeline (e.g. appending the same data in a table each retry)
Let's say the client makes a request to "IstanceA" service which process the request, passes it to DB, and shuts down before sending the response. since the client does not see that it was processed and it will retry the same request. Load balancer will forward the request to another service instance, "InstanceB", which will make the same change on the same DB item.
We should use idempotent tokens. When a client sends a request to a service, it should have some kind of request-id that can be saved in DB to show that we have already executed the request. if the client retries the request, "InstanceB" will check the requestId. Since that particular request already has been executed, it will not make any change to the DB item. Those kinds of requests are called idempotent requests. So we send the same request multiple times, but we won't make any change