How to create a repeatable POST request that contains multipart-form-data? - apache-httpclient-4.x

I am trying to create a POST request that contains multipart-form-data that requires NT Credentials. The authentication request causes the POST to be resent and I get a unrepeatable entity exception.
I tried wrapping the MultipartContent entity that is produced with a BufferedHttpEntity but it throws NullPointerExceptions?
final GenericUrl sau = new GenericUrl(baseURI.resolve("Record"));
final MultipartContent c = new MultipartContent().setMediaType(MULTIPART_FORM_DATA).setBoundary("__END_OF_PART__");
final MultipartContent.Part p0 = new MultipartContent.Part(new HttpHeaders().set("Content-Disposition", format("form-data; name=\"%s\"", "RecordRecordType")), ByteArrayContent.fromString(null, "C_APP_BOX"));
final MultipartContent.Part p1 = new MultipartContent.Part(new HttpHeaders().set("Content-Disposition", format("form-data; name=\"%s\"", "RecordTitle")), ByteArrayContent.fromString(null, "JAVA_TEST"));
c.addPart(p0);
c.addPart(p1);
The documentation for ByteArrayContent says
Concrete implementation of AbstractInputStreamContent that generates repeatable input streams based on the contents of byte array.
Making all the parts repeatable does not solve the problem. Because this code
System.out.println("c.retrySupported() = " + c.retrySupported()); outputs c.retrySupported() = true.
I found the following documentation:
1.1.4.1. Repeatable entities An entity can be repeatable, meaning its content can be read more than once. This is only possible with self
contained entities (like ByteArrayEntity or StringEntity)
I have now converted my MultipartContent to a ByteArrayContent with a multi/part-form media type by extracting the string contents and still get the same error!
But I still get the following exception when I try and call request.execute().
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity.
So how do I go about convincing the ApacheHttpTransport to create a repeatable Entity?

I had to modify all the classes that inherited from HttpContent so that they would report back correctly with .retrySupported() so that the when the ApacheHttpTransport code was entered it would create repeatable content correctly.
The changes were made against version 1.20.0 because that is what I was using. I am submitting a pull request against dev branch HEAD so hopefully, this or some version of this will make it into the next release.
Here are the modifications that need to be merged in.

If content length of all parts in multipart entity is known (returned as a non negative value) the entity will be treated as repeatable. The easiest way to make multipart entity repeatable is to make all its parts repeatable.

Related

How to add back comments/whitespaces in translator using the Antlr4's visitor model

I'm currently writing a TSQL (Sybase/Microsoft SQL) to MySQL translator using the ANTLR4 visitor approach.
I'm able to push comments and whitespaces to different channels so that I can use that information later.
What's not super clear is:
how do I get the data back?
and more importantly how do I plug the comments and whitespaces back into my translated MySQL code?
Re: #1, this seems to work to get the list of all tokens including the comments/whitespaces:
public static List<Token> getHiddenTokensFromString(String sqlIn, int hiddenChannel) {
CharStream charStream = CharStreams.fromString(sqlIn);
CaseChangingCharStream upper = new CaseChangingCharStream(charStream, true);
TSqlLexer lexer = new TSqlLexer(upper);
CommonTokenStream commonTokenStream = new CommonTokenStream(lexer, hiddenChannel);
commonTokenStream.fill();
List<Token> hiddenTokens = commonTokenStream.getTokens();
return hiddenTokens;
}
Re #2, what makes it particularly challenging is that as part of the translation, lines of SQL have to be moved around, some lines removed and some lines added.
Any help will be greatly appreciated.
Thanks.
The ANTLR4 lexer creates a number of tokens, each with an index (a running number). Provided you didn't just skip a token, all tokens are available for later inspection, once the parsing step is done, regardless of their channels (the channel is actually just a number property on a token).
So, given you have a token you want to translate, get its index and then ask the token stream for the tokens with the next smaller index or next higher index. These are usually the hidden whitespaces.
Once you have the whitespace token use its start and stop index to get the original text from the char stream. And since you know where you are in the translation process when you do that, it should be easy to know where to insert the original text.

kafka-python 1.3.3: KafkaProducer.send with explicit key fails to send message to broker

(Possibly a duplicate of Can't send a keyedMessage to brokers with partitioner.class=kafka.producer.DefaultPartitioner, although the OP of that question didn't mention kafka-python. And anyway, it never got an answer.)
I have a Python program that has been successfully (for many months) sending messages to the Kafka broker, using essentially the following logic:
producer = kafka.KafkaProducer(bootstrap_servers=[some_addr],
retries=3)
...
msg = json.dumps(some_message)
res = producer.send(some_topic, value=msg)
Recently, I tried to upgrade it to send messages to different partitions based on a definite key value extracted from the message:
producer = kafka.KafkaProducer(bootstrap_servers=[some_addr],
key_serializer=str.encode,
retries=3)
...
try:
key = some_message[0]
except:
key = None
msg = json.dumps(some_message)
res = producer.send(some_topic, value=msg, key=key)
However, with this code, no messages ever make it out of the program to the broker. I've verified that the key value extracted from some_message is always a valid string. Presumably I don't need to define my own partitioner, since, according to the documentation:
The default partitioner implementation hashes each non-None key using the same murmur2 algorithm as the java client so that messages with the same key are assigned to the same partition.
Furthermore, with the new code, when I try to determine what happened to my send by calling res.get (to obtain a kafka.FutureRecordMetadata), that call throws a TypeError exception with the message descriptor 'encode' requires a 'str' object but received a 'unicode'.
(As a side question, I'm not exactly sure what I'd do with the FutureRecordMetadata if I were actually able to get it. Based on the kafka-python source code, I assume I'd want to call either its succeeded or its failed method, but the documentation is silent on the point. The documentation does say that the return value of send "resolves to" RecordMetadata, but I haven't been able to figure out, from either the documentation or the code, what "resolves to" means in this context.)
Anyway: I can't be the only person using kafka-python 1.3.3 who's ever tried to send messages with a partitioning key, and I have not seen anything on teh Intertubes describing a similar problem (except for the SO question I referenced at the top of this post).
I'm certainly willing to believe that I'm doing something wrong, but I have no idea what that might be. Is there some additional parameter I need to supply to the KafkaProducer constructor?
The fundamental problem turned out to be that my key value was a unicode, even though I was quite convinced that it was a str. Hence the selection of str.encode for my key_serializer was inappropriate, and was what led to the exception from res.get. Omitting the key_serializer and calling key.encode('utf-8') was enough to get my messages published, and partitioned as expected.
A large contributor to the obscurity of this problem (for me) was that the kafka-python 1.3.3 documentation does not go into any detail on what a FutureRecordMetadata really is, nor what one should expect in the way of exceptions its get method can raise. The sole usage example in the documentation:
# Asynchronous by default
future = producer.send('my-topic', b'raw_bytes')
# Block for 'synchronous' sends
try:
record_metadata = future.get(timeout=10)
except KafkaError:
# Decide what to do if produce request failed...
log.exception()
pass
suggests that the only kind of exception it will raise is KafkaError, which is not true. In fact, get can and will (re-)raise any exception that the asynchronous publishing mechanism encountered in trying to get the message out the door.
I also faced the same error. Once I added json.dumps while sending the key, it worked.
producer.send(topic="first_topic", key=json.dumps(key)
.encode('utf-8'), value=json.dumps(msg)
.encode('utf-8'))
.add_callback(on_send_success).add_errback(on_send_error)

Postman/Newman junit report customization

I'm using postman and newman to perform automated tests and I do a JUnit export in order to exploit them in TFS.
However, when I open my .xml report, failures are indicated as follows:
-<failure type="AssertionFailure">
-<![CDATA[Failed 1 times.]]>
</failure>
I would like to know if it is possible to customize the "Failed 1 times." information in order to pass more relevant data about the failure (ie. json body error and description)
Thank you
Alexandre
Well, finally I found out how to proceed (not a clean way but sufficient for my purpose, so far):
I impact the file C:\Users\<myself>\AppData\Roaming\npm\node_modules\newman\lib\reporters\junit\index.js
Request's data and response can be recovered from 'executions' object:
stringExecutions = JSON.stringify(executions); //provide information about the arguments of the object "executions"
from this I can take general information by json-parsing this element and extracting what I want:
jsonExecutions = JSON.parse(stringExecutions)
jsonExecutions[0].response._details.code // gives me the http return code,
jsonExecutions[0].response._details.name // gives me the status,
jsonExecutions[0].response._details.detail //gives a bit more details
Error data (at test case/testsuite level) can be recovered from the 'err.error' object:
stringData = JSON.stringify(err.error); jsonData = JSON.parse(stringData);
from that I extract the data I need, ie.
jsonData.name // the error type
jsonData.message // the error detail
jsonData.stacktrace // the error stack
by the way, in the original file, stack cannot be displayed as there is no 'stack' argument in error.err (it is named 'stacktrace').
Finally failure data (at test step/testcase level) can be recovered from the 'failures' object:
stringFailure = JSON.stringify(failures); jsonFailure = JSON.parse(stringFailure);
from this I extract:
jsonFailure[0].name // the failure type
jsonFailure[0].stack // the failure stack
For my purpose, I add response details from jsonExecutions to my testsuite error data, which is much more verbose in the XML report than previousely.
If there is a cleaner/smarter way to perform this, do not hesitate to tell me, I'll be grateful
Next step : do it clean by creating a custom reporter. :)
Alexandre

How to best validate JSON on the server-side

When handling POST, PUT, and PATCH requests on the server-side, we often need to process some JSON to perform the requests.
It is obvious that we need to validate these JSONs (e.g. structure, permitted/expected keys, and value types) in some way, and I can see at least two ways:
Upon receiving the JSON, validate the JSON upfront as it is, before doing anything with it to complete the request.
Take the JSON as it is, start processing it (e.g. access its various key-values) and try to validate it on-the-go while performing business logic, and possibly use some exception handling to handle vogue data.
The 1st approach seems more robust compared to the 2nd, but probably more expensive (in time cost) because every request will be validated (and hopefully most of them are valid so the validation is sort of redundant).
The 2nd approach may save the compulsory validation on valid requests, but mixing the checks within business logic might be buggy or even risky.
Which of the two above is better? Or, is there yet a better way?
What you are describing with POST, PUT, and PATCH sounds like you are implementing a REST API. Depending on your back-end platform, you can use libraries that will map JSON to objects which is very powerful and performs that validation for you. In JAVA, you can use Jersey, Spring, or Jackson. If you are using .NET, you can use Json.NET.
If efficiency is your goal and you want to validate every single request, it would be ideal if you could evaluate on the front-end if you are using JavaScript you can use json2.js.
In regards to comparing your methods, here is a Pro / Cons list.
Method #1: Upon Request
Pros
The business logic integrity is maintained. As you mentioned trying to validate while processing business logic could result in invalid tests that may actually be valid and vice versa or also the validation could inadvertently impact the business logic negatively.
As Norbert mentioned, catching the errors before hand will improve efficiency. The logical question this poses is why spend the time processing, if there are errors in the first place?
The code will be cleaner and easier to read. Having validation and business logic separated will result in cleaner, easier to read and maintain code.
Cons
It could result in redundant processing meaning longer computing time.
Method #2: Validation on the Go
Pros
It's efficient theoretically by saving process and compute time doing them at the same time.
Cons
In reality, the process time that is saved is likely negligible (as mentioned by Norbert). You are still doing the validation check either way. In addition, processing time is wasted if an error was found.
The data integrity can be comprised. It could be possible that the JSON becomes corrupt when processing it this way.
The code is not as clear. When reading the business logic, it may not be as apparent what is happening because validation logic is mixed in.
What it really boils down to is Accuracy vs Speed. They generally have an inverse relationship. As you become more accurate and validate your JSON, you may have to compromise some on speed. This is really only noticeable in large data sets as computers are really fast these days. It is up to you to decide what is more important given how accurate you think you data may be when receiving it or whether that extra second or so is crucial. In some cases, it does matter (i.e. with the stock market and healthcare applications, milliseconds matter) and both are highly important. It is in those cases, that as you increase one, for example accuracy, you may have to increase speed by getting a higher performant machine.
Hope this helps.
The first approach is more robust, but does not have to be noticeably more expensive. It becomes way less expensive even when you are able to abort the parsing process due to errors: Your business logic usually takes >90% of the resources in a process, so if you have an error % of 10%, you are already resource neutral. If you optimize the validation process so that the validations from the business process are performed upfront, your error rate might be much lower (like 1 in 20 to 1 in 100) to stay resource neutral.
For an example on an implementation assuming upfront data validation, look at GSON (https://code.google.com/p/google-gson/):
GSON works as follows: Every part of the JSON can be cast into an object. This object is typed or contains typed data:
Sample object (JAVA used as example language):
public class someInnerDataFromJSON {
String name;
String address;
int housenumber;
String buildingType;
// Getters and setters
public String getName() { return name; }
public void setName(String name) { this.name=name; }
//etc.
}
The data parsed by GSON is by using the model provided, already type checked.
This is the first point where your code can abort.
After this exit point assuming the data confirmed to the model, you can validate if the data is within certain limits. You can also write that into the model.
Assume for this buildingType is a list:
Single family house
Multi family house
Apartment
You can check data during parsing by creating a setter which checks the data, or you can check it after parsing in a first set of your business rule application. The benefit of first checking the data is that your later code will have less exception handling, so less and easier to understand code.
I would definitively go for validation before processing.
Let's say you receive some json data with 10 variables of which you expect:
the first 5 variables to be of type string
6 and 7 are supposed to be integers
8, 9 and 10 are supposed to be arrays
You can do a quick variable type validation before you start processing any of this data and return a validation error response if one of the ten fails.
foreach($data as $varName => $varValue){
$varType = gettype($varValue);
if(!$this->isTypeValid($varName, $varType)){
// return validation error
}
}
// continue processing
Think of the scenario where you are directly processing the data and then the 10th value turns out to be of invalid type. The processing of the previous 9 variables was a waste of resources since you end up returning some validation error response anyway. On top of that you have to rollback any changes already persisted to your storage.
I only use variable type in my example but I would suggest full validation (length, max/min values, etc) of all variables before processing any of them.
In general, the first option would be the way to go. The only reason why you might need to think of the second option is if you were dealing with JSON data which was tens of MBs large or more.
In other words, only if you are trying to stream JSON and process it on the fly, you will need to think about second option.
Assuming that you are dealing with few hundred KB at most per JSON, you can just go for option one.
Here are some steps you could follow:
Go for a JSON parser like GSON that would just convert your entire
JSON input into the corresponding Java domain model object. (If GSON
doesn't throw an exception, you can be sure that the JSON is
perfectly valid.)
Of course, the objects which were constructed using GSON in step 1
may not be in a functionally valid state. For example, functional
checks like mandatory fields and limit checks would have to be done.
For this, you could define a validateState method which repeatedly
validates the states of the object itself and its child objects.
Here is an example of a validateState method:
public void validateState(){
//Assume this validateState is part of Customer class.
if(age<12 || age>150)
throw new IllegalArgumentException("Age should be in the range 12 to 120");
if(age<18 && (guardianId==null || guardianId.trim().equals(""))
throw new IllegalArgumentException("Guardian id is mandatory for minors");
for(Account a:customer.getAccounts()){
a.validateState(); //Throws appropriate exceptions if any inconsistency in state
}
}
The answer depends entirely on your use case.
If you expect all calls to originate in trusted clients then the upfront schema validation should be implement so that it is activated only when you set a debug flag.
However, if your server delivers public api services then you should validate the calls upfront. This isn't just a performance issue - your server will likely be scrutinized for security vulnerabilities by your customers, hackers, rivals, etc.
If your server delivers private api services to non-trusted clients (e.g., in a closed network setup where it has to integrate with systems from 3rd party developers), then you should at least run upfront those checks that will save you from getting blamed for someone else's goofs.
It really depends on your requirements. But in general I'd always go for #1.
Few considerations:
For consistency I'd use method #1, for performance #2. However when using #2 you have to take into account that rolling back in case of non valid input may become complicated in the future, as the logic changes.
Json validation should not take that long. In python you can use ujson for parsing json strings which is a ultrafast C implementation of the json python module.
For validation, I use the jsonschema python module which makes json validation easy.
Another approach:
if you use jsonschema, you can validate the json request in steps. I'd perform an initial validation of the most common/important parts of the json structure, and validate the remaining parts along the business logic path. This would allow to write simpler json schemas and therefore more lightweight.
The final decision:
If (and only if) this decision is critical I'd implement both solutions, time-profile them in right and wrong input condition, and weight the results depending on the wrong input frequency. Therefore:
1c = average time spent with method 1 on correct input
1w = average time spent with method 1 on wrong input
2c = average time spent with method 2 on correct input
2w = average time spent with method 2 on wrong input
CR = correct input rate (or frequency)
WR = wrong input rate (or frequency)
if ( 1c * CR ) + ( 1w * WR) <= ( 2c * CR ) + ( 2w * WR):
chose method 1
else:
chose method 2

Unit test being confused by three question marks

Im writing some junits, and have this check, comparing the keys and values of two hashmaps
Iterator<Map.Entry<String, String>> it = expected.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<String, String> pairs = (Map.Entry<String, String>) it.next();
assertTrue("Checks key exists", actual.containsKey(pairs.getKey()));
assertThat("Checks value", actual.get(pairs.getKey()), equalTo(pairs.getValue()));
}
Works great, but i have a value that trips it up:
java.lang.AssertionError: Checks value
Expected: "Member???s "
but: was "Member���s "
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
I checked the data, and the data is correct. it appears that the triple ? are tripping up something somehow. Does anyone know why this would be tripped? It seems pretty basic to me, its not even hamcrest getting messed up, its the actual assert.
You have an encoding conflict. Many different ways this could manifest itself, but generally caused by not enforcing consistent encoding across the board.
Assuming you are using UTF-8 somewhere...
If using Maven, set property project.build.sourceEncoding to UTF-8 See doc for more details.
Other build systems will certainly have options to specify code and resource file encodings.
If using IO to read (or write), always specify the encoding. For example, when reading from a file:
InputStream is = new FileInputStream(file), "UTF8");
In short, find any build system setting for encoding, and any point where IO is involved when reading text, and ensure your desired encoding is set.