Flying Saucer(xhtmlrenderer) word-split - xhtmlrenderer

I am getting a bug with Flying Saucer(xhtmlrenderer) where a word at the end of a line gets split across two lines, eg. thinking gets split into thin king, with king appearing at the beginning of the following line. This is very puzzling since the split does not seem to follow a pattern and seems to happen rarely and randomly e.g. 1 in every 20 pdfs generated.
Has anyone else, who has used Flying Saucer, encountered a similar issue?

Sorted. Answer available at http://abhirampal.com/2012/04/18/pdf-document-word-split-using-flying-saucerxhtml-renderer/

Related

how can i prevent my polygons from being split into several parts by the erase function?

I have a problem regarding the erase function of ET GeoWizard in ArcMap. For the original from ArcMap, i dont have the necessary license. But maybe it would be the same problem anyway. I have a feature class with many overlapping buffer polygons. These are derived from surfaces and have a lot of information stored in the attribute table. There are 55 in number. In order to experience the additional area per area created by the 10km buffer, I created this. However, these now also overlap with other areas that are not to be recorded again in terms of area. So I wanted to get the "origin faces" cut out of each of the 55 buffer rings. But without the surface being cut up into several individual pieces. if this cannot be avoided, appear in the attribute table as at least one feature or attributes, it also remains with 55 features in the attribute table. Do you know why this happens and how to avoid/solve it?
Have now first saved all individually, applied the erase function individually (or via batch) and then merged them again. But I still have to do this for several people and would find it very exciting for the future, where I have a mistake in thinking or where the problem lies in the program. I'd be happy for help :)

Allow page breaks inside the row/cell of Tablix

Not really new to SSRS but still can not get around the issue that bothers me for some time ...
So I have a page with title, some text box, then Tablix, then another textbox ...
Tablix is getting different user comments, some are short some are rather long and those long are making problems ...
So I'm trying to avoid that white space coming from (page break I presume).
Almost forgot ... problem is there where I render report in PDF. That is what is the most important to work.
I'm not that new and I know how to disable 'Keep together' options and I can only say that I did.
I also tried and checked all the statics (on Grouping with Advance view enabled).
I'm also prepared to make as many attempts as needed to get to the bottom of this so please do not hesitate to type every step you think I should follow etc. Also I'm willing to give any additional info you may require just ask ... was trying not to write a novel for a problem you can easily spot on image.
Thanks in advance!

Tesseracts handles similar Pictures completely different

I was just playing around a little with Tesseract as I noticed some strange behavior of the program, which I can’t explain myself. Firstly I gave tesseract this preprocessed Picture1 but it didn’t understand any letter.
Then I put this one in and guess what it gave me?
Neuinitialisierung des automatischen
Karten-Updates erforderlich. Aktuellste
The exact letters and words, every single letter was correct!
So can anybody tell me why it didn't got the text in the first picture.
(btw. I preprocessed the two pictures in the absolut same way)
Thanks in advance!

Find the average length of the preceding word of another word in a string of text

I'm trying to write a function prevword_ave_len(word) that takes a string arugment word and returns the average length in characters of the word that precedes word in the text.
The text is the first paragraph of Moby Dick:
Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people's hats off - then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me.
There are a few special requirements to be aware of:
If word happens to be the first word occurring in the text, then the length of the preceding word for that occurrence should be counted as 0.
If word is not in the text then the function should return False.
A "word" is simply a string that is delimited by "whitespace." Punctuation following a word is included as part of the word.
The casing in the original text and in word should be preserved.
How would I go about doing this? My thought process was to split the text into a list of words and then use a for loop to search for each instance of word, and where word is found, somehow index the word before word, find its length and add it to an empty list. Then I would average the elements in this list and then that would be my output. I just don't know how to go about doing this?
This solution uses a dictionary, which values are lists of the lengths of all preceding words.
The given example print the solution for word the (last line).
In case you are not familiar with defaultdict take a look here.
from collections import defaultdict
def prevword_ave_len(word, text):
words = defaultdict(list) #initialization of default dictionary
textlist = text.split() #split text into words
words[textlist[0]].append(0) #append 0 for first word in text
#iterate over words, append length of preceding word to values
for i in range(1,len(textlist)):
words[textlist[i]].append(len(textlist[i-1]))
if word in words:
return sum(words[word])/len(words[word]) #calculate mean
else: return False
if __name__ == "__main__":
text = "Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people's hats off - then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me."
print(prevword_ave_len('the', text))

"L" characters showing up randomly in text in IE 8

I'm having this problem with L characters showing up in IE 8. It's happening in the Healthcare Professionals block and the bottom two blocks. Any experience with this/clue as to what's wrong? I'm going to start deconstructing the whole page soon and rebuilding it line by line, but it would be great to get an answer as to what the heck the cause is.
Maybe you can refer to this https://webmasters.stackexchange.com/questions/15709/strange-characters-appearing-on-websites-ascii-unicode
There may be some encoding issue with the content.