(AS3) Getting an HTML-specific character index in a textfield after word wrap - html

I didn't know how to phrase the title, so sorry about that. If you have a better title suggestion, let me know and I'll change it.
I've got a chunk of text that is displayed as HTML in a TextField. An example of this text is this:
1
<font size="30" color="#FF0000">When your only tool is a hammer, all problems start looking like nails.</font>
</br>
2
<i>99 percent of lawyers give the rest a bad name.</i>
<b>Artificial intelligence is no match for natural stupidity.</b>
<u>The last thing I want to do is insult you. But it IS on the list.</u>
</br>
3<showimage=Images/image1.jpg>
I don't have a solution, but I do admire the problem.
The only substitute for good manners is fast reflexes.
Support bacteria - they're the only culture some people have.
</br>
4
Letting the cat out of the bag is a whole lot easier than putting it back in.
Well, here I am! What are your other two wishes?
Most of the tags are basic, meant to display what I can do formatting wise. However, since Adobe Air has a sandbox that prevents inline images (via the <img src='foo.png'> tag), I've had to come up with another way to display images.
Basically, I intend on having an image displayed somewhere on the screen, and as the user scrolls the image will change based on where in the text they have scrolled to. The image can be a background image, a slideshow on the right, anything really.
In the snippet above, look for my custom tag <showimage=Images/image1.jpg>. I want to get the local y position of that tag once the TextField is rendered as HTML and word wrapped. The trouble is, when I query the y position of the tag (using getCharBoundaries), I can only either search for the tag when I render the text as a .text instead of a .htmlText. If I search for the tag in the TextField after rendering it as .htmlText, it doesn't get found because the tags are hidden and replaced with formatting.
The trouble with the y value I get before rendering the HTML is that the y value will be different due to font sizes, tags being hidden and word wrap changing the line and y value that the tag is located at.
How do I get the correct y value of an HTML tag once the HTML has been rendered?
I've considered using a different style tag, maybe something like &&&&&showImage=Images/image1.jpg&&&&, but that seems like a cop-out and I'd still run into problems if multiple of those tags were in a block of text and the tags were removed, followed by word wrap that shifts lines in a pretty unpredictable way.

myTextField.textHeight tells you the height of the text in pixels. So you can split the string on whatever you're looking for, put the text before your target in the textField and get the textHeight, then put the rest of the text in.
Here's some example code - tMain is the name of the textField:
var iTextHeight: int = 0;
var sText: String = '<font size="30" color="#FF0000">When your only tool is a hammer, all problems start looking like nails.</font></br><i>99 percent of lawyers give the rest a bad name.</i><b>Artificial intelligence is no match for natural stupidity.</b><u>The last thing I want to do is insult you. But it IS on the list.</u></br><showimage=Images/image1.jpg> I don\'t have a solution, but I do admire the problem. The only substitute for good manners is fast reflexes. Support bacteria - they\'re the only culture some people have. </br>Letting the cat out of the bag is a whole lot easier than putting it back in. Well, here I am! What are your other two wishes?';
var aStringParts: Array = sText.split("<showimage=Images/image1.jpg>");
for (var i = 0; i < aStringParts.length; i++) {
if (i == 0) {
tMain.htmlText = aStringParts[i];
trace("height of text: " + tMain.textHeight);
} else {
tMain.appendText(aStringParts[i]);
}
}
sText gets split on the tag you're looking for (removes the text you're looking for and breaks remaining text into an array). The text leading up to the tag is put in the textField and the textHeight is traced. Then the rest of the text is put in the textField. This gives you the y pixel number you need to arrange things.
Let me know of any questions you have.

Instead of going through the trouble of parsing your image tag, have you tried playing with HTMLLoader and using the loadString method? This should load everything in its proper place including the image using the img tag.
private var htmlLoader:HTMLLoader;
private function loadHtml(content:String):void
{
htmlLoader = new HTMLLoader(); //Constructor
htmlLoader.addEventListener(Event.COMPLETE, handleHtmlLoadComplete); //Handle complete
htmlLoader.loadString(content); //Load html from string
}
private function handleHtmlLoadComplete(e:Event):void
{
htmlLoader.removeEventListener(Event.COMPLETE, handleHtmlLoadComplete); //Always remove event listeners!
htmlLoader.width = htmlLoader.contentWidth; //Set width and height container
htmlLoader.height = htmlLoader.contentHeight;
addChild(htmlLoader); //Add to stage
}

Another approach is to search your html string for <showImage ..> tags and replace these with shortcodes e.g [showImage ..] , before inserting the htmlString in a textField. Then this is NOT xml but text and you can retrieve the y value (that is if i understand correctly your issue).
Then the rest of your code can take it from there.
(ps using HtmlLoader seems nice alternative though)

Related

Docs Inserted Image always before all text

Making a simple app script that puts images and Text into a Google Doc separated by 2 Columns, for whatever reason, no matter the way I try it the images are always above the text (Inline) in the Doc, even though they should be layered (Inline),
//Replace QR Code
let qrText = editLocalBody.findText("{{qrCode}}");
let setImagePlace = qrText.getElement().asText().replaceText("{{qrCode}}", "");
let qrCodeImage = setImagePlace.getParent().asParagraph().insertInlineImage(0, qrCodeBlob);
From what I've seen this should insert an image wherever the text was previously located, but when it runs this it's always in the wrong spot, somehow above the text it was suppost to be in!
//Edit - To Show The Progression Of What Is Suppose To Happen And What Actually Happens:
I'm making QR Code badges for a propriety system that runs integrated tightly with Google, so I'm using appscript to get an entry from a google form containing an amount of badges (With relevent data) and autofill a Google Doc Accordingly.
// Loop Start
I fill my template with a text line that has key words in it I can select and replace later, with a keyword it can use to insert another this (This Part Works)
I first edit (findText("{{qrCode}}");) the QR Code, replacing (.replaceText) the keyword for it to nothing ("")
I then get the parent of the piece of code I ran above, which is a block of text (I think all the text in the Doc, I think this is where the issue lies, it puts it above the text because it's just one 'paragraph' or not multiple 'bodies' of text, if I could separate this I think it would work!) As a paragraph, and insert An Inline Image at Child Index (0, of the image ,qrCodeBlob)
I've debugged this script quite a bit, so I know It's that final line that inserting images fails, it sees all the text as 'one'.
// I want this (In Descending Order, each it's own full line):
Image
Text
Image
Text
//What It Gives Me (In Descending Order, each it's own full line):
Image
Image
Text
Text
let qrCodeImage = setImagePlace.getParent().asParagraph().insertInlineImage(0, qrCodeBlob);

Access VBA Create Word header with text and position picture

Having trouble getting access vba to set a word document's header properly. I've got this.
oDoc.PageSetup.DifferentFirstPageHeaderFooter = True
oDoc.Sections(1).Headers(wdHeaderFooterFirstPage).Range.InlineShapes.AddPicture "C:\Users\mr.helpless\Pictures\doody.jpg"
oDoc.Sections(1).Headers(wdHeaderFooterFirstPage).Range.Text = "hello there"
oDoc.Sections(1).Headers(wdHeaderFooterPrimary).Range.Text = "whooo hooo!"
What happens right now is the text will replace the picture for the first page (subsequent pages are fine).
I need to have the picture and text - and I need to offset the picture to the left about half an inch while text is centered with normal margins.
Any idea how to go about it? Basically I need to set a document letterhead with a logo.
Update
Dim myText As String
myText = "hello there"
With oDoc.Sections(1).Headers(wdHeaderFooterFirstPage)
.Shapes.AddPicture Filename:="C:\Users\mr.helpless\Pictures\doody.jpg", LinkToFile:=False, SaveWithDocument:=True
.Range.Collapse
.Range.InsertAfter (myText)
.Range.Font.Name = "Helvetica"
.Range.Font.Size = 8
.Range.Font.Bold = True
.Range.Paragraphs.Alignment = wdAlignParagraphCenter
End With
I've got half of it done, now I just need to position the image to -.5 to margin.
Completed Solution
Just add "Left:=-35" to the picture like such (or whatever value works)
.Shapes.AddPicture Filename:="C:\Users\mr.helpless\Pictures\doody.jpg", LinkToFile:=False, SaveWithDocument:=True, Left:=-35
Have you tried recording a macro in Word that does the rough reposition - then bring the code over to Access and edit it for the correct object and size?
All of it is updated in the original thread. it took using the .Range Collapse to add in text along with the image and it took putting Left:=(value) to move it where I needed it.

Selenium, Python 3, simple scraping text from Erowid LSD experiences?

Based off of an answer on here about a similar thing, I tried to scrape the text of Erowid trip experiences. The URL has a bunch of trip links. I want to click each link and then print the 'report-text-surround' element, which is the trip text.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.erowid.org/experiences/exp.cgi?S1=2&S2=-3&C1=9&Str=')
#I tried to get hrefs by xpath, knowing that each trip links starts with 'exp.php?ID'.
view_links = driver.find_elements_by_xpath("""//*[contains(text(), 'exp.php?ID')]""")
for index, view in enumerate(view_links):
html = view.get_attribute('innerHTML')
href = html.split('"')[1]
view_links[index] = href
#And then visit each href and get the data
for href in view_links:
driver.get(href)
#I know this is the element containing the trip text.
trip_text = driver.find_elements_by_class_name('report-text-surround')
for trip in trip_text:
print (trip.text.encode('utf-8'))
So you are pretty close but there are just 2 small mistakes.
trip_text = driver.find_elements_by_class_name('report-text-surround')
for trip in trip_text:
print (trip.text.encode('utf-8'))
Your driver.find_elements_by_class_name should not be plural, as there is only one on the page. It has a lot of elements, but only one class ('report-text-surround'). This means you're going to get all the text at once, you could change this but you'd have to go through the child elements or get the elements seperately.
You can change that entire section to this:
text = (driver.find_element_by_class_name('report-text-surround').text).encode('utf-8')
print(text);
That will give you all of the text in the entire article. An easy way to split this up after would be to split each part of the text by \n\n.

How to get a html element content

I want to ask if there is a way for me to get like a web element content. What i mean is:
the site
the program
You don't need to type the site address or where the element is, in need it only in this case(fully empty site with a few words only).
My question is that, wets say that you have a text on a webpage, and you want that text to appear in a textbox...That's it
you can use this :
Dim We As New System.Net.WebClient()
textbox1.text = We.DownloadString(_Url)
We.Dispose()

Get page selection including HTML?

I'm writing a Chrome Extension, and I was wondering if it was possible to get the selected text of a particular tab, including the underlying HTML? So if I select a link, it should also return the <a> tag.
I tried looking at the context menu event objects (yes, I'm using a context menu for this), and this is all that comes with the callback:
editable : false
menuItemId : 1
pageUrl : <the URL>
selectionText : <the selected text in plaintext formatting, not HTML>
It also returns a Tab object, but nothing in there was very useful, either.
So I'm kind of at a loss here. Is this even possible? If so, any ideas you might have would be great. Thanks! :)
Getting the selected text of a page is fairly easy, you can do something like
var text = window.getSelection().toString();
and you'll get a text representation of the currently selected text that you can pass from a content script to a background page or a popup.
Getting HTML content is a lot more difficult, mostly because the selection isn't always at a clean HTML boundary in the document (what if you only select a small part of a long link, or a few cells of a table for example). The most direct way to get all of the html associated with a selection is to reference commonAncestorContainer, which is a property on a selection range that corresponds with the deepest node which contains both the start and end of the selection. To get this, you'd do something like:
var selection = window.getSelection();
// Only works with a single range - add extra logic to
// iterate over more ranges if needed
var range = selection.getRangeAt(0);
var container = range.commonAncestorContainer;
var html = container.innerHTML
Of course, this will likely contain a lot of HTML that wasn't actually selected. It's possible that you could iterate through the children of the common ancestor and prune out anything that wasn't in the selection, but that's going to be a bit more involved and may not be necessary depending on what you're trying to do.
To show how to wrap this all up into an extension, I've written a short sample which you can reference:
http://github.com/kurrik/chrome-extensions/tree/master/contentscript-selection/
If you don't want all of the siblings, just the selected HTML, use range's other methods like .cloneContents() (to copy) or .extractContents() (to cut).
Here I use .cloneContents():
function getSelectedHTML() {
var range = window.getSelection().getRangeAt(0); // Get the selected range
var div = document.createElement("div");
div.appendChild(range.cloneContents()); // Get the document fragment from selected range
return div.innerHTML; // Return the actual HTML
}