When using officedown changing from portait to landscape causes problems with page numbering - officer

I have been using officeR and officedown a lot in the past months. These are great packages sparing me a lot of time and effort, since many of my collaborators want *.docx files.
However, I have one problem which can be reproduced using the bookdown example included in the package. After installing the packages you need to run:
dir <- system.file(package = "officedown", "examples", "bookdown")
file.copy(dir, getwd(), recursive = TRUE, overwrite = TRUE)
rmarkdown::render_site("bookdown")
browseURL("bookdown/_book/bookdown.docx")
The resulting word document shows all the possibilities of officedown. However, you see a word document with 10 pages. Word “says” that there are 11 pages. After inserting page numbers using the MS word function, you see that in the bookdown-example page 5 is in portrait and then the next page (landscape ) has the number 7. Page 6 seems to be missing. If you print or convert the file in pdf, there is now an (empty) page 6 . I have faced the problem always and only when I included pages in landscape using “<!---BLOCK_LANDSCAPE_START---> “ and “ <!---BLOCK_LANDSCAPE_STOP---> “. A change from landscape back to portrait does not seem to cause problems.
Any ideas to solve this?
Best wishes
Jörg

I don't think the issue is related to BLOCK_LANDSCAPE_START because my current documents do not run into this error. I think it might be related to how you are producing your document.
Is there a reason you are using these 2 lines?
rmarkdown::render_site("bookdown")
browseURL("bookdown/_book/bookdown.docx")
Assuming you have officedown installed try the following steps
When rendered using the knit button (Ctrl+Shift+K), it does not produce the extra page you describe.

Related

Reliable method of scraping page source i.e the tv at the beginning of each line?

When extracting data you can use CSS/xpaths. But is there a similar or reliable method of doing this in the page source.
www.amazon.com/Best-Sellers-Electronics-Televisions/zgbs/electronics/172659
You could get the page source and then parse using Regex but probably not be reliable if for instance the tv did not load on the page. I have looked up various solutions but I have yet to find one that mentions getting every tv at start of each line (1, 4, 7 etc,, in source) or using a reliable method e.g Css/xpaths in source of a page.
What would is the golden standard of reliable method of doing what I am after?
To get the page source you can use CURL if the page is rendered entirely on server side (most pages won't be), or headless chrome to get the actual DOM that will render in the browser (https://developers.google.com/web/updates/2017/04/headless-chrome).
For scraping the content, I've used cheerio (https://github.com/cheeriojs/cheerio) which will allow you to read in HTML to an object and then scrape your data off that using jQuery expressions. (Headless chrome allows you to execute JS on the pages you visit, so you don't necessarily need cheerio).
In your specific example you could get the TV on each line by combining the right class selectors to get the divs containing TV's, and using attribute selector with 'margin-left=0px' which would get first item on each line. That is obviously very much bound to structure of the page and will likely be broken by smallest of changes in the page source. (And not really any different from using xpaths. Still better than regex though)
With certain elements loading / not loading on the page (if that was what you meant by TV not being there), no golden solutions that I know of, except allowing sufficient time for the page to load and handling your scraper failing gracefully.

Html code broke when shiny upgraded to bootstrap 3

When Shiny upgraded to bootstrap 3, some of my programs looked wonky as a result. So I used shinybootstrap2 for backward compatibility. Unfortunately, there's still a chunk of code that's not being displayed properly (see column Link in the snapshot below). Before the upgrade, this column used to display hyperlinks which take the user to an external website upon clicking. However, now they are just being displayed as text and do not behave interactively. Here's the code I used to populate the column:
paste('<a href = ', shQuote(url), '>', 'Click</a>')
the url is just another variable where the actual link address is stored.
Any clue/thoughts as to why this might be happening?
Try this to allow parsing of links...
output$table <- renderDataTable({
get_table()
},escape=FALSE)
Or escape individual columns as indicated in the documentation

design a java webapp that prints html signs and lables

I want to design a webapp that can print signs for various products, such as a big store.
The content of the signs (product names , descriptions ,prices ) comes from the server and changes daily. Each product can be printed to a A3 or A4 document.
It is also possible to have 3 signs in one A4 page.
In addition each product type has a differently desinged sign (Tv's have the price on the top of the page in RED, and printers have the price on bottom left in BOLD)
the idea is that the program will get the product data from the DB, push it in inside a html template according to the page size and product type and print the html (or convert the html to pdf and print)
some problems I faced so far:
- textfields from the DB can be to long, and overlap an area with other texts or scramble the rest of the sign.
- there are many product types and each one has its own html design and css so its very hard to maintain if i need to change things .
- different browsers show the sign differently .
- different printers print the sign differently.
What would be the best way to approach the problem? could css frameworks help?
I'm open for ideas.
I've developed an app that does printing, and HTML layout is about the furthest direction from the path that I would take. HTML printing loses elements such as background, positioning, etc very randomly....and it depends by printer brand and driver. If you're serious about going this route, the only two paths I'd consider doing are Postscript or Adobe PDF. HTML can be a valid "preview" but there again you will be fighting against the discrepancies between how the browsers render your code to the screen--no two are the same. Best still to do a .pdf and just display it.
On my app, I do general layout snapped to a draggable grid in Javascript, then output the coordinates and elements to a database that my (very specialized) printer picks up via an automated text document FTP and reassembles using a proprietary print server. From there, the print server puts all the elements together, positions via grid and outputs the job. It's been months in the making and a huge pain to build, but the outcome is just what my company needed for custom printing on demand. We train all our users to understand that layout is not guaranteed perfect like inDesign or Quark, and even then we get occasional complaints. Bottom line--the web wasn't made to be a print layout tool!
use xml + xslt serverbased transformation .
Keep data in standard XML (put that xml in DB)
Keep style in XSLT(Select XSLT depending on product company)
This could be pretty complex but you can apply style templates in form of xslt .
Most browser support this if you do it on server side and stream it .
If you want PDF , HTML ,word docs to be generated then just write XSLFO and use apache xalan framework to create them

Is there anything wrong with this Google App Engine url-fetching code that I have here?

When I view the source of the page in my browser (FireFox) (View->Page Source), copy it and paste it into my HTML editor, I view almost the same page (In this example it is www.google.com) as it appears in my browser. But when I get the HTML source through this code (through Googles App Engines)
from google.appengine.api import urlfetch
url = "http://www.google.com/"
result = urlfetch.fetch(url)
if result.status_code == 200:
print result.content
copy it and paste it into my HTML editor, the page then looks quite different. Why is it so? Is there something wrong with the code?
++++++++++++++++++++++++++++++
Follow-up:
By this moment (Sunday, December 13th, 2009, 1:01 PM, GMT, to be precise) I have received two comments-questions (from Aaron and Christian P.) and one answer from Alex Martelli.
Both Aaron and Christian P. are asking about what actually is different between the Fire-Fox-obtained source and Google-App-Engine-obtained source when they are both displayed through the same HTML editor.
Here I have uploaded too screen shots:
One shows the Fire-Fox-obtained source
And the other one shows Google-App-Engine-obtained source
when they are both displayed through “MS Front Page” editor.
One difference, which is quite obvious, is different encoding: In Fire-Fox code everything is displayed in English, while in the Google-App-Engine code I get a lot of various symbols, instead.
Another difference is some additional lines at the top of the page in the Google App Engine code. I think, this is what Alex Martelli was talking about in his answer (“…the fetch-and-print approach is going to have metadata around it as well…”).
One more minor difference is that the box for the Google image is split into several boxes in one code, while it remains whole in the other one.
Alex Martelli suggested that I use this code (if I understood him correctly):
from google.appengine.api import urlfetch
url = "http://www.google.com/"
result = urlfetch.fetch(url)
if result.status_code == 200:
print "content-type: text/plain"
print
I’ve tried it, but in this case nothing is displayed at all.
Thank you all for your responses and, please, continue responding – I really want to see this issue finally resolved.
++++++++++++++++++++++++++++++
Follow-up:
Okay, the issue has been resolved.
I failed to pay my full attention to Alex Martelli's instructions and, therefore, came up with a wrong code. Here is he right one:
from google.appengine.api import urlfetch
url = "http://www.google.com/"
result = urlfetch.fetch(url)
if result.status_code == 200:
print "content-type: text/plain"
print
print result.content
This code displays exactly what is needed - no additional lines at the top of the page.
Well, I still get the strange symbols, but I discovered that it's probably Google's problem. The thing is I am currently in Taiwan, and Google seems to be aware of that and automatically switches from www.google.com (which is in English) to www.google.com.tw (which is in Chinese), but this one, I guess, is already another topic.
Thanks to everyone who has responded here.
You have not explicitly emitted a "content type" header, and an end-of-headers empty line, so the first few lines are probably going to be lost; try adding before the final print something like
print "content-type: text/plain"
print
Beyond this, what you're getting in either case is essentially a big <script> with a little extra HTML around it -- that's all that Firefox is going to give you in the "view source" page, while the fetch-and-print approach is going to have metadata around it as well, e.g., the "doctype" (depending on what HTML editor you're targeting, this may or may not be an issue).

Printing an HTML email that contains a table longer than 1 page

When I try printing an HTML-email with a table that stretches multiple pages then Windows-Mail and IE7 will only print whatever fits on the first page but Firefox prints everything just fine.
Any thoughts how I can make the Microsoft products print the entire thing?
Regards, Pieter
If this is just for a one-off email, you can try one of the following:
1) Click on file > print to ensure you get the print dialog box
2) Ensure that "all pages" is selected in the print dialog
3) Confirm the print job
or
1) Copy and paste the email into another application, such as Word and print the document from there.
If this is not a one-off issue, you could look at generating the information as a printable attachment, for example PDF format - which gives you more control over print-layout than HTML does.
I just dove into the problem again and found out that Internet Explorer and windows mail as well (same html engine?) cannot handle nested tables (a table inside the cell of another table) very well when it comes to printing them.
Removing the nested tables from the html-email solved the problem (and created another one but i was able to solve the layout issues without nested tables nicely ;) )