I'm trying to make a plot with a two layer strip. I want the first layer of strips to have a horizontal text orientation and the second layer to have a vertical text orientation.
In the example below, I want the strip layers that say 'horizontal' to be horizontal and I want '1999' and '2008' to remain vertical.
library(ggplot2)
library(ggtext)
library(glue)
df <- mpg
df$outer <- "horizontal"
p <- ggplot(df, aes(displ, cty)) +
geom_point() +
theme(
strip.text.y.left = element_markdown()
)
p + facet_grid(
outer + year ~ .,
switch = "y"
)
The ggtext package is great, because it allows us to use ggtext::element_markdown() to conditionally format layers of a strip with html tags, such as in the example below:
p + facet_grid(
glue("<span style = 'color:red'>{outer}</span>") + year ~ .,
switch = "y"
)
Created on 2021-07-11 by the reprex package (v1.0.0)
Instead of applying a red color, is there an (HTML) tag I could use to make the text orientation horizontal? I'm not very fluent in HTML. After googling some options, I've tried the following spans with no success:
"<span style = 'transform:rotate(90deg)'>"
"<span style = 'text-orientation:sideways'>"
As a side-note: I know that I can edit the gtable of a plot to manually make edits to labels and whatnot. That is exactly what I'm trying not to do!
In addition to a solution to my problem, there are two other ways I'd consider my question answered.
A link to some documentation that says it is not (yet) possible to do this with ggtext. Please post it as an answer with a small description so I can accept it, if this is the case. A post by ggtext's creator Claus O. Wilke commenting on this, is also fine.*
A code example where an attempt to use canonical HTML tags (besides the two I already tried) fails to rotate the text. I'd then know that someone with more knowledge than me about HTML tried and my question has no apparent solution.
* I'm aware of the paragraph in ggtext's readme that reads the following:
As a general rule, any Markdown, HTML, or CSS feature that isn’t shown in any of the ggtext or gridtext documentation likely doesn’t exist.
I'm fishing for a more explicit statement that says text cannot be rotated with tags.
Related
For better or worse, I don't use LaTeX (yet). I like producing stargazer formatted tables on the fly for class examples in both HTML and in the console. However, I'm having trouble with 3 formatting elements; so far I've found solutions for LaTeX and some in HTML, but the ASCII console text eludes me.
The 3 challenges are:
Breaking a line so that a variable name can wrap instead of increasing the table width.
Aligning coefficients & std. errors at the decimal, even when there are p-value stars.
Making space in the covariate labels & coefficients to allow for a reference group.
Let's start with some reproducible data & outputs to reference.
set.seed(3); x1 <- factor(sample(letters[1:4], 1000, replace=TRUE))
set.seed(4); x2 <- runif(1000, -10, 10)
set.seed(5); x3 <- rbinom(1000, size = 1, prob = 0.13)
set.seed(6); y <- runif(1000, -10, 10)
model <- (lm(y ~ x1 + x2 + x3))
stargazer(model, align=TRUE,
#type="html", out="SO_stargazer.html",
type="text", out="SO_stargazer.txt",
title="Example Title Goes Here",
dep.var.caption="",
dep.var.labels="This is my long title for the Dependent Variable Y",
covariate.labels=c("X1 Group B",
"X1 Group C",
"X1 Group D",
"X2 with a super ridiculous and annoyingly long name",
"X3"))
Line break
My default approach is to use \n in the character string. For example, I might try to break the DV caption:
dep.var.labels="This is my long title for \n the Dependent Variable Y",
But that generates the following error message:
Error in if (nchar(text.matrix[r, c]) > max.length[real.c]) { : missing value where TRUE/FALSE needed
Found a couple posts about this issue (here which reference here), but the poster on the first did not provide much of an example to follow and the second pertained to an underscore that I don't have or gave LaTeX solutions. The only difference that broke what already worked was the addition of the \n. I did try using the tex \\ escape, but that didn't do anything useful for text output.
I am able to get line breaks using <br> in the string for the html output file version.
This post also mentions the tex and html solutions, but not text.
Alignment on the decimal
When there are no statistical significance stars on coefficients, both the coefficients and std. errors align nicely, centered on the decimal point. However, once the stars appear, it 'pushes' the coefficient to the left. This happens in both the text and html output. This is not so bad with 1 star, but 3 stars can be quite a difference. How can I coerce it back to align on the decimal value for both formats? This issue persists even if I use the single.row=TRUE option. This post answer by #Marco Doe has a great visual of what I'm talking about, but noted the centering is for tex. Found a LaTeX solution, but no mention of the other formats on that post. I've tinkered with the align and float options to no avail (inspired by these quasi-related tex solution posts here and here). The latter post hinted at using xtable or post-process edits, but that was more than 5 years ago; so I'm hoping for an updated viable solution.
This image is from Marco Doe's solution and shows the LaTeX output, but does a good job showing an example output formats I get (left) and what I would like to have (right).
Reference categories
Found a LaTex solution, that 'pushes' the covariates & coeffient data down a row, making room for a reference group to be printed in the covariate column; however, the solution is in tex. How can I replicate this for the text output? Can I replicate it for HTML version as part of the R code without having to get surgical with the HTML output code?
#Giac posted the images (linked above) to illustrate the have (left) and want (right). Although these images are tex, how could I get the right image output in text and html?
I am trying to find an easy way to convert my Word documents to HTML without the awful save-as that is built in. These are structured documents (designed for our screen-reader (JAWS) users), and so they use Heading 1, 2, 3, 4 & the Table of Contents.
We plan to convert these to DAISY audiobooks (https://en.wikipedia.org/wiki/DAISY_Digital_Talking_Book ) , so we need pretty clean, but structured, HTML to convert.
I tried the find-replace, using Styles, but it would just replace anything in the text part of the search. I could convert it from any one style to another, but adding text in the box messed it up.
(I think I see that CSS for DAISY means that instead of just <h2> it will have to be <level2 class=='section' <h2> and closing tags), but that's step 2 after I handle this part.)
I just want to be able to find any text using Style 2 and add text to the start of that line saying "yep, here's some style 2" so that I can do the HTML/CSS stuff.
Thanks!
You can do that with a simple Find/Replace. For example, specify the Heading 1 Style for the Find parameter and use:
Replace = <h1>^&</h1>
For a macro you could incorporate that into, see: Convert a Word Range to a String with HTML tags in VBA
I'm having some difficulty using a RegExp to search for text between HTML tags. This is for a search function to search text on a HTML page without find the characters as a match in the tags or attributes of the HTML. When a match has been found I surround it with a div and assign it a highlight class to highlight the search words in the HTML page. If the RegExp also matches on tags or attributes the HTML code is becoming corrupt.
Here is the HTML code:
<html>
<span>assigned</span>
<span>Assigned > to</span>
<span>assigned > to</span>
<div>ticket assigned to</div>
<div id="assigned" class="assignedClass">Ticket being assigned to</div>
</html>
and the current RegExp I've come up with is:
(?<=(>))assigned(?!\<)(?!>)/gi
which matches if assigned or Assigned is the start of text in a tag, but not on the others. It does a good job of ignoring the attributes and tags but it is not working well if the text does not start with the search string.
Can anyone help me out here? I've been working on this for a an hour now but can' find a solution (RegExp noob here..)
UPDATE 2
https://regex101.com/r/ZwXr4Y/1 show the remaining problem regarding HTML entities and HTML comments.
When searching the problem left is that is not ignored, all text inside HTML entities and comments should be ignored. So when searching for "b" it should not match even if the HTML entity is correctly between HTML tags.
Update #2
Regex:
(<)(script[^>]*>[^<]*(?:<(?!\/script>)[^<]*)*<\/script>|\/?\b[^<>]+>|!(?:--\s*(?:(?:\[if\s*!IE]>\s*-->)?[^-]*(?:-(?!->)-*[^-]*)*)--|\[CDATA[^\]]*(?:](?!]>)[^\]]*)*]])>)|(e)
Usage:
html.replace(/.../g, function(match, p1, p2, p3) {
return p3 ? "<div class=\"highlight\">" + p3 + "</div>" : match;
})
Live demo
Explanation:
As you went through more different situations I had to modify RegEx to cover more possible cases. But now I came with this one that covers almost all cases. How it works:
Captures all <script> tags and their contents
Captures all CDATAblocks
Captures all HTML tags (opening / closing)
Captures all HTML comments (as well as IE if conditional statements)
Captures all targeted strings defined in last group inside remaining text (here it is
(e))
Doing so lets us quickly manipulate our target. E.g. Wrap it in tags as represented in usage section. Talking performance-wise, I tried to write it in a way to perform well.
This RegEx doesn't provide a 100% guarantee to match correct positions (99% does) but it should give expected results most of the time and can get modified later easily.
try this
Live Demo
string.match(/<.{1,15}>(.*?)<\/.{1,15}>/g)
this means <.{1,15}>(.*?)</.{1,15}> that anything that between html tag
<any> Content </any>
will be the target or the result for example
<div> this is the content </content>
"this is the content" this is the result
I know the basic things of accessing a website and so (I just started learning yesterday), however I want to extract now. I checked out many tutorials of Mechanize/Nokogiri but each of them had a different way of doing things which made me confused. I want a direct bold way of how to do this:
I have this website: http://openie.allenai.org/sentences/rel=contains&arg2=antioxidant&title=Green+tea
and I want to extract certain things in a structured way. If I inspect the element of this webpage and go to the body, I see so many <dd>..</dd>'s under the <dl class="dl-horizontal">. Each one of them has an <a> part which contains a href. I would like to extract this href and the bold parts of the text ex <b>green tea</b>.
I created a simple structure:
info = Struct.new(:ObjectID, :SourceID) thus from each of these <dd> will add the bold text to the object id and the href to the source id.
This is the start of the code I have, just retrieval no extraction:
agent = Mechanize.new { |agent| agent.user_agent_alias = "Windows Chrome" }
html = agent.get('http://openie.allenai.org/sentences/?rel=contains&arg2=antioxidant&title=Green+tea').body
html_doc = Nokogiri::HTML(html)
The other thing is that I am confused about whether to use Nokogiri directly or through Mechanize. The problem is that there isn't enough documentation provided by Mechanize so I was thinking of using it separately.
For now I would like to know how to loop through these and extract the info.
Here's an example of how you could parse the bold text and href attribute from the anchor elements you describe:
require 'nokogiri'
require 'open-uri'
url = 'http://openie.allenai.org/sentences/?rel=contains&arg2=antioxidant&title=Green%20tea'
doc = Nokogiri::HTML(open(url))
doc.xpath('//dd/*/a').each do |a|
text = a.xpath('.//b').map {|b| b.text.gsub(/\s+/, ' ').strip}
href = a['href']
puts "OK: text=#{text.inspect}, href=#{href.inspect}"
end
# OK: text=["Green tea", "many antioxidants"], href="http://www.talbottteas.com/category_s/55.htm"
# OK: text=["Green tea", "potent antioxidants"], href="http://www.skin-care-experts.com/tag/best-skin-care/page/4"
# OK: text=["Green tea", "potent antioxidants"], href="http://www.specialitybrand.com/news/view/207.html"
In a nutshell, this solution uses XPath in two places:
Initially to find every a element underneath each dd element.
Then to find each b element inside of the as in #1 above.
The final trick is cleaning up the text within the "b" elements into something presentable, of course, you might want it to look different somehow.
According to https://chemistry.meta.stackexchange.com/a/88, Stack Exchange sites use MathJax to format math equations.
When I looked at the demo page (http://www.mathjax.org/demos/tex-samples/), the source code for the first example is:
\[\begin{aligned}
\dot{x} & = \sigma(y-x) \\
\dot{y} & = \rho x - y - xz \\
\dot{z} & = -\beta z + xy
\end{aligned} \]
Since the result is text, I am assuming that some fancy CSS makes it look nice like that. My question is can someone help me find a way to get that CSS and convert that code to raw HTML that looks the same?
If you are using Firefox, you can install a browser AddOn called "Web Developer" which will give you an added menu bar. One of the commands available from this bar is CSS/Display Style Information. You can then select any element on the page and the styling for the element will be shown in separate window at the bottom of the page. By using this, you can potentially reconstruct from scratch the HTML styling for a particular element or set of elements.