Skip a subgoal while proving in Isabelle - proof

I am trying to prove a theorem but got stuck at a subgoal (that I prefer to skip and prove later). How can I skip this and prove the others ?
First, I tried oops and sorry but they both abort the entire proof (instead of the only subgoal). I also tried to put the subgoal into a dummy lemma (assuming proven with sorry) then using it (apply (rule [my dummy lemma])) but it applies the dummy lemma to every other subgoals (not only the first one).

It mostly depends on whether you are using the archaic (sorry for that ;)) apply-style or proper structured Isar for proving. I will give a small example to cover both styles. Assume you wanted to prove
lemma "A & B"
Where A and B just serve as placeholders for potentially huge formulas.
As structured proof you would do something like:
proof
show "A" sorry
next
show "B" sorry
qed
I.e., in this style you can use sorry to omit proofs for subgoals.
In apply-style you could do
apply (rule conjI)
defer -- "moves the first subgoal to the last position"
apply (*proof for subgoal "B"*)
apply (*proof for subgoal "A"*)
There is also the apply-style command prefer n which moves subgoal n to the front.

Related

R Stargazer table ASCII text output formatting (line break, alignment & reference group)

For better or worse, I don't use LaTeX (yet). I like producing stargazer formatted tables on the fly for class examples in both HTML and in the console. However, I'm having trouble with 3 formatting elements; so far I've found solutions for LaTeX and some in HTML, but the ASCII console text eludes me.
The 3 challenges are:
Breaking a line so that a variable name can wrap instead of increasing the table width.
Aligning coefficients & std. errors at the decimal, even when there are p-value stars.
Making space in the covariate labels & coefficients to allow for a reference group.
Let's start with some reproducible data & outputs to reference.
set.seed(3); x1 <- factor(sample(letters[1:4], 1000, replace=TRUE))
set.seed(4); x2 <- runif(1000, -10, 10)
set.seed(5); x3 <- rbinom(1000, size = 1, prob = 0.13)
set.seed(6); y <- runif(1000, -10, 10)
model <- (lm(y ~ x1 + x2 + x3))
stargazer(model, align=TRUE,
#type="html", out="SO_stargazer.html",
type="text", out="SO_stargazer.txt",
title="Example Title Goes Here",
dep.var.caption="",
dep.var.labels="This is my long title for the Dependent Variable Y",
covariate.labels=c("X1 Group B",
"X1 Group C",
"X1 Group D",
"X2 with a super ridiculous and annoyingly long name",
"X3"))
Line break
My default approach is to use \n in the character string. For example, I might try to break the DV caption:
dep.var.labels="This is my long title for \n the Dependent Variable Y",
But that generates the following error message:
Error in if (nchar(text.matrix[r, c]) > max.length[real.c]) { : missing value where TRUE/FALSE needed
Found a couple posts about this issue (here which reference here), but the poster on the first did not provide much of an example to follow and the second pertained to an underscore that I don't have or gave LaTeX solutions. The only difference that broke what already worked was the addition of the \n. I did try using the tex \\ escape, but that didn't do anything useful for text output.
I am able to get line breaks using <br> in the string for the html output file version.
This post also mentions the tex and html solutions, but not text.
Alignment on the decimal
When there are no statistical significance stars on coefficients, both the coefficients and std. errors align nicely, centered on the decimal point. However, once the stars appear, it 'pushes' the coefficient to the left. This happens in both the text and html output. This is not so bad with 1 star, but 3 stars can be quite a difference. How can I coerce it back to align on the decimal value for both formats? This issue persists even if I use the single.row=TRUE option. This post answer by #Marco Doe has a great visual of what I'm talking about, but noted the centering is for tex. Found a LaTeX solution, but no mention of the other formats on that post. I've tinkered with the align and float options to no avail (inspired by these quasi-related tex solution posts here and here). The latter post hinted at using xtable or post-process edits, but that was more than 5 years ago; so I'm hoping for an updated viable solution.
This image is from Marco Doe's solution and shows the LaTeX output, but does a good job showing an example output formats I get (left) and what I would like to have (right).
Reference categories
Found a LaTex solution, that 'pushes' the covariates & coeffient data down a row, making room for a reference group to be printed in the covariate column; however, the solution is in tex. How can I replicate this for the text output? Can I replicate it for HTML version as part of the R code without having to get surgical with the HTML output code?
#Giac posted the images (linked above) to illustrate the have (left) and want (right). Although these images are tex, how could I get the right image output in text and html?

can someone help me how grouping works in regex in my case if group1 success then group2 should be applicable

As I have added image for my problem it selecting the end tag though the first condition is not true. It should only select when the first condition is true.
link
(<a\s\b(href|title)\b.*\">)?|(<[\/]a>)
for the below use cases
www.ags.ny.gov
<a title=\"ba.com/redeem\" href=\"http://ba.com/rertem\" target=\"_blank\" rel=\"nkiops noreferrer\">ba.com/rertem</a>.
www.ags.ay.gov, for free information
Thanks for including the picture. That helped me understand what you're driving at.
The bad boy is the pipe (logical OR) operator, 10th character from the end of your Regular Expression. That says, "Even if there is no match for the first part, if I have that closing anchor tag, then it's a match."
That's what you said you don't want, but the "|" operator says give you a match for </a> no matter what happens before.
I think this is the answer you're looking for. Clarity in your question helps people who want to answer. Your original ask was kind of hazy, but then you added clarification, so I upvoted your question. I hope this answer helps you and encourages you to solicit feedback.
Follow up to your comment:
I knew the logical operator | giving a match at the end though
the first group is not match. However I need to know how can I update
my regex for the second condition should only applicable when the
first condition is true.
Okay, let's think about this a little differently. Your requirement is for the truth values at both ends to be TRUE. That's the same as simply saying the whole thing needs to be one long match. The logical operator | signifies logical disjunction, but your requirement is conjunction. Don't split them up at all. Just make one match expression for the whole string you want.
Where you have |, code instead .+. When I tried that, at first it didn't work because of the ? quantifier, signifying optionality. In that case, my match picked every character even if the first part wasn't there! Well, don't make the first part optional. Get rid of the ?.
My environment will look a little different from yours because I work in R, and it changes the escaping a little bit. You'll get the idea, though. My demonstration did what you appear to be describing you want.
library(stringr)
str1 <- 'www.dfs.ny.gov, for free'
str2 <- '<a title = \\\"http"://www.blah/\\\">www.blah</a>'
str3 <- '<a ID=stuff></a>, for free'
re <- '(<a\\s\\b(href|title)\\b.*\">).+(<[/]a>)'
str_extract(str1, re)
str_extract(str2, re)
str_extract(str3, re)
My results for matching those three strings in my demo were as follows. My change to your RegEx matched the first two and did not match the third (indicated by NA in my output, which means Not Available, as in no match), which are your expected results.
> str_extract(str1, re)
[1] "www.dfs.ny.gov"
> str_extract(str2, re)
[1] "<a title = \\\"http\"://www.blah/\\\">www.blah</a>"
> str_extract(str3, re)
[1] NA

Idiomatic Proof by Contradiction in Isabelle?

So far I wrote proofs by contradiction in the following style in Isabelle (using a pattern by Jeremy Siek):
lemma "<expression>"
proof -
{
assume "¬ <expression>"
then have False sorry
}
then show ?thesis by blast
qed
Is there a way that works without the nested raw proof block { ... }?
There is the rule ccontr for classical proofs by contradiction:
have "<expression>"
proof (rule ccontr)
assume "¬ <expression>"
then show False sorry
qed
It may sometimes help to use by contradiction to prove the last step.
There is also the rule classical (which looks less intuitive):
have "<expression>"
proof (rule classical)
assume "¬ <expression>"
then show "<expression>" sorry
qed
For further examples using classical, see $ISABELLE_HOME/src/HOL/Isar_Examples/Drinker.thy
For better understanding of rule classical it can be printed in structured Isar style like this:
print_statement classical
Output:
theorem classical:
obtains "¬ thesis"
Thus the pure evil to intuitionists appears a bit more intuitive: in order to prove some arbitrary thesis, we may assume that its negation holds.
The corresponding canonical proof pattern is this:
notepad
begin
have A
proof (rule classical)
assume "¬ ?thesis"
then show ?thesis sorry
qed
end
Here ?thesis is the concrete thesis of the above claim of A, which may be an arbitrarily complex statement. This quasi abstraction via the abbreviation ?thesis is typical for idiomatic Isar, to emphasize the structure of reasoning.

How to generate hash from ~200k text/html that would match/compare to similar text?

I would like to make a sort of hash key out of a text (in my case html) that would match/compare to the hash of other similar text
ex of matching texts:
"2012/10/01 This is my webpage #1"+ 100k_of_same_text + random_words_1 + ..
"2012/10/02 This is my webpage #2"+ 100k_of_same_text + random_words_2 + ..
...
"2012/10/02 This is my webpage #2"+ 100k_of_same_text + random_words_3 + ..
So far I've thought of removing numbers and tags but that wold still leave the random words.
Is there anything out there that dose this?
I have root access to the server so I can add any UDF that is necesare and if needed I can do the processing in c or other languages.
The ideal would be a function like generateSimilarHash(text) and an other function compareSimilarHashes(hash1,hash2) that would return the procent of matching text.
Any function like compare(text1,text2) would not work as in my case as I have many pages to compare (~20 mil at the moment)
Any advice is welcomed!
UPDATE:
I'm refering to ahash function as it is described on wikipedia:
A hash function is any algorithm or subroutine that maps large data
sets of variable length to smaller data sets of a fixed length.
the fixed length part is not necessary in my case.
It sounds like you need to utilize a program like diff.
If you are just trying to compare text a hash is not the way to go because slight differences in input cause total and complete differnces in output. (Thus the reason why they are used to encode passwords, and secure text). Character difference programs are pretty complicated, unless you really are interested in how they work and are trying to write your own I would just use a solution like the one that is shown here using sdiff to get a percentage.
Percentage value with GNU Diff
You could use some sort of Levenshtein distance algoritm. this works for small pieces of text, but I'm rather sure that something similar can be applied to large chunks of text.
Ref: http://en.m.wikibooks.org/wiki/Algorithm_implementation/Strings/Levenshtein_distance
I've found out that tag order in webpages can create a very distinctive pattern, that remains the same even if portions of text / css / script change. So I've made a string generated by the tag order (ex: html head meta title body div table tr td span bold... => "hhmtbdttsb...") and then I just do exact matches between these strings. I can even apply the Levenshtein distance algorithm and get accurate results.
If I didn't have html, I would have used the punctuation/end-lines for splitting, or something similar.

Variable order regex syntax

Is there a way to indicate that two or more regex phrases can occur in any order? For instance, XML attributes can be written in any order. Say that I have the following XML:
Home
Home
How would I write a match that checks the class and title and works for both cases? I'm mainly looking for the syntax that allows me to check in any order, not just matching the class and title as I can do that. Is there any way besides just including both combinations and connecting them with a '|'?
Edit: My preference would be to do it in a single regex as I'm building it programatically and also unit testing it.
No, I believe the best way to do it with a single RE is exactly as you describe. Unfortunately, it'll get very messy when your XML can have 5 different attributes, giving you a large number of different REs to check.
On the other hand, I wouldn't be doing this with an RE at all since they're not meant to be programming languages. What's wrong with the old fashioned approach of using an XML processing library?
If you're required to use an RE, this answer probably won't help much, but I believe in using the right tools for the job.
Have you considered xpath? (where attribute order doesn't matter)
//a[#class and #title]
Will select both <a> nodes as valid matches. The only caveat being that the input must be xhtml (well formed xml).
You can create a lookahead for each of the attributes and plug them into a regex for the whole tag. For example, the regex for the tag could be
<a\b[^<>]*>
If you're using this on XML you'll probably need something more elaborate. By itself, this base regex will match a tag with zero or more attributes. Then you add a lookhead for each of the attributes you want to match:
(?=[^<>]*\s+class="link")
(?=[^<>]*\s+title="Home")
The [^<>]* lets it scan ahead for the attribute, but won't let it look beyond the closing angle bracket. Matching the leading whitespace here in the lookahead serves two purposes: it's more flexible than matching it in the base regex, and it ensure that we're matching a whole attribute name. Combining them we get:
<a\b(?=[^<>]*\s+class="link")(?=[^<>]*\s+title="Home")[^<>]+>[^<>]+</a>
Of course, I've made some simplifying assumptions for the sake of clarity. I didn't allow for whitespace around the equals signs, for single-quotes or no quotes around the attribute values, or for angle brackets in the attribute values (which I hear is legal, but I've never seen it done). Plugging those leaks (if you need to) will make the regex uglier, but won't require changes to the basic structure.
You could use named groups to pull the attributes out of the tag. Run the regex and then loop over the groups doing whatever tests that you need.
Something like this (untested, using .net regex syntax with the \w for word characters and \s for whitespace):
<a ((?<key>\w+)\s?=\s?['"](?<value>\w+)['"])+ />
The easiest way would be to write a regex that picks up the <a .... > part, and then write two more regexes to pull out the class and the title. Although you could probably do it with a single regex, it would be very complicated, and probably a lot more error prone.
With a single regex you would need something like
<a[^>]*((class="([^"]*)")|(title="([^"]*)"))?((title="([^"]*)")|(class="([^"]*)"))?[^>]*>
Which is just a first hand guess without checking to see if it's even valid. Much easier to just divide and conquer the problem.
An first ad hoc solution might be to do the following.
((class|title)="[^"]*?" *)+
This is far from perfect because it allows every attribute to occur more than once. I could imagine that this might be solveable with assertions. But if you just want to extract the attributes this might already be sufficent.
If you want to match a permutation of a set of elements, you could use a combination of back references and zero-width
negative forward matching.
Say you want to match any one of these six lines:
123-abc-456-def-789-ghi-0AB
123-abc-456-ghi-789-def-0AB
123-def-456-abc-789-ghi-0AB
123-def-456-ghi-789-abc-0AB
123-ghi-456-abc-789-def-0AB
123-ghi-456-def-789-abc-0AB
You can do this with the following regex:
/123-(abc|def|ghi)-456-(?!\1)(abc|def|ghi)-789-(?!\1|\2)(abc|def|ghi)-0AB/
The back references (\1, \2), let you refer to your previous matches, and the zero
width forward matching ((?!...) ) lets you negate a positional match, saying don't match if the
contained matches at this position. Combining the two makes sure that your match is a legit permutation
of the given elements, with each possibility only occuring once.
So, for example, in ruby:
input = <<LINES
123-abc-456-abc-789-abc-0AB
123-abc-456-abc-789-def-0AB
123-abc-456-abc-789-ghi-0AB
123-abc-456-def-789-abc-0AB
123-abc-456-def-789-def-0AB
123-abc-456-def-789-ghi-0AB
123-abc-456-ghi-789-abc-0AB
123-abc-456-ghi-789-def-0AB
123-abc-456-ghi-789-ghi-0AB
123-def-456-abc-789-abc-0AB
123-def-456-abc-789-def-0AB
123-def-456-abc-789-ghi-0AB
123-def-456-def-789-abc-0AB
123-def-456-def-789-def-0AB
123-def-456-def-789-ghi-0AB
123-def-456-ghi-789-abc-0AB
123-def-456-ghi-789-def-0AB
123-def-456-ghi-789-ghi-0AB
123-ghi-456-abc-789-abc-0AB
123-ghi-456-abc-789-def-0AB
123-ghi-456-abc-789-ghi-0AB
123-ghi-456-def-789-abc-0AB
123-ghi-456-def-789-def-0AB
123-ghi-456-def-789-ghi-0AB
123-ghi-456-ghi-789-abc-0AB
123-ghi-456-ghi-789-def-0AB
123-ghi-456-ghi-789-ghi-0AB
LINES
# outputs only the permutations
puts input.grep(/123-(abc|def|ghi)-456-(?!\1)(abc|def|ghi)-789-(?!\1|\2)(abc|def|ghi)-0AB/)
For a permutation of five elements, it would be:
/1-(abc|def|ghi|jkl|mno)-
2-(?!\1)(abc|def|ghi|jkl|mno)-
3-(?!\1|\2)(abc|def|ghi|jkl|mno)-
4-(?!\1|\2|\3)(abc|def|ghi|jkl|mno)-
5-(?!\1|\2|\3|\4)(abc|def|ghi|jkl|mno)-6/x
For your example, the regex would be
/<a href="home.php" (class="link"|title="Home") (?!\1)(class="link"|title="Home")>Home<\/a>/