JSP - Read page's meta tag - mysql

What I need to do is read in the meta tag of an html file (it's the keyword meta tag) and I was wondering if it was possible with JSP? I need it because I am accessing a database, but the MySQL search is based off of the keyword. Is this possible? If so, do you know how?
SO essentially my MySQL is:
"SELECT * FROM db WHERE searchterm= [KEYWORD META TAG]"

If you make your own custom tag, which outputs the meta tag, then you can easily store the value somewhere when the tag is run.
EDIT
That assumed that your meta tags are on a JSP page. If they're not (as you indicated in comments), and you need to extract them from an external HTML file, then you're going need an HTML parser of some sort (or some ugly/unreliable regular expressions). You might want to try something like http://jsoup.org/.

Related

AEM Rich Text Source Editor Anchor Tag Stripping href formed like Sightly tag

In my AEM project, we have client-side dynamic variable functionality which checks for any strings that are formed inside of a ${ } wrapper. The dynamic variable values are coming from our cookies. Replacing this with a more friendly format that does not conflict with Sightly is not an option at the moment, so please don't tell me to do that :)
When creating an anchor tag in the source editor of the Text core component, I am setting the href as the following: href="/content/en/opt-in.html?hash=${/profile/hash}". The anti-Samy configuration is blocking the href attribute from being rendered on this element, but I have tried to add the following to the overlayed file /apps/cq/xssprotection/config.xml:
<regexp name="expressionURLWithSpecialCharacters" value="(\$\{(\w|\/|:)+\})"/>
<regexp-list>
<regexp name="onsiteURL"/>
<regexp name="offsiteURL"/>
<regexp name="expressionURL"/>
<regexp name="expressionURLWithSpecialCharacters"/>
</regexp-list>
^ inside of the <attribute name="href"> block of common-attributes. Is there something else I need to do in order to make this not be filtered out so that it can be correctly parsed by the global variable replacement? Thanks!
There are two issues here:
The RTE will encode your URL and turn hash=${/profile/hash} into hash=$%7B/profile/hash%7D when storing into JCR
Even if you pass 1, the expression you are trying to use will only match EXACTLY the URL of ${/profile/hash}. You would need to expand the expression to include everything else (scheme, domain/host, path, query etc.). Think onsiteURL and offsiteURL but allowing your expression as well in query parameters. Have a look at https://github.com/apache/sling-org-apache-sling-xss/blob/master/src/main/java/org/apache/sling/xss/impl/XSSFilterImpl.java#L115 to get a starting point.
Have you tried adding disableXSSFiltering="{Boolean}true”?
Vlad, your second point was helpful in that I hadn't considered that one of the regular expressions in the XSS Protection configuration href attribute block needed to match the ${/profile/hash} in addition to the rest of the URL preceding and following it. Although to your first point, the RTE actually did save the special characters as-is into the JCR and did not encode them, probably since I was using the source editor mode and not the inline text editor.
What I ended up doing was creating a new regular expression as follows:
<regexp name="onsiteURLWithVariableExpression"
value="(?!\s*javascript(?::|&colon;))(?:(?://(?:(?:(?:(?:\p{L}\p{M}*)|[\p{N}-._~])|(?:%\p{XDigit}\p{XDigit})|(?:[!$&&apos;()*+,;=]))*#)?(?:\[(?:(?:(?:\p{XDigit}{1,4}:){6}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:::(?:\p{XDigit}{1,4}:){5}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:\p{XDigit}{1,4}){0,1}::(?:\p{XDigit}{1,4}:){4}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,1}\p{XDigit}{1,4})?::(?:\p{XDigit}{1,4}:){3}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,2}\p{XDigit}{1,4})?::(?:\p{XDigit}{1,4}:){2}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,3}\p{XDigit}{1,4})?::(?:\p{XDigit}{1,4}:){1}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,4}\p{XDigit}{1,4})?::(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,5}\p{XDigit}{1,4})?::(?:\p{XDigit}{1,4}))|(?:(?:(?:\p{XDigit}{1,4}:){0,6}\p{XDigit}{1,4})?::))]|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])|(?:(?:(?:\p{L}\p{M}*)|[\p{N}-._~])*|(?:%\p{XDigit}\p{XDigit})*|(?:[!$&&apos;()*+,;=])*))(?::\p{Digit}+)?(?:/|(/(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#)+/?)*))|(?:/(?:(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#)+(?:/|(/(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#)+/?)*))?)|(?:(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#)+(?:/|(/(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#)+)*)))?(?:\?(?:(?:\p{L}\p{M}*)|(\$\{(\w|\/|:)+\})|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#|/|\?)*)?(?:#(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#|/|\?)*)?"/>
which is just the onsiteURL with my original expressionURLWithSpecialCharacters: (\$\{(\w|\/|:)+\}) value added as a group in the query string parameter section. This enabled AEM to accept this as an href value in my anchor tag.
I appreciate everyone's help!

Django Haystack search in Html

i was just wondering (since i didn't find anything quick on Google) if its possible (and how do i achieve that) to search directly in an html file, and ignore the tags or not as i please?
explaining a bit further. we wrote a crawler and obviously the crawler gives back the HTML of the page. But if i feel like searching the content of the crawler, do i need 2 separate fields one with html and one without or i can just have one field with html and search ignoring the html tags or not.
thanks in advance.
If i correctly understand you, all you need is to set search indexes without html tags?
We solved that problem this way:
class PostIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(model_attr='text', use_template=True, document=True)
and in template (search/indexes/blogs/post_test.html) we just used striptags filter
{{ object.content|striptags }}
After that you need to build_schema and rebuild_index. Now it search correctly without tags.

Getting all image tags and changing src in Java

I have created a method that for a given url converts the html into a string. With this string in memory, I would like to find all img tags with a certain data-XXX attribute, extract their src attribute and then change it.
What would be the cleanest way to do this? I have tried XPathReader but it crashes when it finds comments in code... Any other XML parser that would allow me to query for certain attributes without having to go through all tags myself?
Also I've read about regexp but it does not feel right somehow.

Showing certain tags django templates

If I want to be able to show only certain tags in (say as in a forum post) using django tempalte variables how would I do that?
Say the content of my post is:
<div><b>Hell</div>o <i>everyone</i></b>
I don't want to show the div tags, but the b and i tags are fine. I know you can use |safe and autoescape but that seems to escape all html. Is there a better way to do this?
You could use a Custom Django Filter with a Regular Expression that does this.
Have a look here: http://djangosnippets.org/snippets/60/ replace the Regular Expression with what you need to remove the HTMl tags you don't want.

How To Embed a HTML Source Inside Another?

I have two HTML source files. One called index.html and the other embed.html, but I want to embed the second file inside the index one, using HTML this is possible? If not, maybe using JavaScript or VBScript?
You can use an IFrame.
In general, if both documents are valid documents, you cannot simply drop the content of one into another and expect the result to be valid HTML (for example, you will end up with two <body> tags (either nested or not), which is not valid HTML.
This is why frames exist, though they are not very user friendly.
In jQuery, I would use $.load()