Is it possible to make a filter, for example a PropertyFilter that is neutral (and passed to next filter in the chain) if either one or another value matches? Something like:
<filter type="log4net.Filter.PropertyFilter">
<Key value="myProperty" />
<StringsToMatch Operator="OR">
<Match>value1</Match>
<Match>value2</Match>
</StringsToMatch>
</filter>
I really don't want to write my own filter and would prefer to accomplish this with the normal Log4Net filters. Is this possible?
You could certainly develop such a filter yourself by subclassing FilterSkeleton.
But instead of making a specialized filter like this I suggest you rather implement a more generic filter that could be configured to contain a collection of filters and apply the Operator over those. The config could look something like this:
<filter type="CompositeFilter">
<operator value="Or" />
<filters>
<filter type="log4net.Filter.PropertyFilter">
<stringToMatch value="value1" />
</filter>
<filter type="log4net.Filter.PropertyFilter">
<stringToMatch value="value2" />
</filter>
</filters>
</filter>
If you make such a filter I encourage you to submit it to the log4net project. It would certainly be useful for the general public :)
Related
I'm trying to search a collection of HTML files and also provide excerpts in Solr 6.4.1. And since the highlighting needs to return clean readable text, the HTML needs to be stripped down to bare text and stored.
But no matter what I change in the core's configuration, the field I'm specifying does not get returned in the result and highlighting for the document is always empty {}.
managed-schema:
<fieldType name="text_en_splitting_html" class="solr.TextField" autoGeneratePhraseQueries="true" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" catenateAll="0" catenateWords="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory" catenateNumbers="0" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" catenateAll="0" catenateWords="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
<field name="content1" type="text_en_splitting_html" multiValued="true" indexed="true" stored="true"/>
solrconfig.xml is the default one, with the default /update/extract requestHandler. The response I'm getting is:
{
"responseHeader":{
"status":0,
"QTime":4,
"params":{
"q":"*:*",
"hl":"on",
"indent":"on",
"hl.fl":"content1",
"wt":"json",
"_":"1488077854581"}},
"response":{"numFound":100,"start":0,"docs":[
{
"id":"/home/me/files/d1/test.html",
"stream_size":[62963],
"x_parsed_by":["org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.html.HtmlParser"],
"creator":["createhtml"],
"stream_content_type":["text/html"],
"viewport":["width=device-width, initial-scale=1"],
"dc_title":["A nice read"],
"content_encoding":["UTF-8"],
"resourcename":["/home/me/files/d1/test.html"],
"title":["A nice read"],
"creator_url":["http://createhtml.net"],
"content_type":["text/html; charset=UTF-8"],
"_version_":1560362957551960064}
...
},
"highlighting":{
"/home/me/files/d1/test.html":{},
...
I'm indexing with
/opt/solr/bin/post -c mycollection -filetypes html files/
I've also tried with the Tika extract handler
<requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="fmap.Last-Modified">last_modified</str>
</lst>
</requestHandler>
but with limited success. A "content" field now appears in the response and it contains what appears to be a poorly stripped and badly formatted version of the initial document. Highlighting appears to work but it's not clean.
So what I need Solr to do is:
clean up my HTML entirely (no tags, class names, or inline styles - just like JavaScript's .text() method)
perform the search on the stripped content
return the stripped content if I ask it to
return the highlighting on the stripped content
It seems that no matter what I change (except Tika above), "content1" is ignored.
All I'm trying to do here, simply put, is be able to search HTML files and provide excerpts like any other search engine.
I was unable to make this work and Tika would not correctly strip the HTML, so I fixed this by using the Solarium PHP Client for Solr and PHPQuery to parse, strip, extract data, then form my own document to post directly to Solr.
The problem was the ERH (ExtractRequestHandler) defined in solrconfig.xml which was enforcing the use of Tika. By using Solarium, the ERH was bypassed so all fields I defined in managed-schema started being used by the /update request handler.
I am building an ecommerce project where I am using solr search engine.I want to search based on specific keyword. If I enter "c1234" , it should display all the documents having keyword "c1234". Its working fine. But, if I enter "c12#34" then also it should consider "c1234" only. So the problem is I want to ignore the hash tag here. Solr should not consider my hash tag and it should display the same result for both the cases.
The other problem is I want to trim whitespaces. If I search "HP 940", it should trim the whitespace and should display the similar result as "HP940". So I want to have similar reults to be displayed with or without the whitespace. For example,
if I enter "Hp 940", solr should consider it as "HP940". So the problem is triming the white spaces
Thanks in Advance
Try to use olr.WordDelimiterFilterFactory
Test case:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" catenateWords="1"
generateNumberParts="1" catenateNumbers="0" splitOnNumerics="1"
catenateAll="0" splitOnCaseChange="1"
stemEnglishPossessive="1" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
To replace # you should use https://cwiki.apache.org/confluence/display/solr/CharFilterFactories
For the hashtag and other characters you should take a look at the solr.WordDelimiterFilterFactory for this with the catenateWords parameter or alternatively the solr.PatternReplaceCharFilterFactory.
For words like HP 940 also consider something like phrase fields on the dismax handler with no slop.
Consider the following NHibernate mapping, and notice the commented properties. These are the same columns as the key and index column specified in the map. When I remove the comments (thus including the TypeOfPart and UnitId columns for the properties) I get the "Repeated column in mapping for collection" exception.
<map name="Parts" table="ActiveUnitParts" lazy="false">
<key column="UnitId" />
<index column="TypeOfPart" type="integer"/>
<composite-element class="ActiveUnitPart">
<property name="Id" />
<property name="CreationDate" />
<property name="PartInfo"/>
<property name="Remarks"/>
<!-- <property name="TypeOfPart" /> -->
<!-- <property name="UnitId" /> -->
</composite-element>
</map>
What I need in code is a Dictinonary<TypeOfpart, ActiveUnitPart>. But the problem I have is that the values for the properties UnitId and TypeOfPart aren't set in the ActiveUnitPart instance in the Dictinonary<TypeOfpart, ActiveUnitPart>.
Yes, the list of related parts of this unit is loaded, and yes the key in the dictionary is related to the right part. But I do not understand why I can not reference the TypeOfPart and UnitId to fill the properties in ActiveUnitPart itself as well.
How can I solve or workaround this?
Motivation of why I need this:
I must be able to work with ActiveUnitParts without referencing the related Unit (UnitId)
EDIT 1:
I know I can intercept the setter of the Parts property of the Unit and iterate through the Dictinonary<TypeOfpart, ActiveUnitPart> to set the values in code, but it seems like a hack and I wish to learn a more elegant NHibernate way of getting it done, if possible.
It is possible, just change the mapping from column to formula. The best way how to achieve that would be:
<property name="TypeOfPart" formula="TypeOfPart" insert="false" update="false" />
<property name="UnitId" formula="UnitId" insert="false" update="false" />
I have a part of my web application that is an RESTfull api and part a more standard web-pages.
I wish to have the REST part with some custom filters such as the EntryPoint, SuccessHandler and FailureHandler. This part is within the /rest/** mapping.
In the other hand, everything else needs to have more common filters and is mapped with /**.
The problem is to find an easy way to define the filterChainProxy with different mapping-filters.
Right now this solution doesn't work:
<!-- Normal web app -->
<http auto-config="true" use-expressions="true" authentication-manager-ref="authenticationManager">
<form-login/>
<logout/>
<intercept-url pattern="/**" access="hasRole('ROLE_USER')" />
</http>
<!-- Configure RESTfull services -->
<http use-expressions="true" authentication-manager-ref="authenticationManager" entry-point-ref="restAuthenticationEntryPoint" >
<form-login authentication-success-handler-ref="restAuthenticationSuccessHandler" login-page="/rest/login" username-parameter="username" password-parameter="password" />
<logout logout-url="/rest/logout" />
<intercept-url pattern="/rest/**" method="GET" access="hasRole('ROLE_USER')" />
<intercept-url pattern="/rest/**" method="POST" access="hasRole('ROLE_ADMIN')" />
<intercept-url pattern="/rest/**" method="PUT" access="hasRole('ROLE_ADMIN')" />
<intercept-url pattern="/rest/**" method="DELETE" access="hasRole('ROLE_ADMIN')" />
</http>
It complains with: the univseral match being before other patterns.
Is there a way to define such a thing without resorting to define the filterChainProxy with the definition? The http version does help quite a lot to reduce the amount of configuration as I will have to manually set a UsernamePasswordAuthenticationFilter etc.
The next problem is more simple: I have to respond, after the form-login authentication with a JSON object.
So far, I have implemented a SuccessHandler (actually a version of the SimpleUrlAuthenticationSuccessHandler without the redirect part).
How do I write my JSON output?
I must have something like this:
HTTP 200 OK
with:
{"success":true,"customer {"email":"customer#email.com","session_id":"b83a41dfaca6785399f00837888886e646ff9088"}}
and a similar thing with the FailureHandler. It must be quite simple and it's surely is some very basic thing, but how do you do that? Redirecting to a custom controller is not a solution since I will have the 301 redirect status that a very simple REST client might not be able to understand.
At the very least, I wish to have only the http header and no body at all.
thanks!
If you can upgrade to Spring Security 3.1 it supports multiple chains using namespace configuration.
Hallo,
I'am implementing an autocompletion feature in Solr and have one problem.
For autocompletion I am using
<fieldType name="text_auto" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
I thought that the LowerCaseFilter should make the Token Case insensitiv but that ist wrong. In fact in just lowercases the Token which means that a query like "comput" would lead to "computer" while "Comput" doesn't.
Actually I want comput and Comput to lead to Computer.
I allready tried this:
<fieldType name="text_auto_low" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="text_auto_up" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
For some reason it doesn't word either. My question is why and haw can I fix this?
Lucene has the Analyser class which you can use(implement) in three ways:
SimpleAnalyzer : This converts all of the input to lower case.
StopAnalyzer : This removes words that removes noise from your search.
StandardAnalyzer : This does both the above filter processes and thus can 'clean up' your query.
Now, coming to your question, i would recommend a techinque called ngram that splits up your query and then searches for those phrases instead. Thus, you can still get excellent results even if there are typos.
To know how to do this, i suggest you to read this to get you started. It also has other great info regarding queries.
This not only will solve your problem, but will enhance your app.
Have fun :D