Enhance MediaWiki Search - mediawiki

I was just wondering if I can enhance the search facility in MediaWiki, like returning a suggested Result Set closest to the Search Key instead of return 0 results page.
For eg. I have created the ff: articles,
Guidelines to Database Management
Guidelines to SQL Reporting
But when I try to search just by entering Guidelines, instead of showing me or suggesting me something close, it returns 0 results prompting me if I want to create it.
Would it be a little suggestive, can I make it be a little suggestive?

Try installing Lucene in place of the default MediaWiki search engine, which merely does MySQL full-text indexing. Lucene not only would solve the above problem, it also would work if you'd mistyped "Guidlines" and ask, "Did you mean 'Guidelines'?"
http://www.mediawiki.org/wiki/Extension:Lucene-search
http://www.mediawiki.org/wiki/Extension:MWSearch
Lucene is nontrivial to install, but the results are well worth it.

Related

How does stackoverflow manage tags

I am interested in the underlying data structure of the database and the way stackoverflow manages tags. I am about to build application that will rely entirely on tag based filters and I seek for the right approach. What is the best way to design the database, so a minimum queries will have to run in future when working with the sets of tags to filter my data. I did use the search, but couldn't find what I am looking for.
Stackoverflow does not rely entirely on SQL database to work with tags. They cache, pre-sort and pre-aggregate them aggressively.
Read this interesting story of one optimization.
From there you can get some insights on how stackoverflow works.
I don't know if they do it well, but you may want to look at Drupal taxonomy for ideas (http://drupal.org/documentation/modules/taxonomy). If you run the installation, you can look at how they handle this in the generated db.

full text search sql server (which stackoverflow turned down)

My application is a help (user assistance system) just like Online MSDN. but the only way to navigation is through SEARCH. Either the search is good or my system is dead.
I am looking for a third party search engine that can connect to database and provide
out of the box full text searching.
i have researched sql server 2008 ifts, lucene.net api, sql lite fts4 but all of them lack the ranking of result as good as google does.
em not expecting sth like google but i need best ranking search engine product.
Any suggestion or experience ?
maybe i should not go for third party search engine and use Lucene.NET or sql server 2008 FTS
but how can i establish good ranking for user provided Search query.. like
"how can i do upload excel file in XYZ interface" etc..
My short answer is discouraging: you won't be able to find do it yourself, even for an "okay" solution.
If you want good ranking:
Make your site friendly to search engines (which doesn't
necessarily mean that you have to open it to public, just make sure
search engines understand the URLs.)
Pay google to do it (look for google apps)
As you said, a search engine has to do two things at least. The first one is indexing, i.e., finding the documents out of the database based on queried keywords. The second is ranking, which sorts all documents and highlights the most relevant ones.
Ranking is one of the key factor of how good a search engine is. It's not surprising ranking is hard.
To give you an idea how hard it is, take the sentence in your question (i.e., "how can i do upload excel file in XYZ interface") for example. A search engine has to answer at least two questions to get good results:
Which keywords is most important? For example, XYZ might be more important than the word "how", and "can".
What's the possible meanings of the word? "Excel" can be microsoft excel, or Xcel energy(a company name excel)
There are a whole field in computer science dedicated to this problem. If you want some more evidences, take a quick look at ACM WWW.
One thing that is even more discouraging is that getting an "okay" solution would be difficult. The high level point is that the computer knows nothing about English, he has to read a lot to learn how to rank document.
Sadly, "a lot" means a lot of work -- For example, many textbooks suggest ranking documents based on TF/IDF, but getting a reasonable cut for these values requires crawling millions of web pages.
To summarize:
Ranking is hard.
Therefore it's not surprising that you won't be able to find any free, out-of-the-box solutions, and Google and Microsoft keep their ranking algorithms proprietary.
If you want to rank documents in a large database, get a search engine.
check out new feature for semantic search in sql server 2012:
http://msdn.microsoft.com/en-us/library/gg492075%28v=sql.110%29.aspx It won't be a silver bullet but might provides you a "out of the box" approach.

Creating a Full Text Index search

I've created a blog and I wish to search through certain tables in my MySQL databases and then return results for the user on a separate search page. I do not wish to use Google CSE. How would I go about creating this for my site. I found a post on StackOverflow.com from a friend of mine in which he wished to make his more efficient. How would I go about implementing his search engine into my site?
His Code - Here
Are you limited to SQL? There is a lot of software better suitable for text search than any relational database engine. Sphinx, Lucene, Xapian, just to name few.
EDIT
MySQL has some full-text indexing capabilities as well. You may want to check them out.

Best Way To Partial Search in SQL 2008

I've looked into SQL 2008's built-in Full-Text search, and also Lucene.NET.. but I don't think they'll do what I need to do. And I just want to make sure I'm building my program as efficient as possible.
So here's the dream. I want to have a single textbox on a page (like google) and allow the user to enter ANYTHING in. And based on their text, I will search 10's of tables to find what they're looking for.
Example. My database contains thousands of locations, each of which have multiple names / codes. Within each location, there is tonnes of data associated with them.
So if the user wants to display all the locations with the codes that contain "VM" ("CD-VM01", "CD-VM02", "CD-VM03", etc).. they should be able to. Or if they want to find all the locations in Toronto, they just type Toronto.. I want to make the search as easy as possible for people. (I've found that people don't like thinking)..
Plus it ends up being easier to scale to more search options if I can just search the database, and not have to add new fields to a search screen.
So if I don't use Full Text search (which I can't for partial) the only thing I can see that i'm left with is "Like" .. is that right? is that my only option?
I guess the question is, even if you were able to do this in the database, how would you handle it in the UI?
Most likely every search result from a different table will have different attributes that need to be displayed in order for the end user to understand what it is.
The Google search box only needs to search one thing - the content of web pages - and return one type of result - web page URLs and excerpts. Fundamentally you are trying to search for many different things, and so you'll most likely need to handle each case separately.
Alternatively, you could maintain a denormalized search table that contains only the search text and the common attributes you think need to be displayed with each hit. Maintain it either with a scheduled task or with triggers. You'd be able to use FTS on this as well.
Update
Some of the comments express some uncertainty over what SQL Server Full-Text Search is capable of. FTS can most definitely search for a single string anywhere within the text of a column, and can do other things as well (proximity search, free-text search, etc.) If you're just getting started then I'd recommend the TechNet pages on the subject, the documentation is very comprehensive.
In particular I'd suggest having a look at the section on Configuring Catalogs and the Getting Started page (Cole's Notes: you have to create catalogs - writing CONTAINS queries without them won't get you very far). Then take a look at the querying page. I'd be very surprised if you can't find answers to any and all of your questions there.
If you still can't get it to work, I would post a new question with the specifics of your problem - what you've tried, what you're expecting, and what's happening instead.
I believe Lucene does exactly what you're looking for. You can add an index from any external data source (including multiple database tables), then query that index and you'll get back pointers to the matching records.
The drawback is that unlike with full-text indexing, you're responsible for building and maintaining the index yourself.
You can see an example of how Lucene.NET might be used.
It appears that the easiest / quickest solution for this exact problem would be to use LIKE.

Spider that tosses results into mysql

Looking to use Sphinx for site search, but not all of my site is in mysql. Rather than reinvent the wheel, just wondering if there's an open source spider that easily tosses its findings into a mysql database so that Sphinx can then index it.
Thanks for any advice.
There's also the XML pipe datasource that can feed documents to Sphinx. Not sure if it'd be any easier to set something up to output your site's content as XML than it would be to insert it into the DB, but it's an option.
If you're not 100% stuck on using Sphinx you could consider Lucerne like this site is? This should work regardless of underlying technology (database driven or static pages).
I am also currently looking to implement a site search. This question may also help.