I'm implementing a search system in my system and i'm curious about the usage of LIKE. Many websites and books "crucify" the usage of LIKE. But, what's the proper alternative? I really don't want to install a third-party system like Elasticsearch or similar.
For search, the usual approach is the (very powerful) full text search functionality:
http://www.postgresql.org/docs/current/static/textsearch.html
Depending on your specific needs, there also are colorful tools such as n-grams and a case-insensitive text type in contrib:
http://www.postgresql.org/docs/current/static/pgtrgm.html
http://www.postgresql.org/docs/current/static/citext.html
Related
I am developing an application on google map and checking out various options to store and retrieve spatial information within a bounding box.
Initially I thought MySql was not a good option, but after checking http://dev.mysql.com/doc/refman/5.6/en/spatial-analysis-functions.html and http://code.google.com/apis/maps/articles/phpsqlsearch.html, looks like I can use MySql and it does support my use cases.
I was also evaluating node.js and couchdb with geocouch.. With modules like socket.io, geo etc looks like this is also a good choice. check out the book "Getting Started with GEO, CouchDB, and Node.js". My application would be 1 page application and I do not foresee if I would require rdbms anytime in future.
i have also seen this - http://nodeguide.com/convincing_the_boss.html and this makes me little apprehensive about whether to go with node.js-geocouch....
If the architecture for your next apps reads like the cookbook of
NoSQL ingredients, please pause for a second and read this.
Yes, Redis, CouchDB, MongoDB, Riak, Casandra, etc. all look really
tempting, but so did that red apple Eve couldn't resist. If you're
already taking a technological risk with using node.js, you shouldn't
multiply it with more technology you probably don't fully understand
yet.
Sure, there are legitimate use cases for choosing a document oriented
database. But if you are trying to build a business on top of your
software, sticking to conservative database technology (like postgres
or mysql) might just outweigh the benefits of satisfying your inner
nerd and impressing your friends.
What is your opinion ?
GeoCouch sounds like a good solution in your case. If you want to have an easy installation, you can have a look at Couchbase Single Server, which is basically a CouchDB with GeoCouch included (check out the Developer Preview for 2.0.
I implemented a basic search on my site using a like clause in MySQL. But it doesn't help in many cases.
I have a search I am testing with: "swift bird"
The entry in the database is: "Swift"
What do people usually do in order to catch as many of the possibilities, abbreviations, and variations of the words they need to find when implementing their own basic search on their site?
If you want to test, here is the url for this:
http://www.comehike.com/outdoors/birds/search_birds.php
Thanks,
Alex
Have you investigated MySQL's Full Text Search capabilities?
http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html
Can anybody suggest a way to process the information and analyze the data from the comments users post on a article in my website.
I exactly want to process the comments as follows:
Example: Like on a article on computerization may get the following comments:
I love computerization as it makes the work easier.
Computerization is spreading unemployment as 1 computer can work better than 4 people.
How I process this information -
: I take the comments and try to recognize some predefined[and extensible] keywords in it.
Assuming that you are trying to extract some useful information from the comments, you could apply some machine learning to the comments to classify or categorize the data contained within, the sentiments etc.
There are number of different types of learning you can do on the text, however I personally recommend using support vector machines or a naive bayes classifier to be able to categorize and analyze the comments. You could also possibly use clustering, but there needs to be an element of natural language processing in the solution you choose. There are number of different libraries that you can use to implement the code to use either, i.e. svmlight, javaml, etc. I have personally used javaml and it is a good library.
I want to write a simple application like the "stardict" (but not so huge), that searches for the phrase in the dictionary and provides the corresponding value.
I guess that it is kind of "bicycle" and that it was done many times by different people... But the thing is that all the suitable open software, that is available in the web is the "stardict" and it is just incredibly ugly for me personally.
I think that I can write some back-end, that searches for articles in the dictionary and just provides the result in plain form. And the second app, the front-end would just present the result on the screen in acceptable form.
Please recommend the dictionary application the file format to start with, I just want to hear suggestions from my fellow programmers.
Requirements: free, open, has converters to and from popular formats.
P.S. The Apple's "Dictionary" would be just perfect, but it cannot search for the phrase. So if anyone knows how to extend it with "plugins", just let me know. This app is not free, but it is acceptable also.
Have you thought of Unix spell/ispell and the like?
We are in the starting phase of a project, and we are currently wondering
whether which crawler is the best choice for us.
Our project:
Basically, we're going to set up Hadoop and crawl the web for images.
We will then run our own indexing software on the images stored in HDFS
based on the Map/Reduce facility in Hadoop. We will not use other indexing
than our own.
Some particular questions:
Which crawler will handle crawling for images best?
Which crawler will best adapt to a distributed crawling system, in which we
use many servers conducting crawling together?
Right now these look like the 3 best options-
Nutch: Known to scale. Doesn't look like the best option because it seems that is it tied closely to their text searching software.
Heritrix: Also scales. This one currently looks like the best option.
Scrapy: Has not been used on a large scale (not sure though). I dont know if it has the basic stuff like URL canonicalization. I would like to use this one because it is a python framework (I like python more than java), but I don't know if they have implemented the advanced features of a web crawler.
Summary:
We need to get as many images as possible from the web. Which existing crawling framework is both scalable and efficient , but also the one which will be the easiest to modify to get only images?
Thanks!
http://lucene.apache.org/nutch/
I would think going with something with the broadest use and support (community support) would be the better approach.
Nutch may be a good option because you want to end up on HDFS. It may be useful to look into the HBase integration that are currently in the works (NUTCH-650).
You may be able to get the data you need by skipping the index step at the end and instead look at the segments themselves.
However for flexibility another option may be Droids: http://incubator.apache.org/droids/. It's still in the incubator phase at apache, but worth looking at.
You may get some ideas by looking at the SimpleRuntime example in the org.apache.droids.examples. Perhaps by replacing the Sysout handler with one that stores the images onto HDFS that may give you what you want.