implement search filters in java - mysql

We need to implement a search filter (Net-log like) for my social networking site against user profile, filters on profile include age range, gender and interests
we have approx 1M profiles running on MySQL, MySQL doesn't seems the right option to implement such filters so we are looking on Cassandra as well,
So what is the best way to implement such filter, The result need to be very quick
e.g. age = 18 - 24 and gender = male and interest = Football
Age in Date, Gender and interests are varchar
EDITED:
Let me rephrase the problem, How can I get fastest result of any type of search.
It could be on the bases of profile name, or any other profile thing on 1M profile records.
Thanks

It would serve your project well to make an underlying SQL change. You might want to consider changing the Interest column from a free-input field (varchar) to a tag (Many-to-many on an additional table, for example).
You used the example of Football and having a like operator on it. If you changed it to a tag, then you will have an initial structural problem of deciding where to place:
football
Football
American Football
Australian-rules football
But once you have done so, the tags will help your select statement go much faster.
Without this change, you will be pushing your data management problem from a database (which is equipped to handle it) to Java (which might not be).

It may make some sense to try to optimize your query (there may at least be some things that you can do). It sounds like you have a large database, and if you are returning a large result set and filtering the results with java, you may get performance issues because of all of the data kept in cache.
If this is the case, one thing that you could try is looking into caching the results, outside of the database and reading from that. This is something that Hibernate does very well, but you could implement your own version if needed. If this is something that you are interested in, Memcached, is a good starting place.
I just noticed this for MySQL. I do not know how efficient it is but they have some build in full text searching functions, that may help speed things up.

Related

Dynamic search filter with multiple factors in web app

I'm building a real estate website and I'm a little bit confused on how to filter apartment search results. The user can filter his search by clicking on check boxes and a textbox that contains keywords to search for.
My problem is that I have many filtering options (by city and/or location in city and/or apartment size and/or number of bed rooms and/or ... ). so my problem is how to write a mysql stored procedure that can be dynamic to accept different inputs and give back filtered results with pagination. for example, someone can choose number of bedrooms to be 2 or 3 in his filter and to be in a certain city and simply he might not care about the other conditions. And the user might also put a keyword along with the conditions to search for. I'm using Spring MVC and mysql but I guess the help I need is more about the concept than about what languages and relational DB I'm using.
At first, I though of passing key value pairs but this will complicate things a lot in the procedure I guess and will depend on enum tables. so, can you please suggest a proper way to implement this kind of search based on best practices and your expertise.
Many thx
Faceted Search is really an analytic problem, meaning your need an analytic schema to do it properly.
This means a dimensional design. It also means OLAP-style querying.
So, you should read up on those first.
Basically, you want one big table (where each row is a house for sale), with all applicable columns. This doesn't have to be a real table, it could be a view or materialized view.
I'd pass on using sprocs for this. I don't see how it would help.

Follower System, better in MySQL or Redis?

I'm just wondering what solution to chose to implement a follower system?
In MySQL i would have a table
userID INT PRIMARY,
followID INT PRIMARY
And in Redis I would just use a SET and add to the UserID all the followIDs.
What would be faster for lets say someone having 2000 followers and you want to list all the followers?(in a table that has about 1M entries)
What would be faster to find out if two Users follow each other?
Thank you very much!
By modern standards, 1M items are nothing. Any database or NoSQL system will work fine with such volume, so you just have to pick the one you are the most comfortable with.
In term of absolute performance, Redis will be faster than MySQL on this use case, because:
the whole dataset will be in memory
hash tables are faster than btrees
there is no SQL query to parse or execute
However, please note a relational database is far more flexible than a key/value store like Redis. If you can anticipate all the access paths to your data, then Redis is a good solution. Otherwise you will be better served by a more traditional database.
In my opinion, go with MySQL.
The two biggest points you will think about when making the decision are:
1) Have you thought about your use-cases?
You said you want to implement a follower system. If you're only going to be displaying a list of followers which each user has, then the Redis SET will be enough.
But what if you want to get a list of "A list of users which you are currently following"? You can't dig that up easily from your Redis SET, right? Or how about if you wanted to know if User-X is following User-A ? If User-A had 10,000 followers, this wouldn't be easy either would it?
MySQL is much more flexible when querying different types of results in different scenes.
2) Do you really need the performance difference?
As you know, Redis IS faster than MySQL in these kinds of cases.
It is a simple Key-Value system, so it will exceed the performance of MySQL.
Checking out performance results like these:
http://colinhowe.wordpress.com/2009/04/27/redis-vs-mysql/
http://ruturaj.net/redis-memcached-tokyo-tyrant-and-mysql-comparision/
But the performance difference between Redis and MySQL really starts to kick in
only after about 5,000request/sec .
Otherwise you'd wouldn't be seeing a difference of more than 50ms.
Performance difference will not be an issue until you have a VERY large traffic.
So, after thinking about these two points, MySQL would be a better answer.
Redis will be good only if:
1) The purpose of the set/list is specific, and there is no need for flexibility in the future
2) You feel that the performance difference will actually have an effect on your architecture.
It depends on what you want to do with the data. You gave some examples but it does not sound as though you are really giving a full definition of what the product needs to do. If all you really want to do is show users if they follow each other? Then either is fine as you are just talking about 2 simple queries. However, what if you want to show two users the intersection of users they share or you want to make suggestions off of the data based on profile data for the users. Then, it becomes more interesting as Redis has functionality to easily give you the intersection of sets very very quickly (we're talking magnitude differences in terms of speed not just milliseconds - and the difference gets exponentially larger as there are more users/relationships to parse as the sql joins required to get the data can become prohibitive if you want to give the data in real time).
sadd friends:alex george paul bart
sadd friends:alice mary sarah bart
sinterstore friends:alex_alice friends:alex friends:alice
Note that the above can be done with mysql as well, but your performance will suffer and it would be something that you are more likely to run as a batch job and then store the results for future use. On the other hand, keep in mind that the largest "friends" network in the world, Facebook, started with mysql to store relationships. The graphs of those relationships were batched and heavily denormalized for storage in thousands of memcached servers to get decent performance.
Then if you are looking for more options beyond mysq1 or redis, you might want to read what Michael Stonebaker has to say (he helped create Postgres and Ingres) about using an RDBMS system for graph data such as friend relationships. http://gigaom.com/2011/07/07/facebook-trapped-in-mysql-fate-worse-than-death/. Of course, he's trying to sell his new VoltDB but it is interesting food for thought.
So I think you really need to map out the requirements for the app (as I assume it will do more than just show you who your friends are) in terms of both expected load (did you just throw out 2000 or is that really what you expect to handle) and features and budget. Then really examine many of the different options on the market.

Complex SQL String Comparison

I'm merging two databases for a client. In an ideal world, I'd simply use the unique id to join them, but in this case the newer table has different id's.
So I have to join the tables on another column. For this I need to use a complex LIKE statement to join on the Title field. But... they have changed the title's of some rows which breaks the join on those rows.
How can I write a complex LIKE statement to connect slightly different titles?
For instance:
Table 1 Title = Freezer/Pantry Storage Basket
Table 2 Title = Deep Freezer/Pantry Storage Basket
or
Table 1 Title = Buddeez Bread Buddy
Table 2 Title = Buddeez Bread Buddy Bread Dispenser
Again, there are hundreds of rows with titles only slightly different, but inconsistently different.
Thanks!
UPDATE:
How far can MySQL Full-Text Search get me? Looks similar to Shark's suggestion in SQL Server.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Do it in stages. First get all the ones that match out of the way so that you are only working with the exceptions. Your mind is incredibly smarter than the computer in finding things that are 'like' each other so scan over the data and look for similarities and make sql statements that cover the specific cases you see until you get it narrowed down as much as possible.
You will have better results if you 'help' the computer in stages like this than if you try to develop a big routine to cover all cases at once.
Of course there are certainly apis out there that do this already (such as the one google uses to guess your search phrase before you finish it) but whether any are freely available I don't know. Certainly wouldn't hurt to search for one though.
It's fairly difficult to describe ' only slightly different ' in a way that computer would understand. I suggest choosing a group of certain criteria that can be considered either most common or most important and work around it. I am not sure what those criteria should be though since i have only a vague idea what the data set looks like.

Is it a good idea to store code in database?

I'm writing a game for social network and we have a lot of formulas for weapons stats, items stats etc that depend on player's characteristics. Formulas look like
player.money += 10 * player.level
It looks like a good idea to just store functions like these in db and let game-designer enter them through admin site.
But i'm not sure about this. What problems can occur with this approach?
Thank you.
The issue you need to contend with is not whether you should use a database (you should) but the proper design of such. Consider the object hierarchy you have in the game and how that would be reflected in a database.
Making a database column for each property is a bad idea in that it is too rigid. You want to look at a "property bag" approach where you have look-up tables for most of it that can be indexed for performance.
This of a model like this:
itemId, propertyId, propertyValue
For higher performance, combine this with something like Memcached.
This would be an OK approach. As Jon pointed out, it's not really "code", more like item/equipment attributes.
Pros
Easy to modify
Cons
Requires a lookup for each item or piece of equipment
The fact that you will be doing a lookup for each item could be a performance bottleneck. If you decide to use a DB in the end, I would suggest pulling data from it once and then caching it for, say, the rest of the session. The item attributes aren't likely to change frequently, and this will limit the queries to the database.

high load on mysql DB how to avoid?

I have a table contain the city around the worlds it contain more than 70,000 cities.
and also have auto suggest input in my home page - which used intensively in my home page-, that make a sql query (like search) for each input in the input (after the second letter)..
so i afraid from that heavily load,,...,, so I looking for any solution or technique can help in such situation .
Cache the table, preferably in memory. 70.000 cities is not that much data. If each city takes up 50 bytes, that's only 70000 * 50 / (1024 ^ 2) = 3MByte. And after all, a list of cities doesn't change that fast.
If you are using AJAX calls exclusively, you could cache the data for every combination of the first two letters in JSON. Assuming a Latin-like alphabet, that would be around 680 combinations. Save each of those to a text file in JSON format, and have jQuery access the text files directly.
Create an index on the city 'names' to begin with. This speeds up queries that look like:
SELECT name FROM cities WHERE name LIKE 'ka%'
Also try making your auto complete form a little 'lazy'. The more letters a user enters, lesser the number of records your database has to deal with.
What resources exist for Database performance-tuning?
You should cache as much data as you can on the web server. Data that does not change often like list of Countries, Cities, etc is a good candidate for this. Realistically, how often do you add a country? Even if you change the list, a simple refresh of the cache will handle this.
You should make sure that your queries are tuned properly to make best use of Index and Join techniques.
You may have load on your DB from other queries as well. You may want to look into techniques to improve performance of MySQL databases.
Just get your table to fit in memory, which should be trivial for 70k rows.
Then you can do a scan very easily. Maybe don't even use a sql database for this (as it doesn't change very often), just dump the cities into a text file and scan that. That'd definitely be better if you have many web servers but only one db server as each could keep its own copy of the file.
How many queries per second are you seeing peak? I can't imagine there being that many people typing city names in, even if it is a very busy site.
Also you could cache the individual responses (e.g. in memcached) if you get a good hit rate (e.g. because people tend to type the same things in)
Actually you could also probably precalculate the responses for all one-three letter combinations, that's only 26*26*26 (=17k) entries. As a four or more letter input must logically be a subset of one of those, you could then scan the appropriate one of the 17k entries.
If you have an index on the the city name it should be handled by the database efficiently. This statement is wrong, see comments below
To lower the demands on your server resources you can offer autocompletion only after n more characters. Also allow for some timeout, i.e. don't do a request when a user is still typing.
Once the user stopped typing for a while you can request autocompletion.