Dealing with asterisks in Sphinx results - mysql

I'm using Sphinx to search MySQL.
One of the results Sphinx returns for a search is M*A*S*H, as in the hit television show.
The problem I'm facing is that M*A*S*H is returned for nearly any query made with Sphinx. I'm guessing this is due to the asterisks. If not, then what could the problem be?
If the asterisks are causing my problem, how can I work around this to not have M*A*S*H returned for every query?

Make sure that asterisks are included in the charset_table.
charset_table = <blah blah blah>, U+002A
http://sphinxsearch.com/docs/current.html#conf-charset-table

Does this Sphinx function (EscapeString) do what you want

This problem no longer seems to exist, though I don't know why. I'm sure that something must have been amiss in my sphinx.conf. If someone else has this issue, let me know here and I'll try to update this answer accordingly.

In any case you can use the exceptions file to specify any word you may want to include in your searches. Remember to reindex whenever you change the file.
You can read the details here: http://sphinxsearch.com/docs/1.10/conf-exceptions.html

Related

How can I get the location of a term or phrase in a chunk of text in a MS Sql 2008 Full Text index?

Possibly related question: How could I generate a contextual text extract from text returned from a SQL Server Full-Text Index?
At any rate, I'd like to know if there is a way to get the position of the hit(s) within an indexed document. If this isn't possible, I was wondering if it's possible in any other full-text search technologies out there.
Thanks in advance!
EDIT: Derp. I just walked through the implications of the answer to that related question. If that's the best way then I guess it's the best way. Anyone have any better suggestions?
In the end sys.dm_fts_parser appears to be the best/only solution to this. Unfortunately it seems to be able to only handle a maximum amount of characters so it's on to Lucene.NET for me.

Determine if an Index has been used as a hint

In SQL Server, there is the option to use query hints.
eg
SELECT c.ContactID
FROM Person.Contact c
WITH (INDEX(AK_Contact_rowguid))
I am in the process of getting rid of unused indexes and was wondering how I could go about determining if an index was used as a query hint. Does anyone have suggestions on how I could do this?
Cheers,
Joe
You can only run profiler for client SQL or search sys.sql_modules otherwise.
To find unused indexes you'd normally use something based on dmvs. This would show you what indexes are in use and need to be kept.
That's a great question, and I don't think I can give you an easy answer. If it were me, I would script th entire database in Management Studio and do a Text search for the index name. I would also do that in all of my reports and source code, just to be sure, too.
I don't think that hints make their way to sys.dependencies for procs an functions, but even if they did, you'd have some ad-hoc SQL to potentially deal with, so that's why I'd use the text searching route.

Best practice for building a "Narrow your results" product filtering feature

I'm building a "Narrow your results by" feature similar to Best Buy's and NewEgg's. What is the best practice for storing the user's filter selections in a URL that can be shared/bookmarked?
The obvious choice is to simply keep all the user's selections in the query string. However, both of these examples are doing something far more cryptic:
Best Buy:
http://www.bestbuy.com/site/olstemplatemapper.jsp?id=pcat17080&type=page&qp=crootcategoryid%23%23-1%23%23-1~~q70726f63657373696e6774696d653a3e313930302d30312d3031~~cabcat0500000%23%230%23%2311a~~cabcat0502000%23%230%23%23o~~nf518||24363030202d2024383939&list=y&nrp=15&sc=abComputerSP&sp=%2Bcurrentprice+skuid&usc=abcat0500000
It appears they're assigning some unique value to the search and storing it temporarily on their side. Or perhaps wrapping their db id's in a bunch of garbage because they believe in security through obscurity?
Is there some inherent disadvantage to keeping things simple like this?
www.mydomain.com?color=blue&type=laptop
So when I select a 17" screen size as a filter, it would simply reload the page with the additional query string tacked on:
www.mydomain.com?color=blue&type=laptop&screen-size=17
Also, to clarify, I would likely use corresponding ids from the database in the URL to make validation and parsing easier/faster, but the question remains about whether there's some problem I'm missing in my simple approach.
Thanks in advance!
One of the first players in the faceted search domain was Endeca, and they are still used by many of the larger online stores (PC Connection, Home Depot, Walmart ...). You may want to take a look at their website.
There is a Drupal plug-in for faceted search. Check out the demo.
I don't think the URL composition matters much, but I actually think presenting the parameters in a readable form may be dangerous. One of the advantages of using "Guided search" is that you can avoid producing empty result sets by not allowing invalid parameter combinations. If the query-string is user-editable, they can come up with invalid combinations, circumventing the guided search.
I think the more human-readable manner, i.e. www.mydomain.com?color=blue&type=laptop&screen-size=17 is the better approach to take here. Just make sure you are sanitizing everything coming from the url before it gets to the database.
The query string has very reachable max length (255?), which is probably the reason for the serialization.

mysql random generated value

I need to generate a random alpha/numeric to give to users that they come to the site to enter. I dont' know much about random numbers and such, I know there are seeding issues and such, but I'm not sure what they are.
So, I used this:
select substrING(md5(concat_ws('-',md5(username_usr),
MD5(zip_usr), MD5(id_usr),
MD5(created_usr))),-12) from users_usr
Is this safe? I used concat_ws because sometimes zip is null, but the others never are.
And yes, I know this is kinda short, but 1. They have to enter the last 4 of their social, 2. It's 1 time use, 3. There's no private data displayed back in the application and 4. I may use captcha, but since there's no private data, thats probably overkill.
THanks
Maybe using the Universal Unique Identifier would suffice? Just to keep it simple?
If you need a random alphanumeric value, why are you using so many variables? Something like the following should be perfectly enough:
md5(rand())
--Flavor: MySql
It'd help to know the purpose of the "random" string. This isn't random - it's repeatable - and fairly easily repeatable, at that. You're not exposing any sensitive information in a way that's easily reversible, but I'm guessing you're really looking for a way to generate a UUID (univeraslly unique ID). Not coincidentally, recent MySQL versions have a function called UUID.
http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_uuid
That might better solve the problem you're trying to address. If you really want a random number (which can definitely have collisions, by the way) for some reason, don't worry about seeding. If you don't specify a seed, it'll self-seed in a way that's probably better than a fixed seen anyway. You'd then map that random number (or a series of random numbers) to a character (possibly by casting the integer to a char), and repeat that until you have a string of chars long enough. But it bears repeating that a random number is not a guaranteed unique number...
Someone in the deleted duplicate of this question suggested using UUID(), which I think is a good idea. I don't think there's anything greatly wrong with using MD5(RAND()) either.
You'd have to store those, of course, which you don't have to do with your example.
>>SELECT md5(RAND()+CURRENT_TIMESTAMP())

Search and replace a term in a MySQL database

I recently added a new project to our issue tracker, which is Redmine. When creating a project, you give it a name and an identifier (which are often the same). There is a note when creating the project that you cannot change the identifier once it has been set. For this reason I was very careful to choose a generic identifier. Unfortunately, I wasn't careful enough and I spelled it wrong! The misspelled identifier appears in the issue tracker URLs. These will be seen by other developers and another company we are working with, so it's a very embarrassing mistake.
So I'm looking for suggestions as to how to fix this. Either Redmine-specific, or something I can do at the database level (which is MySQL).
I've already found a solution that I will probably go with, but I thought it would be worth asking here anyway. I'm hoping someone can offer a simpler solution - maybe a magically SQL one-liner.
The solution I've found is this:
Dump the database to SQL (using mysqldump)
search and replace with sed or a text editor
recreate the database from this SQL.
Thanks for any suggestions.
Turns out it was as simple as:
update `projects` set `identifier` = '[NEWNAME]' where `indentifer` = '[OLDNAME]';
If the identifier is just confined to some column or set of columns, you can use:
update [table] set [field] = replace([field],'[find]','[replace]');
Replace the bracketed text with the identifiers in your case, as appropriate.