More advanced search with SourceGraph - sourcegraph

I am trying to create a query that states the following:
Find me repos that have file 'baz', which contain 'foo', and also contain file 'bar'.
What I have now doesn't include the second part:
file:baz foo

This query can't be expressed in the current query syntax, which was designed primarily for finding text/regex matches in code.
However, we're expanding search to cover more advanced queries, so I've filed an issue here for this feature: https://github.com/sourcegraph/issues/issues/127. Thanks for using Sourcegraph and feel free to comment on the issue with more context about your use case :)

Related

How to boost results that contain the search phrase on a field

I am trying to put Solr search results that contain my search phrase in a specific field (here resourcename) on the top of the result set.
I am a beginner on Solr. I have searched the web for quite a while and found some related questions, like:
Use function query for boosting score in Solr
SolrNet queries with boost functions
Then I started experimenting myself with queries like these:
https://localhost:8898/solr/collection1/select?defType=edismax&fl=resourcename&indent=on&q=resourcename:"test"*^200,%20content:"test"*^1&qf=resourcename^200%20content^2&rows=1000&wt=json
https://localhost:8898/solr/collection1/select?bf=if(exists(resourcename),100,1)&defType=edismax&fl=resourcename&indent=on&q=resourcename:"test"*^200,%20content:"test"*^1&rows=1000&wt=json
https://localhost:8898/solr/collection1/select?bf=if(exists(resourcename),100,1)&defType=edismax&fl=resourcename&indent=on&q=*:"test"*&rows=1000&wt=json
https://localhost:8898/solr/collection1/select?defType=edismax&fl=resourcename&indent=on&q=*:"test"*&qf=resourcename^200%20content^2&rows=1000&wt=json
But, no matter what I try, I get results containing the word test in the resourcename all over the place and not only on the top of the results.
Any ideas what I might be missing or doing wrong?
There are a lot of syntax mistakes, I would recommend to take a look to the solr wiki for query parsers[1] .
As a suggestion, always take a look to the parsed query and explore the debug functionality for search results.
To get the behavior you are asking I would use the following request parameters (quoting from the wiki):
q=foo bar
qf=field1^5 field2^10
pf=field1^50 field2^20
defType=dismax
With these parameters, the Dismax Query Parser generates a query that looks something like this:
(+(field1:foo^5 OR field2:foo^10) AND (field1:bar^5 OR field2:bar^10))
But it also generates another query that will only be used for boosting results:
field1:"foo bar"^50 OR field2:"foo bar"^20
In this way you can boost results according to the matches in some fields, with related boosts and then also boost phrases appearing in specific other fields.
[1] https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

SQL query for words (not the sentence)

I would like to query a single column (varchar):
sample datarows:
1) The fox jumps like a foo on my bar
2) Jumpers are not cool
3) Apple introduced iJump
When I enter a search criteria like... jump
I expect to get a resultset of: jumps, Jumpers, iJump
(So I dont want the complete row)
Currently I'm using MySQL (I'm open to suggestions as long it's open source)
Since you're using MySQL, I might suggest looking into LIB_MYSQLUDF_PREG.
This open source library will provide you with additional regex functionality, including the PREG_CAPTURE function, which extracts a regex match from a string.
Using this function, you could easily build a regex to return the match you're looking for... Something like:
\b\w*jump\w*\b
Getting any row with your search criteria is easy:
SELECT sentence
FROM sentences
WHERE sentence LIKE '%jump%'
I'd probably do the rest in application logic, since doing it in the database doesn't help you at all.
Also, any method of splitting a string and handling it will probably be database-specific, so you would need to say which one you're using.

Extracting MySQL data within "tags" using regular expressions? [duplicate]

This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
Simulating regex capture groups in mysql
Good day,
I have many rows of data stored in a MySQL table. A typical value could look something like this:
::image-gallery::
::gallery-entry::images/01.jpg::/gallery-entry::
::/image-gallery::
Is there a way - by means of a regular expression that I can a) extract the term image gallery from the first line (it could be any phrase, not just image-gallery) and then extract the center line as two separate values like this:
gallery-entry and then images/01.jpg
There could be many lines of ::gallery-entry:: values, and they could be called anything as well. A more complete example would be:
::image-gallery::
::title::MY GALLERY::/title::
::date::2011-05-20::/date::
::gallery-entry::images/01.jpg::/gallery-entry::
::/image-gallery::
In essence I want this information: The content type (image-gallery) in the above case, first line and last line. Then I need the title as a key value style pair, so title as the key and MY GALLERY as the value. Then, subsequently, I would need all the rows of fields thereafter (gallery-entry) as key value pairs too.
This is for a migration script where data from an old system will be migrated over to a new system with different syntax.
If MySQL select statements would not work, would it be easier to parse the results with a PHP script for data extraction?
Any and all help is always appreciated.
Kind regards,
Simon
Try this regex:
::image-gallery::\s+::title::(.*?)::/title::.*?::gallery-entry::(.*?)::/gallery-entry::\s+::/image-gallery::
Use single-line mode (/pattern/s) so the .*? chews up newlines.
Your key-value pairs will be:
title: $1 (matching group 1)
gallery-entry: $2 (matching group 2)
From simulating-regex-capture-groups-in-mysql there does not seem to be a way to easily capture groups with a regex in mysql. The reason is that MySQL does not natively support capture groups in a regex. If you want that functionality you can use a server side extension like lib_mysqludf_preg to add that capability to MySQL.
The easiest way is to extract the whole column with SQL and then do the text matching in another language (such as php).
In my tests kenbritton's regex didn't work, but building off of it the following regex worked on your test data:
::image-gallery::\s+::title::(.*?)::\/title::\s+(?:.*\s+)*::gallery-entry::(.*?)::\/gallery-entry::\s+::\/image-gallery::

MySQL: Find and Replace Between Certain Characters

In field post_content I have a string like this in nearly 800 rows:
http://somesite.com/">This is some site</a>
I need to remove everything from "> onwards so that it leaves just the URL. I can't do a straight find and replace because the text is unique.
Any clues? This is really my first foray into MySQL database modifications but I did do an extensive search before posting here.
Thanks,
~Kyle~
From this site: http://www.regular-expressions.info/mysql.html
LIB_MYSQLUDF_PREG
If you want more regular expression power in your database, you can consider using LIB_MYSQLUDF_PREG. This is an open source library of MySQL user functions that imports the PCRE library. LIB_MYSQLUDF_PREG is delivered in source code form only. To use it, you'll need to be able to compile it and install it into your MySQL server. Installing this library does not change MySQL's built-in regex support in any way. It merely makes the following additional functions available:
Here it comes...
PREG_CAPTURE extracts a regex match from a string. PREG_POSITION returns the position at which a regular expression matches a string. PREG_REPLACE performs a search-and-replace on a string. PREG_RLIKE tests whether a regex matches a string.
Sounds exactly what you're looking for.
All these functions take a regular expression as their first parameter. This regular expression must be formatted like a Perl regular expression operator. E.g. to test if regex matches the subject case insensitively, you'd use the MySQL code PREG_RLIKE('/regex/i', subject). This is similar to PHP's preg functions, which also require the extra // delimiters for regular expressions inside the PHP string.
See this post: How to do a regular expression replace in MySQL?
Either that or you could just write a script in any lanugage which goes through each record, does a regex replacement and then updates the field. For more info on regex, see here: http://www.regular-expressions.info/reference.html
There's a number of options. One might be to use SUBSTRING_INDEX():
UPDATE
table
SET field = SUBSTRING_INDEX( field, '">', 1 )
It's possible - there is a syntax for User Defined Functions which would let you pass in a regular expression pattern that matches the link and strips everything else.
However, this is quite complicated for somebody new to MySQL, and from your question, this sounds like a one-off. In which case - why not just use Excel and then reimport the data?
Great stuff!
All seems doable with a little bit of time and self education.
In the end, I exported that table as a CSV in Sequel Pro and did some nifty find and replace work in Coda. Not as sophisticated as your suggestions, but it worked.
Thanks again,
~Kyle~

MySQL query for "starts with but may not fully contain"

Is there a way to do a MySQL query for data fields that start with but may not fully contain a given string?
For instance, if I had the following list of data items:
my_table
1. example.com
2. example.com/subpage
3. subdomain.example.com
4. ain.example.com
5. ple.com
I would like to feed
"example.com/subpage" and return #1, #2
"example.com" and return #1
"wexample.com" and return nothing
"exa" and return nothing
"subdomain.example.com/subpage" and return #3
Thanks a lot!
Given:
CREATE TABLE paths ( path VARCHAR(255) NOT NULL );
Searching for "example.com/subpage" would require the following query:
SELECT * FROM paths WHERE INSTR("example.com/subpage", path) = 1;
Just don't try to run it over a large dataset frequently...
Docs: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
Since your test data indicates you don't want character-by-character matching (but something more like component by component), split the input into the components and search on all prefixes.
If you want to return results for example.com but not exam, you are NOT searching for something that "starts with" yuour input. Not sure if the question is wrong or the examples there.
If the examples are correct, you're going to need to do something to identify if your input is a URL or not using pattern matching like regex or at least specify some solid rules around what you want to match. You'll probably need to explain those rules before a correct recommendation can be made too.
It might be as simple as extracting anything before the "/" if there is one or using your application to break up your request to a url component and a path component.
Mode info on regex in mysql
It seems that you want the column value to match the start of your pattern:
SELECT * FROM my_table WHERE 'example.com' LIKE CONCAT(my_table.my_column, '%');
The downside of this is that it isn't going to use any indexes on my_column.