LinqToSql and full text search - can it be done? - linq-to-sql

Has anyone come up with a good way of performing full text searches (FREETEXT() CONTAINS()) for any number of arbitrary keywords using standard LinqToSql query syntax?
I'd obviously like to avoid having to use a Stored Proc or have to generate a Dynamic SQL calls.
Obviously I could just pump the search string in on a parameter to a SPROC that uses FREETEXT() or CONTAINS(), but I was hoping to be more creative with the search and build up queries like:
"pepperoni pizza" and burger, not "apple pie".
Crazy I know - but wouldn't it be neat to be able to do this directly from LinqToSql? Any tips on how to achieve this would be much appreciated.
Update: I think I may be on to something here...
Also: I rolled back the change made to my question title because it actually changed the meaning of what I was asking. I know that full text search is not supported in LinqToSql - I would have asked that question if I wanted to know that. Instead - I have updated my title to appease the edit-happy-trigger-fingered masses.

I've manage to get around this by using a table valued function to encapsulate the full text search component, then referenced it within my LINQ expression maintaining the benefits of delayed execution:
string q = query.Query;
IQueryable<Story> stories = ActiveStories
.Join(tvf_SearchStories(q), o => o.StoryId, i => i.StoryId, (o,i) => o)
.Where (s => (query.CategoryIds.Contains(s.CategoryId)) &&
/* time frame filter */
(s.PostedOn >= (query.Start ?? SqlDateTime.MinValue.Value)) &&
(s.PostedOn <= (query.End ?? SqlDateTime.MaxValue.Value)));
Here 'tvf_SearchStories' is the table valued function that internally uses full text search

Unfortunately LINQ to SQL does not support Full Text Search.
There are a bunch of products out there that I think could: Lucene.NET, NHibernate Search comes to mind. LINQ for NHibernate combined with NHibernate Search would probably give that functionality, but both are still way deep in beta.

Related

Using django query set values() to index into JSONField

I am using django with postgres, and have a bunch of JSON fields (some of them quite large and detailed) within my model. I'm in the process of switching from char based ones to jsonb fields, which allows me to filter on a key within the field, and I'm wondering if there is any way to get the equivalent benefit out of a call to the query set values method.
Example:
What I would like to do, given a Car model with options JSONField, is something like
qset = Car.objects.filter(options__interior__color='red')
vals = qset.values('options__interior__material')
Please excuse the lame toy problem, but hopefully it gets the idea across. Here the filter call does exactly what I want, but the call to values does not seem to be aware of the special nature of the JSON field. I get an error because values can't find the field called "interior" to join on. Is there some other syntax or option that I am missing that will make this work?
Seems like a pretty obvious extension to the existing functionality, but I have so far failed to find any reference to something similar in the docs or through stack overflow or google searches.
Edit - a workaround:
After playing around, looks like this could be fudged by inserting the following in between the two lines of code above:
qset=qset.annotate(options__interior__material=RawSQL("SELECT options->'interior'->'material'",()))
I say "fudged" because it seems like an abuse of notation and would require special treatment for integer indices.
Still hoping for a better answer.
I can suggest a bit cleaner way with using:
django's Func
https://docs.djangoproject.com/en/2.0/ref/models/expressions/#func-expressions
and postgres function jsonb_extract_path_text https://www.postgresql.org/docs/9.5/static/functions-json.html
from django.db.models import F, Func, CharField, Value
Car.objects.all().annotate(options__interior__material =
Func(
F('options'),
Value('interior'),
Value('material'),
function='jsonb_extract_path_text'
),
)
Perhaps a better solution (for Django >= 1.11) is to use something like this:
from django.contrib.postgres.fields.jsonb import KeyTextTransform
Car.objects.filter(options__interior__color='red').annotate(
interior_material=KeyTextTransform('material', KeyTextTransform('interior', 'options'))
).values('interior_material')
Note that you can nest KeyTextTransform expressions to pull out the value(s) you need.
Car.objects.extra(select={'interior_material': "options#>'{interior, material}'"})
.filter(options__interior__color='red')
.values('interior_material')
You can utilize .extra() and add postgres jsonb operators
Postgres jsonb operators: https://www.postgresql.org/docs/9.5/static/functions-json.html#FUNCTIONS-JSON-OP-TABLE

Extract all words from text field in mysql

I have a table that contains text fields. In those fields I store text. There are around 20 to 50 sentences in each field depending on the row. I am making an auto-complete HTML object with HTML and PHP, and I would like to start typing the beginning of a word and that the database return sentences containing those words (Like Microsoft office 2007/2010 navigation pane).
I need mysql to return those words or sentences as a separate result, so i can manipulate them further.
Example:
--------------------------------------------------------------------
| id | title |content |
--------------------------------------------------------------------
1 | test 1 | PHP is a very nice language and has nice features.
2 | test 2 | Spain is a nice country to visit and has nice language.
3 | test 3 | Perl isn\'t as nice a language as PHP.
I need mysql query to return following as different result:
1,"nice language"
1,"nice features"
2,"nice country"
2,"nice langugage"
3,"nicea a language"
Here is my sql query:
SELECT id, SUBSTR(content,POSITION('nice' IN content),50)
FROM entries
MATCH (title,entry) AGAINST ('nice' WITH QUERY EXPANSION)
New Answer
OP is actually asking nothing to do with php and javascript - his question concerns doing string manipulation directly within MySQL.
String manipulation isn't really the main focus of a DBMS. When dealing with "words" in a fluid text sense, there's a lot of logic required to determine where the next word boundary is, and you don't want your database doing this really. Plus, any queries written to do this will probably be incredibly difficult to read.
It depends exactly what you are doing, but it's quite likely that a DB only approach will be slower because there will be more function calls: SQL functions are pretty limited.
And for re-usability and best practice, what if you wanted to change your database in the future to say MongoDB? You'd need to re-write the whole damned awkward query.
No, my suggestion would be to pull the whole value using standard MySQL into PHP, throw it into PCRE, very simple regex, job done. It's better to show what you're actually doing in your PHP code as it's more "intention revealing".
At least 33% of a developer's work is picking the right tool for the job. PHP is the right tool in this example.
Original Answer
You have included the tags php and javascript, so I'm guessing (although your question needs more clarification on this) that you obviously want this 'autocomplete' running client-side. So as a result, you have to get your data from server-side to client-side first.
Twitter Bootstrap has something really cool called Typeahead. This uses JavaScript to perform (what I think) you require: the example on that page shows how you can type a country and it'll auto-complete it for you. It looks like this:
How do you get this working? Include the required JavaScript file first, and then write your HTML.
Here's some from the source code of the bootstrap page so you can see how it works:
<input type="text" data-provide="typeahead" data-items="4" data-source='["Alabama","Alaska","Arizona","Arkansas","California"]'>
Can you see how the data-source attribute is the one that gives the typeahead the information you want? You want to connect to MySQL, grab your data, and shove these into the data-source array for the JavaScript to work with, as above.
So, on your page load, you connect to MySQL and you pull all the relevant strings you would like to be "auto-complete-able" from the Database. You then put these as new Data attributes for the typeahead, and that's pretty much it!
--
Edit: There's a fork of twitter bootstrap's typeahead that allows AJAX calls, so you could use this to perform the data retrieval asynchronously (if you can figure it out, I'd recommend this approach).

LINQ-SQL reuse - CompiledQuery.Compile

I have been playing about with LINQ-SQL, trying to get re-usable chunks of expressions that I can hot plug into other queries. So, I started with something like this:
Func<TaskFile, double> TimeSpent = (t =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Then, we can use the above in a LINQ query like the below (LINQPad example):
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
This produces the expected output, except, a query per row is generated for the plugged expression. This is visible within LINQPad. Not good.
Anyway, I noticed the CompiledQuery.Compile method. Although this takes a DataContext as a parameter, I thought I would include ignore it, and try the same Func. So I ended up with the following:
static Func<UserQuery, TaskFile, double> TimeSpent =
CompiledQuery.Compile<UserQuery, TaskFile, double>(
(UserQuery db, TaskFile t) =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Notice here, that I am not using the db parameter. However, now when we use this updated parameter, only 1 SQL query is generated. The Expression is successfully translated to SQL and included within the original query.
So my ultimate question is, what makes CompiledQuery.Compile so special? It seems that the DataContext parameter isn't needed at all, and at this point i am thinking it is more a convenience parameter to generate full queries.
Would it be considered a good idea to use the CompiledQuery.Compile method like this? It seems like a big hack, but it seems like the only viable route for LINQ re-use.
UPDATE
Using the first Func within a Where statment, we see the following exception as below:
NotSupportedException: Method 'System.Object DynamicInvoke(System.Object[])' has no supported translation to SQL.
Like the following:
.Where(t => TimeSpent(t) > 2)
However, when we use the Func generated by CompiledQuery.Compile, the query is successfully executed and the correct SQL is generated.
I know this is not the ideal way to re-use Where statements, but it shows a little how the Expression Tree is generated.
Exec Summary:
Expression.Compile generates a CLR method, wheras CompiledQuery.Compile generates a delegate that is a placeholder for SQL.
One of the reasons you did not get a correct answer until now is that some things in your sample code are incorrect. And without the database or a generic sample someone else can play with chances are further reduced (I know it's difficult to provide that, but it's usually worth it).
On to the facts:
Expression<Func<TaskFile, double>> TimeSpent = (t =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Then, we can use the above in a LINQ query like the below:
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
(Note: Maybe you used a Func<> type for TimeSpent. This yields the same situation as of you're scenario was as outlined in the paragraph below. Make sure to read and understand it though).
No, this won't compile. Expressions can't be invoked (TimeSpent is an expression). They need to be compiled into a delegate first. What happens under the hood when you invoke Expression.Compile() is that the Expression Tree is compiled down to IL which is injected into a DynamicMethod, for which you get a delegate then.
The following would work:
var q = TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent.Compile().DynamicInvoke()
});
This produces the expected output, except, a query per row is
generated for the plugged expression. This is visible within LINQPad.
Not good.
Why does that happen? Well, Linq To Sql will need to fetch all TaskFiles, dehydrate TaskFile instances and then run your selector against it in memory. You get a query per TaskFile likely because they contains one or multiple 1:m mappings.
While LTS allows projecting in memory for selects, it does not do so for Wheres (citation needed, this is to the best of my knowledge). When you think about it, this makes perfect sense: It is likely you will transfer a lot more data by filtering the whole database in memory, then by transforming a subset of it in memory. (Though it creates query performance issues as you see, something to be aware of when using an ORM).
CompiledQuery.Compile() does something different. It compiles the query to SQL and the delegate it returns is only a placeholder Linq to SQL will use internally. You can't "invoke" this method in the CLR, it can only be used as a node in another expression tree.
So why does LTS generate an efficient query with the CompiledQuery.Compile'd expression then? Because it knows what this expression node does, because it knows the SQL behind it. In the Expression.Compile case, it's just a InvokeExpression that invokes the DynamicMethod as I explained previously.
Why does it require a DataContext Parameter? Yes, it's more convenient for creating full queries, but it's also because the Expression Tree compiler needs to know the Mapping to use for generating the SQL. Without this parameter, it would be a pain to find this mapping, so it's a very sensible requirement.
I'm surprised why you've got no answers on this so far. CompiledQuery.Compile compiles and caches the query. That is why you see only one query being generated.
Not only this is NOT a hack, this is the recommended way!
Check out these MSDN articles for detailed info and example:
Compiled Queries (LINQ to Entities)
How to: Store and Reuse Queries (LINQ to SQL)
Update: (exceeded the limit for comments)
I did some digging in reflector & I do see DataContext being used. In your example, you're simply not using it.
Having said that, the main difference between the two is that the former creates a delegate (for the expression tree) and the latter creates the SQL that gets cached and actually returns a function (sort of). The first two expressions produce the query when you call Invoke on them, this is why you see multiple of them.
If your query doesn't change, but only the DataContext and Parameters, and if you plan to use it repeatedly, CompiledQuery.Compile will help. It is expensive to Compile, so for one off queries, there is no benefit.
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
This isn't a LinqToSql query, as there is no DataContext instance. Most likely you are querying some EntitySet, which does not implement IQueryable.
Please post complete statements, not statement fragments. (I see invalid comma, no semicolon, no assignment).
Also, Try this:
var query = myDataContext.TaskFiles
.Where(tf => tf.Parent.Key == myParent.Key)
.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t)
});
// where myParent is the source of the EntitySet and Parent is a relational property.
// and Key is the primary key property of Parent.

Parsing HTML content into a MySQL database using a parser

I want to be able to parse specific content from a website into a mySQL database. For example, on site http://allrecipes.com/Recipe/Fluffy-Pancakes-2/Detail.aspx I want to parse into my database (which has a table with columns RecipeName, Ingredients 1-10).
So basically my database will contain the name and all the ingredients for that recipe. There is no need to edit the content, simply parse them in as is (i.e. 3/4 cup milk) since i am using character in my database.
How exactly do I go about doing this? I was looking a pre-built parsers and it seems its tough to find one that's easy to use since I am fairly new to programming. Of course, I can manually enter values in but I want to parse them in.
Would it be possible to just parse this content and write a file that has a RecipieName, Ingredient string which I can then parse into my database? Or should I just do it directly into the database? I am unsure as to how to connect a database to a parser also directly, but I might be able to find some information online.
Basically, I am looking for help on how to exactly go about doing this since I am not very well versed in programming and this seems to be a lot more complicated than it might be.
I am using Java as my main language right now, although I can't say I am very good at it. But I should be able to understand the basic concepts.
Any suggestions on what parser to use or how to do this?
Thanks!
This is how I would do it in PHP. This is almost certainly NOT the most efficient way to do it, nor has it been debugged.
function parseHTML($rawHTML){
$startPosition = strpos($rawHTML,'<div class="ingredients"'); //Find the position of the beginning of the ingredients list, return the character number.
$endPosition = strpos($rawHTML,'</div>',$startPosition); //Find the position of the end of the ingredients list, begin searching from the beginning of the list (found in step 1)
$relevantPart = substr($rawHTML,$startPosition,$endPosition); //Isolate the ingredients list
$parsedString = strip_tags($relevantPart); //Strip the HTML tags off of the ingredients list
return $parsedString;
}
Still to be done: You say you have a mySQL database with 10 separate ingredients columns. This code outputs everything as one big string. You would have to change the strip_tags($relevantPart) function to strip_tags($relevantPart,"<li>"). That would let the <li> tags through. Then, you would have to loop through every <li> tag, performing a similar function to this. It shouldn't be too hard, but I don't feel comfortable writing it with no functioning PHP server.

Which DAL libraries support stored procedure execution and results materialisation

I'm used to EF because it usually works just fine as long as you get to know it better, so you know how to optimize your queries. But.
What would you choose when you know you'll be working with large quantities of data? I know I wouldn't want to use EF in the first place and cripple my application. I would write highly optimised stored procedures and call those to get certain very narrow results (with many joins so they probably won't just return certain entities anyway).
So I'm a bit confused which DAL technology/library I should use? I don't want to use SqlConnection/SqlCommand way of doing it, since I would have to write much more code that's likely to hide some obscure bugs.
I would like to make bug surface as small as possible and use a technology that will accommodate my process not vice-a-versa...
Is there any library that gives me the possibility to:
provide the means of simple SP execution by name
provide automatic materialisation of returned data so I could just provide certain materialisers by means of lambda functions?
like:
List<Person> result = Context.Execute("StoredProcName", record => new Person{
Name = record.GetData<string>("PersonName"),
UserName = record.GetData<string>("UserName"),
Age = record.GetData<int>("Age"),
Gender = record.GetEnum<PersonGender>("Gender")
...
});
or even calling stored procedure that returns multiple result sets etc.
List<Question> result = Context.ExecuteMulti("SPMultipleResults", q => new Question {
Id = q.GetData<int>("QuestionID"),
Title = q.GetData<string>("Title"),
Content = q.GetData<string>("Content"),
Comments = new List<Comment>()
}, c => new Comment {
Id = c.GetData<int>("CommentID"),
Content = c.GetData<string>("Content")
});
Basically this last one wouldn't work, since this one doesn't have any knowledge how to bind both together... but you get the point.
So to put it all down to a single question: Is there a DAL library that's optimised for stored procedure execution and data materialisation?
Business Layer Toolkit might be exactly what's needed here. It's a lightweight ORM tool that supports lots of scenarios including multiple result sets although they seem very complicated to do.