Using cts query to retrieve collections associated with the given document uri- Marklogic - json

I need to retrieve the collections to which a given document belongs in Marklogic.
I know xdmp command does that. But I need to use it in cts query to retrieve the data and then filter records from it.
xdmp:document-get-collections("uri of document") can't be run inside cts-query to give appropriate data.
Any idea how can it be done using cts query?
Thanks

A few options come to mind:
Option One: Use cts:values()
cts:values(cts:collection-reference())
If you check out the documentation, you will see that you can also restrict this to certain fragments by passing a query as one of the parameters.
**Update: [11-10-2017]
The comment attached to this asked for a sample of restricting the results of cts:values() to a single document(for practical purposes, I will say fragment == document)
The documentation for cts:values explains this. It is the 4th parameter - a query to restrict the results. Get to know this pattern as it is part of many features of MarkLogic. It is your friend. The query I would use for this problem statement would be a cts:document-query();
An Example:
cts:values(
cts:collection-reference(),
(),
(),
cts:document-query('/path/to/my/document')
)
Full Example:
cts:search(
collection(),
cts:collection-query(
cts:values(
cts:collection-reference(),
(),
(),
cts:document-query('/path/to/my/document')
)
)
)[1 to 10]
Option two: use cts:collection-match()
Need more control over returning just some of the collections from a document, then use cts:colection-match(). Like the first option, you can restrict the results to just some fragments. However, it has the benefit of having an option for a pattern.
Attention:
They both return a sequence - perfect for feeding into other parts of your query. However, under the hood, I believe they work differently. The second option is run against a lexicon. The larger the list of unique collection names and the more complex your pattern match, the longer for resolution. I use collection-match in projects. However, I usually use it when I can limit the possible choices by restricting the results to a smaller number of documents.

You can't do this in a single step. You have to run code first to retrieve collections associated with a document. You can use something like xdmp:document-get-collections for that. You then have to feed that into a cts query that you build dynamically:
let $doc-collections := xdmp:document-get-collections($doc-uri)
return
cts:search(collection(), cts:collection-query($doc-collections))[1 to 10]
HTH!

Are you looking for cts:collection-query()?

Insert two XML files to the same collection:
xquery version "1.0-ml";
xdmp:document-insert("/a.xml", <root><sub1><a>aaa</a></sub1></root>,
map:map() => map:with("collections", ("coll1")));
xdmp:document-insert("/b.xml", <root><sub2><a>aaa</a></sub2></root>,
map:map() => map:with("collections", ("coll1")));
Search the collection:
xquery version "1.0-ml";
let $myColl:= xdmp:document-get-collections("/a.xml")
return
cts:search(/root,
cts:and-query((cts:collection-query($myColl),cts:element-query(xs:QName("a"),"aaa")
)))

Related

Using property OR in "conditions" parameter of askargs action with Semantic MediaWiki API

I'm trying to fetch results via API using the module askargs. I have no problems getting results when I have just one condition or more conditions aggregated with the operator AND where I make use of the pipe character to separate them (like written in documentation).
E.g.
[[Category:+]] AND [[Jurisdiction::A]] AND [[Type::B]]
Category:+ | Jurisdiction::A | Type::B
But the pipe character doesn't work with OR.
I need to be able to use both logical conditions with several arguments within the same query.
Am I missing something?
Am I missing something?
No. The API doesn't handle OR condition, due to simplistic code in the query parameters formatter.
See file SemanticMediaWiki/src/MediaWiki/Api/ApiRequestParameterFormatter.php
at line 132:
protected function formatConditions( $condition ) {
return "[[$condition]]";
}
Every condition in the query is formatted with surrounding brackets, leading OR to be interpreted as a page title.
An alternative is to use Special:Ask with URL encoded query and json format:
https://www.semantic-mediawiki.org/wiki/Special:Ask/-5B-5BHas-20keyword::askargs-5D-5DOR-5B-5BHas-20keyword::ask-5D-5D/-3F%3Dhelp-20page/-3FHas-20description%3Ddescription/format%3Djson
Since I came here from a website search i'm going to add another neat possibility:
If you use the Alternative separator you can use a double pipe as logical OR conjunction.
Example:
%1FCategory:+%1FJurisdiction::A%1FType::B||C
Which should be read as following
Category:+ AND Jurisdiction::A AND (Type::B OR Type::C)

Using django query set values() to index into JSONField

I am using django with postgres, and have a bunch of JSON fields (some of them quite large and detailed) within my model. I'm in the process of switching from char based ones to jsonb fields, which allows me to filter on a key within the field, and I'm wondering if there is any way to get the equivalent benefit out of a call to the query set values method.
Example:
What I would like to do, given a Car model with options JSONField, is something like
qset = Car.objects.filter(options__interior__color='red')
vals = qset.values('options__interior__material')
Please excuse the lame toy problem, but hopefully it gets the idea across. Here the filter call does exactly what I want, but the call to values does not seem to be aware of the special nature of the JSON field. I get an error because values can't find the field called "interior" to join on. Is there some other syntax or option that I am missing that will make this work?
Seems like a pretty obvious extension to the existing functionality, but I have so far failed to find any reference to something similar in the docs or through stack overflow or google searches.
Edit - a workaround:
After playing around, looks like this could be fudged by inserting the following in between the two lines of code above:
qset=qset.annotate(options__interior__material=RawSQL("SELECT options->'interior'->'material'",()))
I say "fudged" because it seems like an abuse of notation and would require special treatment for integer indices.
Still hoping for a better answer.
I can suggest a bit cleaner way with using:
django's Func
https://docs.djangoproject.com/en/2.0/ref/models/expressions/#func-expressions
and postgres function jsonb_extract_path_text https://www.postgresql.org/docs/9.5/static/functions-json.html
from django.db.models import F, Func, CharField, Value
Car.objects.all().annotate(options__interior__material =
Func(
F('options'),
Value('interior'),
Value('material'),
function='jsonb_extract_path_text'
),
)
Perhaps a better solution (for Django >= 1.11) is to use something like this:
from django.contrib.postgres.fields.jsonb import KeyTextTransform
Car.objects.filter(options__interior__color='red').annotate(
interior_material=KeyTextTransform('material', KeyTextTransform('interior', 'options'))
).values('interior_material')
Note that you can nest KeyTextTransform expressions to pull out the value(s) you need.
Car.objects.extra(select={'interior_material': "options#>'{interior, material}'"})
.filter(options__interior__color='red')
.values('interior_material')
You can utilize .extra() and add postgres jsonb operators
Postgres jsonb operators: https://www.postgresql.org/docs/9.5/static/functions-json.html#FUNCTIONS-JSON-OP-TABLE

Complex Gremlin queries to output nodes/edges

I am trying to implement a query and graph visualisation framework that allows a user to enter a Gremlin query, returning a D3 graph of results. The D3 graph is built using a JSON - this is created using separate vertices and edges outputs from the Gremlin query. For simple queries such as:
g.V.filter{it.attr_a == "foo"}
this works fine. However, when I try to perform a more complicated query such as the following:
g.E.filter{it.attr_a == 'foo'}.groupBy{it.attr_b}{it.outV.value}.cap.next().findAll{k,e->e.size()<=3}
- Find all instances of *value*
- Grouped by unique *attr_b*
- Where *attr_a* = foo
- And *attr_b* is paired with no more than 2 other instances of *value*
Instead, the output is of the following form:
attr_b1: {value1, value2, value3}
attr_b2: {value4}
attr_b3: {value6, value7}
I would like to know if there is a way for Gremlin to output the results as a list of nodes and edges so I can display the results as a graph. I am aware that I could edit my D3 code to take in this new output but there are currently no restrictions to the type/complexity of the query, so the key/value pairs will no necessarily be the same every time.
Thanks.
You've hit what I consider one of the key problems with visualizing Gremlin results. They can be anything. Gremlin results might not just be a list of vertices and edges. There is no way to really control this that I can think of. At the end of the day, you can really only visualize results that match a pattern that D3 expects. I'd start by trying to detect that pattern and visualize only in those cases (simply display non-recognized patterns as JSON perhaps).
Thinking of your specific example that results like this:
attr_b1: {value1, value2, value3}
attr_b2: {value4}
attr_b3: {value6, value7}
What would you want D3 to visualize there? The vertices/edges that were traversed over to get that result? If so, you might be stuck. Gremlin doesn't give you a way to introspect the pipeline to see what's passing through it. In other words, unless the user explicitly gathers vertices and edges within the pipeline that were touched you won't have access to them. It would be nice to be able to "spy" on a pipeline in that way, but at the moment it doesn't do that. There's been internal discussion within TinkerPop to create a new kind of pipeline implementation that would help with that, but at the moment, it doesn't exist.
So, without the "spying" capability, I think your only workarounds would be to:
detect vertex/edge list on your client side and only render those with d3. this would force users to always write gremlin that returned data in such a format, if they wanted visualization. put it in the users hands.
perhaps supply server-side bindings for a list of vertices/edges that a user could explicitly side-effect their vertices/edges into if their results did not conform to those expected by your visualization engine. again, this would force users to write their gremlin appropriately for your needs if they want visualization.

mongodb translation for sql INSERT...SELECT

How to simply duplicate documents from collectionABC and copy them into collectionB if a condition like {conditionB:1} and add a timestamp like ts_imported - without knowing the details contained within the original documents?
I could not find a simple equivalent for mongodb which is similar to mysql's INSERT ... SELECT ...
You can use javascript from mongoshell to achieve a similar result:
db.collectionABC.find({ conditionB: 1 }).
forEach( function(i) {
i.ts_imported = new Date();
db.collectionB.insert(i);
});
I realise that this is an old question but...there is a better way of doing it now. MongoDB has now something called aggregation pipeline (v 3.6 and above, maybe some older ones too - I haven't checked). The aggregation pipeline allows you to do more complex things like perform joins, add fields and save documents into a different collection. For the OP's case, the pipeline would look like this:
var pipeline = [
{$match: {conditionB: 1}},
{$addFields: {ts_imported: ISODate()}},
{$out: 'collectionB'}
]
// now run the pipeline
db.collectionABC.aggregate(pipeline)
Relevant docs:
Aggregation pipeline
$out stage
some important limits
Mongodb does not have that kind of querying ability whereby you can (inside the query) insert to another collection based upon variables from the first collection.
You will need to pull that document out first and then operate on it.
You could technically use an MR for this but I have a feeling it will not work for your scenario.
It seems that this works in the context of sequence generation
http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/

LINQ-SQL reuse - CompiledQuery.Compile

I have been playing about with LINQ-SQL, trying to get re-usable chunks of expressions that I can hot plug into other queries. So, I started with something like this:
Func<TaskFile, double> TimeSpent = (t =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Then, we can use the above in a LINQ query like the below (LINQPad example):
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
This produces the expected output, except, a query per row is generated for the plugged expression. This is visible within LINQPad. Not good.
Anyway, I noticed the CompiledQuery.Compile method. Although this takes a DataContext as a parameter, I thought I would include ignore it, and try the same Func. So I ended up with the following:
static Func<UserQuery, TaskFile, double> TimeSpent =
CompiledQuery.Compile<UserQuery, TaskFile, double>(
(UserQuery db, TaskFile t) =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Notice here, that I am not using the db parameter. However, now when we use this updated parameter, only 1 SQL query is generated. The Expression is successfully translated to SQL and included within the original query.
So my ultimate question is, what makes CompiledQuery.Compile so special? It seems that the DataContext parameter isn't needed at all, and at this point i am thinking it is more a convenience parameter to generate full queries.
Would it be considered a good idea to use the CompiledQuery.Compile method like this? It seems like a big hack, but it seems like the only viable route for LINQ re-use.
UPDATE
Using the first Func within a Where statment, we see the following exception as below:
NotSupportedException: Method 'System.Object DynamicInvoke(System.Object[])' has no supported translation to SQL.
Like the following:
.Where(t => TimeSpent(t) > 2)
However, when we use the Func generated by CompiledQuery.Compile, the query is successfully executed and the correct SQL is generated.
I know this is not the ideal way to re-use Where statements, but it shows a little how the Expression Tree is generated.
Exec Summary:
Expression.Compile generates a CLR method, wheras CompiledQuery.Compile generates a delegate that is a placeholder for SQL.
One of the reasons you did not get a correct answer until now is that some things in your sample code are incorrect. And without the database or a generic sample someone else can play with chances are further reduced (I know it's difficult to provide that, but it's usually worth it).
On to the facts:
Expression<Func<TaskFile, double>> TimeSpent = (t =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Then, we can use the above in a LINQ query like the below:
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
(Note: Maybe you used a Func<> type for TimeSpent. This yields the same situation as of you're scenario was as outlined in the paragraph below. Make sure to read and understand it though).
No, this won't compile. Expressions can't be invoked (TimeSpent is an expression). They need to be compiled into a delegate first. What happens under the hood when you invoke Expression.Compile() is that the Expression Tree is compiled down to IL which is injected into a DynamicMethod, for which you get a delegate then.
The following would work:
var q = TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent.Compile().DynamicInvoke()
});
This produces the expected output, except, a query per row is
generated for the plugged expression. This is visible within LINQPad.
Not good.
Why does that happen? Well, Linq To Sql will need to fetch all TaskFiles, dehydrate TaskFile instances and then run your selector against it in memory. You get a query per TaskFile likely because they contains one or multiple 1:m mappings.
While LTS allows projecting in memory for selects, it does not do so for Wheres (citation needed, this is to the best of my knowledge). When you think about it, this makes perfect sense: It is likely you will transfer a lot more data by filtering the whole database in memory, then by transforming a subset of it in memory. (Though it creates query performance issues as you see, something to be aware of when using an ORM).
CompiledQuery.Compile() does something different. It compiles the query to SQL and the delegate it returns is only a placeholder Linq to SQL will use internally. You can't "invoke" this method in the CLR, it can only be used as a node in another expression tree.
So why does LTS generate an efficient query with the CompiledQuery.Compile'd expression then? Because it knows what this expression node does, because it knows the SQL behind it. In the Expression.Compile case, it's just a InvokeExpression that invokes the DynamicMethod as I explained previously.
Why does it require a DataContext Parameter? Yes, it's more convenient for creating full queries, but it's also because the Expression Tree compiler needs to know the Mapping to use for generating the SQL. Without this parameter, it would be a pain to find this mapping, so it's a very sensible requirement.
I'm surprised why you've got no answers on this so far. CompiledQuery.Compile compiles and caches the query. That is why you see only one query being generated.
Not only this is NOT a hack, this is the recommended way!
Check out these MSDN articles for detailed info and example:
Compiled Queries (LINQ to Entities)
How to: Store and Reuse Queries (LINQ to SQL)
Update: (exceeded the limit for comments)
I did some digging in reflector & I do see DataContext being used. In your example, you're simply not using it.
Having said that, the main difference between the two is that the former creates a delegate (for the expression tree) and the latter creates the SQL that gets cached and actually returns a function (sort of). The first two expressions produce the query when you call Invoke on them, this is why you see multiple of them.
If your query doesn't change, but only the DataContext and Parameters, and if you plan to use it repeatedly, CompiledQuery.Compile will help. It is expensive to Compile, so for one off queries, there is no benefit.
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
This isn't a LinqToSql query, as there is no DataContext instance. Most likely you are querying some EntitySet, which does not implement IQueryable.
Please post complete statements, not statement fragments. (I see invalid comma, no semicolon, no assignment).
Also, Try this:
var query = myDataContext.TaskFiles
.Where(tf => tf.Parent.Key == myParent.Key)
.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t)
});
// where myParent is the source of the EntitySet and Parent is a relational property.
// and Key is the primary key property of Parent.