mongodb translation for sql INSERT...SELECT - mysql

How to simply duplicate documents from collectionABC and copy them into collectionB if a condition like {conditionB:1} and add a timestamp like ts_imported - without knowing the details contained within the original documents?
I could not find a simple equivalent for mongodb which is similar to mysql's INSERT ... SELECT ...

You can use javascript from mongoshell to achieve a similar result:
db.collectionABC.find({ conditionB: 1 }).
forEach( function(i) {
i.ts_imported = new Date();
db.collectionB.insert(i);
});

I realise that this is an old question but...there is a better way of doing it now. MongoDB has now something called aggregation pipeline (v 3.6 and above, maybe some older ones too - I haven't checked). The aggregation pipeline allows you to do more complex things like perform joins, add fields and save documents into a different collection. For the OP's case, the pipeline would look like this:
var pipeline = [
{$match: {conditionB: 1}},
{$addFields: {ts_imported: ISODate()}},
{$out: 'collectionB'}
]
// now run the pipeline
db.collectionABC.aggregate(pipeline)
Relevant docs:
Aggregation pipeline
$out stage
some important limits

Mongodb does not have that kind of querying ability whereby you can (inside the query) insert to another collection based upon variables from the first collection.
You will need to pull that document out first and then operate on it.
You could technically use an MR for this but I have a feeling it will not work for your scenario.

It seems that this works in the context of sequence generation
http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/

Related

Using cts query to retrieve collections associated with the given document uri- Marklogic

I need to retrieve the collections to which a given document belongs in Marklogic.
I know xdmp command does that. But I need to use it in cts query to retrieve the data and then filter records from it.
xdmp:document-get-collections("uri of document") can't be run inside cts-query to give appropriate data.
Any idea how can it be done using cts query?
Thanks
A few options come to mind:
Option One: Use cts:values()
cts:values(cts:collection-reference())
If you check out the documentation, you will see that you can also restrict this to certain fragments by passing a query as one of the parameters.
**Update: [11-10-2017]
The comment attached to this asked for a sample of restricting the results of cts:values() to a single document(for practical purposes, I will say fragment == document)
The documentation for cts:values explains this. It is the 4th parameter - a query to restrict the results. Get to know this pattern as it is part of many features of MarkLogic. It is your friend. The query I would use for this problem statement would be a cts:document-query();
An Example:
cts:values(
cts:collection-reference(),
(),
(),
cts:document-query('/path/to/my/document')
)
Full Example:
cts:search(
collection(),
cts:collection-query(
cts:values(
cts:collection-reference(),
(),
(),
cts:document-query('/path/to/my/document')
)
)
)[1 to 10]
Option two: use cts:collection-match()
Need more control over returning just some of the collections from a document, then use cts:colection-match(). Like the first option, you can restrict the results to just some fragments. However, it has the benefit of having an option for a pattern.
Attention:
They both return a sequence - perfect for feeding into other parts of your query. However, under the hood, I believe they work differently. The second option is run against a lexicon. The larger the list of unique collection names and the more complex your pattern match, the longer for resolution. I use collection-match in projects. However, I usually use it when I can limit the possible choices by restricting the results to a smaller number of documents.
You can't do this in a single step. You have to run code first to retrieve collections associated with a document. You can use something like xdmp:document-get-collections for that. You then have to feed that into a cts query that you build dynamically:
let $doc-collections := xdmp:document-get-collections($doc-uri)
return
cts:search(collection(), cts:collection-query($doc-collections))[1 to 10]
HTH!
Are you looking for cts:collection-query()?
Insert two XML files to the same collection:
xquery version "1.0-ml";
xdmp:document-insert("/a.xml", <root><sub1><a>aaa</a></sub1></root>,
map:map() => map:with("collections", ("coll1")));
xdmp:document-insert("/b.xml", <root><sub2><a>aaa</a></sub2></root>,
map:map() => map:with("collections", ("coll1")));
Search the collection:
xquery version "1.0-ml";
let $myColl:= xdmp:document-get-collections("/a.xml")
return
cts:search(/root,
cts:and-query((cts:collection-query($myColl),cts:element-query(xs:QName("a"),"aaa")
)))

Custom `returnFormat` in ColdFusion 10 or 11?

I've a function which is called from different components, .cfms or remotely. It returns the results of a query.
Sometimes the response from this function is manually inspected - a person may want to see the ID of a specific record so they can use it elsewhere.
The provided return formats, being wddx, json, plain all aren't very easily readable for a layman.
I'd love to be able to create a new return format: dump, where the result first writeDumped and then returned to the caller.
I know there'd be more complicated ways of solving this, like writing a function dump, and calling that like a proxy by providing the component, function and parameters so it can call that function and return the results.
However I don't think it's worth going that far. I figured it'd be great if I could just write a new return format, because that's just... intuitive and nice, and I may also be able to use that technique to solve different problems or improve various workflows.
Is there a way to create custom function returnFormats in ColdFusion 10 or 11?
(From comments)
AFAIK, you cannot add a custom returntype to a cffunction, but take a look at OnCFCRequest. Might be able to use it to build something more generic that responds differently whenever a custom URL parameter is passed, ie url.returnformat=yourType. Same net effect as dumping and/or manipulating the result manually, just a little more automated.
From the comments, the return type of the function is query. That being the case, there is simply no need for a custom return format. If you want to dump the query results, do so.
queryVar = objectName.nameOfFunction(arguments);
writeDump (queryVar);

Complex Gremlin queries to output nodes/edges

I am trying to implement a query and graph visualisation framework that allows a user to enter a Gremlin query, returning a D3 graph of results. The D3 graph is built using a JSON - this is created using separate vertices and edges outputs from the Gremlin query. For simple queries such as:
g.V.filter{it.attr_a == "foo"}
this works fine. However, when I try to perform a more complicated query such as the following:
g.E.filter{it.attr_a == 'foo'}.groupBy{it.attr_b}{it.outV.value}.cap.next().findAll{k,e->e.size()<=3}
- Find all instances of *value*
- Grouped by unique *attr_b*
- Where *attr_a* = foo
- And *attr_b* is paired with no more than 2 other instances of *value*
Instead, the output is of the following form:
attr_b1: {value1, value2, value3}
attr_b2: {value4}
attr_b3: {value6, value7}
I would like to know if there is a way for Gremlin to output the results as a list of nodes and edges so I can display the results as a graph. I am aware that I could edit my D3 code to take in this new output but there are currently no restrictions to the type/complexity of the query, so the key/value pairs will no necessarily be the same every time.
Thanks.
You've hit what I consider one of the key problems with visualizing Gremlin results. They can be anything. Gremlin results might not just be a list of vertices and edges. There is no way to really control this that I can think of. At the end of the day, you can really only visualize results that match a pattern that D3 expects. I'd start by trying to detect that pattern and visualize only in those cases (simply display non-recognized patterns as JSON perhaps).
Thinking of your specific example that results like this:
attr_b1: {value1, value2, value3}
attr_b2: {value4}
attr_b3: {value6, value7}
What would you want D3 to visualize there? The vertices/edges that were traversed over to get that result? If so, you might be stuck. Gremlin doesn't give you a way to introspect the pipeline to see what's passing through it. In other words, unless the user explicitly gathers vertices and edges within the pipeline that were touched you won't have access to them. It would be nice to be able to "spy" on a pipeline in that way, but at the moment it doesn't do that. There's been internal discussion within TinkerPop to create a new kind of pipeline implementation that would help with that, but at the moment, it doesn't exist.
So, without the "spying" capability, I think your only workarounds would be to:
detect vertex/edge list on your client side and only render those with d3. this would force users to always write gremlin that returned data in such a format, if they wanted visualization. put it in the users hands.
perhaps supply server-side bindings for a list of vertices/edges that a user could explicitly side-effect their vertices/edges into if their results did not conform to those expected by your visualization engine. again, this would force users to write their gremlin appropriately for your needs if they want visualization.

In CouchDB how do you take in parameters from REST call

Hi so I'm new to CouchDB looks great so far, but really struggling with what must be simple to do!
I have documents structured as:
{
"_id" : "245431e914ce42e6b2fc6e09cb00184d",
"_rev": "3-2a69f0325962b93c149204aa3b1fa683",
"type": "student",
"studentID": "12345678",
"Name": "Test",
"group: "A"
}
And would like to access them them with queries such as http://couchIP/student?group=A or something like that. Are Views what I need here? I don't understand how to take the parameter from the query in the Map functions in Views. example:
function(doc,req) {
if(req.group==='A'){
emit(doc.id, doc.name);
}
}
Is my understanding of how Couch is working wrong or what's my problem here? Thanks in advance, I'm sure this is Couch 101
Already read through http://guide.couchdb.org/ but it didn't really answer the question!
You need views to achieve the desired results.
Define the following map function inside a view of a design document. ( let's name the view "byGroup" and assume this lives in a design document named "_design/students" )
function(doc) {
if(doc.group){
emit(doc.group,null);
}
}
Results can be obtained from the following url
http://couchIP:5984/dbname/_design/students/_view/byGroup?startkey="A"&endkey="A"&include_docs=true
To have friendly url couchdb also provides url rewriting options.
You need to some further reading about views and the relevance that they return key/pair values.
It's not clear what you want to return from the view so I'll guess. If you want to return the whole document you'd create a view like:
function (doc) { emit(doc.group, doc) };
This will emit the group name as a key which you can lookup against, the whole doc will be returned as the value when you look it up.
If you want to just have access to the names of those users you want to do something like:
function (doc) { emit(doc.group, doc.name) };
Your question arises from a misconception about what a view does. Views use map/reduce to generate a representation of your data. You have no control of the output of your view in your query because the view is updated according to changes in your DB documents only.
Using a list is also not a good option. It may seem that you can use knowledge of your request in your list to generate a different output depending on the query parameters but this is wrong because couchdb uses ETags for caching and this means that most times you will get the same answer regardless of your list parameters since the underlying documents won't have changed. There is a trick though to fool couchdb in this case and this implies using two different alternating users but I wouldn't even try this way because surely there are easier ways to achieve your objectives and you can probably solve your problem using group as a key in your map function.

LINQ-SQL reuse - CompiledQuery.Compile

I have been playing about with LINQ-SQL, trying to get re-usable chunks of expressions that I can hot plug into other queries. So, I started with something like this:
Func<TaskFile, double> TimeSpent = (t =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Then, we can use the above in a LINQ query like the below (LINQPad example):
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
This produces the expected output, except, a query per row is generated for the plugged expression. This is visible within LINQPad. Not good.
Anyway, I noticed the CompiledQuery.Compile method. Although this takes a DataContext as a parameter, I thought I would include ignore it, and try the same Func. So I ended up with the following:
static Func<UserQuery, TaskFile, double> TimeSpent =
CompiledQuery.Compile<UserQuery, TaskFile, double>(
(UserQuery db, TaskFile t) =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Notice here, that I am not using the db parameter. However, now when we use this updated parameter, only 1 SQL query is generated. The Expression is successfully translated to SQL and included within the original query.
So my ultimate question is, what makes CompiledQuery.Compile so special? It seems that the DataContext parameter isn't needed at all, and at this point i am thinking it is more a convenience parameter to generate full queries.
Would it be considered a good idea to use the CompiledQuery.Compile method like this? It seems like a big hack, but it seems like the only viable route for LINQ re-use.
UPDATE
Using the first Func within a Where statment, we see the following exception as below:
NotSupportedException: Method 'System.Object DynamicInvoke(System.Object[])' has no supported translation to SQL.
Like the following:
.Where(t => TimeSpent(t) > 2)
However, when we use the Func generated by CompiledQuery.Compile, the query is successfully executed and the correct SQL is generated.
I know this is not the ideal way to re-use Where statements, but it shows a little how the Expression Tree is generated.
Exec Summary:
Expression.Compile generates a CLR method, wheras CompiledQuery.Compile generates a delegate that is a placeholder for SQL.
One of the reasons you did not get a correct answer until now is that some things in your sample code are incorrect. And without the database or a generic sample someone else can play with chances are further reduced (I know it's difficult to provide that, but it's usually worth it).
On to the facts:
Expression<Func<TaskFile, double>> TimeSpent = (t =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Then, we can use the above in a LINQ query like the below:
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
(Note: Maybe you used a Func<> type for TimeSpent. This yields the same situation as of you're scenario was as outlined in the paragraph below. Make sure to read and understand it though).
No, this won't compile. Expressions can't be invoked (TimeSpent is an expression). They need to be compiled into a delegate first. What happens under the hood when you invoke Expression.Compile() is that the Expression Tree is compiled down to IL which is injected into a DynamicMethod, for which you get a delegate then.
The following would work:
var q = TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent.Compile().DynamicInvoke()
});
This produces the expected output, except, a query per row is
generated for the plugged expression. This is visible within LINQPad.
Not good.
Why does that happen? Well, Linq To Sql will need to fetch all TaskFiles, dehydrate TaskFile instances and then run your selector against it in memory. You get a query per TaskFile likely because they contains one or multiple 1:m mappings.
While LTS allows projecting in memory for selects, it does not do so for Wheres (citation needed, this is to the best of my knowledge). When you think about it, this makes perfect sense: It is likely you will transfer a lot more data by filtering the whole database in memory, then by transforming a subset of it in memory. (Though it creates query performance issues as you see, something to be aware of when using an ORM).
CompiledQuery.Compile() does something different. It compiles the query to SQL and the delegate it returns is only a placeholder Linq to SQL will use internally. You can't "invoke" this method in the CLR, it can only be used as a node in another expression tree.
So why does LTS generate an efficient query with the CompiledQuery.Compile'd expression then? Because it knows what this expression node does, because it knows the SQL behind it. In the Expression.Compile case, it's just a InvokeExpression that invokes the DynamicMethod as I explained previously.
Why does it require a DataContext Parameter? Yes, it's more convenient for creating full queries, but it's also because the Expression Tree compiler needs to know the Mapping to use for generating the SQL. Without this parameter, it would be a pain to find this mapping, so it's a very sensible requirement.
I'm surprised why you've got no answers on this so far. CompiledQuery.Compile compiles and caches the query. That is why you see only one query being generated.
Not only this is NOT a hack, this is the recommended way!
Check out these MSDN articles for detailed info and example:
Compiled Queries (LINQ to Entities)
How to: Store and Reuse Queries (LINQ to SQL)
Update: (exceeded the limit for comments)
I did some digging in reflector & I do see DataContext being used. In your example, you're simply not using it.
Having said that, the main difference between the two is that the former creates a delegate (for the expression tree) and the latter creates the SQL that gets cached and actually returns a function (sort of). The first two expressions produce the query when you call Invoke on them, this is why you see multiple of them.
If your query doesn't change, but only the DataContext and Parameters, and if you plan to use it repeatedly, CompiledQuery.Compile will help. It is expensive to Compile, so for one off queries, there is no benefit.
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
This isn't a LinqToSql query, as there is no DataContext instance. Most likely you are querying some EntitySet, which does not implement IQueryable.
Please post complete statements, not statement fragments. (I see invalid comma, no semicolon, no assignment).
Also, Try this:
var query = myDataContext.TaskFiles
.Where(tf => tf.Parent.Key == myParent.Key)
.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t)
});
// where myParent is the source of the EntitySet and Parent is a relational property.
// and Key is the primary key property of Parent.