I have about 8 Million Documents in my Collection.
And I want to remove the special Characters in one of the fields.
I will post my Statement below.
I am using the mongo shell in the Mongo db compass tool.
The update is working about 30-50 Minutes and then throws the following error:
MongoServerError: Error on remote shard thisisjustforstack.com:27000 :: caused by :: cursor id 1272890412590646833 not found
I also see that after throwing this error, he did not update all documents.
db.getCollection('TEST_Collection').aggregate(
[{
$match: {
'1List.Comment': {
$exists: true
}
}
}, {
$project: {
'1List.Comment': 1
}
}]
)
.forEach(function(doc,Index) {doc.1List.Comment=doc.1List.Comment.replace(/[^a-zA-Z 0-9 ]/g, '');
db.TEST_Collection.updateMany({ "_id": doc._id },{ "$set": { "1List.Comment": doc.1List.Comment } });})
Can somebody please help to get this update statement working without running in some sort of timeout? I have read something about noCursorTimeout() but I am not sure on how to use it with my statement and using it in the shell.
Thank you all!
Cursor timeout can't be disabled on individual aggregation cursors.
But you can set on global config:
mongod --setParameter cursorTimeoutMillis=3600000 #1 hour
Anyway I think dividing the task in small batches is a better option
From latest couchbase doc,Could see FTS index can be created/updated using below
PUT /api/index/{indexName}
Creates/updates an index definition.
I have created index with name fts-idx and created successfully.
But looks like update of index is failing with REST API.
Response:
responseMessage : ,{"error":"rest_create_index: error creating index: fts-idx, err: manager_api: cannot create index because an index with the same name already exists: fts-idx"
Anything i have missed here.
I was able to replicate this issue, and I think figured it out. It's not a bug, but it should really be documented better.
You need to pass in the index's UUID as part of the PUT (I think this is a concurrency check). You can get the index's current uuid via GET /api/index/fts-index (it's in indexDef->uuid)
And once you have that, make it part of your update PUT body:
{
"name": "fts-index",
"type": "fulltext-index",
"params": {
// ... etc ...
},
"sourceType": "couchbase",
"sourceName": "travel-sample",
"sourceUUID": "307a1042c094b7314697980312f4b66b",
"sourceParams": {},
"planParams": {
// ... etc ...
},
"uuid": "89a125824b012319" // <--- right here
}
Once I did that, the update PUT went through just fine.
Here is the situation.
1) There is an existing document (let's say the index is baseball-a
2) baseball-a, baseball-b, and baseball-c are aliased to baseball
3) Update a document in baseball-a
POST /baseball-a/1/_update?pretty'
{
"doc": { "my_name": "Casey at the bat2"}
}'
4) now if I do a GET baseball-a/1/ everything is updated
5) but if I do a search
POST /baseball/_search?pretty
{
"query": { "match": { "id": "1" } }
}
then the document that is returned has the old my_name of "Casey at the bat" (missing the '2') but 15 minutes later it shows up... how do I fix this or speed it up?
I think I figured it out. Basically you need to look at the refresh_interval for the alias by doing
GET /baseball/_settings
Mine was set to -1 and it should be set to either 1s or 5s
Additionally after I manually ran this command
POST /baseball/_refresh
it also worked but that is just a hassle... let elastic do it for you automatically. Now if I could only figure out why I can't set the refresh inteval correctly Updating ElasticSearch interval_refresh when aliased
what is the mongodb equivalent of the MySQL query
SELECT username AS `consname` FROM `consumer`
As it was mentioned by sammaye, you have to use $project in aggregation framework to rename fields.
So in your case it would be:
db.consumer.aggregate([
{ "$project": {
"_id": 0,
"consname": "$username"
}}
])
Cool thing is that in 2.6.x version aggregate returns a cursor which means it behaves like find.
You might also take a look at $rename operator to permanently change schema.
Salvador Dali's answer is fine, but not working in meteor versions before 3.0.
I currently try meteor, and their build in mongodb is in version 2.
The way I solved this is like this:
var result = []:
db.consumer.forEach( function(doc) {
result.push({
consname:doc.username
});
});
I have 2 big CSV file with millions of rows.
Because those 2 CSVs are from MySQL, I want to merge those 2 tables into one Document in couch DB.
What is the most efficient way to do this?
My current method is:
import 1st CSV
import 2nd CSV
To prevent duplication, the program will search the Document with the key for each row. after the row is found, then the Document is updated with the columns from the 2nd CSV
The problem is, it really takes a long time to search for each row.
while importing the 2nd CSV, it updates 30 documents/second and I have about 7 million rows. rough calculation, it will take about 64 hours to complete the whole importing.
Thank You
It sounds like you have a "primary key" that you know from the row (or you can compute it from the row). That is ideal as the document _id.
The problem is, you get 409 Conflict if you try to add the 2nd CSV data but there was already a document with the same _id. Is that correct? (If so, please correct me so I can fix the answer.)
I think there is a good answer for you:
Use _bulk_docs to import everything, then fix the conflicts.
Begin with a clean database.
Use the Bulk docuent API to insert all the rows from the 1st and then 2nd CSV set—as many as possible per HTTP query, e.g. 1,000 at a time. (Bulk docs are much faster than inserting one-by-one.)
Always add "all_or_nothing": true in your _bulk_docs POST data. That will guarantee that every insertion will be successful (assuming no disasters such as power loss or full HD).
When you are done, some documents will be conflicted, which means you inserted twice for the same _id value. That is no problem. Simply follow this procedure to merge the two versions:
For each _id that has conflicts, fetch it from couch by GET /db/the_doc_id?conflicts=true.
Merge all the values from the conflicting versions into a new final version of the document.
Commit the final, merged document into CouchDB and delete the conflicted revisions. See the CouchDB Definitive Guide section on conflict resolution. (You can use _bulk_docs to speed this up too.)
Example
Hopefully this will clarify a bit. Note, I installed the *manage_couchdb* couchapp from http://github.com/iriscouch/manage_couchdb. It has a simple view to show conflicts.
$ curl -XPUT -Hcontent-type:application/json localhost:5984/db
{"ok":true}
$ curl -XPOST -Hcontent-type:application/json localhost:5984/db/_bulk_docs --data-binary #-
{ "all_or_nothing": true
, "docs": [ { "_id": "some_id"
, "first_value": "This is the first value"
}
, { "_id": "some_id"
, "second_value": "The second value is here"
}
]
}
[{"id":"some_id","rev":"1-d1b74e67eee657f42e27614613936993"},{"id":"some_id","rev":"1-d1b74e67eee657f42e27614613936993"}]
$ curl localhost:5984/db/_design/couchdb/_view/conflicts?reduce=false\&include_docs=true
{"total_rows":2,"offset":0,"rows":[
{"id":"some_id","key":["some_id","1-0cb8fd1fd7801b94bcd2f365ce4812ba"],"value":{"_id":"some_id","_rev":"1-0cb8fd1fd7801b94bcd2f365ce4812ba"},"doc":{"_id":"some_id","_rev":"1-0cb8fd1fd7801b94bcd2f365ce4812ba","first_value":"This is the first value"}},
{"id":"some_id","key":["some_id","1-d1b74e67eee657f42e27614613936993"],"value":{"_id":"some_id","_rev":"1-d1b74e67eee657f42e27614613936993"},"doc":{"_id":"some_id","_rev":"1-d1b74e67eee657f42e27614613936993","second_value":"The second value is here"}}
]}
$ curl -XPOST -Hcontent-type:application/json localhost:5984/db/_bulk_docs --data-binary #-
{ "all_or_nothing": true
, "docs": [ { "_id": "some_id"
, "_rev": "1-0cb8fd1fd7801b94bcd2f365ce4812ba"
, "first_value": "This is the first value"
, "second_value": "The second value is here"
}
, { "_id": "some_id"
, "_rev": "1-d1b74e67eee657f42e27614613936993"
, "_deleted": true
}
]
}
[{"id":"some_id","rev":"2-df5b9dc55e40805d7f74d1675af29c1a"},{"id":"some_id","rev":"2-123aab97613f9b621e154c1d5aa1371b"}]
$ curl localhost:5984/db/_design/couchdb/_view/conflicts?reduce=false\&include_docs=true
{"total_rows":0,"offset":0,"rows":[]}
$ curl localhost:5984/db/some_id?conflicts=true\&include_docs=true
{"_id":"some_id","_rev":"2-df5b9dc55e40805d7f74d1675af29c1a","first_value":"This is the first value","second_value":"The second value is here"}
The final two commands show that there are no conflicts, and the "merged" document is now served as "some_id".
Another option is simply to do what you are doing already, but use the bulk document API to get a performance boost.
For each batch of documents:
POST to /db/_all_docs?include_docs=true with a body like this:
{ "keys": [ "some_id_1"
, "some_id_2"
, "some_id_3"
]
}
Build your _bulk_docs update depending on the results you get.
Doc already exists, you must update it: {"key":"some_id_1", "doc": {"existing":"data"}}
Doc does not exist, you must create it: {"key":"some_id_2", "error":"not_found"}
POST to /db/_bulk_docs with a body like this:
{ "docs": [ { "_id": "some_id_1"
, "_rev": "the _rev from the previous query"
, "existing": "data"
, "perhaps": "some more data I merged in"
}
, { "_id": "some_id_2"
, "brand": "new data, since this is the first doc creation"
}
]
}