Solr/Lucene update query deletes attribute from data - json

I am running into an issue with an attribute missing in the data after running an update query.
I run a select query like this:
curl "http://localhost:8080/solr/collection1/select?q=title%3AHans+head:true&fl=title,uid,articleId,missing_Attribute,my_otherAttribute&wt=json&indent=true"
It returns an article:
{
"title":"Hans",
"uid":"18_UNIQUEID_123",
"articleId":"123123123",
"missing_Attribute":"M",
}
So missing_Attribute = M, my_otherAttribute is not present yet. Which is fine.
Then I run an update query on this document using:
curl http://localhost:8080/solr/collection1/update?commit=true --data-binary #MyUpdate.json -H 'Content-type:application/json'
with MyUpdate.json as:
[
{
"uid": "18_UNIQUEID_123",
"my_otherAttribute": {
"set": "12"
}
}
]
And run the select query again, results in:
{
"title":"Hans",
"uid":"18_UNIQUEID_123",
"articleId":"123123123",
"my_otherAttribute":"12",
}
my_otherAttribute = 12 but missing_Attribute is gone!
Why is missing_Attribute gone when I update my_otherAttribute?
Why does it not affect any of the other fields ?

To answer my own question, the answer is:
https://wiki.apache.org/solr/Atomic_Updates
The issue I face is that I want to make a partial update of a document. I am using Solr 4.10, so in theory it would work. But only if the schema would support it. And ours does not. So that is why attributes disappear.

Related

Mongo DB Error while Updating - error on remote shard - caused by cursor id

I have about 8 Million Documents in my Collection.
And I want to remove the special Characters in one of the fields.
I will post my Statement below.
I am using the mongo shell in the Mongo db compass tool.
The update is working about 30-50 Minutes and then throws the following error:
MongoServerError: Error on remote shard thisisjustforstack.com:27000 :: caused by :: cursor id 1272890412590646833 not found
I also see that after throwing this error, he did not update all documents.
db.getCollection('TEST_Collection').aggregate(
[{
$match: {
'1List.Comment': {
$exists: true
}
}
}, {
$project: {
'1List.Comment': 1
}
}]
)
.forEach(function(doc,Index) {doc.1List.Comment=doc.1List.Comment.replace(/[^a-zA-Z 0-9 ]/g, '');
db.TEST_Collection.updateMany({ "_id": doc._id },{ "$set": { "1List.Comment": doc.1List.Comment } });})
Can somebody please help to get this update statement working without running in some sort of timeout? I have read something about noCursorTimeout() but I am not sure on how to use it with my statement and using it in the shell.
Thank you all!
Cursor timeout can't be disabled on individual aggregation cursors.
But you can set on global config:
mongod --setParameter cursorTimeoutMillis=3600000 #1 hour
Anyway I think dividing the task in small batches is a better option

Couchbase FTS Index update through REST API

From latest couchbase doc,Could see FTS index can be created/updated using below
PUT /api/index/{indexName}
Creates/updates an index definition.
I have created index with name fts-idx and created successfully.
But looks like update of index is failing with REST API.
Response:
responseMessage : ,{"error":"rest_create_index: error creating index: fts-idx, err: manager_api: cannot create index because an index with the same name already exists: fts-idx"
Anything i have missed here.
I was able to replicate this issue, and I think figured it out. It's not a bug, but it should really be documented better.
You need to pass in the index's UUID as part of the PUT (I think this is a concurrency check). You can get the index's current uuid via GET /api/index/fts-index (it's in indexDef->uuid)
And once you have that, make it part of your update PUT body:
{
"name": "fts-index",
"type": "fulltext-index",
"params": {
// ... etc ...
},
"sourceType": "couchbase",
"sourceName": "travel-sample",
"sourceUUID": "307a1042c094b7314697980312f4b66b",
"sourceParams": {},
"planParams": {
// ... etc ...
},
"uuid": "89a125824b012319" // <--- right here
}
Once I did that, the update PUT went through just fine.

Elastic Search Document Update taking too long when retrieved by alias

Here is the situation.
1) There is an existing document (let's say the index is baseball-a
2) baseball-a, baseball-b, and baseball-c are aliased to baseball
3) Update a document in baseball-a
POST /baseball-a/1/_update?pretty'
{
"doc": { "my_name": "Casey at the bat2"}
}'
4) now if I do a GET baseball-a/1/ everything is updated
5) but if I do a search
POST /baseball/_search?pretty
{
"query": { "match": { "id": "1" } }
}
then the document that is returned has the old my_name of "Casey at the bat" (missing the '2') but 15 minutes later it shows up... how do I fix this or speed it up?
I think I figured it out. Basically you need to look at the refresh_interval for the alias by doing
GET /baseball/_settings
Mine was set to -1 and it should be set to either 1s or 5s
Additionally after I manually ran this command
POST /baseball/_refresh
it also worked but that is just a hassle... let elastic do it for you automatically. Now if I could only figure out why I can't set the refresh inteval correctly Updating ElasticSearch interval_refresh when aliased

mongodb equivalent of SELECT field AS `anothername`

what is the mongodb equivalent of the MySQL query
SELECT username AS `consname` FROM `consumer`
As it was mentioned by sammaye, you have to use $project in aggregation framework to rename fields.
So in your case it would be:
db.consumer.aggregate([
{ "$project": {
"_id": 0,
"consname": "$username"
}}
])
Cool thing is that in 2.6.x version aggregate returns a cursor which means it behaves like find.
You might also take a look at $rename operator to permanently change schema.
Salvador Dali's answer is fine, but not working in meteor versions before 3.0.
I currently try meteor, and their build in mongodb is in version 2.
The way I solved this is like this:
var result = []:
db.consumer.forEach( function(doc) {
result.push({
consname:doc.username
});
});

Alternative to preventing duplicates in importing CSV to CouchDB

I have 2 big CSV file with millions of rows.
Because those 2 CSVs are from MySQL, I want to merge those 2 tables into one Document in couch DB.
What is the most efficient way to do this?
My current method is:
import 1st CSV
import 2nd CSV
To prevent duplication, the program will search the Document with the key for each row. after the row is found, then the Document is updated with the columns from the 2nd CSV
The problem is, it really takes a long time to search for each row.
while importing the 2nd CSV, it updates 30 documents/second and I have about 7 million rows. rough calculation, it will take about 64 hours to complete the whole importing.
Thank You
It sounds like you have a "primary key" that you know from the row (or you can compute it from the row). That is ideal as the document _id.
The problem is, you get 409 Conflict if you try to add the 2nd CSV data but there was already a document with the same _id. Is that correct? (If so, please correct me so I can fix the answer.)
I think there is a good answer for you:
Use _bulk_docs to import everything, then fix the conflicts.
Begin with a clean database.
Use the Bulk docuent API to insert all the rows from the 1st and then 2nd CSV set—as many as possible per HTTP query, e.g. 1,000 at a time. (Bulk docs are much faster than inserting one-by-one.)
Always add "all_or_nothing": true in your _bulk_docs POST data. That will guarantee that every insertion will be successful (assuming no disasters such as power loss or full HD).
When you are done, some documents will be conflicted, which means you inserted twice for the same _id value. That is no problem. Simply follow this procedure to merge the two versions:
For each _id that has conflicts, fetch it from couch by GET /db/the_doc_id?conflicts=true.
Merge all the values from the conflicting versions into a new final version of the document.
Commit the final, merged document into CouchDB and delete the conflicted revisions. See the CouchDB Definitive Guide section on conflict resolution. (You can use _bulk_docs to speed this up too.)
Example
Hopefully this will clarify a bit. Note, I installed the *manage_couchdb* couchapp from http://github.com/iriscouch/manage_couchdb. It has a simple view to show conflicts.
$ curl -XPUT -Hcontent-type:application/json localhost:5984/db
{"ok":true}
$ curl -XPOST -Hcontent-type:application/json localhost:5984/db/_bulk_docs --data-binary #-
{ "all_or_nothing": true
, "docs": [ { "_id": "some_id"
, "first_value": "This is the first value"
}
, { "_id": "some_id"
, "second_value": "The second value is here"
}
]
}
[{"id":"some_id","rev":"1-d1b74e67eee657f42e27614613936993"},{"id":"some_id","rev":"1-d1b74e67eee657f42e27614613936993"}]
$ curl localhost:5984/db/_design/couchdb/_view/conflicts?reduce=false\&include_docs=true
{"total_rows":2,"offset":0,"rows":[
{"id":"some_id","key":["some_id","1-0cb8fd1fd7801b94bcd2f365ce4812ba"],"value":{"_id":"some_id","_rev":"1-0cb8fd1fd7801b94bcd2f365ce4812ba"},"doc":{"_id":"some_id","_rev":"1-0cb8fd1fd7801b94bcd2f365ce4812ba","first_value":"This is the first value"}},
{"id":"some_id","key":["some_id","1-d1b74e67eee657f42e27614613936993"],"value":{"_id":"some_id","_rev":"1-d1b74e67eee657f42e27614613936993"},"doc":{"_id":"some_id","_rev":"1-d1b74e67eee657f42e27614613936993","second_value":"The second value is here"}}
]}
$ curl -XPOST -Hcontent-type:application/json localhost:5984/db/_bulk_docs --data-binary #-
{ "all_or_nothing": true
, "docs": [ { "_id": "some_id"
, "_rev": "1-0cb8fd1fd7801b94bcd2f365ce4812ba"
, "first_value": "This is the first value"
, "second_value": "The second value is here"
}
, { "_id": "some_id"
, "_rev": "1-d1b74e67eee657f42e27614613936993"
, "_deleted": true
}
]
}
[{"id":"some_id","rev":"2-df5b9dc55e40805d7f74d1675af29c1a"},{"id":"some_id","rev":"2-123aab97613f9b621e154c1d5aa1371b"}]
$ curl localhost:5984/db/_design/couchdb/_view/conflicts?reduce=false\&include_docs=true
{"total_rows":0,"offset":0,"rows":[]}
$ curl localhost:5984/db/some_id?conflicts=true\&include_docs=true
{"_id":"some_id","_rev":"2-df5b9dc55e40805d7f74d1675af29c1a","first_value":"This is the first value","second_value":"The second value is here"}
The final two commands show that there are no conflicts, and the "merged" document is now served as "some_id".
Another option is simply to do what you are doing already, but use the bulk document API to get a performance boost.
For each batch of documents:
POST to /db/_all_docs?include_docs=true with a body like this:
{ "keys": [ "some_id_1"
, "some_id_2"
, "some_id_3"
]
}
Build your _bulk_docs update depending on the results you get.
Doc already exists, you must update it: {"key":"some_id_1", "doc": {"existing":"data"}}
Doc does not exist, you must create it: {"key":"some_id_2", "error":"not_found"}
POST to /db/_bulk_docs with a body like this:
{ "docs": [ { "_id": "some_id_1"
, "_rev": "the _rev from the previous query"
, "existing": "data"
, "perhaps": "some more data I merged in"
}
, { "_id": "some_id_2"
, "brand": "new data, since this is the first doc creation"
}
]
}