MongoDB, convert element data type from numeric string to number for a large collection (3kk) [duplicate] - json

This question already has answers here:
MongoDB CursorNotFound Error on collection.find() for a few hundred small records
(2 answers)
Closed 3 years ago.
I have a large 3kk mongodb collection for which i need to convert one element from numeric string to number.
I'm using a mongo-shell script which works for small 100k element collection, please see below the script:
db.SurName.find().forEach(function(tmp){
tmp.NUMBER = parseInt(tmp.NUMBER);
db.SurName.save(tmp);
})
But after a dozen minutes of work I got an error (the error occurs even if the collection is smaller like 1kk):
MongoDB Enterprise Test-shard-0:PRIMARY> db.SurName.find().forEach(function(tmp){
... tmp.NUMBER = parseInt(tmp.NUMBER);
... db.SurName.save(tmp);
... })
2020-01-18T16:59:21.173+0100 E QUERY [js] Error: command failed: {
"operationTime" : Timestamp(1579363161, 14),
"ok" : 0,
"errmsg" : "cursor id 4811116025485863761 not found",
"code" : 43,
"codeName" : "CursorNotFound",
"$clusterTime" : {
"clusterTime" : Timestamp(1579363161, 14),
"signature" : {
"hash" : BinData(0,"EemWWenbArSdh4dTFa0aNcfAPms="),
"keyId" : NumberLong("6748451824648323073")
}
}
} : getMore command failed: {
"operationTime" : Timestamp(1579363161, 14),
"ok" : 0,
"errmsg" : "cursor id 4811116025485863761 not found",
"code" : 43,
"codeName" : "CursorNotFound",
"$clusterTime" : {
"clusterTime" : Timestamp(1579363161, 14),
"signature" : {
"hash" : BinData(0,"EemWWenbArSdh4dTFa0aNcfAPms="),
"keyId" : NumberLong("6748451824648323073")
}
}
} :
_getErrorWithCode#src/mongo/shell/utils.js:25:13
doassert#src/mongo/shell/assert.js:18:14
_assertCommandWorked#src/mongo/shell/assert.js:583:17
assert.commandWorked#src/mongo/shell/assert.js:673:16
DBCommandCursor.prototype._runGetMoreCommand#src/mongo/shell/query.js:802:5
DBCommandCursor.prototype._hasNextUsingCommands#src/mongo/shell/query.js:832:9
DBCommandCursor.prototype.hasNext#src/mongo/shell/query.js:840:16
DBQuery.prototype.hasNext#src/mongo/shell/query.js:288:13
DBQuery.prototype.forEach#src/mongo/shell/query.js:493:12
#(shell):1:1
Is there a way to do this better/right?
EDIT:
The obj schema:
{"_id":{"$oid":"5e241b98c7cab1382c7c9d95"},
"SURNAME":"KOWALSKA",
"SEX":"KOBIETA",
"TERYT":"0201011",
"NUMBER":"51",
"COMMUNES":"BOLESŁAWIEC",
"COUNTIES":"BOLESŁAWIECKI",
"PROVINCES":"DOLNOŚLĄSKIE"
}

The best and fast solution is to use mongodb aggregation with $out operator.
Equivalent to:
insert into new_table
select * from old_table
We convert NUMBER field with $toInt (MongoDB version >= 4.0) operator and store documents in the SurName2 collection. Once we have finished, we just drop old collection and rename SurName2 collection to SurName.
db.SurName.aggregate([
{$addFields:{
NUMBER : {$toInt:"$NUMBER"}
}},
{$out: "SurName2"}
])
Once you check everything is fine, execute these sentences:
db.SurName.drop()
db.SurName2.renameCollection("SurName")

** EDIT - START **
Googling "cursor id not found code 43", yielded this answer: https://stackoverflow.com/a/51602507/2279082
** EDIT - END **
I don't have your data set so I cannot test my answer very well. That being said, you can try to Update the specific field (see about update in the docs: db.collection.update)
So your script will look like this:
db.SurName.find({}, {NUMBER: 1}).forEach(function(tmp){
db.SurName.update({_id: tmp._id}, {$set: {NUMBER: parseInt(tmp.NUMBER)}});
})
Let me know if it helps or if needs an edit

Related

MongoDB regex string startswith and endswith [duplicate]

This question already has answers here:
Regex matching beginning AND end strings
(6 answers)
Regular Expression to find string starts with letter and ends with slash /
(3 answers)
Closed 4 years ago.
So i have something that looks like this
db.usuarios.insert
(
[
{
"nome" : "neymala",
"idade" : 40,
"status" : "solteira"
},
{
"nome" : "gabriel",
"idade" : 31,
"status" : "casado"
},
{
"nome" : "jose",
"idade" : 25,
"status" : "solteiro"
},
{
"nome" : "manoel",
"idade" : 25,
"status" : "solteiro",
"interesses" : [
"esporte",
"musica"
]
}
]
)
I would like to find names that starts with ma and ends with l, for example "manoel" or "manuel"
I have figured out how to do one or the other with the fallowing querys:
db.usuarios.find({nome:{$regex: /^ma/ }})
db.usuarios.find({nome:{$regex: /l$/ }})
Now i would like to combine them into a single query.
You can combine the two requirements into a single regex:
db.usuarios.find({nome: /^ma.*l$/})
In a regex, .* means to match 0 or more of any character. So this regex matches names that start with ma and end with l, ignoring whatever is between.
combine both querys with a AND opetator
db.usuarios.find({
$and:[
{nome:{$regex: /^ma/ }},
{nome:{$regex: /l$/ }}
]
})
In javascript /.../ is a regex, so no $regex needed.
db.usuarios.find({
$and:[
{nome: /^ma/ },
{nome: /l$/ }
]
})
Also note starts with can hit an index. ends with can't. This way you'd better have a selective starts with, otherwise there will be lots of object scans which may occupy extra CPU and possibly slow down your query.
May be this code help you.
you can search small letter string and capital letter string.......

MongoDB queries return no results

I'm having a problem with querying a MongoDB dataset ("On Street Crime in Camden" from data.gov.uk)
The database name is Crime_Data_in_Camden and the collection name is Street_Crime_Camden. The query to find all records, db.Street_Crime_Camden.find(), works fine but anything else returns nothing at
all. Here is a portion of the metadata:
{
"id" : 509935,
"name" : "Ward Name",
"dataTypeName" : "text",
"fieldName" : "ward_name",
"position" : 13,
"renderTypeName" : "text",
"tableColumnId" : 258836,
"width" : 100,
"cachedContents" : {
"largest" : "West Hampstead",
"non_null" : 79813,
"null" : 0,
"top" : [ {
"item" : "Regent's Park",
"count" : 20
}, {
"item" : "Swiss Cottage",
"count" : 19
}, {
"item" : "Holborn and Covent Garden",
"count" : 18
}
}
}
I've tried 3 attempts at a basic query:
db.Street_Crime_Camden.find({"ward_name":"West Hampstead"});
db.Street_Crime_Camden.find({'meta.ward_name':'West Hampstead'});
db.Street_Crime_Camden.find({meta:{ward_name:"West Hampstead"} });
According to any documentation or tutorial that I've seen any of these approaches should be valid. And I know that there are hundreds of rows (or documents) that match those terms, so why are these queries returning nothing? Advice would be appreciated.
The common theme in the three aproaches you tried is some form of ward_name = West Hampstead but there is no attribute named ward_name in the document you shared with us.
Based on the document you show in your question the only way of addressing an attribute with the value West Hampstead is:
db.Street_Crime_Camden.find({"cachedContents.largest": "West Hampstead"});
For background; you address attributes in your documents by using dot notation so the document you included in your question could be found by any of the following find commands:
db.Street_Crime_Camden.find({"name": "Ward Name"});
db.Street_Crime_Camden.find({"position": 13});
db.Street_Crime_Camden.find({"cachedContents.top.item": "Swiss Cottage"});
db.Street_Crime_Camden.find({"cachedContents.top.1.count": 20});
... etc
These examples might help you to understand how to form find criteria. The MongoDB docs are also useful.

Update json array in postgres

I have a data field that looks like this :
{ "field1" : [{"name":'name1',"value1":true},
{"name":'name2',"value2":false}
],
"field2" : [{"name":'name1',"value1":true},
{"name":'name2',"value2":false}
]
}
Is it possible to update a specific field with an update ?
create table t_json (
t_data json
);
insert into t_json values('{"field1":[{"name":"name1","value" : true},{"name":"name2","value" : false}],"field1":[{"name":"name1","value" : true},{"name":"name2","value" : false}]}');
select t_data->'field1'
from t_json;
I tried this :
update t_json
set t_data->'a' = '[{"value1" : true, "value2" : false}]';
But I get an error : "syntax error at or near ->
What is missing ?
I wanted to post this here in case it helps anybody else. By all means use JSON over JSONB unless you actually need features that JSONB affords you. In general, if you need to perform queries on the JSON data itself, use JSONB. If you are just needing to store data, use JSON.
Anyhow, here is how I am updating a JSON[] field:
UPDATE foo SET bar = ARRAY[$${"hello": "world"}$$, $${"baz": "bing"}$$]::JSON[]
The important things to notice are this:
The array is wrapped like this: ARRAY[ ... ]::JSON[]
Each item in the array is wrapped like this: $${ "foo": "bar" }$$
It is worth noting that this same technique can be used for other array types. For example, if you have a text[] column, the query would look like this:
UPDATE foo SET bar = ARRAY[$$hello world$$, $$baz bing$$]::TEXT[]`
Fixing your typos
Doubt it. This is not valid json. name1 and name2 must be double quoted. To ease working with json, ALWAYS use double quotes. ALWAYS query-quote with double-dollar.
{ "field1" : [{"name":'name1',"value1":true},
{"name":'name2',"value2":false}
],
"field2" : [{"name":'name1',"value1":true},
{"name":'name2',"value2":false}
]
}
And, what you INSERTED is also funky.. ALWAYS paste beautified valid JSON in your question.
{
"field1":[{"name":"name1","value" : true},{"name":"name2","value" : false}],
"field1":[{"name":"name1","value" : true},{"name":"name2","value" : false}]
}
Let's change that and fix it.
{
"field1":[{"name":"name1","value" : true},{"name":"name2","value" : false}],
"field2":[{"name":"name1","value" : true},{"name":"name2","value" : false}]
}
Now let's put it in a query..
TRUNCATE t_json;
INSERT INTO t_json (t_data) VALUES ($$
{
"field1":[{"name":"name1","value" : true},{"name":"name2","value" : false}],
"field2":[{"name":"name1","value" : true},{"name":"name2","value" : false}]
}
$$);
Making the update of the JSON
Now it works.. Now you can update it as you want..
UPDATE t_json
SET t_data = jsonb_set(
t_data::jsonb,
'{field1}',
$${"whatever":1}$$
);
Change from JSON to JSONB
Notice we're having to cast to jsonb. As a general rule, NEVER use JSON (not everyone agrees, see comments). There is no point. Instead use the newer JSONB.
ALTER TABLE t_json ALTER COLUMN t_data TYPE jsonb ;
Now you can do
UPDATE t_json
SET t_data = jsonb_set(
t_data,
'{field1}',
$${"whatever":1}$$
);

How to remove an attribute from list of json object in mongo shell? [duplicate]

This question already has answers here:
How to Update Multiple Array Elements in mongodb
(16 answers)
Remove field found in any mongodb array
(2 answers)
Closed 3 years ago.
I have below document in MongoDB(2.4.5)
{
"_id" : 235399,
"casts" : {
"crew" : [
{
"_id" : 1186343,
"withBase" : true,
"department" : "Directing",
"job" : "Director",
"name" : "Connie Rasinski"
},
{
"_id" : 86342,
"withBase" : true
}
]
},
"likes" : 0,
"rating" : 0,
"rating_count" : 0,
"release_date" : "1955-11-11"
}
I want to remove withBase filed from array elements inside casts.crew ..
I tried this
db.coll.update({_id:235399},{$unset: { "casts.crew.withBase" : 1 } },false,true)
nothing changed.
And tried this..
db.coll.update({_id:235399},{$unset: { "casts.crew" : { $elemMatch: { "withBase": 1 } } } },false,true)
it removed entire crew array from the document.
Can someone please provide me the right query?
You can use the new positional identifier to update multiple elements in array in 3.6.
Something like
db.coll.update( {_id:235399}, {$unset: {"casts.crew.$[].withBase":""}} )
$[] removes all the withBase property from the crews array. It acts as a placeholder for updating all elements in array.
Use multi true to affect multiple documents.
Sorry to disappoint you, but your answer
db.coll.update({
_id:235399,
"casts.crew.withBase": {$exists: true}
},{
$unset: {
"casts.crew.$.withBase" : true
}
},false,true)
is not correct. Actually it will remove the value, BUT only from the first occurrence of the subdocument, because of the way positional operator works:
the positional $ operator acts as a placeholder for the first element
that matches the query document
You also can not use $unset (as you tried before) because it can not work on arrays (and are you basically trying to remove a key from a document from the array). You also can not remove it with $pull, because pull removes all the array, not just a field of it.
Therefore as far as I know you can not do this with a simple operator. So the last resort is doing $find and then forEach with save. You can see how to do this in my answer here. In your case you need to have another loop in forEach function to iterate through array and to delete a key. I hope that you will be able to modify it. If no, I will try to help you.
P.S. If someone looks a way to do this - here is Sandra's function
db.coll.find({_id:235399}).forEach( function(doc) {
var arr = doc.casts.crew;
var length = arr.length;
for (var i = 0; i < length; i++) {
delete arr[i]["withBase"];
}
db.coll.save(doc);
});
I found a way to unset this lists without having to pull up the object (meaning, just doing an update), it's pretty hackish but if you have a huge database it will make the deal:
db.coll.update({},{$unset: {"casts.crew.0.withBase" : 1, "casts.crew.1.withBase" : 1} }, {multi: 1})
In other words, you have to calculate how many objects there can be in any of your documents list and add those numbers explicitly, in this case as {casts.crew.NUMBER.withBase: 1}.
Also, to count the longest array in a mongodb object, an aggregate can be done, something like this:
db.coll.aggregate( [ { $unwind : "$casts.crew" }, { $group : { _id : "$_id", len : { $sum : 1 } } }, { $sort : { len : -1 } }, { $limit : 1 } ], {allowDiskUse: true} )
Just want to emphasize that this is not a pretty solution but is way faster than fetching and saving.

MongoDB: how to select an empty-key subdocument?

Ahoy! I'm having a very funny issue with MongoDB and, possibly more in general, with JSON. Basically, I accidentally created some MongoDB documents whose subdocuments contain an empty key, e.g. (I stripped ObjectIDs to make the code look nicer):
{
"_id" : ObjectId("..."),
"stats" :
{
"violations" : 0,
"cost" : 170,
},
"parameters" :
{
"" : "../instances/comp/comp20.ectt",
"repetition" : 29,
"time" : 600000
},
"batch" : ObjectId("..."),
"system" : "Linux 3.5.0-27-generic",
"host" : "host3",
"date_started" : ISODate("2013-05-14T16:46:46.788Z"),
"date_stopped" : ISODate("2013-05-14T16:56:48.483Z"),
"copy" : false
}
Of course the problem is line:
"" : "../instances/comp/comp20.ectt"
since I cannot get back the value of the field. If I query using:
db.experiments.find({"batch": ObjectId("...")}, { "parameters.": 1 })
what I get is the full content of the parameters subdocument. My guess is that . is probably ignored if followed by an empty selector. From the JSON specification (15.12.*) it looks like empty keys are allowed. Do you have any ideas about how to solve that?
Is that a known behavior? Is there a use for that?
Update I tried to $rename the field, but that won't work, for the same reasons. Keys that end with . are not allowed.
Update filed issue on MongoDB issue tracker.
Thanks,
Tommaso
I have this same problem. You can select your sub-documents with something like this:
db.foo.find({"parameters.":{$exists:true}})
The dot at the end of "parameters" tells Mongo to look for an empty key in that sub-document. This works for me with Mongo 2.4.x.
Empty keys are not well supported by Mongo, I don't think they are officially supported, but you can insert data with them. So you shouldn't be using them and should find the place in your system where these keys are inserted and eliminate it.
I just checked the code and this does not currently seem possible for the reasons you mention. Since it is allowed to create documents with zero length field names I would consider this a bug. You can report it here : https://jira.mongodb.org
By the way, ironically you can query on it :
> db.c.save({a:{"":1}})
> db.c.save({a:{"":2}})
> db.c.find({"a.":1})
{ "_id" : ObjectId("519349da6bd8a34a4985520a"), "a" : { "" : 1 } }