mongoDB delete from array - json

I am wondering how I can delete certain elements from an array in mongoDB.
I have the following json saved in the collection geojsons:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"ID": "1753242",
"TYPE": "8003"
}
},
{
"type": "Feature",
"properties": {
"ID": "4823034",
"TYPE": "7005"
}
},
{
"type": "Feature",
"properties": {
"ID": "4823034",
"TYPE": "8003"
}
}
]
}
And I want to delete every element in the array features, where properties.TYPE equals 8003.
I tried it with the following statement, but it does not delete anything.
db.geojsons.aggregate([
{$match: {'features': {$elemMatch : {"properties.TYPE": '8003' }}}},
{$project: {
features: {$filter: {
input: '$features',
as: 'feature',
cond: {$eq: ['$$feature.properties.TYPE', '8003']}
}}
}}
])
.forEach(function(doc) {
doc.features.forEach(function(feature) {
db.collection.deleteOne(
{ "feature.properties.TYPE": "8003" }
);
print( "in ")
});
});
Does anybody know, why this does not delete anything or how to delete the matching elements?
For me it seems that the failure is inside the forEach, since the print statement gets executed as expected.

No need of aggregation here, you can use $pull of update
db.geojsons.update(
{ "type": "FeatureCollection" }, //or any matching criteria
{$pull: { "features": { "properties.TYPE": "8003"} }}
);

You can use $pull to delete specific elements from the array
db.geojsons.update({}, {
$pull: {features: {'properties.TYPE':'8003'}}
}, {multi:true})

I guess in this case the correct solution is reading the matching documents, modify and save them again. So simply create a search query that matches the documents you want: remove the elements in the array that you don't want and save it again.
Moreover, this statement is not doing what you want:
db.collection.deleteOne(
{ "feature.properties.TYPE": "8003" }
);
In fact it is searching for a document that has structure such that feature.properties.TYPE matches it like this:
{
"type": "FeatureCollection",
"features": {
"properties": {
"TYPE": "8003"
}
}
}
And this is definitely not the structure you have.

Related

How can I specify in a json schema that a certain property is mandatory and also must contain a specific value?

I want to create several json schemas for different scenarios.
For scenario 1 I would like to specify that:
a) The property "draftenabled" must have the value true.
b) the property "draftenabled" does exist.
I have checked this post
Validating Mandatory String values in JSON Schema
and tried the following
I tried to validate this json
{
"$schema": "./test-schema.json",
"draftenabled": false,
"prefix": "hugo"
}
with this schema test-schema.json that I had created in Visual Studio Code.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"properties": {
"$schema": {
"type": "string"
},
"draftenabled": {
"type": "boolean"
},
"prefix": {
"type": "string"
}
},
"additionalItems": false,
"contains": {
"properties": {
"draftenabled": {
"const": true
}
},
"required": [
"draftenabled"
]
}
}
I would have expected an error since the value for draftenabled is false rather than true.
It looks like there is some confusion around how the keywords apply to instances (data) of different types.
properties only applies to objects
additionalItems and contains only apply to arrays
Since your instance is an object, additionalItems and contains will be ignored.
Based on your description of what you want, I would do something like the following:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"properties": {
"$schema": {
"type": "string"
},
"draftenabled": {
"const": "true"
},
"prefix": {
"type": "string"
}
},
"required": [
"draftenabled"
]
}
This moves the definitions you have in the contains into the main schema. You got that bit right, just in the wrong place.
You also mention that this is a "scenario 1." If there are other scenarios, I suggest creating schemas like this for each scenario then wrapping all of them together in a oneOf or anyOf:
{
"oneOf": [
{ <scenario 1> },
{ <scenario 2> },
...
]
}

In Logic Apps JSON Array while parsing throwing error for single object but for multiple objects it is working fine

While parsing JSON in Azure Logic App in my array I can get single or multiple values/objects (Box as shown in below example)
Both type of inputs are correct but when only single object is coming then it is throwing an error "Invalid type. Expected Array but got Object "
Input 1 (Throwing error) : -
{
"MyBoxCollection":
{
"Box":{
"BoxName": "Box 1"
}
}
}
Input 2 (Working Fine) : -
{
"MyBoxCollection":
[
{
"Box":{
"BoxName": "Box 1"
},
"Box":{
"BoxName": "Box 2"
}
}]
}
JSON Schema :
"MyBoxCollection": {
"type": "object",
"properties": {
"box": {
"type": "array",
items": {
"type": "object",
"properties": {
"BoxName": {
"type": "string"
},
......
.....
..
}
Error Details :-
[
{
"message": "Invalid type. Expected Array but got Object .",
"lineNumber": 0,
"linePosition": 0,
"path": "Order.MyBoxCollection.Box",
"schemaId": "#/properties/Root/properties/MyBoxCollection/properties/Box",
"errorType": "type",
"childErrors": []
}
]
I used to use the trick of injecting a couple of dummy rows in the resultset as suggested by the other posts, but I recently found a better way. Kudos to Thomas Prokov for providing the inspiration in his NETWORG blog post.
The JSON parse schema accepts multiple choices as type, so simply replace
"type": "array"
with
"type": ["array","object"]
and your parse step will happily parse either an array or a single value (or no value at all).
You may then need to identify which scenario you're in: 0, 1 or multiple records in the resultset? I'm pasting below how you can create a variable (ResultsetSize) which takes one of 3 values (rs_0, rs_1 or rs_n) for your switch:
"Initialize_ResultsetSize": {
"inputs": {
"variables": [
{
"name": "ResultsetSize",
"type": "string",
"value": "rs_n"
}
]
},
"runAfter": {
"<replace_with_name_of_previous_action>": [
"Succeeded"
]
},
"type": "InitializeVariable"
},
"Check_if_resultset_is_0_or_1_records": {
"actions": {
"Set_ResultsetSize_to_0": {
"inputs": {
"name": "ResultsetSize",
"value": "rs_0"
},
"runAfter": {},
"type": "SetVariable"
}
},
"else": {
"actions": {
"Set_ResultsetSize_to_1": {
"inputs": {
"name": "ResultsetSize",
"value": "rs_1"
},
"runAfter": {},
"type": "SetVariable"
}
}
},
"expression": {
"and": [
{
"equals": [
"#string(body('<replace_with_name_of_Parse_JSON_action>')?['<replace_with_name_of_root_element>']?['<replace_with_name_of_list_container_element>']?['<replace_with_name_of_item_element>']?['<replace_with_non_null_element_or_attribute>'])",
""
]
}
]
},
"runAfter": {
"Initialize_ResultsetSize": [
"Succeeded"
]
},
"type": "If"
},
"Process_resultset_depending_on_ResultsetSize": {
"cases": {
"Case_no_record": {
"actions": {
},
"case": "rs_0"
},
"Case_one_record_only": {
"actions": {
},
"case": "rs_1"
}
},
"default": {
"actions": {
}
},
"expression": "#variables('ResultsetSize')",
"runAfter": {
"Check_if_resultset_is_0_or_1_records": [
"Succeeded",
"Failed",
"Skipped",
"TimedOut"
]
},
"type": "Switch"
}
For this problem, I met another stack overflow post which is similar to this problem. While there is one "Box", it will be shown as {key/value pair} but not [array] when we convert it to json format. I think it is caused by design, so maybe we can just add a record "Box" at the source of your xml data such as:
<Box>specific_test</Box>
And do some operation to escape the "specific_test" in the next steps.
Another workaround for your reference:
If your json data has only one array, we can use it to do a judgment. We can judge the json data if it contains "[" character. If it contains "[", the return value is the index of the "[" character. If not contains, the return value is -1.
The expression shows as below:
indexOf('{"MyBoxCollection":{"Box":[aaa,bbb]}}', '[')
The screenshot above is the situation when it doesn't contain "[", it return -1.
Then we can add a "If" condition. If >0, do "Parse JSON" with one of the schema. If =-1, do "Parse JSON" with the other schema.
Hope it would be helpful to your problem~
We faced a similar issue. The only solution we find is by manipulating the XML before conversion. We updated XML nodes which needs to be an array even when we have single element using this. We used a Azure function to update the required XML attributes and then returned the XML for conversion in Logic Apps. Hope this helps someone.

MongoDB: Search for a string in a collection and return only matched collection items

I have the following json saved in the mongoDB:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"ID": "1753242",
"TYPE": "8003"
}
},
{
"type": "Feature",
"properties": {
"ID": "4823034",
"TYPE": "7005"
}
}
]
}
When i want to search for a specific TYPE, I can do it like this:
db.geo.find({"features.properties.TYPE":"8003"})
My problem is, that this query returns the whole json and not just elements with the TYPE "8003".
Does anybody know, how to retrieve just elements with the TYPE "8003" by query?
With the Mongo db 3.2 release, you can use the new $filter aggregation operator to filter an array during projection, which includes all the matches in the array
db.test.aggregate([
{$match: {'features': {$elemMatch : {"properties.TYPE": '8003' }}},
{$project: {
features: {$filter: {
input: '$features',
as: 'feature',
cond: {$eq: ['$$feature.properties.TYPE', '8003']}
}}
}}
]);
If you just want the first element of the results, you can use the positional $ operator like below :
db.geo.find( { "features.properties.TYPE":"8003"}, { "features.$": 1 } )
$elemMatch operator returns only first element matching the $elemMatch condition in query result.
Please try executing following query
db.geo.find({
features: {
$elemMatch: {
"properties.TYPE": "8003"
}
}
}, {
features: {
$elemMatch: {
"properties.TYPE": "8003"
}
}
})
Please refer the documentation of $elemMatch operator as described in below mentioned URL
https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/

JSON Schema - matching based on array inclusion

I have the following simple JSON schema that does the regular expression match based on the content field of my data:
{
"$schema":"http://json-schema.org/schema#",
"allOf":[
{
"properties":{
"content":{
"pattern":"some_regex"
}
}
}
}
It successfully matches the following data:
{
"content": "some_regex"
}
Now lets say I want to add a list of UUIDs to ignore to my data:
{
"content": "some_regex",
"ignoreIds" ["123", "456"]
}
The problem arises when I want to modify my schema not to match when a given value is present in the list of ignoreIds:
Here is my failed attempt:
{
  "$schema": "http://json-schema.org/schema#",
  "allOf": [{
    "properties": {
      "content": {
        "pattern": "some_regex"
      }
    }
  }, {
    "properties": {
      "ignoreIds": {
        "not": {
          // how do I say 'do not match if "123" is in the ignoreIds array'????
        }
      }
    }
  }]
}
Any help will be appreciated!
your JSON schema for the ignoreIds has to be:
"ignoreIds": {
"type": "array",
"items": {
"type": "integer",
"not": {
"enum": [131, 132, whatever numbers you want]
}
}
}
which says
any value in the array ignoreIds matching the not-enum will make the
json invalid
This works of course for an array of strings also:
"ignoreIds": {
"type": "array",
"items": {
"type": "string",
"not": {
"enum": ["131", "132"]
}
}
}
Tested with JSON Schema Lint

How do Elasticsearch's "include_in_parent" / "include_in_root" work? Should it show in '_source'?

In a simple Elasticsearch mapping like this:
{
"personal_document": {
"analyzer": "standard",
"_timestamp": {
"enabled": true
},
"properties": {
"description": {
"type": "multi_field",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed"
},
"description": {
"type": "string",
"include_in_root": true
}
}
},
"my_nested": {
"type": "nested",
"include_in_root": true,
"properties": {
"description": {
"type": "string"
}
}
}
}
}
}
.... isn't "include_in_root": true supposed to add the field my_nested.description to the root document?
And during a query am I not supposed to see THAT field into the _source field?
and
Specifying an highlight directive on the field 'my_nested.description' would automatically retrieve the _included_in_root value_ instead of the nested field?
(something like this)
"highlight": {
"fields": {
"description": {},
"my_nested.description": {}
}
}
Or do I have some misunderstanding of the official nested type documentation?
(that is not really clear)
If the include_in_parent or include_in_root options are enabled on the nested documents then Elasticsearch internally indexes the data with nested fields flattened on the parent document. However, this is just internal for Elasticsearch and you'll never see them in the _source field.
If the user field is of type object, this document would be indexed
internally something like this...
as it is refered here.
Thus, you continue to perform actions (like the highlights that you mention) by referring to the nested document's fields. The highlight syntax that you refer to should look like this
"highlight": {
"fields": {
"my_nested.description": {}
}
}
and not
"highlight": {
"fields": {
"description": {}
}
}
You can use a wildcard for specifying highlight field:
POST /test-1/page/_search
{
"query": {
"query_string": {
"query": "Lorem ipsum"
}
},
"highlight" : {
"fields" : {
"*" : {}
}
}
}
If it's a good idea, I don't know. I guess it depends on your application.
This also works with nested documents, btw --- but seems to hickup when doing attachments on nested documents without include_in_root