Aggregated distinct from find results - json

First of all db.collection.distinct("name"); gives not bad result for me, but the problem is that distinct has limitation (results must not be larger than the maximum BSON size), and I need to aggregate through it, right?
Another things is that I really want to do distinct from find filtered results, so from something like this:
db.collection.find({ name: { $exists: true, $ne: null }, state: "published" });
So the main idea is to save all published "name" values from full collection without any limitation in the json file.
So I used:
>cat 1.json
db.collection.distinct("name");
mongo db < 1.json > 2.json

Correct query is:
db.collection.aggregate([{$match:{name: { $exists: true, $ne: null },
state: "published" }},{$group: {_id: null, uniqueValues: {$addToSet:
"$name"}}}]);

Related

Mongo query to get comma separated value

I have query which is traversing only in forward direction.
example:
{
"orderStatus": "SUBMITTED",
"orderNumber": "785654",
"orderLine": [
{
"lineNumber": "E1000",
**"trackingnumber": "12345,67890",**
"lineStatus": "IN-PROGRESS",
"lineStatusCode": 50
}
],
"accountNumber": 9076
}
find({'orderLine.trackingNumber' : { $regex: "^12345.*"} })**
When I use the above query I get the entire document. But I want to fetch the document when I search with 67890 value as well
At any part of time I will be always querying with single tracking number only.
12345 or 67890 Either with 12345 or 67890. There are chances tracking number value can extend it's value 12345,56789,01234,56678.
I need to pull the whole document no matter what the tracking number is in whatever position.
OUTPUT
should be whole document
{
"orderStatus": "SUBMITTED",
"orderNumber": "785654",
"orderLine": [
{
"lineNumber": "E1000",
"trackingnumber": "12345,67890",
"lineStatus": "IN-PROGRESS",
"lineStatusCode": 50
}
],
"accountNumber": 9076
}
Also I have done indexing for trackingNumber field. Need help here. Thanks in advance.
Following will search with either 12345 or 67890. It is similar to like condition
find({'orderLine.trackingNumber' : { $regex: /12345/} })
find({'orderLine.trackingNumber' : { $regex: /67890/} })
There's also an alternative way to do this
Create a text index
db.order.createIndex({'orderLine.trackingnumber':"text"})
You can make use of this index to search the value from trackingnumber field
db.order.find({$text:{$search:'12345'}})
--
db.order.find({$text:{$search:'67890'}})
--
//Do take note that you can't search using few in between characters
//like the following query won't give any result..
db.order.find({$text:{$search:'6789'}}) //have purposefully removed 0
To further understand how $text searches work, please go through the following link.

How will we ignore duplicates while insert json data into mongoDB based on multiple condition

[
{
"RollNo":1,
"name":"John",
"age":20,
"Hobby":"Music",
"Date":"9/05/2018"
"sal":5000
},
{
"RollNo":2,
"name":"Ravi",
"age":25,
"Hobby":"TV",
"Date":"9/05/2018"
"sal":5000
},
{
"RollNo":3,
"name":"Devi",
"age":30,
"Hobby":"cooking",
"Date":"9/04/2018"
"sal":5000
}
]
Above is the JSON file i need to insert into a MongoDB. Similar JSON data is already in my mongoDB collection named 'Tests'.I have to ignore the records which is already
in the mongoDB based on a certain condition.
[RollNo in mongoDB == RollNo in the json need to insert && Hobby in mongoDB ==Hobby in the json need to insert && Date in mongoDB == Date in the json need to insert].
If this condition matches, i need to igore the insertion,else need to insert the data into DB .
I am using nodejs. Anyone please help me to do it.
If you are using mongoose then use upsert.
db.people.update(
{ RollNo: 1 },
{
"RollNo":1,
"name":"John",
"age":20,
"Hobby":"Music",
"Date":"9/05/2018"
"sal":5000
},
{ upsert: true }
)
But to avoid inserting the same document more than once, only use upsert: true if the query field is uniquely indexed.
The easiest and safest way to do this is by using a compound index.
You can create a compound index like this:
db.people.createIndex( { "RollNo": 1, "Hobby": 1, "Date" : 1 }, { unique: true } )
Then the duplicated inserts will produce an error which you need to process in your code.

Update multiple elements of a list using Couchbase N1QL

context
I have somewhere in my couchbase documents, a node looking like this :
"metadata": {
"configurations": {
"AU": {
"enabled": false,
"order": 2147483647
},
"BE": {
"enabled": false,
"order": 2147483647
},
"BG": {
"enabled": false,
"order": 2147483647
} ...
}
}
and it goes along with a list country unicodes and their "enabled" state
what I want to achieve
update this document to mark is as disabled ("enabled" = false) for all countries
to do this I hoped this syntax would work (let's say I'm trying to update document with id 03c53a2d-6208-4a35-b9ec-f61e74d81dab)
UPDATE `data` t
SET country.enabled = false
FOR country IN t.metadata.configurations END
where meta(t).id = "03c53a2d-6208-4a35-b9ec-f61e74d81dab";
but it seems like it doesn't change anything on my document
any hints ? :)
thanks guys,
As the filed name is dynamic you can generate field names using OBJECT_NAMES() and use that during update of field.
UPDATE data t USE KEYS "03c53a2d-6208-4a35-b9ec-f61e74d81dab"
SET t.metadata.configurations.[v].enabled = false FOR v IN OBJECT_NAMES(t.metadata.configurations) END ;
In above example OBJECT_NAMES(t.metadata.configurations) generates ["AU", "BE","BG"]
When field of JSON is referenced .[v] it evaluates v and value become field.
So During looping construct t.metadata.configurations.[v].enabled becomes
t.metadata.configurations.`AU`.enabled,
t.metadata.configurations.`BE`.enabled,
t.metadata.configurations.`BG`.enabled
Depends on value of v.
This query should work:
update data
use keys "03c53a2d-6208-4a35-b9ec-f61e74d81dab"
set country.enabled = true for country within metadata.configurations when
country.enabled is defined end
The WITHIN allows "country" to be found at any level of the metadata.configurations structure, and we use the "WHEN country.enabled IS DEFINED" to make sure we are looking at the correct type of "country" structure.

MySQL to MongoDB query translation

I need to convert following mysql query to mongo.
Any help will be highly appreciated.
SELECT cr.*, COUNT(cj.job_id) AS finished_chunks FROM `checks_reports_df8` cr
LEFT JOIN `checks_jobs_df8` cj ON cr.id = cj.report_id
WHERE cr.started IS NOT NULL AND cr.finished IS NULL AND cj.is_done = 1
MongoDB doesn't do JOINs. So you will have to query both collections and do the JOIN on the application layer. How to do this exactly depends on which programming language you use to develop your application. You don't say which one you use, so I will just give you an example in JavaScript. When you use a different language: The second snippet is just a simple FOR loop.
These are the MongoDB queries you would use. I don't have access to your data, so I can not guarantee correctness.
var reports = db.checks_reports_df8.find({
"started": {$exists: 1 },
"finished": {$exists: 0 }
});
This query assumes that your null-values are represented by missing fields which is normal practice in MongoDB. When you have actual null values, use "started": { $ne: null } and "finished": null.
Then iterate over the array of documents you get. For each RESULT perform this query:
reports.forEach(function(report) {
var job_count = db.checks_jobs_df8.aggregate([
{$match: {
"report_id": report.id,
"is_done": 1
}},
{$group: {
_id: "$job_id",
"count": { $sum: 1 }
}}
])
// output the data from report and job_count here
});

Improving the performance of aggregating the value of a key spread across multiple JSON rows

I'm currently storing the data in the following format (JSON) in a Redis ZSET. The score is the timestamp in miliseconds.
<timestamp_1> - [ { "key1" : 200 }, { "key2": 100 }, {"key3" : 5 }, .... {"key_n" : 1} ]
<timestamp_2> - [ { "key50" : 500 }, { "key2": 300 }, {"key3" : 290 }, ....{"key_m" : 26} ]
....
....
<timestamp_k> - [ { "key1" : 100 }, { "key2": 200 }, {"key3" : 50 }, ....{"key_p" : 150} ]
I want to extract the values for a key between a given time range.
For example, the values of key2 will in the above example for the entire time range would be.
[timestamp_1:100, timestamp_2:300, ..... timestamp_k:200]
I can get the current output but I've to parse the JSON for each row and then iterate through it to get the value of a given key in each row. The parsing becomes a bottleneck as the size of each row increases (n,m, and p can be as big as 10000).
I'm looking for suggestions about if there is a way to improve the performance in Redis? Are there any specific parsers (in Scala) that can help here.
I'm also open to using other stores such as Cassandra and Elasticsearch if they give better performance. I'm also open to other formats apart from JSON to store the data in Redis ZSet.
Cassandra will work just fine for your requirement.
You can keep key_id as the partitioning key and timestamp as the row key.
You always define your query before designing your column family in cassandra. extract the values for a key between a given time range.
If you are using CQL3,
Create schema:
CREATE TABLE imp_keys (key_id text, score int, timestamp timeuuid,PRIMARY KEY(key_id,timestamp));
Access data:
SELECT score FROM imp_keys WHERE key_id=key2 AND timestamp > maxTimeuuid(start_date) AND timestamp < maxTimeuuid(end_date);