How to query data inside subobjects in MongoDB? - json

I need to query data in MongoBD and my JSON files look like the below.
The problem is that the time/date stamp changes for every five minutes so MongoDB understands the files as having different keys, for example "2018-01-02T00:00:00+09:00" and "2018-01-02T00:05:00+09:00" are different keys.
Finally, how would I query for when T1 is below 280?
I have taken a look at elemMatch but it is for arrays and not subobjects.
I am starting in MongoDB world so I am sorry if this is a silly question but I could not find the answer for it anywhere.
File 1
{
"2018-01-02T00:00:00+09:00": {
"141474": {
"T1": 276.5029,
"T2": 279.3629
},
"141475": {
"T1": 280.1534,
"T2": 279.7219
},
}
}
File 2
{
"2018-01-02T12:00:00+09:00": {
"141474": {
"T1": 275.1324,
"T2": 276.9986
},
"141475": {
"T1": 267.3324,
"T2": 250.6574
},
}
}

Related

JSON Database doesn't work correctly as per REST technique

I created a JSON server and this is the data that I'm using. However, when I'm trying to query the examlist and relate it to the students (i'd like to receive the students based on their ID (the picture below shows the REST query - I'm using ?_expand=student )) it won't display them. The code shows correct as per JSON validators, but my goal is to have them working.
The way my data is organized (the examlist table) won't display the students, because apparently, it cannot read their IDs. This database will be used for HTTP requests, hence I need it fully working.
I'll upload another image so that you can visualize my code.
Momentarily instead of my studentIDs, it's showing some random 0,1 numbers and the student IDs are being pushed down along the arbitrary tree.
(Just the examlist "table")
It's M:M relationship (relational database) and how I want it structured is:
Table "students" that contains information about the students;
I have "table" exams that contains information about the exams;
And then I have another "table" examlist which contains information about the EXAM (ExamID) and the students enrolled in it (basically relates the two abovementioned tables)
When I try querying the students through the "examlist" table, it won't work. However, the other "table" -- exam, does work.
My assumption is the way I have organized the students in the examlist "table" is not good, however, given my very little experience I cannot seem to see where the issue is.
I hope I cleared it out for the most of you! Sorry for any inconvenience caused.
{
"students": [
{
"id": 3021,
"nume": "Ionut",
"prenume": "Grigorescu",
"an": 3,
"departament": "IE"
},
{
"id": 3061,
"nume": "Nadina",
"prenume": "Pop",
"an":3,
"departament": "MG"
},
{
"id": 3051,
"nume": "Ionut",
"prenume": "Casca",
"an": 3,
"departament": "IE"
}
],
"exams": [
{
"id": 1,
"subiect": "Web Semantic",
"profesor": {
"Nume": "Robert Buchman"
}
},
{
"id": 2,
"subiect": "Programare Web",
"profesor": {
"Nume": "Mario Cretu"
}
},
{
"id": 3,
"subiect": "Medii de Programare",
"profesor": {
"Nume": "Valentin Stinga"
}
}
],
"listaexamene": [
{
"examId":1,
"Data Examen":"02/06/2022 12:00",
"studentId":
[
{
"id":3021
},
{
"id":3051
}
]
},
{
"examId":2,
"Data Examen":"27/05/2022 10:00",
"studentId":
[
{
"id":3021
},
{
"id":3051
}
]
},
{
"examId":1,
"Data Examen":"04/06/2022 10:00",
"studentId":
[
{
"id":3021
},
{
"id":3051
},
{
"id":3061
}
]
}
]
}
I had to repost with more information after my first one got closed down
I think I finally got the answer. The problem lays in the JSON server. Apparently, it cannot obtain information from further down the arbitrary tree, only the first layer.
Thank you all for your input on the previous post!

How do I properly use deleteMany() with an $and query in the MongoDB shell?

I am trying to delete all documents in my collection infrastructure that have a type.primary property of "pipelines" and a type.secondary property of "oil."
I'm trying to use the following query:
db.infrastructure.deleteMany({$and: [{"properties.type.primary": "pipelines"}, {"properties.type.secondary": "oil"}] }),
That returns: { acknowledged: true, deletedCount: 0 }
I expect my query to work because in MongoDB Compass, I can retrieve 182 documents that match the query {$and: [{"properties.type.primary": "pipelines"}, {"properties.type.secondary": "oil"}] }
My documents appear with the following structure (relevant section only):
properties": {
"optional": {
"description": ""
},
"original": {
"Opername": "ENBRIDGE",
"Pipename": "Lakehead",
"Shape_Leng": 604328.294581,
"Source": "EIA"
},
"required": {
"unit": null,
"viz_dim": null,
"years": []
},
"type": {
"primary": "pipelines",
"secondary": "oil"
}
...
My understanding is that I just need to pass a filter to deleteMany() and that $and expects an array of objects. For some reason the two combined isn't working here.
I realized the simplest answer was the correct one -- I spelled my database name incorrectly.

Pentaho Kettle: How to dynamically fetch JSON file columns

Background: I work for a company that basically sells passes. Every order that is placed by the customer will contain N number of passes.
Issue: I have these JSON event-transaction files coming into a S3 bucket on a daily basis from DocumentDB (MongoDB). This JSON file is associated to the relevant type of event (insert, modify or delete) for every document key (which is an order in my case). The example below illustrates a "Insert" type of event that came through to the S3 bucket:
{
"_id": {
"_data": "11111111111111"
},
"operationType": "insert",
"clusterTime": {
"$timestamp": {
"t": 11111111,
"i": 1
}
},
"ns": {
"db": "abc",
"coll": "abc"
},
"documentKey": {
"_id": {
"$uuid": "abcabcabcabcabcabc"
}
},
"fullDocument": {
"_id": {
"$uuid": "abcabcabcabcabcabc"
},
"orderNumber": "1234567",
"externalOrderId": "12345678",
"orderDateTime": "2020-09-11T08:06:26Z[UTC]",
"attraction": "abc",
"entryDate": {
"$date": 2020-09-13
},
"entryTime": {
"$date": 04000000
},
"requestId": "abc",
"ticketUrl": "abc",
"tickets": [
{
"passId": "1111111",
"externalTicketId": "1234567"
},
{
"passId": "222222222",
"externalTicketId": "122442492"
}
],
"_class": "abc"
}
}
As we see above, every JSON file might contain N number of passes and every pass is - in turn - is associated to an external ticket id, which is a different column (as seen above). I want to use Pentaho Kettle to read these JSON files and load the data into the DW. I am aware of the Json input step and Row Normalizer that could then transpose "PassID 1", "PassID 2", "PassID 3"..."PassID N" columns into 1 unique column "Pass" and I would have to have to apply a similar logic to the other column "External ticket id". The problem with that approach is that it is quite static, as in, I need to "tell" Pentaho how many Passes are coming in advance in the Json input step. However what if tomorrow I have an order with 10 different passes? How can I do this dynamically to ensure the job will not break?
If you want a tabular output like
TicketUrl Pass ExternalTicketID
---------- ------ ----------------
abc PassID1Value1 ExTicketIDvalue1
abc PassID1Value2 ExTicketIDvalue2
abc PassID1Value3 ExTicketIDvalue3
And make incoming value dynamic based on JSON input file values, then you can download this transformation Updated Link
I found everything work dynamic in JSON input.

Logstash json field removal

We have a heavily nested json document containing server metrcs, the document contains > 1000 fields some of which are completely irrelevant to us for analytic purposes so i would like to remove them before indexing the document in Elastic.
However i am unable to find the correct filter to use as the fields i want to remove have common names in multiple different objects within the document.
The source document looks like this ( reduced in size for brevity)
[
{
"server": {
"is_master": true,
"name": "MYServer",
"id": 2111
},
"metrics": {
"Server": {
"time": {
"boundary": {},
"type": "TEXT",
"display_name": "Time",
"value": "2018-11-01 14:57:52"
}
},
"Mem_OldGen": {
"used": {
"boundary": {},
"display_name": "Used(mb)",
"value": 687
},
"committed": {
"boundary": {},
"display_name": "Committed(mb)",
"value": 7116
}
"cpu_count": {
"boundary": {},
"display_name": "Cores",
"value": 4
}
}
}
}
]
The data is loaded into logstash using the http_poller input plugin and needs to be processed before sending to Elastic for indexing.
I am trying to remove the fields that are not relevant for us to track for analytical purposes, these include the "display_name" and "boundary" fields from each json object in the different metrics.
I have tried using the mutate filter to remove the fields but because they exist in so many different objects it requires to many coded paths to be added to the logstash config.
I have also looked at the ruby filter, which seems promising as it can look the event, but i am unable to get it to crawl the entire json document, or more importantly actually remove the fields.
Here is what i was trying as a test
filter {
split{
field => "message"
}
ruby {
code => '
event.get("[metrics][Mem_OldGen][used]").to_hash.keys.each { |k|
logger.info("field is:", k)
if k.include?("display_name")
event.remove(k)
end
if k.include?("boundary")
event.remove(k)
end
}
'
}
}
It first splits the input at the message level to create one event per server, then tries to remove the fields from a specific metric.
Any help you be greatly appreciated.
If I get the point, you want to keep just the value key.
So, considering the response hash:
response = {
"server": {
"is_master": true,
"name": "MYServer",
"id": 2111
},
"metrics": {
...
You could do:
response[:metrics].transform_values { |hh| hh.transform_values { |h| h.delete_if { |k,v| k != :value } } }
#=> {:server=>{:is_master=>true, :name=>"MYServer", :id=>2111}, :metrics=>{:Server=>{:time=>{:value=>"2018-11-01 14:57:52"}}, :Mem_OldGen=>{:used=>{:value=>687}, :committed=>{:value=>7116}, :cpu_count=>{:value=>4}}}}

Elasticsearch mapping of nested structure

I'm looking for some pointers on mapping a somewhat dynamic structure for consumption by Elasticsearch.
The raw structure itself is json, but the problem is that a portion of the structure contains a variable, rather than the outer elements of the structure being static.
To provide a somewhat redacted example, my json looks like this:
"stat": {
"state": "valid",
"duration": 5,
},
"12345-abc": {
"content_length": 5,
"version": 2
}
"54321-xyz": {
"content_length": 2,
"version", 1
}
The first block is easy; Elasticsearch does a great job of mapping the "stat" portion of the structure, and if I were to dump a lot of that data into an index it would work as expected. The problem is that the next 2 blocks are essentially the same thing, but the raw json is formatted in such a way that a unique element has crept into the structure, and Elasticsearch wants to map that by default, generating a map that looks like this:
"stat": {
"properties": {
"state": {
"type": "string"
},
"duration": {
"type": "double"
}
}
},
"12345-abc": {
"properties": {
"content_length": {
"type": "double"
},
"version": {
"type": "double"
}
}
},
"54321-xyz": {
"properties": {
"content_length": {
"type": "double"
},
"version": {
"type": "double"
}
}
}
I'd like the ability to index all of the "content_length" data, but it's getting separated, and with some of the variable names being used, when I drop the data into Kibana I wind up with really long fieldnames that become next to useless.
Is it possible to provide a generic tag to the structure? Or is this more trivially addressed at the json generation phase, with our developers hard coding a generic structure name and adding an identifier field name.
Any insight / help greatly appreciated.
Thanks!
If those keys like 12345-abc are generated and possibly infinite values, it will get hard (if not impossible) to do some useful queries or aggregations. It's not really clear which exact use case you have for analyzing your data, but you should probably have a look at nested objects (https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html) and generate your input json accordingly to what you want to query for. It seems that you will have better aggregation results if you put these additional objects into an array with a special field containing what is currently your key.
{
"stat": ...,
"things": [
{
"thingkey": "12345-abc",
"content_length": 5,
"version": 2
},
...
]
}