Remove duplicates with date parameter - json

I need to compare duplicates ip of a json by date field and remove the older date
Ex:
[
{
"IP": "10.0.0.20",
"Date": "2019-09-14T20:00:11.543-03:00"
},
{
"IP": "10.0.0.10",
"Date": "2019-09-17T15:45:16.943-03:00"
},
{
"IP": "10.0.0.10",
"Date": "2019-09-18T15:45:16.943-03:00"
}
]
The output of operation need to be like this:
[
{
"IP": "10.0.0.20",
"Date": "2019-09-14T20:00:11.543-03:00"
},
{
"IP": "10.0.0.10",
"Date": "2019-09-18T15:45:16.943-03:00"
}
]

For simplicity's sake, I'll assume the order of the data doesn't matter.
First, if your data isn't already in Python, you can use json.load or json.loads to convert it into a Python object, following the straightforward type mappings.
Then you problem has three parts: comparing date strings as dates, finding the maximum element of a list by that date, and performing this process for each distinct IP address. For these purposes, you can use two of Pyhton's built-in methods and two from the standard library.
Python's built-in max and sorted functions (as well as list.sort) support a (keyword-only) key argument, which uses a function to determine the value to compare by. For example, max(d1, d2, key=lambda x: x[0]) compares the data by the first index of the each (like d1[0] < d2[0]), and returns whichever of d1 and d2 produced the larger key.
To allow that type of comparison between dates, you can use the datetime.datetime class. If your dates are all in the format specified by datetime.datetime.fromisoformat, you can use that function to turn your date strings into datetimes, which can then be compared to each other. Using that in a function that extracts the dates from the dictionaries gives you the key function you need.
def extract_date(item):
return datetime.datetime.fromisoformat(item['Date'])
Those functions allow you to choose the object from the list with the largest date, but not to keep separate values for different IP addresses. To do that, you can use itertools.groupby, which takes a key function and puts the elements of the input into separate outputs based on that key. However, there are two things you might need to watch out for with groupby:
It only groups elements that are next to each other. For example, if you give it [3, 3, 2, 2, 3], it will group two 3s, then two 2s, then one 3 rather than grouping all three 3 together.
It returns an iterator of key, iterator pairs, so you have to collect the results yourself. The best way to do that may depend on your application, but a basic approach is nested iterations:
for key, values in groupby(data, key_function):
for value in values:
print(key, value)
With the functions I've mentioned above, it should be relatively straightforward to assemble an answer to your problem.

Related

How can I query for multiple values after a wildcard?

I have a json object like so:
{
_id: "12345",
identifier: [
{
value: "1",
system: "system1",
text: "text!"
},
{
value: "2",
system: "system1"
}
]
}
How can I use the XDevAPI SearchConditionStr to look for the specific combination of value + system in the identifier array? Something like this, but this doesn't seem to work...
collection.find("'${identifier.value}' IN identifier[*].value && '${identifier.system} IN identifier[*].system")
By using the IN operator, what happens underneath the covers is basically a call to JSON_CONTAINS().
So, if you call:
collection.find(":v IN identifier[*].value && :s IN identifier[*].system")
.bind('v', '1')
.bind('s', 'system1')
.execute()
What gets executed, in the end, is (simplified):
JSON_CONTAINS('["1", "2"]', '"2"') AND JSON_CONTAINS('["system1", "system1"]', '"system1"')
In this case, both those conditions are true, and the document will be returned.
The atomic unit is the document (not a slice of that document). So, in your case, regardless of the value of value and/or system, you are still looking for the same document (the one whose _id is '12345'). Using such a statement, the document is either returned if all search values are part of it, and it is not returned if one is not.
For instance, the following would not yield any results:
collection.find(":v IN identifier[*].value && :s IN identifier[*].system")
.bind('v', '1')
.bind('s', 'system2')
.execute()
EDIT: Potential workaround
I don't think using the CRUD API will allow to perform this kind of "cherry-picking", but you can always use SQL. In that case, one strategy that comes to mind is to use JSON_SEARCH() for retrieving an array of paths corresponding to each value in the scope of identifier[*].value and identifier[*].system i.e. the array indexes and use JSON_OVERLAPS() to ensure they are equal.
session.sql(`select * from collection WHERE json_overlaps(json_search(json_extract(doc, '$.identifier[*].value'), 'all', ?), json_search(json_extract(doc, '$.identifier[*].system'), 'all', ?))`)
.bind('2', 'system1')
.execute()
In this case, the result set will only include documents where the identifier array contains at least one JSON object element where value is equal to '2' and system is equal to system1. The filter is effectively applied over individual array items and not in aggregate, like on a basic IN operation.
Disclaimer: I'm the lead developer of the MySQL X DevAPI Connector for Node.js

Using json_extract in sqlite to pull data from parent and child objects

I'm starting to explore the JSON1 library for sqlite and have been so far successful in the basic queries I've created. I'm now looking to create a more complicated query that pulls data from multiple levels.
Here's the example JSON object I'm starting with (and most of the data is very similar).
{
"height": 140.0,
"id": "cp",
"label": {
"bind": "cp_label"
},
"type": "color_picker",
"user_data": {
"my_property": 2
},
"uuid": "948cb959-74df-4af8-9e9c-c3cb53ac9915",
"value": {
"bind": "cp_color"
},
"width": 200.0
}
This json object is buried about seven levels deep in a json structure and I pulled it from the larger json construct using an sql statement like this:
SELECT value FROM forms, json_tree(forms.formJSON, '$.root')
WHERE type = 'object'
AND json_extract(value, '$.id') = #sControlID
// In this example, #sControlID is a variable that represents the `id` value we're looking for, which is 'cp'
But what I really need to pull from this object are the following:
the value from key type ("color_picker" in this example)
the values from keys bind ("cp_color" and "cp_label" in this example)
the keys value and label (which have values of {"bind":"<string>"} in this example)
For that last item, the key name (value and label in this case) can be any number of keywords, but no matter the keyword, the value will be an object of the form {"bind":"<some_string>"}. Also, there could be multiple keys that have a bind object associated with them, and I'd need to return all of them.
For the first two items, the keywords will always be type and bind.
With the json example above, I'd ideally like to retrieve two rows:
type key value
color_picker value cp_color
color_picker label cp_label
When I use json_extract methods, I end up retrieving the object {"bind":"cp_color"} from the json_tree table, but I also need to retrieve the data from the parent object. I feel like I need to do some kind of union, but my attempts have so far been unsuccessful. Any ideas here?
Note: if the {"bind":"<string>"} object doesn't exist as a child of the parent object, I don't want any rows returned.
Well, I was on the right track and eventually figured out it. I created a separate query for each of the items I was looking for, then INNER JOINed all the json_tree tables from each of the queries to have all the required fields available. Then I json_extracted the required data from each of the json fields I needed data from. In the end, it gave me exactly what I was looking for, though I'm sure it could be written more efficiently.
For anyone interested, this is what hte final query ended up looking like:
SELECT IFNULL(json_extract(parent.value, '$.type'), '_window_'), child.key, json_extract(child.value, '$.bind') FROM (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) parent INNER JOIN (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(value, '$.bind') != 'NULL' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) child ON child.parent = parent.id;
If you have any tips on reducing its complexity, feel free to comment!

Is it possible to get a sorted list of attribute values from a JSON array using JSONPath

Given JSON like:
[
{
"Serial no": 994,
},
{
"Serial no": 456,
}
]
I know this query will give me an array of all Serial no values, in the order they are in the JSON: $..['Serial no']
I'm not sure exactly what sorting capabilities JSONPath has but I think you can use / and \ to sort - but how are they used to modify my query string in this case? I am only interested doing this in pure JSONPath, not JS or post-query sorting - that's easy, I just want to know if I can avoid it.
This is a source I found suggesting sorting is supported but it might be product-specific?
I'm using http://www.jsonquerytool.com/ to test this

Filtering on JSON internal keys stored in PostgreSQL table

I have a report in JSON format stored in a field in a PostgreSQL database table.
Say the (simplified) table format is:
Column | Type
-------------------+----------------------------
id | integer
element_id | character varying(256)
report | json
and the structure of the data in the reports is like this
{
"section1":
"test1": {
"outcome": "nominal",
"results": {
"value1": 34.,
"value2": 56.
}
},
"test2": {
"outcome": "warning",
"results": {
"avg": 4.5,
"std": 21.
}
},
...
"sectionN": {
...
}
}
That is, there are N keys at first level (the sections), each of them being an object with a set of keys (the tests), with a outcome and a variable set of results in form of (key, value) pairs.
I need to do filtering based on internal JSON keys. More specifically, in this example, I want to know if it is possible, using SQL alone, to obtain the elements that have, for example, the std value in the results section above a certain threshold, say 10. I can even know that the std is in test2, but I do not know a priori in which section. With this filter (test2.std > 10.), for example, the record with the sample data shown above will appear, since the std variable in the test2 test has this value equal to 21. (>10.).
Another, simpler, filter could be to request all the records for which the test2.outcome is not nominal.
One way is jsonb_each, like:
select section.key
, test.key
from t1
cross join
jsonb_each(t1.col1) section
cross join
jsonb_each(section.value) test
where (test.value->'results'->>'std')::int > 10
Example at SQL Fiddle.

How to enter multiple table data in mongoDB using json

I am trying to learn mongodb. Suppose there are two tables and they are related. For example like this -
1st table has
First name- Fred, last name- Zhang, age- 20, id- s1234
2nd table has
id- s1234, course- COSC2406, semester- 1
id- s1234, course- COSC1127, semester- 1
id- s1234, course- COSC2110, semester- 1
how to insert data in the mongo db? I wrote it like this, not sure is it correct or not -
db.users.insert({
given_name: 'Fred',
family_name: 'Zhang',
Age: 20,
student_number: 's1234',
Course: ['COSC2406', 'COSC1127', 'COSC2110'],
Semester: 1
});
Thank you in advance
This would be a assuming that what you want to model has the "student_number" and the "Semester" as what is basically a unique identifier for the entries. But there would be a way to do this without accumulating the array contents in code.
You can make use of the upsert functionality in the .update() method, with the help of of few other operators in the statement.
I am going to assume you are going this inside a loop of sorts, so everything on the right side values is actually a variable:
db.users.update(
{
"student_number": student_number,
"Semester": semester
},
{
"$setOnInsert": {
"given_name": given_name,
"family_name": family_name,
"Age": age
},
"$addToSet": { "courses": course }
},
{ "upsert": true }
)
What this does in an "upsert" operation is first looks for a document that may exist in your collection that matches the query criteria given. In this case a "student_number" with the current "Semester" value.
When that match is found, the document is merely "updated". So what is being done here is using the $addToSet operator in order to "update" only unique values into the "courses" array element. This would seem to make sense to have unique courses but if that is not your case then of course you can simply use the $push operator instead. So that is the operation you want to happen every time, whether the document was "matched" or not.
In the case where no "matching" document is found, a new document will then be inserted into the collection. This is where the $setOnInsert operator comes in.
So the point of that section is that it will only be called when a new document is created as there is no need to update those fields with the same information every time. In addition to this, the fields you specified in the query criteria have explicit values, so the behavior of the "upsert" is to automatically create those fields with those values in the newly created document.
After a new document is created, then the next "upsert" statement that uses the same criteria will of course only "update" the now existing document, and as such only your new course information would be added.
Overall working like this allows you to "pre-join" the two tables from your source with an appropriate query. Then you are just looping the results without needing to write code for trying to group the correct entries together and simply letting MongoDB do the accumulation work for you.
Of course you can always just write the code to do this yourself and it would result in fewer "trips" to the database in order to insert your already accumulated records if that would suit your needs.
As a final note, though it does require some additional complexity, you can get better performance out of the operation as shown by using the newly introduced "batch updates" functionality.For this your MongoDB server version will need to be 2.6 or higher. But that is one way of still reducing the logic while maintaining fewer actual "over the wire" writes to the database.
You can either have two separate collections - one with student details and other with courses and link them with "id".
Else you can have a single document with courses as inner document in form of array as below:
{
"FirstName": "Fred",
"LastName": "Zhang",
"age": 20,
"id": "s1234",
"Courses": [
{
"courseId": "COSC2406",
"semester": 1
},
{
"courseId": "COSC1127",
"semester": 1
},
{
"courseId": "COSC2110",
"semester": 1
},
{
"courseId": "COSC2110",
"semester": 2
}
]
}