Intersept a FeatureCollection in MongoDB - json

I have a GeoJson filled with states from Austria and I want to do a query that gives me as output which certain states intercepts my polygon.
This is my query:
db.GeoAustria.find(
{
'features.geometry':{
$geoIntersects:{
$geometry:{
type: "Polygon",
coordinates: [
[
[
16.21685028076172,
48.007381433478855
],
[
16.24225616455078,
47.98716432210271
],
[
16.256675720214844,
48.00669234420252
],
[
16.21685028076172,
48.007381433478855
]
]
]
}
}
}
}
)
But it gives me all the features, including those that don't overlap the polygon...
Where is my mistake in this query?

Basic array match misunderstanding here. The input set is a single doc with 95 polygons in an array in a single FeatureCollection object. When you do a find() on such things, any individual geo that is an intersect will cause the entire doc to be returned as a match. This is exactly the same as:
> db.foo.insert({x:["A","B","C"]})
WriteResult({ "nInserted" : 1 })
> db.foo.find({x:"A"});
{ "_id" : ObjectId("5fb1845b08c09fb8dfe8d1c1"), "x" : [ "A", "B", "C" ] }
The whole doc is returned, not just element "A".
Let's assume that you might have more than one big doc in your collection. This pipeline yields the single target geometry for Baden (I tested it on your input set):
var Xcoords = [
[
[
16.21685028076172,
48.007381433478855
],
[
16.24225616455078,
47.98716432210271
],
[
16.256675720214844,
48.00669234420252
],
[
16.21685028076172,
48.007381433478855
]
]
];
var targ = {type: "Polygon", coordinates: Xcoords};
db.geo1.aggregate([
// First, eliminate any docs where the geometry array has zero intersects. In this
// context, features.geometry means "for each element of array features get the
// geometry field from the object there", almost like saying "features.?.geometry"
{$match: {"features.geometry": {$geoIntersects: {$geometry: targ}} }}
// Next, break up any passing docs of 95 geoms into 95 docs of 1 geom...
,{$unwind: "$features"}
// .. and run THE SAME $match as before to match just the one we are looking for.
// In this context, the array is gone and "features.geometry" means get JUST the
// object named geometry:
,{$match: {"features.geometry": {$geoIntersects: {$geometry: targ}} }}
]);
Beyond this, I might recommend breaking up that FeatureCollection into something that is both indexable (FeatureCollection is NOT indexable in MongoDB) and easier to deal with. For example, this little script run against your single-doc/many-polys design will convert it in 95 docs with extra info:
db.geo2.drop();
mainDoc = db.geo1.findOne(); // the one Austria doc
mainDoc['features'].forEach(function(oneFeature) {
var qq = {
country: "Austria",
crs: mainDoc['crs'],
properties: oneFeature['properties'],
geometry: oneFeature['geometry']
};
db.geo2.insert(qq);
});
db.geo2.aggregate([
{$match: {"geometry": {$geoIntersects: {$geometry: targ}} }}
]);
// yields same single doc output (Baden)
This allows ease of matching and filtering. For more on FeatureCollection vs. GeometryCollection see https://www.moschetti.org/rants/hurricane.html.

Related

How to access data object from a nested array

I am using ObservableHQ and vega lite API to do data visualizations and have faced a problem I can't figure out.
The problem is that, I would like to access data object from the following data structure,
Array
Array
Array
Item
Item
Array
As you can see in my bad drawing, I have a multidimensional array and would like to access a specific array from the main array. How can I do that using Vegalite API?
vl.markCircle({
thickness: 4,
bandSize: 2
})
.data(diff[0])
.encode(
vl.x().fieldQ("mins").scale({ domain: [-60, 60] }),
vl.color().fieldN('type').scale({ range: ['#636363', '#f03b20'] }),
)
.config({bandSize: 10})
.width(600)
.height(40)
.render()
Thank you,
Based on your comments, I’m assuming that you’re trying to automatically chart all of the nested arrays (separately), not just one of them. And based on your chart code, I’m assuming that your data looks sorta like this:
const diff = [
[
{ mins: 38, type: "Type B" },
{ mins: 30, type: "Type B" },
{ mins: 28, type: "Type A" },
…
],
[
{ mins: 20, type: "Type B" },
{ mins: 17, type: "Type A" },
{ mins: 19, type: "Type A" },
…
],
…
];
First, flatten all the arrays into one big array, and record which array each came from with a new array property on the item object, with flatMap. If each child array represents, say, a different city, or a different year, or a different person collecting the data, you could replace array: i with something more meaningful about the data.
const flat = diff.flatMap((arr, i) => arr.map((d) => ({ ...d, array: i })));
Then use Vega-Lite’s “faceting” (documentation, Observable tutorial and examples) to make split the chart into sections, one for each value of array: i, with shared scales. This just adds one line to your example:
vl
.markCircle({
thickness: 4,
bandSize: 2
})
.data(flat)
.encode(
vl.row().fieldN("array"), // this line is new
vl
.x()
.fieldQ("mins")
.scale({ domain: [-60, 60] }),
vl
.color()
.fieldN("type")
.scale({ range: ["#636363", "#f03b20"] })
)
.config({ bandSize: 10 })
.width(600)
.height(40)
.render()
Here’s an Observable notebook with examples of this working. As I show there at the bottom, you can also map over your array to make a totally separate chart for each nested array.

Unable to scrape the names from json response

There are around 966 names in the webpage with json content but with my script I'm getting only 10 out of them. I'm very new to json that is why I'can't figure out the mistake I'm making. How can I get all the names? I'm trying with the below code:
import requests
url = 'https://www.zebra.com/bin/zebra/partnersearch?inMiles=true&start=0&numRows=10&latitude=39.5500507&longitude=-105.7820674&sortOrder=asc&sortBy=distance&country=US&searchRadius=5000'
response = requests.get(url)
data = response.json()
for item in data:
print(item['name'])
Partial json content from that page is:
[{"id":"001i000001XR9dqAAD","website":"www.resortinternet.com","type":"partner","phoneNumber":"+1.970.262.3515","name":"Resortnet, LLC","logoPresent":"No","logoExtension":"","des":"Technology provider for destination resorts","translatedName":"ResortInternet","dbaName":"ResortInternet","PR":"NA","AN":"6244306","accountType":["Reseller"],"contentType":"parent","countries":["US"],"HSA":[],"countriesAndHsa":["US"],"premierSolutionPartner":false,"premierBusinessPartner":false,"solutionPartner":true,"businessPartner":false,"advancedSpecialistBarcodePrinterSupplies":false,"advancedSpecialistCardPrinters":false,"advancedSpecialistSupplies":false,"advancedSpecialistWirelessNetworks":false,"advancedSpecialistPrintEngines":false,"advancedSpecialistRfid":false,"specialistBarcodePrinterSupplies":false,"specialistCardPrinters":false,"specialistSupplies":false,"specialistWirelessNetworks":false,"specialistPrintEngines":false,"specialistRfid":false,"advancedRepairSpecialistLabelPrinter":false,"advancedRepairSpecialistCardPrinter":false,"advancedRepairSpecialistMobilePrinter":false,"advancedRepairSpecialistPrintEngine":false,"repairSpecialistLabelPrinter":false,"repairSpecialistCardPrinter":false,"repairSpecialistMobilePrinter":false,"repairSpecialistPrintEngine":false,"registeredResellerNoSpecialization":false,"pmiWraps":[{"programName":"Solution Partner","category":"Reseller","id":"001i000001XR9dqAAD_2","type":"pmiWrap","contentType":"child"}],"partnerLocations":[{"locationType":"Headquarters","addressLine1":"117 S 6th Ave.,","addressLine2":"PO Box 2718","city":"Frisco","state":"Colorado","zipCode":"80443","country":"United States","phone":"(970) 262-3515","fax":"(970) 668-9431","latlon":"39.5754576,-106.0952117","distance":16.8,"countryCode":"US","HSA":[],"id":"001i000001XR9dqAAD_0","type":"partnerLocation","contentType":"child"},{"locationType":"Primary Location","addressLine1":"117 S 6th Ave.,","city":"Frisco","state":"Colorado","zipCode":"80443","country":"United States","phone":"+1.970.262.3515","latlon":"39.5754576,-106.0952117","distance":16.8,"countryCode":"US","HSA":[],"id":"001i000001XR9dqAAD_1","type":"partnerLocation","contentType":"child"},{"locationType":"Address","addressLine1":"RESORTINTERNET\r2718:FRISCO:80443\r117 S 6TH AVERM UNIT 2","city":"Frisco","state":"Colorado","stateCode":"CO","zipCode":"80443","country":"United States","latlon":"39.5744309,-106.0975203","distance":16.9,"countryCode":"US","HSA":[],"id":"001i000001XR9dqAAD_100","type":"partnerLocation","contentType":"child"}],"verticalHierarchyWraps":[],"primaryLocation":{"locationType":"Headquarters","addressLine1":"117 S 6th Ave.,","addressLine2":"PO Box 2718","city":"Frisco","state":"Colorado","zipCode":"80443","country":"United States","phone":"(970) 262-3515","fax":"(970) 668-9431","latlon":"39.5754576,-106.0952117","distance":16.8,"countryCode":"US","HSA":
I don't think there's a problem with your code. If you check len(data) it returns 10, which means that the list of results contains only 10 (large) JSON objects.
Is there some reason you're expecting more than 10, or are you trying to access the name property of something inside each of these larger objects?
Your JSON is an array of objects, so when you loop through the data you aren't getting the array in your item variable but instead are getting the index of the array.
You can get the array by using the item variable as the index and once you have a reference to the array you can then read property objects such as name:
Like this:
for index in data:
item = data[index]
print(item['name'])
Here it is in JavaScript:
<script>
var data = [
{
"id":"001i000001XR9dqAAD",
"website":"www.resortinternet.com",
"type":"partner",
"phoneNumber":"+1.970.262.3515",
"name":"Resortnet, LLC",
"logoPresent":"No",
"logoExtension":"",
"des":"Technology provider for destination resorts",
"translatedName":"ResortInternet",
"dbaName":"ResortInternet",
"PR":"NA",
"AN":"6244306",
"accountType":[
"Reseller"
],
"contentType":"parent",
"countries":[
"US"
],
"HSA":[
],
"countriesAndHsa":[
"US"
],
"premierSolutionPartner":false,
"premierBusinessPartner":false,
"solutionPartner":true,
"businessPartner":false,
"advancedSpecialistBarcodePrinterSupplies":false,
"advancedSpecialistCardPrinters":false,
"advancedSpecialistSupplies":false,
"advancedSpecialistWirelessNetworks":false,
"advancedSpecialistPrintEngines":false,
"advancedSpecialistRfid":false,
"specialistBarcodePrinterSupplies":false,
"specialistCardPrinters":false,
"specialistSupplies":false,
"specialistWirelessNetworks":false,
"specialistPrintEngines":false,
"specialistRfid":false,
"advancedRepairSpecialistLabelPrinter":false,
"advancedRepairSpecialistCardPrinter":false,
"advancedRepairSpecialistMobilePrinter":false,
"advancedRepairSpecialistPrintEngine":false,
"repairSpecialistLabelPrinter":false,
"repairSpecialistCardPrinter":false,
"repairSpecialistMobilePrinter":false,
"repairSpecialistPrintEngine":false,
"registeredResellerNoSpecialization":false,
"pmiWraps":[
{
"programName":"Solution Partner",
"category":"Reseller",
"id":"001i000001XR9dqAAD_2",
"type":"pmiWrap",
"contentType":"child"
}
],
"partnerLocations":[
{
"locationType":"Headquarters",
"addressLine1":"117 S 6th Ave.,",
"addressLine2":"PO Box 2718",
"city":"Frisco",
"state":"Colorado",
"zipCode":"80443",
"country":"United States",
"phone":"(970) 262-3515",
"fax":"(970) 668-9431",
"latlon":"39.5754576,-106.0952117",
"distance":16.8,
"countryCode":"US",
"HSA":[
],
"id":"001i000001XR9dqAAD_0",
"type":"partnerLocation",
"contentType":"child"
},
{
"locationType":"Primary Location",
"addressLine1":"117 S 6th Ave.,",
"city":"Frisco",
"state":"Colorado",
"zipCode":"80443",
"country":"United States",
"phone":"+1.970.262.3515",
"latlon":"39.5754576,-106.0952117",
"distance":16.8,
"countryCode":"US",
"HSA":[
],
"id":"001i000001XR9dqAAD_1",
"type":"partnerLocation",
"contentType":"child"
},
{
"locationType":"Address",
"addressLine1":"RESORTINTERNET\r2718:FRISCO:80443\r117 S 6TH AVERM UNIT 2",
"city":"Frisco",
"state":"Colorado",
"stateCode":"CO",
"zipCode":"80443",
"country":"United States",
"latlon":"39.5744309,-106.0975203",
"distance":16.9,
"countryCode":"US",
"HSA":[
],
"id":"001i000001XR9dqAAD_100",
"type":"partnerLocation",
"contentType":"child"
}
],
"verticalHierarchyWraps":[
],
"primaryLocation":{
"locationType":"Headquarters",
"addressLine1":"117 S 6th Ave.,",
"addressLine2":"PO Box 2718",
"city":"Frisco",
"state":"Colorado",
"zipCode":"80443",
"country":"United States",
"phone":"(970) 262-3515",
"fax":"(970) 668-9431",
"latlon":"39.5754576,-106.0952117",
"distance":16.8,
"countryCode":"US"
}
}
];
for (var index in data)
{
var item=data[index];
console.log(item["name"]);
console.log(item);
}
</script>

What is the regex to find part of JSON

I have this JSON part:
{
"destination_addresses":[
"8000 AA Zwolle, Nederland"
],
"origin_addresses":[
"8100 AA Heino, Nederland"
],
"rows":[
{
"elements":[
{
"distance":{
"text":"14,6 km",
"value":14555
},
"duration":{
"text":"17 min.",
"value":1022
},
"status":"OK"
}
]
}
],
"status":"OK"
}
I want the regex to find the value 14555 and the value 1022 but not by searching for this numbers (because the numbers always change) but by searching by the node (value in distance and value in duration)
Any ideas?
"distance" : {.*"value" : ([0-9]+) (,.*)*}
Do you mean something like this? If you really need a RegExp to do this.
I don't know the language you use. But in most languages, you can just parse the JSON to an object. Then you can get that easily.
For example, in JavaScript, var obj = JSON.parse(str); may fit your demand.

How to display 'c' array values alone from the given JSON document below using MongoDB?

I am a newbie to MongoDB. I am experimenting the various ways of extracting fields from a document inside collection.
Here in the below JSON document, I am finding it difficult to get extract it according to my need
{
"_id":1,
"dependencies":{
"a":[
"hello",
"hi"
],
"b":[
"Hmmm"
],
"c":[
"Vanilla",
"Strawberry",
"Pista"
],
"d":[
"Carrot",
"Cauliflower",
"Potato",
"Cabbage"
]
},
"productid":"25",
"date":"Thu Jul 30 11:36:49 PDT 2015"
}
I need to display the following output:
c:[
"Vanilla",
"Strawberry",
"Pista"
]
Can anyone please help me in solving it?
MongoDB Aggregation comes into rescue to get the result you are looking for :
$Project--> Passes along the documents with only the specified fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.
db.collection.aggregate( [
{ $project :
{ c: "$dependencies.c", _id : 0 }
}
]).pretty();
As per the output you required, we just need to project ( display) the field "dependencies.c" , so we are creating a new field "c" and assigining the value of the "dependencies.c" into it.
Also by defalut "_id" field will be display along with the result, since you dont need it, so we are suppressing of the _id field by assigining "_id" : <0 or false>, so that it will not display the _id field in the output.
The above query will fetch you the result as below :
"c" : [
"Vanilla",
"Strawberry",
"Pista"
]

How to get data in specific format using scala?

I have a raw json in following format-
"luns": [
{
"numReadBlocks": 15444876,
"numWriteBlocks": 13530714,
"blockSizeInBytes": 512,
"writeIops": 495344,
"readIops": 312702,
"serialNumber": "aaaaaaa",
"uuid": "id",
"shareState": "none",
"usedBytes": 6721716224,
"totalSizeBytes": 16106127360,
"path": "/vol/lun_23052014_025830_vol/lun_23052014_025830"
},
{
"numReadBlocks": 15444876,
"numWriteBlocks": 13530714,
"blockSizeInBytes": 512,
"writeIops": 495344,
"readIops": 312702,
"serialNumber": "aaaaaaa",
"uuid": "id",
"shareState": "none",
"usedBytes": 6721716224,
"totalSizeBytes": 16106127360,
"path": "/vol/lun_23052014_025830_vol/lun_23052014_025830"
}]
The luns may contains list.
I want to process above json and form output as following-
"topStorageLuns": [
{
"name": "Free (in GB)",
"data": [7.79,7.79]
},
{
"name": "Used (in GB)",
"data": [7.21,7.21]
}]
I tried following in order to get output-
val storageLuns = myRawJson
val topStorageLuns = storageLuns.map { storageLun =>
val totalLunsSizeOnStorageDevice = storageLun.luns.foldLeft(0.0) {
case (totalBytesOnDevice, lun) =>
totalBytesOnDevice + lun.usedBytes.getOrElse(0.0).toString.toLong
}
val totalAvailableLunsOnStorageDevice = storageLun.luns.foldLeft(0.0) {
case (totalBytesOnDevice, lun) =>
totalBytesOnDevice + lun.usedBytes.getOrElse(0.0).toString.toLong
}
Json.obj("name" -> storageLun.hostId, "data" -> "%.2f".format(totalLunsSizeOnStorageDevice / (1024 * 1024 * 1024)).toDouble)
}
Can anybody help me to get desired output please???
The key lesson I want to impart is that your algorithm should reflect the shape of the output you want. Work backward from the result you want to build the algorithm.
It looks to me like you want to create an array of length 2, where each entry has a corresponding algorithm (spaced used, space free). Within each of these elements, you want a nested array with an element for each item in your input array, calculated using the algorithm from the outer array. Here's how I would approach the problem:
1) Define your algorithms
val dfAlgorithm: (Seq[(String, JsValue)] => Double) = _.foldLeft(0.0) { (acc, item) =>
/* whatever logic you need to do */
}
val duAlgorithm: (Seq[(String, JsValue)] => Double) = _.foldLeft(0.0) { (acc, item) =>
/* whatever logic you need to do */
}
2) Create a data structure to map over to build your final output
val stats = Seq("Free (in GB)" -> dfAlgorithm, "Used (in GB)" -> duAlgorithm)
3) Map over your input data within your mapping over your algorithms (the logic here reflects the shape of the result you want)
stats.map { case (name, algorithm) =>
Json.obj("name" -> name, "data" -> storageLuns.map { storageLun => algorithm(storageLun) }
}
This isn't going to be a turnkey solution, since I don't know how your free/used algorithms are supposed to work, but this overall scheme should get you there.