I am starting to explore Fiware tools and I am testing Cepheus-CEP.
I know I can set rules and other configuration parameters in the "config.json" file that Cepheus take when I launch the service, but I don't know how to especify these rules on execution time.
For example, a "config.json" could be:
{
"host":"http://localhost:8080",
"in":[
{
"id":"RoomX",
"type":"Room",
"attributes":[
{ "name":"temperature", "type":"double" },
{ "name":"shutter", "type":"string" }
]
}
],
"out":[
{
"id":"ShutterX",
"type":"Shutter",
"attributes":[
{ "name":"status", "type":"string" }
]
}
],
"statements":[
"INSERT INTO Shutter SELECT R.r.shutter as id, 'closed' as status FROM pattern [ every r=Room(temperature > 26.0) -> (timer:interval(5 sec) and not Room(temperature < 26.0 and id=r.id))] as R unidirectional LEFT OUTER JOIN Shutter.std:groupwin(id).std:lastevent() as S ON R.r.shutter = S.id WHERE S is null OR S.status = 'opened'",
"INSERT INTO Shutter SELECT R.r.shutter as id, 'opened' as status FROM pattern [ every r=Room(temperature < 24.0) -> (timer:interval(5 sec) and not Room(temperature > 24.0 and id=r.id))] as R unidirectional LEFT OUTER JOIN Shutter.std:groupwin(id).std:lastevent() as S ON R.r.shutter = S.id WHERE S is null OR S.status = 'closed'"
]
}
And I want to change the statements (rules) list during execution time, Is there a query or something that I could use?
I am afraid that there is not a CRUD API in Cepheus CEP and from my experience you should replace the whole config file at once.
How ever you can replace the file at running time it should works!
Tell me if you find a different approach.
Best regards!
Related
SELECT * from bucket b WHERE meta().id = 'PROFILE_LIST'
Above query gives below result, but in addition I need the inner array matchingProfile_ should come in sorted order of createdDate. Is it possible? if yes, What changes I have to make for this query to achieve the same?
[
{
"matchingProfile_": [
{
"createdDate": "2020-09-26T02:30:00",
"contactDetails_": {
"address_": {
"addressLine1_": "",
"addressLine2_": "",
"city_": ""
}
},
{
"createdDate": "2020-09-27T02:30:00",
"contactDetails_": {
"address_": {
"addressLine1_": "",
"addressLine2_": "",
"city_": ""
}
}
]
}
]
Use subquery expression preserve whole document structure and sort array on desired way (even use complete SQL functionality)
SELECT b.*,
(SELECT RAW mp
FROM b.matchingProfile_ AS mp
ORDER BY mp.createdDate) AS matchingProfile_
FROM bucket AS b USE KEYS 'PROFILE_LIST';
[PROBLEM - My final solution below]
I'd like to import a json file containing my data into Neo4J.
However, it is super slow.
The Json file is structured as follow
{
"graph": {
"nodes": [
{ "id": 3510982, "labels": ["XXX"], "properties": { ... } },
{ "id": 3510983, "labels": ["XYY"], "properties": { ... } },
{ "id": 3510984, "labels": ["XZZ"], "properties": { ... } },
...
],
"relationships": [
{ "type": "bla", "startNode": 3510983, "endNode": 3510982, "properties": {} },
{ "type": "bla", "startNode": 3510984, "endNode": 3510982, "properties": {} },
....
]
}
}
Is is similar to the one proposed here: How can I restore data from a previous result in the browser?.
By looking at the answer.
I discovered that I can use
CALL apoc.load.json("file:///test.json") YIELD value AS row
WITH row, row.graph.nodes AS nodes
UNWIND nodes AS node
CALL apoc.create.node(node.labels, node.properties) YIELD node AS n
SET n.id = node.id
and then
CALL apoc.load.json("file:///test.json") YIELD value AS row
with row
UNWIND row.graph.relationships AS rel
MATCH (a) WHERE a.id = rel.endNode
MATCH (b) WHERE b.id = rel.startNode
CALL apoc.create.relationship(a, rel.type, rel.properties, b) YIELD rel AS r
return *
(I have to do it in two times because else their are relation duplication due to the two unwind).
But this is super slow because I have a lot of entities and I suspect the program to search over all of them for each relation.
At the same time, I know "startNode": 3510983 refers to a node.
So the question: does it exists anyway to speed up to import process using ids as index, or something else?
Note that my nodes have differents types. So I did not find a way to create an index for all of them, and I suppose that would be too huge (memory)
[MY SOLUTION]
CALL apoc.load.json('file:///test.json') YIELD value
WITH value.graph.nodes AS nodes, value.graph.relationships AS rels
UNWIND nodes AS n
CALL apoc.create.node(n.labels, apoc.map.setKey(n.properties, 'id', n.id)) YIELD node
WITH rels, COLLECT({id: n.id, node: node, labels:labels(node)}) AS nMap
UNWIND rels AS r
MATCH (w{id:r.startNode})
MATCH (y{id:r.endNode})
CALL apoc.create.relationship(w, r.type, r.properties, y) YIELD rel
RETURN rel
[EDITED]
This approach may work more efficiently:
CALL apoc.load.json("file:///test.json") YIELD value
WITH value.graph.nodes AS nodes, value.graph.relationships AS rels
UNWIND nodes AS n
CALL apoc.create.node(n.labels, apoc.map.setKey(n.properties, 'id', n.id)) YIELD node
WITH rels, apoc.map.mergeList(COLLECT({id: n.id, node: node})) AS nMap
UNWIND rels AS r
CALL apoc.create.relationship(nMap[r.startNode], r.type, r.properties, nMap[r.endNode]) YIELD rel
RETURN rel
This query does not use MATCH at all (and does not need indexing), since it just relies on an in-memory mapping from the imported node ids to the created nodes. However, this query could run out of memory if there are a lot of imported nodes.
It also avoids invoking SET by using apoc.map.setKey to add the id property to n.properties.
The 2 UNWINDs do not cause a cartesian product, since this query uses the aggregating function COLLECT (before the second UNWIND) to condense all the preceding rows into one (because the grouping key, rels, is a singleton).
Have you tried indexing the nodes before the LOAD JSON? This may not be tenable since you have multiple node labels. But if they are limited you can create placeholder node, create and index and then delete the placeholder. After this, run the LOAD Json
Create (n:YourLabel{indx:'xxx'})
create index on: YourLabel(indx)
match (n:YourLabel) delete n
The index will speed the matching or merging
I have json documents in my Couchbase cluster that looks like this
{
"giata_properties": {
"propertyCodes": {
"provider": [
{
"code": [
{
"value": [
{
"name": "Country Code",
"value": "EG"
},
{
"name": "City Code",
"value": "HRG"
},
{
"name": "Hotel Code",
"value": "91U"
}
]
}
],
"providerCode": "gta",
"providerType": "gds"
},
{
"code": [
{
"value": [
{
"value": "071801"
}
]
},
{
"value": [
{
"value": "766344"
}
]
}
],
"providerCode": "restel",
"providerType": "gds"
},
{
"code": [
{
"value": [
{
"value": "HRG03Z"
}
]
},
{
"value": [
{
"value": "HRG04Z"
}
]
}
],
"providerCode": "5VF",
"providerType": "tourOperator"
}
]
}
}
}
I'm trying to create a query that fetches a single document based on the value of giata_properties.propertyCodes.provider.code.value.value and a specific providerType.
So for example, my input is 071801 and restel, I want a query that will fetch me the document I pasted above (because it contains these values).
I'm pretty new to N1QL so what I tried so far is (without the providerType input)
SELECT * FROM giata_properties AS gp
WHERE ANY `field` IN `gp.propertyCodes.provider.code.value` SATISFIES `field.value` = '071801' END;
This returns me an empty result set. I'm probably doing all of this wrongly.
edit1:
According to geraldss answer I was able to achieve my goal via 2 different queries
1st (More general) ~2m50.9903732s
SELECT * FROM giata_properties AS gp WHERE ANY v WITHIN gp SATISFIES v.`value` = '071801' END;
2nd (More specific) ~2m31.3660388s
SELECT * FROM giata_properties AS gp WHERE ANY v WITHIN gp.propertyCodes.provider[*].code SATISFIES v.`value` = '071801' END;
Bucket have around 550K documents. No indexes but the primary currently.
Question part 2
When I do either of the above queries, I get a result streamed to my shell very quickly, then I spend the rest of the query time waiting for the engine to finish iterating over all documents. I'm sure that I'll be only getting 1 result from future queries so I thought I can use LIMIT 1 so the engine stops searching on first result, I tried something like
SELECT * FROM giata_properties AS gp WHERE ANY v WITHIN gp SATISFIES v.`value` = '071801' END LIMIT 1;
But that made no difference, I get a document written to my shell and then keep waiting until the query finishes completely. How can this be configured correctly?
edit2:
I've upgraded to the latest enterprise 4.5.1-2844, I have only the primary index created on giata_properties bucket, when I execute the query along with the LIMIT 1 keyword it still takes the same time, it doesn't stop quicker.
I've also tried creating the array index you suggested but the query is not using the index and it keeps insisting on using the #primary index (even if I use USE INDEX clause).
I tried removing SELF from the index you suggested and it took a much longer time to build and now the query can use this new index, but I'm honestly not sure what I'm doing here.
So 3 questions:
1) Why LIMIT 1 using primary index doesn't make the query stop at first result?
2) What's the difference between the index you suggested with and without SELF? I tried to look for SELF keyword documentation but I couldn't find anything.
This is how both indexes look in Web ui
Index 1 (Your original suggestion) - Not working
CREATE INDEX `gp_idx1` ON `giata_properties`((distinct (array (`v`.`value`) for `v` within (array_star((((self.`giata_properties`).`propertyCodes`).`provider`)).`code`) end)))
Index 2 (Without SELF)
CREATE INDEX `gp_idx2` ON `giata_properties`((distinct (array (`v`.`value`) for `v` within (array_star(((self.`propertyCodes`).`provider`)).`code`) end)))
3) What would be the query for a specific giata_properties.propertyCodes.provider.code.value.value and a specific providerCode? I managed to do both separately but I wasn't successful in merging them.
Thanks for all your help dear
Here is a query without the providerType.
EXPLAIN SELECT *
FROM giata_properties AS gp
WHERE ANY v WITHIN gp.giata_properties.propertyCodes.provider[*].code SATISFIES v.`value` = '071801' END;
You can also index this in Couchbase 4.5.0 and above.
CREATE INDEX idx1 ON giata_properties( DISTINCT ARRAY v.`value` FOR v WITHIN SELF.giata_properties.propertyCodes.provider[*].code END );
Edit to answer question edits
The performance has been addressed in 4.5.x. You should try the following on Couchbase 4.5.1 and post the execution times here.
Test on 4.5.1.
Create the index.
Use the LIMIT. In 4.5.1, the limit is pushed down to the index.
I'm currently trying to do a bit of complex N1QL for a project I'm working on, theoretically I could do all of this processing in multiple N1QL calls and by parsing the results each time, however if possible I'd like for this to contained in one call.
What I would like to do is:
filter all documents that contain a "dataSync.test.id" field with more than 1 id
Read back all other ids in that list
Use that list to get other documents containing those ids
Get the "dataSync.test._channels" field for those documents (optionally a filter by docType might help parsing)
This would probably return a list of "dataSync.test._channels"
Is this possible in N1QL? It appears like it might be but I can't get the syntax right.
My data structures look a little like
{
"dataSync": {
"test": {
"_channels": [
"RP"
],
"id": [
"dataSync_user_1015",
"dataSync_user_1010",
"dataSync_user_1005"
],
"_lastUpdatedBy": "TEST"
}
},
...
}
{
"dataSync": {
"test": {
"_channels": [
"RSD"
],
"id": [
"dataSync_user_1010"
],
"_lastUpdatedBy": "TEST"
}
},
...
}
Yes. I think you can do all these.
Initial set of IDs with filtering can be retrieved as a subquery and then you can get subsquent documents by joins.
SELECT fulldoc
FROM (select meta().id as dockey from doc where a=1) as mydoc
INNER JOIN doc fulldoc ON KEYS mydoc.dockey;
There are optimizations that can be done here. Try the sequencing first to ensure you're get the job done.
I need to convert following mysql query to mongo.
Any help will be highly appreciated.
SELECT cr.*, COUNT(cj.job_id) AS finished_chunks FROM `checks_reports_df8` cr
LEFT JOIN `checks_jobs_df8` cj ON cr.id = cj.report_id
WHERE cr.started IS NOT NULL AND cr.finished IS NULL AND cj.is_done = 1
MongoDB doesn't do JOINs. So you will have to query both collections and do the JOIN on the application layer. How to do this exactly depends on which programming language you use to develop your application. You don't say which one you use, so I will just give you an example in JavaScript. When you use a different language: The second snippet is just a simple FOR loop.
These are the MongoDB queries you would use. I don't have access to your data, so I can not guarantee correctness.
var reports = db.checks_reports_df8.find({
"started": {$exists: 1 },
"finished": {$exists: 0 }
});
This query assumes that your null-values are represented by missing fields which is normal practice in MongoDB. When you have actual null values, use "started": { $ne: null } and "finished": null.
Then iterate over the array of documents you get. For each RESULT perform this query:
reports.forEach(function(report) {
var job_count = db.checks_jobs_df8.aggregate([
{$match: {
"report_id": report.id,
"is_done": 1
}},
{$group: {
_id: "$job_id",
"count": { $sum: 1 }
}}
])
// output the data from report and job_count here
});