neo4j query performance takes a lot of time - json

I am facing a query performance issue. Do let me know if I am making any mistake here.
I have created around 1700 (Router nodes) and around 4000 (interface nodes) where interfaces are connected to their respective routers using relation (has_interface).
Now I want to create a link between these interfaces. A link will be a relation. Every interface has an IfIPAddress prop associated with it.
When I try to create a link using this query, it runs for a very long time, takes a lot of CPU and then do not create any link.
Here is my query
MATCH (I:Interface), (I2:Interface)
FOREACH(p in FILTER(z in {props} WHERE z.OrigIPAddress = I.IfIPAddress and z.TermIPAddress = I2.IfIPAddress) |
MERGE (:Interface {IfIPAddress:p.OrigIPAddress})-[r:link]->(:Interface {IfIPAddress:p.TermIPAddress})
ON CREATE SET r = p
ON MATCH SET r = p)
Here is what I provide to neo4j using json and curl
{
"params" : {
"props" : [
{
"AreaId" : "",
"OrigIPAddress" : "172.16.42.9",
"OrigNodeID" : "192.168.1.221",
"TermIPAddress" : "172.16.42.10",
"TermNodeID" : "10.229.140.28",
"eEntityStatus" : "1",
"iTotalBW" : "0"
}
]
},
"query" : "MATCH (I:Interface), (I2:Interface) FOREACH(p in FILTER(z in {props} WHERE z.OrigIPAddress = I.IfIPAddress and z.TermIPAddress = I2.IfIPAddress) | MERGE (:Interface {IfIPAddress:p.OrigIPAddress})-[r:link]->(:Interface {IfIPAddress:p.TermIPAddress}) ON CREATE SET r = p ON MATCH SET r = p)"
}
This is what I am doing in the query
First in the FILTER I am removing all those links whose OrigIPAddress or TermIPAddress are not present in the neo4j
After that for every props, I am creating a link between the interfaces.
I am using neo4j 2.1. When neo4j server was running with default configurations it gave error as "OutOFMemory Exception"
I increased the heap size of the server and it is taking a lot of time
Let me know if anything I have missed.
Do let me know if you need logs.

First of all: your query creates a cross product, i.e. it pulls in 8M pairs.
Try something like this first, and report back if it does not work.
FOREACH(p in {props} |
MERGE (I:Interface{IfIPAddress:p.OrigIPAddress})
MERGE (I2:Interface {IfIPAddress:p.TermIPAddress})
MERGE (I)-[r:link]->(I2)
SET r = p
)
You should not set all the properties on r that's just a waste. Only set the properties there that you really need.
Instead of:
SET r = p
do something like this:
SET r.uptime = p.uptime
How many elements do you have in {props} ?
What is your current server configuration? In terms of heap, mmio etc?
Best to share path/to/neo4j/data/graph.db/messages.log for diagnostics.

Related

How can I create an EMR cluster resource that uses spot instances without hardcoding the bid_price variable?

I'm using Terraform to create an AWS EMR cluster that uses spot instances as core instances.
I know I can use the bid_price variable within the core_instance_group block on a aws_emr_cluster resource, but I don't want to hardcode prices as I'd have to change them manually every time the instance type changes.
Using the AWS Web UI, I'm able to choose the "Use on-demand as max price" option. That's exactly what I'm trying to reproduce, but in Terraform.
Right now I am trying to solve my problem using the aws_pricing_product data source. You can see what I have so far below:
data "aws_pricing_product" "m4_large_price" {
service_code = "AmazonEC2"
filters {
field = "instanceType"
value = "m4.large"
}
filters {
field = "operatingSystem"
value = "Linux"
}
filters {
field = "tenancy"
value = "Shared"
}
filters {
field = "usagetype"
value = "BoxUsage:m4.large"
}
filters {
field = "preInstalledSw"
value = "NA"
}
filters {
field = "location"
value = "US East (N. Virginia)"
}
}
data.aws_pricing_product.m4_large_price.result returns a json containing the details of a single product (you can check the response of the example here). The actual on-demand price is buried somewhere inside this json, but I don't know how can I get it (image generated with http://jsonviewer.stack.hu/):
I know I might be able solve this by using an external data source and piping the output of an aws cli call to something like jq, e.g:
aws pricing get-products --filters "Type=TERM_MATCH,Field=sku,Value=8VCNEHQMSCQS4P39" --format-version aws_v1 --service-code AmazonEC2 | jq [........]
But I'd like to know if there is any way to accomplish what I'm trying to do with pure Terraform. Thanks in advance!
Unfortunately the aws_pricing_product data source docs don't expand on how it should be used effectively but the discussion in the pull request that added it adds some insight.
In Terraform 0.12 you should be able to use the jsondecode function to nicely get at what you want with the following given as an example in the linked pull request:
data "aws_pricing_product" "example" {
service_code = "AmazonRedshift"
filters = [
{
field = "instanceType"
value = "ds1.xlarge"
},
{
field = "location"
value = "US East (N. Virginia)"
},
]
}
# Potential Terraform 0.12 syntax - may change during implementation
# Also, not sure about the exact attribute reference architecture myself :)
output "example" {
values = jsondecode(data.json_query.example.value).terms.OnDemand.*.priceDimensions.*.pricePerUnit.USD
}
If you are stuck on Terraform <0.12 you might struggle to do this natively in Terraform other than the external data source approach you've already suggested.
#cfelipe put that ${jsondecode(data.aws_pricing_product.m4_large_price.value).terms.OnDemand.*.priceDimensions.*.pricePerUnit.USD}" in a Locals

Looping through list to store variables in dictionary runs in error

What I want to do:
Get user input from HTML form, store input in variables within Django and perform calculations with variables.
To accomplish that, I use following code:
my_var = requst.POST.get('my_var')
To prevent having 'None' stored in 'my_var' when a Django page is first rendered, I usually use
if my_var == None:
my_var = 1
To keep it simple when using a bunch of variables I came up with following idea:
I store all variable names in a list
I loop through list and create a dictionary with variable names as key and user input as value
For that I wrote this code in python which works great:
list_eCar_properties = [
'car_manufacturer',
'car_model',
'car_consumption',]
dict_sample_eCar = {
'car_manufacturer' : "Supr-Duper",
'car_model' : "Lightning 1000",
'car_consumption' : 15.8,
}
dict_user_eCar = {
}
my_dict = {
'car_manufacturer' : None,
'car_model' : None,
'car_consumption' : None,
}
for item in list_eCar_properties:
if my_dict[item] == None:
dict_user_eCar[item] = dict_sample_eCar[item]
else:
dict_user_eCar[item] = my_dict[item]
print(dict_user_eCar)
Works great - when I run the code, a dictionary (dict_user_eCar) is created where user input (in this case None simulated by using a second dictionary my_dict) is stored. When User leaves input blank - Data from dict_sample_eCar is used.
Now, when I transfer that code to my Django view things don't work not as nice anymore. Code as follows:
def Verbrauchsrechner_eAuto(request):
list_eCar_properties = [
'car_manufacturer',
'car_model',
'car_consumption',
]
dict_model_eCar = {
'car_manufacturer' : "Supr-Duper",
'car_model' : "Lightning 1000",
'car_consumption' : 15.8,
}
dict_user_eCar = {
}
for item in list_eCar_properties:
dict_user_eCar[item] = dict_model_eCar[item]
context = {
'dict_user_eCar' : dict_user_eCar,
'dict_model_eCar' : dict_model_eCar,
'list_eCar_properties' : list_eCar_properties,
}
return render(request, 'eAuto/Verbrauchsrechner_eAuto.html', context = context)
Result: The page gets rendered with only the first dictionary entry. All others are left out. In this cases only car_manufacturer gets rendered to html-page.
Sorry folks - as I was reviewing my post, I realized, that I had a major srew-up at the last part's indentation:
context and return both were part of the for-loop which obviously resulted in a page-rendering after the first loop.
I corrected the code as follows:
for item in list_eCar_properties:
dict_user_eCar[item] = dict_model_eCar[item]
context = {
'dict_user_eCar' : dict_user_eCar,
'dict_model_eCar' : dict_model_eCar,
'list_eCar_properties' : list_eCar_properties,
}
return render(request, 'eAuto/Verbrauchsrechner_eAuto.html', context = context)`
Since I didn't want the time I spend to write this post to be wasted - I simply posted it anyway - even though I found the mistake myself.
Lessons learned for a Newbie in programming:
To many comments in your own code might result in a big confusion
Try to be precise and keep code neat and tidy
Do 1 and 2 before writing long posts in stackoverflow
Maybe someone else will benefit from this.

Reference data join on stream analytics input not giving output

I'm trying to set a rule in Azure Stream Analytics job with the use of reference data and input stream which is coming from an event hub.
This is my reference data JSON packet in BLOB storage:
{
"ruleId": 1234,
"Tag" : "TAG1",
"metricName": "velocity",
"alertName": "velocity over 500",
"operator" : "AVGGREATEROREQUAL",
"value": 500
}
And here is the transformation query in the stream analytics job:
WITH
transformedInput AS
(
SELECT
metric = GetArrayElement(DeviceInputStream.data,0),
masterTag = rules.Tag,
ruleId = rules.ruleId,
alertName = rules.alertName,
ruleOperator = rules.operator,
ruleValue = rules.value
FROM
DeviceInputStream
timestamp by EventProcessedUtcTime
JOIN
rules
ON DeviceInputStream.masterTag = rules.Tag
)
--rule output--
SELECT
System.Timestamp as time,
transformedInput.Tag as Tag,
transformedInput.ruleId as ruleId,
transformedInput.alertName as alert,
AVG(metric.velocity) as avg
INTO
alertruleblob
FROM
transformedInput
GROUP BY
transformedInput.masterTag,
transformedInput.ruleId,
transformedInput.alertName,
ruleOperator,
ruleValue,
TumblingWindow(second, 6)
HAVING
ruleOperator = 'AVGGREATEROREQUAL' AND avg(metric.velocity) >= ruleValue
This is not yielding any results. However, when I do a test with sample input and reference data I get the expected results. But this doens't seem to be working with the streaming data. My use case is if the average velocity is greater than 500 for a 6 second window, store that result in another blob storage. The value of velocity has been greater than 500 for sometime, but I'm not getting any results.
What am I doing wrong?
This was working all along. I just had to specify the input path of the reference blob in the reference input path of stream analytics including the file name. I was basically referencing only the blob container without the actual file. So when I changed the path pattern to "filename.json", I got the results. It was a stupid mistake.

Gathering information from a textfile. Corona SDK

Earlier I asked about gathering information from API-Link, and I have managed to get out most of the details by using the answar I got.
Now ny problem is when another API to get more information
This time the file will contain this information:
{
"username":"UserName",
"confirmed_rewards":"0",
"round_estimate":"0.00000000",
"total_hashrate":"0.000",
"payout_history":"0",
"round_shares":"0",
"workers":{
"UserName.1":{
"alive":"0",
"hashrate":"0.000"
},
"UserName.2":{
"alive":"0",
"hashrate":"0.000"
},
"UserName.3":{
"alive":"1",
"hashrate":"1517.540",
"last_share_timestamp":1369598007
},
"UserName.4":{
"alive":"0",
"hashrate":"0.000"
}
}
}
And I want to gather each of the workers and print them out. This "workers" could contain multiple information, but always start with "UserName.x", where the username come from the "username" paramter each time.
The numbers will always vary from 0 and up
I want to gether the information in the same way by accessing the document, and decode and print out all the workers, whatever the numbers of them are.
By using the script provided in my last question(look at the link in the start), i was thinking that it would be something like
local t = json.decode( txt )
print("Workers: ".. t["workers.UserName.1"])
But this was not the way.
Due to the username changing all the time, I was also thinking somthing like
print("Workers: ".. t["workers" .. "." .. "username" .. "." .. "1"])
From here I have no clue about how I should gather the information, even when the names and numbers vary
Thanks in advance
Here is the perfect solution:
local json = require "json"
local t = json.decode( jsonFile( "data.json" )
local workers = t.workers
for name, user in pairs(workers) do
print("--------------------")
print(name)
for tag, value in pairs(user) do
print(tag , value)
end
end
Here are some more info:
http://www.coronalabs.com/blog/2011/08/03/tutorial-exploring-json-usage-in-corona/
http://www.coronalabs.com/blog/2011/06/21/understanding-lua-tables-in-corona-sdk/
http://lua-users.org/wiki/TablesTutorial

Create "batch" request using ExtJS 4.1 REST Proxy

I've got two model/proxy/stores I'm concerned with Questions and Choices. Both get data from a REST server as JSON. My process currently goes like this:
// load numQuestions records from store.Questions
var qs = Ext.getStore('Question');
//... loadmask, etc.
qs.load({
scope : this,
params : {
limit : numQuestions
},
callback : function() {
this.createQuestionCards(numQuestions);
}
});
Once I have the Questions, I loop through and fetch the Choices that are relevant to each Question like:
for ( i = 0; i < numQuestions; i++) {
// ... misc ...
Assessor.questionChoices[i] = qs.getAt(i).choices();
// ...misc...
},
This works well, except that it makes an XMLHTTPRequest for every loop iteration. With minimum response times in the 0.15 sec area, that is fine for N < ~40. Once the numbers get to 200, which should be a common use case, the delay is nasty.
How do I get ExtJS to "batch" the requests and send them after the loop body? For example:
var choiceBatch = qs.createBatch();
for ( i = 0; i < numQuestions; i++) {
// ... misc ...
Assessor.questionChoices[i] = choiceBatch.getAt(i).choices();
// ...misc...
};
choiceBatch.execute();
The Ext.data.proxy.Rest has a config option batchActions and since it's basically an AjaxProxy with different methods it will probably work in the same way as the AjaxProxy.
Since I am not getting clear answer about restful batch with multipart...
testing on my own with batchActions=true in Ext.data.proxy.Rest v4.2.1 result that batch is only within the same store and HTTP method. (batchActions default to false for the REST)
That means if there is 200 post & 1 delete and you call store.sync(), it will batch into 2 request, the POST request body will be wrapped with an array of records instead of single record.
I am looking for if it can batch all stores with all GET, POST, PUT and DELETE by using multipart/mixed but the result is negative. (check out OData Batch Processing)
Regarding the OP, what you looking for is the model associations. Once you create Questions and Choices Ext model and let the server respond with nested json data (So the Questions contain the child Choices embedded in a request) Ext will create question record along with question.choices() child store automatically.