How does a script optimally layout a pure hierarchical graphviz/dot graph? - generator

I am about to write a script that generates graphviz/dot graphs with the following two characteristics:
All except one node have excactly one parent node (so, it's a tree).
If two or more node share the sampe parent node, they themselves are
in a specific order.
With these characteristics, I'd like my resulting (that is dot-generated graph) to look like so:
No edges should cross
Nodes with the same parent should have the same distance from the graphs top border.
Nodes with the same parent should be drawn from left to right according
to their ordering
However, I can't make dot behave the way I'd like. Here's a dot file to demonstrate my problem:
digraph G {
node [shape=plaintext fontname="Arial"];
0 [label="zero" ];
1 [label="one" ];
2 [label="two" ];
3 [label="three" ];
4 [label="four" ];
5 [label="five" ];
6 [label="six" ];
7 [label="seven" ];
8 [label="eight" ];
9 [label="nine" ];
10 [label="ten" ];
11 [label="eleven" ];
12 [label="twelve" ];
13 [label="thirteen" ];
14 [label="fourteen" ];
15 [label="fivteen" ];
16 [label="sixteen" ];
17 [label="seventeen" ];
18 [label="eighteen" ];
19 [label="nineteen" ];
20 [label="twenty" ];
21 [label="twenty-one"];
22 [label="twenty-two"];
0 -> 1 [arrowhead=none];
1 -> 2 [arrowhead=none];
2 -> 7 [arrowhead=none];
7 -> 8 [arrowhead=none];
8 -> 9 [arrowhead=none];
8 -> 10 [arrowhead=none];
9 -> 10 [color="#aaaaaa" constraint=false];
10 -> 11 [arrowhead=none];
10 -> 12 [arrowhead=none];
11 -> 12 [color="#aaaaaa" constraint=false];
7 -> 13 [arrowhead=none];
8 -> 13 [color="#aaaaaa" constraint=false];
13 -> 14 [arrowhead=none];
7 -> 15 [arrowhead=none];
13 -> 15 [color="#aaaaaa" constraint=false];
15 -> 16 [arrowhead=none];
15 -> 17 [arrowhead=none];
16 -> 17 [color="#aaaaaa" constraint=false];
2 -> 3 [arrowhead=none];
7 -> 3 [color="#aaaaaa" constraint=false];
3 -> 4 [arrowhead=none];
2 -> 5 [arrowhead=none];
3 -> 5 [color="#aaaaaa" constraint=false];
5 -> 6 [arrowhead=none];
2 -> 18 [arrowhead=none];
5 -> 18 [color="#aaaaaa" constraint=false];
18 -> 19 [arrowhead=none];
19 -> 20 [arrowhead=none];
19 -> 21 [arrowhead=none];
20 -> 21 [color="#aaaaaa" constraint=false];
18 -> 22 [arrowhead=none];
19 -> 22 [color="#aaaaaa" constraint=false];
}
results in
Note, the ordering between siblings is indicated by the grey edges (arrows).
So, for example, I am happy with the seven -> three -> five -> eighteen siblings, since they are drawn from left to right in their correct order (as indicated by the arrows).
But I am unhappy with the siblings eight -> thirteen -> fivteen because their edges cross other edges and because their ordering is not from left to right, as I'd like.
Also, nine -> ten, twenty -> twenty-one and nineteen -> twenty-two are in the wrong direction.
I am aware that I could probably get to a picture as I want if I used additional (invisible) edges and the weight attribute and possibly even more features. But as the graphs (and there are many of those) are generated by a script, I can't do that manually.
So, is there a way to achieve what I want?

In this case it's actually very simple: The order of appearance of the nodes in the script does matter. In your script they appear from node 0 to node 22 and are layed out to respect this as much as possible. However, they should appear in the order you added the edges (0,1,2,7,3,5,18, ...). Therefore the simplest solution is to move the block which defines the labels after the block which defines the edges, and you'll get:
No weights, no invisible edges and no invisible nodes.

Related

How to create complex JSON config maps in q?

Is there a good way in q to input somewhat large complicated nested dictionaries which represent/will be converted to json? I'm trying to control the echarts javascript library which basically just renders charts based on json config options. What I'm doing now is:
opt.title.text:"my chart"
opt.xAxis.data:til 100
opt.series.data:100?5
opt.series.type:`line
toClient[opt] /serializes and sends to browser
but is there an obvious way to get rid of the intermediate assignment? Is making a function to take key-path/value pairs and turn them into a dictionary the way to go or is there a better way to go about this?
Or is this something that should be avoided in q, and instead just manually set write q to set specific options and handle the json object map in the javascript client?
Not sure if this is really what you are looking for, but you can create the nested dictionary structure directly if that's what you're after?
q)`title`xAxis`series!(enlist[`text]!enlist"my chart";enlist[`data]!enlist til 100;`data`type!(100?5;`line))
title | (,`text)!,"my chart"
xAxis | (,`data)!,0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ..
series| `data`type!(0 1 1 3 3 3 2 2 4 1 3 3 1 4 0 4 4 4 2 4 3 3 4 0 4 0 0 1 0..

How to find all the triples in a graph?

The graph consists of more than three million nodes and more than 20 million edges. I'm using igraph package on a 8G RAM linux server. The code is
cliques(g,min=3,max=3)
After six days passed, the code is still running. Is there a better way to find all the triples in a graph?
Following #GaborCsardi suggestion you can see this simple example (I used http://igraph.org/nightly igraph dev version)
kite <- graph.famous("Krackhardt_Kite")
triangles(kite)
which yields:
[1] 4 1 2 4 1 3 4 2 5 4 6 1 4 6 3 4 6 7 4 7 2 4 7 5 6 1 3 6 7 8 7 2 5
for the (undirected) graph "Krackhardt_Kite"
You can compare the results with
plot(kite)
Hope this helps

How to create a query based on condition with 3 joined tables in MySQL

I'm a bit confused when trying to create a specific query with the following data tables:
**table 1 - referral_data:**
ID attribution_name
------------------------
1 Category
2 Brand
3 Size
4 Color
5 Processor
6 OS
7 Screen Size
.....
**table 2 - referral_values:**
ID ref_data_id attribution_value
---------------------------------------------
1 1 Cell Phones
2 1 Tablets
3 1 Laptops
4 1 Computers
5 1 LCD Monitors
6 2 Nokia
7 2 Motorola
8 2 Samsung
9 2 Lenovo
10 2 Philips
11 3 10x10x11
12 3 100x100x20
13 3 10x200x200
14 3 2x2x3
15 4 Black
16 4 Cyan
17 4 Magenta
18 4 White
19 4 Blue
20 5 ARM Cortex A11
21 5 Snapdragon 11
22 5 Intel I3 XXXXX
23 5 Exynos XXXX
24 6 Android 4.1
25 6 Android 3.0
26 6 Windows Phone 3
27 6 Windows 8 Professional
28 7 18.5"
29 7 11.8"
30 7 7.0"
31 7 5.0"
32 7 3.5"
......
**table 3 - product_specs:**
ID product_id referral_data_id referral_value_id
--------------------------------------------------------
1 1050 1 1 // <-- Product 1 - Category: Cell Phones
1 1050 2 8 // <-- Product 1 - Brand: Samsung
1 1050 4 19 // <-- Product 1 - Color: Blue
1 1050 6 24 // <-- Product 1 - Processor: Exynos XXXX
1 1050 7 30 // <-- Product 1 - Screen Size: 7.0"
1 1068 1 4 // <-- Product 2 - Category: Computers
1 1068 2 9 // <-- Product 2 - Brand: Samsung
1 1068 6 22 // <-- Product 2 - Processor: Intel Core I3
1 1068 7 28 // <-- Product 2 - Screen Size: 18.5"
......
These tables consists in a "Product Catalog" that I'm planning to use in a e-commerce website.
This is intended to optimize "client side" search functions and organize internal product data information, turning the "content-administrators" tasks in a simplest and easiest environment as possible. (Letting them, for example, choosing an "already-entered" data values instead of re-entering an "already entered data", avoiding duplicated data or typo errors).
The "content administrators" will have options, according to that "table dynamics", to insert new attribution data (product characteristics) and or new attribution values to them (attribution values).
Info: the product code field named "product_id", is outside the tables relation, this is just used to create a link to attach informations to products that they belong to.
In general SQL Joins to get data over the tables, I'm Ok. But there some kind of informations that I need to get / manage, I'm getting nuts. I've spent A LOT of hours and I just have found a headache.
My question is about how to build, in a single query, to get commonly used referral data, based on a CATEGORY.
(when the contents admin choose, for a example, "Cell Phones" in "Category" field, they will get a commonly used "data table information" about that Category, like Brand, Color, Screen Size, etc .... to just choose their category attribution ) and to create a similar query to highlight or order by the commonly used attribution values ( i.e. commonly used screen sizes ).
In a single query?
SELECT ps2.product_id,rv2.attribution_value,rd.attribution_name
FROM ((
(select ps.* from product_specs ps JOIN referral_values rv
ON rv.ref_data_id=ps.referral_value_id
WHERE rv.attribution_value = 'Cell Phones'
) ps1
JOIN product_specs ps2 on ps1.product_id=ps2.product_id)
JOIN referral_values rv2 ON ps2.referral_value_id = rv2.id)
JOIN referral_data rd ON rd.id = rv2.ref_data_id;
The inner select is used to get the right product_id based on criteria 'Cell Phones'. The others are used to populate this value with all details. To do this the first JOIN is a self-JOIN of product_specs to get all data, the following two joins are used to get the string values of them.
By the way: The column product_specs.referral_data_id is redundant and can/should be removed

How to apply a formula for removing data noise in R?

I am working on NGSim Traffic data, having 18 columns and 1180598 rows in a text file. I want to smooth the position data, in the column 'Local Y'. I know there are built-in functions for data smoothing in R but none of them seem to match with the formula I am required to apply. The data in text file looks something like this:
Index VehicleID Total_Frames Local Y
1 2 5 35.381
2 2 5 39.381
3 2 5 43.381
4 2 5 47.38
5 2 5 51.381
6 4 8 504.828
7 4 8 508.325
8 4 8 512.841
9 4 8 516.338
10 4 8 520.854
11 4 8 524.592
12 4 8 528.682
13 4 8 532.901
14 5 7 39.154
15 5 7 43.153
16 5 7 47.154
17 5 7 51.154
18 5 7 55.153
19 5 7 59.154
20 5 7 63.154
The above data columns are just example taken out of original file. Here you can see 3 vehicles, with vehicle IDs = 2, 4 and 5 but in fact there are 2169 vehicles with different IDS. The column Total_Frames tell us how many times vehicle Id of each vehicle is repeated in the first column, for example in the table above, vehicle ID 2 is repeated 5 times, hence '5' in Total_Frames column. Following is the formula I am required to apply to remove data noise (smoothing) from column 'Local Y':
Smoothed Position Value = (1/(Summation of [EXP^-abs(i-k)/delta] from k=i-D to i+D)) * ( (Summation of (Local Y) *[EXP^-abs(i-k)/delta] from k=i-D to i+D))
where,
i = index #
delta = 5
D = 15
I have tried using the built-in functions, which I know of, but they don't smooth the data as required. My question is: Is there any built-in function in R which can do the data smoothing in the way of given formula or which could take this formula as an argument? I need to apply the formula to every value in Local Y which has 15 values before and 15 values after them (i-D and i+D) for same vehicle Id. Can anyone give me any idea how to approach the problem? Thanks in advance.
You can place your formula in a function and then use the apply function of R to apply it to the elements in your "Local Y" column of the dataframe

How to build vtkPolyData based on the information within a txt file

I have a txt file which contains a set of 3 Dimensional data points and I would like to create a vtkPolyData based on those points.
In the file, I have the number of points on the first line, in my case they are 6 x 6. And after that the actual coordinates of each point. The content of the file is like this.
6 6
1 1 3
2 1 3.4
3 1 3.6
4 1 3.6
5 1 3.4
6 1 3
1 2 3
2 2 3.8
3 2 4.2
4 2 4.2
5 2 3.8
6 2 3
1 3 3
2 3 3
3 3 3
4 3 3
5 3 3
6 3 3
1 4 3
2 4 3
3 4 3
4 4 3
5 4 3
6 4 3
1 5 3
2 5 3.8
3 5 4.2
4 5 4.2
5 5 3.8
6 5 3
1 6 3
2 6 3.4
3 6 3.6
4 6 3.6
5 6 3.4
6 6 3
How can I build a vtkPolyData structure with a txt file with this data?
It looks to me like you have a regularly gridded series of points, right? If so, vtkImageData might be a better choice. You can always use a geometry filter afterwards to convert to polydata if you really need it that way.
Create a vtkImageData instance.
Set its dimensions to (6, 6, 1) (the third dimension is ignored).
Set its data type to an appropriate type (float or double, I guess).
Call AllocateScalars();
If in C++,
call GetScalarPointer() and cast it to the data type set in 3.
This pointer will point to an array of size 36. You can just fill each point as you would normally.
If in another language (TCL/Python/Java), call SetScalarComponentFromFloat on the image data, with the arguments (x, y, 0, 0, value). The first 0 is the 3rd dimension and the second is for the first component.
This will give you a grid, and it'll be far more memory efficient than a polydata.
If you want to visualize only the points, use a vtkDataSetMapper, and setup the actor's property with SetRepresentationToPoints(), setting an appropriate point size. That will do a simple job of visualization.
Are these examples useful? In particular, this does generation of points and polygons, so it should be possible to adapt. The core seems to be (with lots left out):
# ...
vtkPolyData shell
vtkFloatPoints points
vtkCellArray strips
# Generate points...
loop {
...
points InsertPoint $k $x0 $x1 $x2
}
shell SetPoints points
points Delete
# Generate triangles/polygons...
loop {
strips InsertNextCell $NP2
# ...
strips InsertCellPoint [expr $kb +$ke ]
# ...
strips InsertCellPoint [expr $kb +$ke ]
}
shell SetStrips strips
strips Delete
# ...