Importing CSV While Creating Relationship In Neo4j - csv

I am trying to create a relationship between two different graphs, using information in a CSV file. I built the query the way I did because the size of each graph, one being 500k+ and the other 1.5m+.
This is the query I have:
LOAD CSV WITH HEADERS FROM "file:///customers_table.csv" AS row WITH row
MATCH (m:Main) WITH m
MATCH (c:Customers) USING INDEX c:Customers(customer)
WHERE m.ASIN = row.asin AND c.customer = row.customer
CREATE (c)-[:RATED]->(m)
This is the error I receive:
Variable `row` not defined (line 4, column 16 (offset: 164))
"WHERE m.ASIN = row.asin AND c.customer = row.customer"
^
An example of the Main table is:
{
"ASIN": "0827229534",
"totalreviews": "2",
"categories": "2",
"title": "Patterns of Preaching: A Sermon Sampler",
"avgrating": "5",
"group": "Book"
}
And an example of a customer is:
{
"customer": "A2FMUVHRO76A32"
}
And inside the customers table csv, I have:
Customer, ASIN, rating
A2FMUVHRO76A32, 0827229534, 5
I can't seem to figure out why it's throwing back that error.

The first WITH clause in your query (WITH row) is unnecessary, but you have to add the variable to the WITH clause. So this version compiles.
LOAD CSV WITH HEADERS FROM "file:///customers_table.csv" AS row
MATCH (m:Main)
WITH m, row
MATCH (c:Customers) USING INDEX c:Customers(customer)
WHERE m.ASIN = row.asin AND c.customer = row.customer
CREATE (c)-[:RATED]->(m)
The reason for this is, that, in essence, WITH chains two query parts together, while limiting the scope to its variables (and in some cases, also performing calculations, aggregations, etc.).
Having said that, you do not even need the second WITH clause, you can just omit it and even merge the two MATCH clauses to a single one:
LOAD CSV WITH HEADERS FROM "file:///customers_table.csv" AS row
MATCH (m:Main), (c:Customers) USING INDEX c:Customers(customer)
WHERE m.ASIN = row.asin AND c.customer = row.customer
CREATE (c)-[:RATED]->(m)

Related

ADF dataflow and columns/rows in separate array in JSON

I have a bunch of json files which have an array with column names and a separate array for the rows.
I want a dynamic way of retrieving column names and merge them with the rows for each json file.
Been playing around with derived columns and column patterns, but struggling to get it working.
I want the column names from [data.column.shortText] and values for each corresponding [data.rows.value] according to the order.
Example format
{
"messages":{
},
"data":{
"columns":[
{
"columnName":"SelectionCriteria1",
"shortText":"Case no."
},
{
"columnName":"SelectionCriteria2",
"shortText":"Period for periodical values",
},
{
"columnName":"SelectionCriteria3",
"shortText":"Location"
},
{
"columnName":"SelectionCriteriaAggregate",
"shortText":"Value"
}
],
"rows":[
[
{
"value":"23523"
},
{
"value":12342349
},
{
"value":"234234",
"code":3342
},
{
"value":234234234
}
]
]
}
}
First, you need to fix your Json data, i can see you have an extra comma in columns second Json and in rows you have value as int and as string so when i tried to parse it in ADF i got an error.
i don't quite understand why you're trying to do merge by position because normally we get rows more than columns, and if you'll get 5 rows and 3 columns you will get an error.
Here is my approach to your problem:
the main idea is that i added index column to both arrays and joined the jsons by Inner Join.
created a Source Data (its 2 but you can make it one to simplify your data flow)
added Select activity to select relevant arrays from the data.
flattened the array(in order to add index column)
added index by using rank activity (please read more about rank and dense rank and what is the difference between the two)
added a Join activity , inner join by index column.
Select activity to remove index column from the result.
saved output to sink.
Json Data that i worked with:
Data Flow:
SelectRows Activity:
Flatten Activity:
Rank actitity:
Join activity:
please check these links:
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-expressions-usage#mapAssociation
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-map-functions

Using json_extract in sqlite to pull data from parent and child objects

I'm starting to explore the JSON1 library for sqlite and have been so far successful in the basic queries I've created. I'm now looking to create a more complicated query that pulls data from multiple levels.
Here's the example JSON object I'm starting with (and most of the data is very similar).
{
"height": 140.0,
"id": "cp",
"label": {
"bind": "cp_label"
},
"type": "color_picker",
"user_data": {
"my_property": 2
},
"uuid": "948cb959-74df-4af8-9e9c-c3cb53ac9915",
"value": {
"bind": "cp_color"
},
"width": 200.0
}
This json object is buried about seven levels deep in a json structure and I pulled it from the larger json construct using an sql statement like this:
SELECT value FROM forms, json_tree(forms.formJSON, '$.root')
WHERE type = 'object'
AND json_extract(value, '$.id') = #sControlID
// In this example, #sControlID is a variable that represents the `id` value we're looking for, which is 'cp'
But what I really need to pull from this object are the following:
the value from key type ("color_picker" in this example)
the values from keys bind ("cp_color" and "cp_label" in this example)
the keys value and label (which have values of {"bind":"<string>"} in this example)
For that last item, the key name (value and label in this case) can be any number of keywords, but no matter the keyword, the value will be an object of the form {"bind":"<some_string>"}. Also, there could be multiple keys that have a bind object associated with them, and I'd need to return all of them.
For the first two items, the keywords will always be type and bind.
With the json example above, I'd ideally like to retrieve two rows:
type key value
color_picker value cp_color
color_picker label cp_label
When I use json_extract methods, I end up retrieving the object {"bind":"cp_color"} from the json_tree table, but I also need to retrieve the data from the parent object. I feel like I need to do some kind of union, but my attempts have so far been unsuccessful. Any ideas here?
Note: if the {"bind":"<string>"} object doesn't exist as a child of the parent object, I don't want any rows returned.
Well, I was on the right track and eventually figured out it. I created a separate query for each of the items I was looking for, then INNER JOINed all the json_tree tables from each of the queries to have all the required fields available. Then I json_extracted the required data from each of the json fields I needed data from. In the end, it gave me exactly what I was looking for, though I'm sure it could be written more efficiently.
For anyone interested, this is what hte final query ended up looking like:
SELECT IFNULL(json_extract(parent.value, '$.type'), '_window_'), child.key, json_extract(child.value, '$.bind') FROM (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) parent INNER JOIN (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(value, '$.bind') != 'NULL' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) child ON child.parent = parent.id;
If you have any tips on reducing its complexity, feel free to comment!

Using N1QL with document keys

I'm fairly new to couchbase and have tried to find the answer to a particular query I'm trying to create with not much success so far.
I've debated between using a view or N1QL for this particular case and settled with N1QL but haven't managed to get it to work so maybe a view is better after all.
Basically I have the document key (Group_1) for the following document:
Group_1
{
"cbType": "group",
"ID": 1,
"Name": "Group Atlas 3",
"StoreList": [
2,
4,
6
]
}
I also have 'store' documents, their keys are listed in this document's storelist. (Store_2, Store_4, Store_6 and they have a storeID value that is 2, 4 and 6) I basically want to obtain all 3 documents listed.
What I do have that works is I obtain this document with its id by doing:
var result = CouchbaseManager.Bucket.Get<dynamic>(couchbaseKey);
mygroup = JsonConvert.DeserializeObject<Group> (result.ToString());
I can then loop through it's storelist and obtain all it's stores in the same manner, but i don't need anything else from the group, all i want are the stores and would have prefered to do this in a single operation.
Does anyone know how to do a N1QL directly unto a specified document value?
Something like (and this is total imaginary non working code I'm just trying to clearly illustrate what I'm trying to get at):
SELECT * FROM mycouchbase WHERE documentkey IN
Group_1.StoreList
Thanks
UPDATE:
So Nic's solution does not work;
This is the closest I get to what I need atm:
SELECT b from DataBoard c USE KEYS ["Group_X"] UNNEST c.StoreList b;
"results":[{"b":2},{"b":4},{"b":6}]
Which returns the list of IDs of the Stores I want for any given group (Group_X) - I haven't found a way to get the full Stores instead of just the ID in the same statement yet.
Once I have, I'll post the full solution as well as all the speed bumps I've encountered in the process.
I apologize if I have a misunderstanding of your question, but I'm going to give it my best shot. If I misunderstood, please let me know and we'll work from there.
Let's use the following scenario:
group_1
{
"cbType": "group",
"ID": 1,
"Name": "Group Atlas 3",
"StoreList": [
2,
4,
6
]
}
store_2
{
"cbType": "store",
"ID": 2,
"name": "some store name"
}
store_4
{
"cbType": "store",
"ID": 4,
"name": "another store name"
}
store_6
{
"cbType": "store",
"ID": 6,
"name": "last store name"
}
Now lets say you wan't to query the stores from a particular group (group_1), but include no other information about the group. You essentially want to use N1QL's UNNEST and JOIN operators.
This might leave you with a query like so:
SELECT
stores.name
FROM `bucket-name-here` AS groups
UNNEST groups.StoreList AS groupstore
JOIN `bucket-name-here` AS stores ON KEYS ("store_" || groupstore.ID)
WHERE
META(groups).id = 'group_1';
A few assumptions are made in this. Both your documents exist in the same bucket and you only want to select from group_1. Of course you could use a LIKE and switch the group id to a percent wildcard.
Let me know if something doesn't make sense.
Best,
Try this query:
select Name
from buketname a join bucketname b ON KEYS a.StoreList
where Name="Group Atlas 3"
Based on your update, you can do the following:
SELECT b, s
FROM DataBoard c USE KEYS ["Group_X"]
UNNEST c.StoreList b
JOIN store_bucket s ON KEYS "Store_" || TO_STRING(b);
I have a similar requirement and I got what I needed with a query like this:
SELECT store
FROM `bucket-name-here` group
JOIN `bucket-name-here` store ON KEYS group.StoreList
WHERE group.cbType = 'group'
AND group.ID = 1

How to enter multiple table data in mongoDB using json

I am trying to learn mongodb. Suppose there are two tables and they are related. For example like this -
1st table has
First name- Fred, last name- Zhang, age- 20, id- s1234
2nd table has
id- s1234, course- COSC2406, semester- 1
id- s1234, course- COSC1127, semester- 1
id- s1234, course- COSC2110, semester- 1
how to insert data in the mongo db? I wrote it like this, not sure is it correct or not -
db.users.insert({
given_name: 'Fred',
family_name: 'Zhang',
Age: 20,
student_number: 's1234',
Course: ['COSC2406', 'COSC1127', 'COSC2110'],
Semester: 1
});
Thank you in advance
This would be a assuming that what you want to model has the "student_number" and the "Semester" as what is basically a unique identifier for the entries. But there would be a way to do this without accumulating the array contents in code.
You can make use of the upsert functionality in the .update() method, with the help of of few other operators in the statement.
I am going to assume you are going this inside a loop of sorts, so everything on the right side values is actually a variable:
db.users.update(
{
"student_number": student_number,
"Semester": semester
},
{
"$setOnInsert": {
"given_name": given_name,
"family_name": family_name,
"Age": age
},
"$addToSet": { "courses": course }
},
{ "upsert": true }
)
What this does in an "upsert" operation is first looks for a document that may exist in your collection that matches the query criteria given. In this case a "student_number" with the current "Semester" value.
When that match is found, the document is merely "updated". So what is being done here is using the $addToSet operator in order to "update" only unique values into the "courses" array element. This would seem to make sense to have unique courses but if that is not your case then of course you can simply use the $push operator instead. So that is the operation you want to happen every time, whether the document was "matched" or not.
In the case where no "matching" document is found, a new document will then be inserted into the collection. This is where the $setOnInsert operator comes in.
So the point of that section is that it will only be called when a new document is created as there is no need to update those fields with the same information every time. In addition to this, the fields you specified in the query criteria have explicit values, so the behavior of the "upsert" is to automatically create those fields with those values in the newly created document.
After a new document is created, then the next "upsert" statement that uses the same criteria will of course only "update" the now existing document, and as such only your new course information would be added.
Overall working like this allows you to "pre-join" the two tables from your source with an appropriate query. Then you are just looping the results without needing to write code for trying to group the correct entries together and simply letting MongoDB do the accumulation work for you.
Of course you can always just write the code to do this yourself and it would result in fewer "trips" to the database in order to insert your already accumulated records if that would suit your needs.
As a final note, though it does require some additional complexity, you can get better performance out of the operation as shown by using the newly introduced "batch updates" functionality.For this your MongoDB server version will need to be 2.6 or higher. But that is one way of still reducing the logic while maintaining fewer actual "over the wire" writes to the database.
You can either have two separate collections - one with student details and other with courses and link them with "id".
Else you can have a single document with courses as inner document in form of array as below:
{
"FirstName": "Fred",
"LastName": "Zhang",
"age": 20,
"id": "s1234",
"Courses": [
{
"courseId": "COSC2406",
"semester": 1
},
{
"courseId": "COSC1127",
"semester": 1
},
{
"courseId": "COSC2110",
"semester": 1
},
{
"courseId": "COSC2110",
"semester": 2
}
]
}

Parsing numerical data using Prolog?

I am new to prolog and am considering using it for a small data analysis application. Here is what I am seeking to accomplish:
I have a CSV file with some data of the following from:
a,b,c
d,e,f
g,h,i
...
The data is purely numerical and I need to do the following: 1st, I need to group rows according to the following scheme:
So what's going on above?
I start at the 1st row, which has value 'a' in column one. Then, I keep going down the rows until I hit a row whose value in column one differs from 'a' by a certain amount, 'z'. The process is then repeated, and many "groups" are formed after the process is complete.
For each of these groups, I want to find the mean of columns two and three (as an example, for the 1st group in the picture above, the mean of column two would be: (b+e+h)/3).
I am pretty sure this can be done in prolog. However, I have 50,000+ rows of data and since prolog is declarative, I am not sure how efficient prolog would be at accomplishing the above task?
Is it feasible to work out a prolog program to accomplish the above task, so that efficiency of the program is not significantly lower than a procedural analog?
this snippet could be a starting point for your task
:- [library(dcg/basics)].
rownum(Z, AveList) :- phrase_from_file(row_scan(Z, [], [], AveList), 'numbers.txt').
row_scan(Z, Group, AveSoFar, AveList) -->
number(A),",",number(B),",",number(C),"\n",
{ row_match(Z, A,B,C, Group,AveSoFar, Group1,AveUpdated) },
row_scan(Z, Group1, AveUpdated, AveList).
row_scan(_Z, _Group, AveList, AveList) --> "\n";[].
% row_match(Z, A,B,C, Group,Ave, Group1,Ave1)
row_match(_, A,B,C, [],Ave, [(A,B,C)],Ave).
row_match(Z, A,B,C, [H|T],Ave, Group1,Ave1) :-
H = (F,_,_),
( A - F =:= Z
-> aggregate_all(agg(count,sum(C2),sum(C3)),
member((_,C2,C3), [(A,B,C), H|T]), agg(Count,T2,T3)),
A2 is T2/Count, A3 is T3/Count,
Group1 = [], Ave1 = [(A2,A3)|Ave]
; Group1 = [H,(A,B,C)|T], Ave1 = Ave
).
with this input
1,2,3
4,5,6
7,8,9
10,2,3
40,5,6
70,8,9
16,0,0
yields
?- rownum(6,L).
L = [ (3.75, 4.5), (5, 6)]