How to update a property in CSV Data in neo4j - csv

i want to update properties in the relationship by import the csv data in Neo4j.
i have created some labels and relationships from csv data like this:
node,name
1,leo
2,sun
3,wang
4,hi
now i would like to add a property "descripe" in the table.
descripe
leader
pro-leader
crew
crew
how can i add this property into the graph? Just only this new property, i dont want to create the new four labels.
THX

It would be useful if you posted the cypher statements that you used to perform your initial graph creation. Without this it is difficult to know exactly how the update should be written. Essentially though you want to load your csv file containing the new properties, perform a MATCH on the nodes in the graph and then set the properties on these nodes. Assuming that you have created a schema index against your labels it would be something like:
LOAD CSV WITH HEADERS FROM "file:///C:/temp/myfile.csv" AS csvLine
MATCH (n:`Label` { indexedproperty : csvLine.value })
SET n.newproperty = csvLine.newpropertyvalue
where Label is the label you applied to your nodes on creation, indexedproperty is the name of the property you added and indexed, csvLine.value is the lookup value of the indexed property read from the .csv file, csvLine.newpropertyvalue is the new property you wish to add (read from the .csv file).
If you post more details on your graph we can help more precisely.

Related

Is there a way to insert the filename into the record when importing a JSON file into Power BI?

Not sure how to ask this but here goes. I have a collection of 500+ JSON files that I need to import into PowerBI. Each JSON has four different levels of information that I need to parse out. I converted the JSON top-level info into a table and transposed it so I had one row like the attached screenshot.
enter image description here
My first question is: can I easily add the filename to the JSON record? I would like to use the filename as a unique key in later queries.
Thanks!
It looks like you may be connecting to each JSON file individually? If I'm correct, assuming all the JSON files can be in a single folder, you can use the "Folder" connection. That then allows you to right-click on the original folder query and choose "reference" to then create various transformations for each JSON file, and it includes the file name.
Related details:
https://powerbi.tips/2016/06/loading-data-from-folder/
https://learn.microsoft.com/en-us/power-bi/guidance/power-query-referenced-queries
Hoping that helps!

To validate the header name of each field of a csv file in azure blob storage using Azure data factory v2 pipelines

I have a scenario where the user will upload a file with some data and a header in that file. i need to process the file and make sure that the field names in the header are correct and have no whitespaces and no special characters.
eg. User dropped file in storage account contains the following header
i need to change it to this
How can i do this ADF v2 ?
Data Factory won't really do this as is, but if this is part of a larger ETL process, you can rename the columns in a Data Flow using Select.
Source:
Add a Select node and go to the "Select settings" tab. If you know the schema, you can just fix the columns manually here:
You can also use a Rule-based mapping to remove spaces from all the column names. To do this, remove all the existing mappings and add the following:
"true()" in this context means apply to all columns, and '$$' refers to the column name. The "Inspect" tab will show the updated column names:

How can I reference a JSON source for a derived column action in Azure Data Factory

I'm new to Azure Data Factory. I've been able to generate a set of JSON files from a REST API source using a Pipeline. Each file consists of one top level JSON object with an array of up to 100 child objects. The output is saved to an Azure Blob Storage container.
I now want to use a Mapping Data Flow to modify the JSON before I write it to Azure SQL, however I'm struggling with the syntax. I've configured the source to point to the directory containing the JSON files. The Source Projection tab displays the correct schema. I can preview the data and I see a row for each file and I can expand the child objects to see the full structure.
However, when I add a Derived Column action, the Input Schema is blank in the Expression Builder. I can refer to the top level elements in the source using the byName and byPosition functions, but I don't know how I can reference the child elements.
The examples that I have been able to find online use a SQL table or CSV file as a source. I can't find any examples that use hierarchical data as the source for a derived column.
Am I missing something? Is this scenario supported?
I found a way to achieve what I want. This may not be the best approach, but it works.
It seems that it is difficult to deal with JSON that has multiple hierarchies as a source for copy data activities. You can choose one level of repeating data to map to a table structure (the Collection Reference property on the Mapping tab).
In my scenario, there was additional repeating data within the data I was mapping to my table. I updated the mapping to write the child JSON data to a text field in my SQL table. To do this, I needed to use the Azure Data Factory JSON editor for my pipeline. You can access this from the "Code" link in the top right corner of the pipeline visual editor.
I added the following line after the closing bracket for the "mappings" array for my copy activity:
"mapComplexValuesToString": true
The full path to the mapping array in the activity definition is typeProperties - translator - mappings. Make sure your commas are correct after you add the new element.
With this approach, I had a row in my SQL table for each array item in my Collection Reference. The scalar child elements in the array items are mapped to table columns and the child JSON element is written to a data column in the same table.
To extract the values I need within the child JSON, I created a SQL view that uses the CROSS APPLY OPENJSON syntax. This allows me to treat the JSON in the data field similar to a related table. You can specify the structure that your JSON is in. If you have nested data in your JSON, you can apply the same approach for each level.
The OPENJSON command is only supported by more recent versions of SQL Server. I'm using Azure SQL, so that works for me.

Data Factory v2 - Generate a json file per row

I'm using Data Factory v2. I have a copy activity that has an Azure SQL dataset as input and a Azure Storage Blob as output. I want to write each row in my SQL dataset as a separate blob, but I don't see how I can do this.
I see a copyBehavior in the copy activity, but that only works from a file based source.
Another possible setting is the filePattern in my dataset:
Indicate the pattern of data stored in each JSON file. Allowed values
are: setOfObjects and arrayOfObjects.
setOfObjects - Each file contains single object, or line-delimited/concatenated multiple objects. When this option is chosen in an output dataset, copy activity produces a single JSON file with each object per line (line-delimited).
arrayOfObjects - Each file contains an array of objects.
The description talks about "each file" so initially I thought it would be possible, but now I've tested them it seems that setOfObjects creates a line separated file, where each row is written to a new line. The setOfObjects setting creates a file with a json array and adds each line as a new element of the array.
I'm wondering if I'm missing a configuration somewhere, or is it just not possible?
What I did for now is to load the rows in to a SQL table and run a foreach for each record in the table. The I use a Lookup activity to have an array to loop in a Foreach activity. The foreach activity writes each row to a blob store.
For Olga's documentDb question, it would look like this:
In the lookup, you get a list of the documentid's you want to copy:
You use that set in your foreach activity
Then you copy the files using a copy activity within the foreach activity. You query a single document in your source:
And you can use the id to dynamically name your file in the sink. (you'll have to define the param in your dataset too):

SSIS - Is there a Data Flow Source component that will handle CSV files where the column order may change?

We have written a number of SSIS packages that import data from CSV files using the Flat File Source.
It now seems that after these packages are deployed into production, the providers of these files may deliver files where the column order of the files changes (Don't ask!). Currently if this happens, our packages will fail.
For example, an additional column is inserted at the beginning of each row. In this case, the flat file source continues to use the existing column order, which obviously has a detrimental effect on the transformation!
Eg. Using a trivial example, the original file has the following content :
OurReference,Client,Amount
235,MFI,20000.00
236,MS,30000.00
The output from the flat file source is :
OurReference Client Amount
235 ClientA 20000.00
236 ClientB 30000.00
Subsequently, the file delivered changes to :
OurReference,ClientReference,Client,Amount
235,A244,ClientA,20000.00
236,B222,ClientB,30000.00
When the existing unchanged package is run against this file, the output from the flat file source is :
OurReference Client Amount
235 A244 ClientA,20000.00
236 B222 ClientB,30000.00
Ideally, we would like to use a data source that will cope with this problem - ie which produces output based on the column names, instead of the column order.
Any suggestions would be welcomed!
Not that I know of.
A possibility to check for the problem in advance is to set up two different connection managers, one with a single flat row. This one can read the first row and tell if it's OK or not and abort.
If you want to do the work, you can take it a step further and make that flat one-field row the only connection manager, and use a script component in your flow to parse the row and assign to the columns you need later in the flow.
As far as I know, there is no way to dynamically add columns to the flow at runtime - so all the columns you need will need to be added to the script task output. Whether they can be found and get parsed from the each line is up to you. Any "new" (i.e. unanticipated) columns cannot be used. Columns which are missing you could default or throw an exception.
A final possibility is to use the SSIS object model to modify the package before running to alter the connection manager - or even to write the entire package dynamically using the object model based on an inspection of the input file. I have done quite a bit of package generation in C# using templates and then adding information based on metadata I obtained from master files describing the mainframe files.
Best approach would be to run a check before the SSIS package imports the CSV data. This may have to be an external script/application, because I don't think you can manipulate data in the MS Business Intelligence Studio.
Here is a rough approach. I will write down the limitations at the end.
Create a flat file source. Put the entire row in one column.
Do not check Column names in first data row.
Create a Script Component
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sRow = Row.Column0;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] columns = sRow.Split(',');
foreach (string column in columns)
{
sManipulated = string.Format("{0}{1}", sManipulated, column.PadRight(15, ' '));
}
/* Note: For sake of demonstration I am padding to 15 chars.*/
Row.Column0 = sManipulated;
}
Create a flat file destination
Map Column0 to Column0
Limitation: I have arbitrarily padded each field to 15 characters. Points to consider:
1. Do we need to have each field of same size?
2. If yes, what is that size?
A generic way to handle that would be to create a table to store the file name, fields, and field sizes.
Use the file name to dynamically create the source and destination connection manager.
Use the field name and corresponding field size to decide the padding. Not sure, if you need this much flexibility. If you have any question, please respond.