Identifying AEM Content Fragments related to a particular field value updated in a csv - csv

When a Content Fragment is updated by drawing values from a linked csv file, is there a way to identify only those CFs that are related to the particular field value, updated in the csv file?
Suppose we only update the income for Year1 in a sample csv file (row: Income, column: Year1) and this csv may be linked to a number of CFs but I only want to find the CFs containing the particular field value updated, that is income for year1.
I initially thought this would trigger an event for the CF updated as it would have for any other update that we do to CFs directly but in this case an event trigger is not happening possibly because it's drawing values from an updated csv file.
What would be a possible solution to this problem?

There is more than one way. In general, it depends on how you upload the csv file in AEM.
You can implement an event listener to monitor changes in the csv file. If it is updated, you can take the path from the event data, parse the contents and update the CFs.
If you are updating the csv file via Sling Post Servlet, then you can implement org.apache.sling.servlets.post.PostOperation and put the logic in it.

Related

Is there a way to insert the filename into the record when importing a JSON file into Power BI?

Not sure how to ask this but here goes. I have a collection of 500+ JSON files that I need to import into PowerBI. Each JSON has four different levels of information that I need to parse out. I converted the JSON top-level info into a table and transposed it so I had one row like the attached screenshot.
enter image description here
My first question is: can I easily add the filename to the JSON record? I would like to use the filename as a unique key in later queries.
Thanks!
It looks like you may be connecting to each JSON file individually? If I'm correct, assuming all the JSON files can be in a single folder, you can use the "Folder" connection. That then allows you to right-click on the original folder query and choose "reference" to then create various transformations for each JSON file, and it includes the file name.
Related details:
https://powerbi.tips/2016/06/loading-data-from-folder/
https://learn.microsoft.com/en-us/power-bi/guidance/power-query-referenced-queries
Hoping that helps!

Source File Connection (Flat File) - Not reading column metadata

When I create the SSIS package it requires a file to be referenced to pick up the files metadata. For example the column headers will be ColumnA, ColumnB.
I have always assumed that these column names need to be present in the file for it to be loaded. Recently business, for whatever reason, changed one of the column names in the file to something else so the file contains ColumnA, NotColumnB. When the SSIS package runs it ignores this and loads the file. I assumed that it would fail. Is my assumption correct and there is something weird going on or is my assumption incorrect, if so please let me know why.
I have changed the column names in a few other packages that load data from a file and they also dont care what the column names are
Click on the flat file source, and press F4 to show the properties tab. There are a property called ValidateExternalMetadata change it to True.
For more information check the following answer:
Detect new column in source not mapped to destination and fail in SSIS
Update 1
It looks like that flat file connection manager has no validation engine and the metadata defined is used at configuration time to configure the mappings between the data file and the database.
Why Does't SSIS Flat File Data Check If Columns Names or Order Have Changed? What is best way to check?
Flat file destination columns data types validation

Metadata map for importing CSV data into IPTC XMP images using Bridge

Let's say I have 100 scanned Tif files. I also have a CSV of the metadata for those 100 Tif files. Each file is named with its unique identifier, which is also column 1 of the csv.
First: How do I find a map that tells me what columns should be named what, in order to stay within the IPTC standard using XMP? (I've googled for most of the day and have found nothing)
Second: How can I merge the metadata in the CSV to each corresponding image?
I'm basically creating a spreadsheet with all 50,000 images in an archival collection, and plan to use the CSV to create the metadata for the images once they're scanned.
Thanks!
To know where to put your metadata, I'd suggest looking at the IPTC Photo Metadata Standard page. Without knowing more about your data, it's hard for someone else to say what data should go where.
As for embedding your data into your files from a CSV file, I'd suggest exiftool. Change the header of each column to the name of the TAG to write to and make the first column the path/filename of each file, your command would be as simple as
exiftool -csv=file.csv /path/to/files
See exiftool FAQ #26 for more details.

How to Write the File and File Path to table

I have a SSIS package - which within a FOR LOOP CONTAINER I look in a particular location, for a particular file format and import it into a database.
This is working fine - when I have two files the contents of both files are being imported.
So I have a Variable Mapping under my ForLoop which records the fully qualified name. What I want to do is when I import the file is I am also recording the file path of where it has come from.
I'm unsure in my dataflow task where I would put that ? Under the data flow I have my source file and a destination.
I tried to have a sql task after the data flow that updated the field in the database with the variable (via Parameter Mapping), but that set the field to the same value for everything (the last file path found) which is not what I'm after.
Any advice would be welcome
In your dataflow task, in between your source and destination add a Derived Column transformation. This will add columns to your dataset with a name and value that you specify. If you reference variables in which you are storing the file name for your loop container, the name of the file being accessed will be appended to an additional column in your dataset. Obviously you need to make sure that this column is present in your destination table.

how to create a SSIS package which creates three text files, using same variables but the textfile is only created when the correct data is found?

There are only 3 files that can be created : "File_1", "File_2" and "File_3". The same variable name is used in each instance (User::FileDirectory) and (User::File_name), but because the actual value of the variable changes, a new file is created.However the files are only created if there is data to go into the file. i.e. if there are no records to populate the file, it will not be created at all. When the files are created, the date the file was created should also be added to the filename. eg: File1_22102011.txt
Ok if the above was a little confusing, the following is how it works,
All the files use the same variable, but it is reset before each file is created.
• So it populates a result set in memory with the first sql selection (ID number, First_Name and Main_Name). It sets the file variable to “File_1”. If there are records in the result set, it creates and writes to this filename.
• Then it creates a new result set with the second selection(Contract No). It sets the variable to "File_2". If there are records in this new result set, a new file will be created from the variable(which now has a new value)
• Finally a third result set is created (Contract_no, ExperianNo, Entity_ID_Number, First_Name, Main_Name), and the file variable is set to "File_3". Again if there are records in the result set, then this file will be created and written to.
I have worked on a few methods to achieve this but they all have failed, So little help will be greatly appreciated.
While what you have works, I think it'd be rather painful to maintain.
I would approach it as 3 sequence containers running in parallel. Each container would have a data flow and two file tasks hanging off it based on success of the parent and the value of row count variable. If the row count variable is 0, delete the file. If it's greater than 0, rename it to File_n
As you can see, I have a container for the first file. The data flow creates an output a.txt file. Based on the value of the variable #RowCount1, it will either delete the empty file or rename it to File_1.
Each data flow would look like a source query, a row count transformation and a file destination with a temporary name (a.txt, b.txt, c.txt). As a file is always created, even if it's empty, we will need to delete or rename it afterwards which will be accomplished based on the file operation tasks.
In my opinion, this approach will be cleaner as it will allow you to test and debug each item in a cleaner manner rather than dealing with an in-memory dataset.