Add transaction rid, generated timestamp or any other uuid to each imported file from S3/directory - palantir-foundry

in the Foundry Data Connection app, I can apply Transformers to the files ingested from S3 or any other Filesystem like source.
Is there any Transformer available to add an identifier (e.g. current transaction_rid, a generated timestamp when the ingestion started, or a uuid) to all filenames?
We have a source that provides numbered files, that get deleted automatically and replaced with new version, but we want to leverage an APPEND/UPDATE transaction type in the ingestion.
Thanks

Related

What is an alternative to CSV data set config in JMeter?

We want to use 100 credentials from .csv but I would rather like to know if there is any other alternative to this available in jmeter.
If you have the credentials in the CSV file there are no better ways of "feeding" them to JMeter than CSV Data Set Config.
Just in case if you're still looking for alternatives:
__CSVRead() function. The disadvantage is that the function reads the whole file into memory which might be a problem for large CSV files. The advantage is that you can choose/change the name of the CSV file dynamically (in the runtime) while with the CSV Data Set Config it has to be immutable and cannot be changed once it's initialized.
JDBC Test Elements - allows fetching data (i.e. credentials) from the database rather than from file
Redis Data Set - allows fetching data from Redis data storage
HTTP Simple Table Server - exposes simple HTTP API for fetching data from CSV (useful for distributed architecture when you want to ensure that different JMeter slaves will use the different data), this way you don't have to copy .csv file to slave machines and split it
There are few alternatives
JMeter Plugin for reading random CVS data : Random CSV Data Set Config
JMeter function : __CSVRead
Reading CSV file data from a JSR223 Pre Processor
CSV Data Set Config is simple, easier to user and available out of the box.

Using Azure Data Factory to read only one file from blob storage and load into a DB

I'd like to read just one file from a blob storage container and load it into a copy operation into a DB, after the arrival of the file has set off a trigger.
Using Microsoft Documentation, the closest I seem to do is read all the file in order of Modified Date.
Would anyone out there know how to read one file after it has arrived in my blob storage?
EDIT:
Just to clarify, I would look to read only the latest file automatically. Without hardcoding the filename.
You can specify a single Blob in the DataSet. This value can be hard coded or variables (using Data Set Parameters):
If you need to run this process whenever a new blob is created/updated, you can use the Event Trigger:
EDIT:
Based on your addition of "only the latest", I don't have a direct solution. Normally, you could use Lookup or GetMetadata activities, but neither they nor the expression language support sorting or ordering. One option would be to use an Azure Function to determine the file to process.
However - if you think about the Event Trigger I mention above, every time it fires the file (blob) is the most recent one in the folder. If you want to coalesce this across a certain period of time, something like this might work:
Logic App 1 on event trigger: store the blob name in a log [blob, SQL, whatever works for you].
Logic App 2 OR ADF pipeline on recurrence trigger: read the log to grab the "most recent" blob name.

Azure Data Factory v2 Data Transformation

I am new to Azure Data Factory. And my question is, I have a requirement to move the data from an on-premise Oracle and on-premise SQL Server to a Blob storage. The data need to be transformed into JSON format. Each row as one JSON file. This will be moved to an Event Hub. How can I achieve this. Any suggestions.
You could use lookup activity + foreach activity. And inside the foreach, there is a copy activity. Please reference this post. How to copy СosmosDb docs to Blob storage (each doc in single json file) with Azure Data Factory
The Data copy tool as part of the azure data factory is an option to copy on premises data to azure.
the data copy tool comes with a configuration wizard where you do all the required steps like configuring the source, sink, integration pipeline etc.
In the source you need to write a custom query to fetch data from the tables you require in json format.
In case of SQL server to select json you would use the options OPENJSON, FOR JSON AUTO to convert the rows to json. Supported in SQL 2016. For older versions you need to explore the options available. Worst case you can write a simple console app in C#/java to fetch the rows and then convert them to json file. And then you can upload the file to azure blob storage. If this is an one time activity this option should work and you may not require a data factory.
In case of ORACLE you can use the JSON_OBJECT function.

Multiple or 1 JSON File

I'm creating a questionnaire application in Qt, where surveys are created, and users log on and complete these surveys. I am saving these as JSON.
Each survey could have 60+ questions, and are completed multiple teams by different people.
Is it more appropriate to save as 1 JSON file, or a file for each Survey?
I would use a Database rather than a JSON file. You can use JSON to serialize data and transfer it through processes and computers or servers, but you don't want to save big data to a JSON file.
Anyway if that's what you want to do I would save each survey in a different JSON file. Maybe keep them in order by assigning a unique identifier to each file (name of the file) so that you can find and search for them easily.
One single file would be a single point of failure, and when reading and writing it there would be concurrency problems. One file for each survey should soothe the problem.

Import into Elasticsearch from repeatedly created JSON but store old values

New to Elasticsearch and Kibana so please bear with me. I'm not using Logstash but JSON files to import the information I need. Basically, Kibana is used for monitoring change in values in MySQL database over time. Right now, transfer of information works by a script to 1)delete previous versions of JSON files containing my information, 2)export MySQL information into JSON format files, and 3)Re-importing the newly created JSON files.
Each row of my data has a timestamp. Here comes the problem. The old versions of the information I imported are no longer reflected in Kibana (maybe cause the previous files I deleted?). Is there a way to keep the information with the old timestamps and simultaneously import the new ones?
If you were to leave the old documents in elasticsearch and insert new ones, then you would be able to see/manage/etc both. Elasticsearch will issue new IDs for the new documents, and all should be well.