QlikView - Loading specific files from remote server - csv

I'm trying to solve this problem for a long time, but now I have to ask for your help.
I have one QVD file on my local PC named e.g. server001_CPU.qvd and on remote servers I have shared folder with many files of many types. There are also files named server001_CPU_YYYYMMDD.csv (e.g. server001_CPU_20140806.csv) that are generated every day and that have same structure as local qvd file. They have column DATE. What I need is (in loading script) to check last DATE in local file and load remote files starting from that day to today and then concatenate it together. Something like this:
CPU:
LOAD * FROM server001_CPU.qvd
LET vMAX = Max(DATE) FROM CPU
DO WHILE vMAX <= Today()
CPU:
LOAD * FROM serverpath/server001_CPU_$(vMAX).csv
LOOP
I'm really trying but I'm new to QV and it has strange logic for me. Thanks in advance for any help.

You can try the below script snippet which should do what you need.
What this does is first open your existing data set (in the QVD), and then finds the maximum date and stores it in table MaxCPUDate. This maximum value is then read into a variable and the table is dropped.
This "Max Date" value is then subtracted from today's date to determine the number of loops to execute to load the individual files. The loop variable is added on to the "Max Date" value to create the filename to load.
CPU:
LOAD
*
FROM server001_CPU.qvd (qvd);
MaxCPUDate:
LOAD DISTINCT
max(DATE) as MaxDate
RESIDENT CPU;
LET vMaxCPUDate = peek('MaxDate',-1,'MaxCPUDate');
DROP TABLE MaxCPUDate;
FOR vFileNum = 0 TO (num(Today()) - $(vMaxCPUDate))
LET Filename ='serverpath/server001_CPU_' & date($(vMaxCPUDate) + $(vFileNum),'YYYYMMDD') & '.csv';
CONCATENATE (CPU)
LOAD
*
FROM $(Filename) (txt, codepage is 1252, embedded labels, delimiter is ',', msq);
NEXT

Related

Data factory copy based off last high water mark value (Dynamic date)

I'm currently working on a project where I need the data factory pipeline to copy based off the last run date.
The process breakdown....
Data is ingested into a storage account
The data ingested is in the directory format topic/yyyy/mm/dd i.e., multiple files being brought in a single directory hence it's files are partitioned by date which looks like this day format and month and year etc
The process currently filters based on the last high water mark date which updates each time the pipeline is run and triggers daily at 4am, once the copy is successful, a set variable increases the high-water mark value by 1 (I.e., one day), though files are not brought over on the weekends (this is the problem)
The date value (HWM) will not increase if no files are brought over and will continue to loop through the same date.
How to I get the pipeline to increase or look for the next file in that directory given that I use the HWV as the directory to the file, copy and update the HWM value only when completed dynamically. Current update logic
current lookup of HWV lookup and directory path to copy files
Instead of adding 1 to last high water mark value, we can try to update current UTC as watermark value. So that, even when pipeline is not triggered data will be copied to the correct destination folder. I have tried to repro in my environment and below is the approach.
Watermark table is taken initially with watermark value as '1970-01-01'.
This table is referred in the Lookup Activity.
Copy data activity is added and in source, query is given as
select * from tab1 where lastmodified > '#{activity('Lookup1').output.firstRow.watermark_value}'
In Sink, Blob storage is taken. In order to have folder structure as year/month/day,
#concat(formatDateTime(utcnow(),'yyyy'),'/', formatDateTime(utcnow(),'mm'),'/',formatDateTime(utcnow(),'dd'))
is given in folder path.
File is copied as in below path.
Once file is copied, Watermark value is updated with the current UTC time.
update watermark_table
set
watermark_value='#{formatDateTime(utcnow(),'yyyy-MM-dd')}'
where tab_name='tab1'
When pipeline is triggered next day, data will be copied from the watermark value and once file is copied, value of current UTC is updated as watermark value.
I think reading the post a couple of time , what I understood is
You already have a water mark logic .
On the weekend when there are NO files in the folder , the current logic does NOT increment the watermark and so you are facing issues .
If I understand the ask correctly . please use the #dayOfWeek() function . Add a If statement and let the current logic only execute when the day of the week is Monday(2)-Friday(6) .
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-expressions-usage#dayofweek

Inserting Data from Flat file source to database between 2 dates

I have an SSIS package set up that imports downloaded data files to the database (one file at a time by date)
Current Setup (for a file):
Downloaded file is at location (same file exists between the date range 1st Feb to Today)
C:\DataFiles\GeneralSale_20170201.txt
In SSIS the variables - For each file there are 4 variables. First is the location of the where the file is, called #Location
The second simple gives the name of the file named #GeneralSales returning the value
GeneralSale_
The third is the date (#ExportDateFormatted) for which the code is, (DT_WSTR,8)(DATEPART("yyyy", #[User::ExportDate]) * 10000 + DATEPART("mm", #[User::ExportDate]) * 100 + DATEPART("dd", #[User::ExportDate])) and [ExportDate] is set as DATEADD("DD", 0, GETDATE()).
[ExportDate] allows me to set the file date (which is already downloaded) that I want to import in my table dbo.GeneralSale i.e. If I want to import the file on 20170205 then I would adjust the export date and then run the package
The final variable is the #ExportFileExtension returning the value
txt
Then in the DataFlow which looks like the below:
The flat file source connects to the connection string below. The Property > Expressions > ConnectionString of the connection string runs the variables to make a file name. This is where I use the variables from before
#[User::Location] + #[User::GeneralSales] + #[User::ExportDateFormatted] + "." + #[User::ExportFileExtension]
Returning the value:
C:\DataFiles\GeneralSale_20170201.txt
This then populates the table with the data of that file. But to insert the date for another day I have to amend the date and run the package.
What I am trying to do is pass a start and end date to let the package insert all data from the files between those dates.
Hope the above information is clear of what goes on and what I am trying to achieve.
You need to iterate between two dates. In SSIS its pretty straightforward; I would describe the main steps:
Define two package parameters, StartDate and EndDate of type Date, and on the package start - validate that StartDate <= EndDate.
Define a Date variable ExtrDate, and add For Loop with settings initial Expression #ExtrDate = #StartDate, Evaluation - #ExtrDate <= #EndDate and Assign - #ExtrDate = DateAdd("dd", 1, #ExtrDate). Purpose of this loop is quite clear.
Put your extraction tasks inside For Loop container.
ExtrDate variable will be increased on each step of the loop.
Package parameters allow building more flexible package.

Running a thread group multiple times for all the values in a csv file

I have recorded a series of 5 HTTP requests in a thread group (say TG). The response value of a request has to be sent as a parameter in next request, and so on till the last request is made.
To send the parameter in first request, I have created a csv file with unique values (say 1,2,3,4,5).
Now I want this TG to run for all the values read from the csv file (In above case, TG should start running for value 1, then value 2, till 5).
How do I do this?
Given your CSV file looks like:
1
2
3
4
5
In the Thread Group set Loop Count to "Forever"
Add CSV Data Set Config element under the Thread Group and configure it as follows:
Filename: if file is in JMeter's bin folder - file name only. If in the other location - full path to CSV file
Variable Names: anything meaningful, i.e. parameter
Recycle on EOF - false
Stop thread on OEF - true
Sharing mode - according to your scenario
You'll get something like:
See Using CSV DATA SET CONFIG guide for more detailed explanation.
Another option is using __CSVRead() function
This method of creating individual request for each record will not be scalable for multiple records. There is another scalable solution here - Jmeter multiple executions for each record from CSV

How to select Multiple CSV files based on date and load into table

i receive input files daily in a folder called INPUTFILES. These files have filename along with datetime.
My Package has been scheduled to run everyday. If i receive 2 files for the day, i need to fetch these 2 files and load into the table.
For example i had files in my files
test20120508_122334.csv
test20120608_122455.csv
test20120608_014455.csv
now i need to run files test20120608_122455.csv test20120608_014455.csv for the same day.
I solved the issue. I have taken one varibale which checks for whether a file exists for that particular Day.
If the file exists for a particular day then the value for the variable is assigned to 1.
For Each Loop Container has been taken, and placed the this file exists variable inside the container.
For Loop Properties
EvalExpression ---- #fileexists==1.
if no file exists for that particular day, then the loop fails.

Pre-process (classic) ASP page

I am running a classic vbscript ASP site with a SQL2008 database. There are a few pages that are processor-heavy, but don't actually change that often. Ideally, I would like the server to process these once a night, perhaps into HTML pages, that can then fly off the server, rather than having to be processed for each user.
Any ideas how I can make this happen?
The application itself works very well, so I am not keen to rewrite the whole thing in another scripting language, even if classic asp is a bit over the hill!!
Yes :
You didn't specify which parts of the pages are "processor heavy" but I will assume it's the query and processing of the SQL data. One idea is to retrieve the data and store it as a cached file, in the filesystem. XML is a good choice for the data format.
Whereas your original code is something like this:
(psuedocode)
get results from database
process results to generate html file
...your modified code can look like this:
check if cache file exists
if not exist
get results from database
store results in cache file
get results from cache file
process results to generate html file.
This is a general caching approach and can be applied to a situation where you've got
query parameters determining the output. Simply generate the name of the cache file based on all the constituent parameters. So if the results depend on query parameters named p1 and p2, then when p1 and p2 have the values 1234 and blue respectively, the cache file might be named cache-1234-blue.xml . If you have 5 distinct queries, you can cache them as query1-1234-blue.xml, query2-1234-blue.xml and so on.
You need not do this "nightly". You can include in your code a cache lifetime, and in place of the "if cache file exists" test, use "if cache file exists and is fresh". To do that just get the last modified timestamp on the cache file, and see if it is older than your cache lifetime.
Function FileOlderThan(fname, age)
'function returns True if the file is older than the age,
' specified in minutes.
Dim LastModified, FSO, DateDifference
Set FSO = CreateObject("Scripting.FileSystemObject")
LastModified = FSO.GetFile(fname).DateLastModified
DateDifference = DateDiff("n", LastModified, Now())
If DateDifference > age Then
FileAge = False
Else
FileAge = True
End If
End Function
fname = Server.MapPath(".") & cacheFileName
If FileOlderThan(fname, 10) Then
... retrieve fresh data ...
End If
This could be 10 minutes, 10 hours, 10 requests, whatever you like.
I said above that XML is a good choice for the data format in the cachefile. ADO has a SaveAsXML method, and you can also generate XML directly from SQL2008 using the FOR XML clause appended to the query.
If the "processor heavy" part is not the query and retrieval, but is the generation of the html page, then you could apply the same sort of approach, but simply cache the html file directly.