Create a SAS function that takes as input and output a dataset - function

I am doing the same 10 sub steps transformation to multiple data sets. Let's call this transformation flag_price_change.
This transformation takes as an input a dataset and a threshold (real) and creates 10 subdatasets in order to come up with the final one with some added columns. As I said before, I repeat this transformation to multiple datasets
As I am processing multiple data tables the same way, I would like to know if I could create a function like this in SAS.
flag_price_change(input_table,column_name1,column_name2,threshold,output_table).
Where column_name 1 and 2 are just names of the columns the algorithm just focus on, and output_table should be the created table after the flag_price_change function is executed.
Questions:
What's the procedure to define such a function?
Can I store it in a separate SAS file?
How do I call this function from another SAS program?

SAS functions are for individual observations of data. What you want is a macro (check out a starter guide here), which is defined like this:
%macro flag_price_change(input_table, column_name1, column_name2, threshold, output_table);
/** Inside the macro, you can refer to each parameter/argument
with an ampersand in front of it. So for example, to add
column_name1 to column_name2, you would do the following:
**/
DATA &output_table;
set &input_table;
new_variable = &column_name1 + &column_name2;
RUN;
%mend;
To call the macro, you would do this:
%flag_price_change(
input_table = data1,
column_name1 = var1,
column_name2 = var2,
threshold = 0.5,
output_table = output1);
To call the same code on another data set with different variable names and threshold:
%flag_price_change(
input_table = data2,
column_name1 = var3,
column_name2 = var4,
threshold = 0.25,
output_table = output2);
There are a lot of tricks and catches with macro programming to be aware of, so do check your work at each step.

Related

Django bulk update setting each to different values? [duplicate]

I'd like to update a table with Django - something like this in raw SQL:
update tbl_name set name = 'foo' where name = 'bar'
My first result is something like this - but that's nasty, isn't it?
list = ModelClass.objects.filter(name = 'bar')
for obj in list:
obj.name = 'foo'
obj.save()
Is there a more elegant way?
Update:
Django 2.2 version now has a bulk_update.
Old answer:
Refer to the following django documentation section
Updating multiple objects at once
In short you should be able to use:
ModelClass.objects.filter(name='bar').update(name="foo")
You can also use F objects to do things like incrementing rows:
from django.db.models import F
Entry.objects.all().update(n_pingbacks=F('n_pingbacks') + 1)
See the documentation.
However, note that:
This won't use ModelClass.save method (so if you have some logic inside it won't be triggered).
No django signals will be emitted.
You can't perform an .update() on a sliced QuerySet, it must be on an original QuerySet so you'll need to lean on the .filter() and .exclude() methods.
Consider using django-bulk-update found here on GitHub.
Install: pip install django-bulk-update
Implement: (code taken directly from projects ReadMe file)
from bulk_update.helper import bulk_update
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
r = random.randrange(4)
person.name = random_names[r]
bulk_update(people) # updates all columns using the default db
Update: As Marc points out in the comments this is not suitable for updating thousands of rows at once. Though it is suitable for smaller batches 10's to 100's. The size of the batch that is right for you depends on your CPU and query complexity. This tool is more like a wheel barrow than a dump truck.
Django 2.2 version now has a bulk_update method (release notes).
https://docs.djangoproject.com/en/stable/ref/models/querysets/#bulk-update
Example:
# get a pk: record dictionary of existing records
updates = YourModel.objects.filter(...).in_bulk()
....
# do something with the updates dict
....
if hasattr(YourModel.objects, 'bulk_update') and updates:
# Use the new method
YourModel.objects.bulk_update(updates.values(), [list the fields to update], batch_size=100)
else:
# The old & slow way
with transaction.atomic():
for obj in updates.values():
obj.save(update_fields=[list the fields to update])
If you want to set the same value on a collection of rows, you can use the update() method combined with any query term to update all rows in one query:
some_list = ModelClass.objects.filter(some condition).values('id')
ModelClass.objects.filter(pk__in=some_list).update(foo=bar)
If you want to update a collection of rows with different values depending on some condition, you can in best case batch the updates according to values. Let's say you have 1000 rows where you want to set a column to one of X values, then you could prepare the batches beforehand and then only run X update-queries (each essentially having the form of the first example above) + the initial SELECT-query.
If every row requires a unique value there is no way to avoid one query per update. Perhaps look into other architectures like CQRS/Event sourcing if you need performance in this latter case.
Here is a useful content which i found in internet regarding the above question
https://www.sankalpjonna.com/learn-django/running-a-bulk-update-with-django
The inefficient way
model_qs= ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
obj.name = 'foo'
obj.save()
The efficient way
ModelClass.objects.filter(name = 'bar').update(name="foo") # for single value 'foo' or add loop
Using bulk_update
update_list = []
model_qs= ModelClass.objects.filter(name = 'bar')
for model_obj in model_qs:
model_obj.name = "foo" # Or what ever the value is for simplicty im providing foo only
update_list.append(model_obj)
ModelClass.objects.bulk_update(update_list,['name'])
Using an atomic transaction
from django.db import transaction
with transaction.atomic():
model_qs = ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
ModelClass.objects.filter(name = 'bar').update(name="foo")
Any Up Votes ? Thanks in advance : Thank you for keep an attention ;)
To update with same value we can simply use this
ModelClass.objects.filter(name = 'bar').update(name='foo')
To update with different values
ob_list = ModelClass.objects.filter(name = 'bar')
obj_to_be_update = []
for obj in obj_list:
obj.name = "Dear "+obj.name
obj_to_be_update.append(obj)
ModelClass.objects.bulk_update(obj_to_be_update, ['name'], batch_size=1000)
It won't trigger save signal every time instead we keep all the objects to be updated on the list and trigger update signal at once.
IT returns number of objects are updated in table.
update_counts = ModelClass.objects.filter(name='bar').update(name="foo")
You can refer this link to get more information on bulk update and create.
Bulk update and Create

How to self-reference the variable to the left in an asignment in GNU Octave?

Say you need to store the name of all local variables in a script: local = union(who,"local"). Is there a function to replace the name ("local") with a reference to the variable which is to the left of the expression?
Update: For the sake of clarity, the sentence will be rewritten like local = union(who, leftside()), where the leftside function returns local.
Is there a sort of leftside() function with this behavior?
Do you mean like this?
>> a = "test"
a = test
>> b = "rest "
b = rest
>> a = union(b, a)
a = erst
>> help union
'union' is a function from the file C:\Octave\Octave-4.0.3\share\octave\4.0
.3\m\set\union.m
-- Function File: C = union (A, B)
-- Function File: C = union (A, B, "rows")
-- Function File: [C, IA, IB] = union (...)
Return the unique elements that are in either A or B sorted in
ascending order.
Update:
I'm not aware of such function. There are bunch of built in functions for internal use, that have the names like __func_name__ and you may get a list of them if you type __ in the Octave's command window and press tab twice. At a glance I don't think there is a function, like the one you need, but take a more detailed look if you like.
Can you describe why do you need such behavior, maybe there is a different approach that can suit your needs i.e there are functions for programmatic creation of variables - genvarname.

postgres crosstab query with $libdir/tablefunc crosstab_hash function

My crosstab query (see below) runs just fine. However, I have to generate a large number of such queries, and - crucially - the number of column definitions will vary from day to day. If the number of output columndefs does not match that of the second argument of the crosstab, the crosstab will throw and error and abort. Therefore, I cannot "hard-wire" the column definitions as in my current query, and I need instead a function which will ensure that column definitions will be synchronized on-the-fly. Is it possible to write a generic postgres function that will be reusable in all such instances? Here is my query:
SELECT *
FROM crosstab
('SELECT
to_char(ipstimestamp, ''mon DD HH24h'') As row_name,
ips.objectid::text As category,
COUNT(*)::integer As value
FROM loggingdb_ips_boolean As log
INNER JOIN IpsObjects As ips
ON log.Varid=ips.ObjectId
WHERE (( log.varid = 37551)
OR (log.varid = 27087)
OR (log.varid = 29469)
OR (log.varid = 50876)
OR (log.varid = 45096)
OR (log.varid = 54708)
OR (log.varid = 47475)
OR (log.varid = 54606)
OR (log.varid = 25528)
OR (log.varid = 54729))
GROUP BY to_char(ipstimestamp, ''yyyy MM DD HH24h''), row_name, objectid, category
ORDER BY to_char(ipstimestamp, ''yyyy MM DD HH24h''), row_name, objectid, category',
'SELECT DISTINCT varid
FROM loggingdb_ips_boolean ORDER BY 1;'
)
As CountsPerHour(row_name text,
"25528" integer,
"27087" integer,
"29469" integer,
"37551" integer,
"45096" integer,
"54606" integer,
"54708" integer,
"54729" integer)
PS: Note that this query can be run against test data at the following server:
host: bellariastrasse.com
database: IpsLogging
user: guest
password: guest
I am afraid what you want is not completely possible. If the return type varies, you can either
create a function returning a generic SETOF record.
But then you'd have to provide a column definition list with every call - bringing you right back to where you started.
create a new function with a matching return type for every different case.
But that's what you are trying to avoid ...
If you have to write "a large number of such queries" you could utilize a query-generator function instead, which would not return the results but the DDL script which you would execute in a second step. Basically a function that takes in the variable parts as parameters and generates the query string in your example .. RETURNS text.
This can get pretty complex. Several meta-levels on top of each other have to be considered, but it is absolutely possible. Be sure to make heavy use of dollar-quoting to keep the quoting madness at bay.

How to update an SQL table with F#

I would like to update a MySQL table from an F# query.
Basically I am now able to import a MySql table into a F# matrix.
I am doing some calculation on this matrix through F# and after the calculation I would like to update the primary MySQL table with the new values obtained.
For example let's take a simple matrix coming from a MySQL table :
let m = matrix [[1.;2.;3.];[1.;1.;3.];[3.;3.;3.]]
Now I would like to design a query which updates the mySQL table.
let query() =
seq { use conn = new SqlConnection(connString)
do conn.Open()
use comm = new SqlCommand("UPDATE MyTAble SET ID = "the first column of the matrix",
conn)
}
Do I need to convert the matrix into a sequence?
For this iterative update, do I need to use T-SQL (I found this way by reading some others answers?
There is no built-in functionality that would make it easier to update F# matrix in a database, so you'll have to write the update yourself. I think the best approach is to simply iterate over the matrix using for and run the update command if the value has changed. Something like:
let m1 = matrix [[1.;0.]; [0.;2.]] // The original value from DB
let m2 = m1 + m1 // New updated matrix
for i in 0 .. (fst m1.Dimensions) - 1 do
for j in 0 .. (snd m1.Dimensions) - 1 do
if m1.[i, j] <> m2.[i, j] then
// Here you need to do the actual update
printfn "SET %d, %d = %f" i j m2.[i, j]
Your function already shows how to run a single UPDATE command - just note that you don't need to wrap it in seq { .. } if you're not returning a sequence of results (here, you just want to perform some action).
If the number of changed values is big, then it may be useful to use some bulk update functionality (so that you don't run too many SQL commands which may be slow), but I'm not sure if MySQL has anything like that.

Using transcations and parameters in ADO in VBScript

I'm a bit stuck with parameters and transactions in ADO, in VBScript and Access. Basically, I'm working through a massive loop and writing the results to a database, so I need to wrap it in a transaction otherwise it takes ages.
I've written the below script which works for a single parameter, (although this seems a bit of a long way of doing it, so if anyone knows a shorter way, please shout). However I can't work out how to expand this to two parameters:
objConn.BeginTrans
set oParm = CreateObject("ADODB.Parameter")
oParm.Value = ""
oParm.Type = 200
oParm.Direction = 1
oParm.Size = 100
Set oCmd = CreateObject("ADODB.Command")
oCmd.ActiveConnection = objConn
oCmd.commandText = "INSERT INTO table (field) VALUES (?)"
oCmd.commandType = 1
oCmd.Parameters.Append oParm
'Big loop here that goes through lots of lines.
oCmd.Execute ,"Field",1
'Loop
objConn.CommitTrans
For example, if I wanted to expand this to:
oCmd.commandText = "INSERT INTO table (field1, field2) VALUES (?,?)"
I can't figure out what I do with my parameters. I'm sure I'm just being stupid here and not quite following how these work.
I've never tried passing parameter values through the Execute method, so I can't quite say what's wrong. I will say that the documentation states that the second argument should be an array of values, so maybe if you tried Array("Field1Val", "Field2Val"), that would work.
What I usually do is give each parameter a name, then you can reference it within your loop to change its value. You can use any name you like, as long each parameter has a unique name. As an example:
' Sometime before your loop
oParm.Name = "foobar"
' Start loop
oCmd.Parameters("foobar").Value = "someValue"
oCmd.Execute , , 1
' End loop
As far as shortening the code, the only suggestion I can make is using the CreateParameter method to, well, create the parameter. That will allow you to set all the relevant properties on one line.
Set oParm = oCmd.CreateParameter("foobar", 200, 1, 100)