retrieving the script name in sbcl - sbcl

Suppose I have a file named '1.sb' with this content:
#!/usr/local/bin/sbcl --script
(prin1 sb-ext:*posix-argv*) (terpri)
If I invoke it as '1.sb a b c' I get this output:
("/usr/local/bin/sbcl" "a" "b" "c")
How do I get the name of the script file itself ('1.sb')?

It turns out that for sbcl, a correct (at least as far as I can tell) lightweight solution would be as in this script:
#!/usr/local/bin/sbcl --script
(prin1 (apply #'concatenate 'string
(remove-if #'null
(list (pathname-name *load-truename*)
(when (pathname-type *load-truename*)
".")
(pathname-type *load-truename*))))) (terpri)
Thanks to user tobik at the Freebsd forums for pointing me to *load-truename*.

Related

Match-zero-or-more operator in nextflow glob

I am trying to create a robust glob pattern that will match most of the different naming conventions used for fastq files we receive. However, the version of nextflow I am using (20.10.0) on the HPC doesn't seem to accept what I've written.
Here are some examples of file names:
19_S8_R1_001.fastq.gz 19_S8_R2_001.fastq.gz
F1HD1_S28_R1.fastq.gz F1HD1_S28_R2.fastq.gz
SRR3137747_1.fastq SRR3137747_2.fastq
The pattern I originally wrote to go with the fromFilePairs operator was *_?(R){1,2}?(_001).f?(ast)q?(.gz). Which I tested in a bash environment. Here is the output from testing in the directory with the top two example files:
-bash-4.2$ shopt -s extglob
-bash-4.2$ ls -1 *_?(R){1,2}?(_001).f?(ast)q?(.gz)
19_S8_R1_001.fastq.gz
19_S8_R2_001.fastq.gz
But when I tried to run this with nextflow, it just gave me the error message I put into the ifEmpty operator.
I've eventually got it working, but using this pattern: *_{R1,R2,1,2}{.fastq.gz,.fq.gz,.fastq,.fq,_001.fastq.gz,_001.fq.gz,_001.fastq,_001.fq}, which isn't particularly robust.
Unless I've missed it in the nextflow documentation (and the information I've found about glob), I don't see alternatives to match-zero-or-more operators in nextflow. Any alternative solutions?
Thanks in advance.
The following glob pattern seems to match some of the more common FASTQ filenames:
Channel
.fromFilePairs( '*_{,R}{1,2}{,_001}.{fq,fastq}{,.gz}' )
.view()
Or with a parameterized directory prefix:
Channel
.fromFilePairs( "${params.input_dir}/*_{,R}{1,2}{,_001}.{fq,fastq}{,.gz}" )
.view()
Results:
N E X T F L O W ~ version 21.04.3
Launching `script.nf` [serene_austin] - revision: 5527b9b3c0
[SRR3137747, [/path/to/fasta/SRR3137747_1.fastq, /path/to/fasta/SRR3137747_2.fastq]]
[19_S8, [/path/to/fasta/19_S8_R1_001.fastq.gz, /path/to/fasta/19_S8_R2_001.fastq.gz]]
[F1HD1_S28, [/path/to/fasta/F1HD1_S28_R1.fastq.gz, /path/to/fasta/F1HD1_S28_R2.fastq.gz]]
Another option, which might be more robust (and readable), is to make use of the fact that you can specify more than one glob pattern using a list as argument, and build your list of glob patterns dynamically:
nextflow.enable.dsl=2
params.input_dir = '/path/to/fasta'
def cartesian_product(A, B) {
A.collectMany{ a -> B.collect { b -> [a, b] } }
}
def extensions = [
'.fastq.gz',
'.fastq',
'.fq.gz',
'.fq',
]
def suffixes = [
'*_R{1,2}_001',
'*_R{1,2}',
'*_{1,2}',
]
workflow {
def patterns = cartesian_product(suffixes, extensions).collect {
"${params.input_dir}/${it.join()}"
}
Channel.fromFilePairs( patterns ).view()
}
Results:
N E X T F L O W ~ version 21.04.3
Launching `script.nf` [deadly_payne] - revision: 6d2472ef23
[19_S8, [/path/to/fasta/19_S8_R1_001.fastq.gz, /path/to/fasta/19_S8_R2_001.fastq.gz]]
[F1HD1_S28, [/path/to/fasta/F1HD1_S28_R1.fastq.gz, /path/to/fasta/F1HD1_S28_R2.fastq.gz]]
[SRR3137747, [/path/to/fasta/SRR3137747_1.fastq, /path/to/fasta/SRR3137747_2.fastq]]

jq - How to extract domains and remove duplicates

Given the following json:
Full file here: https://pastebin.com/Hzt9bq2a
{
"name": "Visma Public",
"domains": [
"accountsettings.connect.identity.stagaws.visma.com",
"admin.stage.vismaonline.com",
"api.home.stag.visma.com",
"api.workbox.dk",
"app.workbox.dk",
"app.workbox.co.uk",
"authz.workbox.dk",
"connect.identity.stagaws.visma.com",
"eaccounting.stage.vismaonline.com",
"eaccountingprinting.stage.vismaonline.com",
"http://myservices-api.stage.vismaonline.com/",
"identity.stage.vismaonline.com",
"myservices.stage.vismaonline.com"
]
}
How can I transform the data to the below. Which is, to identify the domains in the format of site.SLD.TLD present and then remove the duplication of them. (Not including the subdomains, protocols or paths as illustrated below.)
{
"name": "Visma Public",
"domains": [
"workbox.co.uk",
"workbox.dk",
"visma.com",
"vismaonline.com"
]
}
I would like to do so in jq as that is what I've used to wrangled the data into this format so far, but at this stage any solution that I can run on Debian (I'm using bash) without any extraneous tooling ideally would be fine.
I'm aware that regex can be used within jq so I assume the best way is to regex out the domain and then pipe to unique however I'm unable to get anything working so far I'm currently trying this version which seems to me to need only the text transformation stage adding in somehow either during the jq process or with a run over with something like awk after the event perhaps:
jq '[.[] | {name: .name, domain: [.domains[]] | unique}]' testfile.json
This appears to be useful: https://github.com/stedolan/jq/issues/537
One solution was offered which does a regex match to extract the last two strings separated by . and call the unique function on that & works up to a point but doesn't cover site.SLD.TLD that has 2 parts. Like google.co.uk would return only co.uk with this jq for example:
jq '.domains |= (map(capture("(?<x>[[:alpha:]]+).(?<z>[[:alpha:]]+)(.?)$") | join(".")) | unique)'
A programming language is much more expressive than jq.
Try the following snippet with python3.
import json
import pprint
import urllib.request
from urllib.parse import urlparse
import os
def get_tlds():
f = urllib.request.urlopen("https://publicsuffix.org/list/effective_tld_names.dat")
content = f.read()
lines = content.decode('utf-8').split("\n")
# remove comments
tlds = [line for line in lines if not line.startswith("//") and not line == ""]
return tlds
def extract_domain(url, tlds):
# get domain
url = url.replace("http://", "").replace("https://", "")
url = url.split("/")[0]
# get tld/sld
parts = url.split(".")
suffix1 = parts[-1]
sld1 = parts[-2]
if len(parts) > 2:
suffix2 = ".".join(parts[-2:])
sld2 = parts[-3]
else:
suffix2 = suffix1
sld2 = sld1
# try the longger first
if suffix2 in tlds:
tld = suffix2
sld = sld2
else:
tld = suffix1
sld = sld1
return sld + "." + tld
def clean(site, tlds):
site["domains"] = list(set([extract_domain(url, tlds) for url in site["domains"]]))
return site
if __name__ == "__main__":
filename = "Hzt9bq2a.json"
cache_path = "tlds.json"
if os.path.exists(cache_path):
with open(cache_path, "r") as f:
tlds = json.load(f)
else:
tlds = get_tlds()
with open(cache_path, "w") as f:
json.dump(tlds, f)
with open(filename) as f:
d = json.load(f)
d = [clean(site, tlds) for site in d]
pprint.pprint(d)
with open("clean.json", "w") as f:
json.dump(d, f)
May I offer you achieving the same query with jtc: the same could be achieved in other languages (and of course in jq) - the query is mostly how to come up with the regex to satisfy your ask:
bash $ <file.json jtc -w'<domains>l:>((?:[a-z0-9]+\.)?[a-z0-9]+\.[a-z0-9]+)[^.]*$<R:' -u'{{$1}}' /\
-ppw'<domains>l:><q:' -w'[domains]:<[]>j:' -w'<name>l:'
{
"domains": [
"stagaws.visma.com",
"stage.vismaonline.com",
"stag.visma.com",
"api.workbox.dk",
"app.workbox.dk",
"workbox.co.uk",
"authz.workbox.dk"
],
"name": "Visma Public"
}
bash $
Note: it does extract only DOMAIN.TLD, as per your ask. If you like to extract DOMAIN.SLD.TLD, then the task becomes a bit less trivial.
Update:
Modified solution as per the comment: extract domain.sld.tld where 3 or more levels and domain.tld where there’s only 2
PS. I'm the creator of the jtc - JSON processing utility. This disclaimer is SO requirement.
One of the solutions presented on this page offers that:
A programming language is much more expressive than jq.
It may therefore be worthwhile pointing out that jq is an expressive, Turing-complete programming language, and that it would be as straightforward (and as tedious) to capture all the intricacies of the "Public Suffix List" using jq as any other programming language that does not already provide support for this list.
It may be useful to illustrate an approach to the problem that passes the (revised) test presented in the Q. This approach could easily be extended in any one of a number of ways:
def extract:
sub("^[^:]*://";"")
| sub("/.*$";"")
| split(".")
| (if (.[-1]|length) == 2 and (.[-2]|length) <= 3
then -3 else -2 end) as $ix
| .[$ix : ]
| join(".") ;
{name, domain: (.domains | map(extract) | unique)}
Output
{
"name": "Visma Public",
"domain": [
"visma.com",
"vismaonline.com",
"workbox.co.uk",
"workbox.dk"
]
}
Judging from your example, you don't actually want top-level domains (just one component, e.g. ".com"), and you probably don't really want second-level domains (last two components) either, because some domain registries don't operate at the TLD level. Given www.foo.com.br, you presumably want to find out about foo.com.br, not com.br.
To do that, you need to consult the Public Suffix List. The file format isn't too complicated, but it has support for wildcards and exceptions. I dare say that jq isn't the ideal language to use here — pick one that has a URL-parsing module (for extracting hostnames) and an existing Public Suffix List module (for extracting the domain parts from those hostnames).

Using Sikuli Finder() to search screenshot with icon, providing cached like response when used in a loop

(Using Sikuli IDE -288 20/04/19 on Windows 10)
I am currently having issues with a portion of code that runs correctly the first time around, but the second time where the function is looped instead of overwriting information created in the first iteration, it is somehow using the old information.
A function called selectRewards() is called, and it takes a few screenshots of the reward region over a few seconds to gather a useable still of an animation, the file name is numerically incremented. Then the function creates a Finder using the screenshots starting with screenshot 1. The Finder, and image I want to check against is passed into a search() function where it should be using the passed finder and image to find matches. It checks for all defined images in screenshot1, screenshot2 etc. until matches are found. And the matches are selected on the screen using coordinates from the screenshot image.
This all works well in the first iteration of the selectRewards(), it cycles through the screenshots and finds the images on a stable screenshot, but when the function is called again the same exact "found" results are returned, and clicks are exactly the same even when the images don't exist in the screenshots (I've even deleted the screenshots at the end of the first loop to try and clear any incorrect information being sent to the finder.
I've tried to pull the section out to share in a cleaner way, and it still provides the same issue. Any help and advice would be deeply appreciated.
(Though currently having even stranger issues with the code, since having the main script open in a tab on the IDE and the new script in another - neither running - if I run the snippet script it will use the coordinates/image finds from a previous run of the scripts). Can there be some sort of memory issue or caching in windows? ALT+SHIFT+R to restart the IDE normally helps clear the issue.
Settings.MoveMouseDelay = 0.5
#Define Regions
rewardRegion = Region(536,429,832,207)
#Define Images
searchCoupons = Pattern("coupons.png").similar(0.85)
searchAdvanced = Pattern("2011.png").similar(0.85)
searchAdvancedFrag = Pattern("2012.png").similar(0.85)
matchesFound = False
def search(image,rewardGlimpse, descr = ""):
print ("##### searching for: (%s) %s" % (image, descr))
rewardGlimpse.findAll(image) # find all matches (using passed finder variable & image variable)
matches = [] # an empty list to store the matches
while rewardGlimpse.hasNext(): # loop as long there is a first and more matches
matches.append(rewardGlimpse.next()) # access next match and add to matches
# now we have our matches saved in the list matches
print(" Does FindAll have next? (should be false):" + str(rewardGlimpse.hasNext()))
print(" Found matches:" + str(len(matches)))
if len(matches) > 0:
global matchesFound
matchesFound = True
obtainedReward = str(descr)
print(" Match found should be true " + str(matchesFound) + ". Found: "+obtainedReward)
# we want to use our matches
for m in matches:
#Find x & y location of rewards in screenshot
matchx = m.x
matchy = m.y
#Append them to the reward region to line it up.
newx = rewardRegion.getX()+matchx
newy = rewardRegion.getY()+matchy
rewardHover = Location(newx, newy)
#click the found reward location
click(rewardHover)
wait(1)
def selectRewards():
#---- Save Incremental Screenshots
wait(1)
capture(rewardRegion,"screenshot1.png")
wait(0.5)
capture(rewardRegion,"screenshot2.png")
wait(0.5)
capture(rewardRegion,"screenshot3.png")
wait(0.5)
capture(rewardRegion,"screenshot4.png")
wait(0.5)
capture(rewardRegion,"screenshot5.png")
wait(0.5)
capture(rewardRegion,"screenshot6.png")
wait(0.5)
#----- Test the screenshots
snum = 1 #screenshot file number
while True:
global matchesFound
if matchesFound == True:
print("Rewards Found - breaking search loop")
matchesFound = False
break
else:
pass
#Start with _screenshot1.png, increment snum.
screenshotURL = "_screenshot"+str(snum)+".png"
rewardGlimpse = Finder(screenshotURL) #Setup the Finder
print("Currently searching in: " + str(screenshotURL))
#Pass along the image to search, the screenshots Finder, and description.
search(searchCoupons,rewardGlimpse, "Coupons")
search(searchAdvanced,rewardGlimpse, "Advanced Recruit Proof")
search(searchAdvancedFrag,rewardGlimpse, "Advanced Recruit Fragments")
snum = snum + 1
if snum>6:
break
while True: #All rewards available this round are collected
if exists("1558962266403.png"):
click("1558962266403.png")
#confirm
break
else:
pass
print("No reward found at this point.")
print("Matches Found at No Reward Debug: " +str(matchesFound))
#Needed matches not found, selecting random reward.
hover("1558979979033.png")
click("1558980645042.png")
#matchesFound = False #Toggle back to False
#print("Matches found: Variable Value(Should be false)" + str(matchesFound))
def main():
i = 0
SSLoops = 2
while i < 2:
print("Loop #" + str(i+1) + "/"+ str(SSLoops))
print("--------------")
if i == 1: #remove this if statement for live
click("1559251066942.png") #switches spoofed html pages to show diff rewards
selectRewards()
i = i + 1
if __name__ == '__main__':
main()
Loop one of calling selectRewards() is correct, there were 3 images in the reward region that matched the things to search for. But the second loop is incorrect, only one of the matching images were there and wasn't in the same exact position. The script clicked in the 3 locations of the previous loop during the second time around.
Message log:
====
Loop #1/2
--------------
Currently searching in: _screenshot1.png
##### searching for: (P(coupons.png) S: 0.85) Coupons
Does FindAll have next? (should be false):False
Found matches:1
Match found should be true True. Found: Coupons
[log] CLICK on L[603,556]#S(0) (586 msec)
##### searching for: (P(2011.png) S: 0.85) Advanced Recruit Proof
Does FindAll have next? (should be false):False
Found matches:1
Match found should be true True. Found: Advanced Recruit Proof
[log] CLICK on L[653,556]#S(0) (867 msec)
##### searching for: (P(2012.png) S: 0.85) Advanced Recruit Fragments
Does FindAll have next? (should be false):False
Found matches:1
Match found should be true True. Found: Advanced Recruit Fragments
[log] CLICK on L[703,556]#S(0) (539 msec)
Rewards Found - breaking search loop
[log] CLICK on L[90,163]#S(0) (541 msec)
Loop #2/2
--------------
[log] CLICK on L[311,17]#S(0) (593 msec)
Currently searching in: _screenshot1.png
##### searching for: (P(coupons.png) S: 0.85) Coupons
Does FindAll have next? (should be false):False
Found matches:1
Match found should be true True. Found: Coupons
[log] CLICK on L[603,556]#S(0) (617 msec)
##### searching for: (P(2011.png) S: 0.85) Advanced Recruit Proof
Does FindAll have next? (should be false):False
Found matches:1
Match found should be true True. Found: Advanced Recruit Proof
[log] CLICK on L[653,556]#S(0) (535 msec)
##### searching for: (P(2012.png) S: 0.85) Advanced Recruit Fragments
Does FindAll have next? (should be false):False
Found matches:1
Match found should be true True. Found: Advanced Recruit Fragments
[log] CLICK on L[703,556]#S(0) (539 msec)
Rewards Found - breaking search loop
[log] CLICK on L[304,289]#S(0) (687 msec)
====
RaiMan from SikuliX:
ok, the reason is the image caching internally based on filenames.
When a new Finder is created with an image filename already in cache, the cached version is used.
So you either might switch off caching globally:
Settings.setImageCache(0)
or add:
Image.reset()
at the beginning of selectRewards()
or add the loop count to the filename of the captured images (take care: memory is constantly increased!)
BTW: when selecting another tab in the IDE, the images from the previously selected tab are uncached automatically.

CSV Reader works, Trouble with CSV writer

I am writing a very simple python script to READ a CSV (no problem) and to write to another CSV (issue):
System info:
Windows 10
Powershell
Python 3.6.5 :: Anaconda, Inc.
Sample Data: Office Events
The purpose is to filter events based on criteria, and to write to another CSV with desired criteria.
For Example:
I would like to read from this CSV and write the events where Registrations (or column 4) is Greater than 0 (remove rows with registrations = 0)
# SCRIPT TO FILTER EVENTS TO BE PROCESSED
import os
import time
import shutil
import os.path
import fnmatch
import csv
import glob
import pandas
# Location of file containing ALL events
path = r'allEvents.csv'
# Writes to writer
writer = csv.writer(open(r'RegisteredEvents' + time.strftime("%m_%d_%Y-%I_%M_%S") + '.csv', "wb"))
writer.writerow(["Event Name", "Start Date", "End Date", "Registrations", "Total Revenue", "ID", "Status"])
#writer.writerow([r'Event Name', r'Start Date', r'End Date', r'Registrations', r'Total Revenue', r'ID', r'Status'])
#writer.writerow([b'Event Name', b'Start Date', b'End Date', b'Registrations', b'Total Revenue', b'ID', b'Status'])
def checkRegistrations(file):
reader = csv.reader(file)
data = list(reader)
for row in data:
#if row[3] > str(0):
if row[3] > int(0):
writer.writerow(([data]))
The Error I continue to get is:
writer.writerow(["Event Name", "Start Date", "End Date", "Registrations", "Total Revenue", "ID", "Status"])
TypeError: a bytes-like object is required, not 'str'
I have tried using the various commented out statements
For Example:
"" vs r"" vs r'' vs b''
if row[3] > int(0) **vs** if row[3] > str(0)
Every time I execute my script, It creates the file.. so the first csv writer line works (create and open the file)... the second line (to write the headers) is when the error appears...
Perhaps I am getting mixed up with syntax due to python versions, or perhaps I am misusing the CSV library, or (more than likely) I have endless to learn about data type IO and conversion... someone please help!!
I am aware of the excess of import libraries -- script came from another basic script to move files from one location to another based on filename and output a rowcounter for each file being moved.
With that being said, I may be unaware of any missing/ needed libraries
Please let me know if you have any questions, concerns or clarifications
Thanks in advance!
It looks like you are calling:
writer = csv.writer(open('file.csv', 'wb'))
The 'wb' argument is the file mode. The 'b' means that you are opening the file that you are writing to in binary mode. You are then trying to write a string which isn't what it is expecting.
Try getting rid of the 'b' in the 'wb'.
writer = csv.writer(open('file.csv', 'w'))
Let me know if that works for you.

Why is the stream blocking in this iex session?

I'm trying to parse some CSVs using elixir:
iex -S mix
Erlang/OTP 18 [erts-7.2] [source-e6dd627] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Interactive Elixir (1.3.0-dev) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> a = File.stream!("test/lib/fit_notes/fit_notes_export.csv") |> CSV.decode
#Function<49.97003610/2 in Stream.transform/3>
iex(2)> Stream.take(a, 1)
#Stream<[enum: #Function<49.97003610/2 in Stream.transform/3>,
funs: [#Function<38.97003610/1 in Stream.take/2>]]>
iex(3)> Enum.take(a, 1)
[["Date", "Exercise", "Category", "Weight (kgs)", "Reps", "Distance",
"Distance Unit", "Time"]]
iex(4)> Enum.take(a, 2)
^ this just blocks
The first Enum.take that I issue works, the second blocks forever. Can you tell me what am I doing wrong? I'm using this library for CSV parsing.
If you follow the example from http://hexdocs.pm/csv/CSV.html
your code would end up something like this:
a = File.stream!("test/lib/fit_notes/fit_notes_export.csv") |> CSV.decode |>
Stream.take(1) |>
Stream.take(1) |>
Enum.take(2)
Note I changed your first Enum.take(1) in to a Stream.take(1) so that the Stream doesn't get prematurely terminated. Also note doing two Stream.take(1) will be better converted into a single Stream.take(2). Also note how the stream piping works by adding a |> to the end of the each line until you reach an Enum call - which then fires the whole operation.
Added:
For Streams with side-effects (like logging) see
https://github.com/elixir-lang/elixir/issues/1831 where they recommend Stream.each