Method of identifying plaintext files as scripts

Method of identifying plaintext files as scripts - language-agnostic

I am creating a filter for files coming onto a Unix machine. I only want to allow plain text files that do not look like scripts to pass through.
For checking plain text I am checking the executable bit of the file and using the -T file test from perl. (I understand this is not 100%, but it will catch the binary files I most want to avoid). I think this will be sufficient, but any suggestions are welcome.
My main question is in recognizing when a plain text file is a script. Every script I've ever written has started out with a #! line, so my first thought is to read in the file's first line and block any containing that. Are there common non-script plain text files that start with the #! line that I will flag with a false-positive? Are there better/additional methods of identifying a script?

That's what the file command (see Wikipedia) is for. It recognizes much more than just the she-bang (#!), and can tell you what kind of script it is, if any.

Related

Fortran90 - compiled program creates a blank csv file instead of reading the existing one

In short: I am trying to load a csv file but the program always overwrites the existing file as an empty new file.
Longer: I am pretty new to Fortran, so bear with me. I am trying to read data from a csv file into a fortran program. Now I didn't write the program and it is pretty big, so I can't post the whole thing here. The program consists of a whole bunch of .f90 files and everything is compiled using a makefile. Now since I am loading the gcc module before compiling, I am assuming that it is compiled using GNU Fortran, because it is part of gcc. (idk how to find out if that is correct)
The compiler returns an executable in a different directory. When I execute the program in that directory it apparently overwrites the existing .csv file with a new blank one, so the program only reads "End of File". I don't know why it always creates a new file, how do I stop it from doing so?
As a side note, the csv file I am trying to read simply consists of a single column of floats, e.g.
"0.01, 0.13, 0.041,..." etc.
The code that I inserted into a subroutine of one of the .f90 files is the following:
real*8, dimension(nz) :: Nsq
integer :: i
open(10, file='Nsq.csv')
do i=1,20
read(10, *) Nsq(i)
enddo
close(10)
I have also tried to write a small test program, essentially running the same code as above. That one works just fine and outputs the contents of the csv file without any issues. For that one I use gfortran to compile it.
I have no experience in Fortran at all, so I am completely stumped, why this happens. I know the chances are slim that you guys can help me with this, since I can't provide the whole source code. But maybe someone has an idea why this occurs. Maybe you know an alternate way of reading csv files?
Thanks for your time.

The open-statement in Fortran OPEN(connect-spec-list), has a lot of connection specifications which define how an external file should be managed (see. Fortran 2018 Standard sec 12.5.6).
When you open a file using the simplest form of the open-statement:
OPEN(unit=unitid,file="filename")
A lot of default assumptions are made such as: ACCESS="SEQUENTIAL", ASYNCHRONOUS="NO", BLANK="NULL", .... The most important ones, however, are ACTION and STATUS which define the purpose of the file. The action specification states if you want to use the file for reading, writing or both, while the status essentially defines if we work on an existing file or not, and what we should do with it (replace it, keep it, ...)
Both these specifications have a default compiler dependent state.
In the Intel compiler suit, the default is action="readwrite", status="unknown" (see here and here)
Intel defines the status="unknown" as :Indicates the file may or may not exist. If the file does not exist, a new file is created and its status changes to 'OLD'.
The Gnu compiler suit has a different take on this. The default action is defined by a set of rules which depend on its accessibility if the file exists (+rw,+r-w,-r+w) (see here). The behaviour for the default action="unknown" is not documented but seems to be REWRITE (see Default Status of "Unknown" in Open)
It is advised to use a proper method if you know what you want to do with the file:
OPEN(newunit=unitid, file="filename", action="read", status="old")

adding text to an xls file with bash script

I'm trying to understand if it's possible to write to an xls file with a bash script. Situation is outlined below.
I have a cronjob that runs every monday and generates an xls and emails to my client. This xls is filled with data from a MySQL DB. when the report is empty and the client attempts to open it, it shows as corrupt. Originally I addressed this issue by excluding empty files from the email with an if statement. However, the constraint is that all 4 reports much reach the client - empty or not.
So my question is, can I simply add a row of text at the top with a bash script so the file never "empty"? I'm not an expert in bash scripting by any means, so feedback here would be great. thanks!
Tony

I'm not aware of any pure bash implementation for writing XLS files. There are solutions in other languages such as Perl, Python, or PHP. If you think outside the box there is another option available to you. You mentioned that you currently use an if statement to not attach empty files. Create a blank spreadsheet in a program like MS Excel, optionally enter some text in A1 like "No records", save it, and transfer that to a known location on the server that runs the cronjob. Rather that skipping the attachment, whenever you detect an empty file in your if statement just attach the blank "No records" template XLS file. You may need to copy the template to a temporary location before attaching if you need to rename the file.

Mass-upload many text files to MediaWiki

I have many text files that I want to upload to a wiki running MediaWiki.
I don't even know if this is really possible, but I want to give it a shot.
Each text file's name will be the title of the wiki page.
One wiki page for one file.
I want to upload all text files from the same folder as the program is in.
Perhaps asking you to code it all is asking too much, so could you tell me at least which language I should look for to give it a shot?

What you probably want is a bot to create the articles for you using the MediaWiki API. Probably the best known bot framework is pywikipedia for Python, but there are API libraries and bot frameworks for many other languages too.
In fact, pywikipedia comes with a script called pagefromfile.py that does something pretty close to what you want. By default, it creates multiple pages from a single file, but if you know some Python, it shouldn't be too hard to change that.
Actually, if the files are on the same server your wiki runs on (or you can upload them there), then you don't even need a bot at all: there's a MediaWiki maintenance script called importTextFile.php that can do it for you. You can run it in for all files in a given directory with a simple shell script, e.g.:
for file in directory/*.txt; do
php /path/to/your/mediawiki/maintenance/importTextFile.php "$file";
done
(Obviously, replace directory with the directory containing the text files and /path/to/your/mediawiki with the actual path of your MediaWiki installation.)
By default, importTextFile.php will base the name of the created page on the filename, stripping any directory prefixes and extensions. Also, per standard MediaWiki page naming rules, underscores will be replaced by spaces and the first letter will be capitalized (unless you've turned that off in your LocalSettings.php); thus, for example, the file directory/foo_bar.txt would be imported as the page "Foo bar". If you want finer control over the page naming, importTextFile.php also supports an explicit --title parameter. Or you could always copy the script and modify it yourself to change the page naming rules.
Ps. There's also another MediaWiki maintenance script called edit.php that does pretty much the same thing as importTextFile.php, except that it reads the page text from standard input and doesn't have the convenient default page naming rules of importTextFile.php. It can be quite handy for automated edits using Unix pipelines, though.
Addendum: The importTextFile.php script expects the file names and contents to be in the UTF-8 encoding. If your files are in some other encoding, you'll have to either fix them first or modify the script to do the conversion, e.g. using mb_convert_encoding().
In particular, the following modifications to the script ought to do it:
To convert the file names to UTF-8, edit the titleFromFilename() function, near the bottom of the script, and replace its last line:
return $parts[0];
with:
return mb_convert_encoding( $parts[0], "UTF-8", "your-encoding" );
where your-encoding should be the character encoding used for your file names (or auto to attempt auto-detection).
To also convert the contents of the files, make a similar change higher up, inside the main code of the script, replacing the line:
$text = file_get_contents( $filename );
with:
$text = file_get_contents( $filename );
$text = mb_convert_encoding( $text, "UTF-8", "your-encoding" );

In MediaWiki 1.27, there is a new maintenance script, importTextFiles.php, which can do this. See https://www.mediawiki.org/wiki/Manual:ImportTextFiles.php for information. It improves on the old (now removed) importTextFile.php script in that it can handle file wildcards, so it allows the import of many text files at once.

tiffcp.exe merging a results file with a results file in a loop

I am building a web app that takes several tiff image files and merges them together into one single tiff image file using GNUWin32 tiffcp.exe from command line.
The way I was doing it was to loop through the file list and build a string of file names to merge into one single variable.
strfileList = "c:folder\folder\folder\aased98-def-wsdeff-434fsdsd-dvv.tif c:folder\folder\folder\aased98-def-wsdeff-434fsdsd-axs.tif c:folder\folder\folder\aased98-def-wsdeff-434fsdsd-dxzs.tif"
Then I would just write to the command line:
tiffcp.exe strFileList results.tif
The file names are guids and so the paths are fairly long and I do not have any control to shorten them. So if I have a bunch of these documents (over 20 files or so), the length of the string variable exceeds the limits for windows command line and the merge fails.
Since this process is just merging files, my next thought was instead of writing the file names to a string, just do the merge one file at a time. So the first time the loop runs the following type of code:
tiffcp.exe file1.tif results.tif
The result is a perfect 476k tif file. But the next iteration of the loop needs to merge the second file plus the contents of the first "results" tif file. So I do this:
tiffcp.exe results.tif file2.tiff results.tif
The results each time are a blank 1K tiff file?
All the examples I can find of tiffcp.exe say file1.tif file2.tif results.tif, none use the results file to write back to itself?
Any suggestions on how to do this?

Try the -a switch to tiffcp.exe
I'm doing something similar in Python and inside my file processing loop I'm issuing the command:
tiffcpp.exe -a temp.tif output.tif
works fine.

For an ASP.NET project you may want to try LibTiff.Net (free, open source, BSD license). That port of libtiff library contains tiffcp utility with source code. You may try to use it in your code.
Disclaimer: I am one of the maintainers of the library.

I believe your problem is caused by the use of results.tif as both input as output. If you increment the file name (i.e. results1.tif to results2.tif etc.) I believe it should work.
This is a rather inefficient approach (tiff1 is copied 9 times if you have 10 files). Since you refer to libtiff, you may take a look at the source of libtiff cp and check if it is worthwhile to embed it.

What's the best way to automate text replacing?

Here's the situation:
I have a lot of HTML files, and these HTML files link to a lot of documents. The documents have ALL been renamed. I have an excel sheet which has the old name of the file and the new name of the file.
What would be the quickest way to change the links inside the HTML files to accommodate the new names?
The method I'm using now:
Have all the HTML files opened in Notepad++
Use Notepad++'s 'Replace in All Opened Documents' function to replace all occurrences of a certain link with the new file name.
Is there a quicker, better way?

Perl's regular expressions.
elaboration:
pseudocode
open up each file for read-only and read them into a list.
close the files
foreach element in the list
#do the desired text replacement
`s/$oldtext/$newtext/g`;
open each file once more now for writing
write out the new text.
It's not hard, but requires some testing. If you have a lot of edits(and more may happen later), this is more efficient.

There are several free and open-source tools that replace text in several files, one of the open-source ones is FART
If you prefer something with a GUI, try the free Text Crawler

First save the excel to somethine nice and simple like a csv file so its easy to read in you favourite language eg perl. Iterate over each file and do the search and replace. One gotcha though is to do it all in one pass otherwise you could create problems if there are links that have changed in complex ways. Ie if file a.html changed to b.html and b.html changed to a.html you can mess up the links if you do it in multiple passes. So load all the changes into memory then cycle through each file and replace all links in it simultaneously.
Because it is specifically html search and replace a tool like this would be ideal:
http://www.aliassoftware.com/
Finds and Replaces multiple text strings in multiple files at once !

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008