Find and Replace in Files - UTF8

Find and Replace in Files - UTF8 - language-agnostic

Searching for a free application for commercial usage that allows find/replace in multiple files (regular expressions are nice but not a must), that supports opening and saving in UTF-8.
Tried a few like BKReplaceEm but the application ends up saving all the files as ASCII which causes some problems with web-rendering.
Please advise.
[UPDATE] To further clarify, I am searching for a windows utility.
[UPDATE #2] This is going to be used to run through our 450 page site and replace all french characters with the much needed HTML entities.

Notepad++ supports this feature, and is a great little editor in it's own regard.
Edit : Actually, Notepad++ does support replace in files. Click Search -> Find in Files, then select "Replace in files" in the dialog.

In the spirit of previous answer, you can use Perl (which has seamless native Unicode support and whose RegEx capablity are unparalleled). There are Windows perl versions avialable (ActivePerl, Strawberry, or you can use CygWin), and you can even slap GUIs on top of it -= for the latter, you can see what answers are given to my very recent So question :)
Plus, Perl can grab pretty much unlimitedly powerful collection of files, by using globs for simple things, File::Find for more complicated, and using grep on resulting file list to refine further if you need more fancy stuff, e.g. by content of modification time.
UPDATE For a Windows Editor, you can use UltraEdit. It has free evaluation period, and to be perfectly honest, I find the purchase price to be WELL worth paying for this very nice and powerful editor. Among its other features, it supports Unicode, and has pretty fancy search/replace ablities, including Perl RegEx support and S/R in multiple files.

Use sed.

jEdit has a feature called "HyperSearch" (just open the find dialog). You can specify a directory, a file name pattern and jEdit (being based on Java) does support lots of different encodings (and is often smart enough to figure out the correct one).

You could try my editor, Code Trowel
If it doesn't do what you want I'd probably fix it :-)

For windows, Notepad++ is awesome. It's licensed under the GPL. It does search and replace in files and does support regular expressions.

Related

Choice of database for a dictionary which can be edited as plain text

I am creating a dictionary app but I am not sure what kind of database to use !
My Requirements :
Support for Unicode characters.
Reasonably Fast search.
In case of an installed version of the app, the database file can be edited using a text editor (such as notepad).
I want to implement a web version of the app (WHICH USES THE SAME DATABASE FILE USED BY THE INSTALLED VERSION) and viewers can add/modify entries so it requires concurrent access via network.
Easy to parse in any programming language.
Expecting a maximum file size of 100 MB (May not reach anywhere near that but just to make it future proof.)
So with my limited knowledge I ended up with 3 options : CSV, XML or JSON. I prefer CSV for easy editing.
I understand that they are not as good as RDBMS but for the specific scenario is it possible to use any of these or how good can they perform?
Any alternate ideas are also welcome !
Thanks in advance.

I think that using a single file as a "database" is not such a great choice if you think/plan to extend your application further.
My recommendation is SQLite (https://www.sqlite.org/about.html). As it is currently promoted on their website it's suitable on almost all of your requirements except that of being edited with a text editor. But I think that are easy enough solutions for content management as well, like SqliteBrowser http://sqlitebrowser.org/

CHM Creator with ability to parse html meta keywords

I have lots of scanned images of a magazine(published monthly) and i have to organize it in searchable manner.
User should be able to view magazine issue wise or can search for predefined categories/keywords.
What i have thought for now, is to create CHM as it will need less effort than creating a new custom built software.
For that i will create seperate HTMl page(Programatically) with image embedded in it along with the keywords(Stored in Excel sheet along with path of Image) for which that image should be included in result.
So i want a chm creator that can parse html meta tags and add keywords in chm keywords list.
One such software i have found is Abee CHM Maker
But i need some free alternative.
If you have any other idea to organize it with minimal efforts, then also you are welcome...

The standard (free) way to create chm files is using Microsoft's HTML help workshop:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms670169(v=vs.85).aspx
Kind regards,
Bo

Free Pascal has a CHM creator package, a html DOM implementation and a basic commandline compiler for CHM projects (.hhp). The creator package is independent of MS tools or any other binary blob, and available in source. It is portable as far as FPC is portable (not as portable as gcc on paper, but enough in practice with all major architectures and OSes supported)
One could make something like that, I made something similar, but instead of meta, I folded back titles into TOC and index and cleaned up html (TeX4ht output) and fixed links before turning it into a chm.
But it will require some work, and if you are not familiar with Object Pascal/Delphi (the language), it might be a bridge too far. (the hours required would not compare favorably with the costs of the Abee thing, if that would suit your goals).
On the other hand, in a freely programmable system you can decide yourself how far you automatize things. I put in a lot of work once, and now all new output of tex4ht (with a certain fixed set of settings) formats nicely to chms.

See if this helps you (it certainly does what you need):
KEL CHM Creator: http://dumah7.wordpress.com/2009/02/17/kel-chm-creator-v-1-4-0-0/
Alternatively, I think you could add tags on each picture (right click on it-> Properties->Details->Tags) and use Windows explorer for searching them. I have never done this but it is supposed to be working (I guess).

View the innards of a .ppt file?

I need to figure out what is going on inside a client's .ppt files. What is a good way to get started?
My eventual hope is to convert it to HTML. But if I just export the .ppt to HTML, I get a lot of images (as opposed to text), which is not a Good Thing.
EDIT: software that automatically converts .ppt to HTML would be terrific, provided that it preserves as much information as possible in text format. If that doesn't exist, the next best thing would be to understand the innards of the .ppt and write my own code to do a partial conversion.
EDIT: I used OfficeConvert as recommended by Michiel Leenaars. It got me text all right. My 50-page, 8MB test file turned into 40MB of text. The fact that I got text is good. The fact that the amount went way up is moving in the wrong direction. And there is an awful lot of repetition in there. The word "style" appeared 410815 times; the word "draw" appeared 351229 times.

I think a safe way would be to use OfficeConvert to automatically convert to ODF programmatically with Microsoft Office. Run it with /? to get help. There are some dependencies (see below).
Then use a good ODF library like lpod to look inside it.
You can view some interesting code examples here.
Dependencies:
Microsoft .NET Framework Version 2.0 Redistributable Package (x86)
Primary Interop Assemblies for Office 2007 or Office 2010 (whichever you are using).

I like the Aspose products. (I'm not associated with them other than as a customer.) I've used the PPT one specifically to write code that pokes around in the insides of a PPT. Overkill if you just want to convert it to HTML, but invaluable for the sorts of things I use it for.

If you know Java, Apache has the POI project which lets you take a look at the inners of a PPT project. Could get all the info you want about the project (images, text) and then convert it to html however you like.
Its free too.

How can I analyze a closed format (e.g. doc or vce)?

I want to study the .vce format. It's a binary format and it seems more complicated than a simple object serialization. Does it exist any tool or technique to analyze a binary format?

You might need to "Reverse-Code-Engineer" a programm using this file format (http://www.openrce.org/). Tools used for this kind of analysis are: brain, disassembler (IDA Pro for example) and Debugger (OllyDBG for example). But beware - the way for successfull reverse engineering a file format is veeeeeerrry hard.
And reversing an application might be illegal depending on where you live!

You'll have to get a library that can read the format (or create one yourself).

Here is some of the microsoft office binary format specifications

I believe it would only be possible through some nasty reversed-engineering. It would be very useful to have access to application that uses mentioned format, so that you can generate few simple files and compare them in hex editor. You cannot get far with this method, but you might be able to figure out the header.
It would also be useful to study some binary format mechanisms, such as encryption and compression. If you're talking about Visual CertExam file format, than it is likely that useful data will be strongly encrypted.

My 2 cents:
Start by reversing the application reading the files themselves. Particularly android applications are helpful, as the resulting java source is easier to read (you might want to try A+ vce reader for android for example). This program indicates that vce uses/embeds sqlite in the file (in line with what is hinted here: Reverse Engineer a File Format).
Where to go from here? You might want to explore sqlite file carving tools to see if there might be a way to programatically identify the patterns in the file. Good luck!

Repository of BNF Grammars?

Is there a place I can find Backus–Naur Form or BNF grammars for popular languages? Whenever I do a search I don't turn up much, but I figure they must be published somewhere. I'm most interested in seeing one for Objective-C and maybe MySQL.

you have to search on tools used to create grammars: "lex/yacc grammar", "antlr grammar" "railroad diagram"
http://www.antlr3.org/grammar/list.html
Here's some grammar files
objective-c
http://www.omnigroup.com/mailman/archive/macosx-dev/2001-March/022979.html
http://www.cilinder.be/docs/next/NeXTStep/3.3/nd/Concepts/ObjectiveC/B_Grammar/Grammar.htmld/index.html
https://github.com/pornel/objc2grammar
python
http://www.python.org/dev/summary/2006-04-16_2006-04-30/#the-grammar-file-and-syntaxerrors
javascript
http://tomcopeland.blogs.com/EcmaScript.html
http://www.ccs.neu.edu/home/dherman/javascript/
ruby
http://www.ruby-doc.org/docs/ruby-doc-bundle/Manual/man-1.4/yacc.html

FWIW, the MySQL grammar file (mysql-server/sql/sql_yacc.y) is open source and browseable at launchpad.net (though it's a bit slow and I got an error when I tried to pull up the specific file).
Also, a snapshot of the whole MySQL Server source is downloadable from dev.mysql.com.

There are some links from w:BNF#Language Grammars.
BNF Grammars for SQL-92, SQL-99 and SQL-2003
I also found a page that lists grammars for Objective C.
Objective-C grammar for Lex/Yacc Flex/Bison
Reference Manual for the Objective-C Language

IIRC, BNF grammars are just different enough from what yacc/bison want as input to be really annoying :) If you intend to feed these files into a parser generator, you may want to look for files in the appropriate format. I recall seeing such files for Java, JavaScript and C++ at one point. Probably as part of Eclipse, Firefox and GCC, respectively, but I can't remember for sure. I would assume you can find pretty much any parser input file by finding an open source project that uses that language.

I also searched this and i collected this repository
http://slps.github.io/zoo/

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Find and Replace in Files - UTF8 - language-agnostic

Notepad++ supports this feature, and is a great little editor in it's own regard. Edit : Actually, Notepad++ does support replace in files. Click Search -> Find in Files, then select "Replace in files" in the dialog.

Use sed.

jEdit has a feature called "HyperSearch" (just open the find dialog). You can specify a directory, a file name pattern and jEdit (being based on Java) does support lots of different encodings (and is often smart enough to figure out the correct one).

You could try my editor, Code Trowel If it doesn't do what you want I'd probably fix it :-)

For windows, Notepad++ is awesome. It's licensed under the GPL. It does search and replace in files and does support regular expressions.

Related

Choice of database for a dictionary which can be edited as plain text

CHM Creator with ability to parse html meta keywords

View the innards of a .ppt file?

How can I analyze a closed format (e.g. doc or vce)?

Repository of BNF Grammars?

Categories

Resources