Parse txt eliminate header and footer Apache-Hop - apache-hop

I´m trying Apache HOP Desktop and I´m stuck in a pipeline where I have to clean and parse this one column txt. First eliminating header and footer then parsing according to position and lenght.
name: Initial_Pos: Lenght:
N1 2 10
N2 12 24
N3 108 30
i´m trying to use 'Text File Input' as a transform but no luck. Any suggestions?
Thanks in advance

The text file input transform has a wide variety of options you can use.
Under the content tab you can set the filetype to a fixed with file, add the number of header/footer lines and special things like encoding.
text file input content tab
Then under the fields tab you can set the position and length of each field it has to read.
After a bit of fiddling you should be able to extract the information you need.

Related

Word html format: insert a custom TOC via field code

I am generating Word docs from html. Basically, I build a file with html and save it as a .doc. Then I open it in Word and apply a template. All good so far.
I would like to automatically generate a custom TOC via the HTML ie when I am building the document. I need to insert a field code to do that, in the same way I do to add page numbering via the HML. eg:
<span style="mso-field-code: PAGE " class="page-field"></span>
If I save my html doc as docx and apply a template, I can make a TOC based in the styles in the way one would normally create a TOC in Word. I customised the TOC so the Title style is the top level followed by H1, H2 then H3. If I then toggle the field code on the TOC, the field code looks like this:
{ TOC \t "Heading 1,2,Heading 2,3,Heading 3,4,Title,1" }
Now, I can add HTML like this to insert the TOC:
<div style="mso-field-code: TOC " class="toc-field">TOC goes HERE</div>
When I do that, if I right click the text "TOC goes HERE" I get the option to "Update field" and if I do that a TOC is generated using the default H1,H2,H3 tags.
But, what I can't work out is how to include the
\t "Heading 1,2,Heading 2,3,Heading 3,4,Title,1"
part so my custom style sequence is applied. I have tried all sorts of combinations and it seems that adding anything after TOC causes Word to not make a field code.
Does anyone have any suggestions?
Update:
Based on the essential help from #slightlysnarky below, I thought I would summarise the outcome here because the information I needed was in a Microsoft chm file that was taken down many years ago. If you read the following extract from that help manual and compare it to the solution below you will see how this all works.
Word marks and stores information for simple fields by means of the Span element with the mso-field-code style. The mso-field-code value represents the string value of the field code. Formatting in the original field code might be lost when saving as HTML if only the string value of the code is necessary for its calculation.
Word has a different way of storing field information to HTML for more complex fields, such as ones that have formatted text or long values. Word marks these fields with so the data is not displayed in the browser. Word uses the Span element with the mso-element: field-begin, mso-element: field-separator, and mso-element: field-end attributes to contain the three respective parts of the field code: the field start, the separator between field code and field results, and the field end. Whenever possible, Word will save the field to HTML in the method that uses the least file space.
So, basically, add tags as shown below to your HTML at the point you wish the TOC to appear.
:-)
Word recognises a "complex field format" in HTML, along the same lines as it does in the Office Open XML format. So you can use
<span style='mso-element:field-begin'></span>TOC \t "Heading 1,2,Heading 2,3,Heading 3,4,Title,1"
<span style='mso-element:field-separator'></span>This text will show but the user will need to update the field
<span style='mso-element:field-end'></span>
This construct is outlined in a Microsoft document called "Microsoft Office HTML and XML Reference". It's a Windows .exe that unpacks to a .chm Help file. You can get it here
The info. on encoding fields is in Getting Started with Microsoft Office 2000 HTML and XML->Microsoft Word->Fields
There may be a later version but that's the only one I could find.

Find Replace text FOO with Style "Heading 1" with <h1>Foo</h1>

I am trying to find an easy way to convert my Word documents to HTML without the awful save-as that is built in. These are structured documents (designed for our screen-reader (JAWS) users), and so they use Heading 1, 2, 3, 4 & the Table of Contents.
We plan to convert these to DAISY audiobooks (https://en.wikipedia.org/wiki/DAISY_Digital_Talking_Book ) , so we need pretty clean, but structured, HTML to convert.
I tried the find-replace, using Styles, but it would just replace anything in the text part of the search. I could convert it from any one style to another, but adding text in the box messed it up.
(I think I see that CSS for DAISY means that instead of just <h2> it will have to be <level2 class=='section' <h2> and closing tags), but that's step 2 after I handle this part.)
I just want to be able to find any text using Style 2 and add text to the start of that line saying "yep, here's some style 2" so that I can do the HTML/CSS stuff.
Thanks!
You can do that with a simple Find/Replace. For example, specify the Heading 1 Style for the Find parameter and use:
Replace = <h1>^&</h1>
For a macro you could incorporate that into, see: Convert a Word Range to a String with HTML tags in VBA

How to remove a div from the entire project?

I've got a project consisting of over 200 html files. There's a div repeated throughout most of these, looking like this:
<div class='foobar' id="abcdef123'></div>
I have found all uses of the class using the Find in Files function in Sublime Text 2 - now I want to remove them, i.e. completely delete any line containing that div (and its closing tag).
Is there an easy way to do it in Sublime Text 2?
EDIT: I have forgotten to mention that sometimes the div has additional classes and the ID is always different. How would I write a regexp to deal with that?
In Notepad++, open all 200 files and replace with the following regular expression.
<div class='foobar[^']*' id="[^']*"></div>
and replace it by nothing. I don't know Sublimetext2.

HTML text input plugin for CSV data preserving column alignment?

When pasting CSV data into an HTML text area, or in all the jquery rich text editors I could find, data is visually "messed up": columns alignment is lost, in particular when a cell is long and the one below is very short.
Is anyone aware of some kind of plugin similar to a text area that would visually preserve columns alignment when pasting some CSV data into it? That would require interpreting the tabs as columns separators, and not just as a fixed number of spaces.
Thanks!
Write a script to replace the tabs for celltags and the newlines for rowtags like:
r='foo\tbar\tfoobar\nbar\tfoo\tfoobar\nfoobar\tbar\tfoo';
r = r.replace(/\n/g,'</td></tr><tr><td>');
r = r.replace(/\t/g, '</td><td>');
r = '<table><tr><td>' + r + '</td></tr></table>';
alert(r);

How can I preserve leading white space in PDF export of reporting services

I have some leagacy reporting data which is accessed from SSRS via an xml web service data source. The service returns one big field containing formatted plain text.
I've been able to preserve white space in the output by replacing space chars with a non-breaking space, however, when exporting to PDF leading white space is not preserved on lines that do not begin with a visible character. So a report that should render like this:
Report Title
Name Sales
Bob 100.00
Wendy 199.50
Is rendered like this:
Report Title (leading white space stripped on this line)
Name Sales (intra-line white space is preserved)
Bob 100.00
Wendy 199.50
I've not been able to find any solution other than prefixing each line with a character which I really don't want to do.
Using SQL 2005 SP3
I googled and googled the answer to this question. Many answers included changing spaces to Chr(20) or Chr(160). I found a simple solution that seems to work.
If your leading spaces come from a tab stop replace "/t" with actual spaces, 5 or so
string newString = oldString.Replace("/t"," ")
In the expression field for the textbox I found that simply adding a null "Chr(0)" at the beginning of the string preserves the leading spaces.
Example:
=Chr(0) & "My Text"
Have you tried non-breaking spaces of the ASCII variety?
=Replace(Fields!Report.Value, " ", chr(160))
I use chr(160) to keep phone numbers together (12 345 6789). In your case you may want to only replace leading spaces.
You can use padding property of the textbox containing the text. Padding on the left can be increased to add space that does not get stripped on output.
I have used this work around:
In the Textbox properties select the alignment tab.
In the Padding option section edit the right or left padding(wherever you need to add space).
If you need to conditionally indent the text you can use the expression as well.