I want to force a document to classify against a particular layout on Hyperscience - is this possible? I can use the uuid, layout_uuid, layout_version_uuid, along with other metadata. I also want to include the pages belonging to the document if it has been classified already.
I’ve already set up the custom code block to perform this function as below:
def force_classification(submission: Any) -> Any:
***insert code here***
return submission
cct_force_classification = CodeBlock(
reference_name='force_classification',
code=force_classification,
code_input={'submission': previous_block.output('submission')},
title='Force Classification',
description='Force Classification',
)
Reading the SDK docs, I didn't see a clear way to do this. I'm wondering if this is just not possible?
Yes, this is possible! However, there are some limitations. You are able to use a custom code block to specify the layout that a document must be classified against if it has already been classified, as long as the layout that you’re forcing classification against is a semi-structured layout.
new_documents = []
for document in submission.get('documents', []):
if document['layout_uuid'] == 'layout_uuid[1]':
new_document = {
'uuid': document['id'],
'layout_version_uuid': 'layout_version_uuid[2]',
'layout_uuid': 'layout_uuid[1]',
'pages': [{
'submission_page_id': page['id'],
'page_number': page['submission_page_number'],
'classification_type': page['classification_type'],
} for page in document.get('pages', [])],
'metadata': {},
}
new_documents.append(new_document)
return {'submission': submission, 'new_documents': new_documents}
Note that, here, layout_uuid[1] refers to an existing document, and 2 corresponds to the metadata of the other layout you want to force classification against.
Keep in mind that this is still superficial (client side) and will not reflect in the Hyperscience db until you sync this new document back.
I have a base .docx for which I need to change the page header / footer image on a case by case basis. I read that python-docx does not yet handle headers/footers but it does handle Pictures.
What I cannot work around is how to replace them.
I found the Pictures in the documents ._package.parts objects as ImagePart, I could even try to identify the image by its partname attribute.
What I could not find in any way is how to replace the image. I tried replacing the ImagePart ._blob and ._image attributes but it makes no difference after saving.
So, what would be the "good" way to replace one Image blob with another one using python-docx? (it is the only change I need to do).
Current code is:
d = Document(docx='basefile.docx')
parts = d._package
for p in parts:
if isinstance(p, docx.parts.image.ImagePart) and p.partname.find('image1.png'):
img = p
break
img._blob = open('newfile.png', 'r').read()
d.save('newfile.docx')
Thanks,
marc
There is no requirement to use python-docx. I found another Python library for messing with docx files called "paradocx" altought it seems a bit abandoned it works for what I need.
python-docx would be preferable as the project seems more healthy so a solution based on it is still desired.
Anyway, here is the paradocx based solution:
from paradocx import Document
from paradocx.headerfooter import HeaderPart
template = 'template.docx'
newimg = open('new_file.png', 'r')
doc = Document.from_file(template)
header = doc.get_parts_by_class(HeaderPart).next()
img = header.related('http://schemas.openxmlformats.org/officeDocument/2006/relationships/image')[0]
img.data = newimg.read()
newimg.close()
doc.save('prueba.docx')
I have an array of 2000 items, that I need to display in html - each of the items is placed into a div. Now each of the items can have 6 links to click on for further action. Here is how a single item currently looks:
<div class='b'>
<div class='r'>
<span id='l1' onclick='doSomething(itemId, linkId);'>1</span>
<span id='l2' onclick='doSomething(itemId, linkId);'>2</span>
<span id='l3' onclick='doSomething(itemId, linkId);'>3</span>
<span id='l4' onclick='doSomething(itemId, linkId);'>4</span>
<span id='l5' onclick='doSomething(itemId, linkId);'>5</span>
<span id='l6' onclick='doSomething(itemId, linkId);'>6</span>
</div>
<div class='c'>
some item text
</div>
</div>
Now the problem is with the performance. I am using innerHTML to set the items into a master div on the page. The more html my "single item" contains the longer the DOM takes to add it. I am now trying to reduce the HTML to make it small as possible. Is there a way to render the span's differently without me having to use a single span for each of them? Maybe using jQuery?
First thing you should be doing is attaching the onclick event to the DIV via jQuery or some other framework and let it bubble down so that you can use doSomething to cover all cases and depending on which element you clicked on, you could extract the item ID and link ID. Also do the spans really need IDs? I don't know based on your sample code. Also, maybe instead of loading the link and item IDs on page load, get them via AJAX on a as you need them basis.
My two cents while eating salad for lunch,
nickyt
Update off the top of my head for vikasde . Syntax of this might not be entirely correct. I'm on lunch break.
$(".b").bind( // the class of your div, use an ID , e.g. #someID if you have more than one element with class b
"click",
function(e) { // e is the event object
// do something with $(e.target), like check if it's one of your links and then do something with it.
}
);
If you set the InnerHtml property of a node, the DOM has to interpret your HTML text and convert it into nodes. Essentially, you're running a language interpreter here. More text, more processing time. I suspect (but am not sure) that it would be faster to create actual DOM element nodes, with all requisite nesting of contents, and hook those to the containing node. Your "InnerHTML" solution is doing the same thing under the covers but also the additional work of making sense of your text.
I also second the suggestion of someone else who said it might be more economical to build all this content on the server rather than in the client via JS.
Finally, I think you can eliminate much of the content of your spans. You don't need an ID, you don't need arguments in your onclick(). Call a JS function which will figure out which node it's called from, go up one node to find the containing div and perhaps loop down the contained nodes and/or look at the text to figure out which item within a div it should be responding to. You can make the onclick handler do a whole lot of work - this work only gets done once, at mouse click time, and will not be multiplied by 2000x something. It will not take a perceptible amount of user time.
John Resig wrote a blog on documentDragments http://ejohn.org/blog/dom-documentfragments/
My suggestion is to create a documentDragment for each row and append that to the DOM as you create it. A timeout wrapping each appendChild may help if there is any hanging from the browser
function addRow(row) {
var fragment = document.createDocumentFragment();
var div = document.createElement('div');
div.addAttribute('class', 'b');
fragment.appendChild(div);
div.innerHtml = "<div>what ever you want in each row</div>";
// setting a timeout of zero will allow the browser to intersperse the action of attaching to the dom with other things so that the delay isn't so noticable
window.setTimeout(function() {
document.body.appendChild(div);
}, 0);
};
hope that helps
One other problem is that there's too much stuff on the page for your browser to handle gracefully. I'm not sure if the page's design permits this, but how about putting those 2000 lines into a DIV with a fixed size and overflow: auto so the user gets a scrollable window in the page?
It's not what I'd prefer as a user, but if it fixes the cursor weirdness it might be an acceptable workaround.
Yet Another Solution
...to the "too much stuff on the page" problem:
(please let me know when you get sick and tired of these suggestions!)
If you have the option of using an embedded object, say a Java Applet (my personal preference but most people won't touch it) or JavaFX or Flash or Silverlight or...
then you could display all that funky data in that technology, embedded into your browser page. The contents of the page wouldn't be any of the browser's business and hence it wouldn't choke up on you.
Apart from the load time for Java or whatever, this could be transparent and invisible to the user, i.e. it's (almost) possible to do this so the text appears to be displayed on the page just as if it were directly in the HTML.
I have a particularly stupid insecurity about the aesthetics of my code... my use of white space is, frankly, awkward. My code looks like a geek dancing; not quite frightening, but awkward enough that you feel bad staring, yet can't look away.
I'm just never sure when I should leave a blank line or use an end of line comment instead of an above line comment. I prefer to comment above my code, but sometimes it seems strange to break the flow for a three word comment. Sometimes throwing an empty line before and after a block of code is like putting a speed bump in an otherwise smooth section of code. For instance, in a nested loop separating a three or four line block of code in the center almost nullifies the visual effect of indentation (I've noticed K&R bracers are less prone to this problem than Allman/BSD/GNU styles).
My personal preference is dense code with very few "speed bumps" except between functions/methods/comment blocks. For tricky sections of code, I like to leave a large comment block telling you what I'm about to do and why, followed by a few 'marker' comments in that code section. Unfortunately, I've found that some other people generally enjoy generous vertical white space. On one hand I could have a higher information density that some others don't think flows very well, and on the other hand I could have a better flowing code base at the cost of a lower signal to noise ratio.
I know this is such a petty, stupid thing, but it's something I really want to work on as I improve the rest of my skill set.
Would anyone be willing to offer some hints? What do you consider to be well flowing code and where is it appropriate to use vertical white space? Any thoughts on end of line commenting for two or three words comments?
Thanks!
P.S.
Here's a method from a code base I've been working on. Not my best, but not my worst by far.
/**
* TODO Clean this up a bit. Nothing glaringly wrong, just a little messy.
* Packs all of the Options, correctly ordered, in a CommandThread for executing.
*/
public CommandThread[] generateCommands() throws Exception
{
OptionConstants[] notRegular = {OptionConstants.bucket, OptionConstants.fileLocation, OptionConstants.test, OptionConstants.executable, OptionConstants.mountLocation};
ArrayList<Option> nonRegularOptions = new ArrayList<Option>();
CommandLine cLine = new CommandLine(getValue(OptionConstants.executable));
for (OptionConstants constant : notRegular)
nonRegularOptions.add(getOption(constant));
// --test must be first
cLine.addOption(getOption(OptionConstants.test));
// and the regular options...
Option option;
for (OptionBox optionBox : optionBoxes.values())
{
option = optionBox.getOption();
if (!nonRegularOptions.contains(option))
cLine.addOption(option);
}
// bucket and fileLocation must be last
cLine.addOption(getOption(OptionConstants.bucket));
cLine.addOption(getOption(OptionConstants.fileLocation));
// Create, setup and deploy the CommandThread
GUIInteractiveCommand command = new GUIInteractiveCommand(cLine, console);
command.addComponentsToEnable(enableOnConnect);
command.addComponentsToDisable(disableOnConnect);
if (!getValue(OptionConstants.mountLocation).equals(""))
command.addComponentToEnable(mountButton);
// Piggy-back a Thread to start a StatReader if the call succeeds.
class PiggyBack extends Command
{
Configuration config = new Configuration("piggyBack");
OptionConstants fileLocation = OptionConstants.fileLocation;
OptionConstants statsFilename = OptionConstants.statsFilename;
OptionConstants mountLocation = OptionConstants.mountLocation;
PiggyBack()
{
config.put(OptionConstants.fileLocation, getOption(fileLocation));
config.put(OptionConstants.statsFilename, getOption(statsFilename));
}
#Override
public void doPostRunWork()
{
if (retVal == 0)
{
// TODO move this to the s3fronterSet or mounts or something. Take advantage of PiggyBack's scope.
connected = true;
statReader = new StatReader(eventHandler, config);
if (getValue(mountLocation).equals(""))
{
OptionBox optBox = getOptionBox(mountLocation);
optBox.getOption().setRequired(true);
optBox.requestFocusInWindow();
}
// UGLY HACK... Send a 'ps aux' to grab the parent PID.
setNextLink(new PSCommand(getValue(fileLocation), null));
fireNextLink();
}
}
}
PiggyBack piggyBack = new PiggyBack();
piggyBack.setConsole(console);
command.setNextLink(piggyBack);
return new CommandThread[]{command};
}
It doesn't matter.
1) Develop a style that is your own. Whatever it is that you find easiest and most comfortable, do it. Try to be as consistent as you can, but don't become a slave to consistency. Shoot for about 90%.
2) When you're modifying another developer's code, or working on a group project, use the stylistic conventions that exist in the codebase or that have been laid out in the style guide. Don't complain about it. If you are in a position to define the style, present your preferences but be willing to compromise.
If you follow both of those you'll be all set. Think of it as speaking the same language in two different ways. For example: speaking differently around your friends than you do with your grandfather.
It's not petty to make pretty code. When I write something I'm really proud of, I can usually take a step back, look at an entire method or class, and realize exactly what it does at a glance - even months later. Aesthetics play a part in that, though not as large of a part as good design. Also, realize you can't always write pretty code, (untyped ADO.NET anyone?) but when you can, please do.
Unfortunately, at this higher level at least, I'm not sure there are any hard rules you can adhere to to always produce aesthetically pleasing code. One piece of advice I can offer is to simply read code. Lots of it. In many different frameworks and languages.
I like to break up logical "phrases" of code with white space. This helps others easily visualize the logic in the the method - or remind me when I go back and look at old code. For example, I prefer
reader.MoveToContent();
if( reader.Name != "Limit" )
return false;
string type = reader.GetAttribute( "type" );
if( type == null )
throw new SecureLicenseException( "E_MissingXmlAttribute" );
if( String.Compare( type, GetLimitName(), false ) != 0 )
throw new SecureLicenseException( "E_LimitValueMismatch", type, "type" );
instead of
reader.MoveToContent();
if( reader.Name != "Limit" )
return false;
string type = reader.GetAttribute( "type" );
if( type == null )
throw new SecureLicenseException( "E_MissingXmlAttribute" );
if( String.Compare( type, GetLimitName(), false ) != 0 )
throw new SecureLicenseException( "E_LimitValueMismatch", type, "type" );
The same break can almost be accomplished with braces but I find that actually adds visual noise and reduces the amount of code that can be visually consumed simultaneously.
Commens on code line
As for comments at the end of the line - almost never. The're not really bad, just easy to miss when scanning through code. And they clutter up the line taking away from the code making it harder to read. Our brains are already wired to grok line by line. When the comment is at the end of the line we have to split the line into two concrete concepts - code and comment. I say if it's important enough to comment on, put it on the line proceeding the code.
That being said, I do find one or two line hint comments about the meaning of a specific value are sometimes OK.
I find code with very little whitespace hard to read and navigate in, since I need to actually read the code to find logical structure in it. Clever use of whitespace to separate logical parts in functions can increase the ease of understanding the code, not only for the author but also for others.
Keep in mind that if you are working in an environment where your code is likely to be maintained by others, they will have spent the majority of their time looking at code that was not written by you. If your style distinctly differs from what they are used to seeing, your smooth code may be a speed bump for them.
I minimize white space. I put the main comment block above the code block and Additional end of line comments on the Stuff that may not be obvious to another dveloper. I think you are doing that already
My preferred style is probably anathema to most developers, but I will add occasional blank lines to separate what seem like appropriate 'paragraphs' of code. It works for me, nobody has complained during code reviews (yet!), but I can imagine that it might seem arbitrary to others. If other people don't like it I'll probably stop.
The most important thing to remember is that when you join an existing code base (as you almost always will in your professional career) you need to adhere to the code style guide dictated by the project.
Many developers, when starting a project afresh, choose to use a style based on the Linux kernel coding-style document. The latest version of that doc can be viewed at http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/CodingStyle;h=8bb37237ebd25b19759cc47874c63155406ea28f;hb=HEAD.
Likewise many maintainers insist that you use Checkpatch before submitting changes to version control. You can see the latest version that ships with the Linux kernel in same tree I linked to above at scripts/checkpatch.pl (I would link to it but I'm new and can only post one hyperlink per answer).
While Checkpatch is not specifically related to your question about whitespace usage, it will certainly help you eliminate trailing whitespace, spaces before tabs, etc.
Code Complete, by Steve McConnell (available in the usual locations) is my bible on this sort of thing. It has a whole chapter on layout and style that is just excellent. The whole book is just chock full of useful and practical advice.
I use exactly the same amount of whitespace as you :) Whitespace before methods, before comment blocks. In C, C++ the brackets also provide some "pseudo-whitespace" as there is only a single opening/closing brace on some lines, so this also serves to break up the code density.
Your code is fine, just do what you (and others you might work with) are comfortable with.
The only thing I see wrong with some (inexperienced) programmers about whitespace is that they can be afraid to use it, which is not true in this case.
I did however notice that you did not use more than one consecutive blank line in your sample code, which, in certain cases, you should use.
Here is how I would refactor that method. Things can surely still be improved and I did not yet refactor the PiggyBack class (I just moved it to an upper level).
By using the Composed Method pattern, the code becomes easier to read when it's divided into methods that each do one thing and work on a single level of abstraction. Also less comments are needed. Comments that answer to the question "what" are code smells (i.e. the code should be refactored to be more readable). Useful comments answer to the question "why", and even then it would be better to improve the code so that the reason will be obvious (sometimes that can be done by having a test that will fail without the inobvious code).
public CommandThread[] buildCommandsForExecution() {
CommandLine cLine = buildCommandLine();
CommandThread command = buildCommandThread(cLine);
initPiggyBack(command);
return new CommandThread[]{command};
}
private CommandLine buildCommandLine() {
CommandLine cLine = new CommandLine(getValue(OptionConstants.EXECUTABLE));
// "--test" must be first, and bucket and file location must be last,
// because [TODO: enter the reason]
cLine.addOption(getOption(OptionConstants.TEST));
for (Option regularOption : getRegularOptions()) {
cLine.addOption(regularOption);
}
cLine.addOption(getOption(OptionConstants.BUCKET));
cLine.addOption(getOption(OptionConstants.FILE_LOCATION));
return cLine;
}
private List<Option> getRegularOptions() {
List<Option> options = getAllOptions();
options.removeAll(getNonRegularOptions());
return options;
}
private List<Option> getAllOptions() {
List<Option> options = new ArrayList<Option>();
for (OptionBox optionBox : optionBoxes.values()) {
options.add(optionBox.getOption());
}
return options;
}
private List<Option> getNonRegularOptions() {
OptionConstants[] nonRegular = {
OptionConstants.BUCKET,
OptionConstants.FILE_LOCATION,
OptionConstants.TEST,
OptionConstants.EXECUTABLE,
OptionConstants.MOUNT_LOCATION
};
List<Option> options = new ArrayList<Option>();
for (OptionConstants c : nonRegular) {
options.add(getOption(c));
}
return options;
}
private CommandThread buildCommandThread(CommandLine cLine) {
GUIInteractiveCommand command = new GUIInteractiveCommand(cLine, console);
command.addComponentsToEnable(enableOnConnect);
command.addComponentsToDisable(disableOnConnect);
if (isMountLocationSet()) {
command.addComponentToEnable(mountButton);
}
return command;
}
private boolean isMountLocationSet() {
String mountLocation = getValue(OptionConstants.MOUNT_LOCATION);
return !mountLocation.equals("");
}
private void initPiggyBack(CommandThread command) {
PiggyBack piggyBack = new PiggyBack();
piggyBack.setConsole(console);
command.setNextLink(piggyBack);
}
For C#, I say "if" is just a word, while "if(" is code - a space after "if", "for", "try" etc. doesn't help readability at all, so I think it's better without the space.
Also: Visual Studio> Tools> Options> Text Editor> All Languages> Tabs> KEEP TABS!
If you're a software developer who insists upon using spaces where tabs belong, I'll insist that you're a slob - but whatever - in the end, it's all compiled. On the other hand, if you're a web developer with a bunch of consecutive spaces and other excess whitespace all over your HTML/CSS/JavaScript, then you're either clueless about client-side code, or you just don't give a crap. Client-side code is not compiled (and not compressed with IIS default settings) - pointless whitespace in client-side script is like adding pointless Thread.Sleep() calls in server-side code.
I like to maximize the amount of code that can be seen in a window, so I only use a single blank line between functions, and rarely within. Hopefully your functions are not too long. Looking at your example, I don't like a blank line for an open brace, but I'll have one for a close. Indentation and colorization should suffice to show the structure.
I want to store some additional data on an html page and on demand by the client use this data to show different things using JS. how should i store this data? in Invisible divs, or something else?
is there some standard way?
I'd argue that if you're using JS to display it, you should store it in some sort of JS data structure (depending on what you want to do). If you just want to swap one element for another though, invisible [insert type of element here] can work well too.
I don't think there is a standard way; I would store them in JavaScript source code.
One of:
Hidden input fields (if you want to submit it back to the server); or
Hidden elements on the page (hidden by CSS).
Each has applications.
If you use (1) to, say, identify something about the form submission you should never rely on it on the server (like anything that comes from the client). (2) is most useful for things like "rich" tool tips, dialog boxes and other content that isn't normally visible on the page. Usually the content is either made visible or cloned as appropriate, possibly being modified in the process.
If I need to put some information in the html that will be used by the javascript then I use
<input id="someuniqueid" type="hidden" value="..." />
Invisible divs is generally the way to go. If you know what needs to be shown first, you can improve user experience by only loading that initially, then using an AJAX call to load the remaining elements on the page.
You need to store any sort of data to be structured as HTML in an HTML structure. I would say to properly build out the data or content you intend to display as proper HTML showing on the page. Ensure that everything is complete, semantic, and accessible. Then ensure that the CSS presents the data properly. When you are finished add an inline style of "display:none;" to the top container you wish to have dynamically appear. That inline style can be read by text readers so they will not read it until the display style proper upon the element changes.
Then use JavaScript to change the style of the container when you are ready:
var blockit = function () {
var container = document.getElementById("containerid");
container.style.display = "block";
};
For small amounts of additional data you can use HTML5 "data-*" attribute
<div id="mydiv" data-rowindex="45">
then access theese fields with jQuery data methods
$("#mydiv").data("rowindex")
or select item by attribute value
$('div[data-rowindex="45"]')
attach additional data to element
$( "body" ).data( "bar", { myType: "test", count: 40 } );