how to se table contents using simplehtmldom - html

i'm using a parsing library called 'simplehtmldom'. all i want to do is extract the textual contents of table cells. that's all! it seems so simple... everything i've tried results in the ENTIRE FRIGIN PAGE being dumped because apparently all of the primitives traverse the dom tree up, down, and sideways. here's a trivialised example of what i'm trying to do :
$saved = '';
foreach($html->find('tr') as $tr) {
foreach($tr->find('td') as $td) {
$contents = $td->plaintext;
if ($saved) {
echo "$saved : $contents<br>\n";
$saved = '';
}
if (strstr($contents, 'Title') || strstr($contents, 'Author')) {
$saved = $contents;
}
}
}
i've tried using 'plaintext', 'innertext', and 'text', but no matter what i try, i end up getting either endless loads of crap echoed out, or else nothing at all.
does anyone know how to use this parser ? or else could suggest an alternative to do what i want to do ?

CAVEAT - this is not really an answer, but rather an alternative.
i'm closing this question because i was able to solve the problem using a different approach, the DOM class, mentioned here. hopefully this will save someone some time if you're just looking for a way to get the contents of table cells and aren't constrained to a particular package or approach.

Related

Using 'preserve_interword_spaces' in tesseract.js

I am trying to use Tesseract.js for OCR, but I'm not able to get the 'preserve_interword_spaces' option to work. Here is what I am trying:
Tesseract.recognize(
element.files[0],
'eng',
{ preserve_interword_spaces: 1,
logger: progress => {
console.log(progress);
progressBar.querySelector("div").innerText = progress.status;
progressBar.querySelector("progress").value = progress.progress;
} }
).then( //etc )
The OCR is coming out with multiple spaces combined into one. Help?
I'd prefer to define the .recognize() this way, rather than using await(). I know preserve_interword_spaces is supported since I can see it in the documentation here and here
but I'm not sure how to get it to work in my case.
Just an update that I was able to resolve the issue by changing to async(). As the documentation states, Tesseract.recognize() is only meant for quick tasks, not more involved ones.

JList does not clear

I have an issue with removing old elements from my lists. I tried using the methods clear() and removeAllElements() and removeAll() wherever I could but that does not seem to clear them.
To help you understand the situation a little bit better:
d1 is an ArrayList that contains all available devices in our program.
availList2 and availList3 are using the DefaultListModel.
We wanted to make it so that when the user loads the products from the proper text file, if he did that a second time the products already listed in our gui would be overwritten with the ones in the original text file. However we ended up having duplicates of the products, even though we used the clear() method in both the d1 (ArrayList) and the JList.
Any useful tips or possible causes would be appreciated. Thank you very much in advance.
if(ev.getSource() == load_availables) {
int returnVal = chooser.showOpenDialog(mainApp.this);
if(returnVal == JFileChooser.APPROVE_OPTION) {
d1.returnDevices().removeAll(d1.returnDevices());
availList2.clear();
availList3.clear();
//availList2.removeAllElements();
//availList3.removeAllElements();
File file = chooser.getSelectedFile();
read.ReadDevices(file);
for(int i = 0; i < read.Size(); i++) {
d1.add_AvailableDevices(read.get(i));
}
}
}
If the list is not cleared then I would suggest you don't have the proper reference to the DefaultListModel that is being used by the JList when you invoke the clear() method.
Start by the reading the section from the Swing tutorial on How to Use Lists.
Download the ListDemo code and play with it. Change the "Fire" button to use the clear() method on the DefaultListModel to prove to yourself that is all you need to do.
Once you see the code working then you figure out how your code is different from the ListDemo working version.

Linq to SQL - Turn off UpdateCheck in code

I am wanting to turn off the UpdateCheck functionality for all members (except their primary keys). Now I was following the example below as guidance, however my MetaDataMembers of the table are still set to Always.
http://www.the-lazy-coder.com/2013/04/set-updatecheck-to-never.html
The above code snippet just gets you to change the attribute, however it seems to never get picked up, as I can debug the code when it is running and I see all the properties being set, so I am presuming that the attributes changing does not change the underlying object.
Now if I were to change approach and just get the MetaDataMembers directly from the RowType I notice they have the UpdateCheck property, however only a getter. So is there a way to (via reflection if needed) overwrite this property once it is set? Even after looking at decompiled source it is an abstract class and I cannot find any implementations to use for reference.
I am using SQLMetal to generate the Context files, so there is no designer tinkering available, and although some people will say that I should run some text editing macros to parse and change the attributes, it all sounds too long winded when I should just be able to go into the object in memory and tell it to ignore whatever it has been told previously.
SO! Is there a way to override the property in the entities? I have tried running the original code in that link in both constructor, after the objects created and just before I am about to do an update, however none of the changes seem to stick or at least propagate to where it matters, and there is hardly any material on how to do any of this progmatically.
After searching around the internet I found no nice way to do it, and although there is the link I mentioned originally it doesn't work as it works on the attributes which are partly right but in the case above they are working on the attributes which are not in memory and are just the decorations, anyway the code below seems to work but is not nice:
public static void SetUpdateCheckStatus(this IDataContext dataContext, UpdateCheck updateCheckStatus)
{
var tables = dataContext.Mapping.GetTables();
foreach (var table in tables)
{
var dataMembers = table.RowType.DataMembers;
foreach (var dataMember in dataMembers)
{
if (!dataMember.IsPrimaryKey)
{
var dataMemberType = dataMember.GetType();
if (dataMemberType.Name == "AttributedMetaDataMember")
{
var underlyingAttributeField = dataMember.GetType().GetField("attrColumn", BindingFlags.Instance | BindingFlags.NonPublic);
if (underlyingAttributeField != null)
{
var underlyingAttribute = underlyingAttributeField.GetValue(dataMember) as ColumnAttribute;
if (underlyingAttribute != null)
{ underlyingAttribute.UpdateCheck = updateCheckStatus; }
}
}
else
{
var underlyingField = dataMember.Type.GetField("updateCheck", BindingFlags.Instance | BindingFlags.NonPublic);
if (underlyingField != null)
{ underlyingField.SetValue(dataMember, updateCheckStatus); }
}
}
}
}
}
The IDataContext is just a wrapper we put around a DataContext for mocking purposes, so feel free to change that to just DataContext. It is written extremely defensively as this way pulls back lots of members which do not have all the desired data so it has to filter them out and only work on the ones which do.

no "current_page_item" class being added to top level nav item

If you go to www.lindysez.com and click on the top level navigation links, you'll notice that position presence is indicated by a slightly darker background (same as hover state). This works for each of the links EXCEPT "Tips & Techniques".
If I look under the hood, I noticed that the "current_page_item" class isn't being added to the parent of the "Tips & Techniques" menu item when you go to http://www.lindysez.com/tips
Also, curiously enough, when on the tips page, the parent of the "Blog" navigation item gets the class "current_page_parent", which is neither expected nor desired. This is harmless from a UI standpoint, but probably an indication what the underlying problem might be.
Anyone know why the "Tips & Techniques" menu item isn't getting the "current_page_item" class once selected?
Update: Directed #user2019515's comment below
Thanks for the nudge in this direction, it does seem like I could create a workaround that would fix this. Couple things however.
This treats the symptom of the problem rather than the root. I'd like to figure out how to get wp to just add the "current_page_item" class natively. Any idea what's causing this? I think it may also be related to another problem I'm having which involves "tips" not showing in the RSS feed. I posted another question about that, but haven't seen any answers...
https://stackoverflow.com/questions/15980846/custom-post-type-not-showing-up-in-wordpress-rss-feed
Even as a workaround, I'd rather try to avoid a solution that relies on hard coded "menu-item-##". My development and production servers both have different ID#'s. I know I can account for this, but I hope there's a better, scalable workaround.
The code you provided didn't work for me, not that I can find any fault in it, nor are there any errors. Seems like it should work, but just doesn't. I even tried creating a very simple version that would add a class to all the items, but this too isn't executing. Here's that code.
function add_class_to_wp_nav_menu($classes, $item)
{
$classes[] = 'classy-class';
return $classes;
}
add_filter('nav_menu_css_class', 'add_class_to_wp_nav_menu', 10, 2);
Thanks so much for your help!
The Tips page is a post archive page, it shows posts rather than a normal page that's why the blog was being highlighted. You can easily fix this issue by adding the following custom CSS to your style.css file.
.post-type-archive-tips .menu-item-23{
background:url(images/nav-hover.png) repeat-x;
}
You'll need to loop over your menu, by default the current page class is added to the blog page because in the end the custom posts.
//Change current_page_parent for custom post type
function remove_parent_classes($class)
{
// check for current page classes, return false if they exist.
return ($class == 'current_page_item' || $class == 'current_page_parent' || $class == 'current_page_ancestor' || $class == 'current-menu-item') ? FALSE : TRUE;
}
function add_class_to_wp_nav_menu($classes)
{
switch (get_post_type()) {
case 'portfolio':
// we're viewing a custom post type, so remove the 'current_page_xxx and current-menu-item' from all menu items.
$classes = array_filter($classes, "remove_parent_classes");
// add the current page class to a specific menu item (replace ###).
if (in_array('menu-item-14', $classes)) {
$classes[] = 'current_page_parent';
}
break;
case 'tips':
// we're viewing a custom post type, so remove the 'current_page_xxx and current-menu-item' from all menu items.
$classes = array_filter($classes, "remove_parent_classes");
// add the current page class to a specific menu item (replace ###).
if (in_array('menu-item-23', $classes)) {
$classes[] = 'current_page_parent';
}
break;
}
return $classes;
}
add_filter('nav_menu_css_class', 'add_class_to_wp_nav_menu');
There's just one custom post type in the function, but I'm sure you can figure out how to add more. Got any questions, feel free to leave a comment! :)

Stylistic Question: Use of White Space

I have a particularly stupid insecurity about the aesthetics of my code... my use of white space is, frankly, awkward. My code looks like a geek dancing; not quite frightening, but awkward enough that you feel bad staring, yet can't look away.
I'm just never sure when I should leave a blank line or use an end of line comment instead of an above line comment. I prefer to comment above my code, but sometimes it seems strange to break the flow for a three word comment. Sometimes throwing an empty line before and after a block of code is like putting a speed bump in an otherwise smooth section of code. For instance, in a nested loop separating a three or four line block of code in the center almost nullifies the visual effect of indentation (I've noticed K&R bracers are less prone to this problem than Allman/BSD/GNU styles).
My personal preference is dense code with very few "speed bumps" except between functions/methods/comment blocks. For tricky sections of code, I like to leave a large comment block telling you what I'm about to do and why, followed by a few 'marker' comments in that code section. Unfortunately, I've found that some other people generally enjoy generous vertical white space. On one hand I could have a higher information density that some others don't think flows very well, and on the other hand I could have a better flowing code base at the cost of a lower signal to noise ratio.
I know this is such a petty, stupid thing, but it's something I really want to work on as I improve the rest of my skill set.
Would anyone be willing to offer some hints? What do you consider to be well flowing code and where is it appropriate to use vertical white space? Any thoughts on end of line commenting for two or three words comments?
Thanks!
P.S.
Here's a method from a code base I've been working on. Not my best, but not my worst by far.
/**
* TODO Clean this up a bit. Nothing glaringly wrong, just a little messy.
* Packs all of the Options, correctly ordered, in a CommandThread for executing.
*/
public CommandThread[] generateCommands() throws Exception
{
OptionConstants[] notRegular = {OptionConstants.bucket, OptionConstants.fileLocation, OptionConstants.test, OptionConstants.executable, OptionConstants.mountLocation};
ArrayList<Option> nonRegularOptions = new ArrayList<Option>();
CommandLine cLine = new CommandLine(getValue(OptionConstants.executable));
for (OptionConstants constant : notRegular)
nonRegularOptions.add(getOption(constant));
// --test must be first
cLine.addOption(getOption(OptionConstants.test));
// and the regular options...
Option option;
for (OptionBox optionBox : optionBoxes.values())
{
option = optionBox.getOption();
if (!nonRegularOptions.contains(option))
cLine.addOption(option);
}
// bucket and fileLocation must be last
cLine.addOption(getOption(OptionConstants.bucket));
cLine.addOption(getOption(OptionConstants.fileLocation));
// Create, setup and deploy the CommandThread
GUIInteractiveCommand command = new GUIInteractiveCommand(cLine, console);
command.addComponentsToEnable(enableOnConnect);
command.addComponentsToDisable(disableOnConnect);
if (!getValue(OptionConstants.mountLocation).equals(""))
command.addComponentToEnable(mountButton);
// Piggy-back a Thread to start a StatReader if the call succeeds.
class PiggyBack extends Command
{
Configuration config = new Configuration("piggyBack");
OptionConstants fileLocation = OptionConstants.fileLocation;
OptionConstants statsFilename = OptionConstants.statsFilename;
OptionConstants mountLocation = OptionConstants.mountLocation;
PiggyBack()
{
config.put(OptionConstants.fileLocation, getOption(fileLocation));
config.put(OptionConstants.statsFilename, getOption(statsFilename));
}
#Override
public void doPostRunWork()
{
if (retVal == 0)
{
// TODO move this to the s3fronterSet or mounts or something. Take advantage of PiggyBack's scope.
connected = true;
statReader = new StatReader(eventHandler, config);
if (getValue(mountLocation).equals(""))
{
OptionBox optBox = getOptionBox(mountLocation);
optBox.getOption().setRequired(true);
optBox.requestFocusInWindow();
}
// UGLY HACK... Send a 'ps aux' to grab the parent PID.
setNextLink(new PSCommand(getValue(fileLocation), null));
fireNextLink();
}
}
}
PiggyBack piggyBack = new PiggyBack();
piggyBack.setConsole(console);
command.setNextLink(piggyBack);
return new CommandThread[]{command};
}
It doesn't matter.
1) Develop a style that is your own. Whatever it is that you find easiest and most comfortable, do it. Try to be as consistent as you can, but don't become a slave to consistency. Shoot for about 90%.
2) When you're modifying another developer's code, or working on a group project, use the stylistic conventions that exist in the codebase or that have been laid out in the style guide. Don't complain about it. If you are in a position to define the style, present your preferences but be willing to compromise.
If you follow both of those you'll be all set. Think of it as speaking the same language in two different ways. For example: speaking differently around your friends than you do with your grandfather.
It's not petty to make pretty code. When I write something I'm really proud of, I can usually take a step back, look at an entire method or class, and realize exactly what it does at a glance - even months later. Aesthetics play a part in that, though not as large of a part as good design. Also, realize you can't always write pretty code, (untyped ADO.NET anyone?) but when you can, please do.
Unfortunately, at this higher level at least, I'm not sure there are any hard rules you can adhere to to always produce aesthetically pleasing code. One piece of advice I can offer is to simply read code. Lots of it. In many different frameworks and languages.
I like to break up logical "phrases" of code with white space. This helps others easily visualize the logic in the the method - or remind me when I go back and look at old code. For example, I prefer
reader.MoveToContent();
if( reader.Name != "Limit" )
return false;
string type = reader.GetAttribute( "type" );
if( type == null )
throw new SecureLicenseException( "E_MissingXmlAttribute" );
if( String.Compare( type, GetLimitName(), false ) != 0 )
throw new SecureLicenseException( "E_LimitValueMismatch", type, "type" );
instead of
reader.MoveToContent();
if( reader.Name != "Limit" )
return false;
string type = reader.GetAttribute( "type" );
if( type == null )
throw new SecureLicenseException( "E_MissingXmlAttribute" );
if( String.Compare( type, GetLimitName(), false ) != 0 )
throw new SecureLicenseException( "E_LimitValueMismatch", type, "type" );
The same break can almost be accomplished with braces but I find that actually adds visual noise and reduces the amount of code that can be visually consumed simultaneously.
Commens on code line
As for comments at the end of the line - almost never. The're not really bad, just easy to miss when scanning through code. And they clutter up the line taking away from the code making it harder to read. Our brains are already wired to grok line by line. When the comment is at the end of the line we have to split the line into two concrete concepts - code and comment. I say if it's important enough to comment on, put it on the line proceeding the code.
That being said, I do find one or two line hint comments about the meaning of a specific value are sometimes OK.
I find code with very little whitespace hard to read and navigate in, since I need to actually read the code to find logical structure in it. Clever use of whitespace to separate logical parts in functions can increase the ease of understanding the code, not only for the author but also for others.
Keep in mind that if you are working in an environment where your code is likely to be maintained by others, they will have spent the majority of their time looking at code that was not written by you. If your style distinctly differs from what they are used to seeing, your smooth code may be a speed bump for them.
I minimize white space. I put the main comment block above the code block and Additional end of line comments on the Stuff that may not be obvious to another dveloper. I think you are doing that already
My preferred style is probably anathema to most developers, but I will add occasional blank lines to separate what seem like appropriate 'paragraphs' of code. It works for me, nobody has complained during code reviews (yet!), but I can imagine that it might seem arbitrary to others. If other people don't like it I'll probably stop.
The most important thing to remember is that when you join an existing code base (as you almost always will in your professional career) you need to adhere to the code style guide dictated by the project.
Many developers, when starting a project afresh, choose to use a style based on the Linux kernel coding-style document. The latest version of that doc can be viewed at http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/CodingStyle;h=8bb37237ebd25b19759cc47874c63155406ea28f;hb=HEAD.
Likewise many maintainers insist that you use Checkpatch before submitting changes to version control. You can see the latest version that ships with the Linux kernel in same tree I linked to above at scripts/checkpatch.pl (I would link to it but I'm new and can only post one hyperlink per answer).
While Checkpatch is not specifically related to your question about whitespace usage, it will certainly help you eliminate trailing whitespace, spaces before tabs, etc.
Code Complete, by Steve McConnell (available in the usual locations) is my bible on this sort of thing. It has a whole chapter on layout and style that is just excellent. The whole book is just chock full of useful and practical advice.
I use exactly the same amount of whitespace as you :) Whitespace before methods, before comment blocks. In C, C++ the brackets also provide some "pseudo-whitespace" as there is only a single opening/closing brace on some lines, so this also serves to break up the code density.
Your code is fine, just do what you (and others you might work with) are comfortable with.
The only thing I see wrong with some (inexperienced) programmers about whitespace is that they can be afraid to use it, which is not true in this case.
I did however notice that you did not use more than one consecutive blank line in your sample code, which, in certain cases, you should use.
Here is how I would refactor that method. Things can surely still be improved and I did not yet refactor the PiggyBack class (I just moved it to an upper level).
By using the Composed Method pattern, the code becomes easier to read when it's divided into methods that each do one thing and work on a single level of abstraction. Also less comments are needed. Comments that answer to the question "what" are code smells (i.e. the code should be refactored to be more readable). Useful comments answer to the question "why", and even then it would be better to improve the code so that the reason will be obvious (sometimes that can be done by having a test that will fail without the inobvious code).
public CommandThread[] buildCommandsForExecution() {
CommandLine cLine = buildCommandLine();
CommandThread command = buildCommandThread(cLine);
initPiggyBack(command);
return new CommandThread[]{command};
}
private CommandLine buildCommandLine() {
CommandLine cLine = new CommandLine(getValue(OptionConstants.EXECUTABLE));
// "--test" must be first, and bucket and file location must be last,
// because [TODO: enter the reason]
cLine.addOption(getOption(OptionConstants.TEST));
for (Option regularOption : getRegularOptions()) {
cLine.addOption(regularOption);
}
cLine.addOption(getOption(OptionConstants.BUCKET));
cLine.addOption(getOption(OptionConstants.FILE_LOCATION));
return cLine;
}
private List<Option> getRegularOptions() {
List<Option> options = getAllOptions();
options.removeAll(getNonRegularOptions());
return options;
}
private List<Option> getAllOptions() {
List<Option> options = new ArrayList<Option>();
for (OptionBox optionBox : optionBoxes.values()) {
options.add(optionBox.getOption());
}
return options;
}
private List<Option> getNonRegularOptions() {
OptionConstants[] nonRegular = {
OptionConstants.BUCKET,
OptionConstants.FILE_LOCATION,
OptionConstants.TEST,
OptionConstants.EXECUTABLE,
OptionConstants.MOUNT_LOCATION
};
List<Option> options = new ArrayList<Option>();
for (OptionConstants c : nonRegular) {
options.add(getOption(c));
}
return options;
}
private CommandThread buildCommandThread(CommandLine cLine) {
GUIInteractiveCommand command = new GUIInteractiveCommand(cLine, console);
command.addComponentsToEnable(enableOnConnect);
command.addComponentsToDisable(disableOnConnect);
if (isMountLocationSet()) {
command.addComponentToEnable(mountButton);
}
return command;
}
private boolean isMountLocationSet() {
String mountLocation = getValue(OptionConstants.MOUNT_LOCATION);
return !mountLocation.equals("");
}
private void initPiggyBack(CommandThread command) {
PiggyBack piggyBack = new PiggyBack();
piggyBack.setConsole(console);
command.setNextLink(piggyBack);
}
For C#, I say "if" is just a word, while "if(" is code - a space after "if", "for", "try" etc. doesn't help readability at all, so I think it's better without the space.
Also: Visual Studio> Tools> Options> Text Editor> All Languages> Tabs> KEEP TABS!
If you're a software developer who insists upon using spaces where tabs belong, I'll insist that you're a slob - but whatever - in the end, it's all compiled. On the other hand, if you're a web developer with a bunch of consecutive spaces and other excess whitespace all over your HTML/CSS/JavaScript, then you're either clueless about client-side code, or you just don't give a crap. Client-side code is not compiled (and not compressed with IIS default settings) - pointless whitespace in client-side script is like adding pointless Thread.Sleep() calls in server-side code.
I like to maximize the amount of code that can be seen in a window, so I only use a single blank line between functions, and rarely within. Hopefully your functions are not too long. Looking at your example, I don't like a blank line for an open brace, but I'll have one for a close. Indentation and colorization should suffice to show the structure.