Parsing site HTML instead of API [closed] - html

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I'm in the process of making an app and wondered if it is bad practice to parse a website's HTML page (in terms of efficiency) when their API does not provide me with the specific information I need for an element in my app. (Of course all due credit/sources will be provided visibly in my app, etc, etc)
For instance: if for some reason the Google Places API does not provide me with a venue's business hours, so as a workaround I go to that venue's Google Places page and parse the HTML for that business' hours to place in my app.

Just some thoughts that I hope would make things clearer.
If an API doesn't provide the data you need, first, a good idea would be to contact the API developers and request the functionality you need. Also, before taking the web-scraping/html-parsing approach you have to study the legal side of it, make sure the web-site is not against web-scraping - study Terms of Use.
Also, take into account the possible complexity of the html-parsing code. You would depend on the actual HTML markup that can be changed at any point. The solution you would implement can be really fragile because of it.
Besides, some things can be very difficult to get without a real browser. For example, something can be calculated via a javascript code being executed in the browser, or via a set of complicated AJAX calls. In this case, you would need to utilize a real browser, which is, first, a dependency, and, second, the thing that would slow things down dramatically.
Besides, some sites have anti-web-crawling solutions in action, like banning the IP address after multiple consequent requests, or requiring a certain header to be sent with each request etc.
You can also take another defensive step: contact the webmaster and discuss the problem.
Follow-up: Web scraping etiquette.

Related

Sitecore, why is so difficult? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am new to sitecore and my company has been using external company to manage their sitecore, which I totally understand, since it involved so much work on the development.
As a designer myself, I found extermely difficult to create a custom page unless I use what's already available. I could use simple page to insert my html codes but again, it's just way too long. In regular case, when you build a html site, you could simply create pages in dreamweaver and view it on your local computer.
I have tried to make a custom page with the presentation control, but each time I called a sub-rendering. the page is just a BLANK.
*so my 1st question will be, What's the procedure to create a custom page? *
I know sitecore suppose to be powerful and there is many api, I really would like to find out why I found it's so difficult......
my background is, designer base with knowledge of html, css, and php. I am not a developer that's for sure. :)
Thanks for taking time read my blah blah..:)
1st question will be, What's the procedure to create a custom page?
To answer your first question, there are some high levels steps you would generally take.
Create a page template that includes any fields or metadata you need to render the page
Create any layouts, sublayouts, or renderings necessary to render the custom page - this is where having access to a developer normally becomes necessary
Assign the renderings and datasources to the instance of your new template (or better yet, assign the renderings to __Standard Values item)
Publish everything out
You should reference the Self-Study to Building a Very Simple Site from Sitecore
2nd question will be, why do we need to call the developer each time when we want to have some feature inputs?
To answer your second question: To get very far with customizing Sitecore you will need to be a developer or have access to one. This can be mitigated to some extent depending on how flexible the solution is they developed. But let's be real - Sitecore is an Enterprise CMS, it's not Wordpress where you can install a theme and a few plugins.
As someone just learning, there are a number of options
Training from Sitecore - this is probably your best bet
Download and play with Launch Sitecore for sample code and examples to build a real website
Check out the Sitecore Marketplace for modules that can get things done for you
Subscribe to and read John West's blog for inside information of basically every aspect of Sitecore
3rd question is, why I can access the CSS?
This question doesn't make sense frankly, so I will assume it was meant to ask "How?" or "Where?" Without any more information about the site in question, you can normally map the URL to the location on disk. For example:
http://www.mysite.com/css/styles.css
This URL might map to c:\inetpub\wwwroot\mysite\website\css\styles.css
I do highly recommend that any code changes, including CSS, be done through your source control system and only be deployed following your standard release management.
Honestly, I don't believe you are qualified to modify and maintain the Sitecore site given your current training and experience level. The first step I recommend is getting that Sitecore developer training and any training available from your vendor on the specific implementation. Good luck!

Can anyone (including Google) get a list of the files on the server that holds my website? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
If I have an html file on a web server without any links in it and without any links pointing to it anywhere, will Google be able to see it? Will Google be able to promote it?
Generally, Google and other search engines find new pages to add to their indices by following links from one web page to another.
Some search engines, including Bing and Google, also allow webmasters to submit URLs directly, meaning that your site may get indexed even if there are no links pointing to it from the “outside world”. (Links like these are called “inbound links” in the trade.)
Short answer: No, probably.
Longer answer: For the most part, search engines like Google work by following links around, not by guessing what URLs are on your server. As long as the HTML file isn't a well-known name like "index" or "home" or another value used as a default index page by web servers, then it's unlikely to be included in a search index. (disclaimer: I don't work for Google and search algorithms are proprietary, so they may actually have some URL-guessing going on)
However, if you're relying on that behavior to protect something you don't want to be seen until you're ready to promote it, your gonna have a bad time. History is full of examples of companies that decided to "hide" a URL that it wasn't ready to promote, only to be foiled by someone editing the URL string in their browser to troll for hidden content.
In general, if you really have not links to it, the answer is NO. HTTP has no command for getting directory listing. (Well, I don't discussing the possibility of google spying via chrome browser). I you DON'T WANT google to see it, you can put it into a directory declared as forbidden in robots.txt to be 100% sure, and make sure that you server is set up not to give a directory index. If you WANT google to scan it, the only way to get it is to post a link to it somewhere.

Are there any analytics on how many people actually print webpages? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Has anyone, with a large sampling, done research on how many users actually print webpages? I'm looking for some percentage values. .01%, 1%, etc actually print webpages.
It seems like a waste of time, to create design oriented print pages, if the stats extremely low.
It is very easy to create some print styles for your stylesheet to make printing easier on people.
As an example: http://www.alistapart.com/articles/goingtoprint/
In the same way that not everyone who visits your site will be disabled, the best practice is still to create sites that work for people with accessibility problems.
I don't have a link to a study for you but I'm very confident that it depends heavily on the type of content. I.e. the percentage of people who print a youtube video page is for sure much lower than those who print a recipe from a online cookbook.
So it's probably best to run your own study on the particular website where you need it. You can either make a little poll for the users of your site or track how often pages actually get printed.
This is not a metric that is usually tracked.
Since one needs to differentiate the regular page from the printable page, this requires a custom implementation on the printable version page that sends a particular tracking code/cookie.
It is not that hard to implement, one can even have printable pages tracked in google analytics or any analytic engine, but as I said it does require preparation and most people don't track this metric in particular.
It is possible through JavaScript to track the actual printing event with IE browsers. Considering users most likely won't switch to IE just to do the printing, it would give some sort of accurate indication of what % of the IE users, print the page.
For more information about the onbeforeprint and onafterprint events have a look at:
http://msdn.microsoft.com/en-us/library/ms536672(v=vs.85).aspx
Btw, I am not saying that collecting this data solely from IE users would give an accurate indication of the overall % of printed pages across all browsers, because IE is far more commonly used in office environments rather than home environments.

How/When to design an interface? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm trying to get in the habit of designing the interfaces to my websites at the very beginning before I do any actual coding. I've read "Getting Real" by 37 signals and they recommend doing the interface first, before any actual code is produced.
What exactly is meant by that? Does that mean use pure HTML and CSS to design the site and add php, js logic to page afterwards, or is it okay to sprinkle in the php, js from the beginning?
What if your using a framework, should I set up empty controllers that simply call the views, or should the early stages be solely html, css?
Also, what do you guys think about design first vs later?
EDIT I'm talking about AFTER I have sketched everything with pen and paper.. I'm taslking solely about the html mockups. And I'm not too sure about using extra tools that I would need to learn to do this
I think that the majority of the benefit of designing the interface first has been achieved after you are done your paper sketches. Basically, you are just ensuring that you have a design in your head and that your coding process is somewhat end-user driven. You are also trying not to waste time on needless documentation.
Getting the HTML in place (or the skeletons of the Views in an MVC app) makes some sense and this is the main thrust of what 37signals says. I would certainly not do anything beyond this that is just going to be thrown away.
I think if you have a proper design, it is immaterial if you next move on to writing the back-end code after the HTML or if you do the CSS and JavaScript. The CSS and the code should not even need to be aware of each other.
Do whatever gets you excited and motivated. Do whatever gets you thinking more deeply about how the app will actually work so you can catch any flaws in your original thinking. I like to code before CSS but that is just me. You might find it important to get the CSS further along before the app takes shape in your head.
Joel Spolsky likes Balsamiq as a mocking tool. I think that 37signals uses Draft (an iPhone app). I use a Sharpie. The key is not getting too detailed though.
Opinions vary, but I believe that JavaScript should come last. I believe most sites should be designed so that they work 100% without JavaScript and then have JavaScript added for polish.
Learn more about Unobtrusive JavaScript
So (for me):
Quick and dirty sketches of views
Get some HTML in place
Maybe some basic CSS for layout (or more if I need to impress somebody early)
Write the core logic
Add support for web services and AJAx calls
Pretty it all up with snazzy CSS
Write some JavaScript to add the sizzle
Let me ask you this. Do you paint a car before or after you have made the working parts? Maybe you have chosen which paint but ultimately it cannot go on until the car is finished. Maybe you don't agree with this analogy but I think coding will bring out issues that cannot be understood before a site is designed. Code first, design second.
Get a pad of paper. Each page represents one page of your site.
Sketch the interface. What controls go on each page? What controls are the same on each page? What forms are there and on which pages? What happens when user clicks on item x? Item y?
This will help you solidify your plan of both the content and behaviour of your site.
If you just start blindly coding you will end up with burnt spaghetti.
The user interface is what the users of the website will see. Before coding you probably start with some very basic sketches of the site that are not code, to identify page navigation, general placement of content and interaction with the site.
But the earlier you can show and discuss a working UI, the easier it is for the users/client to get an idea of the final product. So quickly move to the HTML, CSS, JavaScript and things like images, to identify:
The data presented on the page (HTML)
The representation of the data (CSS)
The interaction with the data (JavaScript)
Doing so helps to gradually develop an actual working UI that you can discuss with the client. This keeps them involved from early in the project. It forces them to think about the site, and make decisions about content, look and interaction.
Getting such feedback early in the project reduces the risk of building a product that needs to be changed later on. And making changes early in the project is easier/cheaper, then later in the project.
While the UI is being developed you can already start looking into data structures, software components and integrations with other systems to drive the site. But that's not what users/clients are interested in, they want to see and use the product.

Client-side templating frameworks to streamline using jQuery with REST/JSON [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I'm starting to migrate some html generation tasks from a server-side framework to the client. I'm using jQuery on the client. My goal is to get JSON data via a REST api and use this data to populate HTML into the page.
Right now, when a user on my site clicks a link to My Projects, the server generates HTML like this:
<dl>
<dt>Clean Toilet</dt>
<dd>Get off your butt and clean this filth!</dd>
<dt>Clean Car</dt>
<dd>I think there's something growing in there...</dd>
<dt>Replace Puked on Baby Sheets</dt>
</dl>
I'm changing this so that clicking My Projects will now do a GET request that returns something like this:
[
{
"name":"Clean Car",
"description":"I think there's something growing in there..."
},
{
"name":"Clean Toilets",
"description":"Get off your butt and clean this filth!"
},
{
"name":"Replace Puked on Baby Sheets"
}
]
I can certainly write custom jQuery code to take that JSON and generate the HTML from it. This is not my question, and I don't need advice on how to do that.
What I'd like to do is completely separate the presentation and layout from the logic (jquery code). I don't want to be creating DL, DT, and DD elements via jQuery code. I'd rather use some sort of HTML templates that I can fill the data in to. These templates could simply be HTML snippets that are hidden in the page that the application was loaded from. Or they could be dynamically loaded from the server (to support user specific layouts, i18n, etc.). They could be displayed a single time, as well as allow looping and repeating. Perhaps it should support sub-templates, if/then/else, and so forth.
I have LOTS of lists and content on my site that are presented in many different ways. I'm looking to create a simple and consistent way to generate and display content without creating custom jQuery code for every different feature on my site. To me, this means I need to find or build a small framework on top of jQuery (probably as a plugin) that meets these requirements.
The only sort of framework that I've found that is anything like this is jTemplates. I don't know how good it is, as I haven't used it yet. At first glance, I'm not thrilled by it's template syntax.
Anyone know of other frameworks or plugins that I should look into? Any blog posts or other resources out there that discuss doing this sort of thing? I just want to make sure that I've surveyed everything out there before building it myself.
Thanks!
Since posting this question, I have found many other templating options. I've listed many of them below. However, there was recently a jQuery templates proposal that may be the most promising solution yet. There is also a discussion about it on the jquery site. Here is the project location:
https://github.com/nje/jquery/wiki/jquery-templates-proposal
Other solutions I've come across include (in no particular order):
http://www.west-wind.com/weblog/posts/509108.aspx
http://ejohn.org/blog/javascript-micro-templating/
http://beebole.com/pure/
http://archive.plugins.jquery.com/project/jTemplates
http://archive.plugins.jquery.com/project/advancedmerge
http://archive.plugins.jquery.com/project/tempest
http://archive.plugins.jquery.com/project/jBind
http://archive.plugins.jquery.com/project/cliche
http://archive.plugins.jquery.com/project/appendDom
http://archive.plugins.jquery.com/project/openSocial-jquery-templates
http://archive.plugins.jquery.com/project/Orange-J
http://archive.plugins.jquery.com/project/fromTemplate-microtemplate
http://archive.plugins.jquery.com/project/resiglet
http://archive.plugins.jquery.com/project/databind
http://archive.plugins.jquery.com/project/jsont
http://archive.plugins.jquery.com/project/domplate
http://archive.plugins.jquery.com/project/noTemplate
http://archive.plugins.jquery.com/project/jQueryHtmlTemplates
http://github.com/trix/nano
http://aefxx.com/jquery-plugins/jqote/
http://ajaxian.com/archives/chainjs-jquery-data-binding-service
http://ajaxpatterns.org/Browser-Side_Templating
http://beebole.com/pure/
http://code.google.com/p/google-jstemplate/
http://code.google.com/p/trimpath/wiki/JavaScriptTemplates
http://embeddedjs.com/
Javascript template system - PURE, EJS, jquery plugin?
jQuery templating engines
http://goessner.net/articles/jsont/
Sounds like you want sammy.js
http://code.quirkey.com/sammy/
The tutorials there demo use of the template engine
I've used jTemplates quite a few times and from my experience it serves its intended purpose.
If we're limiting the discussion to client side then this is my final comment on the matter as it does the job and despite some funky syntax does it well, however on the server side of things I would definitely prefer the scenario where you POST some JSON which is deserialized to an in-memory object and then validated and passed to a server-side template (such as an ASCX in ASP.NET) where you have the full power of that language.
In my opinion, if the client supports JavaScript well enough for you to be considering jTemplates then I would recommend setting yourself up a JavaScript utility method which allows you to send JSON and receive HTML, thereby cutting out the potentially troublesome middle man. Most languages have JSON-parsing ability and jQuery can auto-parse a server response into JSON by specifying the return type as "json".
Even if you don't receive the JSON from the JavaScript, you can still take the JSON that you would have sent back to the browser and just send it through your server-side template instead. In ASP.NET (with MVC for example) you can have strongly-typed template files that do not need to be compiled, making changes a lot easier to implement. Therefore you would still be sending back markup, but it would have been run through a proper template with the full strength of a programming language behind it.