Parsing HTML with TFHpple on iOS - html

I am working on an iOS project and my goal is to create a "pretty" app version of a specific website. To do this, I'm parsing all the data from said website with TFHpple for use in my app.
This website has a store, so my goal was to read the webpage with the products, then display them all in a table view and what not. However, the website doesn't display all the products on the specific page at once.
It splits up the products across 3 pages inside the one website page so you have to click the "next page" button on the website, and that runs some javascript code and adds the "?foo" part to make the URL in the browser be "http://www.domain.com/page?foo". I read up that this "?foo" is a query string, however I don't know how I would trigger this in my Objective-C code from the base "http://www.domain.com/page" page.
So simply, I can't grab all the items I would like to with TFHpple because some are not in the HTML until the correct query string is used.
I'm not really sure if I am explaining my predicament very clearly, but if anyone has any info that I could use I would appreciate it immensely!

I am working on a similar project. I think your problem has nothing to do with TFHpple. TFHpple only handles HTML parsing rather than data downloading from server. My approach is like this:
NSInteger currentPageIndex=1;
NSString *dataUrlString=[NSString stringWithFormat:#"http://www.domain.com/page?index=%d",currentPageIndex];
Detecting tableView's scrolling event, if it reaches the end of the tableView, then load next page data from server;
Parse the new page's HTML data from server, append data to tableView's datasource and reload the tableView.

Related

Saving static HTML page generated with ReactJS

Background:
I need to allow users to create web pages for various products, with each page having a standard overall appearance. So basically, I will have a template, and based on the input data I need the HTML page to be generated for each product. The input data will be submitted via a web form, following which the data should be merged with the template to produce the output.
I initially considered using a pure templating approach such as Nunjucks, but moved to ReactJS as I have prior experience with the latter.
Problem:
Once I display the output page (by adding the user input to the template file with placeholders), I am getting the desired output page displayed in the browser. But how can I now obtain the HTML code for this specific page?
When I tried to view the source code of the page, I see the contents of 'public/index.html' stating:
This HTML file is a template.
If you open it directly in the browser, you will see an empty page.
Expectedly, the same happens when I try to save (Save As...) the html page via the browser. I understand why the above happens.
But I cannot find a solution to my requirement. Can anyone tell me how I can download/save the static source code for the output page displayed on the browser.
I have read possible solutions such as installing 'React/Redux Development Extension' etc... but these would not work as a solution for external users (who cannot be expected to install these extensions to use my tool). I need a way to do this on production environment.
p.s. Having read the "background" info of my task, do let me know if you can think of any better ways of approaching this.
Edit note:
My app is currently actually just a single page, that accepts user data via a form and displays the output (in a full screen dialog). I don't wish to have these output pages 'published' on the website, and these are simply to be saved/downloaded for internal use. So simply being able to get the "source code" for the dislayed view/page on the browser and saving this to a file would solve my problem. But I am not sure if there is a way to do this?
Its recommended that you use a well-known site generator such as Gatsby or Next for your static sites since "npx create-react-app my-app" is for single page apps.
(ref: https://reactjs.org/docs/create-a-new-react-app.html#recommended-toolchains)
If I'm understanding correctly, you need to generate a new page link for each user. Each of your users will have their own link (http/https) to share with their users.
For example, a scheduling tool will need each user to create their own "booking page", which is a generated link (could be on your domain --> www.yourdomain.com/bookinguser1).
You'll need user profiles to store each user's custom page, a database, and such. If you're not comfortable, I'll use something like an e-commerce tool that will do it for you.
You can turn on the debugger (f12) and go to "Elements"
Then right-click on the HTML tag and press edit as HTML
And then copy everything (ctrl + a)

Obtaining the URL for a Particular Page in DotNetNuke 7

I have created a page in DNN 7 and added the standard feedback module available at Codeplex to it. Now I want to link to this page using a hyperlink in the middle of another page (not from a menu).
I am able to see the URL for the feedback page via the admin pages and it seems to be consistent. So the obvious way would be to use the HTML module and simply hardcode the URL. But something feels wrong about that. I thought of creating a simple module, encapsulate the hyperlink and surrounding text in a control and use NavigateURL to obtain the URL for the feedback page. Unfortunately, I have not been able to figure out how to do that. I have seen a lot of information about getting the URL for other controls within the same module and even using ModuleID but nothing that would help me implement the code for getting the URL for a particular page at my level of experience.
Sorry about the long intro but I was wondering if it is good practice to hardcode the URL and if not how to programmatically obtain the URL for the feedback page.
TIA
The first argument to NavigateURL is TabId (pages are called tabs in the DNN API). To get the ID of the Feedback tab/page, you'll want to call a method off of the DotNetNuke.Entities.Tabs.TabController class; I'd suggest the static method TabController.GetTabByTabPath(portalId, tabPath, cultureCode), so something like this:
Globals.NavigateURL(TabController.GetTabByTabPath(this.PortalId, "//Feedback", string.Empty))
You're still hard-coding the path to the page here; you could have a setting, which would let you pick the page, but that seems like a bit of overkill for a simple link. The main benefit that you would get by hard-coding the path, but still using NavigateURL is that any changes you make to how URLs are generated (e.g. upgrading to the Advanced URL Provider that comes in DNN 7.1) will happen automatically.
Most folks don't worry much about programatically generating links in HTML content.

How to buffer results from a webpage to another?

I am using flask for my website.
I want to submit data using a form from my website to another website and then format the results page (an html page) before displaying it on the browser on my website.
After the form is submitted, the browser is directed to results page of that (another) website. Can I redirect again it to my site?
How to process the results page(call its html content as a string in code) ?
Sounds the process like this: user input the data -> your website get the data and post it to another website -> the website return a worked data -> your website receive this data, parse it and display it. I don't know you are familiar with API or not. If you are familiar with it you should know the process is just like using API. You just need to put all the process in a view and add_url_rule for it.

live content from html to html

I'm using UIWebView to display data from my organization data (publicize and legal), however, for instance, I would only want to pull specific data from the html file rather than pulling the whole URL. e.g. I want to pull the "News" section of the html and I want the user to only stay in that page, not enabling them to go into other parts of the website (e.g. home page, contact us) and allowing them to view the PDF article on the HTML file.
I've asked around and read up on DOM and screen scraping, but it seem that the data pulled are stored in a database instead.
Is there any way that I can pull just the HTML "News" section with the PDF URL into my customized HTML file and that it will be updated live (maybe every 30second it will refresh and pull information from the website so that the content and list of PDF are up to date)(e.g. added in 3new article into the main website, my customize HTML file will also refresh and pull information from website and update my article list)
If anyone can point to me a specific method that allow HTML to HTML data passing (live), that will be great and I can go do more research on it. Currently very lost and confuse as it is my first time doing this. Any help/feedback will be very much appreciated :)
EDIT: For example, google map or google search. I don't want to use the whole google webpage, just taking the important thing that i want like the search result or map display.
This will involve quite a lot of learning on your part - you'll have to learn HTML / the DOM / JavaScript and iOS/UIWebVIew.
Lets leave the live refresh part for now, I'll post another answer or edit to that later on.
That's not going to easy either (check out my earlier posting today on background execution issues that will affect you, unless the update is only to take place in the foreground
iOS Run Code Once a Day)
You will have to do something like this. And note that I've never tried this, nor seen posting of people who have on here, but in theory it should work, but there will be a lot of learning as I've said, and lots of trial and error. Its a big task when you're not familiar with these things.
1) Download the html page and load it in a UIWebView, but that UIWebView is hidden so the user's can't see it.
2) When the page has loaded its dom will be accessable.
3) You can use Javascript to access the DOM and look for the parts you want.
How you inject and run the Javascript in UIWebView can be answered in a separate question (this answer will get too long if all the exact details are included).
4) Remove the parts of the dom you are not interested in. Or use use events to make only those parts you are interested in appear, jQuery can probably help here.
5) Display the UIWebView
Alternatively the HTML could be saved to a file and string parsing could be used to search for the bits you are looking for and create a new text html file from it. I think this would get very messy, better to take advantage of the fact that UIWebView will parse the HTML page and create the dom for you.

Is there anyway of making json data readable by a Google spider?

Is it possible to make JSON data readable by a Google spider?
Say for instance that I have a JSON feed that contains the data for an e-commerce site. This JSON data is used to populate a human-readable page in the users browser. (I.E. The translation from JSON data to human displayed page is done inside the users browser; not my choice, just what I've been given to work with, its an old legacy CGI application and not an actual server-side scripting language.)
My concern here is that, the google spiders will not be able to pickup/directly link to the item in question when a user clicks on it in google, being presented with an index page full of all the items, rather than being linked directly to the item they clicked on.
Is there anyway of "informing" the google spider in the JSON that what they should feed the user a different link?
While Google does crawl and index JavaScript in some circumstances, it's still best to serve "normal" (X)HTML content if at all possible. In this case, it would help to know the rest of the site's setup, in particular: is the JSON content just used to create a feed of links to the product pages (with static content) or are all product pages also generated by JSON feeds? If the feed is only used to point to the actual product pages (which are static) then one way to make the product pages discoverable could be to create a HTML sitemap page or some other alternate form of navigation. A XML Sitemap file can also help, but I would recommend not using it as the sole way of making the product pages discoverable.
If all of the content is only accessible through JSON feeds, then I think you will have to make some bigger changes if you want that content to be accessible through search results.
One way to handle it could also be to use the new JavaScript crawling/indexing proposal, which basically would result in a headless browser being set up between your site and Google: http://code.google.com/web/ajaxcrawling/ (whether setting this up or revamping the rest of the site is easier is hard to say :-))
You should make a wrapper page in server-side code around the JSON data, and respond to requests with either the wrapper or the regular version depending on the User-Agent.