I am posting some data to a server using DefaultHttpClient class and in the response stream I am getting a HTML file. I save the stream as a string and pass it onto another activity which contains a WebView to render this HTML on the screen:
response = httpClient.execute(get);
InputStream is = response.getEntity().getContent();
BufferedReader br = new BufferedReader(new InputStreamReader(is,"utf-8"));
StringBuffer sb = new StringBuffer();
String line;
while((line=br.readLine())!=null){
sb.append(line);
sb.append("\n");
}
is.close();
Intent intent = new Intent(this,Trial.class);
intent.putExtra("trial",sb.toString());
startActivity(intent);
Log.i("SB",sb.toString());
In Second Activity, the code to load the WebView reads:
WebView browser = ((WebView)findViewById(R.id.trial_web));
browser.getSettings().setJavaScriptEnabled(true);
browser.loadData(html,"text/html", "utf-8");
When I run this code, the WebView is not able to render the HTML content properly. It actually shows the HTML string in URL encoded format on the screen. Interestingly, If I copy the Loggers output to HTML file and then load this HTML in my WebView(using webview.loadurl(file:///assets/xyz.html)) everything works fine.
I suspect some problem with character encoding.
What is going wrong here? Please help.
Thanks.
Try using a BasicResponseHandler rather than converting it all into a string yourself. See here for an example. I am skeptical this will help, but it will simplify your code and let you get rid of the inefficient StringBuffer.
Also, you might try switching to loadDataWithBaseURL(), as I have had poor results with loadData(). The aforementioned example shows this as well.
Related
I am trying to parse sidebar TOC(Table of Components) of some documentation site.
Jsoup
I have tried Jsoup. I can not get TOC elements because the HTML content in this tag is not part of initial HTML but is set by JavaScript after the page is loaded.
You can see my previous question here:JSoup cannot parse child elements after depth 2
The suggested solution is to examine what connections are made manually from the Browser Dev Tools menu find the last version of the website. Parsing sidebar TOC of some documentation site is just one component of my java program so I cannot do this manually.
JavaFX Webview(not Android Webview)
I have tried JavaFX Webview because I need a browser that executes javascript code and fills Toc tag components.
WebView browser = new WebView();
WebEngine webEngine = browser.getEngine();
webEngine.load("https://learn.microsoft.com/en-us/ef/ef6/");
But I don't know how can I retrieve HTML code of the loaded website and transfer this data to Jsoup Document?
ANy advice appreciated.
WebView browser = new WebView();
WebEngine webEngine = browser.getEngine();
String url = "https://learn.microsoft.com/en-us/ef/ef6/";
webEngine.load(url);
//get w3c document from webEngine
org.w3c.dom.Document w3cDocument = webEngine.getDocument();
// use jsoup helper methods to convert it to string
String html = new org.jsoup.helper.W3CDom().asString(webEngine.get);
// create jsoup document by parsing html
Document doc = Jsoup.parse(url, html);
I can't promise this is the best way as I've not used Jsoup before and I'm not an expert on the XML API.
The org.jsoup.Jsoup class has a method for parsing HTML in String form: Jsoup.parse(String). This means we need to get the HTML from the WebView as a String. The WebEngine class has a document property that holds a org.w3c.dom.Document. This Document is the HTML content of the currently showing web page. We just need to convert this Document into a String, which we can do with a Transformer.
import java.io.StringWriter;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.jsoup.Jsoup;
public class Utils {
private static Transformer transformer;
// not thread safe
public static org.jsoup.nodes.Document convert(org.w3c.dom.Document doc)
throws TransformerException {
if (transformer == null) {
transformer = TransformerFactory.newDefaultInstance().newTransformer();
}
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(doc), new StreamResult(writer));
return Jsoup.parse(writer.toString());
}
}
You would call this every time the document property changes. I did some "tests" by browsing Google and printing the org.jsoup.nodes.Document to the console and everything seems to be working.
There is a caveat, though; as far as I understand it the document property does not change when there are changes within the same web page (the Document itself may be updated, however). I'm not a web person, so pardon me if I don't make sense here, but I believe that this includes things like a frame changing its content. There may be a way around this by interfacing with the JavaScript using WebEngine.executeStript(String), but I don't know how.
I have a bytearray of a tiff image. When I convert into the same format, image opens. But when I convert it into jpg, it doesn't (in Chrome, but works in IE).
PS: I want to directly convert the bytearray to show image dynamically as per my requirement.
ByteArrayOutputStream bOutStream = new ByteArrayOutputStream();
bOutStream = < ... Tiff image Stream Received from my API Call... >
byte[] chqImage = bOutStream.toByteArray();
response.setContentType("image/jpeg");
BufferedOutputStream output = null;
output = new BufferedOutputStream(response.getOutputStream());
output.write( bOutStream.toByteArray());
output.flush();
You're going to need to actually translate the image itself from TIFF to JPEG. To do that, I recommend reviewing the ImageIO library in Java.
I feel like the javadoc for that library is pretty straightforward, so you should be OK there.
I have a file html, and i want to get string from this file html. I use web browser for read html, but i don't know how to get string in web browser. Can't you help me. Thank all!
WebBrowser web = new WebBrowser();
web.Source = new Uri("Assets/text.html", UriKind.RelativeOrAbsolute);
Try this:
WebBrouser = new WebBrowser();
WebBrouser.Navigate(new Uri("Assets/text.html", UriKind.Relative);
I am assumed that your html file content is in proper format.
I have a ckeditor (http://ckeditor.com/) on my site. I would like for users to be able to push a button to generate a PDF. Currently, I have them press the print function that came with ckeditor, which brings up the print window and from most browsers they can generate a PDF. But I want to make it simplier. I know that generating PDFs from html is difficult, but are there any simple solutions to do this (generate a PDF from the html that ckeditor gives)?
I've heard of a few solutions like fpdf, dompdf and html2pdf.
You can use iText and XMLWorker to create PDF from HTML code.
public void createPDF() throws DocumentException, IOException
{
String fileName="path you want to create the document";
Document document=new Document();
PdfWriter pdfWriter=PdfWriter.getInstance(document, new FileOutputStream(fileName));
document.open();
String finall="<h1>This is a Demo</h1>";
InputStream is = new ByteArrayInputStream(finall.getBytes());
XMLWorkerHelper.getInstance().parseXHtml(pdfWriter,document, is);
document.close();
}
Here we are using XML worker so all your tags should be closed correctly. You need iText and XMLWorker JAR files.Hope this will help you.
I'm scraping a static html site and moving the content into a database-backed CMS. I'd like to use Textile in the CMS.
Is there a tool out there that converts HTML into Textile, so I can scrape the existing site, convert the HTML to Textile, and insert that data into the database?
I know this is an old question, but I found myself trying to do this the other day and not finding anything useful, until I found Pandoc. It can convert loads of other markup formats as well - it's quite brilliant.
Here is a c# lib converting html 2 textile. Though it is textile with their additions. Not pure textile.
Since there was no javascript implementation, I wrote one:
https://github.com/cmroanirgo/to-textile
It's a little primitive at the moment, as it's a blind port of the 'to-markdown' equivalent, but should get the job done.
This is a simple markup replacement, nothing a good regex could not fix.
I recommend Perl, LWP::Simple and some regexes to do the whole thing (spidering, stripping design and menus, converting to textile, and then posting to the database.)
try this simple java code hope it work for you
import java.net.*;
import java.io.*;
class Crawle
{
public static void main(String ar[])throws Exception
{
URL url = new URL("https://www.google.co.in/#q=i+am+happy");
InputStream io = url.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(io));
FileOutputStream fio = new FileOutputStream("crawler/file.txt");
PrintWriter pr = new PrintWriter(fio,true);
String data = "";
while((data=br.readLine())!=null)
{
pr.println(data);
System.out.println(data);
}
}
}
}