My java program is storing the content of web page in the string sb and I want to parse the string to HTML DOM. How do I do that?
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import java.net.*;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Scraper {
public static void main(String[] args) throws IOException, SAXException {
URL u;
try {
u = new URL("https://twitter.com/ssjsatish");
URLConnection cn = u.openConnection();
System.out.println("content type: "+cn.getContentType());
InputStream is = cn.getInputStream();
long l = cn.getContentLengthLong();
StringBuilder sb = new StringBuilder();
if (l!=0) {
int c;
while ((c = is.read()) != -1) {
sb.append((char)c);
}
is.close();
System.out.println(sb);
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource i = new InputSource();
i.setCharacterStream(new StringReader(sb.toString()));
Document doc = db.parse(i);
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
}
}
You don't want to use an XML parser to parse HTML, because not all valid HTML is valid XML. I would recommend using a library specifically designed to parse "real-world" HTML, for example I have had good results with jsoup, but there are others. Another advantage of using this sort of library is that their APIs are designed with Web Scraping in mind, and provide much simpler ways of accessing data in the HTML document.
Related
I'm writing api using vertx router, and i need my api to return list of json object, is it possible? Thanks in advance
Yes its possible, here is a full example with vert.x json abstraction:
import io.vertx.core.Vertx;
import io.vertx.core.json.JsonArray;
import io.vertx.core.json.JsonObject;
import io.vertx.ext.web.Router;
public class Main {
public static void main(String[] args) {
Vertx vertx = Vertx.vertx();
Router router = Router.router(vertx);
router.get("/foo")
.handler(ctx -> {
JsonArray jsonArray = new JsonArray();
jsonArray.add(new JsonObject());
jsonArray.add(new JsonObject());
ctx.response()
.setChunked(true)
.setStatusCode(200)
.end(jsonArray.toBuffer());
});
vertx
.createHttpServer()
.requestHandler(router)
.listen(8080);
}
}
In case you want to use your own library for json manipulation + pojos, you can work with strings instead when ending the request. e.g. with jackson:
router.get("/foo")
.handler(ctx -> {
List<MyObject> myObjectList = new ArrayList<>();
ObjectMapper objectMapper = new ObjectMapper();
String myObjectListJsonStr = objectMapper.writeValueAsString(myObjectList);
ctx.response()
.setChunked(true)
.setStatusCode(200)
.end(myObjectListJsonStr);
});
I'm exploring reactive programming with Spring Webflux and therefore, I'm trying to make my code completely nonblocking to get all the benefits of a reactive application.
Currently my code for the method to parse a Json String to a JsonNode to get specific values (in this case the elementId) looks like this:
public Mono<String> readElementIdFromJsonString(String jsonString){
final JsonNode jsonNode;
try {
jsonNode = MAPPER.readTree(jsonString);
} catch (IOException e) {
return Mono.error(e);
}
final String elementId = jsonNode.get("elementId").asText();
return Mono.just(elementId);
}
However, IntelliJ notifies me that I'm using an inappropriate blocking method call with this code:
MAPPER.readTree(jsonString);
How can I implement this code in a nonblocking way? I have seen that since Jackson 2.9+, it is possible to parse a Json String in a nonblocking async way, but I don't know how to use that API and I couldn't find an example how to do it correctly.
I am not sure why it is saying it is a blocking call since Jackson is non blocking as far as I know. Anyway one way to resolve this issue is to use schedulers if you do not want to use any other library. Like this.
public Mono<String> readElementIdFromJsonString(String input) {
return Mono.just(Mapper.readTree(input))
.map(it -> it.get("elementId").asText())
.onErrorResume( it -> Mono.error(it))
.subscribeOn(Schedulers.boundedElastic());
}
Something along that line.
import reactor.core.publisher.Mono;
import java.nio.charset.StandardCharsets;
import org.springframework.core.ResolvableType;
import org.springframework.core.io.buffer.DataBufferUtils;
import org.springframework.core.io.buffer.DefaultDataBuffer;
import org.springframework.core.io.buffer.DefaultDataBufferFactory;
import org.springframework.http.codec.json.AbstractJackson2Decoder;
import org.springframework.util.MimeType;
import org.springframework.util.MimeTypeUtils;
import com.fasterxml.jackson.databind.ObjectMapper;
#FunctionalInterface
public interface MessageParser<T> {
Mono<T> parse(String message);
}
public class JsonNodeParser extends AbstractJackson2Decoder implements MessageParser<JsonNode> {
private static final MimeType MIME_TYPE = MimeTypeUtils.APPLICATION_JSON;
private static final ObjectMapper OBJECT_MAPPER = allocateDefaultObjectMapper();
private final DefaultDataBufferFactory factory;
private final ResolvableType resolvableType;
public JsonNodeParser(final Environment env) {
super(OBJECT_MAPPER, MIME_TYPE);
this.factory = new DefaultDataBufferFactory();
this.resolvableType = ResolvableType.forClass(JsonNode.class);
this.setMaxInMemorySize(100000); // 1MB
canDecodeJsonNode();
}
#Override
public Mono<JsonNode> parse(final String message) {
final byte[] bytes = message.getBytes(StandardCharsets.UTF_8);
return decode(bytes);
}
private Mono<JsonNode> decode(final byte[] bytes) {
final DefaultDataBuffer defaultDataBuffer = this.factory.wrap(bytes);
return this.decodeToMono(Mono.just(defaultDataBuffer), this.resolvableType, MIME_TYPE, Map.of())
.ofType(JsonNode.class)
.subscribeOn(Schedulers.boundedElastic())
.doFinally((t) -> DataBufferUtils.release(defaultDataBuffer));
}
private void canDecodeJsonNode() {
if (!canDecode(this.resolvableType, MIME_TYPE)) {
throw new IllegalStateException(String.format("JsonNodeParser doesn't supports the given tar`enter code here`get " +
"element type [%s] and the MIME type [%s]", this.resolvableType, MIME_TYPE));
}
}
}
as 2.0 was like java, now in 3.0, I cannot find my error in this code:
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.net.Socket;
public class Main {
public static void main(String[] args){
String hostName = "localhost";
Int portNumber = 16834;
String str = "starttimer\r\n";
try {
#SuppressWarnings("resource")
Socket socket = new Socket(hostName, portNumber);
OutputStreamWriter osw = new <br>OutputStreamWriter(socket.getOutputStream(), "UTF-8");$
send(str, osw);
} catch (Exception e) {
System.err.println(e.getMessage());
}
}
static void send(String str, OutputStreamWriter o) throws IOException {
o.write(str, 0, str.length());
o.flush();
}
}
By the way dont worry about where is the data going
Obviously, it's not valid ActionScript code. It is Java. That's why you have syntax error. I suggest reading some required minimum about the ActionScript language to be able at least to distinguish it from other one before trying to implement something.
Hi guys I have a problem creating a JSON file from a google url that i have. This is my code that im using.
import android.util.Log;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
public class DownloadUrl {
public String readUrl(String strUrl) throws IOException, InterruptedException {
Log.d("URLS = ",strUrl);
Thread.sleep(2000);
String data = "";
InputStream iStream = null;
HttpURLConnection urlConnection = null;
try {
URL url = new URL(strUrl);
// Creating an http connection to communicate with url
urlConnection = (HttpURLConnection) url.openConnection();
// Connecting to url
urlConnection.connect();
// Reading data from url
iStream = urlConnection.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(iStream));
StringBuffer sb = new StringBuffer();
String line = "";
while ((line = br.readLine()) != null) {
sb.append(line);
}
data = sb.toString();
Log.d("downloadUrl", data.toString());
br.close();
} catch (Exception e) {
Log.d("Exception", e.toString());
} finally {
iStream.close();
urlConnection.disconnect();
}
return data;
}
}
It works fine when i throw a url that looks like this into it.
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.7207523,-73.383851&radius=4828&type=bar&key=MYKEY
But when i try and throw a url that looks like this into it.
https://maps.googleapis.com/maps/api/place/details/json?placeid=ChIJe3AmoGsr6IkRuWcK1LAh-DE&key=MYKEY
I get an error: D/GooglePlacesReadTask: java.lang.NullPointerException: Attempt to invoke virtual method 'void java.io.InputStream.close()' on a null object reference
I dont know how i fix this. Any help?
Aah
you did not mention this is in android,
I presume this because you said ,
android.os.NetworkOnMainThreadException
in your comment
Android does not allow time consuming tasks on main thread,
use AsyncTask to call your function or use plain old java thread
Network on main thread exception comes when you run a networking operation on main thread .
Generally AsyncTask is used for these works but if you want to use the same code you written Just add..
StrictMode.ThreadPolicy policy = new StrictMode.ThreadPolicy.Builder().permitAll().build();
StrictMode.setThreadPolicy(policy);
I have created a MySQL database with entries similar to nurse roster, Now i need to send this data to optaplanner deployed on my server. To which file do i need to send it in the optaplanner folder deployed on server to get the results displayed on my webpage.
I'm using Xstream to generate XML file.
Can any one please give me brief on how to make this functionality work and get me the desired results.
The whole dataset serialization from and to XML is part of optaplanner-examples: OptaPlanner itself doesn't provide or require any serialization format. That being said, optaplanner-examples includes the following serialization formats:
Every example: XStream XML format in data directories unsolved and solved. The format is defined by the XStream annotations (#XStreamAlias etc) on the domain classes. In some cases the XML format is too verbose, causing OutOfMemoryError, for example for the big MachineReassignment B datasets.
Most examples: Competition specific TXT format in data directories import and export. The format is defined by the competition (see docs). In the examples GUI, click on button Import to load them.
I suggested you to read the final chapter in optaplanner manual / documentation :
Chapter 15. Integration
If your data source is a database, you can annotate your domain POJO's with JPA annotations. I think it will be a waste if you still store the data from database to xml file then feed the xml file to optaplanner, it will be more wise to feed your POJO objects to optaplanner directly.
I don't know what your web application technology, but the general algorithm will be like this :
Get POJO object data from your database (you can use JPA etc.)
Construct the solution class object
Feed the solution object to optaplanner solver
Get the best solution from optaplanner solver and present it to your user in your user desire.
Take a look at CloudBalancingHelloWorld.java class to get the basic idea. Hope this can help you.
package com.jdbcxml;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.PrintStream;
import java.io.PrintWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import org.w3c.dom.Document;
class EmployeeDAO
{
private Connection conn = null;
static
{
try
{
Class.forName("com.mysql.jdbc.Driver");
}
catch (Exception e)
{
e.printStackTrace();
}
}
public EmployeeDAO()
{
String url = "jdbc:mysql://50.62.23.184:3306/gtuser";
String userId = "gtuser1";
String passWord = "";
try
{
conn = DriverManager.getConnection(url, userId, passWord);
}
catch (SQLException e)
{
e.printStackTrace();
}
}
public void finalize()
{
try
{
conn.close();
}
catch (SQLException e)
{
e.printStackTrace();
}
}
public Document getCustomerList()
{
Document doc = null;
try
{
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * from t7_users");
doc = JDBCUtil.toDocument(rs);
rs.close();
stmt.close();
}
catch (Exception e)
{
e.printStackTrace();
}
return doc;
}
public String getCustomerListAsString()
{
String xml = null;
try
{
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * from t7_users");
xml = JDBCUtil.toXML(rs);
rs.close();
stmt.close();
}
catch (Exception e)
{
e.printStackTrace();
}
return xml;
}
public static void main(String argv[]) throws Exception
{
EmployeeDAO dao = new EmployeeDAO();
String xml = dao.getCustomerListAsString();
System.out.println(xml);
Document doc = dao.getCustomerList();
System.out.println(doc);
//PrintWriter out = new PrintWriter(new FileWriter("output.txt"));
//out.write(doc);;
//out.close();
}
}
Here the pseudo code (I never actually use JSP, I currently using GWT) to give you the basic idea, but please do remember these notes :
I think it will be a waste to save your POJO objects to xml then use XStream library to extract it again to POJO objects. In optaplanner example, they use it because it only need a static data and for the sake of example.
I assume that you already create your approriate domain class model that fit your planning problem domain. Because this is one of the core concept of optaplanner.
In method generateCustomerRoster, you should put your own logic to convert your customer POJO objects to planning solution object.
Hope this can help you and lead you to finish your job. Thanks & Regards.
package com.jdbcxml;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.PrintStream;
import java.io.PrintWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import org.w3c.dom.Document;
public class EmployeeDAO
{
private Connection conn = null;
static
{
try
{
Class.forName("com.mysql.jdbc.Driver");
}
catch (Exception e)
{
e.printStackTrace();
}
}
public EmployeeDAO()
{
String url = "jdbc:mysql://50.62.23.184:3306/gtuser";
String userId = "gtuser1";
String passWord = "";
try
{
conn = DriverManager.getConnection(url, userId, passWord);
}
catch (SQLException e)
{
e.printStackTrace();
}
}
public void finalize()
{
try
{
conn.close();
}
catch (SQLException e)
{
e.printStackTrace();
}
}
public List<Customer> getCustomerList()
{
Document doc = null;
try
{
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * from t7_users");
doc = JDBCUtil.toDocument(rs);
rs.close();
stmt.close();
}
catch (Exception e)
{
e.printStackTrace();
}
return doc;
}
public CustomerRoster generateCustomerRoster(List<Customer> rawData) {
CustomerRoster result = new CustomerRoster();
// here you should write your logic to generate Customer Roster data from your Raw Data (Customer)
return result;
}
public static void main(String argv[]) throws Exception
{
// Build the Solver
SolverFactory solverFactory = SolverFactory.createFromXmlResource("yourSolverConfig.xml");
Solver solver = solverFactory.buildSolver();
// Load your problem
EmployeeDAO dao = new EmployeeDAO();
List<Customer> listCustomer = dao.getCustomerList();
CustomerRoster unsolvedCustomerRoster = generateCustomerRoster(listCustomer);
// Solve the problem
solver.solve(unsolvedCustomerRoster);
CustomerRoster solvedCustomerRoster = (CustomerRoster) solver.getBestSolution();
// Display the result
DataGrid grid = new DataGrid(solvedCustomerRoster); // Just change this line code to display to any of your view component
}
}