parsing HTML as string to get values using keyword - html

I have an html file which is read as a string.. i want to parse that and get values using <TD colSpan=2>Value :
So there are around 10 values i should get from the html file.. how can i do that.. i am trying to use something like
startindex endindex getsubstring
sMainBeginKeyword = "<td>Value : ";
sBeginKeyword = "<td>Value : ";
sEndKeyword = "</td>";
main_begin_index = result.indexOf(sMainBeginKeyword);
while (main_begin_index != -1) {
begin_index = main_begin_index;
end_index = result.indexOf(sEndKeyword, begin_index);
String deloc= result.substring(begin_index + sBeginKeyword.length(), end_index);
But this looks complicated and i can not retrieve more values .. As i have a lot of values with different keywords..

This sort of thing really does need to be done using an XML or DOM parser: Trying to do it with string searches is setting yourself up for failure.
If you loaded the HTML into an XML or DOM parser, the task you're trying to do would be trivial to achieve using XPath notation to find the relevant elements.
You haven't specified which language or platform you're working on (and the code sample you've given is insufficient to be sure either), so it's hard to be any more specific.
Hope that helps.

Related

LUA, multiple number string to number

so I use web to request json:
{"number":"1,2,3"} OR table = {number="1,2,3"}
and when I used this, it shows number:
typeof(1,2,3)
but when i directly get the data from the json/table, it shows string, so is there anyway to convert it to show it as number?
A Lua pattern might also a good choice to get the numbers from the raw string; then use tonumber() as suggested and add the numbers to a table in the sample code as shown below:
numbers = {}
str = '1,2,3'
for num in string.gmatch(str, '([^,]+)') do
table.insert(numbers, tonumber(num))
end

Newtonsoft SelectToken() on Jarray didn't work

I've got an weird problem I came across, I believe there's a better and more efficient way than the following description:
I have a (Newtonsoft) JArray 'ja' with two elements ja[0] and ja[1], ja prints as follows:
[ 1465164019, "{\"date\":{\"y\":16,\"d\":5,\"m\":6},\"DataDesc\":[\"0\",[\"H\",\"M\",\"S\",\"V\",\"L\",\"CL\",\"LTC\"]],\"Data\":[[23,59,23,27,38,1,61252]]}"]
Both ja[0] and ja[1] show the correct values (shown as they appear in Visual Studio):
ja[0] = {1465164019}
ja[1] = {{"date":{"y":16,"d":5,"m":6},"trafficDataDesc":["0",["H","M","S","V","L","CL","LTC"]],"trafficData":[[23,59,23,27,38,1,61252]]}}
I want to get the value with the path "date.m" as follows:
String month = ja[1].SelectToken("date.m").ToString();
that though didn't work, SelectToken() returns null. However, if I access it by reparsing
JObject jo = JObject.Parse(ja[1].ToString());
String month = jo.SelectToken("date.m").ToString();
I get my '6'. The detour via reparsing the ToString() is not that efficient or pretty, though I couldn't get the direct version to work... how would it be done correctly?
Many thanks!

Setting lua table in redis

I have a lua script, which simplified is like this:
local item = {};
local id = redis.call("INCR", "counter");
item["id"] = id;
item["data"] = KEYS[1]
redis.call("SET", "item:" .. id, cjson.encode(item));
return cjson.encode(item);
KEYS[1] is a stringified json object:
JSON.stringify({name : 'some name'});
What happens is that because I'm using cjson.encode to add the item to the set, it seems to be getting stringified twice, so the result is:
{"id":20,"data":"{\"name\":\"some name\"}"}
Is there a better way to be handling this?
First, regardless your question, you're using KEYS the wrong way and your script isn't written according to the guidelines. You should not generate key names in your script (i.e. call SET with "item:" .. id as a keyname) but rather use the KEYS input array to declare any keys involved a priori.
Secondly, instead of passing the stringified string with KEYS, use the ARGV input array.
Thirdly, you can do item["data"] = json.decode(ARGV[1]) to avoid the double encoding.
Lastly, perhaps you should learn about Redis' Hash data type - it may be more suitable to your needs.

How to convert data to CSV or HTML format on iOS?

In my application iOS I need to export some data into CSV or HTML format. How can I do this?
RegexKitLite comes with an example of how to read a csv file into an NSArray of NSArrays, and to go in the reverse direction is pretty trivial.
It'd be something like this (warning: code typed in browser):
NSArray * data = ...; //An NSArray of NSArrays of NSStrings
NSMutableString * csv = [NSMutableString string];
for (NSArray * line in data) {
NSMutableArray * formattedLine = [NSMutableArray array];
for (NSString * field in line) {
BOOL shouldQuote = NO;
NSRange r = [field rangeOfString:#","];
//fields that contain a , must be quoted
if (r.location != NSNotFound) {
shouldQuote = YES;
}
r = [field rangeOfString:#"\""];
//fields that contain a " must have them escaped to "" and be quoted
if (r.location != NSNotFound) {
field = [field stringByReplacingOccurrencesOfString:#"\"" withString:#"\"\""];
shouldQuote = YES;
}
if (shouldQuote == YES) {
[formattedLine addObject:[NSString stringWithFormat:#"\"%#\"", field]];
} else {
[formattedLine addObject:field];
}
}
NSString * combinedLine = [formattedLine componentsJoinedByString:#","];
[csv appendFormat:#"%#\n", combinedLine];
}
[csv writeToFile:#"/path/to/file.csv" atomically:NO];
The general solution is to use stringWithFormat: to format each row. Presumably, you're writing this to a file or socket, in which case you would write a data representation of each string (see dataUsingEncoding:) to the file handle as you create it.
If you're formatting a lot of rows, you may want to use initWithFormat: and explicit release messages, in order to avoid running out of memory by piling up too many string objects in the autorelease pool.
And always, always, always remember to escape the values correctly before passing them to the formatting method.
Escaping (along with unescaping) is a really good thing to write unit tests for. Write a function to CSV-format a single row, and have test cases that compare its result to correct output. If you have a CSV parser on hand, or you're going to need one, or you just want to be really sure your escaping is correct, write unit tests for the parsing and unescaping as well as the escaping and formatting.
If you can start with a single record containing any combination of CSV-special and/or SQL-special characters, format it, parse the formatted string, and end up with a record equal to the one you started with, you know your code is good.
(All of the above applies equally to CSV and to HTML. If possible, you might consider using XHTML, so that you can use XML validation tools and parsers, including NSXMLParser.)
CSV - comma separated values.
I usually just iterate over the data structures in my application and output one set of values per line, values within set separated with comma.
struct person
{
string first_name;
string second_name;
};
person tony = {"tony", "momo"};
person john = {"john", "smith"};
would look like
tony, momo
john, smith

Zend Framework dom problem

I want to get website shortcut icon(favicon) and stylesheet path with zend_dom query
$dom = new Zend_Dom_Query($html);
$stylesheet = $dom->query('link[rel="stylesheet"]');
$shortcut = $dom->query('link[rel="shortcut icon"]');
Stylesheet query is work but shortcut icon query not work. How i do?
Thanks.
This appears to be an issue with Zend's css style query implementation. In Zend/Dom/Query.php, the query function calls a conversion function to convert the query into proper xpath format:
public function query($query)
{
$xpathQuery = Zend_Dom_Query_Css2Xpath::transform($query);
return $this->queryXpath($xpathQuery, $query);
}
However within the transform() method, they seem to be using some pretty basic regex to split up the string by spaces:
$segments = preg_split('/\s+/', $path);
Which basically means your link[rel="shortcut icon"] query now becomes two queries: link[rel="shortcut and icon"]
To get around this, you can use the method Zend_Dom_Query::queryXpath() and provide it with a proper xPath query. Like this:
$dom->queryXpath('//link[#rel="shortcut icon"]');