I have a StorageFile that contains XML. I read the XML from the StorageFile, then I edit it and then I save it again to the StorageFile using the following code:
using (var writeStream = await storageFile.OpenStreamForWriteAsync())
{
xDocument.Save(writeStream, SaveOptions.None);
}
However, when I make the contents shorter, eg from
<Node>
<Child>This is a verrrrrryyy long text</Child>
<Node>
to
<Node>
<Child>This is short</Child>
<Node>
The result on disk is as follows:
<Node>
<Child>This is short</Child>
<Node>rrryyy long text</Child>
<Node>
Obviously the Stream writes only the the new bytes in the file, leaving the old ones intact thus resulting in an invalid XML the next time I try to open it, so this is probably not the right way to save...
How should I be saving it?
SOLUTION is to truncate the stream:
using (var writeStream = await f.OpenStreamForWriteAsync())
{
if (writeStream.CanSeek && writeStream.Length > 0)
writeStream.SetLength(0);
_xml.Save(writeStream, SaveOptions.None);
}
Related
Currently I have a module pulling sql results like this:
[{ID: 'test', NAME: 'stack'},{ID: 'test2', NAME: 'stack'}]
I want to just literally have that written to file so i can read it as an object later, but i want to write it by stream because some of the objects are really really huge and keeping them in memory isnt working anymore.
I am using mssql https://www.npmjs.org/package/mssql
and I am stuck at here:
request.on('recordset', function(result) {
console.log(result);
});
how do I stream this out to a writable stream? I see options for object mode but i cant seem to figure out how to set it?
request.on('recordset', function(result) {
var readable = fs.createReadStream(result),
writable = fs.createWriteStream("loadedreports/bot"+x[6]);
readable.pipe(writable);
});
this just errors because createReadStream must be a filepath...
am I on the right track here or do I need to do something else?
You´re almost on the right track: You just dont need a readable stream, since your data already arrives in chunks.
Then, you can just create the writeable stream OUTSIDE of the actual 'recordset'-Event, else you would create a new stream everytime you get a new chunk (and this is not what you want).
Try it like this:
var writable = fs.createWriteStream("loadedreports/bot"+x[6]);
request.on('recordset', function(result) {
writable.write(result);
});
EDIT
If the recordset is already too big, use the row-Event:
request.on('row', function(row) {
// Same here
});
I wrote a small script listening on an udp-port and stores all incoming messages (one JSON object) inside a single file. The empty file contains an array in JSON format.
I'm looking for an efficient way to store all (concurrently) incoming messages from multiple clients inside this single file.
The files size can be multiple hundred of megabytes large. Parsing the file and appending the new object wouldn't be efficient as needed.
Do you have an approach?
EDIT
My solution, based on #t-j-crowder approach:
var dgram = require("dgram");
var fs = require("fs");
var udp_server = dgram.createSocket("udp4");
var udp_server_port = 5000
udp_server.on("message", function (msg, rinfo) {
var json_part = "{\"message\": " + msg + "}";
fs.open('./data/stats.json','r+',function(err,fd){
if(err) throw err
fs.fstat(fd,function(err,stats){
if(err) throw err
if(stats.size>2){
json_part = new Buffer(','+json_part+']','utf-8');
var pos = parseInt(stats.size)-1;
}else{
json_part = new Buffer('['+json_part+']','utf-8');
var pos = 0;
}
fs.write(fd,json_part, 0, json_part.length, pos, function(err,written,buffer){
if(err) throw err
fs.close(fd,function(){
});
});
});
});
});
udp_server.bind(udp_server_port);
Regards, Marcus
Fundamentally, you'll need to:
Open the file using a seekable, writable stream.
Seek to the end of it.
Back up one character (over the closing ] of the array).
Write out a comma (if this isn't the first entry) and the JSON of your new entry.
Write a closing ].
Close the file.
Looking at the NodeJS docs, it looks like Steps 2-4 (and arguably 5) are done all together, using the position argument of fs.write. (Be sure you open the file using r+, not one of the "append" modes.)
I am trying to save an image selected with FileOpenPicker. I am lunching this event when an image is selected
async void photoChooserTask_Completed(object sender, PhotoResult e)
{
// get the file stream and file name
Stream photoStream = e.ChosenPhoto;
string fileName = Path.GetFileName(e.OriginalFileName);
// persist data into isolated storage
StorageFile file = await ApplicationData.Current.LocalFolder.CreateFileAsync(fileName, CreationCollisionOption.ReplaceExisting);
using (Stream current = await file.OpenStreamForWriteAsync())
{
await photoStream.CopyToAsync(current);
}
}
But this code which will give me the lenght of the saved file return 0
var properties = await file.GetBasicPropertiesAsync();
i = properties.Size;
Have I done something wrong in saving the image?
You may need to flush the stream.
If that does not work, add a breakpoint and check the two streams' lengths after the copy. Are they equal? They should be. Anything suspicious on those two stream objects?
Edit
On the image you posted, I can see that you use the SetSource method of a BitmapImage with the same Stream that you copy. Once you do that, the Stream's Position will be at the end, as it was just read by that call.
CopyToAsync copies everything after the current Position of the Stream you call it on. Since the position is at the end, because it was just read, the CopyToAsync does not copy anthing.
All you need to do to fix your problem is set the stream's Position to 0.
I need to generate pdf from html dynamically using asp.net. HTML is stored in database. HTML has tables and css, upto 10 pages. I have tried iTextSharp by directly passing html, it produces pdf which is not opening. Destination pdf.codeplex.com has no documentation, it produces PDF with styles from parent page.
Any other solution will be helpful.
I've tried many HTML to PDF solutions including iTextSharp, wkhtmltopdf and ABCpdf (paid)
I'm currently settled on PhantomJS a headless, open-source, WebKit-based browser. It is scriptable with a javascript API which is reasonably well documented.
The only disadvantage I found was that attempting to use stdin to pass HTML into the process was unsuccessful because the REPL still has some bugs. I also found that using stdout seemed to be a lot slower than simply allowing the process to write to disk.
The code below avoids stdin and stdout by creating the javascript input as a temp file, executing PhantomJS, copying the output file to a MemoryStream and cleaning up the temporary files at the end.
using System.IO;
using System.Drawing;
using System.Diagnostics;
public Stream HTMLtoPDF (string html, Size pageSize) {
string path = "C:\\dev\\";
string inputFileName = "tmp.js";
string outputFileName = "tmp.pdf";
StringBuilder input = new StringBuilder();
input.Append("var page = require('webpage').create();");
input.Append(String.Format("page.viewportSize = {{ width: {0}, height: {1} }};", pageSize.Width, pageSize.Height));
input.Append("page.paperSize = { format: 'Letter', orientation: 'portrait', margin: '1cm' };");
input.Append("page.onLoadFinished = function() {");
input.Append(String.Format("page.render('{0}');", outputFileName));
input.Append("phantom.exit();");
input.Append("};");
// html is being passed into a string literal so make sure any double quotes are properly escaped
input.Append("page.content = \"" + html.Replace("\"", "\\\"") + "\";");
File.WriteAllText(path + inputFileName, input.ToString());
Process p;
ProcessStartInfo psi = new ProcessStartInfo();
psi.FileName = path + "phantomjs.exe";
psi.Arguments = inputFileName;
psi.WorkingDirectory = Path.GetDirectoryName(psi.FileName);
psi.UseShellExecute = false;
psi.CreateNoWindow = true;
p = Process.Start(psi);
p.WaitForExit(10000);
Stream strOut = new MemoryStream();
Stream fileStream = File.OpenRead(path + outputFileName);
fileStream.CopyTo(strOut);
fileStream.Close();
strOut.Position = 0;
File.Delete(path + inputFileName);
File.Delete(path + outputFileName);
return strOut;
}
I have to obtain a json that is incrusted inside a script tag in certain page... so I can't use regular scraping techniques, like cheerio.
Easy way out, write the file (download the page) to the server and then read it using string manipulation to extract the json (there are several) work on them and save to my db hapily.
the thing is that I'm too new to nodeJS, and can't get the code to work, I think that I'm trying to read the file before it is fully written, and if read it time before obtain [Object Object]...
Here's what I have so far...
var http = require('http');
var fs = require('fs');
var request = require('request');
var localFile = 'tmp/scraped_site_.html';
var url = "siteToBeScraped.com/?searchTerm=foobar"
// writing
var file = fs.createWriteStream(localFile);
var request = http.get(url, function(response) {
response.pipe(file);
});
//reading
var readedInfo = fs.readFileSync(localFile, function (err, content) {
callback(url, localFile);
console.log("READING: " + localFile);
console.log(err);
});
So first of all I think you should understand what went wrong.
The http request operation is asynchronous. This means that the callback code in http.get() will run sometime in the future, but the fs.readFileSync, due to its synchronous nature will execute and complete even before the http request will actually be sent to the background thread that will execute it, since they are both invoked in what is commonly known as the (same) tick. Also fs.readFileSync returns a value and does not use a callback.
Even if you replace fs.readFileSync with fs.readFile instead the code still might not work properly since the readFile operation might execute before the http response is fully read from the socket and written to the disk.
I strongly suggest reading: stackoverflow question and/or Understanding the node.js event loop
The correct place to invoke the file read is when the response stream has finished writing to the file, which would look something like this:
var request = http.get(url, function(response) {
response.pipe(file);
file.once('finish', function () {
fs.readFile(localFile, /* fill encoding here */, function(err, data) {
// do something with the data if there is no error
});
});
});
Of course this is a very raw and not recommended way to write asynchronous code but that is another discussion altogether.
Having said that, if you download a file, write it to the disk and then read it all back again to the memory for manipulation, you might as well forgo the file part and just read the response into a string right away. Your code will then look something like so (this can be implemented in several ways):
var request = http.get(url, function(response) {
var data = '';
function read() {
var chunk;
while ( chunk = response.read() ) {
data += chunk;
}
}
response.on('readable', read);
response.on('end', function () {
console.log('[%s]', data);
});
});
What you really should do IMO is to create a transform stream that will strip away all the data you need from the response, while not consuming too much memory and yielding this more elegantly looking code:
var request = http.get(url, function(response) {
response.pipe(yourTransformStream).pipe(file)
});
Implementing this transform stream, however, might prove slightly more complex. So if you're a node beginner and you don't plan on downloading big files or lots of small files than maybe loading the whole thing into memory and doing string manipulations on it might be simpler.
For further information about transformation streams:
node.js stream api
this wonderful guide by substack
this post from strongloop
Lastly, see if you can use any of the million node.js crawlers already out there :-) take a look at these search results on npm
According to the http module help 'get' does not return the response body
This is modified from the request example on the same page
What you need to do is process the response with in the callback (function) passed into http.request so it can be called when it is ready (async)
var http = require('http')
var fs = require('fs')
var localFile = 'tmp/scraped_site_.html'
var file = fs.createWriteStream(localFile)
var req = http.request('http://www.google.com.au', function(res) {
res.pipe(file)
res.on('end', function(){
file.end()
fs.readFile(localFile, function(err, buf){
console.log(buf.toString())
})
})
})
req.on('error', function(e) {
console.log('problem with request: ' + e.message)
})
req.end();
EDIT
I updated the example to read the file after it is created. This works by having a callback on the end event of the response which closes the pipe and then it can reopen the file for reading. Alternatively you can use
req.on('data', function(chunk){...})
to process the data as it arrives without putting it into a temporary file
My impression is that you serializing a js object into JSON by reading it from a stream that's downloading a file containing HTML. This is do-able yet hard. Its difficult to know when you're search expression is found because if you parse as the chunks come in then you never know if you received only context and you could never find what you're looking for because it was split into 2 or many parts which were never analyzed as a whole.
You could try something like this:
http.request('u/r/l',function(res){
res.on('data',function(data){
//parse data as it comes in
}
});
This allows you to read data as it comes in. You can handle it to save to disc, db, or even parse it if you accumulated the contents within the script tags into a single string then parsed objects in that.