Javamail - How to handle large attachments? - smtp

Over imap im trying to get attachments from several messages.
It works fine, but if there is an attachment with about 20 mega-bytes, then it seems stuck and java is not continuing.
Here is where the problem occurs:
I want to get the content of the attachment and save it into a String:
...
MimeBodyPart attachment = (MimeBodyPart) multipart.getBodyPart(1);
if(!Part.ATTACHMENT.equalsIgnoreCase(attachment.getDisposition())) {
log.error("Part is not an attachment!");
} else {
log.debug("Checking " + localFile.getName() + " with " + attachment.getFileName() + ". Attachment-Size: " + (attachment.getSize()/(1024*1024)) + " mega-bytes.");
InputStream remoteFileIs = attachment.getInputStream();
remoteFileContent = IOUtils.toString(remoteFileIs); //stucked here, when attachment is large
remoteFileIs.close();
...
}
...
Are there any solutions to this?
Regards!

What does IOUtils.toString do? Since you're just giving it an InputStream, with no charset information, it can't possibly be converting the byte stream into characters properly. And whatever it's doing it may be doing it inefficiently for large data.
You can turn on JavaMail Session debugging and see the protocol trace as it's fetching the attachment, to determine whether it's really "stuck" or just slow.
You can also control the buffer size for fetches from the IMAP server by setting the mail.imap.fetchsize property.
But perhaps you should question whether you really want a 20MB attachment in a String. What are you going to do with that String once you have it?

Related

Append string to a Flash AS3 shared object: For science

So I have a little flash app I made for an experiment where users interact with the app in a lab, and the lab logs the interactions.
The app currently traces a timestamp and a string when the user interacts, it's a useful little data log in the console:
trace(Object(root).my_date + ": User selected the cupcake.");
But I need to move away from using traces that show up in the debug console, because it won't work outside of the developer environment of Flash CS6.
I want to make a log, instead, in a SO ("Shared Object", the little locally saved Flash cookies.) Ya' know, one of these deals:
submit.addEventListener("mouseDown", sendData)
function sendData(evt:Event){
{
so = SharedObject.getLocal("experimentalflashcookieWOWCOOL")
so.data.Title = Title.text
so.data.Comments = Comments.text
so.data.Image = Image.text
so.flush()
}
I don't want to create any kind of architecture or server interaction, just append my timestamps and strings to an SO. Screw complexity! I intend to use all 100kb of the SO allocation with pride!
But I have absolutely no clue how to append data to the shared object. (Cough)
Any ideas how I could create a log file out of a shared object? I'll be logging about 200 lines per so it'd be awkward to generate new variable names for each line then save the variable after 4 hours of use. Appending to a single variable would be awesome.
You could just replace your so.data.Title line with this:
so.data.Title = (so.data.Title is String) ? so.data.Title + Title.text : Title.text; //check whether so.data.Title is a String, if it is append to it, if not, overwrite it/set it
Please consider not using capitalized first letter for instance names (as in Title). In Actionscript (and most C based languages) instance names / variables are usually written with lowercase first letter.

How to upload multiple JSON files into CouchDB

I am new to CouchDB. I need to get 60 or more JSON files in a minute from a server.
I have to upload these JSON files to CouchDB individually as soon as I receive them.
I installed CouchDB on my Linux machine.
I hope some one can help me with my requirement.
If possible can someone help me with pseudo code.
My Idea:
Is to write a python script for uploading all JSON files to CouchDB.
Each and every JSON file must be each document and the data present in
JSON must be inserted same into CouchDB
(the specified format with values in a file).
Note:
These JSON files are Transactional, every second 1 file is generated
so I need to read the file upload as same format into CouchDB on
successful uploading archive the file into local system of different folder.
python program to parse the json and insert into CouchDb:
import sys
import glob
import errno,time,os
import couchdb,simplejson
import json
from pprint import pprint
couch = couchdb.Server() # Assuming localhost:5984
#couch.resource.credentials = (USERNAME, PASSWORD)
# If your CouchDB server is running elsewhere, set it up like this:
couch = couchdb.Server('http://localhost:5984/')
db = couch['mydb']
path = 'C:/Users/Desktop/CouchDB_Python/Json_files/*.json'
#dirPath = 'C:/Users/VijayKumar/Desktop/CouchDB_Python'
files = glob.glob(path)
for file1 in files:
#dirs = os.listdir( dirPath )
file2 = glob.glob(file1)
for name in file2: # 'file' is a builtin type, 'name' is a less-ambiguous variable name.
try:
with open(name) as f: # No need to specify 'r': this is the default.
#sys.stdout.write(f.read())
json_data=f
data = json.load(json_data)
db.save(data)
pprint(data)
json_data.close()
#time.sleep(2)
except IOError as exc:
if exc.errno != errno.EISDIR: # Do not fail if a directory is found, just ignore it.
raise # Propagate other kinds of IOError.
I would use CouchDB bulk API, even though you have specified that you need to send them to db one by one. For example, by implementing a simple queue that gets sent out every say 5 - 10 seconds via a bulk doc call will greatly increase performance of your application.
There is obviously a quirk in that and that is you need to know the IDs of the docs that you want to get from the DB. But for the PUTs it is perfect. (it is not entirely true, you can get ranges of docs using bulk operation if the IDs you are using for your docs can be sorted nicely).
From my experience working with CouchDB, I have a hunch that you are dealing with Transactional documents in order to compile them into some sort of sum result and act on that data accordingly (maybe creating next transactional doc in series). For that you can rely on CouchDB by using 'reduce' functions on the views you create. It takes a little practice to get reduce function working properly and is highly dependent on what it is you actually what to achieve and what data you are prepared to emit by the view so I can't really provide you with more detail on that.
So in the end the app logic would go something like that:
get _design/someDesign/_view/yourReducedView
calculate new transaction
add transaction to queue
onTimeout
send all in transaction queue
If I got that first part of why you are using transactional docs wrong all that would really change is the part where you getting those transactional docs in my app logic.
Also, before writing your own 'reduce' function, have a look at buil-in ones (they are alot faster then anything outside of db engine can do)
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
EDIT:
Since you are starting, I strongly recommend to have a look at CouchDB Definitive Guide.
NOTE FOR LATER:
Here is one hidden stone (well maybe not so much a hidden stone but not an obvious thing to look out for for the new-comer in any case). When you write reduce function make sure that it does not produce too much output for the query without boundaries. This will extremely slow down the entire view even when you provide reduce=false when getting stuff from it.
So you need to get JSON documents from a server and send them to CouchDB as you receive them. A Python script would work fine. Here is some pseudo-code:
loop (until no more docs)
get new JSON doc from server
send JSON doc to CouchDB
end loop
In Python, you could use requests to send the documents to CouchDB and probably to get the documents from the server as well (if it is using an HTTP API).
You might want to checkout the pycouchdb module for python3. I've used it myself to upload lots of JSON objects into couchdb instance. My project does pretty much the same as you describe so you can take a look at my project Pyro at Github for details.
My class looks like that:
class MyCouch:
""" COMMUNICATES WITH COUCHDB SERVER """
def __init__(self, server, port, user, password, database):
# ESTABLISHING CONNECTION
self.server = pycouchdb.Server("http://" + user + ":" + password + "#" + server + ":" + port + "/")
self.db = self.server.database(database)
def check_doc_rev(self, doc_id):
# CHECKS REVISION OF SUPPLIED DOCUMENT
try:
rev = self.db.get(doc_id)
return rev["_rev"]
except Exception as inst:
return -1
def update(self, all_computers):
# UPDATES DATABASE WITH JSON STRING
try:
result = self.db.save_bulk( all_computers, transaction=False )
sys.stdout.write( " Updating database" )
sys.stdout.flush()
return result
except Exception as ex:
sys.stdout.write( "Updating database" )
sys.stdout.write( "Exception: " )
print( ex )
sys.stdout.flush()
return None
Let me know in case of any questions - I will be more than glad to help if you will find some of my code usable.

How to send extra data with Image?

In case of image galleries, the image contains a title , description and other information.
I want to transmit every information (along with image data) in the response. Is this possible, in single HTTP response.
The reason I am asking is that, in database, in a single query, everything can be returned viz. the image path, its title, description etc. So, will it not be beneficial to return everything in a single response.
Currently, I am able to just fetch the image path and name (written as imagepath) from database and return the image to the user.
My (usual fetch, read and output file) Backend code (Php):
$img_id=$_GET['id'];
$con=mysqli_connect('localhost','username','password','mydb');
$stmt= mysqli_prepare($con,'select imagepath,description,title from images where imgid=?');
mysqli_stmt_bind_param($stmt,'s',$img_id)
mysqli_stmt_execute($stmt);
mysqli_stmt_bind_result($stmt, $imgpath,$desc,$title);
if (!mysqli_stmt_fetch($stmt)) return ;
//single result
if (file_exists($imgpath))
{
header('Content-Type: image/png');
header('Content-Length: ' . filesize($imgpath));
readfile($file);
}
// Currently, I am unable to send desc and title via this. Is there a way to send them in a single response?
Also, is there a way to retrieve them in the browser side using javascript?
The "X-" prefix is for non-standard headers.
So you might just send your own headers with information you need:
header('X-Image-Title: '.$title);
UPDATE:
https://www.rfc-editor.org/rfc/rfc6648
So just use it without the X- prefix ;)

Base64Encoder cuts off last 4 characters of string

Edit Completely changed question after finding that the problem was elsewhere in the application.
I am working on a Heroku client in Flex and am trying to build the authentication tool now. Heroku uses Basic HTTP Authentication so I setup my User class to store an email and password and expose a method that will return the base64 encoded string representation of the email and password seperated by a colon. The encoder, however, cuts off the last 4 characters in the string (tested by encoding the same string through the openssl encoder built into *Nix. The code that I am using to encode the values is as follows:
public function getAuthString():String{
var encoder:Base64Encoder = new Base64Encoder();
encoder.insertNewLines = false;
encoder.encode(email + ':' + password);
trace(email + ':' + password);
trace(encoder.toString());
return encoder.toString();
}
The trace of the email and password together is correct, but the encoder.toString() call returns a string that is short 4 characters (45 characters long instead of 49).
Has anyone else run into this problem before? If so how did you fix it?
The ActionScript implementation is working as expected. The openssl implementation has the assumption of a newline. The extra four characters you are seeing are the newline character.

HTTP: Generating ETag Header

How do I generate an ETag HTTP header for a resource file?
As long as it changes whenever the resource representation changes, how you produce it is completely up to you.
You should try to produce it in a way that additionally:
doesn't require you to re-compute it on each conditional GET, and
doesn't change if the resource content hasn't changed
Using hashes of content can cause you to fail at #1 if you don't store the computed hashes along with the files.
Using inode numbers can cause you to fail at #2 if you rearrange your filesystem or you serve content from multiple servers.
One mechanism that can work is to use something entirely content dependent such as a SHA-1 hash or a version string, computed and stored once whenever your resource content changes.
An etag is an arbitrary string that the server sends to the client that the client will send back to the server the next time the file is requested.
The etag should be computable on the server based on the file. Sort of like a checksum, but you might not want to checksum every file sending it out.
server client
<------------- request file foo
file foo etag: "xyz" -------->
<------------- request file foo
etag: "xyz" (what the server just sent)
(the etag is the same, so the server can send a 304)
I built up a string in the format "datestamp-file size-file inode number". So, if a file is changed on the server after it has been served out to the client, the newly regenerated etag won't match if the client re-requests it.
char *mketag(char *s, struct stat *sb)
{
sprintf(s, "%d-%d-%d", sb->st_mtime, sb->st_size, sb->st_ino);
return s;
}
From http://developer.yahoo.com/performance/rules.html#etags:
By default, both Apache and IIS embed data in the ETag that dramatically reduces the odds of the validity test succeeding on web sites with multiple servers.
...
If you're not taking advantage of the flexible validation model that ETags provide, it's better to just remove the ETag altogether.
How to generate the default apache etag in bash
for file in *; do printf "%x-%x-%x\t$file\n" `stat -c%i $file` `stat -c%s $file` $((`stat -c%Y $file`*1000000)) ; done
Even when i was looking for something exactly like the etag (the browser asks for a file only if it has changed on the server), it never worked and i ended using a GET trick (adding a timestamp as a get argument to the js files).
Ive been using Adler-32 as an html link shortener. Im not sure whether this is a good idea, but so far, I havent noticed any duplicates. It may work as a etag generator. And it should be faster then trying to hash using an encryption scheme like sha, but I havent verified this. The code I use is:
shortlink = str(hex(zlib.adler32(link)+(2**32-1)/2))[2:-1]
I would recommend not using them and going for last-modified headers instead.
Askapache has a useful article on this. (as they do pretty much everything it seems!)
http://www.askapache.com/htaccess/apache-speed-etags.html
The code example of Mark Harrison is similar to what used in Apache 2.2. But such algorithm causes problems for load balancing when you have two servers with the same file but the file's inode is different. That's why in Apache 2.4 developers simplified ETag schema and removed the inode part. Also to make ETag shorter usually they encoded in hex:
<inttypes.h>
char *mketag(char *s, struct stat *sb)
{
sprintf(s, "\"%" PRIx64 "-%" PRIx64 "\"", sb->st_mtime, sb->st_size);
return s;
}
or for Java
etag = '"' + Long.toHexString(lastModified) + '-' +
Long.toHexString(contentLength) + '"';
for C#
// Generate ETag from file's size and last modification time as unix timestamp in seconds from 1970
public static string MakeEtag(long lastMod, long size)
{
string etag = '"' + lastMod.ToString("x") + '-' + size.ToString("x") + '"';
return etag;
}
public static void Main(string[] args)
{
long lastMod = 1578315296;
long size = 1047;
string etag = MakeEtag(lastMod, size);
Console.WriteLine("ETag: " + etag);
//=> ETag: "5e132e20-417"
}
The function returns ETag compatible with Nginx. See comparison of ETags form different servers