How could I read 4k csv file with phpexcel - csv

I can read a few csv rows using phpexcel, but when I try to read 4k rows an exception is thrown and I get the following message:
Notice: Undefined index:
Error loading file "": Could not open for reading! File does not exist.
$nomOrigine = $_FILES["monfichier"]["name"];
$elementsChemin = pathinfo($nomOrigine);
try {
$objPHPExcel = PHPExcel_IOFactory::load($nomOrigine);
$objWorksheet = $objPHPExcel->getActiveSheet();
} catch(Exception $e) {
die(_'Error loading file "'.pathinfo($nomOrigine,PATHINFO_BASENAME).'": '.$e->getMessage()_);
}

When posting data, if post_max_size is smaller than the file (or other data) that is being uploaded, the file will not be received on the server and you may get (as you did) an error that doesn't seem to make sense.
Increase your post_max_size to the largest value that you think you will need, but if you need a very large size then you should not be doing file uploads via a form.
I say that because with, for example, a 100MB file, the upload time will be very long, and (last I checked) user feedback is not very good, and if there is an error on client or server, then the upload will have to be restarted completely. You can do research on other upload methods if you think you'll need such sizable uploads.
I imagine 20M (as you mentioned in comment) will be fine, but that's really up to you and what size of files you expect to be dealing with.

Related

camel ftp2 file throw error when there is no file present in the folder

I am working on reading the files from the sftp path and process the plain text files that are kept on the server every 5 hours. There is a requirement where I need to throw an exception when there is no file present/kept on the server by the producer. I am using the following to read the files
from(sftp://NUID#SERVER:PORT?&preferredAuthentications=password&delete=true)
.routeId(ROUTE_ID)
.log("${body}")
.process(processor)
.end();
Now if there is no file present when the above route starts it doesn't say anything once there is a file on the server it consumes and process it. I want to throw an exception if there is no file present during a period of time.
Some possible way to throw an exception when there is no file present on the target server.
1. Use sendEmptyMessageWhenIdle option (from file)
Set this option to true will let your route receive an exchange with empty message when the polling consumer scan no file present on the target server. Then, you can add a new step in your route to throw exception when an empty message (but not normal exchange) is encountered.
2. Setup another route with timer component to check last file processing time
In your original route, add a new step to record the last file processing time in somewhere, then have a new route to periodically check whether difference between the last update time and current time is in acceptable time range.
Drawback: False alarm may occur from other problem (e.g. Continuous Network Issue)
What is the exception that you receive called?
I checked the docu http://camel.apache.org/ftp2.html and it may be caused due to an option not being set to true.
Please try again with ignoreFileNotFoundOrPermissionError=true and also check the docu for other options that may apply.

FileNotFoundException when starting a background download even though file clearly exists

In my WinRT application I have the following code:
resultingFile = await downloadFolder.CreateFileAsync(filename, CreationCollisionOption.OpenIfExists);
var downloader = new BackgroundDownloader();
var operation = downloader.CreateDownload(new Uri(rendition.Url), resultingFile);
await operation.StartAsync();
After the CreateFileAsync call I can verify that I do have a 0byte file at the filename path (and double verified by pulling the location out of the resultingFile itself.
However, when operation.StartAsync() is called I get a FileNotFoundException claiming the system could not find the file specified. Unfortunately, that's all it tells me and there is no inner exception.
I have also verified that rendition.Url gives me a valid url that downloads the content I'm expecting to be downloading.
Am I doing something wrong here?
Apparently this code isn't what is throwing the error but it's some code the BackgroundDownloader uses to coordinate things that can't find it's own file.
Uninstalling the application and redeploying it fixed it.
Good waste of 3 hours :(

how to send really really large json object as response - node.js with express

I have been getting this error FATAL ERROR: JS Allocation failed - process out of memory and I have pinpointed it to be the problem that I am sending really really large json object to res.json (or JSON.stringify)
To give you some context, I am basically sending around 30,000 config files (each config file has around 10,000 lines) as one json object
My question is, is there a way to send such a huge json object or is there a better way to stream it (like using socket.io?)
I am using: node v0.10.33, express#4.10.2
UPDATE: Sample code
var app = express();
app.route('/events')
.get(function(req, res, next) {
var configdata = [{config:<10,000 lines of config>}, ... 10,000 configs]
res.json(configdata); // The out of memory error comes here
})
After a lot of try, I finally decided to go with socket.io to send each config file at a time rather than all config files at once. This solved the problem of out of memory which was crashing my server. thanks for all your help
Try to use streams. What you need is a readable stream that produces data on demand. I'll write simplified code here:
var Readable = require('stream').Readable;
var rs = Readable();
rs._read = function () {
// assuming 10000 lines of config fits in memory
rs.push({config:<10,000 lines of config>);
};
rs.pipe(res);
You can try increasing the memory node has available with the --max_old_space_size flag on the command line.
There may be a more elegant solution. My first reaction was to suggest using res.json() with a Buffer object rather than trying to send the entire object all in one shot, but then I realize that whatever is converting to JSON will probably want to use the entire object all at once anyway. So you will run out of memory even though you are switching to a stream. Or at least that's what I would expect.

Handling IE and that it doesn't handle fileSize, etc

I'm using v3.9 UI with jQuery wrapper:
Now that I've solved my 'Maximum Request Length Exceeded' error within IE9 (from here:
FineUploader Error Handling
). All my FineUploader code is within this other link, and didn't think I needed to post it again.
I'm now looking for a better way to let IE users know when they've attempted to upload a file that's too large (Chrome, FF users get the too large file alert, so this isn't a problem). I don't think I need to mess with the 'messages' option as these are working as they should for all the other browsers. It's IE that's not working as it should! IE users will get all the way up to the server with the file they've selected. For handling uploading a file that exceeds the fileSize property, I have code on my server side to checks content length > 'n' and if so then return JSON success = false. See below:
[HttpPost]
public JsonResult UploadFile(HttpPostedFileWrapper qqfile, int surveyInstanceId, int surveyItemResultId, int itemId, int loopingIndex)
{
bool isValid = false;
// file is too big, throw error.
if (qqfile.ContentLength > (1024*1024*2.5))
{
return CreateJsonResult(false);
}
More Code here if the file is 'good'
}
private JsonResult CreateJsonResult(bool isSuccess)
{
var json = new JsonResult();
json.ContentType = "text/plain";
json.Data = new { success = isSuccess };
return json;
}
This short circuits an invalid file upload based on size. Great, but can I return more than just success=false and have this additional JSON value used by FineUploader to display a more useful message to the user? Currently all that shows is 'Upload Failed'. How do I reference the specific html element for that invalid file, so I can add a more descriptive error?
Also, I do have the .on('error') method, but I'm not sure how to trigger this. This would be a logic place to look, as the file upload size issue IS an error. Help? Thanks.
Using the Content-Length of the request to enforce file size restrictions is not a good idea. All upload requests sent by Fine Uploader, by default, are multipart encoded. The Content-Length for multipart encoded requests is the size of the ENTIRE request, not just the file.
In a comment to your answer for your last question, I pointed to a specific section of the documentation that allows you to control the failure text that appears next to a file in Fine Uploader UI mode. All you have to do is set the mode property of the failedUploadTextDisplay option to "custom". Server-side, return the error message text you would like to appear next to the failed file in an "error" property of your JSON response. See the bottom of the "handling errors" documentation section for more details.
UPDATE
It looks like you are using the ContentLength property of the HttpPostedFileWrapper which returns the size of the uploaded file. This was a bit confusing for me at first, since Content Length generally refers to the size of a request. The name choice for this property was a poor one on Microsoft's part, IMHO. So, you can disregard #1 in my answer.

SOLR - Best approach to import 20 million documents from csv file

My current task on hand is to figure out the best approach to load millions of documents in solr.
The data file is an export from DB in csv format.
Currently, I am thinking about splitting the file into smaller files and having a script while post this smaller ones using curl.
I have noticed that if u post high amount of data, most of the time the request times out.
I am looking into Data importer and it seems like a good option
Any others ideas highly appreciated
Thanks
Unless a database is already part of your solution, I wouldn't add additional complexity to your solution. Quoting the SOLR FAQ it's your servlet container that is issuing the session time-out.
As I see it, you have a couple of options (In my order of preference):
Increase container timeout
Increase the container timeout. ("maxIdleTime" parameter, if you're using the embedded Jetty instance).
I'm assuming you only occasionally index such large files? Increasing the time-out temporarily might just be simplest option.
Split the file
Here's the simple unix script that will do the job (Splitting the file in 500,000 line chunks):
split -d -l 500000 data.csv split_files.
for file in `ls split_files.*`
do
curl 'http://localhost:8983/solr/update/csv?fieldnames=id,name,category&commit=true' -H 'Content-type:text/plain; charset=utf-8' --data-binary #$file
done
Parse the file and load in chunks
The following groovy script uses opencsv and solrj to parse the CSV file and commit changes to Solr every 500,000 lines.
import au.com.bytecode.opencsv.CSVReader
import org.apache.solr.client.solrj.SolrServer
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
import org.apache.solr.common.SolrInputDocument
#Grapes([
#Grab(group='net.sf.opencsv', module='opencsv', version='2.3'),
#Grab(group='org.apache.solr', module='solr-solrj', version='3.5.0'),
#Grab(group='ch.qos.logback', module='logback-classic', version='1.0.0'),
])
SolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr/");
new File("data.csv").withReader { reader ->
CSVReader csv = new CSVReader(reader)
String[] result
Integer count = 1
Integer chunkSize = 500000
while (result = csv.readNext()) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", result[0])
doc.addField("name_s", result[1])
doc.addField("category_s", result[2])
server.add(doc)
if (count.mod(chunkSize) == 0) {
server.commit()
}
count++
}
server.commit()
}
In SOLR 4.0 (currently in BETA), CSV's from a local directory can be imported directly using the UpdateHandler. Modifying the example from the SOLR Wiki
curl http://localhost:8983/solr/update?stream.file=exampledocs/books.csv&stream.contentType=text/csv;charset=utf-8
And this streams the file from the local location, so no need to chunk it up and POST it via HTTP.
Above answers have explained really well the ingestion strategies from single machine.
Few more options if you have big data infrastructure in place and want to implement distributed data ingestion pipeline.
Use sqoop to bring data to hadoop or place your csv file manually in hadoop.
Use one of below connector to ingest data:
hive- solr connector, spark- solr connector.
PS:
Make sure no firewall blocks connectivity between client nodes and solr/solrcloud nodes.
Choose right directory factory for data ingestion, if near real time search is not required then use StandardDirectoryFactory.
If you get below exception in client logs during ingestion then tune autoCommit and autoSoftCommit configuration in solrconfig.xml file.
SolrServerException: No live SolrServers available to handle this
request
Definitely just load these into a normal database first. There's all sorts of tools for dealing with CSVs (for example, postgres' COPY), so it should be easy. Using Data Import Handler is also pretty simple, so this seems like the most friction-free way to load your data. This method will also be faster since you won't have unnecessary network/HTTP overhead.
The reference guide says ConcurrentUpdateSolrServer could/should be used for bulk updates.
Javadocs are somewhat incorrect (v 3.6.2, v 4.7.0):
ConcurrentUpdateSolrServer buffers all added documents and writes them into open HTTP connections.
It doesn't buffer indefinitely, but up to int queueSize, which is a constructor parameter.