Seeing "internal error: Huge input lookup [1]" error when trying to create a large powerpoint file using OFFICER package - officer

I don't have a great way to give a reproduceable example, but here's my best description. I'm running a loop that generates 60 different powerpoint slides, each in officer and creates a list, which results in a "pptx document with 60 slides" in my R environment. However, when I try to print this list, I see the following error:
Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, :
internal error: Huge input lookup [1]
I tried running the list with only 10 powerpoint slides, and the print works, creating a slide deck of 10 slides. But I guess 60 is beyond the level that is considered "huge." Is there a way to override this? I saw some other posts about how you can add a Huge override, but I'm not exactly sure where I would do that.

set options = c("HUGE") for read_xml()

Related

TCL stack trace not showing desired error line number

The -errorline element of the return options dictionary for the following TCL script is "2":
puts [info patchlevel]
try {
error "this is an error"
} trap {} {result ropts} {
puts $result
puts $ropts
}
How do I get the stacktrace to display the line number in the source file where the error was actually raised (ie. line 4 instead of 2) ?
Example screenshot:
Tcl often has that information available, but doesn't use it.
It has the information available because you have a chance to retrieve it with info frame and getbytecode (which is in the tcl::unsupported namespace, mostly because we reserve the right to change how the bytecodes themselves work at any time). I'm not quite sure if that would work in your specific case, but if you put your test code in a procedure then it definitely would. (There are complexities here with fragility that I don't fully understand.)
It doesn't use it because, for backward-compatibility with existing tooling, it uses the line numbers it was using prior to the creation of the machinery to support info frame. Those line numbers are relative to the local script fragment (which is whatever reports the line number in the error info trace first); in this case, that is the body of the try.
I don't like that it works like that at all. However, changing things is a bit tricky because we'd need to also figure out what else to report and what to do in the cases where the information genuinely isn't available (such as for automatically-generated code where things are assembled from many strings from many lines).

Splitting a feature collection by system index in Google Earth Engine?

I am trying to export a large feature collection from GEE. I realize that the Python API allows for this more easily than the Java does, but given a time constraint on my research, I'd like to see if I can extract the feature collection in pieces and then append the separate CSV files once exported.
I tried to use a filtering function to perform the task, one that I've seen used before with image collections. Here is a mini example of what I am trying to do
Given a feature collection of 10 spatial points called "points" I tried to create a new feature collection that includes only the first five points:
var points_chunk1 = points.filter(ee.Filter.rangeContains('system:index', 0, 5));
When I execute this function, I receive the following error: "An internal server error has occurred"
I am not sure why this code is not executing as expected. If you know more than I do about this issue, please advise on alternative approaches to splitting my sample, or on where the error in my code lurks.
Many thanks!
system:index is actually ID given by GEE for the feature and it's not supposed to be used like index in an array. I think JS should be enough to export a large featurecollection but there is a way to do what you want to do without relying on system:index as that might not be consistent.
First, it would be a good idea to know the number of features you are dealing with. This is because generally when you use size().getInfo() for large feature collections, the UI can freeze and sometimes the tab becomes unresponsive. Here I have defined chunks and collectionSize. It should be defined in client side as we want to do Export within the loop which is not possible in server size loops. Within the loop, you can simply creating a subset of feature starting from different points by converting the features to list and changing the subset back to feature collection.
var chunk = 1000;
var collectionSize = 10000
for (var i = 0; i<collectionSize;i=i+chunk){
var subset = ee.FeatureCollection(fc.toList(chunk, i));
Export.table.toAsset(subset, "description", "/asset/id")
}

How do I download gridded sst data?

I've recently been introduced to R and trying the heatwaveR package. I get an error when loading erddap data ... Here's the code I have used so far:
library(rerddap)
library(ncdf4)
info(datasetid = "ncdc_oisst_v2_avhrr_by_time_zlev_lat_lon", url = "https://www.ncei.noaa.gov/erddap/")
And I get the following error:
Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) :
schannel: next InitializeSecurityContext failed: SEC_E_INVALID_TOKEN (0x80090308) - The token supplied to the function is invalid
Would like some help in this. I'm new to this website too so I apologize if the above question is not as per standards (codes to be typed in a grey box, etc.)
Someone directed this post to my attention from the heatwaveR issues page on GitHub. Here is the answer I provided for them:
I do not manage the rerddap package so can't say exactly why it may be giving you this error. But I can say that I have noticed lately that the OISST data are often not available on the ERDDAP server in question. I (attempt to) download fresh data every day and am often denied with an error similar to the one you posted. It's gotten to the point where I had to insert some logic gates into my download script so it tells me that the data aren't currently being hosted before it tries to download them. I should also point out that one may download the "final" data from this server, which have roughly a two week delay from present day, as well as the "preliminary (prelim)" data, which are near-real-time but haven't gone through all of the QC steps yet. These two products are accounted for in the following code:
# First download the list of data products on the server
server_data <- rerddap::ed_datasets(which = "griddap", "https://www.ncei.noaa.gov/erddap/")$Dataset.ID
# Check if the "final" data are currently hosted
if(!"ncdc_oisst_v2_avhrr_by_time_zlev_lat_lon" %in% server_data)
stop("Final data are not currently up on the ERDDAP server")
# Check if the "prelim" data are currently hosted
if(!"ncdc_oisst_v2_avhrr_prelim_by_time_zlev_lat_lon" %in% server_data)
stop("Prelim data are not currently up on the ERDDAP server")
If the data are available I then check the times/dates available with these two lines:
# Download final OISST meta-data
final_info <- rerddap::info(datasetid = "ncdc_oisst_v2_avhrr_by_time_zlev_lat_lon", url = "https://www.ncei.noaa.gov/erddap/")
# Download prelim OISST meta-data
prelim_info <- rerddap::info(datasetid = "ncdc_oisst_v2_avhrr_prelim_by_time_zlev_lat_lon", url = "https://www.ncei.noaa.gov/erddap/")
I ran this now and it looks like the data are currently available. Is your error from today, or from a day or two ago? The availability seems to cycle over the week but I haven't quite made sense of any pattern yet. It is also important to note that about a day before the data go dark they are filled with all sorts of massive errors. So I've also had to add error trapping into my code that stops the data aggregation process once it detects temperatures in excess of some massive number. In this case it is something like1^90, but the number isn't consistent meaning it is not a missing value placeholder.
To manually see for yourself if the data are being hosted you can go to this link and scroll to the bottom:
https://www.ncei.noaa.gov/erddap/griddap/index.html
All the best,
-Robert

how to convert/match a handwritten list of names? (HWR)

I would like to see if I can scan a sign-in sheet for a class. The good news is I know 90% of the names that might be written.
My idea was to use tessaract to parse an image of names, and then use the Levenshtein algorithm to compare each line with a list of names in my database and if I get reasonably close matches, then that name is right.
Does this approach sound like a good one? If not, other ideas?
I tried using tesseract on a sample sheet (see below)
I used:
tesseract simple.png -psm 4 outtxt
Tesseract Open Source OCR Engine v3.05.01 with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
I am assuming it didn't like line 2 because I went below the line.
The results I got were:
1.. AM: (harm;
l. ’E (J 22 a 00k
2‘ wau \\) [HQ
4. KIM TAYLOE
5. LN] Davis
6‘ Mzflé! Ha K
Obviously not the greatest, my guess is the distance matches for 4 & 5 would work, but the rest are not even close.
I have control of my sign-in sheet, but not the handwriting of folks coming in, so if any changes to that I can do to help, please let me know.
Since your goal is to get names only - I would suggest you to reduce tessedit_char_whitelist to english alphabetical ones("ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.") so that you will not get characters that you don't expect as output like \\) [ .
Your initial approach to calculate L distance is fine if you success to extract text from handwritten image (which is a hard task for tesseract).
I would also suggest to run some preprocessing on your image. For example you can remove horizontal lines and extract text ROIs around them. In the best case you will be able to extract separated characters, but even if you don't do that - you will get better results & will be able to distinguish result names "line by line".
You should also try other recommended output quality improvement stages which you can find in Tesseract OCR wiki (link)

Trouble finding DataGrid to implement for over 10,000 records with pagination, filtering, and clickable links for each row

I have tried using a few different data grids (FlexiGrid, ExtJs Grid, and YUI DataGrid) and have found YUI to work the best as far as documentation and features available. However, I am having difficulty setting up the data source. When I try to set it up using JSON, it takes too long, or times out. I have already maxed out the memory usage in the php.ini file. There will be many more records in the future as well.
I need to select data to populate the grid based on the user that is currently logged in. Once this information populates the grid, I need each id to be click-able and take me to a different page, or populate information in a div on the same page.
Does anyone have suggestions on loading 25 – 50 records at a time of dynamic data? I have tried implementing the following example to do what I want: YUI Developer Example
I cannot get the data grid to show at all. I have changed the data instance to the following.
// DataSource instance
var curDealerNumber = YAHOO.util.Dom.getElementsByClassName('dealer_number', 'input');
var ds_path = + "lib/php/json_proxy.php?dealernumber='" + curDealerNumber + "'";
var myDataSource = new YAHOO.util.DataSource("ds_path");
myDataSource.responseType = YAHOO.util.DataSource.TYPE_JSON;
myDataSource.responseSchema = {
resultsList: "records",
fields: [
{key:"id", parser:"number"},
{key:"user_dealername"},
{key:"user_dealeraccttype"},
{key:"workorder_num", parser:"number"},
{key:"segment_num", parser:"number"},
{key:"status"},
{key:"claim_type"},
{key:"created_at"},
{key:"updated_at"}
],
metaFields: {
totalRecords: "totalRecords" // Access to value in the server response
}
};
Any help is greatly appreciated, and sorry if this seems similar to other posts, but I searched and still could not resolve my problem. Thank you!
It's hard to troubleshoot without a repro case, but I'd suggest turning on logging to see where the problem might be:
load datatable-debug file
load logger
either call YAHOO.widget.Logger.enableBrowserConsole() to output logs to your browser's JS console (i.e., Firebug), or call new YAHOO.widget.LogReader() to output logs to the screen.
Also make sure the XHR request and response are well-formed with Firebug or similar tool.
Finally, when working with large datasets, consider
pagination
enabling renderLoopSize (http://developer.yahoo.com/yui/datatable/#renderLoop)
chunking data loads into multiple requests (http://developer.yahoo.com/yui/examples/datatable/dt_xhrjson.html).
There is no one-size-fits-all solution for everyone, but hopefully you can find the right set of tweaks for your use case.