Is it possible to split very large RDLs at a specific breakpoint? - reporting-services

I have a very large RDL that prints to a PDF. The generated PDF results in thousands of pages and it's not really possible to reduce the result set.
I've tried many different methods for optimizing the PDF output such as reducing unnecessary formatting, removing all images, using embedded PDF fonts, but my report still takes a very long time to process.
If each one of my records carries an ID beginning with A-Z, is it possible to have the RDL itself trigger multiple child RDLs, such that something like all "A" results would be produced, all "B" results would be produced and so on and so forth?

Related

Tesseract OCR with numeric tables

I need to OCR old statistical tables that contain numerical values for each town in a given area. I use Tesseract 4.0.0-beta.3, and in most cases I get acceptable results, but in some others the software fails to recognise the structure of the table and skips rows or entire columns.
I was trying to apply a more suitable configuration by checking --help-psm, but honestly I couldn't figure out which one could improve my results. I also tried to slice up the tables to individual columns, but the results were even worse. I suppose the issue is that some cells contain 1 or 2 digit numbers, and the rows are deemed to short, which is usually good, but here it is rather problematic. What settings would you use to optimise results?
In a similar situation I was using
tesseract image test --psm 6 --oem 0 digits
I even deleted the left text - to be processed
separately.
Number recognition was ok, but my problem was, that I have ~10 columns and some are blank in some rows, but tesseract sometimes ignores the vertical lines, sometimes displays them as "1", unpredictedly.
I tried several settings, even deleted the vertical lines, but couldn't get tesseract to keep the table structure for subsequent computer-read.
Hope it helps.

Splitting Large CSV files by Column

I have a vey large (4gb) csv file. Cannot open in excel or in other editors. The number of lines (rows) is nearly 3,000 and the number of columns is nearly 320,000.
One solution is to split the original file into smaller ones and be able to open these small ones in Excel or other editors.
The second solution is to take the transpose of the original data then open it in the Excel.
I could not find a tool or script for transpose. I've found some scripts and free software for splitting but each of them splits the csv by row size.
Is there a way to split the original file into smaller ones that consist of max 15000 rows.
I tried to use:
import panda as pd
pd.read_csv(%file Path%).T.to_csv('%new File Path%,headre=false)
But it take ages to complete
In meanwhile I tired to use some python coding, but all of them failed because of the memory issues.
Trial version of the Delimit (http://www.delimitware.com/) handled the data perfectly.

Manipulating 100gb Table

I have a particular dataset in tsv format seperated by tabs that is one big txt file of around 100gb (Somewhere around 255 million rows). I have to filter and extract relevant rows so I can easily work on them. So far, I know that Excel can't handle that many rows, and familliar text editors can't open or very painful to work with tables. I've tried LogParser, a 36 mins query gave me a csv output but unfortunately exported number of rows are way below what I guess is present in the data. Also I get some parsing errors and some columns in the exported sets are shifted. Do you have any other alternatives? Maybe can I somehow turn the data into an SQL database? Is it possible?

Apending CSV files with columns in different orders

I need to regularly merge data from multiple CSV files into a single spreadsheet by appending the rows from each source file. Only OpenOffice/LibreOffice is able to read the UTF-8 CSV file, which has quote-delimited fields containing newline characters.
Now, each CSV file column headings, but the order of the columns vary from file to file. Some files also have missing columns, and some have extra columns.
I have my master list of column names, and the order in which I would like them all to go. What is the best way to tackle this? LibreOffice gets the CSV parsing right (Excel certainly does not). Ultimately the files will all go into a single merged spreadsheet. Every row from each source file must be kept intact, apart from the column ordering.
The steps also need to be handed over to a non-technical third party eventually, so I am looking for an approach that will not offer too many non-expert technical hurdles.
Okay, I'm approaching this problem a different way. I have instead gone back to the source application (WooCommerce) to fix the export, so the spreadsheets list all the same columns and all in the same order, on every export. This does have other consequences that I need to follow up, such as managing patches and trying to get the changes accepted by the source project. But it does avoid having to append the CSV files with mis-matched columns, which seems to be a common issue that no-one has any real solutions for (yes, I have searched, a lot).

MySQL, load data from file, into number of tables

My basic task is to import parts of data from one single file, into several different tables as fast as possible.
I currently have a file per table, and i manage to import each file into the relevant table by using LOAD DATA syntax.
Our product received new requirements from a client, he is no more interested to send us multiple files but instead he wants to send us single file which contains all the original records instead of maintaining multiple such files.
I thought of several suggestions:
I may require the client to write a single raw before each batch of lines in file describing the table to which he want it to be loaded and the number of preceding lines that need to be imported.
e.g.
Table2,500
...
Table3,400
Then i could try to apply LOAD DATA for each such block of lines discarding the Table and line number description. IS IT FEASIBLE?
I may require each record to contain the table name as additional attribute, then i need to iterate each records and inserting it , although i am sure it is much slower vs LOAD DATA.
I may also pre-process this file using for example Java and execute the LOAD DATA as statement in a for loop.
I may require almost any format changes i desire, but it have to be one single file and the import must be fast.
(I have to say that what i mean by saying table description, it is actually a different name of a feature, and i have decided that all relevant files to this feature should be saved in different table name - it is transparent to the client)
What sounds as the best solution? is their any other suggestion?
It depends on your data file. We're doing something similar and made a small perl script to read the data file line by line. If the line has the content we need (for example starts with table1,) we know that it should be in table 1 so we print that line.
Then you can either save that output to a file or to a named pipe and use that with LOAD DATA.
This will probably have a much better performance that loading it in temporary tables and from there into new tables.
The perl script (but you can do it in any language) can be very simple.
You may have another option which is to define a single table and load all your data into that table, then use select-insert-delete to transfer data from this table to your target tables. Depending on the total number of columns this may or may not be possible. However, if possible, you don't need to write an external java program and can entirely rely on the database for loading your data which can also offer you a cleaner and more optimized way of doing the job. You will much probably need to have an additional marker column which can be the name of the target tables. If so, this can be considered as a variant of option 2 above.