Parallel distinct word count in Go - language-agnostic

Jakob Østergaard presented this challenge:
Write a program that reads text from standard-input, and returns (prints) the total number of distinct words found in the text.
How can we meet this challenge with parallel programming (preferably in Go, but a description in English will suffice)?

There are several possibilities, but I guess you mean "efficiently" ?
The general idea would be to split the text into manageable chunks, pile those chunks into a queue, and have multiple consumers handle the chunks.
This looks like a typical Map/Reduce application to me:
_ Worker_
/ \
/ \
Splitter--- Worker ---Aggregator
\ /
\_ Worker _/
Ideally the "multiple" queues would be a single one with multiple consumers, so that if one worker slows down the whole process does not slows as much.
I would also use a signal from the Splitter to the Workers to let them know the input has been fully consumed and they can start send their results to the Aggregator.

Here's the solution, in Go, to Jakob Østergaard's problem.
/*
The problem: Write a program that reads text from
standard-input, and returns (prints) the total
number of distinct words found in the text.
C versus C++, Jakob Østergaard, February 24th, 2004
http://unthought.net/c++/c_vs_c++.html
*/
package main
import (
"bufio"
"fmt"
"os"
"unicode"
)
func main() {
rdr := bufio.NewReader(os.Stdin)
words := make(map[string]bool, 1024*1024)
word := make([]int, 0, 256)
for {
r, n, _ := rdr.ReadRune()
if unicode.IsSpace(r) || n == 0 {
if len(word) > 0 {
words[string(word)] = true
word = word[:0]
}
if n == 0 {
break
}
} else {
word = append(word, r)
}
}
fmt.Println(len(words))
}
It's naive to add the phrase "parallel programming" to this and most other problems and expect some magical improvement. Reading input sequentially from a stream and performing trivial computation provides no significant opportunities for parallel computing.

Related

How to merge hundreds of CSVs, each with millions of rows, using minimal RAM?

I have terabytes of sensor data stored across separate CSV files in format timestamp and value. I need to merge these CSV files where I will have one column for timestamp (calculated as the average of all other timestamps for each row) and each sensor will have its own column where the column name comes from the file name. The number of sensors is more than 500. What tech stack to look into? I can't fit all data in RAM at once.
Example:
sensor1.csv
timestamp value
10000 1.9
10010 2.2
... (millions of rows)
sensor2.csv
timestamp value
10004 3.5
10012 4.3
... (500 more files)
Result should look like this (timestamp in this file is the average of all the timestamps from all 500+ input files, names of sensors, e.g. sensor1, sensor2 and etc., come from filenames):
merged_file.csv
timestamp sensor1 sensor2 ... sensor500
10002 1.9 3.5 2.1
10011 2.2 4.3 3.5
After merge, I would like to store data in a database, e.g. InfluxDB, for future analysis and model trainings. What tools would be best to perform merge and analysis operations on this data?
I see your post as two distinct questions: 1) How to merge 500 CSVs? 2) whatever comes next, some DB?
This is a solution for the first question. I'm using Python, there are languages/runtimes that could do this faster, but I think Python will give you a good first start at the problem and I'm expecting Python will be more accessbile and easier to use for you.
Also, my solution is predicated on the fact that all 500 CSVs have identical row counts.
My solution opens all 500 CSVs at once, creates an outer loop over a set number of rows, and an inner loop over each CSV:
The inner loop reads the timestamp and value for a row in each CSV, averaging the 500 timestamps into a single column, and accumulating the 500 distinct values in their own columns, and all that goes into a final merged row with 501 columns.
The outer loop repeats that process for as many rows as there are across all 500 CSVs.
I generated some sample data, 500 CSVs each with 1_000_000 rows, for 6.5G of CSVs. I ran the following script on my M1 Macbook Air. It completed in 8.3 minutes and peaked at 34.6M of RAM and produced a final CSV that is about 2G on disk.
import csv
import glob
# Fill this in based on your own knowledge, or, based on the output of 'analyze_stats.py'
NUM_ROWS = 1_000_000
sensor_filenames = sorted(glob.glob('sensor*.csv'))
# Sort: trim leading 6 chars, 'sensor', and trailing 4 chars, '.csv', leaving just the number in the middle
sensor_filenames = sorted(sensor_filenames, key=lambda x: int(x[6:-4]))
# Get handles to all files, and create input CSV readers
sensor_readers = []
for sensor_fname in sensor_filenames:
f = open(sensor_fname, newline='')
sensor_readers.append(csv.reader(f))
# Create output CSV writer
f = open('merged_sensors.csv', 'w', newline='')
writer = csv.writer(f)
# Discard all sensor headers
for reader in sensor_readers:
next(reader)
# Build up output header and write
output_header = ['timestamp']
for sensor_fname in sensor_filenames:
sensor_name = sensor_fname[:-4] # trim off '.csv'
output_header.append(sensor_name)
writer.writerow(output_header)
row_count = 0
while row_count < NUM_ROWS:
row_count += 1
values = []
timestamps = []
for reader in sensor_readers:
row = next(reader)
ts, val = row
timestamps.append(int(ts))
values.append(val)
if row_count % 1000 == 0:
print(f'Merged {row_count} rows')
avg_ts = int(sum(timestamps)/len(timestamps))
writer.writerow([avg_ts] + values)
I haven't profiled this, but I believe the only real allocations of memory that add up are going to be:
the 500 file handles and CSV readers (which is small) for the entirety of the process
each row from the input CSVs in the inner loop
the final merged row in the outer loop
At the top of the script I mention analyze_stats.py. Even if this were my data, I'd be very patient and break down the entire process into multiple steps, each which I could verify, and I would ultimately arrive at the correct, final CSV. This is especially true for me trying to help you, because I don't control the data or "know" it like you, so I'm going to offer up this bigger process:
Read all the CSVs and record some stats: headers, column counts, and especially row counts.
Analyze those stats for "conformance", making sure no CSVs deviates from your idea of what it should be, and especially get confirmation that all 500 CSVs have the same number of columns and rows.
Use the proven row count as input into the merge process.
There are ways to write the merge script so it doesn't have to know "the row count" up front, but it's more code, it's slightly more confusing, and it won't help you if there ever is a problem... you probably don't want to find out on row 2 million that there was problem "somewhere", I know I hate it when that happens.
If you're new to Python or the CSV readers/writers I recommend you read these scripts first.
get_sensor_stats.py: reads all your sensor data and records the header, the minimum and maximum number of columns seens, and the row count for each CSV; it writes all those stats out to a CSV
analyze_stats.py: reads in the stats CSV and checks to make sure header and column counts meet pre-defined values; it also keeps a tally of all the row counts for each file, and if there are files with different row counts it will let you know
Also, here's the script I used to generate my 6G of sample data:
gen_sensor_data.py: is my attempt to meaningfully represent your problem space, both in size and complexity (which is very easy, thankfully 🙂)
Same big idea as my Python solution, but in Go for a 2.5 minute speed-up and using 1/3 the RAM.
package main
import (
"encoding/csv"
"fmt"
"io"
"os"
"path/filepath"
"regexp"
"sort"
"strconv"
)
// Fill this in based on your own knowledge, or, based on the output of 'analyze_stats.py'
const NUM_ROWS = 1_000_000
// Sorting sensor filenames by a custom key, the sensor number
type BySensorNum []string
func (sn BySensorNum) Len() int { return len(sn) }
func (sn BySensorNum) Swap(i, j int) { sn[i], sn[j] = sn[j], sn[i] }
func (sn BySensorNum) Less(i, j int) bool {
re := regexp.MustCompile(`.*sensor(\d+).csv`)
// Tease out sensor num from file "A"
fnameA := sn[i]
matchA := re.FindStringSubmatch(fnameA)
numA := matchA[1]
intA, errA := strconv.Atoi(numA)
if errA != nil {
panic(fmt.Errorf("Could not parse number \"%s\" from file-name \"%s\" as int: %v\n", errA, numA, fnameA))
}
// Tease out sensor num from file "B"
fnameB := sn[j]
matchB := re.FindStringSubmatch(fnameB)
numB := matchB[1]
intB, errB := strconv.Atoi(numB)
if errB != nil {
panic(fmt.Errorf("%v: Could not parse number \"%s\" from file-name \"%s\" as int\n", errB, numB, fnameB))
}
// Compare sensor nums numerically
return intA < intB
}
func main() {
// filenames := []string{"../sensor1.csv", "../sensor2.csv", "../sensor3.csv", "../sensor4.csv", "../sensor5.csv", "../sensor6.csv", "../sensor7.csv", "../sensor8.csv", "../sensor9.csv", "../sensor10.csv"}
filenames, err := filepath.Glob("sensor*.csv")
if err != nil {
panic(err) // only expect error if Glob pattern is bad itself (malformed)
}
fileCount := len(filenames)
sort.Sort(BySensorNum(filenames))
// Create output CSV writer
outFname := "merged_sensors.csv"
f, err := os.Create(outFname)
if err != nil {
panic(fmt.Errorf("Could not create \"%s\" for writing: %v", outFname, err))
}
defer f.Close()
w := csv.NewWriter(f)
// Get handles to all files, and create input CSV readers
readers := make([]*csv.Reader, fileCount)
for i, fname := range filenames {
f, err := os.Open(fname)
if err != nil {
panic(fmt.Errorf("Could not open \"%s\": %v", fname, err))
}
defer f.Close()
readers[i] = csv.NewReader(f)
}
// Discard all sensor headers
for _, r := range readers {
r.Read()
}
// With everything created or opened, start writing...
// Build up output header and write
header := make([]string, fileCount+1)
header[0] = "timestamp"
re := regexp.MustCompile(`.*(sensor\d+)\.csv`)
for i, fname := range filenames {
sensorName := re.FindStringSubmatch(fname)[1]
header[i+1] = sensorName
}
w.Write(header)
// "Shell" of final record with fileCount+1 columns, create once and use over-and-over again
finalRecord := make([]string, fileCount+1)
for i := 1; i <= NUM_ROWS; i++ {
var tsSum int
for j, r := range readers {
record, err := r.Read()
if err == io.EOF {
break
}
if err != nil {
panic(fmt.Errorf("Could not read record for row %d of file \"%s\": %v", i, filenames[j], err))
}
timestamp, err := strconv.Atoi(record[0])
if err != nil {
panic(fmt.Errorf("Could not parse timestamp \"%s\" as int in record `%v`, row %d, of file \"%s\": %v", record[0], record, i, filenames[j], err))
}
tsSum += timestamp
finalRecord[j+1] = record[1]
}
// Add average timestamp to first cell/column
finalRecord[0] = fmt.Sprintf("%.1f", float32(tsSum)/float32(fileCount))
w.Write(finalRecord)
}
w.Flush()
}

How to debug/dump Go variable while building with cgo?

I'm trying to write a MySQL UDF in Go with cgo, in which I have a basic one functioning, but there's little bits and pieces that I can't figure out because I have no idea what some of the C variables are in terms of Go.
This is an example that I have written in C that forces the type of one of the MySQL parameters to an int
my_bool unhex_sha3_init(UDF_INIT *initid, UDF_ARGS *args, char *message) {
if (args->arg_count != 2) {
strcpy(message, "`unhex_sha3`() requires 2 parameters: the message part, and the bits");
return 1;
}
args->arg_type[1] = INT_RESULT;
initid->maybe_null = 1; //can return null
return 0;
}
And that works fine, but then I try to do the same/similar thing with this other function in Go like this
//export get_url_param_init
func get_url_param_init(initid *C.UDF_INIT, args *C.UDF_ARGS, message *C.char) C.my_bool {
if args.arg_count != 2 {
message = C.CString("`get_url_param` require 2 parameters: the URL string and the param name")
return 1
}
(*args.arg_type)[0] = C.STRING_RESULT
(*args.arg_type)[1] = C.STRING_RESULT
initid.maybe_null = 1
return 0
}
With this build error
./main.go:24: invalid operation: (*args.arg_type)[0] (type uint32 does
not support indexing)
And I'm not totally sure what that means. Shouldn't this be a slice of some sort, not a uint32?
And this is where it'd be super helpful have some way of dumping the args struct somewhere somehow (maybe even in Go syntax as a super plus) so that I can tell what I'm working with.
Well I used spew to dump the variable contents to a tmp file inside the init function (commenting out the lines that made it not compile) and I got this
(string) (len=3) "%#v"
(*main._Ctype_struct_st_udf_args)(0x7ff318006af8)({
arg_count: (main._Ctype_uint) 2,
_: ([4]uint8) (len=4 cap=4) {
00000000 00 00 00 00 |....|
},
arg_type: (*uint32)(0x7ff318006d18)(0),
args: (**main._Ctype_char)(0x7ff318006d20->0x7ff3180251b0)(0),
lengths: (*main._Ctype_ulong)(0x7ff318006d30)(0),
maybe_null: (*main._Ctype_char)(0x7ff318006d40)(0),
attributes: (**main._Ctype_char)(0x7ff318006d58->0x7ff318006b88)(39),
attribute_lengths: (*main._Ctype_ulong)(0x7ff318006d68)(2),
extension: (unsafe.Pointer) <nil>
})
Alright so huge help with #JimB who stuck with me even though I'm clearly less adept with Go (and especially CGO) but I've got a working version of my UDF, which is an easy and straight forward (and fast) function that pulls a single parameter out of a URL string and decodes it correctly and what not (e.g. %20 gets returned as a space, basically how you would expect it to work).
This seemed incredibly tricky with a pure C UDF because I don't really know C (as well as I know other languages), and there's a lot that can go wrong with URL parsing and URL parameter decoding, and native MySQL functions are slow (and there's not really a good, clean way to do the decoding either), so Go seemed like the better-than-perfect candidate for this kind of problem, for strong performance, ease of writing, and wide variety of easy to use built ins & third party libraries.
The full UDF and it's installation/usage instructions are here https://github.com/StirlingMarketingGroup/mysql-get-url-param/blob/master/main.go
First problem was debugging output. And I did that by Fprintfing to a tmp file instead of the standard output, so that I could check the file to see variable dumps.
t, err := ioutil.TempFile(os.TempDir(), "get-url-param")
fmt.Fprintf(t, "%#v\n", args.arg_type)
And then after I got my output (I was expecting args.arg_type to be an array like it is in C, but instead was a number) I needed to convert the data referenced by that number (the pointer to the start of the C array) to a Go array so I could set it's values.
argsTypes := *(*[2]uint32)(unsafe.Pointer(args.arg_type))
argsTypes[0] = C.STRING_RESULT
argsTypes[1] = C.STRING_RESULT

Inverting a function without rewriting it in Python

I have a string function (and I am sure it is reversible, so no need to test this), could I call it in reverse to perform the opposite operation?
For example:
def sample(s):
return s[1:]+s[:1]
would put the first letter of a string on the end and return it.
'Output' would become 'utputO'.
When I want to get the opposite operation, could I use this same function?
'utputO' would return 'Output'.
Short answer: no.
Longer answer: I can think of 3, maybe 4 ways to approach what you want -- all of which depend on how are you allowed to change your functions (possibly restricting to a sub-set of Python or mini language), train them, or run them normally with the operands you are expecting to invert later.
So, method (1) - would probably not reach 100% determinism, and would require training with a lot of random examples for each function: use a machine learning approach. That is cool, because it is a hot topic, this would be almost a "machine learning hello world" to implement using one of the various frameworks existing for Python or even roll your own - just setup a neural network for string transformation, train it with a couple thousand (maybe just a few hundred) string transformations for each function you want to invert, and you should have the reverse function. I think this could be the best - at least the "least incorrect" approach - at least it will be the more generic one.
Method(2): Create a mini language for string transformation with reversible operands. Write your functions using this mini language. Introspect your functions and generate the reversed ones.
May look weird, but imagine a minimal stack language that could remove an item from a position in a string, and push it on the stack, pop an item to a position on the string, and maybe perform a couple more reversible primitives you might need (say upper/lower) -
OPSTACK = []
language = {
"push_op": (lambda s, pos: (OPSTACK.append(s[pos]), s[:pos] + s[pos + 1:])[1]),
"pop_op": (lambda s, pos: s[:pos] + OPSTACK.pop() + s[pos:]),
"push_end": (lambda s: (OPSTACK.append(s[-1]), s[:-1])[1]),
"pop_end": lambda s: s + OPSTACK.pop(),
"lower": lambda s: s.lower(),
"upper": lambda s: s.upper(),
# ...
}
# (or pip install extradict and use extradict.BijectiveDict to avoid having to write double entries)
reverse_mapping = {
"push_op": "pop_op",
"pop_op": "push_op",
"push_end": "pop_end",
"pop_end": "push_end",
"lower": "upper",
"upper": "lower"
}
def engine(text, function):
tokens = function.split()
while tokens:
operator = tokens.pop(0)
if operator.endswith("_op"):
operand = int(tokens.pop(0))
text = language[operator](text, operand)
else:
text = language[operator](text)
return text
def inverter(function):
inverted = []
tokens = function.split()
while tokens:
operator = tokens.pop(0)
inverted.insert(0, reverse_mapping[operator])
if operator.endswith("_op"):
operand = tokens.pop(0)
inverted.insert(1, operand)
return " ".join(inverted)
Example:
In [36]: sample = "push_op 0 pop_end"
In [37]: engine("Output", sample)
Out[37]: 'utputO'
In [38]: elpmas = inverter(sample)
In [39]: elpmas
Out[39]: 'push_end pop_op 0'
In [40]: engine("utputO", elpmas)
Out[40]: 'Output'
Method 3: If possible, it is easy to cache the input and output of each call, and just use that to operate in reverse - it could be done as a decorator in Python
from functools import wraps
def reverse_cache(func):
reverse_cache = {}
wraps(func)
def wrapper(input_text):
result = func(input_text)
reverse_cache[result] = input_text
return result
wrapper.reverse_cache = reverse_cache
return wrapper
Example:
In [3]: #reverse_cache
... def sample(s):
... return s[1:]+s[:1]
In [4]:
In [5]: sample("Output")
Out[5]: 'utputO'
In [6]: sample.reverse_cache["utputO"]
Out[6]: 'Output'
Method 4: If the string operations are limited to shuffling the string contents in a deterministic way, like in your example, (and maybe offsetting the character code values by a constant - but no other operations at all), it is possible to write a learner function without the use of neural-network programming: it would construct a string with one character of each (possibly with code-points in ascending order), pass it through the function, and note down the numeric order of the string that was output -
so, in your example, the reconstructed output order would be (1,2,3,4,5,0) - given that sequence, one just have to reorder the input for the inverted function according to those indexes - which is trivial in Python:
def order_map(func, length):
sample_text = "".join(chr(i) for i in range(32, 32 + length))
result = func(sample_text)
return [ord(char) - 32 for char in result]
def invert(func, text):
map_ = order_map(func, len(text))
reordered = sorted(zip(map_, text))
return "".join(item[1] for item in reordered)
Example:
In [47]: def sample(s):
....: return s[1:] + s[0]
....:
In [48]: sample("Output")
Out[48]: 'utputO'
In [49]: invert(sample, "uputO")
Out[49]: 'Ouput'
In [50]:

Calling std::vector constructor when containing class manually allocated

I'm afraid to ask questions in case it turns out to be stupid... But I tried to search and don't see the same situation.
I'm retrofitting a std::vector into some existing legacy code that is mostly C style. Our next major release which isn't due for a year or two will jettison a lot of the legacy code. But for now, the way we work is, every project gets recompiled for the customer, depending on specs. Some projects are on Visual Studio 2008, some 2010, etc. My added std::vector code I'm working on has no visible problems when compiled with 2013, but, I get crashes within the STL code when running VS 2008 SP1.
The existing code has a struct, and a fixed size array in it:
#define MAX_REMOTE_CONN 75
typedef struct {
int rno;
int adrs;
bool integ_pending;
} RTUref;
typedef struct {
char device[64];
int port;
RTUref RTU[MAX_REMOTE_CONN];
// more stuff...
} Connect_Info;
So, my basic goal is to get rid of the hard coded size limit to the RTU array. So, I have revised it like this:
class{
public:
int rno;
int adrs;
bool integ_pending;
} RTUref;
typedef std::vector <RTUref> RTUlist;
typedef struct {
char device[64];
int port;
RTUlist RTU;
// more stuff...
} Connect_Info;
The Connect_Info structs are allocated using our own memory manager. Don't know much about it other than it is supposed to be more efficient than use malloc() and free(). I'm guessing that the constructor for RTU doesn't get called since the struct it is contained in data allocated by our own memory manager?
Nevertheless, the code where I size the array, put values into the array all at least seem to work okay. But, when I call .clear() I get a crash from within the STL. And as I said, only if I use 2008. If I use 2013, I don't get that crash.
Assuming pct is a pointer to an allocated Connect_Info structure, the the line:
pct->RTU.clear();
Generates a crash on VS 2008. I am able to resize and add elements to the array. And I even tried to add a check that I don't clear unless the size is greater than zero like so:
if (pct->RTU.size() > 0)
pct->RTU.clear();
And I still get the crash on the clear.
So, I made the educated guess that I need to call a constructor. But, I wasn't quite sure of how to do it. But, in the code where the Connect_Info struct is allocated, I tried to add contructor code like this:
pct->RTU = RTUlist();
It compiles. But, I then get a crash in the STL on that line.
I haven't yet tried to build a small contained test program, as I'm not even sure that I will be able to reproduce the problem without our memory manager. But, I will try if that is what I need to do. I thought maybe someone might see obviously what I'm doing wrong. I'm fairly novice to the STL.
A little background: there is a term in C++ called "POD Type" (or "Plain Old Data Type").
There are verbose rules, but basically things that may do special things on allocations, deallocations, or copies are not POD types. The original Connect_Info was a POD type since it didn't do special things at those times and didn't have any non-POD members.
However, since you added a std::vector (which is not a POD type because it has to do special things at allocation, deallocation, copy, etc (because it allocates memory)), Connect_Info is not a POD type.
POD types can be allocated safely with malloc and deallocated with free since they don't do special things. However, non-POD types cannot (except in exceedingly rare cases which you'll first see after several years of programming C++) be allocated like that.
C only has POD types, so malloc is perfectly acceptable. There are a few options you can do:
int main ( ... )
{
Connect_Info * info = new Connect_Info() ;
std::cout << info->port << std::endl ;
delete info ;
}
Or
Connect_Info * makeOne ()
{
void * ptr = malloc ( sizeof(Connect_Info) ) ;
if ( ! ptr ) return 0 ;
return new (ptr) Connect_Info () ; // "In-Place constructor"
}
void deleteOne ( Connect_Info * info )
{
if ( ! ptr ) return ;
info = info->~Connect_Info() ; // manually call its destructor with the weirdest syntax ever
// Note: I'm not 100% sure this call to 'free' is right because the in-place new can return a different pointer, but I don't know how to the get the original back
free ( static_cast<void*>(info) ) ;
}
int main ( ... )
{
Connect_Info * info = makeOne ()
std::cout << info->port << std::endl ;
deleteOne ( info ) ;
}
If you have boost available (or C++11, which you probably don't), this is a MUCH better option (and only uses header components of boost):
boost::shared_ptr<Connect_Info> makeOne ()
{
return boost::make_shared<Connect_Info> () ;
}
int main ( ... )
{
boost::shared_ptr<Connect_Info> info = makeOne ()
std::cout << info->port << std::endl ;
// nothing else: shared_ptr takes care of that for you
}
(If you have C++11, use std::shared_ptr and std::make_shared)

Fast JSON Parser for Matlab

Do you know a very fast JSON Parser for Matlab?
Currently I'm using JSONlab, but with larger JSON files (mine is 12 MB, 500 000 lines) it's really slow. Or do you have any tips' for me to increase the speed?
P.S. The JSON file is max. 3 levels deep.
If you want to be fast, you could use the Java JSON parser.
And before this answer gets out of hand, I am going to post the stuff I put down so far:
clc
% input example
jsonStr = '{"bool1": true, "string1": "some text", "double1": 5, "array1": [1,2,3], "nested": {"val1": 1, "val2": "one"}}'
% use java..
javaaddpath('json.jar');
jsonObj = org.json.JSONObject(jsonStr);
% check out the available methods
jsonObj.methods % see also http://www.json.org/javadoc/org/json/JSONObject.html
% get some stuff
b = jsonObj.getBoolean('bool1')
s = jsonObj.getString('string1')
d = jsonObj.getDouble('double1')
i = jsonObj.getJSONObject('nested').getInt('val1')
% put some stuff
jsonObj = jsonObj.put('sum', 1+1);
% getting an array or matrix is not so easy (you get a JSONArray)
e = jsonObj.get('array1');
% what are the methods to access that JSONArray?
e.methods
for idx = 1:e.length()
e.get(idx-1)
end
% but putting arrays or matrices works fine
jsonObj = jsonObj.put('matrix1', ones(5));
% you can get these also easily ..
m1 = jsonObj.get('matrix1')
% .. as long as you dont convert the obj back to a string
jsonObj = org.json.JSONObject(jsonObj.toString());
m2 = jsonObj.get('matrix1')
If you can afford to call .NET code, you may want to have a look at this lightweight guy (I'm the author):
https://github.com/ysharplanguage/FastJsonParser#PerfDetailed
Coincidentally, my benchmark includes a test ("fathers data") in the 12MB ballpark precisely (and with a couple levels of depth also) that this parser parses into POCOs in under 250 ms on my cheap laptop.
As for the MATLAB + .NET code integration:
http://www.mathworks.com/help/matlab/using-net-libraries-in-matlab.html
'HTH,
If you just want to read JSON files, and have a C++11 compiler, you can use the very fast json_read mex function.
Since Matlab 2016b, you can use jsondecode.
I have not compared its performance to other implementations.
From personal experience I can say that it is not horribly slow.