Handling DoS from untrusted sockets (and other streams) - tcl

This TIP confused me. It seems to be saying that -buffering line makes the input buffer infinitely large, when I thought line buffering only affected flushing of output? Can't I use -buffersize 5000 together with -buffering line to protect me from people sending long lines? If I can, then what good is chan pending? To discover when the buffer is full without a line break in it?
Or are there two different buffers? One that's just for pre-reading data to save time, and one internal that commands like gets and read use?
EDIT: Or is the problem created only when you use gets because it doesn't return partial lines? Does gets put the stream into an infinite large buffer mode because otherwise if the buffer filled up without a line break, gets could never return it? Is this the "line buffer mode" that the TIP talks about?

First off, the -buffersize option is for output, not input. I've never needed to set it in the past few years; Tcl's buffer management is pretty good.
Secondly, the -buffering option is also for output.
Thirdly, you're vulnerable to someone sending you a vastly long line if you're using blocking channels. You just have no opportunity to do anything other than wait for the end of the line (or the end of the file) to come.
But in non-blocking mode, things are more subtle. You get a readable fileevent for the channel (not relevant for files, but you can check their size is sane more easily, and they're not normally a problem in any case) and do a gets $theChannel line, which returns a -1. (If 0 or more, you've got a complete line.)
So what does the -1 mean? Well, it means that either the line is incomplete or you've got to the end of the stream. You can distinguish the cases with fblocked/chan blocked (or eof to detect the reverse case) and you find that the line isn't there yet. What now? Check to see how much data has been buffered with chan pending input; if there's a silly amount (where “silly” is tunable) then it's time to give up on the channel as the other side isn't being nice (i.e., just close it).
I've yet to see a real use for chan pending output that isn't happier with writable fileevents, but it's not usually a big problem: just using fcopy/chan copy to spool data from large sources to the (slow) output channel works fine without bloating buffers a lot.

Related

How to apply backpressure to Tcl output channel?

We have an application that allows a user to pass an arbitrary Tcl code block (as a callback) to a custom API that invokes it on individual elements of a large data tree. For performance, this is done using a thread pool, so things can get ripping.
The problem is, we have no control over user code, and in one case they are doing a puts that causes memory to explode and the app to crash. I can prevent the this by redirecting stdout to /dev/null which leads me to believe that Tcl's internal buffers can't be emptied fast enough, so it keeps buffering. Heap analysis seems to confirm this.
What I don't understand is that I haven't messed with any of stdout's options, so it should be line buffered, blocking, 4k. So, my first question would be: why is this happening? Shouldn't there already be backpressure applied to prevent this?
My second question would be: how do I prevent this? If the user wants to to something stupid, I'm more than willing to throttle their performance, but I don't want the app to crash. I suppose one solution would be to redefine puts to write to a file (or simply do nothing) before the callback is invoked, but I'd be interested if there was a way to ensure backpressure on the channel to prevent it from continuing to buffer.
Thanks for any thoughts!
It depends on the channel type and how you've configured it. However, the normal model is that writes to a synchronous channel (-blocking true) will either buffer or write immediately (according to the -buffering option) and writes to an asynchronous channel (-blocking false) will, if not processed immediately, be queued to be carried out later by an internal event handler. For most applications, that does the right thing; it sounds like you've passed an asynchronous channel to code that doesn't call into the event loop (or at least not frequently). Try chan configureing the channel to be synchronous before starting the user code; you're in a separate thread so the blocking behaviour shouldn't be a problem for the rest of the application.
Some channels are more tricky. The one that people most normally encounter is the console channel in Tk on platforms such as Windows, where the channel ends up writing into a widget that doesn't have a maximum number of retained lines.

Speeding up puts operation to file in tcl

I have to puts large amounts of data to a file in TCL, and it takes very long. I tried increasing the buffer capacity from 4KB to 1MB using fconfigure, but noticed no improvement whatsoever.
I am not sure if I could flush my buffer at intervals, as I was guessing some of my data would be lost if I do so.
Is there some way I could increase the speed of puts without losing any data?
Generally the output speed is going to be limited by your disk drive's speed and computer system's i/o bandwidth.
Increasing the buffer size is probably the only thing you can do to help.
flush will slow down the write, as it will force-push the write buffer to the operating system.
If your incoming data stream ever pauses or comes in one big chunk that can be fit into memory, you can buffer the incoming data internally, and let the write catch up later.
If your data is coming from another channel (file, socket, whatever) then you can use fcopy to move it across. The fcopy command is careful to work as efficiently as possible, and if you configure both sides (incoming and outgoing) to use binary data transfer — so no encoding conversion or EOL/EOF character processing — then it can do it with minimal data copies; it's as efficient as a user-process level system can copy data (and you'd have to do hackery to move the copy into the OS kernel to do better). Obviously, having to process encoding conversion and transformation of end-of-line markers will slow things down.
Otherwise, the main bottleneck will still (probably) be device to which the output is being written. If it is going to a file, moving to writing to an SSD is the simplest option (but not necessarily the cheapest!) When writing over the network, better networking will make a gigantic difference. You really have to identify what the bottleneck really is; if Tcl is spending most of its time waiting for the hardware, there's very little point in working hard to make Tcl faster as you'll see virtually no results for that work. Fixing hardware bottlenecks is out of the scope of Stack Overflow, though some sister sites might be able to assist.
puts will not lose data unless you do something really evil like doing a force kill (kill -9) on the process, or doing a reset on the location of the file pointer from C code.

How to carry out synchronous processes in TCL

I am trying to carry out two process in parallel. Help me in writing a code in tcl which carries out two processes synchronously in TCL.
In Tcl, there are two ways to run a pair of subprocesses “at the same time”.
Simplest: Without control
If you just want to fire off two processes at once without keeping any control over them, put an ampersand (&) as the last argument to exec:
exec process1 "foo.txt" &
exec process2 "bar.txt" &
Note that, apart from the process ID (returned by exec) you've got no control over these subprocesses at all. Once you set them going, you'll essentially never hear from them again (using appropriate redirections to/from standard in/out may well be advisable!)
More complex: With control
To keep control over a subprocess while running it in the background, make it run in a pipeline created with open. The syntax for doing so is rather odd; be sure to follow it exactly (except as noted below):
set pipelineChannel1 [open |[list process1 "foo.txt" ] "r"]
set pipelineChannel2 [open |[list process2 "bar.txt" ] "r"]
These are reader pipelines where you're consuming the output of the subprocesses; that's what the (optional) r means. To get a pipeline that you write to (i.e., that you provide input to) you use w instead, and if you want to both read and write, use r+. The pipelines are then just normal channels that you use with puts, gets, read, fconfigure, etc. Just close when you are done.
The | must come outside and immediately before the [list …]. This matters especially if the name of the command (possibly a full pathname) has any Tcl metacharacters in it, and is because the specification of open says this:
If the first character of fileName is “|” then the remaining characters of fileName are treated as a list of arguments that describe a command pipeline to invoke, in the same style as the arguments for exec.
The main things to beware of when working with a pipeline are that.
The processing of the subprocesses really is asynchronous. You need to take care to avoid forcing too much output through at once, though turning on non-blocking IO with fconfigure $channel -blocking 0 is usually enough there.
The other processes can (and frequently do) buffer their output differently when outputting to a pipeline than when they're writing to a terminal. If this is a problem, you'll have to consider whether to use a package like Expect (which can also run multiple interactions at once, though that should be used much more sparingly as virtual terminals are a much more expensive and limited system resource than pipelines).
If you're doing truly complex asynchronous interactions with the subprocesses, consider using Tcl 8.6 where there are Tcllib packages built on top of the base coroutine feature that make keeping track of what's going on much easier.

OS development: How to avoid an infinite loop after an exception routine

For some months I've been working on a "home-made" operating system.
Currently, it boots and goes into 32-bit protected mode.
I've loaded the interrupt table, but haven't set up the pagination (yet).
Now while writing my exception routines I've noticed that when an instruction throws an exception, the exception routine is executed, but then the CPU jumps back to the instruction which threw the exception! This does not apply to every exception (for example, a div by zero exception will jump back to the instruction AFTER the division instruction), but let's consider the following general protection exception:
MOV EAX, 0x8
MOV CS, EAX
My routine is simple: it calls a function that displays a red error message.
The result: MOV CS, EAX fails -> My error message is displayed -> CPU jumps back to MOV CS -> infinite loop spamming the error message.
I've talked about this issue with a teacher in operating systems and unix security.
He told me he knows Linux has a way around it, but he doesn't know which one.
The naive solution would be to parse the throwing instruction from within the routine, in order to get the length of that instruction.
That solution is pretty complex, and I feel a bit uncomfortable adding a call to a relatively heavy function in every affected exception routine...
Therefore, I was wondering if the is another way around the problem. Maybe there's a "magic" register that contains a bit that can change this behaviour?
--
Thank you very much in advance for any suggestion/information.
--
EDIT: It seems many people wonder why I want to skip over the problematic instruction and resume normal execution.
I have two reasons for this:
First of all, killing a process would be a possible solution, but not a clean one. That's not how it's done in Linux, for example, where (AFAIK) the kernel sends a signal (I think SIGSEGV) but does not immediately break execution. It makes sense, since the application can block or ignore the signal and resume its own execution. It's a very elegant way to tell the application it did something wrong IMO.
Another reason: what if the kernel itself performs an illegal operation? Could be due to a bug, but could also be due to a kernel extension. As I've stated in a comment: what should I do in that case? Shall I just kill the kernel and display a nice blue screen with a smiley?
That's why I would like to be able to jump over the instruction. "Guessing" the instruction size is obviously not an option, and parsing the instruction seems fairly complex (not that I mind implementing such a routine, but I need to be sure there is no better way).
Different exceptions have different causes. Some exceptions are normal, and the exception only tells the kernel what it needs to do before allowing the software to continue running. Examples of this include a page fault telling the kernel it needs to load data from swap space, an undefined instruction exception telling the kernel it needs to emulate an instruction that the CPU doesn't support, or a debug/breakpoint exception telling the kernel it needs to notify a debugger. For these it's normal for the kernel to fix things up and silently continue.
Some exceptions indicate abnormal conditions (e.g. that the software crashed). The only sane way of handling these types of exceptions is to stop running the software. You may save information (e.g. core dump) or display information (e.g. "blue screen of death") to help with debugging, but in the end the software stops (either the process is terminated, or the kernel goes into a "do nothing until user resets computer" state).
Ignoring abnormal conditions just makes it harder for people to figure out what went wrong. For example, imagine instructions to go to the toilet:
enter bathroom
remove pants
sit
start generating output
Now imagine that step 2 fails because you're wearing shorts (a "can't find pants" exception). Do you want to stop at that point (with a nice easy to understand error message or something), or ignore that step and attempt to figure out what went wrong later on, after all the useful diagnostic information has gone?
If I understand correctly, you want to skip the instruction that caused the exception (e.g. mov cs, eax) and continue executing the program at the next instruction.
Why would you want to do this? Normally, shouldn't the rest of the program depend on the effects of that instruction being successfully executed?
Generally speaking, there are three approaches to exception handling:
Treat the exception as an unrepairable condition and kill the process. For example, division by zero is usually handled this way.
Repair the environment and then execute the instruction again. For example, page faults are sometimes handled this way.
Emulate the instruction using software and skip over it in the instruction stream. For example, complicated arithmetic instructions are sometimes handled this way.
What you're seeing is the characteristic of the General Protection Exception. The Intel System Programming Guide clearly states that (6.15 Exception and Interrupt Reference / Interrupt 13 - General Protection Exception (#GP)) :
Saved Instruction Pointer
The saved contents of CS and EIP registers point to the instruction that generated the
exception.
Therefore, you need to write an exception handler that will skip over that instruction (which would be kind of weird), or just simply kill the offending process with "General Protection Exception at $SAVED_EIP" or a similar message.
I can imagine a few situations in which one would want to respond to a GPF by parsing the failed instruction, emulating its operation, and then returning to the instruction after. The normal pattern would be to set things up so that the instruction, if retried, would succeed, but one might e.g. have some code that expects to access some hardware at addresses 0x000A0000-0x000AFFFF and wish to run it on a machine that lacks such hardware. In such a situation, one might not want to ever bank in "real" memory in that space, since every single access must be trapped and dealt with separately. I'm not sure whether there's any way to handle that without having to decode whatever instruction was trying to access that memory, although I do know that some virtual-PC programs seem to manage it pretty well.
Otherwise, I would suggest that you should have for each thread a jump vector which should be used when the system encounters a GPF. Normally that vector should point to a thread-exit routine, but code which was about to do something "suspicious" with pointers could set it to an error handler that was suitable for that code (the code should unset the vector when laving the region where the error handler would have been appropriate).
I can imagine situations where one might want to emulate an instruction without executing it, and cases where one might want to transfer control to an error-handler routine, but I can't imagine any where one would want to simply skip over an instruction that would have caused a GPF.

Associative cache simulation - Dealing with a Faulty Scheme

While working on simulating a fully associative cache (in MIPS assembly), a couple of questions came to mind based on some information read online;
According to some notes from the University of Maryland
Finding a slot: At most, one slot should match. If
there is more than one slot that
matches, then you have a faulty
fully-associative cache scheme. You
should never have more than one copy
of the cache line in any slot of a
fully-associative cache. It's hard to
maintain multiple copies, and doesn't
make sense. The slots could be used
for other cache lines.
Does that mean that I should check all the time the whole tag list in order to check for a second match? After all if I don't, i will never "realize" about the fault with the cache, yet, checking every single time seems quite inefficient.
In the case I do check, and somehow I manage to find a second match, meaning faulty cache scheme, what shall I do then? Although the best answer would be to fix my implementation, yet Im interested on how to handle it during execution if this situation should arise.
If more than one valid slot matches an address, then that means that when a previous search for the same address was executed, either a valid slot that should have matched the address was not used (perhaps because it was not checked in the first place) or more than one invalid slot was used to store the line that wasn't in the cache at all.
Without a doubt, this should be considered a bug.
But if we've just decided not to fix the bug (maybe we'd rather not commit that much hardware to a better implementation) the most obvious option is to pick one of the slots to invalidate. It will then be available for other cache lines.
As for how to pick which one to invalidate, if one of the duplicate lines is clean, invalidate that one in preference to a dirty cache line. If more than cache line is dirty and they disagree you have an even bigger bug to fix, but at any rate your cache is out of sync and it probably doesn't matter which you pick.
Edit: here's how I might implement hardware to do this:
First off, it doesn't make a whole lot of sense to start with the assumption of duplicates, rather we'll work around that at the appropriate time later. There are a few possibilities of what must happen when caching a new line.
The line is already in the cache, no action is needed
The line is not in the cache but there are invalid slots available: Place the new line into one of the available slots
The line is not in the cache but there are no invalid slots available. Another valid line must be evicted and the new line takes its place.
Picking an eviction candidate has performance consequences. Clean cache lines can be evicted for free, but if chosen poorly, it can cause another cache miss in the near future. Consider if all but one cache line is dirty. If only the clean cache line is evicted, then many sequential reads alternating between two addresses will cause a cache miss on every read. Cache invalidation is among the two hard problems in Comp Sci (the other being 'naming things') and out of the scope of this exact question.
I would probably implement a search that checks for the correct slot to act on for each of these. Then another block would pick the first from that list and act on it.
Now, getting back to the question. What are the conditions under which duplicates could possibly enter the cache. If memory accesses are strictly ordered, and the implementation (as above) is correct, I don't think duplicates are possible at all. And thus there's no need to check for them.
Now lets consider a more implausible case where A single cache is shared across two CPU cores. We're going to just do the simplest thing that could work and duplicate everything except the cache memory itself for each core. Thus the slot searching hardware is not shared. To support this, an extra bit per slot is used as a mutex. search hardware cannot use a slot that is locked by the other core. specifically,
If the address is in the cache, try to lock the slot and return that slot. If the slot is already locked, stall until it is free.
If the address is not in the cache, find an unlocked slot that is invalid or valid but evictable.
in this case we actually can end up in a position where two slots share the same address. If both cores try to write to an address that is not in the cache, they will end up getting different slots, and a duplicate line will occur. First lets think about what could happen:
Both lines were reads from main memory. They will be the same value and they will both be clean. It is correct to evict either.
Both lines were writes. Both will be dirty, but probably not be equal. This is a race condition that should have been resolved by the application by issuing memory fences or some other memory ordering instructions. We cannot guess which one should be used, if there was no cache the race condition would persist into RAM. It is correct to evict either.
One line was a read and one was a write. The write is dirty but the read is clean. Once again this race condition would have persisted into RAM if there was no intervening cache, but the reader could have seen a different value. evicting the clean line is right by RAM, and also has the side effect of always favoring read then write ordering.
So now we know what to do about it, but where does this logic belong. First lets think about what could happen if we don't do anything. A subsequent cache access for the same address on either core could return either line. Even if neither core is issuing writes, reads could keep coming up different, alternating between the two values. This breaks every conceivable idea about memory ordering.
one solution might be to just say that dirty lines belong to one core only, the line is not dirty, but dirty and owned by another core.
In the case of two concurrent reads, both lines are identical, unlocked and interchangeable. It doesn't matter which line a core gets for subsequent operations.
in the case of concurrent writes, both lines are out of sync, but mutually invisible. Although the race condition that this creates is unfortunate, it still leads to a reasonable memory ordering, as if all of the operations that happen on the discarded line happened before any of the operations on the cleaned line.
If a read and a write happen concurrently, the dirty line is invisible to the reading core. However, the clean line is visible to both cores, and would cause memory ordering to break down for the writer. future writes could even cause it to lock both (because both would be dirty).
That last case pretty much militates that dirty lines be preferred to clean ones. This forces at least some extra hardware to look for dirty lines first and clean lines only if no dirty lines were found. So now we have a new concurrent cache implementation:
If the address is in the cache and dirty and owned by the requesting core, use that slot
if the address is in the cache but clean
for reads, just use that slot
for writes, mark the slot as dirty and use that slot
if the address is not in the cache and there are invalid slots, use an invalid slot
if there are no invalid slots, evict a line and use that slot.
We're getting closer, there's still a hole in the implementation. What if both cores access the same address but not concurrently. The simplest thing is probably to just say that dirty lines are really invisible to other cores. In cache but dirty is the same as not being in the cache at all.
Now all we have to think about is actually providing the tool for applications to synchronize. I'd probably do a tool that just explicitly flushes a line if it is dirty. This would just invoke the same hardware that is used during eviction, but marks the line as clean instead of invalid.
To make a long post short, the idea is to deal with the duplicates not by removing them, but by making sure they cannot lead to further memory ordering issues, and leaving the deduplication work to the application or eventual eviction.