From d4975136be48f6b791bd7bd007ae811b798bd3a7 Mon Sep 17 00:00:00 2001 From: "Markus F.X.J. Oberhumer" Date: Mon, 18 Dec 2000 08:44:25 +0000 Subject: [PATCH] *** empty log message *** committer: mfx 977129065 +0000 --- doc/filter.txt | 148 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 148 insertions(+) create mode 100644 doc/filter.txt diff --git a/doc/filter.txt b/doc/filter.txt new file mode 100644 index 00000000..c4b76057 --- /dev/null +++ b/doc/filter.txt @@ -0,0 +1,148 @@ +This document explains the concept of "filtering" in UPX. Basically +filtering is a data preprocessing method which could improve the +compression ratio of the files UPX processes. + +Currently the filters UPX uses are all based on one very special +algorithm which is working well on ix86 executable files. +This is what upx calls the "naive" implementation. There is also a +"clever" method which works only with 32 bit executable file formats +and was first implemented in UPX. + +Let's start with an example (from this point I assume a 32 bit file +format). Consider this code fragment: + +00025970: E877410600 calln FatalError +00025975: 8B414C mov eax,[ecx+4C] +00025978: 85C0 test eax,eax +0002597A: 7419 je file:00025995 +0002597C: 85F6 test esi,esi +0002597E: 7504 jne file:00025984 +00025980: 89C6 mov esi,eax +00025982: EB11 jmps file:00025995 +00025984: 39C6 cmp esi,eax +00025986: 740D je file:00025995 +00025988: 83C4F4 add (d) esp,F4 +0002598B: 68A0A91608 push 0816A9A0 +00025990: E857410600 calln FatalError +00025995: FF45F4 inc [ebp-0C] + +Here you can find two calls to a function called "FatalError". As you +probably know the compression ratio is better if the compressor engine +finds longer sequences of repeated strings. In this case the engine +sees the following two byte sequences: + +E877 410600 8B and +E857 410600 FF. + +So it can find a 3-byte-long match. + +Now comes the trick. On ix86 near calls are encoded as 0xE8 then a 32 +bit relative offset to the destination address. Let's see what +happens if the position of the call is added to that offset: + +0x64177 + 0x25970 = 0x89AE7 +0x64157 + 0x25990 = 0x89AE7 + +E8 E79A0800 8B +E8 E79A0800 FF + +As you can see now the compressor engine finds a 5-byte-long match. +Which means, that we've just saved 2 bytes of compressed data. Not bad. + +So this is the basic idea (the "naive" implementation). All we have to +do is to "filter" the uncompressed data using this method before +compression, and "unfilter" it after decompression. Simply go over the +memory, find 0xE8 bytes and process the next 4 bytes as specified +above. + +Of course there are several possibilities where this scheme could be +improved. First, not only calls could be handled this way - near jumps +(0xE9 + 32 bit offset) could work similarly. + +A second improvement could be if we limit this filtering only for the +area occupied by real code - there is no point in messing with general +data. + +Another improvement comes if the byte order of the 32 bit offset is +reversed. Why? Here is another call which follows the above fragment: + +000261FA: E8C9390600 calln ErrorF + +0x639C9 + 0x261FA = 0x89BC3 + +E8 C39B 0800 compare this with + +E8 E79A 0800 + +As you can see these two functions are quite close together, but the +compressor is not able to utilize this information (2-byte-long matches +are usually not useful) unless the byte order of the offsets are +reversed. In this case: + +E8 0008 9AE7 + +E8 0008 9BC3 + +So, the compressor engine finds a 3-byte-long match here. This is a +nice improvement - now the engine utilizes the similarity of nearby +destinations too. + +This is nice, but what happens when we find a "fake" call - ie. an 0xE8 +which is part of another instruction? Like this: + +0002A3B1: C745 E8 00000000 mov [ebp-18],00000000 + +In this case those nice 0x00 bytes are overwritten with some less +compressible data. This is the disadvantage of the "naive" +implementation. + +So let's be clever and try to detect and process only "real" calls. In +UPX a simple method is used to find these calls. We simply check that +the destinations of these calls are inside the same area as the calls +themselves (so the above code is still a false positive, but it helps +generally). A better method would be to actually disassemble the code - +contributions are welcome :-) + +But this is only half of the job. We can not simply process one call +then skip another one - the unfiltering process needs some information +to be able to reverse the filtering. + +UPX uses the following idea, which works nicely. First we assume that +the size of the area that should be filtered is less than 16MB. Then +UPX scans over this area and keeps a record of the bytes that are +following the 0xE8 bytes. If we are lucky, there will be bytes that +were not found following 0xE8. These bytes are our candidates to be +used as markers. + +Do you still remember that we assumed that the size of scanned area is +less than 16MB? Well, this means that when we process a real call, the +resulting offset will be less than 0x00FFFFFF too. So the MSB is always +0x00. Which is a nice place to store our marker. Of course we should +reverse the byte order in the resulting offset - so this marker will +appear just after the 0xE8 byte and not 4 bytes after it. + +That's all. Just go over the memory area, identify the "real" calls, +and use this method to mark them. Then the job of the unfilter is very +easy - it just searches for a 0xE8 + marker sequence and does the +unfiltering if it finds one. It's clever, isn't it? :) + +To tell you the truth it's not this simple in UPX. It can use an +additional parameter ("add_value") which makes things a little bit more +complicated (for example it can happen that a found marker is proven to +be unusable because of some overflow during an addition). + +And the whole algorithm is optimized for simplicity on the unfiltering +side (as short and as fast assembly as possible - see stub/macros.ash), +which makes the filtering process a little more difficult (fcto_ml.ch, +fcto_ml2.ch, filteri.cpp). + +As it can be seen in filteri.cpp, there are lots of variants of this +filtering implemented - native/clever, calls/jumps/calls&jumps, +reversed/unreversed offsets - a sum of 18 slightly different filters +(and another 9 variants for 16 bit programs). + +You can select one of them using the command line parameter "--filter=" +or try most of them with "--all-filters". Or just let upx use the one +we defined as the default for that executable format. + +EOF