Sunday, October 2, 2011

Techniques of Computer Virus Detection



Antivirus scanners is simply described as a program that look for sequences of bytes extracted from computer viruses in files and in memory to detect them. This is one of the most popular methods to detect computer viruses, and it is reasonably effective. Nowadays, state-of-the-art antivirus software uses a lot more attractive features to detect complex viruses, which cannot be handled using first-generation scanners alone.


(1) String scanning 

String scanning uses an extracted sequence of bytes (strings) that is typical of the virus that is not likely to be found in clean programs. The sequences extracted from computer viruses are then organized in databases, which the virus scanning engines use to search predefined areas of files and system areas systematically to detect the viruses in the limited time allowed for the scanning. Indeed, one of the most challenging tasks of the antivirus scanning engine is to use this limited time (typically no more than a couple of seconds per file) wisely enough to succeed.

(2) Wild Cards 


A wildcard scan allows to skip bytes or byte ranges, and some scanners also allow regular expressions.

To simplify how wild scan works,  here is a simple illustration . 


The string below is a pseudo virus code 


DEAD 1984 ??69 %2 BE99 


(1) Try matching DE, if found continue.
(2) Try matching AD, if found continue.
(3) Try matching 19, if found continue.
(4) Try matching 84, if found continue.
(5) Ignore this byte.
(6) Try matching 69, if found continue.
(7) Try to match BE in any of the following 3 positions and if found continue.
(8) Try matching 99, if found report infection.


If you had some basic computer science knowledge, you can see that wildcard strings are often supported for nibble bytes, which allow more precise matches of instruction groups. Some early-generation encrypted and even polymorphic viruses can be detected easily with wildcard-based strings.


(3) Mismatches
Mismatches allow N number of bytes in the string to be any value, regardless of their position in the string. 
For example, the " 01 02 03 04 05 07 08 09 0A 0B 0C 0D 0E 0F 10 "string with the mismatch value of 2 would match any of the following patterns.

01 02 03 04 05 06 AA 08 BB 0A 0B 0C 0D 0E 0F 10
01 CC 03 04 05 06 07 DD 09 0A 0B 0C 0D 0E 0F 10
EE 02 03 04 05 FF 07 08 09 0A 0B 0C 0D 0E 0F 10

Mismatches are especially useful in creating better generic detections for a family of computer viruses. 
The downside of this technique is that it is a rather slow scanning algorithm.

(4) Hashing
Hashing is a common techniques to speed up the search algorithms. It can be done using 16-bits or 32-bits words of scan string. Antivirus software can even control the hashing better by selective of what the start bytes of the string will contains such as avoids first byte that are common in normal files such as 0X00H. The antivirus with further efforts can typically start with common bytes, hence effectively reducing the numbers of matches. To be extremely fast, some antivirus does not provide wildcard support. A very "exotic" hash able allows wild cards in the string but it uses two hash table and corresponding linked list of strings. Hence, the first table usually contains index bits to the second table .
(5) Start and End Scanning

Start and End scan (a.k.a Top-Tail scan) is used to speed up virus detection by scan the top and the end of file, rather than the entire file. For example, 2kb , 4kb and even 8kb of file scanned for each possible location. As CPU speed become faster, the scan is becoming typical I/O bound as to optimize scan speed, the number of disk read need to be lessen. The early viruses is prefixed, appended and replaced the host files making start and end scan method becomes fairly popular in the early days. 






No comments:

Another random post to read ? Come !

Related Posts with Thumbnails