Antivirus scanners is simply described as a program that look for sequences of bytes extracted from computer viruses in files and in memory to detect them. This is one of the most popular methods to detect computer viruses, and it is reasonably effective. Nowadays, state-of-the-art antivirus software uses a lot more attractive features to detect complex viruses, which cannot be handled using first-generation scanners alone.
(1) String scanning
String scanning uses an extracted sequence of bytes (strings) that is typical of the
virus that is not likely to be found in clean programs. The sequences extracted from
computer viruses are then organized in databases, which the virus scanning
engines use to search predefined areas of files and system areas systematically
to detect the viruses in the limited time allowed for the scanning. Indeed, one
of the most challenging tasks of the antivirus scanning engine is to use this
limited time (typically no more than a couple of seconds per file) wisely enough
to succeed.
(2) Wild Cards
A wildcard scan allows to skip bytes or byte ranges, and some scanners also allow
regular expressions.
To simplify how wild scan works, here is a simple illustration .
The string below is a pseudo virus code
DEAD 1984 ??69 %2 BE99
(1) Try matching DE, if found continue.
(2) Try matching AD, if found continue.
(3) Try matching 19, if found continue.
(4) Try matching 84, if found continue.
(5) Ignore this byte.
(6) Try matching 69, if found continue.
(7) Try to match BE in any of the following 3 positions and if found continue.
(8) Try matching 99, if found report infection.
If you had some basic computer science knowledge, you can see that wildcard strings are often supported for nibble bytes, which allow more precise matches of instruction groups. Some early-generation encrypted and even polymorphic viruses can be detected easily with wildcard-based strings.
(3) Mismatches
Mismatches allow N number of bytes in the string
to be any value, regardless of their position in the string.
For example, the " 01
02 03 04 05 07 08 09 0A 0B 0C 0D 0E 0F 10 "string with the mismatch value of 2 would match any of the
following patterns.
01 02 03 04 05 06 AA 08 BB 0A 0B 0C 0D 0E 0F 10 01 CC 03 04 05 06 07 DD 09 0A 0B 0C 0D 0E 0F 10 EE 02 03 04 05 FF 07 08 09 0A 0B 0C 0D 0E 0F 10
Mismatches are especially useful in creating better generic detections for a family of computer viruses.
The downside of this technique is that it is a rather slow scanning algorithm.
(4) Hashing
(5) Start and End ScanningHashing is a common techniques to speed up the search algorithms. It can be done using 16-bits or 32-bits words of scan string. Antivirus software can even control the hashing better by selective of what the start bytes of the string will contains such as avoids first byte that are common in normal files such as 0X00H. The antivirus with further efforts can typically start with common bytes, hence effectively reducing the numbers of matches. To be extremely fast, some antivirus does not provide wildcard support. A very "exotic" hash able allows wild cards in the string but it uses two hash table and corresponding linked list of strings. Hence, the first table usually contains index bits to the second table .
Start and End scan (a.k.a Top-Tail scan) is used to speed up virus detection by scan the top and the end of file, rather than the entire file. For example, 2kb , 4kb and even 8kb of file scanned for each possible location. As CPU speed become faster, the scan is becoming typical I/O bound as to optimize scan speed, the number of disk read need to be lessen. The early viruses is prefixed, appended and replaced the host files making start and end scan method becomes fairly popular in the early days.
No comments:
Post a Comment