AFRL-IF-RS-TR-2001-103
gak
Final Technical Report
By
May 2001
_ a\v
few
\ lk£m
w
HI
USING A LARGE CYC-BASED ONTOLOGY TO
MODEL AND PREDICT VULNERABILITIES AT
THE REAL-WORLD INFO-SYSTEM BOUNDARY
CYCORP, Inc.
Sponsored by
Defense Advanced Research Projects Agency
DARPA Order No. H504/00
APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED.
The views and conclusions contained in this document are those of the authors and should not he
interpreted as necessarily representing the official policies, either expressed or implied, of the
Defense Advanced Research Projects Agency or the U.S. Government.
AIR FORCE RESEARCH LABORATORY
INFORMATION DIRECTORATE
ROME RESEARCH SITE
ROME, NEW YORK
20010713 043
This report has been reviewed by the Air Force Research Laboratory, Information
Directorate, Public Affairs Office (IFOIPA) and is releasable to the National Technical
Information Service (NTIS). AtNTIS it will be releasable to the general public,
including foreign nations.
AFRL-IF-RS-TR-2001-103 has been reviewed and is approved for publication.
APPROVED: PETER J. ROCCI, Jr.
Project Engineer
FOR THE DIRECTOR: JAMES A. COLLINS, Acting Chief
Information Technology Division
Information Directorate
If your address has changed or if you wish to be removed from the Air Force Research
Laboratory Rome Research Site mailing list, or if the addressee is no longer employed by
your organization, please notify AFRL/IFTD, 525 Brooks Road, Rome, NY 13441-4505.
This will assist us in maintaining a current mailing list.
Do not return copies of this report unless contractual obligations or notices on a specific
document require that it be returned.
USING A LARGE CYC-BASED ONTOLOGY TO MODEL AND PREDICT
VULNERABILITIES AT THE REAL-WORLD INFO-SYSTEM BOUNDARY
Blake Shepard
Contractor: CYCORP, Inc.
Contract Number: F30602-99-C-0142
Effective Date of Contract: 11 May 1999
Contract Expiration Date: 30 November 2000
Short Title of Work: Using a Large CYC_Based Ontology to Model
and Predict Vulnerabilities at the Real-World Info-
System Boundary
Period of Work Covered: May 99 - Nov 00
Principal Investigator:
Phone:
AFRL Project Engineer:
Phone:
Blake Shepard
(512) 514-2952
Peter J. Rocci, Jr.
(315) 330-4654
Approved for public release; distribution unlimited.
This research was supported by the Defense Advanced Research
Projects Agency of the Department of Defense and was monitored
by Peter J. Rocci, Jr., AFRL/IFTD, 525 Brooks Rd, Rome, NY.
REPORT DOCUMENTATION PAGE
Form Approved
OMB No. 0704-0188
Public reporting burden for this collection of Womnetion is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, andcompleting and reviewing
the collection of information. Send comments regarding this burden estimate or any other aspect of this coleclion of information, including suggestions for reducing this < burden, to Wuhnptm Headquarters Services, Drectorata for Information
Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704*0188), Washngton, DC 20503.
1. AGENCY USE ONLY (Leave blankI 2. REPORT DATE
May 01_
4. TITLE AND SUBTITLE
USING A LARGE CYC-BASED ONTOLOGY TO MODEL AND PREDICT
VULNERABILITIES AT THE REAL-WORLD INFO-SYSTEM BOUNDARY
6. AUTHORIS)
Blake Shepard
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)
CYCORP, Inc.
3721 Executive Center Drive
Austin, TX 78731
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)
AFRL/IFTD
525 Brooks Rd
Rome NY 134414505
3. REPORT TYPE AND DATES COVERED
Final May 99 - Nov 00
5. FUNDING NUMBERS
- F30602-99-C-0142
- 63760E
-H504
-34
WU -01
8. PERFORMING ORGANIZATION
REPORT NUMBER
10. SP0NS0RINGJM0NIT0RING
AGENCY REPORT NUMBER
AFRL-IF-RS-TR-2001-103
11. SUPPLEMENTARY NOTES
AFRL Project Engineer: Peter J. Rocci, IFTD, 315-3304654
12a. DISTRIBUTION AVAILABILITY STATEMENT
Approved for public release; distribution unlimited.
13. ABSTRACT (Maximum 200 words)
This work is part of the new move toward content, as opposed to the architecture or methodology or algorithms, as die
critical factor in complex information systems. The Information Assurance vulnerabilities problem is the quintessential
example of "new and unexpected situations arising". In this effort, we have added a tremendous amount of knowledge to the
Cyc Knowledge Base that enables Cyc to reason about cyber and non-cyber vulnerabilities, electronic attacks, and the
relation between vulnerabilities and potential attacks.
14. SUBJECT TERMS
Information Assurance, Knowledge-Bases, Vulnerabilities
17. SECURITY CLASSIFICATION
OF REPORT
UNCLASSIFIED
18. SECURITY CLASSIFICATION
OF THIS PAGE
UNCLASSIFIED
19. SECURITY CLASSIFICATION
OF ABSTRACT
15. NUMBER OF PAGES
24
16. PRICE CODE
20. LIMITATION OF
ABSTRACT
UNCLASSIFIED | _ UL
Using a Large CYC®-Based Ontology
To Model and Predict Vulnerabilities
At the Real-World <--> Info-System Boundary
Cycorp
3721 Executive Center Drive, Suite 100
Austin, TX 78731
1. Introduction
Cycorp's work on I ASET has focused primarily on two fronts. First, we have carefully
and richly represented hundreds of concepts, constraints, and rules that enable Cyc to
reason about the cyber and non-cyber vulnerabilities of information systems
(approximately 600 constants and 4000 assertions). Second, we have developed
mechanisms to enable Cyc to learn about an information system automatically and to
provide a vulnerability assessment of it. Our twin goals were (1) to extend the existing
Cyc-based HPKB IKB (Integrated Knowledge Base) ontology to cover intrusion,
interception, battlefield, and related concepts that occur in patterns of real-world facts that
should lead an INFOSEC officer or commander to suspect a possible vulnerability, and
(2) to build a vulnerability analyst to trigger alerts and to provide a means to query about
the vulnerability of a system. We have accomplished these goals. Cyc can take an
automatically generated representation of a network in the manner just described and
reason about its vulnerabilities using Cyc's full range of common-sense and cyber¬
vulnerability specific knowedge, to provide a sophisticated vulnerability assessment of
the network.
In section 2 of this report, we will provide an overview of the Cyc technology. In section
3,we will describe the information we have added to the Cyc knowledge base about the
cyber and non-cyber vulnerabilities of information systems. In section 4, we will
described the technology we have developed to integrate Cyc with sources of
information that enable it to learn about information systems automatically.
2. Overview of Cyc
Cyc technology consists of an immense multi-contextual knowledge base, an efficient
inference engine, a set of interface tools, and a number of special-purpose application
modules for Unix, Windows NT, Solaris and other platforms. The knowledge base is
built on a core of over 1,000,000 hand-entered assertions designed to capture a large
portion of what we normally consider to be "common-sense knowledge" about the world.
1
For example, Cyc knows that trees are usually outdoors, that once people die they stop
buying things, and that glasses containing liquid should be carried nghtside-up.
2.1. The Cyc knowledge base
The Cyc knowledge base ("KB") is an immense set of assertions about the world. Those
assertions may be stated as expressions in CycL, the Cyc representation language, which
is as expressive as first-oder logic with identity. The terms of CycL expressions can be
variables, certain kinds of objects native to the computational substrate (such as strings or
integers), or Cyc constants. Cyc contains objects in the KB which are created to denote
particular concepts. Over the past fifteen years, approximately 100,000 concepts and just
over 1 and 1/4 million rules and assertions that inter-relate them have been carefully
hand-entered in the Cyc KB.
The Cyc objects which denote concepts are called "constants". They have unique names
and are wri tten with the prefix Cyc constants can either denote collections, like
"the collection of all electronic attacks", or individuals, like "the first electronic attack on
our enclave last Thursday". Every term in Cyc is an element of #$Thmg, the universal
collection. #$Thing is partitioned into #$Individual and #$SetOrCollection.
#$Individual denotes the set of all things which are not sets. Individuals in the Cyc KB
include constants such as #$CityOfSanFrancisco, #$RonaldReagan, #$Intemet,
#$SIPRNet, and #$MicrosoftOfficeSuite-ComputerProgram. #$Collection denotes the
set of all things that are not individuals. Collections in the KB include #$Person,
#$VulnerabilityType, #$AttackByComputerOperation, and #$AttackOnObject.
The fundamental type of CycL expression is the "atomic formula", where a predicate is
applied to one or more terms to indicate some relationship between the things denoted by
the terms:
(<predicate> <terml> <term2>...)
If there are no variables in the expression, all the terms are said to be "ground", and so the
expression is referred to as a "ground atomic formula", or "GAF".
Predicates are all strongly typed, and a single collection must be specified as the type for
each argument of every predicate.
For instance, the term #$performedBy, an instance of #$BinaryPredicate, has the
following assertions:
(#$argllsa #$performedBy #$Action)
(#$arg2Isa #$performedBy #$Agent)
These assertions specify the domain and range of #$performedBy, the collection whose
2
instances can be its first argument and the collection whose instances can be its second
argument. So we can only use #$performedBy to specify some agent which performs
some action.
Because every argument of every predicate in Cyc has type constraints, the space of valid
assertions is radically reduced.
2.2. The Cyc inference engine
The Cyc inference engine handles modus ponens and modus tollens (contrapositive)
inferencing, universal and existential quantification, and mathematical inferencing. It uses
contexts called "microtheories" to optimize inferencing by restricting search domains.
The Cyc knowledge base contains hundreds of thousands of assertions. Many approaches
commonly taken by other inference engines (such as frames, RETE, match, Prolog, etc.)
just don't scale well to KBs of this size. As a result, the Cyc team has been forced to
develop other techniques.
Cyc includes several hundred special-purpose inferencing modules for handling
specialized common types of inference. One set of modules handles reasoning concerning
collection membership, subsethood, and disjointness. Another handles equality reasoning.
Still others implement symmetry, transitivity and reflexivity reasoning.
Backward inferencing-the type of inferencing initiated by an ASK operation-can be
regarded as a search through a tree of nodes, where each node represents a CycL formula
for which bindings are sought, and each link represents a transformation achieved by
employing an assertion in the knowledge base.
For example, let's say I ask Cyc for bindings for the formula (#$likesObject ?x ?y). That
formula will constitute the root node of an inference search tree. What I am looking for is
any assertion which will help provide bindings for ?x and ?y. The KB may contain some
some if-then rules, such as the default rule:
(#$implies
(#$possesses ?x ?y)
(#$likesObject ?x ?y))
This assertion would constitute a link to a new node with a different formula to satisfy,
namely, the formula
(#$possesses ?x ?y)
Now Cyc might find an assertion in the KB that says
3
(#$possesses #$RonaldReagan #$ChocolateCandy002),
In which case Cyc would bind #$RonaldReagan to ?x and $ChocolateCandy002 to ?y.
The Cyc inference engine uses many specially-designed heuristic rules to decide which
leaf node to expand next in an inference search. The heuristic rules are based on the
synactic and semantic features of the formulas that occur at the nodes.
3. Representational work for IASET
Cycorp’s IASET work is part of the new move toward content, as opposed to the
architecture or methodology or algorithms, as the critical factor in complex information
systems. The novel technology is in the meaning of the assertions in the knowledge base.
There is a fundamental difference between brittle ad hoc expert systems, on the one hand,
and a use-neutral knowledge base such as Cyc in which each piece of knowledge is
entered at the highest appropriate level of generality such that new and unexpected
situations that arise in the future can exploit that accumulated corpus of knowledge. The
IA vulnerabilities problem is the quintessential example of "new and unexpected
situations arising". If one could anticipate all dangers, one could addres them. In
general, however, one can't anticipate them, so an INFOSEC officer or commander needs
to be alerted to novel vulnerabilities as soon as they are suspected. Further, knowledge
about the internal data state of a system will not, in general, suffice to ground
vulnerability assessments. Many Red Force behaviors in the real world, and other
external conditions, can indicate whether and when penetrations might occur (or may
have occurred). Cyc is the only system that is sufficiently general and complete to reason
effectively about novel suspected vulnerabilities and vulnerabilities that can be identified
by looking at information external to the data world.
To prepare Cyc to provide novel and cokmplex vulnerability assessments, we needed to
add a sophisticated representation of vulnerability to the Cyc knowledge base. For
IASET, we have added a tremendous amount of knowledge to the Cyc KB that enables
Cyc to reason about cyber and non-cyber vulnerabilities, electronic attacks, and the
relation between vulnerabilities and potential attacks. In section 3.1, the vulnerability
knowledge that has been added to Cyc is described. In 3.2, our approach to representing
electronic attacks in Cyc is discussed. Finally, in 3.3, we describe the way we represent
the relation between vulnerabilities and potential attacks.
3.1. The representation of vulnerabilities in Cyc
Representing the concept "vulnerability" is important for our project because
understanding what it is to be vulnerable, how vulnerabilities interact, and how particualr
vulnerabilities relate to particualr possible attacks is crucial for reasoning in a general
way about novel possible threats.
4
English speakers use the term "vulnerability" in a variety of ways. Thus, there is a suite of
predicates in CYC® to make assertions about vulnerability. Whenever something is
vulnerable it has the potential to incur some sort of damage. We link vulnerability to an
increased likelihood of incurring a particular type of damage rather than actual damage
should the vulnerability conditions materialize. The reason for this is that a vulnerability
itself, even in the conditions in which damage is likely to occur, is not, by itself, enough
to conclude that damage does occur.
Our CycL representations of vulnerability fall into several intersecting camps.^ We
distinguish "being vulnerable in a situation" from "being vulnerable to an object"; we
have ways to express that an object's vulnerability has increased or decreased from some
baseline; we can represent the fact that one situation (or object )makes an entity more
vulnerable to a certain sort of damage than another situation (or object) does; and we have
predicates for expressing an agent's vulnerability in virtue of some proposition being true
(this last employs the full expressive power of Cyc); we can also represent the fact that
vulnerabilities come in degrees. Finally, we sometimes want to say that something
possesses a very common or well-known type of vulnerability, such as “vulnerability to
the cold” or “Eric Allman Sendmail 8 vulnerability,” and we have developed an efficient
means of representing that sort of vulnerability.
What follows is a complete breakdown of the CycL predicates we use to represent
vulnerabilities in Cyc.
3.1.1. "Vulnerability in a situation" versus "vulnerability to an object"
#$vulnerableln and #$vulnerableTo are the CycL predicates we use to represent
"vulnerability in a situation" and "vulnerability to an object", respectively. The purpose
of #$vulnerableln is to represent the vulnerabilities a thing has by virtue of playing a role
in a situation. As an example of the distinction, consider a situation in which I am being
assaulted by a mugger with a knfe. In that situation, I am vulnerable to being killed.
Why? Because, in that situation, I am vulnerable to being mortally wounded by the knife
(#$vulnerableln THREAT-SITUATION VULNERABLE-THING HARM-TYPE)
means that when in situation THREAT-SITUATION, VULNERABLE-THING is
vulnerable to hardships of type HARM-TYPE. In contrast, the purpose of
#$vulnerableTo is to represent the vulnerabilities one object has by virtue of the harmful
capabilities of another object.
(#$vulnerableTo THREAT-OBJ OBJECT HARM-TYPE)
means that object THREAT-OBJ poses harm of type HARM-TYPE to OBJECT. In Cyc,
we have represented the connection between being vulnerable in a situation and being
vulnerable to an object as follows:
5
(#$implies
(#$vulnerableln 7SITUATI0N 7VULNERABLE-THING 7HARM-TYPE)
(#$thereExists 7THREAT-0BJ
(#$and
(#$parts 7SITUATI0N 7THREAT-0BJ)
(#$vulnerableTo 7THREAT-0BJ 7VULNERABLE-THING 7HARM-TYPE))))
This CycL rule represents the fact that whenever a thing is vulnerable to harm in a
situation, there is some threatening object which is part of the situation and which is
making the thing vulnerable to the harm. By virtue of this representation, if CYC®
knew I was vulnerable to getting killed in a mugging event, e.g.,
(#$vulnerableln #$Mugging001 #$Blake #$Killing-Biological)
Cyc would also be able to conclude that there is something in the event represented by
#$Mugging001 that makes me vulnerable to being killed:
(#$thereExists 7THREAT-OBJ
(#$and
(#$parts #$Mugging001 7THREAT-OBJ)
(#$vulnerableTo 7THREAT-OBJ #$Mugging001 #$Killing-Biological)))
3.1.2. Degrees of vulnerability
The predicate #$makesVulnerableToDegree enables Cyc to reason about changing
degrees of vulnerability.
(#$makesVulnerableToDegree SITUATION OBJECT RESULT-TYPE DEGREE)
means that when in SITUATION, OBJECT is vulnerable to hardships of type RESULT-
TYPE to degree DEGREE. We stipulate that the baseline level of vulnerability is #$Low,
and that when sound security measures are in place, vulnerability is #$VeryLow. CycL
rules, such as:
(#$implies
(#$and
(#$parts 7SIT3 7SIT1)
(#$parts 7SIT3 7SIT2)
(#$different 7SIT1 7SIT2 7SIT3)
(#$makesVulnerableToDegree 7SIT1 70BJECT 7RESULT-SPEC #$Low)
(#$makesVulnerableToDegree 7SIT2 70BJECT 7RESULT-SPEC #$Low))
(#$makesVulnerableToDegree 7SIT3 70BJECT 7RESULT-SPEC #$Medium))
6
enable CYC® to conclude that when more than one concurrent situation makes an object
vulnerable to the same ill effect to the same degree, vulnerability to that effect increases.
By virtue of reasoning with this sort of rule, CYC® would know that if being sneezed on
gave me a low-level vulnerability to catching a cold, then in a situatioin in which I was
sneezed on many times, I'd be medium-level vulnerable to catching a cold.
3.1.3. Increasing and decreasing vulnerabilities
The CycL predicates #$increasesVulnerabilityIn, #$increasesVulnerabilityTo,
#$decreasesVulnerabilityIn, and #$decreasesVulnerabilityTo (each of which is connected
to #$makesVulnerableToDegree), are predicates that enable CycL representations of
increasing and decreasing situational and object-related vulnerabilities.
(#$increasesVulnerabilityIn SITUATION OBJECT RESULT-TYPE)
means that being in SITUATION increases OBJECT'S vulnerability to RESULT-TYPE.
This predicate is similar to #$makesVulnerableToDegree except that, unlike
#$makesVulnerableToDegree, it expresses a relative increase in OBJECT'S vulnerability
rather than an absolute degree of vulnerability. #$increasesVulnerabilityIn is connected
to #$makesVulnerableToDegree via the following rule:
(#$implies
(#$and
(#$startsAfterEndingOf ?SIT2 ?SIT1)
(#$greaterThan ?DEG2 7DEG1)
(#$makesVulnerableToDegree 7SIT2 ?OBJ 7HARM 7DEG2)
(#$makesVulnerableT oDegree 7SIT1 70BJ 7HARM7DEG1))
(#$increasesVulnerabilityIn 7SIT2 70BJ 7HARM))
This rule says that if one situation follows another, and the degree of vulnerability of an
object in both situations moves from a relatively low value to a relatively high value, the
object's vulnerability increases.
3.1.4. Common types of vulnerability
We have created specific CycL contants to denote particular common types of
vulnerability. #$Vulnerability the most general type of vulnerability. #$Vulnerability is
the collection of #$StaticSituations in which the salient focal relationship which does not
change is that between a vulnerable entity and that which makes the entity vulnerable.
We use the CycL predicate #$hasVulnerabilityType to represent in Cyc that an object
has a particular type of vulnerability. For example, the assertion:
7
(#$hasVulnerabilityType #$UserAccountOO 1 #$UnauthorizedLoginVulnerability)
means that #$UserAccount001 is vulnerable to unauthorized logins. Here is a
representative list of the vulnerability types that have been represented in Cyc:
#$BufferVulnerability
#$CanonicalPasswordVulnerability
#$ComputerVirusVulnerability
#$CyberInfiltrationVulnerability
#$CyberVulnerability
#$CyberVulnerabilityExploit
#$DenialOfServiceVulnerability
#$DictionaryPasswordVulnerability
#$ExecutableStackV ulnerability
#$FilePermissionVulnerability
#$InsecureAccountV ulnerability
#$InsecureInformationVulnerability
#$InsecurePasswordV ulnerability
#$InsecureSoftwareV ulnerability
#$ JoePasswordVulnerability
#$NoteAboutVulnerability
#$PhysicalDamageVulnerability
#$PhysicalInsecurityVulnerability
#$Phy sical V ulnerability
#$PlaintextPasswordStorageVulnerability
#$PlaintextPasswordVulnerability
#$SGIDVulnerability
#$SUIDRootVulnerability
#$UnauthorizedLoginVulnerability
We represent sufficient conditions for all such common or well-known types of
vulnerability in Cyc. We do this by writing CycL rules that, in their antecedents, specify
the sufficient conditions for the reified vulnerability types mentioned in their
. consequents. For example, the rule:
(#$implies
(#$and
(#$isa 7PASSWORD #$Password-Weak)
(#SpasswordForUnixAccount 7AGENT 7ACCOUNT 7PASSWORD))
(#$has Vulnerability Type 7ACCOUNT #$UnauthorizedLoginVulnerability))
says that being a Unix account with a weak password is sufficient for having an
unauthorized login vulnerability (#$Password-Weak denotes the collection of all
canonical, short, and lexical passwords).
8
3.1.5. Sources of Vulnerability Data
A great deal of the knowledge we represented about cyber vulnerabilities for IASET was
accessed from various online sources of cyber vulnerability data. The online sources
were vulnerability databases or archived computer security mailing lists. Online
vulnerability databases, such as the one at www.securityfocus.com, are typically arranged
so that the specific vulnerabilities of computer programs are indexed by the operating
system on which they are known to cause problems. For example, Eric Allman sendmail
version 8.x is known to have a specific vulnerability for Linux, and we represented that
information in the Cyc KB for IASET. The archived vulnerability mailing lists are less
structured, but still proved to be an excellent source of representable vulnerability
information.
3.2. The representation of electronic attacks in CYC®
What in English are called "cyber attacks" or "electronic attacks" are represented in Cyc
by constants that denote subcollections of #$AttackByComputerOperation, which istelf is
a specialization of #$AttackTypeByWeaponType: it is the collection of all attacks
executed using computer operations as weapons.
Prior to the commencement of our IASET work, Cyc already knew a significant amount
about attacks in general, and that knowledge is inherited by the representations of
electronic attacks for IASET. For example, Cyc knows:
(#$implies
(#$and'
(#$isa ?ATT #$AttackOnObject)
(#$successfulForAgents ?ATT ?DOER)
(#$objectAttacked ?ATT ?OBJ))
(#$damages ?ATT ?0BJ)),
which means that a successful attack on an object damages it. CYC® also knows:
(#$implies
(#$and
(#$isa ?ATT #$AttackOnObject)
(#$performedBy ?ATT ?DOER)
(#$obj ectAttacked ?ATT ?OBJ))
(#$purposeInEvent ?DOER ?ATT
(#$damages ?ATT ?OBJ)))
which means that those who perform attacks do so with the intent of damaging the
objects they attack.
9
The vocabulary used to name constants in the hierarchy of CycL constants that represent
cyber attacks usually has the format '#$ElectronicAttack-x', where Y refers to a
conventional name of an attack (such as 'denial of sevice'). Each of these collections is a
subcollection of #$AttackByComputerOperation, and each is an instance of some type
level collection in the electronic attack hierarchy. Some constants in the
electronic attack hierarchy have the format "#$ElectronicIntelligenceAttack-x". These
constants designate subcollections of #$ElectronicIntelligenceAttack-General, which
itself is a subcollection of #$AttackByComputerOperation, and an instance of
#$ElectronicAttackType.
For IASET, we have reified numerous subcollections of #$AttackByComputerOperation.
Here is a representative list:
#$ElectronicAttack-Bonk
#$ElectronicAttack-BufferOverflow
#$ElectronicAttack-ComputerCrashing
#$ElectronicAttack-Coordinated
#$ElectronicAttack-CorruptionOfInformation
#$ElectronicAttack-DataFlooding
#$ElectronicAttack-DefacingAWebsite
#$ElectronicAttack-DenialOfService
#$ElectronicAttack-DestructionOfInformation
#$ElectronicAttack-Distributed
#$ElectronicAttack-EMailBomb
#$ElectronicAttack-LogicBomb
#$ElectronicAttack-Smurfing
#$ElectronicAttack-SynFlooding
#$ElectronicAttack-T earDrop
#$Electronic Attack-T imeBomb
#$ElectronicAttack-UDPPacketStorm
#$Electronic Attack-V andalism
#$Electronic Attack-Virus
#$Electronic Attack-W orm
The subcollections of #$AttackByComputerOperation are richly interconnected. For
example, #$ElectronicAttack-DefacingAWebsite is a more specific subcollection of
#$ElectronicAttack-Vandalism. #$ElectronicAttack-DataFlooding, #$ElectronicAttack-
EMailBomb, and #$ElectronicAttack-ComputerCrashing are all subcollections of
#$ElectronicAttack-DenialOfService. #$ElectronicAttack-Bonk, #$ElectronicAttack-
SynFlooding, #$ElectronicAttack-TearDrop, and #$ElectronicAttack-UDPPacketStorm
are all subcollections of both #$ElectronicAttack-ComputerCrashing and
#$ElectronicAttack-DataFlooding.
10
We have represented in Cyc elaborate specific information about each type of electronic
attack. For example, the rule
(#$implies
(#$and
(#$obj ectActedOn ?EVNT ?OBJ)
(#$isa ?EVNT #$ElectronicAttack-DenialOfService))
(#$thereExists ?ACT
(#$and
(#$holdsIn (#$STIB ?EVNT)
(#$behaviorCapable ?OBJ ?ACT #$deviceUsed))
(#$holdsIn (#$STIF ?EVNT)
(#$not
(#$behaviorCapable ?OBJ ?ACT #$deviceUsed))))))
says that objects that are the targets of successful denial of service attacks are capable of
functioning before they are attacked, but not after they are attacked. Also, the rule:
(#$implies
(#$isa 7SYNFLOOD #$ElectronicAttack-SynFlooding)
(#$likelihood
(#$isa 7SYNFLOOD #$ElectronicAttack-Distributed) #$HighToVeryHigh))
says that syn flooding attacks are very likely to be distributed attacks. There are dozens
of specific rules like these for all the subcollections of #$AttackByComputerOperation.
3.3. Linking vulnerabilities to potential attacks
Cyc understands the connection between vulnerabilities and potential attacks. One of the
most efficient ways Cyc can reason about the relation between vulnerabilities and
potential attacks is with what we call a "script-oriented" approach.
The script-oriented approach starts with the representation in CycL of common or well-
known types of vulnerability. We create a constant that represents a type of situation in
which an entity is vulnerable to a certain type of ill-effect. We link these representations
of common and well-known vulnerabilities to other constants in the Cyc KB by
specifying the sufficient conditions for having them. For instance, we represent "network
node access vulnerability" and we say that a network with nodes accessible to
unauthorized agents has a network node access vulnerability:
(#$ implies
(#$and
(#$isa 7NETWORK #$LocalAreaNetwork)
(#$node!nSystem 7HUB 7NETWORK)
11
(#$hasPhysicalAccess 7AGENT ?HUB)
(#$unauthorizedAgent 7 AGENT 7HUB))
(#$hasV ulnerabilityT ype 7NETW0RK #$NetworkNodeAccessVulnerability))
The next step in doing script-based vulnerability assessment in Cyc is to represent the
conditions, including types of common vulnerabilities, that enable one to move from the
performance of one action type to another.
Here's an example of the way this works in Cyc . We represent the fact that the
vulnerability represented by the CycL constant #$EricAllmanSendmail8Vulnerability
enables one to move from sending data of a particular sort to the SMTP port, to crashing
the SMTP program:
(#$actionTypeAllowsActionTypeWhen
#$SendingDataToSMTPPortProgram
#$CrashingSMTPProgram
#$EricAllmanSendmail8Vulnerability)
What this assertion says could be represented as a directed graph in which the action
types are nodes and the condition connecting the nodes is a directed link:
#$EricAllmanSendmail8Vulnerability
#$SendingDataToSMTPPortProgram #$CrashingSMTPProgram
By representing all significant condition types that are sufficient for significant action-
types, we will end up with an elaborate script that represents all possible paths from start
actions to goal actions. There are currently 133 instances of #$CycSystemPathConstant
(such as #$n-wayJunctionInSystem, #$sourceNodeInSystem, #$cutNodeInSystem, etc.),
which are richly interconnected with rules that enable Cyc to reason deeply about the
relations of nodes and links in a script.
The next step in performing script-based assessments is determining which conditions
and vulnerabilities obtain on the network being assessed. So, if it turns out that a system
is m nnin g software that would enable one to move from a performance of one action type
to the performance of another action type, Cyc knows it. Finally, it is possible to ask Cyc
which attack paths in the script can be performed on the network being assessed.
So, if we specify the start node as the action type "Scanning a network", it may turn out
12
that, acting on a particular network represented in Cyc, there are a few paths through the
script that leads to the goal electronic attack action type "Gaining root access".
4. Automatic semantic integration of security-relevant knowledge
We have enabled automatic semantic integration of a number of security-relevant
knowledge sources with Cyc. By "semantically integrated", I mean that we take the
output of each of these sources, automatically encode it in CycL, and assert it in the
knowledge base.
Currently, we have four automatic means of integrating specific network information
with the Cyc knowledge base, traceroute provides a logical network topology, nmap
provides information about what ports are open, queso identifies the OS each machine on
a network is running, and by secure shelling to each machine on a network, we can
automatically look at relevant files and run scripts to pick up MAC addresses, CPU
speed, and RAM.
From the command line, we run a single program that executes all these functions on a
network, converts their output to CycL assertions, and incorporates these assertions in the
general Cyc knowledge base.
Cyc can take the representation of a network that has been automatically generated in the
manner I just described and reason about its vulnerabilities using Cyc's full range of
common-sense and cyber-vulnerability specific knowedge, to provide a sophisticated
vulnerability assessment of the network.
5. Future Directions
The IASET work done at Cycorp has, we believe, positioned Cycorp to develop a
powerful commercial network vulnerability assessment tool. The tool we develop will be
significantly different than existing network vulnerability assessment tools. Instead of
discovering vulnerabilities by attempting expoits, the tool we develop will represent a
network declaratively and utilize Cyc’s inference engine and vulnerability knowledge to
deductively determine what vulnerabilities a network has. Instead of being limited to
reasoning about cyber vulnerabilities, the tool we develop will be able to reason about
any sort of vulnerability, drawing on Cyc’s broad common-sense real-world knowledge.
Also, our tool will not be limited to performing vulnerability assesment. It can become a
general-purpose network administration tool. A system administrator will be able use our
tool’s declarative representation of a network to test the impact of any sort of change to the
network before implementing those changes. Also, our tool will be a general risk-
management tool, that automates reasoning about the sorts of security risks it is
acceptable to take given certain assumptions about the intended functionality of a
network.
13
MISSION
OF
AFRL/INFORMATIONDIRECTORATE (IF)
The advancement and application of Information Systems Science
and Technology to meet Air Force unique requirements for
Information Dominance and its transition to aerospace systems to
meet Air Force needs.