OPES Working Group H. K. Orman Internet Draft Purple Streak August 8, 2005 Hopalong: A Streaming Rules Language Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. A revised version of this draft document will be submitted to the RFC editor as a Standard Track RFC for the Internet Community. Discussion and suggestions for improvement are requested, and should be sent to ietf-mta-filters@imc.org. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2005). All Rights Reserved. Abstract This draft describes a language for processing network data streams and dispatching them to OPES services. Orman [Page 1] draft-ietf-opes-rulHeosp-allaonnggu:agAe-Shtorpeaalmoinngg-0R0ules Language 11 July 2005 1.0 Introduction Hopalong is the OPES rules language. Its purpose is to be the connection between content arrival and content processing. The the OPES architecture document [RFC3835] shows where the rules sit in the processing flow. Hopalong is not intended for writing content-based services on an OPES processor, but it is intended for examining content (message headers, parts, etc.) to determine if it should be dispatched to an OPES processor using the OPES callout protocol. Hopalong is a stream processing language, and it can be used on a packet-by-packet basis. The language depends on cooperating, protocol-specific modules to deal with header parsing, parts that that span transport message boundaries and application-level framing. The minimal requirements for the cooperating modules are detailed in this document. The language also depends on knowing the context of the network connection; the operating system must provide this context, and its detailes are described herein. 2.0 Design Objectives The requirements guiding this design were: 1. Applicable to partial messages (packet-by-packet) 2. Need not see the entire message 3. Can be used in the context of initial receipt and/or when the modified data is returned from an OPES process 4. Supports compilation of header modules and rules into minimal processing sets 5. Efficient matching primitives 6. Bounded execution time 7. Does not require a separate thread for each connection or message. 8. No runtime errors 9. Minimal support from operating system 10. Efficient interface to OPES call-out protocols 11. Extensible to additional protocol and message types Table of Contents 1.0 Overview 2.0 Protocol modules and environment variables 3.0 Data types 4.0 Variables and Operators 5.0 Errors 6.0 Binding to OPES Callout Protocol and Services 7.0 Security Considerations Orman [Page 2] draft-ietf-opes-rulHeosp-allaonnggu:agAe-Shtorpeaalmoinngg-0R0ules Language 11 July 2005 8.0 Comparison to Sieve 9.0 Acknowledgements 1.0 Overview Hopalong is a language for writing dispatch rules for OPES. A Hopalong program is an ordered set of statements, and each statement is a set of pre-conditions and a set of actions. To support fast dispatching, Hopalong has powerful matching functions that can be efficiently compiled. The Hopalong rules can be added to or deleted from a running processor by a management process. The matching can be done without having the entire message present, and a stream of packets may encounter several independent matches by the rules, each one with a different action. Matching is performed on data streams coming into an OPES processor and on streams coming from an OPES callout process back to an OPES processor. The different directions are indicated by special environment variables ("OPESIN" and "CALLOUTBACK") that can be used in rule preconditions. Because many of the decisions regarding OPES processing will depend on header information from lower protocols (IP, TCP, HTTP, SMTP, etc.), Hopalong defines requirements for the header parsing modules. These must provide the headers needed by a Hopalong program in an array of string tuples . Other parsing modules may be needed for applicaton-specific data, such as SMTP commands or MIME parts. They must also supply string tuples. content-based lookup of headers. There is a special module for "time" to support scheduled operations. Hopalong is invoked in the context of incoming Internet packets. The dispatch processor provides the packet and the "processing point" to the ruleset. In the normal case, the first packet will have all the headers, and all subsequent packets in "the message" can be dispatched to the OPES processor. In some cases, only part of the stream will be dispatched The language uses an ascii (UTF-8?) character set, but it must be capable of describing any character set for matching. I don’t know how to describe regular expression matching in generalized characters Orman [Page 3] draft-ietf-opes-rulHeosp-allaonnggu:agAe-Shtorpeaalmoinngg-0R0ules Language 11 July 2005 sets so I won’t try to; this is an extension that "someone" must do. Each statement is of the form P(X) --> A, where P is a predicate on a set of variables X and A is a list of actions. The action list indicates parallel actions and sequential actions. An action is either an assignment or a name and a parameter list. The name is either the name of a built-in service or the name of an OPES service. The parameter list is a tuple of two lists: the anonymous and named parameter lists. An OPES service is implicitly a tuple: the callout service name and the name of the callout protocol associated with it. Built-in actions: 1. Skip(until): Ignore rest of message (no further rules invocation; if earlier part of message was sent to a callout service, further parts will also go there; otherwise let packets proceed to their destinations) Skip takes a parameter, e.g. skip(http.new) or skip(tcp.end) 2. Hold(until) message (no more parts will be sent to the rules module or destination until either the message ends or a rule releases the hold) This sets an environmental value for the packet stream, so that the rules processor knows that a stream is in the "hold" state. The "until" variable must be an environment reference e.g. hold(http.end) or hold(time+10minutes). [TBD, multiple holds" and corresponding "release".] A stream may be held while waiting for a packet to return from an OPES service. The "hold" guarantees that the packet stream will be delivered from the opes process in the same order it was received. 3. Release() message, clears "hold". 4. Assign(name, value) an attribute to message. This takes a name and a value; this creates a named variable to be passed to the callout protocol. 5. Cache() the message using the message hash as the index. (can fail due to resource limits) 6. Lookup(hash). Lookup a message using a hash index. Returns the message. Orman [Page 4] draft-ietf-opes-rulHeosp-allaonnggu:agAe-Shtorpeaalmoinngg-0R0ules Language 11 July 2005 Variables for incoming messages (from the network) have the form "A.B" where A names the parsing module and B is the protocol specific name of a parsed item. For example, "IPv4.dest" is parsed by the IPv4 protocol module and dest is the destination field. Variables for messages coming back from a callout service have the form S.N, where S is the name of the callout service and N is the name of the parameter. Anonymous parameters are in an array, S.Anon[]. There may be environmental variables referring to static items, such as files. The rules language can direct that a local variable be assigned from the contents of a static environment variable. 2.0 Protocol modules and environment variables The interpretation of headers and the delineation between "header" and "body" is done by protocol-specific modules. Several protocol layers may be in action at once, and they can set variables for use in Hopalong matching in rule preconditions. Rules indicate their reliance on protcol modules by using the protocol name in a variable name, such as "http.header.expires". If there is no module for "http", the ruleset will not pass static validation and cannot be used. If "http" does not support the "header" part, or if it does not support a header names of "Expires", then the ruleset using it will not pass validation. Besides parsing headers and content enough to satisfy OPES rules, a protocol module has a core of basic variables that it must set: "Start header" message unit "Start body" message unit "End header" message unit "End Body" message unit 3.0 Data types Numeric integers up to 64-bits, in base 10 and base 8. Character strings in ascii (utf-8 ?) and extended character sets (XML standard for representation in utf-8?). Other data types are protocol specific, but the following must be supported: Module ipv4: IPv4 addresses in octet format Orman [Page 5] draft-ietf-opes-rulHeosp-allaonnggu:agAe-Shtorpeaalmoinngg-0R0ules Language 11 July 2005 Module ipv6: IPv6 addresses in octet format Module tcp: tcp port numbers in hex format Module udp: udp port numbers in hex format Module dns: DNS domain names in "." separated format. There is a special array variable "localdomains" for matching the right hand side of DNS names. Almost all other modules: strings in ascii (and/or utf-8?). The strings are usually case insenstive for matching header names. 4.0 Variables and Operators There are variables. There are arrays. Arrays are content addressable. There are assignments: = assign a variable to an environment variable =~ match a variable to an expression, assigning subexpressions to an array There are predicates: member, match There are built-in functions: compare, concatenate, substring, replace There are conjunctions: and or There is negation: not There is an order of evaluation: left to right There are comments: // There are byte operations: network byte order in/out There are regular expressions to support matching; one expression can have subexpressions that are automatically bound to a variable of type array. The usual right-to-left numbering applies. There are keywords: use, hold, release, skip 5.0 Errors Rules must have syntax validation before being used. For a given set Orman [Page 6] draft-ietf-opes-rulHeosp-allaonnggu:agAe-Shtorpeaalmoinngg-0R0ules Language 11 July 2005 of services and protocol modules, it will be possible to validate a rule set an assure that it will have no errors in referring to environmental data. Further, every protocol module can be configured for limited parsing, i.e., the modules will only need to have enough runtime code to check for the features mentioned in the rules. If, however, the configuration of services and/or protocols changes, then the rules will have to be subjected to revaldiation. Some runtime errors will be due to network disruptions. A callout server may not respond in a timely manner, or data that is supposed to be sent through to a client or server will encounter connection stalls. The environment may set error conditions that are not associated with packets. The special environment variable "error" indicates that some kind of error has occurred, and the rules can dispatch on that condition and subconditions ("error.tcpreset", "error.ip.src", etc.) Environment variables are 1. Error 2. Callouts Some runtime errors may occur, even after rule validations. 1. Failure to allocate memory for a variable 2. Overflow in a regular expression evaluation 3. Failure to make progress on callout services 4. Connection timeout 5. Change of ruleset, leaving a content stream on permanent hold There’s a problem with knowing which services are affected when there is an error. Suppose the TCP connection is reset and we’d like to terminate any callout service receiving data from that stream. We need to get the list of services from the operating environment. This isn’t difficult to do, because any service is invoked on a packet that is associated with a networking stack that keeps that information. Therefore, the operating environment must supply a variable that is a list by protocol of active callout invocations. This is an array, Callouts[] associated with each protocol. Error handling for exceptions can be designated by the rules using the OnError variables, e.g. OnError.malloc -> skip(smtp.end) OnError.tcp.timeout -> map("End", this.service_list) Orman [Page 7] draft-ietf-opes-rulHeosp-allaonnggu:agAe-Shtorpeaalmoinngg-0R0ules Language 11 July 2005 6.0 Binding to OPES Callout Protocol and Services The operating environment must provide the glue between the rule action and the OPES callout protocols. It must bind the parameters so that they are recognized by the callout protocol and can be passed to the callout protocols. The callout protocol core defines named parameters and anonymous parameters. A rules language action lists the anonymous parameters first, and the named parameters second. For example: if ( C ) then OPES_SERVICE_A("1", "a", source=ipaddr.src, destn=ipaddr.dst); All parameters and parameter names are specified using the OCP grammar [RFC4037] in section 3.1 for parameters. 7.0 Security Considerations The rules language should not compromise authentication and/or privacy considerations. It cannot initiate connections and it retains minimal state between invocations. If the rules language is used to control authentication functions, it must be used with care. The execution environment must not misdirect or duplicate packets in either the network data stream nor the OPES callout stream. 8.0 Comparison to Sieve The SMTP filtering language "Sieve" [Sieve] and its several extensions (variables, spamtest and virustest, body, edithead, IMAP flag, subaddress, relational tests, vacation, reject and refuse) have semantics for header and content examination and matching. Though strongly oriented towards SMTP, the language seems suitable for application to many protocols. Sieve has rules that are conditional/action pairs and a defined order of evaluation. The semantics allow many kinds of matching, and the variables extension allow matching parts to be assigned to variables. Sieve has taken some steps towards implmenting services by defining header modification actions and even responses (the vacation service). Orman [Page 8] draft-ietf-opes-rulHeosp-allaonnggu:agAe-Shtorpeaalmoinngg-0R0ules Language 11 July 2005 It has also moved towards working with mime parts by defining extensions for matching on mime headers and for matching on decoded parts. Should Hopalong and Sieve have the same semantics? That convergence might be useful to both OPES and Sieve. Application writers would benefit by having to learn only one language in order to establish their services on a filtering/callout platform. Another view would have Sieve be an OPES service, and the OPES rules would be able to match information in SMTP headers and divert those messages to a process that would apply Sieve rules. The Sieve core has several assumptions that tie it to SMTP, and its actions are meant to dispose of the entire message rather than process it as a stream. Sieve tries to make its operations and data tests easy to write and easy to read, but it isn’t obvious that it can hold up to heavy use. For example, if there are several hundred addresses to test before deciding on a service, that list should not be expressed directly in Sieve, but it should be expressed as an operation on a list that is loaded and named by Sieve. Still, Hopalong might benefit from using several of the Sieve semantic elements, such as the ":" prefix for comparators and the multiple matching operators such as "anyof" or "allof", etc. 9.0 Acknowledgements This document has drawn heavily on the thoughts in prior drafts on IRML [IRML], P [P], and Sieve [Sieve]. The authors of those drafts have my admiration for their contributions and careful thought and elucidation. Orman [Page 9] draft-ietf-opes-rulHeosp-allaonnggu:agAe-Shtorpeaalmoinngg-0R0ules Language 11 July 2005 BIBLIOGRAPHY [RFC3835] RFC3835, An Architecture for Open Pluggable Edge Services (OPES) , Barbir et al. [RFC4037] RFC4037, Open Pluggable Edge Services (OPES) Callout Protocol (OCP) Core, A. Rousskov [P] P: Message Processing Language, draft-ietf-opes-rules-p-02, A. Beck, A. Rousskov [Sieve] draft-ietf-sieve-3028bis-02.txt, Sieve: An Email Filtering Language, P. Guenther and T. Showalter [IRML] IRML: A Rule Specification Language for Intermediary Services, A. Beck, M. Hofmann, http://www.bell-labs.com/project/IRML_Parser/DOCS/draft-beck-opes-irml-01.txt Orman [Page 10]