ActiveState!

ActivePerl Documentation
Table of Contents

(Usage Statistics)
(about this ver)


* Getting Started
    * Welcome To ActivePerl
    * Release Notes
    * Readme
    * ActivePerl Change Log
* Install Notes
    * Linux
    * Solaris
    * Windows
* ActivePerl Components
    * Overview
    * PPM
    * Windows Specifics
       * OLE Browser
       * PerlScript
       * Perl for ISAPI
       * PerlEZ
* ActivePerl FAQ
    * Introduction
    * Availability & Install
    * Using PPM
    * Docs & Support
    * Windows Specifics
       * Perl for ISAPI
       * Windows 9X/NT/2000
       * Quirks
       * Web Server Config
       * Web programming
       * Programming
       * Modules & Samples
       * Embedding & Extending
       * Using OLE with Perl
* Windows Scripting
    * Active Server Pages
    * Windows Script Host
    * Windows Script Components

Core Perl Documentation


* perl
* perlfaq
* perltoc
* perlbook

* perlsyn
* perldata
* perlop
* perlreftut
* perldsc
* perllol

* perllexwarn
* perldebug

* perlrun
* perlfunc
* perlopentut
* perlvar
* perlsub
* perlmod
* perlpod

* perlstyle
* perlmodlib
* perlmodinstall
* perltrap
* perlport
* perlsec

* perlref
* perlre
* perlform
* perllocale
* perlunicode

* perlboot
* perltoot
* perltootc
* perlobj
* perlbot
* perltie

* perlipc
* perlnumber
* perlfork
* perlthrtut

* perldiag
* perlfaq1
* perlfaq2
* perlfaq3
* perlfaq4
* perlfaq5
* perlfaq6
* perlfaq7
* perlfaq8
* perlfaq9

* perlcompile

* perlembed
* perlxstut
* perlxs
* perlguts
* perlcall
* perlfilter
* perldbmfilter
* perlapi
* perlintern
* perlapio
* perltodo
* perlhack

* perlhist
* perldelta
* perl5005delta
* perl5004delta

* perlamiga
* perlcygwin
* perldos
* perlhpux
* perlmachten
* perlos2
* perlos390
* perlvms
* perlwin32

Pragmas


* attributes
* attrs
* autouse
* base
* blib
* bytes
* charnames
* constant
* diagnostics
* fields
* filetest
* integer
* less
* lib
* locale
* lwpcook
* open
* ops
* overload
* perllocal
* re
* sigtrap
* strict
* subs
* utf8
* vars
* warnings

Libraries


* ActivePerl
    * DocTools
        * TOC
            * RDF
* AnyDBM_File
* Archive
    * Tar
* AutoLoader
* AutoSplit
* B
    * Asmdata
    * Assembler
    * Bblock
    * Bytecode
    * C
    * CC
    * Debug
    * Deparse
    * Disassembler
    * Lint
    * Showlex
    * Stackobj
    * Terse
    * Xref
* Benchmark
* Bundle
    * LWP
* ByteLoader
* Carp
    * Heavy
* CGI
    * Apache
    * Carp
    * Cookie
    * Fast
    * Pretty
    * Push
    * Switch
* Class
    * Struct
* Compress
    * Zlib
* Config
* CPAN
    * FirstTime
    * Nox
* Cwd
* Data
    * Dumper
* DB
* Devel
    * DProf
    * Peek
    * SelfStubber
* Digest
    * HMAC
    * HMAC_MD5
    * HMAC_SHA1
    * MD2
    * MD5
    * SHA1
* DirHandle
* Dumpvalue
* DynaLoader
* English
* Env
* Errno
* Exporter
    * Heavy
* ExtUtils
    * Command
    * Embed
    * Install
    * Installed
    * Liblist
    * MakeMaker
    * Manifest
    * Miniperl
    * Mkbootstrap
    * Mksymlists
    * MM_Cygwin
    * MM_OS2
    * MM_Unix
    * MM_VMS
    * MM_Win32
    * Packlist
    * testlib
* Fatal
* Fcntl
* File
    * Basename
    * CheckTree
    * Compare
    * Copy
    * CounterFile
    * DosGlob
    * Find
    * Glob
    * Listing
    * Path
    * Spec
        * Functions
        * Mac
        * OS2
        * Unix
        * VMS
        * Win32
    * stat
* FileCache
* FileHandle
* FindBin
* Font
    * AFM
* Getopt
    * Long
    * Std
* HTML
    * AsSubs
    * Element
    * Entities
    * Filter
    * Form
    * FormatPS
    * Formatter
    * FormatText
    * HeadParser
    * LinkExtor
    * Parse
    * Parser
    * TokeParser
    * TreeBuilder
* HTTP
    * Cookies
    * Daemon
    * Date
    * Headers
        * Util
    * Message
    * Negotiate
    * Request
        * Common
    * Response
    * Status
* I18N
    * Collate
* IO
    * Dir
    * File
    * Handle
    * Pipe
    * Poll
    * Seekable
    * Select
    * Socket
        * INET
        * UNIX
* IPC
    * Msg
    * Open2
    * Open3
    * Semaphore
    * SysV
* LWP
    * Debug
    * MediaTypes
    * MemberMixin
    * Protocol
    * RobotUA
    * Simple
    * UserAgent
* Math
    * BigFloat
    * BigInt
    * Complex
    * Trig
* MD5
* MIME
    * Base64
    * QuotedPrint
* NDBM_File
* Net
    * Cmd
    * Config
    * Domain
    * DummyInetd
    * FTP
    * hostent
    * libnetFAQ
    * netent
    * Netrc
    * NNTP
    * PH
    * Ping
    * POP3
    * protoent
    * servent
    * SMTP
    * SNPP
    * Time
* O
* ODBM_File
* Opcode
* Pod
    * Checker
    * Find
    * Html
    * InputObjects
    * Man
    * Parser
    * ParseUtils
    * Plainer
    * Select
    * Text
        * Color
        * Termcap
    * Usage
* POSIX
* PPM
    * SOAPClient
    * SOAPServer
* Safe
* SDBM_File
* Search
    * Dict
* SelectSaver
* SelfLoader
* SHA
* Shell
* SOAP
    * Defs
    * Envelope
    * EnvelopeMaker
    * GenericHashSerializer
    * GenericInputStream
    * GenericScalarSerializer
    * Lite
    * OutputStream
    * Packager
    * Parser
    * Transport
        * HTTP
            * Apache
            * CGI
            * Client
            * Server
        * LOCAL
        * MAILTO
        * POP3
        * TCP
    * TypeMapper
* Socket
* Symbol
* Sys
    * Hostname
    * Syslog
* Term
    * ANSIColor
    * Cap
    * Complete
    * ReadLine
* Test
    * Harness
* Text
    * Abbrev
    * ParseWords
    * Soundex
    * Tabs
    * Wrap
* Thread
    * Queue
    * Semaphore
    * Signal
    * Specific
* Tie
    * Array
    * Handle
    * Hash
    * RefHash
    * Scalar
    * SubstrHash
* Time
    * gmtime
    * Local
    * localtime
    * tm
* UDDI
    * Lite
* UNIVERSAL
* URI
    * data
    * Escape
    * file
    * Heuristic
    * ldap
    * URL
    * WithBase
* User
    * grent
    * pwent
* Win32
    * AuthenticateUser
    * ChangeNotify
    * Clipboard
    * Console
    * Event
    * EventLog
    * File
    * FileSecurity
    * Internet
    * IPC
    * Mutex
    * NetAdmin
    * NetResource
    * ODBC
    * OLE
        * Const
        * Enum
        * NEWS
        * NLS
        * TPJ
        * Variant
    * PerfLib
    * Pipe
    * Process
    * Registry
    * Semaphore
    * Service
    * Sound
    * TieRegistry
* Win32API
    * File
    * Net
    * Registry
* WWW
    * RobotRules
        * AnyDBM_File
* XML
    * Element
    * Parser
        * Expat
    * PPD
    * PPMConfig
    * ValidatingElement
* XSLoader

 SGMLS - class for postprocessing the output from the B and B parsers.


NAME

SGMLS - class for postprocessing the output from the sgmls and nsgmls parsers.


SUPPORTED PLATFORMS

  • Windows
This module is not included with the standard ActivePerl distribution. It is available as a separate download using PPM.

SYNOPSIS

  use SGMLS;
  my $parse = new SGMLS(STDIN);
  my $event = $parse->next_event;
  while ($event) {
    SWITCH: {
      ($event->type eq 'start_element') && do {
        my $element = $event->data;    # An object of class SGMLS_Element
        [[your code for the beginning of an element]]
        last SWITCH;
      };
      ($event->type eq 'end_element') && do {
        my $element = $event->data;    # An object of class SGMLS_Element
        [[your code for the end of an element]]
        last SWITCH;
      };
      ($event->type eq 'cdata') && do {
        my $cdata = $event->data;      # A string
        [[your code for character data]]
        last SWITCH;
      };
      ($event->type eq 'sdata') && do {
        my $sdata = $event->data;      # A string
        [[your code for system data]]
        last SWITCH;
      };
      ($event->type eq 're') && do {
        [[your code for a record end]]
        last SWITCH;
      };
      ($event->type eq 'pi') && do {
        my $pi = $event->data;         # A string
        [[your code for a processing instruction]]
        last SWITCH;
      };
      ($event->type eq 'entity') && do {
        my $entity = $event->data;     # An object of class SGMLS_Entity
        [[your code for an external entity]]
        last SWITCH;
      };
      ($event->type eq 'start_subdoc') && do {
        my $entity = $event->data;     # An object of class SGMLS_Entity
        [[your code for the beginning of a subdoc entity]]
        last SWITCH;
      };
      ($event->type eq 'end_subdoc') && do {
        my $entity = $event->data;     # An object of class SGMLS_Entity
        [[your code for the end of a subdoc entity]]
        last SWITCH;
      };
      ($event->type eq 'conforming') && do {
        [[your code for a conforming document]]
        last SWITCH;
      };
      die "Internal error: unknown event type " . $event->type . "\n";
    }
    $event = $parse->next_event;
  }


DESCRIPTION

The SGMLS package consists of several related classes: see SGMLS, SGMLS_Event, SGMLS_Element, SGMLS_Attribute, SGMLS_Notation, and SGMLS_Entity. All of these classes are available when you specify

  use SGMLS;

Generally, the only object which you will create explicitly will belong to the SGMLS class; all of the others will then be created automatically for you over the course of the parse. Much fuller documentation is available in the .sgml files in the DOC/ directory of the SGMLS.pm distribution.

The SGMLS class

This class holds a single parse. When you create an instance of it, you specify a file handle as an argument (if you are reading the output of sgmls or nsgmls from a pipe, the file handle will ordinarily be STDIN):

  my $parse = new SGMLS(STDIN);

The most important method for this class is next_event, which reads and returns the next major event from the input stream. It is important to note that the SGMLS class deals with most ESIS events itself: attributes and entity definitions, for example, are collected and stored automatically and invisibly to the user. The following list contains all of the methods for the SGMLS class:

next_event(): Return an SGMLS_Event object containing the next major event from the SGML parse.
element(): Return an SGMLS_Element object containing the current element in the document.
file(): Return a string containing the name of the current SGML source file (this will work only if the -l option was given to sgmls or nsgmls).
line(): Return a string containing the current line number from the source file (this will work only if the -l option was given to sgmls or nsgmls).
appinfo(): Return a string containing the APPINFO parameter (if any) from the SGML declaration.
notation(NNAME): Return an SGMLS_Notation object representing the notation named NNAME. With newer versions of nsgmls, all notations are available; otherwise, only the notations which are actually used will be available.
entity(ENAME): Return an SGMLS_Entity object representing the entity named ENAME. With newer versions of nsgmls, all entities are available; otherwise, only external data entities and internal entities used as attribute values will be available.
ext(): Return a reference to an associative array for user-defined extensions.

The SGMLS_Event class

This class holds a single major event, as generated by the next_event method in the SGMLS class. It uses the following methods:

type(): Return a string describing the type of event: ``start_element'', ``end_element'', ``cdata'', ``sdata'', ``re'', ``pi'', ``entity'', ``start_subdoc'', ``end_subdoc'', and ``conforming''. See SYNOPSIS, above, for the values associated with each of these.
data(): Return the data associated with the current event (if any). For ``start_element'' and ``end_element'', returns an SGMLS_ELement object; for ``entity'', ``start_subdoc'', and ``end_subdoc'', returns an SGMLS_Entity object; for ``cdata'', ``sdata'', and ``pi'', returns a string; and for ``re'' and ``conforming'', returns the empty string. See SYNOPSIS, above, for an example of this method's use.
key(): Return a string key to the event, such as an element or entity name (otherwise, the same as data()).
file(): Return the current file name, as in the SGMLS class.
line(): Return the current line number, as in the SGMLS class.
element(): Return the current element, as in the SGMLS class.
parse(): Return the SGMLS object which generated the event.
entity(ENAME): Look up an entity, as in the SGMLS class.
notation(ENAME): Look up a notation, as in the SGMLS class.
ext(): Return a reference to an associative array for user-defined extensions.

The SGMLS_Element class

This class is used for elements, and contains all associated information (such as the element's attributes). It recognises the following methods:

name(): Return a string containing the name, or Generic Identifier, of the element, in upper case.
parent(): Return the SGMLS_Element object for the element's parent (if any).
parse(): Return the SGMLS object for the current parse.
attributes(): Return a reference to an associative array of attribute names and SGMLS_Attribute structures. Attribute names will be all in upper case.
attribute_names(): Return an array of strings containing the names of all attributes defined for the current element, in upper case.
attribute(ANAME): Return the SGMLS_Attribute structure for the attribute ANAME.
set_attribute(ATTRIB): Add the SGMLS_Attribute object ATTRIB to the current element, replacing any other attribute structure with the same name.
in(GI): Return true (ie. 1) if the string GI is the name of the current element's parent, or false (ie. 0) if it is not.
within(GI): Return true (ie. 1) if the string GI is the name of any of the ancestors of the current element, or false (ie. 0) if it is not.
ext(): Return a reference to an associative array for user-defined extensions.

The SGMLS_Attribute class

Each instance of an attribute for each SGMLS_Element is an object belonging to this class, which recognises the following methods:

name(): Return a string containing the name of the current attribute, all in upper case.
type(): Return a string containing the type of the current attribute, all in upper case. Available types are ``IMPLIED'', ``CDATA'', ``NOTATION'', ``ENTITY'', and ``TOKEN''.
value(): Return the value of the current attribute, if any. This will be an empty string if the type is ``IMPLIED'', a string of some sort if the type is ``CDATA'' or ``TOKEN'' (if it is ``TOKEN'', you may want to split the string into a series of separate tokens), an SGMLS_Notation object if the type is ``NOTATION'', or an SGMLS_Entity object if the type is ``ENTITY''. Note that if the value is ``CDATA'', it will not have escape sequences for 8-bit characters, record ends, or SDATA processed -- that will be your responsibility.
is_implied(): Return true (ie. 1) if the value of the attribute is implied, or false (ie. 0) if it is specified in the document.
set_type(TYPE): Change the type of the attribute to the string TYPE (which should be all in upper case). Available types are ``IMPLIED'', ``CDATA'', ``NOTATION'', ``ENTITY'', and ``TOKEN''.
set_value(VALUE): Change the value of the attribute to VALUE, which may be a string, an SGMLS_Entity object, or an SGMLS_Notation subject, depending on the attribute's type.
ext(): Return a reference to an associative array available for user-defined extensions.

The SGMLS_Notation class

All declared notations appear as objects belonging to this class, which recognises the following methods:

name(): Return a string containing the name of the notation.
sysid(): Return a string containing the system identifier of the notation, if any.
pubid(): Return a string containing the public identifier of the notation, if any.
ext(): Return a reference to an associative array available for user-defined extensions.

The SGMLS_Entity class

All declared entities appear as objects belonging to this class, which recognises the following methods:

name(): Return a string containing the name of the entity, in mixed case.
type(): Return a string containing the type of the entity, in upper case. Available types are ``CDATA'', ``SDATA'', ``NDATA'' (external entities only), ``SUBDOC'', ``PI'' (newer versions of nsgmls only), or ``TEXT'' (newer versions of nsgmls only).
value(): Return a string containing the value of the entity, if it is internal.
sysid(): Return a string containing the system identifier of the entity (if any), if it is external.
pubid(): Return a string containing the public identifier of the entity (if any), if it is external.
filenames(): Return an array of strings containing any file names generated from the identifiers, if the entity is external.
notation(): Return the SGMLS_Notation object associated with the entity, if it is external.
data_attributes(): Return a reference to an associative array of data attribute names (in upper case) and the associated SGMLS_Attribute objects for the current entity.
data_attribute_names(): Return an array of data attribute names (in upper case) for the current entity.
data_attribute(ANAME): Return the SGMLS_Attribute object for the data attribute named ANAME for the current entity.
set_data_attribute(ATTRIB): Add the SGMLS_Attribute object ATTRIB to the current entity, replacing any other data attribute with the same name.
ext(): Return a reference to an associative array for user-defined extensions.


AUTHOR AND COPYRIGHT

Copyright 1994 and 1995 by David Megginson, dmeggins@aix1.uottawa.ca. Distributed under the terms of the Gnu General Public License (version 2, 1991) -- see the file COPYING which is included in the SGMLS.pm distribution.


SEE ALSO:

the SGMLS::Output manpage and the SGMLS::Refs manpage.

 SGMLS - class for postprocessing the output from the B and B parsers.