|

|
|
|
SGMLS - class for postprocessing the output from the B and
B parsers.
|
SGMLS - class for postprocessing the output from the sgmls and
nsgmls parsers.
This module is not included with the standard ActivePerl distribution. It is available as a separate download using PPM.
use SGMLS;
my $parse = new SGMLS(STDIN);
my $event = $parse->next_event;
while ($event) {
SWITCH: {
($event->type eq 'start_element') && do {
my $element = $event->data; # An object of class SGMLS_Element
[[your code for the beginning of an element]]
last SWITCH;
};
($event->type eq 'end_element') && do {
my $element = $event->data; # An object of class SGMLS_Element
[[your code for the end of an element]]
last SWITCH;
};
($event->type eq 'cdata') && do {
my $cdata = $event->data; # A string
[[your code for character data]]
last SWITCH;
};
($event->type eq 'sdata') && do {
my $sdata = $event->data; # A string
[[your code for system data]]
last SWITCH;
};
($event->type eq 're') && do {
[[your code for a record end]]
last SWITCH;
};
($event->type eq 'pi') && do {
my $pi = $event->data; # A string
[[your code for a processing instruction]]
last SWITCH;
};
($event->type eq 'entity') && do {
my $entity = $event->data; # An object of class SGMLS_Entity
[[your code for an external entity]]
last SWITCH;
};
($event->type eq 'start_subdoc') && do {
my $entity = $event->data; # An object of class SGMLS_Entity
[[your code for the beginning of a subdoc entity]]
last SWITCH;
};
($event->type eq 'end_subdoc') && do {
my $entity = $event->data; # An object of class SGMLS_Entity
[[your code for the end of a subdoc entity]]
last SWITCH;
};
($event->type eq 'conforming') && do {
[[your code for a conforming document]]
last SWITCH;
};
die "Internal error: unknown event type " . $event->type . "\n";
}
$event = $parse->next_event;
}
The SGMLS package consists of several related classes: see
SGMLS, SGMLS_Event, SGMLS_Element,
SGMLS_Attribute, SGMLS_Notation, and SGMLS_Entity. All
of these classes are available when you specify
use SGMLS;
Generally, the only object which you will create explicitly will
belong to the SGMLS class; all of the others will then be created
automatically for you over the course of the parse. Much fuller
documentation is available in the .sgml files in the DOC/
directory of the SGMLS.pm distribution.
This class holds a single parse. When you create an instance of it,
you specify a file handle as an argument (if you are reading the
output of sgmls or nsgmls from a pipe, the file handle will
ordinarily be STDIN):
my $parse = new SGMLS(STDIN);
The most important method for this class is next_event, which reads
and returns the next major event from the input stream. It is
important to note that the SGMLS class deals with most ESIS
events itself: attributes and entity definitions, for example, are
collected and stored automatically and invisibly to the user. The
following list contains all of the methods for the SGMLS class:
next_event(): Return an SGMLS_Event object containing the
next major event from the SGML parse.
-
element(): Return an SGMLS_Element object containing the
current element in the document.
-
file(): Return a string containing the name of the current
SGML source file (this will work only if the -l option was given to
sgmls or nsgmls).
-
line(): Return a string containing the current line number
from the source file (this will work only if the -l option was
given to sgmls or nsgmls).
-
appinfo(): Return a string containing the APPINFO
parameter (if any) from the SGML declaration.
-
notation(NNAME): Return an SGMLS_Notation object
representing the notation named NNAME. With newer versions of
nsgmls, all notations are available; otherwise, only the notations
which are actually used will be available.
-
entity(ENAME): Return an SGMLS_Entity object representing
the entity named ENAME. With newer versions of nsgmls, all
entities are available; otherwise, only external data entities and
internal entities used as attribute values will be available.
-
ext(): Return a reference to an associative array for
user-defined extensions.
-
This class holds a single major event, as generated by the
next_event method in the SGMLS class. It uses the following
methods:
type(): Return a string describing the type of event:
``start_element'', ``end_element'', ``cdata'', ``sdata'', ``re'', ``pi'',
``entity'', ``start_subdoc'', ``end_subdoc'', and ``conforming''. See
SYNOPSIS, above, for the values associated with each of these.
-
data(): Return the data associated with the current event (if
any). For ``start_element'' and ``end_element'', returns an
SGMLS_ELement object; for ``entity'', ``start_subdoc'', and
``end_subdoc'', returns an SGMLS_Entity object; for ``cdata'', ``sdata'',
and ``pi'', returns a string; and for ``re'' and ``conforming'', returns the
empty string. See SYNOPSIS, above, for an example of this
method's use.
-
key(): Return a string key to the event, such as an element
or entity name (otherwise, the same as data()).
-
file(): Return the current file name, as in the SGMLS
class.
-
line(): Return the current line number, as in the SGMLS
class.
-
element(): Return the current element, as in the SGMLS
class.
-
parse(): Return the SGMLS object which generated the
event.
-
entity(ENAME): Look up an entity, as in the SGMLS class.
-
notation(ENAME): Look up a notation, as in the SGMLS
class.
-
ext(): Return a reference to an associative array for
user-defined extensions.
-
This class is used for elements, and contains all associated
information (such as the element's attributes). It recognises the
following methods:
name(): Return a string containing the name, or Generic
Identifier, of the element, in upper case.
-
parent(): Return the SGMLS_Element object for the
element's parent (if any).
-
parse(): Return the SGMLS object for the current parse.
-
attributes(): Return a reference to an associative array of
attribute names and SGMLS_Attribute structures. Attribute names
will be all in upper case.
-
attribute_names(): Return an array of strings containing the
names of all attributes defined for the current element, in upper
case.
-
attribute(ANAME): Return the SGMLS_Attribute structure for
the attribute ANAME.
-
set_attribute(ATTRIB): Add the SGMLS_Attribute object
ATTRIB to the current element, replacing any other attribute
structure with the same name.
-
in(GI): Return true (ie. 1) if the string GI is the
name of the current element's parent, or false (ie. 0) if it is
not.
-
within(GI): Return true (ie. 1) if the string GI is the
name of any of the ancestors of the current element, or false
(ie. 0) if it is not.
-
ext(): Return a reference to an associative array for
user-defined extensions.
-
Each instance of an attribute for each SGMLS_Element is an object
belonging to this class, which recognises the following methods:
name(): Return a string containing the name of the current
attribute, all in upper case.
-
type(): Return a string containing the type of the current
attribute, all in upper case. Available types are ``IMPLIED'', ``CDATA'',
``NOTATION'', ``ENTITY'', and ``TOKEN''.
-
value(): Return the value of the current attribute, if any.
This will be an empty string if the type is ``IMPLIED'', a string of
some sort if the type is ``CDATA'' or ``TOKEN'' (if it is ``TOKEN'', you may
want to split the string into a series of separate tokens), an
SGMLS_Notation object if the type is ``NOTATION'', or an
SGMLS_Entity object if the type is ``ENTITY''. Note that if the
value is ``CDATA'', it will not have escape sequences for 8-bit
characters, record ends, or SDATA processed -- that will be your
responsibility.
-
is_implied(): Return true (ie. 1) if the value of the
attribute is implied, or false (ie. 0) if it is specified in the
document.
-
set_type(TYPE): Change the type of the attribute to the
string TYPE (which should be all in upper case). Available types
are ``IMPLIED'', ``CDATA'', ``NOTATION'', ``ENTITY'', and ``TOKEN''.
-
set_value(VALUE): Change the value of the attribute to
VALUE, which may be a string, an SGMLS_Entity object, or an
SGMLS_Notation subject, depending on the attribute's type.
-
ext(): Return a reference to an associative array available
for user-defined extensions.
-
All declared notations appear as objects belonging to this class,
which recognises the following methods:
name(): Return a string containing the name of the notation.
-
sysid(): Return a string containing the system identifier of
the notation, if any.
-
pubid(): Return a string containing the public identifier of
the notation, if any.
-
ext(): Return a reference to an associative array available
for user-defined extensions.
-
All declared entities appear as objects belonging to this class, which
recognises the following methods:
name(): Return a string containing the name of the entity, in
mixed case.
-
type(): Return a string containing the type of the entity, in
upper case. Available types are ``CDATA'', ``SDATA'', ``NDATA'' (external
entities only), ``SUBDOC'', ``PI'' (newer versions of nsgmls only), or
``TEXT'' (newer versions of nsgmls only).
-
value(): Return a string containing the value of the entity,
if it is internal.
-
sysid(): Return a string containing the system identifier of
the entity (if any), if it is external.
-
pubid(): Return a string containing the public identifier of
the entity (if any), if it is external.
-
filenames(): Return an array of strings containing any file
names generated from the identifiers, if the entity is external.
-
notation(): Return the SGMLS_Notation object associated
with the entity, if it is external.
-
data_attributes(): Return a reference to an associative array
of data attribute names (in upper case) and the associated
SGMLS_Attribute objects for the current entity.
-
data_attribute_names(): Return an array of data attribute
names (in upper case) for the current entity.
-
data_attribute(ANAME): Return the SGMLS_Attribute object
for the data attribute named ANAME for the current entity.
-
set_data_attribute(ATTRIB): Add the SGMLS_Attribute object
ATTRIB to the current entity, replacing any other data attribute
with the same name.
-
ext(): Return a reference to an associative array for
user-defined extensions.
-
Copyright 1994 and 1995 by David Megginson,
dmeggins@aix1.uottawa.ca. Distributed under the terms of the Gnu
General Public License (version 2, 1991) -- see the file COPYING
which is included in the SGMLS.pm distribution.
the SGMLS::Output manpage and the SGMLS::Refs manpage.
|
SGMLS - class for postprocessing the output from the B and
B parsers.
|
|