WWW(3)							   WWW(3)



NAME
       WWW - World Wide Web Package

SYNOPSIS
       extract_description( FILE )
       extract_meta( FILE, NAME )
       hyperlink( LIST )

DESCRIPTION
       This  package  provides	a utility functions for the World
       Wide Web to extract descriptions of  or	meta  information
       from files, and hyperlink text.

SUBROUTINES
       The following Perl subroutines are defined and available:

       extract_description( FILE )
	      Extracts	a  description from an HTML or plain text
	      file given by the FILE  name;  FILE  should  be  an
	      absolute	 path.	  The  first  $description::chars
	      (default: 2048) characters are read.  If	the  file
	      ends  in one of the extensions htm, html, or shtml,
	      it is presumed to be an HTML file; if the file ends
	      in  txt,	it  is	presumed to be a plain text file.
	      Other extensions are not recognized and no descrip
	      tion is returned for them.

	      For  HTML	 files,	 first, if a <META NAME="descrip
	      tion" CONTENT="..."> or a	 <META	NAME="DC.descrip
	      tion"   CONTENT="...">  (Dublin  Core)  element  is
	      found, then the words specified as the value of the
	      CONTENT attribute is returned as the description.

	      Otherwise,   all	 HTML	comments,   text  between
	      <SCRIPT>, <STYLE>, and <TITLE> tags, and all  other
	      HTML tags are stripped.  If <AREA ... ALT="..."> or
	      <IMG ... ALT="..."> elements are	found,	then  the
	      words  specified as the value of the ALT attributes
	      are extracted.

	      Finally, for either HTML or plain	 text  files,  at
	      most    $description::words   (default:	50)   are
	      returned.

       extract_meta( FILE, NAME )
	      Extracts the value of the CONTENT attribute from	a
	      META  element  having the given NAME attribute from
	      an HTML file given by the FILE name; FILE should be
	      an  absolute path.  The file must end in one of the
	      extensions htm, html, or shtml to be considered  an
	      HTML file.  The first $description::chars (default:
	      2048) characters	are  read.   The  characters  are
	      cached  between  consecutive  calls  using the same
	      filename.

       hyperlink( LIST )
	      Adds hyperlinks to strings: that	is  strings  that
	      contain  substrings  that are valid URLs (according
	      to  RFC  1630)  have  the	 appropriate  HTML   tags
	      ``wrapped''  around  them	 so  that  they	 will  be
	      selectable when displayed in a browser.	The  ftp,
	      gopher, http, https, mailto, news, telnet, and wais
	      URLs are recognized.  Example:

		 Read all about it at
		 http://www.usatoday.com/

	    becomes:

		 Read all about it at
		 <A HREF="http://www.usatoday.com/">http://www.usatoday.com/</A>


SEE ALSO
       perl(1)

       Tim  Berners-Lee.   ``Universal	Resource  Identifiers  in
       WWW,'' Request for Comments 1630, Network Working Group of
       the Internet Engineering Task Force, June 1994.

       Tim  Berners-Lee,  Larry	 Masinter,  and	 Mark	McCahill.
       ``Uniform  Resource Locators (URL),'' Request for Comments
       1738, Network Working Group, 1994.

       Dave Raggett, Arnaud Le Hors, and Ian Jacobs.  ``Notes  on
       helping	search	engines	 index	your Web site,'' HTML 4.0
       Specification, Appendix	B:  Performance,  Implementation,
       and Design Notes, World Wide Web Consortium, April 1998.

       --.  ``Objects, Images, and Applets: How to specify alter
       nate text,'' HTML 4.0 Specification, 13.8, World Wide Web
       Consortium, April 1998.

       Dublin Core Directorate.	 ``The Dublin Core: A Simple Con
       tent Description Model for Electronic Resources.''

       Larry Wall, et al.  Programming Perl, 3rd ed., O'Reilly	&
       Associates, Inc., Sebastopol, CA, 2000.

AUTHOR
       Paul J. Lucas <pauljlucas@mac.com>



WWW			February 12, 2000		   WWW(3)
