
The following sample program connects to a WWW server and retrieves information about the document specified as an 'URL' (Uniform Resource Locator) on the command line. As an example of the information available for the document it will print the date when the document was last modified.
In RFC 1945 which describes the HTTP protocol we can find the following information needed for this task:
The HEAD command can be sent in two formats: the simple request or the full request. The full request format of the HEAD command is defined as follows:
HEAD documentname HTTP/1.0<CRLF>
request header<CRLF>
For our purpose we don't need to pass additional options in the request
header field so we can leave this field blank. However we may not omit
the closing CRLF character pair terminating the request header field
otherwise the server would not accept it as a valid command. The full
request sent to a server will return a full response in the format:
HTTP/1.0 statuscode reasonphrase<CRLF>
response body<CRLF>
The HTTP specification lists several information fields for the response
body that can appear in any order. Currently we are only interested in
the Last-Modified field and ignore all other fields.
The following line shows a sample HEAD command sent to a server with the appropriate response:
HEAD / HTTP/1.0<CRLF><CRLF>
Response from server:
HTTP/1.0 200 OK<CRLF>
Server: GoServe/2.45<CRLF>
Date: Thu, 18 Jul 1996 15:40:47 GMT<CRLF>
Content-Type: text/html<CRLF>
Content-Length: 1081<CRLF>
Content-Transfer-Encoding: binary<CRLF>
Last-Modified: Thu, 19 Oct 1995 16:27:52 GMT<CRLF>
Since we are only interested in the date when the document has been last
modified we have to search the response for this keyword. During development
of this sample I discovered that most web servers use the exact string as
shown above to identify this field, some other servers however don't. To
be able to find the date in responses from all servers we can simply
uppercase the whole string before searching the last-modified field.
This is already everything we need to know for our program. This is the implementation of the main program:
/* SHOWDATE.CMD - IBM REXX Sample Program */
Parse Arg
/* Load REXX Socket library if not already loaded */
If RxFuncQuery("SockLoadFuncs") Then
Do
Call RxFuncAdd "SockLoadFuncs","RXSOCK","SockLoadFuncs"
Call SockLoadFuncs
End
/* retrieve the header of the document specified by URL */
Header = GetHeader(URL)
If Length(Header) \= 0 Then
Do
/* header could be read, find date */
DocDate = GetModificationDate(Header)
Say "Document date is:" DocDate
End
Else
Say "Document information could not be retrieved."
Exit
The 'Connect' function to connect to the server is exactly the
same as already seen in the remote control application except that it
now uses port number 80 if no port was specified by the caller:
/********************************************************/
/* */
/* Function: Connect */
/* Purpose: Create a socket and connect it to server. */
/* Arguments: Server - server name, may contain port no.*/
/* Returns: Socket number if successful, -1 otherwise */
/* */
/********************************************************/
Connect: Procedure
Parse Arg Server
/* if the servername has a port address specified */
/* then use this one, otherwise use the default http */
/* port 80 */
Parse Var Server Server ":" Port
If Port = "" Then
Port = 80
/* resolve server name alias to dotted IP address */
rc = SockGetHostByName(Server, "Host.!")
If rc = 0 Then
Do
Say "Unable to resolve server:" Server
Return -1
End
/* create a TCP socket */
Socket = SockSocket("AF_INET", "SOCK_STREAM", "0")
If Socket < 0 Then
Do
Say "Unable to create socket"
Return -1
End
/* connect the new socket to the specified server */
Host.!family = "AF_INET"
Host.!port = Port
rc = SockConnect(Socket, "Host.!")
If rc < 0 Then
Do
Say "Unable to connect to server:" Server
Call Close Socket
Return -1
End
Return Socket
The 'SendCommand' function expects a single line command from
the caller. As needed by the HTTP protocol two pairs of CRLF are appended
to the command string to classify the command as a full request. After the
command has been sent the function receives the response from the server
until no more characters can be read and returns the response:
/********************************************************/
/* */
/* Function: SendCommand */
/* Purpose: Send a command via the specified socket */
/* and return the full response to caller. */
/* Arguments: Socket - active socket number */
/* Command - command string */
/* Returns: Response from server or empty string if */
/* failed. */
/* */
/********************************************************/
SendCommand: Procedure
Parse Arg Socket, Command
/* append two pairs of CRLF to end the command string */
Command = Command || "0D0A0D0A"x
BytesSent = SockSend(Socket, Command)
Response = ""
Do Forever
BytesRcvd = SockRecv(Socket, "RcvData", 1024)
If BytesRcvd <= 0 Then
Leave
Response = Response || RcvData
End
Return Response
The 'Close' function is already well known from the previous samples:
/********************************************************/
/* */
/* Procedure: Close */
/* Purpose: Close the specified socket. */
/* Arguments: Socket - active socket number */
/* Returns: nothing */
/* */
/********************************************************/
Close: Procedure
Parse Arg Socket
Call SockShutDown Socket, 2
Call SockClose Socket
Return
The 'GetHeader' function isolates the server name and document name
from the passed URL, connects to the server, retrieves the full header
information and closes the connection again, returning the full header
to the caller:
/********************************************************/
/* */
/* Function: GetHeader */
/* Purpose: Request the header for the specified URL */
/* from the network. */
/* Arguments: URL - fully specified document locator */
/* Returns: Full header of specified document or */
/* empty string if failed (also if no header */
/* exists). */
/* */
/********************************************************/
GetHeader: Procedure
Parse Arg URL
/* Isolate server name and document name, document */
/* name is always preceded with a slash */
Parse Var URL "http://" Server "/" Document
Document = "/" || Document
Socket = Connect(Server)
If Socket = -1 Then
Return ""
Command = "HEAD" Document "HTTP/1.0"
Header = SendCommand(Socket, Command)
Call Close Socket
Return Header
Finally the function 'GetModificationDate' searches the full
header (which is passed in a single string) for the last modification
date. As already mentioned we search only the uppercased header to avoid
problems with some web servers. To find the last modification date it
looks for the keyword "LAST-MODIFIED:" and a trailing linefeed character
("0A"x). The extracted modification date now could still contain leading
or trailing blanks or carriage return characters that will be removed
before the result is returned to the caller. Searching only for the
linefeed character as a delimiter ensures that the program will also
work with web servers that use only the UNIX style line separation character:
/********************************************************/
/* */
/* Function: GetModificationDate */
/* Purpose: Find the last-modified date in the passed */
/* header and return just the date. */
/* Arguments: Header - full header of document */
/* Returns: Date string when document was last */
/* modified or empty string if date was not */
/* found. */
/* */
/********************************************************/
GetModificationDate: Procedure
Parse Arg Header
/* isolate date string and strip all unwanted chars */
Parse Upper Var Header "LAST-MODIFIED:" ModDate "0A"x
ModDate = Strip(ModDate)
ModDate = Strip(ModDate,,"0D"x)
Return ModDate