The server has a single-process, multi-threaded, asynchronous I/O design. On a single-processor system this is the most efficient approach. On a multi-processor system it is limited by the single process context (ignoring scripts which execute within their own context). An obvious improvement would be to have multi-processor threading or a pool of server processes, one per CPU, servicing requests. The latter may be the approach of future refinements.
The server has been tested with up to 30 concurrent requests originating from 6 different systems and continues to provide an even distribution of data flow to each client (albeit more slowly :^)
Test results are all obtained using the native Digital TCP/IP
Services executable. The NETLIB image may provide very slightly lower results
due to the additional NETLIB layer. These results are indicative only!
Simple File Request Turn-Around
Two sets of data are now reported, one with caching disabled, the other enabled.
A series of tests using batches of 200 accesses were made and the results averaged. The first test returned an empty file measuring response and file access time, without any actual transfer. The second and third requested files of 16K and 64K characters respectively, testing performance with more realistic scenarios. All were done using one and then ten concurrent requests.
The test system was a lightly-loaded AlphaServer 2100, VMS v7.1 and DEC TCP/IP 4.2. No Keep-Alive: functionality was employed so each request required a complete TCP/IP connection and disposal, although the WWWRKOUT utility (see 16.6 - Server Workout (stress-test)) was used on the same system as the HTTP server, eliminating actual network transport. DNS (name resolution) was disabled. The command lines are show below.
$ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/ht_root/exercise/0k.txt" $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/ht_root/exercise/0k.txt" $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/ht_root/exercise/16k.txt" $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/ht_root/exercise/16k.txt" $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/ht_root/exercise/64k.txt" $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/ht_root/exercise/64k.txt"
The following results were derived using the v5.2 server.
Concurrent | Duration (seconds) | Requests/Second | |
---|---|---|---|
0K | 1 | 2.11 | 95 |
0K | 10 | 1.70 | 117 |
16K | 1 | 3.04 | 66 |
16K | 10 | 2.50 | 80 |
64K | 1 | 6.71 | 30 |
64K | 10 | 5.10 | 39 |
Concurrent | Duration (seconds) | Requests/Second | |
---|---|---|---|
0KB | 1 | 0.95 | 210 |
0KB | 10 | 0.82 | 244 |
16KB | 1 | 1.86 | 107 |
16KB | 10 | 1.60 | 125 |
64KB | 1 | 4.74 | 43 |
64KB | 10 | 4.25 | 47 |
Significantly, with both environments, throughput actually improves at ten concurrent requests (probably due to the latency of the serial TCP/IP connection/disconnection in one-by-one, compared to several happening concurrently).
Note that the response and transfer benefits decline noticably with file
size (transfer time). The difference between cached and non-cached with the
zero file size (no actual data transfer involved) gives some indication of the
raw difference in response latency, some 220% improvement. This is a fairly
crude analysis, but does give some indication of cache efficiencies.
Simple File Request Transfer Rate
The simple text file request under similar conditions indicates a potential transfer rate well in excess of 1 Mbyte per second. (Remember, both client and server are on the same system, so the data, although being transported by TCP/IP networking, is not actually ending up out on a physical network.) This serves to demonstrate that server architecture should not be the limiting factor in file throughput.
$ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=10 - /PATH="/sys$common/sysexe/tnt$server.exe" $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=10 - /PATH="/sys$common/sysexe/tnt$server.exe" $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=10 - /PATH="/sys$common/sysexe/cxx$compiler.exe" $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=10 - /PATH="/sys$common/sysexe/cxx$compiler.exe"
The following results were derived using the v5.2 server.
Concurrent | Duration (seconds) | Mbytes/Second | |
---|---|---|---|
2.4MB (4700 blocks) | 1 | 6.4 | 3.8 |
2.4MB (4700 blocks) | 10 | 5.7 | 4.2 |
6.9MB (13442 blocks) | 1 | 40 | 1.7 |
6.9MB (13442 blocks) | 10 | 17 | 4.0 |
Significantly, there were no dramatic drops in transfer rate between one
and ten concurrent requests! In fact an increase in throughput!
14.1 - File Record Format
The server can handle STREAM, STREAM_LF, STREAM_CR, FIXED and UNDEFINED record formats very much more efficiently than VARIABLE or VFC files.
With STREAM, FIXED and UNDEFINED files the assumption is that HTTP carriage-control is within the file itself (i.e. at least the newline (LF), all that is required required by browsers), and does not require additional processing. With VARIABLE record files the carriage-control is implied and therefore each record requires additional processing by the server to supply it. Even with variable record files having multiple records buffered by the HTTPd before writing them collectively to the network improving efficiency, stream and binary file reads are by Virtual Block and are written to the network immediately making the transfer of these very efficient indeed!
So significant is this efficiency improvement a module exists to
automatically convert VARIABLE record files to STREAM-LF when detected by the
file transfer module. This is disabled by default but the user is strongly
encouraged to enable it and to ensure that stream format files are provided
to the server by other hypertext generating and processing utilitites.
14.2 - Subprocess-based Scripting
Persistant-subprocesses are probably the most efficient solution for child-process scripting under VMS. See 12.2 - Scripting Environment. The I/O still needs to be on-served to the client by the server.
A simple performance evaluation shows the relative merits of the three scripting environments available. Two results are provided here. Both were obtained using the WWWRKOUT utility (see 16.6 - Server Workout (stress-test)) accessing the same CGI test utility script, HT_ROOT:[SRC.CGIPLUS]CGIPLUSTEST.C, which executes in both standard CGI and CGIplus environments. A series of 200 access were made and the results averaged. The first test returned only the HTTP header, evaluating raw request turn-around time. The second test requested a body of 16K characters, again testing performance with a more realistic scenario. No Keep-Alive: functionality was employed so each request required a complete TCP/IP connection and disposal, although the WWWRKOUT utility was used on the same system as the HTTP server, eliminating actual network transport. DNSLookup (host name resolution) was disabled. The test system was a lightly-loaded AlphaServer 2100, VMS v7.1 and DEC TCP/IP 4.2.) The command lines are show below:
$ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/cgi-bin/cgiplustest?0" $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/cgi-bin/cgiplustest?16" $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/cgiplus-bin/cgiplustest?0" $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/cgiplus-bin/cgiplustest?16"
The following results were derived using the v5.2 server.
Concurrent | Duration (seconds) | Requests/Second | |
---|---|---|---|
0KB | 1 | 14.7 | 13.65 |
0KB | 10 | 8.60 | 23.3 |
16KB | 1 | 13.69 | 14.6 |
16KB | 10 | 8.68 | 23.0 |
Concurrent | Duration (seconds) | Requests/Second | |
---|---|---|---|
0KB | 1 | 4.20 | 47.6 |
0KB | 10 | 4.31 | 46.4 |
16KB | 1 | 4.32 | 46.3 |
16KB | 10 | 4.39 | 45.6 |
Although these results are indicative only, they do show CGIplus to have
a potential for improvement over standard CGI in the order of 200%,
a not inconsiderable improvement. Of course this test generates
the output stream very simply and efficiently and so excludes any actual
processing time that may be required by a "real" application.
If the script/application has a large activation time the reduction
in response latency could be even more significant (e.g. Perl scripts
and RDBS access languages).
14.3 - DECnet-based Scripting
Using the same environment as when testing subprocess-based CGI scripts (see above) this series of tests assesses the performance of the same script being executed using DECnet to manage the processes. DECnet Phase-IV was in use on a VMS v7.1 AlphaServer 2100.
$ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/decnet/cgiplustest?0" $ WWWRKOUT /SIM=1 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/decnet/cgiplustest?200" $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/decnet/cgiplustest?0" $ WWWRKOUT /SIM=10 /NOBREAK /NOVARY /NOHEAD /NOOUT /COUNT=200 - /PATH="/decnet/cgiplustest?200"
The following results were derived using the v5.2 server, which now provides for DECnet connection reuse.
Concurrent | Duration (seconds) | Requests/Second | |
---|---|---|---|
0KB | 1 | 32.4 | 6.2 |
0KB | 10 | 50.0 | 4.0 |
16KB | 1 | 25.8 | 7.8 |
16KB | 10 | 22.8 | 8.8 |
Concurrent | Duration (seconds) | Requests/Second | |
---|---|---|---|
0KB | 1 | 17.8 | 11.5 |
0KB | 10 | 11.9 | 16.8 |
16KB | 1 | 18.4 | 10.9 |
16KB | 10 | 12.2 | 16.4 |
This section comments on non-persistant scripts (i.e. those that must run-up and run-down with each request - general CGI behaviour). As may be seen from comparing the two tables connection reuse offers distinct benefits in reduced response times, consistency of response times and overall thoughput, showing a difference of some 200% over non-reuse (similar improvements were reported with the OSU 3.3a server).
With ten simultaneous and back-to-back scripts and no connection reuse many more network processes are generated than just ten. This is due to the NETSERVER maintenance tasks such as log creation and purging, activating and deactivating the task, etc., adding latency into this script environment. The throughput was generally still lower than with subprocess-based scripting (11.5 against 14.7 for single requests, 16.8 against 23.3 for ten concurrent requests).
While earlier versions cautioned on the use of DECnet-based scripting this
has been relaxed somewhat through connection reuse.
14.4 - SSL
At this time there are no definitive measurements of SSL performance (see
10 - Secure Sockets Layer), as work on an SSL version of the WWWRKOUT utility
has not yet been undertaken. One might expect that because of the
CPU-intensive cryptography employed in SSL requests that performance,
particularly where concurrent requests are in progress, would be significantly
lower. In practice SSL seems to provide more-than-acceptable responsiveness.
14.5 - Suggestions
Here are some suggestions for improving the performance of the server, listed in approximate order of significance. Note that these will have proportionally less impact on an otherwise heavily loaded system.
This can actually make a remarkable difference. The same test provided very different throughputs with DNS lookup enabled and disabled (v4.5 server, cache enabled).
duration (seconds) | requests/second | |
---|---|---|
DNSLookup ON | 6.30 | 32 |
DNSLookup OFF | 0.95 | 210 |