计算机系统代写｜CS代写 - 专注Python代写｜Java代写｜R语言代做

18-213/18-613 Proxy Lab: Writing a Caching Web Proxy

1 Introduction

A proxy server is a computer program that acts as an intermediary between clients making requests to access

resources and the servers that satisfy those requests by serving content. A web proxy is a special type of proxy

server whose clients are typically web browsers and whose servers are web servers providing web content.

When a web browser uses a proxy, it contacts the proxy instead of communicating directly with the web

server; the proxy forwards the client’s request to the web server, reads the server’s response, then forwards the

response to the client.

Proxies are useful for many purposes. Sometimes, proxies are used in firewalls, so that browsers behind a

firewall can only contact a server beyond the firewall via the proxy. A proxy may also perform translations

on pages, for example, to make them viewable on web-enabled phones. Importantly, proxies are used as

anonymizers: by stripping requests of all identifying information, a proxy can make the browser anonymous

to web servers. Proxies can even be used to cache web objects by storing local copies of objects from servers

and then responding to future requests by reading them out of its cache rather than by communicating again

with a remote server.

This lab has three parts. An implementation of the first part will be submitted as your checkpoint. Your final

submission will then incorporate the extensions forming the second and third parts. For the first part, you

will create a proxy that accepts incoming connections, reads and parses requests, forwards requests to web

servers, reads the servers’ responses, and forwards the responses to the corresponding clients. The first part

will involve learning about basic HTTP operation and how to use sockets to write programs that communicate

over network connections. In the second part, you will upgrade your proxy to deal with multiple concurrent

connections. This will introduce you to dealing with concurrency, a crucial systems concept. In the third and

last part, you will add caching to your proxy using a simple main memory cache of recently accessed web

content.

You will debug and test your program with PxyDrive, a testing framework we provide, as well as by accessing

your proxy via standard tools, including a web browser. The grading of your code will involve automated

testing. Your code will also be reviewed for correctness and for style.

2 Logistics

This is an individual project. You are allowed only one grace day for the checkpoint and one grace day for the final.

3 Handout instructions

Create your GitHub Classroom repository by clicking the “Download handout" button on the proxylab Autolab

page. Then do the following on a Shark machine:

• Clone the repository that you just created using the git clone command. Do not download and

extract the zip file from GitHub.

• Type your name and Andrew ID in the header comment at the top of proxy.c.

3.1 Robust I/O package

The handout directory contains the files csapp.c and csapp.h, which comprise the CS:APP package

discussed in the CS:APP3e textbook. The CS:APP package includes the robust I/O (RIO) package. When

reading and writing socket data, you should use the RIO package instead of low-level I/O functions, such as

read, write, or standard I/O functions, such as fread, and fwrite.

The CS:APP package also contains a collection of wrapper functions for system calls that check the return

code and exit when there’s an error. You will find that the set of wrapper functions provided is a subset of

those from the textbook and the lecture notes. We have disabled ones for which exiting upon error is not the

correct behavior for a server program. For these, you must check the return code and devise ways to handle

these errors that minimize their impact.

3.2 HTTP parsing library

The handout directory contains the file http_parser.h, which defines the API for a small HTTP string

parsing library. The library includes functions for extracting important data fields from HTTP response

headers and storing them in a parser_t struct. A brief overview of the library is given below. Please refer to

the source files in your handout for the full documentation of the types, structs, and functions available for use

in the library.

To create a new instance of a parser struct, call parser_new(). The returned pointer can then be used as the

first argument to the other functions. parser_parse_line() will parse a line of an HTTP request and store

the result in the provided parser_t struct. Parsed fields of specified types may be retrieved from the struct

by calling parser_retrieve() and by providing a string pointer for the function to write to. Particular

headers may also be retrieved by name via parser_lookup_header(). Headers may instead be accessed in

an iterative fashion by successive calls to parser_retrieve_next_header().

3.3 Modularity

The skeleton file proxy.c, provided in the handout, contains a main function that does practically nothing.

You should fill in that file with your proxy implementation. Modularity, though, should be an important

consideration, and it is important for you to separate the individual modules of your implementation into

different files. For example, your cache should be largely (or completely) decoupled from the rest of your

proxy, so one good idea is to move the implementation of the cache into separate code and header files

cache.c and cache.h.

3.4 Makefile

You are free to add your own source and header files for this lab. The Makefile will automatically link all

.c files into the final binary. While you are free to update the provided Makefile (for example to define the

DEBUG macro), the autograder will use the original Makefile to grade your solution. As such, the entire project

should compile without warnings.

3.5 Other provided resources

Included with your starter code, in the pxy directory, is a pair of programs PxyDrive and PxyRegress (given

as files pxydrive.py and pxyregress.py, respectively.) PxyDrive is a testing framework for your proxy.

PxyRegress provides a way to run a series of standard tests on your proxy using PxyDrive. Both programs

are documented in the PxyDrive user manual, available at:

http://www.cs.cmu.edu/~18213/proxylab/pxydrive-manual.pdf.

Also included, in the tests directory, is a series of 51 test files to test various aspects of your proxy. Each of

these is a command file for PxyDrive. You will want to learn about the operation of PxyDrive and how each

of these tests operate.

Finally, you are provided with a reference implementation of a proxy, named proxy-ref. It is compiled to

execute on a Linux machine.

4 Part I: Implementing a sequential web proxy

The first step is implementing a basic sequential proxy that handles HTTP/1.0 GET requests. Your proxy need

not handle other request types, such as POST requests, but it should respond appropriately, as described below.

Your proxy also need not handle HTTPS requests (only HTTP).

When started, your proxy should listen for incoming connections on a port whose number is specified on the

command line. Once a connection is established, your proxy should read the entirety of the request from the

client and parse the request. It should determine whether the client has sent a valid HTTP request; if so, it

should 1) establish its own connection to the appropriate web server, 2) request the object the client specified,

and 3) read the server’s response and forward it to the client.

4.1 HTTP/1.0 GET requests

When an user enters a URL such as http://www.cmu.edu/hub/index.html into the address bar of a web

browser, the browser will send an HTTP request to the proxy that begins with a request line such as the

following:

GET http://www.cmu.edu:8080/hub/index.html HTTP/1.1\r\n

The proxy should parse the request URL into the host1 , in this case www.cmu.edu:8080, and the path2 ,

consisting of the / character and everything following it. That way, the proxy can determine that it should

open a connection to hostname www.cmu.edu on port 8080 and send an HTTP request of its own, starting

with its own request line of the following form:

GET /hub/index.html HTTP/1.0\r\n

As these examples show, all lines in an HTTP request end with a carriage return (‘\r’) followed by a newline

(‘\n’). Also important is that every HTTP request must be terminated by an empty line, consisting of just the

string “\r\n”.

Notice in the above example that the web browser’s request line ends with HTTP/1.1, while the proxy’s

request line ends with HTTP/1.0. Modern web browsers will generate HTTP/1.1 requests, but your proxy

should handle them and forward them as HTTP/1.0 requests.

Additionally, in the above example, a port number of 8080 was specified as part of the host. If no port is

specified, the default HTTP port of 80 should be used.

4.2 Request headers

Request headers are very important elements of an HTTP request. Headers are key-value pairs provided

line-by-line following the first request line of an HTTP request, with they key and value separated by the

colon (‘:’) character. Of particular importance for this lab are the Host, User-Agent, Connection, and

Proxy-Connection headers. Your proxy must perform the following operations with regard to the listed

HTTP request headers:

• Always send a Host header. This header is necessary to coax sensible responses out of many web

servers, especially those that use virtual hosting.

The Host header describes the host of the web server your proxy is trying to access. For example, to

access http://www.cmu.edu:8080/hub/index.html, your proxy would send the following header:

Host: www.cmu.edu:8080\r\n

It is possible that the client will attach its own Host header to its HTTP requests. If that is the case,

your proxy should use the same Host header as the client.

4.3 Port numbers

There are two significant classes of port numbers for this lab: HTTP request ports and your proxy’s listening port.

The HTTP request port is an optional field in the URL of an HTTP request. That is, the URL may be

of the form, http://www.cmu.edu:8080/hub/index.html, in which case your proxy should connect to

the host www.cmu.edu on port 8080, and it should include the port number in the Host header (e.g., Host:

www.cmu.edu:8080.)

Your proxy must properly function whether or not the port number is included in the URL. If no port is

specified, the default HTTP port number of 80 should be used, which should not be included in the Host

header.

The listening port is the port on which your proxy should listen for incoming connections. Your proxy should

accept a command line argument specifying the listening port number for your proxy. For example, with the

following command, your proxy should listen for connections on port 12345:

linux> ./proxy 12345

The proxy must be given a port number every time it runs. When using PxyDrive, this will be done

automatically, but when you run your proxy on its own, you must provide a port number. You may select

any non-privileged port (greater than 1,024 and less than 32,768) as long as it is not used by other processes.

Since each proxy must use a unique listening port, and many students may be working simultaneously on

each machine, the script port-for-user.pl is provided to help you pick your own personal port number.

Use it to generate a port number based on your Andrew ID:

linux> ./port-for-user.pl bovik

bovik: 5232

The port, p, returned by port-for-user.pl is always an even number. So if you need an additional port

number, say for the Tiny server, you can safely use ports p and p + 1.

4.4 Error handling

In the case of invalid requests, or valid requests that your proxy is unable to handle, it should try to send the

appropriate HTTP status code back to the client (see clienterror() in tiny.c). Read more about HTTP

status codes at: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status

In particular, your proxy must be able to respond to a POST request with the 501 Not Implemented status

code. The request line for a POST request will resemble the following:

POST http://exams.ugrad.cs.cmu.edu/Shibboleth.sso/SAML2/POST HTTP/1.1\r\n

In other cases, it is acceptable for your proxy to simply close the connection to the client when an error occurs,

using close(). Note that in all error cases, you should always clean up all resources being used to handle a

given request, including file descriptors and allocated memory.

Note: Upon normal execution, your proxy should not print anything. However, you should consider having a

verbose mode (set with -v on the command line) that prints useful information for debugging.

Completing Part I satisfies the requirements for the project checkpoint. See Section 7 regarding how your

proxy will be evaluated for the checkpoints.

5 Part II: Dealing with multiple concurrent requests

Production web proxies usually do not process requests sequentially; they process multiple requests in parallel.

This is particularly important when handling a single request can involve a lengthy delay (as it might when

contacting a remote web server). While your proxy waits for a response from the remote web server, it

can work on a pending request from another client. Indeed, most web browsers reduce latency by issuing

concurrent requests for the multiple URLs embedded in a single web page requested by a single client. Once

you have a working sequential proxy, you should alter it to simultaneously handle multiple requests.

Quick Links

Get In Touch