How to access data on World Wide Web?
World Wide Web, also known as www, act as a repository to link information together from all over the world. It provides distributed client-server service where web browser acts a client and different sites on the server play the role of the server. World Wide Web (www) mainly access data via Http. Hypertext transfer protocol works as a combination of FTP and SMTP. It transfers files and uses the services of TCP but it is simpler than FTP because it doesn’t use separate control connections for data transfer. Data transferred between client and server is similar to SMTP messages. MIME-like headers control the format of the messages. Http server and HTTP client are able to interpret HTTP messages. When the client sends the request to HTTP server, data/content is sent as a response by the server to HTTP client. Http uses Port 80 of TCP for service.
Data transfer on www
Hypertext transfer protocol is stateless. It uses services of TCP for data transfer on www. When the client sends a request to the server to initialize data transaction, the server sends a response in reply for confirmation.
Let us have a look on format of request messages and response messages traverse on www for the purpose of data transaction:
The format of request and response messages are nearly same in HTTP data transaction. A request message has 3 parts: Request line, Header, and Body while a response message also made up of 3 parts: Status line, Header, and Body.
Request Line & Status Line: Request line is the first line of the Request message of Http data message sent to the server. It consists of 3 parts: Request type, URL, and Http version. While Status Line is the first line of the Response message sent by the server to the client in return to the request. It also has 3 parts: Http version, Status code, and Status phrase.
Request type can be categorized into methods. There are several methods that can be mentioned as request type in request message:
- GET is used to request a document from Server
- POST is used to send some data/ information to Server from Client
- HEAD is used to request for information about a document, not the actual document
- PUT is used to send document from Server to Client
- CONNECT is reserved
- OPTION is used to know about available options
- TRACE is used to echo incoming requests
URL is a standard to specify any type of information on the Internet. It defines 4 things: Protocol that is client/server program used for retrieving documents, Host that is a computer where information is located, Port that is inserted between host and path and separated by a colon, and Path that tells where the file is actually located.
Http version currently running on market is HTTP 2.0
Status Code consists of 3 digits. Codes in the range of 100 are informal status codes, codes in the range of 200 means successful request status codes, codes in the range of 300 redirect the clients to another URL, codes in the range of 400 indicates client side error and codes in the range of 500 means server side error.
Status Phrase is used to describe the status code in the textual form. Let us have a look at the meaning of status codes in textual form:
|100||Continue||Initial part of request is received, client may continue with its request|
|101||Switching||Server is complying with a client request to switch protocols defined in upgrade header|
|200||OK||The request is successful|
|201||Create||A new URL is created|
|202||Accepted||Requested is accepted but not immediately acted upon|
|204||No Content||No content in body|
|301||Moved Permanently||Requested URL is no longer in use by Server|
|302||Moved Temporarily||Requested URL has moved temporarily by Server|
|304||Not Modified||The document has not been modified|
|400||Bad Request||There is a syntax error in request|
|401||Unauthorized||Request lacks proper authorization|
|403||Forbidden||Service is denied|
|404||Not Found||Document is not found|
|405||Method Not Allowed||Method is not supported in URL|
|406||Not Acceptable||Format requested is not acceptable|
|500||Internal Server Error||Some server side error arises like a crash|
|501||Not Implemented||Requested action cannot be completed|
|503||Service Unavailable||Service is temporarily unavailable, may be requested in future|
Header: It is used to exchange additional information between Client and Server. There may be one or more header lines. Each header line contains header name, colon, space, and header value. Header line has 4 categories viz., General Header, Request Header, Response Header, and Entity header. Each header line belongs to any of those categories. Request message headers belong to general, request or entity header category while response message header may be any of general, response or entity header categories.
General header gives general information about the message. It can be both in request and response message. General headers are given here with meaning:
- Cache-control specifies information about caching
- Connection specifies whether connection should be closed or not
- Date tells current date
- MIME-version tells which MIME version is in use
- Upgrade specifies preferred communication protocol
Request header specifies client’s configuration and preferred document format. Can be used in request messages of HTTP. List of request headers with details:
- Accept shows the medium format the client can accept
- Accept-charset shows the character set client can handle
- Accept-encoding shows encoding scheme the client can handle
- Accept-language shows the language client can accept
- Authorization shows what permission client has
- From shows e-mail address of user
- Host shows host and port no. of server
- If-modified-since sends document if newer than specified date
- If-match sends documents only if matches to given tag
- If-non-match sends documents only if it doesn’t match given tag
- If-range shows only the portion of the document that is missing
- If-unmodified-since sends document if not changed since specified date
- Referrer specifies URL of the linked document
- User-agent identifies client program
Response header specifies server configuration and special information about the request. It can be presented only in the response message for data transaction on www. Some response headers are described here:
- Accept-range shows if server accepts the range requested by client
- Age shows the age of the document
- Public shows the supported list of method
- Retry-after specifies date after which server is available
- Server shows name and version of server
Entity-header gives information about the document body. It is mainly available in response message but it can also be present in request messages such as POST or PUT methods. Types of headers used are explained here:
- Allow tells all valid methods can be used with URL
- Content-encoding specify encoding scheme
- Content-language shows length of document
- Content-range specify range of document
- Content-type specify medium type
- Etag gives an entity tag
- Expires gives time and date when content may change
- Last-modified gives date and time of last change
- Location specifies location of created or moved document
Body: It is an optional part in both request and response messages of HTTP for data traversal on World Wide Web. It contains the document that is to send or receive.
Example of HTTP data transaction on WWW
Suppose a Client wants to send data to Server by using POST method. Request line shows the method (POST), URL and HTTP version (2.0). We have taken 4 header lines here. The body of request message contains input information. The response message contains Status line and 4 header lines. A CGI document is created that is included as the body.
Was this page useful? Share your views in comment section below.
Keep visiting Tech-Blogs to get updated with our latest technology related informative posts.