- World Wide Web is an architectural frame work for accessing linked documents.
- Linked documents are present across many machines all over internet.
- It is a powerful GUI that presents the required information in a attractive way for the user.
- First graphical browser was developed by Marc Andressen of University of Illinois in the year 1993 and named it as Mosaic.
- Netscape is another browser (1994)
- IE is the default windows based browser.
From user’s point of view, www is a collection of web documents (web Pages) or simply pages.
- Each page contain links to other pages which may be present elsewhere on machines in the internet.
- The idea of one page pointing to another page is called hypertext.
- Pages are viewed with a program called browser, browser fetches the requested pages, formats and display.
- On a web page, strings of texts which an underlined are links to other web pages. There are called hyperlinks.
- Page fetching is done by the browser. Users work is only to click the mouse button on the required link.
The entire process of obtaining the web pages on the client machine in purpose to click on the URL falls into two major process.
- The dynamics happening on the client machine.
- The dynamics happening on the server machine.
These are described below.
- Browser follows the hyperlinks on the web pages, so the hyperlink needs a way to name the pages on other machines in the web.
- Web pages are named using Uniform resource locators (URL)
For example, https://speechus.com/
A URL has three parts
- Name of the protocol (http)
- DNS name of the machine where the page is located.
- Name of the containing the page.
When the hyperlink is clicked
- · The browser locates the URL.
- · The browsers ask DNS for IP address.
- · DNS replies with IP address.
- · Browser makes TCP connection to port 80 on the machine with the above IP address.
- · Browser sends a request for the specific file.
- · The server sends the required file.
- · TCP connection is closed.
- · Browser displays all text info.
- · Browser displays all images.
- For uniformity understanding, web pages are written in a standard language HTML.
Web browser is actually an html interpreter. Brower has many buttons which will provide facility for easy navigation.
Eg – Previous page
– Next page
– – Book Mark etc.
– Not all web pages contain HTML.
– Web pages may also include PDF, JPEG, MP3, MPEG data.
– So a general approach is used to represent them. The server along with the web pages also sends additional information about the page. The information uses MIME format. (RFC 1341) Multipurpose Internet Email Extension.
– Whenever the browser encounters a Format not available readily, it consults its MIME table to understand how to display the page.
Server Side Operation
Upon clicking a URL, the server side offers the following operations.
- Accepts a TCP connection from a client.
- Get the name of the file requested disk.
- Get the file from the disc.
- Return the file to the client.
- Release the TCP connection.
– Problems with this type are the disk access with every request.
– SCSI disk have a disc access time of 5 ms. So it permits 200 disks access per second.
– It is still lower if the files are larger.
– To overcome this, the web servers maintain large cache spaces which hold ‘n’ most recent files. Whenever a request comes, the server first look into caches and respond appropriately.
– To make the server faster, multi-threading is adapted.
– There exists different concepts and design in one design. The server has a front end module and k processing modules (threads). The processing modules have access to the cache. The front end module accepts input request and pass it to one of the module. The processing module verifies the cache and responds if the file exists else it invoke disk search and caches the file and also send the file to the client.
– At any instant of time ‘t’ out of k modules, K-X modules may be few to take requests, X modules may be in the queue waiting for disk access and cache search. If the number of disk is enhanced then it is possible to enhance the speed.
Each module does the following.
- Resolve the name of the Web page requested.
There is no file name here. Default is index.html.
- Authentication of client
Needed because some pages are not available for public.
- Perform access control on the client check to see if there are any restrictions.
- Perform access control on the web page. Access restrictions on the page itself.
- Check the cache.
- Fetch the requested page.
- Determine MIME type
- Take care of miscellaneous address ends. (Building User profile, satisfaction.)
- Return the reply to the client.
- Make an entry in the server log.
If too many requests come in each second, the CPU will not be able to handle the processing load, irrespective of no of disks in parallel. The solution is to add machine with replicated disks. This is called server from. A front end will accepts the request and sprays them to all CPUs rather than multiple threads to reduce the load on that machine. Individual machines are again multithreaded with multiple disks.
It is to be seen that cache is local to each machine. TCP connection should terminate at processing node and not at front end.