Search This Blog

Monday 25 July 2011

Confined Web Spiders



INTRODUCTION & OBJECTIVE
The objective of the project is to develop a system that retrieves information and documents very efficiently and which limits the number of returned documents by performing an intelligent search procedure. The purpose is to design a system that displays only relevant information to the user, by suppressing unnecessary and irrelevant information.
 EXISTING SYSTEM
         Traditional  Confined Web Spiders consult databases of the most frequently used words in documents, such as words drawn from documents title and first few sentences, hence they won't retrieve documents in which the keywords for which one is searching are buried somewhere within document. They are useful only for searching specific information in World Wide Web (WWW). Many page authors send Confined Web Spider numerous web pages containing various tricks like irrelevant title tag or repeating certain words in first few levels that are irrelevant to actual contents of the page, to boost the ratings. It might lead to situation where in not even one of the top ten sites listed would be of subject you would expect. Anyone can put up a webpage .Results can return academic results or internet gossip. HTML doesn't provide any standard method to identify contents of documents; it is extremely difficult for Confined Web Spider to identify contents of web page to index them. As World Wide Web seems to be ever expanding, with increasing threat to quality of information available on the web.


 PROPOSED SYSTEM
         XML (extended Markup Language) is a simplified language of the mother of all document defining language, SGML (Standardized General Markup Language ) though XML is not as powerful as SGML but much easier to use . Developing web pages using XML is much similar to HTML but provides author with ability to invent their own tags, the tag names and what they mean are left to author to define depending on subject matter. The most important thing about XML is it allows more details to be included in document, searching for specific topics should become more accurate avoiding many mismatches. This application automates the process of sending queries to these websites using advanced technology and presents the search result from all the sites to the user. It is a Confined Web Spider developed for easy search. This Confined Web Spider software is developed using state of art, high calibrated. It is very much operational with current technologies and practices. In addition, the user interface provided in this application will make user / administrator more comfortable with all the complex tools at his/her easy disposal. Implementation of the Confined Web Spider software tool in any organization website is very much practical as it doesn’t demand any other external resources or components.
To provide flexibility to the users, the interfaces have been developed that are accessible through a browser. The GUI’S at the top level have been categorized as
1.      Administrative user interface
2.      The operational or generic user interface

The ‘administrative user interface’ concentrates on the consistent information that is practically, part of the organizational activities and which needs proper authentication for the data collection. These interfaces help the administrators with all the transactional states like Data insertion, Data deletion and Date updation along with the extensive data search capabilities.
The ‘operational or generic user interface’ helps the end users of the system in transactions through the existing data and required services. The operational user interface also helps the ordinary users in managing their own information in a customized manner as per the included flexibilities
 INPUT & OUTPOUT REPRESENTETION
                                                                                                                                                                                                                                        Input design is a part of overall system design.  The main objective during the input design is as given below:
·        To produce a cost-effective method of input.
·        To achieve the highest possible level of accuracy.
·        To ensure that the input is acceptable and understood by the user.

INPUT STAGES:
The main input stages can be listed as below:
·      Data recording
·      Data transcription
·      Data conversion
·      Data verification
·      Data control
·      Data transmission
·      Data validation
·      Data correction

INPUT TYPES:
It is necessary to determine the various types of inputs.  Inputs can be categorized as follows:
·      External inputs, which are prime inputs for the system.
·      Internal inputs, which are user communications with the system.
·      Operational, which are computer department’s communications to the system?
·      Interactive, which are inputs entered during a dialogue.

INPUT MEDIA:
At this stage choice has to be made about the input media.  To conclude about the input media consideration has to be given to;    
·      Type of input
·      Flexibility of format
·      Speed
·      Accuracy
·      Verification methods
·      Rejection rates
·      Ease of correction
·      Storage and handling requirements
·      Security
·      Easy to use
·      Portability
Keeping in view the above description of the input types and input media, it can be said that most of the inputs are of the form of internal and interactive.  As Input data is to be the directly keyed in by the user, the keyboard can be considered to be the most suitable input device.

OUTPUT DESIGN:

In general are:
·        External Outputs whose destination is outside the organization.
·        Internal Outputs whose destination is with in organization and they are the User’s main interface with the computer. Outputs from computer systems are required primarily to communicate the results of processing to users. They are also used to provide a permanent copy of the results for later consultation. The various types of outputs
·        Operational outputs whose use is purely with in the computer department.
·        Interface outputs, which involve the user in communicating directly with the system.

OUTPUT DEFINITION

The outputs should be defined in terms of the following points:

§  Type of the output
§  Content of the output
§  Format of the output
§  Location of the output
§  Frequency of the output
§  Volume of the output
§  Sequence of the output
It is not always desirable to print or display data as it is held on a computer. It should be decided as which form of the output is the most suitable.
For Example
·        Will decimal points need to be inserted
·        Should leading zeros be suppressed.
OUTPUT MEDIA:
In the next stage it is to be decided that which medium is the most appropriate for the output. The main considerations when deciding about the output media are:
·        The suitability for the device to the particular application.
·        The need for a hard copy.
·        The response time required.
·        The location of the users
·        The software and hardware available.
Keeping in view the above description the project is to have outputs mainly coming under the category of internal outputs. The main outputs desired according to the requirement specification are:   
The outputs were needed to be generated as a hard copy and as well as queries to be viewed on the screen.  Keeping in view these outputs, the format for the output is taken from the outputs, which are currently being obtained after manual processing.  The standard printer is to be used as output media for hard copies.
This application consists following modules. 
1.      Administrator Module
2.      User module
3.      Products module
4.       Jobs module
5.       Yellow  pages module
6.      Resume module
1. Administrator Module:
         This module is about an Administrator who maintains this application. This module allows Administrator to add all objects to this application. The entire application is under control of an Administrator. The administrator has authority to add details of data which is presented in the database.
2. User Module:
         This module is about user. Through this module the user can view all the functionalities of an application and one can search for required products, and jobs etc. Users can post his /her resumes by registering in this application.
3. Products Module:
         User can search products here .The products that are added by the administrator are displayed.
4. Jobs Module:
         The user can search jobs based on the selected criteria such as experience, location, part time, full time etc.
5. Yellow Pages Module:
         This module maintains details of companies and organizations who provide services of various types and sectors.


6. Resume Module:
         This module provides resume services. Through this module the registered users can post their resumes.
 PERFORMANCE REQUIREMENTS
Performance is measured in terms of the output provided by the application. Requirement specification plays an important part in the analysis of a system. Only when the requirement specifications are properly given, it is possible to design a system, which will fit into required environment. It rests largely with the users of the existing system to give the requirement specifications because they are the people who finally use the system.  This is because the requirements have to be known during the initial stages so that the system can be designed according to those requirements.  It is very difficult to change the system once it has been designed and on the other hand designing a system, which does not cater to the requirements of the user, is of no use.
The requirement specification for any system can be broadly stated as given below:       
·        The system should be able to interface with the existing system
·        The system should be accurate
·        The system should be better than the existing system
The existing system is completely dependent on the user to perform all the duties.

HARDWARE AND SOFTWARE REQUIREMENTS

HARDWARE REQUIREMENTS                              
PROCESSOR    :  Intel 2.0 GHz or above
HARD DISK     :  80 GB
RAM                 :    512 MB RAM.

   SOFTWARE REQUIREMENTS

OPERATING SYSTEM                     : WINDOWS XP with SP2. 
LANGUAGE (FRONT END)   : JAVA (JDK1.5/1.6)
SERVER                                     : APACHE TOMCAT 5.5/6.0
WEB TECHNOLOGY              : HTML, JAVASCRIPT, CSS.
DATABASE (BACK END)        : ORACLE 10G.
ARCHITECTURE                      : 3-TIER ARCHITECTURE.

2 comments:

  1. please i need source code and documentation for this project

    ReplyDelete
  2. Where can i get the source code for this project?

    ReplyDelete