SlideShare une entreprise Scribd logo
1  sur  19
A two-stage feature selection method for text
categorization by using information gain, principal
component analysis and genetic algorithm
Presented by
Project scope
 The application will serve the two-stage feature selection
method for text categorization by using information
gain, principal component analysis and genetic
algorithm. Due to the increasing number of documents in
digital form, automated text categorization has become
more promising in the decades.
 A two-stage feature selection and feature extraction
is used to reduce the high dimensionality of a
feature space composing of a large number of
terms, remove redundant and irrelevant features
from the feature space and thereby improve the
performance of text categorization
User classes & characteristics
There are two user module viz. decision tree and KNN
classifier.
 Decision tree: The first phase is tree growing where a
tree is built by greedily splitting each tree node. Because
the tree can over fit the training data, in the second
phase, the over fitted branches of the tree are removed.
 KNN classifier: The KNN classifier ranks the document’s
neighbors among the training documents and uses the
class labels of the k most similar neighbors. Similarity
type between two documents may be measured by the
Euclidean distance, cosine measure, etc.
Operating environment
This application is developed in java platform and will be hosted by a
system using Java JDK and tomcat server. The system will
primarily be developed and tested on Windows Operating Systems.
But our goal is to make it a platform independent solution. The target
platforms are:
Linux
Microsoft Windows &
Solaris.
Design and Implementation
Constraints
All designing and coding will be done on Java
Platform. However application can be
implemented in C#.NET.
Assumptions and Dependencies
Since the application is based on Java platform. Hence we assume that user
system must installed JVM to run this application.
SYSTEM FEATURE
Functional requirements
Hard disk 80 GB
RAM 1GB
Processor Intel Pentium IV
Technology Java
Tools Net beans
Operating System Windows
EXTERNAL INTERFACE REQUIREMENTS
User Interfaces: The application is accessible through web browser. It will interact
with its users with web components interface. There are two types of user for this
system retail manager or analyst and customer each can interact with the system with
the following UIs.
Main screen: On this interface there are some options shown as per the user type
For the analysts there are some options related to what type of analysis they want to
do.
Method wise analysis
Decision tree analysis
KNN classifier analysis
For each of the above analysis there is separate new screen showing advanced
options for that analysis that is something like stated below:
There are buttons for ‘In which format output should be displayed Graphical formats
like pie charts , Bar graphs, Tabular format.
Output screen:
On this screen output will be produced in graphical format with proper description
and some options like save result for further use or compare it with old results or
you may discard it if it is of no use.
Software Interfaces
 Name: Java
Version Number: Version 6.0
 Name: Mysql
Version Number: Version 7.0.1
The system must use My SQL server as its database
 Name: NetBeans
Version Number: Version 6 onward
Communications Interfaces
The system will use Apache/tomcat server as the main
communication protocol trough internet/network.
NON-FUNCTIONAL REQUIREMENTS
Performance Requirements
• System can produce results faster on 4GB RAM.
• It may take more time for peak loads at main node
• The system will be available 100% of the time. Once
there is a fatal error, the system will provide
understandable feedback to the user.
Safety and Security Requirements
• All data will be backed-up everyday automatically and also the
system administrator can back- up the data as a function for
him.
• The system is designed in modules where errors can be
detected and fixed easily. This makes it easier to install and
updates new functionality if required.
Software Quality Attributes
 Usability : The application seem to user friendly since the GUI is
interactive.
 Maintainability : This application is maintained for long period of
time since it will be implemented under java platform .
 Reusability : The application can be reusable by expanding it to the
new modules. Performance: The application seems to be
performing faster under 4 GB of RAM. However, the basic
requirement to run the application is 1GB.
 Security: Since the application is developed on JAVA .It is much
more secure than the other environment.
Data flow diagram
UML Activity diagram
UML State transition diagram
UML Sequence diagram
TECHNICAL SPECIFICATION
ADVANTAGES
 The application is platform independent since it is
developed in JAVA.
 The behavior of the application is user friendly since the
GUI is compatible with all operating environment.
Disadvantage
 Since the application performs several task at same
time, It seems to generate output at long interval of time.
Applications
 Spam filtering, a process which tries to discern E-mail
spam messages from legitimate emails
 Email routing, sending an email sent to a general address to a
specific address or mailbox depending on topic.
 Language identification, automatically determining the language of a
text
 Genre classification, automatically determining the genre of a text
 Readability assessment, automatically determining the degree of
readability of a text, either to find suitable materials for different age
groups or reader types or as part of a larger text
simplification system.

Contenu connexe

Similaire à A two stage feature selection method for text categorization

2 d barcode based mobile payment system
2 d barcode based mobile payment system2 d barcode based mobile payment system
2 d barcode based mobile payment systemParag Tamhane
 
Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...
Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...
Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...Nikhil Jain
 
Online Examination System in .NET & DB2
Online Examination System in .NET & DB2Online Examination System in .NET & DB2
Online Examination System in .NET & DB2Abhay Ananda Shukla
 
PERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACH
PERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACHPERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACH
PERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACHcscpconf
 
Performance comparison on java technologies a practical approach
Performance comparison on java technologies   a practical approachPerformance comparison on java technologies   a practical approach
Performance comparison on java technologies a practical approachcsandit
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
quiz half ppt
quiz half pptquiz half ppt
quiz half pptmohit91
 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clusteringNishanth Harapanahalli
 
IRJET - Code Compiler Shell
IRJET -  	  Code Compiler ShellIRJET -  	  Code Compiler Shell
IRJET - Code Compiler ShellIRJET Journal
 
Java Performance & Profiling
Java Performance & ProfilingJava Performance & Profiling
Java Performance & ProfilingIsuru Perera
 
Mail server_Synopsis
Mail server_SynopsisMail server_Synopsis
Mail server_SynopsisManmeet Sinha
 
IRJET- Conversational Commerce (ESTILO)
IRJET- Conversational Commerce (ESTILO)IRJET- Conversational Commerce (ESTILO)
IRJET- Conversational Commerce (ESTILO)IRJET Journal
 
IRJET- Design of Closed Loop PI Controller Based Hybrid Z-Source DC-DC Conver...
IRJET- Design of Closed Loop PI Controller Based Hybrid Z-Source DC-DC Conver...IRJET- Design of Closed Loop PI Controller Based Hybrid Z-Source DC-DC Conver...
IRJET- Design of Closed Loop PI Controller Based Hybrid Z-Source DC-DC Conver...IRJET Journal
 

Similaire à A two stage feature selection method for text categorization (20)

2 d barcode based mobile payment system
2 d barcode based mobile payment system2 d barcode based mobile payment system
2 d barcode based mobile payment system
 
Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...
Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...
Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...
 
Online Examination System in .NET & DB2
Online Examination System in .NET & DB2Online Examination System in .NET & DB2
Online Examination System in .NET & DB2
 
Onine exam 1
Onine exam 1Onine exam 1
Onine exam 1
 
Documentation
DocumentationDocumentation
Documentation
 
SANJAY_SINGH
SANJAY_SINGHSANJAY_SINGH
SANJAY_SINGH
 
PERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACH
PERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACHPERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACH
PERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACH
 
Performance comparison on java technologies a practical approach
Performance comparison on java technologies   a practical approachPerformance comparison on java technologies   a practical approach
Performance comparison on java technologies a practical approach
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Internship msc cs
Internship msc csInternship msc cs
Internship msc cs
 
quiz half ppt
quiz half pptquiz half ppt
quiz half ppt
 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
 
IRJET - Code Compiler Shell
IRJET -  	  Code Compiler ShellIRJET -  	  Code Compiler Shell
IRJET - Code Compiler Shell
 
Java Performance & Profiling
Java Performance & ProfilingJava Performance & Profiling
Java Performance & Profiling
 
E farming
E farmingE farming
E farming
 
Mail server_Synopsis
Mail server_SynopsisMail server_Synopsis
Mail server_Synopsis
 
IRJET- Conversational Commerce (ESTILO)
IRJET- Conversational Commerce (ESTILO)IRJET- Conversational Commerce (ESTILO)
IRJET- Conversational Commerce (ESTILO)
 
IRJET- Design of Closed Loop PI Controller Based Hybrid Z-Source DC-DC Conver...
IRJET- Design of Closed Loop PI Controller Based Hybrid Z-Source DC-DC Conver...IRJET- Design of Closed Loop PI Controller Based Hybrid Z-Source DC-DC Conver...
IRJET- Design of Closed Loop PI Controller Based Hybrid Z-Source DC-DC Conver...
 
Remote Web Desk
Remote Web DeskRemote Web Desk
Remote Web Desk
 
Saloni_Tyagi
Saloni_TyagiSaloni_Tyagi
Saloni_Tyagi
 

Plus de Parag Tamhane

Outlier detection for high dimensional data
Outlier detection for high dimensional dataOutlier detection for high dimensional data
Outlier detection for high dimensional dataParag Tamhane
 
Detection and identification of cheaters in (t, n) secret
Detection and identification of cheaters in (t, n) secretDetection and identification of cheaters in (t, n) secret
Detection and identification of cheaters in (t, n) secretParag Tamhane
 
3 d antiphishing based cryptography
3 d antiphishing based cryptography3 d antiphishing based cryptography
3 d antiphishing based cryptographyParag Tamhane
 
Mpeg 7 video signature tools for content recognition
Mpeg 7 video signature tools for content recognitionMpeg 7 video signature tools for content recognition
Mpeg 7 video signature tools for content recognitionParag Tamhane
 
Integration of sound signature in graphical password
Integration of sound signature in graphical passwordIntegration of sound signature in graphical password
Integration of sound signature in graphical passwordParag Tamhane
 
Multi biometric cryptosystems based on feature-level fusion
Multi biometric cryptosystems based on feature-level fusionMulti biometric cryptosystems based on feature-level fusion
Multi biometric cryptosystems based on feature-level fusionParag Tamhane
 

Plus de Parag Tamhane (6)

Outlier detection for high dimensional data
Outlier detection for high dimensional dataOutlier detection for high dimensional data
Outlier detection for high dimensional data
 
Detection and identification of cheaters in (t, n) secret
Detection and identification of cheaters in (t, n) secretDetection and identification of cheaters in (t, n) secret
Detection and identification of cheaters in (t, n) secret
 
3 d antiphishing based cryptography
3 d antiphishing based cryptography3 d antiphishing based cryptography
3 d antiphishing based cryptography
 
Mpeg 7 video signature tools for content recognition
Mpeg 7 video signature tools for content recognitionMpeg 7 video signature tools for content recognition
Mpeg 7 video signature tools for content recognition
 
Integration of sound signature in graphical password
Integration of sound signature in graphical passwordIntegration of sound signature in graphical password
Integration of sound signature in graphical password
 
Multi biometric cryptosystems based on feature-level fusion
Multi biometric cryptosystems based on feature-level fusionMulti biometric cryptosystems based on feature-level fusion
Multi biometric cryptosystems based on feature-level fusion
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

A two stage feature selection method for text categorization

  • 1. A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm Presented by
  • 2. Project scope  The application will serve the two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Due to the increasing number of documents in digital form, automated text categorization has become more promising in the decades.  A two-stage feature selection and feature extraction is used to reduce the high dimensionality of a feature space composing of a large number of terms, remove redundant and irrelevant features from the feature space and thereby improve the performance of text categorization
  • 3. User classes & characteristics There are two user module viz. decision tree and KNN classifier.  Decision tree: The first phase is tree growing where a tree is built by greedily splitting each tree node. Because the tree can over fit the training data, in the second phase, the over fitted branches of the tree are removed.  KNN classifier: The KNN classifier ranks the document’s neighbors among the training documents and uses the class labels of the k most similar neighbors. Similarity type between two documents may be measured by the Euclidean distance, cosine measure, etc.
  • 4. Operating environment This application is developed in java platform and will be hosted by a system using Java JDK and tomcat server. The system will primarily be developed and tested on Windows Operating Systems. But our goal is to make it a platform independent solution. The target platforms are: Linux Microsoft Windows & Solaris.
  • 5. Design and Implementation Constraints All designing and coding will be done on Java Platform. However application can be implemented in C#.NET.
  • 6. Assumptions and Dependencies Since the application is based on Java platform. Hence we assume that user system must installed JVM to run this application.
  • 7. SYSTEM FEATURE Functional requirements Hard disk 80 GB RAM 1GB Processor Intel Pentium IV Technology Java Tools Net beans Operating System Windows
  • 8. EXTERNAL INTERFACE REQUIREMENTS User Interfaces: The application is accessible through web browser. It will interact with its users with web components interface. There are two types of user for this system retail manager or analyst and customer each can interact with the system with the following UIs. Main screen: On this interface there are some options shown as per the user type For the analysts there are some options related to what type of analysis they want to do. Method wise analysis Decision tree analysis KNN classifier analysis For each of the above analysis there is separate new screen showing advanced options for that analysis that is something like stated below: There are buttons for ‘In which format output should be displayed Graphical formats like pie charts , Bar graphs, Tabular format. Output screen: On this screen output will be produced in graphical format with proper description and some options like save result for further use or compare it with old results or you may discard it if it is of no use.
  • 9. Software Interfaces  Name: Java Version Number: Version 6.0  Name: Mysql Version Number: Version 7.0.1 The system must use My SQL server as its database  Name: NetBeans Version Number: Version 6 onward
  • 10. Communications Interfaces The system will use Apache/tomcat server as the main communication protocol trough internet/network.
  • 11. NON-FUNCTIONAL REQUIREMENTS Performance Requirements • System can produce results faster on 4GB RAM. • It may take more time for peak loads at main node • The system will be available 100% of the time. Once there is a fatal error, the system will provide understandable feedback to the user.
  • 12. Safety and Security Requirements • All data will be backed-up everyday automatically and also the system administrator can back- up the data as a function for him. • The system is designed in modules where errors can be detected and fixed easily. This makes it easier to install and updates new functionality if required.
  • 13. Software Quality Attributes  Usability : The application seem to user friendly since the GUI is interactive.  Maintainability : This application is maintained for long period of time since it will be implemented under java platform .  Reusability : The application can be reusable by expanding it to the new modules. Performance: The application seems to be performing faster under 4 GB of RAM. However, the basic requirement to run the application is 1GB.  Security: Since the application is developed on JAVA .It is much more secure than the other environment.
  • 18. TECHNICAL SPECIFICATION ADVANTAGES  The application is platform independent since it is developed in JAVA.  The behavior of the application is user friendly since the GUI is compatible with all operating environment. Disadvantage  Since the application performs several task at same time, It seems to generate output at long interval of time.
  • 19. Applications  Spam filtering, a process which tries to discern E-mail spam messages from legitimate emails  Email routing, sending an email sent to a general address to a specific address or mailbox depending on topic.  Language identification, automatically determining the language of a text  Genre classification, automatically determining the genre of a text  Readability assessment, automatically determining the degree of readability of a text, either to find suitable materials for different age groups or reader types or as part of a larger text simplification system.