Intelligent web applications

This lecture has been taken for teh AICTE sponsored workshop on web mining. It covers infromation retrieval, searching, meta search engine, focused search engine, web mining, agent based web, knowledge management on web, ontology management systems and wisom web.

Intelligent web applications

  1. 1. Intelligent Applications for Web Priti Srinivas Sajja Associate Professor Department of Computer Science Sardar Patel University Visit priti sajja.info for detail Created By Priti Srinivas Sajja 1
  Introduction Natural Intelligence • Responds to situations flexibly. • Makes sense of ambiguous or erroneous messages. • Assigns relative importance to elements of a situation. • Finds similarities even though the situations might be different. • Draws distinctions between situations even though there may be many similarities between them. Artificial Intelligence • According to Rich & Knight (1991) "AI is the study of how to make computers do things, at which, at the moment, people are better". • A machine is regarded as intelligent if it exhibits human characteristics generated through natural intelligence. • AI is the study of human thought processes and moving toward problem solving in a symbolic and non-algorithmic way.
  Introduction "Artificial Intelligence(AI) is the study of how to make computers do things at which, at the moment, people are better" • Elaine Rich, Artificial Intelligence, McGraw Hill Publications, 1986
  Introduction human thought process heuristic methods where people are better non-algorithmic characteristics we knowledge using associate with intelligence symbols Constituents of artificial intelligence Acceptable solution Extreme solution, either best or in acceptable time worst taking  (infinite) time Nature of AI solutions
  AI Tests Testing Intelligence Turing test will fail to test for intelligence in two circumstances; 1. A machine may well be Can you tell intelligent without me what is 222222*67344 ? being able to chat exactly like a human; and; Why Sir? 2. The test fails to capture the general properties of The Boss could not judge who was replying, intelligence, such as the ability to solve difficult thus the machine is as intelligent as the problems or come up with secretary. original insights. If a The Turing test machine can solve a difficult problem that no person could solve, it would, in principle, fail the test.
  AI Tests Can you find any test to check the given system is intelligent or not? Walks, Makes and perceives, tests, understands joke smells, and feels like human Reacts differently Solves your problem If it talks like human Translates, conceptually form a test summarizes, and use it in different situation and learns before accepting it.
  Applications Rich & Knight (1991) classified and described the different areas that Artificial Intelligence techniques have been applied to as follows: Mundane Tasks Expert Tasks • Perception - vision and • Engineering - design, fault speech finding, manufacturing • Natural language planning, etc. understanding, generation, • Scientific analysis and translation • Medical diagnosis • Commonsense reasoning • Financial analysis • Robot control Formal Tasks • Games - chess, backgammon, checkers, etc. • Mathematics- geometry, logic, integral calculus, theorem proving, etc.
  Data Pyramid IS Strategy makers apply morals, WBS Wisdom (experience) principles, and experience to generate policies Higher management generates Knowledge (synthesis) KBS knowledge by synthesizing information Middle management uses reports/info. DSS, MIS Information (analysis) generated though analysis and acts accordingly TPS Data (processing of raw observations ) Basic transactions by operational staff using data processing Volume Sophistication and complexity Data pyramid
  Knowledge Based systems Knowledge Inference base engine Explanation Self- and learning reasoning User interface General structure of KBS According to the classifications by Tuthhill & Levy (1991), five main types of KBS exists:  Expert systems  Linked System  CASE based Systems  Intelligent Tutoring Systems  Intelligent User Interface for Database
  Knowledge Based systems Experience Experts Sources of Satellite Broadcasting (Internet, TV, Printed knowledge and Radio) Media Types of Knowledge • Tacit knowledge • Explicit knowledge • Commonsense knowledge • Informed commonsense knowledge • Heuristic knowledge • Domain knowledge • Meta knowledge
  Pros and Cons  Intelligence, explanation and reasoning  Partial self learning, uncertainty handling  Documentation of knowledge  Proactive problem solving  Cost effectiveness  Nature of knowledge  Large volume of knowledge  Knowledge acquisition techniques  Little support to engineer AI based systems  Shelf life of knowledge and system  Development Effort
  Bio-Inspired Computing Bio-inspired  New approaches to AI  Taking inspiration form nature and biological systems  Includes models such as  Artificial Neural Network (ANN),  Genetic Algorithm(GA),  Swarm Intelligence(SI), etc.  Nature has virtues of self learning, evolution, emergence and immunity  The objective of bio-inspired models and techniques to take inspiration from Mother Nature and solve problems in more effective and intelligent way
  Artificial Neural Network (ANN)  An artificial neural network (ANN) is connectionist model of programming using computers.  An ANN attempts to give computers humanlike abilities by mimicking the human brain's functionality.  The human brain consists of a network of more than a hundred billions interconnected neurons working in a parallel fashion. W1 X1 X2 W2 XiWi y … …. W n Xn A biological neuron An artificial neuron
  A Perceptron Multilayer Neural Network Input layer Hidden layers X1 W12 Output layer X2 O0 X3 . . . . . O1 . . . . . . …. . . . . Om . Xn W1h
  15. 15. Intelligent Applications for WebArtificialIntelligence A PerceptronBio-inspiredBio-inspiredWebWeb Intelligence Multilayer Neural NetworkSearching and Input layer Hidden layersRetrieval X1 W12 Output layerKnowledge X2Management on Web O0 X3 . . . . . O1Web Mining . . . . . . …. . . . . OmAgent Based .Web Xn W1hAcknowledgement 15 Created By Priti Srinivas Sajja
  Swarm Intelligence  Inspired by the collective behavior of social insect colonies and other animal societies  Ant colony, fish school, bird flocking and honey comb are the examples
  17. 17. Intelligent Applications for WebArtificialIntelligence Swarm IntelligenceBio-inspiredBio-inspired  Inspired by the collective behavior of social insect colonies and other animal societiesWeb  Ant colony, fish school, bird flocking and honey comb are the examplesWeb IntelligenceSearching andRetrievalKnowledgeManagement on WebWeb MiningAgent BasedWebAcknowledgement 17 Created By Priti Srinivas Sajja
  18. 18. Intelligent Applications for WebArtificial Some more examples ….IntelligenceBio-inspiredBio-inspiredWebWeb IntelligenceSearching andRetrievalKnowledgeManagement on WebWeb MiningAgent BasedWebAcknowledgement 18 Created By Priti Srinivas Sajja
  19. 19. Intelligent Applications for WebArtificialIntelligenceBio-inspiredWebWebWeb IntelligenceSearching andRetrievalKnowledgeManagement on WebWeb MiningAgent BasedWebAcknowledgement 19 Created By Priti Srinivas Sajja
  • Semantic Web is an extension of the current Web in which information is given well defined meaning by associating metadata. (Berners-Lee, Hendler, & Lassila, 2001). • Basic objective of a semantic web is "Making content machine-understandable". • The semantic web aims to allow Web entities (software agents, users, and programs) for interoperating, dynamically discovering and using resources, extracting knowledge, and solving complex problems.
  Challenges and limitations of the current Web  Lack of knowledge-based searches  Lack of effective techniques to access the Web in depth  Lack of mechanisms to deal with dynamic requirements of users  Lack of automatically constructed directories  Lack of multi-dimensional analysis and data mining support By employing the AI techniques for web functions, it is possible to Intelligence partly impart intelligence in web-based business. AI Techniques Web Technology • Platform of Internet • Knowledge representation • Protocols and standards • Knowledge management • Browser • Expert system • Search engine Web • Heuristic functions • Semantic Web Intelligence • New AI methods • Other software The Web Intelligence (WI) is considered as employment of AI techniques for the Web.
  Semantic Web Social Search Engine Web Knowledge  Search Engine Techniques Management  Ontology  Popular tools  Customized  Knowledge management and techniques searches management  Meta ontology  Social Network  Meta search architecture for  Interoperability Analysis engine Web  Inference  Search engine  Security optimization Web Intelligence Web Information Web Mining Web Agents Human Computer Retrieval  Web log mining Interaction/NLP  Intelligent  Information  Web structure agents retrieval and mining  Personalized filtering  Web content  Multi agent interface  Performance mining systems  Multi lingual measures  Sensor Web  Pattern interfaces  Usability discovery  NLP mining
  To implement a simple Web crawler following steps can be performed. 1. Start interaction with user and seek keywords and URL to start with 2. Add the URL to list to search for 3. Repeat while list is not empty 3.1 Consider the first URL and mark with appropriate flag 3.2 If the protocol of the selected URL is not HTTP then break 3.3 Follow the robot.txt file (instructions), if any 3.4 Open the URL 3.5 If the URL is not an HTML file then break else add the file into list of files found 3.6 Extract links by traversing the file 3.7 Repeat this procedure for every link within the file
  Spider Lists Index Processing Storage Web crawler process Simple Crawler Searching all pages Focused Crawler Searching relevant pages Scope of focused crawler
  Information Retrieval (IR) is a science of • information finding, • acquiring, • storing and • utilizing the information for problem solving. The formal steps are given as follows: • Indexing • Query formulation • Matching query representation • Relevance feedback and • interactive retrieval
  Models of Information Retrieval 1. Boolean Model - Boolean operators like AND, OR and NOT are applied to retrieve content. 2. Vector space model - represents the documents and queries as vectors (defined by keywords) in a space having more than one dimensions. 3. Probabilistic model - considers the retrieved content according to some rank based on some probability. 4. Latent semantic model - considers associations among terms and documents to retrieve required content.
  User Terminology Grammar Lexicon Token Templates Preprocessing Tokenizer Recognizer Parser Interpreter Search Natural Query Interpreted Request Query Dialog Processor Search Mechanism NLP for IR Filtered Result Search Result Search Result User Profile and Dialog Context Model Local information Analyzer and and Domain Generator Terminology Generic NLP architecture
  Research Trend in IR • Heuristic filtering • Semantic Information • Multimedia Data • Opinion Retrieval • Information retrieval and translation • Fuzzy Boolean model of information retrieval
  The Web follows document-centric approach, which lacks efficient representation and access of the content on Web. Knowledge Knowledge Sources Use Engineer Discover Knowledge Document Base Knowledge Management Organizational Share Standards, Requirements Protocols, and Services Typical Knowledge Management Process
  Service Crawler Inference Knowledge Standards mechanism Discovery Metadata Knowledge Domain Knowledge repository Ontology Processing Experts Knowledge Editor Local User Profiles Presentation Knowledge Administrator Documents Management Users Knowledge Management Architecture on the Web
  Knowledge Management on Web • Autonomous agents for knowledge discovery • Protocols for knowledge share and use • Ontology editors • K-Commerce • Knowledge management models • Virtual world • Wisdom Web
  Data Mining  The data mining techniques are dedicated techniques that extract patterns and useful information from the existing known sources of data. Text Mining  Text mining techniques are used to find, organize and discover information from the textual resources. Web Mining  Web mining techniques are used to find, organize and discover information from the huge unstructured platform such as Web.
  Challenges of Web Mining  Structure highly unstructured  Size tremendous  Nature dynamic  Accessibility global by anybody  Redundant similar information in many formats  Noise virus, malware and adware
  Purpose Data Text Web Mining Mining Mining Finding pattern and knowledge Data Information Web Finding Retrieval Retrieval Retrieval relevant data Data Type /Sources Any data Textual data Web data Web Mining and Other Related Activities Web Mining Web Web Content Web Log Structure Mining Mining Mining
  Web Content Mining  It attempts to mine content of the Web to discover useful patterns through hyperlinks.  The content may be text, images, audio, video, and structured data like tables and graphs.  The web content mining goes beyond keyword extraction and requires advanced techniques such as NLP and AI.  Web content mining strategies are of two groups  one that directly mine the content of documents and  second that improves on the content search of other tools like search engines.
  Classification as Web Content Mining Techniques  Classification : deals with classification of the content into various groups as accurate as possible. The training sets and test (validation) sets are provided to the classification algorithm to build and to test the classification model respectively. Typical classification techniques include:  Decision tree based methods;  Rule base classification;  Supervised learning through artificial neural network;  Evolutionary techniques; and  Support vector machines;
  Clustering as Web Content Mining Techniques  Clustering : deals with finding groups of similar objects based on the content characteristics itself in unsupervised approach. 3 1 Partition Clustering 1, 2 and 3 are Initial Points 2 independent clusters. 3 1 Hierarchical Clustering Here cluster 3 is subset of 2; and 2 2 is subset of 1. Partition and hierarchical clustering
  Classification and Clustering as Web Content Mining Techniques  Association Mining : deals with discovering interesting relations between variables in large databases. This technique find rules that will predict the occurrence of an entity based on general pattern exists in the given data sets.  Consider following example. Transaction ID Bread Cheese Sauce 1 Yes Yes Yes 2 No No Yes 3 No Yes Yes
  Classification as Web Content Mining Techniques  Opinion Mining: deals with extraction of opinion of users learn attitude of the content, person or product. Opinion mining plays an important role in mining applications for customer relationship management, consumer attitude detection, brand and product positioning, product reviews, and market research.  Feature based opinion mining mines the Web content by given features of a specified product/entity.  Once the opinions are collected, they are further grouped and analyzed.
  Some other Web Content Mining Techniques  Structured data extraction: Structured data extraction deals with extraction of important information about product, services and data records that are available in structured form on host pages.  Unstructured content extraction: It deals with extraction of content that is not available in structured form.  Web information integration: It extracts content form multiple site, checks for redundancy, and integrates information. Vice versa, the content mining can be used for web site classification/clustering also.  Detecting noise: The malware, adware and virus from multiple site can be identified and blocked.  Opinion mining: The customer surveys, opinion, sentiments and product review information etc can be extracted here.
  Web Usage Mining  The Web usage mining provides the collection of information accessed so far to its users.  Web usage mining highlights the behavior of users on the Web and understands access patterns and trends.  The web usage mining deals with web log and accumulated data on web servers in order to understand the user behavior and the web structure.  There are two main purposes for web usage mining. The first one is to track general access pattern and second is customized usage tracking.
  Retrieval Cleaning Log Data Identification Integration Registration Cleaning Data Noise Use Pattern Integration discovery & Analysis Other Cleaning of Patterns Analysis and Information Malware Discovery and Use Analysis Activities for web usage mining
  Web Structure Mining  The Web behaves like a hypertext document information system. The Web objects such as pages and sites are generally exist between the numbers of links.  Web structure mining focuses with structure of such hyperlinks on the Web.  There are two basic techniques to analyze the network of links on the Web. These methods are (i) Hyperlinked Induced Topic Search (HITS) concept and (ii) Page Rank method.  The Web may be represented as a huge directed graph structure.
  Sensor Web Mining  Data are collected from different sensors placed at remote places.  Provides opportunity of efficient geo-referencing in remote fashion.  Sensor Web consists number of sensor platforms called pods. Sensor Suit  Each pod senses some dynamic Memory environmental data in real time fashion. Microcontroller Radio  Radio is used to connect the pod with its Solar panel local neighborhood.  Applications are weather forecasting, costal area monitoring, communication Architecture of a Pod and education, and eco-system information and management.
  45. 45. Intelligent Applications for WebArtificialIntelligence Sensor Web Mining  Data are collected from different sensors placed at remoteBio-inspired places.  Provides opportunity of efficient geo-referencing inWeb remote fashion.Web Intelligence  Sensor Web consists number of sensor platforms called pods.Searching and  Each pod senses some dynamic Sensor SuitRetrieval environmental data in real time fashion. MemoryKnowledgeManagement on Web  Radio is used to connect the pod with its Microcontroller Radio local neighborhood. Solar panelWebMiningWeb Mining  Applications are weather forecasting,Agent Based costal area monitoring, communicationWeb and education, and eco-system Architecture of a Pod information and management.Acknowledgement 45 Created By Priti Srinivas Sajja
  AI for Web Mining  Mining pro-active agents  ANN for finding/ analyzing patterns  Fuzzy partitions and
  47. 47. Intelligent Applications for WebArtificialIntelligence AI for Web MiningBio-inspired  Mining pro-active agentsWeb  ANN for finding/ analyzing patternsWeb Intelligence  Fuzzy partitions and clusteringSearching andRetrievalKnowledge  Evolution of patterns from WebManagement on WebWebMiningWeb Mining  Heuristic based filtering functions for miningAgent BasedWeb  Sentiment mining using NLP on social networkAcknowledgement platform 47 Created By Priti Srinivas Sajja
  48. 48. Intelligent Applications for WebArtificialIntelligence Sensors to acquire environmental information and user’s requirementBio-inspired Autonomy Mobility AgentWeb Cooperation Proactivity Action interfaceWeb IntelligenceSearching andRetrieval LearningKnowledgeManagement on Web Types of Agents  Collaborative Agent  Information AgentWeb Mining  Interface Agent  Intelligent AgentAgent BasedAgent Based  Mobile Agent  Hybrid AgentWebWebAcknowledgement 48 Created By Priti Srinivas Sajja
  49. 49. Intelligent Applications for WebArtificialIntelligence Filtering Agent InterfaceBio-inspired Agent Browsers DocumentWeb URL Management Management Agent Query AgentWeb Intelligence Web/ Search Engine Semantic Agent Protocols andSearching and Standards Web InternetRetrievalKnowledge Ontology Tools Core Social NetworkingManagement on Web Ontology Agent Services Agent CustomizedWeb Mining ServicesAgent BasedAgent BasedWebWeb Agent based webAcknowledgement 49 Created By Priti Srinivas Sajja
  50. 50. Intelligent Applications for WebArtificialIntelligence WebBio-inspired Local Databases Base Domain Databases BaseWeb Ontology Knowledge BaseWeb IntelligenceSearching and Query Manager Search Engine VisualizationRetrievalKnowledgeManagement on Web Web BowserWeb Mining Client Client ClientAgent BasedAgent BasedWebWeb Figure 11.10 Information retrieval agentAcknowledgement 50 Created By Priti Srinivas Sajja
  51. 51. Intelligent Applications for WebArtificialIntelligenceBio-inspired  Agent for semantic analysisWeb  Verification and validation (V&V) agent  Finding suitable web services agentWeb Intelligence  Crawler agentSearching andRetrieval  Explanation and reasoning AgentKnowledge  Natural language interface agentManagement on Web  Communication agentWeb Mining  Network traffic management agentAgent BasedAgent BasedWebWeb  Mobile agent for personalized contentAcknowledgement representation 51 Created By Priti Srinivas Sajja
  52. 52. Intelligent Applications for WebArtificialIntelligence Major References  Knowledge-based systems, Akerkar RA and Priti Srinivas Sajja, Jones & BartlettBio-inspired Publishers, Sudbury, MA, USA (2009)  Intelligent technologies for web applications”, Priti Srinivas Sajja, RajendraWeb Akerkar; CRC Press (Taylor & Francis Group), Boka Raton, FL, USA (2012)Web Intelligence Other References  llustrationsOf.comSearching and  coders-view.blogspot.comRetrieval  info.ideal.comKnowledge  http://businessintelligencetalk.blogspot.inManagement on Web  www.gadgetcage.com  Engadget.comWeb Mining  scenicreflections.com  lih.univ-lehavre.frAgent Based  business2press.comWeb  globalswarminghoneybees.blogspot.comAcknowledgement  pritisajja.info 52 Created By Priti Srinivas Sajja
  53. 53. Intelligent Applications for WebArtificialIntelligenceBio-inspiredWebWeb IntelligenceSearching andRetrievalKnowledgeManagement on Web To the participants and authority of the AICTE sponsored Staff Development Programme on Data Mining,Web Mining 16-28 April, 2012Agent Based at theWeb L. J. Institute of Engineering & Technology, Ahmedabad.Acknowledgement 53 Created By Priti Srinivas Sajja