Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Introduction to Knowledge Graphs


Consultez-les par la suite

1 sur 74 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (16)

Similaire à Introduction to Knowledge Graphs (20)


Plus récents (20)

Introduction to Knowledge Graphs

  1. 1. Introduction to Knowledge Graphs Mukul Joshi Enterprise Architect Tech Mahindra Ltd. mukul@techmahindra.com @mukulashokjoshi
  2. 2. Introduction to Knowledge Graphs Knowledge Graphs are being used in some of the largest enterprises in the world as well as in Intelligent Agents, as a means to capture the concepts/entities in their domains (i.e. context) and the relationships amongst those concepts/entities, in order to:  drive their businesses,  generate insights/inferences and  to enhance the entities/relationships by applying appropriate learning techniques to generate new entities/relationships This presentation will give a brief overview of Knowledge Graphs What? – Preamble/Background to Knowledge Graphs Who? – Who uses Knowledge Graphs Why? – Why use Knowledge Graphs How? – How to use Knowledge Graphs A few Knowledge Graph use cases:  Intelligent Agents  Semantic Web  Semantic Search Engine  Social Network graphs  Biological Networks  Enterprise Knowledge Graph (including the likes of CMDB/IT Architecture)  Master Data Management  Dialog Systems/NLP/NLG  Financial Knowledge Graphs (risk analysis, fraud detection, GDPR etc.)  Cybersecurity
  3. 3. Knowledge Graphs What Why Who How
  4. 4. What are Knowledge Graphs?
  5. 5. AI Dimensions – Machine Evolution – What is the End Game? Matrix Digital Rain are Sushi Recipes https://en.wikipedia.org/wiki/Matrix_digital_rain Evolution Human Machine Artificial Intelligence Human Intelligence Human Machine Interface Neural Chip (Elon Musk) Science Fiction? Cyborg I, Robot Consciousness Matrix Control Devices Control Humans?? Artificial Consciousness Human Level AI Artificial General Intelligence Cognitive Architecture Wake Up, Neo In an interview with MIT researcher Lex Fridman, Tesla and SpaceX CEO Elon Musk reiterated his belief that we’re all living inside a simulation. When Fridman asked him what his first question would be for the first-ever artificial general intelligence system, Musk replied: “What’s outside the simulation?” Pong Of My People Musk has floated the possibility that we’re all living in a “The Matrix” style simulation for several years. During a 2016 interview at a tech conference, Musk asserted that there’s “a one in billions chance we’re in base reality.” The argument: video games have evolved from a “Pong, two rectangles and a dot” he told audiences at the 2016 conference, to “photorealistic 3D simulations with millions of people playing simultaneously.” Given enough time, those games would become “indistinguishable from reality.” https://futurism.com/elon-musk-smart-ai-simulation
  6. 6. AI Dimensions - Intelligent Agent/System Intelligent Agent/ System Thinking Acting Human Rational Agent mimics Human Thinking & Behavior  Fidelity to Human Performance  Empirical i.e. based on observations & hypothesis of Human Behavior Agent has rational behavior  System is rational, if it does the “right” thing given what it knows  Rationalist approach is based on combination of mathematics and engineering Source : From the book “Artificial Intelligence : A Modern Approach” by Stuart Russell and Peter Norvig
  7. 7. AI Dimensions – Turing Test for Intelligent Agent/System Intelligent Agent/ System Natural Language Processing Knowledge Representation Automated Reasoning Machine Learning Computer Vision Robotics The Turing test, proposed by Alan Turing, was designed to provide a satisfactory operational definition of Intelligence. A computer passes the test if a human interrogator, after posing some written questions, cannot tell whether the responses come from a person or from a computer. The computer would need to possess the following capabilities: • Natural Language Processing – to enable it to communicate successfully • Knowledge Representation – to store what it knows or hears • Automated Reasoning – to use the stored the information to answer queries and to draw new conclusions • Machine Learning – to adapt to new circumstances and to detect and extrapolate patterns The so-called total Turing Test includes a video signal so that the interrogator can test the subject’s perceptual abilities, as well as the opportunity for the interrogator to pass physical objects “through the hatch”. To pass the total Turing Test, the computer will need: • Computer Vision – to perceive objects, and • Robotics – to manipulate objects and move about Source : From the book “Artificial Intelligence : A Modern Approach” by Stuart Russell and Peter Norvig
  8. 8. AI Dimensions – Broad Learning Categories (ML Tribes) Machine Learning Tribes Symbolists Bayesians Connectionists Source: Machine Learning Evolution Infographic: http://usblogs.pwc.com/emerging-technology/machine-learning-evolution-infographic/ Evolutionaries Analogizers In practice, each of these algorithms is good for some things but not for others. What we really want is a single algorithm combining the key features of all of them: the ultimate master algorithm - Pedro Domingos, The Master Algorithm Master Algorithm
  9. 9. AI Dimensions – Third Wave of AI - DARPA Source : DARPA perspective on Artificial Intelligence: https://www.darpa.mil/about-us/darpa-perspective-on-ai AI Next (DARPA program) will accelerate “the Third Wave” which enables machines to adapt to changing situations. For instance, adaptive reasoning will enable computer algorithms to discern the difference between the use of ‘principal’ and ‘principle’ based on the analysis of surrounding words to help determine context. Dr Steven Walker, Director of DARPA, said: “Today, machines lack contextual reasoning capabilities and their training must cover every eventuality – which is not only costly – but ultimately impossible. We want to explore how machines can acquire human-like communication and reasoning capabilities, with the ability to recognize new situations and environments and adapt to them.”
  10. 10. AI Dimensions – Third Wave of AI - DARPA Artificial Intelligence Perceive Learn Abstract Reason Source : DARPA perspective on Artificial Intelligence: https://www.darpa.mil/about-us/darpa-perspective-on-ai
  11. 11. AI Dimensions – Third Wave of AI - DARPA Source: A DARPA perspective on AI: https://sites.nationalacademies.org/cs/groups/pgasite/documents/webpage/pga_177035.pdf
  12. 12. AI Dimensions – Third Wave of AI - DARPA Source: A DARPA perspective on AI: https://sites.nationalacademies.org/cs/groups/pgasite/documents/webpage/pga_177035.pdf
  13. 13. cognitive adjective cog·​ni·​tive | ˈkäg-nə-tiv Definition of cognitive 1: of, relating to, being, or involving conscious intellectual activity (such as thinking, reasoning, or remembering) //cognitive impairment 2: based on or capable of being reduced to empirical factual knowledge AI Dimensions – Dictionary definition of Cognitive
  14. 14. AI Dimensions – Cognitive AI This concept of cognitive computing and human-like reasoning has propelled Abdallat to take a giant leap beyond the limits of "conventional AI" to help solve complex problems and engage customers. "Machine learning or neural networks is what we call the numeric side of AI," he explained. "The human brain is really not good at doing large calculations; a calculator can do that faster. But what we're good at is that symbolic side. That's where we are superior to machines. Beyond Limits takes the outputs from these numeric tools -- machine learning, deep learning, neural nets -- and we make actionable intelligence when the presence of human beings is not possible or when you want to automate that layer.“ Numeric SymbolicCOGNITIVE Source: AI for the Enterprise transmitted directly from Mars: https://searchcustomerexperience.techtarget.com/feature/AI-for-the-enterprise-transmitted-directly-from-Mars
  15. 15. AI Dimensions – Cognitive AI Trust in the machine In high-risk, high-value industries such as energy, healthcare, and finance there is too much at stake to trust the decisions of a machine at face value, with no explainable understanding of its reasoning. Since machine learning is a component for many AI systems, it’s important for us to know precisely what the machine is learning. Machine learning is a great method for handling lots of data that can tell you the what. But for AI systems to become trusted advisors to human decision-makers they need to be able to explain the why. In order to make an AI system explainable, we add symbolic reasoning on top of numerical calculation. Symbolic reasoning allows the system to think like a person and supply human-like reasoning to its recommendations. This hybrid approach combining conventional, numeric techniques with symbolic reasoning is called cognitive AI. To comprehend this more intuitive approach, we have to turn to space exploration. Source: Beyond Conventional AI: More Intelligent, More Explainable AI: http://stories.venturebeat.com/beyond-conventional-ai-more-intelligent-more-explainable-ai/ Thinking for itself If you really think about it, our cognitive technology is based on concepts. You can describe a concept at a strict algorithmic level, or you can choose to add more natural language components that give the system the ability to explain itself. Beyond Limits’ cognitive systems have to say: “I have been educated to understand this kind of problem; you're presenting me with a set of features, so I need to manipulate those features relative to my education (Context)” That process of manipulating the features in order to perform inductive, deductive, and abductive reasoning produces an explainable trail. If the natural language declarations are in place, then the system can (at a later time or in conjunction) produce natural language descriptions of what it's doing at any given moment. Explainability becomes a powerful tool when AI engineers work with subject matter experts to learn about their respective specialties. The engineers study the specialty from an algorithm/process/detective perspective, and then annotate it in a form to enable the machine to provide explainability at a human level of understanding. This was a requirement for space missions that Beyond Limits’ scientists solved years ago.
  16. 16. AI Dimensions – Knowledge Graphs – third era of Computing Source: Knowledge Graphs: The third era of computing: https://medium.com/@dmccreary/knowledge-graphs-the-third-era-of-computing-a8106f343450
  17. 17. Perspectives - Data, Information, Knowledge, Wisdom (DIKW) Pyramid Data is a collection of facts in a raw or unorganized form such as numbers or characters. However, without context, data can mean little. By adding context and value to the data, they now have more meaning. Information is the next building block of the DIKW Pyramid. This is data that has been “cleaned” of errors and further processed in a way that makes it easier to measure, visualize and analyze for a specific purpose. By asking relevant questions about ‘who’, ‘what’, ‘when’, ‘where’, etc., we can derive valuable information from the data and make it more useful for us. “How” is the information, derived from the collected data, relevant to our goals? “How” are the pieces of this information connectedto other pieces to add more meaning and value? And, maybe most importantly, “how” can we apply the information to achieve our goal? When we don’t just view information as a description of collected facts, but also understand how to apply it to achieve our goals, we turn it into knowledge. This knowledge is often the edge that enterprises have over their competitors. Wisdom is the top of the DIKW hierarchy and to get there, we must answer questions such as ‘why do something’ and ‘what is best’. In other words, wisdom is knowledge applied in action. How Enterprises and Organizations Move Up the Knowledge Pyramid? One easy and fast way for enterprises to take the steps from data to information to knowledge and wisdom is to use Semantic Technologies such as Linked Data and Semantic Graph Databases. These technologies can create links between disparate and heterogeneous data and infer new knowledge out of existing facts. Armed with this new knowledge, enterprises can climb up the mountain of wisdom and gain a competitive advantage by supporting their business decisions with data-driven analytics. Source: What is the DIKW Pyramid: https://ontotext.com/knowledgehub/fundamentals/dikw-pyramid/
  18. 18. “ISRO knows that Vikram has landed on the Moon” Knowledge is a relation between a Knowerand a Proposition: Knower: ISRO Proposition: the idea expressed by a simple derivative sentence, like “Vikram has landed on the Moon” What can be said about Propositions? For KR&R, what matters about propositions is that they are abstract entities that can be true or false, right or wrong When we say, “ISRO knows that p”, we can just as well say, “ISRO knows that it is true that p” Representation is a relation between domains, where the first is meant to “stand for” or take the place of the second (surrogate). It can also be known as “Digital Twin” Usually, the first domain, the representor, is more concrete or immediate, or accessible in some way than the second • For example, a road sign indicating sharp turn ahead stands for a less immediately visible sharp turn on the road One type of representor is the formal symbol, that is, a character or a group of characters taken from pre-determined alphabets • The digit “7”, for example, stands for the number 7 as does the group of letters “VII” Knowledge Representation, then, is the field of study concerned with using formal symbols to represent a collection of propositionsbelieved by some agents Reasoning, in general, is the formal manipulation of the symbols representing a collection of believed propositions to produce representations of new ones  Symbols are more accessible than the propositions they represent: They must be concrete enough that we can manipulate them in such a way as to construct representations of new propositions  Socrates is human and All humans are mortal. Thus Socrates is mortal  This form of reasoning is called logical inference because the final sentence represents a logical conclusion of the propositions represented by the initial ones Reasoning is a form of calculation, not unlike arithmetic, but over symbols standing for propositions rather than numbers Perspectives – Knowledge Representation & Reasoning (KR&R) Knowledge Representation Reasoning
  19. 19. Perspectives – Symbolic Methods of Knowledge Representation
  20. 20. Perspectives – Ontologies
  21. 21. In their pivotal article Davis, Shrobe, and Szolovits set five principles for knowledge representation: 1.Surrogate: KR provides a symbolic counterpart of actual objects, events and relationships. 2.Ontological commitments: a KR is a set of statements about the categories of things that may exist in the domain under consideration. 3.Fragmentary theory of intelligent reasoning: a KR is a model of what the things can do or can be done with. 4.Medium for efficient computation: making knowledge understandable by computers is a necessary step for any learning curve. 5.Medium for human expression: one the KR prerequisite is to improve the communication between specific domain experts on one hand, generic knowledge managers on the other hand. Whereas models are meant to fully and consistently meet these requirements, ontologies do not: ontological commitments (2) are not supposed to apply to surrogates (1) nor be designed for efficient computation (4). Instead, ontologies are to focus on intelligibility and transparent reasoning (3, 5), and that can be seen as the main objective of enterprise architecture. Perspectives – Ontologies to drive KR&R Source: What is Knowledge Representation: http://groups.csail.mit.edu/medg/ftp/psz/k-rep.html
  22. 22. Ontology (Knowledge Representation) An ontology is a formal description of knowledge as a set of concepts within a domain and the relationships that hold between them. To enable such a description, we need to formally specify components such as individuals (instances of objects), classes, attributes and relations as well as restrictions, rules and axioms. As a result, ontologies do not only introduce a sharable and reusable knowledge representation but can also add new knowledge about the domain. There are, of course, other methods that use formal specifications for knowledge representation such as vocabularies, taxonomies, thesauri, topic maps and logical models. However, unlike taxonomies or relational database schemas, for example, ontologies express relationships and enable users to link multiple concepts to other concepts in a variety of ways. Source: What are Ontologies? https://www.ontotext.com/knowledgehub/fundamentals/what-are-ontologies/
  23. 23. The Benefits of Using Ontologies One of the main features of ontologies is that, by having the essential relationships between concepts built into them, they enable automated reasoning about data. Such reasoning is easy to implement in semantic graph databases that use ontologies as their semantic schemata. What’s more, ontologies function like a ‘brain’. They ‘work and reason’ with concepts and relationships in ways that are close to the way humans perceive interlinked concepts. In addition to the reasoning feature, ontologies provide a more coherent and easy navigation as users move from one concept to another in the ontology structure. Another valuable feature is that ontologies are easy to extend as relationships and concept matching are easy to add to existing ontologies. As a result, this model evolves with the growth of data without impacting dependent processes and systems if something goes wrong or needs to be changed. Ontologies also provide the means to represent any data formats, including unstructured, semi-structured or structured data, enabling smoother data integration, easier concept and text mining, and data-driven analytics. Ontology (Knowledge Representation) Source: What are Ontologies? https://www.ontotext.com/knowledgehub/fundamentals/what-are-ontologies/
  24. 24. What is the Difference Between a Taxonomy and Ontology? A taxonomy is a set of definitions that are organized by a hierarchy that starts at the most general description of something and gets more defined and specific as you go down the hierarchy of terms. For example, a red-tailed hawk could be represented in a common language taxonomy as follows: Bird Raptors Hawks Red Tailed Hawk An ontology describes a concept both by its position in a hierarchy of common factors like the above description of the red-tailed hawk but also by its relationships to other concepts. For example, the red-tailed hawk would also be associated with the concept of predators or animals that live in trees. The richness of the relationships described in an ontology is what makes it a powerful tool for modeling complex business ecosystems. Ontology (Knowledge Representation) Source: Semantic Ontology – The Basics https://www.semanticarts.com/semantic-ontology-the-basics/
  25. 25. Ontology (Knowledge Representation)  The history of artificial intelligence shows that knowledge is critical for intelligent systems. In many cases, better knowledge can be more important for solving a task than better algorithms. To have truly intelligent systems, knowledge needs to be captured, processed, reused, and communicated. Ontologies support all these tasks.  The term "ontology" can be defined as an explicit specification of conceptualization.  Ontologies capture the structure of the domain, i.e. conceptualization. This includes the model of the domain with possible restrictions. The conceptualization describes knowledge about the domain, not about the particular state of affairs in the domain. In other words, the conceptualization is not changing, or is changing very rarely. Ontology is then specification of this conceptualization - the conceptualization is specified by using particular modeling language and particular terms. Formal specification is required in order to be able to process ontologies and operate on ontologies automatically.  Ontology describes a domain, while a knowledge base (based on an ontology) describes particular state of affairs.  Each knowledge based system or agent has its own knowledge base, and only what can be expressed using an ontology can be stored and used in the knowledge base. When an agent wants to communicate to another agent, he uses the constructs from some ontology. In order to understand in communication, ontologies must be shared between agents.
  26. 26. Ontologies capture knowledge in context. The context is expressed as taxonomical trees in which higher order entities (categories and domains) generalize properties of things wherewith you can group them. All the entities from the highest to the lowest are concepts, and often instances, and have properties of their own. Lower order entities (concepts and instances) inherit general properties from above and possess unique properties of their own. Ontology (Knowledge Representation)
  27. 27. Ontology (Knowledge Representation)
  28. 28. A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge. Knowledge Graph – Definition – Key Concepts It’s a graph A knowledge graph is organized as a graph, which is not always true of knowledge bases. The primary benefits of a graph are that connections (relationships) in the data are first-class citizens, you can easily connect new data items as they are injected into the data pool, and, finally, you can easily traverse links to discover how remote parts of a domain relate to each other (there’s a huge value in linking information). A graph is one of the most flexible formal data structures, so you can easily map other data formats to graphs using generic tools and pipelines. It’s semantic the meaning of the data is encoded alongside the data in the graph, in the form of the ontology. A knowledge graph is self-descriptive, or, simply put, it provides a single place to find the data and understand what it’s all about. There is an additional benefit in that you can submit queries in a style that is much closer to a natural language, using a familiar domain vocabulary. That is, the meaning of the data is typically expressed in terms of entity and relation names that are familiar to those interested in the given domain. This enables smarter search, more efficient discovery, and narrows the communication gap between data providers and consumers. Source: What is a Knowledge Graph? https://hackernoon.com/wtf-is-a-knowledge-graph-a16603a1a25f
  29. 29. It’s smart The underlying basis of a knowledge graph is the ontology, which specifies the semantics of the data. An ontology is typically based on logical formalisms which support some form of inference: allowing implicit information to be derived from explicitly asserted data. Some of the information inferred can be otherwise hard to discover. Knowledge graphs being actual graphs, in the proper mathematical sense, allow for the application of various graph-computing techniques and algorithms (for example, shortest path computations, or network analysis), which add additional intelligence over the stored data. They are easy to extend over time as the schema is not strict or prohibitive in the same way as it is for SQL Knowledge Graph – Definition – Key Concepts It’s alive Knowledge Graphs have a flexible structure: the ontology can be extended and revised as new data arrives. This makes it convenient to store and manage data in a knowledge graph if you have use cases where regular updates and data growth are important, particularly when data is arriving from diverse, heterogenous sources. A knowledge graph can support a continuously running data pipeline that keeps adding new knowledge to the graph, refining it as new information arrives. Knowledge graphs are also able to capture diverse meta-data annotations such as provenance or versioning information, which make them ideal for working with a dynamic dataset. There is an increasing need to account for the provenance of data and include it so that the knowledge can be assessed by its consumers in terms of credibility and trustworthiness. A knowledge graph can answer what it knows, and also how and why it knows it. Source: What is a Knowledge Graph? https://hackernoon.com/wtf-is-a-knowledge-graph-a16603a1a25f
  30. 30. Who uses Knowledge Graphs?
  31. 31. Google Knowledge Graph
  32. 32. Google Knowledge Graph
  33. 33. What Google says about its knowledge graph: • Take a query like [taj mahal]. For more than four decades, search has essentially been about matching keywords to queries. To a search engine the words [taj mahal] have been just that—two words. • But we all know that [taj mahal] has a much richer meaning. You might think of one of the world’s most beautiful monuments, or a Grammy Award-winning musician, or possibly even a casino in Atlantic City, NJ. Or, depending on when you last ate, the nearest Indian restaurant. It’s why we’ve been working on an intelligent model—in geek-speak, a “graph”—that understands real-world entities and their relationships to one another: things, not strings. • The Knowledge Graph enables you to search for things, people or places that Google knows about—landmarks, celebrities, cities, sports teams, buildings, geographical features, movies, celestial objects, works of art and more—and instantly get information that’s relevant to your query. This is a critical first step towards building the next generation of search, which taps into the collective intelligence of the web and understands the world a bit more like people do. • Google’s Knowledge Graph isn’t just rooted in public sources such as Freebase, Wikipedia and the CIA World Factbook. It’s also augmented at a much larger scale—because we’re focused on comprehensive breadth and depth. It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. And it’s tuned based on what people search for, and what we find out on the web. Google Knowledge Graph Source: Introducing the Knowledge Graph: https://www.blog.google/products/search/introducing-knowledge-graph-things-not/
  34. 34. Diffbot Knowledge Graph Source: Diffbot Knowledge Graph: https://www.diffbot.com/knowledge-graph/
  35. 35. Diffbot Knowledge Graph That is also what Diffbot is unveiling today: The ability to query the web as a database. This impressive feat is also based on a knowledge graph. The difference is that, in Diffbot's case, the knowledge graph is only partially curated by humans, and is automatically populated by crawling the web First off, you have to crawl the web. This is where Gigablast and Matt Wells come in. Gigablast is a search engine created by Matt Wells, Diffbot's VP of Search, in 2000. Tung says this is what Diffbot uses to crawl, and store, every single document on the web. Hard as this may be, however, it's not even half the job. The really hard part is getting the information out of documents, and this is where the magic is. Tung explains this is done using computer vision, machine learning (ML), and natural language processing (NLP). Computer vision helps Diffbot understand the structure of documents. It mimics the way humans break down documents, figuring out what are the structural elements of each document -- things such as headers, blocks, etc. In a perfect world, this should be possible by inspecting the HTML structure of web documents. But not everything on the web is HTML, and HTML documents are not perfect either. After structure comes content. Content is parsed using a combination of NLP and ML, the result of which is structured knowledge which is added to Diffbot's knowledge graph (DKG). Source: The web as a database: The biggest knowledge graph ever: https://www.zdnet.com/article/the-web-as-a-database-the-biggest-knowledge-graph-ever/
  36. 36. Microsoft Graph Microsoft Graph exposes REST APIs and client libraries to access data on the following Microsoft 365 services:  Office 365 services: Delve, Excel, Microsoft Bookings, Microsoft Teams, OneDrive, OneNote, Outlook/Exchange, Planner, and SharePoint  Enterprise Mobility and Security services: Advanced Threat Analytics, Advanced Threat Protection, Azure Active Directory, Identity Manager, and Intune  Windows 10 services: activities, devices, notifications  Dynamics 365 Business Central Source: Microsoft Graph Overview: https://docs.microsoft.com/en-us/graph/overview
  37. 37. Microsoft Graph Source: Microsoft Graph Overview: https://docs.microsoft.com/en-us/graph/overview
  38. 38. Use Microsoft Graph to build experiences around the user's unique context to help them be more productive. Imagine an app that...  Looks at your next meeting and helps you prepare for it by providing profile information for attendees, including their job titles and managers, as well as information about the latest documents they're working on, and people they're collaborating with.  Scans your calendar, and suggests the best times for the next team meeting.  Fetches the latest sales projection chart from an Excel file in your OneDrive and lets you update the forecast in real time, all from your phone.  Subscribes to changes in your calendar, sends you an alert when you’re spending too much time in meetings, and provides recommendations for the ones you can miss or delegate based on how relevant the attendees are to you.  Helps you sort out personal and work information on your phone; for example, by categorizing pictures that should go to your personal OneDrive and business receipts that should go to your OneDrive for Business.  Analyzes at-scale Office 365 data so that decision makers can unlock valuable insights into time allocation and collaboration patterns that improve business productivity.  Brings custom business data into Microsoft Graph, indexing it to make it searchable along with data from Microsoft 365 services. Pick the first scenario about researching meeting attendees as an example. With the Microsoft Graph API, you can:  Get the email addresses of the meeting event attendees.  Look them up individually as a user in Azure Active Directory to get their profile information. You can then navigate to other resources using relationships:  Connect to their manager through a manager relationship.  Get valuable insights and intelligence including the popular files trending around the user.  Get the most relevant people around the user.  Extend the scenario to get to the user's groups through a memberOf relationship  Reach other members in each group.  Tap into other scenarios enabled by groups, such as education and teamwork. Microsoft Graph Source: Microsoft Graph Overview: https://docs.microsoft.com/en-us/graph/overview
  39. 39. Facebook – Social Graph – TAO This simple example shows a subgraph of objects and associations that is created in TAO after Alice checks in at the Golden Gate Bridge and tags Bob there, while Cathy comments on the check-in and David likes it. Every data item, such as a user, check-in, or comment, is represented by a typed object containing a dictionary of named fields. Relationships between objects, such as “liked by" or “friend of," are represented by typed edges (associations) grouped in association lists by their origin. Multiple associations may connect the same pair of objects as long as the types of all those associations are distinct. Together objects and associations form a labeled directed multigraph. Source: TAO: The power of the graph: https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920/
  40. 40. Facebook puts an extremely demanding workload on its data backend: Every time any one of over a billion active users visits Facebook through a desktop browser or on a mobile device, they are presented with hundreds of pieces of information from the social graph. Users see News Feed stories; comments, likes, and shares for those stories; photos and check-ins from their friends -- the list goes on. The high degree of output customization, combined with a high update rate of a typical user’s News Feed, makes it impossible to generate the views presented to users ahead of time. Thus, the data set must be retrieved and rendered on the fly in a few hundred milliseconds. This challenge is made more difficult because the data set is not easily partitionable, and by the tendency of some items, such as photos of celebrities, to have request rates that can spike significantly. Multiply this by the millions of times per second this kind of highly customized data set must be delivered to users, and you have a constantly changing, read-dominated workload that is incredibly challenging to serve efficiently. Facebook – Social Graph – TAO  Facebook users record their relationships, share their interests, upload text, images, and video, and curate semantic information about their data. This data is the Facebook “Social Graph” and the personalized user experience requires timely, efficient and scalable access to this flood of data  TAO (The Objects & Associations) is an implementation of a simple data model and API for serving the Social Graph. TAO provides basic access to the nodes and edges of a constantly changing graph in data centers across multiple regions. It is optimized heavily for reads, and explicitly favors efficiency and availability over consistency  TAO is geographically distributed data store  Facebook focuses on people, actions, and relationships. These entities and connections are modeled as nodes and edges in a graph. This representation is very flexible; it directly models real-life objects, and can also be used to store an application’s internal implementation-specific data. TAO’s goal is not to support a complete set of graph queries, but to provide sufficient expressiveness to handle most application needs while allowing a scalable and efficient implementation A single Facebook page may aggregate and filter hundreds of items from the social graph. We present each user with content tailored to them, and we filter every item with privacy checks that take into account the current viewer. This extreme customization makes it infeasible to perform most aggregation and filtering when content is created; instead we resolve data dependencies and check privacy each time the content is viewed. As much as possible we pull the social graph, rather than pushing it. This implementation strategy places extreme read demands on the graph data store; it must be efficient, highly available, and scale to high query rates. Source: TAO: The power of the graph: https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920/
  41. 41. Airbnb – Knowledge Graph  Imagine you’re finally getting to take that vacation you’ve dreamed of — three countries, seven cities, thousands of miles. It’s everything you could want and more, right? But where do you start? How do you know what to eat, where to visit, how to experience what makes your destination truly unique? All the information you’re looking for is on Airbnb…somewhere…the question is, “How do we surface the relevant parts of that information to you at the exact time you’re looking for it?”  Discovering what you want and need to know about a destination is crucial to the overall trip experience, especially when traveling to a place you’ve never been to before.  In order to surface relevant context to people, we need to have some way of representing relationships between distinct but related entities (think cities, activities, cuisines, etc.) on Airbnb to easily access important and relevant information about them. These types of information will become increasingly important as we move towards becoming an end-to-end travel platform as opposed to just a place for staying in homes.  The knowledge graph is our solution to this need, giving us the technical scalability we need to power all of Airbnb’s verticals and the flexibility to define abstract relationships. Source: Scaling Knowledge Access and Retrieval at Airbnb: https://medium.com/airbnb-engineering/scaling-knowledge-access-and-retrieval-at-airbnb-665b6ba21e95
  42. 42. Why is a graph structure scalable?  Normally, engineers work with relational structures, where a schema defines what each row of data contains. This is the preferred way for holding data because it works great for transactional processes since it makes it really quick to access rows of data. However, there is an operational burden when you have many table for distinct objects that may contain the same relational information in individual columns (ex: the city homes or experiences are located in, or the type of activity that an experience and that a destination is known for). This is where the graph structure comes into play.  Although our underlying data store is still relational, structuring our queries in terms of this graph give us power in maintaining data semantics. We want the same Surfing that an experience is associated with to be the same Surfing that Hawaii is known for. This type of structure around the relationships between the entities on Airbnb’s platform gives us the scalability and flexibility needed to expand categorization to any number of things. By having the same object to represent all of the things in our world, we remove the operational overhead for redefining the world whenever we introduce a new product to our platform.  In this way, we can support our objective to 1) encode an exponentially growing number of relationships between entities and 2) enable easy traversal along those connections. Airbnb – Knowledge Graph Source: Scaling Knowledge Access and Retrieval at Airbnb: https://medium.com/airbnb-engineering/scaling-knowledge-access-and-retrieval-at-airbnb-665b6ba21e95
  43. 43. Airbnb – Knowledge Graph  The taxonomy of our knowledge graph refers to the vocabulary that we use to describe our inventory and world around us. The taxonomy is hierarchical (as shown above), so that we can map high-level concepts like “Sport” down to a very specific activity such as “Surfing.” The main constraint that we want to maintain is that the knowledge graph is Mutually Exclusive and Collectively Exhaustive, so that we can keep the taxonomy very streamlined in avoiding duplicate data. Because of the graph structure that we have, it’s very easy to scale this taxonomy to tens or hundreds of layers deep and still surface the relevant inventory for high level concepts.  In the graph, we have nodes and edges. Nodes refer to any type of entity on the Airbnb platform (restaurants, neighborhoods, experiences, events, etc.). Edges refer to the types of relationships that exist between any of the entities in the graph. Under this model, there are different types of nodes for different types of entities and different types of edges for different types of relationships (located in, tagged by, etc.). From there, we have a flexible API to query for neighbors connected by certain types of relationships and can index our inventory items by the unique identifiers of their corresponding representation in the knowledge graph.  Once a critical mass of data is reached, we can start thinking about making automatic inferences based on the data already in the graph. For example, if something is tagged with “Nature” and “Walking,” maybe we can infer that it should be tagged with “Hiking” as well. Source: Scaling Knowledge Access and Retrieval at Airbnb: https://medium.com/airbnb-engineering/scaling-knowledge-access-and-retrieval-at-airbnb-665b6ba21e95
  44. 44.  The shopping interests of customers keep changing with their lifestyle. A customer might want to shop for a book today and 3 months later shop for a laptop. With an increase in such behavioural changes, there was a need to relate such users.  Commerce graph is one such solution at Flipkart to identify users with similar shopping patterns. It relates users and products based on the user’s behaviour on our website and apps.  Our commerce graph has approximately more than 500 million nodes and 2 billion edges. The graph involves buyers, sellers, products and categories in a single connected graph.  Few use cases currently being powered by our commerce graph are:  Generating features for insights ML models  Finding out heterogeneous users as beta and control set  Detecting fraudulent sellers and buyers  Sending targeted marketing campaign to users A commerce graph is a data structure that connects products, categories, sellers and users who view or purchase products on Flipkart. Flipkart Commerce Graph Source: Flipkart Commerce Graph - Evaluation of Graph Data Stores: https://tech.flipkart.com/flipkart-commerce-graph-evaluation-of-graph-data-stores-8fe0f964affd
  45. 45.  LinkedIn knowledge graph is a large knowledge base built upon “entities” on LinkedIn, such as members, jobs, titles, skills, companies, geographical locations, schools, etc. These entities and the relationships among them form the ontologyof the professional world and are used by LinkedIn to enhance its recommender systems, search, monetization and consumer products, business and consumer analytics.  we build LinkedIn knowledge graph based on a large amount of user-generated content from members, recruiters, advertisers and company administrators, who are not aware of their participation in building the knowledge graph, and supplement it with data extracted from the Web, which is noisy and can have duplicates. The knowledge graph needs to scale as new members register, new jobs are posted, new companies, skills and titles appear in member profile, etc.  we apply machine learning techniques to solve the challenges when building the LinkedIn knowledge graph, which is essentially a process of data standardization on user- generated content and external data sources, in which machine learning is applied to: entity taxonomy construction, entity relationship inference, data representation for downstream data consumers, knowledge penetration (insights) from graph, and active data acquisition from users to validate our inference and collect training data.  LinkedIn knowledge graph is a dynamic graph. New entities are added to the graph and new relationships are formed continuously. Existing relationships can also change. For example, the mapping from a member to her current title changes when she has a new job. We need to update LinkedIn knowledge graph in real-time upon member profile changes and new entity emergencies. LinkedIn Knowledge Graph Source: Building the LinkedIn Knowledge Graph: https://engineering.linkedin.com/blog/2016/10/building-the-linkedin-knowledge-graph
  46. 46. Semantic Web Technologies The term “Semantic Web” was coined by Tim Berners-Lee for a web of data (or data web)[6] that can be processed by machines[7]—that is, one in which much of the meaning is machine-readable. While its critics have questioned its feasibility, proponents argue that applications in library and information science, industry, biology and human sciences research have already proven the validity of the original concept.[8] Berners-Lee originally expressed his vision of the Semantic Web as follows: I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The "intelligent agents" people have touted for ages will finally materialize.[9]
  47. 47. Semantic Web Technologies The term "Semantic Web" is often used more specifically to refer to the formats and technologies that enable it.[5] The collection, structuring and recovery of linked data are enabled by technologies that provide a formal description of concepts, terms, and relationships within a given knowledge domain. These technologies are specified as W3C standards and include: Resource Description Framework (RDF), a general method for describing information RDF Schema (RDFS) Simple Knowledge Organization System (SKOS) SPARQL, an RDF query language Notation3 (N3), designed with human-readability in mind N-Triples, a format for storing and transmitting data Turtle (Terse RDF Triple Language) Web Ontology Language (OWL), a family of knowledge representation languages Rule Interchange Format (RIF), a framework of web rule language dialects supporting rule interchange on the Web  RDF is a simple language for expressing data models, which refer to objects ("web resources") and their relationships. An RDF-based model can be represented in a variety of syntaxes, e.g., RDF/XML, N3, Turtle, and RDFa. RDF is a fundamental standard of the Semantic Web.[25][26]  RDF Schema extends RDF and is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes.  OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.  SPARQL is a protocol and query language for semantic web data sources.  RIF is the W3C Rule Interchange Format. It's an XML language for expressing Web rules that computers can execute. RIF provides multiple versions, called dialects. It includes a RIF Basic Logic Dialect (RIF-BLD) and RIF Production Rules Dialect (RIF PRD). An ontology in OWL is made up of statements about Classes (i.e., sets of things) and Properties (ways that things relate to other things). Classes and Properties are Concepts.
  48. 48. Semantic Web Technologies - Wikidata Source: Wikidata: https://www.wikidata.org/wiki/Wikidata:Main_Page
  49. 49. Semantic Web Technologies - Wikidata
  50. 50. How to build Knowledge Graphs?
  51. 51. Example Content to be represented in different graph storage models We consider the following Graph Data Storage Models:  Labelled Property Graph (LPG)  Resource Description Framework (RDF)  Specialized Data Storage Model LHS is a Summary Content Card of a well-defined search query on google. We will use this as an example to show the differences in modelling of LPG & RDF Graphs Source: Flipkart Commerce Graph – Evolution of graph data stores: https://tech.flipkart.com/flipkart-commerce-graph-evaluation-of-graph-data-stores-8fe0f964affd
  52. 52. Labelled Property Graph (LPG) Stores The LPG Storage model is extensible and used when extensive storage and the ability to write a query are the requirements. An LPG store has the following characteristics:  The vertices (nodes) have a uniquely identifiable ID and a set of key- value pairs, or properties, that characterise them.  The edges (relations) have an ID, a type and a set of key value of pairs, or properties that characterise the connections. The LHS labelled property graph depicts the Summary Card illustration The LPG has certain elements in common with the RDF graph along with the following imperatives: The Nodes have an internal structure. Example: The Node ‘v-wall’ has properties such as name and genre to characterise the content stored in the node. The Values of attributes do not represent the vertices in the graph. Example: The node ‘v-floyd’ has properties such as Name to characterize the content stored in the node. Source: Flipkart Commerce Graph – Evolution of graph data stores: https://tech.flipkart.com/flipkart-commerce-graph-evaluation-of-graph-data-stores-8fe0f964affd
  53. 53. Resource Description Framework (RDF) Store RDF Storage model is mostly used in solutions which exchange data and metadata. At the core, is this notion of a ‘Triple’, which is a statement composed of three elements that represent two vertices connected by an edge. It’s called subject-predicate-object.  Subject is a resource, or a Node in the graph  Predicate represents an edge–a Relationship  Object is to be another node or a literal value. From the RDF perspective, this is another vertex. Resources (vertices/nodes) and Relationships (edges) are identified by a URI, which is a unique identifier The LHS RDF graph depicts the Summary Card illustration RDF store creates an atomic decomposition of the data, and we may find nodes in the graph that are resources and also literal values. Source: Flipkart Commerce Graph – Evolution of graph data stores: https://tech.flipkart.com/flipkart-commerce-graph-evaluation-of-graph-data-stores-8fe0f964affd
  54. 54. Comparison Source: Flipkart Commerce Graph – Evolution of graph data stores: https://tech.flipkart.com/flipkart-commerce-graph-evaluation-of-graph-data-stores-8fe0f964affd
  55. 55. Grakn.ai Source: GRAKN.AI: The Hyper-Relational Database for knowledge oriented systems: https://www.slideshare.net/GraknLabs/graknai-the-hyperrelational-database-for-knowledgeoriented-systems-80937069
  56. 56. Grakn.ai Source: GRAKN.AI: The Hyper-Relational Database for knowledge oriented systems: https://www.slideshare.net/GraknLabs/graknai-the-hyperrelational-database-for-knowledgeoriented-systems-80937069
  57. 57. Grakn.ai Source: GRAKN.AI: The Hyper-Relational Database for knowledge oriented systems: https://www.slideshare.net/GraknLabs/graknai-the-hyperrelational-database-for-knowledgeoriented-systems-80937069
  58. 58. Grakn.ai Source: GRAKN.AI: The Hyper-Relational Database for knowledge oriented systems: https://www.slideshare.net/GraknLabs/graknai-the-hyperrelational-database-for-knowledgeoriented-systems-80937069
  59. 59. Grakn.ai Source: GRAKN.AI: The Hyper-Relational Database for knowledge oriented systems: https://www.slideshare.net/GraknLabs/graknai-the-hyperrelational-database-for-knowledgeoriented-systems-80937069
  60. 60. Grakn.ai Source: GRAKN.AI: The Hyper-Relational Database for knowledge oriented systems: https://www.slideshare.net/GraknLabs/graknai-the-hyperrelational-database-for-knowledgeoriented-systems-80937069
  61. 61. Grakn.ai Source: GRAKN.AI: The Hyper-Relational Database for knowledge oriented systems: https://www.slideshare.net/GraknLabs/graknai-the-hyperrelational-database-for-knowledgeoriented-systems-80937069
  62. 62. Grakn.ai Source: GRAKN.AI: The Hyper-Relational Database for knowledge oriented systems: https://www.slideshare.net/GraknLabs/graknai-the-hyperrelational-database-for-knowledgeoriented-systems-80937069
  63. 63. Grakn.ai Source: GRAKN.AI: The Hyper-Relational Database for knowledge oriented systems: https://www.slideshare.net/GraknLabs/graknai-the-hyperrelational-database-for-knowledgeoriented-systems-80937069
  64. 64. Grakn.ai Source: Logical inference in a Hyper-Relational Database: https://www.slideshare.net/GraknLabs/logical-inference-in-a-hyperrelational-database-80936775
  65. 65. Why use Knowledge Graphs?
  66. 66. Why use Knowledge Graphs? Source: Data Centric Architecture in Business Transformation using Knowledge Graphs: https://www.slideshare.net/AlanMorrison/datacentric-business-transformation-using-knowledge-graphs
  67. 67. Why use Knowledge Graphs? Source: Data Centric Architecture in Business Transformation using Knowledge Graphs: https://www.slideshare.net/AlanMorrison/datacentric-business-transformation-using-knowledge-graphs
  68. 68. Why use Knowledge Graphs? Source: Data Centric Architecture in Business Transformation using Knowledge Graphs: https://www.slideshare.net/AlanMorrison/datacentric-business-transformation-using-knowledge-graphs
  69. 69. Why use Knowledge Graphs? Source: Data Centric Architecture in Business Transformation using Knowledge Graphs: https://www.slideshare.net/AlanMorrison/datacentric-business-transformation-using-knowledge-graphs
  70. 70. Why use Knowledge Graphs? Source: Data Centric Architecture in Business Transformation using Knowledge Graphs: https://www.slideshare.net/AlanMorrison/datacentric-business-transformation-using-knowledge-graphs
  71. 71. Why use Knowledge Graphs? Source: Data Centric Architecture in Business Transformation using Knowledge Graphs: https://www.slideshare.net/AlanMorrison/datacentric-business-transformation-using-knowledge-graphs
  72. 72. NOTES
  73. 73. Data Models Relational Hierarchical Network A database model to manage data as tuples grouped into relations (tables) A structure of data organized in a tree like model using parent child relationships A database model that allows multiple records to be linked to the same owner file Arranges data in tables Arranges data in a tree similar structure Organizes data in a graph structure Represent both “one to many” and “many to many” relationships Represents “one to many” relationship Represents “many to many” relationships Easier to access data Difficult to access data Easier to access data Flexible Less Flexible Flexible Each child node can have only 1 parent Child node can have multiple parents
  74. 74. Artificial Intelligence – Five Tribes of ML  Symbolists – view learning as the inverse of deduction and take steps from philosophy, psychology and logic Master algorithm – Inverse deduction  Connectionists – reverse engineer the brain and are inspired by neuroscience and physics Master algorithm – Back propagation  Evolutionaries – simulate evolution on the computer and draw on genetics and evolutionary biology Master algorithm – genetic programming  Bayesians – believe learning is a form of probabilistic inference and have their roots in statistics Master algorithm – Bayesian inference  Analogizers – learn by extrapolating from similarity judgements and are influenced by psychology and mathematical optimization Master algorithm – support vector machine Source: The five tribes of machine learning https://medium.com/@jrodthoughts/the-five-tribes-of-machine-learning-c74d702e88da