Publicité
Publicité

Contenu connexe

Similaire à Elsevier: Empowering Knowledge Discovery in Research with Graphs(20)

Publicité
Publicité

Elsevier: Empowering Knowledge Discovery in Research with Graphs

  1. March, 2023 Erik M. Schwartz, VP Product, Knowledge Discovery Enabling Knowledge Discovery in Research with Graphs Neo4j GraphSummit London
  2. A journey to better discovery experiences
  3. Digital Search Application at the Naval Research Laboratory
  4. No more physical space
  5. 1995 TIFF images of Elsevier journals were delivered on disks and loaded into shared drives Electronic Delivery
  6. Building Search Based Applications for more than 25 years 1994 2003 2007 2010 2018
  7. Our Mission Elsevier helps researchers and healthcare professionals advance science and improve health outcomes for the benefit of society. 7
  8. • 2,700+ digitized journals, including The Lancet (1823) and Cell • 43,000+ eBook titles; including iconic works: Gray's Anatomy. • Since the year 2000, more than 99% of the Nobel Laureates in science have published in Elsevier journals • 600k+ peer-reviewed articles in 2020 - 89% more than a decade ago Trusted in research and health for over 140 years 8 Trusted The future is open The innovation delta A better world At a glance
  9. 9 . . Enriched data Enhanced analytics Evidence-led decisions Trusted The future is open The innovation delta A better world At a glance
  10. RELX Risk Scientific, Medical and Technical A&G Corporate Health Markets Legal LexisNexis Exhibitions RELX develops information-based analytics and decision tools for professional and business customers in the Risk, Scientific, Technical & Medical, Legal and Exhibitions sectors. https://www.relx.com/
  11. Knowledge Discovery Providing Search and Recommendations services to enable research and drive better outcomes for society
  12. As a shared service, KD doesn’t go to market directly. We build collaborative partnerships with products, and share objectives. We help products grow by enabling: 1. Better Discovery experiences with Embeddings at scale 2. Access linked data more quickly with Structured Search 3. Increase engagement by using reusable Recommenders Knowledge Discovery core services KD enables Elsevier products to lead the market in academic discovery services
  13. Research Process - Simplified Discover Find existing research and experts to refine areas of focus. Stay up to date. Secure Funding Publish Establish in the system a record the hypothesis and conclusions of research. Carefully document Methods & Protocols Assess Evaluate personal academic output, compare against peers, compare institutions. Get hired/promoted.
  14. Research Process - Simplified Discover Find existing research and Experts to refine areas of focus. Stay up to date. Secure Funding Publish Establish in the system a record the hypothesis and conclusions of research. Carefully Document Methods & Protocols Assess Evaluate personal Academic output, compare against peers,, compare institutions. Get hired / promoted.
  15. Scopus Editorially Curated A&I database The most trusted source for measuring and assessing academic output Key Use Cases • Find assess literature • Assess my academic output • Assess my institutions academic output • Find Experts
  16. 780,000,000,000 Annual search requests
  17. 95% Percent of Structured Queries
  18. Structured Queries Use Cases 1.Find all Papers by Author 2.Find all Citations that reference a paper 3.Find all Metadata about a paper
  19. Introducing the graph Neo 4j – Solved our Structured Query problems allowing us to move away from a search engine. Using Graph QL we are enabling data driven applications throughout the portfolio
  20. 4 Billion Total Relationships
  21. Our graph by the numbers References Grants Works: 311M Abstracts: 85M Authors: 47M Topics: 56K Journals: 163K Organizations: 8.8M
  22. Use case 1: Find all Papers by Author References Grants Works: 311M Abstracts: 85M Authors: 47M Topics: 56K Journals: 163K Organizations: 8.8M 1 2
  23. Use case 2: Find all Citations that reference a paper References Grants Works: 311M Abstracts: 85M Authors: 47M Topics: 56K Journals: 163K Organizations: 8.8M 1 2
  24. Use case 3: Find all Metadata about a paper References Grants Works: 311M Abstracts: 85M Authors: 47M Topics: 56K Journals: 163K Organizations: 8.8M 1 2 2 2 2 2 2
  25. Graphs help us build new product experiences Scopus Societal Impact Article Sustainable Development Goals (SDGs) Editorial Manager Conflict of Interest Find Reviewer Scopus and ScienceDirect Showcase my work Author Profiles ScienceDirect Read Literature Enhanced PDF Reader Author Connections ScienceDirect Find and Assess Literature Search Results Citation counts on SERP / Profiles Scopus Societal Impact Organization SDGs
  26. Practically speaking, we can now take the data that we have in the graph and create a much more precise view of our data. Combined with Embeddings we can now get a much deeper understanding of our Author profiles • Are they really an expert in a field? • Are they still working in this field? • Have they changed fields? More sophisticated ways to understand Experts
  27. Accelerating Data and Analytics PAGE RANK TO EVALUATE ACADEMIC IMPACT CONVENIENT AND EFFICIENT SUPPORT FOR DATA SCIENCE GRAPH DATA SCIENCE (GDS) LIBRARIES FOR EXPLORATIONAL EXPERIMENTS
  28. Where are we in our Graph Journey? Evaluation Neo4j was the best performing Graph DB on the market Integration Connected Graphs to our data pipelines with near real time performance Scaling Ensuring that the Graph can me our performance and scale requirements Decision Selected Enterprise for current and future projects Accelerate Solving existing and new use cases You are here
  29. Thank You
  30. Speaker Biographies • Erik M. Schwartz • Elsevier, 5 years • e.schwartz@elsevier.com • m. +44 (0) 7880 300319 • o. +44 (0) 2074 244309 • Erik has 25+ years of building search product experiences before joining Elsevier with Convera, FAST, Microsoft, Comcast @Eschwaa https://www.linkedin.com/in/eschwaa/

Notes de l'éditeur

  1. Orange font on dark background
  2. Good morning, everyone. My name is Erik Schwartz, and I am a knowledge discovery (KD) guy. Today, I would like to share my journey of building a knowledge graph and the lessons we have learned along the way.
  3. In 1995, I built my first search application at a Navy research facility in Washington, D.C. The library where I worked was running out of space, so we started receiving academic journals in TIFF format on CD-ROMs. We created a digital library by OCRing the TIFF images and making them fully text searchable. That was the beginning of my journey into knowledge discovery. The NRL is a historic research facility, credited with discovering RADAR by sending radio signals across the Patomac River and detecting passing ships . Seated across the river from the Reagan National Airport in Washington, DC, this iconic radar dish sits atop the building that holds the base commander and ther Library. In DC they lovingly refer to the dish as the world’s largest bird bath
  4. The library was responsible for receiving journals in paper format for the researchers on the lab. The fundamental challenge that the library had was that they were out of physical space.
  5. We would rip the images off of the disks, OCR’d them, wrapped them into PDFs, and made them fully text searchable.
  6. A bit about me. After leaving the NRL, I worked for search engine companies, was acquired twice in 2007, and then spent 8 years at Comcast before coming over seas to London to change the search experiences at Elsevier
  7. [Script:] As it has for so many, this pandemic has brought a lot into focus. For the people at Elsevier, our mission has never been clearer. We help researchers and healthcare professionals advance science and improve health outcomes for the benefit of society. It is the scientists, the researchers and healthcare professionals who are leading us out of this global health crisis.
  8. [Script:] Ofcourse you know Elsevier as a publisher and the pace of research and knowledge creation is accelerating. Last year we published more than 600,000 peer-reviewed articles, 89% more than a decade ago. Every month, more than 18 millon users visit ScienceDirect®. In 2020 more than 1.6 billion articles were downloaded.
  9. [Script:] While our publishing continues to grow, Elsevier does much more than produce content. We combine Machine Learning and Natural Language Processing with vast quantities of quality structured data to help researchers, engineers and clinicians perform their work better. It’s this unique delta of data, analytics and evidence that’s taking us in exciting directions. They say that “innovation happens at the intersections.” For example in this cord graph we’re able to visualize the state research in artificial intelligence; to identify connections, relationships, emerging fields – the intersections of science.
  10. Today, I work at Elsevier, which is part of RELX, one of four companies that make up the STM, Legal, Risk, and RX segments. In the STM segment, we provide three core services: text search, structured search, and recommenders. Our team serves A&G and our primary focus is to modernize Scopus, an A&I database containing enriched titles and abstracts for almost 90 million journal articles from Elsevier and hundreds of other publishers
  11. Who we are and what we do. We support A&G products globally and at scale
  12. Focused on 3 key ares: Search, Graph and Recommenders to grow products while aligned strategically with their outcomes
  13. But let me tell you, the path to getting here has not been easy. Our team was faced with a daunting challenge - modernizing Scopus, an A&I database that contains enriched titles and abstracts for almost 90 million journal articles from Elsevier and hundreds of other publishers. Customers use it primarily to evaluate academic output and to find and assess literature. Our search engine was receiving 750 billion requests per year, and 95% of those queries were structured queries. The primary objective of using a graph was to move those structured queries to a more suitable infrastructure, away from a search engine. And that's where the drama begins.
  14. 780Billion . ¾ of a trillion requests handled by our Search Engine per year By Comparison, Google does about 8.5 Billion searches per day
  15. 95% of our requests our structured queries – these include requests like, give me all of the metadata a document, give me all of the information about an author, give me all of the information about my institution. This is supported today by almost 200 Nodes of Search Indexes (SOLR)
  16. So why Neo4j? We wanted a graph so that we could solve for structured queries now and leverage graph relationships for KD in the future. Neo was the fastest graph database on the market for both ingest and query.   We built a Graph QL based system to handle structured queries. Our KD graph consists of the following services: ingestion, metrics service, taxonomy service, graph query service, and hydration. The graph data model consists of the relationships between the core entities in our academic literature, which include works (articles, books, and book chapters), abstracts, authors, topics, journals, and organizations.
  17. The total number of relationships that we have in our graph connecting our core entities.
  18. It starts with Works. Works are articles, books, book chapters. It’s the content that is the core to our business. Associated with the Works are Abstracts. Not every article has an abstract but roughly 75% of all articles in Scopus have an abstract. This jumps to over 85% when we look at content published after 1985. Authors are associate to a Work. As are Topics Works belong to Journal. Authors are affiliated with an organization. But there is an important temporal nature here. The association is with the organization at time of publication. This can change over time. Grants are associate with Authors. As you can see this graph now allows us to start answering some pretty interesting questions. How much is a given topic worth? What is the societal impact of an Organization? What is this organization best at? By adding embeddings of Abstracts, we can enable natural language and semantic representation to engage with this data model.
  19. It starts with Works. Works are articles, books, book chapters. It’s the content that is the core to our business. Associated with the Works are Abstracts. Not every article has an abstract but roughly 75% of all articles in Scopus have an abstract. This jumps to over 85% when we look at content published after 1985. Authors are associate to a Work. As are Topics Works belong to Journal. Authors are affiliated with an organization. But there is an important temporal nature here. The association is with the organization at time of publication. This can change over time. Grants are associate with Authors. As you can see this graph now allows us to start answering some pretty interesting questions. How much is a given topic worth? What is the societal impact of an Organization? What is this organization best at? By adding embeddings of Abstracts, we can enable natural language and semantic representation to engage with this data model.
  20. It starts with Works. Works are articles, books, book chapters. It’s the content that is the core to our business. Associated with the Works are Abstracts. Not every article has an abstract but roughly 75% of all articles in Scopus have an abstract. This jumps to over 85% when we look at content published after 1985. Authors are associate to a Work. As are Topics Works belong to Journal. Authors are affiliated with an organization. But there is an important temporal nature here. The association is with the organization at time of publication. This can change over time. Grants are associate with Authors. As you can see this graph now allows us to start answering some pretty interesting questions. How much is a given topic worth? What is the societal impact of an Organization? What is this organization best at? By adding embeddings of Abstracts, we can enable natural language and semantic representation to engage with this data model.
  21. It starts with Works. Works are articles, books, book chapters. It’s the content that is the core to our business. Associated with the Works are Abstracts. Not every article has an abstract but roughly 75% of all articles in Scopus have an abstract. This jumps to over 85% when we look at content published after 1985. Authors are associate to a Work. As are Topics Works belong to Journal. Authors are affiliated with an organization. But there is an important temporal nature here. The association is with the organization at time of publication. This can change over time. Grants are associate with Authors. As you can see this graph now allows us to start answering some pretty interesting questions. How much is a given topic worth? What is the societal impact of an Organization? What is this organization best at? By adding embeddings of Abstracts, we can enable natural language and semantic representation to engage with this data model.
  22. We have learned many lessons throughout our journey of building a knowledge graph. We have defined our metrics of success, which include expert finding use cases, using Page Rank as a new way to rank academic impact, providing convenient and efficient data support for data science work, and using graph data science libraries for explorational experiments.   New technology is hard, but graph thinking enables a new way of problem-solving. We have applied graph thinking to solve problems, such as conflicts of interest and user-curated organization hierarchies, and we have found success. We have also learned that combining hierarchies and taxonomies with graph data allows us to use user-curated organization hierarchies to detect conflicts of interest at various levels of organization structures.   We are setting up for success for the future. Conflicts of interest enable expert finding use cases. Graph QL and federated graphs enable acceleration for innovation. We are building hybrid recommenders leveraging our data.  
  23. In conclusion, our journey of building a knowledge graph has taught us many valuable lessons. We have defined our metrics of success, applied graph thinking to solve problems, and set up for success for the future. Thank you for listening.
Publicité