This document discusses using Apache Camel as a document processing platform to enrich content from Adobe Experience Manager (AEM) before indexing it into a search engine like Solr. It presents the typical direct integration of AEM and search that has limitations, and proposes using Camel to offload processing and make the integration more fault tolerant. Key aspects covered include using Camel's enterprise integration patterns to extract content from AEM, transform and enrich it through multiple processing stages, and submit it to Solr. The presentation includes examples of how to model content as messages in Camel and build the integration using its Java DSL.
5. Typical AEM + Search Architecture
+
Pros Cons
• Straight forward implementation
• Simple architecture (AEM + Search)
• Complete data model in AEM?
• Not all data may be in AEM
• Processing overhead
• Data cleansing, transformation and
enrichment handled in AEM
• Fault Tolerance
• What if Solr is down?
• Tight coupling to search platform
7. Goals for a better Architecture
• Offload processing outside of AEM
• Improve fault tolerance
• Provide flexible platform for data cleansing,
transformation and aggregation
• Allow for changes to indexing logic with impacting
AEM
• Search engine agnostic
+
10. Document Processing Platform
• Roles & Responsibilities
• Enriches submitted documents prior to indexing.
• Submits documents for indexing.
• Terms & Definitions
• Enrichment: Data cleansing, filtering, transformation,
aggregation, etc.
• Processing Stage: Independent processing unit
responsible for contributing to the enrichment process.
• Pipeline: Consists of one or more processing stages or
sub pipelines.
+
12. Document processing is really an
integration problem, right?
+
Integration Library Integration Framework &
Stream Processing
Enterprise Service Bus
Apache Camel Spring Integration Mule ESB
Spring Cloud Data Flow &
Cloud Stream
Low Complexity High
15. Why Apache Camel?
• Light weight—it’s a JAR
• Imposes no runtime constraints
• Routing engine
• Powerful, fluent Java DSL
• Mature open source project
• Extensive list of integration components
• Avoid writing boiler plate code—leverage EIPs
+
16. Apache Camel & EIP Concepts
+
Message
• Unit of information exchange between applications
Exchange
• Wraps inbound & outbound message + headers
Message Channel
• Allows applications to communicate using messaging
Pipes and Filters
• Perform loosely coupled processing on a message
• Routes and Processors in Camel
19. Importing Product Content into Solr
Problem: “As an AEM developer, I need to import product
content into Solr so that I can display products via search
and on PDPs on my AEM-powered site.”
+
Let’s use Best Buy’s Product API as example…
1. Fetch product data ZIP file via HTTP request.
2. Unzip product data.
3. Parse each JSON file to extract individual products.
4. Transform, enrich and cleanse each product as necessary.
5. Submit each product to Solr for indexing.
25. Enrichment Use Cases for AEM
• Search Relevancy
• Merge ratings and review signals
• Merge analytics signals (visits, page views…)
• Merge social signals (likes, shares, …)
• Cleanse data for search
• Rich content processing (Tika)
• Natural Language Processing (OpenNLP)
• Filter / drop documents
• Classify content
+
26. AEM: Data Model (1/3)
• Use a serializable object to represent your document
• In fact, use a HashMap
• No dependency object graph
• Most search platforms already think of documents as a
series of key/value pairs
• Use key name prefixes to model:
• Index operation type (aem.op)
• Document Fields (aem.field.<field>)
• Metadata (aem.meta.<field>)
+
27. AEM: Data Model (1/3)
HashMap<String, Object> jmsDoc = new HashMap<String, Object>();
// Operation Type
jmsDoc.put("aem.op.type","ADD_DOC");
// Document fields
jmsDoc.put("aem.field.id", page.getPath());
jmsDoc.put("aem.field.crxPath", page.getPath());
jmsDoc.put("aem.field.url", page.getPath() + ".html");
jmsDoc.put("aem.field.title", page.getTitle());
jmsDoc.put("aem.field.description", page.getDescription());
// Metadata
jmsDoc.put("aem.meta.foo", "bar");
+
28. AEM: Listener / JMS Producer (2/3)
+
• Create an AEM Listener
• Implement EventHandler interface
• Listen for the PageEvent topics
• Convert the Page resource to a our data model
• Add operation type
• Add document fields
• Add metadata fields
• Send the message to JMS index topic
• Example: JmsIndexListener.java
29. AEM: JMS Camel Consumer (3/3)
+
• Define your Camel runtime (e.g., standalone, OSGi, etc.)
• Define your Camel routes
• Consume JMS topic
• Route operation type using content-based router
• Enrich document as needed
• Convert JMS document model to Solr model
• Submit index request
• Example: AemToSolr.java
37. In summary…
+
• If you do not need enrichment, keep it simple and
use a direct indexing approach.
• If you have a need to enrich your AEM content
consider using Camel as your document processing
platform.
• This architecture is NOT search-specific!
• Syndicate AEM content to other systems
• Workflow replacement
Declarative Spring-based, route definition also available
Declarative Spring-based, route definition also available
Declarative Spring-based, route definition also available
Take a minute and visually think about how much code would be needed to achieve this goal?
Is most of it boilerplate (e.g., setting up HTTP client, dealing with file input/output, marshaling/unmarshaling JSON, etc.)?
TODO: Add transfrom
3 routes defined, all of which are asynchronous
Demo code available
Declarative Spring-based, route definition also available