In our days, enterprise digital content is scattered on several independent systems and subsystems, which perform services such as user authentication, document storage, and provide search capabilities. Centralising enterprise data into a single repository is a growing necessity for organisations.
Piergiorgio Lucidi (TAI Solutions) and Luis Cabaceira (Alfresco) proposed an approach that could be the "silver bullet"and that will open a clear path for Enterprise digital content centralisation.
In its genesis, Apache ManifoldCF is a crawler that allows you to manage content indexes in your search engines, this was the main goal of the product.
We've realised that we could leverage ManifoldCF to also migrate content, and not only indexes, making it a very good migration tool.
This talk will focus on 2 new output connectors for Apache ManifoldCF that are being developed by us.
This session was presented at the Alfresco DevCon 2018 in Lisbon.
Please see also the retrospective article available here:
https://www.open4dev.com/journal/2018/1/22/alfresco-devcon-2018-wrapping-up
4. Learn. Connect. Collaborate.
Piergiorgio
Lucidi
Chief Technology Evangelist / EIM Specialist @ TAI Solutions
Alfresco Certified Instructor / Engineer / Administrator
Alfresco Forum Moderator / Community Star (OpenPJ)
Mentor / PMC Member / Committer @ ASF
• Apache ManifoldCF
• Apache Community Development
Member of Technical Advisory Group @ Microsoft
Reviewer of toolkits and white papers @ AIIM
Author and Technical Reviewer @ Packt Publishing
Project Leader / Committer @ JBoss Community
6. What is Apache ManifoldCF?
Repository 1
Repository 3
Repository 4
Repository 2
Search Server 1
Search Server 2
Search Server 3
Search Server 4
7. Learn. Connect. Collaborate.
Apache ManifoldCF - Concepts 1/2
Crawling
Repository
Connection
Output
Connection
Authority
Connection
Track injected contents
for incremental
executions
Fetch from content
repositories
Inject contents
to search servers and
repositories
Access tokens
8. Learn. Connect. Collaborate.
Apache ManifoldCF - Concepts 2/2
Tranformation
Connection
Notification
Connection
Status
Reports
History
Reports
Manipulate fetched
contents before the
injection
Send notification
messages for ending
and error events
Inspect the current
queues for the
executing jobs
Search and discover
transactions for
executed jobs
9.
10. Learn. Connect. Collaborate.
Schedule your jobs using UI
Configure repositories for getting contents
Configure search servers for indexing
Configure your transformations
Look at the history for each job and
connection
Configure your target content repositories
to migrate contents
Apache ManifoldCF - What you can do
11. Learn. Connect. Collaborate.
Big swerve: Content Migration
Output Connectors can be used for migrating
contents to target repositories
What we are doing / what we need:
Changes to the core framework
Brand new connector implementations
Unit and integration tests!!!
Scheduled for the next ManifoldCF 2.10 / 3.x
50. Learn. Connect. Collaborate.
Content Migration - Roadmap
Repository connectors
New connectors: Azure Storage
Content migration extension support on existing ones
Output Connectors
New connectors: Azure Storage and Amazon S3
Metadata mapping
Bugfixing
Testing
If you want to help us your involvement is welcome :)