A Cookbook for Replication in AEM

Before getting deep into replication concept in AEM, I thought like “Replication is just activating/publishing a page from author to publish environment”. While going deep through this concept, came to know about various faces and features of replication that excited me a lot. The overall objective of this article is to share comprehensive knowledge on replication process in AEM. For some topic, as it need in depth expertise, I’m sharing the blog/article that I have referred so that you can get more insights on those topics.

In this article, we are going to explore the below things related to replication process in AEM:

  • A What is Definition for Replication in AEM
  • Types of Replication Agent
  • Different ways to trigger Replication
  • Replication Status
  • What is De-activation or unpublishing of a page
  • How to configure a default replication agent
  • How to ensure security during replication
  • Monitoring of Replication Agents
  • Troubleshooting  Replication Issues
  • ACS AEM Common Tools available for Replication
  • Different Replication Scenarios

 

Replication – Definition:

Once you done with authoring of your content in the author environment, the goal is to make it available on your publish environment. The process of moving a page from Author to Publish is known as Replication/ Activation/ Publishing of a Page.

When the activation request made to publish any content, the configured replication agent package the content & put it on replication queue. The content is lifted from the queue and transported to the publish environment using the configured protocol – HTTP. A listener Servlet at publish instance, receives the content package & update it on publish instance. The default listener servlet in publish instance is http://localhost:4503/bin/recieve.

Types of Replication Agent:

1) Default Replication Agent – To activate/publish content from Author to Publish Instance

2) Dispatcher Flush Agent – To explicitly flush the content from dispatcher cache

3) Reverse Replication Agent – To get user generated data from Publish to Author

4) Static Replication Agent – To replicate a static representation of a node into the filesystem

5) Test and Target Replication Agent – To replicate your content to Adobe Test and Target, which will inherit the replication settings from the cloud configuration attached to a campaign

6) YouTube Publish – To replicate your content to YouTube whose authentication settings are managed under cloud services

7) Dynamic Media Hybrid Image Replication – To replicate Dynamic Media [Scene7] assets such as images, video metadata etc.

8) Offloading Outbox Replication Agent – To offload the requests of Outbox Replication Agent

Apart from this, you can create your custom replication agent. For example, the following post of Nateyolles describes on how to create a custom replication agent that flush the cache content from Akamai CDN [great post with lot of info!]

http://www.nateyolles.com/blog/2016/01/aem-akamai-custom-replication-agent

 

Ways to trigger replication:

1) From the Page Editor – Publishing from the page editor is a shallow publish, i.e. only the selected page/pages is/are published and not any child pages. This is useful when you want to replicate specific pages after any content change.

2) From the Site Admin Console

3) By Tree Activation [Activating a complete section (tree) of your website] – If you are replicating your website content for the first time, then you can make use of this feature.

4) Using the Replication option in Package Manager – This wold be useful when you wants to replicate the specific set of content.


5) Using Workflows – You can approve and automate page replication by using workflows. After getting approvals for the edit, the corresponding Page/ Asset will get activated to publish environment [Example: Page Activation Workflow]

6) Scheduled Activation/Deactivation (On/Off Time) – You can schedule times for a page to be published/unpublished using the On Time and Off Time that can be defined in the Page Properties.

 

7) Using CURL Command – By using curl commands you can activate content from author to publish

Activate

curl -u admin:admin -X POST -F path=”/content/path/to/page” -F cmd=”activate” http://localhost:4502/bin/replicate.json

Deactivate
curl -u admin:admin -X POST -F path=”/content/path/to/page” -F cmd=”deactivate” http://localhost:4502/bin/replicate.json

Tree Activation
curl -u admin:admin -F cmd=activate -F ignoredeactivated=true -F onlymodified=true
-F path=/content/geometrixx http://localhost:4502/etc/replication/treeactivation.html

 

Replication Status Indicator [Classic UI]:

 

 

 

Deactivation/ Unpublishing:

When you want to remove a page from the publish environment, you can deactivate a Page in AEM and this is referred as unpublishing. While unpublishing, the page is de-activated only in Publish and it remains available on the author environment for further changes until you delete it.

Configuring a Default Replication Agent:  

We have seen the different types of replication agents available in AEM and various ways to trigger it. Let see how to configure a default replication agent in author instance, so that you can replicate your content from author to publish.

Go to Tools -> Replication -> Agents on Author

Or Hit, http://localhost:4502/etc/replication/agents.author.html

It will show you the list of replication agents available.

Replication_Agent

Select the Default agent (Publish). Go to Settings Edit will show you the different tabs to configure your replication agent.

default_agent_config_1

Important Parameters to consider while configuring your replication agent:

  • In the Settings tab, you will configure the things like agent name and description. Then Enabled checkbox needs to be selected in order to activate that particular replication agent. The default Retry Delay is 60000 milliseconds.
  • If you click on Serialization Type dropdown, it will show 4 options. Choose the appropriate one depends on your need.
  1. Default – Durbo [default type] Set if the agent is to be automatically selected
  2. Dispatcher Flush – Select this if the agent is to be used for flushing the dispatcher cache
  3. Binary less –It allows content activated to the “publish” instance to point to the same Data Store as the one used by the “author” instance which allows us to avoid duplication of the Data Store among “author” and multiple “publish” instances, thus saving on storage costs and reducing the time taken to activate a digital asset in half.For more details on Binary less replication and Shared data Store Implementation, kindly refer

 

http://cq-ops.tumblr.com/post/59996536676/how-to-share-the-data-store-between-author-and

https://aemcorner.com/shared-data-store/

4. Static Content Builder – This is a specialized Agent that stores a static representation of a node into the filesystem. For more details, kindly refer

http://www.aembeginner.com/aem-6-3-static-replication-agent-configuration-and-use-cases/

  • Agent User Id – Create a specific user to replicate content with appropriate access control and use that user as Agent Id. Leave empty if to use the default system user. Agent User Id should not be the admin user, but a user who can only see content that is supposed to be replicated.
  • In the Transport tab, enter the details of the Target [Publish] Agent details like URI, User and password details to authenticate replication

default_agent_config_2

  • In the Triggers tab, we are having many useful parameters.

default_agent_config_3

Ignore default – If this field is enabled, then this agent will not be used to replicate content when a content author issues replication request

On modification – If this field is enabled, the modified content will be auto replicated

On/Off Time Reached – If this field is enabled, agent will auto-replicate if a page passes an on-/offtime boundary [Scheduled Page Activation]

On Receive – if this fields is enabled, the agent will chain replicate whenever receiving replication events.

To know more about Chain Replication, Kindly refer

http://aemfaq.blogspot.in/2013/05/chain-replication-sample.html

No Versioning – Checking on this will avoid the creation of Page Versions which will improve performance

How to ensure Security during Replication:

While replicating a Page from author to publish environment, AEM uses the HTTP protocol. In addition to that AEM uses a proprietary binary format for replication called Durbo. Durbo includes the necessary check summing to ensure that replicated content is not corrupted during transport. If you wants to add an extra layer of security, you can go for mutual SSLMSSL for authenticating the replication request. Using MSSL, the replication agent in author and the HTTP service on the publish instance use certificates to authenticate each other.

For further info kindly refer,

https://helpx.adobe.com/experience-manager/6-3/sites/deploying/using/mssl-replication.html

Troubleshooting Replication Issues:

Each replication agent will have a single queue used to deliver replication packages to a receiving instance. This replication queue will work on FIFO [First In First Out] manner. So if an item fails to replicate/deliver to target instance, it will get stuck in the queue which will block the remaining items to be replicated in the queue.

If you activated your content, but it is not replicated to the publish instance. Then you can troubleshoot the replication queue by hitting the URL [http://localhost:4502/etc/replication/agents.author.html), which will display the list of replication agents. Select the replication agent that you want to debug.  The below screenshot will show you the blocked replication queue.

Possible reasons for a blocked Replication Queue:

  1. The replication agent that you configured to activate content from author to publish may be disabled. So ensure that the particular agent is enabled [Green Mark]
  2. Replication may not work if you change the IP of the hosted publish instance server. So if you change the IP address of any Publish Server, edit the configuration of its corresponding replication agents in author instance to avoid replication failure. Verify the connectivity with the publish instance by clicking Test Connection. It will show “Replication Test Succeeded” if the agent is able to replicate the content to its target instance. Otherwise if any errors, you can click on View Logs.
  3. Basically the replication job could be stuck in a socket read waiting for the publish instance or dispatcher to respond. This could mean that the publish instance or dispatcher is under high load or stuck in a lock.
  4. If you see the Replication pending message, try a force retry for the first item on the queue. If it’s still pending, clear that item from the queue and try to replicate the remaining content. Still the replication status is pending, restart the Replication agent and the replication related bundles in the System Console.

Replication bundle – http://host:port/system/console/bundles/com.day.cq.cq-replication

Apache Sling Event Support bundlehttp://host:port/system/console/bundles/org.apache.sling.event

Apache Felix Event Adminhttp://host:port/system/console/bundles/org.apache.felix.eventadmin

5. After this also, if you see the blocked replication queue, then you have to clear the whole queue. try replicate the corresponding contents/Page again will resolve your replication issue.

Monitoring of Replication Agents:

In general, rather than troubleshoot while replication stuck, you can monitor and check the overall health of the replication agents in a periodical manner to ensure the proper functioning of Replication Agents. You can monitor Replication Agents in AEM through,

Through Classic UI – You can monitor the Replication Queue by going to Tools -> Replication -> Default Agents on Author and Click on the appropriate Agent Name to monitor its replication queue.

replication_queue_blocked

Through Touch UI – You can check the Health of Replication Queue in the Operations Dashboard, aimed at aiding system operators troubleshoot problems and monitor the overall health of an instance.replication_health_report.png

 

replication_health_report_2.PNG

Through JMX Console – The JMX Console available within AEM enables you to monitor and manage services on the CRX server.  For more details, kindly refer

https://helpx.adobe.com/experience-manager/6-3/sites/administering/using/jmx-console.html#ReplicationAgents

ACS AEM Common Tools available for Replication:

Package Replication Status Updater – As I mentioned earlier, you can do Package Replication from the Package Manager Console. This package replication status updater config will help us to check the replication status of the contents replicated through package. It is an Event handler that listens for JCR Package replications and updates the Replication Status of its content accordingly.

https://adobe-consulting-services.github.io/acs-aem-commons/features/package-replication-status-updater/index.html

Automatic Package Replication – This will explain how to auto replicate the packages using Schedulers on particular triggers or using workflow processes.

https://adobe-consulting-services.github.io/acs-aem-commons/features/automatic-package-replicator/index.html

 

Different Replication Scenarios:

Last but not the least, added different replication scenarios with a Pictorial representation to get a better understanding.

       

     

     

 

Hope this post gave you an in-depth view into replication process in AEM. Looking forward for any correction or feedback. Thanks!

 

Leave a comment