Lets understand a scenario to implement the Retry Mechanism in Salesforce.
Scenario:
There is a third party system which sends data to Salesforce and Salesforce processes that data and updates the objects. If there is a failure in update/insert then Salesforce system should retry to process them again. So, in such cases we can design the solution using the stage object.
Objects:
Consider we want to do insert/update on Account based on the data received. For that, we can create a staging object lets say Accountstaging__c which will have fields similar to the Account object.
Design:
Using the standard APIs, third party system can send data to a staging table. There will no validation put in place when the data is being added to the staging object. This is done to assure that all the data reaches Salesforce system without any failure.
Salesforce then picks the data using a trigger or batch and performs the business logic and finally stores the data into Account Object.
Bulk Data Load to Salesforce:
This means when the other systems want to send data in bulk. In that case, we need to process all and update account object accordingly.
Lets say we are about to receive 10,000 records. We can discuss with the other system team and ask them to send a unique id for each bulk load and two extra records. One record will tell us that bulk data load has started to Salesforce and other record will tell that all the data has been sent. After this Salesforce can start its processing.
Record will come in this manner:
Begin of File(bof)-->Actual 10,000 records-->End of File(eof)
Here bof and eof are the checkboxes fields on the account stage object.
Data Processing:
We can have a trigger on stage object which will check whether the eof is true. If yes, then call the batch and query all the records using that unique id and process the data. At the end, we can mark the records in staging object as success or failed.
Retry Mechanism:
Consider some failures when doing the bulk updates in the system. We will have the list of failed records by querying all the failed records for that batch id.
These failed records can be retried using time based Process Builder to call the same batch class with a time gap or can call the batch class immediately by passing the list of records.
So based on the retry processing, we can update the stage records status as success or failure.
This mechanism can be done multiple times on the failed records if we add an integer field called retry counter on staging object. This field will be incremented by 1 whenever there is a failure. While doing the retry we can check whether the retry counter is less than 3, then only get the records. In this way the failed records can be retried multiple times.
If you have any thoughts on this, lets connect and discuss more this.