How do we monitor?
We use trend and summary data from the Kimble Sense and the NewRelic performance monitoring framework to identify indicators of conditions that require action or intervention to resolve.
What do we monitor?
Kantata’s Sense Analysis reports the number of Failed Jobs per org. The Failed Jobs report summarizes these data to show counts by org over recent days.
Job Cleardown Failed
Completed jobs are expected to be cleared down each day. When this does not occur, a typical daily volume of job records can quickly begin consuming storage space in the customer org if unmanaged for multiple days.
Kantata has a NewRelic alert will fire if the Cleardown job fails execution in the org.
An org should be processing jobs regularly according to the Operation Scheduler and the org’s configured Batch Size. A rapid increase in queued jobs over time could indicate that job volume is abnormally high and/or job processing in the org is not working properly.
Kantata has a NewRelic alert will fire if the total queued jobs in an org exceeds 30,000 for at least 3 hours.
Apex Async Processing Limit
An org has a limited capacity for Apex Async jobs within a rolling 24 hour period. Calculated in 30 minute increments. Exceeding this limit will cause an org to effectively shut down apex processing - extremely limited throughput and reduced processing capacity.
Salesforce has limits typically of 250,000 Apex Async jobs unless your org has more than 1,250 users. This limit can be increased temporarily by Salesforce.
Kantata has a NewRelic alert that will fire if the Apex Async processing limit in an org exceeds 75% of its limit for more than 10 minutes.