Request Google to crawl URLs on Sitecore Publish using Google Indexing API

Most of the sites hold a bunch of short-lived content like Events, Job Posting. While reaching the intended audience for short-lived content is challenging, removing the expired content from Search Engine is also vital for user engagement. This can be solved by bridging Google Indexing Mechanism with Sitecore Publish Mechanism using Google Indexing API, which empowers businesses to gain maximum value by reaching the right users at the right time.

IMPORTANT: Google Indexing API allows automating of Google Indexing only for short-lived pages like job postings or live events currently.

STEPS TO CONFIGURE INDEXING API

Create a Google API project using this Setup Tool
Navigate to API Dashboard of the newly created project, and select ‘ENABLE APIS AND SERVICES’

Search for ‘Indexing API’ and Enable the same for the project

Navigate to Credentials Tab, and create credentials for the project

Navigate to Credentials Tab, and select ‘Manage Service Accounts’

Select ‘CREATE SERVICE ACCOUNT’ button to create a new Service Account which will be used for sending indexing requests to Google,

Select ‘Actions’ ->  ‘Manage Keys’ to create new JSON API Key.

Store the downloaded JSON file safely, it is required to send Indexing requests to Google

Navigate to Google Search Console and then to the respective property. Select ‘Settings’ -> ‘ADD USER’ and add the Service Account(created earlier).
Select ‘Actions’ button of any existing Owner Account and select ‘Manage Property Owners’ to add Service Account as Owner to the Google Search Console property(only verified owner accounts can initiate indexing requests to Google)

STEPS TO INTEGRATE INDEXING API

This integration requires Google.Apis.Indexing.v3 Nuget Package, which needs to be added to the project(Depending on the Sitecore Version, you may also want to update the ‘oldVersion’ attribute of ‘bindingRedirect’ configured for ‘Newtonsoft.Json’ in web.config to 0.0.0.0-12.0.0.0 as Google API looks for Newtonsoft.Json 12.0.0.0)

Pages that were created/updated during Publish or Workflow Approval operations shall be captured by adding a custom processor within Publish pipeline and sent to Google as below,

An Event Handler for item:deleting event shall be added to capture the deleted page links and shall be sent to Google as below,

The above processor and event depend on the IndexingAPIHelper.cs and ItemExtensions.cs which needs to be added to the solution.

Copy the JSON file downloaded during the setup process to the website root folder and update the file name in GetGoogleIndexingAPIClientService method of IndexingAPIHelper class accordingly

The configuration is now complete! Indexing Requests for added/updated/deleted content will be sent to Google upon publishing. Ensure that the respective pages follow Google Structured Data Standards (JobPosting, BroadcastEvent). 

Indexing API requests can be monitored from Indexing API Metrics Tab

Please note that the default quota for Indexing requests is 200, you may want to request for a higher quota following the steps described here. Quota usage can be viewed from Indexing API Quota Tab.

Source Code is available in Github. Please do share your feedback below.

Happy Indexing!

Design Considerations and Approaches for Scheduling Recurring Tasks/Workflows

Being one of the key initial steps in the automation journey of the implementation, scheduling tasks is not necessarily easy and straightforward. Web applications have plenty of options to achieve this (Eg: Sitecore Scheduler, Windows Task Scheduler, Azure Logic Apps, Container CronJob, Coveo Push API, Hangfire etc.), but each option comes with pros/cons based on the needs/requirements. There are several basic and advanced considerations that needs to be thought before designing recurring scheduled tasks/workflows,

Changing demands/Extensibility – Certain jobs might require dynamic demands on flow or frequency. The approach needs to be flexible to accommodate the changing demands. Azure Logic Apps will be extremely useful in such cases, considering the extensive configuration options.
Review/Approval – Certain Jobs might need job administrator’s intervention to complete the job and the selected approach needs to accommodate this review/approval.
Logging for tracing issues – It is critical to capture the web job actions and errors, this will be helpful in troubleshooting issues. Log Retention needs to be defined prior and the log storage size needs to be validated during the designing phase.
Alerts on failed jobs – Early detection of issues is a key consideration for any kind of job, appropriate alerts on failure over preferred communication channel like Email, Slack/Teams etc. is highly essential.
Reporting – Based on the criticality of the job, job administrator needs to be intimated about success of job every time or on a daily/weekly basis.
Caching – In scenarios where a recurring job is required to interact with certain processed data on a regular basis, the data can be cached so that the scheduled task need not fetch/process the data every time. Azure APIM & Logic Apps comes handy with extensive caching options. Sitecore CustomCache can also be leveraged.
Triggers & Execution Flexibility – In order to understand the need for any on-demand execution/scheduling, the triggers for the job needs to be analyzed. This will help to determine the best approach that will enable the intended job administrator(Eg: Content Author, Infrastructure Admin etc.) to configure/schedule the job on-demand.
Retry on Failures – Based on the nature/frequency of the job, automating 2-3 retry attempts could be beneficial before alerting a failure
Manual/Automated cancellation – It is common that certain jobs need to be paused/cancelled for a particular period based on internal/external factors. Job administrator needs to be provided with expected permissions for handling such scenarios.
Infrastructure – Job Hosting Platform must be capable of handling the unforeseen load or else it could even result in a site outage.
Concurrency Constraints – Concurrency constraints like number of simultaneous jobs or daily allowed jobs for a user or type of job, need to be pre-defined
Documentation – Documenting the job procedure using Flowcharts/UMLs/Storyboard will not only assist job administrators but will also be supportive during maintenance/troubleshooting
Storage in case of import jobs – Appropriate storage needs to be defined for file-based web jobs for storing the artifacts involved (Eg: Azure Blob Storage, AWS S3 Bucket, etc. )
Need for running jobs on Holidays – Certain jobs need not run outside of business hours or during holidays due to unavailability of data. It is essential to capture these scenarios and schedule accordingly as it might save some bandwidth for the infrastructure.
Permissions – Besides providing permissions to job administrators for handling scheduled task, it is essential to restrict the access for unauthorized people to control/update the job.
Frequency of the Job – Almost all of the available options including Windows Task Scheduler allows scheduling in seconds. Frequency needs to carefully matched with the data availability, to avoid unnecessary load or delays.
Tools and Dependencies – Identification of Tools/Dependencies plays key role in defining the hosting of jobs. Desired dependencies needs to be accessible in a secure way by the scheduled job.

Here are certain key approaches that works well for Sitecore Implementations,

The Sitecore Way
Scheduling jobs in Sitecore comes handy with multiple options including Scheduled Task in Sitecore Interface, Scheduling Agents, SiteCron etc.  Sitecore Powershell & Remoting is a great option when looking to trigger based on external actions and when a flexibility is need to change flow/behavior without development. Utilizing Sitecore Scheduling options when there is no interaction/involvement of Sitecore might adversely affect the CM performance and hence need to be evaluated thoroughly.

The Cloud Way
Azure WebJobs can be utilized for scheduling if the task needs to run in the context of App Services. Otherwise Logic Apps is recommended for automating tasks/workflows. Azure Logic Apps comes with unique url allowing to invoke as and when needed. It also provides options to configure appearance of Reports/Emails. Azure Logic Apps will usually be combined with API Management (and Azure Functions certain times) for additional security (to allow whitelisting/blacklisting, Integration with AD etc.), Load Balancing & Failovers, Caching of responses for certain kind of requests, extensive Monitoring & Telemetry capabilities.
If the implementation uses AWS, Batch jobs, Lambda Functions, API Gateway services etc. can be utilized.

Windows Server
Windows Task Scheduler, being the most common job scheduling mechanism for IAAS/On-Prem instances, it is usually preferred for jobs which doesn’t involve Sitecore interactions and commonly achieved with Powershell. Windows Task Scheduler allows scheduling multiple actions for a specific trigger.

The Search Platform Way
There might be scenarios when external repository data needs to be pulled into Search Platform for presenting on the site as it is (without storing/versioning in Sitecore). In such cases, data can directly be imported into the Search Platform instead of utilizing Sitecore Scheduler to reduce load on CM, especially when the job is expected to process multitudes of data. SOLR allows importing data directly with update/dataimport handler and can be scheduled to run automatically with Powershell/Curl (which can be automated with the scheduling options available with Hosting Platform). For Coveo-based implementations, the same can be achieved with Out-of-the-box option or Coveo Push API and Scheduling.

The Container Way
It is certainly possible to run scheduled jobs from Docker/Kubernetes using Cronjob API and other open-source add-ons like Tasker, Ofelia etc. Image for the Cron Job needs to be built on top of base image(s). This option works well when the scheduled job is dependent only on the components within the container.

With the wide range of scheduling options available, spending time in evaluating the options & designing the job scheduling is crucial to achieve effective long lasting solutions.

Happy Scheduling!