Webhook Rate Limiting

Is anyone else experiencing an outage of Webhook service from ADSK and SG today? Our instance is not reporting errors, and has not received logs or confirmation of outgoing events since 5:30 am this morning. Made a support ticket, but not getting responses. Escalated our outage with our territory manager( should NOT have to do this). Production is backing up due to the outage and wondering if other users are impacted.

Romey

I think the issue is isolated to our instance. I was able to remote into another instance and confirm webhook are working for other sites. So I suspect the issue is isolated to our instance and we have been quarantined. I wish we had more visibility to what is going on here.

1 Like

Hi, Romey, has information I passed on through support helped?

It looked from logs like the site’s throughput was backlogged and periodically rate-limited due to volume / slow performance on a webhook’s response time. I.e. too many requests for the response time of the webhook, so if you identify the webhook and either filter it more tightly or improve its response time, that would reduce throttling and increase throughput.

However, consumption by all webhooks for the site is combined when considering rate limiting, so it is also possible that it is not a single webhook responsible, but an overall increase in volume to all webhoks. (But I think when optimizing there are usually bigger targets that stand out. I would expect one or two webhooks to be consuming significantly more.)

We do have work planned to improve visibility of performance bottlenecks in the webhooks UI, and also provide throttling warnings, but that is still a little bit down in our queue.

Neil

@NeilB Thank you for following up and responding here. Moving forward, I do have a small but very valuable feature request. If ADSK is going to rate limit our instance, this needs to be something that is clear and visible to admins. We spent way to long trying to identify the issue and waiting for a response from ADSK while crippling production. Might I suggest you guys use the Shotgrid banner system to let Admins know that the Webhook service is in duress due to an excessive backlog and all service will be rate limited until the issues are resolved.

The banner system is a great suggestion for notifications.
We are working on moving webhooks finally out of beta, and troubleshooting visibility for users is definitely one of the priorities, and I’ll add using banners to the approaches we are looking at.

Neil