Thumbnails not being generated

It appears that the transcoder is having issues again. We’re getting uploads without thumbnails being generated. Anything you can check on your end?

1 Like

Just came in here to check on this and same with us - specifically with stills, movie files seem mostly fine.

Thumbnails not being generated again. Transcoders must be having issues.

1 Like

same issues transcoder service definitely seems to have issues, versions are also not transcoding currently for us as well…

2 Likes

I put in a ticket, as we are affected as well. They have updated the status page, looks like as of writing this the transcoder service is down:

https://status.shotgunsoftware.com/

2 Likes

It sounds like this is back up and running and there’s probably a big backlog now.

@Autodesk, is it possible to eventually have a public graph (or if you don’t want it on the open, accessible through our own sites) of the rolling average of transcoder backlog?

Thanks
-Kessler

6 Likes

+1 for @Michael.Kessler’s request. We have 40+ shows going here and processes that rely on the thumbnails being accessible. Being able to broadcast the expected wait time will help us pivot our priorities to other things while the process catches up.

4 Likes

Sorry to harp on this one. But while the status update says this was resolved on Friday and:

A fix has been implemented and our transcoding service is back online. Failed transcode jobs that were submitted during the incident will be reprocessed as the system catches up.

Seems like most of our thumbnails remain broken. I had a ticket open with support and was told the jobs were resubmitted but failed again (no reason given). And no update has been given to us since then.

I’ve had to manually re-import some things to keep production happy but there’s a bunch more that I’d rather not have to waste my day updating if they can be re-queued? Any help here?

Also curious if others are still seeing this issue or if it’s just us.

cheers,
kp

Hi @kporangehat ,

Sorry for the delay on the Thumbnail transcoding - I believe everything should be caught up now. @Michael.Kessler was right, there was a large line of jobs for the transcoder to work through and there were a handful of sites that needed an extra kick to get them moving. Our engineers are doing a thorough review of why the issue occurred in the first place so we can avoid such disruption in the future.

Let me know if your Thumbnails need an extra boot and we can get you sorted.

Cheers,
Beth

Thanks @Beth, we appreciate the extra look into failed jobs and the root cause. Ultimately, though, this has been the Achilles heel in Shotgun’s reliability; sometimes it seems that the system gets bogged down, sometimes full-stop. While we are quite happy that the holes are being discovered and patched (rather than glossed over), some reporting tools would go a long way, since “Shit Happens”, especially with services on the internet.

Even knowing that transcodes average 1 minute vs 5 minutes (even when working correctly) would be good to know for user expectation management.

Cheers!
-Kessler

2 Likes

Unfortunately @beth it’s not resolved at all.

  1. SG is once again having transcoding issues.
  2. I believe the post-incident recovery process that takes place is only looking at Version transcoding not thumbnails on their own.

This means that thumbnails that don’t get transcoded during outages will never be processed. I think this is a very big hole in “recovering”.

Where this hurts most for us is when we have a handful of review sessions happening where annotated frames are being created. These frames don’t get thumbnails.

And we have Deliveries coming in from vendors that are being ingested into Shotgun using the GPL tools which have their own transcoding service. So the Version media is being updated under the hood which is great, but the GPL tools also update the thumbnail and filmstrip for the Versions. These thumbnails don’t get generated during these outages and we’re left to clean up the mess ourselves :confused:

To re-echo @Michael.Kessler, yes having some tools to help with the user expectation management as to the state or performance of the transcoding services would go a long way.

And including the thumbnails in the “recovery” process when shit happens also seems like a reasonable expectation. Part of our issue is we’re only just now hearing that thumbnails not created via a Version uploaded movie, don’t get cleaned up by Shotgun when outages occur.

Thanks for the follow-up and we appreciate the ongoing updates.

cheers,
kp

2 Likes

Hi @kporangehat ,

There was an unrelated issue with 3rd-party hosting service, AWS, this afternoon that AWS believe’s they have resolved and we continue to monitor. This has caused slow transcoding times to sites today, but the engineers said that Thumbnails should still transcode as the AWS service catches up (ie: shouldn’t result in failed transcodes this time).

Additionally, there will be a post-mortem posted on the status page within a week or so about Friday’s incident of Transcoding failure that should hopefully provide more color for you.

Thanks for speaking more to your processes. I appreciate understanding more about how you work and your pain points as it helps to outline to our team the ways in which our clients are relying on these services in different ways.

I encourage both yourself and @Michael.Kessler (and anyone else that agrees) to post a feature request to the roadmap if you find client facing monitoring tools for the Transcoder would be helpful. As you know, the more people that speak to this, the more traction it gets.

As always, I appreciate whatever patience you have to offer during times like these.

Cheers,
Beth

3 Likes

Thanks for the info @beth. That’s very helpful to know. I do think this would be a valuable feature to have. @Michael.Kessler and I will seed that entry in the roadmap I’m sure.

I’ll add that it would be nice to know some of that detail about the issue being with S3 in the status updates. Especially in this case, it would alleviate some anxiety on our end and we wouldn’t feel the need to bug you as much :wink:

I really appreciate you filling in the gaps of information. Thanks.

cheers,
kp

2 Likes

Duly noted :wink: I’ll see what we can do haha.

1 Like

Hey folks,

Thanks for your patience as we sorted out the issues following the incident. We’ve now published a postmortem for the incident that should provide some more detail.

If you’re still noticing missing thumbnails or other transcode media, please reach out to us via support.shotgunsoftware.com and we’ll be happy to look into them.

5 Likes

Thank you @khosrow.

Just wanted to follow up and inform everyone our issues have indeed been resolved with some extra effort from the SG team.

Thanks to everyone involved. I’m feeling much better and more confident that the next time something like this happens, it won’t be because of the same issue, and steps are in place to mitigate the issues with non-Version upload-based thumbnails as well as Version-based ones.

Appreciate everyone’s efforts here! :pray:

cheers,
kp

2 Likes

Thank you for the work you put into this.
I am missing detail in the postmortem, don’t know if this is an issue of security/obscurity or not. AWS is not mentioned at all.
I know from experience that these kind of issues involving external services are notoriously hard to monitor and catch. Describing them in the pm would have been interesting. Not sure it would be allowed, though.

On a separate note, better client monitoring would certainly be desirable!

Thumbnail issues again. I’m assuming this has to do with the transcoders and AWS? Will thumbnails be generated eventually or do I need to republish these versions?

We’re having the same issues as well.

1 Like

Evercast uses AWS as well and they have been having all kinds of problems.
https://status.shotgridsoftware.com/