Deadline PipelineTools UI hangs Nuke - could SG be the cause

Hi.

I’m at a complete dead-end in trying to isolate the cause of the PipelineTools UI hanging on some machines and not others. Here’s a quick rundown of the issue:

Week1;

  • launch Nuke via SGTK UI.
  • Open PipelineTools UI, submit to Deadline. No issues.

Week2;

  • Same steps as above, however two people can no longer open the PipelineTools UI.
  • Opening the UI hangs Nuke, and the UI never fully populates; we just see a window filled with black.

Important note; no SG related code was changed. No software on the machines was changed. No infrastructure was changed.

The only thing we can theorise; both machines were rebooted.

Outside of the SGTK launch environment, it works fine.

Here’s what I’ve tried to fix or at least isolate the issue further (we use Linux);

  • Delete Deadline settings within ~/Thinkbox/Deadline10/settings
  • Delete Shotgun settings within ~/.shotgun
  • Run a clean Nuke environment, with only enough loaded to configure the SG environment and the Deadline submitter.

I can only assume this is machine specific, which suggests settings of some form. Yet I’ve purged the settings I’m aware of that could influence the issue, and it persists.

I’m also unable to determine where the crash is occuring; I’ve added log statements to the IntegrationUI (as that manages the UI that is never appearing correctly), and these statements never appear.

What. On. Earth. Could. It. Be?

I’m also posting this on the Deadline forums, just in case there’s some wisdom hiding there.

Any guidance would be appreciated as outside of reinstalling everything (which seems extreme), I cannot solve this.

Thanks.

Are those machines able to see Deadline ok? Try doing deadlineCommand calls to the repo.

“Outside of the SGTK launch environment, it works fine.”
So deadline is working fine outside a SG environment? If so, you can disregard the previous question!

I’d look at your sys.path and PYTHONPATH and compare between the working non-sg environment and the broken sg-environment.

I’ve never had the deadline gui hang like that, it usually raises an exception if something isn’t right.

The next step I’d do is copy the deadline submission python scripts to the deadline repos “custom/submission” folder, and start adding logging and feature-switches to isolate the issue further; (eg try disabling any SG calls in the dialog code).

Hi Patrick - I appreciate your thoughts.

Likewise, I’ve not experienced something like this as usually ‘artist level’ issues are resolved by resetting a problematic preference file somewhere.

Yes, both env’s can see Deadline fine; within the SG env, I can still Submit regular Deadline jobs, but as mentioned the PipelineTools UI hangs Nuke.

I didn’t mention one of the steps I tried; comparison of the registered env vars and their contents. Nothing stood out there, but I’ll have another look.

I’ve also tried your last suggestion of creating a custom copy of the Deadline Integration submitter, adding some break-points etc, yet it hangs without ever showing any logging or exiting at the break-points. I tested my break-points outside of the SG environment and everything worked as expected.

I’m going slightly crazy over this one tbh.

I’ll re-read your points with a clear head in my AM though, as sometimes doing so provides a fresh angle of attack as there must be something simple, somewhere…

Thanks again, clinton

Did you get to the bottom of this Clinton?

Don’t know if this is related to your issue, but I’ll share: if the error console/script editor is opened during the session, logs from ShotGrid can crash Nuke - presumably due to violating thread safety.
We’ve encountered this several times.

Another thing that crashes the Deadline submitter is whether you use multithreaded submission - this depends on the settings entered in the submitter. “Set Dependencies Based on Write Node Render Order” if I remember, forced submission on the main thread.
The multithreaded workflow is broken.

Hi @Patrick - I dedicated about half a day to this and couldn’t find any resolution. The issue still only remains with the two problematic machines; no more, no less.

I’ll ask IT to re-image one of the boxes if it’s not too big an issue for them, and see how it goes after that.

@mmoshev - thanks for your thoughts. As the crash happens prior to any Deadline submit taking place, I suspect it’s related to how DL is communicating with SG. But then again, it crashes before SG is actually called …

If I know more I’ll definitely update this thread.

clinton

Yeah the problem is probably elsewhere. The submitter communicates to the pipeline tools via input/output to a subprocess, which is not really optimal.
Could you check whether it happens when the error console is not open?

To be super clear; can you confirm what error console you’re referring to.

The error console and script editor in Nuke - in some cases they are being written to in a thread unsafe manner by integrations, which crashes Nuke.

One thing to check is if there are any libraries installed in the Local python site-packages that are not on the other machines.

Thanks @Ricardo_Musch. The odd thing, is that nothing changed on the machines from the time of them working, to not working - literally nothing.

The challenge I have now is that both machines are in full time use, so my testing is limited to the odd after-hours session.

@mmoshev - I’ll double-check, thanks for clarifying.

Last night, all machines rebooted - and now, as feared, they all exhibit the issue.

So I’ve spent today digging some more and have managed to print the error that was stalling Nuke;

Error: ArgumentException : Path '/opt/Shotgun/Shotgun' is rooted; it must be a relative path.
   at FranticX.IO.Path2.ValidateSubPath(String subPath)
   at Deadline.StorageDB.FileStorage.GetRepositoryPath(String subDirectory, Boolean checkCustom)
   at Deadline.StorageDB.PluginStorage.GetPluginDirectory(String pluginFolder, String pluginName, Boolean checkCustom, String directoryOverride)
   at Deadline.StorageDB.PluginStorage.c(String mk, String ml, String mm, Boolean mn, String mo)
   at Deadline.StorageDB.PluginStorage.GetParamsFile(String pluginFolder, String pluginName, Boolean checkCustom, String altCustomPluginDirectory)
   at Deadline.Scripting.RepositoryUtils.GetEventPluginConfig(String eventPluginName)
  File "/mnt/cgfx/DeadlineRepository10/custom/submission/Integration/Main/IntegrationUIStandAlone.py", line 94, in __main__
    integration_dialog.AddIntegrationTabs( main_dialog, appName, addDraftTab, projectManagements )
  File "/mnt/cgfx/DeadlineRepository10/custom/submission/Integration/Main/IntegrationUI.py", line 42, in AddIntegrationTabs
    config = RepositoryUtils.GetEventPluginConfig( project )
   at Python.Runtime.PyObject.Invoke(PyObject[] args)
   at Python.Runtime.PyObject.InvokeMethod(String name, PyObject[] args)
   at FranticX.Scripting.PythonNetScriptEngine.CallFunction(String moduleName, String functionName, Object[] args)

I see now what the issue is - the integration tool wants ‘Shotgun’ passed to it (which makes sense), and it’s getting the path to the Shotgun Desktop executable instead. Interesting that a reboot caused this problem.

I’ve patched it so it returns ‘Shotgun’ when it requires it and will look into a non-patched solution at a later stage. Although the further I dig, the more I see that the incorrect value is coming from the python2.7 argparser included with Nuke - we are using 11.3v2 - so a reboot being the trigger is making more and more sense, as argparser is possibly discovering the executable when it was supposed to simply see a string literal (I could be off the path with that assumption though).

Thanks again for your thoughts and input.

Also. I’m Australian, and to describe something as ‘rooted’ literally means it’s f#kd. So it made me laugh when the error was printed out - as in this case, our SG integration is very much f#kd!

1 Like