Distributed Config and Render Farm Setup

Dfulton1 · October 25, 2019, 4:03pm

I am able to run it fine from the desktop app

the error in the log files is

Traceback (most recent call last):
File “/home/dfulton/private/.shotgun/flightschool/p94.basic./cfg/install/core/python/tank/util/loader.py”, line 55, in load_plugin
module = imp.load_source(module_uid, plugin_file)
File “/home/dfulton/private/.shotgun/bundle_cache/app_store/tk-maya/v0.9.7/engine.py”, line 23, in
import maya.OpenMaya as OpenMaya
ImportError: No module named maya.OpenMaya
2019-10-24 16:12:50,576 [32683 INFO sgtk.core.bootstrap.manager] Progress Report (0%): Resolving project…
2019-10-24 16:12:50,581 [32683 INFO sgtk.core.bootstrap.manager] Progress Report (10%): Resolving configuration…

In Maya’s script editor it give me an error when importing the sgtk so it seems to fail out on loading the shotgun panel

When comparing the environment file the desktop app grabs all our environment files and the tk-shell on does not .

pretty much just running this starts Maya but does not give any of the environment files or access to shotgun

import sgtk

sa = sgtk.authentication.ShotgunAuthenticator()
user = sa.get_user()

sgtk.set_authenticated_user(user)

project = {“type” : “Project” , “id” : 94}
mgr = sgtk.bootstrap.ToolkitManager(sg_user=user)
mgr.plugin_id = “basic.”

mgr.base_configuration = “sgtk:descriptor:dev?linux_path=/local/dfulton/code/fs_shotgun”

engine = mgr.bootstrap_engine(“tk-shell”, entity=project)

engine.execute_command(‘maya_2018’,)

I’ve also tried replacing mgr.base_configuration with "sgtk:descriptor:app_store?name=tk-config-basic" and mgr.pipeline_configuration as Primary and my sandbox to the same result. the only way that I can seem get this to work is to place the arguments path on the software entity of the Software page which does not really allow for flex ability.

Also we have written a hook for before_app_launch.py to get our environment paths which just seems to be ignored when launching from the tk-shell

Dfulton1 · October 25, 2019, 6:43pm

By the way the error in the Maya script edit is

# Error: Shotgun: Cannot restore panel_tk_multi_shotgunpanel_main: Shotgun is not currently running # 
onSetCurrentLayout "Maya Classic";
# Error: Shotgun: Could not import sgtk! Disabling for now: No module named sgtk # ```

Dfulton1 · October 31, 2019, 7:28pm

Hello again wondering if it maybe a better idea with this to possibly have a check in the publisher to take said file and open it up using mayabatch as a subprocess.Popen() run all the processes and commands I need. Then bootstrap into tk-shell and through this run tk-multi-publish2 to just publish the processed file in more of a outside approach.

any thoughts on this would be much appreciated

philip.scadding · November 4, 2019, 12:02pm

Hey sorry for the delay in getting back to you, I was off last week.

This part of the error is intriguing

File “/home/dfulton/private/.shotgun/bundle_cache/app_store/tk-maya/v0.9.7/engine.py”, line 23, in
import maya.OpenMaya as OpenMaya

Is that coming from the tk-maya.log file? I’m not sure why if that code is running within Maya that is should be failing to import that module. Is that code running within Maya?

Please could you private message me your environment output, or drop a ticket into support so I can take a closer look. I’m not sure what’s going wrong here yet.

This could work, though you shouldn’t need to do that in order to avoid this issue.

I’m going to try and mock up the Maya batch approach this afternoon, and see how it goes.

philip.scadding · November 4, 2019, 2:33pm

Hi David.

I just had a go my self where I passed the args via the software entity and that worked for me as well. My script was able to make use of the bootstrapped sgtk engine.

In terms of flexibility, I agree doing it that way would limit you but you should be able to modify the launch args via the app_launch.py hook, and set any required settings in either that hook or the before_app_launch.py hook, rather than via the software entity.

I’ve tested that here as well and didn’t run into any problems. Have you definitely added the hook to the same config your bootstrapping? By default, the tk-shell and tk-desktop engines use the same settings, so if it works from SG Desktop it should work from your shell.
I’d be happy to take a look at your config if you’d like? Again feel free to message that to me directly.

Cheers
Phil

Patrick · December 6, 2019, 5:38pm

I’m having another stab at this and can’t get it working.
I’m successfully injecting custom args to the app_launch hook, in this case specifying -V 2 -t somepathtoscript.nk
The problem is that SG nuke engine is still failing so the script (which is a valid script for the project) is failing with the following errors. Am I doing something wrong? Or is it not possible to launch via tk-shell (tank shell) to nuke in commandline mode (-t)?

The app launch command is submitting the following :
start /B “App” “C:\Program Files\Nuke11.3v4\Nuke11.3.exe”
-V 2 -t
“//redacted/DEVJan30_0020_lighting_bootstrapDevTest_v002_blacksmith.nk”

The error is the usual shotgunwrite1: writetank`: unkown command’.

In the nuke shell that loads, I’m unable to interact as it appears the shell I am in is confused which shell I am entering commands to; eg import nuke runs without error, but if I then execute nuke it says the module doesn’t exist. Very strange!

Philip, can you run through a test where you:

add custom args to app_launch to make it launch in to a cmdline nuke session and load a valid sg path.
open a shell with tank shell
launch nuke with `engine.execute_command(‘yournukecommand’)

Make sure your test file has a shotgun writenode to test it loads ok.

Thanks again!
p.

philip.scadding · December 9, 2019, 12:45pm

I’ll give it a go this afternoon and get back to you!

philip.scadding · December 9, 2019, 3:11pm

Hi @Patrick one thing that just crossed my mind, is what context are you running in? If you launched with tank shell then that would be just a project context, and the write node is not added to that environment by default.

Patrick · December 9, 2019, 4:25pm

Just to follow up on this…

To make the original question more specific; has the farm_wrapper approach to farm publishing been tested by the SG devs with a distributed config? The more I dig in to it, the more it feels like it’s not possible (with Deadline at least).

The problem is this :-
If we’re using distributed configs with local caching (which appears to be the current best practice), then we need to launch DCCs (in this case lets concentrate on Nuke) in the same way that we do from a workstation. The reason being, we can’t know the path to the tk-nuke-writenode gizmo until it has been cached on the rendernode, which only happens if nuke is launched with the nuke engine (which we also do not know the path of until it has been cached locally).

It’s a bit of a catch-22.

Yes, we could write our own custom deadline plugin to launch dcc’s via engine.execute_command(‘some_dcc’), but this is a non-trivial task (take a look at the deadline nuke plugin and see how non trivial this setup is for correct handling of nuke tasks and handling stdout).

As for the simpler issue of passing arguments on tank shell, I saw another post you wrote that suggested the context being the issue, and this does indeed fix the problem for these cases… but for the farm_wrapper, on deadline, this won’t cut it.

//edit

I got to the bottom of the problem.

As it stands, without using launch-app, the engine isn’t cached wherever the bundle_cache path is set.

So, in my deadline prejobload script I’m using the descriptor of the submitting engine to then cache that engine locally. I can then inject the path to the locally cached engine in my PYTHONPATH (or NUKE_PATH or MAYA_SCRIPT_PATH) so that when Deadline uses its normal dcc launch code, it launches with a working SG engine.

I gave up on attempting to use launch_app as it is a non-trivial task to edit deadline plugin definitions to launch via a tk-shell instance and I imagine it would break all of deadlines progress and error reporting… not to mention the fact it wouldn’t have been possible to pass deadline arguments through tk-shell launchapp calls anyway.

philip.scadding · December 9, 2019, 5:01pm

More specifically, it only happens after the engine has been bootstrapped. The bootstrap process of an engine is what ensures all the dependencies have been cached.

There isn’t one way to do this really. Launching via the launch app is one way of doing, but as you say with Deadline at least that is probably not so simple as you need to modify the Deadline plugins, to handle the launch via Toolkit.

The approach that has usually been suggested is to just ensure that the env vars are passed to the job and that you have a bootstrap script setup to run on the farm once the software has been launched.

I think my preferred approach to handling Nuke write nodes would be to have a pre job to the main render job, that once Nuke was launched bootstrapped Toolkit and converted the SG write nodes to standard write nodes, saved a temporary nuke script, and then the main render job would run off that.

Patrick · December 10, 2019, 4:18pm

I think what your saying works, but it’s not ideal. Wouldn’t it make more sense for SG to provide a more robust method for bootstrapping that would work on the farm rather than having to hack a solution? Passing environment variables won’t work if they point to files cached on the users local machine? It’s not an approach that’s compatible with SG’s distributed config paradigm.

What should happen is that users can run a simple SG method on the farm and have any DCC work out of the box. As it stands, we’re left scratching our heads a little as there are a number of possible solutions, none of which feel adequate or robust for all possible situations.

I’m curious to hear what the devs think about this? Am I being overly pedantic (which would come as no surprise ) about finding an elegant solution to this?

philip.scadding · December 10, 2019, 5:23pm

You shouldn’t need to point to any local files on disk.
At a bare minimum, all you need to know in order to bootstrap is:

A path to a sgtk API, so you can import the API that will initiate the bootstrapping. (does not need to be project specific, so could just be a sgtk API stored on a server/location the farm can reach)
The entity type and id of the context you want to bootstrap into.
The engine name to bootstrap, (however this could be hardcoded in the bootstrap script, since you will likely have a startup script that performs the bootstrapping per software anyway.)

I appreciate it can be a pain to set those env vars during submission, but if you have a custom submission tool already then it fairly simple to add it.

Patrick · December 12, 2019, 2:40pm

Ok I managed to figure out where I was going wrong.

I was adding the engine path only to NUKE_PATH. I didn’t realise I needed to add the classic_startup folder to the path.
I also needed to add the “TANK_CONTEXT” and “TANK_ENGINE” env vars to our environment for the classic bootstrap to function.

So, unless I am mistaken, to get this working with a distributed config, our deadline globalJobPreload has to do the following :

Bootstrap tk-shell to the correct context.
Get the Pipeline Configuration environment (for the relevant entity type, eg ‘shot’, ‘asset’ etc)
Get the DCC engine environment; eg tk-nuke
Get the descriptor from the engine environment.
Run the “ensure_local” method of the descriptor.
Get the path of the local cache of the engine.
Add this path to DCC env vars (eg for nuke it’s the descriptor cache path + “classic_startup”)

With this in place, when deadline launches nuke, SG will bootstrap correctly, and the tk-writenode gizmo will be available.

This also means that the farm_wrapper code that bootstraps can be disabled as SG will already be instantiated and in the correct context for the publish.

It took a while but I finally put all the pieces of the puzzle together. Nice!

philip.scadding · December 12, 2019, 3:34pm

Nice work, I’ve read over this a number of times and I think I’m following along, but let me know if I’m missing anything.

But with your approach, I have some questions,

I’m not sure what the "deadline globalJobPreload " is (it’s been a few years since I’ve worked with Deadline). Does that run per job before any of the actual frames are rendered, ie only once on one machine? If so I guess that is what I am referring to as a pre job then.
If you are relying on getting the gizmo path so that you can pass it to main job, (so that Shotgun doesn’t need to bootstrap but it can still render the SG write node,) then if you are bootstrapping on one machine, the path will be local to that machine so other nodes won’t be able to access the gizmo? Or do you have a centralised bundle cache?
I’m not quite sure why you need to set the TANK_CONTEXT and TANK_ENGINE, you’re not actually bootstrapping the tk-nuke engine, are you? or maybe you are? But as I understood it reading your steps, your bootstrapping the shell engine, and then doing some extra work to ensure that the nuke write node is pulled down, as it wouldn’t by default because it’s not in the tk-shell engine’s environment. Then you’re grabbing the path to the gizmo and providing it in an env var to the main job.

Here’s what I would expect you to need to do for a straight forward bootstrap in Nuke (or tk-shell, you would just need to tweak the plugin id and engine):

Have a custom Shotgun init.py startup script defined in the NUKE_PATH.
Define an entity type and entity id env var containing appropriate values. The actual environment variable names don’t matter as the custom init.py script will be responsible for reading them, so call them whatever you like.
Deadline starts Nuke in whatever way it sees fit.

The startup script will be invoked by Nuke because it was defined in the NUKE_PATH. The startup script would look something like this:

# Nuke init.py
import sys
import os
# import a standalone copy of the sgtk API 
sys.path.append("/a/path/on/your/server/tk-core/python")

# optionally enable debugging
sgtk.LogManager().global_debug = True

import sgtk

# Instantiate the authenticator object, passing in the defaults manager.
authenticator = sgtk.authentication.ShotgunAuthenticator()

# Create a user programmatically using the script's key.
user = authenticator.create_script_user(
 api_script="Script Name",
 api_key="4e48f....<use the key from your Shotgun site>",
 host="https://yoursite.shotgunstudio.com"
)

# Tells Toolkit which user to use for connecting to Shotgun.
sgtk.set_authenticated_user(user)

entity_type = os.environ["SHOTGUN_ENTITY_TYPE"]
entity_id = os.environ["SHOTGUN_ENTITY_ID"]

def pre_engine_start_callback(ctx):
    ctx.sgtk.synchronize_filesystem_structure()

mgr = sgtk.bootstrap.ToolkitManager(sg_user=user)
mgr.plugin_id = "basic.nuke"
# add the callback in so we can synchronize the path cache, before it attempts to load the apps
mgr.pre_engine_start_callback = pre_engine_start_callback

engine = mgr.bootstrap_engine("tk-nuke", entity={"type": entity_type, "id":  entity_id})

Now if one was following my suggestion then this process would be run as a pre job and you would also open the nuke script, run the write node conversation and save a temporary script for the main job to use to render from. Thus saving all the subsequent slaves having to bootstrap toolkit as well. Only one slave would need to bootstrap in order to convert the Shotgun write nodes.

Patrick · December 16, 2019, 9:48am

Hi Phil,
I think what I’m trying to do is make a more robust and software agnostic approach(and maybe more portable). Your suggestion makes sense for renders, but less so for publish jobs which do require SG. Perhaps I’ve not hit any issues with bootstrapping all slaves as I’m not scaling in a big way yet?

You’ve not quite followed my thinking in your summary.
(1) Correct, I am using the pre job script to setup SG.
(2) Bundle cache path could be either centralised or local depending on whether in-house or remote user. I’m trying to ensure either would work.
(3) I am bootstrapping tk-nuke as, perhaps incorrectly, I’m trying to create a setup that simply recreates a local workstation environment. Adding the tk-nuke/classic_startup to NUKE_PATH appears to work fine for launching Nuke with SG working as expected. I’m no longer doing anything special to ensure the gizmo is pulled down as including classic_startup in NUKE_PATH does this for me. I’m no longer getting the gizmo path as classic_startup (well, engine.py) is adding any gizmo paths to nukes pluginpath on startup for me.

The only issue with the gizmo not being setup is the fact Nuke was launching in to “shot” not “shot-step” from the provided context entity. Perhaps I’m pulling the wrong entity from the context here? I’ll need to investigate that. Adding the write-node app to shot and asset envs resolves this issue anyway.

I think perhaps one thing I’ve not described in detail here, is that we are using SG to handle non-SG software packages; eg custom python libraries, dcc configs, 3rd party libraries etc. We upload these to SG as toolkit bundles, and have a custom method for caching them locally and adding them to the environment. On workstations, this is handled in engine_init and app_launch hooks. On the farm, we need to do the same; so we do need to instanciate tk-shell, so we can grab the correct toolkit bundles, caching them locally, and runninig their app_launch hooks to prepare the environment.

It works really well for us and makes developing, version control, and configuring quite straight forward. Given we have to instanciate tk-shell on the job pre-load, then it makes sense to pass the engine bootstrap path to the DCC to let SG do all the work of preparing the SG environment on the DCC. I get the feeling this approach is against the ethos of SG?

Shall we do a screenshare so I can walk you though this setup. I’m clearly not doing a good job of describing things here

Thanks again
p.

philip.scadding · December 16, 2019, 2:23pm

Hey Patrick I messaged you privately, but I may not be able to give this topic the attention it deserves until the new year, but it will definitely be on my list!

loo-eye · December 18, 2019, 2:27pm

Hey Philip!

I’m glad to hear that you’ll be looking at this post holidays. I’m working with @Patrick on the issues he described and we’ll need to through it to get our client’s pipeline functioning the way they want (which is going to be super awesome!) Should we open a support ticket or do you prefer continuing on this forum?

-louai

philip.scadding · December 19, 2019, 9:17am

Hey Louai

Welcome to forums! Yeah I’m happy to move this over to a support ticket for now. It would be good to feedback here once we have some kind of resolution to the discussion.

Thanks
Phil

philip.scadding · January 15, 2020, 10:08am

Just to round back here, I chatted with @Patrick the other day, and it seems that he has a setup that is working for him. There really is no one way to approach farm integration.

The main thing you want to avoid is bootstrapping on too many render nodes simultaneously, as this can affect your Shotgun site performance and render speed. For example if you had a hundred nodes, all rendering Nuke frames, you don’t want all of them bootstrapping.

So it is important to design your process to limit the number of interactions with Shotgun, by converting Shotgun Nodes before the main render jobs, or running publish processes after the mainframes have been created.

jeff · January 20, 2020, 7:09pm

@philip.scadding just read through this whole thread and got a good amount of information from it, but would still love to see that guide you’ve been working on.

I’ve been trying to figure out how to get Shotgun + Deadline working on the farm w/ Nuke for the past few days and it is not easy!

Topic		Replies	Views
Distributing custom code and tools? Pipeline Integrations	3	1304	August 21, 2019
Shotgun and remote working Pipeline Integrations fptr-tk , wfh	3	3071	March 28, 2020
Farm Bootstrap PiplineConfiguration Pipeline Integrations	1	273	January 19, 2022
Weird Bug With Alternate SG configs Pipeline Integrations	6	1340	November 14, 2019
Photoshop open template with distributed config Pipeline Integrations fptr-tk	1	1327	July 6, 2020

Distributed Config and Render Farm Setup

Related topics