Genome
RSS< Twitter< etc


Stoke MX Asynchronous Cache

Work In Progress, Last Edited: April 25, 2013 

Overview

Stoke MX implements a completely new Asynchronous Caching System which has not been used in any other Thinkbox Software product before.  It allows for better playback performance and faster simulation iterations using all available resources, or a user-defined part of the memory or processing threads.

Sequential vs Parallel Processing

Traditionally, the typical particle advection workflow using FumeFX, Particle Flow and Krakatoa is fully sequential. FumeFX provides data to FumeFX Follow operator which assigns velocities to the particles. The particles are then moved to their new positions according to these Velocities, and passed to Krakatoa for saving. Krakatoa then saves a PRT file which includes the compression of a stream using Zlib, and writing to a large PRT file. Both the Particle Flow advection and the Krakatoa saving stages are single-threaded, and the saving stage takes usually a lot longer than the advection stage.

In contrast, Stoke MX decouples the advection process from the saving process and parallelizes both using all available CPUs and cores, or a fraction thereof. This is possible thanks to the Memory Cache / Pending Buffer described further on this page.

The following graph shows a comparison between the sequential and asynchronous approaches to processing and saving the data. 

Note that the Stoke simulation is also fully multi-threaded, so it is already significantly faster than the single-threaded Particle Flow simulation, but if the PRT saving had to be performed sequentially and on a single thread, this would negate a lot of that speed up.

The following table illustrates the benefit of running multiple saving threads in the background:

The first column shows the simulation, saving and total time of Particle Flow advecting up to 10 million particles over 100 frames using a FumeFX simulation and the FumeFX Follow operator and saving to a PRT sequence using Krakatoa. Particle Flow was set to default Half Frame Integration interval. Changing it to Full Frame would cut the simulation time by half.

The second column shows the performance of Stoke using 8 threads to simulate and only one thread to save to PRT. As you can see, while the pure simulation performance is over 10 times higher than PFlow, this advantage is negated by the comparable saving time, resulting in only 1.4 times overall speed up.

Adding more saving threads significantly improves the saving performance, while eating away a bit of the simulation time because 8 simulation threads are competing with more and more saving threads for processor time. When saving with 4 threads, Stoke is already 3.5 times faster than Stoke with one thread, and nearly 5 times faster than the Particle Flow setup!

Adding 4 more threads has a lot less effect due to the nature of Hyper-Threading, but it is able to squeeze out a bit more performance for a total speedup of over 6.5 times over Particle Flow.

Note that a large part of the saving process happens in the background while the user has full interactive control over 3ds Max and Stoke, so the real Stoke simulation time is between 40 and 90 seconds. So if the time to save all frames to disk does not matter to the user but the time to become productive with the 3D application does, saving with one thread would be preferable and it would be over 38 times faster than the alternative Particle Flow approach. 

Memory Buffering And Performance Balancing

The amount of memory allocated for caching only affects the "Time To Be Interactive" measurement, that is the time it takes before the Stoke system lets the user be creative with the software again. Larger Memory Limits on machines with huge amounts of RAM help reduce the waiting time until the simulation is finished, but the total time from the moment the simulation started to the moment the last PRT frame was written to disk remains constant.

The following chart shows the same simulation as above, but performed with different Memory Limits:


The simulation was performed on a machine with 32 GB of RAM, but all frames of the whole simulation could fit in 19 GB of memory. When Stoke was given a buffer of 24 and 16 GB of RAM, it had no problem fitting all the data and never waiting for memory to become free before simulating the next frame. Thanks to the 8 threads saving in the background, some of the memory was freed up before the next simulated frame was ready. 

Somewhere around 15 GB, Stoke encountered cases where there was not enough memory to store the currently simulated frame, resulting in some waiting time. As result, the Simulation time increased by 100 seconds. But since the 8 saving threads never stopped saving in all test configurations, the actual total time remained constant! The flush time (time after simulation has finished until the last frame was written to disk) decreased by 100 seconds, balancing the total performance out!

Stoke Cache Components

The Stoke MX Cache System consists of the following components:

Memory Cache 

  • The Memory Cache is used mainly to speed up viewport playback.
  • Playback directly from memory is naturally much faster than reading every frame from disk (the Krakatoa PRT Loader for example reads every frame from disk).
  • Memory Playback obviously has the drawback of using up a lot of memory really fast, especially with large particle counts.
  • Stoke MX allows you to specify a Memory Limit to determine exactly how much RAM will be used to accelerate viewport playback. 
  • Setting the Memory Limit to 0 will force Stoke MX to behave exactly like PRT Loader and reload every frame from disk, then drop it and load the next frame as the current time changes. 
  • Stoke MX also lets you cache Every Nth frame to memory and interpolate between these samples at playback time, thus reducing the memory usage significantly.

Pending Buffer 

  • The Pending Buffer is used mainly to speed up the simulation process and the saving of PRT files to disk.
  • It is populated with particle data during simulation and it continues saving to PRT files using one or more background threads after the simulation has finished.
  • This lets Stoke parallelize the PRT output by buffering the simulation data and allowing the simulation to continue without waiting for the data to be saved.
  • The Pending Buffer will remain in memory even if the MAX scene is released from memory and a new scene is loaded.
  • If 3ds Max is about to be shut down and the Pending Buffer still contains data, you will be prompted to wait until all data is saved, or discard the data and shut down anyway.

Disk Cache 

  • The Disk Cache is used to permanently store the simulation results and at the same time is connected with both the Memory Cache and the Pending Buffer - the former is fed by the Disk Cache when a frame that is not in memory has to be played back, while the latter generates the Disk Cache during and after simulation. 
  • The Disk Cache uses regular Krakatoa-compatible PRT files and can be used directly as Stoke's simulation output.

Cache Controls

Use Disk Cache option 

  • This option is checked by default.
  • When unchecked, all simulation data will be stored only in the Memory Cache. 
  • When checked, the data will be stored in the Disk Cache and can be recalled when needed, including in future 3ds Max sessions.
  • If the data exceeds the Memory Limit, previous frames will be dropped. If the Use Disk Cache option is off, that data would be lost. If it is on, only frames that are already saved will be dropped, thus ensuring no data loss. 

Memory Limit 

  • The Memory Limit defines the total amount of memory to be used for the Memory Cache and Pending Buffer combined.
  • The Memory distribution between the Memory Cache and Pending Buffer is performed dynamically and automatically (see further on this page).
  • When the Limit is set to 0, there will be still some memory usage for storing only the current frame. In that case, simulation will perform poorly because all data will have to be processed and saved sequentially, and playback after simulation will be forced to read each frame from disk. 

Cache Nth 

  • Due to the ability of the Stoke MX object to interpolate data from relatively few samples, it is advisable during initial testing to cache only every Nth frame (for example every 2nd, 5th or even 10th frame).
  • This will save a significant amount of memory, while still providing enough data to judge the quality of the simulation.
  • Once the simulation parameters have been fine-tuned, simulating one last time with Every 1st will produce the final output.
  • Note that the Disk Cache will also store only every Nth frame, making that sequence unusable for final rendering via a PRT Loader. The Stoke MX object itself through can be rendered directly and will interpolate the in-between frames as well as possible. 

Cache Operation During Simulation

Simulating With "Use Disk Cache" On 

  • When the simulation is started, if the Use Disk Cache option is checked, the Memory Limit will be split unevenly between the Memory Cache and the Pending Buffer as follows:
  • If the Memory Limit is less than 1024 MB, the Memory Cache will be set to 128 MB and the rest will be given to the Pending Buffer.
  • If the Memory Limit is more than 1024 MB, 512 MB will be given to the Memory Cache and the rest will be allocated for the Pending Buffer.
  • During simulation, the Memory Cache plays no role, but the Pending Buffer is very busy writing out the PRT data. By providing a large enough buffer to hold the simulation without having to wait for the background thread to free up space for newly simulated frames, Stoke MX can shorten the actual simulation time dozens of times!
  • If the memory Limit is set to 0, each frame will be simulated and the saved sequentially, defeating the purpose of the Pending Buffer system.

Simulating With "Use Disk Cache" Off 

  • If the Use Disk Cache option is unchecked, there will be no saving to PRTs during simulation, so all memory defined by the Memory Limit is allocated to the Memory Cache.
  • During the simulation, all frames are stored in the Memory Cache and can be played back immediately from memory when the simulation finishes.
  • If the amount of simulation data exceeds the Memory Limit, earlier frames will be dropped and lost, thus leaving only the latest frames in memory
  • In that case, it is advisable to either increase the Cache Nth value to reduce the memory usage, or enable Use Disk Cache.
  • If the Memory Limit is set to 0, only the last frame simulated will remain in memory.

Cache Operation During Playback

Playback With "Use Disk Cache" Off 

  • If Use Disk Cache was off during simulation, all frames will be already in memory, assuming the Memory Limit was high enough.
  • Playback will be fast and directly from memory.
  • If the scene is closed or the Stoke object is deleted, the simulation data will be lost. Opening the scene in a new session will require simulating again. 

Playback With "Use Disk Cache" On 

  • If Use Disk Cache was on during simulation, frames will be either in memory, or in PRT files on disk.
  • Playback might be slow if a frame has to fetched from disk, but once cached in memory, it will be fast again.
  • Once all frames are saved to PRTs, the scene can be closed or the Stoke object deleted, but the data will not be lost and could be restored from the Disk Cache in a new session.
  • If the Memory Limit is reached and a frame has to be played back that is not in memory, another frame that is already saved to the Disk Cache will be dropped from memory to make room. This way, the Memory Cache will try to stay populated around the current playback time and free up memory from the farther regions of the simulation range.
  • If the Memory Limit is set to 0, each frame will be loaded from the Disk Cache and only the current frame will be kept in memory, emulating the PRT Loader's behavior. 

Resetting The Memory Cache

Resetting The Cache 

  • Resetting the Memory Cache is advisable when the results of the current simulation look unsatisfactory, but there is still data being written to PRTs by the Pending Buffer background thread.
  • Resetting the Memory Cache will cause all data to be removed from memory, and the Pending Buffer thread to be stopped after it finishes saving its current frame. 
  • NO PRT FILES will be deleted in this process though, so any already saved files will be left and can be overwritten by the next simulation, or deleted by hand.
  • Starting a new simulation by pressing the SIMULATE button will also cause an implicit reset of the Memory Cache and the Pending Buffer to remove any traces of the previous simulation iteration. If you want to keep the results, you should Flush the cache manually. 

Loading An Existing Cache 

  • Entering an existing Disk Cache path, for example by entering an pre-existing Version folder name, will associate the Memory Cache with the selected Disk Cache.

Saving To A New PRT Sequence 

  • You have the choice via the [X] menu to save the Memory Cache to a completely new PRT sequence at any user-defined location.
  • The saving will be performed using the main thread and will lock up 3ds Max until all the saving is done.
  • The saving will be performed from both memory data and Disk Cache data, as needed.
  • If Every Nth frame was simulated, no intermediate frames will be interpolated and saved.