.. _simulation:

========================================
 FLAME GPU Simulation and Visualisation
========================================


Introduction
============

The processes of building and running a simulation is made easier described within this chapter as are a number of tools and procedures which simplify the simulation code generation and compilation of simulation executables.
In order to use the FLAME GPU SDK it should be placed in a directory which does not contain any spaces (preferably directly within the C: drive or root or root operating system drive).
The host machine must also be running windows with a copy of the .NET runtime (used within the XSLT template processor) and must contain NVIDIA GPU hardware with Compute level 1.0.

Generating a Functions File Template
====================================

Chapter :ref:`Summary of Agent Function Arguments` previously described the exact argument order for agent function declarations however in most cases it is sensible to use the provided XSLT template ``functions.xslt`` located in the ``FLAMEGPU/templates`` directory within the FLAME GPU SDK) to generate a agent function source file with empty agent function declarations automatically using your XMML model file.
Once this has been generated the agent function scripts can be implemented within the function declarations rather easily.
Care must however be taken in ensuring that if the XMML model file is later modified that the agent function arguments are updated manually where necessary.
Likewise be careful not to overwrite any existing function source file when generating a new one using the XSLT template.
Generation of blank function source files is not incorporated into the visual studio template project and must be manually accomplished.
A .NET based XSLT processor is provided within the FLAME GPU SDK for this purpose (``XSLTProcessor.exe`` located in the ``tools`` directory) and can be used via the command line as follows (or via the ``GenerateFunctionsFileTemplate`` batch file located in the ``tools`` directory of the FLAME GPU SDK);

.. code-block:: bash

    XSLTProcessor.exe XMLModelFile.xml functions.xslt functions.c


Alternatively any compliant XSLT processor such as Xalan, Unicorn or even Firefox web browser can be used.

FLAME GPU Template Files
========================


The FLAME GPU SDK contains a number of XSLT templates which are used to generate the dynamic simulation code.
A brief summary of the functionality and contents of each template file is as follows: 

- ``header.xslt`` This template file generates a header file which contains any agent and message data structures which are common in many of the other dynamically generated simulation source files. The template also generates function prototypes for simulation functions and functions which are visible externally within custom C or C++ code.
- ``main.xslt`` This template file generates a source file which defines the main execution entry point function which is responsible for handling command line options and initialising the GPU device.
- ``io.xslt`` This template file generates a source file which contains functions for loading initial agent XML data files (see :ref:`Initial XML Agent Data`) into the simulation and saving the simulation state back into XML format.
- ``simulation.xslt`` This template file generates a source file containing the host side simulation code which includes loading data to and from the GPU device and making a number of CUDA kernel calls which perform the simulation process.
- ``FLAMEGPU\_kernels.xslt`` This template file generates a CUDA header file which contains the CUDA kernels and device functions which make up the simulation.
- ``visualisation.xslt`` This template file generates a source file which will allow basic visualisation of the simulation using sphere based representation of agents in 3D space. The source file is responsible for CUDA OpenGL interoperability and rending using OpenGL. The source file includes a `visualisation.h` file containing a number of definitions and variables which is not generated by any templates and should be specified manually.


Compilation Using Visual Studio
===============================


The FLAME GPU SDK and examples are targeted at a specific CUDA and Visual Studio version. The Visual Studios XML editor includes validation support and XML tag auto completion which makes defining an XMML model incredibly easy.
The following subsections describe the various aspects of a FLAME GPU project file and describe the build processes.

Visual Studio Project Build Configurations
------------------------------------------

The FLAME GPU examples and template project file contain build configurations 64 bit Windows (``x64``) environments. 32 bit windows has been removed due to limitations on GPU memory addressing since version 1.3.0.
For each platform the project also contains four configurations for debugging (``Debug``) and release versions (``Release``) of both ``console`` based simulation and ``visualisation`` simulation.
The two debug options disable all compiler optimisations and generate debug information for debugging host (non GPU) code and enables CUDA device emulation for GPU (device) debugging.
The visualisation configurations enable building of visualisation code and specify a pre processor macro (``VISUALISATION``) which is used by a number of pre-processor conditionals to change the simulations expected arguments (see :ref:`Simulation Execution Modes and Options`).

Visual Studio Project Virtual File Structure
--------------------------------------------

Within the FLAME GPU examples and template projects code is organised into the following virtual folders; 

- ``FLAME GPU`` Consisting of a folder containing the FLAME GPU XML schemas and Code generating templates. These files are shared amongst all examples so editing them will change simulation code generated for other projects.
- ``FLAMEModel`` Contains the XMML model file and the agent functions file (usually called ``functions.c``). Note that the ``functions.c`` file is actually excluded from the build processes as it is built by the dynamically generated ``simulation.cu`` source file which includes it.
- ``Dynamic Code`` Contains the dynamically generated FLAME GPU simulation code. This code will be overwritten each time the project is built so any changes to this files will be lost unless template transformation is turned off using the FLAME GPU build rule (see :ref:`FLAME GPU Build Rule Options`).
- *Additional Source Code* This folder should contain any hard coded simulation specific source or header files. By default the FLAME GPU project template defines a single ``visualisation.h`` file in this folder which may be modified to set a number of variables such as viewing distance and clipping. Within the FLAME GPU examples this folder is typically used to sore any model specific visualisation code which replaces the dynamically generated visualisation source file.

The physical folders of the SDK structure a self explanatory however it is worth noting that executable files generated by the Visual Studio build processes are output in the SDKs ``bin`` folder which also contains the CUDA run time ``dlls``. 

Build Process
-------------

The Visual Studio build process consists of a number of stages which call various tools, compilers and linkers.
The first of these is the FLAME GPU build tool (described in more detail in the following section) which generates the dynamic simulation code from the FLAME GPU templates and mode file.
Following this the simulation code (within the Dynamic Code folder) is built using the CUDA build rule which compiles the source files using the NVIDIA CUDA compiler 
``nvcc``.
Finally any C or C++ source files are compiled using MSVC compiler and are then linked with the CUDA object files to produce the executable.
To start the build processes select the ``Build`` menu followed by ``Build Solution`` or use the ``F7`` hotkey.
If the first build step in the Visual Studio skips the FLAME GPU build tool a complete rebuilt can be forced by selecting the ``Build`` menu followed by ``Rebuild Solution`` (or ``Ctrl + Alt + F7``).

FLAME GPU Build Rule Options
----------------------------

The FLAME GPU build rule is configured by selecting the XMML model file properties.
Within the Build rule the XSLT options tab (see Figure) allows individual template file transformations to be toggled on or off.
These options are configuration specific and therefore console configurations by default do not processes the visualisation template.

.. figure:: /images/figure2.jpg
   :alt: FLAME GPU Modelling and Simulation Processes
   :width: 75.0%
   
   FLAME GPU Build Rule XSLT Options Tab


Visual Studio Launch Configuration Command Arguments
----------------------------------------------------

In order to set the execution arguments (described in the next section) for simulation executable in any one of one of the four launch configurations, the ``Command Arguments`` property can be set form the Project Properties Page (Select ``Project`` Menu followed by ``FLAMEGPU\_Project Properties``).
The ``Command Arguments`` property is located under ``Configuration Properties -> Debug`` (see :ref:`Agent Function Scripts and the Simulation API`).
Each configuration has its own set of ``Command Arguments`` so when moving between configurations these will need to be set.
Likewise the ``Configuration Properties`` are computer and user specific so these cannot be preset and must be specified the first time each example is compiled and run.
The Visual Studio macro ``$InputDir`` can be used to specify the working directory of the project file which makes locating initial agent data XML files for many of the examples much easier (these are normally located in the iterations folders of each example).

The Command Arguments have been set the simulation executable can be launched by selecting ``Start Debugging`` from the ``Debug`` menu or using the ``F5`` hotkey (this is the same in both release and debug launch configurations).

.. figure:: /images/figure3.jpg
   :alt: FLAME GPU Project Properties Page
   :width: 75.0%
   

Compilation using Make (for Linux and Windows)
==============================================


``make`` can be used to build FLAME GPU under linux and windows (via a windows implementation of ``make``).

Makefiles are provided for each example project ``examples/project/Makefile``), and for batch building all examples (``examples/Makefile``).

To build a console example in release mode:

#. Download the FLAME GPU SDK release or alternatively clone the project using Git (it will be cloned into the folder ``FLAMEGPU``):  

    cd examples/EmptyExample/
    make console

Or for a visualisation example in release mode:

    cd examples/EmptyExample/
    make Visualisation

*Debug* mode executables can be built by specifying ``debug=1`` to make, ``make all debug=1``. The generated executable can then be debugged using ``cuda-gdb``.

In the project specific portion of the Makefile (i.e ``examples/EmptyExample/Makefile``) several variables exist which allow the project to be customised.

- ``EXAMPLE``: Controls the name of the project / executables generated.
- ``HAS_VISUALISATION``: Determins if a visualisation mode should be supported or not.
- ``CUSTOM_VISUALISATION``: Determins if a custom or the default visualisation should be used.
- ``FLAMEGPU_ROOT``: The relative path from the Makefile to the main ``FLAMEGPU`` directory. I.e. ``../../``
- ``EXAMPLE_BIN_DIR``: Path to the location to place executables.
- ``EXAMPLE_BUILD_DIR``: Path to the build directory for this project.
- ``SMS``: Set the CUDA Compute Capabilities to build executables for
- ``TRANSFORM_*_XLS``: Prevent the relevant ``XSLT`` file from being transformed

  - ``TRANSFORM_HEADER_XSLT_DISABLED``: ``header.xslt``
  - ``TRANSFORM_FLAMEGPU_KERNALS_XSLT_DISABLED``: ``flamegpu_kernals.xslt``
  - ``TRANSFORM_IO_XSLT_DISABLED``: ``io.xslt``
  - ``TRANSFORM_SIMULATION_XSLT_DISABLED``: ``simulation.xslt``
  - ``TRANSFORM_MAIN_XSLT_DISABLED``: ``main.xslt``
  - ``TRANSFORM_VISUALISTION_XSLT_DISABLED``: ``visualistion.xslt``

For more information on building FLAME GPU via make, run ``make help`` in an example directory.

Creating a New FLAME GPU Example Project
========================================

The simplest way to create a new FLAME GPU example project is to copy and modify an existing project, renaming visual studio solution / project files, and modifying the Makefile.

A python script is provided to simplify this process for you, makeing the required changes. I.e. to create a new example projected called ``NewExample``, based on the ``EmptyExample`` run the following command.

.. code-block:: bash

    python tools/new_example.py --base EmptyExample NewExample


Simulation Execution Modes and Options
======================================

FLAME GPU simulations require a number of arguments depending on either console or visualisation mode. Both are described in the following subsections.


Console Mode
------------


Simulation executables built for console execution require two arguments, with several optional arguments.

.. code-block:: bash

    usage: EmptyExample [-h] [--help] input_path num_iterations [cuda_device_id] [XML_output_override]

    required arguments:
      input_path           Path to initial states XML file OR path to output XML directory
      num_iterations       Number of simulation iterations

    options arguments:
      -h, --help           Output this help message.
      cuda_device_id       CUDA device ID to be used. Default is 0.
      XML_output_override  Flag indicating if iteration data should be output as XML
                           0 = false, 1 = true. Default 1


The result of running the simulation will be a number of output XML files which will be numbered from ``1`` to ``n``, where ``n`` is the number of simulations specified by the ``Iterations`` argument.
It is possible to turn XML output on or off by changing the definition of the ``OUTPUT_TO_XML`` macro located within the main.xslt template to true (``1``) false (``0``).

Visualisation Mode
------------------

Simulation executables built for visualisation require only a single argument (usage shown below) which is the same as the first argument for with console execution (an initial agent XML file).
The number of simulations iterations is not required as the simulation will run indefinitely until the visualisation is closed.
As with console execution there are additional optional arguments available.

.. code-block:: bash

    usage: EmptyExample [-h] [--help] input_path [cuda_device_id]

    required arguments:
      input_path           Path to initial states XML file OR path to output XML directory

    options arguments:
      -h, --help           Output this help message.
      cuda_device_id       CUDA device ID to be used. Default is 0.


Many of the options for the default visualisation are contained within the ``visualisation.h`` header file and include the following;

- ``SIMULATION_DELAY`` Many simulations are executed extremely quickly making visualisation a blur. This definition allows an artificial delay by executing this number of visualisation render loops before each simulation iteration is processed.
- ``WINDOW_WIDTH`` and ``WINDOW_HEIGHT`` Specifies the size of the visualisation window 
- ``NEAR_CLIP`` and ``FAR_CLIP`` Specifies the near an far clipping plane used for OpenGL rendering.
- ``SPHERE_SLICES`` The number of slices used to create the sphere geometry representing a single agent in the visualisation.
- ``SPHERE_STACKS`` The number of stacks used to create the sphere geometry representing a single agent in the visualisation.
- ``SPHERE_RADIUS`` The physical size of the sphere geometry representing a single agent in the visualisation. This will need to be a sensible value which corresponds with the environment size and agent locations within your model/simulation.
- ``VIEW_DISTANCE`` The camera viewing distance. Again this will need to be a sensible value which corresponds with the environment size and agent locations within your model/simulation.
- ``LIGHT_POSITION`` The visualisation will contain a single light source which will be located at this position.
- ``PAUSE_ON_START`` If defined the simulation is paused on launch, allowing the simulation to be visualised one iteration at a time. 


The colour of spheres in the default visualisation is determined using an agent variable ``colour`` (or alternatively ``type`` or ``state``, however ``colour`` is the preferred option.) This can be an ``int`` or a ``float``, with a set of distinct colours available, using the following defined values:

- ``FLAME_GPU_VISUALISATION_COLOUR_BLACK``
- ``FLAME_GPU_VISUALISATION_COLOUR_RED``
- ``FLAME_GPU_VISUALISATION_COLOUR_GREEN``
- ``FLAME_GPU_VISUALISATION_COLOUR_BLUE``
- ``FLAME_GPU_VISUALISATION_COLOUR_YELLOW``
- ``FLAME_GPU_VISUALISATION_COLOUR_CYAN``
- ``FLAME_GPU_VISUALISATION_COLOUR_MAGENTA``
- ``FLAME_GPU_VISUALISATION_COLOUR_WHITE``
- ``FLAME_GPU_VISUALISATION_COLOUR_BROWN``


.. @todo - Document the controls for the default visualisation.

Creating a Custom Visualisation
===============================


Customised visualisation can easily be integrated to a FLAME GPU project by extending the automatically generated visualisation file (the output of processing ``visualisation.xslt``). *Note: When doing this within Visual Studio it is important to turn off the template processing of the ``visualisation.xslt`` file in each of the launch configurations as processing them will overwrite any custom code!.*
Many of the FLAME GPU SDK examples use customised visualisations in this way.
As with the default visualisations any custom visualisation must define the following function prototypes defined in the automatically generated simulation header.

.. code-block:: c
   :linenos:
       
    extern "C" void initVisualisation();

    extern "C" void runVisualisation();


The first of these can be used to initialise any OpenGL memory and CUDA OpengGL bindings as well as displaying the user interface.
The second of these functions must take control of the simulation by repeatedly calling the draw and singleIteration (which advances the simulation by a single iteration step) functions in a recursive loop.
A more detailed description of the default rendering technique is provided within other FLAME GPU documentation (listed in :ref:`Purpose of This Document`).

Performance Tips
================


The GPU offers some enormous performance advantages for agent simulation over more traditional CPU based alternatives.
With this in mind it is possible to write extremely sub optimal code which will reduce performance.
The following is a list of performance tips for creating FLAME GPU model files;

General Usage of FLAME GPU

- FLAME GPU is optimal where there are very large numbers of relatively simple agents which can be parallelised.
- Populations of agents with very low numbers will perform poorly (in extreme cases slower than if they were simulated using the CPU). If you require an agent population with very few agents consider writing some custom CPU simulation code and transferring any important information into simulation constants to be read by larger agent populations during the FLAME GPU simulation step.
- Outputting information to disk (XML files) is painfully slow in comparison with simulation speeds so consider outputting information visually or only after larger numbers of simulation iterations.

Model Specification

- Minimise the number of variables with agents and message data where possible.
- Try to conceptualise and fully specify the model before completing the agent functions script to avoid making mistakes with agent function arguments. Try to think in terms of X-Machines agents!

Agent Function Scripting

- Small compute intensive agent functions are more efficient than functions which only iterate messages. Try to minimise the number of times message lists are iterated.
- Keep agent functions small and do not define more local variables than is strictly required. Reuse local variables where possible if they are no longer needed and before they go out of scope.

Message Iteration

- For small populations of agents (generally less than 2000 but dependant on hardware and the model) non partitioned messaging has less overhead and is similarly comparable to spatial partitioning.
- For large populations of distributed agents with limited communication spatially partitioned message communication will be much faster.


Detailed profiling using NVTX
=============================

Additional profiling information can be exported for the visual profiler using the Nvidia Tools Extension Library (NVTX). 
NVTX markers and ranges can be optionally enabled to provide enhanced profiling. 

Enabling NVTX Markers via makefile
----------------------------------

To achieve this using the ``Makefile``, simply add ``profile=1`` as an argument to make, on any platform:

.. code-block:: bash

    make console profile=1

Enabling NVTX Markers in Visual Studio
--------------------------------------

To enable NVTX markers in visual studio the solution must be modified to add the relevant definition, include path and linker flags as follows:


- ``C/C++ > Preprocessor > Preprocessor Definitions``
    - Add ``PROFILE``
- ``CUDA C/C++ > Common > Additional Include Directories``
    - Add ``$(NVTOOLSEXT_PATH)include``
- ``Linker > General > Additional Library Directories``
    - Add ``$(NVTOOLSEXT_PATH)lib/x64``
- ``Linker > Input > Additional Dependencies``
    - Add ``nvToolsExt64_1.lib``


Parameter Exploration
=====================

Agent Based Simulations typically have many parameters which control certain aspects of the simulation, which can be used for calibration. As of FLAME GPU 1.5.0 the simplest method to achieve this is to use multiple initial states files for separate simulations which contain different values for environmental constants, and run the simulation on each of the files. 

For instance, for a model with 2 environmental constants representing model parameters called ``SEED`` and ``INIT_POPULATION`` which are defined in ``XMLModelFile.XML`` within the ``<gpu:environment>`` tag as follows:

.. code-block:: xml
   :linenos:

   <gpu:constants>
     <gpu:variable>
       <type>unsigned int</type>
       <name>SEED</name>
       <defaultValue>0</defaultValue>
     </gpu:variable>
     <gpu:variable>
       <type>unsigned int</type>
       <name>INIT_POPULATION</name>
       <defaultValue>1</defaultValue>
     </gpu:variable>
   </gpu:constants>

If we wish to run this with ``SEED`` values ``0``, ``1`` & ``2`` and ``INIT_POPULATION`` values ``10``, ``100`` and ``1000`` this could be achieved with 9 initial states files (stored in separate folders to avoid overwriting output). A script could be used to create these files for large parameter sweeps. 

This could have the following structure:

.. code-block:: none

   iterations
       ├── 0-10
       │   └── 0.xml
       ├── 0-100
       │   └── 0.xml
       ├── 0-1000
       │   └── 0.xml
       ├── 1-10
       │   └── 0.xml
       ├── 1-100
       │   └── 0.xml
       ├── 1-1000
       │   └── 0.xml
       ├── 2-10
       │   └── 0.xml
       ├── 2-100
       │   └── 0.xml
       └── 2-1000
           └── 0.xml

The contents of each file would then be different. Assuming agents are created via an ``INIT`` function, each ``0.xml`` file could look as follows. 


``0-10/0.xml`` would contain:

.. code-block:: xml
   :linenos:

   <states>
       <itno>0</itno>
       <environment>
           <SEED>0</SEED>
           <INIT_POPULATION>10</INIT_POPULATION>
       </environment>
   </states>

``0-100/0.xml`` would contain:

.. code-block:: xml
   :linenos:

   <states>
       <itno>0</itno>
       <environment>
           <SEED>0</SEED>
           <INIT_POPULATION>100</INIT_POPULATION>
       </environment>
   </states>


``0-1000/0.xml`` would contain:

.. code-block:: xml
   :linenos:

   <states>
       <itno>0</itno>
       <environment>
           <SEED>0</SEED>
           <INIT_POPULATION>1000</INIT_POPULATION>
       </environment>
   </states>

And so on. Simulations could then be launched in batch via a script, either sequentially or concurrently depending upon the memory requirements of each model, and the availability of GPUs.