Troubleshooting Parallel Servers and MPI
This section contains some troubleshooting tips when working with parallel servers and MPI distributions.
Specifying Parallel Hosts Yields Invalid Entries
- Make sure that the specified host names are spelled and configured correctly.
Command-Line Option Fails
- Ensure that the command-line shown in the GUI conforms to the options described in Command-Line Reference. See Command-Line Reference.
A Windows Cluster Does Not Work
- Install the most recent release version of Simcenter STAR-CCM+ available using the same installation path on all machines, and ensuring that the MPI component is chosen and set up correctly.
- Ensure credentials are registered as described in Registering Your Credentials for Distributed Simulations on Windows.
- Verify on each machine that you can start serial and local parallel servers.
- Start the multi-host parallel server, trying one host at a time. Starting remote servers is not supported under Windows, so when starting a multi-host parallel server the primary process must be run on the local host.
-
If it does not work, make sure that your firewall settings are not affecting the cluster communication. If the problems persist, you can start the server from the command line with the -v option to get debug output, as follows:
"<STAR-CCM+_INSTALL_DIR>\starccm+" -server -on host1,host2 -v
where host1 and host2 are the two machines that you want to run on, and one of them is the current local machine.
Shared Memory Limits Are Too Low
Linux workstations are often configured with low limits on the amount of allowable shared memory. This restricts how much memory can be pinned by the libraries that MPI uses. These libraries can print warning messages even when only using a single host and they usually indicate the limits are set too low—even if the library isn't being used. Set workstations to have high or preferably unlimited limits.
With Open MPI, a fallback mechanism exists for cases when limits are too low. In such a case, Simcenter STAR-CCM+ generates a warning message. When deciding on memory limits, consider that this fallback mechanism might diminish communication performance. The Open MPI Frequently Asked Questions describe how the limits are changed. (See the answer to How can a system administrator (or user) change locked memory limits?)
User Process Limits Are Too Low
The "user process" limit on Linux workstations restricts how many processes you can create. For parallel runs, this limit can be too low, resulting in undesired behavior such as not being able to connect via SSH to a compute host with a running simulation. If you encounter such behavior, set workstations or cluster nodes to have higher user process limits.
Issues on TCP Networks with High Process Counts
TCP networks are usually not designed for massively parallel workloads and thus, significant losses of parallel efficiency can be expected when running Simcenter STAR-CCM+ on very high process counts over TCP. To avoid poor parallel efficiency and other related scaling issues on such networks, Simcenter STAR-CCM+ supports running over TCP only up to 500 cores. You can still try running at larger scales, but please consider using a high-performance network such as InfiniBand, OmniPath, or Elastic Fabric Adapter instead of TCP for large-scale simulations if you experience usability or efficiency issues.
Suboptimal Performance on AWS EFA Systems
When running on AWS systems with Elastic Fabric Adapter (EFA), the default Open MPI
may lead to suboptimal performance. Please consider evaluating if Intel MPI
(-mpi intel
) can lead to better performance for your simulation
and setup.