Balancing
Care must be taken to balance a parallel configuration to achieve maximum performance. Generally, allocate each process to a single core where no other jobs are being executed. Any additional load on the core to which a process is allocated results in an imbalance in the parallel configuration and paging, resulting to a degradation in performance. In general, the slowest core in the machine limits the parallel simulation: so if one core has a greater load than the remaining cores, all cores are limited by the excessively loaded core.
In the case of a distributed parallel simulation, performance is limited by the slowest machine in the cluster. If one node has a greater load than the remaining nodes, all nodes are all limited by the excessively loaded node.
For detailed information about assessing the performance capabilities of your hardware, see Performance Benchmarks.