Executable: mympiprog.x Resources: 32 processes, 2 nodes Machine: cn182 Started on: Wed Oct 15 16:56:23 2014 Total time: 7 seconds (0 minutes) Full path: /home/user Notes: Summary: mympiprog.x is CPU-bound in this configuration CPU: 88.6% |========| MPI: 11.4% || I/O: 0.0% | This application run was CPU-bound. A breakdown of this time and advice for investigating further is found in the CPU section below. As very little time is spent in MPI calls, this code may also benefit from running at larger scales. CPU: A breakdown of how the 88.6% total CPU time was spent: Scalar numeric ops: 50.0% |====| Vector numeric ops: 50.0% |====| Memory accesses: 0.0% | Other: 0.0% | The per-core performance is arithmetic-bound. Try to increase the amount of time spent in vectorized instructions by analyzing the compiler's vectorization reports. MPI: A breakdown of how the 11.4% total MPI time was spent: Time in collective calls: 100.0% |=========| Time in point-to-point calls: 0.0% | Effective collective rate: 1.65e+02 bytes/s Effective point-to-point rate: 0.00e+00 bytes/s Most of the time is spent in collective calls with a very low transfer rate. This suggests load imbalance is causing synchonization overhead; use an MPI profiler to investigate further. I/O: A breakdown of how the 0.0% total I/O time was spent: Time in reads: 0.0% | Time in writes: 0.0% | Effective read rate: 0.00e+00 bytes/s Effective write rate: 0.00e+00 bytes/s No time is spent in I/O operations. There's nothing to optimize here! Memory: Per-process memory usage may also affect scaling: Mean process memory usage: 2.33e+07 bytes Peak process memory usage: 2.35e+07 bytes Peak node memory usage: 2.8% | The peak node memory usage is very low. You may be able to reduce the amount of allocation time used by running with fewer MPI processes and more data on each process.