How to get performance
writing fast code is writing parallel code
writing parallel code does not start with MPI or OpenMP
single thread performance should be improved first
your goal is not scalability, but time to solution!
learn to exploit lower levels of parallelism
make it visible - the compiler will make the rest