fortran - Is there something wrong in my MPI algorithm? -
i setup algorithm share data between different processors, , has worked far, i'm trying throw larger problem @ , i'm witnessing strange behavior. i'm losing pieces of data between mpi_isend's , mpi_recv's.
i present snippet of code below. comprised of 3 stages. first, processor loop on elements in given array. each element represents cell in mesh. processor checks if element being used on other processors. if yes, non-blocking send process using cell's unique global id tag. if no, checks next element, , on.
second, processor loops on elements again, time checking if processor needs update data in cell. if yes, data has been sent out process. current process blocking receive, knowing owns data , unique global id cell.
finally, mpi_waitall used request codes stored in 'req' array during non-blocking sends.
the issue i'm having entire process completes---there no hang in code. of data being received of cells isn't correct. check data being sent right printing each piece of data prior send operation. note i'm sending , receiving slice of array. each send pass 31 elements. when print array process received it, 3 out of 31 elements garbage. other elements correct. strange thing is same 3 elements garbage---the first, second , last element.
i want rule out isn't drastically wrong in algorithm explain this. or perhaps related cluster i'm working on? mentioned, worked on other models threw @ it, using 31 cores. i'm getting behavior when try throw 56 cores @ problem. if nothing pops out wrong, can suggest means test why pieces of send not making destination?
do = 1, num_cells ! skip cells data isn't needed other processors if (.not.needed(i)) cycle tag = gid(i) ! unique id of cell in entire system ghoster = ghosts(i) ! processor needs data cell call mpi_isend(data(i,1:tot_levels),tot_levels,mpi_datatype,ghoster,tag,mpi_comm,req(send),mpierr) send = send + 1 end sends = send-1 = 1, num_cells ! skip cells don't need data update if (.not.needed_here(i)) cycle tag = gid(i) owner = owner(i) call mpi_recv(data(i,1:tot_levels),tot_levels,mpi_datatype,owner,tag,mpi_comm,mpi_status_ignore,mpierr) end call mpi_waitall(sends,req,mpi_statuses_ignore,mpierr)
is problem you're not receiving of messages? note because mpi_send or mpi_isend completes, doesn't mean corresponding mpi_recv posted/completed. return of send call means buffer can reused sender. data may still buffered internally somewhere on either sender or receiver.
if it's critical know message received, need use different variety of send mpi_ssend or mpi_rsend (or nonblocking versions if prefer). note won't solve problem. make easier figure out messages aren't showing up.
Comments
Post a Comment