Skip to content

Conversation

@ByzanTine
Copy link
Collaborator

This is rather hard to test, I managed to follow TACO guys advice and trace their source code.

All in all, the kernel is as we expected:

int compute(taco_tensor_t *A38, taco_tensor_t *A0, taco_tensor_t *A10, taco_tensor_t *A4) {
  int A381_dimension = (int)(A38->dimensions[0]);
  int A382_dimension = (int)(A38->dimensions[1]);
  double* restrict A38_vals = (double*)(A38->vals); // >>> This is the output
  int* restrict A01_pos = (int*)(A0->indices[0][0]);
  int* restrict A01_crd = (int*)(A0->indices[0][1]);
  int* restrict A02_pos = (int*)(A0->indices[1][0]);
  int* restrict A02_crd = (int*)(A0->indices[1][1]);
  float* restrict A0_vals = (float*)(A0->vals); // >>> This is the COO matrix.
  int A101_dimension = (int)(A10->dimensions[0]);
  int A102_dimension = (int)(A10->dimensions[1]);
  double* restrict A10_vals = (double*)(A10->vals); // >>> A10 and A4 are the two other dense matrix. 
  int A41_dimension = (int)(A4->dimensions[0]);
  int A42_dimension = (int)(A4->dimensions[1]);
  double* restrict A4_vals = (double*)(A4->vals);

  #pragma omp parallel for schedule(static)
  for (int32_t pA38 = 0; pA38 < (A381_dimension * A382_dimension); pA38++) {
    A38_vals[pA38] = 0.0;
  }


// Two outer loop iterating the sparse kernel dimension. 
// Inner loop is one of the dense dimension. So complexity is nnz(X) * shape(A4)[1] 

  #pragma omp parallel for schedule(runtime)
  for (int32_t i32A0 = A01_pos[0]; i32A0 < A01_pos[1]; i32A0++) {
    int32_t i32 = A01_crd[i32A0];
    for (int32_t i33A0 = A02_pos[i32A0]; i33A0 < A02_pos[(i32A0 + 1)]; i33A0++) {
      int32_t i33 = A02_crd[i33A0];
      int32_t i33A38 = i32 * A382_dimension + i33;
      double ti35A38_val = 0.0;
      for (int32_t i35 = 0; i35 < A42_dimension; i35++) {
        int32_t i35A10 = i32 * A102_dimension + i35;
        int32_t i35A4 = i33 * A42_dimension + i35;
        ti35A38_val += (A0_vals[i33A0] * A10_vals[i35A10]) * A4_vals[i35A4];
      }
      A38_vals[i33A38] = ti35A38_val;
    }
  }
  return 0;
}

This form of checking doesn't quite scale though...

@ByzanTine ByzanTine requested a review from LinjianMa January 29, 2021 23:10
@LinjianMa
Copy link
Owner

I see, yeah I can see that it's not easy to test automatically. We can also maybe just use the Taco online service to test. I think the new test function is what we need for the optimizer, so can merge this PR.

@ByzanTine
Copy link
Collaborator Author

Well, I feel this test along doesn't really assert anything. If we want to check this somehow,

  1. We can time this call with (d^2 >> nnz) and assert a reasonable time boundary.
  2. We can call this with sparse x dense x dense vs dense x dense x dense and compare the time needed ratio, pass the test if the performance difference is high enough?
  3. Or we trust taco and never test these things \o/

@LinjianMa
Copy link
Owner

I think we don't need to test time at this stage. The time could be misleading, especially when the test cases are not large enough. This benchmark can be done later when we want to collect experimental results.

I think it will be fine for us to just evaluate based on the flop model. If taco implements in a different way then I think it's taco's problem, the model should be correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants