Skip to content

Conversation

@Squareys
Copy link
Contributor

@Squareys Squareys commented Sep 3, 2015

Hello everybody!

Here's an work-in-progress pullrequest for an optimized implementation of 2D/3D ArrayImgs consisting of NativeType.

TODOs:

  • 2D Implementation of MapNeighborhood
  • 2D Implementation of MapNeighborhoodWithCenter
  • 3D Implementation of MapNeighborhood
  • 3D Implementation of MapNeighborhoodWithCenter
  • Alias for MapNeighborhood to optimized implementation
    • Test

Greetings,
Squareys

@Squareys Squareys force-pushed the array-img-opt-neighborhoods branch 2 times, most recently from f2d9e8a to 8184674 Compare September 4, 2015 11:13
@Squareys
Copy link
Contributor Author

Squareys commented Sep 4, 2015

@dietzc This is pretty much done. What is still left is to consider whether we want to provide this as a alternate implementation to the other implementations which use Shape as parameter. The advantage would, ofcourse, be that things could get implicitly faster. Ignoring out of bounds pixels should probably not be done implicitly, since current expected behaviour for NeighborhoodMap is that pixels outside of bounds are allways accessed, independent of whether there are any (generated by OOB).

@dietzc
Copy link
Member

dietzc commented Sep 4, 2015

We should definetively add this for rectangular shapes. The code should be more or less the same as what you did for benchmarking oob.

@Squareys
Copy link
Contributor Author

Squareys commented Sep 7, 2015

Okay, I have an idea.

@Squareys
Copy link
Contributor Author

Squareys commented Sep 7, 2015

Alright, I came up with something which saves all my aforementioned problems. I need to do some tests and then abstract everything to not have to duplicate code for the *WithCenter implementation.

This is however a performance decrease compared to using the optimized op directly. I tried to accommodate for that by using multithreading for different intervals, but no idea if that was effective. I could provide benchmarks if neccessary.

@Squareys
Copy link
Contributor Author

Squareys commented Sep 7, 2015

While testing, I found some problems which will require me to do a pullrequest for imglib2-algorithm first. This will take a while.

@dietzc
Copy link
Member

dietzc commented Sep 7, 2015

can you shortly ellaborate what you are planning.

@Squareys
Copy link
Contributor Author

Squareys commented Sep 7, 2015

I need access to the span and skipCenter of the RectangleShape for conforms() and for converting the RectangleShape parameter to the span parameter of the optimized implementation.

I already opened an issue for that a while ago, but since I need it now, I will write those getters, do a pullrequest and then I can use them here in ops.

@dietzc
Copy link
Member

dietzc commented Sep 7, 2015

Ah alright. So, your plan is to split up the image in the individual intervals and perform an individual map on each of these individual intervals. Safe interval without OOB, unsafe with. Right? If you use existing Map inside the intervals, you have multi-threading for free.

@Squareys
Copy link
Contributor Author

Squareys commented Sep 7, 2015

If you use existing Map inside the intervals, you have multi-threading for free.

I know, but I am using it outside the intervals.

@dietzc
Copy link
Member

dietzc commented Sep 7, 2015

I don't understand. Lets say we can split a 2D image in four intervals: one in the middle and four at the border. then you can use your optimized variant in the center interval and a default variant on the other intervals. this of course only works for RectangleShape and still you need to know the span.

@Squareys
Copy link
Contributor Author

Squareys commented Sep 7, 2015

Lets say we can split a 2D image in four intervals

Then those intervals will be iterated over in 4 threads. Some threads using unoptimized, standard implemention, one using the optimized implementation for the center interval.

@dietzc
Copy link
Member

dietzc commented Sep 7, 2015

Then those intervals will be iterated over in 4 threads. Some threads using unoptimized, standard implemention, one using the optimized implementation for the center interval.

why 4 threads? why don' you use the default ops().mapNeighborhood(...) in the four unsafe Intervals which uses the parallel implementation anyway?

@Squareys
Copy link
Contributor Author

Squareys commented Sep 7, 2015

why 4 threads? why don' you use the default ops().mapNeighborhood(...) in the four unsafe Intervals which uses the parallel implementation anyway?

Ah, now I understand. Well that happens anyway. But everything would need to wait for the "optimized" implementation, which is not threaded. I could of course use only two threads, one for the center interval and one for all the others.

@dietzc
Copy link
Member

dietzc commented Sep 7, 2015

Ah, now I understand. Well that happens anyway. But everything would need to wait for the "optimized" implementation, which is not threaded. I could of course use only two threads, one for the center interval and one for all the others.

Can't you easily parallelize your optimized version, too (e.g. splitting up the center interval in blocks ...? I mean, how much is the performance gain of the non-parallelized version of yours compared to the unparallellized version?

@Squareys
Copy link
Contributor Author

Squareys commented Sep 8, 2015

The idea was the following: I cannot use ops.map(), because that would recursively call my own MapNeighborhoodOp implementation. The original MapNeighborhood is already threaded, though, since it uses MapIterableToIterable, and ops can choose the Parallel variant.

I am cutting the original image in 2*numDimensions+1 intervals. Once center and many border intervals. All intervals are iterated over in parallel. Why? Because I cannot guarantee that ops chooses a parallelized implementation for iterating over the border intervals. Also, it would wait until it iterated over the entire interval whereas in my implementation it could iterate parallelized over mutliple intervals parallelized (sic: There are two "parallelized" meant to be there).

My optimized version could be parallelized, of course, but the idea behind it was to help jit be able to optimize the loops, which kicks in only after a few iterations. If we split that up, then the chunks might not be big enough for jit to realize there is a hotspot to potentially optimize/on-stack-replace or whatever it does. Disclaimer: That is not based off of any statistical evidence.

I hope that clears up any issues/concerns.
Greetings, Squareys.

@dietzc
Copy link
Member

dietzc commented Sep 8, 2015

My optimized version could be parallelized, of course, but the idea behind it was to help jit be able to optimize the loops, which kicks in only after a few iterations. If we split that up, then the chunks might not be big enough for jit to realize there is a hotspot to potentially optimize/on-stack-replace or whatever it does

Can you support this statement with a benchmark? I have the feeling, that your optimized version + palletization will be the fastest (as the JIT might recognize it across threads).

To avoid code duplication, I decided to expand the existing implementation
rather than implementing a new Op for 3D/1D. While this may not be
"optimal peformance", the maintainability outweights the minimal and
probably unsignificant "performance loss".

Signed-off-by: squareys <[email protected]>
... implementation `MapNeighborhoodNativeType`. This was requested to be
able to let OpService choose the optimized implementation implicitly.
There is however some performance loss which is hopefully traded off by
using multiple threads.

Signed-off-by: squareys <[email protected]>
@Squareys Squareys force-pushed the array-img-opt-neighborhoods branch from d87609b to 17ccca3 Compare September 14, 2015 15:23
@Squareys
Copy link
Contributor Author

Current state: I realized that my optimized implementation works only for complete ArrayImgs and not subintervals. I almost fixed that now, but the implementation will require access to the height of the original image (for skipping to the next line) which is why I will need to change the parameters of that a bit.

Should be done tomorrow.

This Op provides the same interface as all the other MapNeighborhood Ops and
therefore makes the optimized implementation of MapNeighborhood for NativeType
(namely `MapNeighborhoodNativeType`) available for implicit optimization of
depending algorithms.

Signed-off-by: squareys <[email protected]>
@Squareys Squareys force-pushed the array-img-opt-neighborhoods branch from 17ccca3 to e102430 Compare September 15, 2015 09:27
@Squareys Squareys changed the title [WIP] Neighborhood Maps for NativeType ArrayImg Neighborhood Maps for NativeType ArrayImg Sep 15, 2015
@Squareys
Copy link
Contributor Author

@dietc Done. We need to wait for imglib2-algorithm 0.3.2 before merge.

@dietzc
Copy link
Member

dietzc commented Sep 15, 2015

I kindly asked @tpietzsch to release. He will.

@Squareys
Copy link
Contributor Author

Benchmarks have run!

Minimum filter 3D with RectangleShape and sigmas of 1, 2, 4.
5 warmup iterations, 20 iterations.

sigma = 1:
imglib2               4674.617   ±   100.703  ms/op
imagej-ops-extended   2292.921   ±   143.092  ms/op
imagej-ops             492.234   ±    50.990  ms/op

sigma = 2:
imglib2               17442.640 ±  2076.525  ms/op
imagej-ops-extended    9263.320 ±   636.437  ms/op
imagej-ops             1448.860 ±   110.233  ms/op

sigma = 4:
imglib2                76368.879 ±   270.969  ms/op
imagej-ops-extended    47122.435 ±   956.176  ms/op
imagej-ops              7139.398 ±    70.917  ms/op

Summary: The imagej-ops way which is compatible with the common MapNeighborhood API is nearly twice as fast as the old imglib2 way. That is rather dissapointing, rgough, when you realize that the purely optimized version is ten times as fast.

@dietzc
Copy link
Member

dietzc commented Sep 16, 2015

Great work @Squareys. Can you shortly elaborate/confirm my assumpations on the implementations?

ImageJ-Ops:

  • Highly Optimized Jonathan version?
  • Parallelized over pixels

ImageJ-Ops-Extends:

  • Hightly-Optimized-Jonathan in the center region. Slow/ImgLib2 on borders?
  • Parallelized over regions?

ImgLib2:

  • Default existing Map?
  • Parallelized over pixels?

@Squareys
Copy link
Contributor Author

In Order:

  • Yes
  • No
  • Yes
  • Yes
  • No, just using a cursor, plain imglib2 implementation.
  • No.

@dietzc
Copy link
Member

dietzc commented Sep 17, 2015

@Squareys can you shorty explain how these results compare to something which runs parallelized over the pixels e.g. the default MapNeighborhood?

@dietzc
Copy link
Member

dietzc commented Nov 2, 2015

@Squareys can we discuss this PR next Thursday in KN1?

@Squareys
Copy link
Contributor Author

Squareys commented Nov 2, 2015

@dietzc Sure :)

@ctrueden
Copy link
Member

@Squareys Can you find any time to rebase this over the latest master? Let us know either way, because it would be great to get this merged. Thanks!

@ctrueden
Copy link
Member

I took a crack at rebasing this branch. However, the code has changed a lot (especially maps and unary/binary functions), so there were many conflicts. I pushed what I have so far to the array-img-opt-neighborhoods-CTR branch.

@Squareys Do you have any time to clean this up? If not, I will ask @LeonYang5114 to do it after he completes his current project.

@ctrueden
Copy link
Member

ctrueden commented Oct 3, 2018

Closed in favor of #395.

@ctrueden ctrueden closed this Oct 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants