Quantcast
Channel: OpenCV Q&A Forum - RSS feed
Viewing all articles
Browse latest Browse all 41027

GaussianBlur and Canny execution times are much longer on T-API

$
0
0
Hello. I've just started to learn OpenCV 3. I'm on OS X Yosemite. Here's my clinfo in the GPU part: Device Name GeForce GT 330M Device Vendor NVIDIA Device Vendor ID 0x1022600 Device Version OpenCL 1.0 Driver Version 10.0.31 310.90.10.05b12 Device OpenCL C Version OpenCL C 1.1 Device Type GPU Device Profile FULL_PROFILE Max compute units 6 Max clock frequency 1100MHz Max work item dimensions 3 Max work item sizes 512x512x64 Max work group size 512 Preferred work group size multiple 32 Preferred / native vector sizes char 1 / 1 short 1 / 1 int 1 / 1 long 1 / 1 half 0 / 0 (n/a) float 1 / 1 double 0 / 0 (n/a) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (n/a) Address bits 32, Little-Endian Global memory size 268435456 (256MiB) Error Correction support No Max memory allocation 134217728 (128MiB) Unified memory for Host and Device No Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type None Image support Yes Max number of samplers per kernel 16 Max 2D image size 4096x4096 pixels Max 3D image size 2048x2048x2048 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Local Local memory size 16384 (16KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 9 Max size of kernel argument 4352 (4.25KiB) Queue properties Out-of-order execution No Profiling Yes Profiling timer resolution 1000ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Device Available Yes Compiler Available Yes Device Extensions cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics I wrote a little program to test T-API and it turns out that GaussianBlur and Canny take much much longer time to execute on T-API. Here's the code. It loads image and applies these two filter without and with T-API: double totalTime = 0; int64 start = getTickCount(); cvtColor(image, gray, COLOR_BGR2GRAY); double timeMs = (getTickCount() - start) / getTickFrequency() * 1000; totalTime += timeMs; cout << "cvtColor ms [" << timeMs<< "]" << endl; start = getTickCount(); GaussianBlur(gray, gray, Size(7, 7), 1.5); timeMs = (getTickCount() - start) / getTickFrequency() * 1000; totalTime += timeMs; cout << "GaussianBlur ms [" << timeMs<< "]" << endl; start = getTickCount(); Canny(gray, gray, 0, 50); timeMs = (getTickCount() - start) / getTickFrequency() * 1000; totalTime += timeMs; cout << "Canny ms [" << timeMs<< "]" << endl; cout << "= Total [" << totalTime << "]" << endl; // TAPI cout << endl << "TAPI results" << endl; totalTime = 0; UMat uimage; UMat ugray; imread(argv[1], CV_LOAD_IMAGE_COLOR).copyTo(uimage); start = getTickCount(); cvtColor(uimage, ugray, COLOR_BGR2GRAY); timeMs = (getTickCount() - start) / getTickFrequency() * 1000; totalTime += timeMs; cout << "TAPI cvtColor ms [" << timeMs<< "]" << endl; start = getTickCount(); GaussianBlur(ugray, ugray, Size(7, 7), 1.5); timeMs = (getTickCount() - start) / getTickFrequency() * 1000; totalTime += timeMs; cout << "TAPI GaussianBlur ms [" << timeMs<< "]" << endl; start = getTickCount(); Canny(ugray, ugray, 0, 50); timeMs = (getTickCount() - start) / getTickFrequency() * 1000; totalTime += timeMs; cout << "TAPI Canny ms [" << timeMs<< "]" << endl; cout << "= Total [" << totalTime << "]" << endl; Here's the output: cvtColor ms [85.2414] GaussianBlur ms [0.424749] Canny ms [1.04261] = Total [86.7088] TAPI results TAPI cvtColor ms [5.89088] TAPI GaussianBlur ms [15.9865] TAPI Canny ms [17.2889] = Total [39.1663] As you see the cvtColor's time on TAPI is much lower. BUT the GaussianBlur's and Canny's times are much higher. Can you, please, explain to me how can that be? Thank you.

Viewing all articles
Browse latest Browse all 41027

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>