GaussianBlur and Canny execution times are much longer on T-API

Hello. I've just started to learn OpenCV 3. I'm on OS X Yosemite. Here's my clinfo in the GPU part: Device Name GeForce GT 330M Device Vendor NVIDIA Device Vendor ID 0x1022600 Device Version OpenCL 1.0 Driver Version 10.0.31 310.90.10.05b12 Device OpenCL C Version OpenCL C 1.1 Device Type GPU Device Profile FULL_PROFILE Max compute units 6 Max clock frequency 1100MHz Max work item dimensions 3 Max work item sizes 512x512x64 Max work group size 512 Preferred work group size multiple 32 Preferred / native vector sizes char 1 / 1 short 1 / 1 int 1 / 1 long 1 / 1 half 0 / 0 (n/a) float 1 / 1 double 0 / 0 (n/a) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (n/a) Address bits 32, Little-Endian Global memory size 268435456 (256MiB) Error Correction support No Max memory allocation 134217728 (128MiB) Unified memory for Host and Device No Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type None Image support Yes Max number of samplers per kernel 16 Max 2D image size 4096x4096 pixels Max 3D image size 2048x2048x2048 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Local Local memory size 16384 (16KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 9 Max size of kernel argument 4352 (4.25KiB) Queue properties Out-of-order execution No Profiling Yes Profiling timer resolution 1000ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Device Available Yes Compiler Available Yes Device Extensions cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics I wrote a little program to test T-API and it turns out that GaussianBlur and Canny take much much longer time to execute on T-API. Here's the code. It loads image and applies these two filter without and with T-API: double totalTime = 0; int64 start = getTickCount(); cvtColor(image, gray, COLOR_BGR2GRAY); double timeMs = (getTickCount() - start) / getTickFrequency() * 1000; totalTime += timeMs; cout << "cvtColor ms [" << timeMs<< "]" << endl; start = getTickCount(); GaussianBlur(gray, gray, Size(7, 7), 1.5); timeMs = (getTickCount() - start) / getTickFrequency() * 1000; totalTime += timeMs; cout << "GaussianBlur ms [" << timeMs<< "]" << endl; start = getTickCount(); Canny(gray, gray, 0, 50); timeMs = (getTickCount() - start) / getTickFrequency() * 1000; totalTime += timeMs; cout << "Canny ms [" << timeMs<< "]" << endl; cout << "= Total [" << totalTime << "]" << endl; // TAPI cout << endl << "TAPI results" << endl; totalTime = 0; UMat uimage; UMat ugray; imread(argv[1], CV_LOAD_IMAGE_COLOR).copyTo(uimage); start = getTickCount(); cvtColor(uimage, ugray, COLOR_BGR2GRAY); timeMs = (getTickCount() - start) / getTickFrequency() * 1000; totalTime += timeMs; cout << "TAPI cvtColor ms [" << timeMs<< "]" << endl; start = getTickCount(); GaussianBlur(ugray, ugray, Size(7, 7), 1.5); timeMs = (getTickCount() - start) / getTickFrequency() * 1000; totalTime += timeMs; cout << "TAPI GaussianBlur ms [" << timeMs<< "]" << endl; start = getTickCount(); Canny(ugray, ugray, 0, 50); timeMs = (getTickCount() - start) / getTickFrequency() * 1000; totalTime += timeMs; cout << "TAPI Canny ms [" << timeMs<< "]" << endl; cout << "= Total [" << totalTime << "]" << endl; Here's the output: cvtColor ms [85.2414] GaussianBlur ms [0.424749] Canny ms [1.04261] = Total [86.7088] TAPI results TAPI cvtColor ms [5.89088] TAPI GaussianBlur ms [15.9865] TAPI Canny ms [17.2889] = Total [39.1663] As you see the cvtColor's time on TAPI is much lower. BUT the GaussianBlur's and Canny's times are much higher. Can you, please, explain to me how can that be? Thank you.

GaussianBlur and Canny execution times are much longer on T-API

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112