Quantcast
Channel: OpenCV Q&A Forum - RSS feed
Viewing all articles
Browse latest Browse all 41027

solvePnP returning incorrect values

$
0
0
Hi there everyone. I'm currently trying to implement an alternate method to webcam-based AR using an external tracking system. I have everything in my environment configured save for the extrinsic calibration. I decided to use `cv::solvePnP()` as it supposedly does pretty much exactly I want, but after two weeks I am pulling my hair out trying to get it to work. As it stands I pass in my image pixel coordinates acquired with `cv::findChessboardCorners()`. The world points are acquired with reference to a tracked marker affixed to the camera (The extrinsic is thus the transform from this marker's frame to the camera origin). I have tested this with data sets up to 50 points to mitigate the possibility of local minima, but for now I'm testing with only four 2D/3D point pairs. The resulting extrinsic I get from the rvec and tvec returned from `cv::solvePnP()` is massively off in terms of both translation and rotation relative to both a ground truth extrinsic I manually created as well as a basic visual analysis (The translation implies a 1100mm distance while the camera is at most 10mm away). I'm honestly running out of options, so if anyone can help I would be hugely in your debt. My test code is posted below and is the same as my implementation minus some rendering calls. The ground truth extrinsic I have that works with my program is as follows (Basically a pure rotation around one axis and a small translation): 1 0 0 29 0 .77 -.64 32.06 0 .64 .77 -39.86 0 0 0 1 Thanks! #include #include int main() { int imageSize = 4; int markupsSize = 4; std::vector imagePoints; std::vector markupsPoints; double tempImage[3], tempMarkups[3]; // Temp variables or iterative data construction cv::Mat CamMat = (cv::Mat_(3,3) << (566.07469648019332), 0, 318.20416967732666, 0, (565.68051204299513), -254.95231997403764, 0, 0, 1); cv::Mat DistMat = (cv::Mat_(5,1) << -1.1310542849120900e-001, 4.5797249991542077e-001, 7.8356355644908070e-003, 3.7617039978623504e-003, -1.2734302146228518e+000); cv::Mat rvec = cv::Mat::zeros(3,1, cv::DataType::type); cv::Mat tvec = cv::Mat::zeros(3,1,cv::DataType::type); cv::Mat R; cv::Mat extrinsic = cv::Mat::eye(4, 4, CV_64F); // Escape if markup lists aren't equally sized if(imageSize != markupsSize) { //TODO: Replace with try, throw error code, and catch in qSlicerLocalizationModuleWidget return 0; } // Four principal chessboard corners only imagePoints.push_back(cv::Point2d(368.906, 248.123)); imagePoints.push_back(cv::Point2d(156.583, 252.414)); imagePoints.push_back(cv::Point2d(364.808, 132.384)); imagePoints.push_back(cv::Point2d(156.692, 128.289)); markupsPoints.push_back(cv::Point3d(495.115, 39.106, 93.79)); markupsPoints.push_back(cv::Point3d(463.143, -86.286, -51.178)); markupsPoints.push_back(cv::Point3d(500.236, 121.988, 24.056)); markupsPoints.push_back(cv::Point3d(471.276, -3.23, -127.809)); // Larger data set /*imagePoints.push_back(cv::Point2d(482.066, 233.778)); imagePoints.push_back(cv::Point2d(448.024, 232.038)); imagePoints.push_back(cv::Point2d(413.895, 230.785)); imagePoints.push_back(cv::Point2d(380.653, 229.242 )); imagePoints.push_back(cv::Point2d(347.983, 227.785)); imagePoints.push_back(cv::Point2d(316.103, 225.977)); imagePoints.push_back(cv::Point2d(284.02, 224.905)); imagePoints.push_back(cv::Point2d(252.929, 223.611)); imagePoints.push_back(cv::Point2d(483.41, 200.527)); imagePoints.push_back(cv::Point2d(449.456, 199.406)); imagePoints.push_back(cv::Point2d(415.843, 197.849)); imagePoints.push_back(cv::Point2d(382.59, 196.763)); imagePoints.push_back(cv::Point2d(350.094, 195.616)); imagePoints.push_back(cv::Point2d(317.922, 194.027)); imagePoints.push_back(cv::Point2d(286.922, 192.814)); imagePoints.push_back(cv::Point2d(256.006, 192.022)); imagePoints.push_back(cv::Point2d(484.292, 167.816)); imagePoints.push_back(cv::Point2d(450.678, 166.982)); imagePoints.push_back(cv::Point2d(417.377, 165.961)); markupsPoints.push_back(cv::Point3d(457.132, 59.822, 89.247)); markupsPoints.push_back(cv::Point3d(451.634, 42.015, 69.719)); markupsPoints.push_back(cv::Point3d(447.06, 22.927, 48.635)); markupsPoints.push_back(cv::Point3d(442.424, 4.454, 28.659)); markupsPoints.push_back(cv::Point3d(437.621, -14.395, 7.495)); markupsPoints.push_back(cv::Point3d(433.386, -33.034, -12.009)); markupsPoints.push_back(cv::Point3d(429.227, -51.001, -32.269)); markupsPoints.push_back(cv::Point3d(424.291, -70.266, -52.667)); markupsPoints.push_back(cv::Point3d(460.300, 79.769, 69.948)); markupsPoints.push_back(cv::Point3d(455.020, 61.379, 49.306)); markupsPoints.push_back(cv::Point3d(450.501, 43.288, 30.250)); markupsPoints.push_back(cv::Point3d(446.062, 24.572, 8.713)); markupsPoints.push_back(cv::Point3d(441.346, 5.823, -10.997)); markupsPoints.push_back(cv::Point3d(436.670, -13.135, -31.428)); markupsPoints.push_back(cv::Point3d(432.367, -31.428, -51.785)); markupsPoints.push_back(cv::Point3d(427.745, -50.016, -72.519)); markupsPoints.push_back(cv::Point3d(464.824, 101.129, 52.251)); markupsPoints.push_back(cv::Point3d(458.628, 81.864, 30.810)); markupsPoints.push_back(cv::Point3d(454.120, 63.546, 10.458)); */ // Calculate camera pose cv::solvePnP(cv::Mat(markupsPoints), cv::Mat(imagePoints), CamMat, DistMat, rvec, tvec); cv::Rodrigues(rvec, R); // Invert results of Rodrigues by transposing rotation matrix and calculating inverted tvec R = R.t(); tvec = -R * tvec; // translation of inverse std::cout << "Tvec = " << std::endl << tvec << std::endl; // Copy R and tvec into the extrinsic matrix extrinsic( cv::Range(0,3), cv::Range(0,3) ) = R * 1; extrinsic( cv::Range(0,3), cv::Range(3,4) ) = tvec * 1; // Fill last row of homogeneous transform (0,0,0,1) double *p = extrinsic.ptr(3); p[0] = p[1] = p[2] = 0; p[3] = 1; std::cout << "Extrinsic = " << std::endl << extrinsic << std::endl << std::endl; std::cout << "Extrinsic (inv) = " << std::endl << extrinsic.inv() << std::endl; std::cin >> tempImage[0]; return 0; } EDIT 1: I tried normalizing the pixel values using Chi Xu's method (xn = (x-cx)/f, yn = (y-cy)/f). No luck :( EDIT 2: As it seems that almost everyone who uses solvePnP uses a method where they mark the checkerboard corners as the vectors for their world frame and origin, I'm going to try registering my tracker to the checkerboard. If this works as expected, the first coordinate I mark will be approximately <0,0>. The resulting transform from solvePnP will then be multiplied by the inverse of the world-to-camera-marker transformation, thus resulting in (hopefully) the extrinsic matrix. EDIT 3: I performed the steps described in step 2. After establishing a coordinate system on the checkerboard, I calculated the transform cTw from the checkerboard space to the world space and verified it in my rendering environment. I then calculated the extrinsic mTc representing the transform from the camera space to the checkerboard space. The camera marker transform wTr was acquired from the tracking system. Finally, I took all of the transforms and multiplied them as follows to move my transformation all the way from camera origin to camera marker: mTc*cTw*wTr Lo and behold it gave the the exact same result. I'm dying here for any sign of what I'm doing wrong. If anyone has any clue, I beg of you to help.

Viewing all articles
Browse latest Browse all 41027

Trending Articles