I am using tesseract-ocr portable version exe with psm 3 to run the rectangle of bitmap for my images. The problem is it has incorrect spacing for some words(e.g. "this group" becomes "thisgroup"). I tried to correct this problem by resizing the bitmap to a larger size which successfully solves this problem, but then other spacing problem appears(e.g. "apple" becomes "appl e"). The words in the example is not the same as my test file but due to company policy I cannot reveal them. I think resizing the bitmap might not be the best way to solve the spacing problem. Is there other methods I can try out? I also heard there is a way to wrap the word in white rectangle, but I have not found the method. Can someone give me a lead as to the spacing problem?
↧