Friday, February 17, 2017

Reading text from Images using Java

This post will help read texts from your images. It makes use of tessaract library.
You can also use the below module to check if the captcha on your site is strong enough and cannot be broken simply.

Reference:
https://github.com/tesseract-ocr/tessdata
http://stackoverflow.com/questions/18095708/tess4j-doesnt-use-its-tessdata-folder

Language Used:
Java

Git Location:
https://github.com/csanuragjain/extra/tree/master/ReadFromImages

POM Dependency:
 <!-- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j -->  
 <dependency>  
   <groupId>net.sourceforge.tess4j</groupId>  
   <artifactId>tess4j</artifactId>  
   <version>3.2.1</version>  
 </dependency>  

Pre-requisite:
1) Assume you are running this program from c:\myprogram. Now you can follow either of 2 methods based on your requirements.

Space saving method: (You only download the language data which you need. Only require 30MB for a english dataset)
2) Create a folder named tessdata inside c:\myprogram\
3) Navigate to https://github.com/tesseract-ocr/tessdata
4) Download eng.traineddata for breaking captcha with english language (trained data are available for other languages as well)
5) Place the eng.traineddata inside tessdata folder.
6) Finally your folder structure should look like c:\myprogram\tessdata\eng.traineddata

Time saving method: (Download trained data from several languages and atleast cosumes 1GB space)
7) You can also skip Step 2 to Step 5 and simply download the tessdata-master folder from https://github.com/tesseract-ocr/tessdata
8) Unzip the content of tessdata-master.zip file in your main project folder (for eg here it is c:\myprogram\)
9) Rename tessdata-master to tessdata
10) Finally your folder structure should look like c:\myprogram\tessdata\<Trained data from several language>

Program:

ImageCracker class, crackImage method:
 public static String crackImage(String filePath) {  
     File imageFile = new File(filePath);  
     ITesseract instance = new Tesseract();  
     try {  
       String result = instance.doOCR(imageFile);  
       return result;  
     } catch (TesseractException e) {  
       System.err.println(e.getMessage());  
       return "Error while reading image";  
     }  
   }  

How it works:
1) crackImage takes the image which need to be read
2) We point a file object to that image
3) We make a Tessaract object named instance
4) We call the predefined method doOCR of Tessaract library passing the file object from step2
5) the doOCR method returns the text read from the image and returns the same.
6) In case of failure it prints the error message and returns a error string.

Driver class, main method:
 public static void main(String[] args) {  
           // TODO Auto-generated method stub  
           System.out.println(ImageCracker.crackImage("testImage.PNG"));  
      }  

How it works:
1) We call the crackImage method passing the image to be read from.
2) We print the text read from the method on the console.

Input Image (testImage.PNG):
Output:
Create a Youtube metadata crawler using Java

Full Program:

ImageCracker class
 package com.cooltrickshome;  
 import java.io.File;  
 import net.sourceforge.tess4j.*;  
 public class ImageCracker {  
   public static String crackImage(String filePath) {  
     File imageFile = new File(filePath);  
     ITesseract instance = new Tesseract();   
     try {  
       String result = instance.doOCR(imageFile);  
       return result;  
     } catch (TesseractException e) {  
       System.err.println(e.getMessage());  
       return "Error while reading image";  
     }  
   }  
 }  

Driver class:
 package com.cooltrickshome;  
 public class Driver {  
      /**  
       * @param args  
       */  
      public static void main(String[] args) {  
           // TODO Auto-generated method stub  
           System.out.println(ImageCracker.crackImage("testImage.PNG"));  
      }  
 }  

Hope it helps :)

7 comments:

  1. I'm getting this error. I already downloaded tess4j jar. Exception in thread "main" java.lang.NoClassDefFoundError: com.sun.jna.Pointer

    ReplyDelete
    Replies
    1. Advance Programs And Tricks In Java: Reading Text From Images Using Java >>>>> Download Now

      >>>>> Download Full

      Advance Programs And Tricks In Java: Reading Text From Images Using Java >>>>> Download LINK

      >>>>> Download Now

      Advance Programs And Tricks In Java: Reading Text From Images Using Java >>>>> Download Full

      >>>>> Download LINK By

      Delete
  2. Kindly have a look at https://stackoverflow.com/questions/44511562/tess4j-mac-noclassdeffounderror
    This should resolve the issue

    ReplyDelete
  3. So how a lot cash would you be saving? Specialists estimate that many huge firms shall be losing from three to four million this yr. This observe is huge lack of cash, particularly for a small or midsize enterprise that is direct lack of revenue. onlineconvertfree.com

    ReplyDelete
  4. Unable to load library 'tesseract': Native library (linux-x86-64/libtesseract.so) not found in resource

    ReplyDelete
  5. it solved my problem

    sudo apt-get install tesseract-ocr

    You can get more info https://stackoverflow.com/questions/18419504/java-tesseract-error-in-linux-unable-to-load-library-tesseract-libtesseract/24995741

    ReplyDelete
  6. Advance Programs And Tricks In Java: Reading Text From Images Using Java >>>>> Download Now

    >>>>> Download Full

    Advance Programs And Tricks In Java: Reading Text From Images Using Java >>>>> Download LINK

    >>>>> Download Now

    Advance Programs And Tricks In Java: Reading Text From Images Using Java >>>>> Download Full

    >>>>> Download LINK

    ReplyDelete