This post will help read texts from your images. It makes use of tessaract library.
You can also use the below module to check if the captcha on your site is strong enough and cannot be broken simply.
Reference:
https://github.com/tesseract-ocr/tessdata
http://stackoverflow.com/questions/18095708/tess4j-doesnt-use-its-tessdata-folder
Language Used:
Java
Git Location:
https://github.com/csanuragjain/extra/tree/master/ReadFromImages
POM Dependency:
Pre-requisite:
1) Assume you are running this program from c:\myprogram. Now you can follow either of 2 methods based on your requirements.
Space saving method: (You only download the language data which you need. Only require 30MB for a english dataset)
2) Create a folder named tessdata inside c:\myprogram\
3) Navigate to https://github.com/tesseract-ocr/tessdata
4) Download eng.traineddata for breaking captcha with english language (trained data are available for other languages as well)
5) Place the eng.traineddata inside tessdata folder.
6) Finally your folder structure should look like c:\myprogram\tessdata\eng.traineddata
Time saving method: (Download trained data from several languages and atleast cosumes 1GB space)
7) You can also skip Step 2 to Step 5 and simply download the tessdata-master folder from https://github.com/tesseract-ocr/tessdata
8) Unzip the content of tessdata-master.zip file in your main project folder (for eg here it is c:\myprogram\)
9) Rename tessdata-master to tessdata
10) Finally your folder structure should look like c:\myprogram\tessdata\<Trained data from several language>
Program:
ImageCracker class, crackImage method:
How it works:
1) crackImage takes the image which need to be read
2) We point a file object to that image
3) We make a Tessaract object named instance
4) We call the predefined method doOCR of Tessaract library passing the file object from step2
5) the doOCR method returns the text read from the image and returns the same.
6) In case of failure it prints the error message and returns a error string.
Driver class, main method:
How it works:
1) We call the crackImage method passing the image to be read from.
2) We print the text read from the method on the console.
Input Image (testImage.PNG):
Output:
Create a Youtube metadata crawler using Java
Full Program:
ImageCracker class
Driver class:
Hope it helps :)
You can also use the below module to check if the captcha on your site is strong enough and cannot be broken simply.
Reference:
https://github.com/tesseract-ocr/tessdata
http://stackoverflow.com/questions/18095708/tess4j-doesnt-use-its-tessdata-folder
Language Used:
Java
Git Location:
https://github.com/csanuragjain/extra/tree/master/ReadFromImages
POM Dependency:
<!-- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j -->
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>3.2.1</version>
</dependency>
Pre-requisite:
1) Assume you are running this program from c:\myprogram. Now you can follow either of 2 methods based on your requirements.
Space saving method: (You only download the language data which you need. Only require 30MB for a english dataset)
2) Create a folder named tessdata inside c:\myprogram\
3) Navigate to https://github.com/tesseract-ocr/tessdata
4) Download eng.traineddata for breaking captcha with english language (trained data are available for other languages as well)
5) Place the eng.traineddata inside tessdata folder.
6) Finally your folder structure should look like c:\myprogram\tessdata\eng.traineddata
Time saving method: (Download trained data from several languages and atleast cosumes 1GB space)
7) You can also skip Step 2 to Step 5 and simply download the tessdata-master folder from https://github.com/tesseract-ocr/tessdata
8) Unzip the content of tessdata-master.zip file in your main project folder (for eg here it is c:\myprogram\)
9) Rename tessdata-master to tessdata
10) Finally your folder structure should look like c:\myprogram\tessdata\<Trained data from several language>
Program:
ImageCracker class, crackImage method:
public static String crackImage(String filePath) {
File imageFile = new File(filePath);
ITesseract instance = new Tesseract();
try {
String result = instance.doOCR(imageFile);
return result;
} catch (TesseractException e) {
System.err.println(e.getMessage());
return "Error while reading image";
}
}
How it works:
1) crackImage takes the image which need to be read
2) We point a file object to that image
3) We make a Tessaract object named instance
4) We call the predefined method doOCR of Tessaract library passing the file object from step2
5) the doOCR method returns the text read from the image and returns the same.
6) In case of failure it prints the error message and returns a error string.
Driver class, main method:
public static void main(String[] args) {
// TODO Auto-generated method stub
System.out.println(ImageCracker.crackImage("testImage.PNG"));
}
How it works:
1) We call the crackImage method passing the image to be read from.
2) We print the text read from the method on the console.
Input Image (testImage.PNG):
Output:
Create a Youtube metadata crawler using Java
Full Program:
ImageCracker class
package com.cooltrickshome;
import java.io.File;
import net.sourceforge.tess4j.*;
public class ImageCracker {
public static String crackImage(String filePath) {
File imageFile = new File(filePath);
ITesseract instance = new Tesseract();
try {
String result = instance.doOCR(imageFile);
return result;
} catch (TesseractException e) {
System.err.println(e.getMessage());
return "Error while reading image";
}
}
}
Driver class:
package com.cooltrickshome;
public class Driver {
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
System.out.println(ImageCracker.crackImage("testImage.PNG"));
}
}
Hope it helps :)
I'm getting this error. I already downloaded tess4j jar. Exception in thread "main" java.lang.NoClassDefFoundError: com.sun.jna.Pointer
ReplyDeleteAdvance Programs And Tricks In Java: Reading Text From Images Using Java >>>>> Download Now
Delete>>>>> Download Full
Advance Programs And Tricks In Java: Reading Text From Images Using Java >>>>> Download LINK
>>>>> Download Now
Advance Programs And Tricks In Java: Reading Text From Images Using Java >>>>> Download Full
>>>>> Download LINK By
Kindly have a look at https://stackoverflow.com/questions/44511562/tess4j-mac-noclassdeffounderror
ReplyDeleteThis should resolve the issue
So how a lot cash would you be saving? Specialists estimate that many huge firms shall be losing from three to four million this yr. This observe is huge lack of cash, particularly for a small or midsize enterprise that is direct lack of revenue. onlineconvertfree.com
ReplyDeleteUnable to load library 'tesseract': Native library (linux-x86-64/libtesseract.so) not found in resource
ReplyDeleteit solved my problem
ReplyDeletesudo apt-get install tesseract-ocr
You can get more info https://stackoverflow.com/questions/18419504/java-tesseract-error-in-linux-unable-to-load-library-tesseract-libtesseract/24995741
Advance Programs And Tricks In Java: Reading Text From Images Using Java >>>>> Download Now
ReplyDelete>>>>> Download Full
Advance Programs And Tricks In Java: Reading Text From Images Using Java >>>>> Download LINK
>>>>> Download Now
Advance Programs And Tricks In Java: Reading Text From Images Using Java >>>>> Download Full
>>>>> Download LINK