Here we come with the first video in our series to showcase Alfresco functionality.
OCR (Optical Character Recognition) is an important aspect of Alfresco ECM. The viewers who are new to this term, or have not seen how it happens practically in Alfresco, must take a look.
If you are a developer and want to know more about the programming side of the same, you can refer to the earlier blog post by us: Configuring OCR in Alfresco
OCR (Optical Character Recognition) is the recognition of printed or written text characters by a computer. It recognizes the characters from the images or scanned documents, and that makes the images (which contain text) searchable. OCR is a very useful feature for any ECM product or software. In this blog, we will see how we can configure it in Alfresco Community Edition. We have tested this with Alfresco versions 5.1.f and 5.2.e. It should also work with other nearby versions.
4. Place ocr.bat(Windows) and ocr.sh(Linux) at <ALFRESCO-HOME>/
a) ocr.bat (for Windows)
REM to see what happens
mkdir c:\tmp
echo from %1 to %2 >> C:\\tmp\ocrtransform.log
copy /Y %1 "C:\TMP\%~n1%~x1"
echo target %~d2%~p2%~n2
REM call tesseract and redirect output to $TARGET
"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe" "C:\tmp\%~n1%~x1" "%~d2%~p2%~n2" -l eng
b) ocr.sh (for Linux)
# save arguments to variables
SOURCE=$1
TARGET=$2
TMPDIR=/tmp/Tesseract
FILENAME=`basename $SOURCE`
OCRFILE=$FILENAME.tif
# Create temp directory if it doesn't exist
sudo mkdir -p $TMPDIR
# to see what happens
#echo "from $SOURCE to $TARGET" >>/tmp/ocrtransform.log
sudo cp -f $SOURCE $TMPDIR/$OCRFILE
# call tesseract and redirect output to $TARGET
sudo /usr/local/bin/tesseract $TMPDIR/$OCRFILE ${TARGET%\.*} -l eng
#sudo tesseract $TMPDIR/$OCRFILE ${TARGET%\.*} -l eng
sudo rm -f $TMPDIR/$OCRFILE
Note: Make sure that the path for tesseract command is correct in the ocr.sh / ocr.bat file
Linux:
/usr/local/bin or /usr/bin
Windows:
C:\Program Files(x86)\Tesseract-ocr\tesseract.exe
or C:\Program Files\Tesseract-ocr\tesseract.exe
5. If the current user does not have read or execute permissions on ocr.sh then give it.
chmod +rx /opt/<ALFRESCO-HOME>/ocr.sh
6. Add following properties in the alfresco-global.properties file located at
C:\<ALFRESCO-HOME>\tomcat\bin\startup.bat press enter.
Or use manager-windows.exe
Note: Existing files in alfresco will not be OCRed, you have to upload new image files to test.
Important:
Make sure you are passing correct arguments in the context file (Entries in context files will be different for Windows and Linux).
Check whether your .bat or .sh commands are properly working or not
Verify that tesseract creates text file for the image file
To verify that go to the directory where tesseract is installed and run the following command
tesseract ./<image file-name> ./<text file-name> -l eng
If the text file is created with content in it, your tesseract is working.
Comment here, if your contents are still not searchable. We are happy to know your ECM challenges, as we love solving them Contact us!
Let us connect the digital dots!
We are seeking dynamic professionals and unstoppable talents to craft distinct solutions for our clients to enhance their businesses. Come, join our fair & focused, optimistic & thoughtful world and deliver excellence together.
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume the you are happy with it.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.