Image OCR using Python

Hello All,

Any one please help me to extract the text from the image url. i created the python code using tesseract ocr but not able to using that python function in robocorp while run it says “RETURN can only be used inside a user keyword.” and i use the keyword also but still have the same issue can any one please help me out for this. or if someone have code for extract the image url text in robocorp please send me.

It is unclear what have you tried. Could you share the code also?

import cv2
import pytesseract
import requests
import sys
pytesseract.pytesseract.tesseract_cmd = r’C:/Program Files/Tesseract-OCR/tesseract.exe’

def download_image(image_url):
response = requests.get(image_url, stream=True)
if response.status_code == 200:
with open(‘C:/Users/npatel.WMP/Documents/Robots/table-challenge/output/2.jpg’, ‘wb’) as file:
for chunk in response.iter_content(1024):
file.write(chunk)
print(“Image downloaded successfully!”)
else:
print(“Failed to download the image.”)

image = cv2.imread('C:/Users/npatel.WMP/Documents/Robots/table-challenge/output/2.jpg')

# Convert the image to grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Apply thresholding to preprocess the image
threshold_image = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

# Perform OCR using Tesseract
text = pytesseract.image_to_string(threshold_image)
 
print(text)

Example usage

image_url=“https://rpachallengeocr.azurewebsites.net/invoices/10.jpg
download_image(image_url)

i used this python code where i have to pass the url which i have in robocorp as a variable and need the text which is return from this python code. the code is till to download_image(image_url)

It not clear where your “robocorp” variable is coming. You could try:

from robot.libraries.BuiltIn import BuiltIn
url_path = BuiltIn().get_variable_value("$image_url")

Documentation link