Screen OCR sample

Here is some code that uses Tesseract OCR and pyautogui to mouse click on a given word on screen:

This example was only a learning exercise for me :slight_smile: and not mend for any real use.

conda.yaml

channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.7.5
  - pip=20.1
  - numpy
  - tesseract
  - pytesseract
  - opencv
  - pyautogui

mini_ocr.py

import cv2
import numpy as np
import pytesseract
import pyautogui

def click(search):
    image = pyautogui.screenshot()
    image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)

    d = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)

    for text, conf, x, y, w, h in zip(d['text'], d['conf'], d['left'], d['top'], d['width'], d['height']):
        if int(conf) > 50:
            if search.lower() in text.lower():
                dx = pyautogui.size().width / image.shape[1]
                dy = pyautogui.size().height / image.shape[0]
                pyautogui.click(int(dx * (x + w // 2)), int(dy * (y + h // 2)))

tasks.robot

*** Settings ***
Library         mini_ocr.py
Library         RPA.Browser

*** Task ***
Mini
    Open Available Browser    https://robocorp.com
    Click    Documentation
3 Likes