Downloading CSV files with headless Chrome using Python

my script works perfectly in Chrome but when I activate with browser with “open_headless_chrome_browser” script executes everything but downloading the CSV from a click button command.

Hello, @klagster and welcome to our forum.

Can you give more details what you are doing? Like following:

  • is error handling in place and what is error you see when your script fails?
  • what is on your robot.yaml and conda.yaml?
  • what library are you using?
  • can you share your source code, so that we can see what is going on?
  • what tooling are you using to run this? VS Code? Agent? Pure rcc run?
  • does this happen in your machine, or in cloud container?

Thanks jippo

i’ve printed steps completed, the code is not throwing any errors. not sure how to dig deeper…
Im running from Powershell under VS Code
no robot,yaml, conda.yaml

happens on a Windows 11 Parallels VM, running on a local (Arm) mac book

python 3.10.5
robotframework 5.01
pip 22.1.2
chrome Version 103.0.5060.114 (Official Build) (32-bit)Version: 1.69.0 (user setup)

VS Code Info:
Commit: 92d25e35d9bf1a6b16f7d0758f25d48ace11e5b9
Date: 2022-07-07T05:28:32.764Z
Electron: 18.3.5
Chromium: 100.0.4896.160
Node.js: 16.13.2
V8: 10.0.139.17-electron.0
OS: Windows_NT arm64 10.0.22000

from RPA.Browser.Selenium import Selenium
from RPA.FileSystem import FileSystem
import time

browser_lib = Selenium()

def is_downloaded(file_str, wt):

time_stamp = time.time()
download_time = 0
time_waited = 0

while download_time < time_stamp and time_waited <= wt: 
    dir_query = FileSystem().find_files(file_str)
    if len(dir_query) > 0:
        download_time = dir_query[len(dir_query)-1].mtime   
        last_csv_created_name = dir_query[len(dir_query)-1].name
    time.sleep(1)            
    time_waited +=1
if time_waited > wt:
    return 'No File'
else:
    return last_csv_created_name

def robot_work(start_date, end_date, browser_type):

it_worked = False
file_dir = "//Mac/Home/Downloads"
new_file_filter = "//Mac/Home/Downloads/Invoices - *.csv"
url = "https://www.lbmx.com/portal/"
uid = "5LL9Z5M8"
pwd = "DHBNFH2D"


try:
    #Open Browser
    if browser_type == 'Headless':
        browser_lib.open_headless_chrome_browser(url)
    else:
        browser_lib.open_chrome_browser(url)

    browser_lib.set_window_size(1980,1080)
    #browser_lib.set_download_directory(file_dir)
    #switched to "Browser.set.. as Page.setBowloadBehavior is depreceted.. didnt work "
    #params = {'behavior': 'allow', 'downloadPath': file_dir}
    #browser_lib.execute_cdp('Page.setDownloadBehavior', params)

    #get credentials
    uid_input_field = "id:username"
    pwd_input_password = "id:password"
    login_button = "name:login"
    browser_lib.input_text(uid_input_field, uid)
    browser_lib.input_text(pwd_input_password,pwd)
    browser_lib.click_button(login_button)

    #Navigate to Starting Point
    browser_lib.go_to("https://orders.lbmx.com/DocumentReceived/Invoice")

    #Set Date Query    
    print(browser_lib.click_element("link:More"))
    browser_lib.input_text("name:dateAddedFrom", start_date)
    browser_lib.input_text("name:dateAddedTo", end_date)
    browser_lib.click_button('xpath://*[@id="invoiceSearchContainer"]/div[2]/div[1]/div/div[1]/div[2]/div/div/div[1]/button')
    browser_lib.wait_until_element_contains('xpath://*[@id="invoiceList"]/div[4]/ul/li[2]/span','1',15)
except:
    print("Something unexpected went wrong")
finally:
    print("query complete")
    it_worked = True
try:
    #Get CSV
    browser_lib.click_element("id:toolbarActions")
    browser_lib.click_button('xpath://*[@id="divGridToolbarButtonLeft"]/div/ul/li[5]/button')
    filename = is_downloaded(new_file_filter,15)

    if filename == 'No File':
        it_worked = False
        print("the file didnt save")
    else:
        it_worked = True
        print("file saved")
except:
    print("Something went wrong with the download process")
    it_worked = False

return [it_worked,filename]

print(‘Normal:’, robot_work(‘2022-07-01’, ‘2022-07-11’, ‘Other’))
print(‘Headless:’, robot_work(‘2022-07-01’, ‘2022-07-11’, ‘Headless’))

duplicated on Mac as well

OS 12.4, Python 3.10.5, rpa framework 15.3.0

First of all, I hope your “uid” and “pwd” were not your real credentials, since this is public forum. And in general, it is not recommended that you put credentials directly inside your scripts.

About your problem. If you are not using our tooling, and don’t have “robot.yaml” and “conda.yaml” in place, and are managing your environments manually, then there is not much I can help you with.

I can try to give your some hints:

  • one problem might be, that you are using quite edge version of python 3.10, which might fail with some libraries (our tested version is 3.7.5)
  • add better error handling, now script is just printing that something went wrong, but continues
  • you might need to add verification that login was successful (like wait that something is visible), and for other steps as well
  • in general you should do it so that you alternate with verification and action, so that robot can see that first it can actually do an action, and after action is done, result is as expected, so that next step can be done
  • and start using our tooling, so that you have full control on what versions are used and that environments are repeatable and not manually managed
  • and if you have not done so, check out our courses on RPA topic: RPA training courses and certifications
  • also check out video about writing robots in python from https://www.youtube.com/c/Robocorp/videos

Hopefully at least some of ideas above can help you and move your project forward.

1 Like

I’ll add yaml files at some point as I move this to deployment.

Doesn’t seem to be much value in the tooling for pure python projects as it seems to be designed to support .robot projects and I’ve no interest in going there. I’m sure .robot is dandy (I’ve tried it) but my current goal is to run an executable on one windows computer to be launched manually by one user. Implementing/ learning rcc for this purpose isn’t really worth it. Perhaps I’m wrong here? Not sure what additional information the tooling can gather from the one line of code that’s not working?

Basically my script was working fine, with the exception of the original issue I wrote to you about. I took the robot work and put it into a small program to reproduce the problem for review.

I’ll try backing down the python and if it doesn’t work, I’ll report back, and rewrite the robot to scrape the data off the screen.

Okay well, switched to 3.7.5 the issue persisted on Mac, but it turns out the file was actually being created.

it appears that when a headless chrome session is initiated it defaults downloads from the location it was launched from (the project directory). I foolishly never looked there…

FYI, I’ve tried changing the directory with "browser_lib.set_download_directory() but i have not figured out how to get it working yet. “User/username/Downloads” doesn’t work, and “User\username\Downloads” throws an error…

I just made the script look in the right place for now…

next on to windows and check back to see if 3.10 works fine

This worked for me for changing download folder

file_dir = r"C:\\Users\\Raivo\Documents\\TargetFolder"
browser_lib.set_download_directory(file_dir)
browser_lib.open_available_browser("http://rpachallenge.com/")

NB! you need to set it before opening browser

1 Like