Hello everyone, I need help!
I’m automating a simple task:
1 - access a web site with credencials
2 - search for some cases
3 - download some pdfs
The first 2 steps I did perfectly, but the last one no.
I’m coding in pure python and I’ve tried to use requests, but the pdf downloaded come with problems.
the figure bellow is the link of the pdf and the other is the result from the script
The final result I need, is take some data from PDF, if someone knows how to download or get the string of the PDF without download, will help me a lot.
Thanks and sorry the bad english
Hi, most probably the data that you retrieve through the request is not the actual PDF content, but instead some HTML text data telling you that you’re not authorized to access that URL, therefore you place inside invalid content which can’t be displayed.
If that’s the case (you can check by opening the PDF file in an editor and see if it starts with something like
%PDF-1.4), then that’s because the request you sent doesn’t set the cookie nor use the session with which you are authenticated in that website.
To solve it, either do the auth and all the requests under a
requests.Session() object or set the
Cookie header as part of your
headers= dictionary during requests. Sometimes an URL crafted into the form of
https://<usr>:<pwd>@logistica.belgo.com.br/uploaded_file?... might work as well, give it a try.
Or even easier, use Browser automation and navigate to that page, then click over the button/element which triggers the download (don’t forget to call Set Download Directory first).
If you still can’t fix it, I need some form of access into that platform in order to replicate and find a fix myself. Good luck!
Hello cosmin, you saved me again rs.
The data i retrieve through the request was a html ans using a html viewer online I see this.
As you said, the content of request was not the PDF.
About this click from RPA.Browser i’ll study more about
this way doesn’t work
I hope you already changed the password embedded above or used a fake one, as this is a public forum and you should never expose sensitive info here.
Meanwhile please try with the other options and let me know if you hit any blocker.
It’s a fake password yes.
The other way its clicking on the link? it didn’t work, clicking on the link just open the web site with the pdf
All good then. Keep in mind that the
<usr>:<pwd> above I gave as an example was meant to be replaced entirely, not including the brackets, like follows:
https://email@example.com:myPassword1234@site.com/page/to/resource and I’m not really sure about the first
@ like how that should be escaped.
If browser automation doesn’t work here, then try in Python with
requests.Session() (after doing the auth, so the cookies are stored) or use our
RPA.HTTP library starting with the Create Session keyword and using the
It doesn’t work too, but very thanks !!!