Need to convert PDF file to excel for my process

Hi Guys,
Please help to convert the PDF file to excel file on robocorp, I need an hint how can be done ?

Hi!

  • Extract the data you want from the PDF file. You can try with RPA.PDF or RPA.Cloud.AWS (Textract).
  • After you have the data, use the RPA.Excel.Files library to create an Excel file.

Hi @jani,
I appreciate your support, it works.
However, when the PDF was converted to Excel, the whole page was stored in one cell and the second page in another. Is there a way around this? OR Do I need to use Adobe Acrobat to convert it?

Hi! Can you share some code of what you are doing currently?

Hi @jani ,

Below is a snippet of my code return

1 Like

Can you also share with us an example PDF (without sensitive info), so we get an idea about the data format?
With Get Text From Pdf you’ll get a simple blob of text for every targeted page while Append Rows To Worksheet might expect a list of dictionaries (since you’re enabling headers as well) which means one or more rows where every row is a dict which has the corresponding header name as key and the cell value as value. (in your case you’re passing a dict in <page-nr>: <text> format, which stores the entire text found in a PDF page inside a cell under the <page-nr> column)

I suspect that for every PDF page you want to get the text table inside of it, then split all the rows (by \n for example) in a list of columns and then convert those columns in <header>: <cell-value> format, so you’ll end up with a list of dictionaries which will fit the Append Rows To Worksheet keyword expected format.

Let us know if you need more help with the parsing once you attach an input PDF example and an expected output Excel file corresponding to the input. Good luck!