Hi,
I am scheduling my robot in Gitlab pipeline.
Using RCC tool, I configured the pipeline, but when it runs, it throws the below error,
“NameError: name ‘win32com’ is not defined”. By default, my gitlab account uses ubuntu image.
I have the below code in my script -
RPA.Word.Application.Open Application
RPA.Word.Application.Open File ${attach1}
${text}= Get All Texts
RPA.Word.Application.Quit Application
Since my Gitlab points to a Linux OS, The above script is failing as RPA.Word.Application library works only on Windows platform.
How can I handle this on Linux? I need to open a word document and get all the text from it. Kindly clarify
As documentation says, RPA.Word.Application works only in Windows. So, you currently you cannot handle it on Linux, you have to use Windows machine for that.
@Teppo - Thank you. This worked for me and I could get the paragraphs alone from the word document.
Hyperlinks text, table contents are all missing. Could you please let me know how can I get all the text from the word document?
I think those are all available with the python-docx library python-docx — python-docx 0.8.11 documentation but it may require implementing a python function to extract the data. It is hard to define a generic function because it depends on how you want to use the extracted text.
To help you get started, here is another example of python that will extract text from tables:
import docx
def extract_tables(filename):
doc = docx.Document(filename)
text = ""
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
text = "\n".join([text, cell.text])
return text