The engineering industry has tons and tons of multi-page tables. Extracting these tables into Excel has always been one of the most important and headache-inducing tasks for engineers.
PDFBlueprint launched a “Batch Extract Tables” function, greatly improving the efficiency of the extraction tables, while taking into account the accuracy of the extraction.
Come and take a look at the tutorials~
1. First of all, make sure your computer has installed PDFBlueprint, if not yet installed, please download the installation package from the official website and install it.
Official website: https://pdf.fastcadreader.com/
2. Open a multi-page PDF with tables in the same location.
3. Click the drop-down function “Batch Extract Tables” on the “OCR” button in the menu bar.
4. Box the range of tables to be extracted. In order to be compatible with other pages with tables of different lengths, you can box the form as large as possible, but do not box irrelevant information.
5. After boxing is completed, the “Batch Extract Tables” pop-up window will appear, in which you can set various settings for “Batch Extract Tables”.
5.1 Range: You can select "All Pages" or "Page Range".
5.2 Options:
5.2.1 No tabular lines: For cases where there are no or incomplete table lines, this option can be checked to ignore existing table lines, and vertical separators can be added to ensure an accurate breakdown.
5.2.1.1 After clicking "Add Vertical Separator", click the left mouse button to add the vertical separators within the boxing area. At this point you can hold down the Ctrl key and zoom in on the drawing to more accurately add the vertical separators.
5.2.1.2 If the added vertical separator is inaccurate, you can click on the bottom left corner of the boxing range of “Delete” or “Move” button to delete or move the existing vertical separator. When the deletion or movement is complete, click the “Add” button to return to adding vertical separators. When the vertical separator is added, you can browse other pages to view it. (Note: you can only view the boxing range and vertical separator on other pages in single page mode, not in two pages mode or continuous mode.)
5.2.2 Recognize text by image: This option can be checked if the recognition results are garbled or text in a particular language is missing.
5.2.3 Include page number: Checking this option will add the page number column to the left side of the extracted table.
5.2.4 Include bookmark: Checking this option will add the bookmark column to the left side of the extracted table.
5.3 Export: Choose according to the needs of the form of the extraction results.
5.3.1 Export to different sheets in Excel: If you select this option, one Excel sheet will be created for each page of the table.
5.3.2 Export to the same sheet in excel sequentially: If you select this option, the tables extracted from each page will be extracted to the same sheet in sequential order.
5.4 Output:
5.4.1 The output path is defaulted to desktop, you can choose the path. In addition, you can modify the name of the extracted Excel table
5.4.2 Open the file after extracting: Checking this option will open the Excel file after the extraction is complete.
6. After setting, click “Extract”. The progress bar of "Batch Extract Tables" will pop up immediately.
7. Wait patiently for the table extraction to complete (tables with more pages may take longer to extract).
8. When the extraction is finished, the progress bar will be closed automatically. If “Open the file after extracting” is checked, the extracted table will be opened automatically. If “Open the file after extracting” is not checked, you can find the extracted Excel table in the set output path.
9. After opening Excel, you can do subsequent processing of the extracted content, such as calibration and data secondary organization.
Notes:
a. “Batch Extract Tables” supports batch boxing to extract a piece of information or content (such as the name of the drawing, drawing number, pipeline number, etc.). This can be matched with the “Include page number” function for information extraction, and then in Excel using the VLOOKUP function to match the organization.
b. In order to be compatible with the different lengths of the table on each page, when boxing the table, you can box the table as large as possible, but do not box to irrelevant information.
c. Check “No tabular lines” to ignore all existing table lines in the boxing range.
d. “Add Vertical Separator” function can only be used if "No tabular lines" is checked.
e. If the recognition results are garbled or text in a particular language is missing, you can check the "Recognize text by image" option. Check this option to ignore the text information in the PDF itself and use the OCR information directly.
f. Extraction time has a certain relationship with the number of PDF pages. Generally the more pages, the longer the extraction time.
g. If the following pop-up window appears after the extraction is finished, it means that no information has been extracted in the corresponding page number. You can check the copied page number and check it page by page.
Still have questions? Feel free to contact our technical support via Email, WhatsApp or WeChat. We provide 1-to-1 services for free!
Don't forget to follow our social media pages to learn more tips and tutorials!
Back to the list of all tutorials: https://dwg.fastcadreader.com/question/list?categoryId=11