{"id":1340,"date":"2021-12-13T09:57:39","date_gmt":"2021-12-13T09:57:39","guid":{"rendered":"https:\/\/multisite.korebots.com\/SearchAssist\/?p=1340"},"modified":"2022-02-11T00:36:12","modified_gmt":"2022-02-11T00:36:12","slug":"how-to-annotate-and-extract-faqs-from-documents","status":"publish","type":"post","link":"https:\/\/multisite.korebots.com\/SearchAssist\/concepts\/content-sources\/adding-faqs\/how-to-annotate-and-extract-faqs-from-documents\/","title":{"rendered":"Annotating to Extract FAQs"},"content":{"rendered":"<section class=\"l-section wpb_row height_auto\"><div class=\"l-section-h i-cf\"><div class=\"g-cols vc_row via_grid cols_1 laptops-cols_inherit tablets-cols_inherit mobiles-cols_1 valign_top type_default stacking_default\"><div class=\"wpb_column vc_column_container\"><div class=\"vc_column-inner\"><div class=\"wpb_text_column\"><div class=\"wpb_wrapper\"><p><span style=\"font-weight: 400;\">You may have all the FAQs related to your business in a PDF file but not in the format mandated by SearchAssist. Annotate such documents identifying the key sections of the content from a few pages of the document. SearchAssist uses this <\/span><span style=\"font-weight: 400;\">identified pattern from<\/span><span style=\"font-weight: 400;\">\u00a0annotation to extract the FAQs from the document.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Note: This feature is applicable only when extracting FAQs from PDF documents.<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Select a PDF file for extraction.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Select the <\/span><b>Annotate &amp; Extract<\/b><span style=\"font-weight: 400;\"> option.\u00a0 Click <strong>Proceed<\/strong>.<a ref=\"magnificPopup\" href=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_from-pdf_annotate_extract.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-1829 size-large\" src=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_from-pdf_annotate_extract-1024x572.png\" alt=\"\" width=\"640\" height=\"358\" srcset=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_from-pdf_annotate_extract-1024x572.png 1024w, https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_from-pdf_annotate_extract-300x168.png 300w, https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_from-pdf_annotate_extract.png 1504w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The PDF document is loaded into the Annotation Tool allowing you to annotate the various sections in the document.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">To annotate, select the text and tag it as follows:<\/span>\n<ul>\n<li style=\"font-weight: 400;\"><strong>Heading:<\/strong><span style=\"font-weight: 400;\"><strong>\u00a0<\/strong> Apply Heading tag to train the App so that it can identify the question. The content between any two consecutive headings is extracted as the answer for the preceding heading.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Header:<\/b><span style=\"font-weight: 400;\">\u00a0 Avoid random marking of texts as headers. Marking text such as a footer or paragraphs as the header produces invalid results.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Footer:<\/b><span style=\"font-weight: 400;\"> Apply Footer tag to train the App so that it can identify and ignore the footers. Avoid random marking of texts as footers. Marking text such as a header or paragraphs as the footer produces invalid results.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Exclude:<\/b><span style=\"font-weight: 400;\"> Apply Exclude tag to prevent the extraction\u00a0 of that section.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Ignore Page: <\/b><span style=\"font-weight: 400;\">\u00a0Apply Ignore page tag to pages to be excluded from extraction.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Remove Annotation:<\/b><span style=\"font-weight: 400;\"> Apply this feature to undo any incorrect annotations and <\/span><span style=\"font-weight: 400;\">to start annotating afresh.\u00a0<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The App uses the headings, headers, and footers in the extraction process and can learn from it. You need not annotate the entire document. Annotate a couple of pages with headings, headers, and footers, extract and review the questions.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The feature generates Additional document information:<\/span>\n<ul>\n<li style=\"font-weight: 400;\"><b>Document Info<\/b><span style=\"font-weight: 400;\">\u00a0includes Name, Size, and the Number of Pages of the document.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Annotation Summary<\/b><span style=\"font-weight: 400;\">\u00a0 includes Number of annotations marked for each category for the particular page and the entire document.<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">After you annotate, click <\/span><b>Extract<\/b> <span style=\"font-weight: 400;\">to apply the annotation to the entire document and extract FAQs from it in bulk.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The extracted FAQs are listed under Drafts and mark the beginning of the FAQ review workflow. Refer\u00a0 <\/span><a href=\"https:\/\/multisite.korebots.com\/SearchAssist\/?p=1396&amp;preview=true\"><span style=\"font-weight: 400;\">How to use an\u00a0 FAQ Workflow<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<\/div><\/div><\/div><\/div><\/div><\/div><\/section><section class=\"l-section wpb_row height_auto\"><div class=\"l-section-h i-cf\"><div class=\"g-cols vc_row via_grid cols_1 laptops-cols_inherit tablets-cols_inherit mobiles-cols_1 valign_top type_default stacking_default\"><div class=\"wpb_column vc_column_container\"><div class=\"vc_column-inner\"><div class=\"w-post-elm post_navigation layout_simple inv_false\"><a class=\"post_navigation-item order_first to_prev\" href=\"https:\/\/multisite.korebots.com\/SearchAssist\/how-to\/extract-faqs-from-files\/\" title=\"How to Extract FAQs from Files\"><div class=\"post_navigation-item-arrow\"><\/div><div class=\"post_navigation-item-meta\">Previous Post<\/div><div class=\"post_navigation-item-title\"><span>How to Extract FAQs from Files<\/span><\/div><\/a><a class=\"post_navigation-item order_second to_next\" href=\"https:\/\/multisite.korebots.com\/SearchAssist\/concepts\/content-sources\/adding-faqs\/extract-faqs-from-url\/\" title=\"Extract FAQs from URL\"><div class=\"post_navigation-item-arrow\"><\/div><div class=\"post_navigation-item-meta\">Next Post<\/div><div class=\"post_navigation-item-title\"><span>Extract FAQs from URL<\/span><\/div><\/a><\/div><\/div><\/div><\/div><\/div><\/section>\n","protected":false},"excerpt":{"rendered":"You may have all the FAQs related to your business in a PDF file but not in the format mandated by SearchAssist. Annotate such documents identifying the key sections of the content from a few pages of the document. SearchAssist uses this identified pattern from\u00a0annotation to extract the FAQs from the document. Note: This feature...","protected":false},"author":18,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[71],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts\/1340"}],"collection":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/comments?post=1340"}],"version-history":[{"count":22,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts\/1340\/revisions"}],"predecessor-version":[{"id":2835,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts\/1340\/revisions\/2835"}],"wp:attachment":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/media?parent=1340"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/categories?post=1340"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/tags?post=1340"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}