{"id":2840,"date":"2022-02-09T10:14:40","date_gmt":"2022-02-09T10:14:40","guid":{"rendered":"https:\/\/multisite.korebots.com\/SearchAssist\/?p=2840"},"modified":"2022-02-24T05:00:03","modified_gmt":"2022-02-24T05:00:03","slug":"extracting-faqs","status":"publish","type":"post","link":"https:\/\/multisite.korebots.com\/SearchAssist\/concepts\/extracting-faqs\/","title":{"rendered":"Extracting FAQs"},"content":{"rendered":"<p>SearchAssist\u00a0 enables you to extract FAQs either from Files or from URL.<\/p>\n<p>To extract FAQs from files you have to annotate the uploaded file i.e.,\u00a0 mark the various sections in the file uploaded for SearchAssist to identify and save as FAQs.<\/p>\n<h3 class=\"w-post-elm post_title entry-title\"><span class=\"ez-toc-section\" id=\"Supported_File_Formats\"><\/span>Supported File Formats<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<div class=\"w-post-elm post_content\">\n<section class=\"l-section wpb_row height_auto width_full\">\n<div class=\"l-section-h i-cf\">\n<div class=\"g-cols vc_row type_default valign_top\">\n<div class=\"vc_col-sm-12 wpb_column vc_column_container\">\n<div class=\"vc_column-inner\">\n<div class=\"wpb_wrapper\">\n<div class=\"wpb_text_column \">\n<div class=\"wpb_wrapper\">\n<p>The FAQ Extract and Import features support extracting FAQs only from the following:<\/p>\n<ul>\n<li>JSON, CSV, PDF file formats\u00a0 and<\/li>\n<li>Webpages<\/li>\n<\/ul>\n<h4><span id=\"Comma-Seperated_Value_CSV\" class=\"ez-toc-section\">Comma-Separated Value (CSV)<\/span><\/h4>\n<ul>\n<li>The imported FAQs interpret the text in the first column as a question and that in the second column as an answer<\/li>\n<li>The file must not have any headers<\/li>\n<li>Any headers and the text present in the other columns are ignored<\/li>\n<\/ul>\n<h4><span id=\"Portable_Document_Format_PDF\" class=\"ez-toc-section\">Portable Document Format (PDF)<\/span><\/h4>\n<ul>\n<li>The\u00a0<b>Extracted FAQs<\/b> from PDF files processes the content from a PDF and converts it into question-answer pairs<\/li>\n<li>Documents with the table of contents: Ideally a document with a table of contents is preferred. In such cases, the table of contents is extracted first and then used to parse the document and identify headings. The information present in the table of contents is used to derive the hierarchy of headings (headings, subheadings, nested sub headings, etc.). These levels are separated by a vertical line as a delimiter (heading | subheading | sub-sub heading) as part of the extraction process<\/li>\n<li>Documents with no table of contents: In such cases, a pre-trained machine learning model is applied that identifies headings based on either font style or font size. In the case of using font size, the heading hierarchy can also be derived<\/li>\n<li>The text is then formatted with a uniform header and paragraph blocks<\/li>\n<\/ul>\n<h4><span id=\"Web_Pages\" class=\"ez-toc-section\">Web Pages<\/span><\/h4>\n<p>The Extract FAQs supports the following three different FAQ web pages:<\/p>\n<ul>\n<li>Plain FAQ pages with linear question-answer pairs<\/li>\n<li>Pages with question<\/li>\n<li>hyperlinks that point to answers on the same page<\/li>\n<li>Pages with question hyperlinks that point to answers on a different page<\/li>\n<\/ul>\n<p>Extraction of certain FAQs on the webpage fails under the following conditions:<\/p>\n<ul>\n<li>The question text is split between multiple HTML tags on the FAQ page<\/li>\n<li>The tag applied to the answer is neither the child nor the sibling of the extracted question as per the HTML DOM structure<\/li>\n<li>The question does not have a hyperlink to the answer (applies to FAQs with hyperlinks)<\/li>\n<li>When the questions hyperlink to the answer, but the question statement is not repeated above the answer (applies to FAQs with hyperlinks)<\/li>\n<\/ul>\n<p>The extraction of the entire FAQ page fails if the page consists of more than one FAQ page type mentioned previously.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<p>You can review the extracted FAQs in the Review workflow to edit, enhance FAQs and their answers.<\/p>\n<h3 class=\"w-post-elm post_title entry-title\"><span class=\"ez-toc-section\" id=\"Extracting_FAQs_from_Files\"><\/span>Extracting FAQs from Files<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<div class=\"w-post-elm post_content\">\n<section class=\"l-section wpb_row height_auto width_full\">\n<div class=\"l-section-h i-cf\">\n<div class=\"g-cols vc_row type_default valign_top\">\n<div class=\"vc_col-sm-12 wpb_column vc_column_container\">\n<div class=\"vc_column-inner\">\n<div class=\"wpb_wrapper\">\n<div class=\"wpb_text_column \">\n<div class=\"wpb_wrapper\">\n<p>Use the\u00a0<b>Extract FAQs<\/b>\u00a0option to extract all the FAQs that are listed in a PDF file or in web pages.<\/p>\n<p>To add FAQs through the\u00a0<strong>Extract FAQs<\/strong>\u00a0option, take the following steps:<\/p>\n<ol>\n<li>On the Sources page, click\u00a0<b>FAQs\u00a0<\/b>on the left pane.<\/li>\n<li>On the FAQs page, click the +<b>Add FAQ\u00a0<\/b>and select\u00a0<b>Extract FAQs<\/b>.<a ref=\"magnificPopup\" href=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_extract_FAQs.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-1908 size-large\" src=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_extract_FAQs-1024x125.png\" sizes=\"(max-width: 640px) 100vw, 640px\" srcset=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_extract_FAQs-1024x125.png 1024w, https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_extract_FAQs-300x37.png 300w\" alt=\"\" width=\"640\" height=\"78\" data-pagespeed-url-hash=\"390547449\" \/><\/a><\/li>\n<li>In the Extract FAQs dialog box, enter a name in the\u00a0<b>Source Title<\/b>\u00a0field and a description in the\u00a0<b>Description\u00a0<\/b>field.<\/li>\n<li>Extract FAQs from a file:\u00a0For file extraction \u2013 from the\u00a0<b>Extract from File\u00a0<\/b>section, drag and drop a file or click\u00a0<b>Browse\u00a0<\/b>to locate the PDF file.<a ref=\"magnificPopup\" href=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/Manage_Sources_Extract-FAQs-from-file.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-1462 size-large\" src=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/Manage_Sources_Extract-FAQs-from-file-1024x571.png\" sizes=\"(max-width: 640px) 100vw, 640px\" srcset=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/Manage_Sources_Extract-FAQs-from-file-1024x571.png 1024w, https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/Manage_Sources_Extract-FAQs-from-file-300x167.png 300w, https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/Manage_Sources_Extract-FAQs-from-file.png 1513w\" alt=\"\" width=\"640\" height=\"357\" data-pagespeed-url-hash=\"2184506520\" \/><\/a><\/li>\n<li>For the\u00a0<b>Extract from File<\/b>\u00a0option,\u00a0<b>Annotate and Extract<\/b> the uploaded file to identify and include only the FAQs in that file. Refer\u00a0<a href=\"https:\/\/multisite.korebots.com\/SearchAssist\/concepts\/extracting-faqs\/#Annotating_to_Extract_FAQs\">Annotating &amp; Extracting FAQs from documents<\/a><\/li>\n<\/ol>\n<p>In the\u00a0<b>Extract FAQs<\/b>\u00a0dialog box, find the extraction status. If required, cancel the extraction or click\u00a0<b>OK\u00a0<\/b>after the extraction is complete.<\/p>\n<\/div>\n<h3 class=\"w-post-elm post_title entry-title\"><span class=\"ez-toc-section\" id=\"Annotating_to_Extract_FAQs\"><\/span>Annotating to Extract FAQs<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<div class=\"w-post-elm post_content\">\n<section class=\"l-section wpb_row height_auto width_full\">\n<div class=\"l-section-h i-cf\">\n<div class=\"g-cols vc_row type_default valign_top\">\n<div class=\"vc_col-sm-12 wpb_column vc_column_container\">\n<div class=\"vc_column-inner\">\n<div class=\"wpb_wrapper\">\n<div class=\"wpb_text_column \">\n<div class=\"wpb_wrapper\">\n<p>You may have all the FAQs related to your business in a PDF file but not in the format mandated by SearchAssist. Annotate such documents identifying the key sections of the content from a few pages of the document. SearchAssist uses this\u00a0identified pattern from\u00a0annotation to extract the FAQs from the document.<\/p>\n<p>Note: This feature is applicable only when extracting FAQs from PDF documents.<\/p>\n<ol>\n<li>Select a PDF file for extraction.<\/li>\n<li>Select the\u00a0<b>Annotate &amp; Extract<\/b>\u00a0option.\u00a0 Click\u00a0<strong>Proceed<\/strong>.<a ref=\"magnificPopup\" href=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_from-pdf_annotate_extract.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-1829 size-large\" src=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_from-pdf_annotate_extract-1024x572.png\" sizes=\"(max-width: 640px) 100vw, 640px\" srcset=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_from-pdf_annotate_extract-1024x572.png 1024w, https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_from-pdf_annotate_extract-300x168.png 300w, https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/addFAQs_from-pdf_annotate_extract.png 1504w\" alt=\"\" width=\"640\" height=\"358\" data-pagespeed-url-hash=\"3461049340\" \/><\/a><\/li>\n<li>The PDF document is loaded into the Annotation Tool allowing you to annotate the various sections in the document.<\/li>\n<li>To annotate, select the text and tag it as follows:\n<ul>\n<li><strong>Heading:<\/strong><strong>\u00a0<\/strong> Apply Heading tag to train the Search Assistant so that it can identify the question. The content between any two consecutive headings is extracted as the answer for the preceding heading.<\/li>\n<li><b>Header:<\/b>\u00a0 Avoid random marking of texts as headers. Marking text such as a footer or paragraphs as the header produces invalid results.<\/li>\n<li><b>Footer:<\/b> Apply Footer tag to train the Search Assistant so that it can identify and ignore the footers. Avoid random marking of texts as footers. Marking text such as a header or paragraphs as the footer produces invalid results.<\/li>\n<li><b>Exclude:<\/b>\u00a0Apply Exclude tag to prevent the extraction\u00a0 of that section.<\/li>\n<li><b>Ignore Page:\u00a0<\/b>\u00a0Apply Ignore page tag to pages to be excluded from extraction.<\/li>\n<li><b>Remove Annotation:<\/b>\u00a0Apply this feature to undo any incorrect annotations and\u00a0to start annotating afresh.<\/li>\n<\/ul>\n<\/li>\n<li>The Search Assistant uses the headings, headers, and footers in the extraction process and can learn from it. You need not annotate the entire document. Annotate a couple of pages with headings, headers, and footers, extract and review the questions.<\/li>\n<li>The feature generates Additional document information:\n<ul>\n<li><b>Document Info<\/b>\u00a0includes Name, Size, and the Number of Pages of the document.<\/li>\n<li><b>Annotation Summary<\/b>\u00a0 includes Number of annotations marked for each category for the particular page and the entire document.<\/li>\n<\/ul>\n<\/li>\n<li>After you annotate, click\u00a0<b>Extract<\/b>\u00a0to apply the annotation to the entire document and extract FAQs from it in bulk.<\/li>\n<li>The extracted FAQs are listed under Drafts and mark the beginning of the FAQ review workflow. Refer\u00a0\u00a0<a href=\"https:\/\/multisite.korebots.com\/SearchAssist\/concepts\/content-sources\/adding-faqs\/faq-review-workflow\/\">FAQ Review\u00a0 Workflow<\/a>.<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<h3 class=\"w-post-elm post_title entry-title\"><span class=\"ez-toc-section\" id=\"Extracting_FAQs_from_URL\"><\/span>Extracting FAQs from URL<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<div class=\"w-post-elm post_content\">\n<section class=\"l-section wpb_row height_auto width_full\">\n<div class=\"l-section-h i-cf\">\n<div class=\"g-cols vc_row type_default valign_top\">\n<div class=\"vc_col-sm-12 wpb_column vc_column_container\">\n<div class=\"vc_column-inner\">\n<div class=\"wpb_wrapper\">\n<div class=\"wpb_text_column \">\n<div class=\"wpb_wrapper\">\n<p>Use The\u00a0<b>Extract FAQs<\/b>\u00a0option to extract all the FAQs that are available in a web page.<\/p>\n<p>To add FAQs through the Extract FAQs option, take the following steps:<\/p>\n<ol>\n<li>On the Sources tab, click\u00a0<b>FAQs\u00a0<\/b>on the left pane.<\/li>\n<li>On the FAQs page, click the\u00a0<b>Add FAQ\u00a0<\/b>and select\u00a0<b>Extract FAQs<\/b>.<\/li>\n<li>In the Extract FAQs dialog box, enter a name in the\u00a0<b>Source Title<\/b>\u00a0field and a description in the\u00a0<b>Description\u00a0<\/b>field.<\/li>\n<li>To extract FAQs from a web page\u00a0\u00a0click\u00a0<b>Extract from URL<\/b>\u00a0tab.\u00a0Enter the URL of the FAQ page or domain URL\u00a0in the\u00a0<b>Enter URL<\/b>\u00a0field.<\/li>\n<\/ol>\n<p>Note: When you are trying to extract FAQs from a misspelled or wrong URL that is a non-existent, the following error message pops up:<a ref=\"magnificPopup\" href=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/error_failed-to-add-source_faqs.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-1827 size-medium\" src=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/error_failed-to-add-source_faqs-300x190.png\" sizes=\"(max-width: 300px) 100vw, 300px\" srcset=\"https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/error_failed-to-add-source_faqs-300x190.png 300w, https:\/\/multisite.korebots.com\/SearchAssist\/wp-content\/uploads\/sites\/18\/2021\/12\/error_failed-to-add-source_faqs.png 598w\" alt=\"\" width=\"300\" height=\"190\" data-pagespeed-url-hash=\"3757121441\" \/><\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>SearchAssist\u00a0 enables you to extract FAQs either from Files or from URL. To extract FAQs from files you have to annotate the uploaded file i.e.,\u00a0 mark the various sections in the file uploaded for SearchAssist to identify and save as FAQs. Supported File Formats The FAQ Extract and Import features support extracting FAQs only from&#8230;<\/p>\n","protected":false},"author":18,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[71,57,70],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts\/2840"}],"collection":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/comments?post=2840"}],"version-history":[{"count":11,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts\/2840\/revisions"}],"predecessor-version":[{"id":3606,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts\/2840\/revisions\/3606"}],"wp:attachment":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/media?parent=2840"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/categories?post=2840"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/tags?post=2840"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}