{"id":1380,"date":"2021-12-14T06:02:32","date_gmt":"2021-12-14T06:02:32","guid":{"rendered":"https:\/\/multisite.korebots.com\/SearchAssist\/?p=1380"},"modified":"2022-02-09T11:52:28","modified_gmt":"2022-02-09T11:52:28","slug":"supported-file-formats","status":"publish","type":"post","link":"https:\/\/multisite.korebots.com\/SearchAssist\/concepts\/content-sources\/adding-faqs\/supported-file-formats\/","title":{"rendered":"Supported File Formats"},"content":{"rendered":"<section class=\"l-section wpb_row height_auto\"><div class=\"l-section-h i-cf\"><div class=\"g-cols vc_row via_grid cols_1 laptops-cols_inherit tablets-cols_inherit mobiles-cols_1 valign_top type_default stacking_default\"><div class=\"wpb_column vc_column_container\"><div class=\"vc_column-inner\"><div class=\"wpb_text_column\"><div class=\"wpb_wrapper\"><p><span style=\"font-weight: 400;\">The FAQ Extract and Import features support extracting FAQs only from the following: <\/span><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">JSON, CSV, PDF file formats\u00a0 and\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Webpages\u00a0<\/span><\/li>\n<\/ul>\n<h3><span class=\"ez-toc-section\" id=\"Comma-Seperated_Value_CSV\"><\/span><span style=\"font-weight: 400;\">Comma-Seperated Value (CSV)<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The imported FAQs interpret the text in the first column as a question and that in the second column as an answer.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The file must not have any headers.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Any headers and the text present in the other columns are ignored.<\/span><\/li>\n<\/ul>\n<h3><span class=\"ez-toc-section\" id=\"Portable_Document_Format_PDF\"><\/span><span style=\"font-weight: 400;\">Portable Document Format (PDF)<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The <\/span><b>Extracted FAQs<\/b><span style=\"font-weight: 400;\"> from PDF files processes the content from a PDF and converts it into question-answer pairs.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Documents with the table of contents: Ideally a document with a table of contents is preferred. In such cases, the table of contents is extracted first and then used to parse the document and identify headings. The information present in the table of contents is used to derive the hierarchy of headings (headings, subheadings, nested sub headings, etc.). These levels are separated by a vertical line as a delimiter (heading | subheading | sub-sub heading) as part of the extraction process.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Documents with no table of contents: In such cases, a pre-trained machine learning model is applied that identifies headings based on either font style or font size. In the case of using font size, the heading hierarchy can also be derived.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The text is then formatted with a uniform header and paragraph blocks.<\/span><\/li>\n<\/ul>\n<h3><span class=\"ez-toc-section\" id=\"Web_Pages\"><\/span>Web Pages<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">The Extract FAQs supports the following three different FAQ web pages:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Plain FAQ pages with linear question-answer pairs.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Pages with question<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"> hyperlinks that point to answers on the same page.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Pages with question hyperlinks that point to answers on a different page.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Extraction of certain FAQs on the webpage fails under the following conditions:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The question text is split between multiple HTML tags on the FAQ page.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The tag applied to the answer is neither the child nor the sibling of the extracted question as per the HTML DOM structure.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The question does not have a hyperlink to the answer (applies to FAQs with hyperlinks).<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">When the questions hyperlink to the answer, but the question statement is not repeated above the answer (applies to FAQs with hyperlinks).<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">The extraction of the entire FAQ page fails if the page consists of more than one FAQ page type mentioned previously.<\/span><\/p>\n<\/div><\/div><\/div><\/div><\/div><\/div><\/section><section class=\"l-section wpb_row height_auto\"><div class=\"l-section-h i-cf\"><div class=\"g-cols vc_row via_grid cols_1 laptops-cols_inherit tablets-cols_inherit mobiles-cols_1 valign_top type_default stacking_default\"><div class=\"wpb_column vc_column_container\"><div class=\"vc_column-inner\"><div class=\"w-post-elm post_navigation layout_simple inv_false\"><a class=\"post_navigation-item order_first to_prev\" href=\"https:\/\/multisite.korebots.com\/SearchAssist\/concepts\/content-sources\/adding-faqs\/import-faqs-from-structured-data\/\" title=\"Importing FAQs from Structured Data\"><div class=\"post_navigation-item-arrow\"><\/div><div class=\"post_navigation-item-meta\">Previous Post<\/div><div class=\"post_navigation-item-title\"><span>Importing FAQs from Structured Data<\/span><\/div><\/a><a class=\"post_navigation-item order_second to_next\" href=\"https:\/\/multisite.korebots.com\/SearchAssist\/concepts\/content-sources\/adding-faqs\/add-conditions-to-faq-responses\/\" title=\"Adding Conditional Responses\"><div class=\"post_navigation-item-arrow\"><\/div><div class=\"post_navigation-item-meta\">Next Post<\/div><div class=\"post_navigation-item-title\"><span>Adding Conditional Responses<\/span><\/div><\/a><\/div><\/div><\/div><\/div><\/div><\/section>\n","protected":false},"excerpt":{"rendered":"The FAQ Extract and Import features support extracting FAQs only from the following: \u00a0 JSON, CSV, PDF file formats\u00a0 and\u00a0 Webpages\u00a0 Comma-Seperated Value (CSV) The imported FAQs interpret the text in the first column as a question and that in the second column as an answer. The file must not have any headers. Any headers...","protected":false},"author":18,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[71],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts\/1380"}],"collection":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/comments?post=1380"}],"version-history":[{"count":10,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts\/1380\/revisions"}],"predecessor-version":[{"id":2274,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts\/1380\/revisions\/2274"}],"wp:attachment":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/media?parent=1380"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/categories?post=1380"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/tags?post=1380"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}