{"id":454,"date":"2021-05-13T08:55:38","date_gmt":"2021-05-13T08:55:38","guid":{"rendered":"https:\/\/multisite.korebots.com\/SearchAssist\/?p=454"},"modified":"2021-06-28T07:49:31","modified_gmt":"2021-06-28T07:49:31","slug":"web-content","status":"publish","type":"post","link":"https:\/\/multisite.korebots.com\/SearchAssist\/source-management\/content\/web-content\/","title":{"rendered":"Web Content"},"content":{"rendered":"<section class=\"l-section wpb_row height_auto\"><div class=\"l-section-h i-cf\"><div class=\"g-cols vc_row via_grid cols_1 laptops-cols_inherit tablets-cols_inherit mobiles-cols_1 valign_top type_default stacking_default\"><div class=\"wpb_column vc_column_container\"><div class=\"vc_column-inner\"><div class=\"wpb_text_column\"><div class=\"wpb_wrapper\"><p>O<span style=\"font-weight: 400;\">rganizations might already have a web page listing the features or product details that the search users might be looking for. Business users can leverage this information and enable the SearchAssist to respond to search user queries without replicating the data.\u00a0 <\/span><\/p>\n<p>SearchAssist allows for the content to be ingested into the application through web crawling. For example, consider a banking website. The banking website contains the bulk of the information that answers the search user queries. In this scenario, the SearchAssist application is configured to crawl the bank&#8217;s website and index all the web pages so that the indexed pages are retrieved to answer the search users\u2019 queries.<\/p>\n<\/div><\/div><div class=\"w-separator size_small with_line width_default thick_1 style_solid color_border align_center\"><div class=\"w-separator-h\"><\/div><\/div><div class=\"wpb_text_column\"><div class=\"wpb_wrapper\"><h2><span class=\"ez-toc-section\" id=\"Web_Crawling\"><\/span>Web Crawling<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Web Crawling allows you to extract and index content from single or multiple websites to make the content ready for search. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">To crawl web domains, follow the below steps:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Log in to the application with valid credentials.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Click the <\/span><b>Indices <\/b><span style=\"font-weight: 400;\">tab on the top.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">On the left pane, under the <\/span><b>Sources <\/b><span style=\"font-weight: 400;\">section, click <\/span><b>Content<\/b><span style=\"font-weight: 400;\">.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">On the Add Content page, click <\/span><b>Crawl Web Domain<\/b><span style=\"font-weight: 400;\">.<\/span><br \/>\n<a ref=\"magnificPopup\" href=\"http:\/\/docs.kore.ai\/searchassist\/wp-content\/uploads\/sites\/4\/2021\/05\/add_content.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1205\" src=\"http:\/\/docs.kore.ai\/searchassist\/wp-content\/uploads\/sites\/4\/2021\/05\/add_content.png\" alt=\"\" width=\"154\" height=\"135\" \/><\/a><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">On the Crawl Web Domain dialog box, enter the domain URL in the <\/span><b>Source URL<\/b><span style=\"font-weight: 400;\"> field.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Enter a name in the <\/span><b>Source Title<\/b><span style=\"font-weight: 400;\"> field and a description in the <\/span><b>Description <\/b><span style=\"font-weight: 400;\">field.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">To schedule the web crawl, under the <\/span><b>Schedule <\/b><span style=\"font-weight: 400;\">section, turn on the toggle.<\/span>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Set the Start <\/span><b>Date<\/b> and<span style=\"font-weight: 400;\">\u00a0<\/span><b>Time<\/b><span style=\"font-weight: 400;\">, and <\/span><b>Frequency<\/b> at which the crawl needs to be scheduled<span style=\"font-weight: 400;\">. This is possible only if the schedule toggle is turned on.<\/span><\/li>\n<\/ul>\n<p><a ref=\"magnificPopup\" href=\"http:\/\/docs.kore.ai\/searchassist\/wp-content\/uploads\/sites\/4\/2021\/05\/crawl_schedule.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1207\" src=\"http:\/\/docs.kore.ai\/searchassist\/wp-content\/uploads\/sites\/4\/2021\/05\/crawl_schedule.png\" alt=\"\" width=\"1057\" height=\"133\" \/><\/a><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Under the <\/span><b>Crawl Option<\/b><span style=\"font-weight: 400;\"> section, select an option from the drop-down list:<\/span>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><em>Crawl Everything<\/em> &#8211; To enable crawling all the URLs that belong to the web domain.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><em>Crawl Everything Except Specific URLs<\/em> &#8211; To list down the URLs within the web domain that you want to ignore from crawling.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><em>Crawl Only Specific URLs<\/em> &#8211; To list down only the URLs that you want to crawl from the web domain.<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Select <\/span>Crawl Settings as per your requirements<span style=\"font-weight: 400;\">:\u00a0<\/span>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><em>Java Script-rendered<\/em> &#8211; allow crawling of websites with content rendered through JS code.<\/span><\/li>\n<li><em>Crawl Beyond Sitemap<\/em> &#8211; allow crawling the web pages above and beyond the URLs that are provided in the sitemap file of the target website.<\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><em>Use Cookies<\/em> &#8211; allow crawling the web pages that require cookie acceptance.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><em>Respect robots.txt<\/em> &#8211; to honor any directives from the robots.txt file for the web domain<\/span><\/li>\n<li><em>Crawl Depth<\/em> &#8211; The maximum depth allowed to crawl any site, the value of 0 indicates no limit<\/li>\n<li><em>Max URL Limit<\/em> &#8211; The maximum number of URLs to be crawled, the value of 0 indicates no limit<\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Click <\/span><b>Proceed<\/b><span style=\"font-weight: 400;\">.<br \/>\n<a ref=\"magnificPopup\" href=\"http:\/\/docs.kore.ai\/searchassist\/wp-content\/uploads\/sites\/4\/2021\/05\/crawl_web1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1206\" src=\"http:\/\/docs.kore.ai\/searchassist\/wp-content\/uploads\/sites\/4\/2021\/05\/crawl_web1.png\" alt=\"\" width=\"1089\" height=\"765\" \/><\/a><\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Crawl Web Domain dialog box appears with the URL validation status. <\/span><\/li>\n<li style=\"font-weight: 400;\">\u00a0You can choose to <strong>Crawl<\/strong> immediately or later.<\/li>\n<\/ol>\n<\/div><\/div><div class=\"w-separator size_small with_line width_default thick_1 style_solid color_border align_center\"><div class=\"w-separator-h\"><\/div><\/div><div class=\"wpb_text_column\"><div class=\"wpb_wrapper\"><h2><span class=\"ez-toc-section\" id=\"Management\"><\/span>Management<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Once you add content to the application, it needs to be updated as the content from websites may not be static. You can manage (schedule periodic web crawling and edit crawling) and ensure that the content is in sync with the data on the website.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1362 size-full\" src=\"http:\/\/docs.kore.ai\/searchassist\/wp-content\/uploads\/sites\/4\/2021\/05\/manage_content.png\" alt=\"\" width=\"1366\" height=\"344\" \/><\/p>\n<\/div><\/div><div class=\"wpb_text_column\"><div class=\"wpb_wrapper\"><h3><span class=\"ez-toc-section\" id=\"Manage\"><\/span>Manage<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Once the content has been added, you can perform the following actions:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">On the Content list view page, for the respective source from the list<\/span>\n<ol>\n<li>you can\u00a0<strong>delete<\/strong> the source;<\/li>\n<li><strong>recrawl<\/strong> in case of web content.<\/li>\n<\/ol>\n<\/li>\n<li>Click on any content row for the content dialog box with the following details are displayed:\n<ul>\n<li>Name<\/li>\n<li>Description<\/li>\n<li>for web content\n<ul>\n<li>Pages crawled, and last updated time<\/li>\n<li>Configurations as specified above, which are editable<\/li>\n<li>Crawl execution details along with the log<\/li>\n<\/ul>\n<\/li>\n<li>for file content\n<ul>\n<li>Number of pages<\/li>\n<li>Document preview and option to download the same<\/li>\n<li>Date and time of update<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/div><\/div><div class=\"w-separator size_small with_line width_default thick_1 style_solid color_border align_center\"><div class=\"w-separator-h\"><\/div><\/div>[vc_raw_html]JTNDZGl2JTIwY2xhc3MlM0QlMjJ3LXBvc3QtZWxtJTIwcG9zdF9uYXZpZ2F0aW9uJTIwbGF5b3V0X3NpbXBsZSUyMGludl9mYWxzZSUyMiUzRSUwQSUwOSUzQ2ElMjBjbGFzcyUzRCUyMnBvc3RfbmF2aWdhdGlvbi1pdGVtJTIwb3JkZXJfZmlyc3QlMjB0b19wcmV2JTIyJTIwaHJlZiUzRCUyMiUyRlNlYXJjaEFzc2lzdCUyRnNvdXJjZS1tYW5hZ2VtZW50JTJGY29udGVudCUyRiUyMiUzRSUzQyUyRnAlM0UlMEElM0NkaXYlMjBjbGFzcyUzRCUyMnBvc3RfbmF2aWdhdGlvbi1pdGVtLWFycm93JTIyJTNFJTNDJTJGZGl2JTNFJTBBJTNDZGl2JTIwY2xhc3MlM0QlMjJwb3N0X25hdmlnYXRpb24taXRlbS1tZXRhJTIyJTNFUHJldmlvdXMlM0MlMkZkaXYlM0UlMEElM0NkaXYlMjBjbGFzcyUzRCUyMnBvc3RfbmF2aWdhdGlvbi1pdGVtLXRpdGxlJTIyJTNFJTNDc3BhbiUzRUFkZCUyMENvbnRlbnQlM0MlMkZzcGFuJTNFJTNDJTJGZGl2JTNFJTBBJTNDcCUzRSUzQyUyRmElM0UlM0NiciUyMCUyRiUzRSUwQSUzQ2ElMjBjbGFzcyUzRCUyMnBvc3RfbmF2aWdhdGlvbi1pdGVtJTIwb3JkZXJfc2Vjb25kJTIwdG9fbmV4dCUyMiUyMGhyZWYlM0QlMjIlMjMlMjIlM0UlM0MlMkZwJTNFJTBBJTNDZGl2JTIwY2xhc3MlM0QlMjJwb3N0X25hdmlnYXRpb24taXRlbS1tZXRhJTIyJTNFJTNDJTJGZGl2JTNFJTBBJTNDZGl2JTIwY2xhc3MlM0QlMjJwb3N0X25hdmlnYXRpb24taXRlbS10aXRsZSUyMiUzRSUzQyUyRmRpdiUzRSUwQSUzQ3AlM0UlM0MlMkZhJTNFJTNDJTJGZGl2JTNFJTBB[\/vc_raw_html]<div class=\"w-separator size_small with_line width_default thick_1 style_solid color_border align_center\"><div class=\"w-separator-h\"><\/div><\/div><\/div><\/div><\/div><\/div><\/section>\n","protected":false},"excerpt":{"rendered":"Organizations might already have a web page listing the features or product details that the search users might be looking for. Business users can leverage this information and enable the SearchAssist to respond to search user queries without replicating the data.\u00a0 SearchAssist allows for the content to be ingested into the application through web crawling....","protected":false},"author":12,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts\/454"}],"collection":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/comments?post=454"}],"version-history":[{"count":17,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts\/454\/revisions"}],"predecessor-version":[{"id":1226,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/posts\/454\/revisions\/1226"}],"wp:attachment":[{"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/media?parent=454"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/categories?post=454"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multisite.korebots.com\/SearchAssist\/wp-json\/wp\/v2\/tags?post=454"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}