Web Crawler
Web Crawler allows websites or other pages to be crawled, and their content indexed to be used as Knowledge across the Talkdesk platform.
When you use the web crawler to retrieve webpages and index them as your documents, you specify the websites you want. For that, provide the respective URLs. You can only crawl websites that use the secure communication protocol: Hypertext Transfer Protocol Secure (HTTPS). If you receive an error when crawling a website, it could be that the website is blocked from that action.
If a website requires basic authentication, please provide the username and password.
The Web Crawler “Connection Settings” is composed of two options and one mandatory field:
- “Username” [1]: Your user’s account.
- “Password” [2]: Refers to your password.
- “URL(s)” [3]: Provide the seed or sitemap URLs of the website or websites you want to index.
Refresh Settings
For all sources, the “Refresh Settings” define when and how frequently the knowledge base should be re-indexed so that the information is up-to-date.
- “Initial time” [1]: Select the date (including hours and minutes) of the first time the knowledge base should be re-indexed.
- “Period” [2]: Select how frequently the data should be re-indexed.
Updating a Knowledge Base
To update an existing Knowledge Base, please follow these steps:
- Go to the External Sources page, locate the knowledge base you wish to edit, and click on the settings icon [1].
- Change the data as you see fit. The fields need to be filled according to the instructions above for each source.
- When you’re done, hit the Save button.
Deleting a Knowledge source
To delete an existing Knowledge source, please follow these instructions:
- Go to the External sources page, locate the knowledge source you wish to delete, click on the settings icon [1] and choose Delete.
- To confirm the deletion, simply select Delete Knowledge source [2] in the pop-up window.