Daboo Meta Tag Generator Deluxe Bot
|
Version 0.7
Copyright 1997-2004 David Dienhart All Rights Reserved.
Release Date: 1-8-2004
http://www.dienhart.com |
| |
License Agreement
|
|
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
See GNU-GPL_2.html for
complete license
|
|
| |
Files
|
- DMetaGenDlx.pl (Spider Script)
- DMetaGenDlx.exe (Compiled Spider)
- DMetaGenDlx.html (Documentation)
- site.txt (search index (autogenerated))
- GNU-GPL_2.html (License Agreement)
- progresslog.txt (a list of all pages that were indexed or attempted
to be indexed)
|
| |
Requirements
|
- Linux, UNIX, or Windows
- PERL 5.8.4 (Required by DMetaGenDlx.pl)
- HTML::LinkExtor (Required by DMetaGenDlx.pl)
- LWP::UserAgent (Required by DMetaGenDlx.pl)
- HTML::TokeParser (Required by DMetaGenDlx.pl)
* Note: DMetaGenDlx.exe does not require PERL to be installed. It is
a standalone Windows executable.
|
| |
Description
|
- DMetaGenDlxBot.pl indexes the selected site and generates an index based on
the Page Address, and Page Content.
- DMetaGenDlxBot.exe indexes the selected site and generates an index based on the Page Address, and Page Content and does not require PERL to be installed.
- The primary purpose for this program is to use the power and ease
of PERL to index sites for use with other applications.
|
| |
Setup
|
|
There is not much setup required, as DMetaGenDlxBot is designed to run from the command line with variables passed to it.
- If you are using DMetaGenDlxBot.pl on a UNIX or LINUX box, you will need to CHMOD it to 755.
|
| |
Usage
|
|
DMetaGenDlx is executed from the command line as follows:
- PERL DMetaGenDlx.pl http://yourhost.com (If you choose to run the
perl source)
- DMetaGenDlx.exe http://yourhost.com (If you choose to run the
binary)
- Note: do not type index.html or any specified page, the robot
will not function correctly.
- An ASCII table will be generated (site.txt) containing the URL and
the Body of the site.
- As the spider indexes your site the results are returned to the command prompt window. This is useful if you develop an application that uses DMetaGenDlxBot.exe and you wish to capture and provide feedback.
|
| |
Notes
|
- Daboo Meta Tag Generator Deluxe Bot will not index outside of the
domain it is set to crawl. This includes sub-domains. If I set DMetaGenDlxBot
to index "help.dienhart.com" and there are links within the
site to "free.dienhart.com", the spider considers "free.dienhart.com"
to be another domain and does not index it.
|
| |
Troubleshooting
|
- The Crawler Progress is printed to the screen during execution. The
following messages should help you figure out what is going on:
- ADDED: This indicates links that were successfully indexed and
added to searchdb.txt.
- DUPLICATE NOT ADDED: This indicates that this page has already
been added to the index.
- INVALID LINK NOT ADDED: This includes links outside the domain
being spidered and broken links found within the domain.
- FILTERED NOT ADDED: email links and certain file types are ignored
to improve crawler performance.
|
| |
History
|
| 0.05 (11-30-2004) |
- Initial Release
- Based on DLBot.pl Revision 0.04
|
| |
| 0.7 (1-8-2005) |
- Improved script performance. DMetaGenDlxBot.exe now writes the index to a file as the site is indexed while at the same time it returns the pages that are currently being indexed to the calling program. This interfaces and integrates very well with Delphi.
|