[PLUTO-help] controllare i link di un sito

Gio 5 Apr 2007 15:15:56 CEST

simone ha scritto:
> ho un sito dove al interno vi sono tantissime pagine, contenenti numerosi link che puntano ad altre pagine (.html) ora siccome vorrei verificare quali link non puntano piu a nulla,e diconseguenza sarebbero da rimuovere, chiedo se qualcuno conosca un modo per fare questa verifica.
>
> pensavo a uno script, ma non saprei da dove iniziare :-(
>   
Non ho provato, ma un'idea potrebbe essere quella di fare una wget sul 
sito da controllare e redirigere lo standard error su un file, qualcosa 
tipo:
$ wget http://www.miosito.it -r -l 0 2> miofile.log
-r
--recursive
         Turn on recursive retrieving.

-l depth
--level=depth
           Specify recursion maximum depth level depth.  The default maximum
           depth is 5.
Attento con il -r perchè potresti rischiare di fare un mirror di mezza 
internet :-o, se vuoi restare all'interno del tuo sito metti zero (se 
non ricordo male).

altre opzioni interessanti, che ho tovato nel man sono:
 -O file
       --output-document=file
           The documents will not be written to the appropriate files, 
but all
           will be concatenated together and written to file.  If file 
already
           exists, it will be overwritten.  If the file is -, the documents
           will be written to standard output.
 -o logfile
       --output-file=logfile
           Log all messages to logfile.  The messages are normally 
reported to
           standard error.

--spider
           When invoked with this option, Wget will behave as a Web spider,
           which means that it will not download the pages, just check that
           they are there.  For example, you can use Wget to check your 
book-
           marks:

                   wget --spider --force-html -i bookmarks.html

           This feature needs much more work for Wget to get close to the
           functionality of real web spiders.

Buon divertimento!
Ciao, Fabio.