[med-svn] [liblwp-parallel-perl] 08/15: New upstream version 2.62
Andreas Tille
tille at debian.org
Tue Dec 26 12:28:33 UTC 2017
This is an automated email from the git hooks/post-receive script.
tille pushed a commit to branch master
in repository liblwp-parallel-perl.
commit 58772d4ba960102f5ec4d07e1358bc745d7280ac
Author: Andreas Tille <tille at debian.org>
Date: Tue Dec 26 13:22:21 2017 +0100
New upstream version 2.62
---
ChangeLog | 405 ++++++++++
MANIFEST | 30 +
META.json | 42 +
META.yml | 23 +
Makefile.PL | 37 +
README | 162 ++++
README.SSL | 14 +
TODO | 13 +
debian/changelog | 27 -
debian/compat | 1 -
debian/control | 35 -
debian/copyright | 34 -
debian/liblwp-parallel.docs | 3 -
debian/rules | 4 -
debian/watch | 3 -
lib/Bundle/ParallelUA.pm | 40 +
lib/LWP/Parallel.pm | 300 +++++++
lib/LWP/Parallel/Protocol.pm | 305 +++++++
lib/LWP/Parallel/Protocol/file.pm | 298 +++++++
lib/LWP/Parallel/Protocol/ftp.pm | 657 +++++++++++++++
lib/LWP/Parallel/Protocol/http.pm | 489 +++++++++++
lib/LWP/Parallel/Protocol/https.pm | 33 +
lib/LWP/Parallel/RobotUA.pm | 565 +++++++++++++
lib/LWP/Parallel/UserAgent.pm | 1571 ++++++++++++++++++++++++++++++++++++
lib/LWP/ParallelUA.pm | 59 ++
lib/LWP/RobotPUA.pm | 59 ++
t/TEST | 43 +
t/live/ENABLED.off | 0
t/live/jigsaw-auth-b.t | 51 ++
t/live/jigsaw-auth-d.t | 33 +
t/live/jigsaw-md5.t | 27 +
t/live/jigsaw-neg.t | 15 +
t/live/mozilla-ftp.t | 32 +
t/local/compatibility.t | 391 +++++++++
t/local/file.t | 260 ++++++
t/local/http.t | 370 +++++++++
t/local/timeouts.t | 249 ++++++
37 files changed, 6573 insertions(+), 107 deletions(-)
diff --git a/ChangeLog b/ChangeLog
new file mode 100755
index 0000000..00ca026
--- /dev/null
+++ b/ChangeLog
@@ -0,0 +1,405 @@
+Sun May 29 2016 Michael South <msouth at cpan.org>
+ o CPAN Release 2.62
+
+ o Fix HTTPS support with newer LWP versions, a HUGE thanks
+ to Sebastian Willing <sewi at cpan.org> for taking care of this.
+
+Mon Feb 18 2013 Michael South <msouth at cpan.org>
+ o CPAN Release 2.61
+
+ o Correcting error in Makefile.PL :(
+
+Mon Feb 18 2013 Michael South <msouth at cpan.org>
+ o CPAN Release 2.60
+
+ o using 'localhost' when creating daemons for any local/*.t
+ tests. Some machines don't return a reachable domain
+ for `hostname`. This may need further refinement.
+ o exiting explicitly from daemon in local/*.t code that
+ creates them. This was already done in timeouts.t,
+ adding it to the other two locations.
+ o attempting a fix for https://rt.cpan.org/Ticket/Display.html?id=46821
+ without having reproduced. Should at least do no harm, though.
+ o applied (some of) the patch for https://rt.cpan.org/Ticket/Display.html?id=35775
+ by CHORNY at cpan.org (issue with use of `pwd` on Win32)
+ o adding (generated) META.yml and META.json
+
+Fri Feb 15 2013 Michael South <msouth at cpan.org>
+ o CPAN Release 2.59
+
+ o some tests were failing due to prerequisites that were not specified
+ o a little more robustness/info in debug mode of t/local/compatibility.t
+
+Tue Dec 11 2012 Michael South <msouth at cpan.org>
+
+ o CPAN Release 2.58
+
+ o updates for newer versions of libwww (tests now pass with libwww 6.04)
+ o "harder" deprecation of some (previously deprecated) UA shadow methods
+ o removed some deprecated constructs that were throwing warnings
+
+Tue Feb 10 2004 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.57
+
+ o redirect bug fixed (patch by Thomas Boutell)
+
+Mon May 26 2003 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.56
+
+ o forgot to package t/TEST in the latest release, so all tests failed
+
+ o some miscenallenous bugfixes (undefined values, etc)
+
+Fri May 23 2003 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.55 (namefix release: didn't realize PAUSE had some
+ special handling mechanisms for x.yy_zz version names, which resulted
+ in the 2.54_21 release not being indexed)
+
+ o No code changes to 2.54_21!
+
+Tue Mar 11 2003 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.54_21 (bugfix release)
+
+ o Packaging
+
+Wed Feb 19 2003 Marc Langheinrich <marclang at cpan.org>
+
+ o Release 2.54_20 (beta testing only)
+
+ o bugfix: [Parallel::UserAgent.pm] called _new_response without fully
+ qualifiying the package (should be LWP::UserAgent::_new_response),
+ as this was screwed up in the bugfix in 2.54_19
+
+Wed Feb 19 2003 Marc Langheinrich <marclang at cpan.org>
+
+ o Release 2.54_19 (beta testing only)
+
+ o rearranged tests and copied LWP's t/TEST file for ease-of-use
+
+ o added local/file tests
+
+ o added live tests (mozilla, jigsaw.w3c.org), see t/TEST
+
+ o added [Parallel::Protocol::file.pm] by Jeff Behr
+
+ o added file.pm support to [Parallel::UserAgent.pm]
+
+ o bugfix: [Parallel::Protocol.pm] fixed some LWP::Debug::trace calls
+
+ o bugfix: [Parallel::Protocol.pm] receiving into a file would always
+ _append_
+
+ o bugfix: [Parallel::UserAgent.pm] called _new_reponse without $self->
+
+ o bugfix: [Parallel::UserAgent.pm] _single_request would drop parameters
+ when delegating to ->register
+
+ o bugfix: [Parallel::UserAgent.pm] send_request and request would create
+ new useragent, without copying currents ua's settings. now they operate
+ on $self instead (thus retaining all current settings)
+
+ o bugfix: [Parallel::RobotUA.pm] fixed syntax errors!! :-(
+
+ o bugfix: [Parallel::RobotUA.pm] $rules->visit would not be updated if
+ an earlier visit had failed (even though the current one worked), or
+ if we requested the robots.txt file itself.
+
+ o bugfix: [Parallel::RobotUA.pm] added ->use_sleep support
+
+ o checked [Parallel::RobotUA] against LWP::RobotUA 1.18
+
+ o updated RobotUA->delay method to use minutes instead of seconds
+ (compatible with LWP::RobotUA behavior)
+
+ o bugfix: [Parallel::Protocol::ftp.pm] fixed HTTP::Response calls (forgot to
+ add "->new", aargh!)
+
+ o bugfix: [Parallel::Protocol::ftp.pm] added $key and $account vars to make
+ conn_cache work
+
+ o bugfix: [Parallel::Protocol::ftp.pm] fixed various syntax errors!!! :-|
+
+ o renamed Parallel::Protocol::http::connect method to _connect (to indicate
+ that it's private)
+
+ o bugfix: [Parallel::Protocol::http.pm] handle_connect had wrong return
+
+ o bugfix: [Parallel::Protocol::http.pm] pushback buffer (also spotted by
+ Dongqiang Bai)
+
+ o bugfix: [Parallel::Protocol::http.pm] empty buffers passed to ->receive
+
+ o bugfix: [Parallel::Protocol::http.pm] handle_connect had wrong return
+ format
+
+Thu Mar 28 2002 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.54
+
+ o compatibility fixes only in this release (against LWP 5.64). This also
+ means still no support for HTTP/1.1 :-( Uses LWP 'backdoor' to force use
+ of HTTP/1.0
+
+ o checked against libwww-5.64
+
+Mon May 28 2001 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.51
+
+ o NO SUPPORT FOR HTTP/1.1 YET! (new in libwww-perl-5.53)
+
+ o connections that time out halfway through their data transfer now
+ call the on_failure callback (nothing was called in prior versions,
+ thanks to Jonathan Feinberg <jdf at pobox.com> for spotting this)
+
+ o prepared compatibilty for POE http://sourceforge.net/projects/poe/
+ (still needs some work, thanks to http://www.en-directo.net/mail/kirill.html)
+
+ o added (optional) non-blocking connects (use $ua->nonblock(1))
+ (experimental, thanks to http://www.en-directo.net/mail/kirill.html)
+
+ o Fixed bug in Useragent.pm assigning same response to all timed out
+ entries (thanks to John Salmon <john at thesalmons.org>)
+
+ o checked against libwww-5.53
+
+Mon Feb 21 2001 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.50
+
+ o Fixed https.pm inheritance bug
+ (thanks to Chris Osborn <cosborn at digisle.net> and countless others)
+
+ o Fixed (serious) RobotUA.pm bug, where robots.txt files get ignored
+ (thanks to Vlad Ciubotariu <ciu at infoiasi.ro>)
+
+ o checked against libwww-5.50
+
+Mon May 08 2000 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.49
+
+ o added initial https support
+ (as suggested by Marian Szabo <mszabo at coralwave.com>)
+
+Mon May 08 2000 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.48a
+
+ o Added (empty) DESTROY method to Parallel::UserAgent::Entry in
+ order to please perl5.6
+
+Wed Apr 19 2000 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.48
+
+ o Updated changes in libwww-5.48 (since 5.43)
+ UserAgent.pm: 1.66 -> 1.73
+ RobotUA.pm: 1.15 -> 1.17
+ Protocol.pm: 1.33 -> 1.36
+ http.pm: 1.46 -> 1.49
+ ftp.pm: 1.25 -> 1.27
+
+Sun Jul 18 1999 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.43
+
+ o Fixed FTP mode and finally unblocked ftp-scheme again :-)
+
+ o Checked against libwww-5.43
+
+Thu Apr 15 1999 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.41
+
+ o Updated changes in libwww-5.42:
+ UserAgent.pm: 1.64 -> 1.66
+ RobotUA.pm: 1.14 -> 1.15
+ Protocol.pm: 1.31 -> 1.33
+ http.pm: 1.44 -> 1.46
+
+ o Fixed $ua->max_size bug in Parallel/Protocol.pm
+
+Tue Jan 12 1999 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.40
+
+ o Updated changes in libwww-5.41:
+ UserAgent.pm: 1.62 -> 1.64
+ RobotUA.pm: 1.13 -> 1.14
+ http.pm: 1.43 -> 1.44
+ ftp.pm: 1.24 -> 1.25
+
+ o Fixed '$1' bug in Parallel/Protocol/http.pm
+
+Tue Nov 24 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.39
+
+ o Added incompatibility warning for libwww > supported ones
+ (currently up to 5.36)
+
+ o Added cookie jar to redirects (fix by Andrey Chernov <ache at nagual.pp.ru>)
+
+Tue Nov 17 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o Fixed problem with otherwise successful requests that took a
+ long time to have their handles closed. These would be set to "408
+ timeout" although all the data was already transmitted. Now the
+ existing status is preserved and the string (timeout) appended
+ instead.
+
+ o Added more timeout.t tests to make sure the above scheme works.
+
+Tue Nov 10 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.38
+
+ o Split parallel.t into three separate testfiles: compatibility.t,
+ parallel.t and timeouts.t
+
+ o Fixed timeout handling in Parallel::UserAgent.
+
+ o Added Timeout tests to parallel.t testscript (this included
+ using a forking HTTP::Daemon instead of our serial one, since
+ otherwise the timeout simulations would break the server).
+
+ o Fixed die's when calling ->previous methods in parallel.t
+
+Tue Sep 29 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o Fixed testcript parallel.t for broken redirection on ppc-linux.
+
+Tue Sep 22 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.37
+
+ o Fixed bug in Parallel::UserAgent where timeout would fail to create
+ HTTP::Response object.
+
+ o Added some simple examples to LWP::Parallel manpage.
+
+Mon Sep 7 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.36
+
+ o Fixed bogus version information in LWP::Parallel and Bundle::
+ ParallelUA.(forgot to update $VERSION variable accordingly)
+
+ o Fixed missing SYNOPSIS element in ParallelUA.pm and RobotPUA.pm
+
+ o Blocked ftp-scheme handling since it wasn't working at all. Thanks
+ to Gary Foster for actually being the first person to try using
+ Paralle::UserAgent with ftp requests. This will hopefully be fixed
+ in the next release.
+
+Tue Sep 1 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.35
+
+ o http.pm: updated changes in libwww-5.46 (http.pm,v 1.43). Fixed
+ problems with "die"s during operation (now caught by UserAgent.pm)
+
+ o ftp.pm: updated changes in libwww-5.46 (ftp.pm,v 1.24)
+
+ o UserAgent.pm: Fixed "eval{}" calls to Protocol.pm objects. Updated
+ changes in libwww-5.46 (UserAgent.pm,v 1.62)
+
+ o Checked against libwww-5.36
+
+Thu Jul 9 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.34
+
+ o Fixed wrong package name in LWP::Parallel::Protocol::ftp
+
+ o Fixed typo in Parallel::UserAgent
+
+ o Updated libwww-5.34 changes (one in ftp.pm,v 1.23)
+
+ o Checked against libwww-5.34
+
+Wed Jun 10 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o Developers Release 2.33a
+
+ o Fixed "die"s in Parallel/Protocol/http.pm to return
+ HTTP::Response instead.
+
+ o Adjusted Parallel/UserAgent.pm to handle http.pm errors.
+
+ o Checked against libwww-5.33
+
+Mon Apr 27 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o CPAN Release 2.32
+
+ o Updated libwww-5.31 bugfixes (one in UserAgent.pm,v 1.60)
+
+ o Checked against libwww-5.31 and 5.32
+
+Fri Mar 27 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o First official CPAN release, version 2.31
+
+ o Updated libwww5.22 bugfixes (one in UserAgent.pm,v 1.59)
+
+ o Checked against libwww-5.22.
+
+Fri Mar 13 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o Developer Release 2.3c (pre-release of 2.31)
+
+ o Fixed installation bug in Makefile.PL (LWP uses ->Version method
+ for version checking).
+
+ o Updated libwww5.21 bugfixes (two in http.pm,v 1.41).
+
+ o Checked against libwww-5.21.
+
+
+Wed Mar 11 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o Developer Release 2.3b
+
+ o Added $ua->discard_entry($entry) method. This can be used in
+ callbacks to discard the entire entry thus freeing up memory.
+
+ o Makefile.PL will check for correct version of LWP.
+
+ o Fixed some installation glitches.
+
+
+Fri Mar 6 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o Developer Release 2.3a
+
+ o Moved all modules into the LWP::Parallel subdirectory.
+
+ o LWP::ParallelUA and LWP::RobotPUA only stubs for 'real' modules
+ underneath LWP::Parallel tree (LWP::Parallel::UserAgent and
+ LWP::Parallel::RobotUA).
+
+ o Parallel::UserAgent, Parallel::RobotUA and Parallel::Protocol now
+ inherent most of their code from the corresponding LWP modules.
+
+ o Added some documentation in Parallel::UserAgent.
+
+ o Added Makefile.PL support (should compile & install now out of
+ the box using the standard sequence "perl Makefile.PL; make;
+ make test; make install").
+
+ o Supplied test script (in t/parallel.t)
+
+ o Checked against libwww-5.20
+
+
+Mon Mar 2 1998 Marc Langheinrich <marclang at cpan.org>
+
+ o Created standard perl module package format.
+
+
diff --git a/MANIFEST b/MANIFEST
new file mode 100755
index 0000000..473a8f7
--- /dev/null
+++ b/MANIFEST
@@ -0,0 +1,30 @@
+ChangeLog
+MANIFEST
+TODO
+README
+README.SSL
+Makefile.PL
+META.yml
+META.json
+lib/Bundle/ParallelUA.pm
+lib/LWP/Parallel.pm
+lib/LWP/ParallelUA.pm
+lib/LWP/RobotPUA.pm
+lib/LWP/Parallel/UserAgent.pm
+lib/LWP/Parallel/RobotUA.pm
+lib/LWP/Parallel/Protocol.pm
+lib/LWP/Parallel/Protocol/http.pm
+lib/LWP/Parallel/Protocol/https.pm
+lib/LWP/Parallel/Protocol/ftp.pm
+lib/LWP/Parallel/Protocol/file.pm
+t/TEST
+t/local/compatibility.t
+t/local/file.t
+t/local/http.t
+t/local/timeouts.t
+t/live/jigsaw-auth-b.t
+t/live/jigsaw-auth-d.t
+t/live/jigsaw-md5.t
+t/live/jigsaw-neg.t
+t/live/mozilla-ftp.t
+t/live/ENABLED.off
diff --git a/META.json b/META.json
new file mode 100644
index 0000000..27fc3b2
--- /dev/null
+++ b/META.json
@@ -0,0 +1,42 @@
+{
+ "abstract" : "Extension for LWP to allow parallel HTTP and FTP access",
+ "author" : [
+ "Marc Langheinrich <marclang at cpan.org>, Mike South (maintainer) <msouth at cpan.org>"
+ ],
+ "dynamic_config" : 1,
+ "generated_by" : "ExtUtils::MakeMaker version 6.96, CPAN::Meta::Converter version 2.150001",
+ "license" : [
+ "perl_5"
+ ],
+ "meta-spec" : {
+ "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",
+ "version" : "2"
+ },
+ "name" : "ParallelUserAgent",
+ "no_index" : {
+ "directory" : [
+ "t",
+ "inc"
+ ]
+ },
+ "prereqs" : {
+ "build" : {
+ "requires" : {
+ "ExtUtils::MakeMaker" : "0"
+ }
+ },
+ "configure" : {
+ "requires" : {
+ "ExtUtils::MakeMaker" : "0"
+ }
+ },
+ "runtime" : {
+ "requires" : {
+ "IO::Socket::SSL" : "0",
+ "LWP" : "5.64"
+ }
+ }
+ },
+ "release_status" : "stable",
+ "version" : "2.62"
+}
diff --git a/META.yml b/META.yml
new file mode 100644
index 0000000..0974720
--- /dev/null
+++ b/META.yml
@@ -0,0 +1,23 @@
+---
+abstract: 'Extension for LWP to allow parallel HTTP and FTP access'
+author:
+ - 'Marc Langheinrich <marclang at cpan.org>, Mike South (maintainer) <msouth at cpan.org>'
+build_requires:
+ ExtUtils::MakeMaker: '0'
+configure_requires:
+ ExtUtils::MakeMaker: '0'
+dynamic_config: 1
+generated_by: 'ExtUtils::MakeMaker version 6.96, CPAN::Meta::Converter version 2.150001'
+license: perl
+meta-spec:
+ url: http://module-build.sourceforge.net/META-spec-v1.4.html
+ version: '1.4'
+name: ParallelUserAgent
+no_index:
+ directory:
+ - t
+ - inc
+requires:
+ IO::Socket::SSL: '0'
+ LWP: '5.64'
+version: '2.62'
diff --git a/Makefile.PL b/Makefile.PL
new file mode 100755
index 0000000..dc0d536
--- /dev/null
+++ b/Makefile.PL
@@ -0,0 +1,37 @@
+# This -*- perl -*- script writes the Makefile for LWP::Parallel
+# $Id: Makefile.PL,v 1.14 2004/02/10 15:19:10 langhein Exp $
+use ExtUtils::MakeMaker;
+
+# See lib/ExtUtils/MakeMaker.pm for details of how to influence
+# the contents of the Makefile that is written.
+WriteMakefile(
+ 'NAME' => 'LWP::Parallel',
+ 'ABSTRACT' => 'Extension for LWP to allow parallel HTTP and FTP access',
+ 'DISTNAME' => 'ParallelUserAgent',
+ 'VERSION_FROM' => 'lib/LWP/Parallel.pm', # finds $VERSION
+ 'LICENSE' => 'perl',
+ 'AUTHOR' => 'Marc Langheinrich <marclang at cpan.org>, Mike South (maintainer) <msouth at cpan.org>',
+ #
+ # Not 100% sure what is needed here--I'm stripping out
+ # the old http protocol ones though because a recent change should remove
+ # the dependency. Passes tests; if you have problems after this change,
+ # please submit a failing test.
+ #
+ 'PREREQ_PM' => {
+ 'LWP' => 5.64,
+ 'IO::Socket::SSL' => 0,
+ },
+);
+
+package MY;
+
+sub test
+{
+ q(
+test: all
+ $(FULLPERL) t/TEST
+
+);
+}
+
+
diff --git a/README b/README
new file mode 100755
index 0000000..9989521
--- /dev/null
+++ b/README
@@ -0,0 +1,162 @@
+
+ Parallel User Agent
+ ---------------------
+
+(c) 1997-2004 Marc Langheinrich <marclang at cpan.org>
+
+This library is free software; you can redistribute it and/or modify
+it under the same terms as Perl itself.
+
+ Disclaimer
+ ----------
+ Unfortunately, I have very little time to maintain this
+ module. Though I eventually get around to update it every
+ so often, please be aware that the level of support and
+ quality for this Perl module is somewhat less than what you
+ might expect from official CPAN releases :-(
+
+ Disclaimer 2.0
+ --------------
+ Hi, this is Mike South (msouth at cpan.org). I got myself added
+ as a co-maintainer to this project so that I could fix a few
+ small bugs and let it pass tests using more recent versions of
+ libwww. I have no deep knowledge of the module but I am
+ trying to address trivial blockers that are preventing legacy
+ users from being able to install on an updated system. I will
+ do my best to handle any issues that come up, and I thank you
+ in advance for your patience and suggestions/patches.
+
+
+1. What is it?
+--------------
+ParallelUserAgent (or PUA for short) is an extension of the existing
+libwww-5.x distribution. It allows you to connect to download several
+Web pages in _parallel_, without having to request each page one after
+the other.
+
+Instead of retrieving each single page using LWP::UserAgent::request
+or ::simple_request methods, you first "register" all pages that you
+wish to download in parallel and then call PUA's "wait" method, which
+will then make all the necessary connections and download the
+pages.
+
+You can define callback routines which will be called whenever a
+connection is established, is cut off, new data comes in or a request
+finishes. The subroutines can be global for all requests you
+registered, or different for every single request.
+
+Callbacks for example allow you to print status updates as the
+responses come in, or even immediately post follow-up requests based
+on the responses, all within the same single "wait" cycle.
+
+2. How to install it?
+---------------------
+
+PUA comes with an out-of-the-box installation setup. All you have to do
+is the usual
+
+perl Makefile.PL
+make
+make test
+make install
+
+Previous versions of PUA asked you to replace a few standard modules
+that came with libwww5.x. As of version 2.30, PUA will install all of
+its modules into the LWP::Parallel subtree, so that your previous
+libwww installation remains unchanged!
+
+However, it also requires the latest version of the libwww that it has
+been tested against. Earlier (or later) version might work, too, but
+you are on your own there. If you are using a different version then
+the ones listed below, either upgrade to the latest tested libwww
+distribution, or try running the testscript ("make test") and see if
+it tests ok anyways :-)
+
+Checked versions so far:
+
+ libwww-5.76 - 6.04* use ParallelUA-2.62
+
+ libwww-5.66 - 5.69 untested
+ libwww-5.64 - 5.65 use ParallelUA-2.55
+ libwww-5.60 - 5.63 use ParallelUA-2.54
+ libwww-5.52 - 5.53 use ParallelUA-2.51
+ libwww-5.48 - 5.51 use ParallelUA-2.50
+ libwww-5.44 - 5.47 untested (use 2.43)
+ libwww-5.41 - 5.43 use ParallelUA-2.43
+ libwww-5.20 - 5.36 use up to ParallelUA-2.39
+ libwww-5.19 untested
+ libwww-5.18 and below not ok!
+
+Note: For optimal results use the _latest_ version of BOTH
+ libwww-perl (i.e. 6.04) and ParallelUserAgent (i.e. 2.62).
+
+* note--not every single version in this range has been tested. 5.76,
+5.808, 6.04 I am sure of. There could be versions in between that
+need to go back to 2.57. If you find that to be the case please
+file a bug at rt.cpan.org for this distribution.
+
+3. How to use it?
+-----------------
+The current distribution unfortunately has very little documentation
+:-( So far I'm happy that I can keep up with Gisle's constant output
+of new versions of libwww and haven't had much time to think about
+proper documentation.
+
+For starters, you should look into the t/ subdirectory in this
+distribution for some examples on how to use this library! Some brief
+explanations and examples can also be found in the LWP/Parallel.pm
+file -- you can use "perldoc LWP::Parallel" after you installed this
+module to view the embedded documentation.
+
+Most of the other modules also come with some brief description of
+their interfaces which can be viewed using "perldoc <Modulename>".
+
+
+4. Remarks for users upgrading from ParallelUA 2.20:
+----------------------------------------------------
+
+You should start out by installing a clean copy of the latest
+libwww-5.x package, which should overwrite any existing PUA modules
+which used to replace some of the LWP modules.
+
+The new PUA distribution will still offer the LWP::ParallelUA and
+LWP::RobotPUA modules for backward compatibility, but you are
+encouraged to start any new projects using the "local" modules now
+residing under the LWP::Parallel subtree, namely
+LWP::Parallel::UserAgent and LWP::Parallel::RobotUA.
+
+The old "TestScript.pl" that used to come with versions 2.20 and below
+can be found in t/TestScript.pl and can be used to check backward
+compatibility (or can also be used to get a somewhat bigger example on
+how to use this module).
+
+
+5. Questions, comments, etc.
+----------------------------
+
+You can join the libwww-perl mailing list by following the
+instructions at http://lists.perl.org/list/libwww-perl.html
+(send email to libwww-subscribe at perl.org).
+
+Bug reports and suggestions for improvements can be submitted via
+the cpan RT installation:
+
+https://rt.cpan.org/Dist/Display.html?Queue=ParallelUserAgent
+
+COPYRIGHT
+
+ � 1997-2004 Marc Langheinrich. All rights reserved.
+
+This library is free software; you can redistribute it and/or modify
+it under the same terms as Perl itself.
+
+
+Have fun!
+
+--
+$Revision: 1.20 $
+
+
+
+
+
diff --git a/README.SSL b/README.SSL
new file mode 100755
index 0000000..538d634
--- /dev/null
+++ b/README.SSL
@@ -0,0 +1,14 @@
+SSL SUPPORT
+-----------
+
+ ** DISCLAIMER: https support is pretty buggy as of now. i haven't **
+ ** had time to test much of it, so feel free to see if it works **
+ ** for you, but don't expect it to :-) **
+
+If you enabled your libwww-perl package to support SSL/TLSv1 with its
+HTTP client and server classes, you can readily access HTTPS sites
+with ParallelUA as well.
+
+Please refer to README.SSL in the libwww-perl distribution for further
+information on how to enable SSL support in libwww-perl (and therefor
+in ParallelUA).
diff --git a/TODO b/TODO
new file mode 100644
index 0000000..1ad15d2
--- /dev/null
+++ b/TODO
@@ -0,0 +1,13 @@
+* Support all LWP features, e.g.,
+ - support setting specific ip address (aargh, need to use Net::HTTP?!)
+ - support non-blocking calls (? no idea what this entails, see above)
+* Improve Tests
+ - Add tests for local writes
+ - Add tests for https
+ - Add tests for POE
+ - Add tests for local ftp via Net::FTPServer::InMem
+* Include HTTP/1.1 support
+ - Add keep-alive support
+ - Add chunked support (live/jigsaw-chunked.t)
+ - Understand and implement Client-Response-Num header
+ - Implement TE/ZLIB support (live/jigsaw-te.t)
diff --git a/debian/changelog b/debian/changelog
deleted file mode 100644
index 526c767..0000000
--- a/debian/changelog
+++ /dev/null
@@ -1,27 +0,0 @@
-liblwp-parallel-perl (2.62-1) UNRELEASED; urgency=medium
-
- * Team upload.
- * New upstream version
- * Secure URI in watch file
-
- -- Andreas Tille <tille at debian.org> Tue, 26 Dec 2017 13:21:30 +0100
-
-liblwp-parallel-perl (2.57-3) experimental; urgency=low
-
- * Adjusted binary dependency to match libwww5.808-perl.
- * Added previously omitted build dependency on libwww5.808-perl
-
- -- Steffen Moeller <moeller at debian.org> Sun, 17 Oct 2010 02:10:44 +0200
-
-liblwp-parallel-perl (2.57-2) experimental; urgency=low
-
- * Renamed package (Closes: #592528, #593587).
- * Upload to experimental.
-
- -- Steffen Moeller <moeller at debian.org> Sat, 04 Sep 2010 16:43:46 +0200
-
-libparallel-useragent-perl (2.57-1) unstable; urgency=low
-
- * Initial Release.
-
- -- Richard Holland <holland at eaglegenomics.com> Thu, 19 Aug 2010 15:21:00 +0200
diff --git a/debian/compat b/debian/compat
deleted file mode 100644
index 7f8f011..0000000
--- a/debian/compat
+++ /dev/null
@@ -1 +0,0 @@
-7
diff --git a/debian/control b/debian/control
deleted file mode 100644
index 8c82c67..0000000
--- a/debian/control
+++ /dev/null
@@ -1,35 +0,0 @@
-Source: liblwp-parallel-perl
-Section: perl
-Priority: optional
-Build-Depends: debhelper (>= 7), libwww5.808-perl
-Build-Depends-Indep: perl
-Maintainer: Richard Holland <holland at eaglegenomics.com>
-Uploaders: Steffen Moeller <moeller at debian.org>
-Standards-Version: 3.9.1
-Homepage: http://search.cpan.org/dist/ParallelUserAgent/
-
-Package: liblwp-parallel-perl
-Architecture: all
-Depends: ${misc:Depends}, ${perl:Depends}, libwww5.808-perl
-Replaces: libparallel-useragent-perl
-Description: Extension for LWP to allow parallel HTTP and FTP access
- This class implements a user agent that access web sources in parallel.
- .
- Using a LWP::Parallel::UserAgent as your user agent, you typically
- start by registering your requests, along with how you want the
- Agent to process the incoming results (see $ua->register).
- .
- Then you wait for the results by calling $ua->wait. This method
- only returns, if all requests have returned an answer, or the Agent
- timed out. Also, individual callback functions might indicate that
- the Agent should stop waiting for requests and return. (see
- $ua->register)
- .
- This package was first erroneously named libparallel-useragent-perl,
- which would have reflected how the useragent is invoked in the
- Perl code, but only after importing its module LWP::Parallel.
- The module depends on a legacy version of the www Perl module.
- A better alternative to Parallel::Useragent may be to adopt
- WWW::Curl::Simple for the problem. This package was submitted to
- the experimental distribution of Debian because of its dependency
- on libwww-perl5.808-perl that was submitted to experimental.
diff --git a/debian/copyright b/debian/copyright
deleted file mode 100644
index 43fb8b9..0000000
--- a/debian/copyright
+++ /dev/null
@@ -1,34 +0,0 @@
-Format-Specification: http://svn.debian.org/wsvn/dep/web/deps/dep5.mdwn?op=file&rev=135
-Maintainer: Marc Langheinrich, marclang at cpan.org
-Source: http://search.cpan.org/dist/ParallelUserAgent/
-Name: ParallelUserAgent
-
-Files: *
-Copyright: Marc Langheinrich, marclang at cpan.org
-License: Artistic or GPL-1+
-
-Files: lib/LWP/Parallel/Protocol.pm, lib/LWP/Parallel/UserAgent.pm,
- lib/LWP/Parallel/Protocol/http.pm, lib/LWP/Parallel/RobotUA.pm
-Copyright: Marc Langheinrich, marclang at cpan.org
- Parts 1995-2004 Gisle Aas
-License: Artistic or GPL-1+
-
-Files: debian/*
-Copyright: 2010, Richard Holland <holland at eaglegenomics.com>
-License: Artistic or GPL-1+
-
-License: Artistic
- This program is free software; you can redistribute it and/or modify
- it under the terms of the Artistic License, which comes with Perl.
- .
- On Debian GNU/Linux systems, the complete text of the Artistic License
- can be found in `/usr/share/common-licenses/Artistic'.
-
-License: GPL-1+
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 1, or (at your option)
- any later version.
- .
- On Debian GNU/Linux systems, the complete text of version 1 of the
- General Public License can be found in `/usr/share/common-licenses/GPL-1'.
diff --git a/debian/liblwp-parallel.docs b/debian/liblwp-parallel.docs
deleted file mode 100644
index 0b59c59..0000000
--- a/debian/liblwp-parallel.docs
+++ /dev/null
@@ -1,3 +0,0 @@
-TODO
-README.SSL
-README
diff --git a/debian/rules b/debian/rules
deleted file mode 100755
index 2d33f6a..0000000
--- a/debian/rules
+++ /dev/null
@@ -1,4 +0,0 @@
-#!/usr/bin/make -f
-
-%:
- dh $@
diff --git a/debian/watch b/debian/watch
deleted file mode 100644
index 91ba44e..0000000
--- a/debian/watch
+++ /dev/null
@@ -1,3 +0,0 @@
-version=4
-
-https://metacpan.org/release/ParallelUserAgent/ .*/ParallelUserAgent-v?(\d[\d_.-]+)\.(?:tar(?:\.gz|\.bz2)?|tgz|zip)$
diff --git a/lib/Bundle/ParallelUA.pm b/lib/Bundle/ParallelUA.pm
new file mode 100755
index 0000000..54c5d9e
--- /dev/null
+++ b/lib/Bundle/ParallelUA.pm
@@ -0,0 +1,40 @@
+# -*- perl -*-
+# $Id: ParallelUA.pm,v 1.9 2003/02/19 14:57:55 langhein Exp $
+
+package Bundle::ParallelUA;
+
+$VERSION = '2.54_19';
+
+1;
+
+__END__
+
+=head1 NAME
+
+Bundle::ParallelUA - CPAN Bundle for the LWP Parallel User Agent extension
+
+=head1 SYNOPSIS
+
+C<perl -MCPAN -e 'install Bundle::ParallelUA'>
+
+=head1 CONTENTS
+
+ExtUtils::MakeMaker - should be in perl disribution
+
+LWP::UserAgent - Base for Parallel::UserAgent
+
+LWP::RobotUA - Base for Parallel::RobotUA
+
+LWP::Protocol - Base Protocol implementations
+
+LWP::Parallel - Parallel User Agent itself
+
+=head1 DESCRIPTION
+
+This bundle defines all required modules for ParallelUserAgent.
+
+=head1 AUTHOR
+
+Marc Langheinrich
+
+=cut
diff --git a/lib/LWP/Parallel.pm b/lib/LWP/Parallel.pm
new file mode 100755
index 0000000..fc3a9c2
--- /dev/null
+++ b/lib/LWP/Parallel.pm
@@ -0,0 +1,300 @@
+# -*- perl -*-
+# $Id: Parallel.pm,v 1.21 2004/02/10 15:19:18 langhein Exp $
+
+package LWP::Parallel;
+
+$VERSION = '2.62';
+sub Version { $VERSION };
+
+require 5.004;
+require LWP::Parallel::UserAgent; # this should load everything you need
+
+1;
+
+__END__
+
+=head1 NAME
+
+LWP::Parallel - Extension for LWP to allow parallel HTTP and FTP access
+
+=head1 SYNOPSIS
+
+ use LWP::Parallel;
+ print "This is LWP::Parallel_$LWP::Parallel::VERSION\n";
+
+=head1 DESCRIPTION
+
+=head2 Introduction
+
+ParallelUserAgent is an extension to the existing libwww module. It
+allows you to take a list of URLs (it currently supports HTTP, FTP, and
+FILE URLs. HTTPS might work, too) and connect to all of them _in parallel_,
+then wait for the results to come in.
+
+See the Parallel::UserAgent for how to create a LWP UserAgent that
+will access multiple Web resources in parallel. The Parallel::RobotUA
+module will additionally offer proper handling of robot.txt file, the
+de-facto exclusion protocol for Web Robots.
+
+=head2 Examples
+
+The following examples might help to get you started:
+
+
+ require LWP::Parallel::UserAgent;
+ use HTTP::Request;
+
+ # display tons of debugging messages. See 'perldoc LWP::Debug'
+ #use LWP::Debug qw(+);
+
+ # shortcut for demo URLs
+ my $url = "http://localhost/";
+
+ my $reqs = [
+ HTTP::Request->new('GET', $url),
+ HTTP::Request->new('GET', $url."homes/marclang/"),
+ ];
+
+ my $pua = LWP::Parallel::UserAgent->new();
+ $pua->in_order (1); # handle requests in order of registration
+ $pua->duplicates(0); # ignore duplicates
+ $pua->timeout (2); # in seconds
+ $pua->redirect (1); # follow redirects
+
+ foreach my $req (@$reqs) {
+ print "Registering '".$req->url."'\n";
+ if ( my $res = $pua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ }
+ }
+ my $entries = $pua->wait();
+
+ foreach (keys %$entries) {
+ my $res = $entries->{$_}->response;
+
+ print "Answer for '",$res->request->url, "' was \t", $res->code,": ",
+ $res->message,"\n";
+ }
+
+Parallel::UserAgent (as well as the Parallel::RobotUA) offer three
+default methods that will be called at certain points during the
+connection: C<on_connect>, C<on_return> and C<on_failure>.
+
+
+ #
+ # provide subclassed UserAgent to override on_connect, on_failure and
+ # on_return methods
+ #
+ package myUA;
+
+ use Exporter();
+ use LWP::Parallel::UserAgent qw(:CALLBACK);
+ @ISA = qw(LWP::Parallel::UserAgent Exporter);
+ @EXPORT = @LWP::Parallel::UserAgent::EXPORT_OK;
+
+ # redefine methods: on_connect gets called whenever we're about to
+ # make a a connection
+ sub on_connect {
+ my ($self, $request, $response, $entry) = @_;
+ print "Connecting to ",$request->url,"\n";
+ }
+
+ # on_failure gets called whenever a connection fails right away
+ # (either we timed out, or failed to connect to this address before,
+ # or it's a duplicate). Please note that non-connection based
+ # errors, for example requests for non-existant pages, will NOT call
+ # on_failure since the response from the server will be a well
+ # formed HTTP response!
+ sub on_failure {
+ my ($self, $request, $response, $entry) = @_;
+ print "Failed to connect to ",$request->url,"\n\t",
+ $response->code, ", ", $response->message,"\n"
+ if $response;
+ }
+
+ # on_return gets called whenever a connection (or its callback)
+ # returns EOF (or any other terminating status code available for
+ # callback functions). Please note that on_return gets called for
+ # any successfully terminated HTTP connection! This does not imply
+ # that the response sent from the server is a success!
+ sub on_return {
+ my ($self, $request, $response, $entry) = @_;
+ if ($response->is_success) {
+ print "\n\nWoa! Request to ",$request->url," returned code ", $response->code,
+ ": ", $response->message, "\n";
+ print $response->content;
+ } else {
+ print "\n\nBummer! Request to ",$request->url," returned code ", $response->code,
+ ": ", $response->message, "\n";
+ # print $response->error_as_HTML;
+ }
+ return;
+ }
+
+ package main;
+ use HTTP::Request;
+
+ # shortcut for demo URLs
+ my $url = "http://localhost/";
+
+ my $reqs = [
+ HTTP::Request->new('GET', $url),
+ HTTP::Request->new('GET', $url."homes/marclang/"),
+ ];
+
+ my $pua = myUA->new();
+
+ foreach my $req (@$reqs) {
+ print "Registering '".$req->url."'\n";
+ $pua->register ($req);
+ }
+ my $entries = $pua->wait(); # responses will be caught by on_return, etc
+
+
+The final example will demonstrate a simple Web Robot that keeps a
+cache of the "robots.txt" permission files it has encountered so
+far. This example also uses callbacks to handle the response as it
+comes in.
+
+ require LWP::Parallel::UserAgent;
+ use HTTP::Request;
+
+ # persistent robot rules support. See 'perldoc WWW::RobotRules::AnyDBM_File'
+ require WWW::RobotRules::AnyDBM_File;
+
+ # shortcut for demo URLs
+ my $url = "http://www.cs.washington.edu/";
+
+ my $reqs = [
+ HTTP::Request->new('GET', $url),
+ # these are all redirects. depending on how you set
+ # 'redirect_ok' they either just return the status code for
+ # redirect (like 302 moved), or continue to follow redirection.
+ HTTP::Request->new('GET', $url."research/ahoy/"),
+ HTTP::Request->new('GET', $url."research/ahoy/doc/paper.html"),
+ HTTP::Request->new('GET', "http://metacrawler.cs.washington.edu:6060/"),
+ # these are all non-existant server. the first one should take
+ # some time, but the following ones should be rejected right
+ # away
+ HTTP::Request->new('GET', "http://www.foobar.foo/research/ahoy/"),
+ HTTP::Request->new('GET', "http://www.foobar.foo/foobar/foo/"),
+ HTTP::Request->new('GET', "http://www.foobar.foo/baz/buzz.html"),
+ # although server exists, file doesn't
+ HTTP::Request->new('GET', $url."foobar/bar/baz.html"),
+ ];
+
+ my ($req,$res);
+
+ # establish persistant robot rules cache. See WWW::RobotRules for
+ # non-permanent version. you should probably adjust the agentname
+ # and cache filename.
+ my $rules = new WWW::RobotRules::AnyDBM_File 'ParallelUA', 'cache';
+
+ # create new UserAgent (actually, a Robot)
+ my $pua = new LWP::Parallel::RobotUA ("ParallelUA",
+ 'yourname at your.site.com', $rules);
+
+ $pua->timeout (2); # in seconds
+ $pua->delay ( 5); # in seconds
+ $pua->max_req ( 2); # max parallel requests per server
+ $pua->max_hosts(10); # max parallel servers accessed
+
+ # for our own print statements that follow below:
+ local($\) = ""; # ensure standard $OUTPUT_RECORD_SEPARATOR
+
+ # register requests
+ foreach $req (@$reqs) {
+ print "Registering '".$req->url."'\n";
+ $pua->register ($req , \&handle_answer);
+ # Each request, even if it failed to # register properly, will
+ # show up in the final list of # requests returned by $pua->wait,
+ # so you can examine it # later.
+ }
+
+ # $pua->wait returns a pointer to an associative array, containing
+ # an '$entry' for each request made, sorted by its url. (as returned
+ # by $request->url->as_string)
+ my $entries = $pua->wait(); # give another timeout here, 25 seconds
+
+ # let's see what we got back (see also callback function!!)
+ foreach (keys %$entries) {
+ $res = $entries->{$_}->response;
+
+ # examine response to find cascaded requests (redirects, etc) and
+ # set current response to point to the very first response of this
+ # sequence. (not very exciting if you set '$pua->redirect(0)')
+ my $r = $res; my @redirects;
+ while ($r) {
+ $res = $r;
+ $r = $r->previous;
+ push (@redirects, $res) if $r;
+ }
+
+ # summarize response. see "perldoc HTTP::Response"
+ print "Answer for '",$res->request->url, "' was \t", $res->code,": ",
+ $res->message,"\n";
+ # print redirection history, in case we got redirected
+ foreach (@redirects) {
+ print "\t",$_->request->url, "\t", $_->code,": ", $_->message,"\n";
+ }
+ }
+
+ # our callback function gets called whenever some data comes in
+ # (same parameter format as standard LWP::UserAgent callbacks!)
+ sub handle_answer {
+ my ($content, $response, $protocol, $entry) = @_;
+
+ print "Handling answer from '",$response->request->url,": ",
+ length($content), " bytes, Code ",
+ $response->code, ", ", $response->message,"\n";
+
+ if (length ($content) ) {
+ # just store content if it comes in
+ $response->add_content($content);
+ } else {
+ # Having no content doesn't mean the connection is closed!
+ # Sometimes the server might return zero bytes, so unless
+ # you already got the information you need, you should continue
+ # processing here (see below)
+
+ # Otherwise you can return a special exit code that will
+ # determins how ParallelUA will continue with this connection.
+
+ # Note: We have to import those constants via "qw(:CALLBACK)"!
+
+ # return C_ENDCON; # will end only this connection
+ # (silly, we already have EOF)
+ # return C_LASTCON; # wait for remaining open connections,
+ # but don't issue any new ones!!
+ # return C_ENDALL; # will immediately end all connections
+ # and return from $pua->wait
+ }
+
+ # ATTENTION!! If you want to keep reading from your connection,
+ # you should have a final 'return undef' statement here. Even if
+ # you think that all data has arrived, it does not hurt to return
+ # undef here. The Parallel UserAgent will figure out by itself
+ # when to close the connection!
+
+ return undef; # just keep on connecting/reading/waiting
+ # until the server closes the connection.
+ }
+
+=head1 AUTHOR
+
+Marc Langheinrich, marclang at cpan.org
+
+=head1 SEE ALSO
+
+See L<LWP> for an overview on Web communication using Perl. See
+L<LWP::Parallel::UserAgent> and L<LWP::Parallel::RobotUA> for details
+on how to use this library.
+
+=head1 COPYRIGHT
+
+Copyright 1997-2004 Marc Langheinrich E<lt>marclang at cpan.org>
+
+This library is free software; you can redistribute it and/or modify
+it under the same terms as Perl itself.
+
+=cut
diff --git a/lib/LWP/Parallel/Protocol.pm b/lib/LWP/Parallel/Protocol.pm
new file mode 100755
index 0000000..593bb34
--- /dev/null
+++ b/lib/LWP/Parallel/Protocol.pm
@@ -0,0 +1,305 @@
+# -*- perl -*-
+# $Id: Protocol.pm,v 1.10 2004/02/10 15:19:19 langhein Exp $
+# derived from: Protocol.pm,v 1.39 2001/10/26 19:00:21 gisle Exp
+
+package LWP::Parallel::Protocol;
+
+=head1 NAME
+
+LWP::Parallel::Protocol - Base class for parallel LWP protocols
+
+=head1 SYNOPSIS
+
+ package LWP::Parallel::Protocol::foo;
+ require LWP::Parallel::Protocol;
+ @ISA=qw(LWP::Parallel::Protocol);
+
+=head1 DESCRIPTION
+
+This class is used a the base class for all protocol implementations
+supported by the LWP::Parallel library. It mirrors the behavior of the
+original LWP::Parallel library by subclassing from it and adding a few
+subroutines of its own.
+
+Please see the LWP::Protocol for more information about the usage of
+this module.
+
+In addition to the inherited methods from LWP::Protocol, The following
+methods and functions are provided:
+
+=head1 ADDITIONAL METHODS AND FUNCTIONS
+
+=over 4
+
+=cut
+
+#######################################################
+
+require LWP::Protocol;
+ at ISA = qw(LWP::Protocol);
+$VERSION = sprintf("%d.%02d", q$Revision: 1.10 $ =~ /(\d+)\.(\d+)/);
+
+
+use HTTP::Status ();
+use HTML::HeadParser; # thanks to Kirill
+use strict;
+use Carp ();
+
+my %ImplementedBy = (); # scheme => classname
+
+
+=item $prot = LWP::Parallel::Protocol->new();
+
+The LWP::Parallel::Protocol constructor is inherited by subclasses. As this is
+a virtual base class this method should B<not> be called directly.
+
+Note: This is inherited from LWP::Protocol
+
+=cut
+
+
+
+=item $prot = LWP::Parallel::Protocol::create($schema)
+
+Create an object of the class implementing the protocol to handle the
+given scheme. This is a function, not a method. It is more an object
+factory than a constructor. This is the function user agents should
+use to access protocols.
+
+=cut
+
+sub create
+{
+ my ($scheme, $ua) = @_;
+ my $impclass = LWP::Parallel::Protocol::implementor($scheme) or
+ Carp::croak("Protocol scheme '$scheme' is not supported");
+
+ # hand-off to scheme specific implementation sub-class
+ my $protocol = $impclass->new($scheme, $ua);
+
+ return $protocol;
+}
+
+
+=item $class = LWP::Parallel::Protocol::implementor($scheme, [$class])
+
+Get and/or set implementor class for a scheme. Returns '' if the
+specified scheme is not supported.
+
+=cut
+
+sub implementor
+{
+ my($scheme, $impclass) = @_;
+
+ if ($impclass) {
+ $ImplementedBy{$scheme} = $impclass;
+ }
+ my $ic = $ImplementedBy{$scheme};
+ return $ic if $ic;
+
+ return '' unless $scheme =~ /^([.+\-\w]+)$/; # check valid URL schemes
+ $scheme = $1; # untaint
+ $scheme =~ s/[.+\-]/_/g; # make it a legal module name
+
+ # scheme not yet known, look for a 'use'd implementation
+ $ic = "LWP::Parallel::Protocol::$scheme"; # default location
+ no strict 'refs';
+ # check we actually have one for the scheme:
+ unless (@{"${ic}::ISA"}) { # fixed in LWP 5.48
+ # try to autoload it
+ #LWP::Debug::debug("Try autoloading $ic");
+ eval "require $ic";
+ if ($@) {
+ if ($@ =~ /Can't locate/) { #' #emacs get confused by '
+ $ic = '';
+ } else { # this msg never gets to the surface - 1002, JB
+ die "$@\n";
+ }
+ }
+ }
+ $ImplementedBy{$scheme} = $ic if $ic;
+ $ic;
+}
+
+=item $prot->receive ($arg, $response, $content)
+
+Called to store a piece of content of a request, and process it
+appropriately into a scalar, file, or by calling a callback. If $arg
+is undefined, then the content is stored within the $response. If
+$arg is a simple scalar, then $arg is interpreted as a file name and
+the content is written to this file. If $arg is a reference to a
+routine, then content is passed to this routine.
+
+$content must be a reference to a scalar holding the content that
+should be processed.
+
+The return value from receive() is undef for errors, positive for
+non-zero content processed, 0 for forced EOFs, and potentially a
+negative command from a user-defined callback function.
+
+B<Note:> We will only use the file or callback argument if
+$response->is_success(). This avoids sendig content data for
+redirects and authentization responses to the file or the callback
+function.
+
+=cut
+
+sub receive {
+ my ($self, $arg, $response, $content, $entry) = @_;
+
+ LWP::Debug::trace("( [self]" .
+ ", ". (defined $arg ? $arg : '[undef]') .
+ ", ". (defined $response ?
+ (defined $response->code ?
+ $response->code : '???') . " " .
+ (defined $response->message ?
+ $response->message : 'undef')
+ : '[undef]') .
+ ", ". (defined $content ?
+ (ref($content) eq 'SCALAR'?
+ length($$content) . " bytes"
+ : '[ref('. ref($content) .')' )
+ : '[undef]') .
+ ", ". (defined $entry ? $entry : '[undef]') .
+ ")");
+
+
+ my($parse_head, $max_size, $parallel) =
+ @{$self}{qw(parse_head max_size parallel)};
+
+ my $parser;
+ if ($parse_head && $response->content_type eq 'text/html') {
+ require HTML::HeadParser; # LWP 5.60
+ $parser = HTML::HeadParser->new($response->{'_headers'});
+ }
+
+ my $content_size = $entry->content_size;
+
+ # Note: We don't need alarms here since we are not making any tcp
+ # connections. All the data we need is alread in \$content, so we
+ # just read out a string value -- nothing should slow us down here
+ # (other than processor speed or memory constraints :) ) PS: You
+ # can't just add 'alarm' somewhere here unless you fix the calls
+ # to ->receive in the subclasses such as 'ftp' or 'http' and wrap
+ # them in an 'eval' statement that will catch our alarm-exceptions
+ # we would throw here! But since we don't need alarms here, just
+ # forget what I just said - it's irrelevant.
+
+ if (!defined($arg) || !$response->is_success ) {
+ # scalar
+ if ($parser) {
+ $parser->parse($$content) or undef($parser);
+ }
+ LWP::Debug::debug("read " . length($$content) . " bytes");
+ $response->add_content($$content);
+ $content_size += length($$content);
+ $entry->content_size($content_size); # update persistant size counter
+ if (defined($max_size) && $content_size > $max_size) {
+ LWP::Debug::debug("Aborting because size limit of " .
+ "$max_size bytes exceeded");
+ $response->push_header("Client-Aborted", "max_size");
+ #my $tot = $response->header("Content-Length") || 0;
+ #$response->header("X-Content-Range", "bytes 0-$content_size/$tot");
+ return 0; # EOF (kind of)
+ }
+ }
+ elsif (!ref($arg)) {
+ # Mmmh. Could this take so long that we want to use alarm here?
+ my $file_open;
+ if (defined ($entry->content_size) and ($entry->content_size > 0)) {
+ $file_open = open(OUT, ">>$arg"); # we already have data: append
+ } else {
+ $file_open = open(OUT, ">$arg"); # no content received: open new
+ }
+ unless ( $file_open ) {
+ $response->code(&HTTP::Status::RC_INTERNAL_SERVER_ERROR);
+ $response->message("Cannot write to '$arg': $!");
+ return; # undef means error
+ }
+ binmode(OUT);
+ local($\) = ""; # ensure standard $OUTPUT_RECORD_SEPARATOR
+ if ($parser) {
+ $parser->parse($$content) or undef($parser);
+ }
+ LWP::Debug::debug("[FILE] read " . length($$content) . " bytes");
+ print OUT $$content;
+ $content_size += length($$content);
+ $entry->content_size($content_size); # update persistant size counter
+ close(OUT);
+ if (defined($max_size) && $content_size > $max_size) {
+ LWP::Debug::debug("Aborting because size limit exceeded");
+ $response->push_header("Client-Aborted", "max_size");
+ #my $tot = $response->header("Content-Length") || 0;
+ #$response->header("X-Content-Range", "bytes 0-$content_size/$tot");
+ return 0;
+ }
+ }
+ elsif (ref($arg) eq 'CODE') {
+ # read into callback
+ if ($parser) {
+ $parser->parse($$content) or undef($parser);
+ }
+ LWP::Debug::debug("[CODE] read " . length($$content) . " bytes");
+ my $retval;
+ eval {
+ $retval = &$arg($$content, $response, $self, $entry);
+ };
+ if ($@) {
+ chomp($@);
+ $response->push_header('X-Died' => $@);
+ $response->push_header("Client-Aborted", "die");
+ } else {
+ # pass return value from callback through to implementor class
+ LWP::Debug::debug("return-code from Callback was '".
+ (defined $retval ? "$retval'" : "[undef]'"));
+ return $retval;
+ }
+ }
+ else {
+ $response->code(&HTTP::Status::RC_INTERNAL_SERVER_ERROR);
+ $response->message("Unexpected collect argument '$arg'");
+ }
+ return length($$content); # otherwise return size of content processed
+}
+
+=item $prot->receive_once($arg, $response, $content, $entry)
+
+Can be called when the whole response content is available as
+$content. This will invoke receive() with a collector callback that
+returns a reference to $content the first time and an empty string the
+next.
+
+=cut
+
+sub receive_once {
+ my ($self, $arg, $response, $content, $entry) = @_;
+
+ # read once
+ my $retval = $self->receive($arg, $response, \$content, $entry);
+
+ # and immediately simulate EOF
+ my $no_content = '';
+ $retval = $self->receive($arg, $response, \$no_content, $entry)
+ unless $retval;
+
+ return (defined $retval? $retval : 0);
+}
+
+1;
+
+=head1 SEE ALSO
+
+Inspect the F<LWP/Parallel/Protocol/http.pm> file for examples of usage.
+
+=head1 COPYRIGHT
+
+Copyright 1997-2004 Marc Langheinrich E<lt>marclang at cpan.org>
+Parts copyright 1995-2004 Gisle Aas
+
+This library is free software; you can redistribute it and/or modify
+it under the same terms as Perl itself.
+
+=cut
+
+
diff --git a/lib/LWP/Parallel/Protocol/file.pm b/lib/LWP/Parallel/Protocol/file.pm
new file mode 100644
index 0000000..d0969f6
--- /dev/null
+++ b/lib/LWP/Parallel/Protocol/file.pm
@@ -0,0 +1,298 @@
+# Implementation of the file protocol for LWP::Parallel, based on
+# LWP::Protocol::file and LWP::Parallel::Protocol::ftp pattern.
+# contributed by Jeff Behr, October 2001
+# $Id: file.pm,v 1.2 2003/05/26 08:03:34 langhein Exp $
+
+package LWP::Parallel::Protocol::file;
+
+use HTTP::Status ();
+use HTTP::Response ();
+use LWP::MediaTypes ();
+use IO::File();
+use IO::Dir();
+
+use vars qw(@ISA);
+
+require LWP::Parallel::Protocol;
+require LWP::Protocol::file;
+ at ISA = qw(LWP::Parallel::Protocol LWP::Protocol::file);
+
+use strict;
+
+# this method just sees that the file or directory exists and can
+# be read by the user, etc., and then creates a handle for it from
+# IO::File or IO::Dir
+sub handle_connect {
+ my ($self, $request, $proxy, $timeout, $nonblock) = @_;
+
+ LWP::Debug::trace('(Entered Parallel::Protocol::file::handle_connect)');
+
+ # check proxy
+ if (defined $proxy) {
+ my $res = HTTP::Response->new(&HTTP::Status::RC_BAD_REQUEST,
+ 'You cannot proxy through the filesystem');
+ return(undef, $res);
+ }
+
+ #check method
+ my $method = $request->method;
+ unless ($method eq 'GET' || $method eq 'HEAD' || $method eq 'DELETE') {
+ my $res = HTTP::Response->new(&HTTP::Status::RC_METHOD_NOT_ALLOWED,
+ "Method $method not allowed for 'file:' URLs");
+ return(undef, $res);
+ }
+
+ # check url
+ my $url = $request->url;
+ my $scheme = $url->scheme;
+ if ($scheme ne 'file') {
+ my $res = HTTP::Response->new(&HTTP::Status::RC_INTERNAL_SERVER_ERROR,
+ "LWP::file::handle_connect called for '$scheme'");
+ return(undef, $res);
+ }
+
+ ########
+ # URL OK
+ ########
+ # If we get here, URL is OK
+ my $path = $url->file;
+
+ # test file exists and is readable
+ unless (-e $path) {
+ my $res = HTTP::Response->new(&HTTP::Status::RC_NOT_FOUND,
+ "File '$path' does not exist.");
+ return(undef, $res);
+ }
+
+ unless (-r _) {
+ my $res = HTTP::Response->new(&HTTP::Status::RC_FORBIDDEN,
+ "User does not have read permission");
+ return(undef, $res);
+ }
+
+ if ($method eq 'DELETE' && !(-w _)) {
+ my $res = HTTP::Response->new(&HTTP::Status::RC_FORBIDDEN,
+ "User does not have permission to delete $path");
+ return(undef, $res);
+ }
+
+ # file exists and is readable/writable ...
+ my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$filesize,
+ $atime,$mtime,$ctime,$blksize,$blocks) = stat(_);
+
+ # check if-modified-since
+ my $ims = $request->header('If-Modified-Since');
+ if (defined $ims) {
+ my $time = HTTP::Date::str2time($ims);
+ if (defined $time and $time >= $mtime) {
+ my $res = HTTP::Response->new(&HTTP::Status::RC_NOT_MODIFIED,
+ "$method $path");
+ return(undef, $res);
+ }
+ }
+
+ # the return value is an object of IO::Handle, either
+ # IO::File or IO::Dir.
+ # Ooops. Turns out IO::Dir is not derived from IO::Handle and
+ # IO::Select calls in UserAgent->wait calls don't see a handle.
+ # for objects of IO::Dir even though they can be "connections".
+ # Return (undef, response) for directory calls, for now. Prob-
+ # ably have to one-time directory lists in the future, or skip
+ # doing dirs here in favor of list_urls in FileCopy.pm.
+ my $fh;
+ if (-d _) {
+ return (undef,
+ HTTP::Response->new(&HTTP::Status::RC_INTERNAL_SERVER_ERROR,
+ "Skipping directory handle for '$path'."));
+
+ #$fh = IO::Dir->new($path) or return (undef,
+ # HTTP::Response->new(&HTTP::Status::RC_INTERNAL_SERVER_ERROR,
+ # "Unable to create directory handle for '$path': $!"));
+ }
+ elsif (-f _) {
+ $fh = IO::File->new($path) or return (undef,
+ HTTP::Response->new(&HTTP::Status::RC_INTERNAL_SERVER_ERROR,
+ "Unable to create file handle for '$path': $!"));
+ }
+ else {
+ return (undef,
+ HTTP::Response->new(&HTTP::Status::RC_UNSUPPORTED_MEDIA_TYPE,
+ "'$path' is not a directory or file listing."));
+ }
+
+ # Response looks to be OK
+ my $response = HTTP::Response->new(&HTTP::Status::RC_OK, "OK");
+ $response->request($request);
+
+ # Add header(s)
+ $response->header('Last-Modified', HTTP::Date::time2str($mtime));
+
+ return ($fh, $response);
+}
+
+
+sub write_request {
+ my ($self, $req, $fh, $response, $arg, $timeout) = @_;
+
+ LWP::Debug::trace('(Entered Parallel::Protocol::file::write_request)');
+
+ # $fh should be an IO::File or IO::Dir
+ unless (ref($fh) eq 'IO::File' or ref($fh) eq 'IO::Dir') {
+ my $res = HTTP::Response->new(&HTTP::Status::RC_UNSUPPORTED_MEDIA_TYPE,
+ "Socket is not IO::File or IO::Dir");
+ return(undef, $res);
+ }
+
+ # Delete the file, return the response.
+ if ($req->method eq 'DELETE') {
+ my $cnt = unlink $req->uri->file;
+ my $res;
+ if ($cnt) {
+ $res = HTTP::Response->new(&HTTP::Status::RC_OK,
+ "Deleted $req->uri->file");
+ }
+ else {
+ $res = HTTP::Response->new(&HTTP::Status::RC_METHOD_NOT_ALLOWED,
+ "Deletion failed on $req->uri->file");
+ }
+ return(undef, $res);
+ }
+
+ # return input $fh/$socket, response
+ return($fh, $response);
+}
+
+
+sub read_chunk {
+ my ($self, $response, $fh, $request, $arg, $size, $timeout, $entry) = @_;
+
+ LWP::Debug::trace('(Entered Parallel::Protocol::file::read_chunk)');
+
+ $size = 32768 unless defined $size and $size > 0;
+
+ my $path = $request->uri->path;
+ my $method = $request->method;
+ #print "Performing $method on $path\n";
+
+ # this is redundant from &handle_connect - see if it can be streamlined
+ my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$filesize,
+ $atime,$mtime,$ctime,$blksize,$blocks) = stat($path);
+
+ # Collect the data from the dir or file. If it's a dir, we get it in
+ # one shot and send it back to collect_once. Otherwise, we try to stay
+ # with the Parallel fashion and do things through wait(). Doing it this
+ # way will, I think, minimize the amount of memory that gets sucked up.
+ my $buf = "";
+ if (ref($fh) eq 'IO::File') {
+ # LWP::Proto::file does nothing with files under HEAD -
+ # sets header(s) from the values returned by stat, etc.
+ $response->header('Content-Length', $filesize);
+ my $type = LWP::MediaTypes::guess_media_type($path, $response);
+
+ my $bytes;
+ ($buf, $bytes) = $self->_read_file($fh, $response, $size);
+
+ # receive() method bases its action on the $arg value
+ my $retval = $self->receive($arg, $response, \$buf, $entry);
+
+ # $retval from Parallel::Proto->receive()
+ # this should be bytes read or a constant error value
+ # Could do more with the return value here
+ return (defined $retval ? $retval : $bytes);
+ }
+ elsif (ref($fh) eq 'IO::Dir') {
+ $buf = $self->_read_dir($fh, $response);
+ if ($ENV{DIR_AS_HTML}) {
+ ($buf, $response) = $self->_write_as_html($buf, $response);
+ $response->header('Content-Type', 'text/html');
+ } else {
+ $response->header('Content-Type', 'text/plain');
+ }
+ $response->header('Content-Length', length $buf);
+ $buf = "" if $method eq "HEAD";
+ $self->collect_once($arg, $response, $buf);
+ return 0;
+ }
+ else {
+ my $res = HTTP::Response->new(&HTTP::Status::RC_UNSUPPORTED_MEDIA_TYPE,
+ "Socket is not IO::File or IO::Dir");
+ # Not too sure about this return value
+ return 0;
+ }
+}
+
+sub close_connection {
+ my ($self, $response, $fh, $request, $socket) = @_;
+
+ LWP::Debug::trace('(Entered Parallel::Protocol::file::close_connect)');
+
+ $fh->close; # Dir or File
+ return;
+}
+
+sub _read_file {
+ my ($self, $fh, $response, $size) = @_;
+
+ my $content;
+ #$fh->binmode;
+ my $bytes_read = $fh->sysread($content, $size);
+
+ $content, $bytes_read;
+}
+
+
+# when reading directories, just get it all in one shot
+sub _read_dir {
+ my $self = shift;
+ my $fh = shift;
+ my $res = shift;
+
+ my @files = sort $fh->read;
+
+ # Make full directory listing
+ my $path = $res->request->uri->path;
+ for (@files) {
+ if($^O eq "MacOS") {
+ $_ .= "/" if -d "$path:$_";
+ } else {
+ $_ .= "/" if -d "$path/$_";
+ }
+ }
+ my $files = join "", @files;
+
+ return $files;
+}
+
+
+sub _write_as_html {
+ my ($self, $filelist, $response) = @_;
+
+ my $path = $response->request->uri->path;
+
+ # Re-Make directory listing
+ my @files = split '\n', $filelist;
+ for (@files) {
+ my $furl = URI::Escape::uri_escape($_); # file's url
+ my $desc = HTML::Entities::encode($_); # file's link
+ $_ = qq{<LI><A HREF="$furl">$desc</A>};
+ }
+
+ my $url = $response->request->uri;
+ # Ensure that the base URL is "/" terminated
+ my $base = $url->clone;
+ unless ($base->epath =~ m|/$|) {
+ $base->epath($base->epath . "/");
+ }
+ my $files = join("\n",
+ "<HTML>\n<HEAD>",
+ "<TITLE>Directory $path</TITLE>",
+ "<BASE HREF=\"$base\">",
+ "</HEAD>\n<BODY>",
+ "<H1>Directory listing of $path</H1>",
+ "<UL>", @files, "</UL>",
+ "</BODY>\n</HTML>\n");
+
+ return ($files, $response);
+}
+
+1;
diff --git a/lib/LWP/Parallel/Protocol/ftp.pm b/lib/LWP/Parallel/Protocol/ftp.pm
new file mode 100755
index 0000000..a823979
--- /dev/null
+++ b/lib/LWP/Parallel/Protocol/ftp.pm
@@ -0,0 +1,657 @@
+# -*- perl -*-
+# $Id: ftp.pm,v 1.11 2003/02/19 14:58:37 langhein Exp $
+# derived from: ftp.pm,v 1.31 2001/10/26 20:13:20 gisle Exp
+
+# Implementation of the ftp protocol (RFC 959). We let the Net::FTP
+# package do all the dirty work.
+
+package LWP::Parallel::Protocol::ftp;
+
+use Carp ();
+
+use HTTP::Status ();
+use HTTP::Negotiate ();
+use HTTP::Response ();
+use LWP::MediaTypes ();
+use File::Listing ();
+
+require LWP::Parallel::Protocol;
+require LWP::Protocol::ftp;
+ at ISA = qw(LWP::Parallel::Protocol LWP::Protocol::ftp);
+
+use strict;
+
+eval {
+ package LWP::Parallel::Protocol::MyFTP;
+
+ require Net::FTP;
+ Net::FTP->require_version(2.00);
+
+ use vars qw(@ISA);
+ @ISA=qw(Net::FTP);
+
+ sub new {
+ my $class = shift;
+ LWP::Debug::trace('()');
+
+ my $self = $class->SUPER::new(@_) || return undef;
+
+ my $mess = $self->message; # welcome message
+ LWP::Debug::debug($mess);
+ $mess =~ s|\n.*||s; # only first line left
+ $mess =~ s|\s*ready\.?$||;
+ # Make the version number more HTTP like
+ $mess =~ s|\s*\(Version\s*|/| and $mess =~ s|\)$||;
+ ${*$self}{myftp_server} = $mess;
+ #$response->header("Server", $mess);
+
+ $self;
+ }
+
+ sub http_server {
+ my $self = shift;
+ ${*$self}{myftp_server};
+ }
+
+ sub home {
+ my $self = shift;
+ my $old = ${*$self}{myftp_home};
+ if (@_) {
+ ${*$self}{myftp_home} = shift;
+ }
+ $old;
+ }
+
+ sub go_home {
+ LWP::Debug::trace('');
+ my $self = shift;
+ $self->cwd(${*$self}{myftp_home});
+ }
+
+ sub request_count {
+ my $self = shift;
+ ++${*$self}{myftp_reqcount};
+ }
+
+ sub ping {
+ LWP::Debug::trace('');
+ my $self = shift;
+ return $self->go_home;
+ }
+
+};
+my $init_failed = $@;
+
+=item ($socket, $second_arg) = $prot->handle_connect ($req, $proxy, $timeout);
+
+This method connects with the server on the machine and port specified
+in the $req object. If a $proxy is given, it will return an error,
+since the FTP protocol does not allow proxying. (See below on how such
+an error is propagated to the caller).
+
+If successful, the first argument will contain the IO::Socket object
+that connects to the specified site. The second argument is empty (for
+ftp, that is. See LWP::Protocol::http for different usage).
+
+If the connection fails, $socket is set to 'undef', and the second
+argument contains a HTTP::Response object holding a textual
+representation of the error. (You can use its 'code' and 'message'
+methods to find out what went wrong)
+
+=cut
+
+sub handle_connect {
+ my ($self, $request, $proxy, $timeout) = @_;
+
+ # mostly directly copied from the original Protocol::ftp, changes
+ # are marked with "# ML" comment (mostly return values)
+
+ # check proxy
+ if (defined $proxy)
+ {
+ return (undef, new HTTP::Response &HTTP::Status::RC_BAD_REQUEST,
+ 'You can not proxy through the ftp'); # ML
+ }
+
+ my $url = $request->url;
+ if ($url->scheme ne 'ftp') {
+ my $scheme = $url->scheme;
+ return (undef, new HTTP::Response &HTTP::Status::RC_INTERNAL_SERVER_ERROR,
+ "LWP::Protocol::ftp::request called for '$scheme'"); # ML
+ }
+
+ # check method
+ my $method = $request->method;
+
+ unless ($method eq 'GET' || $method eq 'HEAD' || $method eq 'PUT') {
+ return (undef, new HTTP::Response &HTTP::Status::RC_BAD_REQUEST,
+ 'Library does not allow method ' .
+ "$method for 'ftp:' URLs"); # ML
+ }
+
+ if ($init_failed) {
+ return (undef, new HTTP::Response &HTTP::Status::RC_INTERNAL_SERVER_ERROR,
+ $init_failed); # ML
+ }
+
+ my $host = $url->host;
+ my $port = $url->port;
+ my $user = $url->user;
+ # taken out some additional variable declarations here, that are now
+ # only needed in 'write_request' method.
+
+ #################
+ # new in LWP 5.60
+ my $account = $request->header('Account'); # ML
+
+ my $key;
+ my $conn_cache = $self->{ua}{conn_cache};
+ if ($conn_cache) {
+ $key = "$host:$port:$user";
+ $key .= ":$account" if defined($account);
+ if (my $ftp = $conn_cache->withdraw("ftp", $key)) {
+ if ($ftp->ping) {
+ LWP::Debug::debug('Reusing old connection');
+ # save it again
+ $conn_cache->deposit("ftp", $key, $ftp);
+ # added $response object # ML
+ my $response =
+ HTTP::Response->new(&HTTP::Status::RC_OK, "Document follows");
+ return ($ftp, $response);
+ }
+ }
+ }
+
+ # try to make a connection
+ my $ftp = LWP::Parallel::Protocol::MyFTP->new($host,
+ Port => $port,
+ Timeout => $timeout,
+ );
+ # XXX Should be some what to pass on 'Passive' (header??)
+ #################
+
+ my $response;
+ unless ($ftp) {
+ $@ =~ s/^Net::FTP: //; # new in LWP 5.60
+ $response = HTTP::Response->new(&HTTP::Status::RC_INTERNAL_SERVER_ERROR,$@);
+ } else {
+ # Create an initial response object
+ $response = HTTP::Response->new(&HTTP::Status::RC_OK, "Document follows");
+ #################
+ # new in LWP 5.60
+ $response->header(Server => $ftp->http_server);
+ $response->header('Client-Request-Num' => $ftp->request_count);
+ #################
+ $response->request($request);
+ }
+
+ return ($ftp, $response); # ML
+}
+
+sub write_request {
+ my ($self, $request, $ftp, $response, $arg, $timeout) = @_;
+
+ # Some of the following variable declarations, directly copied from
+ # the original Protocol::ftp module, appear both in 'handle_connect'
+ # _and_ 'write_request' method. Although it introduces additional
+ # overhead, we can't pass additional variables between those two
+ # methods, but we need some of the values in both routines. We
+ # allow the account to be specified in the "Account" header
+ my $account = $request->header('Account');
+
+ my $url = $request->url;
+ my $host = $url->host;
+ my $port = $url->port;
+ my $user = $url->user;
+ my $password = $url->password;
+
+ # If a basic autorization header is present than we prefer these over
+ # the username/password specified in the URL.
+ {
+ my($u,$p) = $request->authorization_basic;
+ if (defined $u) {
+ $user = $u;
+ $password = $p;
+ }
+ }
+
+ my $method = $request->method;
+
+ # from here on mostly directly clipped from the original
+ # Protocol::ftp. Changes are marked with "# ML" comment
+
+ # from here on it seems FTP will handle timeouts, right? # ML
+ $ftp->timeout($timeout) if $timeout;
+
+ LWP::Debug::debug("Logging in as $user (password $password)...");
+ unless ($ftp->login($user, $password, $account)) {
+ # Unauthorized. Let's fake a RC_UNAUTHORIZED response
+ my $mess = scalar($ftp->message);
+ LWP::Debug::debug($mess);
+ $mess =~ s/\n$//;
+ my $res = HTTP::Response->new(&HTTP::Status::RC_UNAUTHORIZED, $mess);
+ $res->header("Server", $ftp->http_server);
+ $res->header("WWW-Authenticate", qq(Basic Realm="FTP login"));
+ return (undef, $res); # ML
+ }
+ LWP::Debug::debug($ftp->message);
+
+ #################
+ # new in LWP 5.60
+ my $home = $ftp->pwd;
+ LWP::Debug::debug("home: '$home'");
+ $ftp->home($home);
+
+ # ML
+ my $key;
+ $key = "$host:$port:$user";
+ $key .= ":$account" if defined($account);
+ #
+
+ my $conn_cache = $self->{ua}{conn_cache};
+ $conn_cache->deposit("ftp", $key, $ftp) if $conn_cache;
+ #################
+
+ # Get & fix the path
+ my @path = $url->path_segments;
+ # removed in LWP 5.48
+ #shift(@path); # There will always be an empty first component
+ #pop(@path) while @path && $path[-1] eq ''; # remove empty tailing comps
+
+ my $remote_file = pop(@path);
+ $remote_file = '' unless defined $remote_file;
+
+ my $type;
+ if (ref $remote_file) {
+ my @params;
+ ($remote_file, @params) = @$remote_file;
+ for (@params) {
+ $type = $_ if s/^type=//;
+ }
+ }
+
+ if ($type && $type eq 'a') {
+ $ftp->ascii;
+ } else {
+ $ftp->binary;
+ }
+
+ for (@path) {
+ LWP::Debug::debug("CWD $_");
+ unless ($ftp->cwd($_)) {
+ return (undef, new HTTP::Response &HTTP::Status::RC_NOT_FOUND,
+ "Can't chdir to $_");
+ }
+ }
+
+ if ($method eq 'GET' || $method eq 'HEAD') {
+ # new in ftp.pm,v 1.23 (fixed in ftp.pm,v 1.24)
+ LWP::Debug::debug("MDTM");
+ if (my $mod_time = $ftp->mdtm($remote_file)) {
+ $response->last_modified($mod_time);
+ if (my $ims = $request->if_modified_since) {
+ if ($mod_time <= $ims) {
+ $response->code(&HTTP::Status::RC_NOT_MODIFIED);
+ $response->message("Not modified");
+ return (undef, $response);
+ }
+ }
+ }
+ # end_of_new_stuff
+
+ #################
+ # new in LWP 5.60
+
+ # We'll use this later to abort the transfer if necessary.
+ # if $max_size is defined, we need to abort early. Otherwise, it's
+ # a normal transfer
+ my $max_size = undef;
+
+ # Set resume location, if the client requested it
+ if ($request->header('Range') && $ftp->supported('REST'))
+ {
+ my $range_info = $request->header('Range');
+
+ # Change bytes=2772992-6781209 to just 2772992
+ my ($start_byte,$end_byte) = $range_info =~ /.*=\s*(\d+)-(\d+)/;
+
+ if (!defined $start_byte || !defined $end_byte ||
+ ($start_byte < 0) || ($start_byte > $end_byte) || ($end_byte < 0))
+ {
+ return (undef, HTTP::Response->new(&HTTP::Status::RC_BAD_REQUEST,
+ 'Incorrect syntax for Range request'));
+ }
+
+ $max_size = $end_byte-$start_byte;
+
+ $ftp->restart($start_byte);
+ } elsif ($request->header('Range') && !$ftp->supported('REST')) {
+ return (undef,HTTP::Response->new(&HTTP::Status::RC_NOT_IMPLEMENTED,
+ "Server does not support resume."));
+ }
+ ################
+
+
+ my $data; # the data handle
+ LWP::Debug::debug("retrieve file?");
+ if (length($remote_file) and $data = $ftp->retr($remote_file)) {
+ # remove reading from socket into 'read_chunk' method.
+ # just return our new $listen_socket here.
+ my($type, @enc) = LWP::MediaTypes::guess_media_type($remote_file);
+ $response->header('Content-Type', $type) if $type;
+ for (@enc) {
+ $response->push_header('Content-Encoding', $_);
+ }
+ my $mess = $ftp->message;
+ LWP::Debug::debug($mess);
+ if ($mess =~ /\((\d+)\s+bytes\)/) {
+ $response->header('Content-Length', "$1");
+ }
+ return ($data, $response); # ML
+ } elsif (!length($remote_file) || $ftp->code == 550) {
+ # no file, the remote file is actually a directory, so cdw into directory
+ if (length($remote_file) && !$ftp->cwd($remote_file)) {
+ LWP::Debug::debug("chdir before listing failed");
+ return (undef, new HTTP::Response &HTTP::Status::RC_NOT_FOUND,
+ "File '$remote_file' not found"); # ML
+ }
+
+ # It should now be safe to try to list the directory
+ LWP::Debug::debug("dir");
+ my @lsl = $ftp->dir;
+
+ # Try to figure out if the user want us to convert the
+ # directory listing to HTML.
+ my @variants =
+ (
+ ['html', 0.60, 'text/html' ],
+ ['dir', 1.00, 'text/ftp-dir-listing' ]
+ );
+ #$HTTP::Negotiate::DEBUG=1;
+ my $prefer = HTTP::Negotiate::choose(\@variants, $request);
+
+ my $content = '';
+
+ if (!defined($prefer)) {
+ return (undef, new HTTP::Response &HTTP::Status::RC_NOT_ACCEPTABLE,
+ "Neither HTML nor directory listing wanted"); # ML
+ } elsif ($prefer eq 'html') {
+ $response->header('Content-Type' => 'text/html');
+ $content = "<HEAD><TITLE>File Listing</TITLE>\n";
+ my $base = $request->url->clone;
+ my $path = $base->path;
+ $base->path("$path/") unless $path =~ m|/$|;
+ $content .= qq(<BASE HREF="$base">\n</HEAD>\n);
+ $content .= "<BODY>\n<UL>\n";
+ for (File::Listing::parse_dir(\@lsl, 'GMT')) {
+ my($name, $type, $size, $mtime, $mode) = @$_;
+ $content .= qq( <LI> <a href="$name">$name</a>);
+ $content .= " $size bytes" if $type eq 'f';
+ $content .= "\n";
+ }
+ $content .= "</UL></body>\n";
+ } else {
+ $response->header('Content-Type', 'text/ftp-dir-listing');
+ $content = join("\n", @lsl, '');
+ }
+
+ $response->header('Content-Length', length($content));
+
+ if ($method ne 'HEAD') {
+ # $self->receive_once($arg, $response, $content);
+ # calling receive_once is now done in UserAgent.pm #ML 7/99
+ # here we just add the content to the response:
+ $response->content($content);
+ }
+ } else {
+ my $res = new HTTP::Response &HTTP::Status::RC_BAD_REQUEST,
+ "FTP return code " . $ftp->code;
+ $res->content_type("text/plain");
+ $res->content($ftp->message);
+ return (undef, $res); # ML
+ }
+ } elsif ($method eq 'PUT') {
+ # method must be PUT
+ unless (length($remote_file)) {
+ return (undef, new HTTP::Response &HTTP::Status::RC_BAD_REQUEST,
+ "Must have a file name to PUT to"); # ML
+ }
+ my $data;
+ if ($data = $ftp->stor($remote_file)) {
+ LWP::Debug::debug($ftp->message);
+ LWP::Debug::debug("$data");
+ my $content = $request->content;
+ my $bytes = 0;
+ if (defined $content) {
+ if (ref($content) eq 'SCALAR') {
+ $bytes = $data->write($$content, length($$content));
+ } elsif (ref($content) eq 'CODE') {
+ my($buf, $n);
+ while (length($buf = &$content)) {
+ $n = $data->write($buf, length($buf));
+ last unless $n;
+ $bytes += $n;
+ }
+ } elsif (!ref($content)) {
+ if (defined $content && length($content)) {
+ $bytes = $data->write($content, length($content));
+ }
+ } else {
+ die "Bad content";
+ }
+ }
+ $data->close;
+ LWP::Debug::debug($ftp->message);
+
+ $response->code(&HTTP::Status::RC_CREATED);
+ $response->header('Content-Type', 'text/plain');
+ $response->content("$bytes bytes stored as $remote_file on $host\n")
+ } else {
+ my $res = new HTTP::Response &HTTP::Status::RC_BAD_REQUEST,
+ "FTP return code " . $ftp->code;
+ $res->content_type("text/plain");
+ $res->content($ftp->message);
+ return (undef, $res); # ML
+ }
+ } else {
+ return (undef, new HTTP::Response &HTTP::Status::RC_BAD_REQUEST,
+ "Illegal method $method"); # ML
+ }
+ return (undef, $response);
+}
+
+sub read_chunk {
+ my ($self, $response, $data, $request, $arg, $size, $timeout, $entry) = @_;
+
+ my $method = $request->method;
+ if ($method ne 'HEAD') {
+ LWP::Debug::debug('reading response');
+
+ my $buf = "";
+ # read one chunk at a time from $socket
+ my $bytes_read;
+ # decide whether to use 'read' or 'sysread'
+ $bytes_read = $data->sysread( $buf, $size ); # IO::Socket
+
+ ## XXX find a way here to check maxsize (line 298 in LWP::Protocol::ftp)
+ ## problem: get current size of response from entry object.
+ ## trim buf-content if necessary
+ ## return undef at the end when we're done, no?
+
+ # parse data from server
+ my $retval = $self->receive($arg, $response, \$buf, $entry);
+ # A return value lower than zero means a command from our
+ # callback function. Make sure it reaches ParallelUA:
+ # return (defined($retval) and (0 > $retval) ?
+ # $retval : $bytes_read);
+ return (defined $retval? $retval : $bytes_read);
+ }
+}
+
+sub close_connection {
+ my ($self, $response, $data, $request, $ftp) = @_;
+
+ my $method = $request->method;
+ if ($method ne 'HEAD') {
+ unless ($data->close) {
+ # Something did not work too well
+ $response->code(&HTTP::Status::RC_INTERNAL_SERVER_ERROR);
+ $response->message("FTP close response: " . $ftp->code .
+ " " . $ftp->message);
+ }
+ }
+}
+
+sub request
+{
+ my($self, $request, $proxy, $arg, $size, $timeout) = @_;
+
+ $size = 4096 unless $size;
+
+ LWP::Debug::trace('()');
+
+ # handle connect already gives us our response object
+ # porting remark: ParallelUA expects this function to return
+ # ($socket, $fullpath). Luckily, the Net::FTP is a IO::Socket::INET
+ # object, so ParallelUA won't notice the difference between the
+ # $socket object returned by http.pm's "handle_connect" method, and
+ # the $ftp object returned by ftp.pm's "handle_connect" method :)
+ # As for the $fullpath parameter -- ParallelUA doesn't do anything
+ # with this value other than passing it as a second argument to
+ # the "write_request" method (well, and storing it in its entry list,
+ # in the meantime. But so who cares -- perl certainly doesn't -- if
+ # we store a string or a pointer to an object in there!).
+ my ($ftp, $response) = $self->handle_connect ($request, $proxy, $timeout);
+
+ # if its status is not "OK", then something went wrong during our
+ # call to handle_connect, and we should stop here and return the
+ # response object containing the reason for this error:
+ return $response unless $response->is_success;
+
+ # issue request (in case of error creates Error-Response)
+ my ($listen_socket, $error_response) =
+ $self->write_request ($request, $ftp, $response, $arg, $timeout);
+
+ unless ($error_response) {
+ # now we can start reading from our $listen_socket
+ while (1) {
+ last unless $self->read_chunk ($response, $listen_socket,
+ $request, $arg, $size, $timeout, $ftp);
+ }
+ $self->close_connection ($response, $listen_socket, $request, $ftp);
+ $listen_socket = undef;
+ } else {
+ $response = $error_response;
+ }
+
+ $ftp = undef; # close it (ditto)
+ $response;
+}
+
+1;
+
+__END__
+
+# This is what RFC 1738 has to say about FTP access:
+# --------------------------------------------------
+#
+# 3.2. FTP
+#
+# The FTP URL scheme is used to designate files and directories on
+# Internet hosts accessible using the FTP protocol (RFC959).
+#
+# A FTP URL follow the syntax described in Section 3.1. If :<port> is
+# omitted, the port defaults to 21.
+#
+# 3.2.1. FTP Name and Password
+#
+# A user name and password may be supplied; they are used in the ftp
+# "USER" and "PASS" commands after first making the connection to the
+# FTP server. If no user name or password is supplied and one is
+# requested by the FTP server, the conventions for "anonymous" FTP are
+# to be used, as follows:
+#
+# The user name "anonymous" is supplied.
+#
+# The password is supplied as the Internet e-mail address
+# of the end user accessing the resource.
+#
+# If the URL supplies a user name but no password, and the remote
+# server requests a password, the program interpreting the FTP URL
+# should request one from the user.
+#
+# 3.2.2. FTP url-path
+#
+# The url-path of a FTP URL has the following syntax:
+#
+# <cwd1>/<cwd2>/.../<cwdN>/<name>;type=<typecode>
+#
+# Where <cwd1> through <cwdN> and <name> are (possibly encoded) strings
+# and <typecode> is one of the characters "a", "i", or "d". The part
+# ";type=<typecode>" may be omitted. The <cwdx> and <name> parts may be
+# empty. The whole url-path may be omitted, including the "/"
+# delimiting it from the prefix containing user, password, host, and
+# port.
+#
+# The url-path is interpreted as a series of FTP commands as follows:
+#
+# Each of the <cwd> elements is to be supplied, sequentially, as the
+# argument to a CWD (change working directory) command.
+#
+# If the typecode is "d", perform a NLST (name list) command with
+# <name> as the argument, and interpret the results as a file
+# directory listing.
+#
+# Otherwise, perform a TYPE command with <typecode> as the argument,
+# and then access the file whose name is <name> (for example, using
+# the RETR command.)
+#
+# Within a name or CWD component, the characters "/" and ";" are
+# reserved and must be encoded. The components are decoded prior to
+# their use in the FTP protocol. In particular, if the appropriate FTP
+# sequence to access a particular file requires supplying a string
+# containing a "/" as an argument to a CWD or RETR command, it is
+# necessary to encode each "/".
+#
+# For example, the URL <URL:ftp://myname@host.dom/%2Fetc/motd> is
+# interpreted by FTP-ing to "host.dom", logging in as "myname"
+# (prompting for a password if it is asked for), and then executing
+# "CWD /etc" and then "RETR motd". This has a different meaning from
+# <URL:ftp://myname@host.dom/etc/motd> which would "CWD etc" and then
+# "RETR motd"; the initial "CWD" might be executed relative to the
+# default directory for "myname". On the other hand,
+# <URL:ftp://myname@host.dom//etc/motd>, would "CWD " with a null
+# argument, then "CWD etc", and then "RETR motd".
+#
+# FTP URLs may also be used for other operations; for example, it is
+# possible to update a file on a remote file server, or infer
+# information about it from the directory listings. The mechanism for
+# doing so is not spelled out here.
+#
+# 3.2.3. FTP Typecode is Optional
+#
+# The entire ;type=<typecode> part of a FTP URL is optional. If it is
+# omitted, the client program interpreting the URL must guess the
+# appropriate mode to use. In general, the data content type of a file
+# can only be guessed from the name, e.g., from the suffix of the name;
+# the appropriate type code to be used for transfer of the file can
+# then be deduced from the data content of the file.
+#
+# 3.2.4 Hierarchy
+#
+# For some file systems, the "/" used to denote the hierarchical
+# structure of the URL corresponds to the delimiter used to construct a
+# file name hierarchy, and thus, the filename will look similar to the
+# URL path. This does NOT mean that the URL is a Unix filename.
+#
+# 3.2.5. Optimization
+#
+# Clients accessing resources via FTP may employ additional heuristics
+# to optimize the interaction. For some FTP servers, for example, it
+# may be reasonable to keep the control connection open while accessing
+# multiple URLs from the same server. However, there is no common
+# hierarchical model to the FTP protocol, so if a directory change
+# command has been given, it is impossible in general to deduce what
+# sequence should be given to navigate to another directory for a
+# second retrieval, if the paths are different. The only reliable
+# algorithm is to disconnect and reestablish the control connection.
diff --git a/lib/LWP/Parallel/Protocol/http.pm b/lib/LWP/Parallel/Protocol/http.pm
new file mode 100755
index 0000000..27323bd
--- /dev/null
+++ b/lib/LWP/Parallel/Protocol/http.pm
@@ -0,0 +1,489 @@
+# -*- perl -*-
+# $Id: http.pm,v 1.13 2003/03/11 16:49:35 langhein Exp $
+# derived from: http10.pm,v 1.1 2001/10/26 17:27:19 gisle Exp $
+
+package LWP::Parallel::Protocol::http;
+
+use strict;
+
+require LWP::Debug;
+require HTTP::Response;
+require HTTP::Status;
+require Net::HTTP;
+require IO::Socket;
+require IO::Select;
+use Carp ();
+
+use vars qw(@ISA @EXTRA_SOCK_OPTS);
+
+require LWP::Parallel::Protocol;
+require LWP::Protocol::http; # until i figure out gisle's http1.1 stuff!
+ at ISA = qw(LWP::Parallel::Protocol LWP::Protocol::http);
+
+my $CRLF = "\015\012"; # how lines should be terminated;
+ # "\r\n" is not correct on all systems, for
+ # instance MacPerl defines it to "\012\015"
+
+# The following 4 methods are more or less a simple breakdown of the
+# original $http->request method:
+=item ($socket, $fullpath) = $prot->handle_connect ($req, $proxy, $timeout);
+
+This method connects with the server on the machine and port specified
+in the $req object. If a $proxy is given, it will translate the
+request into an appropriate proxy-request and return the new URL in
+the $fullpath argument.
+
+$socket is either an IO::Socket object (in parallel mode), or a
+LWP::Socket object (when used via Std. non-parallel modules, such as
+LWP::UserAgent)
+
+=cut
+
+sub handle_connect {
+ my ($self, $request, $proxy, $timeout, $nonblock) = @_;
+
+ # check method
+ my $method = $request->method;
+ unless ($method =~ /^[A-Za-z0-9_!\#\$%&\'*+\-.^\`|~]+$/) { # HTTP token
+ return (undef, new HTTP::Response &HTTP::Status::RC_BAD_REQUEST,
+ 'Library does not allow method ' .
+ "$method for 'http:' URLs");
+ }
+
+ my $url = $request->url;
+ my($host, $port, $fullpath) = $self->get_address ($proxy, $url, $method);
+
+ # connect to remote site
+ my $socket = $self->_connect ($host, $port, $timeout, $nonblock);
+
+# LWP::Debug::debug("Socket is $socket");
+
+# get LINGER get it!
+# my $data = $socket->sockopt(13); #define SO_LINGER = 13
+# my @a_data = unpack ("ii",$data);
+# $a_data[0] = 1; $a_data[1] = 0;
+# $data = pack ("ii", at a_data);
+#
+# $socket->sockopt(13, $data); #define SO_LINGER = 13
+# my $newdata = $socket->sockopt(13); #define SO_LINGER = 13
+# @a_data = unpack ("ii",$newdata);
+#
+# print "Socket $socket: SO_LINGER (", $a_data[0],", ",$a_data[1],")\n";
+# got Linger got it!
+
+
+ ($socket, $fullpath);
+}
+
+sub get_address {
+ my ($self, $proxy, $url,$method) = @_;
+ my($host, $port, $fullpath);
+
+ # Check if we're proxy'ing
+ if (defined $proxy) {
+ # $proxy is an URL to an HTTP server which will proxy this request
+ $host = $proxy->host;
+ $port = $proxy->port;
+ $fullpath = $method && ($method eq "CONNECT") ?
+ ($url->host . ":" . $url->port) :
+ $url->as_string;
+ }
+ else {
+ $host = $url->host;
+ $port = $url->port;
+ $fullpath = $url->path_query;
+ $fullpath = "/" unless length $fullpath;
+ }
+ ($host, $port, $fullpath);
+}
+
+sub _connect { # renamed to make clear that this is private sub
+ my ($self, $host, $port, $timeout, $nonblock) = @_;
+ my ($socket);
+ unless ($nonblock) {
+ # perform good ol' blocking behavior
+ #
+ # this method inherited from LWP::Protocol::http
+ $socket = $self->_new_socket($host, $port, $timeout);
+ # currently empty function in LWP::Protocol::http
+ # $self->_check_sock($request, $socket);
+ } else {
+ # new non-blocking behavior
+ #
+ # thanks to http://www.en-directo.net/mail/kirill.html
+ use Socket();
+ use POSIX();
+ $socket =
+ IO::Socket::INET->new(Proto => 'tcp', # Timeout => $timeout,
+ $self->_extra_sock_opts ($host, $port));
+
+ die "Can't create socket for $host:$port ($@)" unless $socket;
+ unless ( defined $socket->blocking (0) )
+ {
+ # IO::Handle::blocking doesn't (yet?) work on Win32 (ActiveState port)
+ # The following happens to work though.
+ # See also: perlport manpage, POE::Kernel, POE::Wheel::SocketFactory,
+ # Winsock2.h
+ if ( $^O eq 'MSWin32' )
+ {
+ my $set_it = "1";
+ my $ioctl_val = 0x80000000 | (4 << 16) | (ord('f') << 8) | 126;
+ $ioctl_val = ioctl ($socket, $ioctl_val, $set_it);
+# warn 'Win32 ioctl returned ' . (defined $ioctl_val ? $ioctl_val : '[undef]') . "\n";
+# warn "Win32 ioctlsocket failed\n" unless $ioctl_val;
+ }
+ }
+ my $rhost = Socket::inet_aton ($host);
+ die "Bad hostname $host" unless defined $rhost;
+ unless ( $socket->connect ($port, $rhost) )
+ {
+ my $err = $! + 0;
+ # More trouble with ActiveState: EINPROGRESS and EWOULDBLOCK
+ # are missing from POSIX.pm. See Microsoft's Winsock2.h
+ my ($einprogress, $ewouldblock) = $^O eq 'MSWin32' ?
+ (10036, 10035) : (POSIX::EINPROGRESS(), POSIX::EWOULDBLOCK());
+ die "Can't connect to $host:$port ($@)"
+ if $err and $err != $einprogress and $err != $ewouldblock;
+ }
+ }
+ LWP::Debug::debug("Socket is $socket");
+ $socket;
+}
+
+sub write_request {
+ my ($self, $request, $socket, $fullpath, $arg, $timeout, $proxy) = @_;
+
+ my $method = $request->method;
+ my $url = $request->url;
+
+ LWP::Debug::trace ("write_request (".
+ (defined $request ? $request : '[undef]').
+ ", ". (defined $socket ? $socket : '[undef]').
+ ", ". (defined $fullpath ? $fullpath : '[undef]').
+ ", ". (defined $arg ? $arg : '[undef]').
+ ", ". (defined $timeout ? $timeout : '[undef]').
+ ", ". (defined $proxy ? $proxy : '[undef]'). ")");
+
+ my $sel = IO::Select->new($socket) if $timeout;
+
+ my $request_line = "$method $fullpath HTTP/1.0$CRLF";
+
+ my $h = $request->headers->clone;
+ my $cont_ref = $request->content_ref;
+ $cont_ref = $$cont_ref if ref($$cont_ref);
+ my $ctype = ref($cont_ref);
+
+ # If we're sending content we *have* to specify a content length
+ # otherwise the server won't know a messagebody is coming.
+ if ($ctype eq 'CODE') {
+ die 'No Content-Length header for request with dynamic content'
+ unless defined($h->header('Content-Length')) ||
+ $h->content_type =~ /^multipart\//;
+ # For HTTP/1.1 we could have used chunked transfer encoding...
+ }
+ else {
+ $h->header('Content-Length' => length $$cont_ref)
+ if defined($$cont_ref) && length($$cont_ref);
+ }
+
+ $self->_fixup_header($h, $url, $proxy);
+
+ my $buf = $request_line . $h->as_string($CRLF) . $CRLF;
+ my $n; # used for return value from syswrite/sysread
+ my $length;
+ my $offset;
+
+ # die's will be caught if user specified "use_eval".
+
+ # syswrite $buf
+ $length = length($buf);
+ $offset = 0;
+ while ( $offset < $length ) {
+ die "write timeout" if $timeout && !$sel->can_write($timeout);
+ $n = $socket->syswrite($buf, $length-$offset, $offset );
+ die $! unless defined($n);
+ $offset += $n;
+ }
+
+ LWP::Debug::conns($buf);
+
+ if ($ctype eq 'CODE') {
+ while ( ($buf = &$cont_ref()), defined($buf) && length($buf)) {
+ # syswrite $buf
+ $length = length($buf);
+ $offset = 0;
+ while ( $offset < $length ) {
+ die "write timeout" if $timeout && !$sel->can_write($timeout);
+ $n = $socket->syswrite($buf, $length-$offset, $offset );
+ die $! unless defined($n);
+ $offset += $n;
+ }
+ LWP::Debug::conns($buf);
+ }
+ }
+ elsif (defined($$cont_ref) && length($$cont_ref)) {
+ # syswrite $$cont_ref
+ $length = length($$cont_ref);
+ $offset = 0;
+ while ( $offset < $length ) {
+ die "write timeout" if $timeout && !$sel->can_write($timeout);
+ $n = $socket->syswrite($$cont_ref, $length-$offset, $offset );
+ die $! unless defined($n);
+ $offset += $n;
+ }
+ LWP::Debug::conns($buf);
+ }
+
+ # For an HTTP request, the 'command' socket is the same as the
+ # 'listen' socket, so we just return the socket here.
+ # (In the ftp module, we usually have one socket being the command
+ # socket, and another one being the read socket, so that's why we
+ # have this overhead here)
+ return $socket;
+}
+
+# whereas 'handle_connect' (with its submethods 'get_address' and
+# 'connect') and 'write_request' mainly just encapsulate different
+# parts of the old http->request method, 'read_chunk' has an added
+# level of complexity. This is because we have to be content with
+# whatever data is available, and somehow 'save' our current state
+# between multiple calls.
+
+# To faciliate things later, when we need redirects and
+# authentication, we insist that we _always_ have a response object
+# available, which is generated outside and initialized with bogus
+# data (code = 0). Also, we can then save ourselves the trouble of
+# using a call-by-variable for $response in order to return a freshly
+# generated $response-object.
+
+# We have to provide IO::Socket-objects with a pushback mechanism,
+# which comes pretty handy in case we can't use all the information read
+# so far. Instead of changing the IO::Socket code, we just have our own
+# little pushback buffer, $pushback, indexed by $socket object here.
+
+my %pushback;
+
+sub read_chunk {
+ my ($self, $response, $socket, $request, $arg, $size,
+ $timeout, $entry) = @_;
+
+ LWP::Debug::trace ("read_chunk (".
+ (defined $response ? $response : '[undef]').
+ ", ". (defined $socket ? $socket : '[undef]').
+ ", ". (defined $request ? $request : '[undef]').
+ ", ". (defined $arg ? $arg : '[undef]').
+ ", ". (defined $size ? $size : '[undef]').
+ ", ". (defined $timeout ? $timeout : '[undef]').
+ ", ". (defined $entry ? $entry : '[undef]'). ")");
+
+ # hack! Can we just generate a new Select object here? Or do we
+ # have to take the one we created in &write_request?!?
+ my $sel = IO::Select->new($socket) if $timeout;
+
+ LWP::Debug::debug('reading response ('.
+ (defined($pushback{$socket})?length($pushback{$socket}):0) .' buffered)');
+
+ my $buf = "";
+ # read one chunk at a time from $socket
+
+ if ( $timeout && !$sel->can_read($timeout) ) {
+ $response->message("Read Timeout");
+ $response->code(&HTTP::Status::RC_REQUEST_TIMEOUT);
+ $response->request($request);
+ return 0; # EOF
+ };
+ my $n = $socket->sysread($buf, $size, length($buf));
+ unless (defined ($n)) {
+ $response->message("Sysread Error: $!");
+ $response->code(&HTTP::Status::RC_SERVICE_UNAVAILABLE);
+ $response->request($request);
+ return 0; # EOF
+ };
+ # need our own EOF detection here
+ unless ( $n ) {
+ unless ($response and $response->code) {
+ $response->message("Unexpected EOF while reading response");
+ $response->code(&HTTP::Status::RC_BAD_GATEWAY);
+ $response->request($request);
+ return 0; # EOF
+ }
+ }
+
+ # prepend contents of unprocessed buffer content from last read
+ $buf = $pushback{$socket} . $buf if $pushback{$socket};
+ LWP::Debug::conns("Buffer contents between dashes -->\n==========\n$buf==========");
+
+ # determine Protocol type and create response object
+ unless ($response and $response->code) {
+ if ($buf =~ s/^(HTTP\/\d+\.\d+)[ \t]+(\d+)[ \t]*([^\012]*)\012//) { #1.39
+ # HTTP/1.0 response or better
+ my($ver,$code,$msg) = ($1, $2, $3);
+ $msg =~ s/\015$//;
+ LWP::Debug::debug("Identified HTTP Protocol: $ver $code $msg");
+ $response->code($code);
+ $response->message($msg);
+ $response->protocol($ver);
+ # store $request info in $response object
+ $response->request($request);
+ }
+ elsif ((length($buf) >= 5 and $buf !~ /^HTTP\//) or
+ $buf =~ /\012/ ) {
+ # HTTP/0.9 or worse
+ LWP::Debug::debug("HTTP/0.9 assume OK");
+ $response->code(&HTTP::Status::RC_OK);
+ $response->message("OK");
+ $response->protocol('HTTP/0.9');
+ # store $request info in $response object
+ $response->request($request);
+ }
+ else {
+ # need more data
+ LWP::Debug::debug("need more data to know which protocol");
+ }
+ }
+
+ # if we have a protocol, read headers if neccessary
+ if ( $response && !&headers($response) ) {
+ # ensure that we have read all headers. The headers will be
+ # terminated by two blank lines
+ unless ($buf =~ /^\015?\012/ || $buf =~ /\015?\012\015?\012/) {
+ # must read more if we can...
+ LWP::Debug::debug("need more data for headers");
+ } else {
+ # now we start parsing the headers. The strategy is to
+ # remove one line at a time from the beginning of the header
+ # buffer ($buf).
+ my($key, $val);
+
+ while ($buf =~ s/([^\012]*)\012//) {
+ my $line = $1;
+
+ # if we need to restore as content when illegal headers
+ # are found.
+ my $save = "$line\012";
+
+ $line =~ s/\015$//;
+ last unless length $line;
+
+ if ($line =~ /^([a-zA-Z0-9_\-.]+)\s*:\s*(.*)/) {
+ $response->push_header($key, $val) if $key;
+ ($key, $val) = ($1, $2);
+ } elsif ($line =~ /^\s+(.*)/ && $key) {
+ $val .= " $1";
+ } else {
+ $response->push_header("Client-Bad-Header-Line" =>
+ $line);
+ }
+ }
+ $response->push_header($key, $val) if $key;
+
+ # check to see if we have any header at all
+ unless (&headers($response)) {
+ # we need at least one header to go on
+ LWP::Debug::debug("no headers found, inserting Client-Date");
+ $response->header ("Client-Date" =>
+ HTTP::Date::time2str(time));
+ }
+ } # of if then else
+ } # of if $response
+
+ # if we have both a response AND the headers, start parsing the rest
+ if ( $response && &headers($response) && length($buf)) {
+ $self->_get_sock_info($response, $socket);
+ # the CONNECT method does not need to read content
+ if ($request->method eq "CONNECT") { # from LWP 5.48's Protocol/http.pm
+ $response->{client_socket} = $socket; # so it can be picked up
+ }
+ else {
+ # all other methods want to read content, I guess...
+ # Note that we can't use $self->collect, since we don't want to give
+ # up control (by letting Protocol::collect use a $collector callback)
+ if (my @te = $response->remove_header('Transfer-Encoding')) {
+ $response->push_header('Client-Transfer-Encoding', \@te);
+ }
+ my $retval = $self->receive($arg, $response, \$buf, $entry);
+ # update pushback buffer (receive handles _all_ of current buffer)
+ $pushback{$socket} = '';
+ # return length of response read (or value of $retval, if any, which
+ # could be one of C_LASTCON, C_ENDCON, or C_ENDALL)
+ return (defined $retval? $retval : length($buf));
+ }
+ }
+
+ $pushback{$socket} = $buf;
+ return $n;
+}
+
+# This function indicates if we have already parsed the headers. In
+# case of HTTP/0.9 we (obviously?!) don't have any (which means that
+# we already 'parsed' them, so return 'true' no matter what)
+
+sub headers {
+ my ($response) = @_;
+
+ return 1 if $response->protocol eq 'HTTP/0.9';
+
+ ($response->headers_as_string ? 1 : 0);
+}
+
+sub close_connection {
+ my ($self, $response, $listen_socket, $request, $cmd_socket) = @_;
+# print "Closing socket $listen_socket\n";
+# $listen_socket->close;
+# $cmd_socket->close;
+}
+
+# the old (single request) frontend, defunct.
+sub request {
+ die "LWP::Parallel::Protocol::http does not support single requests\n";
+}
+
+
+#-----------------------------------------------------------
+# copied from LWP::Protocol::http (v1.63 in LWP5.64)
+#-----------------------------------------------------------
+package LWP::Parallel::Protocol::http::SocketMethods;
+
+sub sysread {
+ my $self = shift;
+ if (my $timeout = ${*$self}{io_socket_timeout}) {
+ die "read timeout" unless $self->can_read($timeout);
+ }
+ else {
+ # since we have made the socket non-blocking we
+ # use select to wait for some data to arrive
+ $self->can_read(undef) || die "Assert";
+ }
+ sysread($self, $_[0], $_[1], $_[2] || 0);
+}
+
+sub can_read {
+ my($self, $timeout) = @_;
+ my $fbits = '';
+ vec($fbits, fileno($self), 1) = 1;
+ my $nfound = select($fbits, undef, undef, $timeout);
+ die "select failed: $!" unless defined $nfound;
+ return $nfound > 0;
+}
+
+sub ping {
+ my $self = shift;
+ !$self->can_read(0);
+}
+
+sub increment_response_count {
+ my $self = shift;
+ return ++${*$self}{'myhttp_response_count'};
+}
+
+#-----------------------------------------------------------
+package LWP::Parallel::Protocol::http::Socket;
+use vars qw(@ISA);
+ at ISA = qw(LWP::Parallel::Protocol::http::SocketMethods Net::HTTP);
+
+#-----------------------------------------------------------
+# ^^^ copied from LWP::Protocol::http (v1.63 in LWP5.64)
+#-----------------------------------------------------------
+
+
+1;
diff --git a/lib/LWP/Parallel/Protocol/https.pm b/lib/LWP/Parallel/Protocol/https.pm
new file mode 100755
index 0000000..3cdfb5d
--- /dev/null
+++ b/lib/LWP/Parallel/Protocol/https.pm
@@ -0,0 +1,33 @@
+# -*- perl -*-
+# $Id: https.pm,v 1.3 2002/03/28 20:25:44 langhein Exp $
+# derived from: https.pm,v 1.8 1999/09/20 12:48:37 gisle Exp $
+
+use strict;
+
+package LWP::Parallel::Protocol::https;
+
+# Figure out which SSL implementation to use (copy & paste from LWP)
+use vars qw($SSL_CLASS);
+if ($IO::Socket::SSL::VERSION) {
+ $SSL_CLASS = "IO::Socket::SSL"; # it was already loaded
+} else {
+ eval { require Net::SSL; }; # from Crypt-SSLeay
+ if ($@) {
+ require IO::Socket::SSL;
+ $SSL_CLASS = "IO::Socket::SSL";
+ } else {
+ $SSL_CLASS = "Net::SSL";
+ }
+}
+
+use vars qw(@ISA);
+
+require LWP::Parallel::Protocol::http;
+require LWP::Protocol::https;
+ at ISA=qw(LWP::Protocol::https LWP::Parallel::Protocol::http);
+
+package LWP::Parallel::Protocol::https::Socket;
+
+our @ISA = qw(LWP::Protocol::https::Socket);
+
+1;
diff --git a/lib/LWP/Parallel/RobotUA.pm b/lib/LWP/Parallel/RobotUA.pm
new file mode 100755
index 0000000..fcf00e5
--- /dev/null
+++ b/lib/LWP/Parallel/RobotUA.pm
@@ -0,0 +1,565 @@
+# -*- perl -*-
+# $Id: RobotUA.pm,v 1.12 2004/02/10 15:19:19 langhein Exp $
+# derived from: RobotUA.pm,v 1.18 2000/04/09 11:21:11 gisle Exp $
+
+
+package LWP::Parallel::RobotUA;
+
+use LWP::Parallel::UserAgent qw(:CALLBACK);
+require LWP::RobotUA;
+ at ISA = qw(LWP::Parallel::UserAgent LWP::RobotUA Exporter);
+$VERSION = sprintf("%d.%02d", q$Revision: 1.12 $ =~ /(\d+)\.(\d+)/);
+
+ at EXPORT = qw();
+# callback commands
+ at EXPORT_OK = @LWP::Parallel::UserAgent::EXPORT_OK;
+%EXPORT_TAGS = %LWP::Parallel::UserAgent::EXPORT_TAGS;
+
+use LWP::Debug ();
+require HTTP::Request;
+require HTTP::Response;
+use HTTP::Date qw(time2str);
+use Carp();
+
+use strict;
+
+=head1 NAME
+
+LWP::Parallel::RobotUA - A class for Parallel Web Robots
+
+=head1 SYNOPSIS
+
+ require LWP::Parallel::RobotUA;
+ $ua = new LWP::Parallel::RobotUA 'my-robot/0.1', 'me at foo.com';
+ $ua->delay(0.5); # in minutes!
+ ...
+ # just use it just like a normal LWP::Parallel::UserAgent
+ $ua->register ($request, \&callback, 4096); # or
+ $ua->wait ( $timeout );
+
+=head1 DESCRIPTION
+
+This class implements a user agent that is suitable for robot
+applications. Robots should be nice to the servers they visit. They
+should consult the F</robots.txt> file to ensure that they are welcomed
+and they should not make requests too frequently.
+
+But, before you consider writing a robot take a look at
+<URL:http://info.webcrawler.com/mak/projects/robots/robots.html>.
+
+When you use a I<LWP::Parallel::RobotUA> as your user agent, then you do not
+really have to think about these things yourself. Just send requests
+as you do when you are using a normal I<LWP::Parallel::UserAgent> and this
+special agent will make sure you are nice.
+
+=head1 METHODS
+
+The LWP::Parallel::RobotUA is a sub-class of LWP::Parallel::UserAgent
+and LWP::RobotUA and implements a mix of their methods.
+
+In addition to LWP::Parallel::UserAgent, these methods are provided:
+
+=cut
+
+=head2 $ua = LWP::Parallel::RobotUA->new($agent_name, $from, [$rules])
+
+Your robot's name and the mail address of the human responsible for
+the robot (i.e. you) are required by the constructor.
+
+Optionally it allows you to specify the I<WWW::RobotRules> object to
+use. (See L<WWW::RobotRules::AnyDBM_File> for persistent caching of
+robot rules in a local file)
+
+=cut
+
+#' fix emacs syntax parser
+
+sub new {
+ my($class,$name,$from,$rules) = @_;
+
+ Carp::croak('LWP::Parallel::RobotUA name required') unless $name;
+ Carp::croak('LWP::Parallel::RobotUA from address required') unless $from;
+
+ my $self = new LWP::Parallel::UserAgent;
+ $self = bless $self, $class;
+
+ $self->{'delay'} = 1; # minutes again (used to be seconds)!!
+ $self->{'use_sleep'} = 1;
+ $self->{'agent'} = $name;
+ $self->{'from'} = $from;
+ # current netloc's we're checking:
+ $self->{'checking'} = {};
+
+ if ($rules) {
+ $rules->agent($name);
+ $self->{'rules'} = $rules;
+ } else {
+ $self->{'rules'} = new WWW::RobotRules $name;
+ }
+
+ $self;
+}
+
+=head2 $ua->delay([$minutes])
+
+Set/Get the minimum delay between requests to the same server. The
+default is 1 minute.
+
+Note: Previous versions of LWP Parallel-Robot used I<Seconds> instead of
+ I<Minutes>! This is now compatible with LWP Robot.
+
+=cut
+
+# reuse LWP::RobotUA::delay here (just needed to clarify usage)
+
+=head2 $ua->host_wait($netloc)
+
+Returns the number of seconds you must wait before you can make a new
+request to this server. This method keeps track of all of the robots
+connection, and enforces the delay constraint specified via the delay
+method above for each server individually.
+
+Note: Although it says 'host', it really means 'netloc/server',
+i.e. it differentiates between individual servers running on different
+ports, even though they might be on the same machine ('host'). This
+function is mostly used internally, where RobotUA calls it to find out
+when to send the next request to a certain server.
+
+=cut
+
+sub host_wait
+{
+ my($self, $netloc) = @_;
+ return undef unless defined $netloc;
+ my $last = $self->{'rules'}->last_visit($netloc);
+ if ($last) {
+ my $wait = int($self->{'delay'} * 60 - (time - $last));
+ $wait = 0 if $wait < 0;
+ return $wait;
+ }
+ return 0;
+}
+
+=head2 $ua->as_string
+
+Returns a string that describes the state of the UA.
+Mainly useful for debugging.
+
+=cut
+
+sub as_string
+{
+ my $self = shift;
+ my @s;
+ push(@s, "Robot: $self->{'agent'} operated by $self->{'from'} [$self]");
+ push(@s, " Minimum delay: " . int($self->{'delay'}) . " minutes");
+ push(@s, " Rules = $self->{'rules'}");
+ join("\n", @s, '');
+}
+
+
+#
+# private methods (reimplementations of LWP::Parallel::UserAgent methods)
+#
+
+# this method now first checks the robot rules. It will try to
+# download the robots.txt file before proceeding with any more
+# requests to an unvisited site.
+# It will also observe the delay specified in our ->delay method
+sub _make_connections_in_order {
+ my $self = shift;
+ LWP::Debug::trace('()');
+
+ my($failed_connections, $remember_failures, $ordpend_connections, $rules) =
+ @{$self}{qw(failed_connections remember_failures
+ ordpend_connections rules)};
+
+ my ($entry, @queue, %busy);
+ # get first entry from pending connections
+ while ( $entry = shift @$ordpend_connections ) {
+
+ my $request = $entry->request;
+ my $netloc = eval { local $SIG{__DIE__}; $request->url->host_port; };
+
+ if ( $remember_failures and $failed_connections->{$netloc} ) {
+ my $response = $entry->response;
+ $response->code (&HTTP::Status::RC_INTERNAL_SERVER_ERROR);
+ $response->message ("Server unavailable");
+ # simulate immediate response from server
+ $self->on_failure ($entry->request, $response, $entry);
+ next;
+ }
+
+ push (@queue, $entry), next if $busy{$netloc};
+
+ # Do we try to access a new server?
+ my $allowed = $rules->allowed($request->url);
+ # PS: pending Robots.txt requests are always allowed! (hopefully)
+
+ if ($allowed < 0) {
+ LWP::Debug::debug("Host not visited before, or robots.".
+ "txt expired: ($allowed) ".$request->url);
+ my $checking = $self->_checking_robots_txt ($netloc);
+ # let's see if we're already busy checkin' this host
+ if ( $checking > 0 ) {
+ LWP::Debug::debug("Already busy checking here. Request queued");
+ push (@queue, $entry);
+ } elsif ( $checking < 0 ) {
+ # We already checked here. Seems the robots.txt
+ # expired afterall. Pretend we're allowed
+ LWP::Debug::debug("Checked this host before. robots.txt".
+ " expired. Assuming access ok");
+ $allowed = 1;
+ } else {
+ # fetch "robots.txt"
+ my $robot_url = $request->url->clone;
+ $robot_url->path("robots.txt");
+ $robot_url->query(undef);
+ LWP::Debug::debug("Requesting $robot_url");
+
+ # make access to robot.txt legal since this might become
+ # a recursive call (in case we lack bandwith to connect
+ # immediately)
+ $rules->parse($robot_url, "");
+
+ my $robot_req = new HTTP::Request 'GET', $robot_url;
+ my $response = HTTP::Response->new(0, '<empty response>');
+ $response->request($robot_req);
+
+ my $robot_entry = new LWP::Parallel::UserAgent::Entry {
+ request => $robot_req,
+ response => $response,
+ size => 8192,
+ redirect_ok => 0,
+ arg => sub {
+ # callback function (closure)
+ my ($content, $robot_res, $protocol) = @_;
+ my $netloc = eval { local $SIG{__DIE__};
+ $request->url->host_port; };
+ # unset flag - we're done checking
+ $self->_checking_robots_txt ($netloc, -1);
+ $rules->visit($netloc);
+
+ my $fresh_until = $robot_res->fresh_until;
+ if ($robot_res->is_success) {
+ my $c = $robot_res->content;
+ if ($robot_res->content_type =~ m,^text/, &&
+ $c =~ /Disallow/) {
+ LWP::Debug::debug("Parsing robot rules for ".
+ $netloc);
+ $rules->parse($robot_url, $c, $fresh_until);
+ }
+ else {
+ LWP::Debug::debug("Ignoring robots.txt for ".
+ $netloc);
+ $rules->parse($robot_url, "", $fresh_until);
+ }
+ } else {
+ LWP::Debug::debug("No robots.txt file found at " .
+ $netloc);
+ $rules->parse($robot_url, "", $fresh_until);
+ }
+ },
+ };
+ # immediately try to connect (if bandwith available)
+ push (@queue, $robot_entry), $busy{$netloc}++
+ unless $self->_check_bandwith($robot_entry);
+ # mark this host as being checked
+ $self->_checking_robots_txt ($netloc, 1);
+ # don't forget to queue the entry that triggered this request
+ push (@queue, $entry);
+ }
+ }
+
+ unless ($allowed) {
+ # we're not allowed to connect to this host
+ my $res = new HTTP::Response
+ &HTTP::Status::RC_FORBIDDEN, 'Forbidden by robots.txt';
+ $entry->response($res);
+ # silently drop entry here from ordpend_connections
+ } elsif ($allowed > 0) {
+ # check robot-wait information to see if we have to wait
+ my $wait = $self->host_wait($netloc);
+
+ # if so, push on @queue queue
+ if ($wait) {
+ LWP::Debug::trace("Must wait $wait more seconds (sleep is ".
+ ($self->{'use_sleep'} ? 'on' : 'off') . ")");
+ if ($self->{'use_sleep'}) {
+ # well, we don't really use sleep, but lets emulate
+ # the standard LWP behavior as closely as possible...
+ push (@queue, $entry);
+
+ # now we also have to raise a red flag for all
+ # remaining entries at this particular
+ # host. Otherwise we might block the first x
+ # requests to this server, but have finally waited
+ # long enough when the x+1 request comes off the
+ # queue, and then we would connect to the x+1
+ # request before any of the first x requests
+ # (which is not what we want!)
+ $busy{$netloc}++;
+ } else {
+ LWP::Debug::debug("'use_sleep' disabled, generating response");
+ my $res = new HTTP::Response
+ &HTTP::Status::RC_SERVICE_UNAVAILABLE, 'Please, slow down';
+ $res->header('Retry-After', time2str(time + $wait));
+ $entry->response($res);
+ }
+ } else { # check bandwith
+ unless ( $self->_check_bandwith($entry) ) {
+ # if _check_bandwith returns a value, it means that
+ # no bandwith is available: push $entry on queue
+ push (@queue, $entry);
+ $busy{$netloc}++;
+ } else {
+ $rules->visit($netloc);
+ }
+ }
+ }
+ }
+ # the un-connected entries form the new stack
+ $self->{'ordpend_connections'} = \@queue;
+}
+
+# this method now first checks the robot rules. It will try to
+# download the robots.txt file before proceeding with any more
+# requests to an unvisited site.
+# It will also observe the delay specified in our ->delay method
+sub _make_connections_unordered {
+ my $self = shift;
+ LWP::Debug::trace('()');
+
+ my($pending_connections, $failed_connections, $remember_failures, $rules) =
+ @{$self}{qw(pending_connections failed_connections
+ remember_failures rules)};
+
+ my ($entry, $queue, $netloc);
+
+ my %delete;
+ # check every host in sequence (use 'each' for better performance)
+ SERVER:
+ while (($netloc, $queue) = each %$pending_connections) {
+
+ # since we shouldn't alter the hash itself while iterating through it
+ # via 'each', we'll make a note here for each netloc that has an
+ # empty queue, so that we can explicitly delete them afterwards:
+ unless (@$queue) {
+ LWP::Debug::debug("Marking empty queue for '$netloc' for deletion");
+ $delete{$netloc}++;
+ next SERVER;
+ }
+
+ # check if we already tried to connect to this location, and failed
+ if ( $remember_failures and $failed_connections->{$netloc} ) {
+ LWP::Debug::debug("Removing all ". scalar @$queue .
+ " entries for unreachable host '$netloc'");
+ while ( $entry = shift @$queue ) {
+ my $response = $entry->response;
+ $response->code (&HTTP::Status::RC_INTERNAL_SERVER_ERROR);
+ $response->message ("Server unavailable");
+ # simulate immediate response from server
+ $self->on_failure ($entry->request, $response, $entry);
+ }
+ # make sure we delete this netloc-entry later
+ LWP::Debug::debug("Marking empty queue for '$netloc' for deletion");
+ $delete{$netloc}++;
+ next SERVER;
+ }
+
+ # get first entry from pending connections at this host
+ while ( $entry = shift @$queue ) {
+ my $request = $entry->request;
+
+ # Do we try to access a new server?
+ my $allowed = $rules->allowed($request->url);
+ # PS: pending Robots.txt requests are always allowed! (hopefully)
+
+ if ($allowed < 0) {
+ LWP::Debug::debug("Host not visited before, or robots.".
+ "txt expired: ".$request->url);
+ my $checking = $self->_checking_robots_txt
+ ($request->url->host_port);
+ # let's see if we're already busy checkin' this host
+ if ( $checking > 0 ) {
+ # if so, don't register yet another robots.txt request!
+ LWP::Debug::debug("Already busy checking here. ".
+ "Request queued");
+ unshift (@$queue, $entry);
+ next SERVER;
+ } elsif ( $checking < 0 ) {
+ # We already checked here. Seems the robots.txt
+ # expired afterall. Pretend we're allowed
+ LWP::Debug::debug("Checked this host before. ".
+ "robots.txt expired. Assuming access ok");
+ $allowed = 1;
+ } else {
+ # queue the entry that triggered this request
+ unshift (@$queue, $entry);
+ # fetch "robots.txt" (i.e. create & issue robot request)
+ my $robot_url = $request->url->clone;
+ $robot_url->path("robots.txt");
+ $robot_url->query(undef);
+ LWP::Debug::debug("Requesting $robot_url");
+
+ # make access to robot.txt legal since this might become
+ # a recursive call (in case we lack bandwith to connect
+ # immediately)
+ $rules->parse($robot_url, "");
+
+ my $robot_req = new HTTP::Request 'GET', $robot_url;
+ my $response = HTTP::Response->new(0, '<empty response>');
+ $response->request($robot_req);
+
+ my $robot_entry = new LWP::Parallel::UserAgent::Entry {
+ request => $robot_req,
+ response => $response,
+ size => 8192,
+ redirect_ok => 0,
+ arg => sub {
+ # callback function (closure)
+ my ($content, $robot_res, $protocol) = @_;
+ my $netloc = eval { local $SIG{__DIE__};
+ $request->url->host_port; };
+ # unset flag - we're done checking
+ $self->_checking_robots_txt ($netloc, -1);
+ $rules->visit($netloc);
+
+ my $fresh_until = $robot_res->fresh_until;
+ if ($robot_res->is_success) {
+ my $c = $content; # thanks to Vlad Ciubotariu
+ if ($robot_res->content_type =~ m,^text/, &&
+ $c =~ /Disallow/) {
+ LWP::Debug::debug("Parsing robot rules for ".
+ $netloc);
+ $rules->parse($robot_url, $c, $fresh_until);
+ }
+ else {
+ LWP::Debug::debug("Ignoring robots.txt for ".
+ $netloc);
+ $rules->parse($robot_url, "", $fresh_until);
+ }
+ } else {
+ LWP::Debug::debug("No robots.txt file found at ".
+ $netloc);
+ $rules->parse($robot_url, "", $fresh_until);
+ }
+ },
+ };
+ # mark this host as being checked
+ $self->_checking_robots_txt ($request->url->host_port, 1);
+ # immediately try to connect (if bandwith available)
+ unless ( $self->_check_bandwith($robot_entry) ) {
+ unshift (@$queue, $robot_entry);
+ }
+ # we can move to the next server either way, since
+ # we'll have to wait for the results of the
+ # robot.txt request anyways
+ next SERVER;
+ }
+ }
+
+ unless ($allowed) {
+ # we're not allowed to connect to this host
+ my $res = new HTTP::Response
+ &HTTP::Status::RC_FORBIDDEN, 'Forbidden by robots.txt';
+ $entry->response($res);
+ # silently drop entry here from pending_connections
+ } elsif ($allowed > 0) {
+ my $netloc = eval { local $SIG{__DIE__};
+ $request->url->host_port; }; # LWP 5.60
+
+ # check robot-wait information to see if we have to wait
+ my $wait = $self->host_wait($netloc);
+
+ # if so, push on @$queue queue
+ if ($wait) {
+ LWP::Debug::trace("Must wait $wait more seconds (sleep is ".
+ ($self->{'use_sleep'} ? 'on' : 'off') . ")");
+ if ($self->{'use_sleep'}) {
+ unshift (@$queue, $entry);
+ next SERVER;
+ } else {
+ LWP::Debug::debug("'use_sleep' disabled");
+ my $res = new HTTP::Response
+ &HTTP::Status::RC_SERVICE_UNAVAILABLE, 'Please, slow down';
+ $res->header('Retry-After', time2str(time + $wait));
+ $entry->response($res);
+ }
+ } else { # check bandwith
+ unless ( $self->_check_bandwith($entry) ) {
+ # if _check_bandwith returns undef, it means that
+ # no bandwith is available: push $entry on queue
+ LWP::Debug::debug("Not enough bandwidth for ".
+ "request to $netloc");
+ unshift (@$queue, $entry);
+ next SERVER;
+ } else {
+ # make sure we update the time of our last
+ # visit to this site properly
+ $rules->visit($netloc);
+ }
+ }
+ }
+ LWP::Debug::debug("Queue for $netloc contains ".
+ scalar @$queue . " pending connections");
+ $delete{$netloc}++ unless scalar @$queue;
+ }
+ }
+ # clean up: (we do this outside of the loop since we're not
+ # suppose to alter an associative array (hash) while iterating
+ # through it using 'each')
+ foreach (keys %delete) {
+ LWP::Debug::debug("Deleting queue for '$_'");
+ delete $self->{'pending_connections'}->{$_}
+ }
+}
+
+
+# request-slots available at host (checks for robots lock)
+sub _req_available {
+ my ( $self, $url ) = @_;
+ # check if blocked
+ if ( $self->_checking_robots_txt($url->host_port) ) {
+ return 0;
+ } else {
+ # else use superclass method
+ $self->SUPER::_req_available($url);
+ }
+};
+
+
+#
+# new private methods
+#
+
+# sets/get robot lock for given host.
+sub _checking_robots_txt {
+ my ($self, $netloc, $lock) = @_;
+ local $^W = 0; # prevent warnings here;
+
+ $self->{'checking'}->{$netloc} = 0
+ unless defined ($self->{'checking'}->{$netloc});
+
+ if (defined $lock) {
+ $self->{'checking'}->{$netloc} = $lock;
+ } else {
+ $self->{'checking'}->{$netloc};
+ }
+}
+
+=head1 SEE ALSO
+
+L<LWP::Parallel::UserAgent>, L<LWP::RobotUA>, L<WWW::RobotRules>
+
+=head1 COPYRIGHT
+
+Copyright 1997-2004 Marc Langheinrich E<lt>marclang at cpan.org>
+
+This library is free software; you can redistribute it and/or
+modify it under the same terms as Perl itself.
+
+=cut
+
+1;
+
diff --git a/lib/LWP/Parallel/UserAgent.pm b/lib/LWP/Parallel/UserAgent.pm
new file mode 100755
index 0000000..4610b6e
--- /dev/null
+++ b/lib/LWP/Parallel/UserAgent.pm
@@ -0,0 +1,1571 @@
+# -*- perl -*-
+# $Id: UserAgent.pm,v 1.31 2004/02/10 15:19:19 langhein Exp $
+# derived from: UserAgent.pm,v 2.1 2001/12/11 21:11:29 gisle Exp $
+# and: ParallelUA.pm,v 1.16 1997/07/23 16:45:09 ahoy Exp $
+
+package LWP::Parallel::UserAgent::Entry;
+
+require 5.004;
+use Carp();
+
+# allowed fields in Parallel::UserAgent entry
+my %fields = (
+ arg => undef,
+ fullpath => undef,
+ protocol => undef,
+ proxy => undef,
+ redirect_ok => undef,
+ response => undef,
+ request => undef,
+ size => undef,
+ cmd_socket => undef,
+ listen_socket => undef,
+ content_size => undef,
+ );
+
+sub new {
+ my($class, $init) = @_;
+
+ my $self = {
+ _permitted => \%fields,
+ %fields,
+ };
+ $self = bless $self, $class;
+
+ if ($init) {
+ foreach (keys %$init) {
+ # call functions and initialize with given values
+ $self->$_($init->{$_});
+ }
+ }
+ $self;
+}
+
+sub get {
+ my $self = shift;
+ my @answer;
+ my $field;
+ foreach $field (@_) {
+ push (@answer, $self->$field() );
+ }
+ @answer;
+}
+
+use vars qw($AUTOLOAD);
+
+sub AUTOLOAD {
+ my $self = shift;
+ my $type = ref($self) || die "$self is not an object";
+ my $name = $AUTOLOAD;
+ $name =~ s/.*://; # strip fully qualified portion
+ unless ( exists $self->{_permitted}->{$name} ) {
+ Carp::croak "Can't access '$name' field in $type object";
+ }
+ if (@_) {
+ return $self->{$name} = $_[0];
+ } else {
+ return $self->{$name};
+ }
+}
+
+sub DESTROY { };
+
+package LWP::Parallel::UserAgent;
+
+use Exporter();
+
+$ENV{PERL_LWP_USE_HTTP_1.0} = "Yes"; # until i figure out gisle's http1.1 stuff
+require LWP::Parallel::Protocol;
+require LWP::UserAgent;
+ at ISA = qw(LWP::UserAgent Exporter);
+
+ at EXPORT = qw();
+# callback commands
+ at EXPORT_OK = qw(C_ENDCON C_ENDALL C_LASTCON);
+%EXPORT_TAGS = (CALLBACK => [qw(C_ENDCON C_ENDALL C_LASTCON)]);
+
+sub C_ENDCON { -1; }; # end current connection (but keep waiting/connecting)
+sub C_LASTCON{ -2; }; # don't start any new connections
+sub C_ENDALL { -3; }; # end all connections and return from 'wait'-method
+
+require HTTP::Request;
+require HTTP::Response;
+
+use Carp ();
+use LWP::Debug ();
+use HTTP::Status ();
+use HTTP::Date qw(time2str);
+use IO::Select;
+use strict;
+
+=head1 NAME
+
+LWP::Parallel::UserAgent - A class for parallel User Agents
+
+=head1 SYNOPSIS
+
+ require LWP::Parallel::UserAgent;
+ $ua = LWP::Parallel::UserAgent->new();
+ ...
+
+ $ua->redirect (0); # prevents automatic following of redirects
+ $ua->max_hosts(5); # sets maximum number of locations accessed in parallel
+ $ua->max_req (5); # sets maximum number of parallel requests per host
+ ...
+ $ua->register ($request); # or
+ $ua->register ($request, '/tmp/sss'); # or
+ $ua->register ($request, \&callback, 4096);
+ ...
+ $ua->wait ( $timeout );
+ ...
+ sub callback { my($data, $response, $protocol) = @_; .... }
+
+=head1 DESCRIPTION
+
+This class implements a user agent that access web sources in parallel.
+
+Using a I<LWP::Parallel::UserAgent> as your user agent, you typically start by
+registering your requests, along with how you want the Agent to process
+the incoming results (see $ua->register).
+
+Then you wait for the results by calling $ua->wait. This method only
+returns, if all requests have returned an answer, or the Agent timed
+out. Also, individual callback functions might indicate that the
+Agent should stop waiting for requests and return. (see $ua->register)
+
+See the file L<LWP::Parallel> for a set of simple examples.
+
+=head1 METHODS
+
+The LWP::Parallel::UserAgent is a sub-class of LWP::UserAgent, but not all
+of its methods are available here. However, you can use its main
+methods, $ua->simple_request and $ua->request, in order to simulate
+singular access with this package. Of course, if a single request is all
+you need, then you should probably use LWP::UserAgent in the first place,
+since it will be faster than our emulation here.
+
+For parallel access, you will need to use the new methods that come with
+LWP::Parallel::UserAgent, called $pua->register and $pua->wait. See below
+for more information on each method.
+
+=over 4
+
+=cut
+
+
+#
+# Additional attributes in addition to those found in LWP::UserAgent:
+#
+# $self->{'entries_by_sockets'} = {} Associative Array of registered
+# requests, indexed via sockets
+#
+# $self->{'entries_by_requests'} = {} Associative Array of registered
+# requests, indexed via requests
+#
+
+=item $ua = LWP::Parallel::UserAgent->new();
+
+Constructor for the parallel UserAgent. Returns a reference to a
+LWP::Parallel::UserAgent object.
+
+Optionally, you can give it an existing LWP::Parallel::UserAgent (or
+even an LWP::UserAgent) as a first argument, and it will "clone" a
+new one from this (This just copies the behavior of LWP::UserAgent.
+I have never actually tried this, so let me know if this does not do
+what you want).
+
+=cut
+
+sub new {
+ my($class,$init) = @_;
+
+ # my $self = new LWP::UserAgent $init;
+ my $self = new LWP::UserAgent; # thanks to Kirill
+ $self = bless $self, $class;
+
+ # handle responses per default
+ $self->{'handle_response'} = 1;
+ # do not perform nonblocking connects per default
+ $self->{'nonblock'} = 0;
+ # don't handle duplicates per default
+ $self->{'handle_duplicates'} = 0;
+ # do not use ordered lists per default
+ $self->{'handle_in_order'} = 0;
+ # do not cache failed connection attempts
+ $self->{'remember_failures'} = 0;
+
+ # supply defaults
+ $self->{'max_hosts'} = 7;
+ $self->{'max_req'} = 5;
+
+ $self->initialize;
+}
+
+=item $ua->initialize;
+
+Takes no arguments and initializes the UserAgent. It is automatically
+called in LWP::Parallel::UserAgent::new, so usually there is no need to
+call this explicitly.
+
+However, if you want to re-use the same UserAgent object for a number
+of "runs", you should call $ua->initialize after you have processed the
+results of the previous call to $ua->wait, but before registering any
+new requests.
+
+=cut
+
+
+sub initialize {
+ my $self = shift;
+
+ # list of entries
+ $self->{'entries_by_sockets'} = {};
+ $self->{'entries_by_requests'} = {};
+
+ $self->{'previous_requests'} = {};
+
+ # connection handling
+ $self->{'current_connections'} = {}; # hash
+ $self->{'pending_connections'} = {}; # hash (of [] arrays)
+ $self->{'ordpend_connections'} = []; # array
+ $self->{'failed_connections'} = {}; # hash
+
+ # duplicates
+ $self->{'seen_request'} = {};
+
+ # select objects for reading & writing
+ $self->{'select_in'} = IO::Select->new();
+ $self->{'select_out'} = IO::Select->new();
+
+ $self;
+}
+
+=item $ua->redirect ( $ok )
+
+Changes the default value for permitting Parallel::UserAgent to follow
+redirects and authentication-requests. The standard value is 'true'.
+
+See C<$ua->register> for how to change the behaviour for particular
+requests only.
+
+=cut
+
+sub redirect {
+ my $self = shift;
+ LWP::Debug::trace("($_[0])");
+ $self->{'handle_response'} = $_[0] if defined $_[0];
+}
+
+=item $ua->nonblock ( $ok )
+
+Per default, LWP::Parallel will connect to a site using a blocking call. If
+you want to speed this step up, you can try the new non-blocking version of
+the connect call by setting $ua->nonblock to 'true'.
+The standard value is 'false' (although this might change in the future if
+nonblocking connects turn out to be stable enough.)
+
+=cut
+
+sub nonblock {
+ my $self = shift;
+ LWP::Debug::trace("($_[0])");
+ $self->{'nonblock'} = $_[0] if defined $_[0];
+}
+
+
+=item $ua->duplicates ( $ok )
+
+Changes the default value for permitting Parallel::UserAgent to ignore
+duplicate requests. The standard value is 'false'.
+
+=cut
+
+sub duplicates {
+ my $self = shift;
+ LWP::Debug::trace("($_[0])");
+ $self->{'handle_duplicates'} = $_[0] if defined $_[0];
+}
+
+=item $ua->in_order ( $ok )
+
+Changes the default value to restricting Parallel::UserAgent to
+connect to the registered sites in the order they were registered. The
+default value FALSE allows Parallel::UserAgent to make the connections
+in an apparently random order.
+
+=cut
+
+sub in_order {
+ my $self = shift;
+ LWP::Debug::trace("($_[0])");
+ $self->{'handle_in_order'} = $_[0] if defined $_[0];
+}
+
+=item $ua->remember_failures ( $yes )
+
+If set to one, enables ParalleUA to ignore requests or connections to
+sites that it failed to connect to before during this "run". If set to
+zero (the dafault) Parallel::UserAgent will try to connect to every
+single URL you registered, even if it constantly fails to connect to a
+particular site.
+
+=cut
+
+sub remember_failures {
+ my $self = shift;
+ LWP::Debug::trace("($_[0])");
+ $self->{'remember_failures'} = $_[0] if defined $_[0];
+}
+
+=item $ua->max_hosts ( $max )
+
+Changes the maximum number of locations accessed in parallel. The
+default value is 7.
+
+Note: Although it says 'host', it really means 'netloc/server'! That
+is, multiple server on the same host (i.e. one server running on port
+80, the other one on port 6060) will count as two 'hosts'.
+
+=cut
+
+sub max_hosts {
+ my $self = shift;
+ LWP::Debug::trace("($_[0])");
+ $self->{'max_hosts'} = $_[0] if defined $_[0];
+}
+
+=item $ua->max_req ( $max )
+
+Changes the maximum number of requests issued per host in
+parallel. The default value is 5.
+
+=cut
+
+sub max_req {
+ my $self = shift;
+ LWP::Debug::trace("($_[0])");
+ $self->{'max_req'} = $_[0] if defined $_[0];
+}
+
+=item $ua->register ( $request [, $arg [, $size [, $redirect_ok]]] )
+
+Registers the given request with the User Agent. In case of an error,
+a C<HTTP::Request> object containing the HTML-Error message is
+returned. Otherwise (that is, in case of a success) it will return
+undef.
+
+The C<$request> should be a reference to a C<HTTP::Request> object
+with values defined for at least the method() and url() attributes.
+
+C<$size> specifies the number of bytes Parallel::UserAgent should try
+to read each time some new data arrives. Setting it to '0' or 'undef'
+will make Parallel::UserAgent use the default. (8k)
+
+Specifying C<$redirect_ok> will alter the redirection behaviour for
+this particular request only. '1' or any other true value will force
+Parallel::UserAgent to follow redirects, even if the default is set to
+'no_redirect'. (see C<$ua->redirect>) '0' or any other false value
+should do the reverse. See LWP::UserAgent for using an object's
+C<requests_redirectable> list for fine-tuning this behavior.
+
+If C<$arg> is a scalar it is taken as a filename where the content of
+the response is stored.
+
+If C<$arg> is a reference to a subroutine, then this routine is called
+as chunks of the content is received. An optional C<$size> argument
+is taken as a hint for an appropriate chunk size. The callback
+function is called with 3 arguments: the data received this time, a
+reference to the response object and a reference to the protocol
+object. The callback can use the predefined constants C_ENDCON,
+C_LASTCON and C_ENDALL as a return value in order to influence pending
+and active connections. C_ENDCON will end this connection immediately,
+whereas C_LASTCON will inidicate that no further connections should be
+made. C_ENDALL will immediately end all requests and let the
+Parallel::UserAgent return from $pua->wait().
+
+If C<$arg> is omitted, then the content is stored in the response
+object itself.
+
+If C<$arg> is a C<LWP::Parallel::UserAgent::Entry> object, then this
+request will be registered as a follow-up request to this particular
+entry. This will not create a new entry, but instead link the current
+response (i.e. the reason for re-registering) as $response->previous
+to the new response of this request. All other fields are either
+re-initialized ($request, $fullpath, $proxy) or left untouched ($arg,
+$size). (This should only be use internally)
+
+LWP::Parallel::UserAgent->request also allows the registration of
+follow-up requests to existing requests, that required redirection or
+authentication. In order to do this, an Parallel::UserAgent::Entry
+object will be passed as the second argument to the call. Usually,
+this should not be used directly, but left to the internal
+$ua->handle_response method!
+
+=cut
+
+sub register {
+ my ($self, $request, $arg, $size, $redirect) = @_;
+ my $entry;
+
+ unless (ref($request) and $request->can('url')) {
+ Carp::carp "Can't use '$request' as an HTTP::Request object. Ignoring";
+ return LWP::UserAgent::_new_response($request, &HTTP::Status::RC_NOT_IMPLEMENTED,
+ "Unknown request type: '$request'");
+ }
+ LWP::Debug::debug("(".$request->url->as_string .
+ ", ". (defined $arg ? $arg : '[undef]') .
+ ", ". (defined $size ? $size : '[undef]') .
+ ", ". (defined $redirect ? $redirect : '[undef]') . ")");
+
+ my($failed_connections,$remember_failures,$handle_duplicates,
+ $previous_requests)= @{$self}{qw(failed_connections
+ remember_failures handle_duplicates previous_requests)};
+
+ my $response = HTTP::Response->new(0, '<empty response>');
+ # make sure our request gets stored within the response
+ # (usually this is done automatically by LWP in case of
+ # a successful connection, but we want to have this info
+ # available even when something goes wrong)
+ $response->request($request);
+
+ # so far Parallel::UserAgent can handle http, ftp, and file requests
+ # (anybody volunteering to porting the rest of the protocols?!)
+ unless ( $request->url->scheme eq 'http' or $request->url->scheme eq 'ftp'
+ # https suggestion by <mszabo at coralwave.com>
+ or $request->url->scheme eq 'https'
+ # file scheme implementation by
+ or $request->url->scheme eq 'file'
+ ){
+ $response->code (&HTTP::Status::RC_NOT_IMPLEMENTED);
+ $response->message ("Unknown Scheme: ". $request->url->scheme);
+ Carp::carp "Parallel::UserAgent can not handle '". $request->url->scheme .
+ "'-requests. Request ignored!";
+ # simulate immediate response from server
+ $self->on_failure ($request, $response);
+ return $response;
+ }
+
+ my $netloc = $self->_netloc($request->url);
+
+ # check if we already tried to connect to this location, and failed
+ if ( $remember_failures and $failed_connections->{$netloc} ) {
+ $response->code (&HTTP::Status::RC_INTERNAL_SERVER_ERROR);
+ $response->message ("Server unavailable");
+ # simulate immediate response from server
+ $self->on_failure ($request, $response);
+ return $response;
+ }
+
+ # duplicates handling: check if we connected to same URL before
+ if ($handle_duplicates and $previous_requests->{$request->url->as_string}){
+ $response->code (&HTTP::Status::RC_INTERNAL_SERVER_ERROR);
+ $response->message ("Duplicate Request: ". $request->url);
+ ## just ignore the request for now. if you want to simulate
+ ## immediate response from server, uncomment this line:
+ # $self->on_failure ($request, $response);
+ return $response;
+ }
+
+ # support two calling techniques: new request or follow-up
+ # 1) follow-up request:
+ if ( ref($arg) and ( ref($arg) eq "LWP::Parallel::UserAgent::Entry") ) {
+ # called with $entry object as first parameter.
+ # re-register new request with same entry:
+ $entry = $arg;
+ # link the previous response to our new response object
+ $response->previous($entry->response);
+ # and update the fields in our entry
+ $entry->request($request);
+ $entry->response($response);
+ # re-registered requests are put first in line (->unshift)
+ # and stored underneath the host they're accessing:
+ # (first make sure we have an array to push things onto)
+ $self->{'pending_connections'}->{$netloc} = []
+ unless $self->{'pending_connections'}->{$netloc};
+ unshift (@{$self->{'pending_connections'}->{$netloc}}, $entry);
+ unshift (@{$self->{'ordpend_connections'}}, $entry);
+
+ # 2) new request:
+ } else {
+ # called first time, create new entry object
+ $size ||= 8192;
+ $entry = LWP::Parallel::UserAgent::Entry->new( {
+ request => $request,
+ response => $response,
+ arg => $arg,
+ size => $size,
+ content_size => 0,
+ redirect_ok => $self->{'handle_response'},
+ } );
+ # if the user specified
+ $entry->redirect_ok($redirect) if defined $redirect;
+
+ # store new entry by request (only new entries)
+ $self->{'entries_by_requests'}->{$request} = $entry;
+
+ # new requests are put at the end
+ # (first make sure we have an array to push things onto)
+ $self->{'pending_connections'}->{$netloc} = []
+ unless $self->{'pending_connections'}->{$netloc};
+ push (@{$self->{'pending_connections'}->{$netloc}}, $entry);
+ push (@{$self->{'ordpend_connections'}}, $entry);
+ }
+ # duplicates handling: remember this entry
+ if ($handle_duplicates) {
+ $previous_requests->{$request->url->as_string} = $entry;
+ }
+
+ return;
+}
+
+# Create a netloc from the url or return an alias netloc for file: proto
+# Fix netloc for file: reqs to generic localhost.file - this can be changed
+# if necessary. Test to ensure url->scheme doesn't return undef (JB)
+sub _netloc {
+ my $self = shift;
+ my $url = shift;
+
+ my $netloc;
+ if ($url->scheme eq 'file') {
+ $netloc = 'localhost.file';
+ } else {
+ $netloc = $url->host_port; # eg www.cs.washington.edu:8001
+ }
+ $netloc;
+}
+
+
+# this method will take the pending entries one at a time and
+# decide wether we have enough bandwith (as specified by the
+# values in 'max_req' and 'max_hosts') to connect this request.
+# If not, the entry will stay on the stack (w/o changing the
+# order)
+sub _make_connections {
+ my $self = shift;
+ if ($self->{'handle_in_order'}) {
+ $self->_make_connections_in_order;
+ } else {
+ $self->_make_connections_unordered;
+ }
+}
+
+sub _make_connections_in_order {
+ my $self = shift;
+ LWP::Debug::trace('()');
+
+ my ($entry, @queue, %busy);
+ # get first entry from pending connections
+ while ( $entry = shift @{ $self->{'ordpend_connections'} } ) {
+ my $netloc = $self->_netloc($entry->request->url);
+ push (@queue, $entry), next if $busy{$netloc};
+ unless ($self->_check_bandwith($entry)) {
+ push (@queue, $entry);
+ $busy{$netloc}++;
+ };
+ };
+ # the un-connected entries form the new stack
+ $self->{'ordpend_connections'} = \@queue;
+}
+
+# unordered connections have the advantage that we do not have to
+# care about screwing up our list of pending connections. This will
+# speed up our iteration through the list
+sub _make_connections_unordered {
+ my $self = shift;
+ LWP::Debug::trace('()');
+
+ my ($entry, $queue, $netloc);
+ # check every host in sequence (use 'each' for better performance)
+ my %delete;
+ SERVER:
+ while (($netloc, $queue) = each %{$self->{'pending_connections'}}) {
+ # get first entry from pending connections at this host
+ ENTRY:
+ while ( $entry = shift @$queue ) {
+ unless ( $self->_check_bandwith($entry) ) {
+ # we don't have enough bandwith -- put entry back on queue
+ LWP::Debug::debug("Not enough bandwidth for request to $netloc");
+ unshift @$queue, $entry;
+ # we can stop here for this server
+ next SERVER;
+ }
+ } # of while ENTRY
+ # mark for deletion if we emptied the queue at this location
+ LWP::Debug::debug("Queue for $netloc contains ". scalar @$queue . " pending connections");
+ $delete{$netloc}++ unless scalar @$queue;
+ } # of while SERVER
+ # delete all netlocs that we completely handled
+ foreach (keys %delete) {
+ LWP::Debug::debug("Deleting queue for $_");
+ delete $self->{'pending_connections'}->{$_}
+ }
+}
+
+
+# this method checks the available bandwith and either connects
+# the request and returns 1, or, in case we didn't have enough
+# bandwith, returns undef
+sub _check_bandwith {
+ my ( $self, $entry ) = @_;
+ LWP::Debug::trace("($entry [".$entry->request->url."] )");
+
+ my($failed_connections, $remember_failures ) =
+ @{$self}{qw(failed_connections remember_failures)};
+
+ my ($request, $response) = ($entry->request, $entry->response);
+ my $url = $request->url;
+ my $netloc = $self->_netloc($url);
+
+ if ( $remember_failures and $failed_connections->{$netloc} ) {
+ $response->code (&HTTP::Status::RC_INTERNAL_SERVER_ERROR);
+ $response->message ("Server unavailable");
+ # simulate immediate response from server
+ $self->on_failure ($request, $response, $entry);
+ return 1;
+ }
+
+ if ( $self->_active ($netloc) ) {
+ if ( $self->_req_available ( $url ) ) {
+ $self->on_connect ( $request, $response, $entry );
+ unless ( $self->_connect ( $entry ) ) {
+ # only increase connection count if _connect doesn't
+ # return error
+ $self->{'current_connections'}->{$netloc}++;
+ } else {
+ # calling ->on_failure is done within ->_connect
+ $self->{'failed_connections'}->{$netloc}++;
+ }
+ } else {
+ LWP::Debug::debug ("No open request-slots available");
+ return; };
+ } elsif ( $self->_hosts_available ) {
+ $self->on_connect ( $request, $response, $entry );
+ unless ( $self->_connect ( $entry ) ) {
+ # only increase connection count if _connect doesn't return error
+ $self->{'current_connections'}->{$netloc}++;
+ } else {
+ # calling ->on_failure is done within ->_connect
+ LWP::Debug::debug ("Failed connection for '" . $netloc ."'");
+ $self->{'failed_connections'}->{$netloc}++;
+ }
+ } else {
+ LWP::Debug::debug ("No open host-slots available");
+ return;
+ }
+ # indicate success here
+ return 1;
+}
+
+#
+# helper methods for _make_connections:
+#
+# number of active connections per netloc
+sub _active { shift->{'current_connections'}->{$_[0]}; };
+# request-slots available at netloc
+sub _req_available {
+ my ( $self, $url ) = @_;
+ $self->{'max_req'} > $self->_active($self->_netloc($url));
+};
+# host-slots available
+sub _hosts_available {
+ my $self = shift;
+ $self->{'max_hosts'} > scalar keys %{$self->{'current_connections'}};
+};
+
+
+# _connect will take the request of the given entry and try to connect
+# to the host specified in its url. It returns the response object in
+# case of error, undef otherwise.
+sub _connect {
+ my ($self, $entry) = @_;
+ LWP::Debug::trace("($entry [".$entry->request->url."] )");
+ local($SIG{"__DIE__"}); # protect against user defined die handlers
+
+ my ( $request, $response ) = $entry->get( qw(request response) );
+
+ my ($error_response, $proxy, $protocol, $timeout, $use_eval, $nonblock) =
+ $self->init_request ($request);
+ if ($error_response) {
+ # we need to manually set code and message of $response as well, so
+ # that we have the correct information in our $entry as well
+ $response->code ($error_response->code);
+ $response->message ($error_response->message);
+ $self->on_failure ($request, $error_response, $entry);
+ return $error_response;
+ }
+
+ my ($socket, $fullpath);
+
+ # figure out host and connect to site
+ if ($use_eval) {
+ eval {
+ ($socket, $fullpath) =
+ $protocol->handle_connect ($request, $proxy, $timeout, $nonblock );
+ };
+ if ($@) {
+ if ($@ =~ /^timeout/i) {
+ $response->code (&HTTP::Status::RC_REQUEST_TIMEOUT);
+ $response->message ('User-agent timeout');
+ } else {
+ # remove file/line number
+ # $@ =~ s/\s+at\s+\S+\s+line\s+\d+.*//s;
+ $response->code (&HTTP::Status::RC_INTERNAL_SERVER_ERROR);
+ $response->message ($@);
+ }
+ }
+ } else {
+ # user has to handle any dies, usually timeouts
+ ($socket, $fullpath) =
+ $protocol->handle_connect ($request, $proxy, $timeout, $nonblock );
+ }
+
+ unless ($socket) {
+ # something went wrong. Explanation might be in second argument
+ unless ($response->code) {
+ # set response code and message accordingly (note: simply saying
+ # $response = $fullpath or $response = HTTP::Response->new would
+ # only affect the local copy of our response object. When using
+ # its ->code and ->message methods directly, we can affect the
+ # original instead!)
+ if (ref($fullpath) =~ /response/i) {
+ $response->code ($fullpath->code);
+ $response->message ($fullpath->message);
+ } else {
+ $response->code (&HTTP::Status::RC_INTERNAL_SERVER_ERROR);
+ $response->message ("Failed on connect for unknown reasons");
+ }
+ }
+ }
+ # response should be empty, unless something went wrong
+ if ($response->code) {
+ $self->on_failure ($request, $response, $entry);
+ # should we remove $entry from 'entries_by_request' list here? no!
+ return $response;
+ } else {
+ # update $socket, $protocol, $fullpath and $proxy info
+ $entry->protocol($protocol);
+ $entry->fullpath($fullpath);
+ $entry->proxy($proxy);
+ $entry->cmd_socket($socket);
+ $self->{'entries_by_sockets'}->{$socket} = $entry;
+# LWP::Debug::debug ("Socket is $socket");
+ # last not least: register socket with (write-) Select object
+ $self->_add_out_socket($socket);
+ }
+
+ return;
+}
+
+# once we're done with a connection, we have to make sure that all
+# references to it's socket are removed, and that the counter for its
+# netloc is properly decremented.
+sub _remove_current_connection {
+ my ($self, $entry ) = @_;
+ LWP::Debug::trace("($entry [".$entry->request->url."] )");
+
+ $entry->cmd_socket(undef);
+ $entry->listen_socket(undef);
+
+ my $netloc = $self->_netloc($entry->request->url);
+ if ( $self->_active ($netloc) ) {
+ delete $self->{'current_connections'}->{$netloc}
+ unless --$self->{'current_connections'}->{$netloc};
+ } else {
+ # this is serious! better stop here
+ Carp::confess "No connections for '$netloc'";
+ }
+}
+
+=item $ua->on_connect ( $request, $response, $entry )
+
+This method should be overridden in an (otherwise empty) subclass in
+order to present customized messages for each connection attempted by
+the User Agent.
+
+=cut
+
+sub on_connect {
+ my ($self, $request, $response, $entry) = @_;
+ LWP::Debug::trace("(".$request->url->as_string.")");
+}
+
+=item $ua->on_failure ( $request, $response, $entry )
+
+This method should be overridden in an (otherwise empty) subclass in
+order to present customized messages for each connection or
+registration that failed.
+
+=cut
+
+sub on_failure {
+ my ($self, $request, $response, $entry) = @_;
+ LWP::Debug::trace("(".$request->url->as_string.")");
+}
+
+=item $ua->on_return ( $request, $response, $entry )
+
+This method should be overridden in an (otherwise empty) subclass in
+order to present customized messages for each request returned. If a
+callback function was registered with this request, this callback
+function is called before $pua->on_return.
+
+Please note that while $pua->on_return is a method (which should be
+overridden in a subclass), a callback function is NOT a method, and
+does not have $self as its first parameter. (See more on callbacks
+below)
+
+The purpose of $pua->on_return is mainly to provide messages when a
+request returns. However, you can also re-register follow-up requests
+in case you need them.
+
+If you need specialized follow-up requests depending on the request
+that just returend, use a callback function instead (which can be
+different for each request registered). Otherwise you might end up
+writing a HUGE if..elsif..else.. branch in this global method.
+
+=cut
+
+sub on_return {
+ my ($self, $request, $response, $entry) = @_;
+ LWP::Debug::trace("(".join (", ",$request->url->as_string,
+ (defined $response->code ?
+ $response->code : '[undef]'),
+ (defined $response->message ?
+ $response->message : '[undef]')) .")");
+}
+
+=item $us->discard_entry ( $entry )
+
+Completely removes an entry from memory, in case its output is not
+needed. Use this in callbacks such as C<on_return> or <on_failure> if
+you want to make sure an entry that you do not need does not occupy
+valuable main memory.
+
+=cut
+
+# proposed by Glenn Wood <glenn at savesmart.com>
+# additional fixes by Kirill http://www.en-directo.net/mail/kirill.html
+sub discard_entry {
+ my ($self, $entry) = @_;
+ LWP::Debug::trace("($entry)") if $entry;
+
+ # Entries are added to ordpend_connections in $self->register:
+ # push (@{$self->{'ordpend_connections'}}, $entry);
+ #
+ # the reason we even maintain this ordered list is that
+ # currently the user can change the "in_order" flag any
+ # time, even if we already started 'wait'ing.
+ my $entries = $self->{ordpend_connections};
+ @$entries = grep $_ != $entry, @$entries;
+
+ $entries = $self->{entries_by_requests};
+ delete @$entries{grep $entries->{$_} == $entry, keys %$entries};
+
+ $entries = $self->{entries_by_sockets};
+ delete @$entries{grep $entries->{$_} == $entry, keys %$entries};
+
+ return;
+}
+
+
+=item $ua->wait ( $timeout )
+
+Waits for available sockets to write to or read from. Will timeout
+after $timeout seconds. Will block if $timeout = 0 specified. If
+$timeout is omitted, it will use the Agent default timeout value.
+
+=cut
+
+sub wait {
+ my ($self, $timeout) = @_;
+ LWP::Debug::trace("($timeout)") if $timeout;
+
+ my $foobar;
+
+ $timeout = $self->{'timeout'} unless defined $timeout;
+
+ # shortcuts to in- and out-filehandles
+ my $fh_out = $self->{'select_out'};
+ my $fh_in = $self->{'select_in'};
+ my $fh_err; # ignore errors for now
+ my @ready;
+
+ my ($active, $pending);
+ ATTEMPT:
+ while ( $active = scalar keys %{ $self->{'current_connections'} } or
+ $pending = scalar ($self->{'handle_in_order'}?
+ @{ $self->{'ordpend_connections'} } :
+ keys %{ $self->{'pending_connections'} } ) ) {
+ # check select
+ if ( (scalar $fh_in->handles) or (scalar $fh_out->handles) ) {
+ LWP::Debug::debug("Selecting Sockets, timeout is $timeout seconds");
+ unless ( @ready = IO::Select->select ($fh_in, $fh_out,
+ undef, $timeout) ) {
+ #
+ # empty array, means that select timed out
+ LWP::Debug::trace('select timeout');
+ my ($socket);
+ # set all active requests to "timed out"
+ foreach $socket ($fh_in->handles ,$fh_out->handles) {
+ my $entry = $self->{'entries_by_sockets'}->{$socket};
+ delete $self->{'entries_by_sockets'}->{$socket};
+ unless ($entry->response->code) {
+ # moved the creation of the timeout response into the loop so that
+ # each entry gets its own response object (otherwise they'll all
+ # share the same request entry in there). thanks to John Salmon
+ # <john at thesalmons.org> for pointing this out.
+ my $response = HTTP::Response->new(&HTTP::Status::RC_REQUEST_TIMEOUT,
+ 'User-agent timeout (select)');
+ # don't overwrite an already existing response
+ $entry->response ($response);
+ $response->request ($entry->request);
+ # only count as failure if we have no response yet
+ $self->on_failure ($entry->request, $response, $entry);
+ } else {
+ my $res = $entry->response;
+ $res->message ($res->message . " (timeout)");
+ $entry->response ($res);
+ # thanks to Jonathan Feinberg <jdf at pobox.com> who finally
+ # reminded me that partial replies should trigger some sort
+ # of on_xxx callback as well. Let's try on_failure for now,
+ # unless people think that on_return is the right thing to
+ # call here:
+ $self->on_failure ($entry->request, $res, $entry);
+ }
+ $self->_remove_current_connection ( $entry );
+ }
+ # and delete from read- and write-queues
+ foreach $socket ($fh_out->handles) { $fh_out->remove($socket); }
+ foreach $socket ($fh_in->handles) { $fh_in->remove($socket); }
+ # continue processing -- pending requests might still work!
+ } else {
+ # something is ready for reading or writing
+ my ($ready_read, $ready_write, $error) = @ready;
+ my ($socket);
+
+ #
+ # WRITE QUEUE
+ #
+ foreach $socket (@$ready_write) {
+ my $so_err;
+ if ($socket->can("getsockopt")) { # we also might have IO::File!
+ ## check if there is any error (suggested by Mike Heller)
+ $so_err = $socket->getsockopt( Socket::SOL_SOCKET(),
+ Socket::SO_ERROR() );
+ LWP::Debug::debug( "SO_ERROR: $so_err" ) if $so_err;
+ }
+ # modularized this chunk so that it can be reused by
+ # POE::Component::Client::UserAgent
+ $self->_perform_write ($socket, $timeout) unless $so_err;
+
+ }
+
+ #
+ # READ QUEUE
+ #
+ foreach $socket (@$ready_read) {
+
+ # modularized this chunk so that it can be reused by
+ # POE::Component::Client::UserAgent
+ $self->_perform_read ($socket, $timeout);
+
+ }
+ } # of unless (@ready...) {} else {}
+
+ } else {
+ # when we are here, can we have active connections?!!
+ #(you might want to comment out this huge Debug statement if
+ #you're in a hurry. Then again, you wouldn't be using perl then,
+ #would you!?)
+ LWP::Debug::trace("\n\tCurrent Server: ".
+ scalar (keys %{$self->{'current_connections'}}) .
+ " [ ". join (", ",
+ map { $_, $self->{'current_connections'}->{$_} }
+ keys %{$self->{'current_connections'}}) .
+ " ]\n\tPending Server: ".
+ ($self->{'handle_in_order'}?
+ scalar @{$self->{'ordpend_connections'}} :
+ scalar (keys %{$self->{'pending_connections'}}) .
+ " [ ". join (", ",
+ map { $_,
+ scalar @{$self->{'pending_connections'}->{$_}} }
+ keys %{$self->{'pending_connections'}}) .
+ " ]") );
+ } # end of if $sel->handles
+ # try to make new connections
+ $self->_make_connections;
+ } # end of while 'current_connections' or 'pending_connections'
+
+ # should we delete fh-queues here?!
+ # or maybe re-initialize in case we register more requests later?
+ # in that case we'll have to make sure we don't try to reconnect
+ # to old sockets later - so we should create new Select-objects!
+ $self->_remove_all_sockets();
+
+ # allows the caller quick access to all issued requests,
+ # although some original requests may have been replaced by
+ # redirects or authentication requests...
+ return $self->{'entries_by_requests'};
+}
+
+# socket handling modularized in order to work better with POE
+# as suggested by Kirill http://www.en-directo.net/mail/kirill.html
+#
+sub _remove_out_socket {
+ my ($self,$socket) = @_;
+ $self->{select_out}->remove($socket);
+}
+
+sub _remove_in_socket {
+ my ($self,$socket) = @_;
+ $self->{select_in}->remove($socket);
+}
+
+sub _add_out_socket {
+ my ($self,$socket) = @_;
+ $self->{select_out}->add($socket);
+}
+
+sub _add_in_socket {
+ my ($self,$socket) = @_;
+ $self->{select_in}->add($socket);
+}
+
+sub _remove_all_sockets {
+ my ($self) = @_;
+ $self->{select_in} = IO::Select->new();
+ $self->{select_out} = IO::Select->new();
+}
+
+sub _perform_write
+{
+ my ($self, $socket, $timeout) = @_;
+ LWP::Debug::debug('Writing to Sockets');
+ my $entry = $self->{'entries_by_sockets'}->{$socket};
+
+ my ( $request, $protocol, $fullpath, $arg, $proxy) =
+ $entry->get( qw(request protocol fullpath arg proxy) );
+
+ my ($listen_socket, $response);
+ if ($self->{'use_eval'}) {
+ eval {
+ ($listen_socket, $response) =
+ $protocol->write_request ($request,
+ $socket,
+ $fullpath,
+ $arg,
+ $timeout,
+ $proxy);
+ };
+ if ($@) {
+ # if our call fails, we might not have a $response object, so we
+ # have to create a new one here
+ if ($@ =~ /^timeout/i) {
+ $response = LWP::UserAgent::_new_response($request, &HTTP::Status::RC_REQUEST_TIMEOUT,
+ 'User-agent timeout (syswrite)');
+ } else {
+ # remove file/line number
+ # $@ =~ s/\s+at\s+\S+\s+line\s+\d+.*//s;
+ $response = LWP::UserAgent::_new_response($request, &HTTP::Status::RC_INTERNAL_SERVER_ERROR,
+ $@);
+ }
+ $entry->response ($response);
+ $self->on_failure ($request, $response, $entry);
+ }
+ } else {
+ # user has to handle any dies, usually timeouts
+ ($listen_socket, $response) =
+ $protocol->write_request ($request,
+ $socket,
+ $fullpath,
+ $arg,
+ $timeout,
+ $proxy);
+ }
+
+ if ($response and !$response->is_success) {
+ $entry->response($response);
+ $entry->response->request($request);
+ LWP::Debug::trace('Error while issuing request '.
+ $request->url->as_string);
+ } elsif ($response) {
+ # successful response already?
+ LWP::Debug::trace('Fast response for request '.
+ $request->url->as_string .
+ ' ['. length($response->content) .
+ ' bytes]');
+ $entry->response($response);
+ $entry->response->request($request);
+ my $content = $response->content;
+ $response->content(''); # clear content here, so that it
+ # can be properly processed by ->receive
+ unless ($request->method eq 'DELETE') { # JB
+ $protocol->receive_once($arg, $response, $content, $entry);
+ }
+ }
+ # one write is (should be?) enough
+ delete $self->{'entries_by_sockets'}->{$socket};
+ $self->_remove_out_socket($socket);
+
+ if (ref($listen_socket)) {
+ # now make sure we start reading from the $listen_socket:
+ # file existing entry under new (listen_)socket
+ $self->_add_in_socket ($listen_socket);
+ $entry->listen_socket($listen_socket);
+ $self->{'entries_by_sockets'}->{$listen_socket} = $entry;
+ } else {
+ # remove from current_connections
+ $self->_remove_current_connection ( $entry );
+ }
+
+ return;
+}
+
+sub _perform_read
+{
+ my ($self, $socket, $timeout) = @_;
+
+ LWP::Debug::debug('Reading from Sockets');
+ my $entry = $self->{'entries_by_sockets'}->{$socket};
+
+ my ( $request, $response, $protocol, $fullpath, $arg, $size) =
+ $entry->get( qw(request response protocol
+ fullpath arg size) );
+
+ my $retval;
+ if ($self->{'use_eval'}) {
+ eval {
+ $retval = $protocol->read_chunk ($response, $socket, $request,
+ $arg, $size, $timeout,
+ $entry);
+ };
+ if ($@) {
+ if ($@ =~ /^timeout/i) {
+ $response->code (&HTTP::Status::RC_REQUEST_TIMEOUT);
+ $response->message ('User-agent timeout (sysread)');
+ } else {
+ # remove file/line number
+ # $@ =~ s/\s+at\s+\S+\s+line\s+\d+.*//s;
+ $response->code (&HTTP::Status::RC_INTERNAL_SERVER_ERROR);
+ $response->message ($@);
+ }
+ $self->on_failure ($request, $response, $entry);
+ }
+ } else {
+ # user has to handle any dies, usually timeouts
+ $retval = $protocol->read_chunk ($response, $socket, $request,
+ $arg, $size, $timeout,
+ $entry);
+ }
+
+ # examine return value. $retval is either a positive
+ # number, indicating the number of bytes read, or
+ # '0' (for EOF), or a callback-function code (<0)
+
+ LWP::Debug::debug ("'$retval' = read_chunk from $entry (".
+ $request->url.")");
+
+ # call on_return method if it's the end of this request
+ unless ($retval > 0) {
+ my $command = $self->on_return ($request, $response, $entry);
+ $retval = $command if defined $command and $command < 0;
+
+ LWP::Debug::debug ("received '". (defined $command ? $command : '[undef]').
+ "' from on_return");
+
+ }
+
+ if ($retval > 0) {
+ # In this case, just update response entry
+ # $entry->response($response);
+ } else { # zero or negative, that means: EOF, C_LASTCON, C_ENDCON, C_ENDALL
+ # read_chunk returns 0 if we reached EOF
+ $self->_remove_in_socket($socket);
+ # use protocol dependent method to close connection
+ $entry->protocol->close_connection($entry->response, $socket,
+ $entry->request, $entry->cmd_socket);
+ # $socket->shutdown(2); # see "man perlfunc" & "man 2 shutdown"
+ close ($socket);
+ $socket = undef; # close socket
+
+ # remove from current_connections
+ $self->_remove_current_connection ( $entry );
+ # handle redirects and security if neccessary
+
+ if ($retval eq C_ENDALL) {
+ # should we clean up a bit? Remove Select-queues:
+ $self->_remove_all_sockets();
+ return $self->{'entries_by_requests'};
+ } elsif ($retval eq C_LASTCON) {
+ # just delete all pending connections
+ $self->{'pending_connections'} = {};
+ $self->{'ordpend_connections'} = [];
+ } else {
+ if ($entry->redirect_ok) {
+ $self->handle_response ($entry);
+ }
+ # pop off next pending_connection (if bandwith available)
+ $self->_make_connections;
+ }
+ }
+ return;
+}
+
+=item $ua->handle_response($request, $arg [, $size])
+
+Analyses results, handling redirects and security. This method may
+actually register several different, additional requests.
+
+This method should not be called directly. Instead, indicate for each
+individual request registered with C<$ua->register()> whether or not
+you want Parallel::UserAgent to handle redirects and security, or
+specify a default value for all requests in Parallel::UserAgent by
+using C<$ua->redirect()>.
+
+=cut
+
+# this should be mainly the old LWP::UserAgent->request, although the
+# beginning and end are different (gets all of its data via $entry
+# parameter!) Also, instead of recursive calls this uses
+# $ua->register now.
+
+sub handle_response
+{
+ my($self, $entry) = @_;
+ LWP::Debug::trace("-> ($entry [".$entry->request->url->as_string.'] )');
+
+ # check if we should process this response
+ # (maybe later - for now always check)
+
+ my ( $response, $request ) = $entry->get( qw( response request ) );
+
+ my $code = $response->code;
+
+ LWP::Debug::debug('Handling result: '.
+ (HTTP::Status::status_message($code) ||
+ "Unknown code $code"));
+
+ if ($code == &HTTP::Status::RC_MOVED_PERMANENTLY or
+ $code == &HTTP::Status::RC_MOVED_TEMPORARILY) {
+
+ # Make a copy of the request and initialize it with the new URI
+ my $referral = $request->clone;
+
+ # And then we update the URL based on the Location:-header.
+ my($referral_uri) = $response->header('Location');
+ {
+ # Some servers erroneously return a relative URL for redirects,
+ # so make it absolute if it not already is.
+ local $URI::ABS_ALLOW_RELATIVE_SCHEME = 1;
+ my $base = $response->base;
+ $referral_uri = $HTTP::URI_CLASS->new($referral_uri, $base)
+ ->abs($base);
+ }
+
+ $referral->url($referral_uri);
+ $referral->remove_header('Host');
+
+ # don't do anything unless we're allowed to redirect
+ return $response unless $self->redirect_ok($referral, $response); # fix by th. boutell
+
+ # Check for loop in the redirects
+ my $count = 0;
+ my $r = $response;
+ while ($r) {
+ if (++$count > 13 ||
+ $r->request->url->as_string eq $referral_uri->as_string) {
+ $response->header("Client-Warning" =>
+ "Redirect loop detected");
+ return $response;
+ }
+ $r = $r->previous;
+ }
+ # From: "Andrey A. Chernov" <ache at nagual.pp.ru>
+ $self->cookie_jar->extract_cookies($response)
+ if $self->cookie_jar;
+ # register follow up request
+ LWP::Debug::trace("<- (registering follow up request: $referral, $entry)");
+ return $self->register ($referral, $entry);
+
+ } elsif ($code == &HTTP::Status::RC_UNAUTHORIZED ||
+ $code == &HTTP::Status::RC_PROXY_AUTHENTICATION_REQUIRED
+ )
+ {
+ my $proxy = ($code == &HTTP::Status::RC_PROXY_AUTHENTICATION_REQUIRED);
+ my $ch_header = $proxy ? "Proxy-Authenticate" : "WWW-Authenticate";
+ my @challenge = $response->header($ch_header);
+ unless (@challenge) {
+ $response->header("Client-Warning" =>
+ "Missing Authenticate header");
+ # added the argument to header here (a guess at which header)
+ # because it dies if you pass no header https://rt.cpan.org/Ticket/Display.html?id=46821
+ LWP::Debug::trace("<- ($response [".$response->header('Client-Warning').'] )');
+ return $response;
+ }
+
+ require HTTP::Headers::Util;
+ CHALLENGE: for my $challenge (@challenge) {
+ $challenge =~ tr/,/;/; # "," is used to separate auth-params!!
+ ($challenge) = HTTP::Headers::Util::split_header_words($challenge);
+ my $scheme = lc(shift(@$challenge));
+ shift(@$challenge); # no value
+ $challenge = { @$challenge }; # make rest into a hash
+ for (keys %$challenge) { # make sure all keys are lower case
+ $challenge->{lc $_} = delete $challenge->{$_};
+ }
+
+ unless ($scheme =~ /^([a-z]+(?:-[a-z]+)*)$/) {
+ $response->header("Client-Warning" =>
+ "Bad authentication scheme '$scheme'");
+ # added the argument to header here (a guess at which header)
+ # because it dies if you pass no header https://rt.cpan.org/Ticket/Display.html?id=46821
+ LWP::Debug::trace("<- ($response [".$response->header('Client-Warning').'] )');
+ return $response;
+ }
+ $scheme = $1; # untainted now
+ my $class = "LWP::Authen::\u$scheme";
+ $class =~ s/-/_/g;
+
+ no strict 'refs';
+ unless (%{"$class\::"}) {
+ # try to load it
+ eval "require $class";
+ if ($@) {
+ if ($@ =~ /^Can\'t locate/) {
+ $response->header("Client-Warning" =>
+ "Unsupport authentication scheme '$scheme'");
+ } else {
+ $response->header("Client-Warning" => $@);
+ }
+ next CHALLENGE;
+ }
+ }
+ LWP::Debug::trace("<- authenticates");
+ return $class->authenticate($self, $proxy, $challenge, $response,
+ $request, $entry->arg, $entry->size);
+ }
+ # added the argument to header here (a guess at which header)
+ # because it dies if you pass no header https://rt.cpan.org/Ticket/Display.html?id=46821
+ LWP::Debug::trace("<- ($response [".$response->header('Client-Warning').'] )');
+ return $response;
+ }
+ LWP::Debug::trace("<- standard exit ($response)");
+ return $response;
+}
+
+# helper function for (simple_)request method.
+sub _single_request {
+ my $self = shift;
+ my $res;
+ if ( $res = $self->register (@_) ) {
+ return $res->error_as_HTML;
+ }
+ my $entries = $self->wait(5);
+ foreach (keys %$entries) {
+ my $response = $entries->{$_}->response;
+# $cookie_jar->extract_cookies($response) if $cookie_jar;
+ $response->header("Client-Date" => HTTP::Date::time2str(time));
+ return $response;
+ }
+}
+
+=item DEPRECATED $ua->deprecated_simple_request($request, [$arg [, $size]])
+
+This method simulated the behavior of LWP::UserAgent->simple_request.
+It was actually kinda overkill to use this method in
+Parallel::UserAgent, and it was mainly here for testing backward
+compatibility with the original LWP::UserAgent.
+
+The name has been changed to deprecated_simple_request in case you
+need it, but because it it no longer compatible with the most recent
+version of libwww, it will no longer run by default.
+
+The following
+description is taken directly from the corresponding libwww pod:
+
+$ua->simple_request dispatches a single WWW request on behalf of a
+user, and returns the response received. The C<$request> should be a
+reference to a C<HTTP::Request> object with values defined for at
+least the method() and url() attributes.
+
+If C<$arg> is a scalar it is taken as a filename where the content of
+the response is stored.
+
+If C<$arg> is a reference to a subroutine, then this routine is called
+as chunks of the content is received. An optional C<$size> argument
+is taken as a hint for an appropriate chunk size.
+
+If C<$arg> is omitted, then the content is stored in the response
+object itself.
+
+=cut
+
+# sub simple_request
+# (see LWP::UserAgent)
+
+# Took this out because with the new libwww it goes into deep
+# recursion. I believe calls that might have hit this will now
+# just go to LWP::UserAgent's implementation. If I comment
+# these out, tests pass; with them in, you get this deep
+# recursion. I'm assuming it's ok for them to just
+# go away, since they were deprecated many years ago after
+# all.
+sub deprecated_send_request {
+ my $self = shift;
+
+ $self->initialize;
+ my $redirect = $self->redirect(0);
+ my $response = $self->_single_request(@_);
+ $self->redirect($redirect);
+ return $response;
+}
+
+=item DEPRECATED $ua->deprecated_request($request, $arg [, $size])
+
+Previously called 'request' and included for compatibility testing with
+LWP::UserAgent. Every day usage was deprecated, and now you have to call it
+with the deprecated_request name if you want to use it (because an incompatibility
+was introduced with the newer versions of libwww).
+
+Here is what LWP::UserAgent has to say about it:
+
+Process a request, including redirects and security. This method may
+actually send several different simple reqeusts.
+
+The arguments are the same as for C<simple_request()>.
+
+=cut
+
+sub deprecated_request {
+ my $self = shift;
+
+ $self->initialize;
+ my $redirect = $self->redirect(1);
+ my $response = $self->_single_request(@_);
+ $self->redirect($redirect);
+ return $response;
+}
+
+=item $ua->as_string
+
+Returns a text that describe the state of the UA. Should be useful
+for debugging, if it would print out anything important. But it does
+not (at least not yet). Try using LWP::Debug...
+
+=cut
+
+sub as_string {
+ my $self = shift;
+ my @s;
+ push(@s, "Parallel UA: [$self]");
+ push(@s, " <Nothing in here yet, sorry>");
+ join("\n", @s, '');
+}
+
+1;
+
+#
+# Parallel::UserAgent specific methods
+#
+sub init_request {
+ my ($self, $request) = @_;
+ my($method, $url) = ($request->method, $request->url);
+ LWP::Debug::trace("-> ($request) [$method $url]");
+
+ # Check that we have a METHOD and a URL first
+ return LWP::UserAgent::_new_response($request, &HTTP::Status::RC_BAD_REQUEST, "Method missing")
+ unless $method;
+ return LWP::UserAgent::_new_response($request, &HTTP::Status::RC_BAD_REQUEST, "URL missing")
+ unless $url;
+ return LWP::UserAgent::_new_response($request, &HTTP::Status::RC_BAD_REQUEST, "URL must be absolute")
+ unless $url->scheme;
+
+
+ LWP::Debug::trace("$method $url");
+
+ # Locate protocol to use
+ my $scheme = '';
+
+ my $proxy = $self->_need_proxy($url);
+ if (defined $proxy) {
+ $scheme = $proxy->scheme;
+ } else {
+ $scheme = $url->scheme;
+ }
+ my $protocol;
+ eval {
+ # add Parallel extension here
+ $protocol = LWP::Parallel::Protocol::create($scheme);
+ };
+ if ($@) {
+ # remove file/line number
+ # $@ =~ s/\s+at\s+\S+\s+line\s+\d+.*//s;
+ return LWP::UserAgent::_new_response($request, &HTTP::Status::RC_NOT_IMPLEMENTED, $@)
+ }
+
+ # Extract fields that will be used below
+ my ($agent, $from, $timeout, $cookie_jar,
+ $use_eval, $parse_head, $max_size, $nonblock) =
+ @{$self}{qw(agent from timeout cookie_jar
+ use_eval parse_head max_size nonblock)};
+
+ # Set User-Agent and From headers if they are defined
+ $request->init_header('User-Agent' => $agent) if $agent;
+ $request->init_header('From' => $from) if $from;
+ $request->init_header('Range' => "bytes=0-$max_size") if $max_size;
+ $cookie_jar->add_cookie_header($request) if $cookie_jar;
+
+ # Transfer some attributes to the protocol object
+ $protocol->can('parse_head') ?
+ $protocol->parse_head($parse_head) :
+ $protocol->_elem('parse_head', $parse_head);
+ $protocol->max_size($max_size);
+
+ LWP::Debug::trace ("<- (undef".
+ ", ". (defined $proxy ? $proxy : '[undef]').
+ ", ". (defined $protocol ? $protocol : '[undef]').
+ ", ". (defined $timeout ? $timeout : '[undef]').
+ ", ". (defined $use_eval ? $use_eval : '[undef]').")");
+
+ (undef, $proxy, $protocol, $timeout, $use_eval, $nonblock);
+}
+
+=head1 ADDITIONAL METHODS
+
+=item $ua->use_alarm([$boolean])
+
+This function is not in use anymore and will display a warning when
+called and warnings are enabled.
+
+=cut
+
+sub use_alarm {
+ warn "The Parallel::UserAgent->use_alarm method is not available anymore.\n" if $^W;
+}
+
+=head1 Callback functions
+
+You can register a callback function. See LWP::UserAgent for details.
+
+=head1 BUGS
+
+Probably lots! This was meant only as an interim release until this
+functionality is incorporated into LWPng, the next generation libwww
+module (though it has been this way for over 2 years now!)
+
+Needs a lot more documentation on how callbacks work!
+
+=head1 SEE ALSO
+
+L<LWP::UserAgent>
+
+=head1 COPYRIGHT
+
+Copyright 1997-2004 Marc Langheinrich E<lt>marclang at cpan.org>
+
+This library is free software; you can redistribute it and/or
+modify it under the same terms as Perl itself.
+
+=cut
+
+__END__
diff --git a/lib/LWP/ParallelUA.pm b/lib/LWP/ParallelUA.pm
new file mode 100755
index 0000000..9bc327c
--- /dev/null
+++ b/lib/LWP/ParallelUA.pm
@@ -0,0 +1,59 @@
+# -*- perl -*-
+# $Id: ParallelUA.pm,v 1.6 2004/02/10 15:19:19 langhein Exp $
+
+package LWP::ParallelUA;
+use Exporter();
+use LWP::Parallel::UserAgent qw(:CALLBACK);
+
+require 5.004;
+ at ISA = qw(LWP::Parallel::UserAgent Exporter);
+ at EXPORT = qw();
+ at EXPORT_OK = @LWP::Parallel::UserAgent::EXPORT_OK;
+%EXPORT_TAGS = %LWP::Parallel::UserAgent::EXPORT_TAGS;
+
+1;
+
+__END__
+
+=head1 NAME
+
+LWP::ParallelUA - Parallel LWP::UserAgent
+
+=head1 SYNOPSIS
+
+ require LWP::ParallelUA;
+ $ua = LWP::ParallelUA->new();
+
+ (see description of LWP::Parallel::UserAgent)
+
+=head1 DESCRIPTION
+
+B<ParallelUA> is a simple frontend to the B<LWP::Parallel::UserAgent>
+module. It is here in order to maintain the compatibility with
+previous releases. However, in order to prevent the previous need for
+changing the original LWP sources, all extension files have been moved
+to the LWP::Parallel subtree.
+
+If you start from scratch, maybe you should start using LWP::Parallel
+and its submodules directly.
+
+See the L<LWP::Parallel::UserAgent> for the documentation on
+this module.
+
+=head1 AUTHOR
+
+Marc Langheinrich, marclang at cpan.org
+
+=head1 SEE ALSO
+
+L<LWP::Parallel::UserAgent>
+
+=head1 COPYRIGHT
+
+Copyright 1997-2004 Marc Langheinrich E<lt>marclang at cpan.org>
+
+This library is free software; you can redistribute it and/or modify
+it under the same terms as Perl itself.
+
+=cut
+
diff --git a/lib/LWP/RobotPUA.pm b/lib/LWP/RobotPUA.pm
new file mode 100755
index 0000000..15744ab
--- /dev/null
+++ b/lib/LWP/RobotPUA.pm
@@ -0,0 +1,59 @@
+# -*- perl -*-
+# $Id: RobotPUA.pm,v 1.6 2004/02/10 15:19:19 langhein Exp $
+
+package LWP::RobotPUA;
+use Exporter();
+use LWP::Parallel::RobotUA qw(:CALLBACK);
+
+require 5.004;
+ at ISA = qw(LWP::Parallel::RobotUA Exporter);
+ at EXPORT = qw();
+ at EXPORT_OK = @LWP::Parallel::RobotUA::EXPORT_OK;
+%EXPORT_TAGS = %LWP::Parallel::RobotUA::EXPORT_TAGS;
+
+1;
+
+__END__
+
+=head1 NAME
+
+LWP::RobotPUA - Parallel LWP::RobotUA
+
+=head1 SYNOPSIS
+
+ require LWP::RobotPUA;
+ $ua = new LWP::RobotPUA 'my-robot/0.1', 'me at foo.com';
+
+ (see description of LWP::Parallel::RobotUA)
+
+=head1 DESCRIPTION
+
+RobotPUA is a simple frontend to the LWP::Parallel::RobotUA
+module. It is here in order to maintain the compatibility with
+previous releases. However, in order to prevent the previous need for
+changing the original LWP sources, all extension files have been moved
+to the LWP::Parallel subtree.
+
+If you start from scratch, maybe you should start using LWP::Parallel
+and its submodules directly.
+
+See the L<LWP::Parallel::RobotUA> for the documentation on this
+module.
+
+=head1 AUTHOR
+
+Marc Langheinrich, marclang at cpan.org
+
+=head1 SEE ALSO
+
+L<LWP::Parallel::RobotUA>, L<LWP::Parallel::UserAgent>, L<LWP::RobotUA>
+
+=head1 COPYRIGHT
+
+Copyright 1997-2004 Marc Langheinrich E<lt>marclang at cpan.org>
+
+This library is free software; you can redistribute it and/or modify
+it under the same terms as Perl itself.
+
+=cut
+
diff --git a/t/TEST b/t/TEST
new file mode 100755
index 0000000..9bcbeb6
--- /dev/null
+++ b/t/TEST
@@ -0,0 +1,43 @@
+#!/usr/bin/perl -w
+
+use strict;
+
+# This script runs Test::Harness on the tests found under the
+# "t" directory. (stolen and adapted from LWP)
+
+# First we check if we already are within the "t" directory
+unless (-d "local") {
+ # try to move into test directory
+ chdir "t" or die "Can't chdir: $!";
+
+ # fix all relative library locations
+ foreach (@INC) {
+ $_ = "../$_" unless m,^/,;
+ }
+}
+# Pick up the library files from the ../blib directory
+unshift(@INC, "../blib/lib", "../blib/arch");
+#print "@INC\n";
+
+
+use Test::Harness;
+$Test::Harness::verbose = shift
+ if $ARGV[0] and ($ARGV[0] =~ /^\d+$/ || $ARGV[0] eq "-v");
+
+my @tests;
+
+if (@ARGV) {
+ for (@ARGV) {
+ if (-d $_) {
+ push(@tests, <$_/*.t>);
+ } else {
+ $_ .= ".t" unless /\.t$/;
+ push(@tests, $_);
+ }
+ }
+} else {
+ @tests = (<local/*.t>, <robot/*.t>);
+ push(@tests, <live/*.t>) if -f "live/ENABLED";
+}
+
+runtests @tests;
diff --git a/t/live/ENABLED.off b/t/live/ENABLED.off
new file mode 100644
index 0000000..e69de29
diff --git a/t/live/jigsaw-auth-b.t b/t/live/jigsaw-auth-b.t
new file mode 100644
index 0000000..f4dd9c6
--- /dev/null
+++ b/t/live/jigsaw-auth-b.t
@@ -0,0 +1,51 @@
+print "1..3\n";
+
+use strict;
+use LWP::Parallel::UserAgent;
+
+my $ua = LWP::Parallel::UserAgent->new(keep_alive => 1);
+
+my $req = HTTP::Request->new(GET => "http://jigsaw.w3.org/HTTP/Basic/");
+
+my $res = $ua->request($req);
+
+#print $res->as_string;
+
+print "not " unless $res->code eq "401";
+print "ok 1\n";
+
+$req->authorization_basic('guest', 'guest');
+$res = $ua->request($req);
+
+#print $res->as_string;
+print "not " unless $res->code eq "200" && $res->content =~ /Your browser made it!/;
+print "ok 2\n";
+
+{
+ package MyUA;
+ use vars qw(@ISA);
+ @ISA = qw(LWP::Parallel::UserAgent);
+
+ my @try = (['foo', 'bar'], ['', ''], ['guest', ''], ['guest', 'guest']);
+
+ sub get_basic_credentials {
+ my($self,$realm, $uri, $proxy) = @_;
+ print "$realm/$uri/$proxy\n";
+ my $p = shift @try;
+ print join("/", @$p), "\n";
+ return @$p;
+ }
+
+}
+
+$ua = MyUA->new(keep_alive => 1);
+
+$req = HTTP::Request->new(GET => "http://jigsaw.w3.org/HTTP/Basic/");
+$res = $ua->request($req);
+
+#print $res->as_string;
+
+print "not " unless $res->content =~ /Your browser made it!/;
+# && $res->header("Client-Response-Num") == 5;
+print "ok 3\n";
+
diff --git a/t/live/jigsaw-auth-d.t b/t/live/jigsaw-auth-d.t
new file mode 100644
index 0000000..4149472
--- /dev/null
+++ b/t/live/jigsaw-auth-d.t
@@ -0,0 +1,33 @@
+print "1..1\n";
+
+use strict;
+use LWP::Parallel::UserAgent;
+
+{
+ package MyUA;
+ use vars qw(@ISA);
+ @ISA = qw(LWP::Parallel::UserAgent);
+
+ my @try = (['foo', 'bar'], ['', ''], ['guest', ''], ['guest', 'guest']);
+
+ sub get_basic_credentials {
+ my($self,$realm, $uri, $proxy) = @_;
+ print "$realm/$uri/$proxy\n";
+ my $p = shift @try;
+ print join("/", @$p), "\n";
+ return @$p;
+ }
+
+}
+
+my $ua = MyUA->new(keep_alive => 1);
+
+my $req = HTTP::Request->new(GET => "http://jigsaw.w3.org/HTTP/Digest/");
+my $res = $ua->request($req);
+
+#print $res->as_string;
+
+print "not " unless $res->content =~ /Your browser made it!/;
+# && $res->header("Client-Response-Num") == 5;
+print "ok 1\n";
+
diff --git a/t/live/jigsaw-md5.t b/t/live/jigsaw-md5.t
new file mode 100644
index 0000000..27b693a
--- /dev/null
+++ b/t/live/jigsaw-md5.t
@@ -0,0 +1,27 @@
+print "1..2\n";
+
+use strict;
+use LWP::Parallel::UserAgent;
+
+my $ua = LWP::Parallel::UserAgent->new(keep_alive => 1);
+
+my $req = HTTP::Request->new(GET => "http://jigsaw.w3.org/HTTP/h-content-md5.html");
+$req->header("TE", "deflate");
+
+my $res = $ua->request($req);
+
+use Digest::MD5 qw(md5_base64);
+print "not " unless $res->header("Content-MD5") eq md5_base64($res->content) . "==";
+print "ok 1\n";
+
+print $res->as_string;
+
+my $etag = $res->header("etag");
+$req->header("If-None-Match" => $etag);
+
+$res = $ua->request($req);
+print $res->as_string;
+
+print "not " unless $res->code eq "304";
+# && $res->header("Client-Response-Num") == 2;
+print "ok 2\n";
diff --git a/t/live/jigsaw-neg.t b/t/live/jigsaw-neg.t
new file mode 100644
index 0000000..7d992a2
--- /dev/null
+++ b/t/live/jigsaw-neg.t
@@ -0,0 +1,15 @@
+print "1..1\n";
+
+use strict;
+use LWP::Parallel::UserAgent;
+
+my $ua = LWP::Parallel::UserAgent->new(keep_alive => 1);
+
+my $req = HTTP::Request->new(GET => "http://jigsaw.w3.org/HTTP/neg");
+$req->header(Connection => "close");
+my $res = $ua->request($req);
+
+print $res->as_string;
+
+print "not " unless $res->code == 300;
+print "ok 1\n";
diff --git a/t/live/mozilla-ftp.t b/t/live/mozilla-ftp.t
new file mode 100644
index 0000000..cacf749
--- /dev/null
+++ b/t/live/mozilla-ftp.t
@@ -0,0 +1,32 @@
+print "1..4\n";
+
+use strict;
+use LWP::Parallel::UserAgent;
+
+my $ua = LWP::Parallel::UserAgent->new(keep_alive => 1);
+
+my $req = HTTP::Request->new(GET => "ftp://ftp.mozilla.org/pub/");
+
+my $res = $ua->request($req);
+
+#print $res->as_string;
+
+print "not " unless $res->code eq "200";
+print "ok 1\n";
+
+print "not " unless $res->header("Content-Type") =~ /ftp-dir-listing/;
+print "ok 2\n";
+
+print "not " unless $res->content =~ /README/;
+print "ok 3\n";
+
+$req = HTTP::Request->new(GET => "ftp://ftp.mozilla.org/pub/README");
+$res = $ua->request($req);
+
+# do not print the contents in a real test -- it contains 'not' :-)
+#print $res->as_string;
+#print "not " unless $res->header("Content-Type") =~ /text\/plain/;
+#print "ok 4\n";
+
+print "not " unless $res->content =~ /mirrors.html/;
+print "ok 4\n";
diff --git a/t/local/compatibility.t b/t/local/compatibility.t
new file mode 100644
index 0000000..e894bc5
--- /dev/null
+++ b/t/local/compatibility.t
@@ -0,0 +1,391 @@
+# vim: set ft=perl :
+$| = 1; # autoflush
+
+my $DEBUG = 0;
+my $CRLF = "\015\012";
+
+#use Data::Dump ();
+#use LWP::Debug qw(+debug +trace +conns);
+#use LWP::Debug qw(+debug);
+
+# First we create HTTP server for testing our http protocol
+# (this is stolen from the libwww t/local/http.t file)
+
+require IO::Socket; # make sure this work before we try to make a HTTP::Daemon
+
+# First we make ourself a daemon in another process
+my $D = shift || '';
+if ($D eq 'daemon') {
+
+ require HTTP::Daemon;
+
+ my $d = HTTP::Daemon->new(Timeout => 10, LocalAddr=>'localhost');
+
+ print "[$$] Pleased to meet you at: <URL:", $d->url, ">\n";
+
+ open(STDOUT, ">/dev/null");
+
+ while ($c = $d->accept) {
+ $r = $c->get_request;
+ if ($r) {
+ my $p = ($r->url->path_segments)[1];
+ my $func = lc("httpd_" . $r->method . "_$p");
+ if (defined &$func) {
+ &$func($c, $r);
+ } else {
+ $c->send_error(404);
+ }
+ } else {
+ print STDERR "Failed: Reason was '". $c->reason ."'\n";
+ }
+ $c = undef; # close connection
+ }
+ print STDERR "HTTP Server terminated\n" if $DEBUG;
+ exit 0;
+} else {
+ use Config;
+ print STDERR "[$$] i'm starting the daemon now!\n" if $DEBUG;
+ open(DAEMON, "$Config{'perlpath'} local/compatibility.t daemon |") or die "Can't exec daemon: $!";
+}
+
+print "1..20\n";
+
+my $greeting = <DAEMON>;
+$greeting =~ /(<[^>]+>)/;
+print STDERR "I am [$$], greeting is [$greeting] and right now dollar 1 is [$1]\n" if $DEBUG;
+my $url_from_daemon = $1;
+
+require URI;
+my $base = URI->new($url_from_daemon);
+sub url {
+ my $u = URI->new(@_);
+ $u = $u->abs($_[1]) if @_ > 1;
+ $u->as_string;
+}
+
+print "Will access HTTP server at $base\n";
+
+# do tests from here on
+
+#use LWP::Debug qw(+);
+
+require LWP::Parallel::UserAgent;
+require HTTP::Request;
+my $ua = new LWP::Parallel::UserAgent;
+$ua->agent("Mozilla/0.01 " . $ua->agent);
+$ua->from('marclang at cpan.org');
+
+#----------------------------------------------------------------
+print "\nLWP::UserAgent compatibility...\n";
+
+# ============
+print " - Bad request...\n";
+$req = new HTTP::Request GET => url("/not_found", $base);
+print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+
+$req->header(X_Foo => "Bar");
+$res = $ua->request($req);
+
+print "not " unless $res->is_error
+ and $res->code == 404
+ and $res->message =~ /not\s+found/i;
+print "ok 1\n";
+print STDERR "\tResponse was '".$res->code. " ". $res->message."'\n" if $DEBUG;
+
+# we also expect a few headers
+print "not " if !$res->server and !$res->date;
+print "ok 2\n";
+
+# =============
+print " - Simple echo...\n";
+sub httpd_get_echo
+{
+ my($c, $req) = @_;
+ $c->send_basic_header(200);
+ print $c "Content-Type: text/plain\015\012";
+ $c->send_crlf;
+ print $c $req->as_string;
+}
+
+$req = new HTTP::Request GET => url("/echo/path_info?query", $base);
+$req->push_header(Accept => 'text/html');
+$req->push_header(Accept => 'text/plain; q=0.9');
+$req->push_header(Accept => 'image/*');
+$req->if_modified_since(time - 300);
+$req->header(Long_text => 'This is a very long header line
+which is broken between
+more than one line.');
+$req->header(X_Foo => "Bar");
+
+$res = $ua->request($req);
+#print $res->as_string;
+
+print "not " unless $res->is_success
+ and $res->code == 200 && $res->message eq "OK";
+print "ok 3\n";
+
+$_ = $res->content;
+ at accept = /^Accept:\s*(.*)/mg;
+
+print "not " unless /^From:\s*marclang\@cpan\.org$/m
+ and /^Host:/m
+ and @accept == 3
+ and /^Accept:\s*text\/html/m
+ and /^Accept:\s*text\/plain/m
+ and /^Accept:\s*image\/\*/m
+ and /^If-Modified-Since:\s*\w{3},\s+\d+/m
+ and /^Long-Text:\s*This.*broken between/m
+ and /^X-Foo:\s*Bar$/m
+ and /^User-Agent:\s*Mozilla\/0.01/m;
+print "ok 4\n";
+
+# ===========
+print " - Send file...\n";
+
+my $file = "test-$$.html";
+open(FILE, ">$file") or die "Can't create $file: $!";
+binmode FILE or die "Can't binmode $file: $!";
+print FILE <<EOT;
+<html><title>Test</title>
+<h1>This should work</h1>
+Now for something completely different, since it seems that
+the file transfer does work ok, right?
+EOT
+close(FILE);
+
+sub httpd_get_file
+{
+ my($c, $r) = @_;
+ my %form = $r->url->query_form;
+ my $file = $form{'name'};
+ $c->send_file_response($file);
+ unlink($file) if $file =~ /^test-/;
+}
+
+$req = new HTTP::Request GET => url("/file?name=$file", $base);
+$res = $ua->request($req);
+
+# under previous versions of the library a $res->title was
+# returned--that part of this test has been removed for
+# compatibility with the new library
+print "not " unless $res->is_success
+ and $res->content_type eq 'text/html'
+ and $res->content_length == 151
+ and $res->content =~ /different, since/;
+print "ok 5\n";
+
+# A second try on the same file, should fail because we unlink it
+$res = $ua->request($req);
+#print $res->as_string;
+print "not " unless $res->is_error
+ and $res->code == 404; # not found
+print "ok 6\n";
+
+# Then try to list current directory
+$req = new HTTP::Request GET => url("/file?name=.", $base);
+$res = $ua->request($req);
+#print $res->as_string;
+use Data::Dumper;
+print Dumper($res). "\nnot " unless $res->code == 501; # NYI
+print "ok 7\n";
+
+# =============
+print " - Check redirect...\n";
+sub httpd_get_redirect
+{
+ my($c) = @_;
+ $c->send_redirect("/echo/redirect");
+}
+
+$req = new HTTP::Request GET => url("/redirect/foo", $base);
+$res = $ua->request($req);
+#print $res->as_string;
+
+print "not " unless $res->is_success
+ and $res->content =~ m|/echo/redirect|;
+print "ok 8\n";
+print "not " unless $res->previous
+ and $res->previous->is_redirect
+ and $res->previous->code == 301;
+print "ok 9\n";
+
+# Lets test a redirect loop too
+sub httpd_get_redirect2 { shift->send_redirect("/redirect3/") }
+sub httpd_get_redirect3 { shift->send_redirect("/redirect4/") }
+sub httpd_get_redirect4 { shift->send_redirect("/redirect5/") }
+sub httpd_get_redirect5 { shift->send_redirect("/redirect6/") }
+sub httpd_get_redirect6 { shift->send_redirect("/redirect2/") }
+
+$req->url(url("/redirect2", $base));
+$res = $ua->request($req);
+#print $res->as_string;
+print "not " unless $res->is_redirect
+ and $res->header("Client-Warning") =~ /loop detected/i;
+print "ok 10\n";
+$i = 1;
+while ($res->previous) {
+ $i++;
+ $res = $res->previous;
+}
+# under the old library with the old "duplicated" methods (which are now
+# named with their old names preceded by "deprecated_") this chained
+# to a depth of 6. With the new library, and those methods
+# deprecated (search for 'sub deprecated_' in /LWP/Parallel/UserAgent.pm ),
+# it gives 8.
+print "not " unless ($i == 6 or $i == 8);
+print "ok 11\n";
+
+#----------------------------------------------------------------
+print "Check basic authorization...\n";
+sub httpd_get_basic
+{
+ my($c, $r) = @_;
+ #print STDERR $r->as_string;
+ my($u,$p) = $r->authorization_basic;
+ if (defined($u) && $u eq 'ok 12' && $p eq 'xyzzy') {
+ $c->send_basic_header(200);
+ print $c "Content-Type: text/plain";
+ $c->send_crlf;
+ $c->send_crlf;
+ $c->print("$u\n");
+ } else {
+ $c->send_basic_header(401);
+ $c->print("WWW-Authenticate: Basic realm=\"libwww-perl\"\015\012");
+ $c->send_crlf;
+ }
+}
+
+{
+ package MyUA; @ISA=qw(LWP::Parallel::UserAgent);
+ sub get_basic_credentials {
+ my($self, $realm, $uri, $proxy) = @_;
+ if ($realm eq "libwww-perl" && $uri->rel($base) eq "basic") {
+ return ("ok 12", "xyzzy");
+ } else {
+ return undef;
+ }
+ }
+}
+$req = new HTTP::Request GET => url("/basic", $base);
+$res = MyUA->new->request($req);
+#print $res->as_string;
+
+print "not " unless $res->is_success;
+print $res->content;
+
+# Lets try with a $ua that does not pass out credentials
+$res = $ua->request($req);
+print "not " unless $res->code == 401;
+print "ok 13\n";
+
+# Lets try to set credentials for this realm
+$ua->credentials($req->url->host_port, "libwww-perl", "ok 12", "xyzzy");
+$res = $ua->request($req);
+print "not " unless $res->is_success;
+print "ok 14\n";
+
+# Then illegal credentials
+$ua->credentials($req->url->host_port, "libwww-perl", "user", "passwd");
+$res = $ua->request($req);
+print "not " unless $res->code == 401;
+print "ok 15\n";
+
+
+#----------------------------------------------------------------
+print "Check proxy...\n";
+sub httpd_get_proxy_http
+{
+ my($c,$r) = @_;
+ if ($r->method eq "GET" and
+ $r->url->scheme eq "http") {
+ $c->send_basic_header(200);
+ $c->send_crlf;
+ } else {
+ $c->send_error;
+ }
+}
+
+sub httpd_get_proxy_ftp
+{
+ my($c,$r) = @_;
+ if ($r->method eq "GET" and
+ $r->url->scheme eq "ftp") {
+ $c->send_basic_header(200);
+ $c->send_crlf;
+ } else {
+ $c->send_error;
+ }
+}
+
+#use LWP::Debug qw(+debug +trace +conns);
+
+$ua->proxy(ftp => $base);
+$req = new HTTP::Request GET => "ftp://ftp.perl.com/proxy_ftp";
+$res = $ua->request($req);
+#print $res->as_string;
+print "not " unless $res->is_success;
+print "ok 16\n";
+
+$ua->proxy(http => $base);
+$req = new HTTP::Request GET => "http://www.perl.com/proxy_http";
+$res = $ua->request($req);
+#print $res->as_string;
+print "not " unless $res->is_success;
+print "ok 17\n";
+
+$ua->proxy(http => '', ftp => '');
+
+#----------------------------------------------------------------
+print "Check POSTing...\n";
+sub httpd_post_echo
+{
+ my($c,$r) = @_;
+ $c->send_basic_header;
+ $c->print("Content-Type: text/plain");
+ $c->send_crlf;
+ $c->send_crlf;
+ $c->print($r->as_string);
+}
+
+$req = new HTTP::Request POST => url("/echo/foo", $base);
+$req->content_type("application/x-www-form-urlencoded");
+$req->content("foo=bar&bar=test");
+$res = $ua->request($req);
+#print $res->as_string;
+
+$_ = $res->content;
+print "not " unless $res->is_success
+ and /^Content-Length:\s*16$/mi
+ and /^Content-Type:\s*application\/x-www-form-urlencoded$/mi
+ and /^foo=bar&bar=test/m;
+print "ok 18\n";
+
+#----------------------------------------------------------------
+print "\nTerminating server...\n";
+sub httpd_get_quit
+{
+ my($c) = @_;
+ $c->send_error(503, "Bye, bye");
+ exit; # terminate HTTP server
+}
+$ua->initialize;
+$req = new HTTP::Request GET => url("/quit", $base);
+print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+}
+print "ok 19\n";
+
+$entries = $ua->wait(5);
+foreach (keys %$entries) {
+ # each entry available under the url-string of their request contains
+ # a number of fields. The most important are $entry->request and
+ # $entry->response.
+ $res = $entries->{$_}->response;
+ print STDERR "Answer for '",$res->request->url, "' was \t",
+ $res->code,": ", $res->message,"\n" if $DEBUG;
+
+ print "not " unless $res->code == 503 and $res->content =~ /Bye, bye/;
+ print "ok 20\n";
+}
diff --git a/t/local/file.t b/t/local/file.t
new file mode 100644
index 0000000..7742445
--- /dev/null
+++ b/t/local/file.t
@@ -0,0 +1,260 @@
+$| = 1; # autoflush
+
+$DEBUG = 0;
+
+# uncomment the following line if you want to run these tests from the command
+# line using the local version of Parallel::UserAgent (otherwise perl will take
+# the already installed version):
+# use lib ('./lib');
+
+print "1..4\n";
+
+require LWP::Parallel::UserAgent;
+require HTTP::Request;
+my $ua = new LWP::Parallel::UserAgent;
+$ua->agent("Mozilla/0.01 " . $ua->agent);
+$ua->from('marclang at cpan.org');
+
+use Cwd;
+my $pwd = getcwd;
+
+#---------------------------------------------------------------
+print "\nLWP::Parallel::UserAgent interface...";
+print "\nSingle bad request..\n";
+$req = new HTTP::Request GET => "file:$pwd/not_found";
+$req->header(X_Foo => "Bar");
+
+print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+}
+print "ok 1\n";
+
+my $entries = $ua->wait(5);
+foreach (keys %$entries) {
+ # each entry available under the url-string of their request contains
+ # a number of fields. The most important are $entry->request and
+ # $entry->response.
+ $res = $entries->{$_}->response;
+ print STDERR "Answer for '",$res->request->url, "' was \t",
+ $res->code,": ", $res->message,"\n" if $DEBUG;
+
+ print "not " unless $res->is_error
+ and $res->code == 404
+ and $res->message =~ /not\s+exist/i;
+
+ print "ok 2\n";
+}
+
+#----------------------------------------------------------------
+print "\nMultiple Requests...\n";
+
+# first five files from directory for testing
+opendir (DIR, $pwd) or die "Can't open $pwd: $!";
+my %files;
+while (defined ($file = readdir(DIR))) {
+ next unless (-f "$pwd/$file");
+ open (FILE, "$pwd/$file") or die "Can't open $pwd/$file: $!";
+ $files{$file} = join ('', <FILE>);
+ close (FILE);
+}
+
+$ua->initialize;
+for (0..10) { # read every file 10 times
+ foreach (keys %files) {
+ $req = new HTTP::Request GET => "file:$pwd/$_";
+ print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+ if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+ last;
+ }
+ }
+}
+print "ok 3\n";
+
+$entries = $ua->wait(5);
+foreach (keys %$entries) {
+ $res = $entries->{$_}->response;
+ my $url = $res->request->url;
+ my $file = $url->as_string;
+ $file =~ s/^.*\///;
+
+ print STDERR "Answer for '$url' was \"",
+ $res->code,": ", $res->message,"\"\n"
+ if $DEBUG;
+
+ unless ( $res->content eq $files{$file} ) {
+ print "not ";
+ last;
+ }
+}
+print "ok 4\n";
+
+__END__
+#----------------------------------------------------------------
+print "\nLarger number of requests (40)..\n";
+
+$ua->initialize;
+
+for (0..40) {
+ my $page = $i % 3;
+ $req = new HTTP::Request GET => url("/page$page", $base);
+ print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+ if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+ last;
+ }
+}
+print "ok 6\n";
+$i=0;
+$entries = $ua->wait(5);
+foreach (keys %$entries) {
+ $res = $entries->{$_}->response;
+ my $url = $res->request->url;
+ $url =~ /([0-9]+)$/;
+ my $num = $1;
+
+ print STDERR "Answer for '$url' was \n\t",
+ $res->code,": ", $res->message," \"", $res->content, "\"\n"
+ if $DEBUG;
+ unless ($res->content =~ /This is page $num/)
+ {
+ print STDERR "Oops: Answer ($i) for '$url' was \n\t",
+ $res->code,": ", $res->message," \"", $res->content, "\"\n";
+
+ print ("not ");
+ last;
+ }
+ $i++;
+}
+print "ok 7\n";
+
+#----------------------------------------------------------------
+sub httpd_get_echo
+{
+ my($c, $req) = @_;
+ $c->send_basic_header(200);
+ print $c "Content-Type: text/plain\015\012";
+ $c->send_crlf;
+ print $c $req->as_string;
+}
+
+sub httpd_get_redirect
+{
+ my($c) = @_;
+ $c->send_redirect("/echo/redirect");
+}
+
+print "\nCheck redirect on/off...\n";
+
+$ua->initialize;
+$ua->redirect(1);
+
+$req = new HTTP::Request GET => url("/redirect/foo", $base);
+print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not ok 8\nnot ok 9\n";
+} else {
+ $entries = $ua->wait(5);
+ foreach (keys %$entries) {
+ $res = $entries->{$_}->response;
+ print "not " unless $res->is_success
+ and $res->content =~ m|/echo/redirect|;
+ print "ok 8\n";
+ print "not " unless $res->previous
+ and $res->previous->is_redirect
+ and $res->previous->code == 301;
+ print "ok 9\n";
+ last;
+ }
+}
+
+$ua->initialize;
+$ua->redirect(0);
+
+print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not ok 10\nnot ok 11\n";
+} else {
+ $entries = $ua->wait(5);
+ foreach (keys %$entries) {
+ $res = $entries->{$_}->response;
+ print "not " if $res->is_success
+ and $res->content =~ m|/echo/redirect|;
+ print "ok 10\n";
+ print "not " unless $res->code == 301;
+ print "ok 11\n";
+ last;
+ }
+}
+
+#----------------------------------------------------------------
+print "\nTesting ordered connections...\n";
+
+my $order_num = 0;
+my @order_history = ();
+sub httpd_get_num
+{
+ my($c, $req) = @_;
+ my $num = $req->url->fragment;
+ push @order_history, $num;
+ my $msg = "Request History: ". join (", ", @order_history) . "\n";
+
+ $c->send_basic_header(200);
+ print $c "Content-Type: text/plain\015\012";
+ $c->send_crlf;
+}
+
+package main;
+
+my $uao = new myPUA { 'handle_in_order' => 0 };
+
+for (0..40) {
+ $req = new HTTP::Request GET => url("/num?$_", $base);
+ print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+ if ( $res = $uao->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+ }
+}
+print "ok 12\n";
+
+$entries = $uao->wait(5);
+
+
+
+#----------------------------------------------------------------
+print "\nTerminating server...\n";
+sub httpd_get_quit
+{
+ my($c) = @_;
+ $c->send_error(503, "Bye, bye");
+ exit; # terminate HTTP server
+}
+$ua->initialize;
+$req = new HTTP::Request GET => url("/quit", $base);
+print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+}
+print "ok 13\n";
+
+$entries = $ua->wait(5);
+foreach (keys %$entries) {
+ # each entry available under the url-string of their request contains
+ # a number of fields. The most important are $entry->request and
+ # $entry->response.
+ $res = $entries->{$_}->response;
+ print STDERR "Answer for '",$res->request->url, "' was \t",
+ $res->code,": ", $res->message,"\n" if $DEBUG;
+
+ print "not " unless $res->code == 503 and $res->content =~ /Bye, bye/;
+ print "ok 14\n";
+}
+
diff --git a/t/local/http.t b/t/local/http.t
new file mode 100644
index 0000000..83d70cb
--- /dev/null
+++ b/t/local/http.t
@@ -0,0 +1,370 @@
+if ($^O eq "MacOS") {
+ print "1..0\n";
+ exit(0);
+}
+
+$| = 1; # autoflush
+
+$DEBUG = 0;
+$NONBLOCK = 0; # set to 1 to try out non-blocking connects (new in 2.51)
+#use LWP::Debug qw(+debug +trace +conns);
+
+
+# uncomment the following line if you want to run these tests from the command
+# line using the local version of Parallel::UserAgent (otherwise perl will take
+# the already installed version):
+# use lib ('./lib');
+
+# First we create HTTP server for testing our http protocol
+# (this is stolen from the LWP t/local/http.t file)
+
+require IO::Socket; # make sure this work before we try to make a HTTP::Daemon
+
+# First we make ourself a daemon in another process
+my $D = shift || '';
+if ($D eq 'daemon') {
+
+ require HTTP::Daemon;
+
+ my $d = new HTTP::Daemon Timeout => 10, LocalAddr => 'localhost';
+
+ print "Please to meet you at: <URL:", $d->url, ">\n";
+ open(STDOUT, ">/dev/null");
+
+ while ($c = $d->accept) {
+ $r = $c->get_request;
+ if ($r) {
+ my $p = ($r->url->path_segments)[1];
+ my $func = lc("httpd_" . $r->method . "_$p");
+ if (defined &$func) {
+ &$func($c, $r);
+ } else {
+ $c->send_error(404);
+ }
+ }
+ $c = undef; # close connection
+ }
+ print STDERR "HTTP Server terminated\n" if $DEBUG;
+ exit 0;
+} else {
+ use Config;
+ open(DAEMON, "$Config{'perlpath'} local/http.t daemon |") or die "Can't exec daemon: $!";
+}
+
+package myPUA;
+use LWP::Parallel::UserAgent qw(:CALLBACK);
+ at ISA = qw(LWP::Parallel::UserAgent);
+
+# redefine methods: on_connect gets called whenever we are about to
+# make a a connection
+
+$myPUA::order_history = [];
+sub on_connect {
+ my ($self, $request, $response, $entry) = @_;
+ my $url = $request->url->as_string;
+ $url =~ s/^.*\?//;
+ push @{$myPUA::order_history}, $url;
+}
+
+sub order_history {
+ my $history = shift if $_;
+ return $myPUA::order_history unless $history;
+ $myPUA::order_history = $history;
+}
+
+package main;
+
+print "1..15\n";
+
+my $greeting = <DAEMON>;
+$greeting =~ /(<[^>]+>)/;
+
+require URI;
+my $base = URI->new($1);
+sub url {
+ my $u = URI->new(@_);
+ $u = $u->abs($_[1]) if @_ > 1;
+ $u->as_string;
+}
+
+# find additional, locally installed web server for testing
+##$uat = new LWP::UserAgent;
+##$res = $uat->head("http://localhost:80");
+##print "not " if $res->code == 500; print "ok: ". $res->code . "\n";
+
+print "Will access HTTP server at $base\n";
+
+# do tests from here on
+
+#use LWP::Debug qw(+);
+
+require LWP::Parallel::UserAgent;
+require HTTP::Request;
+my $ua = new LWP::Parallel::UserAgent;
+$ua->agent("Mozilla/0.01 " . $ua->agent);
+$ua->from('marclang at cpan.org');
+$ua->nonblock($NONBLOCK);
+
+
+#---------------------------------------------------------------
+print "\nLWP::Parallel::UserAgent interface...";
+print "\nSingle bad request..\n";
+$req = new HTTP::Request GET => url("/not_found", $base);
+$req->header(X_Foo => "Bar");
+
+print STDERR "\tRegistering '".$req->url."'\n" if $::DEBUG;
+if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+}
+print "ok 1\n";
+
+my $entries = $ua->wait(5);
+foreach (keys %$entries) {
+ # each entry available under the url-string of their request contains
+ # a number of fields. The most important are $entry->request and
+ # $entry->response.
+ $res = $entries->{$_}->response;
+ print STDERR "Answer for '",$res->request->url, "' was \t",
+ $res->code,": ", $res->message,"\n" if $DEBUG;
+
+ print "not " unless $res->is_error
+ and $res->code == 404
+ and $res->message =~ /not\s+found/i;
+
+ print "ok 2\n";
+ print "not " if !$res->server and !$res->date;
+ print "ok 3\n";
+}
+
+#----------------------------------------------------------------
+print "\nMultiple Requests...\n";
+sub httpd_get_page0
+{
+ my($c) = @_;
+ $c->send_basic_header(200);
+ print $c "Content-Type: text/plain\015\012";
+ $c->send_crlf;
+ print $c "This is page 0";
+}
+
+sub httpd_get_page1
+{
+ my($c) = @_;
+ $c->send_basic_header(200);
+ print $c "Content-Type: text/plain\015\012";
+ $c->send_crlf;
+ print $c "This is page 1";
+}
+
+sub httpd_get_page2
+{
+ my($c) = @_;
+ $c->send_basic_header(200);
+ print $c "Content-Type: text/plain\015\012";
+ $c->send_crlf;
+ print $c "This is page 2";
+}
+
+$ua->initialize;
+for $i (0..11) {
+ my $page = $i % 3;
+ $req = new HTTP::Request GET => url("/page$page", $base);
+ print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+ if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+ last;
+ }
+}
+print "ok 4\n";
+
+$entries = $ua->wait(5);
+foreach (keys %$entries) {
+ $res = $entries->{$_}->response;
+ my $url = $res->request->url;
+ $url =~ /([0-9]+)$/;
+ my $num = $1;
+
+ print STDERR "Answer for '$url' was \n\t",
+ $res->code,": ", $res->message," \"", $res->content, "\"\n"
+ if $DEBUG;
+
+ unless ( $res->content =~ /This is page $num/ ) {
+ print "not ";
+ last;
+ }
+}
+print "ok 5\n";
+
+#----------------------------------------------------------------
+print "\nLarger number of requests (40)..\n";
+
+$ua->initialize;
+
+for $i (0..40) {
+ my $page = $i % 3;
+ $req = new HTTP::Request GET => url("/page$page", $base);
+ print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+ if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+ last;
+ }
+}
+print "ok 6\n";
+$i=0;
+$entries = $ua->wait(5);
+foreach (keys %$entries) {
+ $res = $entries->{$_}->response;
+ my $url = $res->request->url;
+ $url =~ /([0-9]+)$/;
+ my $num = $1;
+
+ print STDERR "Answer for '$url' was \n\t",
+ $res->code,": ", $res->message," \"", $res->content, "\"\n"
+ if $DEBUG;
+ unless ($res->content =~ /This is page $num/)
+ {
+ print STDERR "Oops: Answer ($i) for '$url' was \n\t",
+ $res->code,": ", $res->message," \"", $res->content, "\"\n";
+
+ print ("not ");
+ last;
+ }
+ $i++;
+}
+print "ok 7\n";
+
+#----------------------------------------------------------------
+sub httpd_get_echo
+{
+ my($c, $req) = @_;
+ $c->send_basic_header(200);
+ print $c "Content-Type: text/plain\015\012";
+ $c->send_crlf;
+ print $c $req->as_string;
+}
+
+sub httpd_get_redirect
+{
+ my($c) = @_;
+ $c->send_redirect("/echo/redirect");
+}
+
+print "\nCheck redirect on/off...\n";
+
+$ua->initialize;
+$ua->redirect(1);
+
+$req = new HTTP::Request GET => url("/redirect/foo", $base);
+print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not ok 8\nnot ok 9\n";
+} else {
+ $entries = $ua->wait(5);
+ foreach (keys %$entries) {
+ $res = $entries->{$_}->response;
+ print "not " unless $res->is_success
+ and $res->content =~ m|/echo/redirect|;
+ print "ok 8\n";
+ print "not " unless $res->previous
+ and $res->previous->is_redirect
+ and $res->previous->code == 301;
+ print "ok 9\n";
+ last;
+ }
+}
+
+$ua->initialize;
+$ua->redirect(0);
+
+print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not ok 10\nnot ok 11\n";
+} else {
+ $entries = $ua->wait(5);
+ foreach (keys %$entries) {
+ $res = $entries->{$_}->response;
+ print "not " if $res->is_success
+ and $res->content =~ m|/echo/redirect|;
+ print "ok 10\n";
+ print "not " unless $res->code == 301;
+ print "ok 11\n";
+ last;
+ }
+}
+
+#----------------------------------------------------------------
+print "\nTesting ordered connections...\n";
+
+my @req_history = ();
+sub httpd_get_num
+{
+ my($c, $req) = @_;
+ my $num = $req->url->as_string;
+ $num =~ s/^.*\?//;
+ push @req_history, $num;
+ my $msg = "Request History: ". join (", ", @req_history) . "\n";
+
+# sleep (int(rand(3))); # sleep some
+
+ $c->send_basic_header(200);
+ print $c "Content-Type: text/plain\015\012";
+ $c->send_crlf;
+ print $c $msg;
+}
+
+my $uao = new myPUA { 'handle_in_order' => 0 };
+
+for (0..40) {
+ $req = new HTTP::Request GET => url("/num?$_", $base);
+ print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+ if ( $res = $uao->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+ }
+}
+print "ok 12\n";
+
+$entries = $uao->wait(5);
+
+my @history = @{$uao->order_history()};
+for (0..40) {
+ print "not" unless $history[$_] == $_;
+}
+
+print "ok 13\n";
+
+#----------------------------------------------------------------
+print "\nTerminating server...\n";
+sub httpd_get_quit
+{
+ my($c) = @_;
+ $c->send_error(503, "Bye, bye");
+ exit; # terminate HTTP server
+}
+$ua->initialize;
+$req = new HTTP::Request GET => url("/quit", $base);
+print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+}
+print "ok 14\n";
+
+$entries = $ua->wait(5);
+foreach (keys %$entries) {
+ # each entry available under the url-string of their request contains
+ # a number of fields. The most important are $entry->request and
+ # $entry->response.
+ $res = $entries->{$_}->response;
+ print STDERR "Answer for '",$res->request->url, "' was \t",
+ $res->code,": ", $res->message,"\n" if $DEBUG;
+
+ print "not " unless $res->code == 503 and $res->content =~ /Bye, bye/;
+ print "ok 15\n";
+}
+
diff --git a/t/local/timeouts.t b/t/local/timeouts.t
new file mode 100644
index 0000000..19bf661
--- /dev/null
+++ b/t/local/timeouts.t
@@ -0,0 +1,249 @@
+$| = 1; # autoflush
+
+$DEBUG = 0;
+
+# uncomment the following line if you want to run these tests from the command
+# line using the local version of Parallel::UserAgent (otherwise perl will take
+# the already installed version):
+# use lib ('./lib');
+
+# First we create HTTP server for testing our http protocol
+# (this is stolen from the libwww t/local/http.t file)
+
+require IO::Socket; # make sure this works before we try to make a HTTP::Daemon
+
+# First we make ourself a daemon in another process
+my $D = shift || '';
+if ($D eq 'daemon')
+ {
+ # I am the Daemon
+ require HTTP::Daemon;
+
+ my $d = new HTTP::Daemon Timeout => 10, LocalAddr=>'localhost';
+
+ print "Please to meet you at: <URL:", $d->url, ">\n";
+ open(STDOUT, ">/dev/null");
+
+ my $slave;
+ &handle_connection($slave) while $slave = $d->accept;
+ print STDERR "HTTP Server terminated\n" if $DEBUG;
+ exit 0;
+ } else {
+ # I am the testing program
+ use Config;
+ open(DAEMON, "$Config{'perlpath'} local/timeouts.t daemon |") or die "Can't exec daemon: $!";
+ }
+
+sub handle_connection {
+ my $connection = shift; # HTTP::Daemon::ClientConn
+
+ my $pid = fork;
+ if ($pid) { # spawn OK, and I'm the parent
+ close $connection;
+ return;
+ }
+ ## spawn failed, or I'm a good child
+ my $request = $connection->get_request;
+ if (defined($request)) {
+ my $p = ($request->url->path_segments)[1];
+ my $func = lc("httpd_" . $request->method . "_$p");
+ if (defined &$func) {
+ &$func($connection, $request);
+ } else {
+ $connection->send_error(404);
+ }
+ close $connection;
+ $connection = undef; # close connection
+ }
+ exit 0 if defined $pid; # exit if I'm a good child with a good parent
+}
+
+# This is the testing script
+
+print "1..25\n";
+
+my $greeting = <DAEMON>;
+$greeting =~ /(<[^>]+>)/;
+
+require URI;
+my $base = URI->new($1);
+sub url {
+ my $u = URI->new(@_);
+ $u = $u->abs($_[1]) if @_ > 1;
+ $u->as_string;
+}
+
+print "Will access HTTP server at $base\n";
+
+# do tests from here on
+
+#use LWP::Debug qw(+);
+
+require LWP::Parallel::UserAgent;
+require HTTP::Request;
+my $ua = new LWP::Parallel::UserAgent;
+$ua->agent("Mozilla/0.01 " . $ua->agent);
+$ua->from('marclang at cpan.org');
+
+#----------------------------------------------------------------
+print "\n - Checking Timeouts:\n";
+sub httpd_get_timeout
+{
+ my($c) = @_;
+ sleep(4); # do not answer for 4 seconds;
+ $c->send_basic_header(200);
+ print $c "Content-Type: text/plain\015\012";
+ $c->send_crlf;
+ print $c "This page took 4 seconds";
+}
+
+$ua->initialize;
+print " * for single request..\n";
+$req = new HTTP::Request GET => url("/timeout", $base);
+print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+
+if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+}
+print "ok 1\n";
+
+$entries = $ua->wait(2); # be impatient
+
+foreach (keys %$entries) {
+ # each entry available under the url-string of their request contains
+ # a number of fields. The most important are $entry->request and
+ # $entry->response.
+ $res = $entries->{$_}->response;
+ print STDERR "Answer for '",$res->request->url, "' was \t",
+ $res->code,": ", $res->message,"\n" if $DEBUG;
+
+ print "not " unless $res->is_error
+ and $res->code == 408 # timeout
+ and $res->message =~ /timeout/i;
+
+ print "ok 2\n";
+}
+
+#----------------------------------------------------------------
+$ua->initialize;
+print " * for multiple requests...\n";
+
+$req = new HTTP::Request GET => url("/timeout", $base);
+my $i;
+for $i (0..19) {
+ print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+ if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+ last;
+ }
+}
+print "ok 3\n";
+
+$entries = $ua->wait(2); # be impatient
+
+foreach (keys %$entries) {
+ # each entry available under the url-string of their request contains
+ # a number of fields. The most important are $entry->request and
+ # $entry->response.
+ $res = $entries->{$_}->response;
+ print STDERR "Answer for '",$res->request->url, "' was \t",
+ $res->code,": ", $res->message,"\n" if $DEBUG;
+
+ print "not " unless $res->is_error
+ and $res->code == 408 # timeout
+ and $res->message =~ /timeout/i;
+
+ print "ok 4\n";
+}
+
+
+#----------------------------------------------------------------
+$ua->initialize;
+print " * for mixed requests...\n";
+sub httpd_get_notimeout
+{
+ my($c) = @_;
+ $c->send_basic_header(200);
+ print $c "Content-Type: text/plain\015\012";
+ $c->send_crlf;
+ print $c "This page took no time!";
+}
+
+sub httpd_get_sometimeout
+{
+ my($c) = @_;
+ $c->send_basic_header(200);
+ print $c "Content-Type: text/plain\015\012";
+ $c->send_crlf;
+ print $c "This page took no time to send, but 4 seconds to close";
+ sleep(4); # do not answer for 4 seconds;
+}
+
+my @kind = ("", "no", "some");
+for $i (0..17) {
+ my $page = $i % 3;
+ $req = new HTTP::Request GET => url("/". $kind[$page] . "timeout", $base);
+ print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+ if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not";
+ last;
+ }
+}
+print "ok 5\n";
+
+$entries = $ua->wait(2); # be impatient
+$i = 6;
+foreach (keys %$entries) {
+ # each entry available under the url-string of their request contains
+ # a number of fields. The most important are $entry->request and
+ # $entry->response.
+ $res = $entries->{$_}->response;
+ print STDERR "Answer for '",$res->request->url, "' was \t",
+ $res->code,": ", $res->message,"\n" if $DEBUG;
+ if ($res->request->url =~ /notimeout/) {
+ print "not " unless $res->code == 200
+ and $res->message !~ /timeout/i;
+ } elsif ($res->request->url =~ /sometimeout/) {
+ print "not " unless $res->code == 200
+ and $res->message =~ /timeout/i;
+ } else {
+ print "not " unless $res->is_error
+ and $res->code == 408 # timeout
+ and $res->message =~ /timeout/i;
+ }
+ print "ok ", $i++, "\n";
+}
+
+#----------------------------------------------------------------
+print "\nTerminating server...\n";
+sub httpd_get_quit
+{
+ my($c) = @_;
+ $c->send_error(503, "Bye, bye");
+ exit; # terminate HTTP server (does not work anymore since we're forking)
+}
+$ua->initialize;
+$req = new HTTP::Request GET => url("/quit", $base);
+print STDERR "\tRegistering '".$req->url."'\n" if $DEBUG;
+if ( $res = $ua->register ($req) ) {
+ print STDERR $res->error_as_HTML;
+ print "not ";
+}
+print "ok ", $i++, "\n";
+
+$entries = $ua->wait();
+foreach (keys %$entries) {
+ # each entry available under the url-string of their request contains
+ # a number of fields. The most important are $entry->request and
+ # $entry->response.
+ $res = $entries->{$_}->response;
+ print STDERR "Answer for '",$res->request->url, "' was \t",
+ $res->code,": ", $res->message,"\n" if $DEBUG;
+
+ print "not " unless $res->code == 503 and $res->content =~ /Bye, bye/;
+ print "ok ", $i++, "\n";
+}
+
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/liblwp-parallel-perl.git
More information about the debian-med-commit
mailing list