This document discusses using an HTTP proxy to load specific web pages for testing purposes. It explains that many web pages contain resources from multiple domains that cannot be saved locally. An HTTP proxy can be used to intercept requests and redirect local URLs to a test server, while passing through external URLs to the actual web server. The document provides code examples for setting up an HTTP proxy using HTTP::Proxy and modifying the LWP user agent to handle local and remote URLs differently. Using this approach allows a test loop to load repeatable web page content from both local and external sources.
Apidays New York 2024 - The value of a flexible API Management solution for O...
Selenium sandwich-3: Being where you aren't.
1. Selenium Sandwich Part 3: What you aren't
Steven Lembark
Workhorse Computing
lembark@wrkhors.com
2. What is a Selenium Sandwich?
Tasty!!!
No really...
3. What is a Selenium Sandwich?
Last time we saw how to combine Selenium and Plack.
Selenium calls a page.
Plack returns a specific response.
Catch: You can' get there from here.
4. What is a Selenium Sandwich?
Last time we saw how to combine Selenium and Plack.
Selenium calls a page.
Plack returns a specific response.
Catch: You can' get there from here.
Or you can, which is the problem.
5. Getting to the server
Q: How do we get a specific page loaded?
Say a Google map, Yelp search, or *aaS dashboard?
A: Load the page from a server?
6. Getting to the server
Q: How do we get a specific page loaded?
Say a Google map, Yelp search, or *aaS dashboard?
A: Load the page from a server?
What about our static content?
7. Locally sourced
You want to test a Google page.
How?
Save it locally?
Only if you want to save all of it.
9. Trucked in
Q: How many URL's does it take to make a Google page?
A: Lots.
Banners, logos, JS lib's, Java lib's, ads...
10. Trucked in
Q: How many URL's does it take to make a Google page?
A: Lots.
Banners, logos, JS lib's, Java lib's, ads...
Many are dynamic: they cannot be saved.
12. Relative paths
Many URL's are relative.
They re-cycle the schema+host+port:
http://localhost:24680/foobar.
http://localhost:24680/<everything else>
13. Relative paths
Need to ask locally for a remote page.
With the browser having no idea where it came from.
In other words: We need a proxy.
14. HTTP Proxying
Normally for security or content filtering.
Or avoiding security and content filtering.
How?
15. Explicit proxy
Configure browser.
It asks the proxy for everything.
Proxy pulls content, returns it.
Proxy decides which content goes to test server.
16. HTTP::Proxy
Run as a daemon.
User filters.
LWP as back-end for fetching.
Slow but reliable...
17. Basic proxy setup
Grab a port...
and go!
use HTTP::Proxy;
my $proxy = HTTP::Proxy->new( port => 24680 );
# or...
my $proxy = HTTP::Proxy->new;
$proxy->port( 24680 );
# loop forever
$proxy->start;
18. Initializing HTTP::Proxy
Base class
supplies
“new”.
Derived class
provides its
own “init”.
package Mine;
use parent qw( HTTP::Proxy );
my $src_dir = '';
sub init
{
# @args == whatever was passed to new
# in this case a path.
my ( undef, %argz ) = @_;
$src_dir = $argz{ src_dir } || '.'
or die 'Missing “work_dir” in MyPath';
...
}
19. Adding filters
HTTP::Proxy supports request and response filters.
Requests modify outgoing content.
Response filters hack what comes back.
Our trick is to only filter some of it.
20. Four ways to filter content
request-headers request-body
response-headers response-body
Filters go onto a stack:
$proxy->push_filter
(
response => $filter # or request => ...
);
21. Massage your body
package MyFilter;
use base qw( HTTP::Proxy::BodyFilter );
sub filter
{
# modify content in the reply
my
( $self, $dataref, $message, $protocol, $buffer )
= @_;
$$dataref =~ s/PERL/Perl/g;
}
1
__END__
22. Fix your head
package MyFilter;
use base qw( HTTP::Proxy::HeaderFilter );
# change User-Agent header in all requests
sub filter
{
my ( $self, $headers, $message ) = @_;
$message->headers->header
( User_Agent => 'MyFilter/1.0' );
...
}
23. Have to hack the request
Change:
https://whatever
to:
http://localhost:test_port/...
Or pass through to remote server.
25. Timing is everything
Modifying the response is too late.
That leaves the request or agent.
Request can easily modify headers or body.
Not the request.
26. Timing is everything
Modifying the response is too late.
That leaves the request or agent.
Request can easily modify headers or body.
Not the request.
That leaves the agent.
27. Secret Agents
Choice is a new HTTP::Proxy class (is-a).
Or replacing the agent (has-a).
For now let's try the agent.
29. Wrapping LWP::UserAgent
Anything LWP does, we check first.
Any path we know goes to test.
Any we don't goes to LWP.
Intercept all methods with AUTOLOAD.
Requires we have none of our own.
30. Generic wrapper
package Wrap::LWP;
use parent qw( LWP::UserAgent );
use Exporter::Proxy qw( wrap_lwp install_known );
our $wrap_lwp
= sub
{
my $lwp = shift or die ... ;
my $wrapper = bless $lwp, __PACKAGE __;
$wrapper
};
31. Generic wrapper
use Exporter::Proxy qw( wrap_lwp handle_locally );
use List::MoreUtils qw( uniq );
our @localz = ();
our $handle_locally
= sub
{
# list of URL's is on the stack.
# could be literals, regexen, objects.
# lacking smart match, use if-blocks.
@localz = uniq @localz, @_;
return
};
32. Generic wrapper
our $AUTOLOAD = '';
AUTOLOAD
{
my ( $wrapper, $request ) = @_;
my $url = $request->url;
my $path = $url->path;
if( exists $known{ $path } )
{
# redirect this to the test server
$url->scheme( 'http' );
$url->host ( 'localhost' );
$url->port ( 24680 );
}
...
33. Generic wrapper
# now re-dispatch this to the LWP object.
# this is the same for any wrapper.
# goto preserves the call order (e.g., croak works).
my $i = rindex $AUTOLOAD, ':';
my $name = substr $AUTOLOAD, 1+$i;
my $agent = $$wrapper;
my $handler = $agent->can( $name )
or die ... ;
splice @_, 0, 1, $agent;
goto $handler
}
34. Using the wrapper
use Wrap::LWP;
use HTTP::Proxy;
$handle_locally->
(
'https://foo/bar',
'http://bletch/blort?bim="bam"'
);
my $proxy = HTTP::Proxy->new( ... );
my $wrapper = $wrap_lwp->( $proxy->agent );
$proxy->agent( $wrapper );
$proxy->start;
35. TMTOWDTI
AUTOLOAD can handle known sites.
Instead of modifying the URL: just deal with it.
Upside: Skip LWP for local content.
Downside: Proxy gets more complicated.
36. Result
Known pages are handled locally.
Others are passed to the cloud.
Server & client have repeatable sequence.
The test loop is closed.
37. So...
When you need to be who you're not: Use a proxy.
HTTP::Proxy gives control of request, reply, & agent.
Handling LWP is easy enough.
Which gives us a nice, wrapped sandwich.