Heritrix3.1.0源码解析(二十二)
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Heritrix3.1.0源码解析(⼆⼗⼆)
本⽂继续分析Heritrix3.1.0系统的源码,其实本⼈感觉接下来待分析的问题不是⼀两篇⽂章能够澄清,本⼈不能因为迫于表述⽽乱了问题本⾝的章法,接下来的分析的Heritrix3.1.0系统封装HttpClient组件可能要分⼏篇⽂章来解析
我们知道,Heritrix3.1.0系统是通过封装HttpClient组件(⾥⾯封装了Socket)来与服务器通信的,Socket的输出流写⼊数据,输⼊流接收数据
那么Heritrix3.1.0系统是怎样封装Httpclient(Heritrix3.1.0系统是采⽤的以前的Apache版本)组件的呢?
我们可以看到,在FetchHTTP处理器⾥⾯有⼀段静态代码块,⽤于注册Socket⼯⼚,分别⽤于HTTP通信与HTTPS通信协议(基于TCP协议通信,⾄于两者的关系本⽂就不再分析了,不懂的读者可以参考⽹络通信⽅⾯的教程)
/**
* 注册http和https协议
*/
static {
Protocol.registerProtocol("http", new Protocol("http",
new HeritrixProtocolSocketFactory(), 80));
try {
ProtocolSocketFactory psf = new HeritrixSSLProtocolSocketFactory();
Protocol p = new Protocol("https", psf, 443);
Protocol.registerProtocol("https", p);
} catch (KeyManagementException e) {
e.printStackTrace();
} catch (KeyStoreException e) {
e.printStackTrace();
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
}
}
上⾯的两个类HeritrixProtocolSocketFactory和HeritrixSSLProtocolSocketFactory都实现了HttpClient组件的ProtocolSocketFactory接⼝,⽤于创建客户端Socket对象(HeritrixSSLProtocolSocketFactory类间接实现了ProtocolSocketFactory接⼝)
ProtocolSocketFactory接⼝定义了创建SOCKET对象的⽅法(package mons.httpclient.protocol)
/**
* A factory for creating Sockets.
*
* <p>Both {@link ng.Object#equals(ng.Object) Object.equals()} and
* {@link ng.Object#hashCode() Object.hashCode()} should be overridden appropriately.
* Protocol socket factories are used to uniquely identify <code>Protocol</code>s and
* <code>HostConfiguration</code>s, and <code>equals()</code> and <code>hashCode()</code> are
* required for the correct operation of some connection managers.</p>
*
* @see Protocol
*
* @author Michael Becke
* @author <a href="mailto:mbowler@">Mike Bowler</a>
*
* @since 2.0
*/
public interface ProtocolSocketFactory {
/**
* Gets a new socket connection to the given host.
*
* @param host the host name/IP
* @param port the port on the host
* @param localAddress the local host name/IP to bind the socket to
* @param localPort the port on the local machine
*
* @return Socket a new socket
*
* @throws IOException if an I/O error occurs while creating the socket
* @throws UnknownHostException if the IP address of the host cannot be
* determined
*/
Socket createSocket(
String host,
int port,
InetAddress localAddress,
int localPort
) throws IOException, UnknownHostException;
/**
* Gets a new socket connection to the given host.
*
* @param host the host name/IP
* @param port the port on the host
* @param localAddress the local host name/IP to bind the socket to
* @param localPort the port on the local machine
* @param params {@link HttpConnectionParams Http connection parameters}
*
* @return Socket a new socket
*
* @throws IOException if an I/O error occurs while creating the socket
* @throws UnknownHostException if the IP address of the host cannot be
* determined
* @throws ConnectTimeoutException if socket cannot be connected within the
* given time limit
*
* @since 3.0
*/
Socket createSocket(
String host,
int port,
InetAddress localAddress,
int localPort,
HttpConnectionParams params
) throws IOException, UnknownHostException, ConnectTimeoutException;
/**
* Gets a new socket connection to the given host.
*
* @param host the host name/IP
* @param port the port on the host
*
* @return Socket a new socket
*
* @throws IOException if an I/O error occurs while creating the socket
* @throws UnknownHostException if the IP address of the host cannot be
* determined
*/
Socket createSocket(
String host,
int port
) throws IOException, UnknownHostException;
}
HeritrixProtocolSocketFactory类实现了上⾯的ProtocolSocketFactory接⼝(⽤于HTTP通信)public class HeritrixProtocolSocketFactory implements ProtocolSocketFactory {
/**
* Constructor.
*/
public HeritrixProtocolSocketFactory() {
super();
}
@Override
public Socket createSocket(String host, int port, InetAddress localAddress,
int localPort) throws IOException, UnknownHostException {
// TODO Auto-generated method stub
return new Socket(host, port, localAddress, localPort);
}
@Override
public Socket createSocket(String host, int port, InetAddress localAddress,
int localPort, HttpConnectionParams params) throws IOException,
UnknownHostException, ConnectTimeoutException {
// TODO Auto-generated method stub
// Below code is from the DefaultSSLProtocolSocketFactory#createSocket
// method only it has workarounds to deal with pre-1.4 JVMs. I've
// cut these out.
if (params == null) {
throw new IllegalArgumentException("Parameters may not be null");
}
Socket socket = null;
int timeout = params.getConnectionTimeout();
if (timeout == 0) {
socket = createSocket(host, port, localAddress, localPort);
} else {
socket = new Socket();
InetAddress hostAddress;
Thread current = Thread.currentThread();
if (current instanceof HostResolver) {
HostResolver resolver = (HostResolver)current;
hostAddress = resolver.resolve(host);
} else {
hostAddress = null;
}
InetSocketAddress address = (hostAddress != null)?
new InetSocketAddress(hostAddress, port):
new InetSocketAddress(host, port);
socket.bind(new InetSocketAddress(localAddress, localPort));
try {
socket.connect(address, timeout);
} catch (SocketTimeoutException e) {
// Add timeout info. to the exception.
throw new SocketTimeoutException(e.getMessage() +
": timeout set at " + Integer.toString(timeout) + "ms.");
}
assert socket.isConnected(): "Socket not connected " + host;
}
return socket;
}
@Override
public Socket createSocket(String host, int port) throws IOException,
UnknownHostException {
// TODO Auto-generated method stub
return new Socket(host, port);
}
/**
* All instances of DefaultProtocolSocketFactory are the same.
* @param obj Object to compare.
* @return True if equal
*/
public boolean equals(Object obj) {
return ((obj != null) &&
obj.getClass().equals(HeritrixProtocolSocketFactory.class));
}
/**
* All instances of DefaultProtocolSocketFactory have the same hash code.
* @return Hash code for this object.
*/
public int hashCode() {
return HeritrixProtocolSocketFactory.class.hashCode();
}
}
HeritrixSSLProtocolSocketFactory类通过SecureProtocolSocketFactory实现SecureProtocolSocketFactory接⼝(间接实现了ProtocolSocketFactory接⼝)⽤于HTTPS通信
SecureProtocolSocketFactory接⼝⽅法如下
/**
* A ProtocolSocketFactory that is secure.
*
* @see mons.httpclient.protocol.ProtocolSocketFactory
*
* @author Michael Becke
* @author <a href="mailto:mbowler@">Mike Bowler</a>
* @since 2.0
*/
public interface SecureProtocolSocketFactory extends ProtocolSocketFactory {
/**
* Returns a socket connected to the given host that is layered over an
* existing socket. Used primarily for creating secure sockets through
* proxies.
*
* @param socket the existing socket
* @param host the host name/IP
* @param port the port on the host
* @param autoClose a flag for closing the underling socket when the created
* socket is closed
*
* @return Socket a new socket
*
* @throws IOException if an I/O error occurs while creating the socket
* @throws UnknownHostException if the IP address of the host cannot be
* determined
*/
Socket createSocket(
Socket socket,
String host,
int port,
boolean autoClose
) throws IOException, UnknownHostException;
}
HeritrixSSLProtocolSocketFactory类实现上⾯的SecureProtocolSocketFactory接⼝
/**
* Implementation of the commons-httpclient SSLProtocolSocketFactory so we
* can return SSLSockets whose trust manager is
* {@link org.archive.httpclient.ConfigurableX509TrustManager}.
*
* We also go to the heritrix cache to get IPs to use making connection.
* To this, we have dependency on {@link HeritrixProtocolSocketFactory};
* its assumed this class and it are used together.
* See {@link HeritrixProtocolSocketFactory#getHostAddress(ServerCache,String)}.
*
* @author stack
* @version $Id: HeritrixSSLProtocolSocketFactory.java 6637 2009-11-10 21:03:27Z gojomo $ * @see org.archive.httpclient.ConfigurableX509TrustManager
*/
public class HeritrixSSLProtocolSocketFactory implements SecureProtocolSocketFactory { // static final String SERVER_CACHE_KEY = "heritrix.server.cache";
static final String SSL_FACTORY_KEY = "heritrix.ssl.factory";
/***
* Socket factory with default trust manager installed.
*/
private SSLSocketFactory sslDefaultFactory = null;
/**
* Shutdown constructor.
* @throws KeyManagementException
* @throws KeyStoreException
* @throws NoSuchAlgorithmException
*/
public HeritrixSSLProtocolSocketFactory()
throws KeyManagementException, KeyStoreException, NoSuchAlgorithmException{
// Get an SSL context and initialize it.
SSLContext context = SSLContext.getInstance("SSL");
// I tried to get the default KeyManagers but doesn't work unless you
// point at a physical keystore. Passing null seems to do the right
// thing so we'll go w/ that.
context.init(null, new TrustManager[] {
new ConfigurableX509TrustManager(
ConfigurableX509TrustManager.DEFAULT)}, null);
this.sslDefaultFactory = context.getSocketFactory();
}
@Override
public Socket createSocket(String host, int port, InetAddress clientHost,
int clientPort)
throws IOException, UnknownHostException {
return this.sslDefaultFactory.createSocket(host, port,
clientHost, clientPort);
}
@Override
public Socket createSocket(String host, int port)
throws IOException, UnknownHostException {
return this.sslDefaultFactory.createSocket(host, port);
}
@Override
public synchronized Socket createSocket(String host, int port,
InetAddress localAddress, int localPort, HttpConnectionParams params)
throws IOException, UnknownHostException {
// Below code is from the DefaultSSLProtocolSocketFactory#createSocket
// method only it has workarounds to deal with pre-1.4 JVMs. I've
// cut these out.
if (params == null) {
throw new IllegalArgumentException("Parameters may not be null");
}
Socket socket = null;
int timeout = params.getConnectionTimeout();
if (timeout == 0) {
socket = createSocket(host, port, localAddress, localPort);
} else {
SSLSocketFactory factory = (SSLSocketFactory)params.
getParameter(SSL_FACTORY_KEY);//SSL_FACTORY_KEY
SSLSocketFactory f = (factory != null)? factory: this.sslDefaultFactory;
socket = f.createSocket();
Thread current = Thread.currentThread();
InetAddress hostAddress;
if (current instanceof HostResolver) {
HostResolver resolver = (HostResolver)current;
hostAddress = resolver.resolve(host);
} else {
hostAddress = null;
}
InetSocketAddress address = (hostAddress != null)?
new InetSocketAddress(hostAddress, port):
new InetSocketAddress(host, port);
socket.bind(new InetSocketAddress(localAddress, localPort));
try {
socket.connect(address, timeout);
} catch (SocketTimeoutException e) {
// Add timeout info. to the exception.
throw new SocketTimeoutException(e.getMessage() +
": timeout set at " + Integer.toString(timeout) + "ms.");
}
assert socket.isConnected(): "Socket not connected " + host;
}
return socket;
}
@Override
public Socket createSocket(Socket socket, String host, int port,
boolean autoClose)
throws IOException, UnknownHostException {
return this.sslDefaultFactory.createSocket(socket, host,
port, autoClose);
}
public boolean equals(Object obj) {
return ((obj != null) && obj.getClass().
equals(HeritrixSSLProtocolSocketFactory.class));
}
public int hashCode() {
return HeritrixSSLProtocolSocketFactory.class.hashCode();
}
}
HTTPS通信的SOCKET对象是通过SSLSocketFactory sslDefaultFactory(SSLSocket⼯⼚)对象创建的,为了创建SSLSocketFactory sslDefaultFactory对象
Heritrix3.1.0系统定义了X509TrustManager接⼝的实现类ConfigurableX509TrustManager(⽤于SSL通信,⾃动接收证书)
/**
* A configurable trust manager built on X509TrustManager.
*
* If set to 'open' trust, the default, will get us into sites for whom we do
* not have the CA or any of intermediary CAs that go to make up the cert chain
* of trust. Will also get us past selfsigned and expired certs. 'loose'
* trust will get us into sites w/ valid certs even if they are just
* selfsigned. 'normal' is any valid cert not including selfsigned. 'strict'
* means cert must be valid and the cert DN must match server name.
*
* <p>Based on pointers in
* <a href="/commons/httpclient/sslguide.html">SSL
* Guide</a>,
* and readings done in <a
* href="/j2se/1.4.2/docs/guide/security/jsse/JSSERefGuide.html#Introduction">JSSE
* Guide</a>.
*
* <p>TODO: Move to an ssl subpackage when we have other classes other than
* just this one.
*
* @author stack
* @version $Id: ConfigurableX509TrustManager.java 6637 2009-11-10 21:03:27Z gojomo $
*/
public class ConfigurableX509TrustManager implements X509TrustManager
{
/**
* Logging instance.
*/
protected static Logger logger = Logger.getLogger(
"org.archive.httpclient.ConfigurableX509TrustManager");
public static enum TrustLevel {
/**
* Trust anything given us.
*
* Default setting.
*
* <p>See <a href="/egs/.ssl/TrustAll.html">
* e502. Disabling Certificate Validation in an HTTPS Connection</a> from
* the java almanac for how to trust all.
*/
OPEN,
/**
* Trust any valid cert including self-signed certificates.
*/
/**
* Normal jsse behavior.
*
* Seemingly any certificate that supplies valid chain of trust.
*/
NORMAL,
/**
* Strict trust.
*
* Ensure server has same name as cert DN.
*/
STRICT,
}
/**
* Default setting for trust level.
*/
public final static TrustLevel DEFAULT = TrustLevel.OPEN;
/**
* Trust level.
*/
private TrustLevel trustLevel = DEFAULT;
/**
* An instance of the SUNX509TrustManager that we adapt variously
* depending upon passed configuration.
*
* We have it do all the work we don't want to.
*/
private X509TrustManager standardTrustManager = null;
public ConfigurableX509TrustManager()
throws NoSuchAlgorithmException, KeyStoreException {
this(DEFAULT);
}
/**
* Constructor.
*
* @param level Level of trust to effect.
*
* @throws NoSuchAlgorithmException
* @throws KeyStoreException
*/
public ConfigurableX509TrustManager(TrustLevel level)
throws NoSuchAlgorithmException, KeyStoreException {
super();
TrustManagerFactory factory = TrustManagerFactory.
getInstance(TrustManagerFactory.getDefaultAlgorithm());
// Pass in a null (Trust) KeyStore. Null says use the 'default'
// 'trust' keystore (KeyStore class is used to hold keys and to hold
// 'trusts' (certs)). See 'X509TrustManager Interface' in this doc:
//
// /j2se/1.4.2/docs/guide/security/jsse/JSSERefGuide.html#Introduction factory.init((KeyStore)null);
TrustManager[] trustmanagers = factory.getTrustManagers();
if (trustmanagers.length == 0) {
throw new NoSuchAlgorithmException(TrustManagerFactory.
getDefaultAlgorithm() + " trust manager not supported");
}
this.standardTrustManager = (X509TrustManager)trustmanagers[0];
this.trustLevel = level;
}
@Override
public void checkClientTrusted(X509Certificate[] certificates, String type) throws CertificateException {
if (this.trustLevel.equals(TrustLevel.OPEN)) {
return;
}
this.standardTrustManager.checkClientTrusted(certificates, type);
}
@Override
public void checkServerTrusted(X509Certificate[] certificates, String type) throws CertificateException {
if (this.trustLevel.equals(TrustLevel.OPEN)) {
try {
this.standardTrustManager.checkServerTrusted(certificates, type);
if (this.trustLevel.equals(TrustLevel.STRICT)) {
logger.severe(TrustLevel.STRICT + " not implemented.");
}
} catch (CertificateException e) {
if (this.trustLevel.equals(TrustLevel.LOOSE) &&
certificates != null && certificates.length == 1)
{
// If only one cert and its valid and it caused a
// CertificateException, assume its selfsigned.
X509Certificate certificate = certificates[0];
certificate.checkValidity();
} else {
// If we got to here, then we're probably NORMAL. Rethrow.
throw e;
}
}
}
@Override
public X509Certificate[] getAcceptedIssuers() {
return this.standardTrustManager.getAcceptedIssuers();
}
}
---------------------------------------------------------------------------
本系列Heritrix 3.1.0 源码解析系本⼈原创
转载请注明出处博客园刺猬的温驯。