今天用了一下pyspider爬虫框架,遇到了一点小问题,总结一下:
用pyspider爬取http网站,没有出错,而在爬取https网站出问题。错误信息如下
[E 170414 21:02:52 base_handler:203] HTTP 599: SSL certificate problem: unable to get local issuer certificate Traceback (most recent call last): File "d:\python\lib\site-packages\pyspider\libs\base_handler.py", line 196, in run_task result = self._run_task(task, response) File "d:\python\lib\site-packages\pyspider\libs\base_handler.py", line 175, in _run_task response.raise_for_status() File "d:\python\lib\site-packages\pyspider\libs\response.py", line 172, in raise_for_status six.reraise(Exception, Exception(self.error), Traceback.from_string(self.traceback).as_traceback()) File "d:\python\lib\site-packages\six.py", line 685, in reraise raise value.with_traceback(tb) File "d:\python\lib\site-packages\pyspider\fetcher\tornado_fetcher.py", line 378, in http_fetch response = yield gen.maybe_future(self.http_client.fetch(request)) File "d:\python\lib\site-packages\tornado\httpclient.py", line 102, in fetch self._async_client.fetch, request, **kwargs)) File "d:\python\lib\site-packages\tornado\ioloop.py", line 457, in run_sync return future_cell[0].result() File "d:\python\lib\site-packages\tornado\concurrent.py", line 237, in result raise_exc_info(self._exc_info) File "", line 3, in raise_exc_info Exception: HTTP 599: SSL certificate problem: unable to get local issuer certificate
之前没用框架的时候也发生过SSL证书问题。在抓取函数中加入validate_cert=False就可以解决
这里同样的在self.crawl()最后加上validate_cert=False这个参数就完美解决了
简而言之,就是给self.crawl()加一个参数,编程self.crawl(XXXXX,validate_cert=False)