快资讯丨Python 下载大文件，哪种方式速度更快！

2022-08-16 17:56:40来源：Python人工智能技术

通常，我们都会用 requests 库去下载，这个库用起来太方便了。

方法一

使用以下流式代码，无论下载文件的大小如何，Python 内存占用都不会增加：

def download_file(url):    local_filename = url.split("/")[-1]    # 注意传入参数 stream=True    with requests.get(url, stream=True) as r:        r.raise_for_status()        with open(local_filename, "wb") as f:            for chunk in r.iter_content(chunk_size=8192):                 f.write(chunk)    return local_filename

如果你有对 chunk 编码的需求，那就不该传入 chunk_size 参数，且应该有 if 判断。

(资料图片)

def download_file(url):    local_filename = url.split("/")[-1]    # 注意传入参数 stream=True    with requests.get(url, stream=True) as r:        r.raise_for_status()        with open(local_filename, "w") as f:            for chunk in r.iter_content():                 if chunk:                    f.write(chunk.decode("utf-8"))    return local_filename

iter_content^[1]函数本身也可以解码，只需要传入参数 decode_unicode = True 即可。另外，搜索公众号顶级Python后台回复“进阶”，获取一份惊喜礼包。

请注意，使用 iter_content 返回的字节数并不完全是 chunk_size，它是一个通常更大的随机数，并且预计在每次迭代中都会有所不同。

方法二

使用Response.raw^[2]和shutil.copyfileobj^[3]

import requestsimport shutildef download_file(url):    local_filename = url.split("/")[-1]    with requests.get(url, stream=True) as r:        with open(local_filename, "wb") as f:            shutil.copyfileobj(r.raw, f)    return local_filename

这将文件流式传输到磁盘而不使用过多的内存，并且代码更简单。

注意：根据文档，Response.raw 不会解码，因此如果需要可以手动替换 r.raw.read 方法

response.raw.read = functools.partial(response.raw.read, decode_content=True)

速度

方法二更快。方法一如果 2-3 MB/s 的话，方法二可以达到近 40 MB/s。

参考资料

[1]iter_content:https://requests.readthedocs.io/en/latest/api/#requests.Response.iter_content

[2]Response.raw:https://requests.readthedocs.io/en/latest/api/#requests.Response.raw

[3]shutil.copyfileobj:https://docs.python.org/3/library/shutil.html#shutil.copyfileobj

关键词：参考资料不会增加

快资讯丨Python 下载大文件，哪种方式速度更快！

相关新闻

中山外贸展现出较强韧性前三季度全市外贸进出口2187.9亿元

做一个简易的配置中心，顺带还给整合到了SpringCloud

为什么JSON.parse会损坏大数字，如何解决这个问题？

在任期第一年每位CIO都必须完成的12件事

一次服务器非法重启后导致的故障排查记录

如何在Linux中使用xargs命令

聊聊国产数据库TiDB相关知识，你学会了吗？

什么是 CDN 缓存命中率以及如何计算和优化它？

在传统运维监控系统中加入新的预警能力

Kotlin Flow响应式编程，基础知识入门

程序员应如何理解Reactor模式？

一文掌握所有命令行，包括73个“冷门但有用”的技巧｜GitHub 11万标星之作

一文了解云计算的基本指南

LeCun转推，PyTorch GPU内存分配有了火焰图可视化工具

如何提高无线路由器的安全性

聚焦

IT

科技