<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.5">Jekyll</generator><link href="https://sanbai.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://sanbai.github.io/" rel="alternate" type="text/html" /><updated>2024-06-19T17:16:30+00:00</updated><id>https://sanbai.github.io/feed.xml</id><title type="html">sanbai’s blog</title><subtitle>My journey towards DL 🧠</subtitle><entry><title type="html">crewAI 如何实现 agent 自主互相调用🤖</title><link href="https://sanbai.github.io/ai/agent/2024/06/09/AgentHunt-Ep.01-crewAI.html" rel="alternate" type="text/html" title="crewAI 如何实现 agent 自主互相调用🤖" /><published>2024-06-09T10:46:04+00:00</published><updated>2024-06-09T10:46:04+00:00</updated><id>https://sanbai.github.io/ai/agent/2024/06/09/AgentHunt-Ep.01-crewAI</id><content type="html" xml:base="https://sanbai.github.io/ai/agent/2024/06/09/AgentHunt-Ep.01-crewAI.html"><![CDATA[<blockquote>
  <p>本文预设读者对 AI agent, function-call, tool use 等概念有一定了解.</p>
</blockquote>

<p>crewAI 的一个突出特点是基于角色的 agent 设计, 允许用户自定义具有特定角色, 目标和工具的 agent.
<img src="/assets/images/pic-crewai.png" alt="pic-crewai" />
crewAI 的 agent 之间可以自主委派任务, 不要求明确指示. 和要求显式定义执行步骤的框架 ( e.g. LangChain ) 相比, 灵活性更好. 接下来让我们看看这些特性是如何实现的.</p>

<h2 id="核心概念--实现">核心概念 &amp; 实现</h2>
<p><em>Crew</em> 是一个容器, 包含了一组需要完成的 tasks, 参与任务的 agents, 和可以使用的 tools.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crew</span> <span class="o">=</span> <span class="n">Crew</span><span class="p">(</span>
    <span class="n">tasks</span><span class="o">=</span><span class="p">[...],</span>
    <span class="n">agents</span><span class="o">=</span><span class="p">[</span><span class="n">researcher</span><span class="p">,</span> <span class="n">writer</span><span class="p">],</span>
    <span class="n">manager_llm</span><span class="o">=</span><span class="n">ChatOpenAI</span><span class="p">(</span><span class="n">temperature</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">model</span><span class="o">=</span><span class="s">"gpt-4"</span><span class="p">),</span>
    <span class="n">process</span><span class="o">=</span><span class="n">Process</span><span class="p">.</span><span class="n">hierarchical</span><span class="p">,</span>
<span class="p">)</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">process</code> 有两个选项, <code class="language-plaintext highlighter-rouge">sequential</code> 对应任务顺序执行,  每个 task 在创建的时候需要指定一个 agent 来完成; 
 选择 <code class="language-plaintext highlighter-rouge">hierarchical</code>, 任务由一个 manager agent 进行调度, 指派其他 agent 来执行.</p>

<p><em>Agent</em> 主要封装 LLM 调用, 底层使用的 LangChain. 调用 LLM 的 prompt 由 task, tool 等几部分拼接组成
<img src="/assets/images/pic-crewai-prompt.png" alt="pic-crewai-prompt" /></p>

<p>开始执行后, agent 使用 <a href="https://arxiv.org/abs/2210.03629">ReAct</a> 方式与 LLM 交互直至任务完成 (或者超过交互次数限制).</p>

<h3 id="task-delegation">Task Delegation</h3>
<p>crewAI 有个有趣的特性, 就是 agent 可以将任务委派给其他 agent, 就像人类上班时和同事配合完成工作. 委派任务有两种形式:</p>
<ol>
  <li>ask_question: 向 coworker (其他 agent ) 提问, 返回 coworker 的输出</li>
  <li>delegate_work: 让 coworker 直接执行 task</li>
</ol>

<p>实现方式是将 agent 封装成 tool, 这样就可以通过 ReAct 方式被模型调用</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="n">tools</span> <span class="o">=</span> <span class="p">[</span>
	<span class="n">StructuredTool</span><span class="p">.</span><span class="n">from_function</span><span class="p">(</span>
		<span class="n">func</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">delegate_work</span><span class="p">,</span>
		<span class="n">name</span><span class="o">=</span><span class="s">"Delegate work to co-worker"</span><span class="p">,</span>
		<span class="n">description</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">i18n</span><span class="p">.</span><span class="n">tools</span><span class="p">(</span><span class="s">"delegate_work"</span><span class="p">).</span><span class="nb">format</span><span class="p">(</span>
			<span class="n">coworkers</span><span class="o">=</span><span class="sa">f</span><span class="s">"[</span><span class="si">{</span><span class="s">', '</span><span class="p">.</span><span class="n">join</span><span class="p">([</span><span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">agent</span><span class="p">.</span><span class="n">role</span><span class="si">}</span><span class="s">' for agent in self.agents])</span><span class="si">}</span><span class="s">]"</span>
		<span class="p">),</span>
	<span class="p">),</span>
	<span class="c1"># omitted for clarity ...
</span><span class="p">]</span>
</code></pre></div></div>

<p>agent 当作 tool 使用时对应的 description</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Delegate a specific task to one of the following co-workers: {coworkers}

The input to this tool should be the co-worker, the task you want them to do, and ALL necessary context to execute the task, they know nothing about the task, so share absolute everything you know, don't reference things but instead explain them.
</code></pre></div></div>

<h2 id="效果评估">效果评估</h2>
<p>crewAI 主要使用方式是通过 Python 代码调用, 执行过程有详细日志输出到 terminal, 绿色代表 agent 的 “思考” 过程, 紫色代表 task 的最终结果.
在使用过程也发现了几个问题.</p>

<h4 id="1-使用-tool-时-因为选择参数错误导致多次调用">1. 使用 tool 时, 因为选择参数错误导致多次调用</h4>
<p>看源码 tool 调用功能是直接引用的 LangChain, tool 的参数列表是通过 Python object <code class="language-plaintext highlighter-rouge">__annotations__</code> 拼接的, e.g. 搜索工具的参数列表:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">DuckDuckGoSearchRun</span><span class="p">.</span><span class="n">_run</span><span class="p">.</span><span class="n">__annotations__</span>
<span class="p">{</span><span class="s">'query'</span><span class="p">:</span> <span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">str</span><span class="s">'&gt;, '</span><span class="n">run_manager</span><span class="s">': typing.Optional[langchain_core.callbacks.manager.CallbackManagerForToolRun], '</span><span class="k">return</span><span class="s">': &lt;class '</span><span class="nb">str</span><span class="s">'&gt;}
</span></code></pre></div></div>
<p>这个实现很方便, 但是里面包含了较多无用信息, 会干扰能力不够强的模型.
agent 尝试几次以后可能会猜对参数, 但是下次调用又要重新猜. 这里优化方式可以考虑调用成功以后, 把本次调用方式缓存起来.</p>

<h4 id="2-task-delegation-偶现找不到指定-agent">2. task delegation 偶现找不到指定 agent</h4>
<p>crewAI 的实现中, 会使用 agent role 作为 tool name, 而 role 一般会使用角色身份一类的短语. LLM 在选定 tool 以后, 输出中包含的 tool name 可能因为大小写, 或者空白字符等问题, 无法完全匹配 agent role, 进而导致 agent 不存在的报错.
这里可以使用更宽松的匹配方式, 引入模糊匹配和编辑距离等优化方式.</p>

<h4 id="3-agent-不必要-llm-调用开销">3. agent 不必要 LLM 调用开销</h4>
<p>处理多跳 ( multiple-hop) 问题时, agent 需要多次执行 web search, 和 LLM 多次交互. e.g. 回答鲍勃马利出生地使用什么货币, 需要先知道鲍勃马利是谁在哪国出生, 那里的法定货币是什么.
用 crewAI 处理多跳问题一个常见的现象是, 第一次的 web search 返回的结果其实已经包含了足够的信息, agent 仍然决定继续追踪搜索返回的各个 url. 这些页面的文本最终都会经由 LLM 处理, 浪费时间和 token.</p>

<h2 id="结论">结论</h2>
<p>crewAI 诞生于 agent 理念被广泛关注以后, 在 agent framework 领域新的产品往往有后发优势, crewAI 使用非常简单, 它很受欢迎并不意外.
<img src="/assets/images/pic-crewai-starhistory.png" alt="pic-crewai-starhistory" />
仔细观察会发现和同类产品相比, crewAI 并没有带来本质变化 – 稳定性, 确定性和 task 完成效果依然主要仰赖 LLM 的能力, 框架自身没有带来多少优化.</p>]]></content><author><name></name></author><category term="AI" /><category term="agent" /><summary type="html"><![CDATA[本文预设读者对 AI agent, function-call, tool use 等概念有一定了解.]]></summary></entry><entry><title type="html">理解 einsum</title><link href="https://sanbai.github.io/pytorch/2024/06/01/understanding-einsum.html" rel="alternate" type="text/html" title="理解 einsum" /><published>2024-06-01T06:04:20+00:00</published><updated>2024-06-01T06:04:20+00:00</updated><id>https://sanbai.github.io/pytorch/2024/06/01/understanding-einsum</id><content type="html" xml:base="https://sanbai.github.io/pytorch/2024/06/01/understanding-einsum.html"><![CDATA[<h2 id="einsum">einsum</h2>
<p>Einstein summation 是爱因斯坦发明的一种矩阵运算标记, 旨在简化 tensor 运算表达式的书写.
<img src="/assets/images/pic-einsum-meme.png" alt="pic-einsum-meme" /></p>

<p>比如两个 tensor A, B 的乘法,可以表示成 <code class="language-plaintext highlighter-rouge">ij, jk -&gt; ik</code>, 式形统一简洁直观. 起初我了解到的规则是:</p>
<blockquote>
  <ol>
    <li>输入中重复的字母代表这些 dim 相乘;</li>
    <li>输出中省略的字母代表这在这些 dim 上求和;</li>
  </ol>
</blockquote>

<p>起初一看很 make sense, 但是这个规则很难解释其他场景, 比如 <code class="language-plaintext highlighter-rouge">ii -&gt; i</code>.
我们可以尝试另一种视角, 从代码的角度理解 einsum.</p>

<h2 id="表达式中的字母其实是-iterator">表达式中的字母其实是 iterator</h2>
<p>把输入中的字母看作 Python 的 iterator:</p>
<blockquote>
  <p><code class="language-plaintext highlighter-rouge">i</code> 迭代器返回 row (dim 0) 的 index, <code class="language-plaintext highlighter-rouge">k</code> 返回 col 的 index; 相同字母代表迭代器, 返回的数值相同</p>
</blockquote>

<p>接下来看几个例子</p>
<h4 id="ij-jk---ik-矩阵乘法">ij, jk -&gt; ik: 矩阵乘法</h4>
<ul>
  <li><code class="language-plaintext highlighter-rouge">i</code> 在 A 的行上迭代, <code class="language-plaintext highlighter-rouge">j</code> 在 A 的列上迭代</li>
  <li><code class="language-plaintext highlighter-rouge">j, k</code> 分别在 B 的行和列迭代</li>
  <li><code class="language-plaintext highlighter-rouge">-&gt;</code> 代表输出</li>
  <li><code class="language-plaintext highlighter-rouge">i, k</code> 是结果 C 的行和列迭代器</li>
</ul>

<p>C 中的每个位置 <code class="language-plaintext highlighter-rouge">(i, k)</code> 代表 A 的 <code class="language-plaintext highlighter-rouge">i</code> 行和 B <code class="language-plaintext highlighter-rouge">k</code> 列点积</p>

<p>用伪代码表示</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">A</span><span class="p">.</span><span class="n">rows</span><span class="p">:</span>
    <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">B</span><span class="p">.</span><span class="n">cols</span><span class="p">:</span>
        <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">A</span><span class="p">.</span><span class="n">rows</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">B</span><span class="p">.</span><span class="n">cols</span><span class="p">[</span><span class="n">k</span><span class="p">]:</span>
            <span class="n">C</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">k</span><span class="p">]</span> <span class="o">+=</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">*</span> <span class="n">B</span><span class="p">[</span><span class="n">j</span><span class="p">,</span> <span class="n">k</span><span class="p">]</span>
</code></pre></div></div>
<p>结果是 A, B 矩阵乘法</p>

<h4 id="ij-ij---ij-元素相乘">ij, ij -&gt; ij: 元素相乘</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">A</span><span class="p">.</span><span class="n">rows</span><span class="p">,</span> <span class="n">B</span><span class="p">.</span><span class="n">rows</span><span class="p">:</span>
    <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">A</span><span class="p">.</span><span class="n">cols</span><span class="p">,</span> <span class="n">B</span><span class="p">.</span><span class="n">cols</span><span class="p">:</span>
        <span class="n">C</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">*</span> <span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span>
</code></pre></div></div>
<p>结果是 A 与 B 的元素乘法 (elementwise - multiplication)</p>

<h4 id="ik-jk---ij-逐行点积">ik, jk -&gt; ij: 逐行点积</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">A</span><span class="p">.</span><span class="n">rows</span><span class="p">:</span>
    <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">B</span><span class="p">.</span><span class="n">rows</span><span class="p">:</span>
        <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">A</span><span class="p">.</span><span class="n">rows</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">B</span><span class="p">.</span><span class="n">rows</span><span class="p">[</span><span class="n">j</span><span class="p">]:</span>
            <span class="n">C</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">+=</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">k</span><span class="p">]</span> <span class="o">*</span> <span class="n">B</span><span class="p">[</span><span class="n">j</span><span class="p">,</span> <span class="n">k</span><span class="p">]</span>
</code></pre></div></div>
<p>结果是 A 与 B 逐行点积</p>

<p>对于表达式左侧有两项的表达式, 通过上面几个例子我们可以发现, 相同的字母代表相同的迭代器, 同时在两个 tensor 的对应 dim 上迭代.</p>

<h2 id="更多例子">更多例子</h2>
<p>einsum 可以用于单个 tensor</p>
<h4 id="ii---i-对角线上的元素">ii -&gt; i: 对角线上的元素</h4>
<p>两个 i 是作用在 A 行和列的相同迭代器</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">A</span><span class="p">.</span><span class="n">rows</span><span class="p">:</span>
    <span class="n">C</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">A</span><span class="p">.</span><span class="n">rows</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="n">i</span><span class="p">]</span>
</code></pre></div></div>
<p>所以结果 C 的每个元素是 A 中行列 index 相同的元素: A 对角线上的元素</p>

<p>类似的 <code class="language-plaintext highlighter-rouge">ii -&gt;</code>: 对角线上的元素再求和, 结果是 A 的迹 (trace)</p>

<h4 id="ij---i-逐行求和">ij -&gt; i: 逐行求和</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">A</span><span class="p">.</span><span class="n">rows</span><span class="p">:</span>
    <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">A</span><span class="p">.</span><span class="n">rows</span><span class="p">[</span><span class="n">i</span><span class="p">]:</span>
        <span class="n">C</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span>
</code></pre></div></div>
<p>A 逐行求和</p>

<p>类似的 <code class="language-plaintext highlighter-rouge">ij -&gt;</code> 所有元素求和</p>

<h4 id="ik-jk---ijk-逐行相乘">ik, jk -&gt; ijk: 逐行相乘</h4>
<p>A 中的每一行和 B 中的每一行元素相乘, 输出一个 rank 3 tensor</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="n">A</span><span class="p">.</span><span class="n">rows</span><span class="p">:</span>
    <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">B</span><span class="p">.</span><span class="n">rows</span><span class="p">:</span>
        <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">A</span><span class="p">.</span><span class="n">rows</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">B</span><span class="p">.</span><span class="n">rows</span><span class="p">[</span><span class="n">j</span><span class="p">]:</span>
            <span class="n">C</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">k</span><span class="p">]</span> <span class="o">*</span> <span class="n">B</span><span class="p">[</span><span class="n">j</span><span class="p">,</span> <span class="n">k</span><span class="p">]</span> <span class="c1"># 是一个 vecotr 
</span></code></pre></div></div>

<h2 id="结论">结论</h2>
<p>最后我们重新总结 einsum 的规则:</p>
<ol>
  <li>每个字母代表 tensor 一个 dim 上的迭代器, e.g. 对于 3 x 5 tensor A, <code class="language-plaintext highlighter-rouge">ij</code> 分别在 row 和 col 上迭代</li>
  <li>-&gt; 左侧如果是多项式, 每一项中相同的字母代表相同的迭代器</li>
  <li>右侧缺少迭代器代, 表左侧元素运算以后, 在对应 dim 上求和</li>
</ol>]]></content><author><name></name></author><category term="pytorch" /><summary type="html"><![CDATA[einsum Einstein summation 是爱因斯坦发明的一种矩阵运算标记, 旨在简化 tensor 运算表达式的书写.]]></summary></entry><entry><title type="html">理解 broadcasting 📢</title><link href="https://sanbai.github.io/pytorch/2024/05/26/understanding-broadcasting.html" rel="alternate" type="text/html" title="理解 broadcasting 📢" /><published>2024-05-26T14:39:02+00:00</published><updated>2024-05-26T14:39:02+00:00</updated><id>https://sanbai.github.io/pytorch/2024/05/26/understanding-broadcasting</id><content type="html" xml:base="https://sanbai.github.io/pytorch/2024/05/26/understanding-broadcasting.html"><![CDATA[<p>broadcasting 本质是让大小不相同的两个 tensor 拉抻后具有相同的大小, 能够进行数学运算
<img src="/assets/images/pic-broadcasting_2.png" alt="broadcasting" /></p>

<h1 id="问题">问题</h1>
<p>tensor 进行逐个元素计算时, 通常要求二者 shape 要匹配</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">a</span> <span class="o">=</span> <span class="n">tensor</span><span class="p">([[</span> <span class="mi">0</span><span class="p">,</span>  <span class="mi">1</span><span class="p">,</span>  <span class="mi">2</span><span class="p">],</span>
            <span class="p">[</span> <span class="mi">3</span><span class="p">,</span>  <span class="mi">4</span><span class="p">,</span>  <span class="mi">5</span><span class="p">],</span>
            <span class="p">[</span> <span class="mi">6</span><span class="p">,</span>  <span class="mi">7</span><span class="p">,</span>  <span class="mi">8</span><span class="p">],</span>
            <span class="p">[</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">11</span><span class="p">]])</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">tensor</span><span class="p">([[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
            <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
            <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
            <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]])</span>
<span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
</code></pre></div></div>

<p>如果二者形状不同呢? 可以遍历 row / col 计算并组装结果</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">c</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">ar</span><span class="p">,</span> <span class="n">ac</span> <span class="o">=</span> <span class="n">a</span><span class="p">.</span><span class="n">shape</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">ar</span><span class="p">):</span>
    <span class="n">c</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">*</span> <span class="n">b</span>
</code></pre></div></div>
<p>但这样很慢: Python 的 loop 是 python 实现的, 效率远低于 C 实现的 tensor 运算.
所以需要一个方法, 把 b 拉伸成 a 一样的形状</p>

<h1 id="broadcast-规则">broadcast 规则</h1>
<p>两个大小 (<code class="language-plaintext highlighter-rouge">len(a.shape)</code>, 或者 <code class="language-plaintext highlighter-rouge">t.ndim</code>) 不相同的 tensor 进行计算时, 可以把较小的一方拉伸成与较大的相同
对于 a, b 两 个tensor, 拉伸按照以下规则进行:</p>
<blockquote>
  <ol>
    <li>从最后一个 dim 开始逐个比较</li>
    <li>二者的 size 相同的话不需要拉伸, 继续比较前一个</li>
    <li>a 的 dim 不存在, 或者 size = 1, 则 a 在这个 dim 上进行 “复制”</li>
    <li>…… 直到所有 dim 处理完</li>
  </ol>
</blockquote>

<h2 id="举几个例子">举几个例子</h2>
<p>A, B, C, D 形状如下:
A: 5 x 1
B: 1 x 6
C: 6
D: 1
C 是有 6 个元素的 vector, D 是 scalar
通过 broadcasting 进行运算</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>A      (rank 2 tensor):  5 x 1
B      (rank 2 tensor):  1 x 6
Result (rank 2 tensor):  5 x 6
</code></pre></div></div>
<p>A 的第一列 <code class="language-plaintext highlighter-rouge">a[:, 0]</code> 在 x 轴 (dim 0) 方向上复制 6 次, A 表现为 5 x 6
B 的第一行 <code class="language-plaintext highlighter-rouge">b[0, :]</code> 在 y 轴方向上复制 5 次</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>B      (rank 2 tensor):  1 x 6
C      (rank 1 tensor):      6
Result (rank 2 tensor):  1 x 6
</code></pre></div></div>
<p>C 与 B 最后一个 dim 相同, C 补上缺少的 dim, 表现为 1 x 6</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>A      (rank 2 tensor):  5 x 1
D      (scalar       ):      1
Result (rank 2 tensor):  5 x 1
</code></pre></div></div>
<p>D 的唯一一个元素在每一个位置上复制, D 表现为 5 x 1</p>

<h1 id="broadcasting-实现">broadcasting 实现</h1>
<p>将较小的 tensor 复制多份以便较大的进行匹配,  在处理很大的 tensor 运算时会消耗很多时间和内存, broadcasting 的实现很聪明, 并不会复制数据</p>

<p><code class="language-plaintext highlighter-rouge">tesnor.expand_as</code> 将 b 拉伸到与 a 大小相同, 和 broadcasting 里一样</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">b</span> <span class="o">=</span> <span class="n">b</span><span class="p">.</span><span class="n">expand_as</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="n">b</span><span class="p">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">b</span>

<span class="c1"># Output:
</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">Size</span><span class="p">([</span><span class="mi">4</span><span class="p">,</span> <span class="mi">3</span><span class="p">]),</span>
 <span class="n">tensor</span><span class="p">([[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
         <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
         <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
         <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]]))</span>
</code></pre></div></div>
<p>但是实际上 b 的底层数据没有复制</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">b</span><span class="p">.</span><span class="n">storage</span><span class="p">()</span>

<span class="c1"># Output:
</span> <span class="mi">0</span>
 <span class="mi">1</span>
 <span class="mi">2</span>
<span class="p">[</span><span class="n">torch</span><span class="p">.</span><span class="n">storage</span><span class="p">.</span><span class="n">TypedStorage</span><span class="p">(</span><span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="p">.</span><span class="n">int64</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">cpu</span><span class="p">)</span> <span class="n">of</span> <span class="n">size</span> <span class="mi">3</span><span class="p">]</span>
</code></pre></div></div>

<p>秘密就在于控制在各个 dim (axis) 上移动的步长 (stride)</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">b</span><span class="p">.</span><span class="n">stride</span><span class="p">(),</span> <span class="n">b</span><span class="p">.</span><span class="n">shape</span>

<span class="c1"># Output:
</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">torch</span><span class="p">.</span><span class="n">Size</span><span class="p">([</span><span class="mi">4</span><span class="p">,</span> <span class="mi">3</span><span class="p">]))</span>
</code></pre></div></div>
<p>想象有一个 cursor 在 b 的 row 上移动, 当需要移动到下一个 row 时, 因为 tride = 0, cursor 位置不会发生变化</p>]]></content><author><name></name></author><category term="pytorch" /><summary type="html"><![CDATA[broadcasting 本质是让大小不相同的两个 tensor 拉抻后具有相同的大小, 能够进行数学运算]]></summary></entry></feed>