eager_load详解

作者:周星 发布:2017-10-07

       前面我们学习了解决 1+N 问题的三种方法,includes、preload、eager_load,本次分享我们来详细的学习者三种方法的机制,这里面会涉及一些数据库知识,通过本次分享,我相信大家会学到很多之前不熟悉的知识。

我们的示例都是围绕着这两个 model 展开的:

class User < ActiveRecord::Base
  has_many :addresses
end

class Address < ActiveRecord::Base
  belongs_to :user
end

下面是数据库 seed 数据,用来帮助我们检验查询的结果

rob = User.create!(name: "Robert Pankowecki", email: "robert@example.org")
bob = User.create!(name: "Bob Doe", email: "bob@example.org")

rob.addresses.create!(country: "Poland", city: "Wrocław", postal_code: "55-555", street: "Rynek")
rob.addresses.create!(country: "France", city: "Paris", postal_code: "75008", street: "8 rue Chambiges")
bob.addresses.create!(country: "Germany", city: "Berlin", postal_code: "10551", street: "Tiergarten"

       通过上节的分享,我们知道了这三种方法生成不同的 SQL,includes 和 preload 把 1+N 查询分成了两个SQL ,eager_load 通过 left_join 生成了一个 SQL,那么 includes 和 preload 有什么区别呢? include 方法它会根据一些条件来决定生成的 SQL 应该是哪一种(IN or LEFT_JOIN),Rails 会帮你处理好这一切,但是你可能会产生一些疑问, includes 方法是通过什么条件呢?我们来看一个例子,在这里例子中,include 会使用 eager_load:

User.includes(:addresses).where("addresses.country = ?", "Poland")
User.eager_load(:addresses).where("addresses.country = ?", "Poland")

# SQL (6.3ms)  SELECT `users`.`id` AS t0_r0, `users`.`name` AS t0_r1, `users`.`email` AS t0_r2, `users`.`created_at` AS t0_r3, `users`.`updated_at` AS t0_r4, `addresses`.`id` AS t1_r0, `addresses`.`country` AS t1_r1, `addresses`.`city` AS t1_r2, `addresses`.`postal_code` AS t1_r3, `addresses`.`street` AS t1_r4, `addresses`.`created_at` AS t1_r5, `addresses`.`updated_at` AS t1_r6, `addresses`.`user_id` AS t1_r7 FROM `users` LEFT OUTER JOIN `addresses` ON `addresses`.`user_id` = `users`.`id` WHERE (addresses.country = 'Poland')
=> [#<User id: 5, name: "Robert Pankowecki", email: "robert@example.org", created_at: "2014-06-29 13:41:18", updated_at: "2014-06-29 13:41:18">]

       在上面的例子中,Rails 发现 where 条件中使用了 preload 表(address)的字段名,于是 include 方法就让 eager_load 来处理了,我们再来看看,如果我们直接使用 preload会发生什么

User.preload(:addresses).where("addresses.country = ?", "Poland")

# User Load (0.4ms)  SELECT `users`.* FROM `users` WHERE (addresses.country = 'Poland')
# ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column 'addresses.country' in 'where clause': SELECT `users`.* FROM `users`  WHERE (addresses.country = 'Poland')

程序报错了!因为我们没有通过任何方式来 JOIN address 表!

让我们重新审视一下上面的示例,

User.includes(:addresses).where("addresses.country = ?", "Poland")

你可能会疑惑,这个查询到底做了什么,我给你三个选项:

  1. 查询出所有 country 为 poland 的 user,并且只 preload polish address

  2. 查询出所有 country 为 poland 的 user,并且 preload 所有的 address

  3. 查询出所有的 user 和他们 polish address

答案是第一个!

       现在我们又有新的需求了:查询出所有 country 为 poland 的 user,并且 preload 所有的 address。我需要知道 user 的所有 address,user 至少有一个地址是 poland。
       我们知道我们只需要那些 address 为 poland 的 user,这很简单,User.joins(:addresses).where("addresses.country = ?", "Poland") ,并且我们也知道了我们需要 eager_load 所有的 address,所以一个 includes(:addresses) 就可以搞定了。

r = User.joins(:addresses).where("addresses.country = ?", "Poland").includes(:addresses)

r[0]
#=> #<User id: 1, name: "Robert Pankowecki", email: "robert@example.org", created_at: "2013-12-08 11:26:24", updated_at: "2013-12-08 11:26:24"> 

r[0].addresses
# [
#   #<Address id: 1, user_id: 1, country: "Poland", street: "Rynek", postal_code: "55-555", city: "Wrocław", created_at: "2013-12-08 11:26:50", updated_at: "2013-12-08 11:26:50">
# ]

       我们遗憾的发现,这不是我们想要的结果,因为我们丢失了这个 user 的第二个 address,Rails 仍然发现了我们在 where 条件中使用了 preload 表的字段,然后又在内部使用了 eager_load,和之前的例子不同之处在于,我们这次使用了 INNER_JOIN 替代了 LEFT_JOIN,但是它没有达到我们想要的结果。

SELECT 
"users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4,
"addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7 
FROM "users"
INNER JOIN "addresses" 
ON "addresses"."user_id" = "users"."id" 
WHERE (addresses.country = 'Poland')

在这种情况下,根据前面的内容,你可能知道了,直接使用 preload 代替 includes ,可能会有惊喜

r = User.joins(:addresses).where("addresses.country = ?", "Poland").preload(:addresses)

# User Load (0.3ms)  SELECT `users`.* FROM `users` INNER JOIN `addresses` ON `addresses`.`user_id` = `users`.`id` WHERE (addresses.country = 'Poland')

r[0] 
# [#<User id: 1, name: "Robert Pankowecki", email: "robert@example.org", created_at: "2013-12-08 11:26:24", updated_at: "2013-12-08 11:26:24">] 

r[0].addresses
# [
#  <Address id: 1, user_id: 1, country: "Poland", street: "Rynek", postal_code: "55-555", city: "Wrocław", created_at: "2013-12-08 11:26:50", updated_at: "2013-12-08 11:26:50">,
#  <Address id: 3, user_id: 1, country: "France", street: "8 rue Chambiges", postal_code: "75008", city: "Paris", created_at: "2013-12-08 11:36:30", updated_at: "2013-12-08 11:36:30">] 
# ]

这就是我们想要的答案!正是由于我们使用了 preload,而且语句也很直观。

现在我们再完成第三个需求,查询出所有的 user 和他们 polish address!

但是我并不推荐 preload 一个关联查询的子集(有点拗口,要理解一下),在我们程序的其它部分可能已经假定了我们已经完全的 preload,我推荐这种做法:

class User < ActiveRecord::Base
  has_many :addresses
  has_many :polish_addresses, conditions: {country: "Poland"}, class_name: "Address"
end

然后这样使用:

r = User.preload(:polish_addresses)

# SELECT "users".* FROM "users" 
# SELECT "addresses".* FROM "addresses" WHERE "addresses"."country" = 'Poland' AND "addresses"."user_id" IN (1, 2)

r

# [
#   <User id: 1, name: "Robert Pankowecki", email: "robert@example.org", created_at: "2013-12-08 11:26:24", updated_at: "2013-12-08 11:26:24">
#   <User id: 2, name: "Bob Doe", email: "bob@example.org", created_at: "2013-12-08 11:26:25", updated_at: "2013-12-08 11:26:25">
# ] 

r[0].polish_addresses

# [
#   #<Address id: 1, user_id: 1, country: "Poland", street: "Rynek", postal_code: "55-555", city: "Wrocław", created_at: "2013-12-08 11:26:50", updated_at: "2013-12-08 11:26:50">
# ] 

r[1].polish_addresses
# []

或者这一种:

r = User.eager_load(:polish_addresses)

# SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4, 
#        "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7
# FROM "users" 
# LEFT OUTER JOIN "addresses" 
# ON "addresses"."user_id" = "users"."id" AND "addresses"."country" = 'Poland'

r
# [
#   #<User id: 1, name: "Robert Pankowecki", email: "robert@example.org", created_at: "2013-12-08 11:26:24", updated_at: "2013-12-08 11:26:24">,
#   #<User id: 2, name: "Bob Doe", email: "bob@example.org", created_at: "2013-12-08 11:26:25", updated_at: "2013-12-08 11:26:25">
# ]

r[0].polish_addresses
# [
#   #<Address id: 1, user_id: 1, country: "Poland", street: "Rynek", postal_code: "55-555", city: "Wrocław", created_at: "2013-12-08 11:26:50", updated_at: "2013-12-08 11:26:50">
# ]

r[1].polish_addresses
# []

       但是如果我们只有在执行的时候才知道关联条件的时候怎么办?是使用哪一种?我也不知道,如果你知道,请在评论中告诉我,如果你对本分享有什么好的意见或建议,也请通过上述方式与我联系,我会非常感激。

英文原文地址:http://blog.arkency.com/2013/12/rails4-preloading/

支付宝扫码赞助博主


评论(0)