Image taken from AppSignal

Ecto is a toolkit for mapping database objects to Elixir structs and provides a unified interface to manipulate that data. In this post, we will dive into the internals of Ecto - its major components, their functions, and how they work. In doing so, we’ll demystify some of the apparent magic behind Ecto. Let’s get going!

Ecto’s Modules

Ecto is built of four major modules - Repo, Query, Schema, and Changeset. We’ll look at each in turn. Let’s start with the Repo module.

Repo Module

If you use Ecto with a database (like most users out there), Repo is the heart of Ecto. It binds everything together and provides a centralized point of communication between a database and your application. Repo:

  • maintains connections
  • executes queries against a database
  • provides an API to write migrations that interact with the database Let’s get started with Repo. Simply call use Ecto.Repo inside your Repo module. If you use mix to generate your Elixir project, this is done automatically for you.
# lib/my_app/repo.ex
defmodule MyApp.Repo
  use Ecto.Repo,
      otp_app: :my_app,
      adapter: Ecto.Adapters.Postgres

These few lines of code define the repo. Putting it under the Supervision tree inside application.ex gives you access to a whole set of functions provided by Repo to interact with a database. Again, this is code that is generated for you when using Phoenix:

defmodule MyApp.Application do
  use Application

  @impl true
  def start(_type, _args) do
    children = [
      # Start the Ecto repository
      # Other Children...

    # See
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)

With the few lines of code above, you get the following:

  • Access to the full Ecto.Repo API included in MyApp.Repo. The most common use cases include fetching records with MyApp.Repo.all/2, inserting new records with MyApp.Repo.insert/2, and updating records with MyApp.Repo.update/2.
  • A Supervisor starts that keeps track of all the processes required to keep Ecto working. The Supervision tree initializes the adapter (Ecto.Adapters.Postgres in this case), which is responsible for all communication with the database. The Postgres adapter, in turn, starts a connection pool to your database using the DBConnection library.
  • A query planner starts that’s responsible for planning and normalizing a query and its parameters. It also keeps a cache of all planned queries in an ETS table. We will learn more about this when we get to the Query module.

Monitor Queries Sent to Ecto from Your Elixir Application

In addition, Ecto also automatically publishes telemetry events that can be monitored. For example, to monitor statistics for all the queries sent to Ecto, you can subscribe to the [:my_app, :repo, :query] event with telemetry. Then, each time a query is performed, this event triggers some query metadata that includes the time spent executing the query, retrieving the data from the database, and more. For more details, see this full list of Ecto telemetry events. There are many options available to configure the Repo or the adapter as per your needs, but that’s out of the scope of this post. Let’s just take a very quick look at how you can monitor queries with AppSignal.

Instrumenting Ecto Queries with AppSignal in Your Elixir App

You can use AppSignal to easily get information about Ecto queries running in your Elixir application. Just set your :otp_app configuration to match your app’s OTP app name, and our Ecto instrumentation automatically hooks into your Ecto repos.

Read more in our Ecto docs. Check out our AppSignal for Elixir page.

Query Module

The Query module provides a unified API to write database-agnostic queries in Elixir. Note that building database queries with functions provided by the Ecto.Query module does not result in the queries being executed. These functions return a query in the form of an Ecto.Query struct. Nothing is actually sent to the database until the built %Ecto.Query{} is passed to one of the functions provided by the Repo module. As an example, let’s see a simple query that selects all users above 18 years of age:

age = 18
query = from u in "users",
                where: u.age > ^age,

Type this into an IEx console, and you will see that it creates a struct like this:

#Ecto.Query<from u0 in "users", where: u0.age > 18, select:>

You can also print it as a full map to see everything inside it:

iex(9)> IO.inspect(query, structs: false)
  __struct__: Ecto.Query,
  aliases: %{},
  assocs: [],
  combinations: [],
  distinct: nil,
  from: %{
    __struct__: Ecto.Query.FromExpr,
    as: nil,
    file: "iex",
    hints: [],
    line: 40,
    params: [],
    prefix: nil,
    source: {"users", nil}
  group_bys: [],
  havings: [],
  joins: [],
  limit: nil,
  lock: nil,
  offset: nil,
  order_bys: [],
  prefix: nil,
  preloads: [],
  select: %{
    __struct__: Ecto.Query.SelectExpr,
    aliases: %{},
    expr: {{:., [], [{:&, [], [0]}, :name]}, [], []},
    fields: nil,
    file: "iex",
    line: 40,
    params: [],
    subqueries: [],
    take: %{}
  sources: nil,
  updates: [],
  wheres: [
      __struct__: Ecto.Query.BooleanExpr,
      expr: {:>, [], [{{:., [], [{:&, [], [0]}, :age]}, [], []}, {:^, [], [0]}]},
      file: "iex",
      line: 40,
      op: :and,
      params: [{18, {0, :age}}],
      subqueries: []
  windows: [],
  with_ctes: nil

This is much more interesting - it shows exactly how the simple query is represented internally in Ecto. Ecto.Query.FromExpr contains details about the table we are querying (users).

ASTs in the Query

The other two expressions we see in the query are much more complex, but this is something that the adapters understand and convert to the query. If you look closely, they are ASTs.

Note: If you are interested in learning more about ASTs, check out An Introduction to Metaprogramming in Elixir. Let’s see what the code looks like for this expression:

iex> Macro.to_string({:>, [], [{{:., [], [{:&, [], [0]}, :age]}, [], []}, {:^, [], [0]}]})
"&0.age() > ^0"

This is our condition in where, just normalized into terms the adapters understand. The adapter does the final translation of the query to actual SQL that the database understands. Note that while we usually write SQL here, the adapters don’t need to work with SQL databases only - some adapters work just as well with no-SQL databases. Query generation and all database communication are clearly separated from Ecto’s core. If you want to explore this further, try building out some complex queries with joins, subqueries, windows, etc., and see how they are represented internally - it is a great way to learn how abstractions are made inside Ecto.

Finally, this query struct is converted to a SQL statement by the adapter:

iex> Ecto.Adapters.SQL.to_sql(:all, MyApp.Repo, query)
{"SELECT u0.\"name\" FROM \"users\" AS u0 WHERE (u0.\"age\" > $1)", [18]}

Back to Erlang’s ETS Table

Back in the section about the Repo module, we created an ETS table when we started the Repository in our application. Now is where that ETS table comes into play. When a query is executed multiple times, the query is only prepared the first time.

Note: You can learn more about PREPARE in the context of Postgres. It is then cached inside that ETS table and fetched from there for all subsequent calls. To see the caching in action, check this out (notice the :cached in result, which signals that this query has been cached):

iex> MyApp.Repo.all(query) # This puts the query in the cache

# Trying to prepare the query again gets a cached version
iex> Ecto.Adapter.Queryable.prepare_query(:all, MyApp.Repo, query)
{{:cached, #Function<41.44551318/1 in Ecto.Query.Planner.query_with_cache/8>,
  #Function<42.44551318/1 in Ecto.Query.Planner.query_with_cache/8>,
     ref: #Reference<0.3269063957.2912157697.165169>,
     name: "ecto_5122",
     statement: "SELECT u0.\"name\" FROM \"users\" AS u0 WHERE (u0.\"age\" > $1)",
     param_oids: [20],
     param_formats: [:binary],
     param_types: [Postgrex.Extensions.Int8],
     columns: ["name"],
     result_oids: [1043],
     result_formats: [:binary],
     result_types: [Postgrex.Extensions.Raw],
     types: {Postgrex.DefaultTypes, #Reference<0.3269063957.2912288771.132368>},
     cache: :reference
   }}}, [18]}

Note that this doesn’t cache the result, only the prepared statements. Prepared statements give a large performance advantage, especially for complex queries. From the Postgres docs:

Prepared statements potentially have the largest performance advantage when a single session is being used to execute a large number of similar statements.

The performance difference will be particularly significant if the statements are complex to plan or rewrite, e.g., if the query involves a join of many tables or requires the application of several rules.

The next important module in Ecto is the Schema. Let’s take a look at it now.

Schema Module

You can use Ecto without schemas and it works just as well (as we saw above, when we referenced the table names directly). The Schema module is responsible for defining and mapping a record’s attributes (fields and associations) from a database table to an Elixir struct. To create a schema, we write use Ecto.Schema at the top of our module and use the schema DSL. For example:

defmodule MyApp.Organization do
  use Ecto.Schema

  schema "organizations" do
    field :name, :string

defmodule MyApp.User do
  use Ecto.Schema

  schema "users" do
    field :name, :string
    belongs_to :organization, MyApp.Organization
  • The use statement includes several utility functions and macros inside the module and sets some default module attributes required for Ecto to gather data from the Schema.
  • The schema macro then updates some of those attributes to mark that this is a persisted schema (there is also another embedded_schema macro to deal with non-persisted schemas) and sets some other defaults, like the primary key.
  • The field and belongs_to inside the schema block then put those fields in the module attributes (for type validation), and add the fields to the struct defined by the module. The Ecto.Schema behavior exposes some methods inside the schema to fetch field details. For example:
iex(62)> MyApp.User.__schema__(:source)
iex(63)> MyApp.User.__schema__(:fields)
[:id, :name, :organization_id]
iex(64)> MyApp.User.__schema__(:primary_key)
iex(65)> MyApp.User.__schema__(:associations)
iex(66)> MyApp.User.__schema__(:association, :organization)
  field: :organization,
  owner: MyApp.User,
  related: MyApp.Organization,
  owner_key: :organization_id,
  related_key: :id,
  queryable: MyApp.Organization,
  on_cast: nil,
  on_replace: :raise,
  where: [],
  defaults: [],
  cardinality: :one,
  relationship: :parent,
  unique: true,
  ordered: false

This __schema__ function is also the entry-point for other parts of Ecto to reflect on more details about the defined schema and perform operations on it. For example, when used as the source of a query, the repo will use schema to validate the conditions in the where clause, and cast the data returned from the database to Elixir structs. This results in much better feedback when there’s something wrong.

A Schema Module In Action on an Elixir App

Let’s try executing a query that has a typo to see the benefits of using a schema in action:

from u in "users",
  where: u.ages > 19

Executing this with Repo.all will throw a generic Postgrex.Error:

** (Postgrex.Error) ERROR 42703 (undefined_column) column u0.ages does not exist

    query: SELECT u0."id" FROM "users" AS u0 WHERE (u0."ages" > 19)

Let’s try the same query, but this time with a schema.

from u in MyApp.Accounts.User,
  where: u.ages > 19

As expected, this also throws an error, but it now includes the line number where it happened and has a more specific Exception type:

** (Ecto.QueryError) lib/my_app/accounts.ex:109: field `ages` in `where` does not exist in schema MyApp.Accounts.User in query:

from u0 in MyApp.Accounts.User,
  where: u0.ages > 19,

This works because the query planner in Ecto can look at the schema’s metadata and figure out that this field doesn’t exist on the schema even before hitting the database. Ecto also does type conversions behind the scenes when using the schema. For example, it allows us to run this query:

id = "2"
query = from u in MyApp.Accounts.User, where: == ^id

On the other hand, if you are not using a schema, a similar query will raise an exception:

query = from u in "users", where: == ^id, select:

** (DBConnection.EncodeError) Postgrex expected an integer in -9223372036854775808..9223372036854775807, got "2". Please make sure the value you are passing matches the definition in your table or in your query or convert the value accordingly.

Changeset Module

The final module that we will look at today is Changeset. It provides an interface for validating and transforming data before it is written into a database. Similarly to Query, Changeset provides a structured way to represent changes to data. It is most commonly used with Ecto schemas, but schemaless changesets are also possible when you don’t need a full-fledged schema. Ecto.Changeset provides a comprehensive API to work with data.

The cast/4 Changeset

Let’s start by looking at the most commonly used changeset, cast/4:

iex> changeset = %MyApp.User{name: "some name"}
     |> cast(%{"name" => "", "organization_id" => "1", "foo" => "bar"}, [:name, :organization_id])
     |> validate_required(:name)

We pass initial data (the MyApp.User struct in this case) to cast followed by some parameters and the list of allowed fields. cast figures out the type of each allowed field by looking at the schema metadata. cast then typecasts parameter value to an allowed value, or adds an error to the changeset. For example, here we can see that we had a value of 1 (String) as the organization_id. But from the schema, cast can figure out that organization_id is of type int and cast the value to an integer before putting the change inside the Changeset.

The validate_required/3 Changeset

The second call in the pipeline, validate_required/3, requires the field to be present (as the name suggests). By default, it trims any strings/binaries before running validations and assumes an empty string to be blank. Here’s what is printed out when you inspect the Changeset.

  action: nil,
  changes: %{organization_id: 1},
  errors: [name: {"can't be blank", [validation: :required]}],
  data: #MyApp.User<>,
  valid?: false

This small summary already contains most of the information we need about the changeset. It shows what changes were made to the initial data (organization_id was set to 1), and that the changeset is invalid. It lists all the errors. Let’s go one step further and inspect the full map.

iex> IO.inspect(changeset, structs: false)
  __struct__: Ecto.Changeset,
  action: nil,
  changes: %{organization_id: 1},
  constraints: [],
  data: %{
    __meta__: %{
      __struct__: Ecto.Schema.Metadata,
      context: nil,
      prefix: nil,
      schema: MyApp.User,
      source: "users",
      state: :built
    __struct__: MyApp.User,
    id: nil,
    name: "some name",
    organization: %{
      __cardinality__: :one,
      __field__: :organization,
      __owner__: MyApp.User,
      __struct__: Ecto.Association.NotLoaded
    organization_id: nil
  empty_values: [""],
  errors: [name: {"can't be blank", [validation: :required]}],
  filters: %{},
  params: %{"name" => "", "organization_id" => 1, "foo" => "bar"},
  prepare: [],
  repo: nil,
  repo_opts: [],
  required: [:name],
  types: %{
    id: :id,
    name: :string,
    organization: {:assoc,
       __struct__: Ecto.Association.BelongsTo,
       cardinality: :one,
       defaults: [],
       field: :organization,
       on_cast: nil,
       on_replace: :raise,
       ordered: false,
       owner: MyApp.User,
       owner_key: :organization_id,
       queryable: MyApp.Organization,
       related: MyApp.Organization,
       related_key: :id,
       relationship: :parent,
       unique: true,
       where: []
    organization_id: :id
  valid?: false,
  validations: []

This contains much more information now. The data and params are self-explanatory - the initial data and the params we fed to cast. types contains additional data about the schema we are working with (fetched using the __schema__ method we saw in the previous section). changes and errors are where it gets interesting. cast automatically converts the string organization_id to an integer because it knows that the organization_id is a numeric primary key from the association details. What’s also interesting is that it understands that "" is a blank value and inserts an error into the changeset from the validate_required call. The database manipulation functions from Repo understand changesets and return errors if the changeset is invalid. The Changeset API provides several other functions for validating data and database constraints, and dealing with associations. These functions interact with the data and eventually update the struct that we saw above. When fed to the Repo module’s functions, structs perform the eventual database operations.

Wrapping Up

In this post, we looked at the core concepts of the Ecto library:

  • We started with a Schema that defined our business objects mapped to database tables.
  • To get those objects out of the database, we used the Query API.
  • We then use Changesets to change those objects and insert or update them in the database.
  • Tying everything together is the Repo module that takes inputs from all the other modules, and finally interacts with the database to fetch data or update records. All of these modules work together to provide a structured and safe way to interact with databases. Check out the official Ecto guide to integrate Ecto into your Elixir application. I have also linked to the source code for most of this post, so feel free to dig a little deeper.

Until next time - happy digging!


This article was originally posted on AppSignal Blog