Ecto is a toolkit for mapping database objects to Elixir structs and provides a unified interface to manipulate that data. In this post, we will dive into the internals of Ecto - its major components, their functions, and how they work. In doing so, we’ll demystify some of the apparent magic behind Ecto. Let’s get going!
Ecto’s Modules
Ecto is built of four major modules - Repo
, Query
, Schema
, and Changeset
.
We’ll look at each in turn. Let’s start with the Repo
module.
Repo Module
If you use Ecto with a database (like most users out there), Repo is the heart of Ecto. It binds everything together and provides a centralized point of communication between a database and your application. Repo:
- maintains connections
- executes queries against a database
- provides an API to write migrations that interact with the database
Let’s get started with Repo. Simply call
use Ecto.Repo
inside your Repo module. If you usemix phx.new
to generate your Elixir project, this is done automatically for you.
# lib/my_app/repo.ex
defmodule MyApp.Repo
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.Postgres
These few lines of code define the repo. Putting it under the Supervision tree inside application.ex
gives you access to a whole set of functions provided by Repo to interact with a database. Again, this is code that is generated for you when using Phoenix:
defmodule MyApp.Application do
use Application
@impl true
def start(_type, _args) do
children = [
# Start the Ecto repository
MyApp.Repo,
# Other Children...
]
# See https://hexdocs.pm/elixir/Supervisor.html
# for other strategies and supported options
opts = [strategy: :one_for_one, name: MyApp.Supervisor]
Supervisor.start_link(children, opts)
end
With the few lines of code above, you get the following:
- Access to the full
Ecto.Repo API
included inMyApp.Repo
. The most common use cases include fetching records withMyApp.Repo.all/2
, inserting new records withMyApp.Repo.insert/2
, and updating records withMyApp.Repo.update/2
. - A Supervisor starts that keeps track of all the processes required to keep Ecto working.
The Supervision tree initializes the adapter (
Ecto.Adapters.Postgres
in this case), which is responsible for all communication with the database. The Postgres adapter, in turn, starts a connection pool to your database using theDBConnection
library. - A query planner starts that’s responsible for planning and normalizing a query and its parameters.
It also keeps a cache of all planned queries in an ETS table.
We will learn more about this when we get to the
Query
module.
Monitor Queries Sent to Ecto from Your Elixir Application
In addition, Ecto also automatically publishes telemetry events that can be monitored.
For example, to monitor statistics for all the queries sent to Ecto, you can subscribe to the [:my_app, :repo, :query]
event with telemetry.
Then, each time a query is performed, this event triggers some query metadata that includes the time spent executing the query, retrieving the data from the database, and more.
For more details, see this full list of Ecto telemetry events.
There are many options available to configure the Repo or the adapter as per your needs, but that’s out of the scope of this post. Let’s just take a very quick look at how you can monitor queries with AppSignal.
Instrumenting Ecto Queries with AppSignal in Your Elixir App
You can use AppSignal to easily get information about Ecto queries running in your Elixir application.
Just set your :otp_app
configuration to match your app’s OTP app name, and our Ecto instrumentation automatically hooks into your Ecto repos.
Read more in our Ecto docs. Check out our AppSignal for Elixir page.
Query Module
The Query module provides a unified API to write database-agnostic queries in Elixir.
Note that building database queries with functions provided by the Ecto.Query
module does not result in the queries being executed.
These functions return a query in the form of an Ecto.Query
struct.
Nothing is actually sent to the database until the built %Ecto.Query{}
is passed to one of the functions provided by the Repo module.
As an example, let’s see a simple query that selects all users above 18 years of age:
age = 18
query = from u in "users",
where: u.age > ^age,
select: u.name
Type this into an IEx console, and you will see that it creates a struct like this:
#Ecto.Query<from u0 in "users", where: u0.age > 18, select: u0.name>
You can also print it as a full map to see everything inside it:
iex(9)> IO.inspect(query, structs: false)
%{
__struct__: Ecto.Query,
aliases: %{},
assocs: [],
combinations: [],
distinct: nil,
from: %{
__struct__: Ecto.Query.FromExpr,
as: nil,
file: "iex",
hints: [],
line: 40,
params: [],
prefix: nil,
source: {"users", nil}
},
group_bys: [],
havings: [],
joins: [],
limit: nil,
lock: nil,
offset: nil,
order_bys: [],
prefix: nil,
preloads: [],
select: %{
__struct__: Ecto.Query.SelectExpr,
aliases: %{},
expr: {{:., [], [{:&, [], [0]}, :name]}, [], []},
fields: nil,
file: "iex",
line: 40,
params: [],
subqueries: [],
take: %{}
},
sources: nil,
updates: [],
wheres: [
%{
__struct__: Ecto.Query.BooleanExpr,
expr: {:>, [], [{{:., [], [{:&, [], [0]}, :age]}, [], []}, {:^, [], [0]}]},
file: "iex",
line: 40,
op: :and,
params: [{18, {0, :age}}],
subqueries: []
}
],
windows: [],
with_ctes: nil
}
This is much more interesting - it shows exactly how the simple query is represented internally in Ecto.
Ecto.Query.FromExpr
contains details about the table we are querying (users
).
ASTs in the Query
The other two expressions we see in the query are much more complex, but this is something that the adapters understand and convert to the query.
If you look closely, they are AST
s.
Note: If you are interested in learning more about ASTs, check out An Introduction to Metaprogramming in Elixir. Let’s see what the code looks like for this expression:
iex> Macro.to_string({:>, [], [{{:., [], [{:&, [], [0]}, :age]}, [], []}, {:^, [], [0]}]})
"&0.age() > ^0"
This is our condition in where
, just normalized into terms the adapters understand.
The adapter does the final translation of the query to actual SQL
that the database understands.
Note that while we usually write SQL here, the adapters don’t need to work with SQL
databases only - some adapters work just as well with no-SQL databases.
Query generation and all database communication are clearly separated from Ecto’s core.
If you want to explore this further, try building out some complex queries with joins, subqueries, windows, etc., and see how they are represented internally - it is a great way to learn how abstractions are made inside Ecto.
Finally, this query struct is converted to a SQL statement by the adapter:
iex> Ecto.Adapters.SQL.to_sql(:all, MyApp.Repo, query)
{"SELECT u0.\"name\" FROM \"users\" AS u0 WHERE (u0.\"age\" > $1)", [18]}
Back to Erlang’s ETS Table
Back in the section about the Repo module, we created an ETS table when we started the Repository in our application.
Now is where that ETS table comes into play.
When a query is executed multiple times, the query is only prepared
the first time.
Note: You can learn more about PREPARE
in the context of Postgres.
It is then cached inside that ETS table and fetched from there for all subsequent calls.
To see the caching in action, check this out (notice the :cached
in result, which signals that this query has been cached):
iex> MyApp.Repo.all(query) # This puts the query in the cache
# Trying to prepare the query again gets a cached version
iex> Ecto.Adapter.Queryable.prepare_query(:all, MyApp.Repo, query)
{{:cached, #Function<41.44551318/1 in Ecto.Query.Planner.query_with_cache/8>,
#Function<42.44551318/1 in Ecto.Query.Planner.query_with_cache/8>,
{5122,
%Postgrex.Query{
ref: #Reference<0.3269063957.2912157697.165169>,
name: "ecto_5122",
statement: "SELECT u0.\"name\" FROM \"users\" AS u0 WHERE (u0.\"age\" > $1)",
param_oids: [20],
param_formats: [:binary],
param_types: [Postgrex.Extensions.Int8],
columns: ["name"],
result_oids: [1043],
result_formats: [:binary],
result_types: [Postgrex.Extensions.Raw],
types: {Postgrex.DefaultTypes, #Reference<0.3269063957.2912288771.132368>},
cache: :reference
}}}, [18]}
Note that this doesn’t cache the result, only the prepared statements. Prepared statements give a large performance advantage, especially for complex queries. From the Postgres docs:
Prepared statements potentially have the largest performance advantage when a single session is being used to execute a large number of similar statements.
The performance difference will be particularly significant if the statements are complex to plan or rewrite, e.g., if the query involves a join of many tables or requires the application of several rules.
The next important module in Ecto is the Schema
. Let’s take a look at it now.
Schema Module
You can use Ecto without schemas and it works just as well (as we saw above, when we referenced the table names directly).
The Schema
module is responsible for defining and mapping a record’s attributes (fields and associations) from a database table to an Elixir struct.
To create a schema, we write use Ecto.Schema
at the top of our module and use the schema
DSL.
For example:
defmodule MyApp.Organization do
use Ecto.Schema
schema "organizations" do
field :name, :string
end
end
defmodule MyApp.User do
use Ecto.Schema
schema "users" do
field :name, :string
belongs_to :organization, MyApp.Organization
end
end
- The
use
statement includes several utility functions and macros inside the module and sets some default module attributes required for Ecto to gather data from the Schema. - The
schema
macro then updates some of those attributes to mark that this is a persisted schema (there is also anotherembedded_schema
macro to deal with non-persisted schemas) and sets some other defaults, like the primary key. - The
field
andbelongs_to
inside the schema block then put those fields in the module attributes (for type validation), and add the fields to the struct defined by the module. TheEcto.Schema
behavior exposes some methods inside the schema to fetch field details. For example:
iex(62)> MyApp.User.__schema__(:source)
"users"
iex(63)> MyApp.User.__schema__(:fields)
[:id, :name, :organization_id]
iex(64)> MyApp.User.__schema__(:primary_key)
[:id]
iex(65)> MyApp.User.__schema__(:associations)
[:organization]
iex(66)> MyApp.User.__schema__(:association, :organization)
%Ecto.Association.BelongsTo{
field: :organization,
owner: MyApp.User,
related: MyApp.Organization,
owner_key: :organization_id,
related_key: :id,
queryable: MyApp.Organization,
on_cast: nil,
on_replace: :raise,
where: [],
defaults: [],
cardinality: :one,
relationship: :parent,
unique: true,
ordered: false
}
This __schema__
function is also the entry-point for other parts of Ecto to reflect on more details about the defined schema and perform operations on it.
For example, when used as the source of a query, the repo will use schema to validate the conditions in the where
clause, and cast the data returned from the database to Elixir structs.
This results in much better feedback when there’s something wrong.
A Schema Module In Action on an Elixir App
Let’s try executing a query that has a typo to see the benefits of using a schema in action:
from u in "users",
select: u.id,
where: u.ages > 19
Executing this with Repo.all
will throw a generic Postgrex.Error
:
** (Postgrex.Error) ERROR 42703 (undefined_column) column u0.ages does not exist
query: SELECT u0."id" FROM "users" AS u0 WHERE (u0."ages" > 19)
Let’s try the same query, but this time with a schema.
from u in MyApp.Accounts.User,
select: u.id,
where: u.ages > 19
As expected, this also throws an error, but it now includes the line number where it happened and has a more specific Exception type:
** (Ecto.QueryError) lib/my_app/accounts.ex:109: field `ages` in `where` does not exist in schema MyApp.Accounts.User in query:
from u0 in MyApp.Accounts.User,
where: u0.ages > 19,
select: u0.id
This works because the query planner in Ecto can look at the schema’s metadata and figure out that this field doesn’t exist on the schema even before hitting the database. Ecto also does type conversions behind the scenes when using the schema. For example, it allows us to run this query:
id = "2"
query = from u in MyApp.Accounts.User, where: u.id == ^id
Repo.all(query)
On the other hand, if you are not using a schema, a similar query will raise an exception:
query = from u in "users", where: u.id == ^id, select: u.id
Repo.all(query)
** (DBConnection.EncodeError) Postgrex expected an integer in -9223372036854775808..9223372036854775807, got "2". Please make sure the value you are passing matches the definition in your table or in your query or convert the value accordingly.
Changeset Module
The final module that we will look at today is Changeset
.
It provides an interface for validating and transforming data before it is written into a database.
Similarly to Query
, Changeset provides a structured way to represent changes to data. It is most commonly used with Ecto schemas, but schemaless changesets are also possible when you don’t need a full-fledged schema.
Ecto.Changeset provides a comprehensive API to work with data.
The cast/4
Changeset
Let’s start by looking at the most commonly used changeset, cast/4
:
iex> changeset = %MyApp.User{name: "some name"}
|> cast(%{"name" => "", "organization_id" => "1", "foo" => "bar"}, [:name, :organization_id])
|> validate_required(:name)
We pass initial data (the MyApp.User
struct in this case) to cast
followed by some parameters and the list of allowed fields.
cast
figures out the type of each allowed field by looking at the schema metadata. cast
then typecasts parameter value to an allowed value, or adds an error to the changeset.
For example, here we can see that we had a value of 1
(String
) as the organization_id
.
But from the schema, cast
can figure out that organization_id
is of type int
and cast the value to an integer before putting the change inside the Changeset.
The validate_required/3
Changeset
The second call in the pipeline, validate_required/3
, requires the field to be present (as the name suggests).
By default, it trims any strings/binaries before running validations and assumes an empty string to be blank.
Here’s what is printed out when you inspect the Changeset
.
#Ecto.Changeset<
action: nil,
changes: %{organization_id: 1},
errors: [name: {"can't be blank", [validation: :required]}],
data: #MyApp.User<>,
valid?: false
>
This small summary already contains most of the information we need about the changeset.
It shows what changes were made to the initial data (organization_id
was set to 1
), and that the changeset is invalid. It lists all the errors.
Let’s go one step further and inspect the full map.
iex> IO.inspect(changeset, structs: false)
%{
__struct__: Ecto.Changeset,
action: nil,
changes: %{organization_id: 1},
constraints: [],
data: %{
__meta__: %{
__struct__: Ecto.Schema.Metadata,
context: nil,
prefix: nil,
schema: MyApp.User,
source: "users",
state: :built
},
__struct__: MyApp.User,
id: nil,
name: "some name",
organization: %{
__cardinality__: :one,
__field__: :organization,
__owner__: MyApp.User,
__struct__: Ecto.Association.NotLoaded
},
organization_id: nil
},
empty_values: [""],
errors: [name: {"can't be blank", [validation: :required]}],
filters: %{},
params: %{"name" => "", "organization_id" => 1, "foo" => "bar"},
prepare: [],
repo: nil,
repo_opts: [],
required: [:name],
types: %{
id: :id,
name: :string,
organization: {:assoc,
%{
__struct__: Ecto.Association.BelongsTo,
cardinality: :one,
defaults: [],
field: :organization,
on_cast: nil,
on_replace: :raise,
ordered: false,
owner: MyApp.User,
owner_key: :organization_id,
queryable: MyApp.Organization,
related: MyApp.Organization,
related_key: :id,
relationship: :parent,
unique: true,
where: []
}},
organization_id: :id
},
valid?: false,
validations: []
}
This contains much more information now.
The data
and params
are self-explanatory - the initial data and the params we fed to cast
.
types
contains additional data about the schema we are working with (fetched using the __schema__
method we saw in the previous section).
changes
and errors
are where it gets interesting. cast
automatically converts the string organization_id
to an integer because it knows that the organization_id
is a numeric primary key from the association details.
What’s also interesting is that it understands that ""
is a blank value and inserts an error into the changeset from the validate_required
call.
The database manipulation functions from Repo
understand changesets and return errors if the changeset is invalid.
The Changeset API provides several other functions for validating data and database constraints, and dealing with associations.
These functions interact with the data and eventually update the struct that we saw above. When fed to the Repo
module’s functions, structs perform the eventual database operations.
Wrapping Up
In this post, we looked at the core concepts of the Ecto library:
- We started with a
Schema
that defined our business objects mapped to database tables. - To get those objects out of the database, we used the
Query
API. - We then use
Changeset
s to change those objects and insert or update them in the database. - Tying everything together is the
Repo
module that takes inputs from all the other modules, and finally interacts with the database to fetch data or update records. All of these modules work together to provide a structured and safe way to interact with databases. Check out the official Ecto guide to integrate Ecto into your Elixir application. I have also linked to the source code for most of this post, so feel free to dig a little deeper.
Until next time - happy digging!
------------------
This article was originally posted on AppSignal Blog