WaterLily

Introduction and Quickstart

WaterLilyModule

WaterLily.jl

Dev Examples CI codecov

Julia flow

Overview

WaterLily.jl is a simple and fast fluid simulator written in pure Julia. This project is supported by awesome libraries developed within the Julia scientific community, and it aims to accelerate and enhance fluid simulations. Watch the JuliaCon2024 talk here:

JuliaCon2024 still and link

If you have used WaterLily for research, please cite us! The 2024 paper describes the main features of the solver and provides benchmarking, validation, and profiling results.

@misc{WeymouthFont2024,
    title         = {WaterLily.jl: A differentiable and backend-agnostic Julia solver to simulate incompressible viscous flow and dynamic bodies},
    author        = {Gabriel D. Weymouth and Bernat Font},
    url           = {https://arxiv.org/abs/2407.16032},
    eprint        = {2407.16032},
    archivePrefix = {arXiv},
    year          = {2024},
    primaryClass  = {physics.flu-dyn}
}

Method/capabilities

WaterLily solves the unsteady incompressible 2D or 3D Navier-Stokes equations on a Cartesian grid. The pressure Poisson equation is solved with a geometric multigrid method. Solid boundaries are modelled using the Boundary Data Immersion Method. The solver can run on serial CPU, multi-threaded CPU, or GPU backends.

Example: Flow over a circle

WaterLily lets the user can set the domain size and boundary conditions, the fluid viscosity (which determines the Reynolds number), and immerse solid obstacles. A large selection of examples, notebooks, and tutorials are found in the WaterLily-Examples repository. Here, we will illustrate the basics by simulating and plotting the flow over a circle.

We define the size of the simulation domain as n by m cells. The circle has radius m/8 and is centered at (m/2,m/2). The flow boundary conditions are (U,0), where we set U=1, and the Reynolds number is Re=U*radius/ν where ν (Greek "nu" U+03BD, not Latin lowercase "v") is the kinematic viscosity of the fluid.

using WaterLily
function circle(n,m;Re=100,U=1)
    # signed distance function to circle
    radius, center = m/8, m/2-1
    sdf(x,t) = √sum(abs2, x .- center) - radius

    Simulation((n,m),   # domain size
               (U,0),   # domain velocity (& velocity scale)
               2radius; # length scale
               ν=U*2radius/Re,     # fluid viscosity
               body=AutoBody(sdf)) # geometry
end

The circle geometry is defined using a signed distance function. The AutoBody function uses automatic differentiation to infer the other geometric parameters of the body automatically. Replace the circle's distance function with any other, and now you have the flow around something else... such as a donut or the Julia logo. For more complex geometries, ParametricBodies.jl defines a body using any parametric curve, such as a spline. See that repo (and the video above) for examples.

The code block above return a Simulation with the parameters we've defined. Now we can initialize a simulation (first line) and step it forward in time (second line)

circ = circle(3*2^5,2^6)
sim_step!(circ)

Note we've set n,m to be multiples of powers of 2, which is important when using the (very fast) geometric multi-grid solver.

We can now access and plot whatever variables we like. For example, we can plot the x-component of the velocity field using

using Plots
u = circ.flow.u[:,:,1] # first component is x
contourf(u') # transpose the array for the plot

Initial velocity field

As you can see, the velocity within the circle is zero, the velocity far from the circle is one, and there are accelerated and decelerated regions around the circle. The sim_step! has only taken a single time step, and this initial flow around our circle looks similar to the potential flow because the viscous boundary layer has not separated yet.

A set of flow metric functions have been implemented, and we can use them to measure the simulation. The following code block defines a function to step the simulation to time t and then use the pressure_force metric to measure the force on the immersed body. The function is applied over a time range, and the forces are plotted.

function get_forces!(sim,t)
    sim_step!(sim,t,remeasure=false)
    force = WaterLily.pressure_force(sim)
    force./(0.5sim.L*sim.U^2) # scale the forces!
end

# Simulate through the time range and get forces
time = 1:0.1:50 # time scale is sim.L/sim.U
forces = [get_forces!(circ,t) for t in time];

#Plot it
plot(time,[first.(forces), last.(forces)],
    labels=permutedims(["drag","lift"]),
    xlabel="tU/L",
    ylabel="Pressure force coefficients")

Pressure forces

We can also plot the vorticity field instead of the u-velocity to see a snap-shot of the wake.

# Use curl(velocity) to compute vorticity `inside` the domain
ω = zeros(size(u));
@inside ω[I] = WaterLily.curl(3,I,circ.flow.u)*circ.L/circ.U

# Plot it
clims = (-6,6)
contourf(clamp.(ω,clims...)'; clims,
    color=palette(:RdBu,9),linewidth=0,levels=8,
    aspect_ratio=:equal,border=:none)

Vorticity field

As you can see, WaterLily correctly predicts that the flow is unsteady, with an alternating vortex street wake, leading to an oscillating side force and drag force.

Multi-threading and GPU backends

WaterLily uses KernelAbstractions.jl to multi-thread on CPU and run on GPU backends. The implementation method and speed-up are documented in the 2024 paper, with costs as low as 1.44 nano-seconds measured per degree of freedom and time step!

Note that multi-threading requires starting Julia with the --threads argument, see the multi-threading section of the manual. If you are running Julia with multiple threads, KernelAbstractions will detect this and multi-thread the loops automatically.

Running on a GPU requires initializing the Simulation memory on the GPU, and care needs to be taken to move the data back to the CPU for visualization. As an example, let's compare a 3D GPU simulation of a sphere to the 2D multi-threaded CPU circle defined above

using CUDA,WaterLily
function sphere(n,m;Re=100,U=1,T=Float64,mem=Array)
    radius, center = m/8, m/2-1
    body = AutoBody((x,t)->√sum(abs2, x .- center) - radius)
    Simulation((n,m,m),(U,0,0), # 3D array size and BCs
                2radius;ν=U*2radius/Re,body, # no change
                T,   # Floating point type
                mem) # memory type
end

@assert CUDA.functional()      # is your CUDA GPU working??
GPUsim = sphere(3*2^5,2^6;T=Float32,mem=CuArray); # 3D GPU sim!
println(length(GPUsim.flow.u)) # 1.3M degrees-of freedom!
sim_step!(GPUsim)              # compile GPU code & run one step
@time sim_step!(GPUsim,50,remeasure=false) # 40s!!

CPUsim = circle(3*2^5,2^6);    # 2D CPU sim
println(length(CPUsim.flow.u)) # 0.013M degrees-of freedom!
sim_step!(CPUsim)              # compile GPU code & run one step
println(Threads.nthreads())    # I'm using 8 threads
@time sim_step!(CPUsim,50,remeasure=false) # 28s!!

As you can see, the 3D sphere set-up is almost identical to the 2D circle, but using 3D arrays means there are almost 1.3M degrees-of-freedom, 100x bigger than in 2D. Never the less, the simulation is quite fast on the GPU, only around 40% slower than the much smaller 2D simulation on a CPU with 8 threads. See the 2024 paper and the examples repo for many more non-trivial examples including running on AMD GPUs.

Finally, KernelAbstractions does incur some CPU allocations for every loop, but other than this sim_step! is completely non-allocating. This is one reason why the speed-up improves as the size of the simulation increases.

Contributing and issues

We always appreciate new contributions, so please submit a pull request with your changes and help us make WaterLily even better! Note that contributions need to be submitted together with benchmark results - WaterLily should always be fast! 😃 For this, we have a fully automated benchmarking suite that conducts performance tests. In short, to compare your changes with the latest WaterLily, clone the that repo and run the benchmarks with

git clone https://github.com/WaterLily-jl/WaterLily-Benchmarks && cd WaterLily-Benchmarks
sh benchmark.sh -wd "<your/waterlily/path>" -w "<your_waterlily_branch> master"
julia --project compare.jl

This will run benchmarks for CPU and GPU backends. If you do not have a GPU, simply pass -b "Array" when runnning benchmark.sh. More information on the benchmark suite is available in that README.

Of course, ideas, suggestions, and questions are welcome too! Please raise an issue to address any of these.

Development goals

  • Immerse obstacles defined by 3D meshes (Meshing.jl)
  • Multi-CPU/GPU simulations (https://github.com/WaterLily-jl/WaterLily.jl/pull/141)
  • Free-surface physics with (Volume-of-Fluid) or other methods.
source

Types Methods and Functions

WaterLily.AbstractBodyType
AbstractBody

Immersed body Abstract Type. Any AbstractBody subtype must implement

d = sdf(body::AbstractBody, x, t=0)

and

d,n,V = measure(body::AbstractBody, x, t=0, fastd²=Inf)

where d is the signed distance from x to the body at time t, and n & V are the normal and velocity vectors implied at x. A fast-approximate method can return ≈d,zero(x),zero(x) if d^2>fastd².

source
WaterLily.AbstractPoissonType
Poisson{N,M}

Composite type for conservative variable coefficient Poisson equations:

∮ds β ∂x/∂n = σ

The resulting linear system is

Ax = [L+D+L']x = z

where A is symmetric, block-tridiagonal and extremely sparse. Moreover, D[I]=-∑ᵢ(L[I,i]+L'[I,i]). This means matrix storage, multiplication, ect can be easily implemented and optimized without external libraries.

To help iteratively solve the system above, the Poisson structure holds helper arrays for inv(D), the error ϵ, and residual r=z-Ax. An iterative solution method then estimates the error ϵ=̃A⁻¹r and increments x+=ϵ, r-=Aϵ.

source
WaterLily.AutoBodyType
AutoBody(sdf,map=(x,t)->x; compose=true) <: AbstractBody
  • sdf(x::AbstractVector,t::Real)::Real: signed distance function
  • map(x::AbstractVector,t::Real)::AbstractVector: coordinate mapping function
  • compose::Bool=true: Flag for composing sdf=sdf∘map

Implicitly define a geometry by its sdf and optional coordinate map. Note: the map is composed automatically if compose=true, i.e. sdf(x,t) = sdf(map(x,t),t). Both parameters remain independent otherwise. It can be particularly heplful to set compose=false when adding mulitple bodies together to create a more complex one.

source
WaterLily.BodiesType
Bodies(bodies, ops::AbstractVector)
  • bodies::Vector{AutoBody}: Vector of AutoBody
  • ops::Vector{Function}: Vector of operators for the superposition of multiple AutoBodys

Superposes multiple body::AutoBody objects together according to the operators ops. While this can be manually performed by the operators implemented for AutoBody, adding too many bodies can yield a recursion problem of the sdf and map functions not fitting in the stack. This type implements the superposition of bodies by iteration instead of recursion, and the reduction of the sdf and map functions is done on the mesure function, and not before. The operators vector ops specifies the operation to call between two consecutive AutoBodys in the bodies vector. Note that + (or the alias ) is the only operation supported between Bodies.

source
WaterLily.FlowType
Flow{D::Int, T::Float, Sf<:AbstractArray{T,D}, Vf<:AbstractArray{T,D+1}, Tf<:AbstractArray{T,D+2}}

Composite type for a multidimensional immersed boundary flow simulation.

Flow solves the unsteady incompressible Navier-Stokes equations on a Cartesian grid. Solid boundaries are modelled using the Boundary Data Immersion Method. The primary variables are the scalar pressure p (an array of dimension D) and the velocity vector field u (an array of dimension D+1).

source
WaterLily.SimulationType
Simulation(dims::NTuple, u_BC::Union{NTuple,Function}, L::Number;
           U=norm2(u_BC), Δt=0.25, ν=0., ϵ=1, perdir=()
           uλ::nothing, g=nothing, exitBC=false,
           body::AbstractBody=NoBody(),
           T=Float32, mem=Array)

Constructor for a WaterLily.jl simulation:

  • dims: Simulation domain dimensions.
  • u_BC: Simulation domain velocity boundary conditions, either a tuple u_BC[i]=uᵢ, i=eachindex(dims), or a time-varying function f(i,t)
  • L: Simulation length scale.
  • U: Simulation velocity scale.
  • Δt: Initial time step.
  • ν: Scaled viscosity (Re=UL/ν).
  • g: Domain acceleration, g(i,t)=duᵢ/dt
  • ϵ: BDIM kernel width.
  • perdir: Domain periodic boundary condition in the (i,) direction.
  • exitBC: Convective exit boundary condition in the i=1 direction.
  • : Function to generate the initial velocity field.
  • body: Immersed geometry.
  • T: Array element type.
  • mem: memory location. Array, CuArray, ROCm to run on CPU, NVIDIA, or AMD devices, respectively.

See files in examples folder for examples.

source
WaterLily.BC!Function
BC!(a,A)

Apply boundary conditions to the ghost cells of a vector field. A Dirichlet condition a[I,i]=A[i] is applied to the vector component normal to the domain boundary. For example aₓ(x)=Aₓ ∀ x ∈ minmax(X). A zero Neumann condition is applied to the tangential components.

source
WaterLily.Jacobi!Method
Jacobi!(p::Poisson; it=1)

Jacobi smoother run it times. Note: This runs for general backends, but is very slow to converge.

source
WaterLily.apply!Method
apply!(f, c)

Apply a vector function f(i,x) to the faces of a uniform staggered array c or a function f(x) to the center of a uniform array c.

source
WaterLily.check_nthreadsMethod
check_nthreads(::Val{1})

Check the number of threads available for the Julia session that loads WaterLily. A warning is shown when running in serial (JULIANUMTHREADS=1).

source
WaterLily.curlMethod
curl(i,I,u)

Compute component i of $𝛁×𝐮$ at the edge of cell I. For example curl(3,CartesianIndex(2,2,2),u) will compute ω₃(x=1.5,y=1.5,z=2) as this edge produces the highest accuracy for this mix of cross derivatives on a staggered grid.

source
WaterLily.curvatureMethod
curvature(A::AbstractMatrix)

Return H,K the mean and Gaussian curvature from A=hessian(sdf). K=tr(minor(A)) in 3D and K=0 in 2D.

source
WaterLily.exitBC!Method
exitBC!(u,u⁰,U,Δt)

Apply a 1D convection scheme to fill the ghost cell on the exit of the domain.

source
WaterLily.insideMethod
inside(a)

Return CartesianIndices range excluding a single layer of cells on all boundaries.

source
WaterLily.inside_uMethod
inside_u(dims,j)

Return CartesianIndices range excluding the ghost-cells on the boundaries of a vector array on face j with size dims.

source
WaterLily.interpMethod
interp(x::SVector, arr::AbstractArray)

Linear interpolation from array `arr` at index-coordinate `x`.
Note: This routine works for any number of dimensions.
source
WaterLily.keMethod
ke(I::CartesianIndex,u,U=0)

Compute $½∥𝐮-𝐔∥²$ at center of cell I where U can be used to subtract a background flow (by default, U=0).

source
WaterLily.locMethod
loc(i,I) = loc(Ii)

Location in space of the cell at CartesianIndex I at face i. Using i=0 returns the cell center s.t. loc = I.

source
WaterLily.loggerFunction
logger(fname="WaterLily")

Set up a logger to write the pressure solver data to a logging file named WaterLily.log.

source
WaterLily.measure!Function
measure!(sim::Simulation,t=timeNext(sim))

Measure a dynamic body to update the flow and pois coefficients.

source
WaterLily.measure!Method
measure!(flow::Flow, body::AbstractBody; t=0, ϵ=1)

Queries the body geometry to fill the arrays:

  • flow.μ₀, Zeroth kernel moment
  • flow.μ₁, First kernel moment scaled by the body normal
  • flow.V, Body velocity

at time t using an immersion kernel of size ϵ.

See Maertens & Weymouth, doi:10.1016/j.cma.2014.09.007.

source
WaterLily.measureMethod
d,n,V = measure(body::AutoBody||Bodies,x,t;fastd²=Inf)

Determine the implicit geometric properties from the sdf and map. The gradient of d=sdf(map(x,t)) is used to improve d for pseudo-sdfs. The velocity is determined solely from the optional map function. Skips the n,V calculation when d²>fastd².

source
WaterLily.mult!Method
mult!(p::Poisson,x)

Efficient function for Poisson matrix-vector multiplication. Fills p.z = p.A x with 0 in the ghost cells.

source
WaterLily.pcg!Method
pcg!(p::Poisson; it=6)

Conjugate-Gradient smoother with Jacobi preditioning. Runs at most it iterations, but will exit early if the Gram-Schmidt update parameter |α| < 1% or |r D⁻¹ r| < 1e-8. Note: This runs for general backends and is the default smoother.

source
WaterLily.perBC!Method
perBC!(a,perdir)

Apply periodic conditions to the ghost cells of a scalar field.

source
WaterLily.reduce_sdf_mapMethod
reduce_sdf_map(sdf_a,map_a,d_a,sdf_b,map_b,d_b,op,x,t)

Reduces two different sdf and map functions, and d value.

source
WaterLily.residual!Method
residual!(p::Poisson)

Computes the resiual r = z-Ax and corrects it such that r = 0 if iD==0 which ensures local satisfiability and sum(r) = 0 which ensures global satisfiability.

The global correction is done by adjusting all points uniformly, minimizing the local effect. Other approaches are possible.

Note: These corrections mean x is not strictly solving Ax=z, but without the corrections, no solution exists.

source
WaterLily.sdf_map_dMethod
sdf_map_d(ab::Bodies,x,t)

Returns the sdf and map functions, and the distance d (d=sdf(x,t)) for the Bodies type.

source
WaterLily.sim_step!Method
sim_step!(sim::Simulation,t_end=sim(time)+Δt;max_steps=typemax(Int),remeasure=true,verbose=false)

Integrate the simulation sim up to dimensionless time t_end. If remeasure=true, the body is remeasured at every time step. Can be set to false for static geometries to speed up simulation.

source
WaterLily.sim_timeMethod
sim_time(sim::Simulation)

Return the current dimensionless time of the simulation tU/L where t=sum(Δt), and U,L are the simulation velocity and length scales.

source
WaterLily.sliceMethod
slice(dims,i,j,low=1)

Return CartesianIndices range slicing through an array of size dims in dimension j at index i. low optionally sets the lower extent of the range in the other dimensions.

source
WaterLily.solver!Method
solver!(A::Poisson;log,tol,itmx)

Approximate iterative solver for the Poisson matrix equation Ax=b.

  • A: Poisson matrix with working arrays.
  • A.x: Solution vector. Can start with an initial guess.
  • A.z: Right-Hand-Side vector. Will be overwritten!
  • A.n[end]: stores the number of iterations performed.
  • log: If true, this function returns a vector holding the L₂-norm of the residual at each iteration.
  • tol: Convergence tolerance on the L₂-norm residual.
  • itmx: Maximum number of iterations.
source
WaterLily.δMethod
δ(i,N::Int)
δ(i,I::CartesianIndex{N}) where {N}

Return a CartesianIndex of dimension N which is one at index i and zero elsewhere.

source
WaterLily.ωMethod
ω(I::CartesianIndex{3},u)

Compute 3-vector $𝛚=𝛁×𝐮$ at the center of cell I.

source
WaterLily.ω_θMethod
ω_θ(I::CartesianIndex{3},z,center,u)

Compute $𝛚⋅𝛉$ at the center of cell I where $𝛉$ is the azimuth direction around vector z passing through center.

source
WaterLily.∂Method
∂(i,j,I,u)

Compute $∂uᵢ/∂xⱼ$ at center of cell I. Cross terms are computed less accurately than inline terms because of the staggered grid.

source
WaterLily.@insideMacro
@inside <expr>

Simple macro to automate efficient loops over cells excluding ghosts. For example,

@inside p[I] = sum(loc(0,I))

becomes

@loop p[I] = sum(loc(0,I)) over I ∈ inside(p)

See @loop.

source
WaterLily.@loopMacro
@loop <expr> over <I ∈ R>

Macro to automate fast loops using @simd when running in serial, or KernelAbstractions when running multi-threaded CPU or GPU.

For example

@loop a[I,i] += sum(loc(i,I)) over I ∈ R

becomes

@simd for I ∈ R
    @fastmath @inbounds a[I,i] += sum(loc(i,I))
end

on serial execution, or

@kernel function kern(a,i,@Const(I0))
    I ∈ @index(Global,Cartesian)+I0
    @fastmath @inbounds a[I,i] += sum(loc(i,I))
end
kern(get_backend(a),64)(a,i,R[1]-oneunit(R[1]),ndrange=size(R))

when multi-threading on CPU or using CuArrays. Note that get_backend is used on the first variable in expr (a in this example).

source