Hiii, it’s ya girl.
She is sleep deprieved because she need to adjust her sleep cycle 12 hours to prepare for JuliaCon 2023 in Boston.
Thus this blog-post is coming to you fresh from the small hours of the morning in Australia, to keep me awake.
Expect this to be more manic than usual.
I am here today to tell you that you are probabl wrong about what top-level code in julialang does.
Though if you understand precompilation well, then you do know that.
This post will explain what is even the point of __init__
.
Let’s say, I create a package: Foo.jl
.
(i’m actually doing this as I create this post. You’re welcome.)
Normally I would use PkgTemplates.jl for this.
But it’s 12:20am and am making a demonstration.
And lets give Foo/src/Foo.jl
the following content:
So we might expect that that package would define x
to a random value between 1 and 100 every time the package is loaded.
So lets load the package and check the value of Foo.x
and test that.
ok ok, valid. Now let’s restart julia and check again
And again:
So that’s kinda weird, the chance of the same number 3 times in a row is literally one in a million.
So something weird is happening.
It’s not re-rolling the number when the package is loaded.
_(This is where some people might guess that I forgot to seed the RNG. But that isn’t the case. Julia automatically seeds the RNG using LibUV’s uvrandom
entropy pool source)
Lets make a change to our package.
Let’s add a comment about what we have observed.
Editting Foo/src/Foo.jl
to say
Ok now lets check it again.
uh oh. It changed. Lets check again.
I am guessing many readers now have gotten the trick to it. It changes every time the package gets precompiled.
So what is going on during precompilation that does this?
Most people are probably well used to precompilation which loads up the package and saves stuff like the parsed functions etc. People who are a bit more involved will know about using precompilation to actually save the compiled code (especially in Julia 1.9+ where that saves always down to native code) for a particular method. Maybe you’ve used PrecompileTools to mark things to be precompiled that way. but these things are actually just a special case of what precompilation actually dones.
Precompilation just runs everything, and then saves the state of the julia runtime to disk. So this does mean that all the parsed and lowered function definitions are stored, and it does mean that any functions that called get their JIT compiled code saved (PrecompileTools just provides helpers for a bit more control over this), yes. Because that information is part of the julia runtime state. To be a bit more precise, precompilation doesn’t store the whole runtime state, it stores the subset of it which is owned or extended by this module.
Then when ever you load a package, that saved state is loaded up.
The source code at top level in the package is never run again.
When you call a function that code is run, not from file but out of the state that was stored during precompilation.
Which is where __init__
comes in.
The code in __init__
, unlike the top-level code of the module, is run whenever the module is loaded.
So let’s fix our package.
Editting Foo/src/Foo.jl
again:
We use @eval
as when run executes things in the global scope.
(we can’t use global x = rand(1:100)
since that woudldn’t make it const
).
Is it great to use @eval
in this way? I will leave that for a future post (kinda what prompted this post in the first place. I wanted to talk about @eval
but i needed people to understand precompilation first.).
It’s kinda moot as this particular use case is so weird it doesn’t truely matter.
and checking that restarting Julia restarts it:
And we are all good.
There are some interesting effects of how constants defined at the top level are precompiled.
The fact that precompilation actually just stores (more or less) the whole state of the julia run time is a whole thing.
This is why we can do things like using @eval
to generate lots of different methods, and have all those end up just as performant as if we had written them out by hand.
This can be used for data driven code.
It has been suggested using this for TimeZones.jl rather than how it currently manually serialized the data, and then loads it in __init__
.
I have also seen cases where there are hugely complex global values defined, and which thus blow-up the size of the compile cache to be many gigabytes.
I hope this post has been illuminating. It’s served it’s purpose of keeping me awake. It’s now almost 4am, so I am allowed to get ready for bed.