9. Today’s theme
• Ruby3's type.
• Some people held some meetings
to discuss Ruby3's type
– Matz, soutaro, akr, ko1, mame
– Main objective: clarify matz's hidden
requirements (and compromises) for
Ruby3's type
• (Not to decide everything behind closed door)
• We'll explain the (current) requirements
10. Agenda
• A whirlwind tour of already-proposed
"type systems" for Ruby
• Type DB: A key concept of Ruby3's
type system
• A missing part: Type profiler
12. Type-related systems for Ruby
• Steep
– Static type check
• RDL
– (Semi) static type check
• contracts.ruby
– Only dynamic check of arguments/return values
• dry-types
– Only dynamic checks of typed structs
• RubyTypeInference (by JetBrains)
– Type information extractor by dynamic analysis
• Sorbet (by Stripe)
13. RDL: Types for Ruby
• Most famous in academic world
– Jeff Foster at Univ. of Maryland
– Accepted in OOPSLA, PLDI, and POPL!
• The gem is available
– https://github.com/plum-umd/rdl
• We evaluated RDL
– thought writing type annotations for
OptCarrot
14. Basis for RDL
# load RDL library
require "rdl"
class NES
# activate type annotations for RDL
extend RDL::Annotate
# type annotation before method definition
type "(?Array<String>) -> self", typecheck: :call
def initialize(conf = ARGV)
...
15. RDL type annotation
• Accepts one optional parameter typed
Array of String
• Returns self
– Always "self" for initialize method
type "(?Array<String>) -> self", typecheck: :call
def initialize(conf = ARGV)
...
16. RDL type annotation
• "typecheck" controls type check timing
– :call: when this method is called
– :now: when this method is defined
– :XXX: when "RDL.do_typecheck :XXX" is
done
– nil: no "static check" is done
• Used to type-check code that uses the method
• Still "run-time check" is done
type "(?Array<String>) -> self", typecheck: :call
def initialize(conf = ARGV)
...
17. Annotation for instance variables
• Needs type annotations for all
instance variables
class NES
# activate type annotations for RDL
extend RDL::Annotate
var_type :@cpu, "%any"
type "() -> %any", typecheck: :call
def reset
@cpu.reset
#=> receiver type %any not supported yet
...
18. Annotation for instance variables
• Needs type annotations for all
instance variables
class NES
# activate type annotations for RDL
extend RDL::Annotate
var_type :@cpu, "[reset: () -> %any]"
type "() -> %any", typecheck: :call
def reset
@cpu.reset
#=> receiver type [reset: () -> %any] not sup
...
19. Annotation for instance variables
• Needs type annotations for all
instance variables
class NES
# activate type annotations for RDL
extend RDL::Annotate
var_type :@cpu, "Optcarrot::CPU"
type "() -> %any", typecheck: :call
def reset
@cpu.reset
# error: no type information for
# instance method `Optcarrot::CPU#reset'
20. Annotation for instance variables
• Succeeded to type check
class NES
# activate type annotations for RDL
extend RDL::Annotate
type "Optcarrot::CPU","reset","()->%any"
var_type :@cpu, "Optcarrot::CPU"
type "() -> %any", typecheck: :call
def reset
@cpu.reset
...
21. Requires many annotations...
type "() -> %bot", typecheck: :call
def reset
@cpu.reset
@apu.reset
@ppu.reset
@rom.reset
@pads.reset
@cpu.boot
@rom.load_battery
end
22. Requires many annotations...
type "() -> %bot", typecheck: nil
def reset
@cpu.reset
@apu.reset
@ppu.reset
@rom.reset
@pads.reset
@cpu.boot
@rom.load_battery
end
No static
check
23. … still does not work
type "() -> %bot", typecheck: nil
def reset
...
@rom.load_battery #=> [65533]
end
# Optcarrot::CPU#reset: Return type error.…
# Method type:
# *() -> %bot
# Actual return type:
# Array
# Actual return value:
# [65533]
24. Why?
• typecheck:nil doesn't mean no check
– Still dynamic check is done
• %bot means "no-return"
– Always raises exception, process exit, etc.
– But this method returns [65533]
– In short, this is my bug in the annotation
type "() -> %bot", typecheck: nil
def reset
...
@rom.load_battery #=> [65533]
end
25. Lessons: void type
• In Ruby, a lot of methods return
meaningless value
– No intention to
allow users
to use the value
• What type should we use in this case?
– %any, or return nil explicitly?
• We need a "void" type
– %any for the method; it can return anything
– "don't use" for users of the method
def reset
LIBRARY_INTERNAL_ARRAY.
each { … }
end
26. RDL's programmable annotation
• RDL supports meta-programming
symbols.each do |id|
attr_reader_type, id, "String"
attr_reader id
end
27. RDL's programmable annotation
• RDL supports pre-condition check
– This can be also used to make type
annotation automatically
• I like this feature, but matz doesn't
– He wants to avoid type annotations
embedded in the code
– He likes separated, non-Ruby type definition
language (as Steep)
pre(:belongs_to) do |name|
……
type name, "() -> #{klass}"
end
28. Summary: RDL
• Semi-static type check
– The timing is configurable
• It checks the method body
– Not only dynamic check of
arguments/return values
• The implementation is mature
– Many features actually works, great!
• Need type annotations
• Supports meta-programming
29. Steep
• Snip: You did listen to soutaro's talk
• Completely static type check
• Separated type definition language
– .rbi
– But also requires (minimal?) type
annotation embedded in .rb files
30. Digest: contracts.ruby
require 'contracts'
class Example
include Contracts::Core
include Contracts::Builtin
Contract Num => Num
def double(x)
x * 2
end
end
• RDL-like type annotation
– Run-time type check
31. Digest: dry-types
require 'dry-types'
require 'dry-struct'
module Types
include Dry::Types.module
end
class User < Dry::Struct
attribute :name, Types::String
attribute :age, Types::Integer
end
• Can define structs with typed fields
– Run-time type check
– "type_struct" gem is similar
32. Digest: RubyTypeInference
• Type information extractor by dynamic
analysis
– Run test suites under monitoring of
TracePoint API
– Hooks method call/return events, logs
the passed values, and aggregate them
to type information
– Used by RubyMine IDE
34. Summary of Type Systems
Objective Targets Annotations
Steep Static type
check
Method body Separated
(mainly)
RDL Semi-static
type check
Method body Embedded in
code
contracts.
ruby
Dynamic
type check
Arguments and
return values
Embedded in
code
dry-types Typed
structs
Only Dry::Struct
classes
Embedded in
code
RubyType
Inference
Extract type
information
Arguments and
return values
N/A
36. Idea
• Separated type definition file is good
• But meta-programming like attr_* is
difficult to support
– Users will try to generate it programmatically
• We may want to keep code position
– To show lineno of code in type error report
– Hard to manually keep the correspondence
between type definition and code position
in .rbi file
– We may also want to keep other information
38. How to create Type DB
Type
DB
Steep type
definition
Ruby
code
write
manually compile
stdlib
Already included
RubyTypeInference
automatically extract by dynamic analysis
Type Profiler
40. Type Profiler
• Another way to extract type information
from Ruby code
– Alternative "RubyTypeInference"
• Is not a type inference
– Type inference of Ruby is hopeless
– Conservative static type inference can
extracts little information
• Type profiler "guesses" type information
– It may extract wrong type information
– Assumes that user checks the result
41. Type Profilers
• There is no "one-for-all" type profiler
– Static type profiling cannot handle
ActiveRecord
– Dynamic type profiling cannot extract
syntactic features (like void type)
• We need a variety of type profilers
– For ActiveRecord by reading DB schema
– Extracting from RDoc/YARD
42. In this talk
• We prototyped three more generic
type profilers
– Static analysis 1 (SA1)
• Mainly for used-defined classes
– Static analysis 2 (SA2)
• Mainly for builtin classes
– Dynamic analysis (DA)
• Enhancement of "RubyTypeInference"
43. SA1: Idea
• Guess a type of formal parameters
based on called method names
class FooBar
def foo(...); ...; end
def bar(...); ...; end
end
def func(x) #=> x:FooBar
x.foo(1)
x.bar(2)
end
44. SA1: Prototyped algorithm
• Gather method
definitions in each
class/modules
– FooBar={foo,bar}
• Gather method calls
for each parameters
– x={foo,bar}
– Remove general methods (like #[] and #+)
to reduce false positive
– Arity, parameter and return types aren't used
• Assign a class that all methods match
class FooBar
def foo(...);...;end
def bar(...);...;end
end
def func(x)
x.foo(1)
x.bar(2)
end
45. SA1: Evaluation
• Experimented SA1 with WEBrick
– As a sample code that has many user-
defined classes
• Manually checked the guessed result
– Found some common guessing failures
• Wrong result / no-match result
– No quantitative evaluation yet
46. SA1: Problem 1
• A parameter is not used
• Many methods are affected
def do_GET(req, res)
raise HTTPStatus::NotFound, "not found."
end
DefaultFileHandler#do_GET(req:#{}, res:HTTPResponse)
FileHandler#do_GET(req:#{}, res:#{})
AbstractServlet#do_GET(req:#{}, res:#{})
ProcHandler#do_GET(request:#{}, response:#{})
ERBHandler#do_GET(req:#{}, res:HTTPResponse)
47. SA1: Problem 2
• Incomplete guessing
• Cause
– the method calls req.request_uri
– Both HTTPResponse and HTTPRequest
provides request_uri
HTTPProxyServer#perform_proxy_request(
req: HTTPResponse | HTTPRequest,
res: WEBrick::HTTPResponse,
req_class:#{new}, :nil)
48. (Argurable) solution?
• Exploit the name of parameter
– Create a mapping from parameter name
to type after profiling
• "req" HTTPRequest
– Revise guessed types using the mapping
• Fixed!
DefaultFileHandler#do_GET(req:HTTPRequest, res:HTTPResponse)
FileHandler#do_GET(req:HTTPRequest, res:HTTPResponse)
AbstractServlet#do_GET(req:HTTPRequest, res:HTTPResponse)
ProcHandler#do_GET(request:#{}, response:#{})
ERBHandler#do_GET(req:HTTPRequest, res:HTTPResponse)
CGIHandler#do_GET(req:HTTPRequest, res:HTTPResponse)
49. SA1: Problem 3
• Cannot guess return type
• Can guess in only limited cases
– Returns formal parameter
– Returns a literal or "Foo.new"
– Returns an expression which is already
included Type DB
• See actual usage of the method?
– Requires inter-procedural or
whole-program analysis!
50. SA1: Pros/Cons
• Pros
– No need to run tests
– Can guess void type
• Cons
– Hard when parameters are not used
• This is not a rare case
– Heuristic may work, but cause wrong
guessing
51. SA2: Idea
• I believe this method expects Numeric!
def add_42(x) #=> (x:Num)=>Num
x + 42
end
52. SA2: Prototyped algorithm
• Limited type DB of stdlib
– Num#+(Num) Num
– Str#+(Str) Str, etc.
• "Unification-based type-inference"
inspired algorithm
– searches "α#+(Num) β"
– Matches "Num#+(Num) Num"
• Type substitution: α=Num, β=Num
x + 42
53. SA2: Prototyped algorithm (2)
• When multiple candidates found
– matches:
• Num#<<(Num) Num
• Str#<<(Num) Str
• Array[α]#<<(α) Array[α]
– Just take union types of them
• (Overloaded types might be better)
def push_42(x)
x << 42
end
#=> (x:(Num|Str|Array))=>(Num|Str|Array)
x << 42
54. SA2: Evaluation
• Experimented SA1 with OptCarrot
– As a sample code that uses many builtin
types
• Manually checked the guessed result
– Found some common guessing failures
• Wrong result / no-match result
– No quantitative evaluation yet
55. SA2: Problem 1
• Surprising result
– Counterintuitive, but actually it works
with @fetch:Array[Num|Str]
def peek16(addr)
@fetch[addr] + (@fetch[addr + 1] << 8)
end
# Optcarrot::CPU#peek16(Num) => (Num|Str)
56. SA2: Problem 2
• Difficult to handle type parameters
– Requires constraint-based type-inference
@ary = [] # Array[α]
@ary[0] = 1 # unified to Array[Num]
@ary[1] = "str" # cannot unify Num and Str
57. SA2: Pros/Cons
• Pros
– No need to run tests
– Can guess void type
– Can guess parameters that is not used as a
receiver
• Cons
– Cause wrong guessing
– Hard to handle type parameters (Array[α])
– Hard to scale
• The bigger type DB is, more wrong results will
happen
58. DA: Idea
• Recording actual inputs/output of
methods by using TracePoint API
– The same as RubyTypeInference
• Additional features
– Support block types
• Required enhancement of TracePoint API
– Support container types: Array[Int]
• By sampling elements
60. DA: Problem 1
• Very slow (in some cases)
– Recording OptCarrot may take hours
– Element-sampling for Array made it faster,
but still take a few minutes
• Without tracing, it runs in a few seconds
– It may depend on application
• Profiling WEBrick is not so slow
61. DA: Problem 2
• Cannot guess void type
– Many methods returns garbage
– DA cannot distinguish garbage and
intended return value
• SA can guess void type by heuristic
– Integer#times, Array#each, etc.
– if statement that has no "else"
– while and until statements
– Multiple assignment
• (Steep scaffold now supports some of them)
62. DA: Problem 3
• Some tests confuse the result
– Need to ignore error-handling tests by
cooperating test framework
assert_raise(TypeError) { … }
63. DA: Pros/Cons
• Pros
– Easy to implement, and robust
– It can profile any programs
• Including meta-programming like
ActiveRecord
• Cons
– Need to run tests; it might be very slow
– Hard to handle void type
– TracePoint API is not enough yet
– Need to cooperate with test frameworks
64. Conclusion
• Reviewed already-proposed type
systems for Ruby
– Whose implementations are available
• Type DB: Ruby3's key concept
• Some prototypes and experiments of
type profilers
– Need more improvements / experiments!