I am an OOP programmer going back to the late 80s (including the cfront days of C++), and a serious user of Python since 2007.
In Python, I sometimes try data-oriented programming, using lists and dicts to structure data. And I find that it does not work well. Once I get two or more levels of nesting, I find it far too easy to get confused about which level I'm on, which is not helped by Python's lack of strong typing. In these situations, I often introduce objects that wrap the map or dict, and have methods that make sense for that level. In other words, the objects can be viewed as providing clear documentation for the whole nested structure, and how it can be navigated.
>Once I get two or more levels of nesting, I find it far too easy to get confused about which level I'm on
Author here, I agree with you. I have the working memory of a small pigeon.
The flavor of data orientation we cover in the book leverages strongly typed representations of data (as opposed to using hash maps everywhere). So you'll always know what's shape it's in (and the compiler enforces it!). We spend a lot of time exploring the role that the type system can play in our programming and how we represent data.
Given the strongly typed flavour of data oriented programming, I wonder if you have any thoughts on the "proliferation of types" problem. How to avoid, especially in a nominally typed language like Java, an explosion of aggregate types for every context where there may be a slight change in what fields are present, what their types are, and which ones are optional. Basically, Rich Hickey's Maybe Not talk.
record Make(makeId, name)
record Model(modelId, name)
record Car(make, model, year)
record Car(makeId, modelId, year)
record Car(make, model)
record Car(makeId, modelId)
record Car(make, year)
record Car(makeId, year)
record Car(make, model, year, colour)
record Car(makeId, modelId, year, colour)
record Car(year, colour)
....
Or in a sane world code generate a bunch of constructors.
In the field of ontology (say OWL and RDF) there is a very different viewpoint about ‘Classes’ in the objects gain classes as they gain attributes. :Taylor_Swift is a :Person because she has a :birthDate, :birthPlace and such but was not initially a :Musician until she :playsInstrument, :recordedTrack, :performedConcert and such. Most languages have object systems like Java or C++ where a Person can’t start out as not a Musician but become one later like the way they can in real life.
Notably in a system like the the terrible asymmetry of where does an attribute really belong is resolved, as in real life you don’t have to say it is primary that Taylor Swift recorded the Album Fearless or that Fearless was recorded by Taylor Swift.
It’s a really fascinating question in my mind how you create a ‘meta object facility’ that puts a more powerful object system on your fingers in a language like Java or Python, for instance you can have something like
taylorSwift.as(Musician.class)
which returns something that implements the Musician.class interface if
What I am talking about is more dynamic, although meta-objects could be made more static too.
Particularly, I am not a Musician now but if I learned to play an instrument or performed at a concert I could become a Musician. This could be implemented as
paulHoule.isA(Musician.class) # false
paulHoule.as(Musician.class).playsInstruments() # an empty Set<Instrument>
paulHoule.as(Musician.class).playsInstruments().add(trumpet)
paulHoule.isA(Musician.class) # now true
I really did build a very meta object facility that represented objects from this system
in an RDF graph and provided an API in Python that made those objects look mostly Pythonic. Inheritance in MOF is like Java so I didn't need to use any tricks to make dynamic classes (possible in RDF) available.
I haven't yet had the luxury to experiment with the latest version of Java, but this is one of the reasons why I wish Java introduced named parameters the say way kotlin and scala do.
Eg:
data class Make(makeId: String, name: String)
data class Model(modelId: String, name: String)
data class Car(make: Make, model: Model, year: String, ...)
Now you can go ahead and order the params whichever way you wish so long as you're explicitly naming them:
val v1 = Car(make = myMake1, model = myModel1, year = "2023", ...)
val v1 = Car(model = myModel1, make = myMake1, year = "2023", ...)
Once withers land, I think you could approximate this by letting your record class have a zero argument constructor which sets every field to some blank value, and then fill the fields using `with`.
var x = new Car() with { make = "Volvo"; year = "2023"; };
If you want the Car constructor to enforce constraints, you could use this pattern in a separate Builder record:
record Car(String make, String year) {
Car {
Objects.requireNonNull(make);
Objects.requireNonNull(year);
}
record Builder(String make, String year) {
Builder() {
this(null, null);
}
Car build() {
return new Car(make, year);
}
}
}
var x = new Car.Builder() with { make = "Volvo"; year = "2023"; }.build();
So much syntax to enable something that other languages have had for 10+ years. That's why I can't take the "Java is as good as Kotlin now" arguments seriously.
I love that talk (and most of Rich's stuff). I consider myself a Clojure fanboy that got converted to the dark side of strong static typing.
I think, to some degree, he actually answers that question as part of his talk (in between beating up nominal types). Optionality often pops up in place of understanding (or representing) that data has a context. If you model your program so that it has "15 maybe sheep," then... you'll have 15 "maybe sheep" you've got to deal with.
The possible combinations of all data types that could be made is very different from the subset that actually express themselves in our programs. Meaning, the actual "explosion" is fairly constrained in practice because (most) businesses can't function under combinatorial pressures. There's some stuff that matters, and some stuff that doesn't. We only have to apply typing rigor to the stuff that matters.
Where I do find type explosions tedious and annoying is not in expressing every possible combination, but in trying to express the slow accretion of information. (I think he talks about this in one of his talks, too). Invoice, then InvoiceWithCustomer, then InvoiceWithCustomerAndId, etc... the world that microservices have doomed us to representing.
I don't know a good way to model that without intersection types or something like Rows in purescript. In Java, it's a pain point for sure.
My sense is that what's needed is a generalization of the kinds of features offered by TypeScript for mapping types to new types (e.g. Partial<T>) "arithmetically".
For example I often really directly want to express is "T but minus/plus this field" with the transformations that attach or detach fields automated.
In an ideal world I would like to define what a "base" domain object is shaped like, and then express the differences from it I care about (optionalizing, adding, removing, etc).
For example, I might have a Widget that must always have an ID but when I am creating a new Widget I could just write "Widget - {.id}" rather than have to define an entire WidgetCreateDTO or some such.
> For example, I might have a Widget that must always have an ID but when I am creating a new Widget I could just write "Widget - {.id}" rather than have to define an entire WidgetCreateDTO or some such.
In this case you're preferring terseness vs a true representation of the meaning of the type. Assuming that a Widget needs an ID, having another type to express a Widget creation data makes sense, it's more verbose but it does represent the actual functioning better, you pass data that will be used to create a valid Widget in its own type (your WidgetCreationDTO), getting a Widget as a result of the action.
> Assuming that a Widget needs an ID, having another type to express a Widget creation data makes sense, it's more verbose but it does represent the actual functioning better
I agree with this logically. The problem is that the proliferation of such types for various use cases is extremely detrimental to the development process (many more places need to be updated) and it's all too easy for a change to be improperly propagated.
What you're saying is correct and appropriate I think for mature codebases with "settled" domains and projects with mature testing and QA processes that are well into maintenance over exploration/iteration. But on the way there, the overhead induced by a single domain object whose exact definition is unstable potentially proliferating a dozen types is developmentally/procedurally toxic.
To put a finer point on it: be fully explicit when rate of change is expected to be slow, but when rate of change is expected to be high favor making changes easy.
> What you're saying is correct and appropriate I think for mature codebases with "settled" domains and projects with mature testing and QA processes that are well into maintenance over exploration/iteration. But on the way there, the overhead induced by a single domain object whose exact definition is unstable potentially proliferating a dozen types is developmentally/procedurally toxic.
> To put a finer point on it: be fully explicit when rate of change is expected to be slow, but when rate of change is expected to be high favor making changes easy.
I agree with the gist of it, at the same time I've worked in many projects which did not care about defining a difference between those types of data in their beginning, and since they naturally change fast they accrued a large amount of technical debt quickly. Even more when those projects were in dynamically typed languages like Python or Ruby, relying just on test cases to do rather big refactorings to extrincate those logical parts are quite cumbersome, leading to avoidance to refactor into proper data structures afterwards.
Through experience I believe you need to strike a balance, if the project is in fluid motion you do need to care more about easiness of change until it settles but separating the actions (representation of a full fledged entity vs representation of a request/action to create the entity, etc.) is not a huge overhead given the benefits down the line (1-3 years) when the project matures. Balancing this is tricky though, and the main reason why any greenfield project requires experienced people to decide when flexibility should trump better representations or not.
> Through experience I believe you need to strike a balance, if the project is in fluid motion you do need to care more about easiness of change until it settles but separating the actions (representation of a full fledged entity vs representation of a request/action to create the entity, etc.) is not a huge overhead given the benefits down the line (1-3 years) when the project matures. Balancing this is tricky though, and the main reason why any greenfield project requires experienced people to decide when flexibility should trump better representations or not.
I am in complete agreement, and this is why experienced architects and project managers are so key. Effective software architecture has a time dimension.
Someone needs to have the long term picture of how the architecture of the system with develop, enforce a plan so that the project doesn't get locked into or cut by early-stage decisions long term, but also doesn't suffer the costs of late-stage decisions early on, and manage the how/when of the transition process.
I think we could have better tools for this. Some of them in libraries, but others to be effective may need to be in the language itself.
Hopefully your domain is sane enough that you can read nearly all the data you are going to use up front, then pass it on to your pure functions. Speaking from a Java perspective.
> Given the strongly typed flavour of data oriented programming, I wonder if you have any thoughts on the "proliferation of types" problem.
Not a problem.
You're just making your life needlessly hard and blaming Java for the problems you're creating for yourself.
This represents, coincidentally, the bulk of the problems pinned on Java.
Everywhere else the problem you described is a variant of an anti-pattern and code smell widely known as telescoping constructor pattern.
The problems caused by telescoping constructors have a bunch of known cures:
- builder pattern (Lombok supports this, by the way),
- the parameter object pattern (builder pattern's poor cousin)
- semantically-appropriate factory methods.
The whole reason behind domain models taking the center stage when developing a software project is that you build your whole project around a small set of types with the necessary and sufficient expressiveness to represent your problem domain.
Also, "explosion of aggregate types" can only become a problem if for some reason you don't introduce support for type conversion when introducing specialized types.
I have thoroughly enjoyed that Hickey talk, but I think he has a very system-oriented view/take - which is very important and shows his experience - but it is also common to have control over the whole universe for our program.
In the interconnected system view, data schemas can change without notice, and the program should be backwards and forwards compatible to a reasonable degree to avoid being brittle.
This is not a problem when we control the whole universe.
I find that Haskell-esque type systems (strongly typed with frequent use of algebraic data types to represent every possible state in _that_ universe) work better for the latter, but are not the best fit for the former, and they often have to add some escape hatches at the boundaries.
Java itself is in a weird cross of this two - it has a reasonably strong type system nowadays, but it’s also a very dynamic runtime where one can easily create their own class at runtime and load it, reflect on it, etc.
So all in all — are you making that Car as part of your universe where you control everything, and it won’t change in unexpected ways? Make a record, potentially with nullable/Optional/Maybe types for the fields, if that makes sense.
If it represents some outside data that you don’t control, then you might only care about a subset of the fields: create a record type for that subset and use a converter from e.g. json to that record type, and the converter will save you from new fields. If everything is important then your best bet is basically what Clojure/JSONObject/etc do, just have a String-keyed map.
(Note: structural types can help here, and I believe OCaml has row polymorphism?)
This discussion sounds like there is confusion about the Car abstraction.
Make and model vs. makeId and modelId: Pick one. Are Make and Model referenced by Cars or not? There seems a slight risk of the Banana/Monkey/Jungle problem here, so maybe stick with ids, and then rely on functions that lookup makes and models given ids. I think it's workable either way.
As for all the optional stuff (color, year, ...): What exactly is the problem? If Cars don't always have all of these properties then it would be foolish of Car users to just do myCar.colour, for example. Test for presence of an optional property, or use something like Optional<T> (which amounts to a language supported testing for presence). Doesn't any solution work out pretty much the same? When I have had this problem, I have not done a proliferation of types (even in an inheritance hierarchy) -- that seems overly complicated and brittle.
I'm not familiar with Java. Does it have no notion of structural types at all? If it does, maybe you could wrap those fields in `Car` with `Maybe`/`Option` (I’m not sure what the equivalent is in Java) so you get something like `Car(Maybe Make, Maybe Model, Maybe Year, Maybe Colour)`?
That one is pretty simple. You have a Car object with four fields. The types of the fields are, respectively Optional<Make>, Optional<Model>, Optional<Year>, and Optional<Colour>.
so now when you have a function that takes in a Car object, you have no idea what fields those objects might have, because it's all optional! Which means the checks for the validity of each field end up spreading out to every function.
Which is no worse than the situation in a dynamically typed language where every field in every object could be optional.
Dynamic typing advocates sometimes miss that statically typed languages don't force you to encode every invariant in the type system, just those that seem important enough.
Or, if you really want to go overboard, you could use a dependently typed language and write functions that only accept cars with a specific combination of fields not being empty. But that's typically not worth the complexity.
Frankly, your contract was that you have no idea what fields those objects might have. I'm just fulfilling it. You won't have checks for validity of each field, as Optional is valid, but you will have to have code that handles Optional<> types (so things like foo.getModel().orElse()...), which is the requirement you described. That doesn't mean you'll be constantly checking the validity of each field.
I see people conflate strong/weak and static/dynamic quite often. Python is strong[1]/dynamic, with optional static typing through annotations and a type checker (mypy, pyright, etc).
Perhaps the easiest way to add static types to data is with pydantic. Here's an example of using pydantic to type-check data provided via an external yaml configuration file:
[1] strong/weak are not strictly defined, as compared to dynamic/static, but Python is absolutely on the strong end of the scale. You'll get a runtime TypeError if you try to add a number to a string, for example, compared to say JavaScript which will happily provide a typically meaningless "wat?"-style result.
In some significant ways, it's not strong at all. It's stronger than Javascript but it's difficult not to be. Python is a duck typing language for the most part.
Duck typing is an aspect of it being dynamically typed, not whether it is strong/weak. But strong/weak is not formally defined, so if duck typing disqualifies it for you, so be it.
For example, you will find functions where the runtime value of parameters will change the return type (e.g. you get a list of things instead of one thing). So unless we want to throw out huge amounts of Python libraries (and the libraries are absolutely the best thing Python has going for it) then we have to accept that it’s not a very good statically type language experience.
The JS community on the other hand had adopted TypeScript very widely. JS libraries are often designed with typing in mind, so despite being weakly typed, the static type experience is actually very good.
I don't disagree. However, often, when I use a library, I use it within a small function that I control, which I can then type again. Of course, if libraries change e.g. the type they return over time (which they shouldn't also according to Rich), you often only notice if you have a test (which you should have anyway).
Moreover, for many libraries there are types- libraries that add types to their interface, and more and more libraries have types to begin with.
Anyway just wanted to share that for me at least it's in practice not so bad as you make it sound if you follow some good processes.
YMMV. I have over two decades of experience with Python and about a decade with JS though it's all backend work. I use both in my day job, but write in Python more frequently. I've found the transition to Python static typing much more seamless and easier to adopt than TS.
Amusingly, I can't call any time where I'd had to deal with differently typed return values in Python, but just recently had to fix some legacy JS code that was doing that (a function that was returning null, scalar, or array depending upon how many values it got in response to a SQL query).
>For example, you will find functions where the runtime value of parameters will change the return type (e.g. you get a list of things instead of one thing).
I have long argued that such interfaces are doing it wrong. That's what "Special cases aren't special enough to break the rules." in the Zen is supposed to warn about, to my understanding.
Defining an operation between two different types is not at all the same thing as enabling implicit conversions. Notice for example that "1" * 2 gives "11", and not "2" nor 2. Interpreting multiplication of a string by an integer as "repeat the string that many times" doesn't require any kind of conversion (the integer is simply a counter for a repeated concatenation process). Interpreting addition as "append the base-10 representation of the integer" certainly does. (Consider: why base 10?)
You have a point that strong vs weak typing is not a binary and that different languages can enable a varying amount of implicit conversions in whatever context (not to mention reinterpretation of the underlying memory). But from ~20 years of experience, Python's type system is nothing like JavaScript's - and it's definitely helpful to those who understand it and don't fight against it.
In my experience it's typically people from languages like Haskell that can't see the difference.
> that's just operator overloading and it exists in many statically typed languages too
My point is that Python's "typing" guarantees allow a caller to call a function with the wrong type, and get back a wrong answer and/or silently lose data.
Strong typing is pointless if the language is unable to actually prevent common footguns, like passing in the incorrect type.
I'm moving more and more to the opinion that arguing about the spectrum of strong <-> weak typing is stupid, because type utility is on the spectrum of static <-> dynamic, with dynamic being full of footguns.
Living this dream in Python right now (inherited a code base that used nasty nesting of lists & dicts). You don't strictly need to do OOP to solve the problem, but it really does help to have a data model. Using dataclasses to map out the data structures makes the code so much more readible, and the support for type hints in Python is good enough that you can even debug problems with the type system.
I see a lot of people mentioning Pydantic here, but you should take a look into TypedDict. It provides a type structure ontop of a plain dictionary, and sounds like exactly what you’d want, and is a built-in that you don’t need a dependency for.
Mypy for example can also see the types of the dictionary are supposed to be when you use it just like a normal dictionary.
I recommend you use pydantic for type annotations. Alternatively, dataclasses. Then you pair it with typeguards @typechecked annotation and the types will be checked at runtime for each method/function. You can use mypy to check it at "compile time".
Having clear data types without oop is possible, even in python.
Python's not really built for that AFAIK, though. In languages built for it, you can type your dicts/hashes/maps/whatever and its easier to see what they are/know where the functions that operate on them live. I'm most familiar with Elixir which has structs which are simply specialized map (analogous to dict in Python) where their "type" is the name of the module they belong to. There can only be one struct per module, In this sense is easy to know exactly where its functions live and is almost like a class with the very key difference that modules are not stateful.
> In languages built for it, you can type your dicts/hashes/maps/whatever and its easier to see what they are/know where the functions that operate on them live.
I think I must be misunderstanding what you mean by that, because I can very much do that in Python.
That's what I thought. I obviously don't know Python well enough and didn't know you can name dicts (like, beyond setting them to a variable). I guess you can export from a module so they are prefixed! Didn't think of that one earlier.
I'm not sure what you mean by naming dicts, but Python has TypedDict, where you can define the names and types of specific keys. They only exist for type checking and behave exactly as a normal dict at runtime.
In modern typed Python, you can instead use dataclasses, NamedTuples (both in the standard library), attrs or Pydantic (both third-party) to represent structs/records, the latter also providing validation. Still, TypedDicts are helpful when interfacing with older code that uses dicts for heterogeneous data.
My main gripe with them is that different TypedDicts are not compatible with each other. For example, it would be very helpful if a dict with x:str and y:str fields were considered superclasses of dicts with x:str, y:str and z:str like they are in TypeScript, but they aren't. They are considered different types, limiting their usability in some contexts.
When using homogenous dicts, you can still use dict[str, T], and T can be Any if you don't want to type the whole thing. You can use any hashable type instead of str for keys. I often do that when reading JSON from dynamically typed dict[str, Any] to dataclasses.
That needs to be explicit for any interacting types. You must define separate classes and explicitly define their hierarchy. This is fine if you control all the types, but it breaks down quickly. The best example is having two TypedDicts with the same members; in Python, you cannot use one instead of the other.
from typing import TypedDict
class A(TypedDict):
a: int
b: str
class B(TypedDict):
a: int
b: str
def f(a: A) -> None: pass
b = B(a=1, b='b')
f(B) # mypy error: Argument 1 to "f" has incompatible type "type[B]"; expected "A" [arg-type]
On the other hand, this is legal in Typescript:
interface A {
a: number;
b: string;
}
interface B {
a: number;
b: string;
}
function f(a: A) {}
const b: B = {a: 1, b: 'b'};
f(b);
This is most useful when A has a subset of B's attributes, like this (which also doesn't work in Python):
interface A {
a: number;
}
interface B {
a: number;
b: string;
}
function f(a: A) {}
const b: B = {a: 1, b: 'b'};
f(b);
Python classes are basically dictionaries that have a distinct type bound to them. Alternatively you can subclass from dictionary to give yourself a distinct type but still be a dictionary. Slotted classes are basically named tuples (and of course, Python has actual named tuples and dataclasses), so there's a lot of ways to "tag" a collection with a specific type in mind.
>"In these situations, I often introduce objects that wrap the map or dict, and have methods that make sense for that level."
I've been doing the same thing since the end of the 80s as well starting with Turbo/Borland Pascal, C++, and later any other language that supports OOP.
In Python, I sometimes try data-oriented programming, using lists and dicts to structure data. And I find that it does not work well. Once I get two or more levels of nesting, I find it far too easy to get confused about which level I'm on, which is not helped by Python's lack of strong typing. In these situations, I often introduce objects that wrap the map or dict, and have methods that make sense for that level. In other words, the objects can be viewed as providing clear documentation for the whole nested structure, and how it can be navigated.