> It's possible to implement anything correctly with dynamic types but the point...

seanwilson · on Aug 5, 2021

> Or to be precise, there isn't any that passes any statistical or other standards. What little evidence there is goes both ways and has, at best, tiny effect sizes.

You mean the link from [1] to "An experiment about static and dynamic type systems" based on "an empirical study with 49 [undergraduate] subjects that studies the impact of a static type system for the development of a parser over 27 hours working time.". I think studies like this are more distracting here than useful (low skill level, not large scale enough, too much noise, toy example). The article here sums up a lot of my thoughts on this: https://news.ycombinator.com/item?id=27892615

And do you really need a study that proves e.g. (to pick a simpler example to summarise) a language that makes it impossible for null deferences to happen at compile-time is going to have less null dereference errors than one that lets those errors happen at runtime? It's like insisting for an empirical study that 2 + 2 = 4.

> It may be that such code is easier to write with static types, but that hasn't been shown,

The typed SQL example might look something like:

   function getUserInput(): string

   function createSanitisedString(s: string): SanitisedString
   function createSafeSqlQuery(s: SanitisedString): SqlQuery

For the code `createSafeSqlQuery(getUserInput())`, a static type checker would stop the entire program from starting with a type error pinpointing the exact line where the unsanitised data is coming from. With a dynamic type checker:

1. At best, the code will fail only after the user input is received by createSafeSqlQuery during runtime and you won't know where the input originated from.

2. At worst, the coder forgets to add a check like `typeof s === SanitisedString` or a call to `createSanitisedString` in `createSafeSqlQuery` and creates an injection attack.

The static type checker clearly wins here for me for safety and ease of implementing correctly. I don't need a study to know that compile-time errors are better than production runtime errors, that automated and exhaustive checks of correctness are better than the coder having to remember to add manual checks, and that it's better to know the exact line the unsanitised input came from over only knowing it comes from somewhere.

What languages have you used that have strong static type systems that you've written large programs in? Have you tried OCaml or Haskell for example?

simiones · on Aug 5, 2021

> function createSanitisedString(s: string): SanitisedString > function createSafeSqlQuery(s: SanitisedString): SqlQuery

The problem with the whole argument is that these functions are not actually enough to work with SQL, since they don't allow us to create dynamic SQL from safe strings.

Here are some ideas of why we can' use the functions you proposed:

  user_filter = raw_user_input_col_name ++ " LIKE ?"
  createSafeSQLQuery("SELECT * FROM my_table WHERE " ++ user_filter ++ ";") //type error

  sanitizedQuery = createSanitisedString("SELECT * FROM my_table WHERE" ++ user_filter ++ ";") //should return error or quote everything
  createSafeSQLQuery("SELECT * FROM my_table WHERE" ++ createSanitisedString(user_filter) ++ ";") //filter will be wrong, type error

Let's now add something to allow us to achieve our goal:

  function asSanitizedString(s: string) : SanitizedString {
    return s
  }
  user_filter = createSanitizedString(raw_user_input_col_name ) ++ asSanitizedString(" LIKE ?")
  query = asSanitizedString("SELECT * FROM my_table WHERE") ++ user_filter ++ asSanitizedString(";")
  createSafeSqlQuery(query) //works, nice
  createSafeSqlQuery(asSanitizedString("SELECT * FROM my_table WHERE " ++ user_col_name ++ "LIKE ?;")) //oops, this also works

You can achieve all of this with a non-string API: you can have an SQL DSL or just an SQL AST library; or you can use an ORM. But either way, you can't fix it without modeling the entirety of SQL into your type system (or as much of SQL as you are willing to support).

If you don't believe me, go looking for a library that allows arbitrary strings as SQL, but statically ensures user input can't be used to construct an SQL query. I don't know of any one.

seanwilson · on Aug 5, 2021

> The problem with the whole argument is that these functions are not actually enough to work with SQL

My bad for being unclear but I meant you would be sanitising something like the user entering "tea cups" into a shop search input form and searching your shop product database with that. I didn't mean the user would be entering SQL queries.

> But either way, you can't fix it without modeling the entirety of SQL into your type system

Using the type system to check correctness sounds good to me.

This is getting way too into the weeds about SQL anyway. Null dereference checks are a less distracting example to focus on for instance. As long as you can encode it into the type system, all my points are still relevant.

simiones · on Aug 6, 2021

> My bad for being unclear but I meant you would be sanitising something like the user entering "tea cups" into a shop search input form and searching your shop product database with that. I didn't mean the user would be entering SQL queries.

No, I understood that. But my point is that the programmer is going to be composing SQL queries, and they will be doing it based on user input. The SQL API will have a very hard time distinguishing which parts it receives are trusted programmer input and which parts are untrusted user input.

> Using the type system to check correctness sounds good to me.

Sure, but SQL is a huge language with significant differences between any 2 major DB implementations (and that depends on SQL server configurations - e.g. in MySQL you can specify via config file whether "a" is a string or an identifier). I have never seen a full SQL DSL or AST used in any SQL client library, in any language: it's just too much work.

mpweiher · on Aug 5, 2021

>You mean the link from [1] to "An experiment about static and dynamic type systems"

No. I mean, yes, that is one study, but there are a lot more. They all come out essentially the same.

Danluu did a an overview article a while ago:

https://danluu.com/empirical-pl/

Note that the A large scale study of programming languages and code quality in github paper, the one that makes some of the strongest arguments for safety of the bunch, was the one that was later completely taken apart at OOPSLA 2019. Its findings, which also had very small effect sizes and statistics significance, are completely invalid.

> do you really need a study that proves ... less null references errors

1. Languages don't have null errors, programs do.

2. I don't need a study to show that a program has less than or equal null errors in a language with null safety, because the program in a language without might still have zero. If you're going to make a logical claim, then let's stick to what's logically provable. If you're going to hand-wave...well you might as well use a dynamically typed language ;-)

3. I do need a study to show that such differences matter at all. All software has bugs, if this a significant source of bugs, then a mechanism to make me not have them might matter.

4. I do need a study to show that the net effect is positive. For example, the famous Caper Jones study showed that dynamic languages such as Smalltalk and Objective-C were significantly more productive than statically typed languages, including Haskell, C++ and Java. Studies also have long shown that bugs scale linearly with code size, a very strong correlation. So let's say null-references are 1% of you total bugs, but you now write twice the amount of code, that means a move to a null-checked language will significantly increase your total bug count, despite eliminating a certain class of bugs.

(In fact other, older studies showed that type errors accounted for 10% of bugs, so if you can save just 10% of code using a dynamically typed language, you're ahead in terms of bugs).

Of course, I can also not require such studies and make an engineering judgement. And this is fine, we do it all the time because very little in software has been demonstrated much at all. But you then need to be aware that this is a subjective judgement call, and others may reasonably come to a different conclusion.

And being aware of this makes for much, much better engineering judgement, IMHO.

igouy · on Aug 5, 2021

> … but you now write twice the amount of code…

We could take some JavaScript programs —

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

— and try to transliterate them into Dart —

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

— and see how much or how little boiler plate is required when a language supports null safety, local type inference, implicit-dynamic: false ?

seanwilson · on Aug 5, 2021

> 1. Languages don't have null errors, programs do.

> 2. I don't need a study to show that a program has less than or equal null errors in a language with null safety, because the program in a language without might still have zero. If you're going to make a logical claim, then let's stick to what's logically provable. If you're going to hand-wave...well you might as well use a dynamically typed language ;-)

I think now you're just nitpicking and being uncharitable for the sake of it instead of responding to the overall point of my message I took care to write.

> Studies also have long shown that bugs scale linearly with code size, a very strong correlation. So let's say null-references are 1% of you total bugs, but you now write twice the amount of code, that means a move to a null-checked language will significantly increase your total bug count, despite eliminating a certain class of bugs.

> (In fact other, older studies showed that type errors accounted for 10% of bugs, so if you can save just 10% of code using a dynamically typed language, you're ahead in terms of bugs).

Most software engineering studies have so many confounding factors even in isolation that combing results from multiple studies like this is nonsensical and misleading. This reads like brilliant satire if I'm honest.

> And being aware of this makes for much, much better engineering judgement, IMHO.

You didn't reply to the question about if you've written large projects with strong static type systems like OCaml or Haskell. If the answer is no, they're worth learning for more awareness instead of relying on likely very limited and/or flawed empirical studies in my opinion.

igouy · on Aug 5, 2021

> … famous Caper Jones study…

Have you ever considered the validity of that study: which purports to compare programming languages, apparently without considering obvious differences in available programming tools?

Even back in the late '80s, those Smalltalk implementations provided a very integrated development environment — writing Smalltalk in a simple text editor really isn't the same ;-)

chriswarbo · on Aug 5, 2021

> > It's possible to implement anything correctly with dynamic types but the point is static typing makes it easier to do so

> This is where the argument became circular and we can basically stop.

I think any appeal to "easier" is inherently subjective. I've certainly tried to avoid making any such claims. Instead, I've mostly tried to argue that static types can forbid certain programs from ever running, and we can use that to forbid things like vulnerable SQL queries.

> > It's possible to implement anything correctly with dynamic types

> but apparently chriswarbo does not.

I never said any such thing; besides which, it's trivially the case that anything implemented with static types could also be implemented with dynamic types, since static types only exist at compile time (they are "erased" during compilation), and hence every running program is dynamically typed.

In fact, I don't think I've said anything about correct implementations at all. My point is that static types can forbid incorrect implementations. When it comes to security, that is much more important; i.e. I would much rather bang my head against a screen filled with compiler errors, than expose some insecure program to the harsh reality of the Internet.

> The parent claim (by chriswarbo) was that, for example, SQL injection attacks were, in fact, incontrovertible proof that SQL injection attacks "are classic type errors".

I've not claimed that (in fact, I'm stuggling to parse what that sentence might even mean). I claim that SQL injection attacks are "classic type errors" in the sense that:

- It's a widespread problem, easily encountered, commonly warned about, and most developers have probably done it at some point.

- It's a class of error that can be entirely ruled out at the language/API level, through use of static types. Similar to how segfaults can be entirely ruled out at the language level, using managed memory like in Python/JS/etc.

- Due to the above, it's a common example used to illustrate the usefulness of static types in blog posts, articles, etc. One which sticks in my mind is "A type-based solution to the “strings problem”: a fitting end to XSS and SQL-injection holes?" https://blog.moertel.com/posts/2006-10-18-a-type-based-solut...

- Due to the above, it's considered "folk wisdom" (or a "folk theorem") in the relevant fields (e.g. Programming Language Theory, Formal Methods, etc.)

Other examples of "classic type errors" might include:

- Mixing up integers with strings-of-digits, e.g. 'print("You were born around " + (time.now.year - input("How old are you?"))'

- Forgetting to 'unwrap' some list element, e.g. 'open(listdir("config/")).read()' instead of 'listdir("config/").map(f => open(f).read())'

- Mixing up units of measure, e.g. 'distance = speed + time' instead of 'distance = speed * time'

Basically any common mix-up between values/components in a program, where the computer could help up to spot the mix-up (perhaps entirely automatically, if type inference is involved). In the case of SQL injection, the mix-up is between 'string of bytes' and 'SQL query'.

> And yes, that article is somewhat old, but the evidentiary situation has not changed, despite further attempts to make the claim that static typing is provably better. Claims that were resoundingly debunked.

I very much appreciate when researchers attempt to ground things in a bit more empiricism! Unfortunately those particular studies just aren't looking at the sorts of things that I find relevant; in particular, those studies (and that linked blog post, and many of the comments here), seem overly-preoccupied with mundane trivialities like "string", or "int", or "SQLColumnName".

Personally, I'm much more interested in how types can help me:

- Avoid leaking secret information https://link.springer.com/chapter/10.1007/978-3-030-17138-4_...

- Guarantee big-O resource usage https://twanvl.nl/blog/agda/sorting#a-monad-for-keeping-trac...

- Guarantee the correct order of operations https://docs.idris-lang.org/en/latest/st/machines.html

- Prevent AI agents from losing capabilities http://chriswarbo.net/projects/powerplay

mpweiher · on Aug 15, 2021

[SQL injection]

> It's a class of error that can be entirely ruled out at the language/API level, through use of static types.

Repeating this claim doesn't make it true.

Once again: this has nothing to do with static vs. dynamic types, and everything to do with modelling.

To make this clear, let's compare a dynamically and a statically type-checked version of this with the model of SQL as just strings.

Both the dynamically and the statically type-checked version of this program will be susceptible to SQL injection attacks. The statically type-checked version will verify at compile time that all the values are indeed strings.

Now let's compare a dynamically and a statically checked program with a proper model for the SQL, not strings.

Both of these will not be susceptible to SQL injection attacks.

It has nothing to do with static vs. dynamic, and everything with how you model the problem.