You have no idea about their codebase, the implementation details of their features nor how they counted the lines (comments included?). So stating that it’s dumb is beyond ridiculous.
You are right in that it’s certainly a high LoC count for Python, but still...
And yes, knowing nothing else about their code base than A) It's in Python, and B) it's several million lines of code, I feel very confident that there is at least an order of magnitude too much of it. Instagram is just not doing anything that complicated.
(I should mention I specialize in maintaining and refactoring legacy Python code. I know what I'm talking about here.)
Features that are "not complicated" can actually very easily be "very complicated" at scale. Which Instagram does have. 500 million users, every single day.
If you need several millions of lines of Python to do what Instagram server does, the code is bloated.
My bet is that they let too many Java devs loose on the code base, without experienced Python devs reviewing the commits and managing the deluge of unnecessary classes. I've seen it happen before.
>If you need several millions of lines of Python to do what Instagram server does
I have this feeling that you're probably not all that aware of 95% of what their code actually does, and thus probably not in a position to make judgements as to whether their code base is truly bloated relative to what it does.
From a user's perspective, Instagram has:
a) a way to post pictures/videos/sound recordings to a public feed. The pictures can include overlays of links to other users, to other posts, to song lyrics that play in sync with the music, etc etc. Users viewing their posts get the ability to comment/like/link, with automatic language detection and translation on demand.
b) a way to see how other users interact with their posts, allowing comments, seeing views and other analytics, monetizing etc etc
c) Provides advertisers with the ability to place stories (stories are a stream of short-lived (24 hours) video/audio posts that users see) or posts (that can be static/video/audio), with links to external sites, purchasing direct links ("Shop now"/"Buy this") etc
Instagram is much more than a stream of user images.
That doesn't include all the "back office" stuff like spam/reporting/censorship/language translation etc etc.
As orf said you have no idea about their codebase. And you have no idea what's included in that statement -- given that they talk about startup time, they most likely are taking into account the whole framework, a plethora of admin and analytics tools, lots of debugging / debug-only infrastructure, migrations, lots of tooling whose sole purpose is making it easier to work in large teams, etc…
(And for the record, Linux is ~37 million lines of actual code, Postgres ~2 million, and gcc ~8 million)
There's nothing absurd about one of the most visited websites on earth to be a couple million LOC.
> As orf said you have no idea about their codebase.
I do too: It's Python and it's several million lines.
Metaphor: you've got three pallets of goods and have hired three trucks to move them. I don't have to know how you wrapped the pallets to know that you brought two too many trucks.
I don't have to know the details of what's included in "Instagram Server" et. al. to make this call (obviously) based on my experience and first-hand knowledge of similar codebases. Frankly, I am kind of disappointed in the pushback I'm getting on this. The only reason to have a multi-million line Python project is for the entertainment of devs, or, worse yet, job security.
Let me put it this way, if the CTO of Instagram showed up here I would be willing to bet US$100,000 that I could reduce the Instagram code by 90% in six months. (Do you think the devs there would appreciate that? Even the one that got laid off as a result?)
If I sound cynical it's only because I've seen this sort of thing for myself. I'm not trying to say that the Instagram devs are dumb or nefarious, this kind of code happens organically and often despite our best efforts. But that code needs a diet. I'm sure of that.
- - - -
edit: In re:
> (And for the record, Linux is ~37 million lines of actual code, Postgres ~2 million, and gcc ~8 million)
So, call it 50M LoC, what's your ratio for Python/C? Meaning, how many lines of C code are replaced, on average, by one line of Python?
And how feature-complete are we talking? POSIX? GCC targets a lot of languages and platforms, eh?
If you were going for an integrated system, like Oberon OS or a Smalltalk IDE, I think my claim is still plausible, eh?
Let me put it this way, if the CTO of Instagram showed up here I would be willing to bet US$100,000 that I could reduce the Instagram code by 90% in six months.
And from Instagram's POV, the ROI on that would be much less than putting in the sort of belts and braces that the article talked about.
They don't have the time or space to engage in a massive technical debt reduction program, they're too busy destroying Snapchat and other competitors, reacting to TikTok, implementing an entirely new IGTV video service that provides their customers (ie advertisers and marketers) the equivalent of youtube within the Instagram universe, etc.
I'm sure that every large internet service's codebase out there could be made much leaner and smaller. The question is whether that is worth their while.
Dude, sincerely, thank you. I feel like this is the sane answer I was waiting for. Cheers! (and for your other comment in re: what all Instagram does. I appreciate it.)
That's bananas.
Nothing Instagram does requires that much code.
Also, that much Python code means you're doing it wrong.