That is an interesting project! The idea of a graphical representation of infrastructure assets, with the ability of querying, sounds very useful, especially given how in AWS Management Console every region is a separate realm and no single view of resources is available. However, I don't agree with equating "infrastructure assets" with AWS -- AWS is the biggest cloud, but almost every business is moving towards multiple clouds / hybrid cloud these days. Similarly to providing a "bird's eye view" of assets in AWS, this concept would be even more useful in a multi-cloud / cross-cloud infrastructure. If this is on the roadmap, it probably deserves a mention in the documentation, and if not, it should be stated that this project focuses on AWS only at this time.
Terraform uses a graph to keep track of infrastructure dependencies. You can export the graph and take a look (terraform graph).
I once built a terraform provider for a custom neo4j cmdb to be able to add metadata in the form of additional relationships to anything provisioned with TF.
This data was used for config inventory, job scheduling and a bunch of other automations/rules.
Broke the ”cloud vendor” boundry this way and we used it seamlessly against on-prem vsphere and aws.
Never got around to open source it, unfortunately.
A graph for this type of cmdb/config data is really useful.
For every business actively doing multi-cloud, I'd really love to see their business-case argument for doing so. For the past 8 years or so, I've talked to a large number of businesses that have gone multi-cloud to "avoid vendor lock-in" or to "get the best features of all clouds" who end up regretting their choices because they tend to end up with the worst of all worlds, networking issues, latency problems, tooling woes, and considerably larger tooling teams that are always playing catch-up.
> this concept would be even more useful in a multi-cloud / cross-cloud infrastructure.
Thanks for the comment! Cartography can definitely be extended to do multi-cloud but we're focusing on AWS at the moment. At the same time though we didn't want to discourage others from looking at this by saying in the docs that it was solely an AWS project. There's nothing really stopping it from supporting other data sources.
Edit: we're planning on adding a GitHub module soon too.
I've been loving this project. Also a big shout out to Beagle (Sony), which is also doing good work here recently: https://github.com/yampelo/beagle .
If you're curious about this stuff for more day-to-day, we (Graphistry) add some crazy GPU graph visual analytics + visual templating / automation for daily tasks + DB connectors. Think automatically mapping your daily splunk alerts across host/network/intel data, or seeing your entire data center, or going through a ton of DNS logs for bots/intel. Super fun space right now! Ex: http://labs.graphistry.com/graph/graph.html?dataset=PyGraphi...
The name is a bit confusing. Cartography sounds like something related to maps. Perhaps the authors should consider renaming this otherwise really awesome project?
Yes, because mapping has more than one meaning, e.g. an operation that associates each element of a given set (the domain) with one or more elements of a second set (the range).
It's still just one definition: A paper map relates a point on a the piece of paper to a point in some other space (topography, subway stops, interstate routes, etc). And if you relax the "carto" part of "cartography" (e.g. someone working on OSM or Google Maps is still doing cartography), then I think cartography as "the practice of constructing maps" is still an unambiguous and non-overloaded meaning, which applies to mapping infrastructure just as well.
In the non-free world you can get similar features with a CMDB (like HP or EMC CMDBs).
This kind of software usually its licensed for the number of elements it tracks... and the costs add up easily (hosts, network ports, IP addresses, OS, Application, assets counts as one element and you end up tracking 1000s of them).
Literally working on an extremely similar project, albeit to power security; seems inevitable that an open-source solution would've popped up. Glad to see this. Any exploration in doing this cross-cloud?
Good question. As is, this does not keep anything in sync.
To keep the graph in sync with changes in the account, simply set up a cronjob to run `cartography` whenever you would need a refresh. Each sync run should guarantee that you have the most up-to-date data.
Here's how a sync works: when the sync starts, set a variable called `update_tag` to the current time. Then, pull all the data from your AWS account(s) and create Neo4j nodes and their relationships, making sure to set their `lastupdated` fields to `update_tag`.
Finally, delete the left over nodes and relationships (i.e. those that do not have up-to-date `lastupdated` fields). This way the data stays fresh, and you can see this in the [cleanup jobs](https://github.com/lyft/cartography/tree/master/cartography/...).
Our approach requires us to stay as real-time as possible, so we're actually using CloudWatch events to keep in sync -- the deletes become a little hard after that.
I look forward to the progress of Cartography, though!
In an ideal world, yes. In this world, they REALLY should.
The downside of AWS providing something like this out of the box would be complete lack of usability. There's a running not-even-a-joke that whatever UI an AWS service comes with, it's going to be at best an excuse. If you want to make a good use of a feature/offering in their stack, you have to write your own tools for it.
Don't get me wrong, the APIs in AWS are, for most parts, quite nice to work with. [asterisk goes here] -- It's the stock UIs in their web console that are maddening and (IMO) barely fit for anything productive.
AWS Config provides essentially all of nodes and edges required to build a nice resource graph, but unfortunately it's quite limited in the services it covers.
I built something similar for teaching people Docker in webcasts. It also integrates a little inspector, a terminal, hotkeys, and for no good reason; procedural color theming. It's kind of an ugly code base because I built it for screen casting then got a job that required a lot of attention so I didn't end up using it much, but I like it.
Nice. We should connect. I've built this for security using a very similar stack (Neo, etc). Where it's similar is the use of a graph to implement an ontology of any given system, where we diverge is I've built a separate viz, collaboration, and management layer.
Ontologies never took off because I think they tried to encompass too many things, where graphs are a simple tool you can use to build them as part of a greater idea.
I've wanted to mock up something similar, but more along the lines of Lightbend Pipelines mixed with Touchdesigner. Or I guess a more interactive version of Netflix's Vizceral would be another way of looking at it. But having Touchdesigner's multi-tiered workspace within a node concept would be key.
I've been using JupiterOne recently, has anyone else seen it? It creates neo4j-like graph and CMDB from actual AWS data and many other things. There’s a free community version of the product too that I could test out in my lab.
We're building something very security oriented if you'd like to try and give your feedback (contact@cloudhawk.io) ? Thanks for mentioning Jupiter one we've never heard of it but interesting product !
I'd go further and suggest any existing commonly understood word is a poor choice for a new thing, but many software projects seem hellbent on confounding their potential users along with people who have no great interest in the new thing.