Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Yandex open sourced it's BI tool DataLens (github.com/datalens-tech)
268 points by SergeAx on Sept 26, 2023 | hide | past | favorite | 86 comments


Such a huge landscape of BI and data visualization tools. Are there any clear open source leaders? Apache Superset?


Superset and metabase are definitely the OSS leaders.

I work on an open source code-based BI tool called Evidence, which might be of interest to you.

It's effectively a static site generator aimed at building automated reports and analysis.

https://github.com/evidence-dev/evidence

Previous discussions on HN:

https://news.ycombinator.com/item?id=28304781 - 91 comments

https://news.ycombinator.com/item?id=35645464 - 97 comments


I really, really like your idea with Evidence. I long for a mode in Metabase that’s like a “notebook mode”, where the main focus is narrative and it’s ornamented with viz, rather than the other way around.

I want to be able to publish this notebook when I’m done and then be able to hand that around, the same concept that you’ve built Evidence around. I think that’s a very good idea, so thanks.

The main thing keeping me from switching is that Metabase’s query builder and visualizations are too good for 95% of my work. It’s hard to picture going back to writing _all_ the SQL.


Thank you!

I hear that. We're making a lot of progress on reducing the amount of sql you need to write, keeping it DRY etc. Making the dev experience really buttery and high leverage is definitely a priority.

Here are a few of the things we're working on in that regard:

1. Making our components issue their own queries so that you don't need to write full sql expressions, just fragments when you're defining the chart you want.

2. Improving intellisense -- right now you get "slash commands" and snippets in our vscode extension to invoke components (which are really sweet), but we're aiming to get to a full intellisense type of experience.

3. Supporting external metrics layers where it makes sense. We've got some users using Cube, we're interested in Malloy, and the dbt semantic layer, those types of things.

One of our team members is very keen on building something he calls "Evidence studio", sort of a wsywig you could invoke during development for generating basic queries, setting up charts etc. that syncs the components into their text representation. That'd be further off though :)


The OS BI field is bigger than I expected.


There's redash as well.


It is a dead project


Seems to be getting updates, last commit 5 days ago


Databricks acquired it if I recall correctly.. so maybe.


Why so?


Platforms like this are always pretty funny. It's basically a gateway drug to their cloud platform (which isn't free) that they hope you use, but then they keep it open source so that they don't have to pay salaries to w2 employees. Smart thinkin'!


Thank you for mentioning it!

For anyone who is interested in our cloud service, it's an easy way to put your project online, keep it up to date with your data, and place it behind access control.

For many organizations, hosting Evidence in their own infrastructure is easy enough, but if you don't want to manage that, we are happy to manage it for you.

It is not free (that's how we pay the salaries), but pricing is available here:

https://evidence.dev/cloud


Right now I'm heavily into Grafana but when it comes to BI it kinda falls flat, I regularly have to fall back to using the Plotly plugin to create the charts (but it's getting better, at least you can do a normal scatter or bar chart out of the box since version 8. Labeling the axes is still a problem though). Navigation is also problematic, like jumping to a source table takes a lot of effort to make (basically you have to create a new dashboard and do some hyperlinking instead of there being a ready to go "view source data" button). I feel like there's a lot of friction to get Grafana to do BI, but I've also invested so much time in it I'm afraid to jump ship...


I’d advise metabase over superset.

Superset looked good, but operating superset quickly runs into the same Python issues all Python software suffers from.

Sometimes it would just break for no apparent reason. Configuring it was a nightmare of magic Python code and unclear settings. Trying to use plugins was equally painful: due to the poor boundary separating the applications dependencies from the plugins dependencies, adding a db connector could just bork the whole application.


Or like not being able to delete a user without running some SQL:

https://github.com/apache/superset/issues/13345

Almost instantly run into this issue setting up a test instance of Superset. And the issue has been around for years.


Very sound advice. I started the setup process for Superset, and it's even worse on Windows. Contrast that to Superset which is a single .jar file and worked instantly.


One of those “Superset” seems wrong.


Also Metabase, which I found easier to deploy and use.


Metabase 47 also has new serialization features in paid editions that allow for git-based workflows. https://www.metabase.com/learn/administration/git-based-work...


What do people use for an "analytics.js" for reporting events with common items like campaign data, user device and user profile measurements, and related from browsers and devices?


Is there any open source BI tool that can be embedded in some other product so that users (not product developers but final users) can create their own dashboards?


QuickSight has great embedding story

Also CubeJS if you want to a bit more flexibility.


Metabase


IMHO Metabase is too complex for end users: too much "code" (queries) to write and very limited in the presentation layer (widgets users can use to show the results).

I was thinking of something closer to Power BI, e. g. something like AppSmith but more BI-oriented (AppSmith is a generic tool to build your own applications, kind of like Visual Basic or Delphi).

https://www.appsmith.com/


This is awesome! Let's not forget about their second generation SQL database YDB which is an open source alternative to TiDB and YugabyteDB.


Yandex consistently pumps out great software. Clickhouse is awesome, for starters, as is this.


Apache 2.0 licensed (from a cursory glance at the first few repos).


Looks pretty neat. Having not fully investigated it yet, I will say the one thing I usually run into with this and other BI tools is a lack of flexibility in the UI for forming queries. It's sometimes limited to one table or view at a time. I wonder why more of them don't use more flexible querying techniques, perhaps just due to the risk of a bad query being formed?

My preferred approach is implemented in Zillion, which I use for BI at my company: https://github.com/totalhack/zillion


Nit on title:

it's => its


Just be glad it wasn't " its' " (sic), which has been showing up more and more in my input streams.


>Where can I find persistent application data storage?

>We use the .us-data folder to store PostgreSQL data permanently. You can delete this folder if you want, it will be recreated with the demo data after restarting the datalens-us container

Why "us"?



Any screenshots of how it looks?


The website has screenshots: https://datalens.tech/


Ah thanks, I was browsing through the github repos.



I'm giving up Power BI and I'm moving to Domo. I look at rolling my own occasionally.


(I'm the founder of a competitor to Domo)

I really like the concept of Domo. They have ETL, modeling, a warehouse and BI in one app ("data-stack-in-a-box"). I've interviewed 20 of their customers and the general sentiment was pretty bad. There's a long sales process, a longer process to get it set it up, and they've built all the modeling and connectors themselves (vendor lock in, none are best-in-class).

Definite (https://www.definite.app/) is a data-stack-in-a-box. We have a built-in modeling layer for core metrics and an AI assistant to answer any one-off questions.

A few ways we're different:

Built on open source - We run the data stack for you and give you a single app to manage and analyze your data, but it's all built on open source standards. So if you decide at any point you want to run it all yourself, the code is yours to lift and shift to your own infrastructure.

Battle tested connectors - We're using Meltano / Singer (open source library from Stitch) for our connectors, so they've been used heavily in production for years.

Self-serve that actually works - A lot of tools promise self-serve, but AI is making this real. We've invested heavily in making it possible to ask questions and get accurate answers. The AI queries a modeled view of your data that can answer questions that depend on well defined metrics (e.g. ARR, DAU, etc.).


> Text to SQL that actually works. Ask questions in natural language across multiple tables and our engine knows where to join.

Can you share any information about how it knows where to join?


We do a couple things here:

1. When you bring your own data warehouse, we parse the query history, convert it so an AST and learn JOIN's from there

2. When you're using our managed data warehouse and ETL, we already know most of the JOIN's (e.g. we know how to join the Hubspot data we ingress to Stripe)

3. For anything not covered by #1 or #2, we have a modeling layer where you can specify how to join tables.


all that text and no pricing page. shame.


Fair point! We have a free plan (capped at 3 users) and paid plans (which add ETL and a data warehouse) start at $500 a month.


We have Domo at work and it just seems overly complicated and insane. I'm wanting to learn it instead of just polling data sources and adding them to a local Opensearch instance, but... too verbose for me.


If you want to roll your own, maybe have a look at Plotly's Dash?


Observable Plot [0] is also nice. AFAIU it's the same library powering the visualizations within Observable itself.

[0] https://observablehq.com/plot/


Yandex open sourced all of its code a few months back when it was all stolen and leaked.



I would have no idea. Why are you asking me and sharing a link to something which tells you. That’s weird, and I’m not interested in clicking the link anyway.


Not sure why the downvotes - very useful context and phrased correctly as a question.


Cool. Rolling your own BI seems fraught with peril at most orgs where I imagine the buy vs build decision is always buy. How many PowerBI or Tableau seats do you need before rolling your own internal BI platform starts to make sense?


I don't think it ever makes sense because the large players will always be able to make new innovations (mobile apps, natural language querying, SSO integrations, etc) that, short of large corps hiring BI teams to invest in the open source ecosystem like superset, your open source solution won't have.


Yeah, unless your at unicorn scale, I don't see how self-hosting BI would ever make sense relative to other investments you could make.


I must like to live dangerously. In all seriousness though there are low cost alternatives to those mega BI tools that suit many use cases. If I wasn't rolling my own I'd probably start with Metabase or Superset. What I use: https://github.com/totalhack/zillion


Yandex makes some pretty cool tech- they clearly have a lot of smart engineers.

It's a shame that geopolitics means most of it will have to be reinvented by someone else before it'll see any use.


Yeah. I’d love to use clickhouse but yandex ties to the russian government makes me not want to.


ClickHouse was open-sourced in 2016 and moved to an independent company in 2021, with no ties remaining with Russia or Yandex. Read more: https://clickhouse.com/blog/we-stand-with-ukraine/


Oh wow that was news to me, thanks!


Not only you. The company that I work for delivers software to government agencies. Dependencies like libraries or stack parts that are Russian or Chinese are simply banned. Even Harbor was rejected (and probably should be, as they provide a bunch of their own built Docker images, so you never know what your stack runs). I know "but it's open source" etc., and somebody may feel offended, but that's reality.


ClickHouse is a separate company registered in US and some part of it is owned by US-based venture funds.


Its open source, who care what longitude and latitude the steward of the project is physically based in ... copy the code and make your own thing up


Yes, that is really a shame, although it's been fully open sourced.

Yandex is aware of how the geopolitical situation is hurting them and are therefore building a new company called Double.Cloud, based in Europe, to work around the negative public opinion on Yandex, and thereby continue being able to sell Clickhouse cloud services.


Need to borrow a tin foil hat?


For not wanting to be involved with the russian government? Wow, what a conspiracy theory!


No need for that. I didn’t know Clickhouse was now disconnected from the Russian government.


Isn't it all open-source?


Some projects have refused code from developers because they are Russian or Chinese. Just because it is open source doesn't mean it is unbiased, free from politics, and without racism.


some projects racially discriminate developers to avoid possible bias, politics, and racism? lolz


Guys, congratulations, you have become stars. I posted your dialogue on one of the Russian resources, and it was read by 117 thousand people. https://pikabu.ru/story/yandeks_opublikoval_iskhodnyie_kodyi...


They would argue they are discriminating based on nationality not race.

I can't see that argument working in a court of law...


>I can't see that argument working in a court of law...

Don't US(or any other country for that matter) laws discriminate based on nationality all the time?


Generally only the government is allowed to discriminate based on nationality (ie. for immigration), but private companies cannot.


Since the discussion started in the comments already, I have a similar question: any recommendations for a solution (don’t care if OSS or not) that has the best UX for nontechnical people to assemble some data and reports anyhow? I have Salesforce, some mariadb/postgres and (optionally) hubspot as data sources.

I can buy or manually provision anything, no technical hurdles or policies from that side. My absolute focus is the raw UX for business people.

Suggestions?


Honestly Metabase has given best balance between allowing non-technical users to self-service and technical users to dig in and use raw sql if that's what they want. Also it OSS core so you can self host. It is super feature rich and has most everything in the OSS version as long as you don't need enterprise features like SAML auth, audit log, ...,etc


Came here to say this as well.

Metabase is the only tool I’ve used where I’ve managed to get non-technical users to actually engage and use to query building tools to answer their own questions.


I'm a co-founder of Dataland.io where we're building a powerful dataset viewer + search engine that can work on top of your Postgres or data warehouse.

We designed it specifically to provide an excellent UX to business users while reducing BI burden on the data team. We find that most business users often just need to search, filter, and sort instead of looking at charts to make operational decisions.

UX-wise, what sets us apart are:

- <1s full-text search (even on billions of rows of data), feels like Cmd+F in Google sheets, but faster

- Performance: we stream billions of rows into the web browser, seamless scrolling (no paging of 50 records at a tieme)

- Rich cells make tables easier to scan/read (enum strings => colored tags, numbers => color-coded based on value => checkboxes, timestamptz => clear date time pills)

If that fits what you need, happy to give you a demo.

arthur(at)dataland.io

Otherwise, I think the simplest BI (if charts are impt) could be something like evidence.dev or Metabase.

But I also think it's going to require some curation on your part. Can you reasonably expect business users to navigate the entire schema/table tree across these three sources? That's where I think the bottleneck often lies -- if your BI tool allows engineering to just expose a subset of curated core tables.


I'm the founder of Definite (https://www.definite.app/). We do ETL, modeling, storage and BI in one app ("data-stack-in-a-box").

> has the best UX for nontechnical people to assemble some data

If they can use Excel / pivot tables, they can use Definite. They can also just ask in natural language and we generate the report for them.

> Salesforce, some mariadb/postgres and (optionally) hubspot as data sources

We have pipelines for all of these and can spin up a managed data warehouse to store all the data if you don't already have one.

Drop me a note at mike@definite.app if you're interested


The problem is that developing the perfect UI for nontechnical people to assemble reports probably requires a bespoke frontend for your business, and one that likely lags behind the reality of its changes. Most businesses instead opt to just hire semitechnical people that can do a bit of data work and give answers to the report-writers, as they can accommodate business changes over time and understand how to construct new queries out of the overall business' data sources.

Maybe that'll change one day with AI, and when it does that will be bought by every big company in the world (-:


Veezoo (https://www.veezoo.com) is built to make it as easy as possible for nontechnical users to get answers to their ad-hoc questions.

Follows a conversational "ChatGPT-like" approach since already 2016.

Info: I'm one of the founders.


I dislike about your pricing that it tells me reasonable 29$ and then in the fineprint it says minimum 5 users. I get the reasoning behind your pricing logic, but I really dislike it. Now as solo business owner I'm gone.


If you have a single data source that you'd like to use you can even use it for free up to 5 users.


If you want a spreasheet interface for business users, Sigma Computing (https://www.sigmacomputing.com/)


glaring misuse of it's in the title


Well played ;-)

ps - I spotted the pedant that is technically correct, and I claim my 5 McFun bucks!


Add "'s" for genitive case in English, they said.


I'm Ukrainian


Aren’t you American?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: