Yandex open sourced it's BI tool DataLens

xnx · on Sept 26, 2023

Such a huge landscape of BI and data visualization tools. Are there any clear open source leaders? Apache Superset?

amcaskill · on Sept 26, 2023

Superset and metabase are definitely the OSS leaders.

I work on an open source code-based BI tool called Evidence, which might be of interest to you.

It's effectively a static site generator aimed at building automated reports and analysis.

https://github.com/evidence-dev/evidence

Previous discussions on HN:

https://news.ycombinator.com/item?id=28304781 - 91 comments

https://news.ycombinator.com/item?id=35645464 - 97 comments

Jgrubb · on Sept 27, 2023

I really, really like your idea with Evidence. I long for a mode in Metabase that’s like a “notebook mode”, where the main focus is narrative and it’s ornamented with viz, rather than the other way around.

I want to be able to publish this notebook when I’m done and then be able to hand that around, the same concept that you’ve built Evidence around. I think that’s a very good idea, so thanks.

The main thing keeping me from switching is that Metabase’s query builder and visualizations are too good for 95% of my work. It’s hard to picture going back to writing _all_ the SQL.

amcaskill · on Sept 27, 2023

Thank you!

I hear that. We're making a lot of progress on reducing the amount of sql you need to write, keeping it DRY etc. Making the dev experience really buttery and high leverage is definitely a priority.

Here are a few of the things we're working on in that regard:

1. Making our components issue their own queries so that you don't need to write full sql expressions, just fragments when you're defining the chart you want.

2. Improving intellisense -- right now you get "slash commands" and snippets in our vscode extension to invoke components (which are really sweet), but we're aiming to get to a full intellisense type of experience.

3. Supporting external metrics layers where it makes sense. We've got some users using Cube, we're interested in Malloy, and the dbt semantic layer, those types of things.

One of our team members is very keen on building something he calls "Evidence studio", sort of a wsywig you could invoke during development for generating basic queries, setting up charts etc. that syncs the components into their text representation. That'd be further off though :)

3abiton · on Sept 27, 2023

The OS BI field is bigger than I expected.

cvalka · on Sept 26, 2023

There's redash as well.

dingdong33 · on Sept 26, 2023

It is a dead project

antman · on Sept 26, 2023

Seems to be getting updates, last commit 5 days ago

amcaskill · on Sept 26, 2023

Databricks acquired it if I recall correctly.. so maybe.

cvalka · on Sept 27, 2023

Why so?

otteromkram · on Sept 27, 2023

Platforms like this are always pretty funny. It's basically a gateway drug to their cloud platform (which isn't free) that they hope you use, but then they keep it open source so that they don't have to pay salaries to w2 employees. Smart thinkin'!

amcaskill · on Sept 27, 2023

Thank you for mentioning it!

For anyone who is interested in our cloud service, it's an easy way to put your project online, keep it up to date with your data, and place it behind access control.

For many organizations, hosting Evidence in their own infrastructure is easy enough, but if you don't want to manage that, we are happy to manage it for you.

It is not free (that's how we pay the salaries), but pricing is available here:

https://evidence.dev/cloud

RedShift1 · on Sept 26, 2023

Right now I'm heavily into Grafana but when it comes to BI it kinda falls flat, I regularly have to fall back to using the Plotly plugin to create the charts (but it's getting better, at least you can do a normal scatter or bar chart out of the box since version 8. Labeling the axes is still a problem though). Navigation is also problematic, like jumping to a source table takes a lot of effort to make (basically you have to create a new dashboard and do some hyperlinking instead of there being a ready to go "view source data" button). I feel like there's a lot of friction to get Grafana to do BI, but I've also invested so much time in it I'm afraid to jump ship...

FridgeSeal · on Sept 26, 2023

I’d advise metabase over superset.

Superset looked good, but operating superset quickly runs into the same Python issues all Python software suffers from.

Sometimes it would just break for no apparent reason. Configuring it was a nightmare of magic Python code and unclear settings. Trying to use plugins was equally painful: due to the poor boundary separating the applications dependencies from the plugins dependencies, adding a db connector could just bork the whole application.

PetahNZ · on Sept 26, 2023

Or like not being able to delete a user without running some SQL:

https://github.com/apache/superset/issues/13345

Almost instantly run into this issue setting up a test instance of Superset. And the issue has been around for years.

xnx · on Sept 26, 2023

Very sound advice. I started the setup process for Superset, and it's even worse on Windows. Contrast that to Superset which is a single .jar file and worked instantly.

amarshall · on Sept 27, 2023

One of those “Superset” seems wrong.

noughtme · on Sept 26, 2023

Also Metabase, which I found easier to deploy and use.

dacort · on Sept 26, 2023

Metabase 47 also has new serialization features in paid editions that allow for git-based workflows. https://www.metabase.com/learn/administration/git-based-work...

doctorpangloss · on Sept 26, 2023

What do people use for an "analytics.js" for reporting events with common items like campaign data, user device and user profile measurements, and related from browsers and devices?

zxspectrum1982 · on Sept 27, 2023

Is there any open source BI tool that can be embedded in some other product so that users (not product developers but final users) can create their own dashboards?

moltar · on Sept 29, 2023

QuickSight has great embedding story

Also CubeJS if you want to a bit more flexibility.

throw03172019 · on Sept 27, 2023

Metabase

zxspectrum1982 · on Sept 27, 2023

IMHO Metabase is too complex for end users: too much "code" (queries) to write and very limited in the presentation layer (widgets users can use to show the results).

I was thinking of something closer to Power BI, e. g. something like AppSmith but more BI-oriented (AppSmith is a generic tool to build your own applications, kind of like Visual Basic or Delphi).

https://www.appsmith.com/

cvalka · on Sept 26, 2023

This is awesome! Let's not forget about their second generation SQL database YDB which is an open source alternative to TiDB and YugabyteDB.

mdekkers · on Sept 27, 2023

Yandex consistently pumps out great software. Clickhouse is awesome, for starters, as is this.

felixhummel · on Sept 26, 2023

Apache 2.0 licensed (from a cursory glance at the first few repos).

totalhack · on Sept 26, 2023

Looks pretty neat. Having not fully investigated it yet, I will say the one thing I usually run into with this and other BI tools is a lack of flexibility in the UI for forming queries. It's sometimes limited to one table or view at a time. I wonder why more of them don't use more flexible querying techniques, perhaps just due to the risk of a bad query being formed?

My preferred approach is implemented in Zillion, which I use for BI at my company: https://github.com/totalhack/zillion

dwheeler · on Sept 26, 2023

Nit on title:

it's => its

pklausler · on Sept 26, 2023

Just be glad it wasn't " its' " (sic), which has been showing up more and more in my input streams.

unixhero · on Oct 1, 2023

>Where can I find persistent application data storage?

>We use the .us-data folder to store PostgreSQL data permanently. You can delete this folder if you want, it will be recreated with the demo data after restarting the datalens-us container

Why "us"?

nijave · on Oct 1, 2023

Unified Storage. https://github.com/datalens-tech/datalens-us

RedShift1 · on Sept 26, 2023

Any screenshots of how it looks?

capableweb · on Sept 26, 2023

The website has screenshots: https://datalens.tech/

RedShift1 · on Sept 26, 2023

Ah thanks, I was browsing through the github repos.

kasperni · on Sept 26, 2023

https://cloud.yandex.com/en/services/datalens

lasermike026 · on Sept 26, 2023

I'm giving up Power BI and I'm moving to Domo. I look at rolling my own occasionally.

mritchie712 · on Sept 26, 2023

(I'm the founder of a competitor to Domo)

I really like the concept of Domo. They have ETL, modeling, a warehouse and BI in one app ("data-stack-in-a-box"). I've interviewed 20 of their customers and the general sentiment was pretty bad. There's a long sales process, a longer process to get it set it up, and they've built all the modeling and connectors themselves (vendor lock in, none are best-in-class).

Definite (https://www.definite.app/) is a data-stack-in-a-box. We have a built-in modeling layer for core metrics and an AI assistant to answer any one-off questions.

A few ways we're different:

Built on open source - We run the data stack for you and give you a single app to manage and analyze your data, but it's all built on open source standards. So if you decide at any point you want to run it all yourself, the code is yours to lift and shift to your own infrastructure.

Battle tested connectors - We're using Meltano / Singer (open source library from Stitch) for our connectors, so they've been used heavily in production for years.

Self-serve that actually works - A lot of tools promise self-serve, but AI is making this real. We've invested heavily in making it possible to ask questions and get accurate answers. The AI queries a modeled view of your data that can answer questions that depend on well defined metrics (e.g. ARR, DAU, etc.).

philipodonnell · on Sept 27, 2023

> Text to SQL that actually works. Ask questions in natural language across multiple tables and our engine knows where to join.

Can you share any information about how it knows where to join?

mritchie712 · on Sept 27, 2023

We do a couple things here:

1. When you bring your own data warehouse, we parse the query history, convert it so an AST and learn JOIN's from there

2. When you're using our managed data warehouse and ETL, we already know most of the JOIN's (e.g. we know how to join the Hubspot data we ingress to Stripe)

3. For anything not covered by #1 or #2, we have a modeling layer where you can specify how to join tables.

danr4 · on Sept 26, 2023

all that text and no pricing page. shame.

mritchie712 · on Sept 26, 2023

Fair point! We have a free plan (capped at 3 users) and paid plans (which add ETL and a data warehouse) start at $500 a month.

_boffin_ · on Sept 26, 2023

We have Domo at work and it just seems overly complicated and insane. I'm wanting to learn it instead of just polling data sources and adding them to a local Opensearch instance, but... too verbose for me.

RedShift1 · on Sept 26, 2023

If you want to roll your own, maybe have a look at Plotly's Dash?

chatmasta · on Sept 26, 2023

Observable Plot [0] is also nice. AFAIU it's the same library powering the visualizations within Observable itself.

[0] https://observablehq.com/plot/

wutwutwat · on Sept 26, 2023

Yandex open sourced all of its code a few months back when it was all stolen and leaked.

6c696e7578 · on Sept 26, 2023

Didn't the CEO flee due to the war?

https://www.intellinews.com/russian-tech-titan-yandex-ceo-vo...

wutwutwat · on Sept 26, 2023

I would have no idea. Why are you asking me and sharing a link to something which tells you. That’s weird, and I’m not interested in clicking the link anyway.

bthomas · on Sept 27, 2023

Not sure why the downvotes - very useful context and phrased correctly as a question.

mobileexpert · on Sept 26, 2023

Cool. Rolling your own BI seems fraught with peril at most orgs where I imagine the buy vs build decision is always buy. How many PowerBI or Tableau seats do you need before rolling your own internal BI platform starts to make sense?

htrp · on Sept 26, 2023

I don't think it ever makes sense because the large players will always be able to make new innovations (mobile apps, natural language querying, SSO integrations, etc) that, short of large corps hiring BI teams to invest in the open source ecosystem like superset, your open source solution won't have.

mritchie712 · on Sept 26, 2023

Yeah, unless your at unicorn scale, I don't see how self-hosting BI would ever make sense relative to other investments you could make.

totalhack · on Sept 26, 2023

I must like to live dangerously. In all seriousness though there are low cost alternatives to those mega BI tools that suit many use cases. If I wasn't rolling my own I'd probably start with Metabase or Superset. What I use: https://github.com/totalhack/zillion

londons_explore · on Sept 26, 2023

Yandex makes some pretty cool tech- they clearly have a lot of smart engineers.

It's a shame that geopolitics means most of it will have to be reinvented by someone else before it'll see any use.

efxhoy · on Sept 26, 2023

Yeah. I’d love to use clickhouse but yandex ties to the russian government makes me not want to.

zX41ZdbW · on Sept 26, 2023

ClickHouse was open-sourced in 2016 and moved to an independent company in 2021, with no ties remaining with Russia or Yandex. Read more: https://clickhouse.com/blog/we-stand-with-ukraine/

efxhoy · on Sept 27, 2023

Oh wow that was news to me, thanks!

kmac_ · on Sept 26, 2023

Not only you. The company that I work for delivers software to government agencies. Dependencies like libraries or stack parts that are Russian or Chinese are simply banned. Even Harbor was rejected (and probably should be, as they provide a bunch of their own built Docker images, so you never know what your stack runs). I know "but it's open source" etc., and somebody may feel offended, but that's reality.

mahoro · on Sept 26, 2023

ClickHouse is a separate company registered in US and some part of it is owned by US-based venture funds.

Eumenes · on Sept 26, 2023

Its open source, who care what longitude and latitude the steward of the project is physically based in ... copy the code and make your own thing up

Silasdev · on Sept 26, 2023

Yes, that is really a shame, although it's been fully open sourced.

Yandex is aware of how the geopolitical situation is hurting them and are therefore building a new company called Double.Cloud, based in Europe, to work around the negative public opinion on Yandex, and thereby continue being able to sell Clickhouse cloud services.

macinjosh · on Sept 27, 2023

Need to borrow a tin foil hat?

freilanzer · on Sept 27, 2023

For not wanting to be involved with the russian government? Wow, what a conspiracy theory!

efxhoy · on Sept 27, 2023

No need for that. I didn’t know Clickhouse was now disconnected from the Russian government.

cpursley · on Sept 26, 2023

Isn't it all open-source?

Dah00n · on Sept 26, 2023

Some projects have refused code from developers because they are Russian or Chinese. Just because it is open source doesn't mean it is unbiased, free from politics, and without racism.

slt2021 · on Sept 26, 2023

some projects racially discriminate developers to avoid possible bias, politics, and racism? lolz

lam0x86 · on Sept 29, 2023

Guys, congratulations, you have become stars. I posted your dialogue on one of the Russian resources, and it was read by 117 thousand people. https://pikabu.ru/story/yandeks_opublikoval_iskhodnyie_kodyi...

londons_explore · on Sept 26, 2023

They would argue they are discriminating based on nationality not race.

I can't see that argument working in a court of law...

neoromantique · on Sept 26, 2023

>I can't see that argument working in a court of law...

Don't US(or any other country for that matter) laws discriminate based on nationality all the time?

londons_explore · on Sept 27, 2023

Generally only the government is allowed to discriminate based on nationality (ie. for immigration), but private companies cannot.

anonyfox · on Sept 26, 2023

Since the discussion started in the comments already, I have a similar question: any recommendations for a solution (don’t care if OSS or not) that has the best UX for nontechnical people to assemble some data and reports anyhow? I have Salesforce, some mariadb/postgres and (optionally) hubspot as data sources.

I can buy or manually provision anything, no technical hurdles or policies from that side. My absolute focus is the raw UX for business people.

Suggestions?

davidarenas · on Sept 26, 2023

Honestly Metabase has given best balance between allowing non-technical users to self-service and technical users to dig in and use raw sql if that's what they want. Also it OSS core so you can self host. It is super feature rich and has most everything in the OSS version as long as you don't need enterprise features like SAML auth, audit log, ...,etc

FridgeSeal · on Sept 26, 2023

Came here to say this as well.

Metabase is the only tool I’ve used where I’ve managed to get non-technical users to actually engage and use to query building tools to answer their own questions.

arthurwu · on Sept 26, 2023

I'm a co-founder of Dataland.io where we're building a powerful dataset viewer + search engine that can work on top of your Postgres or data warehouse.

We designed it specifically to provide an excellent UX to business users while reducing BI burden on the data team. We find that most business users often just need to search, filter, and sort instead of looking at charts to make operational decisions.

UX-wise, what sets us apart are:

- <1s full-text search (even on billions of rows of data), feels like Cmd+F in Google sheets, but faster

- Performance: we stream billions of rows into the web browser, seamless scrolling (no paging of 50 records at a tieme)

- Rich cells make tables easier to scan/read (enum strings => colored tags, numbers => color-coded based on value => checkboxes, timestamptz => clear date time pills)

If that fits what you need, happy to give you a demo.

arthur(at)dataland.io

Otherwise, I think the simplest BI (if charts are impt) could be something like evidence.dev or Metabase.

But I also think it's going to require some curation on your part. Can you reasonably expect business users to navigate the entire schema/table tree across these three sources? That's where I think the bottleneck often lies -- if your BI tool allows engineering to just expose a subset of curated core tables.

mritchie712 · on Sept 26, 2023

I'm the founder of Definite (https://www.definite.app/). We do ETL, modeling, storage and BI in one app ("data-stack-in-a-box").

> has the best UX for nontechnical people to assemble some data

If they can use Excel / pivot tables, they can use Definite. They can also just ask in natural language and we generate the report for them.

> Salesforce, some mariadb/postgres and (optionally) hubspot as data sources

We have pipelines for all of these and can spin up a managed data warehouse to store all the data if you don't already have one.

Drop me a note at mike@definite.app if you're interested

robertlagrant · on Sept 26, 2023

The problem is that developing the perfect UI for nontechnical people to assemble reports probably requires a bespoke frontend for your business, and one that likely lags behind the reality of its changes. Most businesses instead opt to just hire semitechnical people that can do a bit of data work and give answers to the report-writers, as they can accommodate business changes over time and understand how to construct new queries out of the overall business' data sources.

Maybe that'll change one day with AI, and when it does that will be bought by every big company in the world (-:

tillvz · on Sept 26, 2023

Veezoo (https://www.veezoo.com) is built to make it as easy as possible for nontechnical users to get answers to their ad-hoc questions.

Follows a conversational "ChatGPT-like" approach since already 2016.

Info: I'm one of the founders.

tomschwiha · on Sept 26, 2023

I dislike about your pricing that it tells me reasonable 29$ and then in the fineprint it says minimum 5 users. I get the reasoning behind your pricing logic, but I really dislike it. Now as solo business owner I'm gone.

tillvz · on Sept 26, 2023

If you have a single data source that you'd like to use you can even use it for free up to 5 users.

Aaronstotle · on Sept 26, 2023

If you want a spreasheet interface for business users, Sigma Computing (https://www.sigmacomputing.com/)

riffic · on Sept 26, 2023

glaring misuse of it's in the title

Logans_Run · on Sept 26, 2023

Well played ;-)

ps - I spotted the pedant that is technically correct, and I claim my 5 McFun bucks!

culebron21 · on Sept 27, 2023

Add "'s" for genitive case in English, they said.

vgt · on Sept 26, 2023

I'm Ukrainian

ande-mnoc · on Sept 27, 2023

Aren’t you American?