Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Smart Fruit – A Python schema-based machine learning library
44 points by madman_bob on June 30, 2018 | hide | past | favorite | 5 comments
I've made a small Python library, designed for quick-and-easy prototyping of machine learning models. It's built on top of scikit-learn, to serialize and deserialize data from the forms you're likely to have, to the format used in scikit-learn.

https://github.com/madman-bob/Smart-Fruit

It's pretty bare-bones at the moment, but I thought I'd see if there was any interest before spending too much time on it.

Let me know what you think.



I've recently written a related library - given a DataFrame it'll run sklearn's RandomForest to check which columns predict other columns. The goal is to learn which relationships exist within a DataFrame. Typically in the exploratory process in machine learning we want to learn how the data holds together - this tool helps with that discovery exercise. It'll auto-LabelEncode text and allows classification or regression. There are two example Notebooks (Titanic & Boston) to show what it is doing. Correlations (Pearson, Spearman, Kendall) can also be calculated. The RandomForest result can show non-linear relationships that aren't exposed by correlations. https://github.com/ianozsvald/discover_feature_relationships


I liked the boldness of this idea. But 'something' needs to select the sklearn model, tune its hyper-params - how long can you keep it all hidden away from the user?

The training phase can be considerably long. Have you thought of some kind of an async wrapper that Smart Fruit might provide or will the user be expected to code it up?

This is more of a user experience comment - when the interface is designed to feel as if one is interacting with a DB / ORM the user may come to assume that the outcomes will be deterministic... While the returned results will remain deterministic given the training data, model and hyper-parameters remain the same - it won't feel as deterministic when either of these is updated... I am not sure if I communicated my concern clearly. I am trying to understand who the intended end-user is, of this package...


I would propose a potential user as someone interested in some of the meta considerations and patterns of statistical reasoning, aka machine learning. There are is a vast amount of particulars the second hand on my watch operates (e.g. vibrating quartz, digital), but I can use that mostly reliable device to investigate higher level phenomenom, like calculating distance of planets by timing their movement. This library opens a direct line to these algorithims such that one might intuit, and apply, their high level behavior; as I could not time planets if consumed with the fidelitity and reliability of resonating quartz, it would slow my ability to explore this kind of reasoning if concerned with the minutiae.

That said, all points taken. If this sparks interest in someone, as is stands, it would be on them to dig in to all the considerations you've outline.


I love it. Pasted in the column headers to `iris.data` from the Iris website. Voila, up and running per instructions on Github. For prototyping / exploring ideas, for the syntactical layman, but conceptuallly familiar, what a boon.


This looks like a good porcelain to sklearn. Many including myself find it intimidating at times.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: