More

carljv · on Feb 9, 2022

Mazda is pretty allergic to touchscreens and views them as a safety risk. Environmental controls are physical button and knobs in all models I’ve seen, and even the infotainment feature are controlled with buttons and a central knob. I never have to reach up and touch the screen. The screen isn’t even responsive to touch while the car is in motion.

carljv · on Jan 29, 2022

First, the link should probably be to the full report

https://www.kaggle.com/nomilk/data-science-language-and-job-...

Second, this is data scraped from Australian job listings. So the title should probably reflect that.

Lastly, this data seems a little odd. You’ve got month-to-month swings of 20% or more happening here. This data source seems extremely noisy and I don’t think there’s anything you can reliably say about a trend here.

I think others in the comments have pointed out why, if it were real, you might see this overall trend and why it doesn’t mean salaries for the same role are falling.

This looks like a fun side project, but I would be careful reading anything into the results.

nomilk · on Feb 3, 2022

Noted. I'll incorporate this feedback in the next edition (circa end 2022 or early 2023).

The large month to month swings are because the Australian labour market isn't so big, so even a single employer's hiring spree can significantly affect the number of listings in a given month. There are seasonal effects in there too of course. It's volatile data.

I did a quick mock up using a general additive model, which I think's a better fit. I'll use it next edition: https://i.imgur.com/7oTn9Oq.png

carljv · on Jan 29, 2022

In addition to that, I think some lower-paying roles that used to be analysts or BI specialists are increasingly being called Data Scientists. My guess would be if you controlled for skills and job responsibilities you’d see increases.

carljv · on June 5, 2021

That is not an accurate description of the book.

carljv · on June 5, 2021

There’s some overlap, but vectors are essential to the language. Every type of data in R is a vector. There are no scalars, just vectors of length 1. Instead of dictionaries, it’s idiomatic in R to use “lists”, which are vectors of vectors. Data frames are lists (vectors of vectors) constrained to have equal length element vectors (ie columns). Classes are defined as lists with some metadata (stored in a vector) to direct method dispatch.

It’s not just vectorizing mathematical operations a la numpy.

carljv · on Aug 12, 2020

This. During the period where Uber and Lyft weren't operating in Austin, there were several local alternatives that popped up right away, worked just fine, operated within local regs, and that I honestly preferred service-wise. If Austin can do it, I bet California can.

nrmitchi · on Aug 12, 2020

Yes, but after that Uber and Lyft aggressively drove those local alternatives out of business by giving effectively-free rides for the next couple of months. The majority of my rides around the downtown area during that time cost me _maybe_ a few dollars out of pocket. That is if I even paid at all, vs using "credit" which seemed to be arbitrarily added to my account.

It is illogical to start up a new business when it is more than likely that the previous incumbent will just come back in X number of months and operate at a loss until you're gone.

ntsplnkv2 · on Aug 12, 2020

Should be illegal

carljv · on July 1, 2020

> It's not the job of the AI researcher to solve "social biases" in every field, it's their job to build the AI.

"'Once the rockets are up, who cares where they come down? That's not my department' says Wernher von Braun"

julianapostate · on July 2, 2020

Tim Berners-lee should have stopped to consider the potential for dark web criminality before he developed the internet.

carljv · on July 2, 2020

If one is working on AI dealing with facial recognition and is oblivious to the potential for bias, and the unethical applications of that technology, at this point in the game, I can only assume it's willful.

carljv · on Feb 6, 2019

It’s interesting to see how the definition of “tech company” has expanded. Only one or two of these companies create and sell software or technology. These are startups, maybe, and I think we can talk about where the “innovative” companies are. But these are companies that run platforms online, or use the web to sell products in novel ways. They seem very different from most of the companies on the parent comment’s list.

ghobs91 · on Feb 6, 2019

This is partly a side effect of tech becoming so big, it's segmenting into products vs services. For example, Tinder isn't directly selling software to anyone, it's selling a dating service that runs exclusively on software. You could say the same about Facebook, what software is it selling directly to consumers?

None. It's selling a social network service to its users and an ad network service to its clients, built entirely on software.

romwell · on Feb 6, 2019

Sure. Tinder and Facebook are tech firms, and so is Bloomberg.

That mattress company you put on the list isn't. Lines in the sand, yadda yadda, slapping a website on a store doesn't make it "tech".

carljv · on Dec 17, 2018

The R ecosystem around cloud services and their APIs still seems immature, so it’s great to see folks working on packages in the space.

I’m not 100% sure if this is providing any new functionality not provided by existing cloudyr projects or is just wrapping them in a new API. I think either is fine, but it would help to better understand why you’d want to use flyio vs, say aws.s3 or the like.

Also, there are some aspects of the API that make me a little itchy. If I’m reading the examples correctly, it seems like flyio_set_datasource sets a global variable and then there are generic functions like list_files that do different things based on that global state?

That seems risky to me, and a more idiomatic approach to this would be to have a function that returns a handle object representing a Google Cloud or AWS service, then have generic functions take that handle and dispatch to appropriate methods.

Even then, namespacing in R isn’t really a thing, and I worry that really plain function names like list_files or export_file are likely to get clobbered by other packages using names like that. For packages like readr that are intended to actually replace large swaths of IO functions, that’s fine. But I’m not sure it makes sense for a more specialized package like this.

Despite that, I do appreciate you all creating and open sourcing this. Like I mentioned, any work on cloud packages is welcome from my perspective! Interested to see how this develops.

renthu · on Dec 18, 2018

Isn't the main USP of flyio the cloud agnostic part? You can play between local, google and amazon without changing the code.

carljv · on May 2, 2018

The Jupyter team deserves every accolade they get and more. The console, notebook, and now JupyterLab are some of the key reasons why Python's data ecosystem thrives.

I think Jupyter notebooks are quite useful as "rich display" shells. I often use them to set up simple interactive demos or tutorials to show folks or keep notes or scratch for myself.

That being said, I do think the "reproducibility" aspect of the notebook is overblown for the reasons other comments cite. Notebooks are hard to version control and diff, and are easy to "corrupt." I often see Jupyter notebooks described as "literate programs," and I really don't think that's an apt description. The notebook is basically the IPython shell exposed to the browser where you can display rich output.

This is where I think the R ecosystem's approach to the problem is better (a bit like org-mode & org-babel). For them, there is a literate program in plain text. Code blocks can be executed interactively and results displayed inline by a "viewer" on the document (like that provided by RStudio), but executing code doesn't change the source code of the program, and diffs/versions are only created by editing the source. At any point, the file can be "compiled" or processed into a static output document like HTML or PDF.

This is essentially literate programming but with an intermediate "interactive" feature facilitated by an external program. RMarkdown source doesn't know its being interacted with or executed, and you can edit it like any other literate program.

Interaction, reproducibility, and publication have fundamental tensions with each other. Jupyter notebooks are trying to do all three in the same software/format, and my sense is that they're starting to strain against those tensions.

goerz · on May 3, 2018

Notebooks can be reproducible, they just aren’t automatically so. It requires a little bit of effort and discipline, if reproducibility is a goal. https://www.svds.com/jupyter-notebook-best-practices-for-dat... is an excellent starting point. Personally, I use notebooks to keep a record of large computational pipelines. The key is to cache all results to disk. This allows for an iterative process where I modify the notebook, kill the kernel, and rerun everything. Only new calculations will be executed, everything previously calculated will simply be loaded from disk. In the end, I have a reproducible record of the entire project (and rerunning the notebook is fast) This kind of make-like functionality is implemented through the doit Python package (http://pydoit.org). An example workflow for this is http://clusterjob.readthedocs.io/en/latest/pydoit_pipeline.h...

PurpleRamen · on May 4, 2018

So basically you write a script instead of a notebook? If you save data on disc, are they still displayed with the rich formating of Jupyter?

goerz · on May 4, 2018

Well, it also contains markdown comments (often with LaTeX formulas, so the graphical rendering is appreciated), and, most importantly, plots of the results (which are typically fast to generate, so they are not cached).

your-nanny · on May 3, 2018

I agree, 120%.

I like the r approach so much more.

carljv · on May 3, 2018

I mean, as a medium for interactive exploration where you might want graphs and widgets or other rich/dynamic output, I still think the notebook is superior. But as a medium for developing complete, share-able, reproducible data analyses, I do think R has the upper hand.

_2d30 · on May 3, 2018

Graphs, widgets and other rich/dynamic output is also possible with the R approach.

https://rmarkdown.rstudio.com/

Additionally, Rstudio is an incredibly powerful IDE for data analysis.

EDIT: Interestingly, however, I still use ESS https://ess.r-project.org/ but that's because I love Emacs too much :D

carljv · on May 3, 2018

I understand. I believe I pointed that all out my comment above. I wasn't saying that I find the notebooks superior because they allow for rich & dynamic output, but that I find it superior to RStudio when all you want is a quick exploratory REPL capable of rich/dynamic output. I simply find it easier to fire up a notebook and start noodling around than writing an RMarkdown notebook. That really only holds if I'm not overly concerned with keeping or sharing the notebook. Otherwise, I believe RMarkdown is the better option.

I also tend gravitate towards ESS, and probably split my R development time between emacs and RStudio. I've even written a very kludgy Rmd notebook mode that uses overlays to show evaluation results from code chunks. But RStudio is very well-designed and ESS just doesn't compare feature-wise, sadly.

your-nanny · on May 3, 2018

I just like python pandas better than R.

carljv · on May 3, 2018

Not me. I'd take dplyr and related libraries over pandas any day. I've been using pandas for 6 years and I'm still regularly tripped up by parts of its API.