Hacker News new | past | comments | ask | show | jobs | submit | yuanchuan's comments login

Tensorflow might not be the fastest in terms of computation speed, but it can be used from research to production with Tensorflow Serving.

As such you won't need to implement/convert your model in another format for usage.


It is definitely doable. You can refer to this blog post by AWS (https://aws.amazon.com/blogs/big-data/join-amazon-redshift-a...) to set up FDW to Redshift.

What is more exciting is you can leverage Redshift MPP architecture with this method.


We use Airflow in Tech in Asia as well.


I once worked on similar project. Each day, the amount of the data coming in is about 5TB.

If your data are event data, e.g. User activity, clicks, etc, these are non-volatile data which should preserve as-is and you want to enrich them later on for analysis.

You can store these flat files in S3 and use EMR (Hive, Spark) to process them and store it in Redshift. If your files are character delimited files, you can easily create a table definition with Hive/Spark and query it as if it is a RDBMS. You can process your files in EMR using spot instances and it can be as cheap as less than a dollar per hour.


Correct me if I'm wrong. I watched the Safari Content Blocker video that is presented in WWDC 2015 and it mentioned that the list of content to be filtered is compiled to bit code instead of reading it as a JSON file, which makes it more efficient and less draining on CPU. Since it is compiled down to bit code, 32-bit will not be compatible to 64-bit and that's why only the newer iPhones and iPads are compatible. It is not that iPhone 5 is not powerful enough but simply the CPU architecture doesn't support.


That's the most artificially overengineered solution I've seen in a while. Since the adblock list is custom, it would have to be "compiled" on the phone anyway, so arch mismatch simply doesn't apply. Even if it did, it could be done at phone startup. It's "compiling" a list of strings, not building an office suite...

There are so many high-performance/low-power ways to solve the extremely complicated problem of "does a given string appear in a given list?"... this is just Apple looking for excuses to force people on 5 to upgrade, as usual.


If Apple was looking for excuses to force people on 5 to upgrade, they'd simply not support iOS 9 on that device at all...


Supporting old devices is kind of marketing against Android. How those devices actually work with new OS is another matter.


They would need to be compiled to a different architecture. Guess Apple doesn't want to write the code to compile the list to older 32-bit ARM.


Can totally relate to this. I have written, scrapped, re-written the code a few times for the past 4 years (1461 days). I am almost there!

Great advice and now I need to get things started again.


It is that buzz surrounding Hadoop that makes people misunderstood its use and capability. I have met non-technical analysts who want RDBMS performance on Hadoop. They expect seconds to minutes scale queries on hundreds of GB of data.

I always throw this analogy to people who misunderstood Hadoop: A stone to crack an egg or a spoon?

Hadoop and RDBMS only have a thin overlapping region in the Venn diagram that describes their capabilities and use cases.

Ultimately, it is cost vs efficiency. Hadoop can solve all data problems. Likewise for RDBMS. This is an engineering tradeoff that people have to make.


I totally agree with you. Capability <strong>"LIKE"</strong> will drive Hadoop adoption, Hadoop should not be seen as replacement of R.D.B.M.S. These are two different tools for made for different purpose.


> They expect seconds to minutes scale queries on hundreds of GB of data.

Use BigQuery from Google.


On-premise cluster.

Cloud solution are totally out due to the nature of the data. Not everything can be done in cloud.

If you have such huge amount of data, the total amount of time it takes to transfer there and compute is not as competitive as an on-premise solution, unless all your data live in the cloud.


I would look into https://spark.apache.org/ then. You can get quite good performance out of it, but you need to spend more effort in babysitting your data.


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: