yuanchuan's comments

yuanchuan · on June 17, 2017

Tensorflow might not be the fastest in terms of computation speed, but it can be used from research to production with Tensorflow Serving.

As such you won't need to implement/convert your model in another format for usage.

yuanchuan · on Nov 15, 2016

It is definitely doable. You can refer to this blog post by AWS (https://aws.amazon.com/blogs/big-data/join-amazon-redshift-a...) to set up FDW to Redshift.

What is more exciting is you can leverage Redshift MPP architecture with this method.

yuanchuan · on Nov 11, 2016

We use Airflow in Tech in Asia as well.

yuanchuan · on Feb 25, 2016

I once worked on similar project. Each day, the amount of the data coming in is about 5TB.

If your data are event data, e.g. User activity, clicks, etc, these are non-volatile data which should preserve as-is and you want to enrich them later on for analysis.

You can store these flat files in S3 and use EMR (Hive, Spark) to process them and store it in Redshift. If your files are character delimited files, you can easily create a table definition with Hive/Spark and query it as if it is a RDBMS. You can process your files in EMR using spot instances and it can be as cheap as less than a dollar per hour.

yuanchuan · on Sept 25, 2015

Correct me if I'm wrong. I watched the Safari Content Blocker video that is presented in WWDC 2015 and it mentioned that the list of content to be filtered is compiled to bit code instead of reading it as a JSON file, which makes it more efficient and less draining on CPU. Since it is compiled down to bit code, 32-bit will not be compatible to 64-bit and that's why only the newer iPhones and iPads are compatible. It is not that iPhone 5 is not powerful enough but simply the CPU architecture doesn't support.

toyg · on Sept 25, 2015

That's the most artificially overengineered solution I've seen in a while. Since the adblock list is custom, it would have to be "compiled" on the phone anyway, so arch mismatch simply doesn't apply. Even if it did, it could be done at phone startup. It's "compiling" a list of strings, not building an office suite...

There are so many high-performance/low-power ways to solve the extremely complicated problem of "does a given string appear in a given list?"... this is just Apple looking for excuses to force people on 5 to upgrade, as usual.

kalleboo · on Sept 25, 2015

If Apple was looking for excuses to force people on 5 to upgrade, they'd simply not support iOS 9 on that device at all...

vbezhenar · on Sept 25, 2015

Supporting old devices is kind of marketing against Android. How those devices actually work with new OS is another matter.

kccqzy · on Sept 25, 2015

They would need to be compiled to a different architecture. Guess Apple doesn't want to write the code to compile the list to older 32-bit ARM.

yuanchuan · on Sept 16, 2015

Can totally relate to this. I have written, scrapped, re-written the code a few times for the past 4 years (1461 days). I am almost there!

Great advice and now I need to get things started again.

yuanchuan · on Jan 19, 2015

It is that buzz surrounding Hadoop that makes people misunderstood its use and capability. I have met non-technical analysts who want RDBMS performance on Hadoop. They expect seconds to minutes scale queries on hundreds of GB of data.

I always throw this analogy to people who misunderstood Hadoop: A stone to crack an egg or a spoon?

Hadoop and RDBMS only have a thin overlapping region in the Venn diagram that describes their capabilities and use cases.

Ultimately, it is cost vs efficiency. Hadoop can solve all data problems. Likewise for RDBMS. This is an engineering tradeoff that people have to make.

sleepythread · on Jan 19, 2015

I totally agree with you. Capability <strong>"LIKE"</strong> will drive Hadoop adoption, Hadoop should not be seen as replacement of R.D.B.M.S. These are two different tools for made for different purpose.

pacala · on Jan 19, 2015

> They expect seconds to minutes scale queries on hundreds of GB of data.

Use BigQuery from Google.

yuanchuan · on Jan 20, 2015

On-premise cluster.

Cloud solution are totally out due to the nature of the data. Not everything can be done in cloud.

If you have such huge amount of data, the total amount of time it takes to transfer there and compute is not as competitive as an on-premise solution, unless all your data live in the cloud.

pacala · on Jan 20, 2015

I would look into https://spark.apache.org/ then. You can get quite good performance out of it, but you need to spend more effort in babysitting your data.