Even in with large scale data Hadoop/Spark tend to be used in ways that makes no sense, as if something being self described as big data means that as soon as you cross some threshold you SHOULD be using it.
Recently had an argument with a senior engineer on our team because a pipeline that processed several PB of data, scaled to +1000 machines and was all account a success was just a Python script using multiprocessing distributed with ECS and didn't use Spark.
Recently had an argument with a senior engineer on our team because a pipeline that processed several PB of data, scaled to +1000 machines and was all account a success was just a Python script using multiprocessing distributed with ECS and didn't use Spark.