- Languages: Java.
- Scripting languages: Perl, Python or similar.
- CS algorithms: sorting, hashing, recursion, trees, graphs, etc.
- Hadoop core: MapReduce, HDFS.
- Hadoop utilities: Oozie, ZooKeeper.
- Relational algebra (SQL).
- Unix shell programming (sh, bash, csh, zsh): pipes, redirection, process control, etc.
- Unix pipeline utilities: awk, sed, grep, find, etc.
- Unix system utilities: cron, at, kill, ssh, sftp, etc.
- Regular expressions.
- Hadoop cluster administration: queues, quotas, replication, block size, decommission nodes, add nodes, etc.
- JVM-based functional languages: Scala, Clojure.
- Hadoop pipeline frameworks: Streaming, Crunch, Cascading.
- Hadoop productivity frameworks: Scrunch, Scoobi.
- Hadoop query languages: Pig, Hive, Scalding, Cascalog, PyCascading.
- Hadoop libraries: Mahout.
- Alternative HDFS-based computing frameworks: Spark (Pregel).
- Serialization frameworks: Avro, Thrift, Protocol Buffers.
- Distributed databases: Cassandra, Voldemort, HBase, MongoDB, CouchDB.
- Real-time event streaming: Storm, S4, InfoSphere Streams (IBM).
- Statistics, data mining or machine learning: expectation, regression, clustering, etc.
- Specific experience with the Cloudera Hadoop distribution.
- Unix system administration: sudo, mountd, bind, sendmail, etc.
- Database administration: MySQL, SQLite, Oracle, or similar.
- Scripting languages: Perl, Python or similar.
- CS algorithms: sorting, hashing, recursion, trees, graphs, etc.
- Hadoop core: MapReduce, HDFS.
- Hadoop utilities: Oozie, ZooKeeper.
- Relational algebra (SQL).
- Unix shell programming (sh, bash, csh, zsh): pipes, redirection, process control, etc.
- Unix pipeline utilities: awk, sed, grep, find, etc.
- Unix system utilities: cron, at, kill, ssh, sftp, etc.
- Regular expressions.
- Hadoop cluster administration: queues, quotas, replication, block size, decommission nodes, add nodes, etc.
- JVM-based functional languages: Scala, Clojure.
- Hadoop pipeline frameworks: Streaming, Crunch, Cascading.
- Hadoop productivity frameworks: Scrunch, Scoobi.
- Hadoop query languages: Pig, Hive, Scalding, Cascalog, PyCascading.
- Hadoop libraries: Mahout.
- Alternative HDFS-based computing frameworks: Spark (Pregel).
- Serialization frameworks: Avro, Thrift, Protocol Buffers.
- Distributed databases: Cassandra, Voldemort, HBase, MongoDB, CouchDB.
- Real-time event streaming: Storm, S4, InfoSphere Streams (IBM).
- Statistics, data mining or machine learning: expectation, regression, clustering, etc.
- Specific experience with the Cloudera Hadoop distribution.
- Unix system administration: sudo, mountd, bind, sendmail, etc.
- Database administration: MySQL, SQLite, Oracle, or similar.