Sunday, December 16, 2012

Hadoop ecosystem

- Languages: Java.

- Scripting languages: Perl, Python or similar.

- CS algorithms: sorting, hashing, recursion, trees, graphs, etc.

- Hadoop core: MapReduce, HDFS.

- Hadoop utilities: Oozie, ZooKeeper.

- Relational algebra (SQL).

- Unix shell programming (sh, bash, csh, zsh): pipes, redirection, process control, etc.

- Unix pipeline utilities: awk, sed, grep, find, etc.

- Unix system utilities: cron, at, kill, ssh, sftp, etc.

- Regular expressions.

- Hadoop cluster administration: queues, quotas, replication, block size, decommission nodes, add nodes, etc.

- JVM-based functional languages: Scala, Clojure.

- Hadoop pipeline frameworks: Streaming, Crunch, Cascading.

- Hadoop productivity frameworks: Scrunch, Scoobi.

- Hadoop query languages: Pig, Hive, Scalding, Cascalog, PyCascading.

- Hadoop libraries: Mahout.

- Alternative HDFS-based computing frameworks: Spark (Pregel).

- Serialization frameworks: Avro, Thrift, Protocol Buffers.

- Distributed databases: Cassandra, Voldemort, HBase, MongoDB, CouchDB.

- Real-time event streaming: Storm, S4, InfoSphere Streams (IBM).

- Statistics, data mining or machine learning: expectation, regression, clustering, etc.

- Specific experience with the Cloudera Hadoop distribution.

- Unix system administration: sudo, mountd, bind, sendmail, etc.

- Database administration: MySQL, SQLite, Oracle, or similar.

123passportphoto is a very easy to use passport photo website that provides six enhanced photos. I have never had an issue while using this ...