Sunday, December 16, 2012

Hadoop ecosystem

- Languages: Java.

- Scripting languages: Perl, Python or similar.


- CS algorithms: sorting, hashing, recursion, trees, graphs, etc.


- Hadoop core: MapReduce, HDFS.


- Hadoop utilities: Oozie, ZooKeeper.


- Relational algebra (SQL).


- Unix shell programming (sh, bash, csh, zsh): pipes, redirection, process control, etc.


- Unix pipeline utilities: awk, sed, grep, find, etc.


- Unix system utilities: cron, at, kill, ssh, sftp, etc.


- Regular expressions.



- Hadoop cluster administration: queues, quotas, replication, block size, decommission nodes, add nodes, etc.


- JVM-based functional languages: Scala, Clojure.


- Hadoop pipeline frameworks: Streaming, Crunch, Cascading.


- Hadoop productivity frameworks: Scrunch, Scoobi.


- Hadoop query languages: Pig, Hive, Scalding, Cascalog, PyCascading.


- Hadoop libraries: Mahout.


- Alternative HDFS-based computing frameworks: Spark (Pregel).


- Serialization frameworks: Avro, Thrift, Protocol Buffers.


- Distributed databases: Cassandra, Voldemort, HBase, MongoDB, CouchDB.


- Real-time event streaming: Storm, S4, InfoSphere Streams (IBM).


- Statistics, data mining or machine learning: expectation, regression, clustering, etc.



- Specific experience with the Cloudera Hadoop distribution.


- Unix system administration: sudo, mountd, bind, sendmail, etc.


- Database administration: MySQL, SQLite, Oracle, or similar.

123passportphoto is a very easy to use passport photo website that provides six enhanced photos. I have never had an issue while using this ...