- NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable.
Tuesday, December 18, 2012
Monday, December 17, 2012
Cloudera Hadoop
Cloudera Hadoop Offers:
§ HDFS – Self healing distributed file system
§ MapReduce – Powerful,
parallel data processing framework
§ Hadoop Common – a set of
utilities that support the Hadoop subprojects
§ HBase – Hadoop database for random read/write access
§ Hive – SQL-like queries and tables on large datasets
§ Pig – Dataflow language and compiler
§ Oozie – Workflow for interdependent Hadoop jobs
§ Sqoop – Integrate databases and data warehouses with Hadoop
§ Flume – Highly reliable, configurable streaming data collection
§ Zookeeper – Coordination
service for distributed applications
The Hadoop Distributed File System (HDFS) is a distributed file
system designed to run on commodity hardware. It has many similarities with
existing distributed file systems. However, the differences from other
distributed file systems are significant. HDFS is highly fault-tolerant and is
designed to be deployed on low-cost hardware. HDFS provides high throughput
access to application data and is suitable for applications that have large
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to
file system data.
Name Node:
NameNode manages the namespace, file system metadata, and access
control. There is exactly one NameNode in each cluster. We can say NameNode is
master and data nodes are slaves. It contains all the informations about data
(i.e. the meta data)
Data Node:
DataNode holds the actual file system data. Each data node
manages its own locally-attached storage (i.e. the node’s hard disk) and stores
a copy of some or all blocks in the file system. There are one or more
DataNodes in each cluster.
Install / Deploy Hadoop:
Hadoop can be installed in 3 modes
1. Standalone mode:
To deploy Hadoop in standalone mode, we just need to set path of
JAVA_HOME. In this mode there is no need to start the daemons and no need of
name node format as data save in local disk.
2. Pseudo Distributed mode:
In this mode all the daemons (nameNode, dataNode,
secondaryNameNode, jobTracker, taskTracker) run on a single machine.
3. fully-distribute mode:
Besides Apache Hadoop, it's more or less a three horse race for Hadoop distribution betweenHortonWorks, Cloudera and MapR. Then there are GreenPlum HD and IBM InfoSphere BigInsights.
In Apache all the projects (Pig, Hive etc) are independent. Cloudera makes sure all these frameworks work properly with each other and packages them as CDH. With CDH there are regular release, which I haven't seen in Apache. Another thing is it's difficult to get support for Apache Hadoop, while Cloudera and others provide commercial support for their own versions of Hadoop.
CDH version:
Versions:
·
CDH4.2.0
Sunday, December 16, 2012
Hadoop ecosystem
- Languages: Java.
- Scripting languages: Perl, Python or similar.
- CS algorithms: sorting, hashing, recursion, trees, graphs, etc.
- Hadoop core: MapReduce, HDFS.
- Hadoop utilities: Oozie, ZooKeeper.
- Relational algebra (SQL).
- Unix shell programming (sh, bash, csh, zsh): pipes, redirection, process control, etc.
- Unix pipeline utilities: awk, sed, grep, find, etc.
- Unix system utilities: cron, at, kill, ssh, sftp, etc.
- Regular expressions.
- Hadoop cluster administration: queues, quotas, replication, block size, decommission nodes, add nodes, etc.
- JVM-based functional languages: Scala, Clojure.
- Hadoop pipeline frameworks: Streaming, Crunch, Cascading.
- Hadoop productivity frameworks: Scrunch, Scoobi.
- Hadoop query languages: Pig, Hive, Scalding, Cascalog, PyCascading.
- Hadoop libraries: Mahout.
- Alternative HDFS-based computing frameworks: Spark (Pregel).
- Serialization frameworks: Avro, Thrift, Protocol Buffers.
- Distributed databases: Cassandra, Voldemort, HBase, MongoDB, CouchDB.
- Real-time event streaming: Storm, S4, InfoSphere Streams (IBM).
- Statistics, data mining or machine learning: expectation, regression, clustering, etc.
- Specific experience with the Cloudera Hadoop distribution.
- Unix system administration: sudo, mountd, bind, sendmail, etc.
- Database administration: MySQL, SQLite, Oracle, or similar.
- Scripting languages: Perl, Python or similar.
- CS algorithms: sorting, hashing, recursion, trees, graphs, etc.
- Hadoop core: MapReduce, HDFS.
- Hadoop utilities: Oozie, ZooKeeper.
- Relational algebra (SQL).
- Unix shell programming (sh, bash, csh, zsh): pipes, redirection, process control, etc.
- Unix pipeline utilities: awk, sed, grep, find, etc.
- Unix system utilities: cron, at, kill, ssh, sftp, etc.
- Regular expressions.
- Hadoop cluster administration: queues, quotas, replication, block size, decommission nodes, add nodes, etc.
- JVM-based functional languages: Scala, Clojure.
- Hadoop pipeline frameworks: Streaming, Crunch, Cascading.
- Hadoop productivity frameworks: Scrunch, Scoobi.
- Hadoop query languages: Pig, Hive, Scalding, Cascalog, PyCascading.
- Hadoop libraries: Mahout.
- Alternative HDFS-based computing frameworks: Spark (Pregel).
- Serialization frameworks: Avro, Thrift, Protocol Buffers.
- Distributed databases: Cassandra, Voldemort, HBase, MongoDB, CouchDB.
- Real-time event streaming: Storm, S4, InfoSphere Streams (IBM).
- Statistics, data mining or machine learning: expectation, regression, clustering, etc.
- Specific experience with the Cloudera Hadoop distribution.
- Unix system administration: sudo, mountd, bind, sendmail, etc.
- Database administration: MySQL, SQLite, Oracle, or similar.
Difference between 'hadoop dfs' and 'hadoop fs'
With my latest assignment I have started exploring Hadoop and related technologies. When exploring HDFS and playing with it, I came across these two syntaxes of querying HDFS:
> hadoop dfs
> hadoop fs
Initally could not differentiate between the two and keep wondering why we have two different syntaxes for a common purpose. I googled the web and found people too having the same question and below are there reasonings:
Per Chris explanation looks like there's no difference between the two syntaxes. If we look at the definitions of the two commands (hadoop fs and hadoop dfs) in $HADOOP_HOME/bin/hadoop
I am not convinced with this, I looked out for a more convincing answer and here's are a few excerpts which made better sense to me:
Below are the excerpts from hadoop documentation which describes these two as different shells.
So from the above it can be concluded that it all depends upon the scheme configure. When using this two command with absolute URI, i.e. scheme://a/b the behavior shall be identical. Only its the default configured scheme value for file and hdfs for fs and dfs respectively which is the cause for difference in behavior.
source
> hadoop dfs
> hadoop fs
Initally could not differentiate between the two and keep wondering why we have two different syntaxes for a common purpose. I googled the web and found people too having the same question and below are there reasonings:
Per Chris explanation looks like there's no difference between the two syntaxes. If we look at the definitions of the two commands (hadoop fs and hadoop dfs) in $HADOOP_HOME/bin/hadoop
...
elif [ "$COMMAND" = "datanode" ] ; then
CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_OPTS"
elif [ "$COMMAND" = "fs" ] ; then
CLASS=org.apache.hadoop.fs.FsShell
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfs" ] ; then
CLASS=org.apache.hadoop.fs.FsShell
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfsadmin" ] ; then
CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
...
that's his reasoning behind the difference.I am not convinced with this, I looked out for a more convincing answer and here's are a few excerpts which made better sense to me:
FS relates to a generic file system which can point to any file systems like local, HDFS etc. But dfs is very specific to HDFS. So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination . But specifying DFS operation relates to HDFS.
Below are the excerpts from hadoop documentation which describes these two as different shells.
FS Shell
The FileSystem (FS) shell is invoked by bin/hadoop fs. All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands.
DFShell
The HDFS shell is invoked by bin/hadoop dfs. All the HDFS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenode:namenodeport/parent/child or simply as /parent/child (given that your configuration is set to point to namenode:namenodeport). Most of the commands in HDFS shell behave like corresponding Unix commands.
So from the above it can be concluded that it all depends upon the scheme configure. When using this two command with absolute URI, i.e. scheme://a/b the behavior shall be identical. Only its the default configured scheme value for file and hdfs for fs and dfs respectively which is the cause for difference in behavior.
source
Saturday, December 15, 2012
NoSQL database
LIST OF NOSQL DATABASES [currently 150]
Core NoSQL Systems: [Mostly originated out of a Web 2.0 need]
Wide Column Store / Column Families
Hadoop / HBase
API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication: HDFS Replication, Written in: Java, Concurrency: ?, Misc: Links: 3 Books [1, 2, 3]Cassandra
API: many » Thrift languages, Protocol: ?, Query Method: MapReduce, Replicaton: , Written in: Java, Concurrency: eventually consistent , Misc: like "Big-Table on Amazon Dynamo alike", initiated by Facebook, » Slides , » Clients, » InstallationHypertable
API: Thrift (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thrift, Query Method: HQL, native Thrift API, Replication: HDFS Replication, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High performance C++ implementation of Google's Bigtable. » Commercial supportAccumulo
Accumulo is based on BigTable and is built on top of Hadoop, Zookeeper, and Thrift. It features improvements on the BigTable design in the form of cell-based access control, improvedcompression, and a server-side programming mechanism that can modify key/value pairs at various points in the data management process.Amazon SimpleDB
Misc: not open source / part of AWS, Book (will be outperformed by DynamoDB ?!)Cloudata
Google's Big table clone like HBase. » ArticleCloudera
Professional Software & Services based on Hadoop.HPCC
from LexisNexis, info, articleStratosphere
(research system) massive parallel & flexible execution, M/R generalization and extention (paper, poster).Document Store
MongoDB
API: BSON, Protocol: C, Query Method: dynamic object-based language & MapReduce, Replication: Master Slave & Auto-Sharding, Written in: C++,Concurrency: Update in Place. Misc: Indexing, GridFS Links: » Talk, » NotesCouchbase Server
API: Memcached API+protocol (binary and ASCII) , most languages, Protocol: Memcached REST interface for cluster conf + management, Written in: C/C++ +Erlang (clustering), Replication: Peer to Peer, fully consistent, Misc: Transparent topology changes during operation, provides memcached-compatible caching buckets, commercially supported version available, Links: » Wiki, » ArticleCouchDB
API: JSON, Protocol: REST, Query Method: MapReduceR of JavaScript Funcs, Replication: Master Master, Written in: Erlang, Concurrency: MVCC, Misc:Links: » 3 CouchDB books , » Couch Lounge (partitioning / clusering), » Dr. Dobbs
RavenDB
.Net solution. Provides HTTP/JSON access. LINQ queries & Sharding supported. » MiscMarkLogic Server
(freeware+commercial) API: JSON, XML, Java Protocols: HTTP, RESTQuery Method: Full Text Search, XPath, XQuery, Range, Geospatial Written in: C++ Concurrency:Shared-nothing cluster, MVCC Misc: Petabyte-scalable, cloudable, ACID transactions, auto-sharding, failover, master slave replication, secure with ACLs. Developer Community »Clusterpoint Server
(freeware+commercial) API: XML, PHP, Java, .NET Protocols: HTTP, REST, native TCP/IP Query Method: full text search, XML, range and Xpath queries; Written in C++Concurrency: ACID-compliant, transactional, multi-master cluster Misc: Petabyte-scalable document store and full text search engine. Information ranking. Replication. CloudableThruDB
(please help provide more facts!) Uses Apache Thrift to integrate multiple backend databases as BerkeleyDB, Disk, MySQL, S3.Terrastore
API: Java & http, Protocol: http, Language: Java, Querying: Range queries, Predicates, Replication: Partitioned with consistent hashing, Consistency: Per-record strict consistency, Misc: Based on TerracottaRaptorDB
JSON based, Document store database with compiled .net map functions and automatic hybrid bitmap indexing and LINQ query filtersJasDB
Lightweight document database written in Java for high performance. API: JSON, Java Protocol:REST Query Method: REST OData Style Query language, Java fluent Query API Written in: JavaConcurrency: Atomic document writes Indexes: eventually consistent indexesSisoDB
A Document Store on top of SQL-Server.SDB
For small online databases, PHP / JSON interface, implemented in PHP.SchemaFreeDB
allows powerful SQL statements through it's enhanced SFQL query language. Provides join-like operations with simple dot-notated relations. Complex queries across complex relationships at run-time. Delivered as a pure JSON-over-HTTP Web API.djondb
djonDB API: BSON, Protocol: C++, Query Method: dynamic queries and map/reduce, Drivers: Java, C++, PHP Misc: ACID compliant, Full shell console over google v8 engine, djondb requirements are submited by users, not market. License: GPL and commercialEJDB
Embedded JSON database engine based on tokyocabinet. API: C/C++ BSON, Node.js binding, Protocol: Native, Written in: C, Query language: mongodb-like dynamic queries, Concurrency: RW locking, transactional , Misc: Indexing, collection level rw locking, collection level transactions., License: LGPLdensodb
DensoDB is a new NoSQL document database. Written for .Net environment in c# language. It’s simple, fast and reliable. SourceKey Value / Tuple Store
DynamoDB
Automatic ultra scalable NoSQL DB based on fast SSDs. Multiple Availability Zones. Elastic MapReduce Integration. Backup to S3 and much more...Azure Table Storage
Collections of free form entities (row key, partition key, timestamp). Blob and Queue Storage available, 3 times redundant. Accessible via REST or ATOM.Riak
API: JSON, Protocol: REST, Query Method: MapReduce term matching , Scaling: Multiple Masters; Written in: Erlang, Concurrency: eventually consistent (stronger then MVCC via Vector Clocks), Misc: ... Links: talk »,Redis
API: Tons of languages, Written in: C, Concurrency: in memory and saves asynchronous disk after a defined time. Append only mode available. Different kinds of fsync policies. Replication: Master / Slave, Misc: also lists, sets, sorted sets, hashes, queues. Cheat-Sheet: », great slides » Admin UI» From the Ground up »Aerospike
Ultra-fast & Web Scale DB. In-memory + Native flash. Predictable Performance - balanced 250k/50k TPS reads/writes, 99% under 1 ms. Concurrency: ACID + Tunable Consistency.Replication: Zero Config, Zero Downtime, auto clustering, cross datacenter replication, rolling upgrades. Written in: C. APIs: Many. Links: Native Flash/ SSDs, 1M TPS on $5k server, 17x lower TCO, Zero DowntimeLevelDB
Fast & Batch updates. DB from Google. Written in C++. Blog », hot Benchmark », Article »(in German). Java access.Chordless
API: Java & simple RPC to vals, Protocol: internal, Query Method: M/R inside value objects, Scaling: every node is master for its slice of namespace, Written in: Java, Concurrency:serializable transaction isolation,GenieDB
Immediate consistency sharded KV store with an eventually consistent AP store bringing eventual consistency issues down to the theoretical minimum. It features efficient record coalescing. GenieDB speaks SQL and co-exists / do intertable joins with SQL RDBMs.Scalaris
(please help provide more facts!) Written in: Erlang, Replication: Strong consistency over replicas, Concurrency: non blocking Paxos.Tokyo Cabinet / Tyrant
Links: nice talk », slides », Misc: Kyoto Cabinet »Scalien
API / Protocol: http (text, html, JSON), C, C++, Python, Java, Ruby, PHP,Perl. Concurrency:Paxos.Berkeley DB
API: Many languages, Written in: C, Replication: Master / Slave, Concurrency:MVCC, License: Sleepycat, Berkeley DB Java Edition: API: Java, Written in: Java, Replication: Master / Slave, Concurrency: serializable transaction isolation, License: SleepycatVoldemort
Open-Source implementation of Amazons Dynamo Key-Value Store.Dynomite
Open-Source implementation of Amazons Dynamo Key-Value Store. written in Erlang. With "data partitioning, versioning, and read repair, and user-provided storage engines provide persistence and query processing".KAI
Open Source Amazon Dnamo implementation, Misc: slidesMemcacheDB
API: Memcache protocol (get, set, add, replace, etc.), Written in: C, Data Model:Blob, Misc: Is Memcached writing to BerkleyDB.Faircom C-Tree
API: C, C++, C#, Java, PHP, Perl, Written in: C,C++. Misc: Transaction logging. Client/server. Embedded. SQL wrapper (not core). Been around since 1979.LSM
Key-Value database that was written as part of SQLite4, They claim it is faster then LevelDB. Instead of supporting custom comparators, they have a recommended data encoding for keys that allows various data types to be sorted.HamsterDB
: (embedded solution) ACID Compliance, Lock Free Architecture (transactions fail on conflict rather than block), Transaction logging & fail recovery (redo logs), In Memory support – can be used as a non-persisted cache, B+ Trees – supported [Source: Tony Bain »]STSdb
API: C#, Written in C#, embedded solution, generic XTableTarantool/Box
API: C, Perl, PHP, Python, Java and Ruby. Written in: Objective C ,Protocol:asynchronous binary, memcached, text (Lua console). Data model: collections of dimensionless tuples, indexed using primary + secondary keys. Concurrency: lock-free in memory, consistent with disk (write ahead log). Replication: master/slave, configurable. Other: call Lua stored procedures.Maxtable
API: C, Query Method: MQL, native API, Replication: DFS Replication, Consistency:strict consistency Written in: C.Pincaster
For geolocalized apps. Concurrency: in-memory with asynchronous disk writes. API:HTTP/JSON. Written in: C. License: BSD.RaptorDB
A pure key value store with optimized b+tree and murmur hashing. (In the near future it will be a JSON document database much like mongodb and couchdb.)TIBCO Active Spaces
peer-to-peer distributed in-memory (with persistence) datagrid that implements and expands on the concept of the Tuple Space. Has SQL Queries and ACID (=> NewSQL).allegro-C
Key-Value concept. Variable number of keys per record. Multiple key values, Hierarchic records. Relationships. Diff. record types in same DB. Indexing: B*-Tree. All aspects configurable. Full scripting language. Multi-user ACID. Web interfaces (PHP, Perl, ActionScript) plus Windows client.nessDB
A fast key-value Database (using LSM-Tree storage engine), API: Redis protocol(SET,MSET,GET,MGET,DEL etc.), Written in: ANSI CHyperDex
Distributed searchable key-value store. Fast (latency & throughput), scalable, consistent, fault tolerance, using hyperscpace hashing. APIs for C, C++ and Python.Mnesia
(ErlangDB »)LightCloud
(based on Tokyo Tyrant)Hibari
Hibari is a highly available, strongly consistent, durable, distributed key-value data storeBangDB
API: Get,Put,Delete, Protocol: Native, HTTP, Flavor: Embedded, Network, Elastic Cache, Replication: P2P based Network Overlay, Written in: C++, Concurrency: ?, Misc: robust, crash proof, Elastic, throw machines to scale linearly, Btree/EhashOpenLDAP
Key-value store, B+tree. Lightning fast reads+fast bulk loads. Memory-mapped files for persistent storage with all the speed of an in-memory database. No tuning conf required. Full ACID support. MVCC, readers run lockless. Tiny code, written in C, compiles to under 32KB of x86-64 object code. Modeled after the BerkeleyDB API for easy migration from Berkeley-based code. Benchmarks against LevelDB, Kyoto Cabinet, SQLite3, and BerkeleyDB are available, plus full paper and presentation slides.Elliptics
Github Page »[SubRecord, Mo8onDb, Dovetaildb]
Graph Databases »
Neo4J
API: lots of langs, Protocol: Java embedded / REST, Query Method: SparQL, nativeJavaAPI, JRuby, Replication: typical MySQL style master/slave, Written in: Java, Concurrency: non-block reads, writes locks involved nodes/relationships until commit, Misc:ACID possible, Links: Video », good Blog »Infinite Graph
(by Objectivity) API: Java, Protocol: Direct Language Binding, Query Method:Graph Navigation API, Predicate Language Qualification, Written in: Java (Core C++), Data Model: Labeled Directed Multi Graph, Concurrency: Update locking on subgraphs, concurrent non-blocking ingest, Misc: Free for Qualified Startups.InfoGrid
API: Java, http/REST, Protocol: as API + XPRISO, OpenID, RSS, Atom, JSON, Java embedded, Query Method: Web user interface with html, RSS, Atom, JSON output, Java native, Replication: peer-to-peer, Written in: Java, Concurrency: concurrent reads, write lock within one MeshBase, Misc: Presentation »HyperGraphDB
API: Java (and Java Langs), Written in:Java, Query Method: Java or P2P, Replication: P2P, Concurrency: STM, Misc: Open-Source, Especially for AI and Semantic Web.DEX
: API: Java, .NET, C++, Blueprints Interface Protocol: Embedded, Query Method: APIs (Java, .Net, C++) + Gremlin (via Blueprints), Written in: C++, Data Model: Labeled Directed Attributed Multigraph, Concurrency: yes, Misc: Free community edition up to 1 Mio nodes, Links: Intro », Tutorial»GraphBase
Sub-graph-based API, query language, tools & transactions. Embedded Java, remote-proxy Java or REST. Distributed storage & processing. Read/write all Nodes. Permissions & Constraints frameworks. Object storage, vertex-embedded agents. Supports multiple graph models. Written in JavaTrinity
API: C#, Protocol: C# Language Binding, Query Method: Graph Navigation API, Replication: P2P with Master Node, Written in: C#, Concurrency: Yes (Transactional update in online query mode, Non-blocking read in Batch Mode) Misc: distributed in-memory storage, parallel graph computation platform (Microsoft Research Project)AllegroGraph
API: Java, Python, Ruby, C#, Perl, Clojure, Lisp Protocol: REST, Query Method:SPARQL and Prolog, Libraries: Social Networking Analytics & GeoSpatial, Written in: CommonLisp, Links: Learning Center », Videos »BrightstarDB
A native, .NET, semantic web database with code first Entity Framework, LINQ andOData support. API: C#, Protocol: SPARQL HTTP, C#, Query Method: LINQ, SPARQL, Written in: C#Bigdata
API: Java, Jini service discovery, Concurrency: very high (MVCC), Written in: Java, Misc: GPL + commercial, Data: RDF data with inference, dynamic key-range sharding of indices, Misc: Blog » (parallel database, high-availability architecture, immortal database with historical views)Meronymy
RDF enterprise database management system. It is cross-platform and can be used with most programming languages. Main features: high performance, guarantee database transactions with ACID, secure with ACL's, SPARQL & SPARUL, ODBC & JDBC drivers, RDF & RDFS. »OpenLink Virtuoso
Hybrid DBMS covering the following models: Relational, Document, GraphVertexDB
FlockDB
by twitter » »BrightstarDB
Execom IOG
List of SPARQL implementations can be found here
Multimodel Databases
OrientDB
Languages: Java, Schema: Has features of an Object-Database, DocumentDB, GraphDB or Key-Value DB, Written in: Java, Query Method: Native and SQL, Misc: really fast, lightweight, ACID with recovery.ArangoDB
API: REST, blueprint, different programming languages like Ruby, Python, Java, PHP, Data Model: document oriented & graphs with shapes, Protocol: HTTP using JSON, Query Method: declarative query language, query by example, map/reduce, key/value, Replication:master-master asynchronous, master-master synchronous, Written in: C/C++/Javascript (using V8), Concurrency: MVCC, Misc: "stored procedures" (in Ruby & Javascript),schema-free schematas, (note: work in progress, this list includes features for V2 available in Autumn 2012),AlchemyDB
GraphDB + RDBMS + KV Store + Document Store. Alchemy Database is a low-latency high-TPS NewSQL RDBMS embedded in the NOSQL datastore redis. Extensive datastore-side-scripting is provided via deeply embedded Lua.[Mostly NOT originated out of a Web 2.0 need but worth a look for great non relational solutions]
Object Databases »
Versant
API: Languages/Protocol: Java, C#, C++, Python. Schema: language class model (easy changable). Modes: always consistent and eventually consistent Replication: synchronous fault tolerant and peer to peer asynchronous. Concurrency: optimistic and object based locks. Scaling: can add physical nodes on fly for scale out/in and migrate objects between nodes without impact to application code. Misc: MapReduce via parallel SQL like query across logical database groupings.db4o
API: Java, C#, .Net Langs, Protocol: language, Query Method: QBE (by Example), Soda, Native Queries, LINQ (.NET), Replication: db4o2db4o & dRS to relationals, Written in: Java, Cuncurrency: ACID serialized, Misc: embedded lib, Links: DZone Refcard #53 », Book »,Objectivity
API: Languages: Java, C#, C++, Python, Smalltalk, SQL access through ODBC. Schema: native language class model, direct support for references, interoperable across all language bindings. 64 bit unique object ID (OID) supports multi exa-byte. Platforms: 32 and 64 bit Windows, Linux, Mac OSX, *Unix. Modes: always consistent (ACID). Concurrency: locks at cluster of objects (container) level. Scaling: unique distributed architecture, dynamic addition/removal of clients & servers, cloud environment ready. Replication: synchronous with quorum fault tolerant across peer to peer partitions. ProgressGemstone
Starcounter
API: C# (.NET languages), Schema: Native language class model, Query method:SQL, Concurrency: Fully ACID compliant, Storage: In-memory with transactions secured on disk, Reliability: Full checkpoint recovery, Misc: VMDBMS - Integrating the DBMS with the virtual machine for maximal performance and ease of use.Perst
API: Java,Java ME,C#,Mono. Query method: OO via Perst collections, QBE, Native Queries, LINQ, native full-text search, JSQL Replication: Async+sync (master-slave) Written in:Java, C#. Caching: Object cache (LRU, weak, strong), page pool, in-memory databaseConcurrency: Pessimistic+optimistic (MVCC) + async or sync (ACID) Index types: Many tree models + Time Series. Misc: Embedded lib., encryption, automatic recovery, native full text search, on-line or off-line backup.VelocityDB
Written in100% pure C#, Concurrency: ACID/transactional, pessimistic/optimistic locking, Misc: compact data, B-tree indexes, LINQ queries, 64bit object identifiers (Oid) supporting multi millions of databases and high performance. Deploy with a single DLL of around 400KB.HSS Database
Written in: 100% C#, The HSS DB v3.0 (HighSpeed-Solutions Database), is a client based, zero-configuration, auto schema evolution, acid/transactional, LINQ Query, DBMS for Microsoft .NET 4/4.5, Windows 8 (Windows Runtime), Windows Phone 7.5/8, Silverlight 5, MonoTouch for iPhone and Mono for AndroidZODB
API: Python, Protocol: Internal, ZEO, Query Method: Direct object access, zope.catalog, gocept.objectquery, Replication: ZEO, ZEORAID, RelStorage Written in: Python, C Concurrency:MVCC, License: Zope Public License (OSI approved) Misc: Used in production since 1998Magma
Smalltalk DB, optimistic locking, Transactions, etc.NEO
API: Python - ZODB "Storage" interface, Protocol: native, Query Method: transactional key-value, Replication: native, Written in: Python, Concurrency: MVCC (internally), License: GPL "v2 or later", Misc: Load balancing, fault tolerant, hot-extensible.PicoLisp
Language and Object Database, can be viewed as a Database Development Framework. Schema: native language class model with relations + various indexes. Queries: language build in + a small Prolog like DSL Pilog. Concurrency: synchronization + locks. Replication, distribution and fault tolerance is not implemented per default but can be implemented with native functionality. Written in C (32bit) or assembly (64bit).siaqodb
An object database engine that currently runs on .NET, Mono, Silverlight,Windows Phone 7, MonoTouch, MonoAndroid, CompactFramework; It has implemented a Sync Framework Provider and can be synchronized with MS SQLServer; Query method:LINQ;Sterling
is a lightweight object-oriented database for .NET with support for Silverlight and Windows Phone 7. It features in-memory keys and indexes, triggers, and support for compressing and encrypting the underlying data.Morantex
Stores .NET classes in a datapool. Build for speed. SQL Server integration. LINQ support.EyeDB
EyeDB is an LGPL OODBMS, provides an advanced object model (inheritance, collections, arrays, methods, triggers, constraints, reflexivity), an object definition language based on ODMG ODL, an object query and manipulation language based on ODMG OQL. Programming interfaces for C++ and Java.FramerD
Object-Oriented Database designed to support the maintenance and sharing of knowledge bases. Optimized for pointer-intensive data structures used by semantic networks, frame systems, and many intelligent agent applications. Written in: ANSI C.Ninja Database Pro
Ninja Database Pro is a .NET ACID compliant relational object database that supports transactions, indexes, encryption, and compression. It currently runs on .NET Desktop Applications, Silverlight Applications, and Windows Phone 7 Applications.NDatabase
C# & embeddedGrid & Cloud Database Solutions
GigaSpaces
Popular SpaceBased Grid Solution.Infinispan
scalable, highly available data grid platform, open source, written in Java.Queplix
NOSQL Data Integration Environment, can integrate relational, object, BigData – NOSQL easily and without any SQL.Hazelcast
P2P Data Grid Solution on java.util.*, On a 100 Noce EC2 Cluster »XML Databases
EMC Documentum xDB
(commercial system) API: Java, XQuery, Protocol: WebDAV, web services, Query method: XQuery, XPath, XPointer, Replication: lazy primary copy replication (master/replicas), Written in: Java, Concurrency: concurrent reads, writes with lock; transaction isolation, Misc: Fully transactional persistent DOM; versioning; multiple index types; metadata and non-XML data support; unlimited horizontal scaling. Developer Network »eXist
API: XQuery, XML:DB API, DOM, SAX, Protocols: HTTP/REST, WebDAV, SOAP, XML-RPC, Atom, Query Method: XQuery, Written in: Java (open source), Concurrency: Concurrent reads, lock on write; Misc: Entire web applications can be written in XQuery, using XSLT, XHTML, CSS, and Javascript (for AJAX functionality). (1.4) adds a new full text search index based on Apache Lucene, a lightweight URL rewriting and MVC framework, and support for XProc.Sedna
Misc: ACID transactions, security, indices, hot backup. Flexible XML processing facilities include W3C XQuery implementation, tight integration of XQuery with full-text search facilities and a node-level update language.BaseX
BaseX is a fast, powerful, lightweight XML database system and XPath/XQuery processor with highly conformant support for the latest W3C Update and Full Text Recommendations. Client/Server architecture, ACID transaction support, user management, logging, Open Source, BSD-license, written in Java, runs out of the box.Qizx
commercial and open source version, API: Java, Protocols: HTTP, REST, Query Method: XQuery, XQuery Full-Text, XQuery Update, Written in: Java, full source can be purchased, Concurrency: Concurrent reads & writes, isolation, Misc: Terabyte scalable, emphasizes query speed.Berkeley DB XML
API: Many languages, Written in: C++, Query Method: XQuery, Replication: Master / Slave, Concurrency: MVCC, License: SleepycatMultidimensional Databases
Globals:
by Intersystems, multidimensional array.Node.js API, array based APIs (Java / .NET), and a Java based document API.Intersystems Cache
Postrelational System. Multidimensional array APIs, Object APIs, Relational Support (Fully SQL capable JDBC, ODBC, etc.) and Document APIs are new in the upcoming 2012.2.x versions. Availible for Windows, Linux and OpenVMS.GT.M
API: M, C, Python, Perl, Protocol: native, inprocess C, Misc: Wrappers: M/DB for SimpleDB compatible HTTP », MDB:X for XML », PIP for mapping to tables for SQL », Features: Small footprint (17MB), Terabyte Scalability, Unicode support, Database encryption, Secure, ACID transactions (single node), eventual consistency (replication), License: AGPL v3 on x86 GNU/Linux, Links: Slides »,SciDB
Array Data Model for Scientists, » paper, » poster, » HiScaBlograsdaman
: Short description: Rasdaman is a scientific database that allows to store and retrieve multi-dimensional raster data (arrays) of unlimited size through an SQL-style query language. API: C++/Java, Written in C++, Query method: SQL-like query language rasql, as well as via OGC standards WCPS, WCS, WPS link2Multivalue Database
U2
(UniVerse, UniData): MultiValue Databases, Data Structure: MultiValued, Supports nested entities, Virtual Metadata, API: BASIC, InterCall, Socket, .NET and Java API's, IDE: Native, Record Oriented, Scalability: automatic table space allocation, Protocol: Client Server, SOA, Terminal Line, X-OFF/X-ON, Written in: C, Query Method: Native mvQuery, (Retrieve/UniQuery) and SQL, Replication: yes, Hot standby, Concurrency: Record and File Locking (Fine and Coarse Granularity)OpenInsight
API: Basic+, .Net, COM, Socket, ODBC, Protocol: TCP/IP, Named Pipes, Telnet, VT100. HTTP/S Query Method: RList, SQL & XPath Written in: Native 4GL, C, C++, Basic+, .Net, Java Replication: Hot Standby Concurrency: table &/or row locking, optionally transaction based & commit & rollback Data structure: Relational &/or MultiValue, supports nested entities Scalability: rows and tables size dynamicallyReality
(Northgate IS): The original MultiValue data set database, virtual machine, enquiry and rapid development environment. Delivers ultra efficiency, scalability and resilience while extended for the web and with built-in auto sizing, failsafe and more. Interoperability includes Web Services, Java Classes, XML, ActiveX, Sockets, C and, for those that have to interoperate with the SQL world, ODBC/JDBC and two-way transparent SQL data access.OpenQM
Supports nested data. Fully automated table space allocation. Concurrency control via task locks, file locks & shareable/exclusive record locks. Case insensitivity option. Secondary key indices. Integrated data replication. QMBasic programming language for rapid development. OO programming integrated into QMBasic. QMClient connectivity from Visual Basic, PowerBasic, Delphi, PureBasic, ASP, PHP, C and more. Extended multivalue query language.ESENT
(by Microsoft) ISAM storage technology. Access using index or cursor navigation. Denormalized schemas, wide tables with sparse columns, multi-valued columns, and sparse and rich indexes. C# and Delphi drivers available. Backend for a number of MS Products as Exchange.jBASE
more info »Event Sourcing
Event Store
other NoSQL related databases
IBM Lotus/Domino
Type: Document Store, API: Java, HTTP, IIOP, C API, REST Web Services, DXL, Languages: Java, JavaScript, LotusScript, C, @Formulas, Protocol: HTTP, NRPC, Replication: Master/Master, Written in: C, Concurrency: Eventually Consistent, Scaling: Replication ClusterseXtremeDB
Type: In-Memory Database; Written in: C; API: C/C++, SQL, JNI, C#(.NET), JDBC; Replication: Async+sync (master-slave), Cluster; Scalability: 64-bit and MVCCRDM Embedded
APIs: C++, Navigational C. Embedded Solution that is ACID Compliant with Multi-Core, On-Disk & In-Memory Support. Distributed Capabilities, Hot Online Backup, supports all Main Platforms. Supported B Tree & Hash Indexing. Replication: Master/Slave, Concurrency: MVCC. Client/Server: In-process/Built-in.ISIS Family
(semistructured databases) »VaultDB
Next-gen NoSQL encrypted document store. Multi-recipient / group encryption. Featuers: concurrency, indices, ACID transactions, replication and PKI management. Supports PHP and many others. Written in C++. Commercial but has a free version. API: JSONPrevayler
Java RAM Data structure journalling.Yserial
Python wrapper over sqlite3unresolved and uncategorized
VMware vFabric GemFire
(coming soon!)Btrieve
(by Pervasive Software) key/index/tupel DB. Using Pages. » (faq »)KirbyBase
Written in: Ruby. github: »Tokutek:
Recutils:
GNU Tool for text files containing records and fields. Manual »FileDB:
Mainly targeted to Silverlight/Windows Phone developers but its also great for any .NET application where a simple local database is required, extremely Lightweight - less than 50K, stores one table per file, including index, compiled versions for Windows Phone 7, Silverlight and .NET, fast, free to use in your applicationsCodernityDB
written in PythonTwisted Storage », Java-Chronicle », Ringo, Sherpa, tin, Dryad, SkyNet, Disco
Possibly the oldest NoSQL DB (together with MUMPS): » Adabas VSAM by IBM is also a good candidate.
123passportphoto is a very easy to use passport photo website that provides six enhanced photos. I have never had an issue while using this ...
-
Effective October 31, 2014, Embassy Kingston does not accept interview applications from third country nationals Effective October...
-
Introduction/ History of Struts The first version of Struts was released in June 2001. It was born out of the idea that JSPs and servle...
-
Cloudera Hadoop Offers: § HDFS – Self healing distributed file system § MapReduce – Powerful, parallel data processing framew...