====== Learnexa Storage Solution ====== {{:storage_learnexa.jpg|}} ===== Current Production Setup ===== We currently have a single node GlusterFS solution, which has network bandwidth cap of 1Gbps and cannot scale up unless we add a few more nodes or find a better a better storage solution. ===== Requirement(s) ===== - High Performance (I/O Performance - Read and Write) - Distributed (File stored across storage nodes) - Fault-Tolerant (Client should survive a Storage Node Failure) ==== Known Technical Limitations ==== * Please note that not all DFS offer Direct parallel access to Filesystem and Network access. Even it does its not easy to manage or maintain the storage solution. ==== Solutions Available ==== * GlusterFS * BeeGFS * MapR-FS A solution which can be considered for future storage needs is Amazon EFS. Amazon EFS is still in preview state when this wiki was created. ===== GlusterFS (Gluster File System) ===== GlusterFS seems to be a good overall solution, because of the following reasons; - Fault Tolerance * if its configured to work in Distributed-Replicated (or similar to RAID10) configuration which requires at-least 4 nodes. - Scales up better * Its better than traditional NFS solution as it uses server's internal storage. - Self-Healing capabilities * allow automatic rebalancing of data once failed node recovers. - Works with any common NFS version 3 client. - Easy to manage. What GlusterFS lacks? - Even though GlusterFS is a parallel filesystem, Gluster clients cannot speak parallel to all the servers at once, which adds a network bottle neck. - GlusterFS is based on technology similar to Rsync over NFS, hence it depends on underlying Translator, which auto-balances data across nodes. - GlusterFS depends purely on how the underlying Filesystem is configured. - Though its easy to manage, it is difficult to monitor client side failures. - Performance is lower that BeeGFS and Mapr-FS For More FAQ's please visit: http://www.gluster.org/community/documentation/index.php/GlusterFS_Technical_FAQ ===== BeeGFS/FhGFS Parallel File System ===== BeeGFS/FhGFS seems to be a good solution for performance, because of the following reasons; - Network and Storage Performance * Scales up both in storage and network performance as and when new nodes are added. - Requires special client module, which is easy to configure. - Gives better performance similar to Mapr. - Easy to manage, better monitoring capabilities and comes with excellent administrative tools. What BeeGFS lacks? - Does not have auto self-healing and Proper Fault-tolerance ===== Mapr-FS (HDFS - Hadoop Distributed File System) ===== MAPR-FS seems to be a good overall for performance and storage, because of the following reasons; - Network and Storage Performance * Scales up both in storage and network performance as and when new nodes are added. - Performance is similar to BeeGFS. - Fault Tolerant and allows replicated sets to be spread across what’s called data nodes. What MapR-FS lacks? - Licensed based, only one node can be added part of community version. - No proper mounting capability - Does not show disk utilization over NFS. - Difficult to manage, its more like Tomcat application and hence need Java runtime environment. ==== Final Solution AWS S3 ==== We have tested the S3 performance on AWS and we found it to be efficient and highly available. Here are some statistics; 1073741824 bytes (1.1 GB) copied, 9.97808 s, 108 MB/s 4294967296 bytes (4.3 GB) copied, 37.887 s, 113 MB/s We have reached the maximum of Gigabit speed on S3 similar to HDFS (though not limited to Bandwidth of our Storage Server) Note : The performance is dependent on the type of Instance used, the above test was done on c1.xlarge machine. ====== Suggested Solution (Obsolete) ====== * We have to go for mixed solution, Application software usually keeps 2 copies of media files, * We can place application data (including recording and streaming) on GlusterFS and BeeGFS will handle recording and streaming (a copy of the same). * Else we can have multiple GlusterFS nodes for short term until Amazon EFS is ready for production use (which is our ultimate solution - Includes auto-burst rate).