Pregleda (30 dana / ukupno): 11 / 1493
LinkedIn has open sourced Dr. Elephant, a powerful tool that helps users of Hadoop and Spark understand, analyze, and improve the performance of their flows.
Introducing DataStax Enterprise Graph
DataStax has announced an exciting new component of DataStax Enterprise (DSE): DSE Graph, which is a scale-out graph database used to manage complex and highly connected data. DSE Graph is a critical part of the largest upcoming release in our company’s history that includes new versions of our server, management/monitoring, and development tools.
Articles, Tutorials and Talks
Analyzing the Panama Papers with Neo4j: Data Models, Queries & More
In this post, we look at the graph data model used by the ICIJ and show how to construct it using Cypher in Neo4j. We dissect an example from the leaked data, recreating it using Cypher, and show how the model could be extended.
Data Modeling in Cassandra from a Postgres Perspective
How easy is it to model my schema in Cassandra if all I know is Postgres? Cassandra has its own SQL-like dialect called Cassandra Query Language (CQL) that mirrors many of the semantics of SQL but that's where it stops. You'll need to know how SQL and CQL differ and how to model the data properly in Cassandra.
Rethinking Streaming Analytics for Scale
Helena Edelson addresses new architectures emerging for large scale streaming analytics - based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) and other streaming analytics platforms and frameworks using Apache Flink or GearPump. Edelson discusses the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows.
Apache Showdown: Flink vs. Spark
In this post, we will describe the evaluation and decision process, and show why Apache Flink best fulfilled our requirements, as opposed to Spark.
ArangoDB: Polyglot Persistence Without Cost
Hello World, Kafka Connect + Kafka Streams
Managing Data Storage with Blockchain and BigchainDB
Code-generating away the boilerplate in our migration back to SpiderMonkey
Hadoop Real World Solutions Cookbook - Second Edition
Over 100+ hands-on recipes to help you learn and master the intricacies of Apache Hadoop 2.X, YARN, Hive, Pig, Oozie, Flume, Sqoop, Apache Spark, and Mahout.
Interesting Projects, Tools and Libraries
Dr. Elephant is a performance monitoring and tuning tool for Hadoop and Spark. It automatically gathers all the metrics, runs analysis on them, and presents them in a simple way for easy consumption. Its goal is to improve developer productivity and increase cluster efficiency by making it easier to tune the jobs.
Docussandra marries MongoDB's simple data storage model with the horizontal scaling of Cassandra. It enables developers to store arbitrary payloads as BSON, as they would with MongoDB, in a Cassandra cluster. It supports indexing, filtering, sorting, querying and pagination (via familiar limit and offset semantics), all at a blazing speed. Simple json document storage with effortless scaling exposed as a service - that's Docussandra!
TiKV is a Distributed Key-Value Database which mainly refers to the design of Google Spanner and HBase, but much simpler (Don't depend on any distributed file system). We've implemented the Raft consensus algorithm in Rust and stored consensus state in RocksDB. It not only guarantees consistency for data but also makes use of placement driver to implement sharding (split && merge) and data migration automatically. The transaction model is similar to Google's Percolator, and with some performance improvements. In fact, we provide snapshot isolation (SI) and serializable snapshot isolation (SSI), and then externally consistent reads and writes in distributed transactions.
Application examples (quick starters) that showcase Redis simple and advanced features.
Interactive and Reactive Data Science using Scala and Spark.
redis-migrate-tool is a convenient and useful tool for migrating data between redis.
Budite prvi koji će ostaviti komentar.