NoSQL weekly 278
Articles, Tutorials and Talks A tale of troubleshooting database performance, with Cassandra and sysdig In this article, I'll show you a Cassandra performance issue that we recently dealt with, and I want to cover how we spotted the problem, what kind of troubleshooting we did to better understand it, and how we eventually solved it. RDBMS & Graphs: Drivers for Connecting to a Graph Database This week, we'll discuss the language drivers and APIs specific to Neo4j with plenty of resources for further exploration. At this point, if you are curious about other, non-Neo4j graph databases, we encourage you to explore the available drivers within their respective communities. DynamoDB Design Patterns and Best Practices In this talk, we'll walk you through common NoSQL design patterns for a variety of applications to help you learn how to design a schema, store, and retrieve data with DynamoDB. We will discuss best practices with DynamoDB to develop IoT, AdTech, and gaming apps. Redis Performance Monitoring with the ELK Stack This post looks at how you can do Redis performance monitoring using the ELK Stack to ship, analyze, and visualize the data. Analyze a Time Series in Real Time with AWS Lambda, Amazon Kinesis and Amazon DynamoDB Streams This post explains how to perform time-series analysis on a stream of Amazon Kinesis records, without the need for any servers or clusters, using AWS Lambda, Amazon Kinesis Streams, Amazon DynamoDB and Amazon CloudWatch. We demonstrate how to do time-series analysis on live web analytics events stored in Amazon Kinesis Streams and present the results in near real-time for use cases like live key performance indicators, ad-hoc analytics, and quality assurance, as used in our AWS-based data science and analytics RAVEN (Reporting, Analytics, Visualization, Experimental, Networks) platform at JustGiving. Churn Prediction with PySpark using MLlib and ML Packages The prediction process is heavily data driven and often utilizes advanced machine learning techniques. In this post, we'll take a look at what types of customer data are typically used, do some preliminary analysis of the data, and generate churn prediction models - all with PySpark and its machine learning frameworks. SASIIndex SASIIndex, or "SASI" for short, is an implementation of Cassandra's Index interface that can be used as an alternative to the existing implementations. This post goes on describe how to get up and running with SASI, demonstrates usage with examples, and provides some details on its implementation. Building an Async Networking Layer for mongos How to design an HBase data model for recommendations Getting Started with Couchbase and Spark on Apache Zeppelin Strong consistency in Manhattan Financial Markets are Graphs Using Neo4j to Take Us to the Stars Books Spark: Big Data Cluster Computing in Production
"Spark: Big Data Cluster Computing in Production" goes beyond general Spark overviews to provide targeted guidance toward using lightning-fast big-data clustering in production. Written by an expert team well-known in the big data community, this book walks you through the challenges in moving from proof-of-concept or demo Spark applications to live Spark in production. Real use cases provide deep insight into common problems, limitations, challenges, and opportunities, while expert tips and tricks help you get the most out of Spark performance. Coverage includes Spark SQL, Tachyon, Kerberos, ML Lib, YARN, and Mesos, with clear, actionable guidance on resource scheduling, db connectors, streaming, security, and much more. Interesting Projects, Tools and Libraries Chronicle-Map Replicate your Key Value Store across your network, with consistency, persistance and performance. mailcap A mail capture and archival server for RethinkDB. DalmatinerDB DalmatinerDB is a metric database written in pure Erlang. It takes advantage of some special properties of metrics to make some tradeoffs. The goal is to make a store for metric data (time, value of a metric) that is fast, has a low overhead, and is easy to query and manage. HDocDB HDocDB is a client layer for using HBase as a store for JSON documents. It implements many of the interfaces in the OJAI framework. Upcoming Events and Webinars Webinar: Continuous Applications: Spark, Kafka, Beam, and Beyond After this webcast, you'll be able to:
Comprehend various streaming concepts like watermarks, triggers and how they affect system requirements and design
Understand complexities of continuous applications architectures and design trade-offs involved
Develop complex big data applications beyond simple aggregations with simplicity, completeness and changing semantics
Webinar: Low-latency ingestion and analytics with Apache Kafka and Apache Apex (Hadoop) This talk will cover a fully fault tolerant, scalable, and operational ingestion from Kafka using Apache Apex application, running natively in Hadoop. The talks will deep dive into technical details of the connectors in Apache Malhar. Details of production use cases will also be discussed. Webinar: Data Modeling, Data Querying, and NoSQL: A Deep Dive Attend and learn:
How a flexible data model and structured query language simplifies development
Best practices for data modeling -- when to embed data; when to reference it
How to do SQL-like queries on semi-structured data
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector In this webinar, we will cover the architecture needed to use the BI Connector with MongoDB. We will also demonstrate how to build reports with your data.