Java, NoSQL, SQL, REST API and other scary words

Angular JS – implement Emoji

Today I’ll show you how to implement custom emoji (in fact – how to replace any text with any image) with AngularJS. In fact, that will kill you browser, cause this is a pretty heaven operation and for the chat application, which refreshes every 3 seconds, such kind of solution is a real disaster.

So, the first step is to implement custom binding for your text, you could easily do it this way:


As you can see – I’m using emoji function here, and here is my code listing:

var emoticons = {
 ':)' : getUrlToMessages +'img_smile.png',
 ':(' : getUrlToMessages +'img_sad.png',
 ':D' : getUrlToMessages +'img_haha.png',
 ':o' : getUrlToMessages +'img_omg.png'
 }, patterns = [], metachars = /[[\]{}()*+?.\\|^$\-,&#\s]/g;

$scope.emoji = function(message){
   if (message != null){
   // build a regex pattern for each defined property
   for (var i in emoticons) {
     if (emoticons.hasOwnProperty(i)){ // escape metacharacters
     patterns.push('('+i.replace(metachars, "\\$&")+')');
   // build the regular expression and replace
   return message.replace(new RegExp(patterns.join('|'), 'g'), function (match) {
     var escape = typeof emoticons[match] != 'undefined' ? '<img src="' + emoticons[match] + '" />' : match;
     return $sce.trustAsHtml(escape);

As far as I don’t want to use any custom images,  I’ll just use decimal code emoji. This is not so straightforward cause Angular’s $sanitize service attempts to convert the characters to their HTML-entity equivalent.  To avoid that HTML going through $sanitize – pass your string through $sce.trustAsHtml:

$scope.emoji = function(message){
 return $sce.trustAsHtml(message);


Java, NoSQL, SQL, REST API and other scary words

Cassandra Datastax and Java – best way to set up connection

I’ll research the best way to make a connection from my Java to Cassandra here. There are a lot of examples how to do that, but the main thing, but I’m developing some kind of chat application on my localhost (will do single insert/update statements, etc.) when all this Spark examples are perfect for analytical workflows.

The first one example is Spark 1.6:

public static JavaSparkContext getCassandraConnector(){
         SparkConf conf = new SparkConf();
         conf.set("spark.driver.allowMultipleContexts", "true");
         conf.set("", "");

         JavaSparkContext sc = new JavaSparkContext(conf);
         return sc;

So, I also got an example for Spark 2.x where the builder will automatically reuse an existing SparkContext if one exists and create a SparkContext if it does not exist. Configuration options set in the builder are automatically propagated over to Spark and Hadoop during I/O. Continue reading

Java, NoSQL, SQL, REST API and other scary words

Spark Dataset API implementation


Spark introduced Dataframes in Spark 1.3 release. Dataframe overcomes the key challenges that RDDs had.

A DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a R/Python Dataframe. Along with Dataframe, Spark also introduced catalyst optimizer, which leverages advanced programming features to build an extensible query optimizer.

Dataframe Features

  • Distributed collection of Row Object: A DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database, but with richer optimizations under the hood.
  • Data Processing: Processing structured and unstructured data formats (Avro, CSV, elastic search, and Cassandra) and storage systems (HDFS, HIVE tables, MySQL, etc). It can read and write from all these various datasources.
  • Optimization using catalyst optimizer: It powers both SQL queries and the DataFrame API. Dataframe use catalyst tree transformation framework in four phases,
  • 1.Analyzing a logical plan to resolve references 2.Logical plan optimization 3.Physical planning 4.Code generation to compile parts of the query to Java bytecode.
  • Hive Compatibility: Using Spark SQL, you can run unmodified Hive queries on your existing Hive warehouses. It reuses Hive frontend and MetaStore and gives you full compatibility with existing Hive data, queries, and UDFs.
  • Tungsten: Tungsten provides a physical execution backend whichexplicitly manages memory and dynamically generates bytecode for expression evaluation.
  • Programming Languages supported:
  • Dataframe API is available in Java, Scala, Python, and R.

Continue reading

Java, NoSQL, SQL, REST API and other scary words

Modification of the SQL ER model to NoSQL Cassandra

Original MySQL ER-model:


There is no difficulties to implement account management and security case based on UI architecture – there is no need to join tables.

First problem appeared on attempt  to implement friendship data or messages – we need to join users_link and users table for friendship data and we need to join messages and friendship data to filter messages to display here.

There are some shortcuts which will allow us to solve these problems. Continue reading

Java, NoSQL, SQL, REST API and other scary words

Spark and Data Formats. Introduction

This is a pretty short compilation about data formats and I will ignore JSON, XML and a lot of other formats here.

From my point of view – this one presentation is very important to understand difference between different formats (I used a lot of other data sources, just in case):


The CSV (“Comma Separated Values”) file format is often used to exchange data between differently similar applications. The CSV Format:

  • Each record is one line – Line separator may be LF (0x0A) or CRLF (0x0D0A), a line separator may also be embedded in the data (making a record more than one line but still acceptable).
  • Fields are separated with commas.
  • Leading and trailing whitespace is ignored – Unless the field is delimited with double-quotes in that case the whitespace is preserved.
  • Embedded commas – Field must be delimited with double-quotes.
  • Embedded double-quotes – Embedded double-quote characters must be doubled, and the field must be delimited with double-quotes.
  • Embedded line-breaks – Fields must be surrounded by double-quotes.
  • Always Delimiting – Fields may always be delimited with double quotes, the delimiters will be parsed and discarded by the reading applications.

Example: Continue reading

Java, NoSQL, SQL, REST API and other scary words

Spark Introduction. RDD

Once again, the key thing about queries in Cassandra (my previous article about it) –  there is no way to do joins there. So you should be very accurate with the model of the database. Anyway, if  there is a strong need to perform relational queries over data stored in Cassandra clusters – use Spark.

Let’s start with short introduction about Spark, it is:

  • A lightning-fast cluster computing technology, designed for fast computation,
  • A unified relational query language for traversing over Spark Resilient Distributed Datasets (RDDs),
  • Support of a variation of the query language used in relational databases,
  • Not about their own database and query language – Spark is about query language and other databases (in our case – Cassandra). You can execute Spark queries in Java applications that traverse over Cassandra column families.

One pretty nice article about Spark is here:

Spark and Hadoop difference


What to add? Continue reading

Java, NoSQL, SQL, REST API and other scary words

Cassandra. Introduction

NoSQL is definitely faster than SQL. This isn’t surprising; NoSQL’s simpler denormalized store allows you to retrieve all information about a specific item in a single request. There’s no need for related JOINs or complex SQL queries.

That said, your project design and data requirements will have most impact. A well-designed SQL database will almost certainly perform better than a badly designed NoSQL equivalent and vice versa.

I decided to start with Cassandra cause, from my point of view, this is the most trending NoSQL for today.

Cassandra is good for:

  • Simple setup, maintenance code
  • Fast random read/write
  • Flexible parsing/wide column requirement
  • No multiple secondary index needed

Not good for:

  • Secondary index
  • Relational data
  • Transactional operations (Rollback, Commit)
  • Primary & Financial record
  • Stringent and authorization needed on data
  • Dynamic queries/searching  on column data
  • Low latency

Key usage case: Twitter Continue reading