Java, NoSQL, SQL, REST API and other scary words

Cassandra Datastax and Java – best way to set up connection

I’ll research the best way to make a connection from my Java to Cassandra here. There are a lot of examples how to do that, but the main thing, but I’m developing some kind of chat application on my localhost (will do single insert/update statements, etc.) when all this Spark examples are perfect for analytical workflows.

The first one example is Spark 1.6:

public static JavaSparkContext getCassandraConnector(){
         SparkConf conf = new SparkConf();
         conf.setAppName("Chat");
         conf.set("spark.driver.allowMultipleContexts", "true");
         conf.set("spark.cassandra.connection.host", "127.0.0.1");
         conf.set("spark.rpc.netty.dispatcher.numThreads","2");
         conf.setMaster("local[2]");

         JavaSparkContext sc = new JavaSparkContext(conf);
         return sc;
    }

So, I also got an example for Spark 2.x where the builder will automatically reuse an existing SparkContext if one exists and create a SparkContext if it does not exist. Configuration options set in the builder are automatically propagated over to Spark and Hadoop during I/O.

public static SparkSession getSparkSession(){
    SparkSession sparkSession = SparkSession
        .builder()
        .appName("Chat")
        .config("spark.driver.allowMultipleContexts","true")
        .config("spark.sql.warehouse.dir", "/file:C:/temp")
        .config("spark.cassandra.connection.host", "127.0.0.1")
        .config("spark.cassandra.connection.port", "9042")
        .master("local[2]")
        .getOrCreate();
    return sparkSession;
}

I also researched Pooling Options (example for the Session), like:

public static Session getPoolSession(){
    PoolingOptions poolingOptions = new PoolingOptions();
    poolingOptions
    .setCoreConnectionsPerHost(HostDistance.LOCAL,  4)
    .setMaxConnectionsPerHost( HostDistance.LOCAL, 10)
    .setMaxRequestsPerConnection(HostDistance.LOCAL, 32768)
    .setMaxRequestsPerConnection(HostDistance.REMOTE, 2000)
    .setHeartbeatIntervalSeconds(120);

    Cluster cluster = Cluster.builder()
        .addContactPoints("127.0.0.1")
        .withPoolingOptions(poolingOptions)
        .build();

    Session session = cluster.connect("chat");
    return session;
    }

I wondered, what is the most efficient way to make a connection and in my case – the best way is to use regular Datastax java driver (cause, once again, I got no analytical workflows):

http://docs.datastax.com/en/developer/java-driver-dse/1.1/

and

https://www.datastax.com/dev/blog/datastax-enterprise-java-driver-1-0-0-released

So, here is the code for my localhost:

import com.datastax.driver.dse.DseCluster;
import com.datastax.driver.dse.DseSession;

DseCluster dseCluster = null;
try {
   dseCluster = DseCluster.builder()
           .addContactPoint("127.0.0.1")
           .build();
   DseSession dseSession = dseCluster.connect();
 
   Row row = dseSession.execute("select release_version from system.local").one();
   System.out.println(row.getString("release_version"));
} finally {
   if (dseCluster != null) dseCluster.close();
}

It appeared about half year ago, so  I missed that somehow. I don’t need to touch pooling option parameters cause default values should do the work. Here’s is an explanation:

http://docs.datastax.com/en/drivers/java/2.2/com/datastax/driver/core/PoolingOptions.html

Thanks @nevsv from Stackoverflow for help.

 

P.S.

Do not forget to use PreparedStatement for queries that are executed multiple times in your application

PreparedStatement prepared = session.prepare(
 "insert into product (sku, description) values (?, ?)");
 BoundStatement bound = prepared.bind("234827", "Mouse");
 session.execute(bound);
 session.execute(prepared.bind("987274", "Keyboard"));

It is currently recommended to not create prepared statements for ‘SELECT *’ queries if you plan on making schema changes involving adding or dropping columns. Alternatively you should list all columns of interest in your statement, i.e.: SELECT a, b, c FROM tbl.

This will be addressed in a future release of both Cassandra and the driver. Follow CASSANDRA-10786 and JAVA-1196 for more information.

Advertisements
Standard

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s