Java, NoSQL, SQL, REST API and other scary words

Cassandra Datastax and Java – best way to set up connection

I’ll research the best way to make a connection from my Java to Cassandra here. There are a lot of examples how to do that, but the main thing, but I’m developing some kind of chat application on my localhost (will do single insert/update statements, etc.) when all this Spark examples are perfect for analytical workflows.

The first one example is Spark 1.6:

public static JavaSparkContext getCassandraConnector(){
         SparkConf conf = new SparkConf();
         conf.set("spark.driver.allowMultipleContexts", "true");
         conf.set("", "");

         JavaSparkContext sc = new JavaSparkContext(conf);
         return sc;

So, I also got an example for Spark 2.x where the builder will automatically reuse an existing SparkContext if one exists and create a SparkContext if it does not exist. Configuration options set in the builder are automatically propagated over to Spark and Hadoop during I/O.

public static SparkSession getSparkSession(){
    SparkSession sparkSession = SparkSession
        .config("spark.sql.warehouse.dir", "/file:C:/temp")
        .config("", "")
        .config("spark.cassandra.connection.port", "9042")
    return sparkSession;

I also researched Pooling Options (example for the Session), like:

public static Session getPoolSession(){
    PoolingOptions poolingOptions = new PoolingOptions();
    .setCoreConnectionsPerHost(HostDistance.LOCAL,  4)
    .setMaxConnectionsPerHost( HostDistance.LOCAL, 10)
    .setMaxRequestsPerConnection(HostDistance.LOCAL, 32768)
    .setMaxRequestsPerConnection(HostDistance.REMOTE, 2000)

    Cluster cluster = Cluster.builder()

    Session session = cluster.connect("chat");
    return session;

I wondered, what is the most efficient way to make a connection and in my case – the best way is to use regular Datastax java driver (cause, once again, I got no analytical workflows):


So, here is the code for my localhost:

import com.datastax.driver.dse.DseCluster;
import com.datastax.driver.dse.DseSession;

DseCluster dseCluster = null;
try {
   dseCluster = DseCluster.builder()
   DseSession dseSession = dseCluster.connect();
   Row row = dseSession.execute("select release_version from system.local").one();
} finally {
   if (dseCluster != null) dseCluster.close();

It appeared about half year ago, so  I missed that somehow. I don’t need to touch pooling option parameters cause default values should do the work. Here’s is an explanation:

Thanks @nevsv from Stackoverflow for help.



Do not forget to use PreparedStatement for queries that are executed multiple times in your application

PreparedStatement prepared = session.prepare(
 "insert into product (sku, description) values (?, ?)");
 BoundStatement bound = prepared.bind("234827", "Mouse");
 session.execute(prepared.bind("987274", "Keyboard"));

It is currently recommended to not create prepared statements for ‘SELECT *’ queries if you plan on making schema changes involving adding or dropping columns. Alternatively you should list all columns of interest in your statement, i.e.: SELECT a, b, c FROM tbl.

This will be addressed in a future release of both Cassandra and the driver. Follow CASSANDRA-10786 and JAVA-1196 for more information.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s