I’ll research the best way to make a connection from my Java to Cassandra here. There are a lot of examples how to do that, but the main thing, but I’m developing some kind of chat application on my localhost (will do single insert/update statements, etc.) when all this Spark examples are perfect for analytical workflows.
The first one example is Spark 1.6:
public static JavaSparkContext getCassandraConnector(){
SparkConf conf = new SparkConf();
conf.setAppName("Chat");
conf.set("spark.driver.allowMultipleContexts", "true");
conf.set("spark.cassandra.connection.host", "127.0.0.1");
conf.set("spark.rpc.netty.dispatcher.numThreads","2");
conf.setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);
return sc;
}
So, I also got an example for Spark 2.x where the builder will automatically reuse an existing SparkContext if one exists and create a SparkContext if it does not exist. Configuration options set in the builder are automatically propagated over to Spark and Hadoop during I/O.
public static SparkSession getSparkSession(){
SparkSession sparkSession = SparkSession
.builder()
.appName("Chat")
.config("spark.driver.allowMultipleContexts","true")
.config("spark.sql.warehouse.dir", "/file:C:/temp")
.config("spark.cassandra.connection.host", "127.0.0.1")
.config("spark.cassandra.connection.port", "9042")
.master("local[2]")
.getOrCreate();
return sparkSession;
}
I also researched Pooling Options (example for the Session), like:
public static Session getPoolSession(){
PoolingOptions poolingOptions = new PoolingOptions();
poolingOptions
.setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
.setMaxConnectionsPerHost( HostDistance.LOCAL, 10)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 32768)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 2000)
.setHeartbeatIntervalSeconds(120);
Cluster cluster = Cluster.builder()
.addContactPoints("127.0.0.1")
.withPoolingOptions(poolingOptions)
.build();
Session session = cluster.connect("chat");
return session;
}
I wondered, what is the most efficient way to make a connection and in my case – the best way is to use regular Datastax java driver (cause, once again, I got no analytical workflows):
http://docs.datastax.com/en/developer/java-driver-dse/1.1/
and
https://www.datastax.com/dev/blog/datastax-enterprise-java-driver-1-0-0-released
So, here is the code for my localhost:
import
com.datastax.driver.dse.DseCluster;
import
com.datastax.driver.dse.DseSession;
DseCluster dseCluster =
null
;
try
{
dseCluster = DseCluster.builder()
.addContactPoint(
"127.0.0.1"
)
.build();
DseSession dseSession = dseCluster.connect();
Row row = dseSession.execute(
"select release_version from system.local"
).one();
System.out.println(row.getString(
"release_version"
));
}
finally
{
if
(dseCluster !=
null
) dseCluster.close();
}
It appeared about half year ago, so I missed that somehow. I don’t need to touch pooling option parameters cause default values should do the work. Here’s is an explanation:
http://docs.datastax.com/en/drivers/java/2.2/com/datastax/driver/core/PoolingOptions.html
Thanks @nevsv from Stackoverflow for help.
P.S.
Do not forget to use PreparedStatement for queries that are executed multiple times in your application
PreparedStatement prepared = session.prepare( "insert into product (sku, description) values (?, ?)"); BoundStatement bound = prepared.bind("234827", "Mouse"); session.execute(bound); session.execute(prepared.bind("987274", "Keyboard"));
It is currently recommended to not create prepared statements for ‘SELECT *’ queries if you plan on making schema changes involving adding or dropping columns. Alternatively you should list all columns of interest in your statement, i.e.: SELECT a, b, c FROM tbl.
This will be addressed in a future release of both Cassandra and the driver. Follow CASSANDRA-10786 and JAVA-1196 for more information.