Java, NoSQL, SQL, REST API and other scary words

Spark and Data Formats. Introduction

This is a pretty short compilation about data formats and I will ignore JSON, XML and a lot of other formats here.

From my point of view – this one presentation is very important to understand difference between different formats (I used a lot of other data sources, just in case):

http://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-stampedecon-2015

CSV

The CSV (“Comma Separated Values”) file format is often used to exchange data between differently similar applications. The CSV Format:

  • Each record is one line – Line separator may be LF (0x0A) or CRLF (0x0D0A), a line separator may also be embedded in the data (making a record more than one line but still acceptable).
  • Fields are separated with commas.
  • Leading and trailing whitespace is ignored – Unless the field is delimited with double-quotes in that case the whitespace is preserved.
  • Embedded commas – Field must be delimited with double-quotes.
  • Embedded double-quotes – Embedded double-quote characters must be doubled, and the field must be delimited with double-quotes.
  • Embedded line-breaks – Fields must be surrounded by double-quotes.
  • Always Delimiting – Fields may always be delimited with double quotes, the delimiters will be parsed and discarded by the reading applications.

Example: Continue reading

Standard
Java, NoSQL, SQL, REST API and other scary words

Spark Introduction. RDD

Once again, the key thing about queries in Cassandra (my previous article about it) –  there is no way to do joins there. So you should be very accurate with the model of the database. Anyway, if  there is a strong need to perform relational queries over data stored in Cassandra clusters – use Spark.

Let’s start with short introduction about Spark, it is:

  • A lightning-fast cluster computing technology, designed for fast computation,
  • A unified relational query language for traversing over Spark Resilient Distributed Datasets (RDDs),
  • Support of a variation of the query language used in relational databases,
  • Not about their own database and query language – Spark is about query language and other databases (in our case – Cassandra). You can execute Spark queries in Java applications that traverse over Cassandra column families.

One pretty nice article about Spark is here: http://spark.apache.org/docs/latest/programming-guide.html

Spark and Hadoop difference

1

What to add? Continue reading

Standard
Java, NoSQL, SQL, REST API and other scary words

Cassandra. Introduction

NoSQL is definitely faster than SQL. This isn’t surprising; NoSQL’s simpler denormalized store allows you to retrieve all information about a specific item in a single request. There’s no need for related JOINs or complex SQL queries.

That said, your project design and data requirements will have most impact. A well-designed SQL database will almost certainly perform better than a badly designed NoSQL equivalent and vice versa.

I decided to start with Cassandra cause, from my point of view, this is the most trending NoSQL for today.

Cassandra is good for:

  • Simple setup, maintenance code
  • Fast random read/write
  • Flexible parsing/wide column requirement
  • No multiple secondary index needed

Not good for:

  • Secondary index
  • Relational data
  • Transactional operations (Rollback, Commit)
  • Primary & Financial record
  • Stringent and authorization needed on data
  • Dynamic queries/searching  on column data
  • Low latency

Key usage case: Twitter Continue reading

Standard
QA (eng)

Test Suites for Simple Chat Application (REST API)

Introduction

This is a framework with test suites which allows me to test my Simple Chat Application UI in 5 minutes.

test app

I could provide bitbucket project access upon request and this is a link to original chat application.

Short Demo

Architecure

  • TestNG and Java
  • Webdriver is a main test library
  • JDBC to work with DB (clean up test data)
  • ANT to display test results
  • Test Suites and data stores as xls files and their status updates with every run
  • Easy to use with Jenkins

Framework example to begin with

Step by step instruction how to create framework

Standard
Java, NoSQL, SQL, REST API and other scary words

Simple SQL commands

Term Description Syntax
SELECT The SELECT statement is used to select data from a database. The result is stored in a result table, called the result-set. SELECT column_name,column_name
FROM table_name;
DISTINCT In a table, a column may contain many duplicate values; and sometimes you only want to list the different (distinct) values.
The DISTINCT keyword can be used to return only distinct (different) values.
SELECT DISTINCT column_name,column_name
FROM table_name;
WHERE The WHERE clause is used to extract only those records that fulfill a specified criterion. SELECT column_name,column_name
FROM table_name
WHERE column_name operator value;
AND/OR
ORDER BY The ORDER BY keyword is used to sort the result-set by one or more columns.
The ORDER BY keyword sorts the records in ascending order by default. To sort the records in a descending order, you can use the DESC keyword.
INSERT INTO he INSERT INTO statement is used to insert new records in a table. INSERT INTO table_name
VALUES (value1,value2,value3,…);INSERT INTO table_name (column1,column2,column3,…)
VALUES (value1,value2,value3,…);
UPDATE The UPDATE statement is used to update existing records in a table.
DELETE
SELECT TOP The SELECT TOP clause is used to specify the number of records to return.
LIKE The LIKE operator is used to search for a specified pattern in a column.
IN The IN operator allows you to specify multiple values in a WHERE clause.
BETWEEN The BETWEEN operator is used to select values within a range.
Joins SQL joins are used to combine rows from two or more tables:
INNER JOIN: Returns all rows when there is at least one match in BOTH tables
LEFT JOIN: Return all rows from the left table, and the matched rows from the right table
RIGHT JOIN: Return all rows from the right table, and the matched rows from the left table
FULL JOIN: Return all rows when there is a match in ONE of the tables
INNER JOIN The INNER JOIN keyword selects all rows from both tables as long as there is a match between the columns in both tables. INNER JOIN is the same as JOIN.
Общие значения
LEFT JOIN The LEFT JOIN keyword returns all rows from the left table (table1), with the matching rows in the right table (table2). The result is NULL in the right side when there is no match. SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name=table2.column_name;
RIGHT JOIN The RIGHT JOIN keyword returns all rows from the right table (table2), with the matching rows in the left table (table1). The result is NULL in the left side when there is no match. SELECT column_name(s)
FROM table1
RIGHT OUTER JOIN table2
ON table1.column_name=table2.column_name;
FULL JOIN The FULL OUTER JOIN keyword returns all rows from the left table (table1) and from the right table (table2).
The FULL OUTER JOIN keyword combines the result of both LEFT and RIGHT joins.
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name=table2.column_name;
UNION The SQL UNION operator combines the result of two or more SELECT statements.
The UNION operator selects only distinct values by default. To allow duplicate values, use the ALL keyword with UNION.
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
———————————————————
SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;
SELECT INTO The SELECT INTO statement copies data from one table and inserts it into a new table. SELECT *
INTO newtable [IN externaldb]
FROM table1;
INSERT INTO SELECT With SQL, you can copy information from one table into another.
The INSERT INTO SELECT statement copies data from one table and inserts it into an existing table
INSERT INTO table2
SELECT * FROM table1;
CREATE DATABASE The CREATE DATABASE statement is used to create a database.
CREATE TABLE NOT NULL – Indicates that a column cannot store NULL value
UNIQUE – Ensures that each row for a column must have a unique value
PRIMARY KEY – A combination of a NOT NULL and UNIQUE. Ensures that a column (or combination of two or more columns) have an unique identity which helps to find a particular record in a table more easily and quickly
FOREIGN KEY – Ensure the referential integrity of the data in one table to match values in another table
CHECK – Ensures that the value in a column meets a specific condition
DEFAULT – Specifies a default value when specified none for this column
INCREMENT TABLE
DROP INDEX Indexes, tables, and databases can easily be deleted/removed with the DROP statement. DROP INDEX index_name ON table_name
DROP INDEX table_name.index_name
DROP INDEX index_name
ALTER TABLE table_name DROP INDEX index_name
DROP TABLE table_name
DROP DATABASE database_name
TRUNCATE TABLE table_name
ALTER TABLE The ALTER TABLE statement is used to add, delete, or modify columns in an existing table. ALTER TABLE table_name
ADD column_name datatype
Views In SQL, a view is a virtual table based on the result-set of an SQL statement.
A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.
You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from one single table.
CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition
Date Data Types MySQL comes with the following data types for storing a date or a date/time value in the database:DATE – format YYYY-MM-DD
DATETIME – format: YYYY-MM-DD HH:MM:SS
TIMESTAMP – format: YYYY-MM-DD HH:MM:SS
YEAR – format YYYY or YY
SQL Server comes with the following data types for storing a date or a date/time value in the database:DATE – format YYYY-MM-DD
DATETIME – format: YYYY-MM-DD HH:MM:SS
SMALLDATETIME – format: YYYY-MM-DD HH:MM:SS
TIMESTAMP – format: a unique number
SQL Functions SQL Aggregate Functions
SQL aggregate functions return a single value, calculated from values in a column.Useful aggregate functions:AVG() – Returns the average value
COUNT() – Returns the number of rows
FIRST() – Returns the first value
LAST() – Returns the last value
MAX() – Returns the largest value
MIN() – Returns the smallest value
SUM() – Returns the sum
SQL Scalar functions
SQL scalar functions return a single value, based on the input value.Useful scalar functions:UCASE() – Converts a field to upper case
LCASE() – Converts a field to lower case
MID() – Extract characters from a text field
LEN() – Returns the length of a text field
ROUND() – Rounds a numeric field to the number of decimals specified
NOW() – Returns the current system date and time
FORMAT() – Formats how a field is to be displayed

Continue reading

Standard
Java, NoSQL, SQL, REST API and other scary words

Simple Chat Application (REST API)

Introduction

This is a simple restfull chat application to send messages between friends. Key features:

  • Register to the System
  • Login
  • Send/Approve Friendship Request
  • View friends list
  • Ignore User
  • Send Message
  • Delete account Minor features:
  • Auto update messages and friends lists
  • Security by Token

Link to the project upon request.

Short Demo

Continue reading

Standard
QA (eng)

Documents generator

It’s not a secret that there are a lot of common features in a huge amount of different applications. That’s why it will be very useful to collect all this documents in one repository and make it open for everyone. At this repository analysts will be able to  collect their specifications, voting for other specifications and searching the best of them (according to previous votes of all users) for reuse.

Solving this task I see a lot of difficulties:

  1. All of my documents are confident and a have no opportunity to share them. But I have some documentation, which were created for my own ideas, and, of course, I can share them (they got only one minus – they were wrote at my native language and it’s not English).
  2. Not all of analysts ready to share their documents, because it’s the same like teaching your competitors.
  3. It could be quite hard to understand, what was wrote in each document, because each company have their own standard (in Russia everyone tries to prove, that they are working according RUP, IEEE, K. Wiegers books,…, but, as a saw, that is not true).
  4. And the last one – people are too lazy to post something, which, furthermore, connected with their job.

I decided that the first step should solve the problem of creating standardized specifications and the easiest way is to create application, which will not allow describing window/web frames in different ways (that is why I decline the idea of describing fields in Excel or Word documents).

At this point I was inspired by IBM Web Content Manager (WCM). It took about one hour to understand – how should I create web forms with amount of fields and their properties (as type, length and so on). I don’t know if it will be correct to add print screens here, but, hover, it will be easy for you to find them. So, according to WCM work process, it will be enough for analyst to specify – how/where fields should be displayed at GUI and their properties. After that, developer’s work will be decreased 2-3 times, but this is not a first step for me, and, furthermore:

  1. WCM cannot generate all of specifications, what we need at our projects.
  2. It’s too expensive while we are working only on non-portal solutions.
  3. It could work only with web frames, can’t include tables and so on.

Why I decided to made another one application on this theme

I’m educating some students (private courses) and they should understand next important things, which are easy to show on working application, which could be immediately modified:

  1. How to collect requirements
  2. How to provide impact analysis
  3. How to improve usability and make application user friendly
  4. How to work with non-functional requirements
  5. What attributes each field or table should have
  6. How to prepare a test plan for application

 Domain model

After my primary analysis of the problem – I’ll enumerate primary features for application, which will be able to generate such kind of specifications:

  1. Creating GUI description
  2. Adding fields and table parts
  3. Adding checks for input data
  4. Adding formulas for calculated fields
  5. Generating text description

This generator should be available from web or user should have an opportunity to install it on his PC without configuring database procedures. That is why it is appropriate to save all necessary data at xml-files.

Domain model:

1

In fact, I want to see many frames/GUI at project and I want to reuse one frame in other project. Then, I want each frame consist of fields (with opportunity of copying field and field’s property to another frame) and tables. Tables should consist of many frames and have their own attributes (for example, are they editable or not). And all of the verification procedures should be cross project entities.

Version 0.1

Today I made my first version, which, of course, not ready to use at my department. Nevertheless, application already has an export to XLS function and it is quite easy to understand how to work with it.

2

3

Day by day I’m planning to improve it quality and amount of useful features. I will add
description of my Frames for business analysts, rewrite dropdown lists, to make already added frames and field excluded. I have been working as analyst and QA for 6 years, so I have a complicated view on my program and all of the processes, which I’m planning to automate.

However, you will be able to download my application soon. It consists of executable JAR file and a BAT file. BAT file will help you to start my application. I didn’t want to made users install any additional software or database engines, that’s why application uses only XML files for storing information.

Standard