Catoira's Development Notes: June 2013

The Misunderstanding about SOA

SOA, or Service-Oriented Architecture is not a couple of Web Services!

For several times I see products being sold with a label SOA, or people on LinkedIn saying about their experience with SOA, but when you question about it, the answer usually is the same: the product implement some web services, or the person knows how to develop web services. That drives me crazy!

Other people think they implement SOA just because they have web services implemented, orchestrated by a ESB (Enterprise Service Bus) tool or a BPM (business process management) tool. It's better then just talk about web services, but still misses a big context.

Ok, SOA is somehow tied to the service concept. But it's in a bigger scenario, and I'm not just only talking about service orchestration, I'm talking about a top-down approach in the whole organization, a SOA Governance.

In 2009 I had to deal with a SOA Governance project, to define a SOA Center of Excellence. At that time, I didn't have a clear picture of SOA, just some ideas and concepts. I also couldn't find any good reference, or someone that could help me to define it. The best approach, where I based my proposal came from The Open Group. At that time, they just had a work-in-progress document about SOA Governance, based on COBIT and ITIL methodologies.

Basically, the idea was to apply SOA Governance methodology along with Project Management and Development methodologies together. And that's the big picture, the recipe to structure SOA. Of course, the amount of details on each methodology is really impressive, and all related to each other. Also, it depends on the company's adoptions - you can't just throw everything the company uses to implement everything with the new methodologies. You have to adapt, and have a detailed plan to make it possible to be done.

You should try it. Ask anyone about SOA. You'll find lots of "experts" that can talk for hours about... web services!

Version Control

Every time I'm starting a new project, I take a look on the web, talk to friends and check on magazines what are the latest hot technologies I can apply to the project. Of course, I can't apply everything, it's a risk, so I pick up what I can depending on several factors, being determinant the deadlines.

And that's how I started to move on with version control. My first experience was with a Borland tool, I can't remember the name, but I remember that it came bundled with a Delphi 4 package, if I'm not missing the correct version. Anyway, there was no flexibility at all, and the tool was full of bugs. But we get used to them, and I used it for a while.

With java projects, initially I started to use CVS, or Concurrent Version System. In 2001 I participated in one specific project that required a trustful version control, since we had three distinct teams, located in two different cities, working together in the same project. And CVS responded as expected. We had some minor issues on renaming files, and when we worked on the same file, but the benefits on using CVS on that project were unquestionable.

I don't remember well when I tried the Apache SVN, but when I tested the branching, I left CVS behind. Never used again. The tools are much better, wide supported - the Eclipse plugin, in example, make it easy to use.

I stayed working with SVN since 2011, when I tried out Git. I remember checking out the speed of the commits, the ability to easily and quickly have concurrent local branches, commit some of them, and push it to the central repository. And also the merging impressed me. Much better than SVN for what I was doing! I know, it's not only that, but comparing how fast and easy it was, comparing to SVN... I moved on to Git.

And when we talk to Git, we can't forget to mention GitHub. I just left my first repository there. The idea of having a Git repository hosting server available for everyone is awesome!

When considering a version control solution for a new project, for sure, there are lots of good options. And the decision on which one to pick must be tied to the technology (like Team Foundation for a .Net project), the project requirements and the preferences (architect and developers). Also, if it's new to you and your team, spend some time testing. Make sure you're selecting a solution easy to work with.

Shell Scripts

We can't deny scripting is powerful, depending on what we are supposed to do. But some people look at bash scripts as a really complex technology. Early this year, in 2013, for example, I was asked to help in a proof of concept for a specific client. The team was already working on the main tasks, but they were struggling in some validations and for an automation with the PoC files.

So, to validate the files and its contents, I wrote a couple of Java programs. But for the file automation (copy/uncompress/move to another file system - hadoop), with some extra validations, I decided to write a generic bash script on the Linux box. So, I decided to write here some really simple chunks of code to show how easy it is to use, and how it solves some issues in a simple way:

Before starting to write the main logic, it's necessary to define the global variables. For example:

### Set main directory
rootDir=/
targetHadoopDir=/user/jdoe/client/data/
tempDir=/home/jdoe/tmp/

First action is checking if a parameter was passed on the command line. If not, a message is printed, and the execution is stopped:
### check if there's a directory name passed as parameter

if [ $# -ne 1 ]
then
echo "Need a directory as a parameter!"
exit 1
else
rootDir=$1
fi

After that, it was necessary to check if a temporary directory exists. If it's there, the script is supposed to clean it up. If not, it has to create the directory. So, the call to the sub routine responsible to perform the logic described is done just by using the sub routine name:

### check if a temporary directory exists
checkTemp

And the routine has to be specified before the main logic:

### check if temp directory exists, and create it or clean it
checkTemp() {
if [ -d "${tempDir}" ]
then
rm -r $tempDir
if [ $? -gt 0 ]
then
### error
echo "Error removing directory '$tempDir'!"
exit 1
fi
fi
mkdir $tempDir
if [ $? -gt 0 ]
then
### error
echo "Error creating temp directory '$tempDir'!"
exit 1
else
echo "Temp directory '$tempDir' ready to be used!"
fi
}

You might have noticed how simple it is to write a command and check if it ran or not. Another tip is to print the steps and comment your script. Try to be clear, since you might forget later why you were performing certain steps;
So, the last step on the main logic is to process the files. In fact, all the steps above were basically a preparation to process the files. So, in the main logic I called the main sub routine using the roo=t directory as a parameter:

### run down the directory tree and work with the filesprocessDir $rootDir
echo "Done!"

This routine is recursive. It checks all the files under it, and calls itself if it's a directory, or processes the file if it's a file:

### process directory search, recursively
processDir() {
for file in $1/*
do
if [ -f "$file" ]
then
### it's a file
processFile $file
else
if [ -d "$file" ]
then
### it's a directory
processDir $file
fi
fi
done
}

I won't write here the whole processDir routine and all the other called sub routines, but I'll point some interesting pieces of code in it:

How to get the parameter within the sub routine:

### process each fileprocessFile() { fileName=$1

Comparing the file name against a string:

if echo $fileName | grep -q "_food_"
then
prefix="f"
echo " it's a food file!"
else

All the other steps are basically linux and hadoop commands to safely perform the copy.

After performing some tests, and adjusting some minor issues, I left it running for the whole weekend, and got the results on the following Monday.

The PoC ended up being a success, and this step on the process that could have become a huge manual task was solved with a simple bash script.

So, in my opinion, if you are thinking of a solution for repetitive tasks or to orchestrate complex tasks, you should consider write a script. That can be the easiest and more efficient solution.

Thursday, June 27, 2013

The Misunderstanding about SOA

Version Control

Shell Scripts