Thursday, August 8, 2013

HTML5, CSS3 and JavaScript, a great recipe!

HTML, CSS and JavaScript, a pretty combination for Web Application, being mobile or not! And with the improvements on HTML5 and CSS3, the capabilities were multiplied!

I'll leave here my notes about the basics on html, css and javascript integration, some pieces of code exemplifying them, and a bit of HTML5 and CSS3.

I won't explain what is html, css and js, and it's structures. If you don't have the basics concepts, you'd better not continue to read this post.


CSS Addressing HTML Elements

First thing: keep your CSS code in a separate file. Don't use embedded in your HTML file. This way you can have a standard code, reusable, making it easy to maintain.

There are three ways to address HTML elements on your CSS:

  1. Directly to a specific element type. For example, if you want all <h1> elements to have a font-size with 25 pixels, just create on your CSS a piece of code like the following one:
    h1 {
        font-size: 25px;
    }
  2. Specify a class to be used as a tag attribute on the HTML file. First, create the class on the CSS file - starts with a dot:
    .blue_paragraph {
        background-color: blue;
    }

    Then, refer to these class within the HTML file:
    <p class="blue_paragraph">
  3. Refer directly to a id attribute on a HTML element:
    <p id="main_paragraph">
    So, on CSS it addresses the id - the hash has to be used before the id name on the CSS file:
    #main_paragraph {
        font-size: 25px;
    }
The Box Model is another important thing to be highlighted. For the border, it is possible to specify how thick it is. For the margin and padding, it defines the spacing inside and outside the box. You can use in two ways, removing the whole margin:
    margin: 0px;
or defining jus the top, left, right or bottom:
    margin-top: 0px;


JavaScript Important Notes

First thing: keep your JS code in separate files. Don't use embedded in your HTML file. Document your code as much as you can. This way you can have a standard code, reusable, making it easy to maintain.

Make use of the console. When you use Chrome, i.e., on the "Inspect Element" option, you can view the Console. Within the JavaScript, make use of it to print values, variables, comments, by using:

    console.log('Hello');

Have a list of all available events available, like the following one:
http://www.quirksmode.org/dom/events/index.html

Make use of objects, translate them into a JSON document when necessary:
    var user = new Object();
    user.name = 'Pedro';
    user.lastName = 'Catoira';

    var jsonObject = JSON.stringify(user);


New Features on CSS3

The new pseudo-classes target elements based on their position in the document:
  • :root – targets the root element of a document;
  • :only-child – element in the document tree that is the only child of its parent;
  • :empty – targets elements that don’t have any children or any text, like <h1></h1>;
  • :nth-child(n) – targets child elements in relation to their position within the parent, using an index;
  • :first-of-type – targets the first of a specific type of element within a parent, and is the opposite of :last-of-type;
  • :first-child – targets the first child element in a parent. It is the opposite of :last-child;
  • :not(s) – this one targets elements that are not matched by the specified selector.

Vendor prefixes helps browsers interpret the code. Below is a short list with all the vendor prefixes for major browsers:
  • -moz- : Firefox
  • -webkit- : Webkit browsers such as Safari and Chrome
  • -o- : Opera
  • -ms- : Internet Explorer
Rounded corners without images and JavaScript is one of the biggest features of CSS3, we can do this with only a few lines of code:
#my_id {    height: 100px;
    width: 100px;
    border: 1px solid #FFF;
    /* For WebKit: */
    -webkit-border-radius: 15px;
}

Border images allow developers and designers to take there site to the next level without the need to create images:
#id_xyz
    /* url, top image, right image, bottom image, left image */
    border-image:  url(border.png) 30 30 30 30 round round;
}

Before CSS3, we had to either use a shadow image or JavaScript to apply a shadow or create the shadow directly in the image. With CSS3 Box Shadow we can apply shadows to almost every element of our website:
#id_xyz
    background: #FFF;
    border: 1px solid #000;
    /* For WebKit: */
    -webkit-box-shadow: 5px #999;
}

With the Multi-Column Layout, it's possible to arrange text in more of a “news paper” type way. You have the choice to pick the amount of columns, the column width, column gap width, and column separator:
#id_xyz
    text-size: 12px;
    /* For WebKit: */
    -webkit-column-gap: 1em;
    -webkitcolumn-rule: 1px solid #000;
    -webkit-column-count: 3;
}

Create multiple backgrounds on a single element:
#id_xyz {
    background:
        url(topbg.jpg) top left no-repeat,
        url(middlebg.jpg)center left no-repeat,
        url(bottombg.jpg) bottom left no-repeat;
}

@font-face - The new CSS3 implementation allow developers and designers to use any licensed TrueType “.tff” or OpenType “.otf” ” in their web designs:
@font-face {
    font-family: "my-font";
    src: url(my-font.tff) format("truetype");
}
#id_xyz {
    font-family: "my-font", sans-serif;
}

In CSS3, three additional attribute selections are available for matching substrings of the attribute value:
Select elements with title prefix of “x”:
p[title^=x] {
    (...)
}

Select elements with title suffix of “x”:
p[title$=x] {
    (...)
}

Select elements with title contain at least one instance of “x”:
p[title*=x] {
    (...)
}

Opacity is also supported:
#id_xyz {
    background: #F00;
    opacity:  0.5;
}

RGBA Colors:
#id_xyz {
    background: rgba(255, 212, 45, 0.5);
}

But you don't really have to have all of that in mind. Just have to know the capabilities. These are helpful sites to support nice CSS code:




New Features on HTML5

The list of new features on HTML5 is huge. In summary, it's trying to simplify what we had before, and also adding some new important features. Here are some of them:

  • Simple Doctype - just need this now:
    <!DOCTYPE html>
  • The new elements:
    http://www.w3schools.com/html/html5_new_elements.asp;
  • The type attribute is not required anymore for script and link tags;
  • Editable contents:
    <h2> To-Do List </h2>  
    <ul contenteditable="true">  
        <li> Break mechanical cab driver. </li>  
        <li> Drive to abandoned factory  
        <li> Watch video of self </li>  
    </ul>  
  • Small validations, like testing the typed e-mail account, now is available on HTML5. That makes the browser to test it and display an error message without submitting the form:
    <form action="" method="get">
        <label for="email">Email:</label>
        <input id="email" name="email"
    type="email" />
     
        <button type="submit"> Submit Form </button>
    </form>  
  • Regular Expressions:
    <input type="text" name="username"
           placeholder="4 <> 10"
           
    pattern="[A-Za-z]{4,10}"/>
  • Required and Autofocus attributes:
    <input type="text" name="email" required autofocus
  • Native placeholders for input boxes:
    <input name="email" placeholder="jdoe@gmail.com" />
  • Local Storage is the feature on HTML5, in combination to JavaScript, that allows data to be saved offline, like in the example below:
    localStorage.clear();
    localStorage.setItem('names', variable);
    console.log(localStorage.getItem('names'));
This is an amazing website that helps to check the compatibilities between browsers, including the mobile ones: http://caniuse.com/

SQL or NoSQL, that's the question!

We hear now too much about noSQL in the market. Sometimes, it's being applied without any sense, just because it's shrewd to say you're implementing an whole architecture based on storing your data into a noSQL database.

Of course, we can easily find some really accurate use of noSQL databases. But it's really important to understand why are you picking up a non-relational database. it's a huge decision, even to select which noSQL database you'll work with.

First thing to be clear is that we can implement everything on a noSQL or relational database. The overhead on processing the data, to develop and maintain the environment, and the final performance are the key factors. So, eventually, your final architecture may combine both relational and noSQL databases.

Another idea to keep in mind is that is pretty easy to setup either noSQL and relational databases for small tests/PoCs. The relational databases can be simple, just make it simple for your tests - you may not need to set up all constraints, normalize your tables, etc. You wouldn't be doing anything like that with a noSQL database.

In my opinion, the approach has to be reasonable. We also have to try to solve the problem in a simplistic way, and not create a beautiful and complex architecture to solve a small issue.

A good way to do that is iterate with your own solution: you get a problem, study the solution, put it in the context and question yourself if it fits on the big picture. Compare it against the whole scenario. If it makes sense, go ahead and attack the next problem. Sometimes you might step back and combine solutions into one.

For example, Riak is a known noSQL database that supports clustering out of the box. So, it's provides high availability, even in failure conditions. But you have to ask yourself if it's really necessary. If you pick the big players on relational database, they provide more than 99,999% of availability.

In other scenarios, it's easy to justify your decisions. If we're talking about document based databases, like blogs, you may consider a document based noSQL database, like MongoDB. Similar scenario for InfoGraph, if we are talking about graphic databases. And we can't forget about Hadoop for really big data if you need performance.

There are lots of matrix available on the web, pointing the databases and its usage. Also, lots of reports comparing the products. These materials are interesting, I recommend some research before defining your strategy.

Basically, there are lots of noSQL solutions available. We just have to understand well the problem we are dealing with, and pick a relational database or a specific noSQL database that solves this problem. There may not be a 100% correct solution in all cases. So, write on your notes your assumptions and the reasons you picked up some specific solution... and good luck!

Monday, July 15, 2013

The Power Behind the JMS

In every discussion about middleware, the idea of a ESB tool, Web Services and EJBs are always in focus. But we can't forget that the JMS can bring up a reliable and loosely coupled way to have an asynchronous communication.

Of course, every scenario demands different combination of components of software to be used. And the success formula is directly related on the way you design your final architecture.

In 2007 I was called to solve some bad issues one big retail company was having in their environment. Basically, they had one of the biggest consulting companies developing their solutions in a JEE environment, with lots of integrations. But some decisions they made on the design phase was causing lots of troubles, and giving a huge headache for the retailer's IT Manager.

Besides some basic issues they couldn't resolve, like adjusting the heap size for the application server, the major problems weren't that easy to address. That's delicate. You have to tell they spent money in a not well designed architecture, and that they will fail with that is not a straight task. So, I decided to have a different approach, showing how to fix the current scenario, but also recommending changes to support better performance in the future. And one of the points was related to the JMS.

The design for that project was using JMS to exchange data between one ERP and the developed java system responsible to populate other systems. That was the biggest bottleneck. The performance was critical, and it was failing on delivering some of the messages.

So, first thing to analyze was the amount of data transmitted in a daily basis. More than 40GB. Oh, looks bad... So, I picked up the biggest file. Around 10MB. Basically, if you work with persisted messages, the data segments within JMS have a predefined size, and a predefined amount of files that can or cannot rotate when they're full. These values can be changes, of course. But they left the default values there. So, when a 10MB message was about to be stored in the data segments within the JMS, with less then 10% of it's size, it was failing.

The final document presented the two scenarios. For the first one, "to make it work as it is", the recommendation was to set a huge data segment size. And the second one, was to make the JEE application responsible for the integration to act like a FTP Hub, being fed with JMS notifications:

  • For the messages with less then certain amount of data, lets say 100K, the message would contain the real data, and the process would be the same;
  • Bigger that that, it would make use of FTP to get the data and send it to the target system, database or file system.
Some other recommendations over the use of JMS were included on the final document, like making concurrent processes instead of having just one process running. But one of the issues they were having was related to the target JMS availability. In fact, they were thinking in the opposite way. The application that is about to send the messages was running in a application server. So, it should send to a JMS Queue within the same application server. So, if the target consumer is not available, you won't stop. In fact, the consumer is responsible to go to the source and get the messages. With this scenario, you won't have any trouble. It works perfectly!

I also created a PoC to show them what we was talking about, and how to implement it. When I arrived there, JMS was being questioned, and they were also considering to substitute it by web services, because it was failing. With that kind of architecture, Web Services would become the "big bad wolf" sooner or later. Anyways, when I finished my assignment, JMS was in a good picture again. 

In summary, JMS is a really powerful tool. But you have to apply it with some premisses:
  • Is the communication asynchronous?
  • What kind of information is being loaded into the messages?
  • Is it confidential information? Not a good idea to use it on queues!
  • What is the amount of data, per message? Focus on the biggest messages;
  • And the final question is: do you really need JMS? I mean, I saw one design where JMS was supposed to be used as a logger, for one standalone application. The responsible for that wasn't unable to answer this question with any good and reasonable arguments.
During the architecture design, we should try to come up with discussions related to the peaks on daily/monthly processing, and dependencies. You might need to consider these information on your final design, to further concurrency setup and strategy to adopt for the distributed transactions.

JMS can be a really nice addition to your projects, but the project should require it, and not the opposite.

Thursday, July 11, 2013

A bit of AWS

I was always reading about Amazon Web Services, liked the idea of having remote services to build a complete cloud solution, but left it in a side note to try it later.

Well, today I decided to try it. Created my AWS free account, and got really impressed with the amount of features and components available to create the solutions!

My intention here today is not to explain what is AWS, since there are lot of web sites, including the AWS one, with tutorials and videos explaining it. I'd like to show what I've done, leave a general picture on what I did, how I did, and the hidden tricks on the configurations.

I decided to focus on the main features, and from that I'm talking about a Linux instance on the cloud, a relational database, an application server and a network configuration to make it accessible from my local environment.

I tried two different approaches, and so I'll explain these two scenarios, just to show the freedom you have to architect and specify your projects on the cloud using AWS.


Scenario 1

Within this scenario, I'll show how to create and use an EC2 instance. In other words, I'll pick up a Ubuntu instance on the cloud. There are also other Linux options, and even Windows. The setup is straight, not a big deal. But let's keep it simple.

Go to the Management Console, and click on top of EC2. From there, hit on Launch Instance button.

While creating the instance, you might have created a Key Pair. Don't loose your local key, and set the permission for just you to read:
chmod 400 <your key name>.pem
Go to the Network Interfaces, and find your Public DNS.

To access the Ubuntu via SSH, you must use the ubuntu user, your local key pair, and the public DNS:
ssh -i <your key name>.pem ubuntu@<your public DNS>
Done! Now you're accessing your remote ubuntu instance!

From there, I installed tomcat7:
sudo apt-get update
sudo apt-get install tomcat7
Just to make sure it's running:
ps -ef | grep tomcat
Now you have to open port 8080, make it available from outside world:

  • First you have to go to Security Groups, and on Inbound tab, add the port 8080;
  • Apply changes and go back to Instances;
  • Right-click your instance and click Connect - get the IP for the Public DNS;
  • Open another tab on your browser and look for the port 8080 on your Public DNS.

If you get the message It Works! from tomcat, you're good to go! You now can access your linux instance remotely, and access the port 8080 without any problem.

I'm not going far on this scenario. I just wanted to show how to create the OS instance, access it from your local machine. I just set up the tomcat to show that is necessary to configure the authorization to access the specified tomcat port externally.

Even though it's simple, from there you can expand your environment as you need and want. For example, having the basic environment created, you can just install a database and a version control component (SVN, GIT) on your Ubuntu, add Jenkins, update the configurations and quickly you can have an integrated development environment.

Replicate the environment, and you can create the Q&A and Production environments. Implement promotion processes and have a full environment ready.


Scenario 2

For this scenario, I decided to pick up different components available on AWS. The database and the application server. Note that you won't need the EC2 instance for this scenario.


Application Server - Elastic Beanstalk

Go to Elastic Beanstalk and follow the steps to create a new Application Server Instance. Easily you have a new App Server running and accessible! I selected Tomcat 7.

Comparing to having it tied to the OS, it's extremely interesting the option to have it independent from the EC2 Ubuntu instance. And working together with the Eclipse Plugin, you can directly deploy to it.


Database - RDS

Go to RDS and click on Launch a DB instance button.

With several options between MySQL, Oracle and SQL Server, I picked up MySQL - it's just for a test. Basically you can access it from your local machine using a SSH connection. It's pretty easy to use from MySQL Workbench. If you have a MySQL instance running in your box, it's easier to turn it off. Within a linux, just type a command like:
mysqladmin -u <root> -p<password> shutdown
To configure the SSH tunnel on MySQL Workbench, click on New Server Instance:

  • The Address you can get from your BD Instance. When you click on the details, check the address on Endpoint;
  • Follow the next steps, and done! You have now access to your database on the cloud.
Don't forget to download the Connector/J from MySQL website. This is the official JDBC connector for MySQL.


Create a sample table

From MySQL Workbench, create a simple table to be used as an example:
create table grosseries (
  code    int,
  name    varchar2(50),
  date    datetime
);

Test the connection from your local Eclipse

Create a Java Project, a class called Connect, and use a code like the following one:

Just heads up on the JDBC URL:
  • To get the server name, go to your instance and click on top of the row. The details will be presented. Get the name from the EndPoint field;
  • The database name is the same you created during the installation.

Now it's just a matter to run it and check if you got any results on your console.


Install the Eclipse Plug-in

In order to create a Eclipse project and deploy on the Elastic Beanstalk, it is necessary to install the AWS Plugin for Eclipse:
  • On Eclipse, select the option Install New Software under the menu option Help;
  • Type http://aws.amazon.com/eclipse on Work With field and click on Add button;
  • Type a name for the installation - I used AWS Plugin and hit the Ok button;
  • Select the top option and follow the next steps until the plugin is installed;
  • Restart Eclipse and configure the plugin:
    • Open the AWS Management Perspective;
    • On EC2 Instances tab, hit on the link to login with your account;
    • To create/view your Access Key, and get your Account Name and Secret Access Key, go to: https://portal.aws.amazon.com/gp/aws/securityCredentials
    • Type your information, and connect - you must see your EC2 Instance listed below.

Test the Deployment to the Elastic Beanstalk

I'll not list all steps to create a java project here. Feel free to create anything web based, so you can access from a web browser/

The idea here is to create a local project on your Eclipse and then deploy to AWS Elastic Beanstalk. In order to accomplish that, the following rules have to be considered:
  • Create a AWS Java Web Project on Eclipse;
  • Use a local Application Server for local tests;
  • When the application is ready to go to AWS, follow the steps below:
    • On the Project Explorer, right-click your project and select the option Deploy to AWS Elastic Beanstalk, under the Amazon Web Services option;
    • Select the correct server and version and hit Next;
    • Type the application name and environment name and hit Next;
    • Set the option Deploy with a key pair;
    • Hit Finish - it took me some time to deploy my application.

Now go to your Elastic Beanstalk on AWS and check the results.

Note that you can also just create manually an application on AWS Elastic Beanstalk and upload the war file.


Conclusions

I made it simple. But the capabilities are impressive! You can create a high available environment with load balancing, setup regular backups, build huge solutions and count with more resources as you need them, monitor... Anyways, I can create a enormous list of features and benefits, but the message will be just one: you can find anything you need to build a consistent solution on the cloud with AWS. Of course, there's a price to be paid in the end. So, my 50 cents, start with the minimum, and grow as you need.

Thursday, June 27, 2013

The Misunderstanding about SOA

SOA, or Service-Oriented Architecture is not a couple of Web Services!

For several times I see products being sold with a label SOA, or people on LinkedIn saying about their experience with SOA, but when you question about it, the answer usually is the same: the product implement some web services, or the person knows how to develop web services. That drives me crazy!

Other people think they implement SOA just because they have web services implemented, orchestrated by a ESB (Enterprise Service Bus) tool or a BPM (business process management) tool. It's better then just talk about web services, but still misses a big context.

Ok, SOA is somehow tied to the service concept. But it's in a bigger scenario, and I'm not just only talking about service orchestration, I'm talking about a top-down approach in the whole organization, a SOA Governance.

In 2009 I had to deal with a SOA Governance project, to define a SOA Center of Excellence. At that time, I didn't have a clear picture of SOA, just some ideas and concepts. I also couldn't find any good reference, or someone that could help me to define it. The best approach, where I based my proposal came from The Open Group. At that time, they just had a work-in-progress document about SOA Governance, based on COBIT and ITIL methodologies.

Basically, the idea was to apply SOA Governance methodology along with Project Management and Development methodologies together. And that's the big picture, the recipe to structure SOA. Of course, the amount of details on each methodology is really impressive, and all related to each other. Also, it depends on the company's adoptions - you can't just throw everything the company uses to implement everything with the new methodologies. You have to adapt, and have a detailed plan to make it possible to be done.

You should try it. Ask anyone about SOA. You'll find lots of "experts" that can talk for hours about... web services!

Version Control

Every time I'm starting a new project, I take a look on the web, talk to friends and check on magazines what are the latest hot technologies I can apply to the project. Of course, I can't apply everything, it's a risk, so I pick up what I can depending on several factors, being determinant the deadlines.

And that's how I started to move on with version control. My first experience was with a Borland tool, I can't remember the name, but I remember that it came bundled with a Delphi 4 package, if I'm not missing the correct version. Anyway, there was no flexibility at all, and the tool was full of bugs. But we get used to them, and I used it for a while.

With java projects, initially I started to use CVS, or Concurrent Version System. In 2001 I participated in one specific project that required a trustful version control, since we had three distinct teams, located in two different cities, working together in the same project. And CVS responded as expected. We had some minor issues on renaming files, and when we worked on the same file, but the benefits on using CVS on that project were unquestionable.

I don't remember well when I tried the Apache SVN, but when I tested the branching, I left CVS behind. Never used again. The tools are much better, wide supported - the Eclipse plugin, in example, make it easy to use.

I stayed working with SVN since 2011, when I tried out Git. I remember checking out the speed of the commits, the ability to easily and quickly have concurrent local branches, commit some of them, and push it to the central repository. And also the merging impressed me. Much better than SVN for what I was doing! I know, it's not only that, but comparing how fast and easy it was, comparing to SVN... I moved on to Git.

And when we talk to Git, we can't forget to mention GitHub. I just left my first repository there. The idea of having a Git repository hosting server available for everyone is awesome!

When considering a version control solution for a new project, for sure, there are lots of good options. And the decision on which one to pick must be tied to the technology (like Team Foundation for a .Net project), the project requirements and the preferences (architect and developers). Also, if it's new to you and your team, spend some time testing. Make sure you're selecting a solution easy to work with.

Shell Scripts

We can't deny scripting is powerful, depending on what we are supposed to do. But some people look at bash scripts as a really complex technology. Early this year, in 2013, for example, I was asked to help in a proof of concept for a specific client. The team was already working on the main tasks, but they were struggling in some validations and for an automation with the PoC files.

So, to validate the files and its contents, I wrote a couple of Java programs. But for the file automation (copy/uncompress/move to another file system - hadoop), with some extra validations, I decided to write a generic bash script on the Linux box. So, I decided to write here some really simple chunks of code to show how easy it is to use, and how it solves some issues in a simple way:

  • Before starting to write the main logic, it's necessary to define the global variables. For example:
### Set main directory
rootDir=/
targetHadoopDir=/user/jdoe/client/data/
tempDir=/home/jdoe/tmp/
        • First action is checking if a parameter was passed on the command line. If not, a message is printed, and the execution is stopped:
          ### check if there's a directory name passed as parameter
        if [ $# -ne 1 ]
        then
          echo "Need a directory as a parameter!"
          exit 1
        else
          rootDir=$1
        fi
                      • After that, it was necessary to check if a temporary directory exists. If it's there, the script is supposed to clean it up. If not, it has to create the directory. So, the call to the sub routine responsible to perform the logic described is done just by using the sub routine name:
                      ### check if a temporary directory exists
                      checkTemp
                        • And the routine has to be specified before the main logic:
                        ### check if temp directory exists, and create it or clean it
                        checkTemp() {
                          if [ -d "${tempDir}" ]
                          then
                            rm -r $tempDir
                            if [ $? -gt 0 ]
                            then
                              ### error
                              echo "Error removing directory '$tempDir'!"
                              exit 1
                            fi
                          fi
                          mkdir $tempDir
                          if [ $? -gt 0 ]
                          then
                            ### error
                            echo "Error creating temp directory '$tempDir'!"
                            exit 1
                          else
                            echo "Temp directory '$tempDir' ready to be used!"
                          fi
                        }
                                                                      • You might have noticed how simple it is to write a command and check if it ran or not. Another tip is to print the steps and comment your script. Try to be clear, since you might forget later why you were performing certain steps;
                                                                      • So, the last step on the main logic is to process the files. In fact, all the steps above were basically a preparation to process the files. So, in the main logic I called the main sub routine using the roo=t directory as a parameter:
                                                                      ### run down the directory tree and work with the filesprocessDir $rootDir 
                                                                      echo "Done!"
                                                                            •  This routine is recursive. It checks all the files under it, and calls itself if it's a directory, or processes the file if it's a file:
                                                                            ### process directory search, recursively
                                                                            processDir() {
                                                                              for file in $1/*
                                                                              do
                                                                                if [ -f "$file" ] 
                                                                                then
                                                                                  ### it's a file
                                                                                  processFile $file
                                                                                else
                                                                                  if [ -d "$file" ] 
                                                                                  then
                                                                                    ### it's a directory
                                                                                    processDir $file
                                                                                  fi
                                                                                fi
                                                                              done
                                                                            }
                                                                                                             I won't write here the whole processDir routine and all the other called sub routines, but I'll point some interesting pieces of code in it:
                                                                                                            • How to get the parameter within the sub routine:
                                                                                                            ### process each fileprocessFile() {  fileName=$1
                                                                                                                • Comparing the file name against a string:
                                                                                                                if echo $fileName | grep -q "_food_"
                                                                                                                then
                                                                                                                  prefix="f"
                                                                                                                  echo "  it's a food file!"
                                                                                                                else
                                                                                                                All the other steps are basically linux and hadoop commands to safely perform the copy.

                                                                                                                After performing some tests, and adjusting some minor issues, I left it running for the whole weekend, and got the results on the following Monday. 

                                                                                                                The PoC ended up being a success, and this step on the process that could have become a huge manual task was solved with a simple bash script.

                                                                                                                So, in my opinion, if you are thinking of a solution for repetitive tasks or to orchestrate complex tasks, you should consider write a script. That can be the easiest and more efficient solution.