Maxim Gubin's Tech Blog

Thursday, April 4, 2019

Adobe AEM Content Syncing

Background

Over the last couple of months I’ve been working on a migration project from AEM 6.2 to 6.4. And one thing we struggled with was being able to maintain parity of the content between existing and the newly created environments. The issues arose because the existing tools are not working out so great like the vltrcp, Grabbit, packages. Each one has its limitations that prevent us from using any one tool in particular. The main issue is that when someone asked us to mirror the current production environment it could take on the order of two days worth of time. It was a real time-sink. And by the time you're done with the sync the prod content is already outdated and doesn't match anymore.

At first we looked at using vltrcp which proved pretty error prone and quite slow (taking something like in a order of days for a single full sync (content and images). Delta syncs had other issues too like requiring us to remove the ordering of the nodes in some instances which then presented problems: we needed the content to maintain the same order within the jcr:content/par nodes.

Then we looked at using the TWC’s Grabbit tool which we also found quite awesome in theory.
It was super fast to sync plain content over since it did RCP syncs serializing the data into tiny bits using Google Protobuf. It was still slow syncing binary content since that can't really be compressed. So in practice, it had a lot of issues with keeping content synced correctly. It would skip over nodes when trying to use the delta option. Primarily because of the way the delta sync works. It simply checks the timestamp of the last run and only copies nodes that have a timestamp that is newer than the last run. So if a job failed to finish it would fail to copy over content it didn't get to copying during the failed job run. It also had issues when trying to do a full sync. The dam still took a substantial amount of time. (12-14 hours. Note that the size of the dam is approximately 80-90gb).

Eureka Moment

Thinking about this problem I realized that perhaps there is a better way. We have tools like ACS Commons Query Packager tool and we also have an HTTP based tool that AEM provides OOTB called the querydebug/querybuilder tool.

And what we ended up doing was initially doing a sync via modified Grabbit tool (we customized it to continue running on errors - otherwise it would stop in the middle of a sync and we had to restart again and again). And then for subsequent changes we just simply used ACS Commons Query Packager tool in conjunction with these other custom python scripts based off QueryBuilder API.

This allowed us to figure out exactly what is different between environments and just simply copy over the missing content in form of packages and be able to replicate only those paths without needing to do any tree activations. Each of these various phases was made into a separate modular script that can all be chained together.

The one tool that we didn't get a chance to explore is the oak-upgrade or crx2oak tool which could be used to copy over content between various environments.

Scenarios where using the sync scripts could be useful

• Automated syncs from prod to lower environments on a scheduled basis
• Migration from one AEM author to another and making sure the content is kept in sync until launch time
• Finding any orphaned pages in publishers and removing them

How these scripts can be chained together

Sync between Source Author and Target Author

Find the differences of what exists in source author vs target and generate a package from those differences for any particular path. Then the package can be installed via curl and we can replicate just the paths that were added instead of doing a tree activation - which is a costly operation.

createContentDiffList.py
createPackageFromPaths.py
toggleWorkflows.py - disable the workflows
toggleComponents.py - disable any pre-processor component
curl to upload to author server (maybe should be part of another script or a separate script)
replicatePaths.py or unzipAndReplicateQueryPackagePaths.py
toggleComponents.py - enable any pre-processor component
toggleWorkflows.py - enable the workflows

Find and publish any pages on author that were published but are not on publisher

So if there are pages in author that have been published but they are not in publisher. Somehow they may have been missed but they should exist on publishers since they are marked published on author.

createContentDiffList.py using the --source_published True flag
toggleWorkflows.py - disable the workflows
toggleComponents.py - disable any pre-processor component
replicatePaths.py
toggleComponents.py - enable any pre-processor component
toggleWorkflows.py - enable the workflows

Find and unpublish any pages that are marked un-published on author but exist in publisher (orphaned pages)

createContentDiffList.py using the --source_unpublished True flag
toggleWorkflows.py - disable the workflows
toggleComponents.py - disable any pre-processor component
replicatePaths.py - use the --deactivate flag to unpublish instead of publishing
toggleComponents.py - enable any pre-processor component
toggleWorkflows.py - enable the workflows

Monday, April 26, 2010

What's is going to happen to Java?

This has kind of been discussed by some lately but I think many people are still wondering but don't ask. Java has been around for quite some time now and has infiltrated many a businesses with its simple syntax yet powerful features. A lot of developers still do java and companies have invested heavily into Java, so I feel like it is here to stay. It's definitely starting to lose its luster lately for a few reasons.
One being the state of affairs with Java's creator and a long time guardian of its innovation.
The other being the multiple-processors becoming the norm in computers.

The former by itself has been quite a pain point of the Java community, even before the take-over. JCP has been relatively slow in keeping up with people's needs. Some of the other languages emerging are giving Java a run for its money in terms of the features and paradigms that they offer.

The ruby community (rubyflow.com, rubyinside.com) has been gaining a lot of momentum in the last few years and I really like a lot of the stuff that is going on there. A lot of innovation, the community is really growing and contributing to its growth. It is evolving at a more rapid pace than any other community as of late.

Erlang picked up interest lately because of its distributed nature, fault-tolerance and high-availability, which go hand-in-hand when dealing with communication type of systems.

There are definitely some interesting languages emerging on the JVM itself (JRuby of course being one).
We have Clojure, Scala, Groovy, Fantom to name a few.

From what I see Java is going to be slowly being superceeded by some of these. I know Groovy is definitely being used quite a bit already and the fact that now we have a statically typed version of it called Groovy++, is going to pique quite a bit of interest.

Scala is a really interesting for me personally, because it in a unique position of being a hybrid language. It is both Object-oriented and functional at the same time. Its speed is nearly identical to Java itself.
The akka framework being developed on top of Scala is the most interesting framework for me these days.

Clojure is really interesting too. It is nearly pure functional programming language and its built-in STM gives it a unique feature that not many functional languages have (I don't know if any do actually).

Fantom is one of the new kids on the block, yes I say that even when some of these other languages haven't even been around longer than a few years.
It brings an interesting aspect where the code written in Fantom can compilre for both the CLR (.Net) and JVM. An interesting way to bridge the gap between the two systems.

I have been reading up on all of these languages because I think they will all be players in the game. In fact they already are. Like I said though, I Java is here to stay.
Look at COBOL (I really don't want to compare it to COBOL), but still, if you look at COBOL, it is still around how many years later?

We live in interesting times and I don't think Java developers have to worry about job security, but I do think the circumstances over the last couple of years have spawned some really interesting languages, that people should check out or at least be aware of. If you're a Java developer, I would highly recommend learning at least one of the new languages, as learning is always good for you and it also opens you up to new interesting points of views and paradigms that you may have never even been aware existed.

Monday, January 18, 2010

My humble opinion on what is OSGi's purpose

There was a recent post on theserverside regarding Spring's decision to move spring dm server to Eclipse. There were a lot of posts that I thought sounded like people do not really understand what OSGi is really for and what are its benefits outside of the embedded realm.

www.theserverside.com post:
http://www.theserverside.com/news/thread.tss?thread_id=59183

Here is what I wrote there:

The responses suggest that many people don't consider OSGi a trivial concept.
And at first it isn't, but after reading a bit about it, it actually makes a lot of sense.
The primary issue that enterprise (as opposed to embedded) OSGi tries to solve is the JVM class loading issues.
Secondly, it is the separation of concerns.

As far as the JVM class-loading issue is concerned, I'm sure everyone that has worked with Java to a degree has run into JVM classpath issues, whether its related to having the same package class names, or having multiple jars with slight version variations loaded into the JVM memory.

OSGi allows the developer to specify which exact packages should be loaded into the JVM and even allows multiple JARs with let's say different versions to be loaded at the same time.

However, when you import you are given the option to specify which version of the package that you want to import. This provides much more granularity than just specifying the versions of the JAR files themselves.

When you create a jar, it is good practice to specify the versions of all of the packages that you are exporting as well.

As far as the second issue (separation of concerns), OSGi makes you start thinking about your code in terms of modules. A module should do something very specific.
As an example, you may have one module that has a specific JDBC driver for your database. Another one that deals with DAO layer. Another module that deals with exposing your business logic via REST or SOAP.

Furthermore, a good practice is to split each of your modules into an all interface bundle and one that is all implementation. That way you can swap out your implementation at any time with another bundle at runtime, without compromising your contracts defined in the interface bundle.

As far as the complexity of dealing with the Manifest file, that isn't the case. You actually don't need to change your manifest file directly.

As Stuart mentioned, if you use maven, the felix's maven bundler plugin will help you to manage your relationships of the bundles and packages for you.
You still need to provide which packages you want to import and export.

And another great feature of OSGi is that it allows you to swap out a piece of your functionality without bringing down the server.
In conventional containers the entire web application has to be restarted. With OSGi, nothing is down and only a piece of your code is actually refreshed with newer code.

Thursday, August 27, 2009

Using Groovy to gather a file path list

Just thought I'd share a good use case of Groovy as a basic script.

I've created this script in a need to speed up one of our legacy manual processes of gathering files to be released to another environment (QA, Staging and Prod).
Basically, when we used CVS, we would need to list out all of the files needed to be pushed manually. What this script does is allows you to specify the base directory in which it is to begin recursing. You can also specify the prefix to use.
For example, on another computer the paths where the CVS is located may be different.
This script also skips the CVS and SVN type files. And it switches the slashes from Windows (back-slash) to Unix style (forward-slash).


def outputMatch = { prependString, file ->
  def fileName = file.path
  fileName = fileName.replaceAll(/^\./, "");
  fileName = fileName.replaceAll("\\\\", "/")
  def prependStringStartPos = fileName.indexOf(prependString)
  fileName.substring(prependStringStartPos, fileName.length())
}

def baseDir = null
def prependString = ""
def lookupString = null

if (args.length > 0 && args[0] != null && !args[0].equals("")){
  // i.e. C:\dev\cvsrepo\devel\fr_site\frdart
  baseDirectory = args[0]
}

if (args.length > 1 && args[1] != null && !args[1].equals("")){
  // i.e. fr_site/frdart
  prependString = args[1]
}

if (args.length > 2 && args[2] != null && !args[2].equals("")){
  // i.e. frdart
  lookupString = args[2]
}

def count = 0

if (baseDirectory == null)
  baseDirectory = "."

new File(baseDirectory).eachFileRecurse{file ->
  
  if (file.isFile() && 
      !(file.parent.toLowerCase().endsWith("cvs") || file.parent.toLowerCase().endsWith("svn"))){
    if ((lookupString != null && file.name.contains(lookupString))
        || lookupString == null){
      println outputMatch(prependString, file)
      count++
    } 
  } 
} 

println "\nNumber of Files=" + count

Mockito and the Builder Pattern for speeding up fixtures in Unit Tests

Mockito is a great mocking library on the JVM and I love the terseness of it and all of the functionality that it offers just like EasyMock.

Problem Description

In one of my projects at my current job, I have been using it to mock out classes in my unit tests of some Freemarker templates. Technically, these are more of integration tests, but since the library that it uses to do the generation of the content is well unit-tested itself, I consider this to be in effect unit tests.

So, I came up with using the Builder pattern in conjunction with mockito to simplify the creation of fixtures. We have a complex object that consists of many other objects and in order to test the templates thoroughly, we needed to constructs many different variants of this object. So using the Agile approach I began creating many of these variants by using helper methods. I kept on refactoring until every hierarchical piece was eventually wrapped up in a separate method.

Just as an example of what I am talking about, I will create an imaginary hierarchical structure that mimics my company's object.

Domain

Let's say that the one object that we are talking about is an object that represents a country.

A country is made up of many states.

A state is made up of many cities.

Let's just keep it simple like this for a moment.
And we will keep track of the population total at each level and the full name of the leader at that post (president, governor, mayor) and the name of the entitity.

Classes

Country.java


package com.mzgubin.mockito_builder_example;

import java.util.List;
import java.util.ArrayList;

public class Country {

  private String name;
  private String leaderFullName;
  private long population = 0;
  private List states;

  public void setName(String name){
    this.name = name;
  }

  public String getName(){
    return name;
  }

  public void setLeaderFullName(String leaderFullname){
    this.leaderFullName = leaderFullName;
  }

  public String getLeaderFullName(){
    return leaderFullName;
  }

  public void setPopulation(long population){
    this.population = population;
  }

  public long getPopulation(){
    return population;
  }

  public void setStates(List states){
    this.states = states;
  }

  public List getStates(){
    return states;
  }
}

State.java


package com.mzgubin.mockito_builder_example;

import java.util.List;
import java.util.ArrayList;

public class State {

  private String name;
  private String leaderFullName;
  private long population = 0;
  private List cities;

  public void setName(String name){
    this.name = name;
  }

  public String getName(){
    return name;
  }

  public void setLeaderFullName(String leaderFullname){
    this.leaderFullName = leaderFullName;
  }

  public String getLeaderFullName(){
    return leaderFullName;
  }

  public void setPopulation(long population){
    this.population = population;
  }

  public long getPopulation(){
    return population;
  }

  public void setCities(List cities){
    this.cities = cities;
  }

  public List getCities(){
    return cities;
  }
}

City.java


package com.mzgubin.mockito_builder_example;

public class City {

  private String name;
  private String leaderFullName;
  private long population = 0;

  public void setName(String name){
    this.name = name;
  }

  public String getName(){
    return name;
  }

  public void setLeaderFullName(String leaderFullname){
    this.leaderFullName = leaderFullName;
  }

  public String getLeaderFullName(){
    return leaderFullName;
  }

  public void setPopulation(long population){
    this.population = population;
  }

  public long getPopulation(){
    return population;
  }
}

Mocking
Before using the Builder pattern, I've had to create all of the mock objects manually.
(I am excluding imports etc, just because I want to concentrate on the important code)



package com.mzgubin.mockito_builder_example;

import java.util.List;
import java.util.ArrayList;

import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.when;

public class Helper {
  public City getMockedCity(){
    City city = mock(City.class);
    when(city.getName()).thenReturn("New York City");
    when(city.getPopulation()).thenReturn(10);
    when(city.getLeaderFullName()).thenReturn("Michael Bloomberg");

    return city;
  }

  public State getMockedState(){
    List city = new ArrayList();

    City city = getMockedCity();
    cities.add(city);

    State state = mock(State.class);
    when(state.getName()).thenReturn("New York");
    when(state.getPopulation()).thenReturn(100);
    when(state.getLeaderFullName()).thenReturn("David Paterson");
    when(state.getCities()).thenReturn(cities);

    return state;
  }

  public Country getMockedCountry(){
    State state = getMockedState();
    List states = new ArrayList();
    states.add(state);
    Country country = mock(Country.class)

    when(country.getPopulation()).thenReturn(1000);
    when(country.getName()).thenReturn("United States");
    when(country.getLeaderFullName()).thenReturn("Barrack Obama");
    when(country.getStates()).thenReturn(states);
  }
}

This is quite an easy example, but if you have things that are more complex, you can imagine, this can become quite tedious to maintain as the code/tests change.

Enter the Builder Pattern
The idea really came from Groovy's builders and I initially wanted to create the builders in Groovy. I did not have much time to investigate using Groovy's builders, so instead I chose to go with Java since I am much more familiar with it.
I created a single Builder per class in the following format:


package com.mzgubin.mockito_builder_example;

import org.apache.log4j.Logger;
import java.util.List;
import java.util.ArrayList;

import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.when;

public class CountryBuilder {

  private final static Logger log = Logger.getLogger(CountryBuilder.class);

  private String name;
  private String leaderFullName;
  private long population;
  private List states;
  private Country country;

  public CountryBuilder() {
    country = mock(Country.class);
  }

  public CountryBuilder(Country country) {
    if (country != null) {
      this.country = country;
      populateFields(country);
    } else {
      country = mock(Country.class);
    }
  }

private void populateFields(Country country) {
    this.name= country.getName();
    this.leaderFullName = country.getLeaderFullName();
    this.population= country.getPopulation();
    this.states = country.getStates();
  }
  
  public String toString(){
    StringBuilder sb = new StringBuilder();
    sb.append("Country:\n");
    sb.append("name=" + name).append("\n");
    sb.append("leaderFullName=" + leaderFullName).append("\n");
    sb.append("population=" + population).append("\n");
    sb.append("states=" + states.size()).append("\n");
  
    return sb.toString();
  }
  
  public Country build(){
    if (states == null){
      states = new ArrayList();
    }
  
    when(country.getName()).thenReturn(name);
    when(country.getLeaderFullName()).thenReturn(leaderFullName);
    when(country.getPopulation()).thenReturn(population);
    when(country.getStates()).thenReturn(states);
  
    dump();
  
    return country;
  }
  
  public void dump(){
    log.debug(toString());
  }
  
  public CountryBuilder name(String name){
    this.name = name;
    return this;
  }
  
  public CountryBuilder leaderFullName(String leaderFullName){
    this.leaderFullName = leaderFullName;
    return this;
  }
  
  public CountryBuilder population(long population){
    this.population = population;
    return this;
  }
  
  public CountryBuilder states(List states){
    this.states = states;
  }
  
  public CountryBuilder state(State state) throws Exception {
    if (this.states == null){
      this.states = new ArrayList();
    }
    this.states.add(state);
    return this;
  }
  
  public CountryBuilder state(StateBuilder stateBuilder) throws Exception {
    this.state(stateBuilder.build());
  }
}

And I am not going to list the other two Builder implementations of State and City, as they are nearly identical to this implementation.

And now ladies and gentlemen, viola!
So now we no longer really need to mess with Mockito's implementation too much when we setup the fixtures, except when the Builders themselves need to be augmented.

We can now build a Country fixture object like so:


Country country = new CountryBuilder()
  .name("United States")
  .fullLeaderName("Barrack Obama")
  .population(1000)
  .state(
    new StateBuilder()
      .name("New York")
      .fullLeaderName("David Paterson")
      .population(100)
      .city(
        new CityBuilder()
          .name("New York City")
          .fullLeaderName("Michael Bloomberg")
          .population(10)
        ) // end of city
      ) // end of state
  ).build(); // end of country

Now, that is nice and clean!

Maybe next time I will show how to do the same utilizing Groovy's Builder pattern!

Just beware that I did not test this code, but hopefully it will compile. I will try and run it later to make sure it at least compiles. If you find any mistakes, please let me know.

Saturday, June 27, 2009

My pitch at trying to get my company to use Groovy alongside Java

This is a blog I've previously posted on my other blog page.

It was turned down since it was deemed too young of a language and not enough people know it as a whole.
What are your thoughts on this matter?
How would you pitch Groovy to your company if you had to, and would you?

Here is the link to the presentation that I've shown:
http://www.slideshare.net/mzgubin/groovy-finesse-1655111

Rethinking relational database backed domain mindsets

Yes there are issues with relational databases, but they have been have been a developers' bread and butter.

Most of every issue has been solved in some form or another.
The data duplication and inconsistency is usually solved by normalization.
The impedance mismatch has been solved with ORM layer.
The latest problem though of trying to distribute the database is becoming an issue though. The distribution part is emerging as companies are trying to scale out and utilize more hardware or VMs to solve the mounting pressures of growing number of hits.

So what is the solution to this problem? Well, it appears that relational databases are being overtaken by the key/value, or map-reduce type of databases, like: Amazon's Dynamo, Google's BigTable; and the open-source types like: BerkeleyDB, CouchDB, HBase, Hypertable, Tokyo Cabinet, Project Voldemort, Redis, MongoDB and the list goes on.

Object Databases (db4o) and hierarchial databases like Neo4j are in a different category.

These provide a way to not only distribute your databases, but also provide speed and efficiency that is needed. You also get eventual consistency.

There is a draw-back to this though. The drawback is that we need to rewire the way we think about domain design. There are no more relations per se and the eventual consistency means there is a chance for inconsistency?

So, how do you go about switching your relational database type of mindset to this one? Do you need to rethink the way you design the structure of your domains?
Maybe this just needs some type of a eureka effect to take place to get this.
Does anybody else have these types of questions?