Integrating Geotools into gvSIG CE (II)

The gvSIG CE codesprint took place last week in Munich. Víctor and I had a great time working with all the people there, specially with Ben and Fran, the other developers.

Although we couldn’t devote as much time as we wanted on the Geotools integration, some progress was done on the technical side and lots of things were discussed. The most important conclusions I took during the week are:

  • We need to investigate further on the cost of making the integration. We produced a TODO list that must be explored further before having an accurate idea of the cost. It can be seen at the end of this post.
  • Creating an adapter layer from current gvSIG API to new Geotools based one would reduce the incompatibilities with existing extensions. However, creating and maintaining such adapter is quite expensive and was one of the things that slowed me most during the codesprint. As Fran suggested, it would be cool to have a migration guide from old API to new one, but backwards compatibility at the development level is not considered from now on.
  • There is the will to collaborate between people from the two forks. From the next CE version, we’ll share NavTable and OpenCADTools will also be integrated. Hopefully, the collaboration around these projects will be only the beginning of a more tight collaboration.

In the same branch we worked on the Geotools integration we did some other tasks that also improve the development experience, which we consider very important.

One of these tasks is the Maven management: we are adding Maven support to all gvSIG projects. With this improvement it is not necessary to care about transitive dependencies anymore and, since these dependencies are automatically managed by Maven, and it is neither necessary to keep them in the repository. Thus, not only we obtain a much cleaner way of managing the dependencies and the configuration of the Eclipse projects but also we remove one of the most heavy set of files in the repository. However, the Maven management is not completely done: it is still necessary to configure Maven in order to generate the plugins and artifacts necessary to build a complete gvSIG distribution.

Another important task is the cleanup work: we removed a lot of unused features, dead code, extensions and more. Also, we formatted the code and organized imports of all Java files in order to have a much more readable source code with less warnings (indeed, I get confused whenever I have more than 15.000 warnings, I know, I get older). What we obtain with these improvements is, on one hand, to remove useless code that cannot realistically be maintained by the gvSIG CE team and, on the other hand, to obtain a much clearer source code. In order to quantify the weight lost with these changes, we have checked the number of source code lines in .xml and .java files, as well as the total MB size in the applications, extensions, frameworks and libraries folders:

Before Code Sprint

  • 153.113 lines in .xml files.
  • 1.313.508 lines in .java files.
  • 543MB in all files.

After the Code Sprint

  • 149.179 lines in .xml files.
  • 1.181.750 lines in .java files.
  • 303MB in all files.

So, in percentages, we managed to reduce the following amounts:

  • 2.57% in .xml files.
  • 10.03% in .java files.
  • 44.20% in total MB.

Next steps will be to go on estimating the cost of GeoTools integration. I plan to dedicate two or three days before the end of the month to the TODO list to gain a greater visibility on what’s to be done. Then, hopefully we’ll have enough information to take some decisions.

As for the aforementioned TODO list, we’ve identified the following tasks:

1.- Replace the adapter layer for DataStores.
This involves removing the adapters completely. We thought about creating an adapter layer to adapt the calls to old gvSIG adapter layer to the new GeoTools based one but we found some problems:

  • The layer is very big, it should involve reading as well as writing operations.
  • The layer has random access to the features while geotools has only sequential access.

We have decided that investing in this layer it is not worth. We prefer to remove completely old API and adapt the extensions. This will lead to incompatibility with extensions maintained for Association gvSIG but in the case of NavTable and OpenCADTools it seems like there is some interest in decoupling the data access layer so that 95% of the code is compatible.

– There are many points were the instance of the adapter is checked. This can be implemented by methods at the FLayer level (isFile, isDB, etc.). The layer gets the information from the DataStore or at construction time.
– Remove the adapters and see how many compile errors we get. We can compare it with the many errors we got with the CRS replacing where we got hundreds of errors and it only took 30 hours to compile it back (there is pending effort there).
– Adapt selection to DataStores (http://docs.geotools.org/latest/userguide/library/main/filter.html#handling-selection)
– Edition. Good point. Research on GT necessary here.
– Intelligent usage of DataStores. They are heavy objects. They should be cached, shared between layers and closed when not used anymore (or when not used for some time). SourceManager in libFMap is a very primitive implementation of this.
– FLyrWFS could be just a FLyrVect with a WFS DataStore on it.

2.- Remove libDriverManager, remove writers, remove GDBMS.
We should remove all these projects and see how many compile errors we get. We can compare it with the many errors we got with the CRS replacing where we got hundreds of errors and it only took 30 hours to compile it back (there is pending effort there).

– It is still necessary to show the user a list of available DB connectors, file formats with writing capabilities, etc. So in place of the drivers a “Source” abstraction must be built.
– To connect to databases it is necessary to create connections, to query all the available tables, etc. Not sure if this can be done with GT. It is necessary to check. If it is not possible maybe a specialization of Source called DBSouce should be built to manage that.

3.- Remove all raster stuff under FLyrRaster.
– Analogous to the adapter layer removal.

4.- Make layers become structured data object holders.
They don’t hold any functionality related with data. They will just provide a DataStore, a GridCoverage, a Geotools WMS client, etc. and the name of the layer, the relation with the other layers (parent, sibling, etc.)

– There is no need for most of FLyrVect subclasses since the difference between file and db is at the DataStore level. As holders of DataStore, a file based layer is exactly equal to a holder of a database. So VectorialDBLayer, VectorialFileLayer, etc. should be removed.
– Some subclases just add some functionality, like VoronoiAndTinInputLyr. This can be implemented like an adapter system in Eclipse EMF generated classes and can be easier for persistence than subclassing.

5.- Include OGR DataStores: OGR providers available as GT DataStores.

6.- Include image-io-ext providers: GDAL providers available as GT GridCoverages.

7.- Mavenization of build.
– We can take advantage of the efforts done in 2.0
– It will allow to run tests easily, produce coverage reports, sonar reports, keep
dependencies out of the SVN repository (faster checkouts and commits), put in place
an integration server, etc.

8.- Tests (lots of)
– Even if it is not the perfect indicator, we should produce a coverage report in order to motivate people to add tests.
– We could allow any refactoring (in the strict meaning of the word) that extracts some functionality and covers it with tests. Even if something breaks, it’s better in the long term.

9.- Persistence through extensible schemas.
– Create a extensible schema.
– Keep the methods that reads old projects and adapt to the new changes. Add an option “Import old project” that calls these methods.

10.- Recover WMS Dimension support (lost after Fran’s WMS refactoring)

11.- Data access abstraction layer for OpenCADTools and NavTable

12.- Andami simplification
– Make all plugins work under the same classloader -> Faster startup time.

Do you have a project idea and want to turn it into reality? We would like to hear from you, tell us about it

The facts define us