MDC Project Notes

2005-07-18 Progress Report
The following are now completed;
1.) Configuration and packaging - I have updated the ant build which does a complete build of Mdc and creates the appropriate
distribution files. It includes handing the extras to be installed in Exist (such as Apache Fop and Citeproc and also procedure
for installing Mdc in a local tomcat installation.
2.) User installation documentation.
3.) Resolver BaseUrl? is now configurable and has been updated with latest url.
4.) Changes to search UI have been done as requested.

2005-07-14 Progress on Federated Search:
The algorithm that I have done should satisfy Howard's requirement. The only thing is, it is in the mdc code as it is intimately tied into the exist cache, hence my question to Mathew re the jafer parallel query - perhaps I can exploit that?

The logic now works as follows;
I use zclient to do the search on each jafer target, but only return the number of hits for each (same as jafer parallel).
I then divide the max_records by the number of targets and apportion the number to be retrieved for each target.
I then loop through each target and retrieve the apportioned number for that target and store them in the cache
I then return an array of the keys for the full set in the cache, so the front-end can further sort/filter.

For example;
Say 3 targets T1, T2 and T3 with hits h1=160, h2=10, h3=42 (max_records = 100);
Then we will retrieve from;
T1 - 33 + 12 + 3 = 48
T2 - 10
T3 - 33 + 9 = 42

That is, the algorithm divides ntargets by max_records, allocates that number each from hits, then loops again dividing until all have been allocated.



2005-07-01 Notes on Federated Search:
I'm busy with configuration now and we need to finalise the logic on how the federated search works and what/how many records are returned in a query.

The present logic is;
1.) A set of jafer targets is configured in a config file (loaded at startup).
2.) Each jafer target has a max records parameter (target-max).
3.) We have a global parameter for max search records returned (search-max).
4.) When searching we apply the following logic;
Call jafer on 1st target and retrieve up to target-max records, but
if retrieved > search-max stop.
If not stopped, call next jafer target as above.

That is, if search-max is say 100 and targets are each 100 and there are say 160 results from target1, you'll only get the first 100 from target1 and target2 won't be looked at.
Or if search-max is 100 and target 1 is 50 and target2 is 50, you'll get first 50 (of the 160) from target1 and up to 50 from target2 if any.


2005-07-01 Progress Report:
We've got the SRW service up and running now with MDC. Does the basic keyword query as we discussed. We'll enhance it as we go. The client side is also now integrated into the MDC Destop Client. I'm going to be working on the packaging of the mdc app for distribution next week, so I'll also package the SRW stuff I've done along with the appropriate ant tasks, so that you can add it to the Jafer distribution.


2005-06-23 Note on SRW service.
We only have Mathew's "lite" srw client packaged with the desktop. Also, just to be precise, the desktop is actually communicating with the MDC server, which is implementing a web service which in turn uses mdc's current jafer query to search mdc's currently configured targets. Actually, mdc's web service is implemented from the srw api packaged by Mathew as part of the jafer code base.


2005-06-22 Progress Report:
Citeproc (the release version 0.7) is now integrated to Mdc. It uses a general purpose mdc-xhtml stylesheet adapted from one of Bruce's for the output redenering, presently just showing the resource list title as meta information and the resources formatted according to a user selected choice of one of Bruce's CSL scripts. Presently the list of CSL scripts are "hardcoded" in the "Create Doc" stylesheet drop-down - we can include these as part of mdc config when I do that work. Note also that you can select a citeproc stylesheet and PDF output and it generates a PDF document as well!

Try it out at http://oxfordltgdev.org.uk:8080/mdctest/search.jsp


2005-06-07 Progress Report:
I've uploaded the new version to http://www.oxfordltgdev.org.uk:8080/mdctest/search.jsp.
Significant changes are;
1.) Sorted out the author field. This now displays the <displayForm> field if it exists or the <namePart> field if not.
2.) Widths of colums in the grids have been increased (these I can only do as an overall width and then give "hints" as to the wdths of the individual columns - the browser still decides on its own actual widths).
3.) Browse Repository tag now instead of popup window. You can "open" a RL or format it into viewable documents (html,pdf, etc).
4.) The "Create Document" functions for xhtml, pdf and mods xml are working from both the Save Resource popup and the Browse Repository tab. Here I am using a "standard" stylesheet for mods3 docs which I created based on the original sample done with mdc1. It uses the cocoon instance installed with exist to do the formatting for xhtml, pdf and mods.
5.) I have reorganised how the RL header's are stored internally in the respoitory. They are now also stored as mods documents with the <relatedItem> tag being used to reference the constituent resources. This means that you can search the repository and see both RL's and resource components in the results.
6.) Transport classes for Andy with ModsUtils? performing extract/formatting functions for use in export/import.

Outstanding;
1.) Import button. Not working yet, although I have written all the code for this. It now just needs implementing and testing. I should finish this by tomorrow.
2.) Bruce's citeproc - Still need to get saxon working with jafer before we can try this.
3.) SRW service.
4.) There are still a few configuration things outstanding, e.g specifying number of jafer results returned, multiple jafer targets, etc.
4.) Packaging and documentation in cvs for download/use by third parties.

Hopefully you can use this version for demo purposes. There may still be some "funnies" with conversion of marc to mods with documents that have fields I have not encountered during testing. We can deal with these as we encounter them.


2005-06-02 Work tasks and estimates:
1.) Get mods2xhtml stylesheet working for cocoon mechanism (by the way this was working but with rli, so I have to translate the xsl into mods and deal with the new header we're now using). This so you can demo.
2.) Fix minor layout problems on grid on UI. This so you can demo.
3.) Change UI to have "Browse Resources" tab instead of popup and allow user to "view as xhtml" (dependent on 1.) - also so you can demo.
4.) Get Transfer class sorted for Andy for import as per his request. This I can complete by Monday.
5.) Get Transfer class doing export for Andy. This I should be able to complete by Monday.
6.) Complete import classes on server to allow user to upload xml files for import. By later next week. Say Wednesday.
7.) Integration with citeproc. Matheus may have done it in an hour with php, but we have a conflict with jafer and it WILL take a lot longer than that to resolve if we run it under MDC. I will try Wolfgang's suggestion on running saxon8 under exist, but reconfiguring java to use saxon is going to require research of the docs and testing, etc. I still think it is going to take a couple of days work to get citeproc properly integrated with what we're doing (eg, sorting out how we add our header).
8.) SRW/CQL support from the mdc server to accept a search request from the client. I still need to investigate this in more detail. If there is a library available, then this may be a day or two's work. If we have to code up the SRW protocol on the server side this will take longer. I have done axis-style web service applications before and it is much more complex than just coding up a wsdl. But let me complete the investigation of this first.



2005-06-01 Citeproc and Mdc:
I have downloaded citeproc and it would be ideal for generating formatted resourcelists from MDC. Effectively, the formatting is specified using CSL, an easy-to-use xml markup language for specifying citation formatting. Bruce has a number of sample CSL files in the distribution. Also, there is the ability to use output drivers for different document types although I'm unsure as to what is currently available (if Bruce could clarify?).

There are two ways in which citeproc can be used with mdc;

1.) Integrated as part of MDC with one or more "standard" CSL's being available for users to select when the "Create Document" button is selected on a particular resource list.

2.) Where citeproc is run separately by the user (for example as a standalone application from the command line, or integrated into another web application perhaps) and it "gets" the RL to be formatted from mdc via HTTP (using SRW/CQL).

I have tried to get 1.) set up but run into the following problem; citeproc utilses xslt 2.0 which requires saxon8. After installing this with mdc, the present jafer transforms don't work due to some conflict with the xslt 1.0 xalan used by jafer. This means I am going to have to spend some time trying to resolve this conflict - it could be a day or two's work.

With 2.) we are currently investigating how best to provide SRW support from mdc, but this is also not trivial. I have not yet completed the investigation.

The question is then, do you wan't me to first get a basic xslt 1.0 mods-to-xhtml stylesheet working, so we can at least produce an html view of an RL and use that for demo purposes, or should I press on and try and get citeproc working with jafer.


2005-05-31 Progress Report:
I'm busy with reorganising the UI as per our discussion. Wrt the repository browse;

The user then selects the RL they want from the table, the popup is closed and the "Repository..." popup now shows the list from the repository that the user selected. When the user in turn closes this popup, the original search window is refreshed with the current (newly loaded) RL items.

the user clicks this, the search window is effectively replaced by the Repos Browse window and the user selects a RL from the list. The question is, what should the next screen be the user sees; 1.) the "Resources..." popup, or the Search window again, but with the loaded RL items?

The basic logic on import is you call txfer.loadResourceList(String incomingRL), then it parses the big xml doc into 1st mods tag for the header, creates a new ResourceHeader? and calls its setRawXml() method, then for each subsequent mods tag, does the same for ResourceItems?. The setRawXml() method then "extracts" the fields and populates the fields map.


2005-05-24 Progress Report:
I've just uploaded the latest version. This is at http://www.oxfordltgdev.org.uk:8080/mdctest/search.jsp (I've created the separate demo and test webapps on the dev machine (mdc and mdctest), so when you're happy with this we can copy it to mdc).

This version now includes;
1.) All popups fully working.
2.) Save to repository working.
3.) Search repository working.
4.) Filter search results working.
5.) Repository browse working.
6.) Load RL from repository for update.
7.) Update or Save As New for RL's.

Outstanding;
1.) Create Document - needs the MODS3TORLI xsl - still to do.
2.) Import - needs the integration class Andy and I are working on.
3.) Fields for other resource types, e.g video, etc, not finalised.

2005-05-23 Transfer Classes - Notes
The basic logic is looking like;
On the ReourceHeader? and ResourceItem?'s I've included a getUpdatedXml() to get the XML after applying whatever is in the fields Map. So getRawXml() gets the original xml that was for example loaded during import. That is, in a newly created transfer object, getRawXml would return null. So toMods() calls getUpdatedXml() on ResourceHeader? then iterates through and calls getUpdatedXml() on each ResourceItem?, and so on.



2005-05-14 Progress Report:
1.) UI
I have got the web UI more-or-less completed, although not all the functionality is working yet. You'll see there are main tabs that handle the various main user operations; (search, manual input, import and repository). The search tab allows the user to select "federated" or "repository" as targets (although the repos search isn't finished yet) and add rsources from the result set to the RL (these can be intermixed from jafer and exist). The manual input tab allows the user to input resource meta data (by tabs Book, Journal, Image, Video), although the fields aren't finalised yet. The Resource List... button allows the user to specify RL metadata and to save the list to the repository and also to export the list in the various export formats. This is also not yet working as we don't yet have mods-rli stylesheets (existing extracts expect rli format). The repository (also not finished yet) allows the user to search the repository for RL's by creator/title and a table of these is displayed. The user can select one and it is "opened" into the "Your List" table. The user can then work with this as before. The import tab allows users to import a list (also not complete till RLI/MODS stylesheet is done).
On the search tab I haven't yet included the search current results - still working on that.
I have also got the extra columns and vertical and horiz scrolling all working now.

2.) Backend
All the MODS functions are now working and resource records are stored in MODS format in the db. I have also changed the RL to be stored internally as a simple xml structure, e.g.

<mdcresourcelist>
<owner>Adrian</owner>
<author>Adrian</author>
<title>test list</title>
<annotation>anno</annotation>
<resources>
<resourceid>1bda4aacd812a26d04173c0eeaadefc0?</resourceid>
<resourceid>46e18052a05db7a4cf1ce9d5e6d2fcf1?</resourceid>
</resources>
</mdcresourcelist>

Then the list to be loaded is a join on this with the mods documents (this is effectively Bruce's recommendation). I have not stored this as RLI as it was too complicated. We will need to have a stylesheet to export an internal mdc RL in RLI or whatever else is required. As mentioned above, we will require MODS-RLI xsl to get the exports working.

Main work to do now is bolting the rest of the UI with the underlying classes. Still a few days work required. I have spent about 7 fte days on this phase so far.


Progress Note as at 2005-04-13;
The interface as it was in mdc1 is now set up on the dev server and using the repository. You can access it at
http://www.oxfordltgdev.org.uk:8080/mdc/search2.jsp. It follows the process >as before, i.e.
1.) Perform the jafer searches and add items to your list
2.) Manipulate your list.
3.) Click 'create list' and the RLI form is displayed.
4.) Complete the form and press 'save'. The RL is then added to the
repository in the 'public' collection.
5.) You can then export the list in RLI xml, XHTML and PDF. Note, that there is no flow included in these pages yet (i.e. navigating back to search2.jsp, clearing results, etc). You would need to let me know what you want there.

Additionally, you can then see the document in the repository by going
into the repostory front-end at http://www.oxfordltgdev.org.uk:8080/exist/mdc/home.xql and choosing to
'browse the public store', You will see your list and can open it and
also do the exports from the repository.
The jafer search option is still in the repository front-end, I will
remove it in due course.

Progress note as at 2005-03-31
Firstly, for the front-end work, I have taken an alternative approach to the MDC 1 jsp approach (jsp, tag libraries, java beans) by using xquery scripts for the user interface. Exist provides seamless support for coding the front-end pages, quering the database xml documents, integrating to java (the jafer code) and executing xslt transforms all from within the xquery language! Although there is a learning curve with xquery it is really worth the effort as it provides a much more cohesive architecture for building web applications. It is more mature than I expected. I have subscribed to the Exist mailing list and have been monitoring the posts. I have also scanned the forum archives. Exist has actually now been going for 3 years, so it has reached a pretty stable level. From an efficiency point of view, it is pretty similar to jsp (compiles and caches xqueries, etc). I believe Exist is definitely viable for a production environment and handling a reasonably large user community.

The functionality I have completed at this stage is:
- Jafer configs are stored as an xml file in the database (presently uploaded using Exist.
- Basic search facility same as before, but allows user to select which jafer config to use and the max records that should be returned.
- Results are shown in tabs as before and the user can "mark" records for adding to the resource list. Each tab could be for a different jafer target. The user can switch between tabs and all marked records are remembered. All of the tab information is now kept in the database (as oposed to databeans in session variables), so the user can work with a large number of searches.
- The xquery accesses jafer code directly doing away with the need for all the previous "middle layer" java classes. In fact I have only added one wrapper method to ZClient, otherwise it uses just the native jafer code and none of the mdc jafer classes are now required.
- The user can "openUrl" records at any stage.
- The user can generate a resource list from the marked records (across all tabs) and a resource list form is presented as before. The user can add a list-level annotation and record-level annotations (these are now added to the resources when the list is saved).
- After generating the list and adding annotations the user saves the list (specifying creator, title). The list is saved in a public store. Later we can let it be saved to a private store. After saving, the user could go back and mark other records in the tabs and generate another list, or the user can clear tabs and start again.
- The user can browse resource lists in the public store and select one to view. I have catered for a search facility here as well, but more work needs to be done on this.
- The user can view a stored resource list and export it (at this stage) in rli xml, html, pdf formats. Later we can quite easily add "editing" to this view and allow the user to change annotations, title, etc and the re-save the list.
- The user can import a resource list in rli xml format and store it in the public store under a specified creator/title. This is then browsable/viewable as described above.

Outstanding with the above;
- Layout - "prettying" up the layout and getting the page-flow properly working - I still need to do this. There are some graphics missing, table borders and colours, etc. Also menu's and help information need to be done. The javascipt message "Jafer search in progress" still needs to be added.
- Bugs
- there are some bugs where an exception is thrown when some fields are missing in the search results.
- there is a bug relating to session timeouts - if the user leaves the screen for a while and the session times out an exception is thrown.
- The "drill-down" search (when clicking on an author or title) is not working yet.

Outstanding from design document;
There are still a number of items outstanding from the document. Perhaps once you've been through the app, we can revise what's still outstanding. I haven't yet looked at the mods stuff, etc.

Upload to Demo Server;
I am still sorting out a few problems, but I should be able to do an upload later this evening. I will also update cvs tomorrow morning.

Progress note as at 2005-03-17
I've made some progress - I now have jafer integrated into the exist web context and I've created some test XQuery's to submit the jafer queries and process the results directly into the db without having to go through the whole databeans mechansim. This simplifies the java side and opens up much more flexibility in working with the front-end. I have started on some new front-end user-interface stuff, but there's still a way to go on this. I'm also still busy with integrating the cocoon stylesheets to format the downloads.

Progress note as at 2005-03-08
I spent a couple of hours with Colin this afternoon going through what I had
done and discussing the various technical alternatives. In terms of next
steps at least related to what I have been doing it would seem these could
be;
1.) Get the cocoon downloads (pdf,xml,etc) directly from Exist sorted out. I
would need information on where the existing cocoon script/s are.
2.) Work on the server/db side of using multiple jafer targets. This should
probably start with a detailed technical spec document as to exactly how thisis going to be done as it seems there are a number of possible approaches.
This should be done in close conjunction with Colin and Mathew. I'll
elaborate on this when we speak tomorrow.
3.) Work up some detail on how/what should be done with the resource lists that are stored in the exist db. Presently they are just stored primitively
under an "owner/title" entry, but there's nothing more than that. Again, a
spec document would probably be a good idea for this.
4.) Housekeeping stuff. Presently the session search results are just stored
in the exist-db, but are never cleared out and would have to be done so
manually through Exist admin facilities. Some automated procedures should be put in place to do this. Also, backup/restore procedures should at least be documented for when production resource lists are being stored.

Progress note as at 2005-02-17
I have set up my development environment and own test server and have got the existing mdc app working on it. I have now created a new package stream org.mdcog and the packages org.mdcog.jafer, org.mdcog.databeans, org.mdcog.db
and org.mdcog.util. I have separated out all the classes used for mdc from the original jafer classes and moved these into the above packages. I have
also written an ant script to build the mdc stuff and prepare a fileset of
the appropriate files for both mdc and jafer for copy to tomcat (or
generating a war).

I have now worked through (to understand) the existing db layer and worked out a strategy for handling both session state persistence and resource list persistence using exist-db (see discussion below).

I have created a org.mdcog.JaferClient? bean (from the original ZClientDB)
which will be used to handle the jafer configuration, queries and result sets
and transforms. Persistence is delegated to org.mdgog.Persistor which handles databean creation and the exist-db interface. The jsp code will interact directly with JaferClient? as its principal bean. I am currently working through exist-db to fully understand it (setting up some test code to
exercise its api, etc, before coding up the Persistor class.

The approach I have taken is that JaferClient? provides the public api to the
jsp search code and keeps session scoped databeans and/or interacts with the Persistor. The basic logic for a query is;
-Jsp search calls JaferClient? submitQuery method with query keys.
-JaferClient? executes query on jafer target (presently just one).
-JaferClient? reads all (or part) of the records in jafer resultset,
transforms them (RLI xsl) to xml and persists them as xml documents in the db keyed by sessionId,recordSetId (tabid) and recordNo.
-JaferClient? generates a set of databeans for each record containing
Auther,Title,docId,openURL. This set is returned to jsp for display.
-User selects list of records to add to resource list, etc. Existing beans
such as ReadingListBean? work as is, but they are modified to call the
Persitor to retrieve records for generating the download list, etc.

The main difference from the existing system is obviously using exist-db as
the persistence store, but also the separate packaging of the different
functions, hopefully making it easier to implement your design goals, and
also for later packaging mdc2 as a toolkit with its own well defined user
api.

I think everthing above is envisaged in your design document. The one thing that comes to mind that I can see as a possible "limitation" to the end user is; when we execute a jafer query, there could be a large number of results returned. This is presently restricted to some configured or user specified maximum - but there is no way for the user to see the "extra" results. That is there's no equivalent to the "page tabs" you get in google for example. (The present tabs are per query not per page of results for one query). I'm not sure if you would want to cater for this, but I will keep David's mechanism of subclassing jafer's setRecordCursor as that could facilitate it.