MDC Project Notes
2005-07-18 Progress Report
The following are now completed;
1.) Configuration and packaging - I have updated the ant build which does a complete build of Mdc and creates the appropriate
distribution files. It includes handing the extras to be installed in Exist (such as Apache Fop and Citeproc and also procedure
for installing Mdc in a local tomcat installation.
2.) User installation documentation.
3.) Resolver BaseUrl? is now configurable and has been updated with latest url.
4.) Changes to search UI have been done as requested.
2005-07-14 Progress on Federated Search:
The algorithm that I have done should satisfy Howard's requirement. The
only thing is, it is in the mdc code as it is intimately tied into the
exist cache, hence my question to Mathew re the jafer parallel query -
perhaps I can exploit that?
The logic now works as follows;
I use zclient to do the search on each jafer target, but only return the
number of hits for each (same as jafer parallel).
I then divide the max_records by the number of targets and apportion the
number to be retrieved for each target.
I then loop through each target and retrieve the apportioned number for
that target and store them in the cache
I then return an array of the keys for the full set in the cache, so the
front-end can further sort/filter.
For example;
Say 3 targets T1, T2 and T3 with hits h1=160, h2=10, h3=42 (max_records
= 100);
Then we will retrieve from;
T1 - 33 + 12 + 3 = 48
T2 - 10
T3 - 33 + 9 = 42
That is, the algorithm divides ntargets by max_records, allocates that
number each from hits, then loops again dividing until all have been
allocated.
2005-07-01 Notes on Federated Search:
I'm busy with configuration now and we need to finalise the logic on how
the federated search works and what/how many records are returned in a
query.
The present logic is;
1.) A set of jafer targets is configured in a config file (loaded at
startup).
2.) Each jafer target has a max records parameter (target-max).
3.) We have a global parameter for max search records returned (search-max).
4.) When searching we apply the following logic;
Call jafer on 1st target and retrieve up to target-max records, but
if retrieved > search-max stop.
If not stopped, call next jafer target as above.
That is, if search-max is say 100 and targets are each 100 and there are
say 160 results from target1, you'll only get the first 100 from target1
and target2 won't be looked at.
Or if search-max is 100 and target 1 is 50 and target2 is 50, you'll get
first 50 (of the 160) from target1 and up to 50 from target2 if any.
2005-07-01 Progress Report:
We've got the SRW service up and running now with MDC. Does the basic
keyword query as we discussed. We'll enhance it as we go. The client
side is also now integrated into the MDC Destop Client. I'm going to be
working on the packaging of the mdc app for distribution next week, so
I'll also package the SRW stuff I've done along with the appropriate ant
tasks, so that you can add it to the Jafer distribution.
2005-06-23 Note on SRW service.
We only have Mathew's "lite" srw
client packaged with the desktop. Also, just to be precise, the desktop
is actually communicating with the MDC server, which is implementing a
web service which in turn uses mdc's current jafer query to search mdc's
currently configured targets. Actually, mdc's web service is implemented
from the srw api packaged by Mathew as part of the jafer code base.
2005-06-22 Progress Report:
Citeproc (the release version 0.7) is now integrated to Mdc. It uses a
general purpose mdc-xhtml stylesheet adapted from one of Bruce's for the
output redenering, presently just showing the resource list title as
meta information and the resources formatted according to a user
selected choice of one of Bruce's CSL scripts. Presently the list of CSL
scripts are "hardcoded" in the "Create Doc" stylesheet drop-down - we
can include these as part of mdc config when I do that work. Note also
that you can select a citeproc stylesheet and PDF output and it
generates a PDF document as well!
Try it out at http://oxfordltgdev.org.uk:8080/mdctest/search.jsp
2005-06-07 Progress Report:
I've uploaded the new version to
http://www.oxfordltgdev.org.uk:8080/mdctest/search.jsp.
Significant changes are;
1.) Sorted out the author field. This now displays the <displayForm>
field if it exists or the <namePart> field if not.
2.) Widths of colums in the grids have been increased (these I can only
do as an overall width and then give "hints" as to the wdths of the
individual columns - the browser still decides on its own actual widths).
3.) Browse Repository tag now instead of popup window. You can "open" a
RL or format it into viewable documents (html,pdf, etc).
4.) The "Create Document" functions for xhtml, pdf and mods xml are
working from both the Save Resource popup and the Browse Repository tab.
Here I am using a "standard" stylesheet for mods3 docs which I created
based on the original sample done with mdc1. It uses the cocoon instance
installed with exist to do the formatting for xhtml, pdf and mods.
5.) I have reorganised how the RL header's are stored internally in the
respoitory. They are now also stored as mods documents with the
<relatedItem> tag being used to reference the constituent resources.
This means that you can search the repository and see both RL's and
resource components in the results.
6.) Transport classes for Andy with ModsUtils? performing
extract/formatting functions for use in export/import.
Outstanding;
1.) Import button. Not working yet, although I have written all the code
for this. It now just needs implementing and testing. I should finish
this by tomorrow.
2.) Bruce's citeproc - Still need to get saxon working with jafer before
we can try this.
3.) SRW service.
4.) There are still a few configuration things outstanding, e.g
specifying number of jafer results returned, multiple jafer targets, etc.
4.) Packaging and documentation in cvs for download/use by third parties.
Hopefully you can use this version for demo purposes. There may still be
some "funnies" with conversion of marc to mods with documents that have
fields I have not encountered during testing. We can deal with these as
we encounter them.
2005-06-02 Work tasks and estimates:
1.) Get mods2xhtml stylesheet working for cocoon mechanism (by the way
this was working but with rli, so I have to translate the xsl into mods
and deal with the new header we're now using). This so you can demo.
2.) Fix minor layout problems on grid on UI. This so you can demo.
3.) Change UI to have "Browse Resources" tab instead of popup and allow
user to "view as xhtml" (dependent on 1.) - also so you can demo.
4.) Get Transfer class sorted for Andy for import as per his request.
This I can complete by Monday.
5.) Get Transfer class doing export for Andy. This I should be able to
complete by Monday.
6.) Complete import classes on server to allow user to upload xml files
for import. By later next week. Say Wednesday.
7.) Integration with citeproc. Matheus may have done it in an hour with
php, but we have a conflict with jafer and it WILL take a lot longer
than that to resolve if we run it under MDC. I will try Wolfgang's
suggestion on running saxon8 under exist, but reconfiguring java to use
saxon is going to require research of the docs and testing, etc. I still
think it is going to take a couple of days work to get citeproc properly
integrated with what we're doing (eg, sorting out how we add our header).
8.) SRW/CQL support from the mdc server to accept a search request from
the client. I still need to investigate this in more detail. If there is
a library available, then this may be a day or two's work. If we have to
code up the SRW protocol on the server side this will take longer. I
have done axis-style web service applications before and it is much more
complex than just coding up a wsdl. But let me complete the
investigation of this first.
2005-06-01 Citeproc and Mdc:
I have downloaded citeproc and it would be ideal for generating
formatted resourcelists from MDC. Effectively, the formatting is
specified using CSL, an easy-to-use xml markup language for specifying
citation formatting. Bruce has a number of sample CSL files in the
distribution. Also, there is the ability to use output drivers for
different document types although I'm unsure as to what is currently
available (if Bruce could clarify?).
There are two ways in which citeproc can be used with mdc;
1.) Integrated as part of MDC with one or more "standard" CSL's being
available for users to select when the "Create Document" button is
selected on a particular resource list.
2.) Where citeproc is run separately by the user (for example as a
standalone application from the command line, or integrated into another
web application perhaps) and it "gets" the RL to be formatted from mdc
via HTTP (using SRW/CQL).
I have tried to get 1.) set up but run into the following problem;
citeproc utilses xslt 2.0 which requires saxon8. After installing this
with mdc, the present jafer transforms don't work due to some conflict
with the xslt 1.0 xalan used by jafer. This means I am going to have to
spend some time trying to resolve this conflict - it could be a day or
two's work.
With 2.) we are currently investigating how best to provide SRW support
from mdc, but this is also not trivial. I have not yet completed the
investigation.
The question is then, do you wan't me to first get a basic xslt 1.0
mods-to-xhtml stylesheet working, so we can at least produce an html
view of an RL and use that for demo purposes, or should I press on and
try and get citeproc working with jafer.
2005-05-31 Progress Report:
I'm busy with reorganising the UI as per our discussion. Wrt the
repository browse;
- We currently invoke this as a popup from the "Repository..." popup.
The user then selects the RL they want from the table, the popup is
closed and the "Repository..." popup now shows the list from the
repository that the user selected. When the user in turn closes this
popup, the original search window is refreshed with the current (newly
loaded) RL items.
- I am now putting a "Browse Repository" tab on the search window. When
the user clicks this, the search window is effectively replaced by the
Repos Browse window and the user selects a RL from the list. The
question is, what should the next screen be the user sees; 1.) the
"Resources..." popup, or the Search window again, but with the loaded RL
items?
The basic logic on import is you call txfer.loadResourceList(String
incomingRL), then it parses the big xml doc into 1st mods tag for the
header, creates a new ResourceHeader? and calls its setRawXml() method,
then for each subsequent mods tag, does the same for ResourceItems?. The
setRawXml() method then "extracts" the fields and populates the fields map.
2005-05-24 Progress Report:
I've just uploaded the latest version. This is at
http://www.oxfordltgdev.org.uk:8080/mdctest/search.jsp (I've created the
separate demo and test webapps on the dev machine (mdc and mdctest), so
when you're happy with this we can copy it to mdc).
This version now includes;
1.) All popups fully working.
2.) Save to repository working.
3.) Search repository working.
4.) Filter search results working.
5.) Repository browse working.
6.) Load RL from repository for update.
7.) Update or Save As New for RL's.
Outstanding;
1.) Create Document - needs the MODS3TORLI xsl - still to do.
2.) Import - needs the integration class Andy and I are working on.
3.) Fields for other resource types, e.g video, etc, not finalised.
2005-05-23 Transfer Classes - Notes
The basic logic is looking like;
On the ReourceHeader? and ResourceItem?'s I've included a getUpdatedXml()
to get the XML after applying whatever is in the fields Map. So
getRawXml() gets the original xml that was for example loaded during
import. That is, in a newly created transfer object, getRawXml would
return null. So toMods() calls getUpdatedXml() on ResourceHeader? then
iterates through and calls getUpdatedXml() on each ResourceItem?, and so on.
2005-05-14 Progress Report:
1.) UI
I have got the web UI more-or-less completed, although not all the
functionality is working yet. You'll see there are main tabs that handle
the various main user operations; (search, manual input, import and
repository). The search tab allows the user to select "federated" or
"repository" as targets (although the repos search isn't finished yet)
and add rsources from the result set to the RL (these can be intermixed
from jafer and exist). The manual input tab allows the user to input
resource meta data (by tabs Book, Journal, Image, Video), although the
fields aren't finalised yet. The Resource List... button allows the user
to specify RL metadata and to save the list to the repository and also
to export the list in the various export formats. This is also not yet
working as we don't yet have mods-rli stylesheets (existing extracts
expect rli format). The repository (also not finished yet) allows the
user to search the repository for RL's by creator/title and a table of
these is displayed. The user can select one and it is "opened" into the
"Your List" table. The user can then work with this as before. The
import tab allows users to import a list (also not complete till
RLI/MODS stylesheet is done).
On the search tab I haven't yet included the search current results -
still working on that.
I have also got the extra columns and vertical and horiz scrolling all
working now.
2.) Backend
All the MODS functions are now working and resource records are stored
in MODS format in the db. I have also changed the RL to be stored
internally as a simple xml structure, e.g.
<mdcresourcelist>
<owner>Adrian</owner>
<author>Adrian</author>
<title>test list</title>
<annotation>anno</annotation>
<resources>
<resourceid>1bda4aacd812a26d04173c0eeaadefc0?</resourceid>
<resourceid>46e18052a05db7a4cf1ce9d5e6d2fcf1?</resourceid>
</resources>
</mdcresourcelist>
Then the list to be loaded is a join on this with the mods documents
(this is effectively Bruce's recommendation). I have not stored this as
RLI as it was too complicated. We will need to have a stylesheet to
export an internal mdc RL in RLI or whatever else is required. As
mentioned above, we will require MODS-RLI xsl to get the exports working.
Main work to do now is bolting the rest of the UI with the underlying
classes. Still a few days work required. I have spent about 7 fte days
on this phase so far.
Progress Note as at 2005-04-13;
The interface as it was in mdc1 is now set up on the dev server and using the repository. You can access it at
http://www.oxfordltgdev.org.uk:8080/mdc/search2.jsp. It follows the
process
>as before, i.e.
1.) Perform the jafer searches and add items to your list
2.) Manipulate your list.
3.) Click 'create list' and the RLI form is displayed.
4.) Complete the form and press 'save'. The RL is then added to the
repository in the 'public' collection.
5.) You can then export the list in RLI xml, XHTML and PDF.
Note, that there is no flow included in these pages yet (i.e. navigating
back to search2.jsp, clearing results, etc). You would need to let me know what you want there.
Additionally, you can then see the document in the repository by going
into the repostory front-end at http://www.oxfordltgdev.org.uk:8080/exist/mdc/home.xql and choosing to
'browse the public store', You will see your list and can open it and
also do the exports from the repository.
The jafer search option is still in the repository front-end, I will
remove it in due course.
Progress note as at 2005-03-31
Firstly, for the front-end work, I have taken an alternative approach to
the MDC 1 jsp approach (jsp, tag libraries, java beans) by using xquery
scripts for the user interface. Exist provides seamless support for
coding the front-end pages, quering the database xml documents,
integrating to java (the jafer code) and executing xslt transforms all
from within the xquery language! Although there is a learning curve with
xquery it is really worth the effort as it provides a much more cohesive
architecture for building web applications. It is more mature than I
expected. I have subscribed to the Exist mailing list and have been
monitoring the posts. I have also scanned the forum archives. Exist has
actually now been going for 3 years, so it has reached a pretty stable
level. From an efficiency point of view, it is pretty similar to jsp
(compiles and caches xqueries, etc). I believe Exist is definitely
viable for a production environment and handling a reasonably large user
community.
The functionality I have completed at this stage is:
- Jafer configs are stored as an xml file in the database (presently uploaded using Exist.
- Basic search facility same as before, but allows user to select which
jafer config to use and the max records that should be returned.
- Results are shown in tabs as before and the user can "mark" records
for adding to the resource list. Each tab could be for a different jafer
target. The user can switch between tabs and all marked records are
remembered. All of the tab information is now kept in the database (as
oposed to databeans in session variables), so the user can work with a
large number of searches.
- The xquery accesses jafer code directly doing away with the need for
all the previous "middle layer" java classes. In fact I have only added
one wrapper method to ZClient, otherwise it uses just the native jafer
code and none of the mdc jafer classes are now required.
- The user can "openUrl" records at any stage.
- The user can generate a resource list from the marked records (across
all tabs) and a resource list form is presented as before. The user can
add a list-level annotation and record-level annotations (these are now
added to the resources when the list is saved).
- After generating the list and adding annotations the user saves the
list (specifying creator, title). The list is saved in a public store.
Later we can let it be saved to a private store. After saving, the user
could go back and mark other records in the tabs and generate another
list, or the user can clear tabs and start again.
- The user can browse resource lists in the public store and select one
to view. I have catered for a search facility here as well, but more
work needs to be done on this.
- The user can view a stored resource list and export it (at this stage)
in rli xml, html, pdf formats. Later we can quite easily add "editing"
to this view and allow the user to change annotations, title, etc and
the re-save the list.
- The user can import a resource list in rli xml format and store it in
the public store under a specified creator/title. This is then
browsable/viewable as described above.
Outstanding with the above;
- Layout - "prettying" up the layout and getting the page-flow properly
working - I still need to do this. There are some graphics missing,
table borders and colours, etc. Also menu's and help information need to
be done. The javascipt message "Jafer search in progress" still needs to
be added.
- Bugs
- there are some bugs where an exception is thrown when some fields are
missing in the search results.
- there is a bug relating to session timeouts - if the user leaves the
screen for a while and the session times out an exception is thrown.
- The "drill-down" search (when clicking on an author or title) is not
working yet.
Outstanding from design document;
There are still a number of items outstanding from the document. Perhaps
once you've been through the app, we can revise what's still
outstanding. I haven't yet looked at the mods stuff, etc.
Upload to Demo Server;
I am still sorting out a few problems, but I should be able to do an
upload later this evening. I will also update cvs tomorrow morning.
Progress note as at 2005-03-17
I've made some progress - I now have jafer integrated into the exist web
context and I've created some test XQuery's to submit the jafer queries
and process the results directly into the db without having to go
through the whole databeans mechansim. This simplifies the java side and
opens up much more flexibility in working with the front-end. I have
started on some new front-end user-interface stuff, but there's still a
way to go on this. I'm also still busy with integrating the cocoon
stylesheets to format the downloads.
Progress note as at 2005-03-08
I spent a couple of hours with Colin this afternoon going through what I had
done and discussing the various technical alternatives. In terms of next
steps at least related to what I have been doing it would seem these could
be;
1.) Get the cocoon downloads (pdf,xml,etc) directly from Exist sorted out. I
would need information on where the existing cocoon script/s are.
2.) Work on the server/db side of using multiple jafer targets. This should
probably
start with a detailed technical spec document as to exactly how thisis
going to be done as it seems there are a number of possible approaches.
This should be done in close conjunction with Colin and Mathew. I'll
elaborate on this when we speak tomorrow.
3.)
Work up some detail on how/what should be done with the resource lists
that are stored in the exist db. Presently they are just stored
primitively
under an "owner/title" entry, but there's nothing more than that. Again, a
spec document would probably be a good idea for this.
4.) Housekeeping stuff. Presently the session search results are just stored
in the exist-db, but are never cleared out and would have to be done so
manually
through Exist admin facilities. Some automated procedures should be put
in place to do this. Also, backup/restore procedures should at least be
documented for when production resource lists are being stored.
Progress note as at 2005-02-17
I
have set up my development environment and own test server and have got
the existing mdc app working on it. I have now created a new package
stream org.mdcog and the packages org.mdcog.jafer, org.mdcog.databeans,
org.mdcog.db
and org.mdcog.util. I have separated out all the
classes used for mdc from the original jafer classes and moved these
into the above packages. I have
also written an ant script to build the mdc stuff and prepare a fileset of
the appropriate files for both mdc and jafer for copy to tomcat (or
generating a war).
I
have now worked through (to understand) the existing db layer and
worked out a strategy for handling both session state persistence and
resource list persistence using exist-db (see discussion below).
I have created a org.mdcog.JaferClient? bean (from the original ZClientDB)
which will be used to handle the jafer configuration, queries and result sets
and
transforms. Persistence is delegated to org.mdgog.Persistor which
handles databean creation and the exist-db interface. The jsp code will
interact directly with JaferClient? as its principal bean. I am
currently working through exist-db to fully understand it (setting up
some test code to
exercise its api, etc, before coding up the Persistor class.
The approach I have taken is that JaferClient? provides the public api to the
jsp search code and keeps session scoped databeans and/or interacts with the Persistor. The basic logic for a query is;
-Jsp search calls JaferClient? submitQuery method with query keys.
-JaferClient? executes query on jafer target (presently just one).
-JaferClient? reads all (or part) of the records in jafer resultset,
transforms them (RLI xsl) to xml and persists them as xml documents in
the db keyed by sessionId,recordSetId (tabid) and recordNo.
-JaferClient? generates a set of databeans for each record containing
Auther,Title,docId,openURL. This set is returned to jsp for display.
-User selects list of records to add to resource list, etc. Existing beans
such as ReadingListBean? work as is, but they are modified to call the
Persitor to retrieve records for generating the download list, etc.
The main difference from the existing system is obviously using exist-db as
the persistence store, but also the separate packaging of the different
functions, hopefully making it easier to implement your design goals, and
also for later packaging mdc2 as a toolkit with its own well defined user
api.
I
think everthing above is envisaged in your design document. The one
thing that comes to mind that I can see as a possible "limitation" to
the end user is; when we execute a jafer query, there could be a large
number of results returned. This is presently restricted to some
configured or user specified maximum - but there is no way for the user
to see the "extra" results. That is there's no equivalent to the "page
tabs" you get in google for example. (The present tabs are per query
not per page of results for one query). I'm not sure if you would want
to cater for this, but I will keep David's mechanism of subclassing
jafer's setRecordCursor as that could facilitate it.