Skip to topic | Skip to bottom
Grimoires
Grimoires.DistributedQueryDiscovery

Start of topic | Skip to actions

Discussion between Simon Miles, Nedim Alpdemir, Arijit Mukherjee on DQP document publishing and discovery (September 2003)

Simon

After the talk I had with you last week, Iím keen to continue looking into registering queries in our registry that a DQP could process. Do you have a sample OQL query that someone other than the author might wish to use? If its parameterised thatís all the better! Any other information you might think would be worth registering/discovering about a query Iíll also happily accept

Nedim

The attached file (DQPrequest_synch_simple.xml) is an example of a request document (lets call it GDS-perform-document --- GDS stands for Grid Data Service) that contains an OQL query. This document is what the DQP expects from a client for submitting a query. Its format is dictated by the Grid Data Service (GDS) perform document schema (I attached the schema file grid_data_service_types.xsd which contains the relevant XML types). Note that the request not only contains the OQL query string, but it also encapsulates information on the nature of interaction (synchronous/aynchronous), and it can potentially contain other information to tell the DQP what to do with the result of the query (although the one attached does not contain such information).

I am not entirely sure how DQP will be integrated into myGrid, but I suspect there will be a tool that would read a document (lets call it DQP-request-document) which (as opposed to GDS-perform-document ) would contain not only a query but also the information required to configure the DQP so that it can import database schemas of all databases that need to participate in a distributed query. This tool can then create a web service wrapper around the DQP for that particular request. Currently the configuration information is passed as a separate document before the actual query request (i.e. GDS-perform-document) is submitted. So if we combine both the configuration information and the query request in a single document (remember we called this DQP-request-document), it might look something like in the third file attached called DQP-request-document-example.xml.

I suspect there are two scenarios here:

1 - It is the Web-Service wrapper generated by the tool which gets registered to the registry 2 - It is the DQP-request-document which gets registered to the registry.

I think the more natural approach would be to register the web-service (option 1) with some metadata that tells its function -- i.e. the data sources it integrates and a description of the query. Or may be the metadata would be the DQP-request-document that was used to generate the web service wrapper.

For the second option the metadata could contain the end point of a generic web service wrapper that can take a DQP-request-document and execute it on the fly. So in that case the user discovers the DQP-request-document and uses the end point of the web service wrapper to execute the request.

Or may be we should support both approaches. Or any other ideas? What do you think?

Simon

Thanks for that detailed response! I think the scenario we were considering was option 2 exactly as you describe (registering the DQP-request-document with both OQL query/GDS-perform-document plus required data source information). This is better than option 1 both because itís more a interesting process and description is involved, and because the client discovering the DQP-request-document can choose a generic web service DQP wrapper based on the resources provided by that wrapper/DQP, e.g. using local network vs. whole Grid, which may be appropriate in different circumstances. For the latter reason, we would expect the web service endpoint not to be included in the registration details. Pretty much this is what happens for discovering and enacting workflows. However, option 1 is perfectly reasonable as well (and much easier to do at the moment) and you could see circumstances where, in option 2, a wrapper endpoint might usefully be included in the description as with web services in UDDI.

To start with, unless you urgently need web service wrappers registered for myGrid demos or Luc disagrees with the above, Iíll look at option 2. Iíll first try and understand the structure of the DQP-request-document, so Iíll probably be back with questions very soon! Is there any document you can point me at which might help?

Also, anything you can tell me on the parameterisation of queries? Is the query in the document attached parameterised? If not, would it be possible to make it parameterised in a simple (possibly contrived) way? What sort of variable would the parameter represent? A constraint on a field value?

Arijit

From the points given by Norman during our initial discussion about the document we produced (it's in the Twiki), there could be another possibility (I'm not quite sure if the 1st option in Nedim's mail refers to this one or not).

The tool would generate a Java object (say DQPQueryInstance??) - which will be parameterized with the handle of the data sources it would query, and the query itself. This object can be stored in the MIR (just as the workflows are), and we can even think of the WSDL type definition of this to register it with the registry (we need to look into the feasibility of this). While using this "DQPQueryInstance", you give another parameter to it (the handle of the DAISRegistry it would search for the GDQS factory - or you can give the handle of the GDQS factory directly) - and this object does the rest. This could have methods to perform all its functions.

We cannot maintain a particular instance because that would imply an infinite lifetime for it. That's why this will be a generic object which takes in the factory handle and creates the proper DQP instance.

And we must not forget the tool - in my view, this can be a part of the workbench - a graphical view which would let the user mention all the stuff (even build the OQL query)?

Simon

Iím not sure I understand your proposal exactly. Do you mean that a generic object would be published that took arbitrary queries with data sources and performed DQP? If so, then, apart from the fact that it probably should be a web service rather than a Java object, is how I understood Normanís suggestion. It seems like a useful thing to do but not the same thing. If a biologist wishes to search for analysis tools giving more information on given data values then they should discover queries (along with services and workflows) that do useful searches of remote databases given those values, rather than a generic DQP service.

Or do you mean that an object specific to a query would be stored in the MIR and registered? If so, the disadvantages I can see of this approach are the following. First, it relies on the client being written in Java. Second, while the workflows are stored in the MIR currently, they would ideally be drawn from the MIR and published as a whole document (rather than a reference) in the registry to allow you to search for workflows based on, for example, the activities they perform, and by semantic descriptions attached to parts of the workflow. The same seems to apply to DQP request documents, where you might search based on a data source used, for example. In this case, the DQPQueryInstance?? sounds to me exactly like something that would be constructed by the client from what was discovered in the registry, rather than being registered itself, and sounds like it would be useful as such from your description.

Iíve possibly misunderstood in both cases, though!

Your idea of giving a WSDL-type definition with the parameters being inputs to the query processing is exactly what I had in mind, though from my discussion with Nedim, it seemed to be that parameterising based on data source would not normally be useful because there would be few sources that could be replaced one for another in a query due to differing schemas. Do you agree with this?

Iíll take a look at your TWiki document, thanks for the reference.

Arijit

Initially I thought of having objects stored in the MIR - but then, yes, the client would have a problem if it's not in Java.

The second one sounds more reasonable - constructing the instance from the document. Only additional thing apart from the document would be the handle of the DAISRegistry (it shouldn't be inside the document).

> Your idea of giving a WSDL-type definition with the parameters being inputs to the query processing is exactly what I had in mind, though from my discussion with
> Nedim, it seemed to be that parameterising based on data source would not normally be useful because there would be few sources that could be replaced one for
> another in a query due to differing schemas. Do you agree with this?

I think so. But Nedim would be the best person to analyse this...I don't quite know how the data source and schema stuff works...

Simon

> The second one sounds more reasonable - constructing the instance from the document. Only additional thing apart from the document would be the handle of the DAISRegistry (it shouldn't be inside the
> document).

So the DAISRegistry (or GDQS Factory, I was considering, but thatís only because I donít know what the DAISRegistry is :-)) would be supplied by the client after discovering the document and constructing the object? This sounds a sensible approach. I was envisaging a GDQS Factory, one appropriate to the clientís needs, being discovered from the registry separately by the client. Would it be more appropriate for the client to use/discover a DAISRegistry?

Arijit

Right now, the demo client searches the DAISRegistry which contains the handle of the GDQS factory, and using that, an empty GDQS instance is created, to which the data sources are added. I think if you want to match the client's needs, then the factory should hold some relevant information - at this point, I am not sure it does - does it Nedim?

If it doesn't hold such information, then I think providing the registry handle would be sufficient. If it holds information such as which data resources it can support, then based on the user requirements, the client can search the registry...

I think we are coming to the point about the type of DQP instances - empty, semi-configured, fully-configured......

Nedim, supposing we use the "fully-configured" scheme, would the GDQS factory hold information about the data sources it supports somewhere?

Nedim

> Thanks for that detailed response! I think the scenario we were considering was option 2 exactly as you describe
> (registering the DQP-request-document with both OQL query/GDS-perform-document plus required data source information).
> This is better than option 1 both because it's more a interesting process and description is involved, and because the
> client discovering the DQP-request-document can choose a generic web service DQP wrapper based on the resources
> provided by that wrapper/DQP, e.g. using local network vs. whole Grid, which may be appropriate in different
> circumstances.

This approach assumes that the generic DQP wrapper does not know anything about the "set of data sources" requird to run a query. What it does is just take the DQP-request-document, create an instance of DQP using DQP factory, take the first part of the DQP-request-document (the data source list) and cause the DQP instance to import the database schemas of the data sources, and finally submit the query request (extracting the second part of the DQP-request-document) to the DQp instance and deliver the results. So essenially the web service wraps the factory rather than a DQP instance. In that case, as you suggest, the end point of the actual DQP wrapper can be discovered as a second step based on some other metadata exposed by different wrappers. However, it might make sense to include a default wrapper end point to save time (in most cases you might end up discovering the same wrapper). For this scenario there is no need for a tool that takes a DQP-request-document and generates a wrapper service. You write the wrapper once, but you might deploy it to different servers to exploit different computational resources on the Grid (hence alternatives for discovery) However what I was suggesting was slightly different. The tool I mentioned in my e-mail would read a DQP-request- document and generate the code for a web service wrapper (as an off-line process). The wrapper would then know about the data sources being integrated, and would implement a method for each query contained in the document (so the document can contain multiple query requests). If the queries are paremeterised, then these paremeters would be defined as the parameters of the method generated which corresponds to that query (see the attachment for a possible example of a document that contains parameterised queries). This will result in a set of canned queries represented as a set of corresponding methods on the port type of the wrapper web service. In this case it makes sense to include the end point of this wrapper web service in the registration, I guess (or does it not?). The end point information can either be an additional metadata or infact it can be an additional element in the document (not sure what the implications of these two are, or whether they are fundamentally different).

> For the latter reason, we would expect the web service endpoint not to be included in the registration details. Pretty
> much this is what happens for discovering and enacting workflows. However, option 1 is perfectly reasonable as well
> (and much easier to do at the moment) and you could see circumstances where, in option 2, a wrapper endpoint might
> usefully be included in the description as with web services in UDDI.

> To start with, unless you urgently need web service wrappers registered for myGrid demos or Luc disagrees with the
> above, I'll look at option 2. I'll first try and understand the structure of the DQP-request-document, so I'll probably
> be back with questions very soon! Is there any document you can point me at which might help?

As I said above, for a web service wrapper that effectively implements canned queries, I would think that registering NA> the service itself would make more sense.

Also, anything you can tell me on the parameterisation of queries? Is the query in the document attached parameterised? If not, would it be possible to make it parameterised in a simple (possibly contrived) way? What sort of variable would the parameter represent? A constraint on a field value?

I have attached a possible example document that contains a parameterised query (DQP-request-document-example2.xml). The examples in the document illustrate how currently OGSA-DAI paremeterised queries are represented. This might differ in the DQP case, but the idea would be similar. If the wrapper service is registered as I described above, the paremeters will simply be represented as the method parameters. So, that should be straight forward to handle for your registry. If however we register the request-document, the paramaters will be represented in the document. The exact format needs to be decided though.

As a general comment; we will need to prepare a more detailed description of the ideas discussed here. Probably Arijit will be more involved in this. Do you have a particular deadline/date that you want us to respond with such a detailed description?

-- SimonMiles - 20 Dec 2004
to top


You are here: Grimoires > DesignDocument > DistributedQueryDiscovery

to top

Copyright © 2004 by the University of Southampton