JADE Environment Development Ideas

What's new in the upcoming JADE release?

IMPROVED DEVELOPER EFFICIENCY. ENHANCED SECURITY. SMOOTHER INTEGRATION

The JADE 2022 release meets tomorrow’s business demands.


Start your update to JADE's latest release

Using the "last replayed audit timestamp" as the _timestamp on the L records

It is my understanding that when the full clones are extracted into the .tbl files, as a full set of Load ("L") records, the _timestamp value of each of the L records generated is the actual date and time of the extract of the L record (or the write to the file), however the data position of the L record is actually the point of replay when the data pump was stopped - i.e. the "last replayed audit timestamp" as reported in SDS Admin.

If my understanding is correct, and I'm not missing anything, wouldn't the "last replayed audit timestamp" be a better choice for this L record _timestamp value, rather than time of extract, as that timestamp is the correct data position? Even if this was optional at extract time or in the INI file.

I don't have a recent example, but I'm sure in the past this has caused us problems in the Live environment where there is no "quiet" time on the Primary - such that when the data pump has been stopped for the extraction and load of the full clones data is being modified in the Primary. This may have been due to us using exclusively the _timestamp value for operation ordering/oid modifcation, rather than also considering the oid edition number, but I can't recall, sorry.

Even though we do have the _edition numbers to track the latest version of a particular oid, when relying on the _timestamp values it can become confusing where we have a modification in the Primary while the data pump is stopped, but before the class L records are extracted.

For example, say we stop the data pump at 12:00 and at 12:01 an instance of a class is deleted on the Primary so we get a Delete ("D") operation of the oid into RPS SQL with a _timestamp of 12:01, but say with the data pump still stopped we then extract a clone of that class, at 12:05, we then end up with an L record of the same oid with a _timestamp of 12:05 - though of course the edition number will be lower. Suppose then we start the data pump and sync, this can be confusing to see an oid deleted at 12:01 but still in the set of load records at 12:05.

I maybe misunderstanding how this works but, if I'm not, could using the "last replayed audit timestamp" as the _timestamp on the L records, due to it accurately reflecting the data position, be considered?

The reason for requesting this again now is that we believe it makes sense in terms of the data position, and the example I used in the contact did cause us an issue in Production last year, albeit there was a lot to consider in that issue, this was just part of it and made the investigation etc more complex.

In that case, in July last year, we had an issue with the RPS datapump following the monthly system release, which took a number of days to resolve (via a Par contact and fix) - in that case the full clones/class extracts and loads that were generated via RPS mapping changes in the system release didn't get generated on the RPS node until a few days after the release - they were generated with _timestamp values of a couple of days after the actual release, some instances in those full clones ("L" records) had been deleted on the Primary in the meantime - this led to the "D" records having _timestamps earlier than the "L" records and complexity in investigation, and perhaps also some data loss/inconsitency - but I can't be sure on the latter as it was quite a while ago.

But we would like to request this change if possible, if for nothing else than it is one less thing for us to have to recall in any such issues like this in future etc - if that makes sense, we would really appreciate if the _timestamp value could closer match the data position for which they represent.


  • John Richards
  • Mar 30 2022
  • IDE Backlog
  • Attach files
  • Roger Rawcliffe commented
    October 07, 2022 09:12

    Hi John


    Thank you very much for taking the time to understand our problem, appreciate you picking up this improvement for us.


    Many thanks


    Rog

  • Admin
    John Richards commented
    October 07, 2022 01:00

    Hi Roger,

    Thankyou for your further comments and the insights into the technical problem that you are trying to solve at Skipton.


    As you observe the _timestamp value of each Load record is the actual date and time of the extract operation itself (i.e. when the write to the file occurs).


    We concur that this value is of 'limited' value especially given your reintegration of data use case.


    After discussing various options we also agree that using the 'last replayed audit timestamp' on the Secondary would be a better choice to allow correct sequencing of records.


    We will add an item to our work program to make this improvement.

    Regards John Richards



  • Roger Rawcliffe commented
    May 13, 2022 14:54

    Hi John/Kevin

    Thank you for this, appreciate you taking the time to look into this - I agree with Kevin's comments.

    I have asked one of my colleagues in the data engineering team for added insight into the app/business problem, and I think he's hopefully detailed it well:


    "The issue relates to how we process the CORE data from RPS. All non-Many-Many classes are mapped in the Historical mode so we receive all Jade transactions with a timestamp. This is fundamental to how we process data for KNIGHT [our datawarehouse, effectively] in time-bound windows (followed by the creation of snapshots), and means that we can retrospectively catch-up on processing/snapshotting if we need to.

    Another factor to consider is that we retain deleted data in KNIGHT (marked as deleted based on the ‘D’ operation received in RPS) and that previously-deleted records are not returned in a Full Clone (‘L’).

    The way that the timestamp is set on a Full Clone means that we cannot just integrate Full Clone records with the C/U/D records. Instead we built some bespoke, quite complicated logic that works on certain assumptions.

    It works reasonably in normal circumstances but can lead to issues when there are exceptional circumstances. For example, when we had the issue in July 2021, the data extracts were made on 18/07/2021 but the timestamp on them was 22/07/2021, when we got RPS running again. This caused data issues, particularly for objects that had undergone a subsequent deletion in the interim period.

    We’re now building a new platform in Azure using different processing tools and re-architecting from scratch, and there is uncertainty whether the previous bespoke logic can be recreated. It would be much slicker and much more robust for us if the timestamp on the Full Clone was more meaningful."


    I hope this helps? But please let me know if you need more details and we'll get them together.

    Many thanks

    Rog

  • Kevin Saul commented
    May 13, 2022 03:03

    I believe Roger is referring to problems that would arise where SQL queries are dependent on the _timestamp ordering being consistent with the operation & object edition order. When a table is reloaded due to a full clone/rebuild or as part of replaying re-orgs from a primary deployment, the time that occurs on the RPS secondary could be after a subsequent transaction has already occurred on the primary that have updated/deleted objects in the table. When the subsequent transaction is replayed, the timestamp used is from the journal, reflecting when the transaction occurred, not when it was replayed. Using the current time stamp when reloading a table is inconsistent, hence we can end up in situations where we have L records that appear newer than records from U/D operations (to have a deleted record before it was reloaded would be especially confusing!). What Roger's asking for doesn't seem unreasonable - the timestamp used for L records should always be based on the last journal replayed or being replayed.

  • Admin
    John Richards commented
    May 13, 2022 01:53

    Hi Roger, We discussed this JEDI at our review meeting this week. Our first thought was the same as the reply in the PAR, but before we go down that track could you please give us some insight into the application or business problem you are trying to solve. Cheers John R. If you prefer you could reply via the Skipton Teams channel.