XML in, JSONB out: Best-Effort XML → JSONB in YugabyteDB YSQL

Most modern apps want to store semi-structured data as JSON/JSONB, but a lot of legacy or external systems still speak XML. YugabyteDB YSQL has excellent JSONB support but no built-in XML type or parser.

In this tip we’ll build a generic XML → JSONB user-defined function (UDF) that:

  • ● Works in plain YugabyteDB YSQL / yugabyted.

  • ● Does not hard-code any tag names or schemas.

  • ● Handles nested elements and repeated tags (arrays).

  • ● Produces a nested jsonb tree from arbitrary “reasonable” XML.

We’ll also add an optional helper to auto-cast numeric-looking strings into numeric values.

Finally, we’ll run it on a realistic sample: a RiskAssessmentEnvelope XML payload from a (fictional) risk scoring gateway.

1. Generic XML → JSONB in YSQL

This function walks an XML string as plain text:

  • ● Strips the XML declaration (<?xml ...?>).

  • ● Scans for tags <Tag ...> / </Tag>.

  • ● Ignores attributes (keeps only tag names and content).

  • ● Builds a JSON tree:

    • ○ Each tag name becomes a key.

    • ○ If there are multiple siblings with the same tag name, they become a JSON array.

    • ○ Nested XML becomes nested objects.

⚠️ This is a best-effort XML parser, not a full XML engine. It works well for “API-style” XML but ignores attributes and namespaces.

Function: xml_to_jsonb_generic(text) → jsonb

				
					CREATE OR REPLACE FUNCTION xml_to_jsonb_generic(p_xml text)
RETURNS jsonb
LANGUAGE plpgsql
AS $$
DECLARE
  v_xml        text := p_xml;
  v_children   jsonb := '{}'::jsonb;  -- key -> array of values
  v_result     jsonb := '{}'::jsonb;

  pos          int := 1;
  xml_len      int;
  sub          text;
  start_rel    int;
  start_tag_pos int;
  end_tag_rel  int;
  end_tag_pos  int;
  tag_full     text;
  tag_name_raw text;
  tag_name     text;

  after_open_pos int;
  sub_after_open text;
  close_rel      int;
  close_pos      int;
  close_end_pos  int;
  inner_text     text;
  closing        text;

  child_value   jsonb;
  existing_vals jsonb;

  k             text;
  arr           jsonb;
BEGIN
  -- Strip XML declaration like <?xml ...?>
  v_xml := regexp_replace(v_xml, '^\s*<\?xml[^>]*>\s*', '', 'g');
  xml_len := length(v_xml);

  -- Walk the XML text and parse tags one by one
  WHILE pos <= xml_len LOOP
    sub := substring(v_xml from pos);
    start_rel := strpos(sub, '<');
    IF start_rel = 0 THEN
      EXIT; -- no more tags
    END IF;

    start_tag_pos := pos + start_rel - 1;

    -- Skip closing tags like </foo>
    IF substring(v_xml from start_tag_pos+1 for 1) = '/' THEN
      pos := start_tag_pos + 2;
      CONTINUE;
    END IF;

    -- Skip declarations / comments / DOCTYPE: <?...>, <! ... >
    IF substring(v_xml from start_tag_pos+1 for 1) IN ('?', '!') THEN
      sub := substring(v_xml from start_tag_pos);
      end_tag_rel := strpos(sub, '>');
      IF end_tag_rel = 0 THEN
        EXIT;
      END IF;
      pos := start_tag_pos + end_tag_rel;
      CONTINUE;
    END IF;

    -- Find end of opening tag: <tag ...>
    sub := substring(v_xml from start_tag_pos);
    end_tag_rel := strpos(sub, '>');
    IF end_tag_rel = 0 THEN
      EXIT; -- malformed
    END IF;
    end_tag_pos := start_tag_pos + end_tag_rel - 1;

    -- Extract tag name (first token, trim any trailing '/')
    tag_full := substring(v_xml from start_tag_pos+1 for end_tag_pos - start_tag_pos - 1);
    tag_name_raw := split_part(tag_full, ' ', 1);
    tag_name := trim(trailing '/' from tag_name_raw);

    -- Self-closing tag like <foo/> → treat as empty value
    IF right(tag_full, 1) = '/' THEN
      inner_text := '';
      close_end_pos := end_tag_pos + 1;
    ELSE
      -- Find matching closing tag </tag_name>
      closing := '</' || tag_name || '>';
      after_open_pos := end_tag_pos + 1;

      sub_after_open := substring(v_xml from after_open_pos);
      close_rel := strpos(sub_after_open, closing);
      IF close_rel = 0 THEN
        -- Malformed; treat as empty
        inner_text := '';
        close_end_pos := end_tag_pos + 1;
      ELSE
        close_pos := after_open_pos + close_rel - 1;
        inner_text := substring(v_xml from after_open_pos for close_pos - after_open_pos);
        close_end_pos := close_pos + length(closing);
      END IF;
    END IF;

    -- Determine value: nested structure or plain text?
    IF position('<' IN inner_text) > 0 THEN
      child_value := xml_to_jsonb_generic(inner_text);
    ELSE
      child_value := to_jsonb(btrim(inner_text));
    END IF;

    -- Append child_value to v_children[tag_name] array
    existing_vals := v_children -> tag_name;
    IF existing_vals IS NULL THEN
      v_children := v_children || jsonb_build_object(tag_name, jsonb_build_array(child_value));
    ELSE
      v_children :=
        jsonb_set(
          v_children,
          ARRAY[tag_name],
          existing_vals || jsonb_build_array(child_value),
          true
        );
    END IF;

    -- Advance position
    pos := close_end_pos;
  END LOOP;

  -- If no children parsed, return leaf text
  IF v_children = '{}'::jsonb THEN
    RETURN to_jsonb(btrim(v_xml));
  END IF;

  -- Convert arrays into either scalar or array in final result
  FOR k, arr IN SELECT key, value FROM jsonb_each(v_children) LOOP
    IF jsonb_array_length(arr) = 1 THEN
      v_result := v_result || jsonb_build_object(k, arr->0);
    ELSE
      v_result := v_result || jsonb_build_object(k, arr);
    END IF;
  END LOOP;

  RETURN v_result;
END;
$$;
				
			

What you get:

  • ● The root tag becomes the top-level key (e.g. "RiskAssessmentEnvelope").

  • ● Each child tag becomes a field.

  • ● Repeated tags at the same level (<UltimateBeneficialOwner>…</UltimateBeneficialOwner> multiple times) become JSON arrays.

2. Optional: Auto-casting numeric-looking strings in JSONB

By default, all values come out as strings (because XML is text). Sometimes that’s fine; sometimes you want { "OwnershipPercent": 37.5 } as a number, not "37.5".

This helper walks any JSONB and:

  • ● Leaves numbers, booleans, null as-is.

  • ● For string values:

    • ○ If they match a simple numeric pattern (123 or 123.45), converts to numeric.

    • ○ Otherwise leaves them as strings.

  • ● Recurses through objects and arrays.

Function: jsonb_autocast_numbers(jsonb) → jsonb

				
					CREATE OR REPLACE FUNCTION jsonb_autocast_numbers(p_j jsonb)
RETURNS jsonb
LANGUAGE plpgsql
AS $$
DECLARE
  v_type text;
  v_key  text;
  v_val  jsonb;
  v_obj  jsonb := '{}'::jsonb;
  v_arr  jsonb := '[]'::jsonb;
  v_text text;
BEGIN
  IF p_j IS NULL THEN
    RETURN NULL;
  END IF;

  v_type := jsonb_typeof(p_j);

  IF v_type = 'string' THEN
    -- Get the underlying text
    v_text := p_j #>> '{}';

    -- Simple numeric pattern: integer or decimal
    IF v_text ~ '^[0-9]+(\.[0-9]+)?$' THEN
      RETURN to_jsonb(v_text::numeric);
    ELSE
      RETURN p_j;
    END IF;

  ELSIF v_type = 'object' THEN
    FOR v_key, v_val IN
      SELECT key, value FROM jsonb_each(p_j)
    LOOP
      v_obj := v_obj || jsonb_build_object(
        v_key,
        jsonb_autocast_numbers(v_val)
      );
    END LOOP;
    RETURN v_obj;

  ELSIF v_type = 'array' THEN
    FOR v_val IN
      SELECT value FROM jsonb_array_elements(p_j)
    LOOP
      v_arr := v_arr || jsonb_build_array(
        jsonb_autocast_numbers(v_val)
      );
    END LOOP;
    RETURN v_arr;

  ELSE
    -- number, boolean, null → unchanged
    RETURN p_j;
  END IF;
END;
$$;
				
			

Use it when you want numeric-looking things ("37.5", "3000", "2") converted, without hard-coding field names.

3. Example: RiskAssessmentEnvelope XML → JSONB

Here’s a sample XML from a fictitious Risk Gateway:

				
					<?xml version="1.0" encoding="UTF-8"?>
<RiskAssessmentEnvelope xmlns="http://schemas.tr.com/risk/1.0">
  <Header>
    <MessageId>RISK-REQ-742981</MessageId>
    <Timestamp>2025-11-18T16:42:11Z</Timestamp>
    <SourceSystem>YB-Risk-Gateway</SourceSystem>
    <CorrelationId>74a7a09b-e678-4b1c-8328-52e9f6ff3d92</CorrelationId>
    <Environment>UAT</Environment>
  </Header>

  <ClientContext>
    <Organization>
      <OrganizationId>ORG-003914</OrganizationId>
      <OrganizationName>Global Compliance Advisors LLC</OrganizationName>
      <RegionCode>EMEA</RegionCode>
      <MarketSegment>FINANCIAL_INSTITUTIONS</MarketSegment>
    </Organization>
    <User>
      <UserId>j.smith.risk.analyst</UserId>
      <Role>SeniorRiskOfficer</Role>
      <Department>EnterpriseRisk</Department>
      <Locale>en-GB</Locale>
    </User>
  </ClientContext>

  <RiskRequestBody>
    <RiskAssessmentRequest>
      <RequestId>RA-2025-000172</RequestId>
      <AssessmentType>KYC_ONBOARDING</AssessmentType>
      <Channel>API</Channel>
      <Priority>HIGH</Priority>

      <Subject>
        <SubjectId>CUST-2048557</SubjectId>
        <SubjectType>LEGAL_ENTITY</SubjectType>
        <LegalName>Nordic Atlas Trading S.A.</LegalName>
        <CountryOfIncorporation>LU</CountryOfIncorporation>
        <PrimaryJurisdiction>EU</PrimaryJurisdiction>
        <IndustryCode>5221</IndustryCode>
        <IndustryDescription>Securities and Commodities Trading</IndustryDescription>
      </Subject>

      <OwnershipStructure>
        <UltimateBeneficialOwner>
          <Name>Elena Kovác</Name>
          <OwnershipPercent>37.5</OwnershipPercent>
          <CountryOfResidence>CZ</CountryOfResidence>
        </UltimateBeneficialOwner>
        <UltimateBeneficialOwner>
          <Name>Atlas Holdings Trust</Name>
          <OwnershipPercent>62.5</OwnershipPercent>
          <CountryOfResidence>LI</CountryOfResidence>
        </UltimateBeneficialOwner>
      </OwnershipStructure>

      <RiskDimensions>
        <Dimension name="AML">
          <RequestedChecks>
            <CheckCode>PEP_SCREENING</CheckCode>
            <CheckCode>SANCTIONS_LISTS</CheckCode>
            <CheckCode>ADVERSE_MEDIA</CheckCode>
          </RequestedChecks>
        </Dimension>
        <Dimension name="KYC">
          <RequestedChecks>
            <CheckCode>IDENTITY_VERIFICATION</CheckCode>
            <CheckCode>BUSINESS_REGISTRY_LOOKUP</CheckCode>
          </RequestedChecks>
        </Dimension>
        <Dimension name="CREDIT">
          <RequestedChecks>
            <CheckCode>COUNTERPARTY_RISK_SCORE</CheckCode>
          </RequestedChecks>
        </Dimension>
      </RiskDimensions>

      <Configuration>
        <ScoringModelId>TR-RISK-MODEL-V5</ScoringModelId>
        <MaxLatencyMs>3000</MaxLatencyMs>
        <IncludeDebugInfo>false</IncludeDebugInfo>
        <ReturnLevelOfDetail>SUMMARY_AND_FACTS</ReturnLevelOfDetail>
      </Configuration>
    </RiskAssessmentRequest>

    <NotificationPreferences>
      <Callback>
        <CallbackUrl>https://api.example.com/risk/callback/v1</CallbackUrl>
        <Method>POST</Method>
        <AuthType>OAUTH2_CLIENT_CREDENTIALS</AuthType>
        <ClientId>risk-client-019</ClientId>
      </Callback>
      <EmailAlerts>
        <Enabled>true</Enabled>
        <SeverityThreshold>HIGH</SeverityThreshold>
        <Recipient>risk-alerts@example.com</Recipient>
      </EmailAlerts>
    </NotificationPreferences>
  </RiskRequestBody>
</RiskAssessmentEnvelope>
				
			
3.1 Convert XML → JSONB
				
					SELECT xml_to_jsonb_generic($$
<?xml version="1.0" encoding="UTF-8"?>
<RiskAssessmentEnvelope xmlns="http://schemas.tr.com/risk/1.0">
  <Header>
    <MessageId>RISK-REQ-742981</MessageId>
    <Timestamp>2025-11-18T16:42:11Z</Timestamp>
    <SourceSystem>YB-Risk-Gateway</SourceSystem>
    <CorrelationId>74a7a09b-e678-4b1c-8328-52e9f6ff3d92</CorrelationId>
    <Environment>UAT</Environment>
  </Header>

  <ClientContext>
    <Organization>
      <OrganizationId>ORG-003914</OrganizationId>
      <OrganizationName>Global Compliance Advisors LLC</OrganizationName>
      <RegionCode>EMEA</RegionCode>
      <MarketSegment>FINANCIAL_INSTITUTIONS</MarketSegment>
    </Organization>
    <User>
      <UserId>j.smith.risk.analyst</UserId>
      <Role>SeniorRiskOfficer</Role>
      <Department>EnterpriseRisk</Department>
      <Locale>en-GB</Locale>
    </User>
  </ClientContext>

  <RiskRequestBody>
    <RiskAssessmentRequest>
      <RequestId>RA-2025-000172</RequestId>
      <AssessmentType>KYC_ONBOARDING</AssessmentType>
      <Channel>API</Channel>
      <Priority>HIGH</Priority>

      <Subject>
        <SubjectId>CUST-2048557</SubjectId>
        <SubjectType>LEGAL_ENTITY</SubjectType>
        <LegalName>Nordic Atlas Trading S.A.</LegalName>
        <CountryOfIncorporation>LU</CountryOfIncorporation>
        <PrimaryJurisdiction>EU</PrimaryJurisdiction>
        <IndustryCode>5221</IndustryCode>
        <IndustryDescription>Securities and Commodities Trading</IndustryDescription>
      </Subject>

      <OwnershipStructure>
        <UltimateBeneficialOwner>
          <Name>Elena Kovác</Name>
          <OwnershipPercent>37.5</OwnershipPercent>
          <CountryOfResidence>CZ</CountryOfResidence>
        </UltimateBeneficialOwner>
        <UltimateBeneficialOwner>
          <Name>Atlas Holdings Trust</Name>
          <OwnershipPercent>62.5</OwnershipPercent>
          <CountryOfResidence>LI</CountryOfResidence>
        </UltimateBeneficialOwner>
      </OwnershipStructure>

      <RiskDimensions>
        <Dimension name="AML">
          <RequestedChecks>
            <CheckCode>PEP_SCREENING</CheckCode>
            <CheckCode>SANCTIONS_LISTS</CheckCode>
            <CheckCode>ADVERSE_MEDIA</CheckCode>
          </RequestedChecks>
        </Dimension>
        <Dimension name="KYC">
          <RequestedChecks>
            <CheckCode>IDENTITY_VERIFICATION</CheckCode>
            <CheckCode>BUSINESS_REGISTRY_LOOKUP</CheckCode>
          </RequestedChecks>
        </Dimension>
        <Dimension name="CREDIT">
          <RequestedChecks>
            <CheckCode>COUNTERPARTY_RISK_SCORE</CheckCode>
          </RequestedChecks>
        </Dimension>
      </RiskDimensions>

      <Configuration>
        <ScoringModelId>TR-RISK-MODEL-V5</ScoringModelId>
        <MaxLatencyMs>3000</MaxLatencyMs>
        <IncludeDebugInfo>false</IncludeDebugInfo>
        <ReturnLevelOfDetail>SUMMARY_AND_FACTS</ReturnLevelOfDetail>
      </Configuration>
    </RiskAssessmentRequest>

    <NotificationPreferences>
      <Callback>
        <CallbackUrl>https://api.example.com/risk/callback/v1</CallbackUrl>
        <Method>POST</Method>
        <AuthType>OAUTH2_CLIENT_CREDENTIALS</AuthType>
        <ClientId>risk-client-019</ClientId>
      </Callback>
      <EmailAlerts>
        <Enabled>true</Enabled>
        <SeverityThreshold>HIGH</SeverityThreshold>
        <Recipient>risk-alerts@example.com</Recipient>
      </EmailAlerts>
    </NotificationPreferences>
  </RiskRequestBody>
</RiskAssessmentEnvelope>
$$);
				
			

You’ll get a JSONB structure (formatted here for readability) roughly like:

				
					{
    "RiskAssessmentEnvelope": {
        "Header": {
            "MessageId": "RISK-REQ-742981",
            "Timestamp": "2025-11-18T16:42:11Z",
            "Environment": "UAT",
            "SourceSystem": "YB-Risk-Gateway",
            "CorrelationId": "74a7a09b-e678-4b1c-8328-52e9f6ff3d92"
        },
        "ClientContext": {
            "User": {
                "Role": "SeniorRiskOfficer",
                "Locale": "en-GB",
                "UserId": "j.smith.risk.analyst",
                "Department": "EnterpriseRisk"
            },
            "Organization": {
                "RegionCode": "EMEA",
                "MarketSegment": "FINANCIAL_INSTITUTIONS",
                "OrganizationId": "ORG-003914",
                "OrganizationName": "Global Compliance Advisors LLC"
            }
        },
        "RiskRequestBody": {
            "RiskAssessmentRequest": {
                "Channel": "API",
                "Subject": {
                    "LegalName": "Nordic Atlas Trading S.A.",
                    "SubjectId": "CUST-2048557",
                    "SubjectType": "LEGAL_ENTITY",
                    "IndustryCode": "5221",
                    "IndustryDescription": "Securities and Commodities Trading",
                    "PrimaryJurisdiction": "EU",
                    "CountryOfIncorporation": "LU"
                },
                "Priority": "HIGH",
                "RequestId": "RA-2025-000172",
                "Configuration": {
                    "MaxLatencyMs": "3000",
                    "ScoringModelId": "TR-RISK-MODEL-V5",
                    "IncludeDebugInfo": "false",
                    "ReturnLevelOfDetail": "SUMMARY_AND_FACTS"
                },
                "AssessmentType": "KYC_ONBOARDING",
                "RiskDimensions": {
                    "Dimension": [
                        {
                            "RequestedChecks": {
                                "CheckCode": [
                                    "PEP_SCREENING",
                                    "SANCTIONS_LISTS",
                                    "ADVERSE_MEDIA"
                                ]
                            }
                        },
                        {
                            "RequestedChecks": {
                                "CheckCode": [
                                    "IDENTITY_VERIFICATION",
                                    "BUSINESS_REGISTRY_LOOKUP"
                                ]
                            }
                        },
                        {
                            "RequestedChecks": {
                                "CheckCode": "COUNTERPARTY_RISK_SCORE"
                            }
                        }
                    ]
                },
                "OwnershipStructure": {
                    "UltimateBeneficialOwner": [
                        {
                            "Name": "Elena Kovác",
                            "OwnershipPercent": "37.5",
                            "CountryOfResidence": "CZ"
                        },
                        {
                            "Name": "Atlas Holdings Trust",
                            "OwnershipPercent": "62.5",
                            "CountryOfResidence": "LI"
                        }
                    ]
                }
            },
            "NotificationPreferences": {
                "Callback": {
                    "Method": "POST",
                    "AuthType": "OAUTH2_CLIENT_CREDENTIALS",
                    "ClientId": "risk-client-019",
                    "CallbackUrl": "https://api.example.com/risk/callback/v1"
                },
                "EmailAlerts": {
                    "Enabled": "true",
                    "Recipient": "risk-alerts@example.com",
                    "SeverityThreshold": "HIGH"
                }
            }
        }
    }
}
				
			

Note:

  • ● The root element is "RiskAssessmentEnvelope".

  • ● UltimateBeneficialOwner is an array (two owners).

  • ● RiskDimensions.Dimension is an array (AML, KYC, CREDIT).

  • ● Attributes like Dimension name="AML" are not retained (we ignore attributes).

3.2 Optional: Auto-cast numeric strings

To turn numeric-like strings ("37.5", "5221", "3000") into numeric values:

				
					SELECT jsonb_autocast_numbers(
         xml_to_jsonb_generic($$
<?xml version="1.0" encoding="UTF-8"?>
<RiskAssessmentEnvelope xmlns="http://schemas.tr.com/risk/1.0">
  ... full XML here ...
</RiskAssessmentEnvelope>
$$)
       ) AS envelope_json_typed;

				
			

Now things like:

  • ● "OwnershipPercent": "37.5"37.5 (numeric)

  • ● "MaxLatencyMs": "3000"3000

  • ● "IndustryCode": "5221"5221

… are actual numeric values in the JSON tree, without you hard-coding those field names.

4. Storing and querying the JSONB in YugabyteDB

A simple table to store these envelopes:

				
					CREATE TABLE risk_envelopes (
  id              uuid PRIMARY KEY,
  message_id      text,
  created_at      timestamptz DEFAULT now(),
  envelope_json   jsonb
);
				
			

Insert from XML:

				
					INSERT INTO risk_envelopes (id, message_id, envelope_json)
SELECT
  gen_random_uuid(),
  'RISK-REQ-742981',
  jsonb_autocast_numbers(
    xml_to_jsonb_generic($$
<?xml version="1.0" encoding="UTF-8"?>
<RiskAssessmentEnvelope xmlns="http://schemas.tr.com/risk/1.0">
  ... full XML here ...
</RiskAssessmentEnvelope>
$$)
  );
				
			
4.1 Example queries

All HIGH-priority requests:

				
					SELECT id, message_id
FROM risk_envelopes
WHERE envelope_json
      -> 'RiskAssessmentEnvelope'
      -> 'RiskRequestBody'
      -> 'RiskAssessmentRequest'
      ->> 'Priority' = 'HIGH';
				
			

Example:

				
					yugabyte=# SELECT id, message_id
yugabyte-# FROM risk_envelopes
yugabyte-# WHERE envelope_json
yugabyte-#       -> 'RiskAssessmentEnvelope'
yugabyte-#       -> 'RiskRequestBody'
yugabyte-#       -> 'RiskAssessmentRequest'
yugabyte-#       ->> 'Priority' = 'HIGH';
                  id                  |   message_id
--------------------------------------+-----------------
 c1933f5d-9bde-47f2-a89c-4ba11d02b0f2 | RISK-REQ-742981
(1 row)
				
			

All UBOs with OwnershipPercent > 25:

				
					SELECT
  id,
  ubo->>'Name'                 AS ubo_name,
  (ubo->>'OwnershipPercent')::numeric AS pct
FROM risk_envelopes e
CROSS JOIN LATERAL
  jsonb_array_elements(
    e.envelope_json
      -> 'RiskAssessmentEnvelope'
      -> 'RiskRequestBody'
      -> 'RiskAssessmentRequest'
      -> 'OwnershipStructure'
      -> 'UltimateBeneficialOwner'
  ) AS ubo
WHERE (ubo->>'OwnershipPercent')::numeric > 25;
				
			

Example:

				
					yugabyte=# SELECT
yugabyte-#   id,
yugabyte-#   ubo->>'Name'                 AS ubo_name,
yugabyte-#   (ubo->>'OwnershipPercent')::numeric AS pct
yugabyte-# FROM risk_envelopes e
yugabyte-# CROSS JOIN LATERAL
yugabyte-#   jsonb_array_elements(
yugabyte(#     e.envelope_json
yugabyte(#       -> 'RiskAssessmentEnvelope'
yugabyte(#       -> 'RiskRequestBody'
yugabyte(#       -> 'RiskAssessmentRequest'
yugabyte(#       -> 'OwnershipStructure'
yugabyte(#       -> 'UltimateBeneficialOwner'
yugabyte(#   ) AS ubo
yugabyte-# WHERE (ubo->>'OwnershipPercent')::numeric > 25;
                  id                  |       ubo_name       | pct
--------------------------------------+----------------------+------
 c1933f5d-9bde-47f2-a89c-4ba11d02b0f2 | Elena Kovác          | 37.5
 c1933f5d-9bde-47f2-a89c-4ba11d02b0f2 | Atlas Holdings Trust | 62.5
(2 rows)
				
			

All requested risk checks (flattened):

				
					SELECT DISTINCT
  e.id,
  rc AS check_code
FROM risk_envelopes e
CROSS JOIN LATERAL
  jsonb_array_elements(
    e.envelope_json
      -> 'RiskAssessmentEnvelope'
      -> 'RiskRequestBody'
      -> 'RiskAssessmentRequest'
      -> 'RiskDimensions'
      -> 'Dimension'
  ) AS dim
CROSS JOIN LATERAL
  jsonb_array_elements_text(
    CASE jsonb_typeof(dim->'RequestedChecks'->'CheckCode')
      WHEN 'array' THEN dim->'RequestedChecks'->'CheckCode'
      ELSE jsonb_build_array(dim->'RequestedChecks'->'CheckCode')
    END
  ) AS rc;
				
			

Example:

				
					yugabyte=# SELECT DISTINCT
yugabyte-#   e.id,
yugabyte-#   rc AS check_code
yugabyte-# FROM risk_envelopes e
yugabyte-# CROSS JOIN LATERAL
yugabyte-#   jsonb_array_elements(
yugabyte(#     e.envelope_json
yugabyte(#       -> 'RiskAssessmentEnvelope'
yugabyte(#       -> 'RiskRequestBody'
yugabyte(#       -> 'RiskAssessmentRequest'
yugabyte(#       -> 'RiskDimensions'
yugabyte(#       -> 'Dimension'
yugabyte(#   ) AS dim
yugabyte-# CROSS JOIN LATERAL
yugabyte-#   jsonb_array_elements_text(
yugabyte(#     CASE jsonb_typeof(dim->'RequestedChecks'->'CheckCode')
yugabyte(#       WHEN 'array' THEN dim->'RequestedChecks'->'CheckCode'
yugabyte(#       ELSE jsonb_build_array(dim->'RequestedChecks'->'CheckCode')
yugabyte(#     END
yugabyte(#   ) AS rc;
                  id                  |        check_code
--------------------------------------+--------------------------
 c1933f5d-9bde-47f2-a89c-4ba11d02b0f2 | COUNTERPARTY_RISK_SCORE
 c1933f5d-9bde-47f2-a89c-4ba11d02b0f2 | PEP_SCREENING
 c1933f5d-9bde-47f2-a89c-4ba11d02b0f2 | ADVERSE_MEDIA
 c1933f5d-9bde-47f2-a89c-4ba11d02b0f2 | IDENTITY_VERIFICATION
 c1933f5d-9bde-47f2-a89c-4ba11d02b0f2 | SANCTIONS_LISTS
 c1933f5d-9bde-47f2-a89c-4ba11d02b0f2 | BUSINESS_REGISTRY_LOOKUP
(6 rows)
				
			
5. Caveats and when to parse XML in the app tier

This pattern is practical and works well for:

  • ● “API-shaped” XML with nested elements and repeated tags.

  • ● Use cases where you mostly need a JSON representation for querying.

  • ● Demos and prototypes where installing extensions or server-side languages isn’t desirable.

But it’s still a simplified XML view:

  • ● Attributes (like Dimension name="AML") are ignored.

  • ● Namespaces are not modeled; only the element names are used.

  • ● It assumes reasonably well-formed XML; malformed or very exotic XML may not parse correctly.

  • ● For very high-throughput or compliance-sensitive use cases, a proper XML library in the application tier (Java, Python, Go, etc.) is still the best option.

Summary

With just two YSQL UDFs:

  • ● xml_to_jsonb_generic(text) → jsonb

  • ● jsonb_autocast_numbers(jsonb) → jsonb

… you can:

  • ● Ingest XML into YugabyteDB Anywhere / yugabyted without extra extensions.

  • ● Get a reasonably faithful JSONB representation of nested XML.

  • ● Automatically convert numeric-looking values to actual numbers.

  • ● Use YugabyteDB’s distributed SQL and JSONB features to query, index, and analyze what used to be “just XML blobs”.

Have Fun!