• John Walker posted an update in the group Group logo of UpdatesUpdates 1 month, 1 week ago

    2019 July 8

    We've been getting lots of warnings recently from
    what appears to be routine load on the server.  These conditions
    abate without any transient problems or lasting consequences.
    To reduce the noise level, I changed the $DANGER threshold in:
    from 1_500_000 to 1_250_000.  This is still well above the level
    where we'd get out of memory failures from php_fpm.
    On 2019-06-14, after getting undefined variable warnings from
    the /server/cron/video_theatre/ CRON job, I
    added diagnostic code which dumps the JSON response from the
    YouTube API request which resulted in the error.  At 22:00 UTC
    on 2019-07-06 this trap sprung and reported the following:
        videoID undefined for query,id&order=date&maxResults=1&fields=items/id/videoId,items/snippet(publishedAt,title,description)
          reply was:
         "items": [
           "snippet": {
            "publishedAt": "2019-07-06T16:55:54.000Z",
            "title": "Grenade Launchers",
            "description": ""
          Parsed as:
        $VAR1 = {
            'items' => [
                           'snippet' => {
                               'publishedAt' => '2019-07-06T16:55:54.000Z',
                               'title' => 'Grenade Launchers',
                               'description' => ''
    The query was for the most recent video at the time from the
    "Forgotten Weapons" channel.  By the time I got around to
    running this down, a newer video had been published on the
    channel and the problem had gone away.  To try to run it down, I
    composed a manual query for the five most recent items on the
    channel, requesting all fields, not just the selected subset we
    do in the CRON job:,id&order=date&maxResults=5
    This returned the following relevant information for the
    offending item:
       "kind": "youtube#searchResult",
       "etag": "\"Bdx4f4ps3xCOOo1WZ91nTLkRZ_c/5PUFJbztY3rd5S1fhKKwMBQH-3Y\"",
       "id": {
        "kind": "youtube#playlist",
        "playlistId": "PL9e3UCcU00TQ03q09ZLuOSuBfqIcBg0q9"
       "snippet": {
        "publishedAt": "2019-07-06T16:55:54.000Z",
        "channelId": "UCrfKGpvbEQXcbe68dzXgJuA",
        "title": "Grenade Launchers",
        "description": "",
        "thumbnails": {
         "default": {
          "url": "",
          "width": 120,
          "height": 90
         "medium": {
          "url": "",
          "width": 320,
          "height": 180
         "high": {
          "url": "",
          "width": 480,
          "height": 360
        "channelTitle": "Forgotten Weapons",
        "liveBroadcastContent": "none"
    Now I get it!  This is a *playlist*, not a video, as identified
    by the '"kind": "youtube#playlist"' in the "id" section.  Since
    it has no videoId in that section, we come up blank for that
    It turns out that the YouTube Data API query format:
    includes a "type" item which you can set to any combination of
    "channel", "playlist", or "video" (comma separated) and defaults
    to all three if not specified.  So, it appears all we need to do
    is add a "type=video" parameter to our query to see only videos
    and exclude playlist and channel posts.  Let's try it in a
    manual query:,id&type=video&order=date&maxResults=5
    Yup, it appears to work.
    I added the "type=video" to the query composed in:
    Running too many of these queries risks exhausting the quota
    permitted by the YouTube Data API and being locked out for 24
    hours.  After having run a number of manual queries in debugging
    this, I'll just let the CRON job run in the normal course of
    affairs (we run it every 12 hours at 08:00 and 20:00) and see
    how it worked after the next scheduled run.
    Note that we still have the $Debug flag set in,
    which causes each YouTube Data query and reply to be dumped in a
    "debug" subdirectory of /server/cron/video_theatre.  As these
    file are named from the YouTube channel ID, they overwrite those
    from the last run and don't accumulate over time.  I'm going to
    leave this on until I'm confident we've fixed these intermittent
    query failures once and for all.
    Created an AWS Route 53 Health Check for, ID
    52cbed43-2a30-4205-b149-8601baad4a18.  This contacts the site
    via https: and requests the home page, the defaults.  I did not
    set up an alarm for the health check failures, and will not do
    so until I'm happy it's running properly.  The health check is
    configured to poll from the default set of
    AWS regions:
        US East (N. Virginia)
        US West (N. California)
        US West (Oregon)
        EU (Ireland)
        Asia Pacific (Singapore)
        Asia Pacific (Sydney)
        Asia Pacific (Tokyo)
        South America (São Paulo)
    at the default interval of 30 seconds (you can't set it any
    longer). As no special options were requested, this health check
    falls under the free service provided for an AWS account.
    After running the health check for an hour, it reported the site
    healthy with no anomalies, but the load that retrieval of the
    home page, which involves firing up the full WordPress stack and
    dynamically generating a 100 Kb page on the fly is sufficient,
    combined with the frequency of polls and number of sites
    polling, to begin to chew into our CPU credit.  This won't do. 
    I disabled the health check for the moment.  After I think about
    this for a while, I'll specify an endpoint for the health check
    which isn't as costly for the server to process (perhaps a
    static page outside the purview of WordPress or a CGI script
    which performs a proof of life for the site).  The health check
    for Fourmilab, which only retrieves the root frameset for the
    site as a static document, imposes no substantial load on that
    The scheduled update of the Video Theatre at 20:00 UTC ran