Converting my website to AMP (Accelerated Mobile Pages)
Introduction
Viewing web pages on a mobile device (who are we even kidding, on bigger devices too) is a pretty awful experience. Everything takes a long time to load and shifts around while doing so, page layouts are often entirely broken on phones, and there are a bunch of obnoxious interstitial or otherwise obtrusive ads all over the place. In October 2015, Google announced the Accelerated Mobile Pages Project (hereafter "AMP", but also try using "amphtml" in web searches), a restricted subset of HTML5 that requires conformant pages to follow practices that guarantee they can be loaded and rendered quickly by mobile devices. (There's no reason you can't also use AMP to make pages for beefier non-mobile devices, although the practice doesn't seem to be common yet.) Among other restrictions, this is accomplished by disallowing external resources like JavaScript and CSS files that can block the rest of the page from being rendered, and by requiring pages to define the sizes of things like images upfront so they can be loaded asynchronously without shifting other elements around. There's also a Google-run CDN called the AMP Cache that promises to automatically cache and serve all valid AMP documents quickly.
At first glance, AMP doesn't seem all that different from Facebook
Instant Articles or Apple News Format
(why do you hate the word "the" so much, Apple?). Both of those standards
have the stated goal of making it more pleasant to read articles on mobile devices, too.
Facebook Instant
Articles are also HTML5-based, and Instant Articles and AMP basically even have the same logo (
But there are some big differences in how the standards are used. AMP pages can be served directly from your personal server and displayed by any modern browser (since they're just regular HTML5 pages with some AMP-provided JavaScript and CSS). Facebook Instant Articles and Apple News articles can only be viewed using Facebook's mobile apps or the Apple News app, respectively, and Facebook and Apple ultimately get to decide what gets published; the formats are designed to improve the article-reading experience on those closed platforms but don't do anything to improve the open web. Google searches are one way to get to the AMP versions of pages, but AMP is also used by Twitter, Pinterest, and LinkedIn, and Microsoft even announced that they were adding support to the Bing App in September 2016.
After reading about AMP, I decided to take a stab at creating AMP versions of the pages on my personal site (where you're probably reading this). This page has a list of the main requirements for an AMP document. The ones that stuck out the most to me are:
- Pages need to include AMP boilerplate CSS via
<style>
and contain an AMP<script>
element. - All custom CSS rules need to be inlined in a single
<style>
element in<head>
(setting per-element CSS viastyle
attributes isn't permitted). - No custom JavaScript is allowed. Depending on the page, this maybe isn't as bad as it sounds — there's a big collection of custom elements that are implemented via web components.
<img>
is replaced with<amp-img>
, which can apparently do all kinds of fancy lazy-loading and transcoding tricks.width
andheight
attributes need to be present so the page can be laid out before any images have been downloaded.- AMP pages need to link to a (possibly-non-AMP) canonical URL.
- There are some more easy, miscellaneous requirements, like including an
amp
attribute in the<html>
element. (You can instead include a ⚡ character (U+26A1, "HIGH VOLTAGE SIGN"), which seems like a great idea if you're a masochist who wants to constantly copy-and-paste that symbol instead of just typingamp
.)
My site
So, taking stock of the current state of my site: It's aggressively static. There's no database or
server-side scripting, and I generate all of the HTML files locally before copying them to
NearlyFreeSpeech.NET, my webhost. To do that, I have a Ruby
script named template_lib.rb
that defines functions like start_page
(returns
<html><head>...</head><body>
), end_page
(returns </body></html>
),
and a few other helpers like start_box
and end_box
to emit containers. Each page
on my site has a .rhtml
file that looks similar to this:
<%= start_page("Page title", [other args]) %>
<%= start_box("Hello", [other args]) %>
<p>Here's some content!</p>
<p>(I never said it was interesting content.)</p>
<%= add_image("image.png", 1024, 768, [other args]) %>
<%= end_box %>
<%= end_page([args]) %>
Each <%= ... %>
tag gets replaced by the output of the Ruby code inside of it. There's
also a build script that checks if there's an up-to-date .html
file for each .rhtml
file, and if not, generates one using ERB with a command
like:
erb -r ./template_lib.rb my_page.rhtml >my_page.html
Most pages have almost no JavaScript. On desktop, there's a fixed-position site tree (called navbox
in my CSS rules) on the left side of the page. On mobile, the navbox appears at the top of the page and
there's a button that can be clicked to expand and collapse it using some trivial JavaScript that changes an
element's classes to trigger
CSS
transitions.
template_lib.rb
also contains the site hierarchy and uses it to generate the navbox. The
hierarchy is rendered differently for different pages: if the current page contains multiple headings or
subpages of its own, that portion of the tree is expanded.
(Thankfully, almost all of the preceding is outdated: I finally rewrote the site generation code in the Go programming language in May 2020. You can see the new generation code at https://codeberg.org/derat/intransigence.)
There's a base set of CSS rules, and then separate sets for desktop and mobile that are hidden behind @media(min-width:641px)
and @media(max-width:640px)
media selectors. I got on a kick recently about trying to get a
perfect score from Google PageSpeed
Insights, so I'd already made some fortuitous changes like inlining these CSS rules into each page
instead of loading them as external files. I'd also dropped Google Analytics, which makes you lose a point
due to loading a JavaScript file that has a cache expiry of just two hours. (There's the added bonus that
not tracking users seems like a nice thing to do.)
Remember when I said "almost no JavaScript"? That's true for most of the pages, but there are two that are more complicated: my glucometer page, which includes a lot of SVG graphs rendered by d3.js, and my pullup-bars-in-San-Francisco page, which has an embedded Google Maps map with custom markers and a bit of custom JS. I decided I'd just skip those for now.
Comparing the current site to AMP's list of rules, it was clear that the mobile navbox behavior would need to change: pages can't use custom JS to set classes to expand or collapse the navbox. I skimmed through some lists of AMP components and discovered amp-sidebar. It seemed like it'd be a better user experience than what I have right now on mobile, so let's use it!
It was also clear that I'd need to mush all of my custom CSS together into a single <style>
element instead of having separate elements for the base, desktop, and mobile rules. While I was at it, I
figured I should probably drop the desktop rules entirely when generating AMP pages and also omit the media
selector around the mobile rules for AMP — otherwise, an AMP page loaded in a desktop browser would be
missing a bunch of styles.
Converting one page by hand
To get started on the conversion, I made a copy of a simple .html
page, inlined the base and
mobile CSS rules, added amp
to the <html>
element, and loaded it in Chrome
with #development=1
on the end of the URL and the JS console open to see what's broken.
Wow, a lot of stuff is broken!
AMP displays pretty decent error messages for validation errors, so I:
- changed
<img>
tags to<amp-img>
and added closing</amp-img>
tags, - saw a ton of errors about the navbox and just commented out its HTML and CSS for now,
- added
minimum-scale=1
to the<meta name="viewport">
tag (it already hadinitial-scale=1
), - dropped
style="background-position-x: ..."
attributes from some elements, - dropped
style="width: ..."
attributes from image containers, - removed a DOM-loaded
<script>
tag that wired up toggle-navbox event listeners, - moved the logo to the top of the page instead instead of above the navbox,
- noticed that amp overrides
<body>
's margin, setting it to 0, and instead set a margin on the top-level container element, - and made a bunch of other miscellaneous CSS additions.
It validates!
The next step was adding an <amp-sidebar>
to bring back the navigation links. This was
surprisingly straightforward: I added a <script>
before the main v0.js
AMP
<script>
to pull in the sidebar JavaScript, moved the navbox <div>
within a new pair of <amp-sidebar>
opening and closing tags, and then added an <amp-img>
(temporarily using my old expand/collapse triangle icon as a placeholder) with an on
attribute
telling it to open the navbox when clicked:
<amp-sidebar id="sidebar" layout="nodisplay" side="right">
<div id="navbox">
<!-- navigation items elided... -->
</div>
</amp-sidebar>
<div id="top-container">
<!-- site logo elided... -->
<amp-img id="menu-button" src="collapse.png" alt="menu"
width="19" height="17" tabindex="0" role="button"
on="tap:sidebar.open"></amp-img>
</div>
Updating the workflow
I felt like I had a pretty good grasp of the sort of changes that I'd need to make to get most of the pages
validating now, so I turned toward adding this stuff to template_lib.rb
. Some of it seemed
desirable for non-AMP pages as well, like adding minimum-scale=1
, so I did that first. For the
rest, like using <amp-img>
rather than <img>
, I knew I'd need to make
template_lib.rb
produce different output depending on whether it was generating an AMP page or
a non-AMP one. An environment variable seemed like the easiest way to pass data into ERB, so I updated the
build script to run ERB a second time for each .rhtml
template with AMP=1
prepended to the beginning of the command line, like:
AMP=1 erb -r ./template_lib.rb my_page.rhtml >my_page.amp.html
In template_lib.rb
, I set a global variable at the top of the file using:
$amp = ENV['AMP'] == '1'
and then later did stuff like:
def add_image(...)
img_tag = $amp ? 'amp-img' : 'img'
img_end = $amp ? '</amp-img>' : ''
# additional code elided...
"<#{img_tag} ...>#{img_end}"
end
I spent roughly forever trying to figure out why this wasn't working when I first used $amp =
ENV['AMP'] == 1
. Then I remembered that Ruby, unlikely JavaScript, doesn't coerce strings into
integers or vice versa for comparisons. I think I might've even followed this up with still more confusion
by using $amp = ENV['AMP']
and running AMP= erb ...
for the non-AMP pages,
forgetting that empty strings, the string "0"
, and even the integer 0
evaluate to
true
in Ruby. Sigh.
Once I'd fixed that, I integrated the rest of my changes into template_lib.rb. This included adding AMP's
boilerplate CSS rules, along with creating a new file containing my own AMP-specific CSS rules (inlined into
my big <style>
tag by template_lib.rb
) and updating many of the script's
functions to check $amp
and produce appropriate output. There was also a bit of trickiness
around using the correct order for the elements in my <head>
tag; AMP has strict rules
around this.
Notable tricky bits
One possibly-interesting change I had to make involved places where template_lib.rb was setting random
background-position-x
inline styles on some elements to vary their appearance. AMP doesn't
allow style
attributes on elements, so I had to find some other way of doing this. With the
help of Stack Overflow, I realized that I could approximate the current look (with periodic repetition) by
using the :nth-of-type pseudo-class
to vary the position used on successive elements:
.contentbox:nth-of-type(8n) .boxtop {
background-position-x: 0;
}
.contentbox:nth-of-type(8n+1) .boxtop {
background-position-x: 1024px;
}
/* additional rules elided... */
.contentbox:nth-of-type(8n+6) .boxtop {
background-position-x: -768px;
}
.contentbox:nth-of-type(8n+7) .boxtop {
background-position-x: 256px;
}
Some unexpected work came from AMP's requirement that AMP pages link to their content's canonical (usually
non-AMP) URLs using a <link rel="canonical">
tag, while the canonical pages link to the
AMP URLs with <link rel="amphtml">
— this gives search crawlers a way to discover AMP
pages. I was using relative URLs throughout my site, and template_lib.rb didn't have any idea of what an
absolute, canonical URL should even look like. I had to add some more logic to the script to make it able to
generate links in both directions. I should mention here that there's an
AMP URL API that you can use to request
both cached- and non-cached AMP copies of a given canonical URL, but it seems like it's targeted at sites
and apps like Pinterest or Twitter that might want to rewrite tons of external links to point at
faster-loading AMP versions instead.
I'm able to get an idea of how often my regular pages are being loaded by looking at my web server's logs, but AMP pages might end up getting served directly from Google via the AMP Cache. I decided to add Google Analytics tracking back to the AMP pages so I can at least see if anyone's loading them. Luckily, this was basically no work at all: You can add <amp-analytics> elements that send pings in response to various events (in my case, page load). For Google Analytics, I just copy-and-pasted Google's "page tracking" example and inserted my Analytics-supplied property ID.
How to link to things
So at this point, there are three ways to load my pages. Taking the main index.html
page as an
example, you can use:
https://www.erat.org/index.html
, the canonical non-AMP version served by my webhosthttps://www.erat.org/index.amp.html
, the AMP version served by my webhosthttps://cdn.ampproject.org/c/s/www.erat.org/index.amp.html
, the AMP version served by Google from the AMP Cache
The CDN version ought to load quickly, but that URL is really ugly. There's also a big question that you
should ask whenever using someone else's version of a URL instead of your own canonical URL: how long
will this keep working? I know that www.erat.org
URLs will work as long as I keep paying
the bills, but Google has... uh... a
not-so-stellar track record
around supporting products that don't make money. Luckily,
someone
asked about this on Stack Overflow. The circa-July-2016 answer there was, "We recommend that people link
to the canonicals[,] not to the Google AMP Cache versions of their pages." In my case, I think that
"canonicals" refers to my non-AMP canonical URLs — I don't want to link directly to my AMP pages since
they won't look right for desktop users (although this might change if or when I update the AMP pages to
look reasonable on larger devices). There's been more discussion since then on the linked (and now-deleted)
Twitter conversation, and there are now
AMP Cache
Guidelines that strongly suggest that it should be safe to link to CDN URLs:
4. [An AMP cache] pledges to maintain URL space forever (even beyond the lifetime of the cache itself):
i. This can be achieved by donating the URL space to a trustworthy third party entity such as archive.org.
ii. This means that, should a cache decide to no longer operate, URLs should redirect to the origin URL or be served by another cache.
But there's something else that makes me reluctant to depend on the Google AMP Cache: I don't have any
recourse when things break. If the cache goes down entirely, or starts serving a stale or incorrect version
of my page, or decides that it no longer validates, or whatever, there's nothing I can do about it. I
actually ran into this immediately after uploading my AMP pages and loading them via the cdn.ampproject.org
URLs: the orange image across the top of my content boxes was missing. From the dev console, I could see
that the AMP Cache was returning a 404 error instead of the image. This was especially weird since other
images served by the cache (and even the similar blue image across the top of the navbox!) were working. I
submitted a bug report, and it turned out to be a problem with the way that the cache tries to resize
extremely-wide-and-short images (the orange image was 4096x2). But it stayed broken for weeks, which wasn't
confidence-inspiring.
In the meantime, I set the image's background-color style (which I should've done in the first place) so visitors at least see a solid orange color when loading pages from the cache. Then I gave up and switched to a differently-sized image that didn't trigger the bug.
I eventually decided to stick to the following practices:
- When I link to one of my pages from somewhere else, I should use the non-AMP version hosted at my
domain, e.g.
https://www.erat.org/dark_mode.html
. - My non-AMP pages can continue to link to other pages and images, JS files, etc. from my domain using
relative URLs like
/dark_mode.html
or/resources/erat.svg
. - My AMP pages should use relative URLs in
<amp-image>
elements (since I want the AMP Cache to be able to rewrite those to be served from the cache), but they should use absolute URLs likehttps://www.erat.org/dark_mode.amp.html
(note that I'm linking to the AMP versions of other pages) andhttps://www.erat.org/pubkey.asc
when linking to other pages or downloadable files within the domain. The AMP Cache seems to try to rewrite relative URLs intelligently when it's serving pages, but I figure it's safer to do this myself instead of relying on whatever heuristics it's using.
To make these rules easier to follow, I added a helper function to template_lib.rb
that
rewrites a given relative URL appropriately depending on whether it's generating an AMP or non-AMP page and
then updated all of my .rhtml
files to call it instead of including relative URLs directly.
There's actually one other option here: I could configure my web server to detect mobile user agents and
redirect them to the AMP version of the page automatically. This can result in pretty frustrating moments
for users, though: how many times have you ended up seeing the mobile version of a page on a desktop machine
after following a shared link, or struggled to convince a site to serve the full desktop version of a page
to your phone instead of some useless scaled-down mobile version? Additionally, a redirect forces an extra
network round-trip before the client can render anything (although
HTTP 2.0 server push
may change this). Google has a page describing
some best practices
for serving separate mobile content, but the <link rel="amphtml">
tag already makes
it likely that Google will direct mobile users to the AMP versions of my pages, so I decided not to try to
do anything tricky.
Validation
My build script already validates each page that it generates by uploading it to validator.w3.org with a command like:
curl --silent --show-error \
-F 'action=check' \
-F 'uploaded_file=[local file]' \
http://validator.w3.org/check
and looking for the text "This document was successfully checked" in the output. That seems to work fine for HTML5 documents, but it'd be nice to do something more to check that the AMP pages are abiding by all the additional rules that are present there — this is even more important since the AMP Cache will only cache valid documents. There's an online AMP validator, but it seems oddly geared toward validate-as-you-type use, and it wasn't clear to me whether there's any way to upload a local file to it. The only way I could find to validate local files was by using the amphtml-validator Node.js package as described in the "Command Line Tool" section of the Validate AMP Pages doc. I wouldn't call it blazingly fast, but it seems to get the job done. Note that the AMP validator currently reports failure when passed AMP Cache URLs.
Iframes!
I had AMP versions of almost all of my site's pages now, but there were still those two holdouts: the
glucometer page with the d3.js <svg>
graphs and the
pullup-bars page with the embedded map. For the glucometer page, it was
pretty clear there was no way I was going to get the graphs working within the AMP page — d3.js is a
JavaScript library, after all. I could've just screenshotted the graphs and used static images in the AMP
page, but after I saw that AMP supports iframes via the
<amp-iframe>
element, I decided to try to move the graph-rendering code into a new non-AMP HTML file and embed it
into the AMP page.
After searching for terms like "responsive svg", I found some pages like
this one
that were super-helpful for setting the right styles on my <svg>
elements to make them
scale to their containers' widths. I eventually ended up with this in the AMP page:
iframe.graph {
background-color: transparent;
border: solid 1px #ddd;
padding: 0;
overflow: hidden;
}
<amp-iframe class="graph" width="300" height="200"
layout="responsive" frameborder="0" sandbox="allow-scripts"
src="https://www.erat.org/iframes/glucometer_graph.html?w=300&h=200&g=historical">
</amp-iframe>
I put an absolute URL in the src
attribute since the framed page isn't AMP-compliant and won't
get cached by the AMP Cache. I strongly suspect that the cache is already built to handle this case by
rewriting links to non-cached framed pages, but I decided to be explicit instead of counting on that. It
does make development a bit trickier, though, as local copies of the main page still embed the online iframe
page.
glucometer_graph.html
contains some JavaScript that extracts the w
, h
,
and g
query parameters from window.location
and renders the requested graph at the
requested size. (I believe the size is needed to determine the graph's aspect ratio, which should remain
fixed even when the graph is scaled.) This page uses the following CSS to make its SVG element take up all
the space allotted to it:
body {
margin: 0;
overflow: hidden;
}
svg.graph {
width: 100%;
height: 100%;
display: inline-block;
position: absolute;
background-color: white;
}
Some additional attributes are set on the <svg>
element when d3.js adds it when the page
is first loaded (width
and height
here are the dimensions that are used for the
graph on the non-AMP page and that define the bounds used when placing elements in it):
var svg = d3.select(selector)
.append("svg:svg")
.data(/* data to graph... */)
.attr("preserveAspectRatio", "xMinYMin meet")
.attr("viewBox", "0 0 " + width + " " + height)
.attr("class", "graph");
This seems to mostly work. The iframe maintains its aspect ratio while expanding to fill the page width, and the SVG element within it scales the position and dimensions of all of its elements accordingly. AMP is happy since the loading and execution of this slow, bulky JavaScript doesn't block the main page — in fact, I even get a animated spinner in each iframe if I scroll down to it before it's ready.
I'm punting on making an AMP version of the other page with the embedded map, because I'm not quite sure how or even if I'll be able to make the map interact with the rest of the page. It's likely that I'll need to pare down the functionality of the page in its AMP version.
Structured data
At this point, I felt like I was pretty much done with the converting-to-AMP effort, but I got email from Google (by virtue of having registered my site in the Search Console) complaining that my site was missing Schema.org structured data to make it easier to index. There's a strong suggestion at the bottom of that AMP basic markup page to include this, too: it's apparently a requirement to be considered for inclusion in the Google Search news carousel. The chance of that happening for any of my pages seems like it's basically zero, but I figured I'd go ahead and add structured data in the JSON-LD format anyway.
The AMP Project provides a
listing of metadata
examples, including
this
one that's pretty much exactly what I wanted. I didn't see this at first, so I instead spent too long
reading about JSON-LD and the fields
that Google wants and exploring the Schema.org people's bizarre taxonomical urges to drive them to try
to classify everything in the world. All that I actually needed to do was to add some code that includes a
JSON blob like this in each page's <head>
:
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Article",
"mainEntityOfPage": "https://www.erat.org/sleep.html",
"headline": "How to Get More Sleep",
"description": "Tips for sleeping longer",
"datePublished": "2011-07-06",
"dateModified": "2011-07-17",
"author": {
"@type": "Person",
"name": "Daniel Erat",
"email": "dan-website@erat.org"
},
"publisher": {
"@type": "Organization",
"name": "erat.org",
"url": "https://www.erat.org/",
"logo": {
"@type": "ImageObject",
"url": "https://www.erat.org/resources/erat-125-20160914.png",
"width": 125,
"height": 45
}
},
"image": {
"@type": "ImageObject",
"url": "https://www.erat.org/sleep/shades.jpg",
"width": 1024,
"height": 768
}
}
</script>
I hadn't kept track of when I originally wrote some of these pages, so I had to do a lot of git spelunking
and just guess in some cases. I also don't have 696-pixel-wide images for most of them, so I just left out
the image
section for those. Google's
Structured Data Testing Tool was helpful for checking that I didn't mess anything up.
Note that adding lots of data above the <link rel="amphtml">
tags on your canonical pages
can apparently result in your AMP pages not being found by Google, so don't
do that.
Wrapping up
So, that mostly wrapped up the effort. I'm still a bit ambivalent about some aspects of AMP and hesitant to
go AMP-only. There are users with older browsers that don't support recent standards like web components,
and while I'd expect much of the content on my pages to still render for them, tags like <amp-sidebar>
and <amp-image>
may just get silently dropped.
AMP allows <noscript> tags in some places
to handle the no-JavaScript scenario, but I'm concerned about the middle ground of browsers that support
JavaScript, but poorly.
I'm also ceding some control of my site, both by depending on AMP JavaScript files that could change at any time (I've noticed a few issues around sidebar behavior; one was fixed quickly and the other is still being discussed) and by letting my site be mirrored by the AMP Cache (remember the earlier image-serving bug?). For the former issue, if the canonical AMP JS files break horribly, I guess I can always switch to using my own locally-hosted copies from an older version. There's a good chance I wouldn't even notice if my site breaks due to an upstream change, but one could also make the argument that browser vendors are always pushing new versions of their code that could break my site as well.
Amusingly (I guess... ?), the 100/100 mobile speed score I'd managed to get from Google's PageSpeed Insights tool actually fell to 89/100 in the AMP versions of my pages. This seems to be due to the pages not being renderable until the AMP JavaScript is loaded (which is kind of the whole point of AMP) and the AMP CDN serving the scripts with a 50-minute cache timeout, shorter than one-week-to-one-year that PageSpeed Insights recommends for static-ish content. I think the idea is that AMP will be (or maybe already is) ubiquitous enough that browsers will typically already have AMP's JavaScript files cached when loading a page, obviating the need for blocking on additional network requests. I suppose it's a good thing that PageSpeed Insights doesn't carve out exceptions for AMP, although it feels weird that my score dropped when adopting a standard that's trying to adhere to best practices for creating mobile-friendly sites.
This touches on another criticism of AMP that I've seen: it's entirely possible to get all of the advantages of AMP (with the possible exception of preloading), or to potentially achieve even better performance, without actually using AMP. As an extreme example of this, consider a simple HTML page with no JavaScript and no external resources. It'd be unfortunate if search engines started favoring AMP pages over just-as-fast non-AMP pages. So far, this hasn't happened.
The AMP tech lead's stance on the mailing list seems reasonable to me: "Its true that super trivial pages are faster than trivial AMP pages, but we find that most real world pages do not fall into the former bucket. AMP goes beyond raw performance. It e.g. supports a prerender mode that gives both privacy guarantees and saves CPU and bandwidth. Super fast non-AMP pages do not support such a mode (at least cannot be provably known to support it)." You can find more discussion about the politics of AMP scattered across the web, of course.
Looking at this as a user, AMP seems great. I preferentially click links with the AMP logo next to them due to how quickly they load and how much less obtrusive the advertising on them is. For example, compare the non-AMP version of this Los Angeles Times article against the AMP version (note that you'll need to follow the AMP link on a phone or with a mobile user-agent lest you get redirected to the non-AMP page). The non-AMP version takes a long time to load, and shifts around, and displays an interstitial ad covering the whole page, and then includes an auto-playing video that follows me when I try to scroll past it! In contrast, the AMP page loads quickly and just includes a few tasteful ads.
And even as a developer, AMP seems like a good thing to me. Just doing some simple benchmarking of my graphics page from my home network, I see similar times to load the main HTML file and do the first paint (likely because I'd already optimized the page). The AMP version sometimes takes a bit longer to completely finish loading the page; I suspect that this is due to below-the-fold or background scripts that don't impact the user experience. The story changes when I load the AMP Cache's copy. Now, the HTML file frequently loads in 100 milliseconds, the first paint happens before the 200-millisecond mark, and the whole thing is often done in less than half a second. I'm essentially getting a Google-run CDN for free.
Page | Got HTML | First paint | Loaded |
---|---|---|---|
non-AMP | 260 | 320 | 650 |
AMP | 260 | 320 | 800 |
AMP Cache | 100 | 170 | 400 |
After I did all of this, I came across a page describing someone else's experiences generating an AMP version of their website. They don't seem too different from my own.
To summarize, I like AMP and think you should use it! If you give it a try, here are some useful resources:
- amphtml-discuss mailing list
- Stack Overflow questions tagged "amp-html"
- GitHub issue tracker for the amphtml project
- Google Webmasters Help Community (there used to be a dedicated "accelerated-mobile-pages-amp" category, but categories seem like they're gone now)
Addendum: HERE COMES A NEW IFRAME (2016-10-27)
I was feeling ambitious and/or cocky, so I decided to try to create an AMP version of that one last page with the embedded map. Here's a high-level overview of its original state:
- There's a
<script>
element that loads the Maps JavaScript API. - There's a chunk of my own JavaScript that uses the API to embed a map in the page and place a bunch of markers on it.
- Below the map, there are blurbs describing the various featured locations.
- When a map marker is clicked, an info bubble is displayed containing a
#some-blurb-id
link that jumps down to the corresponding blurb. - Each blurb has a link of its own that calls a JavaScript function that scrolls the page back up to the embedded map and selects the corresponding marker.
There's no place for custom JavaScript in AMP, so it seemed obvious that I'd need to use the same
iframe-based approach that I used for the glucometer page. It was straightforward to
move all of the page's JavaScript into a new page and embed it using an <amp-iframe>
element (which maybe loads a tiny bit slower than embedding the map directly into the main page). The tricky
part was going to be preserving (or at least approximating) the existing
clicking-on-map-marker-scrolls-main-page, clicking-on-link-in-main-page-selects-map-marker behavior.
First, I decided to try to get this working again on the non-AMP version of the page. Per the
same-origin policy,
the framed page and the main page are able to access each other as long as they're served from the same
place. The framed page can reach out using window.parent
or window.top
, so I made
the map info bubble set window.top.location.hash
to scroll the main page to a given blurb.
Going in the other direction, I decided that it'd be cleanest if the main page uses postMessage
to send objects to the framed page — this works even across origins. I ended up with the following in the
main page:
// Wire up links to post messages to the iframe to activate markers.
document.addEventListener('DOMContentLoaded', function() {
var iframe = document.getElementById('map');
var anchors = document.getElementsByClassName('map-link');
for (var i = 0; i < anchors.length; i++) {
var a = anchors[i];
var id = a.parentElement.parentElement.id;
var f = iframe.contentWindow.postMessage.bind(
iframe.contentWindow, {id: id}, '*', []);
a.addEventListener('click', f, false);
}
}, false);
The framed page then listens for messages that get posted to it:
window.addEventListener('message', function(e) {
selectPoint(e.data.id, true)
}, false);
This worked, so it was time to look at the AMP version. The JavaScript in the main page had to go when
$amp
is set; I instead made the blurbs link to #map
to just scroll the map into
view without selecting any markers. On the iframe side, I initially tried to continue updating window.top.location.hash
,
or even window.top.location.href
after reading the old href
and appending the
fragment. AMP requires iframes
to be sandboxed, so to convince the browser to permit this, I needed to set allow-top-navigation
,
along with allow-same-origin
... which
AMP
explicitly forbids when AMP pages and framed content are served from the same origin. The reason for
this makes sense: an AMP page might get served from a cache, in which case it'll no longer be coming from
the same origin as the framed page — it seems reasonable to block this upfront so developers don't get
confused later when their cached pages don't work.
I discovered that I was still able to assign to window.top.location.href
from the
framed page without allow-same-origin
, just not read from it. Assigning to location.hash
seems like it's treated the same as reading from location.href
(maybe that's how it's
implemented?), so I wasn't able to do that. And to make the iframe
construct a main page URL
containing a fragment, I realized I'd somehow need to pass it the main page URL — otherwise, it wouldn't
know if it should navigate to https://www.erat.org/pullups.html#foo
or https://www.erat.org/pullups**.amp**.html#foo
.
I tried passing the outer URL to the framed page via a query string:
<amp-iframe id="map" width="640" height="480"
layout="responsive" frameborder="0"
sandbox="allow-scripts allow-top-navigation"
src="https://www.erat.org/iframes/pullups_map.html?https://www.erat.org/pullups.amp.html">
</amp-iframe>
In the framed page, I grabbed this URL using:
var pageUrl = window.location.search.substring(1);
and then later scrolled the main page with:
window.top.location = pageUrl + '#' + id;
This mostly works, but when the page is served by the CDN (or by a local server I'm running for
development), this code navigates back to the copy of the page on my webhost. The outer page can't pass its
actual location from window.location
because doing so would require JavaScript.
This Stack Overflow answer, suggesting using document.referrer
,
offered a glimmer of hope. I was initially concerned that the referrer wouldn't be passed to the framed
page, but the following returns the outer URL (minus any fragment) regardless of whether it's on my webhost,
the CDN, or a local server:
var pageUrl = document.referrer.split('#', 1)[0];
So this brings the AMP page almost to parity with the non-AMP version: clicking links in blurbs jumps up to the map without selecting markers, but the more-important links in the map scroll down to the corresponding blurbs as before.
When I tried to load the AMP page on a laptop instead of a phone, I ran into another restriction:
amp-iframe
may not appear close to the top of the document (except for iframes that useplaceholder
as described below). They must be either 600px away from the top or not within the first 75% of the viewport when scrolled to the top – whichever is smaller.
Well, that seems straightforward enough. I loaded the page on a laptop with a high-DPI display and took a
screenshot of the map. Then I added an <amp-image>
element with a placeholder
attribute between the <amp-iframe>
tags:
<amp-img layout="fill" placeholder
src="pullups/map_placeholder-1448.png"
width="640" height="480"
alt="[map placeholder]"></amp-img>
I decided it'd also be nice to show a placeholder on the non-AMP page, so I first styled the <iframe>
in the outer page to display the image before its contents show up:
.mapbox iframe {
background-image: url('pullups/map_placeholder-1448.png');
background-size: 100% 100%;
/* additional properties... */
}
Within the framed page, the map itself can take a while to load, so I also styled the <body>
to display the placeholder image:
body {
margin: 0;
overflow: hidden;
background-image: url('pullups/map_placeholder-1448.png');
background-size: 100% 100%;
}
The map object apparently fires an event once it has fully loaded and become idle, so I made it initially hidden:
#map-div {
width: 100%;
height: 100%;
display: inline-block;
visibility: hidden;
position: absolute;
}
#map-div.loaded {
visibility: visible;
}
Here's the JavaScript that makes the map visible once it's ready:
// Only show the map once it's fully loaded.
google.maps.event.addListenerOnce(map, 'idle', function() {
mapDiv.className = 'loaded';
});
This works pretty well! The real map's scale can be different from the placeholder's depending on the browser's display size, and there's a chance that users might try to interact with the placeholder and get confused, but it's still less obtrusive than seeing a gray box when the page is first loaded.
You can see the AMP version of the pullup-bars page.
Addendum: Google not indexing AMP pages (2016-10-29)
I noticed that Google stopped linking to the AMP versions of my pages when I searched using the AMP search preview from a phone (or when I did a regular mobile Google search — AMP pages were added to the main mobile search index in September 2016, so the demo site is probably no longer necessary). This was weird, because it was definitely displaying them with the AMP logo adjacent before. My AMP pages were still validating and being returned by the AMP Cache, and the Search Console reported that they were indexed and even receiving traffic, so I was at a loss for why they weren't being returned as results.
I found this then-unanswered Stack Overflow question describing a similar problem, but it wasn't any help — Google was actually now returning the AMP version of the page mentioned in the question, and I didn't have enough reputation points on the site to be able to leave a comment asking the original poster if they'd changed anything to fix the problem.
I tried asking Google if it knew about the AMP versions of my pages using the ampUrls.batchGet API as described in Google's "Link to AMP Content" document. It didn't:
% curl -i -s -k -X POST \
-H "Content-Type: application/json" \
-H "X-Goog-Api-Key: [my key]" \
-d "{urls: ['https://www.erat.org/pullups.html']}" \
"https://acceleratedmobilepageurl.googleapis.com/v1/ampUrls:batchGet"
HTTP/1.1 200 OK
...
{
"urlErrors": [
{
"errorCode": "NO_AMP_URL",
"errorMessage": "No AMP URL for the request URL.",
"originalUrl": "https://www.erat.org/pullups.html"
}
]
}
When I searched for someone else's page, like https://ampbyexample.com
, the API would return
the AMP (in this case, same as the canonical) and CDN versions of the page:
{
"ampUrls": [
{
"originalUrl": "https://ampbyexample.com",
"ampUrl": "https://ampbyexample.com/",
"cdnAmpUrl": "https://cdn.ampproject.org/c/s/ampbyexample.com/"
}
]
}
I compared my site's pages to other AMP pages being indexed by Google and quickly noticed one difference: in
my non-AMP pages, the <link rel="amphtml" ...>
element was appearing pretty far down the
page, often more than 1300 bytes from the start of the file. (The huge JSON-LD
structured data blob was mostly to blame for this.) Other pages put it near the top. I remembered that
<meta charset>
elements are supposed to appear within the first 1024 bytes of a HTML file, and it seemed totally
plausible that Google could decide to also only scan the first kilobyte of a file looking for <link
rel="amphtml">
.
So, I updated my canonical pages so the <link rel="amphtml">
tags appeared as close to
the top as possible, just after <meta charset>
:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<link rel="amphtml" href="https://www.erat.org/pullups.amp.html">
<!-- additional elements... -->
By the next morning, around eight hours later, the API was reporting that it knew about the AMP and CDN versions of my pages:
{
"ampUrls": [
{
"originalUrl": "https://www.erat.org/pullups.html",
"ampUrl": "https://www.erat.org/pullups.amp.html",
"cdnAmpUrl": "https://cdn.ampproject.org/c/s/www.erat.org/pullups.amp.html"
}
]
}
And just a few hours after that, my AMP pages were also getting returned for mobile searches, so I answered the Stack Overflow question and filed an AMP bug requesting that this be clarified in the docs.
Addendum: More indexing problems (2016-11-19)
After a month or so, I was still excited enough about AMP that I decided to convert eatnum.com, another site that I made, to use it. I could probably write another long blog post about the process, but the upshot was that I ended up with dynamically-generated, canonical (i.e. no non-AMP versions) AMP pages at URLs like https://eatnum.com/19120.
More than a week after updating the site to use AMP, the Search Console was still reporting zero AMP pages,
though. It's basically just a bunch of machine-generated pages filled with numbers (*cough* although at
least not covered with ads and Flash like other nutrition data websites), so I could accept Google choosing
to take a while to recrawl it, but I noticed that the ampUrls.batchGet
API also reported URL_IS_INVALID_AMP
(Request URL is an invalid AMP URL.)
for all of the AMP pages. This was weird, since both in-browser
validation and the online validator said it was fine.
I eventually figured out that the problem was a hack that I'd added to make the non-interactive AMP versions
of these pages link to the dynamic site: I'd placed <a>
tags around an <input>
element with hopes of replacing the static page with a dynamic version of the same view when the user
clicked on the input field to search for something else. This worked as I'd intended in Chrome, but it's
against the rules — the W3C's
spec says that <a> can't contain interactive elements. After getting rid of the linked-<input>
hack, the API suddenly became happy with the AMP pages.
If I'd been using an HTML5 validator, I would've caught this earlier, but the two HTML5 validators that I'm aware of (the W3C's and Validator.nu's) display so many errors related to AMP's custom elements and attributes that they're unusable for AMP pages unless I bolt on a bunch of filtering to ignore the expected problems. Without doing that, I'm not sure how to prevent this from happening again given how lenient the AMP validator seems to be, though.
A bit later: While I haven't seen it mentioned in many places, there's also a
Google-supplied AMP validator. I'm not sure
whether it's stricter than the AMP project's validator, but I wouldn't be surprised if it includes some
extra checks that can help when trying to figure out why Google isn't indexing the AMP version of a page.
However, even with it reporting that my pages are "eligible for AMP search features in Google search
results" and "[have] valid structured data", the search console itself still reports zero AMP pages for
eatnum.com
. So... *shrug*.