Webmaster level: All
In web development context, semantics refers to semantic markup, which means markup used according to its meaning and purpose.
Markup used according to its purpose means using heading elements (for instance,
h1
to
h6
) to mark up headings, paragraph elements (
p
) for paragraphs, lists (
ul
,
ol
,
dl
, also
datalist
or
menu
) for lists, tables for data tables, and so on.
Stating the obvious became necessary in the old days, when the Web consisted of only a few web sites and authors used tables to code entire sites, table cells or paragraphs for headings, and thought about other creative ways to achieve the layout they wanted. (Admittedly, these authors had fewer instruments at their disposal than authors have today. There were times when coding a three column layout was literally impossible without using tables or images.)
Up until today authors were not always certain about what HTML element to use for what functional unit in their HTML page, though, and “living” specs like HTML 5 require authors to keep an eye on what elements will be there going forward to mark up what otherwise calls for “meaningless” fallback elements like
div
or
span
.
To know what elements HTML offers, and what meaning these elements have, it’s necessary to consult the HTML specs. There are indices—covering
all HTML specs and elements—that make it a bit simpler to look up and find out the meaning of an element. However, in many cases it may be necessary to check what
the HTML spec says.
For example, take
the code
element:
The code
element represents a fragment of computer code. This could be an XML element name, a filename, a computer program, or any other string that a computer would recognize.
Author-controlled semantics
HTML elements carry meaning as defined by the HTML specs, yet ID and class names can bear meaning too. ID and class names, just like
microdata, are typically under author control, the only exception being
microformats. (We will not cover microdata or microformats in this article.)
ID and class names give authors a lot of freedom to work with HTML elements. There are a few basic rules of thumb that, when followed, make sure this freedom doesn’t turn into problems:
Advantages of using semantic markup
Using markup according to how it’s meant to be used, as well as modest use of functional ID and class names, has several advantages:
- It’s the professional thing to do.
- It’s more accessible.
- It’s more maintainable.
Special cases
“Neutral” elements, elements with ambiguous meaning, and presentational elements constitute special cases.
div
and
span
offer a “generic mechanism for adding structure to documents.” They can be used whenever there is no other element available that matches what the contents in question represent.
In the past a lot of confusion was caused by the
b
,
strong
,
i
, and
em
elements. Authors cursed
b
and
i
for being presentational, and typically suggested a 1:1 replacement with
strong
and
em
. Not to stir up the past, here’s what HTML 5 says, granting all four elements a
raison d’être:
b |
“a span of text to be stylistically offset from the normal prose without conveying any extra importance, such as key words in a document abstract, product names in a review, or other spans of text whose typical typographic presentation is boldened” |
<p>The <b>frobonitor</b> and <b>barbinator</b> components are fried. |
strong |
“strong importance for its contents” |
<p><strong>Warning.</strong> This dungeon is dangerous. |
i |
“a span of text in an alternate voice or mood, or otherwise offset from the normal prose, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, a ship name, or some other prose whose typical typographic presentation is italicized” |
<p>The term <i>prose content</i> is defined above. |
em |
“stress emphasis of its contents” |
<p><em>Cats</em> are cute animals. |
Last but not least, there are truly presentational elements. These elements will be supported by user agents (browsers) for forever but shouldn’t be used anymore as
presentational markup is not maintainable, and should be handled by style sheets instead. Some popular ones are:
How to tell whether you’re on track
A quick and dirty way to check the semantics of your page and understand how it might be interpreted by a screen reader is to disable CSS, for example using the Web Developer Toolbar extension available
for Chrome and
Firefox. This only identifies issues around the use of CSS to convey meaning, but can still be helpful.
There are also tools like
W3C’s semantic data extractor that provide cues on the meaningfulness of your HTML code.
Other methods range from peer reviews (coding best practices) to user testing (accessibility).
Do’s and Don’ts
Don’t |
Do |
Reason |
<p class"heading">foo</p>
|
<h1>foo</h1>
|
For headings there are heading elements. |
<p><font size="2">bar</font></p>
|
<p>bar</p>
p { font-size: 1em; }
|
Presentational markup is expensive to maintain. |
<table>
<tr>
<td class="heading">baz</td>
</tr>
<tr>
<td>scribble</td>
</tr>
</table>
|
<h1>baz</h1>
<p>scribble</p>
|
Use table elements for tabular data. |
<div class="newrow">foo</div>
<div>1</div>
<div class="newrow">bar</div>
<div>2</div>
|
<table>
<tr>
<th>foo</th>
<td>1</td>
</tr>
<tr>
<th>bar</th>
<td>2</td>
</tr>
</table>
|
Use table elements for tabular data. |
foo bar.<br><br>baz scribble.
|
<p>foo bar.</p>
<p>baz scribble.</p>
|
Denote paragraphs by paragraph elements, not line breaks. |
Written by Jens O. Meiert, Senior Web Architect, Google Webmaster Team