What things should a programmer implementing the technical details of a web site address before making the site public? If Jeff Atwood can forget about HttpOnly cookies [1], sitemaps [2], and cross-site request forgeries [3] all in the same site, what important thing could I be forgetting as well?
I'm thinking about this from a web developer's perspective, such that someone else is creating the actual design and content for the site. So while usability and content may be more important than the platform, you the programmer have little say in that. What you do need to worry about is that your implementation of the platform is stable, performs well, is secure, and meets any other business goals (like not cost too much, take too long to build, and rank as well with Google as the content supports).
Think of this from the perspective of a developer who's done some work for intranet-type applications in a fairly trusted environment, and is about to have his first shot and putting out a potentially popular site for the entire big bad world wide web.
Also: I'm looking for something more specific than just a vague "web standards" response. I mean, HTML, JavaScript, and CSS over HTTP are pretty much a given, especially when I've already specified that you're a professional web developer. So going beyond that, Which standards? In what circumstances, and why? Provide a link to the standard's specification.
This question is community wiki, so please feel free to edit that answer to add links to good articles that will help explain or teach each particular point. To search in only the answers from this question, use the
inquestion:this
option
[4].
The idea here is that most of us should already know most of what is on this list. But there just might be one or two items you haven't really looked into before, don't fully understand, or maybe never even heard of.
Interface and User Experience
rel="nofollow"
to user-generated links
to avoid spam
[13].Security
Performance
favicon.ico
file in the root of the site, i.e. /favicon.ico
.
Browsers will automatically request it
[46], even if the icon isn’t mentioned in the HTML at all. If you don’t have a /favicon.ico
, this will result in a lot of 404s, draining your server’s bandwidth.SEO (Search Engine Optimization)
example.com/pages/45-article-title
instead of example.com/index.php?page=45
history.pushState({"foo":"bar"}, "About", "./?page=1");
Is a great command. So even though the address bar has changed the page does not reload. This allows you to use ? instead of #! to keep dynamic content and also tell the server when you email the link that we are after this page, and the AJAX does not need to make another extra request./sitemap.xml
.
<link rel="canonical" ... />
[48] when you have multiple URLs that point to the same content, this issue can also be addressed from
Google Webmaster Tools
[49].301 Moved Permanently
) asking for www.example.com
to example.com
(or the other way round) to prevent splitting the google ranking between both sites.Technology
Bug fixing
Lots of stuff omitted not necessarily because they're not useful answers, but because they're either too detailed, out of scope, or go a bit too far for someone looking to get an overview of the things they should know. If you're one of those people you can read the rest of the answers to get more detailed information about the things mentioned in this list. If I get the time I'll add links to the various answers that contain the things mentioned in this list if the answers go into detail about these things. Please feel free to edit this as well, I probably missed some stuff or made some mistakes.
[1] http://en.wikipedia.org/wiki/Gecko_%28layout_engine%29Rule number one of security:
Never trust user input.
$_SERVER
in PHP) - Xeoncross
Not specific to public websites, but useful nevertheless:
Security
SEO
Performance
Productivity
User experience
Core Web technologies
I, personally, avoid using extensions like .php in my URLs. For example:
http://www.example.com/contact
http://www.example.com/contact.php
Not only does the first URL look cleaner, but if I decided to switch languages, it would be less of an issue.
So how does one implement this? Here is the .htaccess code I found works best:
# If requested URL-path plus ".php" exists as a file
RewriteCond %{DOCUMENT_ROOT}/$1.php -f
# Rewrite to append ".php" to extensionless URL-path
RewriteRule ^(([^/]+/)*[^.]+)$ /$1.php [L]
Source: http://www.webmasterworld.com/apache/3609508.htm
.htaccess
-file like that, but it is a good start at least. On a few sites I direct it all through one file (I've chosen r.php?r=$1
) from where I load up templates and if possible whole cached versions of the pages (like /about rarely changes, so I cut some milliseconds on the loadtime by not generating it every time). - Frank
Here are a couple of thoughts.
First, staging. For most simple sites developers overlook the idea of having one or more test or staging environments available to smoothly implement changes to architecture, code or sweeping content. Once the site is live, you must have a way to make changes in a controlled way so the production users aren't negatively affected. This is most effectively implemented in conjunction with the use of a version control system (CVS, Subversion, etc.) and an automated build mechansim (Ant, NAnt, etc.).
Second, backups! This is especially relevant if you have a database back-end serving content or transaction information. Never rely on the hosting provider's nightly tape backups to save you from catastrophe. Make triple-sure you have an appropriate backup and restore strategy mapped out just in case a critical production element gets destroyed (database table, configuration file, whatever).
I'll add one:
In addition to caching
It might be a bit outside of the scope, but I'd say that knowing how robots.txt [1] and search engine spiders work is a plus.
[1] http://www.robotstxt.org/robotstxt.htmlAside from basic competence in the base language and the key technologies which might be assumed (although shouldn't be taken for granted):
This is all before you even get to a web environment really. Once you get into a web environment you would really expect them to understand (in no particular order):
After that:
And then - if your site is data driven
Well, everyone else has already mentioned most things I thought of - but one thing I always forget is a favicon [1]. Sounds stupid, I know, but I think it's one of those little things that helps to emphasise your brand, and I never seem to remember it. Please check Scott Hanselman's post about how to use it carefully [2].
I agree with some of the rest too - I think it's important to know as much as possible about your chosen language, so that you can code it with best practices and maintainability in mind. I've come across functions and patterns that I wish I'd known about when I did my first few crappy, amateur projects, as it would have saved me writing some retarded WTF-ey workarounds!
[1] http://www.w3.org/2005/10/howto-faviconThe cruel, hard facts:
Users spend as much time on your website as an interviewer does reading your resume when submitted in a pile of thousands of others
Everything about websites and website design revolves around these facts.
This is just an outline on why it is so important to adhere to standards and read those website design books.
When to say "no" to the designer or client, and how to do so gracefully and diplomatically.
If you have any influence on design, please read, "Don't Make Me Think" by Steve Krug. It is an easy read, and will almost certainly make you think...
You also have to:
I follow several security related blogs and podcasts.
In addition, I get email alerts from SANS https://portal.sans.org/. (you need to register, but it's a great source).
(I'm always interested in learing about other good sources, too).
[1] http://www.schneier.com/blog/index.rdfHow to work with absolute and relative paths.
I would think that knowing all you can about your deployment environment would rank up there.
IIS, MSSQL or Apache, MySQL, etc? ASP.NET, PHP, etc.?
Perhaps this is a no-brainer, but surely someone out there has written code that relies on [insert dependency] only to find out their client's server was missing [aforementioned dependency].
Good thread. Here are some areas I think no one's mentioned:
Accessibility (A11y), WAI-ARIA tags and so forth and since it's 2010, why not start adding some HTML5 into the mix also.
checkout Selenium for jUnit-izing client-side testing.
And lastly, Content Distribution Networks, don't host your static files if you can avoid them. e.g. Akamai or Google's instance of JQuery.
How to build a scalable design in the off chance that the site becomes really popular.
Know how to hinder Denial of Service (DoS) attacks on user login forms by keeping track of the number of failed logins over a given period of time. In the event you hit a certain threshold above the running average, increase the duration of all recurring login attempts by a particular amount (say 5 sec.).
Someone feel free to modify for clarity :)
Security:
System.Net.Cookie
. Set the httpOnlyCookies
attribute on the authentication cookie. Internet Explorer Service Pack 1 supports this attribute, which prevents client-side script from accessing the cookie from the document.cookie property.Performance:
Found a new one today:
They style sheets you included as a base-line when starting a project to give you more consistent behavior across different browsers. See this question:
http://stackoverflow.com/questions/167531/is-it-ok-to-use-a-css-reset-stylesheet
On a public site, make sure you are using an XML sitemap [1] to help search engine crawlers crawl your content more intelligently.
If you have non-HTML content on your site, you should also look into Google's extensions of the sitemap protocol [2] to make sure you are using whatever is appropriate. They have specific extensions for News [3], Video [4], Code [5], Mobile-specific [6] content and Geospatial [7] content.
One thing I learned that was not obvious in the Google help, is that each of these content-specific sitemaps should be a separate file and joined together at the root with a sitemap index file [8]. For some reason Google doesn't like you to mix content in one sitemap. Also, when you use Google Webmaster tools [9] to tell Google about your sitemaps, tell it about each of the special sitemaps you have separately and use the drop-down to specify the type. You would think the crawler could use the XML to auto-detect this stuff, but apparently not.
[1] http://www.sitemaps.org/Ensure that whatever framework/server-side scripting/web server/other you're using doesn't expose error messages directly to the user.
Checking that whatever has been put in place to facilitate the above during development is switched off or reversed. Obviously the preference is to have this stuff properly configured in first place - but it will still occur time and time again.
That's mainly written from a security standpoint, but very much related is the usability issue of ensuring that should errors occur, the user get something that makes sense to them and tries as best possible to get them back to what they were doing.
Good knowledge of HTTP, including caching and expiry headers
How to avoid Cross site request forgeries (XSRF) (this is not cross site scripting (XSS))
Now I'll probably be modded down for overuse of parentheses.
I don't have sufficient good karma to edit, my contribution: Looks like everyone here is from the US :)
Most of the essentials have been covered by the top 10 answers, but here are a few of the ones I missed up there:
For browser compatibility testing, use browsershots.org (free) or better yet, litmus (cheap)
For stress testing, use the command line tool ab
-
ApacheBench
[1] (on Linux/Mac OS X). It will let you find the 'heavier' pages, so you can do your performance tweaking where it will matter the most (that is, caching!). "A slow page is a DoS attack waiting to happen."
If you, like most, will be using a web host rather than hosting your own web server, spend a couple of weeks (yes, weeks!) on the WebHostingTalk.com forum to get a feel for which hosting providers are currently the best in the lands. That forum is THE one and only gathering place for serious web hosting nerds, and these cats have the dirt on everyone. If you are serious about your web sites, you need to background check your hosting providers on WebHostingTalk.
Use a remote distributed system for monitoring your uptime (e.g. to determine whether it's time to move to a different hosting provider) - host-tracker.com comes to mind, but there are many others
Do not write your own CAPTCHA. I repeat: Do NOT write your own CAPTCHA!
make sure (unlike me) you dont develop your site using FF3 and IE8 and then at the end, check IE7 and see that it looks a mess and need to spend days tweeking it.
always check the site renders ok in a number of different browsers during development, dont leave it till the end.
Begin by designing your page as if HTML was your only tool and JavaScript and CSS didn't exist, and make sure it validates. (This is not an excuse to use <font> tags, I'm talking about making good semantic code here!)
Then, add CSS (from an external file), and gently style your work, adding as little extra HTML as possible.
Finally, make your JavaScript (I'd use jQuery) enhance the user experience - again adding as little extra markup as possible.
This is an interesting video (more on web application, but always a good thing)
Cross-browser support, particularly with respect to CSS.
You should consult the OWASP [1] web site and understand the vulnerabilities listed there. Keep in mind OWASP does not talk about issues like scalability, session state management issues, and browser compatibility. Those areas will need to be understood as well. But I would argue that they certainly are less important than security.
[1] http://www.owasp.org/Web standards:
HTTP protocols.
UI Design
Web Security
Web Caching
Some web server knowledge (Apache httpd, ISS, lighttpd)
LAMP
If you implement a "I forgot my password" feature, don't email their password back in plaintext. Instead, email them a time-expiring link which will take them to a page that allows them to select a new password.
Ten+ things you have to do before you launch a website [1]
[1] http://blog.eike.se/2010/11/ten-things-you-have-to-do-before-you.htmlSet aside all the technical aspects, skills and security, I would make sure that it would be easy to use and really does the thing the user expect. Human computer interaction is important. Layot and flow is important. Otherwise no one will use it, other that scammers, spammers and robots.
:)
//W
Read about the Principle of least astonishment in " Principle of least astonishment [1]" and " User-Friendly Programming [2]".
[1] http://en.wikipedia.org/wiki/Principle_of_least_astonishmentHaving a backup strategy is really important (as it's already been mentioned) but checking the backup is equaly as important. There is no point having 100s of backups if they are all corrupt. Your restoration strategy should be known and tested depending on the needs of the business.
Have a basic understanding of Web Analytics so they can understand how the users are interacting with their site.
HTTP Compression is often overlooked and can drastically speed up a website.
A web developer should know:
What should a developer know before building a public web site?
What about the data?
Cross Browser Compatibility
SEO
Horizontal /Vertical Scaling
Advantages/ Disadvantages of Caching
Duplicate slashes in a path are normally harmless, but <a href="//index.html">
does not mean what you think it means.
One very important thing for UI Heavy Sites is taking care of screen resolution. It can totally make or break the UI experience of your site.
If it's unusable, you have no chance!
From a systems perspective, document how the application works and the subsystems involved and add instrumentation to the application for the systems in which it will run (e.g. event logs or performace monitor in Windows).
The application has to be run by some support personnel and they need tools to track possible problems that may appear.
Consider your design from your potential users perspectives. How will they use the site? What will benefit them most? What will annoy, frustrate, or keep them from using it? If you're trying to decide on a design element that will benefit you, but not the user, scrap it.
I am new in Web Development and what I faced problem with are
Regarding credit cards and debit cards, at least within the United States, be aware of PCI compliance and the various rules and responsibilities that it covers. Accepting credit cards for a small e-commerce application can open a very nasty can of worms if the proper security measures are not in place. It goes way beyond having SSL enabled on the web site. Search for PCI-DSS on your favorite search engine and make sure you, and your clients, understand the regulations that they will need to follow. Other locales have similar rules under different names, but all of the major payment card players are getting serious about securing cardholder data.
Especially for SEO, but for some other reasons [1] as well: remove [2] session id's from (public) URLs, that might have been added by the web framework for cookie-less browsers, but may not be required for public browsing anyhow.
[1] http://randomcoder.com/articles/jsessionid-considered-harmfulThis may have already been mentioned, but knowing how the client plans on updating the site. If the client has someone who "knows HTML", then prepare for problems. It's best to have a good CMS in place for updates if the client wishes to update the website themselves, NEVER let them have access to all of your code.
Develop for Gecko and Webkit browsers first, then use conditional comments [1] to address IE issues that cannot be fixed by tweaking CSS (e.g. for more specificity, rules that trigger IE's ' hasLayout [2]', etc.)
[1] http://www.quirksmode.org/css/condcom.htmlNothing. If you're building your first site, just build it. Get dirty, make mistakes and learn. Because after you've built hundreds and lots of advanced tricks are second hand to you, after you've done it all, seen it all, the one thing you'll always need to remember is the one thing you know when you start: you don't know everything. Especially if you're worried about security. Even if you cover all the bases, someone will come up with something new. It's the downside to being one of the Good Guys.
You need very little knowledge to put a site out to the public. Don't forget that there are billions of sites out there, and you don't want to spend months of your valuable time building something that nobody wants.
All the skills you need are basic HTML, CSS and JavaScript to quickly throw up a prototype and put it out in front of the big bad web. Think about it this way - if you build out something really awesome in several months, let's say, and you put it out on web, and nobody clicks on that link to Get Started, then something has gone terribly wrong.
Either you were working on the wrong problem, or a problem that nobody had, or they didn't know they had a problem. You could simply test your early hypothesis by putting up a nice fancy mockup landing page with a link saying "Get Started", and when users click that, you take them to a thank you page asking them for their email/contact information to inform them for when you do actually go live.
I have recently been introduced to this idea of a Minimum Viable Product [1] (MVP) which is very radical in terms of what it is. It's not a minimum viable product in the sense that most developers would think of it as. Here's a nice interview with Eric Ries that talks about the idea in detail - http://venturehacks.com/articles/minimum-viable-product.
Kent Beck, the creator of Extreme Programming methodologies had an interesting story to share in the Startups Lessons Learnt [2] conference today in San Francisco. He had an idea of introducing a payment gateway to charge users for unlocking higher levels of a game he was building. They estimated it was going to take a little while to implement the whole thing, so he decided to just put up a button saying "Buy the Next Level" on the game page. When users clicked that link, they just let them into the next level without charging or anything. But it didn't hurt them at all as they didn't have a million+ user base, and they collected valuable information about how many users were actually willing to buy the next level.
So I would recommend you don't wait until you build a nicely polished and finished product before reaching out to your users. And to get started with that, you don't need a whole lot of knowledge. Basic HTML/CSS/JavaScript skills are more than sufficient to get started.
[1] http://en.wikipedia.org/wiki/Minimum_viable_productI agree with "The Professor" there's no point in having a beautifully built site that validates correctly and is accessible to all if the content is rubbish. In addition to his comment though I'd add spell checking and proof reading. I find that the majority of tweaks that have to be made after the site has gone live is down to spelling/grammatical issues.
If you are going to accept user input, learn input validation. This is the biggest thing that programmers make mistakes on, they accept user input in random location and it allows script kiddies to come along and remote include a file that then gives them full control over your local machine.
"Be lenient in what you accept, but strict in what you output"
However, don't trust any user generated input in any way shape or form. Don't trust it!
Understand how to monitor a site for intrusion and make it easy for the person who manages the site to recover to a known-good state. Even if you aren't going to be managing the site you should educate the site-owner in this regard before handing it over.
Even if your code is bulletproof, the server that the site is hosted on can be compromised (especially in a shared-server environment), so it seems like it's not so much a question of whether your site will be hacked, as when it will happen and how much pain will be involved in cleaning it up.
So you'll want to design with this in mind; e.g., craft your URL scheme such that it is easy to spot malicious requests in the access logs; think carefully before storing page templates in a database; and so forth.
Site design and development with thinking of localization feature for other languages.
Need to know what is easy to use for the public, not for an IT professional or software developer.
Take a look at a good web usability book, e.g. Don't Make Me Think: A Common-Sense Approach to Web Usability [1], by Steve Krug.
[1] http://rads.stackoverflow.com/amzn/click/0321344758If for some reason you don't trust Google or you want to have more control over the collected data, try Piwik [1] as analysis tool. It is open source and extensible via plugins.
[1] http://piwik.orgThe most important thing for a web site developer to know is that there really is no such thing as a standard. The standards exist, but are often ignored or are incorrectly implemented.
The only way to know if your pages are going to operate correctly on all web browsers is to try it on every browser you can find: IE, Firefox, Opera, Safari, and Chorme for a start.
So, yes, of course, use standard practices. But then test and remove those features which do not work across all browsers.
One of the key things is to understand how you are going to debug your system. This means understanding the 'big picture'. So know your environment (O/S, database, framework, networking et al) and at least know where to 'look' if you have ten users each calling with their on issue even if you did not write all that server side code.
Often times, good user interface design (error logging with the right amount of detail, log levels, hooks to display some details on demand) will go a long way.
Know how to resist session highjacking. Http_only is only one aspect of this, and not necessarily the most likely for some threat models (it applies when people can insert html onto your site).
There are session highjacking attacks which are regarded as remotely executable by NIST, and exploits are in the wild today. Here are some refs:
http://fscked.org/projects/cookiemonster
CVE-2002-1152 [1]
[1] http://www.google.com/search?client=safari&rls=en-us&q=CVE-2002-1152&ie=UTF-8&oe=UTF-8Make sure someone in the organization already has the content maintenance, ongoing SEO and marketing plan worked out fully. Because if they haven't, they're going to default to you to provide all of those things (possibly with little compensation).
Don't put users' email addresses in plain text as they will get spammed to death
Yeah, Im angry at alexa.com; before they somehow did get my contact info - I did not get any spam. Now I'm spammed to death! - Frank