Google Crawl Error 404

A word about 404 errors

One of the most common crawling errors are 404 errors, which occur when somebody tries to access a page that does not exist (usually because the page has been deleted or the user clicked on a broken or incorrect link). Most of the time, 404 errors can be ignored. However, if you’re seeing a lot of traffic leading to a URL that 404s, check your site for broken links. More information about dealing with 404 errors.

http://support.google.com/webmasters/bin/static.py?hl=en&page=checklist.cs&tab=1095580

Windows 7 – Enable Telnet

It’s very rare that I use Telnet these days, so it took a long time for me to notice that by default it was not packaged with Windows 7. I did some research and found out that this was also true for Windows Vista. More than likely this was an attempt to make Windows more secure by default, as Telnet is very insecure and whenever you have the choice you should always use SSH.
However, with that being said, you can quickly re-enable Telnet by following these steps:
  • Start
  • Control Panel
  • Programs And Features
  • Turn Windows features on or off
  • Check Telnet Client
  • Hit OK
  • After that you can start Telnet via Command Prompt.

Difference between POP3 and IMAP

What is the Difference between POP and IMAP Mail Server?

The using of IMAP to access your mailbox has advantages over POP3 and the difference of their working mechanism can be summarized in the following table.

POP3

IMAP

Since email needs to be downloaded into desktop PC before being displayed, you may have the following problems for POP3 access:

  • You need to download all email again when using another desktop PC to check your email.
  • May get confused if you need to check email both in the office and at home.

The downloaded email may be deleted from the server depending on the setting of your email client.

Since email is kept on server, it would gain the following benefits for IMAP access:

  • No need to download all email when using other desktop PC to check your email.
  • Easier to identify the unread email.
All messages as well as their attachments will be downloaded into desktop PC during the ‘check new email’ process. A whole message will be downloaded only when it is opened for display from its content.
Mailboxes can only be created on desktop PC. There is only one mailbox (INBOX) exists on the server. Multiple mailboxes can be created on the desktop PC as well as on the server.
Filters can transfer incoming/outgoing messages only to local mailboxes. Filters can transfer incoming/outgoing messages to other mailboxes no matter where the mailboxes locate (on the server or the PC).
Outgoing email is stored only locally on the desktop PC. Outgoing email can be filtered to a mailbox on server for accessibility from other machine.
Messages are deleted on the desktop PC. Comparatively, it is inconvenient to clean up your mailbox on the server. Messages can be deleted directly on the server to make it more convenient to clean up your mailbox on the server.
Messages may be reloaded onto desktop PC several times due to the corruption of system files. The occurrence of reloading messages from the server to PC is much less when compared to POP3.

Source: http://email.cityu.edu.hk/faq/popimap.htm

How to make Lucene indexing faster

Here are some things to try to speed up the indexing speed of your Lucene application. Please see ImproveSearchingSpeed for how to speed up searching.

  • Be sure you really need to speed things up. Many of the ideas here are simple to try, but others will necessarily add some complexity to your application. So be sure your indexing speed is indeed too slow and the slowness is indeed within Lucene.
  • Make sure you are using the latest version of Lucene.
  • Use a local filesystem. Remote filesystems are typically quite a bit slower for indexing. If your index needs to be on the remote fileysystem, consider building it first on the local filesystem and then copying it up to the remote filesystem.
  • Get faster hardware, especially a faster IO system. If possible, use a solid-state disk (SSD). These devices have come down substantially in price recently, and much lower cost of seeking can be a very sizable speedup in cases where the index cannot fit entirely in the OS’s IO cache.
  • Open a single writer and re-use it for the duration of your indexing session.
  • Flush by RAM usage instead of document count.For Lucene <= 2.2: call writer.ramSizeInBytes() after every added doc then call flush() when it’s using too much RAM. This is especially good if you have small docs or highly variable doc sizes. You need to first set maxBufferedDocs large enough to prevent the writer from flushing based on document count. However, don’t set it too large otherwise you may hit LUCENE-845. Somewhere around 2-3X your “typical” flush count should be OK.For Lucene >= 2.3: IndexWriter can flush according to RAM usage itself. Call writer.setRAMBufferSizeMB() to set the buffer size. Be sure you don’t also have any leftover calls to setMaxBufferedDocs since the writer will flush “either or” (whichever comes first).
  • Use as much RAM as you can afford.More RAM before flushing means Lucene writes larger segments to begin with which means less merging later. Testing in LUCENE-843 found that around 48 MB is the sweet spot for that content set, but, your application could have a different sweet spot.
  • Turn off compound file format.Call setUseCompoundFile(false). Building the compound file format takes time during indexing (7-33% in testing for LUCENE-888). However, note that doing this will greatly increase the number of file descriptors used by indexing and by searching, so you could run out of file descriptors if mergeFactor is also large.
  • Re-use Document and Field instances As of Lucene 2.3 there are new setValue(…) methods that allow you to change the value of a Field. This allows you to re-use a single Field instance across many added documents, which can save substantial GC cost. It’s best to create a single Document instance, then add multiple Field instances to it, but hold onto these Field instances and re-use them by changing their values for each added document. For example you might have an idField, bodyField, nameField, storedField1, etc. After the document is added, you then directly change the Field values (idField.setValue(…), etc), and then re-add your Document instance.Note that you cannot re-use a single Field instance within a Document, and, you should not change a Field’s value until the Document containing that Field has been added to the index. See Field for details.
  • Always add fields in the same order to your Document, when using stored fields or term vectorsLucene’s merging has an optimization whereby stored fields and term vectors can be bulk-byte-copied, but the optimization only applies if the field name -> number mapping is the same across segments. Future Lucene versions may attempt to assign the same mapping automatically (see LUCENE-1737), but until then the only way to get the same mapping is to always add the same fields in the same order to each document you index.
  • Re-use a single Token instance in your analyzer Analyzers often create a new Token for each term in sequence that needs to be indexed from a Field. You can save substantial GC cost by re-using a single Token instance instead.
  • Use the char[] API in Token instead of the String API to represent token TextAs of Lucene 2.3, a Token can represent its text as a slice into a char array, which saves the GC cost of new’ing and then reclaiming String instances. By re-using a single Token instance and using the char[] API you can avoid new’ing any objects for each term. See Token for details.
  • Use autoCommit=false when you open your IndexWriterIn Lucene 2.3 there are substantial optimizations for Documents that use stored fields and term vectors, to save merging of these very large index files. You should see the best gains by using autoCommit=false for a single long-running session of IndexWriter. Note however that searchers will not see any of the changes flushed by this IndexWriter until it is closed; if that is important you should stick with autoCommit=true instead or periodically close and re-open the writer.
  • Instead of indexing many small text fields, aggregate the text into a single “contents” field and index only that (you can still store the other fields).
  • Increase mergeFactor, but not too much.Larger mergeFactors defers merging of segments until later, thus speeding up indexing because merging is a large part of indexing. However, this will slow down searching, and, you will run out of file descriptors if you make it too large. Values that are too large may even slow down indexing since merging more segments at once means much more seeking for the hard drives.
  • Turn off any features you are not in fact using. If you are storing fields but not using them at query time, don’t store them. Likewise for term vectors. If you are indexing many fields, turning off norms for those fields may help performance.
  • Use a faster analyzer.Sometimes analysis of a document takes alot of time. For example, StandardAnalyzer is quite time consuming, especially in Lucene version <= 2.2. If you can get by with a simpler analyzer, then try it.
  • Speed up document construction. Often the process of retrieving a document from somewhere external (database, filesystem, crawled from a Web site, etc.) is very time consuming.
  • Don’t optimize… ever.
  • Use multiple threads with one IndexWriter. Modern hardware is highly concurrent (multi-core CPUs, multi-channel memory architectures, native command queuing in hard drives, etc.) so using more than one thread to add documents can give good gains overall. Even on older machines there is often still concurrency to be gained between IO and CPU. Test the number of threads to find the best performance point.
  • Index into separate indices then merge. If you have a very large amount of content to index then you can break your content into N “silos”, index each silo on a separate machine, then use the writer.addIndexesNoOptimize to merge them all into one final index.
  • Run a Java profiler.If all else fails, profile your application to figure out where the time is going. I’ve had success with a very simple profiler called JMP. There are many others. Often you will be pleasantly surprised to find some silly, unexpected method is taking far too much time.

Source: http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

Apache CouchDB

Apache CouchDB™ is a database that uses JSON for documents, JavaScript for MapReduce queries, and regular HTTP for an API

CouchDB is a database that completely embraces the web. Store your data with JSON documents. Access your documents with your web browser, via HTTP. Query, combine, and transform your documents with JavaScript. CouchDB works well with modern web and mobile apps. You can even serve web apps directly out of CouchDB. And you can distribute your data, or your apps, efficiently using CouchDB’s incremental replication. CouchDB supports master-master setups with automatic conflict detection.

CouchDB comes with a suite of features, such as on-the-fly document transformation and real-time change notifications, that makes web app development a breeze. It even comes with an easy to use web administration console. You guessed it, served up directly out of CouchDB! We care a lot about distributed scaling. CouchDB is highly available and partition tolerant, but is also eventually consistent. And we care a lot about your data. CouchDB has a fault-tolerant storage engine that puts the safety of your data first.

See the introductiontechnical overview, or one of the guides for more information.

MongoDB

MongoDB (from “humongous”) is a scalable, high-performance, open source NoSQL database. Written in C++, MongoDB features:

Document-oriented storage » JSON-style documents with dynamic schemas offer simplicity and power.

Full Index Support » Index on any attribute, just like you’re used to.

Replication & High Availability » Mirror across LANs and WANs for scale and peace of mind.

Auto-Sharding » Scale horizontally without compromising functionality.

Querying » Rich, document-based queries.

Fast In-Place Updates » Atomic modifiers for contention-free performance.

Map/Reduce » Flexible aggregation and data processing.

GridFS » Store files of any size without complicating your stack.

Commercial Support » Enterprise class support, training, and consulting available.

JustOneDB

The Relational Database for Big Data

JustOneDB is a new class of database – a NewSQL database that feels like a traditional relational database yet performs and adapts to change like no other.

The likelihood is that your application is best suited to a relational database – most are. But with exploding data volumes driving spiraling hardware and software license costs, the options for keeping pace with the tsunami of data are daunting.

JustOneDB removes all of that pain. It can handle the biggest data volumes today but at a fraction of the cost and complexity of alternative solutions.

Fast Facts
Fully-functional relational database
– SQL99 compliant
– Fully transactional
– PostgreSQL compatible
– Industry standard interfaces for BI tools and languages
– No need to design indexes, partitions for query performance
– No need for schema transformations
– Very fast row inserts
– Index-like query performance
– Concurrent updates and queries
– Can use DAS, NAS and SAN storage
– Supports stored procedures

Performance
Performance per SATA HDD and 3GHz CPU
– Insert up to 500,000 column values per second
– Eliminate up to 1 billion rows per second for selective queries

Capacity
– Up to 65535 tables per database
– Up to 1024 columns per table
– Up to 65535 bytes per text value
– Number values +/- 10
75 at up to 512 bit precision

Limitations
Release 1.1 has the following temporary limitations that will be removed in future releases
– Analytical queries currently use conventional join strategies and row aggregation and perform similarly to a fully indexed row store

– Features not currently supported:
– Triggers
– Save-points
– Unique and key constraints
– Text search
– Spatial search
– Object extensions

RavenDB

RavenDB is a transactional, open-source Document Database written in .NET, offering a flexible data model designed to address requirements coming from real-world systems.

RavenDB allows you to build high-performance, low-latency applications quickly and efficiently.

Features:

– Safe by default

Based on years of experience with real, live enterprise systems, RavenDB is built to ensure data access is done right. No locking, no abuse of network or system resources. With RavenDB your application is guarateed to be as fast as and reliable as it gets.

– Transactional

ACID transactions are fully supported, even between different nodes. If you put data in, it is going to stay there. We care about your data and we keep it safe.

– Scalable

Sharding, replication and multi-tenancy are supported out-of-the-box. Scaling out is as easy as it gets.

– Schema free

Forget about tables, rows, mappings or complex data-layers. RavenDB is a document-oriented database you can just dump all your objects into.

– Get running in 5 minutes

5 minutes, that’s all it takes to start using RavenDB. Designed not to get in your way, RavenDB requires no complex installation process, just download and run. Check out our Quickstart Tutorials

It Just Works

Stop fighting the database and get ready to go into a world full of fun, with a database that cares. The fluent and intuitive API makes building data backed applications a breeze. As a guideline, zero-administration is required to the server. Just unzip, run and start writing code.

Fast queries

RavenDB can satisfy any query in the speed of light, as no processing whatsoever is being made to satisfy queries. All indexing operations are done in the background, and have no effect on querying, writing or reading from the database.

Best practices built in

Enjoy working with the bleeding edge of modern software development, using friction-free methodolgies.

High performance

RavenDB is a very fast persistence layer for every type of data model. Skip creating complicated mapping or multi-layer DALs, just persist your entities. It Just Works, and it does the Right Thing.

Caching built in

Multiple level of caches operate automatically both on the server and on the client by default, transparently. Yet, caching is completely configurable and advanced modes like Aggressive Caching exist.

APIs

Access RavenDB from any language and technology. Client / Server communication is done via REST over HTTP, and client APIs for .NET (including Linq and F# support), Silverlight and Javascript

Built-in management studio

Easily manage your database and data using the graphical UI bundled with every instance of RavenDB server.

Carefully designed

Every bit of code was carefully considered. RavenDB was designed with best-practices in mind, and it ensures that everything Just Works.

Map / Reduce

Indexes are defined using easy to write Map/Reduce functions written in Linq syntax. By supporting concepts like multi-maps and boosting indexes are so simple to write, yet very powerful.

Feature rich and extensible

Built with extensibility in mind, RavenDB can be easily extended both on the client and the server. Many integration points ensure you can always squeeze more out of RavenDB. You aren’t shackled to a One Size Fits None solution.

Embeddedable

RavenDB can be embedded in any .NET application, making it a perfect fit also for desktop applications.

Bundles

RavenDB ships with server-side plugins extending it in various helpful ways. It is just a matter of dropping a DLL to the server folder.

Index replication to SQL

To allow you to take advantage of the reporting tools available in the relational world, RavenDB allows you to easily replicate indexes to SQL tables.

Full-text search built in

No need to plug in any external tool to support advanced searches on text fields. Full-text searches are supported out of the box by the server and the client API.

Advanced search techniques

The built-in full-text search engine (Lucene) allows RavenDB to support a lot of other cool stuff, including and not limited to:

Geo-spatial search support

Out of the box, with easy to use API

Easy backups

Make backups asynchronously, without disturbing the normal DB operations. Backup and Restore are both supported by the DB, a utility tool to make the process even easier is bundled with the server.

Multi-tenancy

Host multiple databases in one RavenDB server.

Attachments

RavenDB supports storing data streams that are not actual data, like images and other binary data you don’t want to store as a document but still want available.

Online Index Rebuilds

Indexes are updated in the background, without requiring any interaction from the user or the normal ACID operation of the database.

Fully async (C# 5 ready)

RavenDB already supports the brand new async API intruced by C# 5

Community

RavenDB enjoys a great and supportive community you can meet in the mailing list and on JabbR.

Cloud hosting available

No need to host the server yourself. Run RavenDB on the cloud with RavenHQ, CloudBird, AppHarbor or Windows Azure.