EllisLab text mark
Advanced Search
     
Is it possible to intercept external garbage URLs to a search routine instead of 404 page?
Posted: 11 April 2011 02:00 AM   [ Ignore ]
Avatar
Joined: 2011-03-08
417 posts

 
Google Webmaster Tools is complaining of numerous links returning with “An Error Was Encountered [400]”.

Edit - start:

Just noticed that my original post was truncated:(

What I would like to do is to somehow trap the URL before it fails the routing tests, etc.

I would like to use the following code:

$bad_chars = array('width','height','=',':','<','>','alt','//''etc');

  
$good_words  str_replace($bad_chars'/'$_SERVER['REDIRECT_URL']); // REQUEST_URL

  
header('Location: /my_search_routine/' $good_words);
  exit; 

Edit - end:


The following is an example which results in application/errors/error_general.php

 
http://website.com/afiles/images/santa-email.jpg” width=“100” height=“50” alt=“image”></a> </div> <div class=“c0 r”><a

 Signature 

Ongoing project:
  http://johns-jokes.com/joke-of-the-day/2010/May.html
My Hippy Trail:
  http://the-road-to-kathmandu.johns-jokes.com/
In case you forget:
  http://deformedweb.co.uk/php_variable_tests.php

 
Posted: 11 April 2011 02:42 AM   [ Ignore ]   [ # 1 ]   [ Rating: 0 ]
Avatar
Joined: 2008-11-04
4422 posts

Make sure your images exist?

 Signature 

Me: WanWizard.eu | My company: Exite | Datamapper: DataMapper ORM

 
Posted: 11 April 2011 03:12 AM   [ Ignore ]   [ # 2 ]   [ Rating: 0 ]
Avatar
Joined: 2011-03-08
417 posts

 
The images all exist, the problem is the trailing junk.

Try appending the junk onto a known URL image on your site and see what happens.


I have just tried appending the junk onto your avatar and the response I get is “Oops! This link appears to be broken.”

 

http://ellislab.com/images/avatars/uploads/avatar_78055.jpgOops! This link appears to be broken.

 
 
 

 Signature 

Ongoing project:
  http://johns-jokes.com/joke-of-the-day/2010/May.html
My Hippy Trail:
  http://the-road-to-kathmandu.johns-jokes.com/
In case you forget:
  http://deformedweb.co.uk/php_variable_tests.php

 
Posted: 11 April 2011 03:31 AM   [ Ignore ]   [ # 3 ]   [ Rating: 0 ]
Avatar
Joined: 2009-06-19
6267 posts

The problem comes form the href content which starts with a single quote but is erroneously closed with a double quote - so it’s not actually closed until another single quote is found further down. So all of http://www.snapshotjourneys.com/uploads/images/BORNEO/borneo-kota-kinabalu-malaysia/borneo-kota-kinabalu-malaysia-3-university.jpg” width=“81” height=“50” alt=“image”></a> </div> <div class=“c0 r”><a

 Signature 

Ceritfied State of CT Computer Programming Teacher.
Custom Designed Icons, eBook Covers Software Boxes. CD, DVD Etc. New iPhone® Tab Bar Icons and iPhone® Applications Icons.

STOP! Before posting your questions, remember the WWW Golden rule:
What did you try? What did you get? What did you expect to get?

Input -> Controller | Processing -> Model | Output -> View

 
Posted: 11 April 2011 04:22 AM   [ Ignore ]   [ # 4 ]   [ Rating: 0 ]
Avatar
Joined: 2011-03-08
417 posts

 
Just updated my original post to include the requirements which were truncated.

 
 
 

 Signature 

Ongoing project:
  http://johns-jokes.com/joke-of-the-day/2010/May.html
My Hippy Trail:
  http://the-road-to-kathmandu.johns-jokes.com/
In case you forget:
  http://deformedweb.co.uk/php_variable_tests.php

 
Posted: 11 April 2011 01:28 PM   [ Ignore ]   [ # 5 ]   [ Rating: 0 ]
Avatar
Joined: 2008-11-04
4422 posts

I still don’t see how you can get into this situation other than invalid HTML or invalid links.
Which is your problem as a developer, and you should fix that, not work around it.

 Signature 

Me: WanWizard.eu | My company: Exite | Datamapper: DataMapper ORM

 
Posted: 11 April 2011 02:13 PM   [ Ignore ]   [ # 6 ]   [ Rating: 0 ]
Avatar
Joined: 2011-03-08
417 posts

I got into this situation by other webmasters using incorrect hotlinks. I have no control over these other sites but it appears I am being penalised by Google for not having corresponding landing pages for the bad URLs.

Here are Google Webmaster Tools’s first two from eighteen web sites that have invalid links:

http://ezentials.com/eqk-7-days-before-santa-rfc.html

http://fivestarsmarketplace.com/lov-diagram-santa-pictures-printables.html

Search the source code for “afiles/images” and as you will see the first part of the image URLs is correct but the complete URL is invalid.

I was hoping to find a way to test the URL before CI routed the URL to an error page. This would also be ideal for filtering all the other hotlinked images.
 
 
 

 Signature 

Ongoing project:
  http://johns-jokes.com/joke-of-the-day/2010/May.html
My Hippy Trail:
  http://the-road-to-kathmandu.johns-jokes.com/
In case you forget:
  http://deformedweb.co.uk/php_variable_tests.php

 
Posted: 11 April 2011 02:32 PM   [ Ignore ]   [ # 7 ]   [ Rating: 0 ]
Avatar
Joined: 2008-11-04
4422 posts

Ok. So this is about other sites linking to your site?

Then instead of a standard CI 404, route to a 404 controller returning a 200 status, and displays a 404 page with links to important parts of your application.

 Signature 

Me: WanWizard.eu | My company: Exite | Datamapper: DataMapper ORM

 
Posted: 11 April 2011 03:01 PM   [ Ignore ]   [ # 8 ]   [ Rating: 0 ]
Avatar
Joined: 2011-03-08
417 posts
WanWizard - 11 April 2011 06:32 PM

Ok. So this is about other sites linking to your site?

Then instead of a standard CI 404, route to a 404 controller returning a 200 status, and displays a 404 page with links to important parts of your application.

 

 
Ah the “penny has dropped”.

I was curious to know why my code was being ignored in the /application/errors/error_general.php. I will try remming the “header(“HTTP/1.1 404 Not Found”);” script and report back tomorrow… now it is way past my bed time smile

Many thanks.
 
 
 

 Signature 

Ongoing project:
  http://johns-jokes.com/joke-of-the-day/2010/May.html
My Hippy Trail:
  http://the-road-to-kathmandu.johns-jokes.com/
In case you forget:
  http://deformedweb.co.uk/php_variable_tests.php

 
Posted: 12 April 2011 12:19 PM   [ Ignore ]   [ # 9 ]   [ Rating: 0 ]
Avatar
Joined: 2011-03-08
417 posts

Nearly there but cannot get both conditions to work together.

What I would like to do is to somehow trap the external URL before it fails the routing tests, etc.

The following .htaccess in the images folder is supposed to:
1. accept image links from my own site
2. intercept all external links and divert to an ./images/index.php
  (where URL is parsed and routed to a search routine).

.htacees

RewriteEngine on

  
# this line redirects everything to index.php including links from my own site
  # RewriteRule (.*) index.php

  
RewriteCond %{HTTP_REFERER} !^$
  
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?johns-jokes.com [NC]

  # RewriteRule \.$ ./index.php 
  # RewriteRule (.*) index.php/$1 [R,NC,L] 


 
./images/index.php

<?php 
  
// this works fine
  // 1. parses the URI
  // 2. formats the results 
  // 3. redirects the results to my search routine with parameters

  
$x $_SERVER['REQUEST_URI'];
  if(
strpos($x'.'))
  
{
    
// bad link used for testing
    // $x = http://johns-jokes.com//afiles/images/days-before-christmas.png" width="39" alt="image"/>

    
$x=substr($x,15);
    
$i2=strpos($x'.');
    
$x=substr($x0$i2);

    
$x=str_replace('-','/'$x);
    
header ('HTTP/1.1 301 Moved Permanently');
    
header('Location: http://johns-jokes.com/joke/search/' .$xTRUE301);
    exit;
  

 
 
 

 Signature 

Ongoing project:
  http://johns-jokes.com/joke-of-the-day/2010/May.html
My Hippy Trail:
  http://the-road-to-kathmandu.johns-jokes.com/
In case you forget:
  http://deformedweb.co.uk/php_variable_tests.php