The Church Media Community
Equipping You to Communicate Effectively
support CMN & share a
library of 19K+ images, videos, etc
Go Pro!
 
Go Back   The Church Media Community > Website Design > Website Coding Discussions
Forgot Password?
                          Register

Website Coding Discussions PHP, MySQL, Java, Javascript, ASP, etc.

Reply
 
Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old Friday, February 6th, 2009, 04:38 PM
New Church Media Member

 
 Join Date: Sep 2008 
 Last Online: Monday, August 16th, 2010 
Question php regex problem with url

Anybody familiar with regex, please keep reading:

On one of my databases the external links are saved in this format:
Code:
[url=www.google.com/index/]Link[/url]
It works fine if one of the following are used:BUT, it won't recognize a bare domain in this format:Can anybody see the problem right out? Here's the regex:
Code:
$table_field = preg_replace(
"/\[url=www\.([a-z0-9\-]+\.(com|org|net|us|eu|fi))(\/[a-zA-Z0-9\-\_\/]+)(\.(jpg|gif))?\](.+?)\[\/url\]/",
"<a href='http://www.$1$3$4'>$6</a>",
$table_field);
Thanks in advance!
Reply With Quote Start a New Topic From This Comment
  #2 (permalink)  
Old Friday, February 6th, 2009, 06:40 PM
Tim Eason - ChurchMedia.net Community Founder 1999-2008
Spectator

 
Have you tried removing the "www" parts? Most sites don't need it anymore, and some sites might look something like blogs.domain.com. It may be causing some weird conflict....

It looks fine from what I can tell.
Reply With Quote Start a New Topic From This Comment
  #3 (permalink)  
Old Friday, February 6th, 2009, 08:22 PM
shayward's Avatar
New PA System Survivor

 
 Join Date: Jun 2006 
 Last Online: Thursday, November 10th, 2011 
I'm not much good with php but I like regular expressions a lot. I analyzed your regex and I think I see the problem. I'm assuming * means "zero or more" in php regex.

\[url=www\.([a-z0-9\-]+\.(com|org|net|us|eu|fi))(\/[a-zA-Z0-9\-\_\/]+)*(\.(jpg|gif))?\](.+?)\[\/url\]

What you were doing is saying that you always require at least one / followed by some various characters. In the case of the last URL you posted, there was no / followed by extra characters. So by adding the * after (\/[a-zA-Z0-9\-\_\/]+) it made it optional.

Now with all of that said, I don't believe the pattern you are using will properly validate all URLs. Even with the * to solve one problem, this URL won't validate:

java.sun.com

Have a look at the specification at http://www.w3.org/Addressing/rfc1738.txt and you'll be surprised how much simpler you could make your regex.
__________________
Sanctus Software
More RegEx: (?<BookTitle>[A-Za-z0-9 ]+)\s(?<ChapterNumber>\d{1,3})[:](?<VerseNumber>\d{1,3})
Reply With Quote Start a New Topic From This Comment
  #4 (permalink)  
Old Friday, February 6th, 2009, 08:50 PM
shayward's Avatar
New PA System Survivor

 
 Join Date: Jun 2006 
 Last Online: Thursday, November 10th, 2011 
Here's perhaps something to have a look at:

PHP Code:
((?<Protocol>[a-z]+)://)?
(?<HostName>
    (?<
Label>[a-z0-9][[a-z0-9-]*)
    ([.](?<
Label>[a-z0-9][[a-z0-9-]*))*
)
([:](?<
Port>d{1,5}))?
(?<
Path>([/]([a-z0-9-.]|(%[0-9a-f]{2}))*)*|[/])? 
I haven't tested a huge variety of host names and protocols but it supports things like:

http://sun.java.com:90/dir1/dir2/dir3/pic.jpg
http://sun.java.com:90/dir1/dir2/dir3
http://sun.java.com:90/
http://sun.java.com
ftp://downloads.somesite.org
http://sun.java.com/special%0Fcaracters

So it supports:
- capturing groups... feel free to strip them out if you don't need them
- protocol (http, ftp) optionally
- any number of labels in the host name (but not imposing length limits)
- port numbers optionally
- path including special characters (%xx) where "x" is a hex digit.

I didn't do query strings. Also, this is compliant with .NET (C#, VB, ASP.NET) so you might need to add some escape characters for php.
__________________
Sanctus Software
More RegEx: (?<BookTitle>[A-Za-z0-9 ]+)\s(?<ChapterNumber>\d{1,3})[:](?<VerseNumber>\d{1,3})
Reply With Quote Start a New Topic From This Comment
  #5 (permalink)  
Old Friday, February 6th, 2009, 10:48 PM
New Church Media Member

 
 Join Date: Sep 2008 
 Last Online: Monday, August 16th, 2010 
Thanks Shayward!

Honestly I was not expecting much of an answer from this forum, but I'm lazy to sign up for new forums, so I gave it a shot...fortunately!

I'm intentionally avoiding any sun.java.com or :90 -type urls and especially any ftp urls. All I'm allowing is the basic www.java.com or with some subdirs and with/without jpg|gif file. Nothing else.

But thanks a thousands for your input! I hope that * would solve the problem. I'll give it a shot in the morning.
Reply With Quote Start a New Topic From This Comment
  #6 (permalink)  
Old Saturday, February 7th, 2009, 07:37 AM
shayward's Avatar
New PA System Survivor

 
 Join Date: Jun 2006 
 Last Online: Thursday, November 10th, 2011 
I'm glad it helped

You never know what someone on CMN is going to come up with
Reply With Quote Start a New Topic From This Comment
  #7 (permalink)  
Old Saturday, February 7th, 2009, 08:07 AM
New Church Media Member

 
 Join Date: Sep 2008 
 Last Online: Monday, August 16th, 2010 
Worked out perfect! Thanks shayward! The * was the fix.

It's always easier to invent something if you have sample to look at
Reply With Quote Start a New Topic From This Comment
Reply

  The Church Media Community > Website Design > Website Coding Discussions

Tags
php, preg_replace, problem, regex

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:



Add to Google


Register Now for FREE!
Our records show you have not yet registered to our community. To sign up for your FREE account INSTANTLY fill out the form below!

Username: Password: Confirm Password: E-Mail: Confirm E-Mail:
Agree to forum rules 


All times are GMT -6. The time now is 06:33 AM.

   
 
© 1995-2008, ChurchMedia™, ChurchMedia LLC

SEO by vBSEO 3.1.0