Solved Check URL using DOS script

born2achieve

New member
Local time
4:33 AM
Messages
36
Hi,

I am using windows7 64 bit. I have a text file "sample.txt" and text file has 200000 image url's as like below


http://www.imagesup.net/img/icon_index1.png
http://www.imagesup.net/img/icon_index2.png
http://www.imagesup.net/img/icon_index3.png
http://www.imagesup.net/img/icon_index3.png
http://www.imagesup.net/img/icon_index5.png
http://www.imagesup.net/img/icon_index6.png
http://www.imagesup.net/img/icon_index7.png
http://www.imagesup.net/img/icon_index8.png
........

I wanted to check the image existson the directory, All my images are hosted remotely.Is it possible to achive using the batch script? if i executhe batch, it should output the url which doesn't have the image on output.txt file. Is it possible to achieve using DOS script. any sampel code please

Thanks
 

My Computer

Computer type
PC/Desktop
OS
windows7 64 bit
Can you post the txt file somewhere (preferable zipped) and post a link to it?

you just want to check if http://www.imagesup.net/img/icon_index6.png exists? (and that very a lot of files). So the files have to be checked on the webserver. is that a linux or windows machine?
 

My Computer

Computer type
Laptop
Computer Manufacturer/Model Number
ACER ASPIRE 5742G
OS
Microsoft Windows 7 Home Premium 64-bits 7601 Multiprocessor Free Service Pack 1
CPU
Intel(R) Core(TM) i3 CPU M 370 @ 2.40GHz
Motherboard
Acer Aspire 5742G
Memory
4,00 GB
Graphics Card(s)
ATI Mobility Radeon HD 5400 Series
Sound Card
(1) AMD High Definition Audio Device (2) Realtek High Defi
Screen Resolution
1366 x 768 x 32 bits (4294967296 colors) @ 60 Hz
Hard Drives
WDC WD5000BEVT-22ZAT0
Hi,

thanks for your reply and it's windows server. but i don't have access to the server. Is it possible to check from m y local machine? could you please assist me
 

My Computer

Computer type
PC/Desktop
OS
windows7 64 bit
Is it just a large number of url's you want to check?
each line in txt file has 1 url?

Post the txt file zipped please
 

My Computer

Computer type
Laptop
Computer Manufacturer/Model Number
ACER ASPIRE 5742G
OS
Microsoft Windows 7 Home Premium 64-bits 7601 Multiprocessor Free Service Pack 1
CPU
Intel(R) Core(TM) i3 CPU M 370 @ 2.40GHz
Motherboard
Acer Aspire 5742G
Memory
4,00 GB
Graphics Card(s)
ATI Mobility Radeon HD 5400 Series
Sound Card
(1) AMD High Definition Audio Device (2) Realtek High Defi
Screen Resolution
1366 x 768 x 32 bits (4294967296 colors) @ 60 Hz
Hard Drives
WDC WD5000BEVT-22ZAT0

My Computer

Computer type
Laptop
Computer Manufacturer/Model Number
ACER ASPIRE 5742G
OS
Microsoft Windows 7 Home Premium 64-bits 7601 Multiprocessor Free Service Pack 1
CPU
Intel(R) Core(TM) i3 CPU M 370 @ 2.40GHz
Motherboard
Acer Aspire 5742G
Memory
4,00 GB
Graphics Card(s)
ATI Mobility Radeon HD 5400 Series
Sound Card
(1) AMD High Definition Audio Device (2) Realtek High Defi
Screen Resolution
1366 x 768 x 32 bits (4294967296 colors) @ 60 Hz
Hard Drives
WDC WD5000BEVT-22ZAT0
Hey Dude,

thanks for the reply and Finally i was able to install the GetGnuwin32 and i did all the installation specified in the document. Now i could see the Wget exe and i tried to

D:\GnuWin32\GetGnuWin32\bin> wget.exe http://pagead2.googlesyndication.com/simgad/11417432194530698365

I could see the result on the command prompt and i can see the image downloaded on the root folder. Now Could you please help me on reading the URL.txt has 200000 urls and need to output which URL has doesn't have image on output.txt.

[Note: I don't want to download the image to my folder. I just need
get the URL which doesn't have image]

could you please help on making this process as batch script


Any help please.
 

My Computer

Computer type
PC/Desktop
OS
windows7 64 bit
Can you post the txt file. Best to zip it first.
I want to see the format and test. Or you don't want that due to privacy???
I that case post only a small part
 

My Computer

Computer type
Laptop
Computer Manufacturer/Model Number
ACER ASPIRE 5742G
OS
Microsoft Windows 7 Home Premium 64-bits 7601 Multiprocessor Free Service Pack 1
CPU
Intel(R) Core(TM) i3 CPU M 370 @ 2.40GHz
Motherboard
Acer Aspire 5742G
Memory
4,00 GB
Graphics Card(s)
ATI Mobility Radeon HD 5400 Series
Sound Card
(1) AMD High Definition Audio Device (2) Realtek High Defi
Screen Resolution
1366 x 768 x 32 bits (4294967296 colors) @ 60 Hz
Hard Drives
WDC WD5000BEVT-22ZAT0
I made a small batch file, no third-party tools used. But this solution is going to be very slow and might take days if you have 200000 URLs to check! I suggest you do a small test first like I've done.

This batch file assumes you run it from the folder where the sample.txt file is. As content for my sample.txt file I used the URLs listed in your first post + added 4 URLs for pictures that actually exist. None of the URLs from your first post works.

sample.txt content:
http://www.imagesup.net/img/icon_index1.png
http://www.imagesup.net/img/icon_index2.png
http://www.imagesup.net/img/icon_index3.png
http://www.imagesup.net/img/icon_index3.png
http://www.imagesup.net/img/icon_index5.png
http://www.imagesup.net/img/icon_index6.png
http://www.imagesup.net/img/icon_index7.png
http://www.imagesup.net/img/icon_index8.png
http://www.sevenforums.com/images/styles/window7/misc/r1sha-1c.jpg
http://www.sevenforums.com/images/ranks/MCC130c.png
http://www.theregister.co.uk/Design/graphics/std/rlogo.png
https://images.cdn.static.malwareby...s/20141121-15225cba892/images/header-logo.jpg

The batch file:
Code:
@ECHO OFF

REM Only URLs that failed will be printed

REM Create Vbscript to check URL:
 >"TestURL.vbs" echo URL = wscript.arguments(0)
>>"TestURL.vbs" echo Set objXMLHTTP = CreateObject("MSXML2.XMLHTTP")
>>"TestURL.vbs" echo objXMLHTTP.open "GET", URL, false
>>"TestURL.vbs" echo objXMLHTTP.send()
>>"TestURL.vbs" echo While Not objXMLHTTP.ReadyState = 4
>>"TestURL.vbs" echo     Sleep 10
>>"TestURL.vbs" echo Wend
>>"TestURL.vbs" echo If Not objXMLHTTP.Status = 200 Then
>>"TestURL.vbs" echo     WScript.Echo(URL)
>>"TestURL.vbs" echo End if
>>"TestURL.vbs" echo Set objXMLHTTP = Nothing

REM Read each line in txt file and call the Vbscript with the line as parameter:
for /F "delims=" %%a in (sample.txt) do (
    cscript /nologo TestURL.vbs "%%a"
)
Output when I run the batch file:
http://www.imagesup.net/img/icon_index1.png http://www.imagesup.net/img/icon_index2.png http://www.imagesup.net/img/icon_index3.png http://www.imagesup.net/img/icon_index3.png http://www.imagesup.net/img/icon_index5.png http://www.imagesup.net/img/icon_index6.png http://www.imagesup.net/img/icon_index7.png http://www.imagesup.net/img/icon_index8.png
Only the failed URLs were listed. Not the 4 URLs I added.

For testing purposes the batch file won't write to output.txt but when you've tested it and want to create the output.txt file you run the batch file like this that will print to the file instead of the command window:
nameOfBatchFile > output.txt
 

My Computer

Computer type
Laptop
Computer Manufacturer/Model Number
HP Elitebook 8540p
OS
Windows 7 Pro 32
CPU
Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz
Motherboard
Hewlett-Packard 1521
Memory
4,00 GB (Usable 2,98)
Graphics Card(s)
NVIDIA NVS 5100M
Sound Card
NVIDIA High Definition Audio
Screen Resolution
1600x900
Hard Drives
INTEL SSDSA2CW120G3
Antivirus
F-Secure Internet Security
Browser
IE, Firefox, Opera
Other Info
Sandboxie,
SRP (Software Restriction Policy),
EMET (Enhanced Mitigation Experience Toolkit),
WFC (Windows Firewall Control by BiniSoft),
Malwarebytes Premium
Hey Dude,

Finally i was able to install the GetGnuwin32 and i did all the installation specified in the document. Now i could see the Wget exe and i tried to

D:\GnuWin32\GetGnuWin32\bin> wget.exe http://pagead2.googlesyndication.com/si ... 4530698365

I could see the result on the command prompt and i can see the image downloaded on the root folder. Now Could you please help me on reading the URL.txt has 200000 urls and need to output which URL has doesn't have image on output.txt.
then i tried to achieve the ecat output by below code,

Code:
@echo off
(for /f "usebackq delims=" %%a in ("url-list.txt") do (
    "D:\GnuWin32\GetGnuWin32\bin\wget.exe" --spider "%%a" || echo missing %%a
))>url.log
pause

i am good now. but seems to be this is not fastest way i am looking for as it takes plenty of time to output the result. please guide me with the fastest way to achieve this
 

My Computer

Computer type
PC/Desktop
OS
windows7 64 bit
Hey Guyz,

thakns for all your timely help and could you please help on achieving this faster. Any suggestions please
 

My Computer

Computer type
PC/Desktop
OS
windows7 64 bit
Sorry, that's the best I can do. The problem is that your PC will have to access every URL and wait for the web site to finish loading it until you know if the picture exist or not.

Normally these kind of things are done with web crawler tools from a server but that's another issue and one I can't help you with.
 

My Computer

Computer type
Laptop
Computer Manufacturer/Model Number
HP Elitebook 8540p
OS
Windows 7 Pro 32
CPU
Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz
Motherboard
Hewlett-Packard 1521
Memory
4,00 GB (Usable 2,98)
Graphics Card(s)
NVIDIA NVS 5100M
Sound Card
NVIDIA High Definition Audio
Screen Resolution
1600x900
Hard Drives
INTEL SSDSA2CW120G3
Antivirus
F-Secure Internet Security
Browser
IE, Firefox, Opera
Other Info
Sandboxie,
SRP (Software Restriction Policy),
EMET (Enhanced Mitigation Experience Toolkit),
WFC (Windows Firewall Control by BiniSoft),
Malwarebytes Premium
Thanks Tookeri,

Can you please suggest the best web crawler tool? how can i achieve this functionality in a better way and faster way
 

My Computer

Computer type
PC/Desktop
OS
windows7 64 bit
Hey Dude,

Finally i was able to install the GetGnuwin32 and i did all the installation specified in the document. Now i could see the Wget exe and i tried to

D:\GnuWin32\GetGnuWin32\bin> wget.exe Error 404 (Not Found)!!1 ... 4530698365

I could see the result on the command prompt and i can see the image downloaded on the root folder. Now Could you please help me on reading the URL.txt has 200000 urls and need to output which URL has doesn't have image on output.txt.
then i tried to achieve the ecat output by below code,

Code:
@echo off
(for /f "usebackq delims=" %%a in ("url-list.txt") do (
    "D:\GnuWin32\GetGnuWin32\bin\wget.exe" --spider "%%a" || echo missing %%a
))>url.log
pause

i am good now. but seems to be this is not fastest way i am looking for as it takes plenty of time to output the result. please guide me with the fastest way to achieve this
use wget -S --spider

post results
 

My Computer

Computer type
Laptop
Computer Manufacturer/Model Number
ACER ASPIRE 5742G
OS
Microsoft Windows 7 Home Premium 64-bits 7601 Multiprocessor Free Service Pack 1
CPU
Intel(R) Core(TM) i3 CPU M 370 @ 2.40GHz
Motherboard
Acer Aspire 5742G
Memory
4,00 GB
Graphics Card(s)
ATI Mobility Radeon HD 5400 Series
Sound Card
(1) AMD High Definition Audio Device (2) Realtek High Defi
Screen Resolution
1366 x 768 x 32 bits (4294967296 colors) @ 60 Hz
Hard Drives
WDC WD5000BEVT-22ZAT0
Hi Kaktusoft,

Yes, i am using wget -S --spider for this purpose. But it take huge time(hours and hours) and still it didn't finish yet.

So what is the best way to achieve this . please guide me
 

My Computer

Computer type
PC/Desktop
OS
windows7 64 bit
Here's a very simple solution that will do it 10 times faster:

Split the file with URLs to 10 separate files and create 10 versions of the batch file that reads each URL file, and create 10 different output files. Then run all 10 simultaneously. When all have finished you add all output files into one single file.
 

My Computer

Computer type
Laptop
Computer Manufacturer/Model Number
HP Elitebook 8540p
OS
Windows 7 Pro 32
CPU
Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz
Motherboard
Hewlett-Packard 1521
Memory
4,00 GB (Usable 2,98)
Graphics Card(s)
NVIDIA NVS 5100M
Sound Card
NVIDIA High Definition Audio
Screen Resolution
1600x900
Hard Drives
INTEL SSDSA2CW120G3
Antivirus
F-Secure Internet Security
Browser
IE, Firefox, Opera
Other Info
Sandboxie,
SRP (Software Restriction Policy),
EMET (Enhanced Mitigation Experience Toolkit),
WFC (Windows Firewall Control by BiniSoft),
Malwarebytes Premium
Hi Tookri,

good solution. I tried splitting as 10000 of 20 times and i could finish with in 3-4 hours. I like this idea.

thanks for the great tip. Thanks a lot to everyone who participated and helped me.
 

My Computer

Computer type
PC/Desktop
OS
windows7 64 bit
Happy to hear it worked out for you :)
 

My Computer

Computer type
Laptop
Computer Manufacturer/Model Number
HP Elitebook 8540p
OS
Windows 7 Pro 32
CPU
Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz
Motherboard
Hewlett-Packard 1521
Memory
4,00 GB (Usable 2,98)
Graphics Card(s)
NVIDIA NVS 5100M
Sound Card
NVIDIA High Definition Audio
Screen Resolution
1600x900
Hard Drives
INTEL SSDSA2CW120G3
Antivirus
F-Secure Internet Security
Browser
IE, Firefox, Opera
Other Info
Sandboxie,
SRP (Software Restriction Policy),
EMET (Enhanced Mitigation Experience Toolkit),
WFC (Windows Firewall Control by BiniSoft),
Malwarebytes Premium
Back
Top