🎉 30 days FREE!Claim Now

· Andrei M. · Automation  · 14 min read

Case Study: How a Stationery Brand Scraped 6,000 Products from a Legacy Supplier System

A stationery brand needed to import 6,000 products from a supplier whose only catalog was a legacy web application with no export functionality. URL-based scraping got the job done in under a day.

Case Study: How a Stationery Brand Scraped 6,000 Products from a Legacy Supplier System

A stationery brand selling notebooks, pens, art supplies, office organization products, and specialty paper had sourced from a key supplier for 11 years. When the brand decided to migrate their catalog management to MicroPIM and centralize all product data, they discovered that this supplier — responsible for 6,000 SKUs — had no way to export their product catalog. The supplier’s inventory and product data lived in a legacy web application from the early 2000s. No CSV export, no API, no product feed. The only way to access the data was through the product pages of the supplier’s web interface. Manual copy-paste was projected to take 3 weeks of full-time work. Product data scraping via URL import completed the same task in under a day.


The Challenge

The legacy supplier system was a bespoke web application built in the early 2000s that the supplier had never replaced. It served its original purpose — managing the supplier’s internal inventory — but it had not been built with data portability in mind. From the stationery brand’s perspective, the system had one relevant characteristic: each product had a stable URL that displayed a product detail page with the full product record, including SKU, name, description, category, dimensions, weight, material, color, price, and stock status.

The system had no export function. The supplier’s technical contact confirmed this directly: there was no button, no menu option, and no backend tool that would generate a file containing the product catalog. The supplier was aware of the limitation but had not prioritized replacing the system because it still functioned for their internal operations. A full system replacement was on their roadmap but at least 18 months away.

The stationery brand’s options as assessed by their operations team:

Option 1: Manual copy-paste. One team member could copy-paste approximately 25-30 complete product records per hour from the supplier web interface into a spreadsheet. At 6,000 products and 25 products per hour, this would require 240 hours — approximately 6 weeks for one person working full-time, or 3 weeks for two people working in parallel. Either timeline delayed the broader catalog migration project, which was on a 4-week schedule.

Option 2: Custom web scraping script. The brand’s developers could write a custom scraping script in Python or Node.js to automate the extraction. This was technically feasible, but the developer time estimate was 2 to 4 days for script development and testing, plus additional time for handling pagination, rate limiting, and error recovery for a legacy system that behaved inconsistently. The developer resource was not available within the migration timeline.

Option 3: Third-party scraping service. External scraping-as-a-service vendors were quoted at €1,200 to €2,400 for a one-time catalog extraction of this scale, with delivery timelines of 5 to 10 business days and no guarantee of field mapping to the brand’s catalog schema.

Option 4: Accept incomplete data. Proceeding with the migration without the supplier’s catalog data and filling it in retrospectively. This was rejected because the supplier’s 6,000 products represented 38% of the brand’s active SKU count, and going live with 38% of the catalog absent would make the system unusable for its intended purpose.

None of the four options were clearly acceptable given the migration timeline and budget constraints.


What They Tried First

Before identifying the MicroPIM URL import approach, the brand’s operations team had attempted a partial manual extraction. Two team members spent one full working day on manual copy-paste, working through the supplier’s product categories systematically. By end of day, they had captured 340 product records out of 6,000 — approximately 5.6% of the catalog.

The manual work also revealed a structural complexity they had not anticipated: the supplier’s legacy system displayed product variants (different colors or sizes of the same base product) as separate full product pages rather than as variants on a parent record. This meant the 6,000 URL count was actually 6,000 distinct product pages, some of which were color or size variants of the same physical product. Identifying and grouping variants would require additional manual analysis on top of the basic data extraction.

A developer who had seen the supplier’s page source code estimated that writing a scraping script would be straightforward from a technical standpoint — the legacy system used simple HTML tables for specification data with consistent markup — but the time estimate was 3 to 4 days including testing. That developer was committed to another project for the next 2 weeks.


The Solution

The solution used MicroPIM’s URL import feature to systematically scrape the supplier’s product pages in batches, extract the product data from the HTML, and map the extracted fields to the brand’s catalog schema.

Step 1: Build the URL List

The first prerequisite was a complete list of the 6,000 product page URLs from the supplier system. The supplier’s web interface had a category hierarchy with 42 product categories, each with a paginated product listing. While the system had no export function, the listing pages did support a “view all” mode that displayed all products in a category on a single page with their individual product page links.

The operations manager spent approximately 2 hours systematically navigating to each category’s full listing page, selecting all product links, and pasting them into a URL list spreadsheet. This produced 5,987 distinct product URLs — the full catalog minus 13 products that appeared to be discontinued (no-content pages).

The URL collection was the manual groundwork necessary to enable the automated scraping. It was 2 hours of work rather than 3 weeks.

[SCREENSHOT: Spreadsheet showing URL list with 5,987 rows — each row contains a product page URL from the supplier legacy system, organized by product category column]

Step 2: Configure the URL Import and Field Mapping

The URL list was uploaded to MicroPIM’s URL import tool. Before running the full import, the team ran a test batch of 20 URLs from three different categories to validate that the extraction was capturing all required fields.

The test batch returned extracted data for all 20 products. The legacy system’s HTML structure used consistent table markup for specifications across all categories, which meant the extraction pattern worked uniformly. Extracted fields from the supplier pages included:

  • Supplier SKU (the supplier’s internal product code)
  • Product name (in the supplier’s format)
  • Full description text
  • Category path (3 levels deep in the supplier’s taxonomy)
  • Material specification
  • Dimensions (length, width, height) as a concatenated string in the format “LxWxH cm”
  • Weight
  • Color (where applicable)
  • Pack size (for products sold in multiples)
  • Supplier price (the cost price shown in the brand’s logged-in supplier portal)

The dimension field required a transformation rule — the supplier stored “LxWxH cm” as a single string, while MicroPIM stores length, width, and height as separate numeric fields. A split transformation was configured in the field mapping to parse the string into three separate values.

[SCREENSHOT: MicroPIM URL import field mapping configuration showing supplier page extracted fields on the left mapped to MicroPIM catalog attributes on the right, with the dimension string split transformation highlighted]

Step 3: Run the Full Import in Batches

With the field mapping validated on the test batch, the full 5,987-URL import was queued in batches of 500. MicroPIM rate-limited requests to the supplier server to avoid overloading the legacy system — the rate limit was set conservatively at 2 requests per second given the age and unknown capacity of the supplier’s infrastructure.

Twelve batches of 500 (with one final batch of 487) ran sequentially. Total extraction time for all 5,987 products at 2 requests per second: approximately 5.5 hours.

During the import run, 34 URLs returned error responses (connection timeouts from the legacy system). These were re-queued and run again after a 30-minute interval, during which the supplier system appeared to throttle responses during peak usage hours. All 34 re-queued products extracted successfully on the second attempt.

Extraction completion: 5,987 products, 0 failures after re-queuing, elapsed time 7 hours total including the re-queue wait.

Step 4: Review, Variant Grouping, and Schema Mapping

The extracted records required three post-extraction steps before they were catalog-ready.

First, the brand’s SKU codes were not present in the supplier system — the supplier’s product codes were used as identifiers. The team ran a cross-reference against their existing internal records to match supplier SKUs to brand SKUs for the approximately 3,200 products that already existed in the brand’s system under their own codes. This matching took approximately 4 hours using MicroPIM’s bulk edit view with a SKU mapping import.

Second, variant grouping: the 5,987 extracted records included approximately 1,400 products that were color or size variants of a shared parent product. MicroPIM’s duplicate detection identified 487 product name clusters where multiple records shared the same base name with color or size suffixes. The team reviewed these clusters and configured parent-child variant relationships for 412 of them (75 were genuinely distinct products despite similar names).

Third, the supplier’s category taxonomy was mapped to the brand’s internal category hierarchy. The supplier used 42 categories; the brand used 28. A category mapping table was built and applied in bulk.

[SCREENSHOT: MicroPIM bulk import results summary showing 5,987 products extracted, 3,200 matched to existing SKUs, 2,787 new products created, 412 variant parent-child relationships established, and 42 supplier categories mapped to 28 brand categories]


The Results

Total time for product data extraction and catalog import: 7 hours for automated scraping plus 1 working day for post-extraction review, SKU matching, variant grouping, and category mapping. Total elapsed time: approximately 2 working days from starting the URL list collection to having 5,987 products in the catalog.

Comparison to manual approach: The manual copy-paste estimate had been 3 weeks for two people. The URL import approach completed the equivalent work in 2 days for one person plus automated processing. An estimated 225 person-hours saved.

Migration timeline impact: The catalog migration project completed within its 4-week schedule. The supplier data import, which had been identified as the primary schedule risk, came in 2.5 weeks ahead of the manual estimate, creating buffer time for QA on the broader migration.

Data completeness: The extracted records were more complete than the manual approach would have produced. Manual transcription over 3 weeks typically degrades in accuracy by day 8-10 as attention fatigue sets in. The automated extraction applied the same field mapping uniformly to all 5,987 records. Post-import QA on a sample of 200 products found a data accuracy rate of 99.2% — 5 records had extraction errors on the dimension field where the supplier’s format had varied slightly from the standard pattern.

Supplier relationship: The supplier’s technical team received a summary of the extracted data to verify accuracy. Their reaction was that the extraction had produced a more complete and structured version of their own product data than they had in their legacy system, because the system’s internal data was stored in a less organized format than what was displayed on the product pages. The supplier used the extracted and structured data to begin planning their own system migration.

Ongoing process: For the 200-400 new products the supplier adds to their system each quarter, a URL update process was established: the operations team collects new product URLs from the supplier’s “New Arrivals” category listing monthly and runs a targeted import batch of 200-400 URLs. This takes approximately 30 minutes per month and keeps the catalog synchronized with the supplier’s new additions.


Key Takeaways

  • Legacy supplier systems without export functionality are a common constraint in supplier-diverse catalogs. The product data scraping approach removes the binary choice between manual transcription and expensive custom development.
  • The manual groundwork — collecting the URL list — is the necessary step that URL import cannot replace. In this case, that took 2 hours for a 6,000-product catalog. Identifying a method to collect URLs systematically (category listing pages, paginated search results) reduces this to a tractable task.
  • Extraction accuracy from HTML pages depends heavily on the consistency of the page markup. Legacy systems built with structured HTML tables often scrape more reliably than modern single-page applications built with JavaScript-rendered content.
  • Rate limiting imports on legacy supplier systems is not optional — overloading a system with no capacity headroom will get your requests blocked and potentially disrupt the supplier’s operations.
  • The post-extraction work (SKU matching, variant grouping, category mapping) is often underestimated when planning a scraping project. Allow equivalent time for post-processing as for the extraction itself.

If your catalog has supplier data locked in a legacy system with no export functionality, the manual copy-paste math almost certainly does not work within any reasonable project timeline. MicroPIM’s URL import converts a months-long manual project into a day-scale automated one. Test the approach on a sample of your supplier’s product pages by creating a free account at app.micropim.net/register.



Frequently Asked Questions

What types of legacy supplier systems does URL import work with?

MicroPIM’s URL import works with any system that displays product data as standard HTML on a stable, accessible URL. This includes legacy web applications built with PHP, ASP, ColdFusion, or early Java frameworks — common in systems built in the 2000s and early 2010s. It also works with modern ecommerce platforms (Shopify, Magento, WooCommerce) and B2B supplier portals. The key requirements are: the product data must be rendered in the HTML response (not loaded exclusively via JavaScript after page load), and the product must have a distinct, stable URL. Systems that display product data only through search results or require a session login before displaying prices may require additional configuration. URL imports do not work with PDF product catalogs — MicroPIM has a separate PDF import function for those.

How do you handle a supplier system that requires login credentials to display full product data including prices?

For supplier portals that show full product data including pricing only to authenticated users, MicroPIM’s URL import supports authenticated sessions. You configure the authentication credentials (username/password, or session cookie) in the import settings, and the importer makes authenticated requests to the product pages. This is the typical setup for B2B supplier portals where catalog data and pricing are behind a login wall. For the stationery brand in this case study, the supplier’s legacy system showed all product data publicly, so no authentication configuration was needed — but the authentication feature exists for systems that require it.

What happens when a supplier’s legacy system changes its page structure after you have built a scraping configuration?

If a supplier updates their system and changes the HTML structure of their product pages, the field extraction pattern in MicroPIM may begin returning incorrect or missing values. This is detected during import when extracted fields are empty or contain unexpected values — the post-import review step will catch this for a manual review batch, or the import validation will flag it if required fields are missing. When this happens, the field mapping configuration needs to be updated to match the new HTML structure. The frequency of this issue depends on how actively the supplier updates their system — for a legacy system that has not changed in 15 years, the risk of near-term structural changes is low.

Is it appropriate to scrape a supplier’s website without their explicit permission?

Best practice is to inform your supplier that you are using automated URL access to collect product data for your own catalog management purposes. In a supplier relationship, the product data you are extracting belongs to the shared commercial relationship — you are collecting data about products you sell, from pages the supplier has made accessible to their trade customers. Most suppliers have no objection to this when explained in context. You should review the supplier’s website terms of service and consult your legal team if there is any uncertainty about the specific terms in your jurisdiction or the supplier’s terms. For the stationery brand in this case, the supplier was informed of the approach and provided their support — the URL import process was used with the supplier’s knowledge.

Andrei M.

Written by

Andrei M.

Founder MicroPIM

Entrepreneur and founder of MicroPIM, passionate about helping e-commerce businesses scale through smarter product data management.

"Your most unhappy customers are your greatest source of learning." — Bill Gates

Back to Blog

Related Posts

View All Posts »
Get Started Today

Start Using MicroPIM for Free

No credit card required. Free trial available for all Pro features.

Join other businesses owners who are using MicroPIM to automate their product management and grow their sales.

  • 14-day free trial for Pro features
  • No credit card required
  • Cancel anytime
SSL Secured
4.9/5 rating