🎉 30 days FREE!Claim Now

· Andrei M. · AI Tools  · 12 min read

Case Study: How a Lighting Manufacturer Enriched 1,500 Products Using Scraped Specs and AI

A lighting manufacturer had 1,500 products with incomplete technical specifications. They scraped missing data from component supplier websites, then used AI to generate complete product descriptions.

Case Study: How a Lighting Manufacturer Enriched 1,500 Products Using Scraped Specs and AI

A lighting manufacturer selling LED panels, chandeliers, track systems, and desk lamps had 1,500 products in their catalog. Their internal records contained a model number, a wattage figure, and a price for each product. Everything else — lumens, color temperature, CRI, beam angle, IP rating, dimensions, and certifications — was missing. Without that technical data, their ecommerce channel was generating a conversion rate of 0.8% from product page visits, and the support team was fielding 40 to 60 pre-purchase specification queries per week that the product pages should have been answering automatically.


The Challenge

The missing technical data was not a mystery. It existed in detailed form on the websites of the component suppliers and LED chip manufacturers who made the products that went into each lighting fixture. The problem was that none of this data had ever been collected and stored against the product catalog records. The catalog had been built for internal inventory purposes — model numbers, stock levels, and cost prices — not for customer-facing commerce.

The scale of the data enrichment problem was significant when broken down by attribute type:

  • 1,500 products with no lumen output values recorded
  • 1,500 products with no color temperature (CCT) classification
  • 1,200 products with no CRI (Color Rendering Index) value
  • 980 products with no beam angle specification
  • 1,500 products with no product dimensions
  • 870 products with no IP rating for moisture and dust resistance

Six critical technical attributes, missing from between 58% and 100% of the catalog.

The business impact was measurable. The 0.8% product page conversion rate was roughly one-third of the industry average for lighting products (2.1% to 2.8% per benchmarks from similar B2C lighting retailers). The specification query volume — 40 to 60 per week — was an operational overhead that the product pages were supposed to eliminate. And on marketplaces, incomplete specification data meant many products were not appearing in filtered searches at all, because marketplace faceted search requires structured attribute values to include a listing in a filtered result set.

The B2B wholesale buyers who represented 60% of revenue by value were particularly sensitive to this gap. Industrial buyers sourcing lighting for commercial installations require exact specification data for compliance, design, and procurement documentation. Multiple wholesale accounts had explicitly told the sales team that they were qualifying suppliers partly based on whether complete technical specification sheets were available on the product listing. Some had stopped requesting quotes for new products because the specification research burden was too high.


What They Tried First

The first attempt at product data enrichment was a manual spreadsheet project. A product data intern was assigned to research and fill missing specifications for each product by visiting the relevant component supplier websites and entering values into a master spreadsheet.

After 6 weeks, the intern had completed enrichment for 94 products — roughly 6 products per day, accounting for research time per product. At that pace, the remaining 1,406 products would take approximately 234 working days, or nearly a full year of one person’s time. The cost of that resourcing was not viable, and the project was paused.

The second attempt was supplier outreach. The catalog manager emailed 22 component suppliers requesting complete technical specification sheets in a structured format (CSV or Excel). Eleven suppliers responded. Of those, 7 provided data that was usable in some form — either a structured CSV or a PDF with well-organized specification tables. The other 4 sent general product brochures with marketing language rather than technical values.

Supplier-provided data covered approximately 380 products, about 25% of the catalog. The data quality varied: some suppliers used different measurement methodologies (for example, measuring lumen output at the LED chip rather than at the fixture, which overstates actual output by 15-30% depending on driver efficiency). Those discrepancies required validation before the data could be trusted. Supplier outreach had taken 4 weeks and added an additional 2 weeks of validation work, producing coverage for a quarter of the catalog.

Neither approach was going to deliver complete product data enrichment at scale within a reasonable timeframe.


The Solution

The team shifted to a two-stage process: scrape technical specification data from component supplier product pages to fill the missing attributes, then use MicroPIM’s AI description generator to produce customer-facing product descriptions from the enriched specification data.

Step 1: Identify the Data Sources

The catalog team compiled a mapping of each product to its primary component supplier. For LED panels, this was typically the LED chip and driver supplier. For track lighting, the optics manufacturer. For desk lamps, the LED module supplier. Each had a product page or specification sheet accessible on their public-facing website or B2B portal.

This mapping was done in a spreadsheet with two columns: product SKU and supplier page URL. For 1,500 products, this mapping exercise took approximately 3 days, working from the bill-of-materials records in the internal catalog. This was necessary groundwork — the scraping step required a URL to target per product.

[SCREENSHOT: MicroPIM URL import queue showing 150 component supplier product URLs staged for batch import, with SKU mapping column linking each URL to the corresponding catalog product]

Step 2: Scrape Specification Data via URL Import

With the URL mapping complete, the team used MicroPIM’s URL import feature to extract specification data from the component supplier pages. URLs were submitted in batches of 150. MicroPIM extracted the structured specification data from each page — typically from a specification table or structured data block — and returned the extracted fields for review.

The extraction covered lumens, color temperature, CRI, beam angle, IP rating, dimensions, and certifications (CE, RoHS, UL) for the products where these were present on the supplier page.

Extraction accuracy varied by supplier site quality. For suppliers with well-structured HTML specification tables, field extraction was direct and required minimal review. For suppliers with PDF-embedded or image-rendered specifications, manual transcription was still required for those specific products. Of the 1,500 products, approximately 1,180 (79%) had supplier pages with machine-readable specification data. The remaining 320 required a mix of PDF extraction and manual entry.

The 1,180 products with scrapeable supplier pages were processed in 4 batches over 2 days. Total time including review and correction of extraction anomalies: 16 hours. The remaining 320 products took an additional 3 days of manual work.

Step 3: Validate and Map Extracted Specifications

Extracted specification values were reviewed for three types of errors before being committed to the catalog: unit inconsistencies (lumens vs. kilolumens), measurement methodology discrepancies (chip-level vs. fixture-level lumen output), and field mis-extractions where the scraper had pulled from the wrong table row.

The validation pass took approximately 8 hours, working through flagged anomalies in bulk using MicroPIM’s bulk edit view. Products where the extracted lumen value was more than 20% higher than expected based on comparable products of the same wattage were flagged for manual review — a simple heuristic that caught the majority of chip-level vs. fixture-level measurement discrepancies.

[SCREENSHOT: MicroPIM bulk edit view showing 1,500 products with newly populated specification attributes — lumens, CCT, CRI, beam angle, IP rating — with red flags on 47 products flagged for lumen value anomalies]

Step 4: Generate Product Descriptions with AI

With the technical specification attributes populated across the catalog, the team used MicroPIM’s AI description generator to produce customer-facing product descriptions from the enriched data.

The AI generator uses the product’s structured attribute data as its input. A prompt template was configured to instruct the AI to write descriptions that addressed three buyer segments present in the customer base: residential consumers (prioritizing warmth, aesthetics, and ease of installation), commercial buyers (prioritizing lumen output per watt, certifications, and longevity), and wholesale buyers (prioritizing specification completeness and compliance data).

The prompt template included instructions to reference the specific CCT value in Kelvin, to state CRI as a numeric value with the “Ra” designation, to mention IP rating using the standard IPxx format, and to include beam angle in degrees where relevant for directional fixtures.

Generating descriptions for all 1,500 products took approximately 4 hours, running in batches of 100. Each batch was reviewed by a catalog manager before being published. Minor edits were required on roughly 8% of generated descriptions — primarily for fixtures with unusual specifications where the AI’s phrasing needed adjustment for technical precision.


The Results

The enrichment and AI description generation project ran for 3.5 weeks total — 5 days for the URL mapping groundwork, 2 days for batch scraping, 1 day for validation, and 4 hours for AI description generation plus a review day. This compared to the estimated 234 working days the manual approach would have required.

Conversion rate: Product page conversion rate increased from 0.8% to 2.1% in the 60 days following full catalog enrichment. The improvement was consistent across product categories.

Pre-purchase specification queries: Support ticket volume for specification-related pre-purchase questions dropped from 40-60 per week to 8-12 per week — a reduction of approximately 78%. The product pages were now answering the questions that had previously required a support interaction.

Marketplace visibility: On the two marketplaces where the brand had listings, products appearing in filtered searches (using facets for color temperature, CRI, and IP rating) increased from 340 to 1,490 products after enrichment. The additional 1,150 products were now appearing in filtered result sets where they had previously been absent due to missing attribute values.

Wholesale account activity: Four wholesale accounts that had reduced request-for-quote activity over the prior quarter cited improved product data depth in their follow-up communications after the catalog update. RFQ volume from wholesale accounts increased 34% in the 6 weeks following enrichment.

Data enrichment pipeline: The URL mapping methodology and MicroPIM scraping workflow became the standard process for new product onboarding. New products are now enriched within 24 hours of catalog entry rather than being added with skeleton records.


Key Takeaways

  • Manual product data enrichment at scale is not a resourcing problem — it is an architectural problem. Individual researchers fill records at 5-8 products per day regardless of how many people you assign. The workflow needs to change, not the headcount.
  • Component supplier websites are the most accurate source of technical specification data for manufactured products. The data exists; it just needs to be extracted and mapped to your catalog rather than retrieved one field at a time.
  • URL-based scraping via MicroPIM covers the majority of a catalog where supplier pages are well-structured HTML. The remainder — PDFs, image-rendered specs — still requires manual work, but that remainder is typically 20-30% of products rather than 100%.
  • AI description generation from structured specification data is reliable when the attribute schema is complete. The quality problem in AI-generated product content is almost always insufficient input data, not model capability.
  • The operational payoff from data enrichment comes from multiple directions simultaneously: conversion rate, support volume reduction, marketplace visibility, and wholesale account engagement. Measuring only one of these understates the total return.

If your catalog has products with functional specification data sitting on supplier websites that has never been collected, the product data enrichment gap is likely costing you more in conversion rate and support overhead than a structured scraping project would cost to execute. Set up a free account at app.micropim.net/register to test MicroPIM’s URL import on your first batch of supplier pages — the data extraction and enrichment workflow is available from day one.



Frequently Asked Questions

What types of supplier pages does MicroPIM’s URL import handle best for product data enrichment?

Pages with HTML-rendered specification tables or structured data blocks (schema.org/Product markup) produce the most reliable extraction results. Standard ecommerce product pages from major component suppliers like Philips, Osram, or Cree-format LED module suppliers typically fall into this category. Pages where specifications are embedded in PDFs, rendered as images, or loaded via JavaScript after page load require additional handling — either using MicroPIM’s PDF import functionality or manual transcription for those specific products. In practice, supplier pages built on standard ecommerce platforms (Magento, WooCommerce, Shopify) scrape cleanly in the majority of cases.

How do you handle specification data from multiple suppliers for a single product?

When a product uses components from two or more suppliers (for example, an LED module from one supplier and a driver/ballast from another), the enrichment process requires pulling data from both pages and combining the fields. In the lighting manufacturer’s workflow, this applied mainly to track lighting systems where the fixture and the LED module had separate supplier pages. MicroPIM allows you to import from multiple URLs and merge the extracted data into a single product record during the review step. The merge requires manual judgment for a small number of products, but the majority of attributes are non-overlapping (the fixture supplier provides physical dimensions and IP rating; the LED module supplier provides lumens, CRI, and CCT).

Can the AI description generator write for multiple buyer audiences from the same product data?

Yes. The prompt template configuration in MicroPIM’s AI generator allows you to specify audience context, tone, and the specific attribute fields to emphasize. In the lighting manufacturer’s case, a single prompt template was configured to address residential, commercial, and wholesale buyer contexts within a single product description by structuring the output into distinct sections. For storefronts with distinct buyer segments, you can also configure separate AI prompt templates per channel — one for the B2C storefront and one for the B2B wholesale catalog — using the same underlying enriched attribute data as input.

How do you maintain data enrichment quality as new products are added to the catalog?

The key is making enrichment a step in the product onboarding workflow rather than a retrospective project. Once the URL mapping convention is established (each new product record includes the component supplier page URL at the time of onboarding), the scraping and AI generation steps can be run on new products in batches of 10-20 within the same session. The lighting manufacturer runs a weekly enrichment batch every Monday for products added in the prior week. This keeps the catalog at near-complete attribute coverage without the need for periodic catch-up projects.

Andrei M.

Written by

Andrei M.

Founder MicroPIM

Entrepreneur and founder of MicroPIM, passionate about helping e-commerce businesses scale through smarter product data management.

"Your most unhappy customers are your greatest source of learning." — Bill Gates

Back to Blog

Related Posts

View All Posts »
Get Started Today

Start Using MicroPIM for Free

No credit card required. Free trial available for all Pro features.

Join other businesses owners who are using MicroPIM to automate their product management and grow their sales.

  • 14-day free trial for Pro features
  • No credit card required
  • Cancel anytime
SSL Secured
4.9/5 rating