· Andrei M. · Data Quality · 11 min read
Case Study: How a Beauty Brand Scraped Competitor Specs to Rebuild Their Product Data
A beauty brand had incomplete product specifications that were losing them search visibility. They used web scraping to collect competitor data as a benchmark, then rebuilt their own product records.
Case Study: How a Beauty Brand Scraped Competitor Specs to Rebuild Their Product Data
A mid-sized skincare brand with 340 active SKUs was watching their organic search rankings decline over six months while their competitors’ product pages climbed into the top three positions for almost every category keyword. The cause was not their marketing budget or their domain authority — it was their product data. Specifically, it was the absence of it.
The Challenge
The brand’s product catalog had been built from supplier-provided data sheets, which gave them a functional starting point but not a competitive one. Their average product page contained a product name, a 40-word description, a price, and a weight. That was the full extent of it.
Their competitors — established skincare retailers and DTC brands — were ranking with product pages that contained:
- Full ingredient lists (INCI format) with percentages where applicable
- Step-by-step usage instructions (4-6 steps per product)
- Skin type compatibility flags
- Certification badges (cruelty-free, vegan, dermatologist-tested)
- SPF rating and water resistance duration for sunscreen products
- Fragrance-free, paraben-free, and sulfate-free attribute flags
- Size variants with volume in ml and oz
- Country of manufacture
- Shelf life and period after opening (PAO)
A structured content audit of 40 competitor product pages versus their own pages showed the gap in numbers: competitors averaged 18 distinct attributes per product. The brand averaged 4.
Google Search Console data made the business impact concrete. Their impressions for ingredient-related search queries — terms like “niacinamide serum fragrance free” or “vitamin C moisturizer SPF 30 vegan” — had dropped 61% over the prior 8 months. These were exactly the specification-driven queries that their bare product data could not serve.
The conversion rate problem was compounding the visibility problem. When shoppers did land on the product pages from branded search or direct traffic, the bounce rate was 74%. Exit surveys attributed the majority of bounces to “not enough information to make a purchase decision.”
What They Tried First
The first response was to hire a content writer to manually research and rewrite product descriptions. This produced well-written copy for 12 products over three weeks — about 4 products per week at roughly 3 hours per product, much of that time spent researching ingredient effects, certifications, and usage instructions.
At that pace, completing the full catalog of 340 products would take 85 weeks. New products were being added to the catalog at a rate of 15-20 per month. The writing project was falling behind before it had meaningfully started.
The second attempt was to reach back to their suppliers and request enriched data sheets. Responses came from about 30% of suppliers, and of those responses, roughly half provided data in a usable format. The rest sent PDFs of printed product packaging that required manual transcription. After 6 weeks of supplier outreach, they had enriched data for 47 products — 14% of their catalog.
Neither approach was going to close the gap within a timeframe that mattered for the current search cycle.
The Solution
The insight that changed the approach came from their ecommerce manager, who noted that their competitors were already doing the research work. Every competitor product page had the full attribute set they were trying to build. The problem was not that the data did not exist — it was that the data was sitting in competitor catalogs rather than in their own.
The strategy shifted to using competitor product pages as a structural benchmark. They were not looking to copy descriptions or steal content; they were looking to understand the complete attribute schema that well-ranked beauty product pages contained, and then use that schema to structure their own data enrichment project.
MicroPIM’s URL import feature became the operational tool for this approach.
Step 1: Identify the Structural Benchmark
The team selected three direct competitors whose product pages consistently ranked in positions 1-5 for their category keywords. They picked one product per competitor from three categories — serum, moisturizer, and SPF — giving them 9 competitor product pages to analyze.
Using MicroPIM’s URL import, they imported each competitor page. The import extracted structured data from the page: product name, description fields, specification tables, attribute lists, and content sections. The goal was not to import the content as catalog data — it was to use the extracted structure to build a comprehensive attribute schema.
[SCREENSHOT: MicroPIM URL import interface showing a competitor skincare product URL being submitted for data extraction]
From 9 pages across 3 competitors, the analysis produced a consolidated attribute list of 22 distinct specification fields that appeared consistently across well-ranked beauty product pages. These were the fields their own product pages were missing.
Step 2: Rebuild the Attribute Schema in MicroPIM
With the 22-field benchmark defined, the team rebuilt their product attribute schema in MicroPIM to support all 22 fields. Several of these were structured attributes — skin type was a multi-select field, certifications were boolean flags, SPF was a numeric field with a defined range, fragrance-free was a boolean. Others were text fields for ingredients and usage instructions.
The attribute schema rebuild took about 4 hours and resulted in a product data structure that matched the richness of the best-performing pages in their category.
[SCREENSHOT: MicroPIM attributes panel showing 22 skincare-specific attribute fields configured with their data types and validation rules]
Step 3: Use Competitor Data as a Research Reference, Then Write Original Content
For each product in their catalog, the team used MicroPIM’s URL import to pull the competitor page for the closest equivalent product. The extracted data provided:
- The standard ingredient format used in their category (INCI naming)
- The typical structure of usage instructions (number of steps, action-verb format)
- Which certifications were being claimed and how they were displayed
- The attribute value vocabulary — for example, whether “sensitive skin” was the standard term or whether “sensitive/reactive skin” was preferred in SEO terms
Armed with this benchmark data, they wrote original content for each product using the competitor pages as a structural reference. The actual descriptions, ingredient lists, and instructions were written from their own product documentation, manufacturer specifications, and lab sheets — the competitor data was used only to establish completeness standards, not to copy.
The benchmark data cut research time per product from 3 hours to under 45 minutes. They knew exactly what fields to fill, what format the data should be in, and what level of detail was expected in each field because the competitor pages had already demonstrated the standard.
Step 4: Import the Enriched Data at Scale
Once the enriched data was prepared for the first 80 products in a structured spreadsheet, they used MicroPIM’s CSV import to push all 80 records into the catalog in a single operation. Field mapping from the spreadsheet columns to the new attribute schema was configured once and saved as a reusable template.
[SCREENSHOT: MicroPIM CSV import field mapping screen showing spreadsheet columns being mapped to the 22 skincare attribute fields]
Subsequent batches of 40-60 products were imported on the same template. The import configuration did not need to be reconfigured for each batch — the template handled the field mapping automatically.
The Results
The catalog enrichment project ran for 14 weeks, enriching all 340 existing products and incorporating the new 22-attribute schema into the product onboarding process for new additions.
Search visibility recovery: Google Search Console impressions for ingredient and specification-driven queries increased 83% in the 90 days following the enrichment of the first batch of products. By the end of the 14-week project, impressions were up 147% year-over-year for their top 50 category keywords.
Ranking improvements: 34 product pages that had been unranked for their target keywords moved into positions 1-20 within 60 days of enrichment. 12 pages reached the top 5 positions for their primary keywords, compared to zero at the start of the project.
Bounce rate reduction: Product page bounce rate dropped from 74% to 51% within 60 days of the initial batch going live. Post-enrichment exit surveys showed “not enough information” responses fell from 67% of bounces to 18%.
Time efficiency: The benchmark-informed writing workflow averaged 42 minutes per product. The previous approach had averaged 3 hours per product. The 14-week project used approximately 240 person-hours of content work — the original manual approach would have required approximately 1,020 person-hours for the same output.
Revenue attribution: In the 90 days following full catalog enrichment, revenue from organic search increased 31% compared to the same 90-day period the previous year, despite overall site traffic growing at only 12% year-over-year. The conversion rate improvement from better-specified product pages was contributing as much to the revenue gain as the traffic increase.
Key Takeaways
- Thin product data is a search visibility problem before it is a content quality problem. Incomplete attribute schemas mean product pages cannot rank for specification-driven queries regardless of how well-written the descriptions are.
- Competitor product pages are a legitimate structural benchmark for product attribute schemas. Using them to identify the completeness standard for your category is different from copying their content.
- Product data scraping via URL import works at the research and benchmarking level. Importing competitor pages to understand what fields a well-ranked product page contains gives you the schema blueprint without requiring you to use their content.
- The operational constraint is usually not the writing itself — it is knowing what to write and to what standard. A clear benchmark cuts per-product enrichment time significantly more than better writing tools do.
- Connecting the enrichment work to measurable search and conversion outcomes lets you prioritize the product pages that matter most first, rather than working through the catalog alphabetically or by recency.
If your product catalog has more than 50 products and your average product page has fewer than 10 distinct specification attributes, the search and conversion opportunity from enrichment is likely larger than you expect. Start with a sample of 10 products in your top-performing category, enrich them to match the depth of the best-ranked pages in your space, and measure the impact before committing to the full catalog project.
Start a free 14-day trial at app.micropim.net/register — MicroPIM’s URL import and CSV import tools are available from day one, no credit card required.
Related Reading
- Every Import Format Explained — A complete breakdown of one-link imports, CSV, and API feed methods
- Feed Your Store: One-Link Product Import — How MicroPIM’s URL import feature works in detail
- Case Study: Garden Tools Supplier Catalog Scraping — A parallel approach using product data extraction from supplier pages
Frequently Asked Questions
Is using competitor pages for product data scraping legally permissible?
Using publicly accessible web pages to understand content structure, attribute formats, and schema conventions is a standard research practice in ecommerce. The key distinction is between scraping competitor content for reuse — which raises copyright and fair use questions — and using competitor pages to benchmark the completeness and format of your own original content. In the approach described here, competitor pages were used to identify what fields to fill and what format to use; all actual product data was written from the brand’s own product documentation and manufacturer materials. Consult your legal team regarding the terms of service for specific competitor sites and applicable regulations in your jurisdiction.
How many competitor pages do you need to scrape to build a reliable schema benchmark?
In practice, 6-12 pages across 2-4 competitors provides sufficient pattern recognition for most product categories. You are looking for the attributes that appear consistently across multiple well-ranked pages — those are the fields that matter for your category. One-off attributes that appear on a single competitor page without appearing elsewhere are less likely to be structurally important. Diminishing returns set in quickly beyond about 15 pages for a single category.
What happens to competitor data imported into MicroPIM?
URL imports create catalog records in MicroPIM based on the extracted page data. In the workflow described here, those imported records were used as research references and then discarded — they were not published to any storefront or integrated into the brand’s live catalog. MicroPIM gives you full control over which imported records are published, staged, or deleted, so using URL imports for research purposes without those records reaching your storefront is straightforward.
Will enriching product attributes alone improve search rankings, or are other factors required?
Attribute enrichment addresses one of the primary structural reasons product pages fail to rank for specification-driven queries: the content is not present to match the query. However, it is one factor among several. Page load speed, mobile usability, internal linking, domain authority, and structured data markup (schema.org/Product) all contribute to ranking performance. The results described in this case study reflect a combination of attribute enrichment and existing baseline SEO health. Enrichment is unlikely to rescue a product page that has severe technical SEO problems, but for a technically sound site with thin content, it is frequently the highest-leverage intervention available.

