NewsBreak logo

Software Engineer, Web Crawling

Apply now
Company name
NewsBreak
(website)
Annual base salary
$165,000 — $250,000
Location

On-site from

Posted on SalaryPine

About NewsBreak

NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, our mission is to foster safer, more vibrant, and authentically connected lives. Through robust collaborations with thousands of local publishers and businesses across the nation, NewsBreak is revolutionizing how a new wave of readers access and engage with essential, locally sourced content & information.

Since our inception in 2015, our trajectory has been nothing short of remarkable. We proudly stand as the nation’s premier local news app.

As a Series-C unicorn startup, our headquarter nestles in the tech hub of Mountain View, California, with other offices in New York City and Seattle. For more information, visit www.newsbreak.com/about

About the Role

We're seeking a founding engineer to lead the design and development of our next-generation web crawling and dynamic indexing infrastructure. Your mission will be to create an adaptive, real-time crawling system that not only integrates seamlessly with external search providers but rapidly evolves based on user queries and interactions, continuously expanding and refining NewsBreak’s proprietary content index and recommendation knowledge graph.

This is far beyond a traditional web crawling role. You’ll architect and implement sophisticated crawling strategies informed directly by real user search patterns, enabling our AI agents to provide fresh, accurate, and hyper-localized responses. You will build infrastructure capable of dynamically responding to user queries, proactively crawling and indexing content within minutes, rather than days or weeks.

Your work will directly empower our AI-driven question-answering and recommendation systems, creating a closed-loop feedback mechanism where user queries trigger real-time crawling and indexing tasks, continuously improving our content quality and comprehensiveness. This is an opportunity to rethink web crawling as a foundational intelligence layer, rather than a static data collection tool.

Responsibilities

  • Design, develop, and deploy a real-time, adaptive web crawling and indexing infrastructure capable of proactively responding to user-generated queries and external search results integration.
  • Architect dynamic crawling strategies that rapidly prioritize, fetch, parse, and index web pages based on real-time demand signals.
  • Implement scalable crawling systems supporting millions of URLs per day with low latency (minutes-level) from discovery to indexing.
  • Collaborate closely with AI, search, and recommendation teams to build a tightly coupled feedback loop between user queries, crawling decisions, and content indexing.
  • Own the full lifecycle of the crawler infrastructure, from discovery algorithms, URL state management, garbage collection, deduplication, and storage optimization, to downstream indexing integration.
  • Optimize crawler performance, reliability, and resource utilization through rigorous profiling, monitoring, and tooling.
  • Mentor junior engineers and help build out a high-performing infrastructure team with deep expertise in intelligent web crawling systems.

Requirements

  • Bachelor's degree or higher in Computer Science, Engineering, or a related technical field.
  • 5+ years of proven experience designing and operating large-scale web crawling and indexing infrastructure at major technology companies or innovative startups.
  • Extensive experience with distributed systems, crawler frontier design, real-time URL prioritization, and high-QPS crawling infrastructure.
  • Strong system-level coding skills in Python, Go, or C++.
  • Demonstrated ability to integrate web crawling systems with downstream indexing, search engines, or NLP/AI pipelines.
  • Solid understanding of web technologies, web protocols (HTTP/HTTPS), JavaScript rendering, and anti-scraping countermeasures.
  • Experience building responsive crawling systems driven by real-time user signals or query logs is a strong plus.
  • Excellent problem-solving, analytical, and communication skills, with a proactive attitude towards system improvement and innovation.

Benefits

We offer a competitive benefits package:

  • Health, dental, and vision care for you and your family (100% coverage for employee)
  • Top-tier 401(K) plan with company matching
  • Paid time off and paid holidays
  • FSA, HSA and commuter benefits programs
  • Team activity budget
The US base salary range for this full-time position is listed below. Pay may vary based on a number of factors including job-related skills, level, experience, geographic location and relevant education or training. At NewsBreak, we design our overall rewards package to attract top talents. Depending on the position, the role may also be eligible for discretionary bonus and options. Your recruiter can share more details during the hiring process.Annual Base Pay Range$165,000—$250,000 USD

CPRA Privacy Notice for California Candidates

SalaryPine's logo

SalaryPine

© 2025 Borna Oy. 08:51:56
SalaryPine™ aggregates job postings for informational purposes under fair use. All trademarks, logos, and brand names mentioned in the job postings are the property of their respective owners and do not imply endorsement or affiliation.