
Software Engineer, Web Crawling
- Company name
- NewsBreak (website)
- Annual base salary
- $165,000 — $250,000
- Location
On-site from
- Posted on SalaryPine
About NewsBreak
NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, our mission is to foster safer, more vibrant, and authentically connected lives. Through robust collaborations with thousands of local publishers and businesses across the nation, NewsBreak is revolutionizing how a new wave of readers access and engage with essential, locally sourced content & information.
Since our inception in 2015, our trajectory has been nothing short of remarkable. We proudly stand as the nation’s premier local news app.
As a Series-C unicorn startup, our headquarter nestles in the tech hub of Mountain View, California, with other offices in New York City and Seattle. For more information, visit www.newsbreak.com/about
About the Role
We're seeking a founding engineer to lead the design and development of our next-generation web crawling and dynamic indexing infrastructure. Your mission will be to create an adaptive, real-time crawling system that not only integrates seamlessly with external search providers but rapidly evolves based on user queries and interactions, continuously expanding and refining NewsBreak’s proprietary content index and recommendation knowledge graph.
This is far beyond a traditional web crawling role. You’ll architect and implement sophisticated crawling strategies informed directly by real user search patterns, enabling our AI agents to provide fresh, accurate, and hyper-localized responses. You will build infrastructure capable of dynamically responding to user queries, proactively crawling and indexing content within minutes, rather than days or weeks.
Your work will directly empower our AI-driven question-answering and recommendation systems, creating a closed-loop feedback mechanism where user queries trigger real-time crawling and indexing tasks, continuously improving our content quality and comprehensiveness. This is an opportunity to rethink web crawling as a foundational intelligence layer, rather than a static data collection tool.
Responsibilities
- Design, develop, and deploy a real-time, adaptive web crawling and indexing infrastructure capable of proactively responding to user-generated queries and external search results integration.
- Architect dynamic crawling strategies that rapidly prioritize, fetch, parse, and index web pages based on real-time demand signals.
- Implement scalable crawling systems supporting millions of URLs per day with low latency (minutes-level) from discovery to indexing.
- Collaborate closely with AI, search, and recommendation teams to build a tightly coupled feedback loop between user queries, crawling decisions, and content indexing.
- Own the full lifecycle of the crawler infrastructure, from discovery algorithms, URL state management, garbage collection, deduplication, and storage optimization, to downstream indexing integration.
- Optimize crawler performance, reliability, and resource utilization through rigorous profiling, monitoring, and tooling.
- Mentor junior engineers and help build out a high-performing infrastructure team with deep expertise in intelligent web crawling systems.
Requirements
- Bachelor's degree or higher in Computer Science, Engineering, or a related technical field.
- 5+ years of proven experience designing and operating large-scale web crawling and indexing infrastructure at major technology companies or innovative startups.
- Extensive experience with distributed systems, crawler frontier design, real-time URL prioritization, and high-QPS crawling infrastructure.
- Strong system-level coding skills in Python, Go, or C++.
- Demonstrated ability to integrate web crawling systems with downstream indexing, search engines, or NLP/AI pipelines.
- Solid understanding of web technologies, web protocols (HTTP/HTTPS), JavaScript rendering, and anti-scraping countermeasures.
- Experience building responsive crawling systems driven by real-time user signals or query logs is a strong plus.
- Excellent problem-solving, analytical, and communication skills, with a proactive attitude towards system improvement and innovation.
Benefits
We offer a competitive benefits package:
- Health, dental, and vision care for you and your family (100% coverage for employee)
- Top-tier 401(K) plan with company matching
- Paid time off and paid holidays
- FSA, HSA and commuter benefits programs
- Team activity budget