About the role:
You will be responsible for expanding and optimizing our data and data pipeline architecture while ensuring data quality with rigorous testing of data and data-processing code. The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up.
Signal Automotive has created a streamlined end-to-end workflow platform and inventory management tool to scale its wholesale business. We believe that rich, high-quality data is a key component in transforming the automotive wholesale industry. We are focused on using AI/ML to better understand the complex market dynamics in the US auto industry and to share the learned insights with our customers so they can operate more efficiently. We have made solid progress here, and a common refrain from customers is ‘Signal knows my business better than I do’.
You will work with truly awesome Business and Technology teams and will play a key role in surfacing the information hidden in vast amounts of data. Your primary focus will be on transforming tables and streams of data into well-designed data models in a variety of database systems, including Google BigQuery, PostgreSQL, and Elasticsearch. In addition to typical SQL queries and analytical functions, these transformations also include the application of machine-learning models to enrich the data. As such, candidates with an interest in machine learning are well suited for this opportunity.
As a Data Engineer you will:
- Create and maintain optimal data pipeline architecture
- Assemble large, complex data sets that meet functional / non-functional business requirements
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL, Python, and Google-Cloud big-data technologies, e.g., BigQuery, GCS, Dataflow/Apache Beam, etc.
- Contribute to building and maintaining a system for continuous integration and deployment of changes to data systems
- Build analytics tools that utilize the data pipeline to provide actionable insights
- Work with stakeholders including the product, operations, and development teams to assist with data-related technical issues and support their data infrastructure needs
- Create data management tools for analysts and data scientists that assist them in building and optimizing our product into an innovative industry leader
- Work with data and analytics experts to strive for greater functionality in our data systems
Preferred candidates have:
- Expert knowledge of SQL and experience optimizing SQL queries in multiple databases
- Experience building and optimizing big-data pipelines, architectures, and data sets
- Experience delivering real-time analytics and machine learning predictions
- A successful history of manipulating, processing, and extracting value from large disconnected datasets
- Experience maintaining data quality and data equivalence between ML-model training systems and ML-model serving systems
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement
- Proficiency with Python and basic libraries for machine learning
- Experience with message queuing technologies such as RabbitMQ, Kafka, and/or Pulsar
- Experience with horizontally scalable data warehouse technologies e.g., Google BigQuery, Hadoop+Hive, Snowflake
- Experience with various SQL and NoSQL databases, including PostgreSQL and Elasticsearch
- Experience with data pipeline and workflow management tools e.g., Airflow, Dagster, Prefect, Cloud Composer
- Experience with stream-processing systems e.g. Spark Streaming, Flink, Apache Beam, Kafka Streams, etc.
- Strong understanding of data structures and algorithms
- Familiarity with Linux, Docker, and Kubernetes
- Experience with the lifecycle of machine-learning model development
- Familiarity with processes and technical vocabulary working with Agile software teams
- Excellent teamwork skills
- University degree in Information/Computer Science or related
- Very comfortable with English