We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Jersey City, NJ

SUMMARY:

  • Technology professional with 7+ years of experience in providing data driven, highly accurate and result oriented solutions to challenging business problems.
  • Adept at collecting, analysing and interpreting large data sets, developing, and forecasting new models.
  • Experience in analysing data using Python, R, SQL, Microsoft Excel, Hive, PySpark, Spark SQL for Data Mining, Data Cleansing and Machine Learning.
  • Good experience in developing web applications implementing Model View Control architecture using Python, Django, and Flask web application frameworks.
  • Strong experience in Python programming to build APIs, web services, machine learning, data transformations and data engineering.
  • Hands on experience in analysing data using Python, SQL, Microsoft Excel, Hive, PySpark, Spark SQL for Data Mining, Data Cleansing, Data Munging and Machine Learning.
  • Extensive experience in data manipulation using Python for loading and extraction as well as with Python libraries such as NumPy, SciPy, Matplotlib and Pandas for data analysis and numerical computations.
  • Exposure in building web frameworks using Python - Flask.
  • Hands on experience on Spark with Scala, PySpark.
  • Sound knowledge in Data Quality & Data Governance practices and processes.
  • Expertise in developing SSIS packages for data migration between different sources such as flat file, excel, oracle, Sybase, DB2, MS accessing SQL Server SSIS.
  • Hands on experience working Amazon Web Services (AWS) using EC2, S3, IAM, Lambda, RDS for data processing.
  • Good experience in data pre-processing, data analysis, machine learning to get insights into structured and unstructured data.
  • Experience in generating and communicating insights through visualizations, reports and dashboards using packages/tools such as Matplotlib, Pandas, Power BI, Tableau, NetworkX etc.
  • Extensively used Python and SQL to build ETL pipelines, automation and data science tooling in Linux and windows environments; while writing clean, test covered, modular code.
  • Strong ability to conduct qualitative and quantitative analysis for effective data-driven decision making.
  • Experience in developing web-based applications using Python, Django, JavaScript, C++, HTML, XML, CSS, jQuery, RESTful, and AJAX.
  • Hands-on experience in developing web applications and RESTful web services and APIs using Python, Flask.
  • Good experience in handling database issues and connections with SQL and NoSQL databases like SQL Server, Postgres, Oracle, MongoDB by installing and configuring various packages in python.
  • Experience working on RESTful web services and invoked them using Postman.
  • Hands on experience on version control like Git.
  • Proficient in building reports and dashboards in Tableau (BI Tool).
  • Expertise in broad range of technologies, including business process tools such as Microsoft Project, MS Excel, MS Access, MS Visio.
  • Experience in building applications in different operating systems like Windows and Linux.
  • Excellent interpersonal and communication skills, efficient time management and organization skills, ability to handle multiple tasks and work well in a team environment.

TECHNICAL SKILLS:

Data Engineering: Data Analysis, Scripting, ETL & Data Pipelines, Machine Learning, Text Analytics & NLP, Statistics, Deep Learning, Mathematical Modelling

Languages: Python, Scala, R, C++, Java

Python Libraries: Matplotlib, Seaborn, SciPy, NumPy, NetworkX

Cloud Services: VPC creation, S3 buckets, Boto3, EC2 instance, RDS instances, CloudFront, IAM, Amazon Command-line, creating security groups, implementing S3 Security & Encryption, managing S3 object lifecycle, Route53, Databases, DynamoDB, DNS, Redshift, Lambda and implementing Bootstrap scripts. Kubernetes, Step function, Kafka, Docker

Frameworks: Flask, Postman

Visualization Tools: Power BI, Tableau

Databases: SQL (SQL Server, Postgres, Oracle), NoSQL (MongoDB)

Other Tools: Airflow, Linux, Git, REST APIs (FastAPI), Spark/PySpark, Hadoop/HDFS

PROFESSIONAL EXPERIENCE:

Confidential, Jersey City, NJ

Data Engineer

Responsibilities:

  • Worked on infrastructure deployment on AWS using EC2 (Virtual Cloud Servers), RDS (Configured Relational Database Service), VPC (Virtual Private Cloud), managed the Network and its Security, Route 53, Cloud Formation, Direct Connect, AWS S3, AWS Ropeworks (operations automation), IAM, Glacier (Cloud storage) & Amazon CloudWatch Monitoring Management.
  • Applied AWS’s CLI to automate backups of short-lived data-stores to S3 buckets, EBS and created nightly backup AMIs for the mission critical production servers.
  • Developed and optimized ETL workflows in both legacy and distributed environments.
  • Worked with Cloud Watch, EC2, managing securities and Elastic Load Balancing on AWS.
  • Practical knowledge on AWS OpsWorks, CloudFormation and AWS Elastic Beanstalk.
  • Created SQL Server (T-SQL) stored procedures, views, and Power BI reports for project planning and regulatory reporting.
  • Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python Matplot library.
  • Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
  • Created Model dependency mapping by recursive tree traversal on regex extracted model ids from model descriptions.
  • Used HTML/CSS, JavaScript and AJAX for development of the website's user interface.
  • Developed views and templates with Django's view controller.
  • Designed REST APIs and/or packages that abstract feature extraction and complex prediction/forecasting algorithms on time series data.
  • Created and maintained data dictionaries, database schema, SQL & Python coding style/standard guides and runbooks for ETL & exposure reporting processes.
  • Performed data ingestion and data manipulation for Vertica, SharePoint, delimited files, APIs etc. using Python.
  • Performed data analysis, data cleaning, data visualization and data transformation using the SciPy stack (pandas, matplotlib, NumPy) and Power BI to discover insights in data.

Environment: Python, Django Web Framework, Pandas, Matplotlib, SciPy, NumPy, Power BI, REST API, Vertica, SharePoint, AWS (RDS, IAM, S3 Cloud Watch, Route 53, VPC, Autoscaling, Shell Scripts, RDS, SES, SQS and SNS), PyUnit, MySQL, Git, Jira, HTML, HTML5/CSS, jQuery, JavaScript, Unix/Linux, Mac, Windows environment, T-SQL, SQL Server

Confidential, Bedminster, NJ

Data Engineer

Responsibilities:

  • Evaluated, extracted/transformed data for analytical purpose in a Big data environment.
  • Involved in designing ETL processes and developing source to target mappings.
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames.
  • Developed Spark application by using Python (PySpark) to transform data according to business rules.
  • Used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, NLTK in Python for developing data pipelines and various machine learning algorithms.
  • Implemented AWS Lambdas to drive real-time monitoring dashboards from system logs.
  • Sourced Data from various sources (Teradata, Oracle) into Hadoop Eco system using big data tools like Sqoop.
  • Developed Shell Script to install snowflake jars, Python packages and spark executors from artifacts.
  • Worked on exporting and analysing data to the Snowflake using for visualization and to generate reports for the BI team. Use Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as storage mechanism.
  • Interpreted problems and provided solutions to business problems using data analysis, data mining, optimization tools, and machine learning techniques and statistics.
  • Developed Spark/Scala, Python for regular expression (regex) in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Analysed SQL scripts and designed the solutions to implement using PySpark.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Created various types of data visualizations using Python and Tableau.
  • Involved in development of Web Services using REST API’s for sending and getting data from the external interface in the JSON format.
  • Worked on complex SQL Queries, PL/SQL procedures and convert them to ETL tasks.

Environment: Python, Django, Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, NLTK, Spark, Scala, PySpark, AWS Lambdas, EC2, S3, Tableau, Hadoop, Hive, HDFS, Shell Scripting, REST API, SQL, Linux, Windows

Confidential, Bloomington, IL

Data Engineer/Python Developer

Responsibilities:

  • Involved in entire lifecycle of the projects including design, development, and deployment, testing and implementation and support.
  • Developed and deployed python scripts using Cron jobs to replace moving averages-based SQL anomaly detection with a time series clustering and news sentiment-based anomaly detection; reducing false positives and saving 1.5hrs/day.
  • Supported a team of Data Analysts managing historical times series data for various exchange traded assets by performing data sourcing, cleaning, loading, and processing from sources like ICE, Morningstar, Refinitiv, Eikon etc.
  • Developed Restful API's using Python Flask and T-SQL data models as well as ensured code quality by writing unit tests using Pytest.
  • Worked with data vendors, client services, reference data and other organizations to validate and facilitate the availability of market data in the Riskmetrics tool for clients and other users.
  • Imported data from Snowflake query and into Spark Dataframes and performed transformations and actions on Dataframes.
  • Built various graphs for business decision making using Python matplotlib library.
  • Used NumPy for Numerical analysis for Insurance premium.
  • Created tasks and workflows in the Workflow Manager and monitored the sessions in the Workflow Monitor.
  • Successfully conversed with vendors to meet their requirements and resolve their queries
  • Utilized PyUnit, a Python framework for unit testing, for all Python applications.
  • Wrote Python scripts to automate manual tasks.
  • Utilized Git for version control via CLI.
  • Wrote SQL queries to identify and validate data inconsistencies in data warehouse against source system.
  • Assisted users in creating/modifying worksheets and data visualization dashboards in Tableau.
  • Used Power BI and Tableau for Data Visualization and Analytics.

Environment: Python, Flask, Matplotlib, NumPy, Pandas, Workflow Manager, Power BI, Tableau, Cron Jobs, SQL, HTML, HTML5/CSS, jQuery, JavaScript, Linux, Windows, PyUnit, Git

Confidential

Python Developer

Responsibilities:

  • Responsible for requirements gathering, system analysis, design, development, testing and deployment. Involved in the overall SDLC phases.
  • Developed human-centered user interfaces using CSS, HTML, PHP, JavaScript and jQuery.
  • Developed and tested various features while in the Agile environment by using Python, Django, HTML5/CSS, Bootstrap, JavaScript.
  • Created and implemented Business Logic using Python/Django.
  • Created MySQL database and Implemented Python Django API to extract data.
  • Utilized Amazon Cloud’s EC2 and Amazon’s SQS to upload and retrieve project’s history
  • Utilized Django to develop RESTful web services for both producer and consumer.
  • Successfully conversed with vendors to meet their requirements and resolve their queries
  • Utilized PyUnit, a Python framework for unit testing, for all Python applications.
  • Wrote Python scripts to automate manual tasks.
  • Utilized Git for version control via CLI.

Environment: Python, Django Web Framework, PyUnit, MySQL, Git, Jira, HTML, HTML5/CSS, jQuery, Pandas, JavaScript, Linux, Mac, Windows

Confidential

Python Developer

Responsibilities:

  • Worked on several stages of Software development Life Cycle (SDLC) such as requirement gathering, planning, researching/analysis, modelling, design and development
  • Deployed various complex/efficient SQL queries and PL/SQL functions.
  • Performed Python scripting for the automation of production tasks
  • Design and Developed a REST architecture-based web services via using Python, Flask and Postgres-Database.
  • Worked on version control such as Git, performed Branching, committed the code changes to the master branch inside Linux.
  • Utilized Python’s package, Beautiful Soup for web scraping.
  • Worked on writing tests for RESTful API, involved using Python’s Requests library and pytest unit testing framework.
  • Utilized Jenkins to deploy the projects and ran unit tests.
  • Wrote multiple test cases to automate the CLI configuration using Python.
  • Created unit testing programs with Python’s PyUnit
  • Traced and fixed bugs in an already developed application.
  • Worked on scanning and analysing the system logs using bash programming.

Environment: Python, Django, PyUnit, PL/SQL, Linux, Git, HTML, HTML5/CSS, DOM, jQuery, AJAX, JavaScript, Linux, Mac, Windows

We'd love your feedback!