We provide IT Staff Augmentation Services!

Data Engineer Resume

Owings Mills, MD

SUMMARY:

  • Have proven track record of working as Data Engineer on Amazon cloud services, Bigdata/Hadoop Applications and product development.
  • Well versed with Big data on AWS cloud services i.e. EC2, S3, Glue, Anthena, DynamoDB and RedShift
  • Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys
  • Defined and deployed monitoring, metrics, and logging systems on AWS .
  • Experience working on creating and running Docker images with multiple micro - services .
  • Docker container orchestration using ECS, ALB and lambda.
  • Experience with Unix/Linux systems with scripting experience and building data pipelines
  • Experience on Cloud Databases and Data warehouses ( SQL Azure and Confidential Redshift/RDS )
  • Played a key role in migrating Cassandra, Hadoop cluster on AWS and defined different read/write strategies
  • Strong SQL development skills including writing Stored Procedures, Triggers, Views, and User Defined functions.
  • Expert in developing SSIS/DTS Packages to extract, transform and load (ETL) data into data warehouse/data marts from heterogeneous sources.
  • Good understanding of software development methodologies, including Agile (Scrum).
  • Expertise in development of various reports, dashboards using various Tableau Visualizations
  • Hands on experience with different programming languages such as Java, Python, R, SAS
  • Experience in using different Hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Pig, Sqoop, Hive, Impala, Hbase, Kafka, and Crontab tools.
  • Expert in creating HIVE UDFs using java in order to analyze data sets for complex aggregate requirements.
  • Experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark-Scala, PySpark, Spark-Sql, and Pig.
  • Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive .
  • Created user-friendly GUI interface and Web pages using HTML, CSS and JSP
  • Experience on MS SQL Server, including SSRS, SSIS, and T-SQL.

PROFESSIONAL EXPERIENCE:

Confidential, Owings Mills, MD

Data Engineer

Responsibilities:

  • Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
  • Coordinated with team and Developed framework to generate Daily adhoc, Report’s and Extracts from enterprise data and automated using Oozie.
  • Worked on cloud deployments using maven, docker and Jenkins.
  • Designed and Co-ordinated with Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets.
  • Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloudwatch
  • Used AWS Glue for the data transformation, validate and data cleansing.
  • Used python Boto 3 to configure the services AWS glue, EC2, S3

Confidential, Madison, SD

Data Engineer

Responsibilities:

  • Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases
  • Used AWS glue catalog with crawler to get the data from S3 and perform sql query operations
  • Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift
  • Used JSON schema to define table and column mapping from S3 data to Redshift
  • Wrote indexing and data distribution strategies optimized for sub-second query response
  • Developed a statistical model using artificial neural networks for ranking the students to better assist the admission process.
  • Designed and developed schema data models.
  • Performed Data cleaning and Preparation on XML files.
  • Robotic Process Automation of data cleaning and preparation in Python.
  • Built analytical dashboards to track the student records and GPAs across the board.
  • Used deep learning frameworks like MXNet, Caffe 2, Tensorflow, Theano, CNTK and Keras to help clients build Deep learning models
  • Participated in requirements meetings and data mapping sessions to understand business needs.

Confidential

Data Engineer

Responsibilities:

  • Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day
  • Worked on Big data on AWS cloud services i.e. EC2, S3, EMR and DynamoDB
  • Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.
  • Implementing and Managing ETL solutions and automating operational processes.
  • Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics
  • Wrote various data normalization jobs for new data ingested into Redshift
  • Advanced knowledge on Confidential Redshift and MPP database concepts.
  • Migrated on premise database structure to Confidential Redshift data warehouse
  • Was responsible for ETL and data validation using SQL Server Integration Services.
  • Defined and deployed monitoring, metrics, and logging systems on AWS.
  • Implemented Work Load Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer-running adhoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.
  • Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.
  • Expertise knowledge in Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done.

Confidential

Data Analyst

Responsibilities:

  • Developed stored procedures in MS SQL to fetch the data from different servers using FTP and processed these files to update the tables.
  • Responsible for Designing Logical and Physical data modelling for various data sources on Confidential Redshift
  • Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.
  • Experience with building data pipelines in python/Pyspark/HiveSQL/Presto/BigQuery and building python DAG in Apache Airflow.
  • Created ETL Pipeline using Spark and Hive for ingest data from multiple sources.
  • Involved in using SAP and transactions done in SAP - SD Module for handling customers of the client and generating the sales reports.
  • Coordinated with clients directly to get data from different databases.
  • Worked on MS SQL Server, including SSRS, SSIS, and T-SQL.
  • Designed and developed schema data models.
  • Documented business workflows for stakeholder review.

Confidential, IND

Application Developer

Responsibilities:

  • Worked on developing a product “Ecommerce” a web-based application which is relied on SAP (ERP) using Java, JSPs, HTML, CSS and Java Script.
  • Developed reports for the Business using Google charts API
  • Built SQL queries to build the reports for pre sales and secondary sales estimations.
  • Used JavaScript and JQuery for client-side validations.
  • Created user-friendly GUI interface and Web pages using HTML, CSS and JSP.
  • Established connection between portal and SAP using JCo Connectors.
  • Designed and developed Session Beans for implementing Business logic.
  • Worked on developing a product “Ezcommerce” a web-based application which is relied on SAP (ERP) Troubleshooting/Debugging the code and providing support to the client.
  • Created complex SQL queries and used JDBC connectivity to access the database.

Hire Now