We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

0/5 (Submit Your Rating)

Miami, FL

SUMMARY

  • 10+years of IT experience with Data Engineer/Analyst and coding with analytical programming using SQL, Python, Snowflake and AWS.
  • Good working knowledge in multi - tiered distributed environment, good understanding of Software Development Lifecycle (SDLC)-Agile and Waterfall Methodologies
  • The ability to develop reliable, maintainable, efficient code in most of SQL, Hive and Python.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Management Systems (RDBMS) and from RDBMS to HDFS.
  • Strong knowledge of various data warehousing methodologies and data modeling concepts.
  • Experience in the development of ETL processes and frameworks for large-scale, complex datasets
  • Experience with application development on Linux, python, RDBMS, NoSQL and ETL solutions
  • Experience designing and operating very large Data Warehouses.
  • Experience in the development of ETL processes and frameworks like matillion for large-scale, complex datasets
  • Good knowledge in BI and data visualization tools such asTableau
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, Pig.
  • Gained Knowledge to write cloud formation templates and deployed AWS resourcing.
  • Design and build data processing pipelines using tools and frameworks.
  • Understanding data transformation and translation requirements and which tools to leverage to get the job done.
  • Understanding data pipelines and modern ways of automating data pipeline using cloud-based technique.
  • Good understanding and hands on experience in setting up and maintaining NoSQL Databases like MongoDB, and HBase
  • Familiarity with new advances in the data engineering space such as EMR and NoSQL technologies like Dynamo DB.
  • Good knowledge in data visualization tools such asTableau
  • Ability to work on several python packages like Pandas, NumPy, Matplotlib, Beautiful Soup, Pyspark
  • Hands on experience in working with Continuous Integration and Deployment (CI/CD).
  • Defining user stories and driving the agile board in JIRA during project execution, participate in sprint demo and retrospective.
  • Experience in using various version control systems like Git.
  • Strong analytical skills with the ability to collect, organize and analyze large amounts of information with attention to detail and accuracy.
  • Possess good interpersonal, analytical presentation Skills, ability to work in Self-managed and Team environments.

TECHNICAL SKILLS

Programming Languages: Python, SQL, PHP, C++, Shell Scripting

SDLC Methodologies: Agile/SCRUM, Waterfall

Operating Systems: Windows, Linux, Unix

Python Libraries: Pyspark, Pandas, Beautiful Soup, Jinja2, NumPy, SciPy, Matplotlib, Unit test

Big Data Tools: Hadoop3.3, Hive3.2.1, Kafka2.8, Scala, MapReduce, Sqoop

Cloud Tools: Azure and AWS (S3, RDS, Dynamo, EMR, Redshift, Glue)

Databases: MS SQL Server, MySQL, PostgreSQL, Oracle, MongoDB, SQLite, Dynamo DB

Version Controls: Git, GitHub, Bitbucket

Other Tools: Snowflake, Databricks, Hive Spark, Matillion, ETL, JIRA, Docker, MS Excel, Data Pipeline, Data Modeling

PROFESSIONAL EXPERIENCE

Confidential

Sr Data Engineer

Responsibilities:

  • Working as a Data Engineer, Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks.
  • Highly Involved into Data Architecture and Application Design using Cloud and Big Data solutions on AWS.
  • Work independently to develop and test Hive based blocks to load and register data into Data Lake and Data Warehouse.
  • Migrated application from internal data center toAWS cloud platform.
  • Used and developed python scripts to migrate data to AWS.
  • Involved in Dataset Registration in Exchange for S3, One Lake, and Snowflake.
  • Created ETL jobs using matillion to load server data into the Snowflake Data Warehouse.
  • Worked onData management and Data Integration.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce and Hive.
  • Provided guidance to development team working on PySpark as ETL platform.
  • Developed data pipeline using Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Maintain existing ETL workflows, data management and data query components.
  • Extensively worked usingAWSservices along with wide and in depth understanding of each one of them
  • Involved in creating Hive tables and loading and analyzing data using hive queries.
  • Built REST APIs to easily add new analytics or issuers into the model.
  • Developed Map Reduce program for parsing and loading into HDFS.
  • Used Hive to do transformations, event joins and some pre-aggregations before store and register data into snowflake.
  • UsedHive SQLto do analysis on the data and identify different correlations.
  • Used Hivescriptand Amazon Elastic MapReduce for unit testing.
  • Imported data from Dynamo DB to Redshift in Batches using Amazon Batch using TWS scheduler.
  • Performed Data analysis and Data validation on Snowflake and Hue Editor.
  • Capacity planned for cloud infrastructure with AWS console, EC2 Instances, AWS EMR, S3
  • Used Databricks as platform for high-scaled analytics using spark.
  • Worked on the Analytics Infrastructure team to develop a stream filtering system on top of Apache Kafka.
  • Used Python scripts to update content in the database and manipulate files.
  • Involved in supporting a cloud-based data warehouse environment. such as Snowflake
  • Implemented code in python to retrieve and manipulate data.
  • Worked with data governance team to maintain data models, Metadata, Data Dictionaries define source fields and its definitions.
  • Extensively involved in managing and reviewing Hadoop log files.
  • Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloudwatch.
  • Design and build ETL pipelines to automate ingestion of structured and unstructured data.
  • Demonstrated a full understanding of the Fact/Dimension data warehouse design model, including star and snowflake design methods.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Responsible for the design, implementation, and architecture ofvery large-scale data intelligence solutions around Snowflake Data Warehouse.
  • Worked on functions in Lambda that aggregates the data from incoming events, and then stored result data in Amazon DynamoDB.
  • Load and transformedlarge sets of structured, semi structured and unstructured data usingSpark and Databricks.
  • Used Amazon EMR for map reduction jobs and test locally using Jenkins.
  • Created Load Balancer on AWS EC2 for stable cluster and services which provide fast and effective processing of Data.
  • Developed API for using AWS Lambda to manage the servers and run the code in the AWS .
  • Designed and implemented RestApi to access snowflake DB platform.
  • Created different type of reports including Cross-tab, Conditional, and Drill-down, OLAP and Sub reports.
  • Worked on Oozie workflow engine for job scheduling.
  • Developed tools using Python, Shell scripting, XML to automate some of the menial tasks.
  • Wrote many T-SQL Stored procedures to perform specific tasks as per the requirements.
  • Presented data in a visually appealing tool Tableau.

Environment: Agile, Snowflake, Hadoop3.3, Python3.1, AWS, Matillion, Jenkins2.3, DynamoDB, HDFS, Hive3.2.1, Lambda, EC2, EMR, S3, OLAP, Rest API, T-SQL, Oozie, MapReduce, XML, Kafka2.8, Tableau.

Confidential, Miami FL

Sr. Data Engineer

Responsibilities:

  • Worked independently in the development, testing, implementation, and maintenance of systems of moderate-to-large size and complexity.
  • Used the Agile methodology to build the different phases of Software development life cycle (SDLC).
  • Extensively worked usingAWSservices along with wide and in depth understanding of each one of them.
  • Handled Business logics by backend Python programming to achieve optimal results.
  • Responsible for ETL (Extract, Transform, and Load) processes to bring data from multiple sources into a single warehouse environment.
  • Worked on standard python packages like boto and boto3 for AWS.
  • Investigated data sources to identify new data elements needed for data integration.
  • Worked with delivery of Data & Analytics applications involving structured and un-structured data on Hadoop based platforms.
  • Carried out various mathematical operations for calculation purpose using python libraries.
  • Involved in designing and deploying AWS Solutions using EC2, S3, RDS and Redshift.
  • Created ETL Pipeline using Spark and Hive for ingest data from multiple sources.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data through Hadoop, MapReduce, Pig and Hive.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed
  • Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
  • Proficiency in wrangling data and creating data pipelines using fast, efficient Python code.
  • Used GIT for the version control and deployed project into AWS.
  • Configured Inbound/Outbound in AWS Security groups according to the requirements.
  • Used AWS Glue for the data transformation, validate and data cleansing.
  • Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.
  • Designed and developed data management system using MySQL.
  • Worked with Analysts on various requirements gathered in JIRA.
  • Used EMR for data pre-analysis by creating EC2 instances.
  • Designed and documented Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object Oriented Design) using Visio.
  • Created complex program unit using PL/SQL Records, Collection types
  • Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
  • Effectively loaded real time data from various data sources into HDFS using Kafka.
  • Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
  • Created severalDatabricks Sparkjobs withPysparkto perform several tables to table operations.
  • Developed AWS Lambda python functions using S3 triggers to automate workflows.
  • Prepared Dashboards using calculations, parameters, calculated fields, groups, sets and hierarchies in Tableau.

Environment: Hadoop3.0, Agile, AWS, Python, MS Visio, JIRA, MySQL, HDFS, Kafka1.1, GIT, EC2, S3, Spark, OLTP, ODS, MongoDB, Tableau.

Confidential - New York, NY

Data Engineer

Responsibilities:

  • Worked as a Data Engineer worked with the analysis teams and management teams and supported them based on their requirements.
  • Worked on Agile, testing methodologies, resource management and scheduling of tasks.
  • Worked on all phases of data warehouse development lifecycle, from gathering requirements to testing, implementation, and support.
  • Worked with Amazon Web Services (AWS) for improved efficiency of storage and fast access.
  • Worked on Data migration project from Teradata to Snowflake.
  • Developed data pipeline using Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift.
  • Implemented Data pipelines for big data processing using Spark transformations and Python API and clusters in AWS.
  • Specified nodes and performed the data analysis queries on Amazon redshift clusters on AWS
  • Involved in developing ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Added support for Amazon AWS S3 to host static/media files and the database into Amazon Cloud.
  • Used Oozie to automate data loading into the HDFS and PIG to pre-process the data.
  • Called Data Rules and Transform Rules functions using Informatica Stored Procedure Transformation.
  • Implemented data models, database designs, data access, table maintenance and code changes together with our development team.
  • Joined various tables in Cassandra usingspark and Scalaand ran analytics on top of them.
  • Analyze data from multiple data sources and develop a process to integrate the data into a single but consistent view
  • Performed Data Analytics on Data Lake using Pyspark on Databricks platform.
  • Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
  • Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop reports using advanced SQL queries in snowflake.
  • Involved in BI interactive Dashboards and Tableau Publisher.
  • Created multiple dashboards in tableau for multiple business needs.
  • Used Excel's VLOOKUP's to determine the customer data and created pivots to easily access and validate data.

Environment: Agile, Snowflake, AWS, Spark, HDFS, Sqoop, Oozie, Scala, Cassandra, PIG, HBase, Tableau, Excel, Pyspark, Databricks, Informatica.

Confidential - SanJose, CA

Data Analyst/Engineer

Responsibilities:

  • Worked as a Data Analyst/Engineerto generateDataModels using Erwin and developed relational database system.
  • Analyzed data where it lives by Mounting Azure Data Lake and Blob to Databricks.
  • Involved in extracting and mining data for analysis to aid in solving business problems.
  • Used Azure Data Lake as Source and pulled data using Azure Polybase.
  • Formulated SQL queries, Aggregate Functions, and database schema to automate information retrieval.
  • Involved in manipulating data to fulfill on analytical and segmentation requests.
  • Managed data privacy and security in Power BI.
  • Written complex SQL queries for data analysis to meet business requirements.
  • Using Data Visualization tools and techniques to best share data with business partners
  • Designed and implemented aDataLake to consolidatedatafrom multiple sources, using Hadoop stacks technologies like Sqoop, hive.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Reviewed code and system interfaces and extracts to handle the migration of data between systems/databases.
  • Developed a Conceptual model using Erwin based on requirements analysis.
  • Involved inETLmappingdocuments indatawarehouseprojects.
  • Involved in loading of data into Teradata from legacy systems and flat files using complex Multi Load scripts and Fast Load scripts.
  • Created azure data factory (ADF pipelines) using Azure polybase and Azure blob.
  • Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
  • Developed T-SQL code, stored procedures, views, functions, and other database objects to supply data for downstream applications and fulfill business requirements.
  • Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
  • Implemented data ingestion and handling clusters in real time processing using Kafka.
  • Created/ Tuned PL/SQL Procedures, SQL queries for Data Validation for ETL Process.

Environment: Azure Data Lake, Erwin, Power BI, Hadoop, HBase, Teradata, T-SQL, SSIS, PL/SQL.

Confidential

Data Analyst

Responsibilities:

  • Performed Data Analysis using SQL queries on source systems to identify data discrepancies and determine data quality.
  • Performed extensive Data Validation, Data Verification against Data Warehouse and performed debugging of the SQL-Statements and stored procedures for business scenarios.
  • Designed and developed Tableau dashboards using stack bars, bar graphs, scattered plots, and Gantt charts.
  • Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.
  • Worked on ER/Studio for Conceptual, logical, physical data modeling and for generation of DDL scripts
  • Handled performance requirements for databases inOLTPandOLAP models.
  • Analyzed the data which is using the maximum number of resources and made changes in the back-end code using PL/SQL stored procedures and triggers
  • Performed data completeness, correctness, data transformation and data quality testing using SQL.
  • Involved in designing Business Objects universes and creating reports.
  • Conducted design walk through sessions with Business Intelligence team to ensure that reporting requirements are met for the business.
  • Prepared complex T-SQL queries, views and stored procedures to loaddatainto staging area.
  • Wrote UNIX shell scripts to invoke all the stored procedures, parse the data and load into flat files.
  • Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.

Environment: SQL, PL/SQL, OLAP, OLTP, UNIX, MS Excel, T-SQL.

We'd love your feedback!