Sr Data Engineer Resume
Miami, FL
SUMMARY
- 10+years of IT experience with Data Engineer/Analyst and coding with analytical programming using SQL, Python, Snowflake and AWS.
- Good working knowledge in multi - tiered distributed environment, good understanding of Software Development Lifecycle (SDLC)-Agile and Waterfall Methodologies
- The ability to develop reliable, maintainable, efficient code in most of SQL, Hive and Python.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Management Systems (RDBMS) and from RDBMS to HDFS.
- Strong knowledge of various data warehousing methodologies and data modeling concepts.
- Experience in the development of ETL processes and frameworks for large-scale, complex datasets
- Experience with application development on Linux, python, RDBMS, NoSQL and ETL solutions
- Experience designing and operating very large Data Warehouses.
- Experience in the development of ETL processes and frameworks like matillion for large-scale, complex datasets
- Good knowledge in BI and data visualization tools such asTableau
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, Pig.
- Gained Knowledge to write cloud formation templates and deployed AWS resourcing.
- Design and build data processing pipelines using tools and frameworks.
- Understanding data transformation and translation requirements and which tools to leverage to get the job done.
- Understanding data pipelines and modern ways of automating data pipeline using cloud-based technique.
- Good understanding and hands on experience in setting up and maintaining NoSQL Databases like MongoDB, and HBase
- Familiarity with new advances in the data engineering space such as EMR and NoSQL technologies like Dynamo DB.
- Good knowledge in data visualization tools such asTableau
- Ability to work on several python packages like Pandas, NumPy, Matplotlib, Beautiful Soup, Pyspark
- Hands on experience in working with Continuous Integration and Deployment (CI/CD).
- Defining user stories and driving the agile board in JIRA during project execution, participate in sprint demo and retrospective.
- Experience in using various version control systems like Git.
- Strong analytical skills with the ability to collect, organize and analyze large amounts of information with attention to detail and accuracy.
- Possess good interpersonal, analytical presentation Skills, ability to work in Self-managed and Team environments.
TECHNICAL SKILLS
Programming Languages: Python, SQL, PHP, C++, Shell Scripting
SDLC Methodologies: Agile/SCRUM, Waterfall
Operating Systems: Windows, Linux, Unix
Python Libraries: Pyspark, Pandas, Beautiful Soup, Jinja2, NumPy, SciPy, Matplotlib, Unit test
Big Data Tools: Hadoop3.3, Hive3.2.1, Kafka2.8, Scala, MapReduce, Sqoop
Cloud Tools: Azure and AWS (S3, RDS, Dynamo, EMR, Redshift, Glue)
Databases: MS SQL Server, MySQL, PostgreSQL, Oracle, MongoDB, SQLite, Dynamo DB
Version Controls: Git, GitHub, Bitbucket
Other Tools: Snowflake, Databricks, Hive Spark, Matillion, ETL, JIRA, Docker, MS Excel, Data Pipeline, Data Modeling
PROFESSIONAL EXPERIENCE
Confidential
Sr Data Engineer
Responsibilities:
- Working as a Data Engineer, Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
- Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks.
- Highly Involved into Data Architecture and Application Design using Cloud and Big Data solutions on AWS.
- Work independently to develop and test Hive based blocks to load and register data into Data Lake and Data Warehouse.
- Migrated application from internal data center toAWS cloud platform.
- Used and developed python scripts to migrate data to AWS.
- Involved in Dataset Registration in Exchange for S3, One Lake, and Snowflake.
- Created ETL jobs using matillion to load server data into the Snowflake Data Warehouse.
- Worked onData management and Data Integration.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce and Hive.
- Provided guidance to development team working on PySpark as ETL platform.
- Developed data pipeline using Sqoop to ingest cargo data and customer histories into HDFS for analysis.
- Maintain existing ETL workflows, data management and data query components.
- Extensively worked usingAWSservices along with wide and in depth understanding of each one of them
- Involved in creating Hive tables and loading and analyzing data using hive queries.
- Built REST APIs to easily add new analytics or issuers into the model.
- Developed Map Reduce program for parsing and loading into HDFS.
- Used Hive to do transformations, event joins and some pre-aggregations before store and register data into snowflake.
- UsedHive SQLto do analysis on the data and identify different correlations.
- Used Hivescriptand Amazon Elastic MapReduce for unit testing.
- Imported data from Dynamo DB to Redshift in Batches using Amazon Batch using TWS scheduler.
- Performed Data analysis and Data validation on Snowflake and Hue Editor.
- Capacity planned for cloud infrastructure with AWS console, EC2 Instances, AWS EMR, S3
- Used Databricks as platform for high-scaled analytics using spark.
- Worked on the Analytics Infrastructure team to develop a stream filtering system on top of Apache Kafka.
- Used Python scripts to update content in the database and manipulate files.
- Involved in supporting a cloud-based data warehouse environment. such as Snowflake
- Implemented code in python to retrieve and manipulate data.
- Worked with data governance team to maintain data models, Metadata, Data Dictionaries define source fields and its definitions.
- Extensively involved in managing and reviewing Hadoop log files.
- Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloudwatch.
- Design and build ETL pipelines to automate ingestion of structured and unstructured data.
- Demonstrated a full understanding of the Fact/Dimension data warehouse design model, including star and snowflake design methods.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
- Responsible for the design, implementation, and architecture ofvery large-scale data intelligence solutions around Snowflake Data Warehouse.
- Worked on functions in Lambda that aggregates the data from incoming events, and then stored result data in Amazon DynamoDB.
- Load and transformedlarge sets of structured, semi structured and unstructured data usingSpark and Databricks.
- Used Amazon EMR for map reduction jobs and test locally using Jenkins.
- Created Load Balancer on AWS EC2 for stable cluster and services which provide fast and effective processing of Data.
- Developed API for using AWS Lambda to manage the servers and run the code in the AWS .
- Designed and implemented RestApi to access snowflake DB platform.
- Created different type of reports including Cross-tab, Conditional, and Drill-down, OLAP and Sub reports.
- Worked on Oozie workflow engine for job scheduling.
- Developed tools using Python, Shell scripting, XML to automate some of the menial tasks.
- Wrote many T-SQL Stored procedures to perform specific tasks as per the requirements.
- Presented data in a visually appealing tool Tableau.
Environment: Agile, Snowflake, Hadoop3.3, Python3.1, AWS, Matillion, Jenkins2.3, DynamoDB, HDFS, Hive3.2.1, Lambda, EC2, EMR, S3, OLAP, Rest API, T-SQL, Oozie, MapReduce, XML, Kafka2.8, Tableau.
Confidential, Miami FL
Sr. Data Engineer
Responsibilities:
- Worked independently in the development, testing, implementation, and maintenance of systems of moderate-to-large size and complexity.
- Used the Agile methodology to build the different phases of Software development life cycle (SDLC).
- Extensively worked usingAWSservices along with wide and in depth understanding of each one of them.
- Handled Business logics by backend Python programming to achieve optimal results.
- Responsible for ETL (Extract, Transform, and Load) processes to bring data from multiple sources into a single warehouse environment.
- Worked on standard python packages like boto and boto3 for AWS.
- Investigated data sources to identify new data elements needed for data integration.
- Worked with delivery of Data & Analytics applications involving structured and un-structured data on Hadoop based platforms.
- Carried out various mathematical operations for calculation purpose using python libraries.
- Involved in designing and deploying AWS Solutions using EC2, S3, RDS and Redshift.
- Created ETL Pipeline using Spark and Hive for ingest data from multiple sources.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data through Hadoop, MapReduce, Pig and Hive.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed
- Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
- Proficiency in wrangling data and creating data pipelines using fast, efficient Python code.
- Used GIT for the version control and deployed project into AWS.
- Configured Inbound/Outbound in AWS Security groups according to the requirements.
- Used AWS Glue for the data transformation, validate and data cleansing.
- Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.
- Designed and developed data management system using MySQL.
- Worked with Analysts on various requirements gathered in JIRA.
- Used EMR for data pre-analysis by creating EC2 instances.
- Designed and documented Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object Oriented Design) using Visio.
- Created complex program unit using PL/SQL Records, Collection types
- Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
- Effectively loaded real time data from various data sources into HDFS using Kafka.
- Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
- Created severalDatabricks Sparkjobs withPysparkto perform several tables to table operations.
- Developed AWS Lambda python functions using S3 triggers to automate workflows.
- Prepared Dashboards using calculations, parameters, calculated fields, groups, sets and hierarchies in Tableau.
Environment: Hadoop3.0, Agile, AWS, Python, MS Visio, JIRA, MySQL, HDFS, Kafka1.1, GIT, EC2, S3, Spark, OLTP, ODS, MongoDB, Tableau.
Confidential - New York, NY
Data Engineer
Responsibilities:
- Worked as a Data Engineer worked with the analysis teams and management teams and supported them based on their requirements.
- Worked on Agile, testing methodologies, resource management and scheduling of tasks.
- Worked on all phases of data warehouse development lifecycle, from gathering requirements to testing, implementation, and support.
- Worked with Amazon Web Services (AWS) for improved efficiency of storage and fast access.
- Worked on Data migration project from Teradata to Snowflake.
- Developed data pipeline using Sqoop to ingest cargo data and customer histories into HDFS for analysis.
- Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift.
- Implemented Data pipelines for big data processing using Spark transformations and Python API and clusters in AWS.
- Specified nodes and performed the data analysis queries on Amazon redshift clusters on AWS
- Involved in developing ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Added support for Amazon AWS S3 to host static/media files and the database into Amazon Cloud.
- Used Oozie to automate data loading into the HDFS and PIG to pre-process the data.
- Called Data Rules and Transform Rules functions using Informatica Stored Procedure Transformation.
- Implemented data models, database designs, data access, table maintenance and code changes together with our development team.
- Joined various tables in Cassandra usingspark and Scalaand ran analytics on top of them.
- Analyze data from multiple data sources and develop a process to integrate the data into a single but consistent view
- Performed Data Analytics on Data Lake using Pyspark on Databricks platform.
- Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
- Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop reports using advanced SQL queries in snowflake.
- Involved in BI interactive Dashboards and Tableau Publisher.
- Created multiple dashboards in tableau for multiple business needs.
- Used Excel's VLOOKUP's to determine the customer data and created pivots to easily access and validate data.
Environment: Agile, Snowflake, AWS, Spark, HDFS, Sqoop, Oozie, Scala, Cassandra, PIG, HBase, Tableau, Excel, Pyspark, Databricks, Informatica.
Confidential - SanJose, CA
Data Analyst/Engineer
Responsibilities:
- Worked as a Data Analyst/Engineerto generateDataModels using Erwin and developed relational database system.
- Analyzed data where it lives by Mounting Azure Data Lake and Blob to Databricks.
- Involved in extracting and mining data for analysis to aid in solving business problems.
- Used Azure Data Lake as Source and pulled data using Azure Polybase.
- Formulated SQL queries, Aggregate Functions, and database schema to automate information retrieval.
- Involved in manipulating data to fulfill on analytical and segmentation requests.
- Managed data privacy and security in Power BI.
- Written complex SQL queries for data analysis to meet business requirements.
- Using Data Visualization tools and techniques to best share data with business partners
- Designed and implemented aDataLake to consolidatedatafrom multiple sources, using Hadoop stacks technologies like Sqoop, hive.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Reviewed code and system interfaces and extracts to handle the migration of data between systems/databases.
- Developed a Conceptual model using Erwin based on requirements analysis.
- Involved inETLmappingdocuments indatawarehouseprojects.
- Involved in loading of data into Teradata from legacy systems and flat files using complex Multi Load scripts and Fast Load scripts.
- Created azure data factory (ADF pipelines) using Azure polybase and Azure blob.
- Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
- Developed T-SQL code, stored procedures, views, functions, and other database objects to supply data for downstream applications and fulfill business requirements.
- Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
- Implemented data ingestion and handling clusters in real time processing using Kafka.
- Created/ Tuned PL/SQL Procedures, SQL queries for Data Validation for ETL Process.
Environment: Azure Data Lake, Erwin, Power BI, Hadoop, HBase, Teradata, T-SQL, SSIS, PL/SQL.
Confidential
Data Analyst
Responsibilities:
- Performed Data Analysis using SQL queries on source systems to identify data discrepancies and determine data quality.
- Performed extensive Data Validation, Data Verification against Data Warehouse and performed debugging of the SQL-Statements and stored procedures for business scenarios.
- Designed and developed Tableau dashboards using stack bars, bar graphs, scattered plots, and Gantt charts.
- Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.
- Worked on ER/Studio for Conceptual, logical, physical data modeling and for generation of DDL scripts
- Handled performance requirements for databases inOLTPandOLAP models.
- Analyzed the data which is using the maximum number of resources and made changes in the back-end code using PL/SQL stored procedures and triggers
- Performed data completeness, correctness, data transformation and data quality testing using SQL.
- Involved in designing Business Objects universes and creating reports.
- Conducted design walk through sessions with Business Intelligence team to ensure that reporting requirements are met for the business.
- Prepared complex T-SQL queries, views and stored procedures to loaddatainto staging area.
- Wrote UNIX shell scripts to invoke all the stored procedures, parse the data and load into flat files.
- Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.
Environment: SQL, PL/SQL, OLAP, OLTP, UNIX, MS Excel, T-SQL.