We provide IT Staff Augmentation Services!

Sr. Data Engineer (aws) Resume

3.00/5 (Submit Your Rating)

Plano, TX

SUMMARY

  • Over 61/2 years of professional IT experience as Data Engineer in building data pipelines using Big data Hadoop ecosystem, Spark, Amazon Web Services (AWS), Data Science Concepts, Python, SQL, Tableau, GitHub and ETL tools.
  • Experience in using and writing SQL queries, database creation, and writing stored procedures, DDL, DML SQL queries
  • Experienced in multiple domains like Finance, Retail, E commerce and Healthcare.
  • Used Spark, Amazon Web Services to build scalable and fault - tolerant systems infrastructure to process 15 TB data/day resulting in 15% increase in the total number of users.
  • Experience in AWS services (data ingestion methods, including services such as Amazon Kinesis Firehose, which offers fully managed real-time streaming to Amazon S3 and AWS Snowball, which allows bulk migration of on-premises storage and Hadoop clusters to Amazon S3 and AWS Storage Gateway, integrating on-premises data processing platforms with Amazon S3-based data lakes).
  • Efficient in working with Hive data warehouse tool creating tables, data distributing by implementing Partitioning and Bucketing strategy, writing and optimizing the Hive QL queries.
  • Experience in ingestion, storage, querying, processing and analysis of Big Data with hands-on experience in Big Data including Apache Spark, Spark SQL and Spark Streaming.
  • Implemented advanced procedures like text analytics, mining and processing using the in-memory computing capabilities like Apache Spark written in Scale.
  • Expertise in collecting, exploring, analyzing and visualizing the data by generating tableau/Looker reports/dashboards, AWS QuickSight.
  • Experience in designing, developing and deploying projects in AWS suite including services such as On Demand Big Data Analytics, Click Stream Analysis, Event Driven Extract, Transform, Load (ETL), Smart Applications, Data Warehousing, etc.
  • Designed, tested, maintained the data management and processing systems using spark, AWS, Hadoop and shell scripting.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL, AWS, and big data technologies.
  • Knowledge on Amazon Web Services (AWS) services like EC2, S3, AWS Lambda, Container services, Amazon VPC, Amazon RDS/DynamoDB/RedShift, Aurora, Data Migration Services, Amazon DocumentDB, DAX, in memory services, security services, Monitoring and analytics services.
  • Had some experience in Azure Development, worked on Azure web application, Azure storage, Azure SQL Database, Virtual machines, Azure Data Factory, HD Insights, Azure search, and notification hub.

TECHNICAL SKILLS

Languages: Scala, Python, Py spark, SQL, Java, Hive QL, Shell Scripting

Big Data: Apache Spark, HDFS, YARN, Hive, Sqoop, Map Reduce, Tez, Ambari, Zookeeper, Data warehousing.

Databases: MS SQL Server 2016/2014, SQL Server, DB2, Oracle 12c/11g, Cassandra, Tera data, Big Query, Druid

Cloud: Amazon Web Services (S3, EC2, Lambda, Athena, EMR, Redshift, Kinesis Data Analytics, Elastic search, QuickSight, Glue, Deep Learning AMI, SageMaker), Azure.

Methodologies: Agile, Waterfall, UML, System Development Life Cycle (SDLC)

Data Science: ML/DL algorithms, TensorFlow, Keras.

Data Visualization Tools: Tableau, Amazon QuickSight, Looker, PowerBI, Microsoft Excel (Pivot tables, graphs, charts, Dashboards)

O-R mapping: Hibernate, JPA

Tools: Automic, Hue, Looker, IntelliJ IDEA, Eclipse, Pycharm, Maven, Zookeeper, VMware, Putty, DB visualizer.

Operating System: Windows, Linux, Mac

PROFESSIONAL EXPERIENCE

SR. DATA ENGINEER (AWS)

Confidential, PLANO, TX

RESPONSIBILITIES:

  • Objective of this project is to migrate all the services from in-house to cloud (AWS). This includes building a data lake as a cloud-based solution in AWS using Amazon S3 and makes it a single source of truth.
  • Provided meaningful and valuable information for better decision-making.
  • Migration of data includes various data types like Streaming data, Structured data and unstructured data from various sources and also includes legacy data migration.
  • Utilize AWS services with focus on big data analytics, enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility.
  • Designed AWS architecture, Cloud migration, AWS EMR, DynamoDB, Redshift and event processing using lambda function.
  • Built NoSQL solution for non-structural data using AWS DynamoDB services
  • Built data warehousing solutions on analytics/reporting using AWS Redshift service.
  • Developed Python programs to consume data from APIs as part of several data extraction processes and store the data in AWS S3.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
  • Created Hive External tables to stage data and then move the data from Staging to main tables.
  • Wrote Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Migration of Data from Hive to Presto DB for faster query execution and data retrieval times.
  • Used Presto in-built connectors for Red shift and Hive to prepare datasets for applying advanced analytics (ML) on certain use cases.
  • Implemented performance optimizations on the Presto SQL Queries for improving query retrieval times.
  • Used Query execution plans in Presto for tuning the queries that are integrated as data sources for Tableau dashboards.
  • Design of Red shift Data model and working on the RedShift performance improvements that helps faster query retrieval and also improves the dependent reporting/analytics layers.
  • Developed data transition programs from DynamoDB to AWS Redshift (ETL Process) using AWS Lambda by creating functions in Python for the certain events based on use cases.

ENVIRONMENT: AWS Services like S3, DynamoDB, EMR, RedShift, Lambda. Data Engineering tools like HDFS, Sqoop, Hive, Tables, ETL processing, SQL, Python.

SR. DATA ENGINEER

Confidential, CAMDON, NJ

RESPONSIBILITIES:

  • Worked with the analysis teams and management teams and supported them based on their requirements.
  • Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
  • Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.
  • Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base.
  • Developed live reports in a drill down mode to facilitate usability and enhance user interaction
  • Extensively worked on Shell scripts for running SAS programs in batch mode on Linux.
  • Wrote Python scripts to parse XML documents and load the data in the database.
  • Used Python to extract weekly information from XML files.
  • Developed Python scripts to clean the raw data.
  • Worked on AWS CLI to aggregate clean files in Amazon S3 and also on Amazon EC2 Clusters to deploy files into Buckets.
  • Used AWS CLI with IAM roles to load data to Redshift cluster,
  • Responsible for in depth data analysis and creation of data extract queries in both Net ezza and Tera data databases.
  • Extensive development in Net ezza platform using PL SQL and advanced SQLs.
  • Validated regulatory finance data and created automated adjustments using advanced SAS Macros, PROC SQL, UNIX (Korn Shell) and various reporting procedures.
  • Designed reports in SSRS to create, execute, and deliver tabular reports using shared data source and specified data source. Also, Debugged and deployed reports in SSRS.
  • Optimized the performance of queries with modification in SQL queries, established joins and created clustered indexes
  • Used Hive and Sqoop utilities and Oozie workflows for data extraction and data loading.
  • Development of routines to capture and report data quality issues and exceptional scenarios.
  • Creation of Data Mapping document and data flow diagrams.
  • Involved in generating dual-axis bar chart, Pie chart and Bubble chart with multiple measures and data blending in case of merging different sources.
  • Developed dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.
  • Created Dashboards style of reports using QlikView components like List box Slider, Buttons, Charts and Bookmarks.
  • Coordinated with Data Architects and Data Modelers to create new schemas and views in Netezza to improve reports execution time, worked on creating optimized Data-Mart reports.
  • Worked on QA the data and adding Data sources, snapshot, caching to the report.
  • Involved in troubleshooting at database levels, error handling and performance tuning of queries and procedures.

ENVIRONMENT: Data Engineering tools like SQL, PL/SQL, Advanced Excel, SAS, HDFS, Sqoop, Hive, Tables, ETL, Oozie AWS Services like CLI, IAM, EC2, S3, Lambda, QlikView

HADOOP DEVELOPER

Confidential, PA

RESPONSIBILITIES:

  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL databases for huge volumes of data.
  • Toiled on numerous file formats like Text, Sequence files, Avro, Parquet, ORC, JSON, XML files and Flat files using Map Reduce Programs.
  • Expanded daily process to do incremental import of data from DB2 and Teradata into Hive tables using Sqoop.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Resolved performance issues in Hive and Pig scripts with analyzing Joins, Group and Aggregation and how it translates to MR jobs.
  • Hands-on experience with the Hadoop ecosystem (HDFS, MapReduce, Hbase, Hive, Impala, Spark, Kafka, Kudu, Solr)
  • Stock the data into Spark RDD and Perform in-memory data computation to generate the output exact to the requirements.
  • Involved in scripting Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities as to the requirements.
  • Developed data pipelines using Spark, Hive and Sqoop to ingest, transform and analyze operational data.
  • Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into HBase tables.
  • Expansively worked with Partitions, Dynamic Partitioning, bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
  • Designing Oozie workflows for job scheduling and batch processing.

ENVIRONMENT: Hadoop, Spark, Scala, Teradata, Hive, pig, Impala, Sqoop, Oozie, SQL, DB2, spark SQL.

SR. SOFTWARE DEVELOPER

Confidential

Responsibilities:

  • Competency in using XML Web Services by using SOAP to transfer data to supply chain and for domain expertise Monitoring Systems.
  • Worked on Maven to build tool for building jar files. Used the Hibernate framework (ORM) to interact with the database.
  • Knowledge in struts tiles framework for layout management. Worked on design, analysis, and development and testing various phases of the application.
  • Develop named HQL queries and Criteria for use in application. Developed user interface using JSP and HTML.
  • Used JDBC for the Database connectivity. Involved in projects utilizing Java, Java EE web applications in the creation of fully-integrated client management systems.
  • Consistently met deadlines as well as requirements for all production work orders.
  • Executed SQL statements for searching contactors depending on Criteria. Development and integration of the application using Eclipse IDE.
  • Involved in building, testing and debugging of JSP pages in the system. Involved in multi-tiered J2EE design utilizing spring (IOC) architecture and Hibernate.
  • Involved in the development of front-end screens using technologies like JSP, HTML, AJAX and JavaScript.
  • Configured spring managed beans. Spring Security API is used for configured security.

ENVIRONMENT: Java, J2EE, JSP, Hibernate, Struts, XML Schema, SOAP, Java Script, PL/SQL, Junit, AJAX, HQL, JSP, HTML, JDBC, Maven, Eclipse.

We'd love your feedback!