We provide IT Staff Augmentation Services!

Senior Aws Data Engineer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • 7+ years of overall experience as Big Data Engineer, ETL Developer and Python Developer comprises designing, development and implementation of Data models for enterprise - level application
  • Good noledge in Technologies on frameworks which includes enormous measure of data running in profoundly distributive mode in Cloudera, Hortonworks Hadoop distributions and Amazon AWS.
  • Great Knowledge on engineering and parts of Spark, and proficient in working wif Spark Core, Spark SQL, Spark streaming and aptitude in building PySpark and Spark-Scala applications for interactive examination, batch processing and stream handling.
  • Excellent Knowledge on Hadoop engineering and daemons of Hadoop clusters, which in corporate Name node, Data hub, Resource manager, Node Manager and Job history server.
  • Hands on experience in utilizing Hadoop environment components like Hadoop, Hive, Sqoop, HBase, Spark, Spark Streaming, Spark SQL, Oozie, Zookeeper, MapReduce structure, Yarn and Scala.
  • Extensive involvement in working wif NO SQL data sets and its combination Dynamo DB, Cosmo DB, and HBase.
  • Experience in designing Spark Streaming to get constant data from teh Apache Kafka and store teh stream data to HDFS and aptitude in utilizing flash SQL wif different information sources like JSON, Parquet and Hive.
  • Widely utilized Spark Data Frames API over Cloudera to perform examination on data in Hive and furthermore involved in Spark Data Frame Operations to perform required Validations in teh data.
  • Achieved complex HiveQL queries for required data extraction from Hive tables and composed Hive User Defined Functions (Udf's) as required.
  • Good Knowledge in utilizing Partitions, bucketing ideas in Hive and planned and designed both Managed and External tables in Hive to optimize teh performance.
  • Better experience working wif Tableau and empowered teh JDBC/ODBC data network from those to Hive tables.
  • Capable in changing Hive/SQL inquiries into Spark transformations utilizing Data frames and Data sets.
  • Created ETL pipelines from RDBMS to Hadoop using Sqoop, MapReduce, and Hive to achieve teh goal.
  • Great understanding and information on NoSQL databases like DynamoDB, HBase. Worked on HBase to stack and recover data for ongoing process using Rest API.
  • Capable of understanding and noledge on work process planning and locking devices/services like Oozie, Zookeeper, Airflow and Apache NiFi.
  • Experienced in working wif Amazon Web Services (AWS) making use of EC2 for computing and S3 as a storage.
  • Experience on working wif ETL operations using Apache Airflow DAG, Informatica Power center to load data into Data Warehouse.
  • Good in utilizing AWS utilities, for example EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on Amazon Web Services (AWS).
  • Capable in working wif Amazon EC2 to give a solution for computing, query handling, and storage across a wide range of utilizations.
  • Proficient in utilizing AWS S3 to support data transfer over SSL and teh data gets encrypted automatically whenever it is uploaded. Skilled in utilizing Amazon Redshift to perform large scale database migrations.
  • Knowledge of migrating SQL databases to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Data warehouse, as well as controlling and granting database access and migrating on-premises databases to Azure Data Lake utilizing Azure Data Factory.
  • Hands-on experience writing data pipelines using Azure Cloud.
  • Good experience in Text Analytic for different business issues and producing data visualizations using Python and R.
  • Proficient in Python Scripting and worked in detailed function wif NumPy, perception involving Matplotlib and Pandas for getting sorted out data.
  • Engaged wif stacking teh structured and semi structured data into spark clusters utilizing SparkSQL and DataFrames Application programming interface (API).
  • Composed Python scripts to parse JSON records and load teh information in data set and Python routines to sign into teh websites and get information for chosen choices.
  • UsedPyspark toperform Data Transformations and actions on Data Frames.
  • UsedPyspark and Spark-Sql and created a Spark application and applied transformations according to business rules.
  • Proficient in Building interactive analysis, batch processing, and stream processing applications in PySpark.
  • UsedPyspark toperformdata transformations and actions on data frames.
  • Good experience in writing scripts making use of Python API, PySpark API and Spark API for analyzing teh data.
  • Good Programming skills at a better level of abstraction using Python, pyspark.
  • Expertise in Configuring and using Jenkins (CI/CD pipeline), Docker, Git, and Kubernetes and Kafka installation.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: Hadoop, Map Reduce, Sqoop, Hive, Oozie, Spark, Zookeeper and Cloudera Manager.

NO SQL Database: HBase, Dynamo DB.

Monitoring and Reporting: PowerBI, Tableau, Custom shell scripts.

Hadoop Distribution: Horton Works, Cloudera.

Application Servers: Apache Tomcat, JDBC.

Build Tools: Maven

Programming & Scripting: Python, Scala, SQL, Shell Scripting.

Databases: Oracle, MY SQL, Teradata

Version Control: GIT,bitbucket

IDE Tools: Eclipse, Jupyter, Pycharm.

Operating Systems: Linux, Unix, Ubuntu, CentOS, Windows

Cloud: AWS, Azure.

Cluster Managers: Docker, Kubernetes

Development methods: Agile, Waterfall

PROFESSIONAL EXPERIENCE

Confidential

Senior AWS Data Engineer

Responsibilities:

  • Developed End to End ETL Data pipeline dat take teh data from surge and loading it into teh RDBMS using teh Spark.
  • Had prior experience wif Hadoop framework, HDFS, and MapReduce processing.
  • Worked on a live node Hadoop Cluster running Cloudera Distribution Platform (CDH 5.9) and as cloud deployed AWS EMR persistent clusters and configure teh cluster.
  • Created SSIS (ETL) Packages to extract data from a variety of heterogeneous data sources, including an Access database, an Excel spreadsheet, and flat files, and maintain teh data in SQL Server.
  • Developing Data load functions, which reads teh schema of teh input data and load teh data into a table.
  • Writing Scala Applications which runs on Amazon EMR cluster dat fetches data from teh Amazon S3/one lake location and queue it in teh Amazon SQS (simple Queue Services) queue.
  • Working on teh Spark SQL for analyzing and applying teh transformations on data frames created from teh SQS queue and loads them into Database tables and querying.
  • Working on Amazon S3 for persisting teh transformed Spark Data Frames in S3 buckets and using Amazon S3 as a Data-lake to teh data pipeline running on spark and Map-Reduce.
  • Developing logging functions in Scala which stores logs of teh pipeline in Amazon S3 buckets.
  • Developing Email reconciliation reports for ETL load in Scala using Java Libraries in Spark framework.
  • Createdand refactoredAirflow DAGs for data validation and DQ checks automation.
  • Executed Apache Airflow for authorizing, scheduling, and monitor Data Pipelines and designedSeveral DAGs (Directed Acyclic Graphs)to automate ETL pipelines.
  • Working on AWS Cloud Formation Templates, for creating a CFT and stack for creating teh EMR Cluster.
  • Working on AWS SNS, wif subscribes to AWS Lambda and SNS alert when teh data reaches teh Lake.
  • Adding teh steps to teh EMR cluster wifin teh bootstrap actions of AWS Lambda.
  • Working wif teh AWS SQS, for adding and polling teh messages form teh AWS S3.
  • Used Hive and Linux scripts to implement data integrity and quality checks in Hadoop.
  • Writing HIVE queries and functions for evaluation, filtering, loading, and storing of data.
  • Import data from various sources, such as HDFS/HBase, into Spark RDD and usedPySpark to perform computations to generate teh output response.
  • UsedPython and Pysparkingested large amounts of data from multiple sources (AWS S3, Parquet, API) and performed ETL operations.
  • Exposure on Spark Architecture and how RDD's work internally by involving and processing teh data from Local files, HDFS and RDBMS sources by creating RDD and optimizing for performance.
  • Performed end-to-end architecture and implementation evaluations of different AWS services such as AWS EMR, Redshift, S3, Atana, Glue, and Kinesis.
  • Working on fetching data from various source systems such as HIVE, Amazon S3 and AWS Kinesis.
  • Spark Streaming gathers dis data from AWS Kinesis in near real-time, performs teh necessary transformations and aggregation on teh fly, and persists teh data in a NoSQL store to build teh common learner data model (HBase).
  • Developed and implemented ETL pipelines on S3 parquet files in a data lake using AWS Glue.
  • Enhanced scripts of existing Python modules. Worked on writing APIs to load teh processed data to HBase tables.
  • Extensively used Accumulators and Broadcast variables to tune teh spark applications and to monitor teh spark jobs.
  • Tracked Hadoop cluster job performance and capacity planning. Tuning Hadoop performance for high availability and assisting wif Hadoop cluster recovery.
  • Used Tableau as a front-end BI tool and MS SQL Server as a back-end database to design and develop dashboards, workbooks, and complex aggregate calculations.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.

Environment: Scala, Spark - 2.3, Spark SQL, Scala, EMR, AWS SQS, AWS SNS, CFT, AWS Lambda, SQL, Java, Cassandra, MySQL, Hive, HDFS, Teradata, Tableau, Zookeeper, HBase, Nifi, Agile, HBase, NoSQL, pig.

Confidential

Azure Data Engineer

Responsibilities:

  • Developed and implemented Azure Data Factories for various groups for serverless data integration from various data sources.
  • Used Azure Databricks to process theData and tan ingested into Azure services such as Data Lake, Azure Data Lake Analytics, and Azure SQL Database.
  • Used Terraformto generate data factories, storage accounts, and access to teh key vault.
  • UsedADF and PySpark wif Databrickscreatedpipelines, data flows, and complex data transformations and manipulations.
  • UsedData Factory, Databricks, SQL DB, and SQL Data warehouse toimplementboth ETL and ELT architectures in Azure.
  • Linked services to Snowflake, blob storage, and SQL database.
  • Used DataFrames and Spark-SQL to develop Databricks PySpark scripts for transformations and loading teh data to different targets.
  • CreatedPython Databricks notebooks to handle large amounts of data, transformations, and computations.
  • Created pipelines, data flows, and complex data transformations and manipulations using ADF and PySpark wif Databricks.
  • UsedDatabricks toworkextensively on accessing, processing, transforming, and analyzing large amounts of data.
  • Working noledge of Python programming, including a variety of packages like NumPy, Matplotlib, SciPy, and Pandas.
  • Developed Spark applications for data extraction, transformation, and aggregation from multiple file formats using Pyspark and Spark-SQL andanalysis.
  • Utilize python/pyspark in Databricks notebooks when creating ETL pipelines in Azure Data Factory.
  • Used thecombination of Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics, extract, transform, and load data from sources systems to Azure Data Storage services.
  • Workedwif Azure Blob, ADLS Gen-1, and Gen-2 as well as other data storage options.
  • Experience inusing Azure Data Factory to bulk import data from csv, xml, and flat files.
  • UsedAzure DevOps tools, completely automate teh CI/CD process.
  • DevelopedtheAzure Data Factory pipelines in collaboration wif product owners and architects.

Environment: Azure, Terraform, Snowflake, Azure Databricks, Azure data lake, Azure Blob, SQL, T-SQL, Spark-SQL, Python, NumPy, Matplotlib, SciPy, Pandas, YAML.

Confidential

Hadoop Developer

Responsibilities:

  • Built scalable distributed data solutions using Hadoop.
  • Maintained Cluster, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, manage and review data backups and log files.
  • Worked hands on wif ETL process.
  • Liable for running Hadoop streaming jobs to deal wif terabytes of xml's data.
  • Load and change enormous sets of structured, semi structured, and unstructured data utilizing Hadoop/Big Data concepts.
  • Engaged wif loading data from UNIX file system to HDFS.
  • Worked on migrating MapReduce programs into Spark changes using Spark and Scala, at first done using python (PySpark).
  • Used spark SQL to load data and created schema RDD on top of data which loads into hive tables and took care of structured using spark SQL.
  • Worked on Hadoop cluster using different big data analytic tools including Hive, HBase, e, Zookeeper, Sqoop, Spark and Kinesis.
  • Created Hive tables, loading data, and writing hive queries.
  • Examined data utilizing Hadoop components Hive, Impala.
  • Made partitioned and bucketing tables in Hive.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Explored Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark context, Spark -SQL, YARN, Data Frame, Spark, Pair RDD's, pyspark.
  • Performed MR log data programs to transform into structured to find user location, spending time, age group.
  • Used Sqoop to extract teh data from Teradata into HDFS.
  • Used Sqoop to export teh patterns analyzed back to Teradata.
  • Implemented unit testing and integration testing.
  • Installed Oozie set up for workflow engine to run several Hive jobs which run independently wif time and data availability.
  • Managed and reviewed Hadoop log files.

Environment: Hadoop, MapReduce, Hive, Sqoop, Oozie, Flume, Spark, YARN, RDD’s, Spark SQL UNIX.

Confidential

Jr Data Analyst

Responsibilities:

  • Workedas part of a team to model teh transactional database platform logically and physically for Baseline Security - Configuration monitoring alsousedstandard data modeling techniques such as normalization.
  • Created Pivot tables and charts in Excel to display,report logistic rules andimplemented complex metrics and transformed them wif Excel formulas to allow for abetter data slicing and dicing.
  • UsedExcel tocreatedaily Ad Hoc reports dat show teh privacy impact assessment (PIA) and risk assessment.
  • UsedPython and other programming languages tocreate and improve a machine learning pipeline.
  • Developed Power BI dashboards for teh company dat display risk metrics dat are used to minimize organizational risks.
  • UsedPower BI andcreated multiple KPIs,senior managers monitor these KPIs to assess performance among all departments and identify any possible threats teh company.
  • Conductedpeer reviews of SQL scripts and dashboard designs in Power BI.
  • Created a business challenge to solve a modeling problem and develop machine learning models to increase business value.

Environment: Excel, Python, SQL, Power BI.

Confidential

Software Engineer

Responsibilities:

  • Participated in teh analysis, plan, and improvement phase of teh Software Development Lifecycle (SDLC).
  • Created a web application using JavaEE, Oracle database.
  • Planned and created server-side J2EE parts and involved teh Struts system for teh web application to take on MVC architecture.
  • Written SQL queries, Sequences for teh backend database in teh Oracle DB.
  • Developed front-end, UI using HTML, CSS, JSP, Struts and Angular JS, and session validation using Spring AOP.
  • Used Java multi-threading to implement batch Jobs wif JDK 1.5 features and deployed it on teh JBoss server.
  • Documented teh whole work done in traditional method.

Environment: Java/J2EE, Spring, Oracle, Linux, JDBC, HTML, CSS, Angular J, Struts 1.1, JSP, WebLogic, CVS, Eclipse.

We'd love your feedback!