We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Dunwoody, GA

SUMMARY

  • Around 7+ years of experience with strong emphasis on Design, Development, Implementation, and Deployment of Software Applications.
  • Over 5+ years of comprehensive experience in Data Engineering with strong emphasis on Big data and Hadoop ecosystem frameworks.
  • Hands on experience with Hadoop Ecosystem components like Spark, MapReduce (Processing), HDFS(Storage), Hive, Impala (Analytical Querying), Yarn, Sqoop, HBase, Oozie and Kafka.
  • Strong knowledge on various programming languages with expertise in Java, Scala and Python.
  • Extensive experience writing end to end Spark Applications both using Scala and Python and utilizing Spark RDD, Spark DataFrames, Spark SQL and Spark Streaming.
  • Gained good experience troubleshooting long running jobs in Spark and fine tuning the performance bottlenecks.
  • Good experience creating real time streaming pipelines using Kafka and Spark Streaming for consuming.
  • Experience working with both Distributions (CDH, HDP) and cloud services primarily AWS.
  • Solid experience working with various native services in AWS Cloud like S3, EMR, Athena, Glue, Redshift, AWS SWF etc., for building data pipelines.
  • Experience working with NoSQL databases like HBase and Cassandra.
  • Good experience in handling data manipulation using python Scripts and experience in developing Python scripts for automation.
  • Experience in developing MapReduce jobs in Java for data cleaning and pre - processing.
  • Expertise in writing Hive Scripts and extended their functionality using User Defined Functions (UDF's).
  • Expertise in modelling data efficiently in Hive tables using Partitions and Bucketing.
  • Expertise in preparing Interactive Data Visualization's using Tableau Software from different sources.
  • Expertise in Object-Oriented Analysis and Design (OOAD) like UML and use of various design patterns.
  • Experience in Java, JSP, Servlets, EJB, Web Logic, Web Sphere, Hibernate, Spring, JBoss, JDBC, RMI, Java Script, Ajax, jQuery, XML and HTML.
  • Fluent with the core Java concepts like I/O, Multi-Threading, Exceptions, Reg Ex, Data Structures and Serialization.
  • Extensive experience in Java and J2EE technologies like Servlets, JSP, JSF, JDBC, JavaScript, ExtJS, spring, hibernate, and Junit testing.
  • Performed Unit Testing using Junit Testing Framework and Log4J to monitor the error logs.
  • Experience in process improvement, Normalization/De-normalization, Data extraction, cleansing and Manipulation.
  • Expertise in working with Transactional Databases like Oracle, SQL server, My SQL, and Db2.

TECHNICAL SKILLS

Big Data Ecosystem: Spark, Pyspark, Mapreduce, Hive, HDFS, Sqoop, HBase, Flume, Oozie, Impala, Kafka, Nifi, Airflow, Databricks

Languages: Java, Scala, Python

No Sql: HBase, Cassandra, MongoDb

Databases: MySQL, Teradata, Oracle

IDEs: Eclipse, Intellij, PyCharm

Other Tools: Maven, Jenkins, Putty, WinSCP, Jira, Confluence

Version Control: GitHub, SVN, CVS

Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE

Confidential, Dunwoody, GA

Sr. Big Data Engineer

Responsibilities:

  • Worked on development, testing and deployment of Spark applications.
  • Worked on troubleshooting and fine-tuning Spark jobs.
  • Worked on building real time pipelines using Kafka and Spark Streaming.
  • Written Kafka producers using Kafka Producer Api and integrated with Spark Streaming applications for consuming the stream messages.
  • Worked on automating the data pipelines and ensuring the reliability of the data pipelines.
  • Used Spark JDBC Readers for connecting to external databases and pulling the data to S3 data lake.
  • Used Spark JDBC Writers for connecting to redshift and writing the processed dataframes to Redshift.
  • Used Hive scripting for producing custom and adhoc data sets requested by downstream business teams.
  • Evaluated AWS Databricks as an option to replace EMR clusters and create data pipelines using Databrics Notebooks.
  • As part of evaluation, did some working poc's on connecting to external metastores from databricks clusters, utilizing databricks utils etc.,
  • Utilized Glue metastore service in AWS for storing all the Hive metadata.
  • Utilized Athena Interactive Query Service in AWS for performing data analysis.
  • Automated launching of EMR Spark clusters using AWS Java SDK and terminating the clusters once step is finished.
  • Responsible for creating Spring Boot based Rest applications to allow some of the metadata and preview of the processed data to be consumed by downstream application teams.
  • Responsible for automating CICD build and deployment using Jenkins.

Environment: AWS EMR, Spark, HDFS, Python, Hive, HBase, HiveQL, Sqoop, Java, Scala, Unix, IntelliJ, Autosys, Maven

Confidential, Bloomfield, CT

Sr. Big Data Developer

Responsibilities:

  • Involved in developing roadmap for migration of enterprise data from multiple data sources like SQL Server, provider databases into S3 which serves as a centralized datahub across the organization.
  • Loaded and transformed large sets of structured and semi structured data from various downstream systems.
  • Developed ETL pipelines using Spark and Hive for performing various business specific transformations.
  • Building data applications and automating the pipelines in Spark for bulk loads as well as Incremental Loads of various Datasets.
  • Worked closely with our data scientist team’s and business consumers to shape the datasets as per the requirements.
  • Automated the data pipeline to ETL all the Datasets along with full loads and incremental loads of data.
  • Performed bulk load of JSON data from s3 bucket to snowflake.
  • Used Snowflake functions to perform semi structures data parsing entirely with SQL statements
  • Utilized AWS services like EMR, S3, Glue Metastore and Athena extensively for building the data applications.
  • Implemented a'server less'architecture usingAPI Gateway, Lambda, and Dynamo DBand deployedAWS Lambda codefrom Amazon S3 buckets. Created a Lambda Deployment function, and configured it to receive events from your S3 bucket
  • Worked on building input adapters for data dumps from FTP Servers using Apache spark.
  • Generating Data Models using Erwin9.6 and developed relational database system and involved in Logical modeling using the Dimensional Modeling techniques such as Star Schema and Snowflake Schema
  • Wrote spark applications to perform operations like data inspection, cleaning, load and transforms the large sets of structured and semi-structured data.
  • Developed Spark with Scala and Spark-SQL for testing and processing of data.
  • Reporting the spark job stats, monitoring and running data quality checks are made available for each Datasets.
  • Used SQL Programming Skills to work around the Relational SQL Databases.

Environment: AWS Cloud Services, Apache Spark, Spark-SQL, Unix, Kafka, Scala, SQL Server.

Confidential, NYC, NY

Big Data /Hadoop Engineer

Responsibilities:

  • Involved in importing and exporting data between Hadoop Data Lake and Relational Systems like Oracle, MySQL using Sqoop.
  • Involved in developing spark applications to perform ELT kind of operations on the data.
  • Modified existing MapReduce jobs to Spark transformations and actions by utilizing Spark RDDs, DataFrames and Spark SQL API’s
  • Utilized Hive partitioning, Bucketing and performed various kinds of joins on Hive tables
  • Involved in creating Hive external tables to perform ETL on data that is produced on daily basis
  • Validated the data being ingested into Hive for further filtering and cleansing.
  • Developed Sqoop jobs for performing incremental loads from RDBMS into HDFS and further applied Spark transformations
  • Loaded data into hive tables from spark and used Parquet columnar format.
  • Created Oozie workflows to automate and productionize the data pipelines
  • Migrating Map Reduce code into Spark transformations using Spark and Scala.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Did a Poc on GCP cloud services and feasibility of migrating onprem setup to GCP cloud and utilizing various services in GCP like Dataproc, BigQuery, Cloud Storage etc.,
  • Designed, documented operational problems by following standards and procedures using JIRA

Environment: Hadoop, Hive, Impala, Oracle, Spark, Pig, Sqoop, Oozie, Map Reduce, GIT, Confluence, Jenkins.

Confidential

Hadoop Developer

Responsibilities:

  • Involved in importing data from Microsoft SQLserver, MySQL, Teradata. into HDFS using Sqoop.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
  • Used Hive to analyze the partitioned and bucked data to compute various metrics of reporting.
  • Involved in creating Hive tables loading data, and writing queries that will run internally in MapReduce
  • Involved in creating Hive External tables for HDFS data.
  • Solved performance issues in Hive and Pig Scripts with understanding of Joins, Group and Aggregation and perform the MapReduce jobs.
  • Used Spark for transformations, event joins and some aggregations before storing the data into HDFS.
  • Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in the data being reported.
  • Analyze the large amount of data sets to determine optimal way to aggregate.
  • Worked on the Oozie workflow to run multiple Hive and Pig jobs.
  • Worked on creating Custom Hive UDF's.
  • Developed automated shell script to execute Hive Queries.
  • Involved in processing ingested raw data using Apache Pig.
  • Monitored continuously and managed the Hadoop cluster using cloudera manager.
  • Worked on different file formats like JSON, AVRO, ORC, Parquet and Compression like Snappy, zlib, ls4 etc.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Gained Knowledge in creating Tableau dashboard for reporting analyzed data.
  • Expertise with NoSQL databases like HBase.
  • Experienced in managing and reviewing the Hadoop log files.
  • Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.

Environment: HDFS, MapReduce, Sqoop, Hive, Pig, Shark, Spark, Oozie, MySQL, Eclipse, Git, GitHub, Jenkins.

Confidential

Java/J2EE developer

Responsibilities:

  • Involved in designing Class and Sequence diagrams with UML and Data flow diagrams.
  • Implemented MVC architecture using Strut’s framework to get the Free Quote.
  • Designed and developed front end using JSP, Struts (tiles), XML, JavaScript, and HTML.
  • Used Struts tag libraries to create JSP.
  • Implemented Spring MVC, dependency Injection (DI) and aspect-oriented programming (AOP) features along with Hibernate.
  • Experienced with implementing navigation usingSpring MVC.
  • Used Hibernate for object-relational mapping persistence.
  • Implemented message driven beansto get from queues to send again to support team usingMSendcommands.
  • Experienced withhibernate core interfaceslike configuration, session factory, transactional and criteria interfaces.
  • Reviewed the requirements and Involved in database design for new requirements
  • Wrote Complex SQL queries to perform various database operations usingTOAD.
  • Java Mail API was used to notify the Agents about the free quote and for sending Email to the Customer with Promotion Code for validation.
  • Involved in testing using Junit.
  • Performed application development using Eclipse and Web Sphere Application Server for deployment.
  • Used SVN for version control.

Environment: Java, Spring, Hibernate, JM’s, Web Services, Ejb, Sql, Pl/Sql, Html, CSS, Jsp, java script, Ant, Junit, Web sphere.

We'd love your feedback!