Sr. Big Data Engineer Resume
Dunwoody, GA
SUMMARY
- Around 7+ years of experience with strong emphasis on Design, Development, Implementation, and Deployment of Software Applications.
- Over 5+ years of comprehensive experience in Data Engineering with strong emphasis on Big data and Hadoop ecosystem frameworks.
- Hands on experience with Hadoop Ecosystem components like Spark, MapReduce (Processing), HDFS(Storage), Hive, Impala (Analytical Querying), Yarn, Sqoop, HBase, Oozie and Kafka.
- Strong knowledge on various programming languages with expertise in Java, Scala and Python.
- Extensive experience writing end to end Spark Applications both using Scala and Python and utilizing Spark RDD, Spark DataFrames, Spark SQL and Spark Streaming.
- Gained good experience troubleshooting long running jobs in Spark and fine tuning the performance bottlenecks.
- Good experience creating real time streaming pipelines using Kafka and Spark Streaming for consuming.
- Experience working with both Distributions (CDH, HDP) and cloud services primarily AWS.
- Solid experience working with various native services in AWS Cloud like S3, EMR, Athena, Glue, Redshift, AWS SWF etc., for building data pipelines.
- Experience working with NoSQL databases like HBase and Cassandra.
- Good experience in handling data manipulation using python Scripts and experience in developing Python scripts for automation.
- Experience in developing MapReduce jobs in Java for data cleaning and pre - processing.
- Expertise in writing Hive Scripts and extended their functionality using User Defined Functions (UDF's).
- Expertise in modelling data efficiently in Hive tables using Partitions and Bucketing.
- Expertise in preparing Interactive Data Visualization's using Tableau Software from different sources.
- Expertise in Object-Oriented Analysis and Design (OOAD) like UML and use of various design patterns.
- Experience in Java, JSP, Servlets, EJB, Web Logic, Web Sphere, Hibernate, Spring, JBoss, JDBC, RMI, Java Script, Ajax, jQuery, XML and HTML.
- Fluent with the core Java concepts like I/O, Multi-Threading, Exceptions, Reg Ex, Data Structures and Serialization.
- Extensive experience in Java and J2EE technologies like Servlets, JSP, JSF, JDBC, JavaScript, ExtJS, spring, hibernate, and Junit testing.
- Performed Unit Testing using Junit Testing Framework and Log4J to monitor the error logs.
- Experience in process improvement, Normalization/De-normalization, Data extraction, cleansing and Manipulation.
- Expertise in working with Transactional Databases like Oracle, SQL server, My SQL, and Db2.
TECHNICAL SKILLS
Big Data Ecosystem: Spark, Pyspark, Mapreduce, Hive, HDFS, Sqoop, HBase, Flume, Oozie, Impala, Kafka, Nifi, Airflow, Databricks
Languages: Java, Scala, Python
No Sql: HBase, Cassandra, MongoDb
Databases: MySQL, Teradata, Oracle
IDEs: Eclipse, Intellij, PyCharm
Other Tools: Maven, Jenkins, Putty, WinSCP, Jira, Confluence
Version Control: GitHub, SVN, CVS
Methodologies: Agile, Waterfall
PROFESSIONAL EXPERIENCE
Confidential, Dunwoody, GA
Sr. Big Data Engineer
Responsibilities:
- Worked on development, testing and deployment of Spark applications.
- Worked on troubleshooting and fine-tuning Spark jobs.
- Worked on building real time pipelines using Kafka and Spark Streaming.
- Written Kafka producers using Kafka Producer Api and integrated with Spark Streaming applications for consuming the stream messages.
- Worked on automating the data pipelines and ensuring the reliability of the data pipelines.
- Used Spark JDBC Readers for connecting to external databases and pulling the data to S3 data lake.
- Used Spark JDBC Writers for connecting to redshift and writing the processed dataframes to Redshift.
- Used Hive scripting for producing custom and adhoc data sets requested by downstream business teams.
- Evaluated AWS Databricks as an option to replace EMR clusters and create data pipelines using Databrics Notebooks.
- As part of evaluation, did some working poc's on connecting to external metastores from databricks clusters, utilizing databricks utils etc.,
- Utilized Glue metastore service in AWS for storing all the Hive metadata.
- Utilized Athena Interactive Query Service in AWS for performing data analysis.
- Automated launching of EMR Spark clusters using AWS Java SDK and terminating the clusters once step is finished.
- Responsible for creating Spring Boot based Rest applications to allow some of the metadata and preview of the processed data to be consumed by downstream application teams.
- Responsible for automating CICD build and deployment using Jenkins.
Environment: AWS EMR, Spark, HDFS, Python, Hive, HBase, HiveQL, Sqoop, Java, Scala, Unix, IntelliJ, Autosys, Maven
Confidential, Bloomfield, CT
Sr. Big Data Developer
Responsibilities:
- Involved in developing roadmap for migration of enterprise data from multiple data sources like SQL Server, provider databases into S3 which serves as a centralized datahub across the organization.
- Loaded and transformed large sets of structured and semi structured data from various downstream systems.
- Developed ETL pipelines using Spark and Hive for performing various business specific transformations.
- Building data applications and automating the pipelines in Spark for bulk loads as well as Incremental Loads of various Datasets.
- Worked closely with our data scientist team’s and business consumers to shape the datasets as per the requirements.
- Automated the data pipeline to ETL all the Datasets along with full loads and incremental loads of data.
- Performed bulk load of JSON data from s3 bucket to snowflake.
- Used Snowflake functions to perform semi structures data parsing entirely with SQL statements
- Utilized AWS services like EMR, S3, Glue Metastore and Athena extensively for building the data applications.
- Implemented a'server less'architecture usingAPI Gateway, Lambda, and Dynamo DBand deployedAWS Lambda codefrom Amazon S3 buckets. Created a Lambda Deployment function, and configured it to receive events from your S3 bucket
- Worked on building input adapters for data dumps from FTP Servers using Apache spark.
- Generating Data Models using Erwin9.6 and developed relational database system and involved in Logical modeling using the Dimensional Modeling techniques such as Star Schema and Snowflake Schema
- Wrote spark applications to perform operations like data inspection, cleaning, load and transforms the large sets of structured and semi-structured data.
- Developed Spark with Scala and Spark-SQL for testing and processing of data.
- Reporting the spark job stats, monitoring and running data quality checks are made available for each Datasets.
- Used SQL Programming Skills to work around the Relational SQL Databases.
Environment: AWS Cloud Services, Apache Spark, Spark-SQL, Unix, Kafka, Scala, SQL Server.
Confidential, NYC, NY
Big Data /Hadoop Engineer
Responsibilities:
- Involved in importing and exporting data between Hadoop Data Lake and Relational Systems like Oracle, MySQL using Sqoop.
- Involved in developing spark applications to perform ELT kind of operations on the data.
- Modified existing MapReduce jobs to Spark transformations and actions by utilizing Spark RDDs, DataFrames and Spark SQL API’s
- Utilized Hive partitioning, Bucketing and performed various kinds of joins on Hive tables
- Involved in creating Hive external tables to perform ETL on data that is produced on daily basis
- Validated the data being ingested into Hive for further filtering and cleansing.
- Developed Sqoop jobs for performing incremental loads from RDBMS into HDFS and further applied Spark transformations
- Loaded data into hive tables from spark and used Parquet columnar format.
- Created Oozie workflows to automate and productionize the data pipelines
- Migrating Map Reduce code into Spark transformations using Spark and Scala.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Did a Poc on GCP cloud services and feasibility of migrating onprem setup to GCP cloud and utilizing various services in GCP like Dataproc, BigQuery, Cloud Storage etc.,
- Designed, documented operational problems by following standards and procedures using JIRA
Environment: Hadoop, Hive, Impala, Oracle, Spark, Pig, Sqoop, Oozie, Map Reduce, GIT, Confluence, Jenkins.
Confidential
Hadoop Developer
Responsibilities:
- Involved in importing data from Microsoft SQLserver, MySQL, Teradata. into HDFS using Sqoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Used Hive to analyze the partitioned and bucked data to compute various metrics of reporting.
- Involved in creating Hive tables loading data, and writing queries that will run internally in MapReduce
- Involved in creating Hive External tables for HDFS data.
- Solved performance issues in Hive and Pig Scripts with understanding of Joins, Group and Aggregation and perform the MapReduce jobs.
- Used Spark for transformations, event joins and some aggregations before storing the data into HDFS.
- Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in the data being reported.
- Analyze the large amount of data sets to determine optimal way to aggregate.
- Worked on the Oozie workflow to run multiple Hive and Pig jobs.
- Worked on creating Custom Hive UDF's.
- Developed automated shell script to execute Hive Queries.
- Involved in processing ingested raw data using Apache Pig.
- Monitored continuously and managed the Hadoop cluster using cloudera manager.
- Worked on different file formats like JSON, AVRO, ORC, Parquet and Compression like Snappy, zlib, ls4 etc.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
- Gained Knowledge in creating Tableau dashboard for reporting analyzed data.
- Expertise with NoSQL databases like HBase.
- Experienced in managing and reviewing the Hadoop log files.
- Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.
Environment: HDFS, MapReduce, Sqoop, Hive, Pig, Shark, Spark, Oozie, MySQL, Eclipse, Git, GitHub, Jenkins.
Confidential
Java/J2EE developer
Responsibilities:
- Involved in designing Class and Sequence diagrams with UML and Data flow diagrams.
- Implemented MVC architecture using Strut’s framework to get the Free Quote.
- Designed and developed front end using JSP, Struts (tiles), XML, JavaScript, and HTML.
- Used Struts tag libraries to create JSP.
- Implemented Spring MVC, dependency Injection (DI) and aspect-oriented programming (AOP) features along with Hibernate.
- Experienced with implementing navigation usingSpring MVC.
- Used Hibernate for object-relational mapping persistence.
- Implemented message driven beansto get from queues to send again to support team usingMSendcommands.
- Experienced withhibernate core interfaceslike configuration, session factory, transactional and criteria interfaces.
- Reviewed the requirements and Involved in database design for new requirements
- Wrote Complex SQL queries to perform various database operations usingTOAD.
- Java Mail API was used to notify the Agents about the free quote and for sending Email to the Customer with Promotion Code for validation.
- Involved in testing using Junit.
- Performed application development using Eclipse and Web Sphere Application Server for deployment.
- Used SVN for version control.
Environment: Java, Spring, Hibernate, JM’s, Web Services, Ejb, Sql, Pl/Sql, Html, CSS, Jsp, java script, Ant, Junit, Web sphere.
