Sr. Big Data Engineer Resume Dunwoody, GA - Hire IT People

SUMMARY

Around 7+ years of experience with strong emphasis on Design, Development, Implementation, and Deployment of Software Applications.
Over 5+ years of comprehensive experience in Data Engineering with strong emphasis on Big data and Hadoop ecosystem frameworks.
Hands on experience with Hadoop Ecosystem components like Spark, MapReduce (Processing), HDFS(Storage), Hive, Impala (Analytical Querying), Yarn, Sqoop, HBase, Oozie and Kafka.
Strong knowledge on various programming languages with expertise in Java, Scala and Python.
Extensive experience writing end to end Spark Applications both using Scala and Python and utilizing Spark RDD, Spark DataFrames, Spark SQL and Spark Streaming.
Gained good experience troubleshooting long running jobs in Spark and fine tuning the performance bottlenecks.
Good experience creating real time streaming pipelines using Kafka and Spark Streaming for consuming.
Experience working with both Distributions (CDH, HDP) and cloud services primarily AWS.
Solid experience working with various native services in AWS Cloud like S3, EMR, Athena, Glue, Redshift, AWS SWF etc., for building data pipelines.
Experience working with NoSQL databases like HBase and Cassandra.
Good experience in handling data manipulation using python Scripts and experience in developing Python scripts for automation.
Experience in developing MapReduce jobs in Java for data cleaning and pre - processing.
Expertise in writing Hive Scripts and extended their functionality using User Defined Functions (UDF's).
Expertise in modelling data efficiently in Hive tables using Partitions and Bucketing.
Expertise in preparing Interactive Data Visualization's using Tableau Software from different sources.
Expertise in Object-Oriented Analysis and Design (OOAD) like UML and use of various design patterns.
Experience in Java, JSP, Servlets, EJB, Web Logic, Web Sphere, Hibernate, Spring, JBoss, JDBC, RMI, Java Script, Ajax, jQuery, XML and HTML.
Fluent with the core Java concepts like I/O, Multi-Threading, Exceptions, Reg Ex, Data Structures and Serialization.
Extensive experience in Java and J2EE technologies like Servlets, JSP, JSF, JDBC, JavaScript, ExtJS, spring, hibernate, and Junit testing.
Performed Unit Testing using Junit Testing Framework and Log4J to monitor the error logs.
Experience in process improvement, Normalization/De-normalization, Data extraction, cleansing and Manipulation.
Expertise in working with Transactional Databases like Oracle, SQL server, My SQL, and Db2.

TECHNICAL SKILLS

Big Data Ecosystem: Spark, Pyspark, Mapreduce, Hive, HDFS, Sqoop, HBase, Flume, Oozie, Impala, Kafka, Nifi, Airflow, Databricks

Languages: Java, Scala, Python

No Sql: HBase, Cassandra, MongoDb

Databases: MySQL, Teradata, Oracle

IDEs: Eclipse, Intellij, PyCharm

Other Tools: Maven, Jenkins, Putty, WinSCP, Jira, Confluence

Version Control: GitHub, SVN, CVS

Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE

Confidential, Dunwoody, GA

Sr. Big Data Engineer

Responsibilities:

Worked on development, testing and deployment of Spark applications.
Worked on troubleshooting and fine-tuning Spark jobs.
Worked on building real time pipelines using Kafka and Spark Streaming.
Written Kafka producers using Kafka Producer Api and integrated with Spark Streaming applications for consuming the stream messages.
Worked on automating the data pipelines and ensuring the reliability of the data pipelines.
Used Spark JDBC Readers for connecting to external databases and pulling the data to S3 data lake.
Used Spark JDBC Writers for connecting to redshift and writing the processed dataframes to Redshift.
Used Hive scripting for producing custom and adhoc data sets requested by downstream business teams.
Evaluated AWS Databricks as an option to replace EMR clusters and create data pipelines using Databrics Notebooks.
As part of evaluation, did some working poc's on connecting to external metastores from databricks clusters, utilizing databricks utils etc.,
Utilized Glue metastore service in AWS for storing all the Hive metadata.
Utilized Athena Interactive Query Service in AWS for performing data analysis.
Automated launching of EMR Spark clusters using AWS Java SDK and terminating the clusters once step is finished.
Responsible for creating Spring Boot based Rest applications to allow some of the metadata and preview of the processed data to be consumed by downstream application teams.
Responsible for automating CICD build and deployment using Jenkins.

Environment: AWS EMR, Spark, HDFS, Python, Hive, HBase, HiveQL, Sqoop, Java, Scala, Unix, IntelliJ, Autosys, Maven

Confidential, Bloomfield, CT

Sr. Big Data Developer

Responsibilities:

Involved in developing roadmap for migration of enterprise data from multiple data sources like SQL Server, provider databases into S3 which serves as a centralized datahub across the organization.
Loaded and transformed large sets of structured and semi structured data from various downstream systems.
Developed ETL pipelines using Spark and Hive for performing various business specific transformations.
Building data applications and automating the pipelines in Spark for bulk loads as well as Incremental Loads of various Datasets.
Worked closely with our data scientist team’s and business consumers to shape the datasets as per the requirements.
Automated the data pipeline to ETL all the Datasets along with full loads and incremental loads of data.
Performed bulk load of JSON data from s3 bucket to snowflake.
Used Snowflake functions to perform semi structures data parsing entirely with SQL statements
Utilized AWS services like EMR, S3, Glue Metastore and Athena extensively for building the data applications.
Implemented a'server less'architecture usingAPI Gateway, Lambda, and Dynamo DBand deployedAWS Lambda codefrom Amazon S3 buckets. Created a Lambda Deployment function, and configured it to receive events from your S3 bucket
Worked on building input adapters for data dumps from FTP Servers using Apache spark.
Generating Data Models using Erwin9.6 and developed relational database system and involved in Logical modeling using the Dimensional Modeling techniques such as Star Schema and Snowflake Schema
Wrote spark applications to perform operations like data inspection, cleaning, load and transforms the large sets of structured and semi-structured data.
Developed Spark with Scala and Spark-SQL for testing and processing of data.
Reporting the spark job stats, monitoring and running data quality checks are made available for each Datasets.
Used SQL Programming Skills to work around the Relational SQL Databases.

Environment: AWS Cloud Services, Apache Spark, Spark-SQL, Unix, Kafka, Scala, SQL Server.

Confidential, NYC, NY

Big Data /Hadoop Engineer

Responsibilities:

Involved in importing and exporting data between Hadoop Data Lake and Relational Systems like Oracle, MySQL using Sqoop.
Involved in developing spark applications to perform ELT kind of operations on the data.
Modified existing MapReduce jobs to Spark transformations and actions by utilizing Spark RDDs, DataFrames and Spark SQL API’s
Utilized Hive partitioning, Bucketing and performed various kinds of joins on Hive tables
Involved in creating Hive external tables to perform ETL on data that is produced on daily basis
Validated the data being ingested into Hive for further filtering and cleansing.
Developed Sqoop jobs for performing incremental loads from RDBMS into HDFS and further applied Spark transformations
Loaded data into hive tables from spark and used Parquet columnar format.
Created Oozie workflows to automate and productionize the data pipelines
Migrating Map Reduce code into Spark transformations using Spark and Scala.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Did a Poc on GCP cloud services and feasibility of migrating onprem setup to GCP cloud and utilizing various services in GCP like Dataproc, BigQuery, Cloud Storage etc.,
Designed, documented operational problems by following standards and procedures using JIRA

Environment: Hadoop, Hive, Impala, Oracle, Spark, Pig, Sqoop, Oozie, Map Reduce, GIT, Confluence, Jenkins.

Confidential

Hadoop Developer

Responsibilities:

Involved in importing data from Microsoft SQLserver, MySQL, Teradata. into HDFS using Sqoop.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
Used Hive to analyze the partitioned and bucked data to compute various metrics of reporting.
Involved in creating Hive tables loading data, and writing queries that will run internally in MapReduce
Involved in creating Hive External tables for HDFS data.
Solved performance issues in Hive and Pig Scripts with understanding of Joins, Group and Aggregation and perform the MapReduce jobs.
Used Spark for transformations, event joins and some aggregations before storing the data into HDFS.
Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in the data being reported.
Analyze the large amount of data sets to determine optimal way to aggregate.
Worked on the Oozie workflow to run multiple Hive and Pig jobs.
Worked on creating Custom Hive UDF's.
Developed automated shell script to execute Hive Queries.
Involved in processing ingested raw data using Apache Pig.
Monitored continuously and managed the Hadoop cluster using cloudera manager.
Worked on different file formats like JSON, AVRO, ORC, Parquet and Compression like Snappy, zlib, ls4 etc.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
Gained Knowledge in creating Tableau dashboard for reporting analyzed data.
Expertise with NoSQL databases like HBase.
Experienced in managing and reviewing the Hadoop log files.
Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.

Environment: HDFS, MapReduce, Sqoop, Hive, Pig, Shark, Spark, Oozie, MySQL, Eclipse, Git, GitHub, Jenkins.

Confidential

Java/J2EE developer

Responsibilities:

Involved in designing Class and Sequence diagrams with UML and Data flow diagrams.
Implemented MVC architecture using Strut’s framework to get the Free Quote.
Designed and developed front end using JSP, Struts (tiles), XML, JavaScript, and HTML.
Used Struts tag libraries to create JSP.
Implemented Spring MVC, dependency Injection (DI) and aspect-oriented programming (AOP) features along with Hibernate.
Experienced with implementing navigation usingSpring MVC.
Used Hibernate for object-relational mapping persistence.
Implemented message driven beansto get from queues to send again to support team usingMSendcommands.
Experienced withhibernate core interfaceslike configuration, session factory, transactional and criteria interfaces.
Reviewed the requirements and Involved in database design for new requirements
Wrote Complex SQL queries to perform various database operations usingTOAD.
Java Mail API was used to notify the Agents about the free quote and for sending Email to the Customer with Promotion Code for validation.
Involved in testing using Junit.
Performed application development using Eclipse and Web Sphere Application Server for deployment.
Used SVN for version control.

Environment: Java, Spring, Hibernate, JM’s, Web Services, Ejb, Sql, Pl/Sql, Html, CSS, Jsp, java script, Ant, Junit, Web sphere.

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Dunwoody, GA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship