Big Data Engineer Resume
Quincy, MA
PROFESSIONAL SUMMARY:
- Over 7 years of experience in software development with 5 years of experience in Big Data as Hadoop/Spark developer, 2 plus of years of experience in AWS and 2 plus years of experience in analysis, design, development and testing of enterprise software applications using Java/J2ee technologies.
- Strong hands on experience in developing and integrating big data applications allowing for flexible and scalable data transformation with data quality controls.
- Experience working with major components in Hadoop Ecosystem like Hadoop MapReduce, HDFS, Hive, Pig, Sqoop, HBase, Flume, Spark, Storm, Kafka, Oozie and Zookeeper.
- Excellent understanding of Hadoop Architecture (YARN & MRV1) and experience in setting up, configuring and monitoring Hadoop clusters.
- Used Spark API over Cloudera and HDP to perform analytics on data in Hive.
- Analyzed large data sets of structured, semi - structured and unstructured data using AWS Athena, HiveQL and Spark programs.
- Worked with data in various file formats including Avro, ORC and Parquet .
- Developed Spark code using Scala and Spark-SQL for data processing and testing.
- Experienced on Amazon AWS concepts like EMR, EC2 and S3 Buckets which provides fast and efficient processing of Big Data.
- Experienced in Extraction, Transformation, and Loading (ETL) processes based on business need using AWS Glue and Oozie to execute multiple Spark, Hive, Shell and SSH actions.
- Good understanding of NoSQL databases like HBase and Cassandra .
- Implemented Talend Bigdata to load the data in to HDFS, S3, Hive.
- Experienced in Cluster maintenance, commissioning and decommissioning data nodes, troubleshooting, managing & reviewing Hadoop log files.
- Solid understanding and extensive experience in working with different databases such as Oracle, SQL Server, MySQL and writing stored procedures, functions, joins and triggers for different data models.
- Excellent Java development skills using J2EE, Servlets, Junit and familiar with popular frameworks such as Spring and Hibernate .
- Extensive experience in PL/SQL, developing stored procedures with optimization techniques
- Expertise in Waterfall and Agile - SCRUM methodologies.
- Excellent team player, with pleasant disposition and ability to lead a team.
TECHNICAL SKILLS:
Programming Skills: Python, Scala, Java, Shell Scripting.
Big Data Hadoop: Map Reduce, HDFS, HBase, Zookeeper, HIVE, Pig, SQOOP, Spark, Cassandra, Oozie, Flume, Ambari, Flume, Kafka, Mahout, Impala, HUE.
Amazon Web Service s: EC2, S3 Bucket, EMR, DynamoDB, Redshift, Athena, Glue.
Databases: Oracle, MySQL, SQL Server 2008, Apache Cassandra, Mongo DB, HBase.
Operating Environment: Red Hat Linux 6/5.2, UNIX and IBM AIX 7.1.
Developer Tools: IntelliJ, Eclipse IDE, Toad, SQL Developer.
Frameworks: Struts, Hibernate, Spring MVC, Spring Core, Spring JDBC.
Tools: FileZilla, Visual Studio, Tableau, Zeppelin
PROFESSIONAL EXPERIENCE:
Big Data Engineer
Confidential, Quincy, MA
Responsibilities:
- Redesigned and architected new data ingestion and load solution for a Confidential DWH using SPARK, Scala, HIVE on yarn cluster for performance improvement.
- Performed API calls using the python scripting. Performed reads and writes to S3 using Boto3 library.
- Working on a Data Lake creation project using Hive, Sqoop, Spark and AWS S3 .
- Developed a custom parser in Spark Scala for complex healthcare data and perform CDC. Deployed this application on AWS EMR cluster as well as on Hortonworks ( HDP ) cloud.
- Developed Airflow DAG for daily incremental loads, which gets data from Oracle and then imported into hive tables using Sqoop .
- Worked on Scala code base related to Apache Spark performing Transformations and Actions on RDDs, Data Frames and Datasets using Spark SQL.
- Performed Data Ingestion using Sqoop, Used Hive QL for data processing and scheduled the complex work flows using Oozie.
- Used AWS EMR for processing of the ETL jobs and load to S3 buckets and AWS Athena for ad-hoc/low latency querying on S3 data.
- Implemented buckets in HIVE and partitioning and dynamic partitions in Spark, Hive.
- Executed complex Hive QL queries for required data extraction from Hive tables.
- Great familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive QL.
- Developed Slowly changings Dimensions ( SCDs ) populating the data to S3 using Spark Scala.
- Worked with large databases and write complex SQL queries to get the right input files to process and analyze in Hadoop environment.
- Developed Oozie workflows to ingest/parse the raw data, populate staging tables and store the refined data in partitioned tables in the Hive.
- Assisted in installing, configuring and maintaining Hadoop cluster on Hortonworks Data Platform with Hadoop tools like Spark, Hive, Pig, HBASE, Zookeeper and Sqoop for application development.
- Actively monitor, research and analyze ways in which the services in AWS can be improved.
- Leveraging build tools such as SBT and Maven for Building the Spark Applications.
- Used CI/CD for spark apps using Jenkins and Git.
Environment: Spark, Hive, HDFS, Sqoop, AWS EC2, AWS EMR, AWS S3 Buckets, Airflow, Hortonworks Distributed Platform, Maven Build, MySQL, AWS Athena, Agile-Scrum, Scala, Python, putty, IntelliJ, Git.
HADOOP/SPARK DEVELOPER
Confidential, Reston, VA
Responsibilities:
- Extensively migrated existing architecture to Spark Streaming for live streaming of data.
- Executed Spark code using Scala for Spark Streaming/SQL for faster processing of data.
- Developed Oozie Bundles to schedule Sqoop and Hive jobs to create data pipelines.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's in Python.
- Developed Kafka producer for message handling.
- Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job.
- Used Amazon CLI for data transfers to and from Amazon S3 buckets.
- Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
- Experienced in pulling data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
- Worked on Data serialization formats for converting complex objects into sequence bits by using ORC and Parquet.
- Worked on analyzing and examining customer behavioral data using HiveQL.
- Analyzed large amount of data sets to determine optimal way to aggregate and report on it.
- Involved in daily SCRUM meetings to discuss the development/progress.
Environment: Hadoop, Hive, HDFS, Pig, Sqoop, Oozie, Spark, Spark-Streaming, Kafka, Apache Solr, Cassandra, Cloudera Distribution, Maven Build, MySQL, AWS, Agile-Scrum. Scala, putty, IntelliJ, Git.
HADOOP/SPARK DEVELOPER
Confidential, Tampa, FL
Responsibilities:
- Used Sqoop to inject data from server to HDFS and Hive as a part of data acquisition.
- Used Spark to remove all the missing data and data transformation to create new features in pre-processing phase.
- Used Hive and Impala to get some insights about the customer data in data exploration stage.
- Used Sqoop, Hive, Spark and Oozie for building data pipeline.
- Installed and configured Hadoop MapReduce and HDFS.
- Developed multiple Map Reduce jobs in Java for data cleaning and processing.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
- Experienced in managing and reviewing Hadoop log files.
- Involved in configuring Hadoop ecosystem components like HBase, Hive, Pig and Sqoop.
- Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked with the business analyst team for gathering requirements and client needs.
- Hands on experience in loading data from UNIX file system and Teradata to HDFS.
Environment: Hadoop, Map Reduce, Spark, Spark SQL, Flume, HDFS, Hive, PIG, Sqoop, Oozie, SQL, Scala, Java, Zookeeper, Shell script.
HADOOP DEVELOPER/ADMINISTRATOR
Confidential, Bentonville, AR
Responsibilities:
- Installed and configured Hive, Pig, Sqoop and Flume on the Hadoop cluster.
- Developed simple to complex MapReduce jobs using Hive and Pig.
- Data processing of logs and semi structured content using PIG.
- Created Hive tables to store the processed results in a tabular format.
- Developed Hive Scripts for implementing dynamic Partitions.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Used Sqoop to efficiently transfer data between databases and HDFS.
- Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive, and HBase.
- Involved in gathering the requirements, designing, development and testing.
- Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
- Debugging and identifying issues reported by QA with the Hadoop jobs.
- Worked on Hue interface for querying the data.
- Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI team.
Environment: Hadoop (HDP 2.X), UNIX, Eclipse, HDFS, Java, MapReduce, Apache Pig, Hive, Flume, Python, SQOOP and MySQL, Tableau.
JAVA DEVELOPER
Confidential
Responsibilities:
- Designed and developed user interface using Struts tags, JSP, HTML and JavaScript.
- Developed user specific highlights (dashboard menu) section, Home page, Admin home page, user module (Modify/search users, create user screens with assigning various roles) using Spring MVC framework, Hibernate ORM Module, Spring Core Module, XML, JSP and XSLT.
- Implemented various UI components using JQuery, HTML, CSS.
- Implemented various services required for the business application.
- Involved in designing the user interfaces using HTML, CSS, and JSPs.
- Configured Hibernate and Spring to map the business objects to Oracle Database using XML configuration file.
- Involved in writing Shell Script to export oracle table's data into flat files and performed unit testing using JUNIT and used Log4j for logging and automatic batch jobs.
- Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic.
Environment: Spring Core, Spring MVC, Hibernate, HTML, CSS, JQuery, AJAX, IBM Websphere, Oracle, Eclipse, Maven, JIRA, Git.