Big Data Engineer Resume Quincy, MA - Hire IT People

PROFESSIONAL SUMMARY:

Over 7 years of experience in software development with 5 years of experience in Big Data as Hadoop/Spark developer, 2 plus of years of experience in AWS and 2 plus years of experience in analysis, design, development and testing of enterprise software applications using Java/J2ee technologies.
Strong hands on experience in developing and integrating big data applications allowing for flexible and scalable data transformation with data quality controls.
Experience working with major components in Hadoop Ecosystem like Hadoop MapReduce, HDFS, Hive, Pig, Sqoop, HBase, Flume, Spark, Storm, Kafka, Oozie and Zookeeper.
Excellent understanding of Hadoop Architecture (YARN & MRV1) and experience in setting up, configuring and monitoring Hadoop clusters.
Used Spark API over Cloudera and HDP to perform analytics on data in Hive.
Analyzed large data sets of structured, semi - structured and unstructured data using AWS Athena, HiveQL and Spark programs.
Worked with data in various file formats including Avro, ORC and Parquet .
Developed Spark code using Scala and Spark-SQL for data processing and testing.
Experienced on Amazon AWS concepts like EMR, EC2 and S3 Buckets which provides fast and efficient processing of Big Data.
Experienced in Extraction, Transformation, and Loading (ETL) processes based on business need using AWS Glue and Oozie to execute multiple Spark, Hive, Shell and SSH actions.
Good understanding of NoSQL databases like HBase and Cassandra .
Implemented Talend Bigdata to load the data in to HDFS, S3, Hive.
Experienced in Cluster maintenance, commissioning and decommissioning data nodes, troubleshooting, managing & reviewing Hadoop log files.
Solid understanding and extensive experience in working with different databases such as Oracle, SQL Server, MySQL and writing stored procedures, functions, joins and triggers for different data models.
Excellent Java development skills using J2EE, Servlets, Junit and familiar with popular frameworks such as Spring and Hibernate .
Extensive experience in PL/SQL, developing stored procedures with optimization techniques
Expertise in Waterfall and Agile - SCRUM methodologies.
Excellent team player, with pleasant disposition and ability to lead a team.

TECHNICAL SKILLS:

Programming Skills: Python, Scala, Java, Shell Scripting.

Big Data Hadoop: Map Reduce, HDFS, HBase, Zookeeper, HIVE, Pig, SQOOP, Spark, Cassandra, Oozie, Flume, Ambari, Flume, Kafka, Mahout, Impala, HUE.

Amazon Web Service s: EC2, S3 Bucket, EMR, DynamoDB, Redshift, Athena, Glue.

Databases: Oracle, MySQL, SQL Server 2008, Apache Cassandra, Mongo DB, HBase.

Operating Environment: Red Hat Linux 6/5.2, UNIX and IBM AIX 7.1.

Developer Tools: IntelliJ, Eclipse IDE, Toad, SQL Developer.

Frameworks: Struts, Hibernate, Spring MVC, Spring Core, Spring JDBC.

Tools: FileZilla, Visual Studio, Tableau, Zeppelin

PROFESSIONAL EXPERIENCE:

Big Data Engineer

Confidential, Quincy, MA

Responsibilities:

Redesigned and architected new data ingestion and load solution for a Confidential DWH using SPARK, Scala, HIVE on yarn cluster for performance improvement.
Performed API calls using the python scripting. Performed reads and writes to S3 using Boto3 library.
Working on a Data Lake creation project using Hive, Sqoop, Spark and AWS S3 .
Developed a custom parser in Spark Scala for complex healthcare data and perform CDC. Deployed this application on AWS EMR cluster as well as on Hortonworks ( HDP ) cloud.
Developed Airflow DAG for daily incremental loads, which gets data from Oracle and then imported into hive tables using Sqoop .
Worked on Scala code base related to Apache Spark performing Transformations and Actions on RDDs, Data Frames and Datasets using Spark SQL.
Performed Data Ingestion using Sqoop, Used Hive QL for data processing and scheduled the complex work flows using Oozie.
Used AWS EMR for processing of the ETL jobs and load to S3 buckets and AWS Athena for ad-hoc/low latency querying on S3 data.
Implemented buckets in HIVE and partitioning and dynamic partitions in Spark, Hive.
Executed complex Hive QL queries for required data extraction from Hive tables.
Great familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive QL.
Developed Slowly changings Dimensions ( SCDs ) populating the data to S3 using Spark Scala.
Worked with large databases and write complex SQL queries to get the right input files to process and analyze in Hadoop environment.
Developed Oozie workflows to ingest/parse the raw data, populate staging tables and store the refined data in partitioned tables in the Hive.
Assisted in installing, configuring and maintaining Hadoop cluster on Hortonworks Data Platform with Hadoop tools like Spark, Hive, Pig, HBASE, Zookeeper and Sqoop for application development.
Actively monitor, research and analyze ways in which the services in AWS can be improved.
Leveraging build tools such as SBT and Maven for Building the Spark Applications.
Used CI/CD for spark apps using Jenkins and Git.

Environment: Spark, Hive, HDFS, Sqoop, AWS EC2, AWS EMR, AWS S3 Buckets, Airflow, Hortonworks Distributed Platform, Maven Build, MySQL, AWS Athena, Agile-Scrum, Scala, Python, putty, IntelliJ, Git.

HADOOP/SPARK DEVELOPER

Confidential, Reston, VA

Responsibilities:

Extensively migrated existing architecture to Spark Streaming for live streaming of data.
Executed Spark code using Scala for Spark Streaming/SQL for faster processing of data.
Developed Oozie Bundles to schedule Sqoop and Hive jobs to create data pipelines.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's in Python.
Developed Kafka producer for message handling.
Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job.
Used Amazon CLI for data transfers to and from Amazon S3 buckets.
Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
Experienced in pulling data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
Worked on Data serialization formats for converting complex objects into sequence bits by using ORC and Parquet.
Worked on analyzing and examining customer behavioral data using HiveQL.
Analyzed large amount of data sets to determine optimal way to aggregate and report on it.
Involved in daily SCRUM meetings to discuss the development/progress.

Environment: Hadoop, Hive, HDFS, Pig, Sqoop, Oozie, Spark, Spark-Streaming, Kafka, Apache Solr, Cassandra, Cloudera Distribution, Maven Build, MySQL, AWS, Agile-Scrum. Scala, putty, IntelliJ, Git.

HADOOP/SPARK DEVELOPER

Confidential, Tampa, FL

Responsibilities:

Used Sqoop to inject data from server to HDFS and Hive as a part of data acquisition.
Used Spark to remove all the missing data and data transformation to create new features in pre-processing phase.
Used Hive and Impala to get some insights about the customer data in data exploration stage.
Used Sqoop, Hive, Spark and Oozie for building data pipeline.
Installed and configured Hadoop MapReduce and HDFS.
Developed multiple Map Reduce jobs in Java for data cleaning and processing.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
Experienced in managing and reviewing Hadoop log files.
Involved in configuring Hadoop ecosystem components like HBase, Hive, Pig and Sqoop.
Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
Worked with the business analyst team for gathering requirements and client needs.
Hands on experience in loading data from UNIX file system and Teradata to HDFS.

Environment: Hadoop, Map Reduce, Spark, Spark SQL, Flume, HDFS, Hive, PIG, Sqoop, Oozie, SQL, Scala, Java, Zookeeper, Shell script.

HADOOP DEVELOPER/ADMINISTRATOR

Confidential, Bentonville, AR

Responsibilities:

Installed and configured Hive, Pig, Sqoop and Flume on the Hadoop cluster.
Developed simple to complex MapReduce jobs using Hive and Pig.
Data processing of logs and semi structured content using PIG.
Created Hive tables to store the processed results in a tabular format.
Developed Hive Scripts for implementing dynamic Partitions.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Used Sqoop to efficiently transfer data between databases and HDFS.
Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive, and HBase.
Involved in gathering the requirements, designing, development and testing.
Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
Debugging and identifying issues reported by QA with the Hadoop jobs.
Worked on Hue interface for querying the data.
Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI team.

Environment: Hadoop (HDP 2.X), UNIX, Eclipse, HDFS, Java, MapReduce, Apache Pig, Hive, Flume, Python, SQOOP and MySQL, Tableau.

JAVA DEVELOPER

Confidential

Responsibilities:

Designed and developed user interface using Struts tags, JSP, HTML and JavaScript.
Developed user specific highlights (dashboard menu) section, Home page, Admin home page, user module (Modify/search users, create user screens with assigning various roles) using Spring MVC framework, Hibernate ORM Module, Spring Core Module, XML, JSP and XSLT.
Implemented various UI components using JQuery, HTML, CSS.
Implemented various services required for the business application.
Involved in designing the user interfaces using HTML, CSS, and JSPs.
Configured Hibernate and Spring to map the business objects to Oracle Database using XML configuration file.
Involved in writing Shell Script to export oracle table's data into flat files and performed unit testing using JUNIT and used Log4j for logging and automatic batch jobs.
Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic.

Environment: Spring Core, Spring MVC, Hibernate, HTML, CSS, JQuery, AJAX, IBM Websphere, Oracle, Eclipse, Maven, JIRA, Git.

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Quincy, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship