Big Data Engineer/application Architect Resume
New York, NY
PROFESSIONAL SUMMARY:
- Over 9 years of working experience as a Big Data developer and designed and developed various applications on Big Data and python open - source technologies.
- Strong development skills in Hadoop, HDFS, Map Reduce, Hive, Sqoop, HBase with solid understanding of Hadoop internals.
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Leveraged strong skills in developing applications involving Big Data technologies like Hadoop, Map Reduce, Yarn, Flume, Hive, Pig, Sqoop, H Base, Cloudera, Map R, Avro, Spark and Scala.
- Extensively worked on major components of Hadoop Ecosystem like HDFS, HBase, Hive, Sqoop, PIG, and MapReduce.
- Develop various scripts, numerous batch jobs to schedule various Hadoop programs.
- Experience in analyzing data using Hive QL, and custom MapReduce programs in Java.
- Hands on experience in importing and exporting data from different databases like Oracle, MySQL, into HDFS and Hive using Sqoop.
- Good knowledge of NoSQL databases like Mongo DB, Cassandra and HBase.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, Pig and Splunk.
- Experience in Programming and Development of java modules for an existing web portal based in Java using technologies like JSP, Servlets, JavaScript and HTML, Angular with MVC architecture.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data Analytics.
- Experienced in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
- Expertise in developing a simple web based application using J2EE technologies like JSP, Servlets, and JDBC.
- Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
- Work Extensively in Core Java, Struts, JSF, Spring, Hibernate, Servlets, JSP and Hands-on experience with PL/SQL, XML and SOAP.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
- Extensively worked on Linux based CentOS and strong hands-on experience on Linux commands.
- Well versed working with Relational Database Management Systems as Oracle, MS SQL, MySQL Server
- Hands on experience in advanced Big-Data technologies like Spark Ecosystem (Spark SQL, MLlib, SparkR and Spark Streaming), Kafka and Predictive analytics
- Knowledge of the software Development Life Cycle (SDLC), Agile and Waterfall Methodologies.
- Good knowledge of NoSQL databases such as HBase, MongoDB and Cassandra.
- Experience in working with Eclipse IDE, Net Beans, and Rational Application Developer.
- Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig, Hive 2.3, Sqoop 1.4, Apache Impala 2.1, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper, Databricks
Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, Azure Data Lake, Data Factory
Hadoop Distributions: Cloudera, Hortonworks, MapR
Programming Language: Java, Scala, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0, JSP, Servlets
Frameworks: Spring 5.0.5, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS
Web Technologies: HTML, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX
Databases: Oracle 12c/11g, SQL
Operating Systems: Linux, Unix, Windows 10/8/7
IDE and Tools: Eclipse 4.7, NetBeans 8.2, IntelliJ, Maven, Visual Basic Studio
NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB, Accumulo
Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere
SDLC Methodologies: Agile, Waterfall
Version Control: GIT, SVN, CVS, Codecommit
ADDITIONAL SKILLS:
J2EE, MVC, Java, Hibernate, JSON, JQuery, Eclipse, spring, JavaScript, Hadoop, Hive, MongoDB, Zookeeper, Spark, MapR, Pig, Sqoop, Agile, Azure, Jenkins, HDFS, NoSQL, HBase, Impala, MapReduce, YARN, Oozie, Oracle, PL/SQL, Nifi, XML, MYSQL
WORK EXPERIENCE:
Confidential, New York, NY
Big Data Engineer/Application Architect
Responsibilities:
- Working on Hadoop eco-system over AWS cloud leveraging services like EMR, EC2, S3, CloudFormation, Lambda, Athena, Glue, DynamoDB and AWS cost-explorer etc..
- Responsible for developing and managing the Analytical/Machine learning capabilities on AWS cloud across Amex.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce frameworks, Hive, Spark RDD
- Involved in designing Hadoop architecture on AWS leveraging Service Catalog, CloudFormation, AWS EMR, DynamoDB and event processing using lambda functions.
- Developed Data-governance tools using python and spark for securely placing enterprise data on AWS S3.
- Designed and configured the Hadoop cluster using AWS EMR based on user behavior.
- Responsible for Design EDW Application Solutions & Deployment, optimizing processes, definition and implementation of best practices.
- Built an end-to-end automated tool which performs extracting zip files, load the data into respective hive tables in compressed format using shellscript, pyspark RDD and run QC.
- Responsible for providing support to users across Amex on their data processing and modeling pipelines.
- Provided several performance tuning and query optimization techniques to users for their hive and spark jobs.
- Worked closely with Business users to gather requirements and troubleshooting issues on machine learning algorithms.
- Performed data modeling using gradient boosting, Tree building algorithms such as AXGboost, GBDT, catboost etc...
- Worked closely with Business vendors for enhancing BigData and machine learning platforms on AWS cloud as per business needs.
- Performed several POCs on newly on-boarded AWS and BigData related services which help in enhancing the platform.
- Managed and lead the development effort with the help of a diverse internal and overseas group.
- Developed UI application using Angular and NVD3 to display network graphs of all the interlinked customers.
- Participated in scrum and retrospective meetings and worked closely with scrum master to create features and stories in Jira.
- Extensively worked on Excel for generating pivot tables and performing vlookup to join records from multiple Excel sheets.
Environment: AWS, EMR, EC2, S3, RDS, Glue, Athena, Service Catalog, Cloud Formation, Lambda Functions, Hadoop, Spark, Hive, Python, Pandas, XGBoost, Tensorflow, Angularjs, NVD3, Linux, HDFS, Spark-streaming
Confidential, San Antonio, TX
Sr. Big Data Developer
Responsibilities:
- As a Sr. Big Data Developer worked on Hadoop eco-systems including Hive, HBase, Zookeeper, Spark Streaming with CDH distribution.
- Developed Big Data solutions focused on pattern matching and predictive modeling.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Involved in Agile methodologies, daily scrum meetings, sprint planning.
- Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
- Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Load the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
- Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
- Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
- Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
- Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
- Worked on MongoDB, HBase databases which differ from classic relational databases
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
- Integrated Kafka-Spark streaming for high efficiency throughput and reliability
- Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.
Environment: Hadoop 3.0, Hive 2.3, CDH4, MongoDB, Python, pandas, Zookeeper, Spark, MapR, Pig 0.17, Sqoop, Agile, Azure, Jenkins, HDFS, NoSQL, HBase, Impala, MapReduce, YARN, Oozie, Oracle 12c, PL/SQL, Nifi, XML, JSON, MYSQL, Java
Confidential, Sunnyvale, CA
Big Data/Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
- Developed Apache Spark applications by using spark for data processing from various streaming sources.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Migrated MapReduce jobs to Spark jobs to achieve better performance.
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame.
- Worked on Kafka and REST API to collect and load the data on Hadoop file system also used Sqoop to load the data from relational databases.
- Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
- Involved in transforming data from legacy tables to HDFS and Hive tables using Sqoop.
- Expertise in implementing Spark using and Spark SQL for faster testing and processing of data responsible to manage data from different sources Scala.
- Implemented Apache Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Involved in migrating MapReduce jobs into RDD (Resilient data distributions) and create Spark jobs for better performance.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Developed the batch scripts to fetch the data from ECS cloud and do required transformations in Scala using Spark framework.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Performed transformations like event joins, filter bot traffic and some pre-aggregations using Pig.
- Developed code in Java which creates mapping in Elastic Search even before data is indexed into.
- Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
- Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Created and maintained various Shell and Python scripts for automating various processes and optimized MapReduce code, pig scripts and performance tuning and analysis.
Environment: Hadoop 3.0, Spark, Python, Hive 2.3, Agile, MapReduce, Kafka, HBase, HDFS, Sqoop, Scala, RDBMS, Oozie, Pig 0.17, Sqoop, Cassandra 3.11, NoSQL, Elastic Search, Java
Confidential
Java/J2EE Developer
Responsibilities:
- Developed and utilized J2EE Services and JMS components for messaging communication in WebSphere Application Server.
- Implemented MVC architecture by separating the business logic from the presentation layer.
- Developed code using Java, J2EE, and spring also used Hibernate as an ORM tool for object relational mapping.
- Used JNDI to perform lookup services for the various components of the system.
- Created REST web services to send data in JSON format to different systems using spring boot.
- Extensively used JQuery to provide dynamic User Interface and for the client side validations.
- Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
- Participated in object-oriented design, development and testing of REST APIs using Java.
- Implemented Dependency Injection (IOC) feature of spring framework to inject dependency into objects.
- Developed data access layer by integrating spring and Hibernate.
- Used Hibernate framework for data persistence. Developed Hibernate objects for persisting data into the database.
- Responsible for developing Hibernate configuration and mapping files for Persistent layer (Object and Relational Mapping).
- Developed Object Oriented JavaScript code and responsible for client-side validations using JQuery
- Extensively used Spring IOC features with spring framework for bean injection and transaction management.
- Used Spring Framework as middle tier application framework, persistence strategy using spring support for Hibernate for integrating with database.
- Involved in designing the application using MVC pattern
- Created JDBC data source and connection pooling for the Application and hibernate mapping files when needed.
- Consumed Restful Web Services to establish communication between different applications
- Implemented Business Services using the Core java and spring.
- Wrote object-oriented JavaScript for transparent presentation of both client- and server-side validation.
Environment: J2EE, MVC, Java, Hibernate, JSON, JQuery, Eclipse, spring, JavaScript