Big Data/hadoop Developer Resume
Newark, NJ
SUMMARY:
- Have 6+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
- Experience in working in environments using Agile (SCRUM), RUP and Test-Driven development methodologies.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing.
- Extensive experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Hive, Impala & Spark.
- Expertise in using J2EE application servers such as IBM Web Sphere, JBoss and web servers like Apache Tomcat.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
- Experience in analyzing data using HIVEQL, PIG Latin and custom Map Reduce programs in JAVA.
- Extending HIVE and PIG core functionality by using custom UDF's.
- Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, J2EE, EJB, JSP, Java Servlets, Struts, and Java database Connectivity (JDBC) technologies.
- Good understanding in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Implemented Service Oriented Architecture (SOA) using Web Services and JMS (Java Messaging Service).
- Implemented J2EE Design Patterns such as MVC, Session Façade, DAO, DTO, Singleton Pattern, Front Controller and Business Delegate.
- Experienced in developing web services with XML based protocols such as SOAP, Axis, UDDI and WSDL.
- Extensive experience in loading and analyzing large datasets with Hadoop framework (Map Reduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala), NoSQL databases like Mongo DB, HBase, Cassandra.
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Hands on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Good knowledge in SQL and PL/SQL to write Stored Procedures and Functions and writing unit test cases using JUnit.
- Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart.
- Hands on experience working with databases like Oracle 12g, SQL Server 2010 and MySQL.
- Experience in developing web-based enterprise applications using Java, J2EE, Servlets, JSP, EJB, JDBC, Hibernate, Spring IOC, Spring AOP, Spring MVC, Spring Web Flow, Spring Boot, Spring Security, Spring Batch, Spring Integration, Web Services (SOAP and REST) and ORM frameworks like Hibernate.
- Strong knowledge on Hadoop eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
- Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
- Expertise in JavaScript, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls.
- Expertise in using XML related technologies such as XML, DTD, XSD, XPATH, XSLT, DOM, SAX, JAXP, JSON and JAXB.
TECHNICAL SKILLS:
Big Data Technologies: Apache Hadoop, Map Reduce, Hive, Pig, HBase, Sqoop, Spark, HDFS, Apache Kafka, Apache Flume, Apache oozie, Apache Zookeeper
Hadoop Distributions: Cloudera
Development Tools: IntelliJ IDEA, Eclipse
Programming Languages: Scala, Python, Java
Build Tools: Maven, SBT
NOSQL Databases: HBase
Version Control Tools: Git, SVN
Cloud: AWS
Databases: MySQL, Oracle 10g, 11g
Operating Systems: Windows 7/10, Linux (Cent OS, Red hat, Ubuntu), Mac OS
PROFESSIONAL EXPERIENCE:
Confidential, Newark, NJ
Big data/Hadoop Developer
Responsibilities:
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
- Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route 53, SES and SNS in the defined virtual private connection.
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Created Managed tables and External tables in Hive and loaded data from HDFS.
- Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
- Scheduled several times based Oozie workflow by developing Python scripts.
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, Sqoop, package and MySQL.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
- Scheduled map reduces jobs in production environment using Oozie scheduler.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig.
- Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
- Improved the Performance by tuning of HIVE and map reduce.
Environment: HDFS, Map Reduce, Hive, Sqoop, Pig, Flume, Vertica, Oozie Scheduler, Java, Shell Scripts, Teradata, Oracle, HBase, Cassandra, Cloudera, AWS, JavaScript, JSP, Kafka, Spark, Scala and ETL, Python.
Confidential, Minnesota, MN
Big data/Hadoop Developer
Responsibilities:
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Used Spark API over Hortonworks, Hadoop YARN to perform analytics on data in Hive.
- Wrote Hive queries for data analysis to meet the business requirements.
- Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
- Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop MapReduce using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
- Deployed MapReduce and Spark jobs on Amazon Elastic MapReduce using datasets stored on S3.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Used Amazon CloudWatch to monitor and track resources on AWS.
- Knowledge of designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala, Cassandra.
- Real time streaming of data using Spark with Kafka.
- Responsible for developing data pipeline using Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
- Experienced with batch processing of data sources using Apache Spark and Elastic search.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Integrating user data from Cassandra to data in HDFS.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Involved in importing the real-time data to Hadoop using Kafka and implemented Oozie jobs for daily imports.
- Created Hive tables and involved in data loading and writing Hive UDFs.
Environment: CDH4, CDH5, Scala, Spark, Spark Streaming, Spark SQL, HDFS, AWS, Hive, Pig, Linux, Oozie, Hue, Flume, MapReduce, Apache Kafka, Sqoop, Oracle, Shell Scripting and Cassandra, Hortonworks.
Confidential, Irving, TX
Hadoop Developer
Responsibilities:
- Provided application demo to the client by designing and developing a search engine, report analysis trends, application administration prototype screens using AngularJS, and Bootstrap JS.
- Took the ownership of complete application Design of Java part, Hadoop integration
- Assisted with the architect to analyze the existing system and future system Prepared design blue pints and application flow documentation.
- Experienced in managing and reviewing Hadoop log files Load and transform large sets of structured, semi-structured and unstructured data.
- Responsible to manage data coming from different sources and application Supported Map Reduce Programs those are running on the cluster.
- Responsible for working with Message broker system such as Kafka Extracted data from mainframes and feed to KAFKA and ingested to HBase to perform Analytics.
- Created MapReduce jobs to extracts the contents from HBase and configured in OOZIE workflow to generate analytical reports.
- Written MapReduce programs to organize the data and ingest the data to suitable for analytics in client specified format.
- Hands on experience in writing python scripts to optimize the performance Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Involved in writing spark applications using Scala. Hands on experience in creating RDDs, transformations, and Actions while implementing spark applications.
- Good knowledge in creating data frames using Spark SQL. Involved in loading data into Cassandra NoSQL Database.
- Stored the derived the results in HBase from analysis and make it available to data ingestion for SOLR for indexing data.
- Involved in integration of java search UI, SOLR and HDFS Involved in code deployments using continuous integration tool using Jenkins.
- Documented all the challenges, issues involved to deal with the security system and Implemented best practices.
- Created Project structures and configurations according to the project architecture and made it available to the junior developer to continue their work.
- Handled onsite coordinator role to deliver work to offshore Involved in core reviews and application lead supported activities.
Environment: Cassandra, Spring 3.2, Restful services using CXF web services framework, spring data, SOLR 5.2.1, PIG, HIVE, Map Reduce, Sqoop Zookeeper, SVN, windows AD, windows KDC, Hortonworks distribution of Hadoop 2.3, YARN, Ambari
Confidential
Java Developer
Responsibilities:
- Involved in gathering system requirements for the application and worked with the business team to review the requirements and went through the Software Requirement Specification document and Architecture document.
- Involved in intense User Interface (UI) operations and client-side validations using AJAX toolkit.
- Used SOAP to expose company applications as a Web Service to outside clients.
- Log package is used for the debugging.
- Used Clear Case for version control.
- Used Web Services for creating rate summary and used WSDL and SOAP messages for getting insurance plans from the different module and used XML parsers for data retrieval.
- Developed business components and integrated those using Spring features such as Dependency Injection, Auto wiring components such as DAO layers and service proxy layers.
- Used Spring AOP to implement Distributed declarative transaction throughout the application.
- Wrote Hibernate configuration XML files to manage data persistence.
- Used TOAD to generate SQL queries for the applications, and to see the reports from log tables.
- Involved in the migration of Data from Excel, Flat file, Oracle, XML files to SQL Server by using BCP and DTS utility.
Environment: Java/J2EE, MVC Arch with CICS interaction, HTML, Axis, SOAP, Servlets, Web services, Restful Web Services, Sybase, Spring, DB2, RAD, Rational Clear case, WCF, AJAX, Toad.
