We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

Princeton, NJ

SUMMARY:

  • Overall 6+ years of IT experience in a variety of industries, which includes hands on experience in Big Data Analytics and development.
  • Experienced in installing, configuring, and administrating Hadoop applications in multi node cluster using Hortonworks distributions.
  • Expertise with the tools in Hadoop Ecosystem including Hadoop, Yarn, MapReduce, Tez, Sqoop, Spark, Flume Hive, HDFS and Kafka
  • Excellent knowledge on Hadoop Ecosystems such as HDFS, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Experience in importing and exporting data using Sqoop from HDFS/AWS S3 to Relational Database Systems and vice - versa
  • Experience in Infrastructure development and operations involving AWS cloud platforms, EC2, EBS, S3, AWS RDS, AWS Redshift, AWS API Gateway, AWS VPC, AWS IAM Roles and Security groups.
  • Good understanding of Avro, JSON & XML parsers and technologies like XPath, XML Schema.
  • Good experience in Shell Script.
  • Experience in Continuous Integration and related tools (i.e. Jenkins, Maven).
  • Good exposure with Agile software development process.
  • Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
  • Strong Experience of Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
  • Experience in Implementing Rack Topology scripts to the Hadoop Cluster.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.
  • Worked in large and small teams for systems requirement, design & development.
  • Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented.
  • Good working knowledge of Cassandra.
  • Very good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Experience in using various IDEs Eclipse, IntelliJ and repositories SVN and Git.
  • Experience of using build tools Ant, Maven.
  • Preparation of Standard Code guidelines, analysis and testing documentations.
  • Technology and Web based applications.

TECHNICAL SKILLS:

Bigdata/Hadoop Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozie

Machine Learning: NO SQL Databases, HBase, Cassandra, MongoDB

Languages: C, Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Java Script, Shell Scripting.

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB, RESTful

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Cloud Computing Tools: Amazon WBS

Operating Systems: Windows, UNIX, MS DOS, Sun Solaris.

Databases: Microsoft SQL Server, MySQL, Oracle, DB2

Build Tools: Jenkins, Maven, ANT

Business Intelligence Tools: Tableau, Splunk, QlikView

Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ

Development Methodologies: Agile/Scrum, Waterfall Version Control Tools Git, SVN

PROFESSIONAL EXPERIENCE:

Confidential, Princeton, NJ

Hadoop/Spark Developer

Responsibilities:

  • Developed Scala scripts, UDFFs using Data frames/RDD’s/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters Exploring DAG’s, their dependencies and logs using Airflow pipelines for automation.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Developed Spark/Scala, Python,R forregularexpression(regex)project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Worked on R packages to interface with Caffe Deep Learning Framework
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Handled large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked on Cluster of size 130 nodes.
  • Addressed overfitting by implementing of the algorithm regularization methods like L2 and L1.
  • Used Principal Component Analysis in feature engineering to analyze high dimensional data.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau /Spotfire.
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.

Environment: Hadoop YARN, Spark Core, Spark Streaming, Spark SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Talend, Oozie, Jenkins, Cloudera, Oracle 12c, Linux.

Confidential, NEW YORK, NY

Hadoop Developer

Responsibilities:

  • Performed data ETL by collecting, exporting, merging and massaging data from multiple sources and platforms including SSRS/SSIS (SQL Server Integration Services) in SQL Server.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked with cross-functional teams (including data engineer team) to extract data and rapidly execute from MongoDB through MongDB connector for Hadoop.
  • Performed data cleaning and feature selection using MLlib package in PySpark.
  • Consumed the data from Kafka using Apache spark.
  • Used Python to perform ANOVA test to analyze the differences among hotel clusters.
  • Involved in loading data from LINUX file system to HDFS.
  • Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in MySQL databases.
  • Delivered analysis support to hotel recommendation and providing an online A/B test.
  • Designed Tableau bar graphs, scattered plots, and geographical maps to create detailed level summary reports and dashboards.
  • Developed hybrid model to improve the accuracy rate.

Environment: Hadoop, SQL Server, Pig, Apache Hive, Kafka,Apache Spark, Storm, Solr, Shell Scripting, HBase, Python, Kerberos, Agile, Zoo Keeper, Maven, Ambari, Horton Works, MySQL

Confidential

Hadoop Developer

Responsibilities:

  • Loading the data from the different Data sources like into HDFS using Sqoopand load into Hive tables, which are partitioned.
  • Developed pig scripts to transform the data into structured format.
  • Developed Hive queries for Analysis across different banners.
  • Developed Hive UDF's to bring all the customers Email Id into a structured format.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Developed bash scripts to bring the log files from FTP server and then processing it to load into Hive tables. All the bash scripts are scheduled using Resource Manager Scheduler.
  • Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class.
  • Developed Map Reduce programs for applying business rules on the data. Developed and executed hive queries for Denormalizing the data. Worked on analyzing data with Hive and Pig.
  • Worked in Implementing Rack Topology scripts to the Hadoop Cluster.
  • Worked with the admin team in designing and upgrading CDH 3 to CDH 4.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Hadoop, HDFS, Map Reduce, Hive, HBase, Linux, Cluster Management.

Confidential

Java/J2EE Developer

Responsibilities:

  • Developed all the UI using JSP and Spring MVC with client-side validations using JavaScript. Developed the DAO layer using Hibernate. Designed class and sequence diagrams for Enhancements.
  • Developed the user interface presentation screens using HTML, XML, CSS, jQuery. Experience in working with Spring MVC using AOP, DI/IOC. Co-ordinate with the QA leads for development of test plan, test cases, and unit test code.
  • Involved in testing and deployment of the application on Apache Tomcat Application Server during integration and QA testing phase. Involved in building JUNIT test cases for various modules.
  • Maintained the existing code base developed in spring and Hibernate framework by incorporating new features and doing bug fixes.
  • Written Java classes to test UI and Web services through JUnit.
  • Performed functional and integration testing, extensively involved in release/deployment related critical activities.
  • Used SVN for version control. Log4J was used to log both User Interface and Domain Level Messages.
  • Used Soap UI for testing the Web Services.
  • Involved in Application Server Configuration and in Production issues resolution.
  • Wrote SQL queries and Stored Procedures for interacting with the Oracle database.
  • Documentation of common problems prior to go-live and while actively involved in a Production Support role.

Environment: Java, J2EE, JSP , Spring, Hibernate, CSS, JavaScript, Oracle, JBoss, Maven, Eclipse, JUnit, Log4J, AJAX, Web services, JNDI, JMS, HTML, XML, XSD, XML Schema, SVN, Git.

We'd love your feedback!