We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

El Segundo, CA

PROFESSIONAL SUMMARY:

  • Around 8+ years of professional IT experience in the fields of Big Data, BI and Java in Financial, Insurance and Digital Services Industries.
  • Worked with Big Data distributions like Cloudera (CDH 3 and 4) with Cloudera Manager
  • Hands - on experience with major components in Hadoop Ecosystem like Map Reduce, HDFS, YARN, Hive, Pig, HBase, Sqoop, Oozie, Cassandra, Impala and Flume.
  • Experience with new Hadoop 2.0 architecture YARN and developing YARN Applications on it.
  • Experience with Apache Spark’s Core, Spark SQL, Streaming and MlLib components.
  • Experience with distributed systems, large-scale non-relational data stores and multi-terabyte data warehouses.
  • Experienced in developing UDFs for Hive using Java.
  • Firm grip on data modeling, database performance tuning and NoSQL map-reduce systems.
  • Responsible for setting up processes for Hadoop based application design and implementation.
  • Experience in managing HBase database and using it to update/modify the data.
  • Experience in running MapReduce and Spark jobs over YARN.
  • Handling data in various file formats such as Sequential, AVRO, RC, Parquet and ORC.
  • Strong knowledge on the scalability and applications of Spark and its components - Core, SQL and Dataframes.
  • Hands-on experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Involved in developing complex ETL transformation & performance tuning.
  • Extensively worked with Teradata utilities likeBTEQ, Fast Export, Fast Load, Multi Loadto export and load data to/from different source systems including flat files.
  • Hands on experience using query tools like TeradataSQL Assistant, TOAD, PLSQL developerand Query man.
  • Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Experience using middleware architecture using Sun Java technologies like J2EE, JSP, Servlets, and application servers like Web Sphere and Web logic.

TECHNICAL SKILLS:

Big Data: HDFS, MapReduce, Hive, Pig, ZooKeeper, Apache Spark, Core, MlLib, Spark SQL and Dataframes

Languages: C, C++, Java, Python, J2EE, PL/SQL, MR, Pig Latin, HiveQL, Unix shell scripting and Scala

Operating Systems: Sun Solaris, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Databases and Datawarehousing: Teradata, DB2, Oracle 9i/10g/11g, SQL Server, MySQL

Tools: and IDE: Maven, Toad, Eclipse, NetBeans, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

PROFESSIONAL EXPERIENCE:

Confidential, El Segundo, CA

Hadoop Developer

Responsibilities:

  • Extracted and updated the data into HDFS using Sqoop import and export command line utility interface.
  • Responsible for developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS.
  • Develop transformations using custom MapReduce, Pig and Hive
  • Perform Map side joins in both Pig and Hive
  • Optimize joins in Hive using techniques such as Sort-Merge join and Map side join
  • Control parallelism at relational level and script level in Pig
  • Implement partitioning and bucketing techniques in Hive
  • Develop script to create external tables and updated partitioning information on a daily basis
  • Convert MR algorithms into Spark transformations and actions by creating RDDs, pair RDDs
  • Build reusable Hive UDF libraries for business requirements which enabled users to use these UDFs in Hive querying
  • Involved in converting Hive/SQL queries into Spark functionality and analyze them using Scala API
  • Loaded cache data into HBase using Sqoop.
  • Build Spark Dataframes to process huge amounts of structured data
  • Use JSON to represent complex data structure within a map reduce job
  • Store and preprocess the logs and semi structured content on HDFS using MapReduce and import it into Hive warehouse
  • Develop Pig Latin scripts to extract the data from the web server output files to load into HDFS
  • Streamline Hadoop jobs and workflow operations using Oozie workflow and scheduled through AutoSys on a monthly basis
  • Perform data analysis on NoSQL databases such as HBase and Cassandra
  • Analyzed HBase data in Hive by creating external partitioned and bucketed tables
  • Perform POC on single member debug on Spark and Hive

Environment: Hadoop 2x, Apache Spark, Spark-SQL, Dataframes, Scala, HDFS, HIVE, Oozie, Autosys, Oracle, Teradata, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, Core Java, Cloudera Hadoop Distribution, PL/SQL, Toad, Windows NT, LINUX

Confidential, River Woods, IL

Hadoop Developer

Responsibilities:

  • Developed MapReduce jobs in both PIG and Hive for data cleaning and pre-processing
  • Monitored multiple Hadoop clusters environments using Ganglia, monitored workload, job performance and capacity planning using Cloudera Manager
  • Monitored multiple Hadoop clusters environments using Ganglia, monitored workload, job performance and capacity planning using Cloudera Manager.
  • Developed Sqoop scripts for loading data into HDFS from DB2 and preprocessed with PIG
  • Performed MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Automated the tasks of loading the data into HDFS and pre-processing with Pig by developing workflows using Oozie
  • Loaded data from UNIX file system to HDFS and written Hive User Defined Functions
  • Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, Xml, JSON and Parquet
  • Created Map Reduce jobs in Python for ad-hoc purposes
  • Used Sqoop to load data from DB2 to Hbase for faster querying and performance optimization
  • Worked on streaming to collect this data from Flume and performed real time batch processing
  • Developed Hive scripts for implementing dynamic partitions
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using testing library
  • Collected the logs data from web servers and integrated in to HDFS using Flume
  • Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and Hbase using Oozie
  • Performed POCs on Spark test environment
  • Written ETL jobs to visualize the data and generate reports from MySQL database using DataStage

Environment: Hadoop, HDFS, Hive, Pig, Flume, Python, Hbase, Sqoop, Oozie, DataStage, Linux, Hortonworks Distribution, Relational Databases

Confidential, New York

Big Data Engineer

Responsibilities:

  • Extracted data from relational databases such as SQL Server and MySql by developing Scala and SQL code
  • Uploaded it to Hive and combined new tables with existingdatabases
  • Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, XML, JSON and Parquet
  • Configured big data workflows to run on the top of Hadoop which comprises of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce
  • Loaded various formats of structured and unstructured data from Linux file system to HDFS
  • Used Combiners and Partitioners in MapReduce programming
  • Written Pig Scripts to ETL the data into NOSQL database for faster analysis
  • Read from Flume and involved in pushing batches of data to HDFS and HBase for real time processing of the files
  • Parsing XML data into structured format and loading into HDFS
  • Scheduled various ETL process and Hive scripts by developing Oozie workflow
  • Utilized Tableau to visualize the analyzed data and performed report design and delivery
  • Created POC for Flume implementation
  • Involved in reviewing both functional and non-functional aspects of the business model
  • Championed to communicate and present the models to business customers and executives, using the same

Environment: Hadoop, HDFS, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, HIVE, Oozie, Core Java, Hortonworks Distribution, LINUX

Confidential

Business Intelligence/ETL Developer

Responsibilities:

  • Involved in design & development of operational data source and data marts in Oracle
  • Reviewed source data and recommend data acquisition and transformation strategy
  • Involved in conceptual, logical and physical data modeling and used star schema in designing the data warehouse
  • Designed ETL process using Teradata to load the data from various source databases and flat files to target data warehouse in Oracle
  • Used Power mart Workflow Manager to design sessions, event wait/raise, and assignment, e-mail, and command to execute mappings
  • Created parameter based mappings, Router and lookup transformations
  • Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata
  • Optimized mappings using transformation features like Aggregator, Filter, Joiner, Expression and Lookups
  • Created daily and weekly workflows and scheduled to run based on business needs

Environment: Data modeling, SQL Server SSIS, SSRS, Oracle 10g, Teradata 6, XML, TOAD, SQL, PL/SQL, IBM AIX, UNIX Shell Scripts, Web Intelligence, DSBASIC, Cognos, Erwin, STAR team, Remedy, Maestro job scheduler, Mercury Quality Center, Control-M

Confidential

Java Developer

Responsibilities:

  • Involved in the core product development using J2EE, JSF and Hibernate
  • Actively involved in the full life cycle Object Oriented application development - ObjectModeling, DatabaseMapping, GUIDesign
  • Used JavaScript to perform client side validations and Struts-Validator framework for server-side validation
  • Worked on requirement gathering, high level design and Waterfall model to get best result
  • Created data access using SQL and PL/SQL stored procedures
  • Used Hibernate annotations with Java for various stages in the application
  • Built web services upon SOAP to export and import attachments from file to associated applications
  • Developed DAO (dataaccess objects) using Spring Framework
  • Deployed the components in to WebSphere Application server
  • Used HTML/CSS and JavaScript for UI development
  • Written Sql queries including Joins, Triggers, Stored procedures, Views using MySql
  • Implemented the JSPs and EJBs in the JSF Framework to handle the workflow of the application
  • Developed UnitTest Cases, used JUnit for unit testing of the application

Environment: Java, J2EE, Struts, SQL, JAX RPC, XML, RAD, Websphere, MQ, Agile, JSPS,SOAP

We'd love your feedback!