Hadoop Developer Resume El Segundo, CA - Hire IT People

PROFESSIONAL SUMMARY:

Around 8+ years of professional IT experience in the fields of Big Data, BI and Java in Financial, Insurance and Digital Services Industries.
Worked with Big Data distributions like Cloudera (CDH 3 and 4) with Cloudera Manager
Hands - on experience with major components in Hadoop Ecosystem like Map Reduce, HDFS, YARN, Hive, Pig, HBase, Sqoop, Oozie, Cassandra, Impala and Flume.
Experience with new Hadoop 2.0 architecture YARN and developing YARN Applications on it.
Experience with Apache Spark’s Core, Spark SQL, Streaming and MlLib components.
Experience with distributed systems, large-scale non-relational data stores and multi-terabyte data warehouses.
Experienced in developing UDFs for Hive using Java.
Firm grip on data modeling, database performance tuning and NoSQL map-reduce systems.
Responsible for setting up processes for Hadoop based application design and implementation.
Experience in managing HBase database and using it to update/modify the data.
Experience in running MapReduce and Spark jobs over YARN.
Handling data in various file formats such as Sequential, AVRO, RC, Parquet and ORC.
Strong knowledge on the scalability and applications of Spark and its components - Core, SQL and Dataframes.
Hands-on experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
Involved in developing complex ETL transformation & performance tuning.
Extensively worked with Teradata utilities likeBTEQ, Fast Export, Fast Load, Multi Loadto export and load data to/from different source systems including flat files.
Hands on experience using query tools like TeradataSQL Assistant, TOAD, PLSQL developerand Query man.
Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
Experience using middleware architecture using Sun Java technologies like J2EE, JSP, Servlets, and application servers like Web Sphere and Web logic.

TECHNICAL SKILLS:

Big Data: HDFS, MapReduce, Hive, Pig, ZooKeeper, Apache Spark, Core, MlLib, Spark SQL and Dataframes

Languages: C, C++, Java, Python, J2EE, PL/SQL, MR, Pig Latin, HiveQL, Unix shell scripting and Scala

Operating Systems: Sun Solaris, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Databases and Datawarehousing: Teradata, DB2, Oracle 9i/10g/11g, SQL Server, MySQL

Tools: and IDE: Maven, Toad, Eclipse, NetBeans, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

PROFESSIONAL EXPERIENCE:

Confidential, El Segundo, CA

Hadoop Developer

Responsibilities:

Extracted and updated the data into HDFS using Sqoop import and export command line utility interface.
Responsible for developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS.
Develop transformations using custom MapReduce, Pig and Hive
Perform Map side joins in both Pig and Hive
Optimize joins in Hive using techniques such as Sort-Merge join and Map side join
Control parallelism at relational level and script level in Pig
Implement partitioning and bucketing techniques in Hive
Develop script to create external tables and updated partitioning information on a daily basis
Convert MR algorithms into Spark transformations and actions by creating RDDs, pair RDDs
Build reusable Hive UDF libraries for business requirements which enabled users to use these UDFs in Hive querying
Involved in converting Hive/SQL queries into Spark functionality and analyze them using Scala API
Loaded cache data into HBase using Sqoop.
Build Spark Dataframes to process huge amounts of structured data
Use JSON to represent complex data structure within a map reduce job
Store and preprocess the logs and semi structured content on HDFS using MapReduce and import it into Hive warehouse
Develop Pig Latin scripts to extract the data from the web server output files to load into HDFS
Streamline Hadoop jobs and workflow operations using Oozie workflow and scheduled through AutoSys on a monthly basis
Perform data analysis on NoSQL databases such as HBase and Cassandra
Analyzed HBase data in Hive by creating external partitioned and bucketed tables
Perform POC on single member debug on Spark and Hive

Environment: Hadoop 2x, Apache Spark, Spark-SQL, Dataframes, Scala, HDFS, HIVE, Oozie, Autosys, Oracle, Teradata, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, Core Java, Cloudera Hadoop Distribution, PL/SQL, Toad, Windows NT, LINUX

Confidential, River Woods, IL

Hadoop Developer

Responsibilities:

Developed MapReduce jobs in both PIG and Hive for data cleaning and pre-processing
Monitored multiple Hadoop clusters environments using Ganglia, monitored workload, job performance and capacity planning using Cloudera Manager
Monitored multiple Hadoop clusters environments using Ganglia, monitored workload, job performance and capacity planning using Cloudera Manager.
Developed Sqoop scripts for loading data into HDFS from DB2 and preprocessed with PIG
Performed MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
Automated the tasks of loading the data into HDFS and pre-processing with Pig by developing workflows using Oozie
Loaded data from UNIX file system to HDFS and written Hive User Defined Functions
Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, Xml, JSON and Parquet
Created Map Reduce jobs in Python for ad-hoc purposes
Used Sqoop to load data from DB2 to Hbase for faster querying and performance optimization
Worked on streaming to collect this data from Flume and performed real time batch processing
Developed Hive scripts for implementing dynamic partitions
Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using testing library
Collected the logs data from web servers and integrated in to HDFS using Flume
Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and Hbase using Oozie
Performed POCs on Spark test environment
Written ETL jobs to visualize the data and generate reports from MySQL database using DataStage

Environment: Hadoop, HDFS, Hive, Pig, Flume, Python, Hbase, Sqoop, Oozie, DataStage, Linux, Hortonworks Distribution, Relational Databases

Confidential, New York

Big Data Engineer

Responsibilities:

Extracted data from relational databases such as SQL Server and MySql by developing Scala and SQL code
Uploaded it to Hive and combined new tables with existingdatabases
Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, XML, JSON and Parquet
Configured big data workflows to run on the top of Hadoop which comprises of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce
Loaded various formats of structured and unstructured data from Linux file system to HDFS
Used Combiners and Partitioners in MapReduce programming
Written Pig Scripts to ETL the data into NOSQL database for faster analysis
Read from Flume and involved in pushing batches of data to HDFS and HBase for real time processing of the files
Parsing XML data into structured format and loading into HDFS
Scheduled various ETL process and Hive scripts by developing Oozie workflow
Utilized Tableau to visualize the analyzed data and performed report design and delivery
Created POC for Flume implementation
Involved in reviewing both functional and non-functional aspects of the business model
Championed to communicate and present the models to business customers and executives, using the same

Environment: Hadoop, HDFS, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, HIVE, Oozie, Core Java, Hortonworks Distribution, LINUX

Confidential

Business Intelligence/ETL Developer

Responsibilities:

Involved in design & development of operational data source and data marts in Oracle
Reviewed source data and recommend data acquisition and transformation strategy
Involved in conceptual, logical and physical data modeling and used star schema in designing the data warehouse
Designed ETL process using Teradata to load the data from various source databases and flat files to target data warehouse in Oracle
Used Power mart Workflow Manager to design sessions, event wait/raise, and assignment, e-mail, and command to execute mappings
Created parameter based mappings, Router and lookup transformations
Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata
Optimized mappings using transformation features like Aggregator, Filter, Joiner, Expression and Lookups
Created daily and weekly workflows and scheduled to run based on business needs

Environment: Data modeling, SQL Server SSIS, SSRS, Oracle 10g, Teradata 6, XML, TOAD, SQL, PL/SQL, IBM AIX, UNIX Shell Scripts, Web Intelligence, DSBASIC, Cognos, Erwin, STAR team, Remedy, Maestro job scheduler, Mercury Quality Center, Control-M

Confidential

Java Developer

Responsibilities:

Involved in the core product development using J2EE, JSF and Hibernate
Actively involved in the full life cycle Object Oriented application development - ObjectModeling, DatabaseMapping, GUIDesign
Used JavaScript to perform client side validations and Struts-Validator framework for server-side validation
Worked on requirement gathering, high level design and Waterfall model to get best result
Created data access using SQL and PL/SQL stored procedures
Used Hibernate annotations with Java for various stages in the application
Built web services upon SOAP to export and import attachments from file to associated applications
Developed DAO (dataaccess objects) using Spring Framework
Deployed the components in to WebSphere Application server
Used HTML/CSS and JavaScript for UI development
Written Sql queries including Joins, Triggers, Stored procedures, Views using MySql
Implemented the JSPs and EJBs in the JSF Framework to handle the workflow of the application
Developed UnitTest Cases, used JUnit for unit testing of the application

Environment: Java, J2EE, Struts, SQL, JAX RPC, XML, RAD, Websphere, MQ, Agile, JSPS,SOAP

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

El Segundo, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship