Hadoop Developer Resume
El Segundo, CA
PROFESSIONAL SUMMARY:
- Around 8+ years of professional IT experience in the fields of Big Data, BI and Java in Financial, Insurance and Digital Services Industries.
- Worked with Big Data distributions like Cloudera (CDH 3 and 4) with Cloudera Manager
- Hands - on experience with major components in Hadoop Ecosystem like Map Reduce, HDFS, YARN, Hive, Pig, HBase, Sqoop, Oozie, Cassandra, Impala and Flume.
- Experience with new Hadoop 2.0 architecture YARN and developing YARN Applications on it.
- Experience with Apache Spark’s Core, Spark SQL, Streaming and MlLib components.
- Experience with distributed systems, large-scale non-relational data stores and multi-terabyte data warehouses.
- Experienced in developing UDFs for Hive using Java.
- Firm grip on data modeling, database performance tuning and NoSQL map-reduce systems.
- Responsible for setting up processes for Hadoop based application design and implementation.
- Experience in managing HBase database and using it to update/modify the data.
- Experience in running MapReduce and Spark jobs over YARN.
- Handling data in various file formats such as Sequential, AVRO, RC, Parquet and ORC.
- Strong knowledge on the scalability and applications of Spark and its components - Core, SQL and Dataframes.
- Hands-on experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Involved in developing complex ETL transformation & performance tuning.
- Extensively worked with Teradata utilities likeBTEQ, Fast Export, Fast Load, Multi Loadto export and load data to/from different source systems including flat files.
- Hands on experience using query tools like TeradataSQL Assistant, TOAD, PLSQL developerand Query man.
- Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Experience using middleware architecture using Sun Java technologies like J2EE, JSP, Servlets, and application servers like Web Sphere and Web logic.
TECHNICAL SKILLS:
Big Data: HDFS, MapReduce, Hive, Pig, ZooKeeper, Apache Spark, Core, MlLib, Spark SQL and Dataframes
Languages: C, C++, Java, Python, J2EE, PL/SQL, MR, Pig Latin, HiveQL, Unix shell scripting and Scala
Operating Systems: Sun Solaris, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Databases and Datawarehousing: Teradata, DB2, Oracle 9i/10g/11g, SQL Server, MySQL
Tools: and IDE: Maven, Toad, Eclipse, NetBeans, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer
PROFESSIONAL EXPERIENCE:
Confidential, El Segundo, CA
Hadoop Developer
Responsibilities:
- Extracted and updated the data into HDFS using Sqoop import and export command line utility interface.
- Responsible for developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS.
- Develop transformations using custom MapReduce, Pig and Hive
- Perform Map side joins in both Pig and Hive
- Optimize joins in Hive using techniques such as Sort-Merge join and Map side join
- Control parallelism at relational level and script level in Pig
- Implement partitioning and bucketing techniques in Hive
- Develop script to create external tables and updated partitioning information on a daily basis
- Convert MR algorithms into Spark transformations and actions by creating RDDs, pair RDDs
- Build reusable Hive UDF libraries for business requirements which enabled users to use these UDFs in Hive querying
- Involved in converting Hive/SQL queries into Spark functionality and analyze them using Scala API
- Loaded cache data into HBase using Sqoop.
- Build Spark Dataframes to process huge amounts of structured data
- Use JSON to represent complex data structure within a map reduce job
- Store and preprocess the logs and semi structured content on HDFS using MapReduce and import it into Hive warehouse
- Develop Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Streamline Hadoop jobs and workflow operations using Oozie workflow and scheduled through AutoSys on a monthly basis
- Perform data analysis on NoSQL databases such as HBase and Cassandra
- Analyzed HBase data in Hive by creating external partitioned and bucketed tables
- Perform POC on single member debug on Spark and Hive
Environment: Hadoop 2x, Apache Spark, Spark-SQL, Dataframes, Scala, HDFS, HIVE, Oozie, Autosys, Oracle, Teradata, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, Core Java, Cloudera Hadoop Distribution, PL/SQL, Toad, Windows NT, LINUX
Confidential, River Woods, IL
Hadoop Developer
Responsibilities:
- Developed MapReduce jobs in both PIG and Hive for data cleaning and pre-processing
- Monitored multiple Hadoop clusters environments using Ganglia, monitored workload, job performance and capacity planning using Cloudera Manager
- Monitored multiple Hadoop clusters environments using Ganglia, monitored workload, job performance and capacity planning using Cloudera Manager.
- Developed Sqoop scripts for loading data into HDFS from DB2 and preprocessed with PIG
- Performed MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
- Automated the tasks of loading the data into HDFS and pre-processing with Pig by developing workflows using Oozie
- Loaded data from UNIX file system to HDFS and written Hive User Defined Functions
- Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, Xml, JSON and Parquet
- Created Map Reduce jobs in Python for ad-hoc purposes
- Used Sqoop to load data from DB2 to Hbase for faster querying and performance optimization
- Worked on streaming to collect this data from Flume and performed real time batch processing
- Developed Hive scripts for implementing dynamic partitions
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using testing library
- Collected the logs data from web servers and integrated in to HDFS using Flume
- Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and Hbase using Oozie
- Performed POCs on Spark test environment
- Written ETL jobs to visualize the data and generate reports from MySQL database using DataStage
Environment: Hadoop, HDFS, Hive, Pig, Flume, Python, Hbase, Sqoop, Oozie, DataStage, Linux, Hortonworks Distribution, Relational Databases
Confidential, New York
Big Data Engineer
Responsibilities:
- Extracted data from relational databases such as SQL Server and MySql by developing Scala and SQL code
- Uploaded it to Hive and combined new tables with existingdatabases
- Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, XML, JSON and Parquet
- Configured big data workflows to run on the top of Hadoop which comprises of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce
- Loaded various formats of structured and unstructured data from Linux file system to HDFS
- Used Combiners and Partitioners in MapReduce programming
- Written Pig Scripts to ETL the data into NOSQL database for faster analysis
- Read from Flume and involved in pushing batches of data to HDFS and HBase for real time processing of the files
- Parsing XML data into structured format and loading into HDFS
- Scheduled various ETL process and Hive scripts by developing Oozie workflow
- Utilized Tableau to visualize the analyzed data and performed report design and delivery
- Created POC for Flume implementation
- Involved in reviewing both functional and non-functional aspects of the business model
- Championed to communicate and present the models to business customers and executives, using the same
Environment: Hadoop, HDFS, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, HIVE, Oozie, Core Java, Hortonworks Distribution, LINUX
Confidential
Business Intelligence/ETL Developer
Responsibilities:
- Involved in design & development of operational data source and data marts in Oracle
- Reviewed source data and recommend data acquisition and transformation strategy
- Involved in conceptual, logical and physical data modeling and used star schema in designing the data warehouse
- Designed ETL process using Teradata to load the data from various source databases and flat files to target data warehouse in Oracle
- Used Power mart Workflow Manager to design sessions, event wait/raise, and assignment, e-mail, and command to execute mappings
- Created parameter based mappings, Router and lookup transformations
- Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata
- Optimized mappings using transformation features like Aggregator, Filter, Joiner, Expression and Lookups
- Created daily and weekly workflows and scheduled to run based on business needs
Environment: Data modeling, SQL Server SSIS, SSRS, Oracle 10g, Teradata 6, XML, TOAD, SQL, PL/SQL, IBM AIX, UNIX Shell Scripts, Web Intelligence, DSBASIC, Cognos, Erwin, STAR team, Remedy, Maestro job scheduler, Mercury Quality Center, Control-M
Confidential
Java Developer
Responsibilities:
- Involved in the core product development using J2EE, JSF and Hibernate
- Actively involved in the full life cycle Object Oriented application development - ObjectModeling, DatabaseMapping, GUIDesign
- Used JavaScript to perform client side validations and Struts-Validator framework for server-side validation
- Worked on requirement gathering, high level design and Waterfall model to get best result
- Created data access using SQL and PL/SQL stored procedures
- Used Hibernate annotations with Java for various stages in the application
- Built web services upon SOAP to export and import attachments from file to associated applications
- Developed DAO (dataaccess objects) using Spring Framework
- Deployed the components in to WebSphere Application server
- Used HTML/CSS and JavaScript for UI development
- Written Sql queries including Joins, Triggers, Stored procedures, Views using MySql
- Implemented the JSPs and EJBs in the JSF Framework to handle the workflow of the application
- Developed UnitTest Cases, used JUnit for unit testing of the application
Environment: Java, J2EE, Struts, SQL, JAX RPC, XML, RAD, Websphere, MQ, Agile, JSPS,SOAP
