- 7+ years of IT experience in software development and support with experience in developing strategic methods for deploying big data technologies specifically Hadoop to efficiently solve Big Data processing requirement.
- 2 years of hands on experience in Hadoop Framework and its ecosystem including but not limited to HadoopMapReduce, HDFS, Hbase, Zookeepeer, Hive, Sqoop, Pig and Flume.
- Hands on experience in writing MapReduce jobs (using Java native code), Pig, Hive for various business use cases.
- Hands on experience in optimizing MapReduce algorithms using Mappers, Reducers, Combiners and Partitioner’s to deliver the best results for the large datasets.
- Familiarity with Hadoop applications (e.g. administration, configuration management, monitoring, debugging, and performance tuning).
- Hands on experience in user defined functions to provide custom Hive and Pig capabilities.
- Working experience in Pig Latin, a Scripting Language for Hadoop Distributed File system. Experience in designing both time driven and data driven automated workflows order to run jobs of HadoopMapReduce and Pig.
- Working experience in setting up SSH, SCP, SFTP connectivity between UNIX hosts and writing HDFS Admin Shell commands.
- Involved in writing custom UDFs by extending Hive and Pig core functionality.
- Hands on experience using Sqoopto import and export data into HDFS from RDBMS and vice - versa.
- Deep understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode and DataNode concepts.
- Expertise in design and development of various web and enterprise applications using various technologies like JSP, Servlets, Struts, Hibernate, Spring, JDBC, XML, AJAX, SOAP and Web Services.
- Experience in writing SQL queries, PL/ SQL code, Packages and Triggers in Oracle and SQL Server.
- Good team player and interested in taking initiative in implementing tasks and sharing knowledge with other team members.
Databases: SQL Server 2008 R2/2012/2014,MS Access, Oracle 9i/10g/11/11g, MySql
NoSQL Databases: HBase, Cassandra, MongoDB
Languages: C++, Java, J2EE, PL/SQL, HiveQL, Shell Scripting
Hadoop Stack: MapReduce, HDFS, Hive, Sqoop, Pig, Hbase, Zookeeper, Flume
ETL & Reporting: SSRS 2005/2008, SSIS 2005/2008
Operating Systems: Windows NT Server, Windows XP/8, UNIX, Red Hat Linux
Version Control Tools: CVS, Tortoise SVN, PVCS
Web Technologies: JSP, Servlet, HTML, Java Script, XML, CSS, AJAX, JQuery
Confidential, Pittsburgh, PA
- Collected the logs data from data warehouse and integrated in to HDFS using Sqoop.
- Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Analyzed the data with Hive Query Language and Pig Latin Script.
- Wrote Hive UDF's to analyze the data and involved in creating Hive tables, loading with data and writing Hive queries which run internally in MapReduce way.
- Developed and optimized Pig and Hive UDFs (User-Defined Functions) to implement the functionality of external languages as and when required.
- Followed Pig and Hive best practices for tuning.
- Involved in Cluster coordination services through Zookeeper and adding new nodes to an existing cluster.
- Supported MapReduce programs those are running on the cluster and developed Java UDF's for operational assistance.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed several shell scripts, which act as wrapper to start these Hadoop jobs and set the configuration parameters.
- Developed workflow in Sqoop to load the data into HDFS and pre-process with Pig.
- Used Compression Techniques (snappy) with file formats to leverage the storage in HDFS.
- Used HBase for random access of data and update at record level.
Environment: MapReduce, HDFS,Sqoop, Linux, Hadoop, Pig, Hive, HBase.
Confidential, Reston, VA
Senior Hadoop Developer
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Completely involved in the requirement analysis phase, and worked on technical specification documents.
- Developed the Sqoop scripts in order to make the interaction between Pig and MySQL Database.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW tables and historical metrics. Tested raw data and executed performance scripts.
- Created Hive tables to store the processed results in a tabular format.
- Involved in writing the script files for processing data and loading to HDFS, and writing CLI commands using HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Managed and reviewed Hadoop log files.
- Responsible for building scalable distributed data solutions using Hadoop.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Worked on different file formats like Text files, Sequence Files, Record columnar files (RC).
Environment: Hadoop 1.21, Hive Query Language, Map Reduce, Sqoop, Pig, HDFS,HBase, Zookeeper, PL/SQL, Flume.
- Used RUP development methodology for the application.
- Worked on the software design and development.
- Involved in creating SRS and design specification documents using UML use cases, sequence diagrams and activity diagrams.
- Installed and configured JCO in the system and created runtime environment for the application to be deployed.
- Used Struts Framework to implement MVC architecture for this application.
- Retrieved the custom made configuration and transaction tables from SAP system to the user interface.
- Used internationalization to make the application available in different locales and regions. Users from different regions can use this application in their native language.
- Generated Crystal Reports to display information corresponding to sales orders.
- Used Rational Clear case for source code configuration management and ClearQuest for bug tracking.
Confidential, Austin, TX
- Involved in designing, developing and configuring server side J2EE components.
- Responsible for communication architecture of JSPs with Controllers and EJBs with other components.
- Involved in designing UML class diagrams, activity diagrams and sequence diagrams.
- Used Apache Beehive Framework to implement Page flow for the application in Weblogic portal environment.
- Implemented controllers, forms,validations using Apache Beehive Framework.
- Involved in creation of user interfaces using Html, JSP and NetUItaglibraries.
- Implemented Factory, Service Controller, and Business Delegate and Session Façade design patterns.
- Worked on REST webservices to invoke VAS components.
- Used JAX-RS API to implement the REST webservices.
- Developed market place product listing to list all products based on the product categories and functional areas.
- Developed application level logging using Log4j.
- Used CVS for source code configuration management.
- Performed code review and unit testing.