We provide IT Staff Augmentation Services!

Big Data Hadoop Developer Resume

Pittsburgh, PA


  • 11 years of extensive experience in IT including Big / Hadoop developer , Master - slave architecture designer and hive/pig developer
  • Experience on Bigdata engineering and Analytics using Hadoop working environment includes Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Oozie, and Sqoop.
  • Experienced in processing large datasets of different forms including structured, semi-structured and unstructured data.
  • Expertise in SQL Server 2005/2008 Reporting Services (SSRS), SQL Server 2005 Integration Services (SSIS) in Business Intelligence (BI).
  • Hands on experience with Cloudera and multi cluster nodes on Hortonworks Sandbox.
  • Expertise at designing tables in Hive, Mysql using Sqoop and processing data like importing and exporting of databases to the HDFS.
  • Experienced in working with data architecture including pipeline design of data ingestion, Architecture information of Hadoop, data modeling, machine learning and advanced data processing.
  • Experience optimizing ETL workflows, where data coming from different sources and it is processed.
  • ETL Data extraction, managing, aggressions and loading into HBase.
  • Expertise in developing Pig Latin Script and Hive Query Language.
  • Extensive knowledge about Zookeeper process for Various types of centralized configurations
  • Experienced the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets and Text files.
  • Experience in managing and reviewing Hadoop Log files using Flume and Kafka also developed the Pig UDF's and Hive UDF's to pre-process the data for analysis.
  • Hands on experience with Spark to handle the streaming data.
  • Hands on experience with spring tool suit for development of Scala Applications.
  • Shell Scripting to load the data and process it from various Enterprise Resource Planning (ERP) sources.
  • Hands on experience in writing Pig Latin and Pig Interpreter to run the Map Reduce jobs.
  • Expertise in Hadoop components like Yarn, Pig, Hive, HBase, Flume, Oozie, Shell Scripting like Bash.
  • Good Understanding of Hadoop architecture and the daemons of Hadoop including Name-Node, Data Node, Job Tracker, Task Tracker, Resource Manager.
  • Hands on experience in ingesting data into Data Warehouse using various data loading techniques.
  • Good knowledge on Data Modelling and Data Mining to model the data as per business requirements.
  • Hands on experience with MapReduce, Pig, Programming Model, Installation and Configuration of Hadoop, HBase, Hive, Pig, Sqoop and Flume using Linux commands.
  • Working knowledge of Agile and waterfall development models.


Big Data: Cloudera Distribution, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Storm, Scala, Impala

Operating System: Windows, Linux, Unix.

Languages: Java, J2EE, SQL, PYTHON, Scala

Databases and SQL: MS SQL Server 2012/2008, Oracle 11g (PL/SQL) and MySQL 5.6, MongoDB

Web Technologies: JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT.

Version Tools: GIT, SVN, CVS

IDE: IBM RAD, Eclipse, IntelliJ

Tools: TOAD, SQL Developer, ANT, Log4J

Web Services: WSDL, SOAP.

ETL: Talend ETL, Talend Studio

Web/App Server: UNIX server, Apache Tomcat


Confidential, Pittsburgh, PA

Big Data Hadoop Developer


  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
  • Worked on Hortonworks Data Platform Hadoop distribution for data querying using Hive to store and retrieve data.
  • Implemented Hive optimized joins to gather data from different sources and run ad-hoc queries on them.
  • Performed custom aggregate functions using Spark SQL and performed interactive querying.
  • Co-ordination with Hortonworks, development and the operations team on the platform level issues.
  • Extensively worked on creating combiners, partitioning, distributed cache to improve performance of MapReduce jobs.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
  • Used Sqoop transfer data between databases and HDFS and used Kafka to stream the log data from servers.
  • Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
  • Implemented different analytical algorithms using MapReduce programs to apply on top of HDFS data.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, sharding, replication and schema design.
  • Familiar with MongoDB write concern to avoid loss of data during system failures.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Extensively performed CRUD operations like put, get, scan, delete, update etc., on HBase database.
  • Wrote Hive Generic UDF’s to perform business logic operations at table level.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and preprocessing with Pig, Hive, Sqoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Used Hive join queries to join multiple tables of a source system and load them into Elastic Search Tables.
  • Used Apache Kafka as messaging system to load log data, data from applications into HDFS system.
  • Developed POC using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Involved in converting Hive queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked on various file formats and compression Text, Avro, Parquet file formats, snappy, bz2, gzip compression.
  • Implemented test scripts to support test driven development and continuous integration.
  • Scheduling cron jobs for file system check using fsck and wrote shell scripts to generate alerts.
  • Data scrubbing and processing with Oozie.
  • Provide Technical support for the Research in Information Technology program
  • Manage and upgrade Linux and OS X server systems.
  • Responsible for installation, configurations and management for Linux Systems

Environment: Hadoop, Java, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Python, Spark, Impala, Scala, Kafka, Shell Scripting, Eclipse, Cloudera, MySQL, Talend, Cassandra

Confidential, Conshohocken, PA

Hadoop Developer/Data Engineer


  • Good understanding and related experience with Hadoop stack-internals, Hive, Pig and Map/ Reduce.Experience in setting up of clusters utilizing cloudera manager.
  • Wrote the MapReduce jobs to parse the web logs which are stored in HDFS.
  • Involved in optimizing Hive Queries, joins to get better results for Hive ad-hoc queries.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis, also extracted files from MongoDB through Flume and processed.
  • Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
  • Generated reports from this hive table for visualization purpose.
  • Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
  • Understanding of managed distributions of Hadoop, like Cloudera and Hortonworks.
  • Used Scala to develop Scala coded Spark projects and executed using Spark-submit.
  • Hands on experience in developing Map Reduce programs using Apache Hadoop for analyzing the Big Data.
  • Expertise in implementing ad-hoc Map Reduce programs using Pig Scripts.
  • Experience in importing and exporting data from RDBMS to HDFS, Hive tables and HBase by using Sqoop.
  • Wrote Flume configuration files for importing streaming log data into HBase withFlume.
  • Imported several transactional logs from web servers with Flume to ingest the data into
  • HDFS. Using Flume and Spool directory for loading the data from local system (LFS) to HDFS.
  • Written and Implemented Teradata Fast load, Multiload and Bteq scripts, DML and
  • DDL.
  • Experience in importing streaming data into HDFS using flume sources, and flume sinks and transforming the data using flume interceptors.
  • Knowledge in handling Kafka cluster and created several topologies to support real-time processing requirements.
  • Hands on experience migrating complex map reduce programs into Apache Spark RDD transformations.
  • Design and Programming experience in developing Internet Applications using Java, J2EE, JSP, MVC, Servlets, Struts, Hibernate, JDBC, JSF, EJB, XML, AJAX and web based development tools.

Environment: Hadoop, HDFS, MapReduce, Spark, Pig, Hive, Sqoop, Flume, Kafka, HBase, Oozie, Spark, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera.

Confidential, Charlotte, NC

DW/ BI Developer


  • Designed the Target Schema definition and ETL Jobs using Data stage.
  • Used DS Director to view logs and clears logs and validates the job.
  • Mapping Data Items from Source Systems to the Target System.
  • Tuning the performance of ETL jobs.
  • Involved in creating Stored Procedures, views, tables, constraints.
  • Generated reports from the cubes by connecting to Analysis server from SSRS.
  • Designed and created Report templates, bar graphs and pie charts.
  • Modify and enhance existing SSRS reports.

Environment: MS SQL Server Enterprise 2000, Infosphere Datastage 7.5, Oracle, XML, Unix Shell Script.

Confidential, Wayne, PA

SQL BI Developer


  • Worked with client to understand and analyze business requirements to provide the possible technical solutions.
  • Review and modify software programs to ensure technical accuracy & reliability of programs.
  • Translate business requirements into software applications and models.
  • Worked with database objects such as tables, views, synonyms, sequences and database links as well as custom packages tailored to business requirements.
  • Built complex queries using SQL and wrote stored procedures using PL/SQL.
  • Used Bulk Collections, Indexes, and Materialized Views to improve the query executions.
  • Ensure compliance of standards and conventions in developing programs.
  • Created SQL scripts for conversion of legacy data (including validations) and then load it into the tables.
  • Created complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views, Joins and Other SQL code to implement business rules.
  • Performed System Acceptance Testing (SAT) and User Acceptance Testing (UAT) for databases. Performed unit testing and path testing for application.
  • Analyzed available data from MS Excel, MS Access and SQL server.
  • Involved in implementing the data integrity validation checks.
  • Resolve and troubleshoot complex issues.

Environment: MS SQL Server 2005/2008, Windows 2003/2008, SSIS, SQL Server, Management Studio, SQL Server Business Intelligence studio, SQL Profiler, Microsoft Excel and Access.

Confidential, OH

SQL Developer


  • Accountable for the business definition and integration of source systems with the data warehouse to complete consolidation of vendor funds and required reporting for retail across the enterprise.
  • Researched existing systems and processes, developed / documented business requirements and held meetings with Finance, IT and Retail Operations teams to complete the development schedule.
  • Designed and developed Summary/ Aggregate tables using procedures and views from SQL Server
  • Understanding the Reporting design requirements from the Architects and Business Analyst.
  • Interact with SME’s and End Users for requirement gathering, process improvement, problem identification, project analysis & review meetings and progress reporting.
  • Designed SSIS package for automatic data migration to transform Data from SQL Server 2000 to SQL server 2008.
  • Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS).
  • Created several Dashboards and Scorecards with Key Performance Indicators (KPI) in SQL Server Analysis Services (SSAS)-2008/2005.
  • Well versed in defining/creating and handling Data Sources, Data Source views and parameterized Reports in SSRS 2008.
  • Designing OLAP cubes, Data Modeling, Dashboard Reports, and scorecards according to the business requirement.
  • Served as technical expert guiding choices to implement analytical and reporting solutions for complex financial scenarios.
  • Created/Updated Stored Procedures, Triggers, Functions, Views, and Indexes with extensive use of T-SQL.
  • Design and develop SSIS (ETL) packages for loading data from Oracle and Flat files (3GB) to SQL Server Database.

Environment: SQL Server 2005/2008,SQL BI Suite (SSIS, SSAS, SSRS), Enterprise manager, PPS, XML, MS PowerPoint, MS Project, MS Access 2003 & Windows Server 2003.

Hire Now