We provide IT Staff Augmentation Services!

Big Data Developer Resume

Los, AngeleS


  • Over 7 years of total IT experience including 3 years in BigData technologies and 4 years implementing DataWarehousing solutions using IBM DataStage V8 and V11.
  • Extensive experience in using Hadoop eco - system components like HDFS, MapReduce, Oozie, Pig, Hive, Sqoop, Flume, Kafka, Impala, HBase, Zookeeper.
  • Have experience in Apache Spark, Spark SQL and No SQL databases like Cassandra, MongoDB and Hbase.
  • Experience in installing, configuring and maintaining the Hadoop Cluster including YARN configuration using Cloudera, Hortonworks and AWS.
  • Experienced in Integrating Hadoop with Apache Storm and Kafka. Expertise in uploading Click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
  • Expertise on Scala Programming language and Spark Core
  • Experience in benchmarking Hadoop Cluster to tune and obtain the best performance out of it.
  • Good knowledge on S3 Buckets, Dynamo DB, RedShift.
  • Very Good understanding of SQL, ETL and Data Warehousing Technologies
  • Familiar with all stages of Software Development Life Cycle, Issue Tracking, Version Control and Deployment.
  • Extensively worked in writing, tuning and profiling jobs in MapReduce, Advanced MapReduce using Java.
  • Experience in writing Shell-Scripts, Cron Automation, Regular Expressions and MRUnit.
  • Hands on experience in dealing with Compression Codecs like Snappy, BZIP2.
  • Implemented workflows in Oozie using Sqoop, MapReduce, Hive and other Java and Shell actions.
  • Good knowledge of working with Avro and Parquet formats.
  • Excellent knowledge of Data Flow Lifecycle and implementing transformations and analytic solutions.
  • Extending Hive and Pig core functionality by writing Custom UDFs and creating Serdes in Hive.
  • Have sound knowledge on designing data warehousing applications with using Tools like Teradata, Oracle and SQL Server.
  • Experience on using Talend ETL tool.
  • Knowledge of java virtual machines (JVM) and multithreaded processing.
  • Developed Web-Services module for integration using SOAP and REST.
  • Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
  • Used Datastage Version Control to migrate the project from one environment to the other.
  • Experience working with Parallel Extender for Parallel Processing to improve job performance while working with bulk data sources. Worked with most of the parallel stages applying different partitioning techniques.
  • Worked with various databases Oracle 10g/9i/8i/8.0/7.x, DB2, MS Access, SQL Server and Experience in major relational DB platforms.
  • Excellent analytical, problem solving, communication and interpersonal skills, with ability to interact with individuals at all levels.


BigData Technologies: Hadoop, MapReduce, YARN, Pig, Hive, HBase, Sqoop, Scala, Spark, Python, PySpark Mongo-DB with Python, Neo4j, Cassandra.

ETL Tools: IBM Infosphere DataStage 11.3/9.1/8.5/8.1 (Manager, Designer, Director, Administrator), DataStage PX (Parallel Extender), Quality Stage.

Databases: Oracle10g/9i/8i/8.x/7.x,DB 2 9.0/8.0/7.0, MS SQL Server 2005/7.0/6.5, Ms Access 2000, SQL*Plus, SQL*Loader, TOAD 7.0 and Developer 2000

Operating System: Sun Solaris 2.7/2.6, HP-UX 10.2/9.0, IBM AIX 4.3/4.2, Linux, MS DOS 6.22, Win 3.x/95/98/XP, Win NT 4.0, Sun Ultra, HP9000, IBM RS6000, AS400

Programming Skills: UNIX Shell Scripting, SQL, PL/SQL, SQL*Plus 3.3/8.0, Business Intelligence, C, C++, Java, JavaScript, SQL*Loader, VB, ASP, COBOL, HTML, XML.

Datamodeling Tools: Dimensional Data Modeling, Star Join Schema Modeling, Snow-Flake, Modeling, Fact and Dimensions Tables, Physical and Logical Data, Enterprise Database, Integration and Management, Microsoft Visual Studio.

BI Tools: OBIEE 11g, BI Publisher, Tableau 9.2, Cognos 10.x


Confidential, Los Angeles

Big Data Developer


  • Involvement in design, development and testing phases of Software Development Life Cycle.
  • Installed and configured Hadoop Ecosystem components.
  • Used Spark Data Frame API to process Structured and Semi Structured files and load them back into S3 Bucket.
  • Migrated Map reduce jobs to Spark Jobs to achieve better performance
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
  • Imported the data from Oracle source and populated it into HDFS using Sqoop.
  • Developed a streaming data pipeline using Kafka and Storm to store data into HDFS.
  • Implemented a POC with Spark SQL to interpret complex Json records.
  • Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.
  • Developed MapReduce jobs to Convert data files into Parquet file format.
  • Executed Hive queries on Parquet tables to perform data analysis to meet the business requirements.
  • Created table definition and made the contents available as a Schema-BackedRDD.
  • Developed business specific Custom UDF's in Hive, Pig.
  • Exported the aggregated data onto Oracle using Sqoop for reporting on the Tableau dashboard.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it

Environment: HDFS, MapReduce, Kafka, Storm, S3, Parquet, Pig, Hive, Sqoop, Spark, Oracle, Oozie, RedHat Linux, Tableau.

Confidential, Detroit

Big Data Developer


  • Involved in installing cluster and Configuring Hadoop Ecosystem components.
  • Worked with Hadoop administrator in rebalancing blocks and decommissioning nodes in the cluster.
  • Responsible to manage data coming from different sources.
  • Extracted the data onto HDFS using Flume, Kafka.
  • Imported and exported data using Sqoop to load data from RDBMS to HDFS and vice versa, on regular basis.
  • Developed, Monitored and Optimized MapReduce jobs for data cleaning and preprocessing.
  • Built data pipeline using Pig and MapReduce in Java.
  • Implemented MapReduce jobs to write data into Avro format.
  • Automated all the jobs for pulling the data and to load into Hive tables, using Oozie workflows.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Developed custom Serde's specific to the requirement in Hive.
  • Implemented Pattern matching algorithms with Regular Expressions, built profiles using Hive and stored the results in HBase.
  • Used Maven to build the application.
  • Implemented Unit Testing using MRUnit.

Environment: HDP, HDFS, Flume, Kafka, Sqoop, Pig, Hive, MapReduce, HBase, Oozie, MRUnit, Maven, Avro, RedHat Linux, RDBMS.

Confidential, Los Angeles

DataStage Developer


  • Experience in Designing, Compiling, Testing, and Scheduling and Running DataStage jobs.
  • Worked with various techniques such as schema bound views, partitioning, and ETL/Query optimization.
  • Efficient in all phases of the development life cycle, coherent with Data Cleansing, Data Integration, Data Conversion, Performance Tuning.
  • Developed Server jobs for extracting, transforming, integrating and loading data to targets.
  • Expertise in data warehousing techniques like Data cleansing, Slowly Changing Dimension phenomenon and Change Data Capture.
  • Used Datastage Version Control to migrate the project from one environment to the other.
  • Excellent analytical, problem solving, communication and interpersonal skills, with ability to interact with individuals at all levels.
  • Worked with SQL server and created jobs to load data from SQL to Oracle.
  • Involved in Extracting, transforming, loading and testing data from XML files, Flat files, Oracle and DB2 using DataStage jobs.
  • Involved in performance tuning of the DataStage jobs and queries.
  • Written SQL in DB2 for using in DataStage and testing the data.
  • Troubleshooting and performance tuning of ETL jobs
  • Used DataStage Manager for importing metadata from repository, new job categories and creating new data elements.
  • Worked on NDM jobs for Secure file transfers from one server to other server.
  • Used Autosys for scheduling the jobs using autosys scripts.
  • Used Microsoft visual Studio to migrate jobs from one environment to other environment.
  • Used TFS to checkin jobs form DataStage and files from backend and also used DSX generator to convert the files to .dsx for Migration purpose. .
  • Support Hyperion Interactive Reporting from front and backend support.
  • Supported Production instances via Service Manager Tickets.

Environment: IBM Infosphere DataStage 8.5, 9.5 - DataStage Designer, DataStage Director, DataStage Manager, DataStage Administrator, Autosys, Microsoft Visual Studio, Oracle,UNIX Shell Programming.


DataStage Developer


  • Extensively used DataStage Designer to develop various Parallel jobs to extract, cleanse, transform, integrate and load data into Enterprise Data Warehouse tables.
  • Worked with the Business analysts and the DBA for requirements gathering, business analysis, testing, and project coordination.
  • Worked with DataStage Manager to import/export metadata, DataStage Components between the projects.
  • Involved in Design, Source to Target Mappings between sources to operational staging targets, using Star Schema, implemented logic for Slowly Changing Dimensions.
  • Involved in Performance Tuning of Parallel Jobs using Performance Statistics
  • Used Various Standard and Custom Routines in DataStage jobs.
  • Tuned the Parallel jobs for better performance.
  • Responsible for adopting the standards for Stage and Link naming conventions.
  • Created and edited the design specification documents for the jobs
  • Participated in discussions with Team leader, Group Members and Technical Manager regarding any technical and Business Requirement issues.
  • Developed Parameterized reusable Datastage jobs where you can use these jobs in multiple instances.
  • Performed Unit testing for jobs developed to ensure that it meets the requirements.
  • Coordinated with team members at times of change in Business requirements and change in Data Mart Schema.

Environment: DataStage 8.x (Designer, Director, Manager, Administrator), Oracle 8i, PL/SQL, UNIX Shell Programming, Windows NT.

Hire Now