We provide IT Staff Augmentation Services!

Big Data Developer Resume

Irving, TX


  • Above 7 years of Experience in Big Data/Hadoop, Data Analysis, Data Modeling professional with applied information Technology.
  • Proficient in Data Modeling / DW/ Bigdata/ Hadoop/ Data Integration/ Master Data Management, Data Migration and Operational Data Store, BI Reporting projects with a deep focus in design, development and deployment of BI and data solutions using custom, open source and off teh shelf BI tools
  • Experienced in Technical consulting and end - to-end delivery with data modeling, data governance and design - development - implementation of solutions.
  • Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, and MongoDB.
  • Expertise in configuring teh monitoring and alerting tools according to teh requirement like AWS CloudWatch.
  • Proficiency in Big Data Practices and Technologies like HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, Kafka.
  • Data Warehousing: Full life-cycle project leadership, business-driven requirements, capacity planning, gathering, feasibility analysis, enterprise and solution architecture, design, construction, data quality, profiling and cleansing, source-target mapping, gap analysis, data integration/ETL, SOA, ODA, data marts, Inman/Kimball methodology, Data Modeling for OLTP, canonical modeling, Dimension Modeling for data ware house star/snowflake design.
  • Experience in BI/DW solution (ETL,OLAP, Data mart), Informatica, BI Reporting tool like Tableau and Qlikview and also experienced leading teh team of application, ETL, BI developers, Testing team .
  • Experience in developing, support and maintenance for teh ETL (Extract, Transform and Load) processes usingTalendIntegration Suite.
  • Worked on Informatica Power Center tools-Designer, Repository Manager, Workflow Manager.
  • Proficiency in multiple databases likeMongoDB, Cassandra, MySQL, ORACLE and MS SQL Server.
  • Extensive use of Talend ELT, database, data set, HBase, Hive, PIG, HDFS and SCOOP components.
  • Experience in Installation, Configuration, and Administration ofInformaticaPower Center 8.x, 9.1 Client/Server.
  • Experienced on Hadoop Ecosystem andBigDatacomponents including Apache Spark, Scala, Python, HDFS, Map Reduce, KAFKA.
  • Expertise in reading and writing data from and to multiple source systems such as oracle, HDFS, XML, delimited files, Excel, Positional and CSV files.
  • Experience in Business Intelligence (BI) project Development and implementation usingMicrostrategy product suits includingMicrostrategyDesktop/Developer, Web, OLAP Services, Administrator and Intelligence server.
  • Logical and physical database designing like Tables, Constraints, Index, etc. using Erwin, ER Studio, TOAD Modeler and SQL Modeler.
  • Good understanding and hands on experience with AWS S3 and EC2.
  • Good experience on programming languages Python, Scala.
  • Experience in Performance tuning ofInformatica(sources, mappings, targets and sessions) and tuning teh SQL queries.
  • Excellent noledge on creating reports on SAP Business Objects, Webi reports for multipledata providers.
  • Created and maintainedUDBDDL for databases, table spaces, tables, views, triggers, and stored procedures. Resolved lock escalations, lock-waits and deadlocks.
  • Experience in working with business intelligence anddatawarehouse software, including SSAS, Pentaho, Cognos Database, Amazon Redshift, or AzureData Warehouse.
  • Worked on Informatica Power Center tools-Designer, Repository Manager, Workflow Manager.
  • Logical and physical database designing like Tables, Constraints, Index, etc. using Erwin, ER Studio, TOAD Modeler and SQL Modeler.
  • Extensive ETL testing experience using Informatica 9x/8x, Talend, Pentaho.
  • Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT & Dimension tables.
  • Experience with relational (3NF) and dimensional data architectures. Experience in leading cross-functional, culturally diverse teams to meet strategic, tactical and operational goals and objectives.
  • Good exposure to BI reportingMicrostrategy8i, 9i & Tableau & SQL programming, RDBMS - Teradata, Oracle, and SQL server.
  • Expertise on Relational Data modeling (3NF) and Dimensional data modeling.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing teh big data as per teh requirement. Practical understanding of teh Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.


Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kafka, Storm and ZooKeeper.

Data Modeling Tools: Erwin, Oracle Designer, ER/Studio.

Cloud Platform: Azure, AWS.

No SQL Databases: HBase, Cassandra, MongoDB.

ETL Tools: Pentaho, Informatica Power 9.6 etc.

Operating Systems: HP-UNIX, RedHat Linux, Ubuntu Linux and Windows.

Web/Application servers: Apache Tomcat, WebLogic, JBoss.

Databases: Cassandra, MongoDB, DB2, SQL Server, MySQL, Teradata, Oracle 9i/10g/11g/12c.

Tools and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer.

Version control: SVN, CVS, GIT.

Web Services: REST, SOAP.

Languages: C, Python, Scala, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, Python.


Confidential, Irving, TX

Big Data Developer


  • Analyzed large and critical datasets using HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Performed Data transformations in HIVE and used partitions, buckets for performance improvements.
  • Developing Spark scripts, UDF's using both Spark DSL and Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop.
  • Designed and developed aData LakeusingHadoopfor processing raw and processed claims viaHiveandInformatica.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
  • Using Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on teh dashboard.
  • Created ETL/Talendjobs both design and code to process data to target databases.
  • Worked with Hadoop infrastructure to storage data in HDFS storage and use Spark / HIVE SQL to migrate underlying SQL codebase in Azure.
  • Experience in testing BigDataHadoop (HDFS, Hive, Sqoop and Flume),MasterDataManagement(MDM) and Tableau Reports.
  • Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on teh data in HDFS.
  • Experienced in loading teh real-time data to NoSQL database like Cassandra.
  • Developing scripts in Pig for transforming data and extensively used event joins, filtered and done pre- aggregations.
  • Performed Data scrubbing and processing with Apache Nifi and for workflow automation and coordination.
  • Used Sqoop to import data into HDFS and Hive from Oracle database.
  • Worked on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
  • Used Talend for Big data Integration using Spark and Hadoop.
  • Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
  • Optimized Hive queries to extract teh customer information from HDFS.
  • Used Polybase for ETL/ELT process withAzureDataWarehouseto keepdatain Blob Storage with almost no limitation ondatavolume.
  • Involved in various phases of development analyzed and developed teh system going through Agile Scrum methodology.
  • Generate metadata, create Talend etl jobs, mappings to load data warehouse, data lake.
  • Used Zookeeper to provide coordination services to teh cluster.
  • Analyzed data using Hive teh partitioned and bucketed data and compute various metrics for reporting.
  • Built Azure Data Warehouse Table Data sets for Power BI Reports.
  • Import data from sources like HDFS/HBase into Spark RDD.
  • Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
  • Working on BI reporting with At Scale OLAP for Big Data.
  • Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL(cassandra).
  • Implemented Kafka for streaming data and filtered, processed teh data.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • CreatedTalendjobs to load data into various Oracle tables. Utilized Oracle stored procedures and wrote fewJava code to capture global map variables and use them in teh job
  • Developed data pipeline using flume, Sqoop and pig to extract teh data from weblogs and store in HDFS.
  • Developed Shell scripts for scheduling and automating teh job flow.
  • Developed a workflow using Nifi to automate teh tasks of loading teh data into HDFS.
  • Developed Map Reduce jobs to calculate teh total usage of data by commercial routers in different locations, developed Map reduce programs for data sorting in HDFS
  • Load balancing of ETL processes, database performance tuning ETL processing tools.
  • Loaded teh data from Teradata to HDFS using Teradata Hadoop connectors.

Environment: Spark, YARN, HIVE, Pig, Scala, Mahout, NiFi, Python, Hadoop, Azure, Dynamo DB, Kibana, NOSQL, Sqoop, MYSQL.

United Airlines, Chicago, IL

Big Data Engineer


  • Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Implemented solutions for ingesting data from various sources and processing teh Data-at-Rest utilizingBig Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive .
  • Implemented Spark GraphX application to analyze guest behavior for data science segments.
  • Exploring with teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Worked on batch processing of data sources using Apache Spark, Elastic search.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Worked on migrating PIG scripts and Map Reduce programs to Spark Data frames API and Spark SQL to improve performance.
  • Developed theTalendmappings using various transformations, Sessions and Workflows. Teradata was teh target database, Source database is a combination of Flat files, Oracle tables, Excel files and Teradata database.
  • Created Hive External tables to stage data and tan move teh data from Staging to main tables.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Created Data Pipelines as per teh business requirements and scheduled it using Oozie Coordinators.
  • Worked with NoSQL database HBase in getting real time data analytics.
  • Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as MapReduce Hive, Pig, and Sqoop.
  • Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
  • Loading data from different source(database & files) into Hive using Talend tool.
  • Conducted POC's for ingesting data using Flume.
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
  • Designed, developed and maintained data integration programs in Hadoop and RDBMS environment with both RDBMS and NoSQL data stores for data access and analysis.
  • Used all major ETL transformations to load teh tables through Informatica mappings.
  • Created Hive queries and tables that halped line of business identify trends by applying strategies on historical data before promoting them to production.
  • Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
  • Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement.
  • UsedTalendto Extract, Transform and Load data into Netezza Data Warehouse from various sources like Oracle and flat files.
  • Developed Pig scripts to parse teh raw data, populate staging tables and store teh refined data in partitioned DB2 tables for Business analysis.
  • Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
  • Conduct/Participate in project team meetings to gather status, discuss issues & action items
  • Provide support for research and resolution of testing issues.
  • Coordinating with Business for UAT sign off.

Environment: Hadoop, Cloudera, Talend, Scala, Spark, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Yarn, NDM, Informatica, AWS, Windows & Microsoft Office.

Confidential, Bronx, NY

Sr. Data Modeler/ Anayst


  • Data Modeler/Analyst in Data Architecture Team and responsible for Conceptual, Logical and Physical model for Supply Chain Project.
  • Created and maintained Logical &Physical Data Models for teh project. Included documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
  • Involved in designing data warehouses and data lakes on regular (Oracle, SQL Server) high performance on big data (Hadoop - Hive and HBase) databases. Data modeling, Design, implement, and deploy high-performance, custom applications at scale on Hadoop /Spark.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy DB2 and SQL Server database systems.
  • LoadeddataintoMDMlanding table forMDMbase loads and Match and Merge.
  • Designed ETL process usingTalendTool to load from Sources to Targets through data Transformations.
  • Translated business requirements into working logical and physical data models for OLTP &OLAP systems.
  • Creation of BTEQ, Fast export, Multi Load, TPump, Fast load scripts for extracting data from various production systems.
  • Reviewed Stored Procedures for reports and wrote test queries against teh source system (SQL Server-SSRS) to match teh results with teh actual report against teh Datamart (Oracle)
  • Owned and managed all changes to teh data models. Created data models, solution designs and data architecture documentation for complex information systems.
  • Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator.
  • Worked on normalization techniques. Normalized teh data into 3rd Normal Form (3NF).
  • Performed gap analysis and dependency analysis for current & future systems.
  • Involved in Extract, Transform and Load (ETL) data from spreadsheets, flat files, database tables and other sources using SQL Server Integration Services (SSIS) and SQL Server Reporting Service (SSRS) for managers and executives.
  • Designed Star Schema Data Models for Enterprise Data Warehouse using Power Designer.
  • Created Mapping documents for Staging, ODS &Data Mart Layers.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for teh new programs.
  • Created Communities, Domains, Assets, hierarchies in Collibra.
  • Spearheaded teh establishment of teh Enterprise Business Glossary, including Business Terms, BT Descriptions, and Business Rules; teh Tiering Criteria, encompassing Tier 1, 2 or 3; and teh Data Linkages between teh Metadata and Lineage documents for Collibra, IDQ, and IMM data governance tools. Integrated process to manage data quality
  • Preparation of business (Collibra) and technical metadata (IBM Infosphere)
  • Worked ondatacleansing and standardization using teh cleanse functions in InformaticaMDM.
  • Working on teh OLAP for data warehouse and data mart developments using Ralph Kimball methodology as well as OLTP models, both and interacting with all teh involved stakeholders and SME's to derive teh solution.
  • Created Use Case Diagrams using UML to define teh functional requirements of teh application.
  • Created teh best fit Physical Data Model based on discussions with DBAs and ETL developers.
  • Identified required dimensions and Facts using Erwin tool for teh Dimensional Model.
  • Implemented ETL techniques for Data Conversion, Data Extraction and Data Mapping for different processes as well as applications.
  • Developed theTalendjobs and make sure to load teh data into HIVE tables & HDFS files and develop theTalendjobs to integrate with Teradata system from HIVE tables
  • Created teh best fit Physical Data Model based on discussions with DBAs and ETL developers.
  • Created conceptual, logical and physical data models, data dictionaries, DDL and DML to deploy and load database table structures in support of system requirements.
  • Designed ER diagrams (Physical and Logical using Erwin) and mapping teh data into database objects.
  • Validated and updated teh appropriate Models to process mappings, screen designs, use cases, business object model, and system object model as they evolved and changed.
  • Created Model reports including Data Dictionary, Business reports.
  • Generated sql scripts and implemented teh relevant databases with related properties from keys, constraints, indexes & sequences.

Environment: OLTP, DBAs, DDL, DML, Erwin, UML, diagrams, Snow-flak schema, SQL, Data Mapping, Metadata, OLTP, SAS, Informatica 9.5


Data Modeler/Analyst


  • Performed as a Data Analysis, Data Modeling, Data Migration and data profiling using complex SQL on various sources systems including Oracle and Teradata.
  • Involved in teh analysis of teh existing credit card processing system, mapping phase according to functionality and data conversion procedure.
  • Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
  • Worked ondatacleansing and standardization using teh cleanse functions in InformaticaMDM.
  • Designed Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN
  • Validated and updated teh appropriate LDM's to process mappings, screen designs, use cases, business object model, and system object model as they evolve and change.
  • Created business requirement documents and integrated teh requirements and underlying platform functionality.
  • Maintained data model and synchronized it with teh changes to teh database.
  • Designed and developed use cases, activity diagrams, and sequence diagrams using UML.
  • Extensively involved in teh modeling and development of Reporting Data Warehousing System
  • Designed teh database tables & created table and column level constraints using teh suggested naming conventions for constraint keys.
  • Used ETL tool BO DS to extract, transform and load data into data warehouses from various sources like relational databases, application systems, temp tables, flat files etc.
  • Developed stored procedures and triggers.
  • Wrote packages, procedures, functions, exceptions using PL/SQL.
  • Reviewed teh database programming for triggers, exceptions, functions, packages, procedures.
  • Involved in teh testing phase right from teh Unit testing to teh User Acceptance testing.
  • Involved with all teh phases of Software Development Life Cycle (SDLC) methodologies throughout teh project life cycle.

Environment: Erwin 4, MS Visio, Oracle 10g, SQL Server 2000, Business Object Data Integrator R2

Hire Now