Big Data Developer Resume
Los, AngeleS
PROFESSIONAL SUMMARY:
- Over 7 years of total IT experience including 3 years in BigData technologies and 4 years implementing DataWarehousing solutions using IBM DataStage V8 and V11.
- Extensive experience in using Hadoop eco - system components like HDFS, MapReduce, Oozie, Pig, Hive, Sqoop, Flume, Kafka, Impala, HBase, Zookeeper.
- Have experience in Apache Spark, Spark SQL and No SQL databases like Cassandra, MongoDB and Hbase.
- Experience in installing, configuring and maintaining the Hadoop Cluster including YARN configuration using Cloudera, Hortonworks and AWS.
- Experienced in Integrating Hadoop with Apache Storm and Kafka. Expertise in uploading Click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Expertise on Scala Programming language and Spark Core
- Experience in benchmarking Hadoop Cluster to tune and obtain the best performance out of it.
- Good knowledge on S3 Buckets, Dynamo DB, RedShift.
- Very Good understanding of SQL, ETL and Data Warehousing Technologies
- Familiar with all stages of Software Development Life Cycle, Issue Tracking, Version Control and Deployment.
- Extensively worked in writing, tuning and profiling jobs in MapReduce, Advanced MapReduce using Java.
- Experience in writing Shell-Scripts, Cron Automation, Regular Expressions and MRUnit.
- Hands on experience in dealing with Compression Codecs like Snappy, BZIP2.
- Implemented workflows in Oozie using Sqoop, MapReduce, Hive and other Java and Shell actions.
- Good knowledge of working with Avro and Parquet formats.
- Excellent knowledge of Data Flow Lifecycle and implementing transformations and analytic solutions.
- Extending Hive and Pig core functionality by writing Custom UDFs and creating Serdes in Hive.
- Have sound knowledge on designing data warehousing applications with using Tools like Teradata, Oracle and SQL Server.
- Experience on using Talend ETL tool.
- Knowledge of java virtual machines (JVM) and multithreaded processing.
- Developed Web-Services module for integration using SOAP and REST.
- Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
- Used Datastage Version Control to migrate the project from one environment to the other.
- Experience working with Parallel Extender for Parallel Processing to improve job performance while working with bulk data sources. Worked with most of the parallel stages applying different partitioning techniques.
- Worked with various databases Oracle 10g/9i/8i/8.0/7.x, DB2, MS Access, SQL Server and Experience in major relational DB platforms.
- Excellent analytical, problem solving, communication and interpersonal skills, with ability to interact with individuals at all levels.
TECHNICAL SKILLS:
BigData Technologies: Hadoop, MapReduce, YARN, Pig, Hive, HBase, Sqoop, Scala, Spark, Python, PySpark Mongo-DB with Python, Neo4j, Cassandra.
ETL Tools: IBM Infosphere DataStage 11.3/9.1/8.5/8.1 (Manager, Designer, Director, Administrator), DataStage PX (Parallel Extender), Quality Stage.
Databases: Oracle10g/9i/8i/8.x/7.x,DB 2 9.0/8.0/7.0, MS SQL Server 2005/7.0/6.5, Ms Access 2000, SQL*Plus, SQL*Loader, TOAD 7.0 and Developer 2000
Operating System: Sun Solaris 2.7/2.6, HP-UX 10.2/9.0, IBM AIX 4.3/4.2, Linux, MS DOS 6.22, Win 3.x/95/98/XP, Win NT 4.0, Sun Ultra, HP9000, IBM RS6000, AS400
Programming Skills: UNIX Shell Scripting, SQL, PL/SQL, SQL*Plus 3.3/8.0, Business Intelligence, C, C++, Java, JavaScript, SQL*Loader, VB, ASP, COBOL, HTML, XML.
Datamodeling Tools: Dimensional Data Modeling, Star Join Schema Modeling, Snow-Flake, Modeling, Fact and Dimensions Tables, Physical and Logical Data, Enterprise Database, Integration and Management, Microsoft Visual Studio.
BI Tools: OBIEE 11g, BI Publisher, Tableau 9.2, Cognos 10.x
PROFESSIONAL EXPERIENCE:
Confidential, Los Angeles
Big Data Developer
Responsibilities:
- Involvement in design, development and testing phases of Software Development Life Cycle.
- Installed and configured Hadoop Ecosystem components.
- Used Spark Data Frame API to process Structured and Semi Structured files and load them back into S3 Bucket.
- Migrated Map reduce jobs to Spark Jobs to achieve better performance
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
- Imported the data from Oracle source and populated it into HDFS using Sqoop.
- Developed a streaming data pipeline using Kafka and Storm to store data into HDFS.
- Implemented a POC with Spark SQL to interpret complex Json records.
- Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.
- Developed MapReduce jobs to Convert data files into Parquet file format.
- Executed Hive queries on Parquet tables to perform data analysis to meet the business requirements.
- Created table definition and made the contents available as a Schema-BackedRDD.
- Developed business specific Custom UDF's in Hive, Pig.
- Exported the aggregated data onto Oracle using Sqoop for reporting on the Tableau dashboard.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it
Environment: HDFS, MapReduce, Kafka, Storm, S3, Parquet, Pig, Hive, Sqoop, Spark, Oracle, Oozie, RedHat Linux, Tableau.
Confidential, Detroit
Big Data Developer
Responsibilities:
- Involved in installing cluster and Configuring Hadoop Ecosystem components.
- Worked with Hadoop administrator in rebalancing blocks and decommissioning nodes in the cluster.
- Responsible to manage data coming from different sources.
- Extracted the data onto HDFS using Flume, Kafka.
- Imported and exported data using Sqoop to load data from RDBMS to HDFS and vice versa, on regular basis.
- Developed, Monitored and Optimized MapReduce jobs for data cleaning and preprocessing.
- Built data pipeline using Pig and MapReduce in Java.
- Implemented MapReduce jobs to write data into Avro format.
- Automated all the jobs for pulling the data and to load into Hive tables, using Oozie workflows.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Developed custom Serde's specific to the requirement in Hive.
- Implemented Pattern matching algorithms with Regular Expressions, built profiles using Hive and stored the results in HBase.
- Used Maven to build the application.
- Implemented Unit Testing using MRUnit.
Environment: HDP, HDFS, Flume, Kafka, Sqoop, Pig, Hive, MapReduce, HBase, Oozie, MRUnit, Maven, Avro, RedHat Linux, RDBMS.
Confidential, Los Angeles
DataStage Developer
Responsibilities:
- Experience in Designing, Compiling, Testing, and Scheduling and Running DataStage jobs.
- Worked with various techniques such as schema bound views, partitioning, and ETL/Query optimization.
- Efficient in all phases of the development life cycle, coherent with Data Cleansing, Data Integration, Data Conversion, Performance Tuning.
- Developed Server jobs for extracting, transforming, integrating and loading data to targets.
- Expertise in data warehousing techniques like Data cleansing, Slowly Changing Dimension phenomenon and Change Data Capture.
- Used Datastage Version Control to migrate the project from one environment to the other.
- Excellent analytical, problem solving, communication and interpersonal skills, with ability to interact with individuals at all levels.
- Worked with SQL server and created jobs to load data from SQL to Oracle.
- Involved in Extracting, transforming, loading and testing data from XML files, Flat files, Oracle and DB2 using DataStage jobs.
- Involved in performance tuning of the DataStage jobs and queries.
- Written SQL in DB2 for using in DataStage and testing the data.
- Troubleshooting and performance tuning of ETL jobs
- Used DataStage Manager for importing metadata from repository, new job categories and creating new data elements.
- Worked on NDM jobs for Secure file transfers from one server to other server.
- Used Autosys for scheduling the jobs using autosys scripts.
- Used Microsoft visual Studio to migrate jobs from one environment to other environment.
- Used TFS to checkin jobs form DataStage and files from backend and also used DSX generator to convert the files to .dsx for Migration purpose. .
- Support Hyperion Interactive Reporting from front and backend support.
- Supported Production instances via Service Manager Tickets.
Environment: IBM Infosphere DataStage 8.5, 9.5 - DataStage Designer, DataStage Director, DataStage Manager, DataStage Administrator, Autosys, Microsoft Visual Studio, Oracle,UNIX Shell Programming.
Confidential
DataStage Developer
Responsibilities:
- Extensively used DataStage Designer to develop various Parallel jobs to extract, cleanse, transform, integrate and load data into Enterprise Data Warehouse tables.
- Worked with the Business analysts and the DBA for requirements gathering, business analysis, testing, and project coordination.
- Worked with DataStage Manager to import/export metadata, DataStage Components between the projects.
- Involved in Design, Source to Target Mappings between sources to operational staging targets, using Star Schema, implemented logic for Slowly Changing Dimensions.
- Involved in Performance Tuning of Parallel Jobs using Performance Statistics
- Used Various Standard and Custom Routines in DataStage jobs.
- Tuned the Parallel jobs for better performance.
- Responsible for adopting the standards for Stage and Link naming conventions.
- Created and edited the design specification documents for the jobs
- Participated in discussions with Team leader, Group Members and Technical Manager regarding any technical and Business Requirement issues.
- Developed Parameterized reusable Datastage jobs where you can use these jobs in multiple instances.
- Performed Unit testing for jobs developed to ensure that it meets the requirements.
- Coordinated with team members at times of change in Business requirements and change in Data Mart Schema.
Environment: DataStage 8.x (Designer, Director, Manager, Administrator), Oracle 8i, PL/SQL, UNIX Shell Programming, Windows NT.