Bigdata / Teradata Developer Resume
SUMMARY
- IT Professional with 7+ years of experience in - Design, Development, Analysis, Testing, and Operations of software applications including experience working withData Warehouse, Big Data, AWS, Hadoop, Spark, Python, Informatica, Teradata, Oracle and Java/J2EE.
- Strong experience in processing big data in the cloud using AWS - S3, Spark, EMR, AWS Lambda, Snowflake, DynamoDB and Airflow.
- Hands on development experience in using Apache Hadoop/Cloudera ecosystem components.
- Good knowledge in processing of real-time/batch data using Apache PySpark/Spark Streaming, Kafka.
- Experienced with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, SparkSQL, Data Frame, and Pair RDDs.
- Hands on experience in handling different file formats like Text, CSV, JSON, Avro and Parquet.
- Hands on experience in setting up workflows for managing and scheduling using Oozie and Airflow.
- Experience in Unix Shell scripting and Job Scheduling tools like Autosys, Tivoli and Control-M.
- Strong knowledge in Data warehousing concepts like Star Schema and Snowflakes Schemadimensional modeling etc.,
- Experience in Teradata Utilities such as FastLoad, MultiLoad, FastExport, TPT and BTEQ.
- Solid understanding of Object-Oriented Programming (OOPS) concepts, Java, J2EE.
- Sound knowledge of database architecture for OLTP and OLAP applications, Data Analysis and ETL Processes.
- Experienced in writing SQL and PL/SQL Stored Procedures, Triggers and Functions.
- Experienced in MongoDB installation, patching, troubleshooting, performance, tracking/tuning, back - up and recovery in dynamic environments and managing theMongoDB cluster.
- Experience in managing the lifecycle of MongoDB including sizing, automation, monitoring and tuning.
- Hands on experience in Business Intelligence BI tools like Tableau, Power BI.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing.
- Good experience with Use-case development, with Software methodologies like Waterfall and Agile.
- Experience working independently or as part of a team, highly effective at communicating with all levels of management and coworkers and committed to delivering superior quality work.
TECHNICAL SKILLS
Cloud: AWS- S3, EMR, AWS Lambda, CloudFormation, DynamoDB, Snowflake
Hadoop Ecosystem: Hadoop MR, HDFS, YARN, Hive, Sqoop, Impala, Oozie, Kafka, HBase, Kerberos, etc.
Programming Languages: Python, Scala, Java, SQL and Unix Shell scripting
ETL Tools: Informatica Power Center, Airflow, Datastage
Databases: Teradata, Oracle, SQL Server, MySQL, Vertica,MongoDB, Cassandra.
Scheduling Tools: Autosys, Control-M, Tivoli
Java Frameworks: Struts, JSP, Spring, J2EE, Hibernate
Other Tools: IntelliJ, Eclipse IDE, Toad, ERwin, Visio
PROFESSIONAL EXPERIENCE
Confidential
BigData / Teradata Developer
Responsibilities:
- Involved with the Design and Development of ETL process related to benefits and offers data into the data warehouse from different sources.
- Created UNIX scripts to check and validate the files received from the source.
- Developed Informatica Workflows, Mappings, Mapplets and Reusable Transformations to facilitate ETL Process. Worked with the Transformations such as Lookup, Joiner, etc.
- Created and used Parameter Files to provide values to the variables used across sessions and workflows of Informatica.
- Extensively used the Teradata utilities BTEQ, Fastload, Multiload, etc., and created various Teradata Macros in SQL Assistant to serve the analysts.
- Worked with the migration team to move the data and processes to AWScloudenvironment and responsible for building Cloud solutions.
- Involved in the process of data acquisition, analysis and data pre-processing like data cleansing and data transformation various source data types like JSON, Text, and import from RDBMS using Python/Spark.
- Responsible for design & development of SparkSQL Scripts using Python based on Functional Specifications to loaddata to snowflake. Implemented the workflows using Airflow in Python to automatetasks.
- Developed complex business rules using Hive and Impala to transform and store the data in an efficient manner for trend analysis, billing, and business intelligence. Used Hive in the data exploration stage to get insights about the Offers/Benefits. Created Hiveexternaltables on top of valid data sets.
- Integrate Hadoop with Teradata and OracleRDBMS systems by Importing and Exporting data using Sqoop. Written Hive user-defined functions to accomplish critical logic.
- Secured Hadoop cluster using MITKerberos authentication and accessing Hadoop Environment using HUE.
Environment: Informatica Power Center 10.2, Informatica Cloud, Oracle, Teradata, Hadoop, Hive, Hue, HDFS, Kerberos, Sqoop, TDCH, SQL Assistant, Shell Scripting, AWS S3, EMR, Lambda, Python, Spark, Airflow and Red Hat Linux.
Confidential, Washington, DC
ETL Developer
Responsibilities:
- Used Informatica Designer to create complex mappings using different transformations like Source Qualifier, Expression, etc., to meet the business requirements.
- Created and Configured Workflows, Worklets, and Sessions to transport the data to flat files from heterogeneous sources using Informatica Workflow Manager.
- Scheduling the sessions to extract, transform and load data to the warehouse database on Business requirements.
- Created and Configured Workflows, Worklets, and Sessions to transport the data to flat files from heterogeneous sources using Informatica Workflow Manager.
- Developed Generic UNIX scripts that can be used throughout the ETL process. Monitored Sessions, Workflows using Workflow monitor.
- Optimized the performance of the mappings by various tests on sources, targets, and transformations.
- Worked on loading of data to Teradata tables from flat files using MLOAD & FLOAD. Written several Teradata BTEQ scripts to implement the business logic.
- Designed, developed, and enhanced the existing UNIX shell scripts to automate the ETL process and scheduled jobs in Autosys.
- Attended business meetings and analyzed the source data coming from flat files, SQL server and Oracle databases.
- Thoroughly unit tested the ETL code and worked closely with the integration test group.
- Involved in generating various reports using Business Objects.
Environment: HP-UX, Informatica Power Center 9.1, Oracle, Flat files, Teradata, TOAD, UNIX, Business Objects, Shell Scripting and Autosys.
Confidential
Hadoop Developer
Responsibilities:
- Offloaded the existing EDW and used Hadoop for the staging phase in implementing ETL pipeline.
- Imported the OLTP, OLAP, CRM table contents in Hadoop using Sqoop.
- Experienced in implementing different kinds of joins to integrate data from different data sets like Map and reduce side join.
- Worked extensively on creating Hive tables, loading data, writing hive queries, generating partitions, buckets for optimization and performing Aggregations like count, average, sum in Hive and exported them to EDW, providing low latency and frequent querying capability with BI tools.
- Work on fine-tuning the PySpark code for the optimized utilization of Hadoop resources for the production run.
- Developed Python scripts for movement of data across different systems.
- Worked with AutoSys job scheduler to automate Python jobs.
- Automated the data processing with Oozie for loading data into the Hadoop Distributed File System.
Environment: Hadoop, Hive, PySpark, Python, Sqoop, Oozie, Shell Scripting and Autosys.