We provide IT Staff Augmentation Services!

Big Data Developer Minneapolis, Mn Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • 5+ years of IT experience in analysis, design, implementation & administration of business application systems for Telecom and Retail Sectors.
  • Experience in ETL (Extract Transform Load), Data Integration and Data Warehousing using Informatica, Teradata and Oracle technologies.
  • Expertise in design and development of SSIS (ETL) packages.
  • Loaded streaming log data from various webservers into HDFS using Flume.
  • Experience in loading files into Hive and HDFS from Oracle and SQL Server using SQOOP.
  • Extensive experience in Requirement Analysis, Work flow analysis, Design, Development & Implementation, Testing & Deployment of complete software development life cycle (SDLC) in field of Desktop Applications, Microsoft web and client/server technologies.
  • Experienced in major Hadoop ecosystem's projects such as MapReduce, HIVE, PIG, HBase, SQOOP, SPARK and OOZIE with Cloudera Manager.
  • Good Understanding of Hadoop architecture and the daemons of Hadoop including Name - Node, Data Node, Job Tracker, Task Tracker, Resource Manager, Name Manager, Application Master.
  • Expertise in implementing complex business rules by creating Robust Mappings, Reusable Transformations using Transformations like Unconnected Look Up, Connected Look Up, Joiner, Router, Expression, Aggregator, Filter, Update Strategy etc.
  • Experience working with Teradata Parallel Transporter (TPT), BTEQ, Fast load, Multiload, TPT, SQL Assistant, DDL and DML commands.
  • Handle the TEXT, JSON, XML, AVRO, Sequence file, Parquet Log data using Hive (SERDE), Pig and filter the data based on query factor.
  • Proficient in Teradata EXPLAIN plans, collect Stats option, Primary Indexes (PI, NUPI), Secondary Indexes (USI, NUSI), Partition Primary Index (PPI), Join Indexes (JI), Volatile, global temporary, derived tables etc.
  • Extensively used different features of Teradata such as BTEQ, Fast load, Multiload, SQL Assistant, DDL and DML commands.
  • Experiences with push down optimization in Informatica.
  • Hands on experience with HP Vertica SQL Analytics, Loading and exporting data
  • Experience in Oracle supplied packages,Dynamic SQL, Records and PL/SQL Tables.
  • Expertise in creating indexed tables with primary key, foreign key and composite key constraints to maintain referential integrity and to maintain performance
  • Hands on experience in the ETL processes using SQL Server Integration Services (SSIS) Bulk Copy Program (BCP) and Data Transformation Services (DTS)
  • Develop and implement ETL routines according to data warehouse design and architecture.
  • Proficient in extraction, transformation, cleansing, and loading data from heterogeneous data sources of Oracle, MS Access and flat files
  • Integrator of relational databases like SQL Server, ORACLE, Microsoft Access and Excel spreadsheets.

PROFESSIONAL EXPERIENCE

Confidential, Minneapolis, MN

BigData Developer

Responsibilities:

  • Used Python for pattern matching in build logs to format errors and warnings.
  • Implemented Spark-SQL with various data sources like JSON , Parquet, ORC and Hive.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Used Flume for Collecting, aggregating and loading log data from multiple sources into HDFS .
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Created hive tables defined with appropriate static and dynamic partitions, intended for efficiency and worked on them using HIVE SQL
  • Leveraged AWS S3 as storage layer for HDFS.
  • Configured Zookeeper to coordinate and support the distributed applications as it offers high throughput and availability with low latency.
  • Involving in the ETL process using the Spark, Python on S3 hive tables.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Encoded and decoded json objects using PySpark to create and modify the dataframes in Apache Spark
  • Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed
  • Created Hive tables on HDFS to store the data processed by Apache Spark on the Cloudera Hadoop Cluster in Parquet format.

Environment: HDFS, Hive, Spark, SQL, Cloudera Manager, Sqoop, Zookeeper,Flume, Oozie, Java (jdk 1.6), Eclipse

Confidential, OR

Big Data Developer

Responsibilities:

  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Involved in writing Pig Scripts for Cleansing the data and implemented Hive tables for the processed data in tabular format.
  • Involved in running Hadoop streaming jobs to process terabytes of XML format data. Participated in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
  • Involved in Sqoop, HDFS Put or Copy from Local to ingest data.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
  • Completely involved in the requirement analysis phase.
  • Implemented test scripts to support test-driven development and continuous integration.
  • Implemented SQL, PL/SQL Stored Procedures.
  • Involved in developing Shell scripts to orchestrate the execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
  • Serializing JSON data and storing the data into tables using Hive.
  • Schema definition for JSON File for multi-nested Using HIVE-SERDE.
  • Hive Data sampling, Buckets and Cluster methods for a schema.
  • Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
  • Actively updated the upper management with daily updates on the progress of a project that include the classification levels that were achieved on the data.

Environment: Hadoop, MapReduce, NoSql, Hive, Pig, Sqoop, Core Java, HDFS, Eclipse.

Confidential, Dallas, TX

TeraData Developer

Responsibilities:

  • Understand the structure of data, build data architecture and implement data model in Vertica, and carry out data mapping from legacy system to Vertica.
  • Solely responsible for development and implementation of vSQL scripts which include copy, projections, segmentation, encoding, partitions, merge and functions.
  • Fine tune performance of Vertica queries by using projection segmentation, various encoding, compression techniques.
  • Developed Teradata FastLoad, Multiload, BTEQ, TPT, Fast Export scripts to load Landing zone files into Stage and then to Target.
  • Created Teradata DML scripts and executed them with BTEQ in Unix scripts
  • Developed complex UNIX shell scripts to handle the data load and extraction process in vertica system.
  • Actively participated in scrum calls for project planning and tracking daily status.
  • Created Incremental Data Load from various sources into SQL Server Database Engine with ETL Operations.
  • Designed, Developed and Build Informatica Power Center Mappings and workflows using Teradata External Loaders.
  • Extracted data from various source systems like Oracle, Sql Server and flat files as per the requirements.
  • Development of scripts for loading the data into the base tables in EDW and to load the data from source to staging and staging area to target tables using Fast Load, Multiload and BTEQ utilities of Teradata.
  • Writing scripts for data cleansing, data validation, data transformation for the data coming from different source systems.
  • Reviewed the SQL for missing joins & join constraints, data format issues, miss-matched aliases, casting errors.
  • Developed procedures to populate the customer data warehouse with transaction data, cycle and monthly summary data, and historical data.
  • Dealt with initials, delta and Incremental data as well Migration data to load into the Teradata.
  • Used extensively Derived Tables, Volatile Table and GTT tables in many of the ETL scripts.
  • Created Python scripts for migrating from Teradata to Vertica
  • Flat files are loaded into databases using Fast Load and then used in the queries to do joins.

Environment: Informatica 8.6, Teradata, MS SQL Server, BTEQ, FLOAD, MLOAD, FastExport, Tpump, HP Vertica, VSQL, Mercury Quality Center 9.0, Shell Scripts (Bash, Korn), Teradata SQL Assistant 13.0, TOAD, Notepad++, Winscp.

Confidential, Bentonville, AR

BI Developer

Responsibilities:

  • Created ETL packages using SSIS t o extract data from different data sources, reformats the data and load the reformatted data into destination tables
  • Created Incremental Data Load from various sources into SQL Server Database Engine with ETL Operations
  • Responsible for transferring the data using SSIS packages from excel/flat file sources to database.
  • Responsible for creating, scheduling and Jobs.
  • Developed Stored Procedures, Functions, Tables, Views and other T-SQL code and SQL joins for applications.
  • Published migrated data using SSIS (SQL Server Integration Services) through data flow tasks, look up transformations and script tasks.
  • Experience in SSIS script task, look up transformations and data flow tasks using T- SQL and Visual Basic (VB) scripts.
  • Back end Experience in MS SQL Server development, writing T-SQL stored procedures, views, triggers, cursors, UDFs.
  • Created SSIS packages, cleaning data and loading data from various sources.
  • Worked on various tasks and transformations like Execute SQL Task, Execute Package Task, and Conditional split, Script Component, Merge and Lookup while loading the data into Destination.
  • Involved in populating data, generating reports using Reporting Services (SSRS) and deploying them on the server.
  • Involved in generating Parameterized Tabular and Matrix Reports on data sets imported from SQL server Database Engine and also Analysis Services cubes as well as migrating reports to SQL Reporting Services from Crystal Reports and Microsoft Access.
  • Troubleshooting, supporting, monitoring production and test environments .

Environment: MS SQL Server 2008/2008R2/2012, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), SSAS, SharePoint, JIRA, Bit bucket, Source Tree, Confluence.

We'd love your feedback!