We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

TX

SUMMARY

  • 9+ years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and applications.
  • Highly dedicated and results oriented Hadoop Developer wif 5+ years of strong end - to-end experience on Hadoop Development wif varying level of expertise around different BIGDATA Environment projects and Big Data technologies like MapReduce, YARN, HDFS, Apache Cassandra, HBase, Oozie, Hive, Sqoop, Pig, Zoo Keeper
  • In depth knowledge in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming.
  • Expertise in converting Map Reduce programs into Spark transformations using Spark RDD's.
  • Expertise in Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Experience in implementing Real-Time event processing and analytics using messaging systems like Spark Streaming.
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling, testing and validation and data visualization.
  • Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information wif the halp of RDD.
  • Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Experience wif all flavor of Hadoop distributions, including Cloudera, Hortonworks, Mapr and Apache.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (5.X) distributions and on Amazon web services (AWS).
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Creating Databricks notebooks using SQL, Python and automated notebooks using jobs.
  • Extensive experienced working wif Spark tools like RDD transformations, spark MLlib and spark QL.
  • Expertise in transforming business resources and requirements into manageable data formats and analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
  • Implemented Spring boot microservices to process the messages into the Kafka cluster setup.
  • Hands on experience in writingHadoopJobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Experienced in working wif structured data using HiveQL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
  • Extensive experience in collecting and storing stream data like log data in HDFS using Apache Flume.
  • Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.
  • Implemented the frontend and backend using Jinja, Javascript, Python Flask Pyramid Framework, and Neo4j.
  • Creating Spark clusters and configuring high concurrency clusters using Azure Databricks to speed up the preparation of high-quality data.
  • Experienced in using Pig scripts to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
  • Good Experience wif NoSQL Databases like HBase, MongoDB and Cassandra.
  • Used Confidential Power BI to extract data from external sources and modify data to certain format as required in Excel, and created SSIS packages to load excel sheets from PC to database. Used Power BI Power Pivot to develop data analysis prototype, and used Power View and Power Map to visualize reports.
  • Experience on using Cassandra CQL wif Java APIs to retrieve data from Cassandra tables.
  • An accomplished Hadoop/Spark developer experienced in ingestion, storage, querying, processing and analysis of big data.
  • Expertise in performing real time analytics on big data using HBase and Cassandra.
  • Hands on experience in querying and analyzing data from Cassandra for quick searching, sorting and grouping through CQL.
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
  • Experience working wif MongoDB for distributed storage and processing.
  • Good knowledge and experienced in Extracting files from MongoDB through Sqoop and placed in HDFS and processed.
  • Worked on importing data into HBase using HBase Shell and HBase Client API.
  • Experience in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Experience wif Oozie Workflow Engine in running workflow jobs wif actions that run Java MapReduce and Pig jobs.
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Great hands on experience wifPysparkfor using Spark libraries by using python scripting for data analysis.
  • Integrated custom visuals based on business requirements using Power BI Desktop
  • Provided continued development and maintenance of bug fixes for the existing and new Power BI Reports
  • Implemented data science algorithms like shift detection in critical data points using Spark, doubling the performance.
  • Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling and data visualization wif large data sets of structured and unstructured data
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
  • Extensive experience in working wif various distributions of Hadoop like enterprise versions of Cloudera (CDH4/CDH5), Hortonworks and good knowledge on MAPR distribution, IBM Big Insights and Amazon’s EMR (Elastic MapReduce).
  • Experience in design and develop the POC in Spark using Scala to compare the performance of Spark wif Hive and SQL/Oracle.
  • Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark databricks cluster.
  • Installed, and Documented the Informatica Power Center setup on multiple environments.
  • Developed automated processes for flattening the upstream data from Cassandra which in JSON format. Used Hive UDFs to flatten the JSON Data.
  • Expertise in developing responsive Front-End components wif JavaScript, JSP, HTML, XHTML, Servlets, Ajax, and AngularJS.
  • Experience as a Java Developer in Web/intranet, client/server technologies using Java, J2EE, Servlets, JSP, JSF, EJB, JDBC and SQL.
  • Experience in Data Science and Analytics including Artificial Intelligence/Deep Learning/Machine Learning, Data Mining and Statistical Analysis
  • Good knowledge in working wif scheduling jobs in Hadoop using FIFO, Fair scheduler and Capacity scheduler.
  • Experienced in designing both time driven and data driven automated workflows using Oozie and Zookeeper.
  • Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
  • Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files, XML files, and Databases.
  • Supported various reporting teams and experience wif data visualization tool Tableau.
  • Designed and engineered on-premise to off-premise CI/CD docker pipelines (integration and deployment) wif ECS, Glue, Lambda, ELK, Spark databricks, firehose and kinesis stream.
  • Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and
  • ETL Tools like IBM DataStage, Informatica and Talend.
  • Experienced and in-depth knowledge of cloud integration wif AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Confidential Azure.
  • Detailed understanding of Software Development Life Cycle (SDLC) and strong knowledge in project implementation methodologies like Waterfall and Agile.
  • Develop and implement oozie workflows for Bigdata technologies and schedule them via crontab while monitoring their performance using Hue which is an opensource SQL workbench for DWH.

TECHNICAL SKILLS

Languages: C, C++, Python, R, PL/SQL, Java, HiveQL, Pig Latin, Scala, UNIX shell scripting.

Hadoop Ecosystem: HDFS, YARN, Scala, Map Reduce, Hive, Pig, Zookeeper, Sqoop, Oozie, Bedrock, Flume, Kafka, Impala, NiFi, MongoDB, HBase.

Databases: Oracle, MS-SQL Server, MySQL, PostgreSQL, NoSQL (HBase, Cassandra, MongoDB), Teradata.

Tools: Eclipse, NetBeans, Informatica, IBM DataStage, Talend, Maven, Jenkins.

Hadoop Platforms: Hortonworks, Cloudera, Azure, Amazon Web services (AWS).

Operating Systems: Windows XP/2000/NT, Linux, UNIX.

Amazon Web Services: Redshift, EMR, EC2, S3, RDS, Cloud Search, Data Pipeline, Lambda.

Version Control: GitHub, SVN, CVS.

Packages: MS Office Suite, MS Vision, MS Project Professional.

PROFESSIONAL EXPERIENCE

Confidential, TX

Data Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experience in creating Kafka producer and Kafka consumer for Spark streaming which gets the data from different learning systems of the patients.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Evaluated the performance of Apache Spark in analyzing genomic data.
  • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
  • Designed and engineered on-premise to off-premise CI/CD docker pipelines (integration and deployment) wif ECS, Glue, Lambda, ELK, Spark databricks, firehose and kinesis stream.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Experience in design, development, and maintenance and support of Big Data Analytics using Hadoop Ecosystem components
  • Setup GCP Firewall rules to allow or deny traffic to and from the VM's instances based on specified configuration and used GCP cloud CDN (content delivery network) to deliver content from GCP cache locations drastically improving user experience and latency.
  • Used Confidential Power BI to extract data from external sources and modify data to certain format as required in Excel, and created SSIS packages to load excel sheets from PC to database. Used Power BI Power Pivot to develop data analysis prototype, and used Power View and Power Map to visualize reports.
  • Good experience in analyzing Hadoop cluster and different analytic tools like Pig, Impala.
  • Worked on SnowSQL and Snowpipe Converted Talend Joblets to support the snowflake functionality. Created Snowpipe for continuous data load. Created data sharing between two snowflake accounts. Created internal and external sta and transformed data during load. Redesigned the Views in snowflake to increase the performance.
  • Worked on google cloud platform (GCP) services like compute engine, cloud load balancing, cloud storage, cloud SQL, stack driver monitoring
  • Experienced in managing andreviewingHadooplog files.
  • Developed a job server (REST API, spring boot, ORACLE DB) and job shell for job submission, job profile storage, job data (HDFS) query/monitoring.
  • Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
  • Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
  • Experienced in working wif spark eco system using SparkSQL and Scala queries on different formats like Text file, CSV file.
  • Created concurrent access for Hive tables wif shared and exclusive locking that can be enabled in Hive wif the halp of Zookeeper implementation in the cluster.
  • Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS.
  • Implemented Name Node backup using NFS. This was done for High availability.
  • Spun up HDInsight clusters and used Hadoop ecosystem tools like Kafka, Spark and databricks for real-time analytics streaming, sqoop, pig, hive and CosmosDB for batch jobs.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Created POC using Spark SQL and MLlib libraries.
  • Experienced in managing and reviewing Hadoop log files.
  • Integrated custom visuals based on business requirements using Power BI Desktop
  • Provided continued development and maintenance of bug fixes for the existing and new Power BI Reports
  • Transform data by running a Python activity in Azure Databricks.
  • Configure Project Environment for Flex - Spring Java Communication using BlazeDS Remoting.
  • Used Ajax/json post calls for accessing spring methods.
  • Server-side templating languages such as Jinja2, Mako, were re used in the Technology stack for the development.
  • Worked closely wif EC2 infrastructure teams to troubleshoot complex issues.
  • Worked wif AWS cloud and created EMR clusters wif spark for analyzing raw data processing and access data from S3 buckets.
  • Involved in installing EMR clusters on AWS.
  • Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
  • Designed the NIFI/HBASE pipeline to collect the processed customer data into Hbase tables.
  • Apply Transformation rules on the top of Data Frames.
  • Migrating on-prem ETLs from MS SQL server to Azure Cloud using Azure Data Factory and Databricks.
  • Involved in SQL Server Configuration, Administration, Implementation and Trouble-shooting for Business work. Migrated existing self-service reports and adhoc reports to Power BI.
  • Developed custom calculated measures using DAX in Power BI to satisfy business requirements.
  • Worked wif different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing
  • Developed Hive UDFs and UDAF’s for rating aggregation.
  • Used spring annotations as well as xml configuration for dependency injection and Spring Batch for running batch jobs.
  • Developed java client API for CRUD and analytical Operations by building a restful server and exposing data from No-SQL databases like Cassandra via rest protocol.
  • Created Hive tables and involved in data loading and writing Hive UDFs.
  • Experience in managing and reviewing Hadoop Log files.
  • Worked extensively wif Sqoop to move data from DB2 and Teradata to HDFS.
  • Experienced in developing web-based applications using Python, Django, AWS, Jinja, WSGI, PostgreSQL, Redis, HTML, CSS, JavaScript, JQuery, and XML.
  • Collected the logs data from web servers and integrated in to HDFS using Kafka.
  • Provided ad-hoc queries and data metrics to the Business Users using Hive, Impala.
  • Worked on various performance optimizations like using distributed cache for small datasets, partition, bucketing in hive, map side joins etc.
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run wif time and data availability
  • Extensive experience wif Frameworks in Struts, ORM (Hibernate) and Spring, (SpringMVC, SpringAOP, Spring Context Dependency Injection, SpringJDBC, SpringDAO, Spring ORM, Spring Security and Spring Boot).
  • Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
  • Used Cassandra (CQL) wif Java API's to retrieve data from Cassandra tables.
  • Create Notebooks in Data bricks to pull the data from S3 and process wif the transformation rules and load back the data to persistence area in S3 in Apache Parquet format
  • Worked on analyzing and examining customer behavioral data using Cassandra.
  • Worked on Solr configuration and customizations based on requirements.
  • Indexed documents using Apache Solr.
  • Extensively use Zookeeper as job scheduler for Spark Jobs.
  • Created basic reports using Confidential files as source to fetch the data in Power BI. Designed and developed Power BI graphical and visualization solutions wif business requirement documents and plans for creating interactive dashboards.
  • Creating Databricks notebooks using SQL, Python and automated notebooks using jobs.
  • Worked wif BI teams in generating the reports on Tableau.
  • Developed different REST APIs in Jinja and flask framework wif using python scripting.
  • Web application development using HTML, CSS, SASS, JavaScript, JQuery, AJAX, JSON, Bootstrap and jinja2.
  • Responsible for design and developing Persistence classes using Hibernate, and spring boot data Template frameworks to save data in database tables.
  • Used JIRA for bug tracking and CVS for version control.
  • Built metric reports and identified the formulas and functionality of the dashboard reports and digitizing the metric dashboards wif Power BI service
  • Generated custom and parameterized reports using SSRS. Created various visualizations like waterfall, funnel, matrix visualization, scatter plots, combo charts, gauges, cards and KPI. Knowledge of publishing the Power BI Desktop models to Power BI Service to create highly informative dashboards, collaborate using workspaces, apps, and to get quick insights about datasets. Experience in publishing Power BI Desktop reports created in Report view to the Power BI service.
  • Met wif business/user groups to understand the requirement for new Data Lake Project.
  • Worked in Agile Iterative sessions to create Hadoop Data Lake for the client.
  • Defined the reference architecture for Big Data Hadoop to maintain structured and unstructured data wifin the enterprise.
  • Lead the efforts to develop and deliver the data architecture plan and data models for the multiple data warehouses and data marts attached to the Data Lake Project.
  • Created Talend jobs to copy the files from one server to another and utilized Talend FTP components
  • Used Power Query to acquire data and Power BI desktop for designing rich visuals.
  • Developed Spring Boot based Micro Services & implemented Spring cloud/Netflix API architecture p Confidential erns (Eureka Service discovery, Configuration server).
  • Created Talend jobs to load data into various Oracle tables. Utilized Oracle stored procedures and wrote few Java code to capture global map variables and use them in the job. Used ETL methodologies and best practices to create Talend ETL jobs. Followed and enhanced programming and naming standards.
  • Developed Talend jobs to populate the claims data to data warehouse - star schema.

Environment: Hadoop, MapReduce, HDFS, PIG, Hive, Jinja2, Sqoop, Spring Boot, Oozie, Storm, Kafka, Spark, Spark Streaming, Databricks, Scala, Cassandra, Cloudera, ZooKeeper, AWS, Solr, Power BI, MySQL, Shell Scripting, Java, Tableau.

Confidential, Seattle WA

Big Data developer

Responsibilities:

  • Extensively migrated existing architecture toSpark Streaming to process the live streaming data.
  • Responsible forSparkCore configuration based on type of Input Source.
  • Executed Spark code using Scala forSpark Streaming/SQL for faster processing of data.
  • Performed SQL Joins among Hive tables to get input forSparkbatch process.
  • Gatheird the business requirements from the Business Partners and Subject Matter Experts.
  • Creating Reports in Looker based on Snowflake Connections
  • DevelopedPySparkcode to mimic the transformations performed in the on-premise environment.
  • Analyzed the Sql scripts and designed solutions to implement using pyspark.
  • Used jinja2 if statement for structure control. Used PyChecker for testing code. created custom new columns depending up on the use case while ingesting the data into Hadoop lake using pyspark.
  • Analyze Cassandra database and compare it wif other open-source NoSQL databases to find which one of them better suites the current requirement.
  • Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network.
  • Implemented Spark using Scala and used Pyspark using Python for faster testing and processing of data.
  • Designed multiple Python packages that were used wifin a large ETL process used to load 2TB of data from an existing Oracle database into a new PostgreSQL cluster
  • Involved in converting Hive/Sql queries into Spark transformations using Spark RDD’s.
  • Loading data from Linux file system to HDFS and vice-versa
  • Developed UDF’s using both Data Frames/Sql and RDD in Spark for Data Aggregation queries and reverting back into OLTP through Sqoop.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
  • Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Installed and monitored Hadoop ecosystems tools on multiple operating systems like Ubuntu, CentOS.
  • Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for Data aggregation, queries and writing data back into OLTP system through Sqoop.
  • Extensively use Zookeeper as job scheduler for Spark Jobs.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Built the web application by using Python, Django, AWS, MongoDB, Jinja, WSGI, Fabric, PostgreSQL, and Redis.
  • Experience working wifAzure SQL Database Import and Export Service.
  • Experience indeploying SQL Databases in AZURE.
  • Experience inMoving Data in and out of Windows Azure SQL Databases and Blob Storage.
  • Experience in Azure Synapse of an analytics service that brings together data warehousing and Big Data analytics. Azure Synapse brings these two worlds together wif a unified experience to ingest, prepare, manage, and serve data for BI and machine learning needs.
  • Experience in designing Kafka for multi data center cluster and monitoring it.
  • Designed number of partitions and replication factor for Kafka topics based on business requirements.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
  • Implemented a distributed messaging queue to integrate wif Cassandra using Apache Kafka and Zookeeper.
  • Experience on Kafka and Spark integration for real time data processing.
  • Developed Kafka producer and consumer components for real time data processing.
  • Hands-on experience for setting up Kafka mirror maker for data replication across the cluster’s.
  • Experience in Configure, Design, Implement and monitor Kafka Cluster and connectors.
  • Involved in loading data fromUNIXfile system to HDFS using Shell Scripting.
  • Hands on experience on linux shell scripting.
  • Created Airflow Scheduling scripts in PythonWorked extensively Sqooping wide range of data sets
  • Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
  • Importing and exporting data into HDFS from Oracle database using NiFi.
  • Started using apacheNiFito copy the data from local file system to HDFS.
  • Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
  • Worked wif different file formats like Json, AVRO and parquet.
  • Developed different REST APIs in Jinja and flask framework wif using python scripting.
  • Experienced in using apacheHueandAmbari to manage and monitor the Hadoop clusters.
  • Experienced in using version control systems like SVN, GIT build tool Mavenand continuous integration toolJenkins.
  • Good experience in using Relational databasesOracle, SQL Server andPostgreSQL.
  • Worked wifagile, Scrum and Confidentialsoftware development framework for managing product development.
  • UsingAmbarito monitor node’s health and status of the jobs in Hadoop clusters.
  • ImplementedKerberosfor strong authentication to provide data security.
  • Involved in creatingHivetables, loading and analyzing data using hive queries.
  • Experience in creating dash boards and generating reports usingTableauby connecting to tables in Hive and HBase.
  • CreatedSqoopjobs to populate data present in relational databases to hive tables.
  • Experience in importing and exporting data usingSqoopfromHDFS/Hive/HBaseto Relational Database Systems and vice - versa. Skilled in Datamigration and data generationin Big Data ecosystem.
  • Oracle SQL tuning using explain plan.
  • Manipulate, serialize, model data in multiple forms like JSON, XML.
  • Involved in setting up map reduce 1 and map reduce 2.
  • Prepared Avro schema files for generating Hive tables.
  • Built the web application by using Python, Django, AWS, MongoDB, Jinja, WSGI, Fabric, PostgreSQL, and Redis.
  • Used Impala connectivity from the User Interface(UI) and query the results using ImpalaQL.
  • Worked on physical transformations of data model which involved in creating Tables, Indexes, Joins, Views and Partitions.
  • Involved in Analysis, Design, System architectural design, Process interfaces design, design, documentation.
  • Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
  • Involved in Cassandra Data modelling to create key spaces and tables in multi Data Center DSE Cassandra DB.
  • Utilized Agile and Scrum Methodology to halp manage and organize a team of developers wif regular code review sessions.
  • Load and transform large sets of structured, semi-structured using Hive and Impala wif elastic search
  • Worked closely wif different business teams to gather requirements, prepare functional and technical documentsand UAT Processfor CreatingData quality rules in cosmos & Cosmos Streams.

Environment: Hadoop, HDFS, PIG, Hive, Sqoop, jinja, Oozie, Cloudera, ZooKeeper, Oracle, Shell Scripting, Nifi, Unix, Linux, BigSQL.

Confidential, Dallas TX

Big Data developer

Responsibilities:

  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Migrated existing SQL queries to HiveQL queries to move to big data analytical platform.
  • Integrated Cassandra file system to Hadoop using Map Reduce to perform analytics on Cassandra data.
  • Installed and configured Cassandra DSE multi-node, multi-data center cluster.
  • Developed the application using Struts Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams were used.
  • Participated in requirement gathering and converting the requirements into technical specifications.
  • Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
  • Created Business Logic using Servlets, Session beans and deployed them on Web logic server.
  • Wrote complex SQL queries and stored procedures.
  • Developed the XML Schema and Amazon Web services for the data maintenance and structures.
  • Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and hive.
  • Selecting the appropriate AWS service based upon data,compute, system requirements.
  • Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
  • Experience in creating integration between Hive and HBase for TEMPeffective usage and performed MR Unit testing for the Map Reduce jobs.
  • Got good experience wif NOSQL database like MongoDB.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database.
  • Designed and implemented a 24 node Cassandra cluster for single point inventory application.
  • Analyzed the performance of Cassandra cluster using nodetool TP stats and CFstats for thread analysis and latency analysis.
  • Implemented Real time analytics on Cassandra data using thrift API.
  • Responsible to manage data coming from different sources.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Load and transform large sets datainto HDFS using Hadoop fs commands.
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run wif time and data availability.
  • Implemented UDFS, UDAFS in java and python for hive to process the data that can’t be performed using Hive inbuilt functions.
  • Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate report.
  • Involved in writing optimized Pig Script along wif involved in developing and testing Pig Latin Scripts
  • Supported in setting up updating configurations for implementing scripts wif Pig and Sqoop.
  • Designed the logical and physical data modeling wrote DML scripts for Oracle 9i database.
  • Used Hibernate ORM framework wif Spring framework for data persistence.
  • Wrote test cases in JUnit for unit testing of classes.
  • Involved in templates and screens in HTML and JavaScript.

Environment: Java, HDFS, Cassandra, Map Reduce, Sqoop, JUnit, HTML, JavaScript, Hibernate, Spring, Pig, Hive.

Confidential, Dallas TX

Data Engineer

Responsibilities:

  • Worked wif different types of Subqueries (Correlated, Inline and Nested), Cursors, REF Cursors and loops to suit the business logic.
  • Expertise in using oracle streams for site to site replication.
  • Generated SQL and PL/SQL scripts involving Tables, Views, Primary keys, Indexes, Constraints, Sequences and Synonyms.
  • Used scripting languages like Java and HTML for back end support.
  • Used normalization techniques to fine tune the database to reduce the Data Redundancy and to fix bugs wifin packages and stored procedures using Explain Plan and Dbms output.
  • Mostly involved in code reviewing, debugging and troubleshooting performance issues by using optimization techniques
  • Wrote Unix Shell Scripts to create temporary tables, to schedule batch jobs, diagnostics, automated processes, administration tasks and for deploying the Oracle forms and reports to production servers
  • Worked on Oracle Data Integrator (ODI) as a custom Data warehouse development for customer's information system data.
  • Worked on Oracle OLAP tool to produce analytic measures, including time-series calculations, financial models, and forecasts.
  • Created Complex ETL Packages using SSIS to extract data from staging tables to partitioned tables wif incremental load.
  • Written and edited shell scripts to generate ad hoc reports and to automate the execution of the procedures.
  • Worked wif java developers to repair PL/SQL packages and improve processing time through code optimizations and indexes

Environment: Oracle 9i, Perl Scripting, ODI (Oracle Data Integrator), SQL Integration Services (SSIS), PL/SQL TOAD, SQL*LOADER, Java, HTML, UNIX, MS Access, Putty

Confidential

Data Engineer

Responsibilities:

  • Managed Enterprise and Distributed Data Warehouses in Developmental and Production Environments using SQL2005.
  • Optimizing the current ETL process, identifying and eliminating bottle necks and decreasing data latency.
  • Created synchronized mechanism for data management in SQLServer.
  • Wrote Complex Stored Procedures, Triggers, Views and Queries.
  • Created indexes, Constraints and rules on database objects.
  • Migrated data from Oracle to SQLserver database using ODBC connectivity
  • Optimized schema, performance, and capacity planning of various data transformation processes related to the reports.
  • Export & Import data from Flat file, CSV file to/from SQLServer Database using SSIS.
  • Using UNIX environment for performing the testing
  • Created Database and Database Objects like Tables, Stored Procedures, Views, Triggers, Rules, Defaults,
  • User defined data types and functions.
  • Involved in implementation, maintenance, and monitoring Database Back-ups, Database Space
  • Management, Error Log monitoring, patch application, etc.
  • Troubleshoot SQLServer performance issues for internal and external customers
  • Created Ad-Hoc reports for business users using SSRS
  • Built the tabs wif various charts from show me tool bar like (line chart, pie chart, bar chart).
  • Automated data fetch using UNIX shell script.
  • Writing PL/SQLcode using the technical and functional specifications.

Environment: PL/SQL, SQL Developer, Toad, SQL*Plus, Oracle 10g/11g, Erwin, UNIX.

Confidential

Java Developer

Responsibilities:

  • Involved in all the phases of (SDLC) Software Development Life Cycle including analysis, designing, coding, testing and deployment of the application.
  • Developed Class Diagrams, Sequence Diagrams, State diagrams usingRational Rose.
  • Developed user interface using JSP, JSP Tag libraries JSTL, HTML, CSS, and Java Script to simplify the complexities of the application.
  • Adapted various design patterns like Business Delegate, Data Access Objects, MVC
  • Used Spring framework to implement MVC Architecture.
  • Implemented Layout management usingStruts Tiles Framework.
  • Used the Struts validation Framework in the presentation layer.
  • Used Core Spring framework for Dependency injection.
  • Developed JPA mapping to the Database tables to access the data from the Oracle database.
  • Developed DAO to handle queries using JPA-Hibernate and Transfer objects.
  • Monitored the error logs usingLog4jand fixed the problem.
  • The batch framework made heavy use of XML/XSL transforms.
  • UsedAntscripts to fetch, build and deploy the application to development environment.
  • Used Eclipse IDE for writing code.
  • Creating JUnit test case design logic and implementation throughout application.
  • Extensively used Clear Case for version controlling.

Environment: Java 6, J2EE 5, Hibernate 3.0(JPA), Spring, XML, JSP, JSTL, CSS, JavaScript, HTML, AJAX, JUnit, Oracle 10g, Log4J 1.2.1, Eclipse 3.4, UNIX.

We'd love your feedback!