We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Duluth, GA

SUMMARY:

  • 5+ years of total IT experience which includes Java Development, Web application Development, Database Management and Big Data ecosystem technologies
  • Around 3+ years of Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, Spark, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Cassandra,Hortonworks, Cloudera, Mahout, Avro and Scala.
  • Worked on migrating Map Reduce programs into Spark transformations using PySpark and Scala.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Hands - on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning)
  • Skilled programming in Map-Reduce framework and Hadoop ecosystems.
  • Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
  • Good experience in transferring the data to Solr from MYSQL by using dataimport handler in stand alone mode.
  • Experience in setting up the Hadoop Environment on cloud(AWS,Oracle) and spinning of the new services on Existing nodes.
  • Good experience in sqooping the data to Solr from MYSQL by using Sqoop in Hadoop Environment.
  • Worked on a live 60 nodes Hadoop cluster running Cloudera CDH4
  • Worked with highly unstructured and semi structured data of 90 TB in size (270 TB with replication factor of 3)
  • Implemented Commissioning and Decommissioning of new nodes to existing cluster.
  • Extracted the data from Relational Database (SQL, Oracle, MYSQL) into HDFS using Sqoop.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Developed UDF’s in Java & Python as and when necessary to use in PIG and HIVE queries
  • Experience in using Sequence files, RCFile, Parquet, AVRO and HAR file formats.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process
  • Implemented authentication using Kerberos and authentication using Apache Sentry.
  • Worked with the admin team in designing and upgrading CDH 4 to CDH 5
  • Good knowledge of Amazon Web Service components like EC2, EMR, S3 etc
  • Experience and knowledge NoSQL database like HBase, Cassandra or MongoDB.
  • Good Working knowledge of BI Tools like Tableau, Birst.
  • Extensive experience with SQL, PL/SQL and database concepts
  • Proactive and well organized with effective time management skills and problem solving skills
  • Experience and knowledge in Microservices, Docker container and Linux commands.
  • Experienced in shell scripting to execute the commands and to notify the result to group via IM.
  • Good Interpersonal skills and ability to work as part of a team. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines
  • Participated in requirement analysis, reviews and working sessions to understand the requirements and system design
  • Good Experience in designing Relational Databases and creating ERP diagrams.
  • Trained in Machine Learning concepts in python and NPL
  • Worked in private project on machine learning to improve the seat filling.
  • Good Experience in website scraping and loading data into RDBMS.
  • Advanced knowledge in performance troubleshooting and tuning MYSQL clusters.
  • Very good experience in Query tuning to decrease the latency.
  • Strong knowledge on Oracle Database administration issues (i,e schema, user, security)
  • Experienced in bulk insert, bulk copy, dynamic sql, xml query methods (XML parsing)indexing and optimization and stored procedures
  • Experienced in DBA skills like db resize, encryption, security.
  • Expertise in relational database administration including configuration, implementation, data modeling, maintenance, redundancy/HA, security, troubleshooting/performance tuning, upgrades, database, data and server migrations, SQL Tune and troubleshoot database operations
  • Experienced in capacity planning and disk space management.
  • Experience in Troubleshoot Linux/Unix operating systems for database performance.

TECHNICAL SKILLS:

Operating Systems: Linux (Ubuntu, CentOS), Windows, Mac OS

Hadoop ECO Systems: Hadoop, MapReduce,Solr, Yarn, HDFS, HBase, Impala, Hive, Pig, Sqoop, Oozie, Flume, ZooKeeper, Spark, Scala, Zeppelin

NOSQL Databases: HBase, Cassandra, MongoDB,GraphDB

Programming Languages: C, Python, Scala, Core Java, J2EE (SERVLETS, JSP, JDBC,JAVA BEANS,EJB), C#, ASP.NET

Frameworks: Spring, Hibernate

Documentation: Atlassian-Confluence & JIRA

Repositories: BitBucket

Web Technologies: HTML, CSS, XML, JavaScript, Maven

Scripting Languages: Javascript, UNIX, Python

Databases: Oracle 11g&12c, MS-Access, MySQL, SQL-Server 2000/2005/2008/2012, Teradata

SQL Server Tools: SQL Server Management Studio, Enterprise Manager, Query Analyzer, Profiler, Export & Import (DTS).

IDE: Eclipse, Visual Studio, IDLE, IntelliJ

Web Services: Restful, SOAP

Tools Methodologies: Bugzilla, QuickTest Pro (QTP) 9.2, Selenium, Quality Center, Test Link, TWS, SPSS, SAS, Documentum, Tableau, Mahout Agile, UML, Design Patterns

PROFESSIONAL EXPERIENCE:

Confidential, Duluth, GA

Hadoop Developer

Responsibilities:

  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark.
  • Implemented Spark using Pyspark and SparkSQL for faster testing and processing of data.
  • Imported data from AWS S3 and into Spark DataFrames and performed transformations and actions on DataFrames.
  • Used the JSON and XML SerDe's for serialization and deserialization to load JSON and XML data into Hive tables.
  • Used HBase to store Audit data, which will be used for future analysis.
  • Used Spark for interactive queries, Processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents
  • Optimization of existing algorithms in Hadoop using Sparkcontext, Spark-SQL, Data Frames and Pair RDD's
  • Extensively involved in Design phase and delivered Design documents
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive HBase database and SQOOP
  • Involved in validating the aggregate table based on the rollup process documented in the data mapping. Developed Hive QL, Spark RDD SQL and automated the flow using shell scripting
  • Developed MapReduce programs to parse the raw data and store the refined data in tables.
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Developed database triggers and procedures to update the real-time cash balances.
  • Worked closely with the testing team in creating new test cases and also created the use cases for the module before the testing phase.
  • Coordinated work with DB team, QA team, Business Analysts and Client Reps to complete the client requirements efficiently.
  • Involved in the performance tuning of PL/SQL statements.
  • Involved in fetching brands data from social media applications like Facebook, twitter.
  • Developed and updated social media analytics dashboards on regular basis.
  • Performed data mining investigations to find new insights related to customers.
  • Involved in forecast based on the present results and insights derived from data analysis.
  • Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
  • Manage and review Hadoop log files.
  • Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
  • Involved in identification of topics and trends and building context around that brand.
  • Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g & 12c, Core Java Cloudera HDFS, Eclipse.

Confidential, Mobile, Al

Oracle Developer

Responsibilities:

  • Design and develop Oracle packages, functions, and procedure and Postgres functions.
  • Participated in the analysis phase, preparing the technical specification, development and testing documents.
  • Involved in SQL Query tuning and provided tuning recommendations to ERP jobs, time/CPU consuming queries.
  • Created database objects like tables, views, procedures, packages, and cursor using TOAD, PL SQL developer, SQL developer and SQL navigator.
  • Ensure data warehouse and data mart designs efficiently support BI and end user
  • Collaborate with BI teams to create reporting data structures.
  • Build and maintain Oracle PL/SQL packages and stored procedures (using TOAD and Pl SQL developer), Workflows, and SQR reports as requested by client, for processing and analysis.
  • Used Ref cursors and Collections for accessing complex data through joining many tables.
  • Extensively involved in Performing database, SQL's, API's tuning and optimization.
  • Handled Exception extensively for the ease of debugging and displaying the error messages in the application.
  • Created numerous simple to complex queries involving self joins, correlated sub-queries, functions, cursors, dynamic SQL.
  • Developed Oracle partitions, and performance tuning/query tuning.
  • Experience in creating ETL transformations and jobs using Oracle using dynamic packages and procedures using oracle code.
  • Responsible for Internet Native Banner Development and Self-Service Banner Programming.
  • SQL and PL SQL Development
  • Responsible to create new student,faculty,Administration APIs to enhance registration process,Academic records, Alumni,payment process,to submit course work etc..

Environment: Oracle 10g/11g, Oracle SQL Developer, TOAD, SQL*Loader, FTP, Linux, SQL.

Confidential

Oracle Developer

Responsibilities :

  • Collected business requirements from team and translated them as technical specs and design docs for development.
  • Created Stored Procedures, Triggers, Indexes, User defined Functions, Constraints etc. on various database objects to obtain the required results.
  • Created Complex ETL Packages using Oracle to extract data from staging tables to partitioned tables with incremental load.
  • Developed, monitored and deployed Oracle packages.
  • Responsible for Scheduling Jobs, Alerting and Maintaining Oracle packages.
  • Ensure performance, security, and availability of databases.
  • Worked on optimizing MYSQL Server performance, monitoring and queries tuning
  • Responsible for creating MYSQL user Accounts
  • Responsible to capture long running Queries.

Environment: Oracle 11g, SQL, PL/SQL, TOAD, Oracle Developer, SQL*Loader, SQL*PLUS.

Confidential

Data Analyst

Responsibilities:

  • Responsible to load data from Excel to database by using SQL loader.
  • Worked on creating views on sales, expenditure, transport
  • Responsible to create customized reports in SSRS using SQL Server 2008 R2 with different type of properties like Chart Controls, filters, interactive sorting, SQL Parameters.
  • Added / Changed Tables, columns, triggers, and T-SQL code. Using Indexes, Views, Trigger and Store Procedures for updating and cleaning the existing data and also the new data which was to be added
  • Created SSRS Report Model Projects in BI Studio as well as created, modified and managed various report models with multiple model objects, source fields and expressions.
  • Interacted with business users for gathering requirements for Project Dashboards.
  • Deployed SSRS Reports in SharePoint Sites for web based access for multiple end users
  • Designed and developed Matrix and Tabular reports with Cascaded, Parameterized, and drill down, drill through, Custom Groups reports using SSRS.

Environment: SQL Server 2008/2005 Enterprise Edition, SQL BI Suite (SSAS, SSIS, SSRS), VB Script, Enterprise manager, XML, MS PowerPoint, OLAP, OLTP, MOSS 2007, MS Project, MS Access 2008 & Windows Server 2008, Oracle.

We'd love your feedback!