Big Data Engineer Resume
Allen, TX
SUMMARY:
- Over all 4+ years of IT experience working on software development projects.
- Excellent analytical, problem solving, communication and interpersonal skills, explore/adopt to new technologies with ease and a good team player with an ability to interact with individuals at all levels.
- Experience in Elastic Search Engine Lucene/Index based search, ELK log analytics tool, Elasticsearch, Logstash, Kibana.
- Design NoSQL database schema to help migrating legacy application's datastore to Elasticsearch.
- Designing Elasticsearch, Kibana and Logstash based logs & metrics pipeline and performing KPI based cloud monitoring.
- Experienced in performing in memory data processing for batch , real time , and advanced analytics
- Good knowledge of integrating Spark Streaming with Kafka for real time processing of streaming data
- Good knowledge in Hadoop architecture and experienced in developing such as HDFS, Hive, Pig and Spark .
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed a presentation layer using JSP, HTML, CSS and client validations using JavaScript.
- Worked with the Kibana dashboard for the overall build status with drill down features and created real - time dashboards.
TECHNICAL AND ANALYTICAL SKILLS:
Roles: Big Data Engineer, Spark /Hadoop Developer, Data Analyst
Programming: Python, R, C, SQL, JAVA
Tools: Spyder, IPython Notebook/Jupyter, Spark Notebook, Zeppelin notebook
Cloud: AWS/EMR/EC2/S3 (also direct-Hadoop-EC2)
Big Data: Elasticsearch, Spark, Hadoop, Hive, Pig, Sqoop
DB Languages: SQL, PL/SQL, Oracle, Hive, Spark SQL
Domain: Big Data, Data Mining, Data Analytics.
EXPERIENCE HISTORY:
Confidential
Big Data Engineer, Allen, TX
Responsibilities:
- Involved in configuring Elastic search, Log stash & Kibana (ELK) stacks and Elasticsearch performance and optimization
- Converted features in JSON to Elasticsearch Stack: Logstash to Kibana
- Indexed logs from data lake on Elastic search using spark for visualization on Kibana.
- Performed data analytics using Elasticsearch, Logstash, and Kibana (ELK ) to aggregates millions of lines of logs from Cloud DVR components into actionable, and human-readable dashboards
- Worked on Configuring Zookeeper, Kafka and logstash cluster for data ingestion and Elasticsearch performance and optimization and Worked on Kafka for live streaming of data.
- Worked with the Kibana dashboard for the overall build status with drill down features and created real-time dashboards.
- Data migrating into Elasticsearch through ES-Spark integration and Created mapping are indexing in Elasticsearch for quick retrieval
- Troubleshooting any build issue with ELK and work towards the solution.
- Using Curator API on Elasticsearch to data back up and restoring.
Environment : Elasticsearch, Logstash, Kibana, Kafka, Python, IntelliJ, Hadoop, Spark.
Confidential
Big Data Intern, Chicago
Responsibilities :
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase NoSQL database and Sqoop .
- Importing and exporting data in HDFS and Hive using Sqoop .
- Experience with NoSQL databases.
- Extracted files from Hbase through Sqoop and placed in HDFS and processed.
- Written Hive UDFS to extract data from staging tables.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Familiarized with job scheduling using Fair Scheduler so that CPU time is well distributed amongst all the jobs.
- Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
- Managed Hadoop log files.
- Analyzed the web log data using the HiveQL.
Environment: : Java, Eclipse, Hadoop, Map reduce, Hive, Hbase, Linux, Map Reduce, HDFS, Shell Scripting, MySQL.
Confidential
Java Developer/ Python Developer
Responsibilities:
- Worked on preparing LLD docs, test plans and code changes, then tested the changes.
- Wrote Shell scripts to automate business process.
- Designed, developed Middleware Components using Web-logic Application Server, persistence registration object, request entry handling (controller) object, concurrency object, transaction object.
- Preparing the packages and moving it LIFE and PROD.
- Used JavaScript for client-side validations.
- Worked on Python OpenStack APIs and used NumPy for Numerical analysis.
- Used Ajax and jQuery for transmitting JSON data objects between frontend and controllers.
- Developed Wrapper in Python for instantiating multi-threaded application.
- Developed views and templates with Python and Django's view controller and templating language to created user-friendly website interface.
- Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from python using Python-MySQL connector MySQL dB package to retrieve information.
Environment: Python 2.7, Django 1.4, J2EE (JDK 1.6, JSP, Servlets, JDBC), Struts, XML, JavaScript, Oracle 8i, Web logic, Eclipse, ANT, CVS, Linux. Remedy, SNOW, Access-DB, CR-DB and Tracking tool.
Confidential
Java/J2EE developer
Responsibilities:
- Involved in the complete Software Development Life Cycle (SDLC) phases of the project.
- Involved in designing and developing dynamic web pages using HTML, CSS and JavaScript.
- Designed the Struts Action Servlets.
- Used JavaScript for client-side validation.
- Used Hibernate Framework for object relational mapping with the persistent database.
- Wrote JUnit test cases to test the functionality of each method in the DAO classes developed.
- Responsible for generating build script using ANT that compiles the code, pre-compiles the JSP’s, builds an ear file and deploys the application on the application server.
- Used JBOSS application server to deploy the application.
- Used SVNS version control to maintain the code versions.
- Used Log4J for logging of messages in all environments.
Environment: Java, Servlets, HTML, JavaScript, CSS, Struts, Hibernate, MySQL, SVN, log4j, JBOSS application server, Ant, Eclipse.